Are your Docker containers crashing in production without warning? In most cases, the issue isn’t Docker itself but a handful of recurring configuration mistakes. Here are the five most expensive ones, and how to fix them today.
What is your Docker production maturity level?
1 / 5 — Do your Docker containers run as root by default?
2 / 5 — What is the average size of your Docker images in production?
3 / 5 — How do you handle secrets (passwords, API keys) in your containers?
4 / 5 — Do you have HEALTHCHECK and resource limits on every container?
5 / 5 — How do you monitor your Docker containers in production?
Table of contents
- Why Docker fails in production
- Mistake #1 – Running your containers as root
- Mistake #2 – Building bloated images without multi-stage
- Mistake #3 – Storing secrets and data inside the image
- Mistake #4 – Skipping health checks and resource limits
- Mistake #5 – Flying blind with no logs or monitoring
- The Docker production checklist
- Conclusion
- Docker in production FAQ

Docker remains the most widely deployed container engine in the world. These five mistakes are among the most frequent root causes of Docker incidents seen in production by DevOps teams in French-speaking Switzerland. They are not Docker bugs. Instead, they are configuration shortcuts taken in development that come back to haunt you the day traffic spikes or an attacker comes knocking.
In production, a container that keeps restarting, an image that ships 2 GB of unused weight, or a secret committed inside a Dockerfile costs time, money, and sometimes customer trust. The good news is that all these mistakes are preventable with the right method and reflexes. The catch is knowing which ones to look for first.
Why Docker fails in production

First, a container is not a virtual machine. It is an isolated process that shares the host kernel. Therefore, any misconfiguration can ripple straight back to the host. Furthermore, unlike development where a quick restart erases the issue, production is unforgiving of the smallest oversight.
According to a 2024 Snyk study, more than 60% of scanned Docker images contain at least one critical vulnerability. In short, the gap between a robust Docker setup and a fragile one usually comes down to five key decisions.
Three symptoms that should set off alarms
- Containers stuck in a restart loop: typically a missing health check, an unset environment variable, or a permission issue.
- Memory creeping up for no reason: missing
--memoryand--cpuslimits, a memory leak, or unrotated logs. - Latency exploding under load: oversized image at startup, a database sharing the default Docker network, or no orchestration in place.
Of course, these symptoms are only the visible tip of the iceberg. However, in production they are exactly what wakes up your team at 3 AM. Now let’s break down the five mistakes that cause them.
Mistake #1 – Running your containers as root

By default, a Docker container runs as root. That’s convenient in development, but it is a serious security flaw in production. If an attacker compromises your application, they immediately gain root privileges inside the container. Moreover, in some configurations they can break out toward the host.
The typical attack scenario
A developer publishes an image based on node:18. As a result, the image inherits the root user. In production, the Node app exposes an SSRF flaw. Consequently, the attacker triggers an arbitrary command inside the container. Because they are root, they can modify files, install tools, or abuse a poorly configured volume to reach the host system.
How to fix the mistake
Fortunately, the fix is straightforward. Add a dedicated user in your Dockerfile and switch to it before the entrypoint command.
- Create an application user:
RUN addgroup --system app && adduser --system --ingroup app app - Switch to it:
USER appright beforeCMDorENTRYPOINT. - Verify at runtime:
docker exec my-container whoamimust return something other than root.
In addition, use the --read-only flag to make the container filesystem immutable. As a result, even an attacker with root inside the container cannot persist any file. This is a baseline requirement of the CIS Docker Benchmark, the reference for container security.
Mistake #2 – Building bloated images without multi-stage

A 2 GB Docker image for an application that ships 50 MB of code is the most visible and most common mistake. It slows every deployment, inflates your storage bill, and increases the attack surface. Specifically, the more packages an image carries, the more potential vulnerabilities it ships.
The three main causes of overweight images
- Bloated base image:
ubuntu:latestweighs 80 MB,alpine:3.19only 7 MB, anddistrolessa tiny 2 MB. - Build tools left in the final image: compilers, package managers, temporary files, and source maps.
- Unoptimized layers: each
RUNcreates a layer, and a poorly writtenRUN apt-get installcan double the image size.
The fix: multi-stage builds
The multi-stage pattern lets you compile inside a heavy image and then copy only the final binary into a minimal runtime image. In other words, you keep the comfort of a full build while shipping a featherweight runtime.
For example, here is a typical Dockerfile for a Go application:
- Stage 1 (builder):
FROM golang:1.22 AS builder, copy sources, rungo build. - Stage 2 (runtime):
FROM gcr.io/distroless/static, thenCOPY --from=builder /app/bin /app. - Result: a final image of 15 MB instead of 800 MB.
Consequently, deployments drop from 45 seconds to 3 seconds, and the attack surface shrinks dramatically. To go further, read the official Docker building best practices, which detail the advanced patterns.
Recommended training
Docker – Administration
Ref. DOCK-02
Master Docker in production: deployment, security, multi-stage builds, volume management, and orchestration. Prepares you for the Docker Certified Associate certification.
Duration: 3 days
Level: Intermediate
Location: Geneva / Lausanne / Virtual
Mistake #3 – Storing secrets and data inside the image

Hardcoding a database password into a Dockerfile, or passing it as an environment variable at startup, looks harmless. In fact, it is one of the most common leaks Snyk identifies every year in public images. Furthermore, once a secret is committed inside a layer, it stays there even if you delete it later.
Why environment variables are not enough
Variables passed via docker run -e DB_PASS=... or a .env file are visible in many places. For instance, docker inspect exposes them, and any process inside the container can read them in /proc/1/environ. In short, this is not a secret, it is plaintext configuration.
The three solutions to favor
- Docker Secrets (Swarm): injects secrets in memory under
/run/secrets/, never on disk. - HashiCorp Vault or AWS Secrets Manager: secrets fetched at runtime via an authenticated API, with automatic rotation.
- BuildKit secret mounts:
RUN --mount=type=secretduring build, without persisting the value into the final layer.
Moreover, never commit a .env file to Git. Configure a pre-commit hook with Gitleaks to block any leak. This is standard practice among the Swiss DevOps teams running critical CI/CD pipelines.
The special case of user data
In addition, never store data inside the container itself. Instead, always rely on persistent volumes. However, watch out for poorly configured volumes. A volume mounted with the wrong permissions can overwrite system files. Worse, it may enable a host escape. Always test with docker volume inspect before going to production.
Mistake #4 – Skipping health checks and resource limits

A running container is not a working container. That nuance is one many teams learn the hard way. Without a HEALTHCHECK directive in your Dockerfile, Docker has no idea whether your application actually answers requests. Consequently, the orchestrator keeps routing traffic to a zombie container.
Anatomy of a clean health check
For example, here is a typical health check for an HTTP API:
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3CMD curl -fsS http://localhost:8080/health || exit 1
The orchestrator (Docker Swarm, Kubernetes, or Nomad) can then automatically restart the container on failure. In particular, the start-period avoids false positives when slow-starting apps (Java, .NET) take time to boot.
Resource limits: your safeguard against noisy neighbors
Without limits, a leaking container can devour all the host RAM and crash every other container next door. This is the classic noisy neighbor effect. Therefore, always configure:
- Memory:
--memory=512m --memory-swap=512mto prevent any overflow. - CPU:
--cpus=1.5to cap processor usage. - PIDs:
--pids-limit=200to prevent fork bombs. - Logs:
--log-opt max-size=10m --log-opt max-file=3to avoid filling the disk.
Of course, the right values depend on your actual workload. First, measure with docker stats under nominal load. Then, add a 30% safety margin. This is the method recommended by reference DevOps guides such as Stéphane Robert’s Docker hub.
Mistake #5 – Flying blind with no logs or monitoring

The last trap is arguably the most underestimated one. In practice, many teams ship Docker to production without centralized monitoring. They rely on docker logs on demand. That works for a single container. However, it becomes unmanageable the moment you have ten of them.
The minimum recommended stack
For a calm production, three complementary building blocks are essential:
- Centralized logs: Grafana Loki, ELK, or a managed service like Datadog. Configure Docker to ship logs via the
json-filedriver with rotation, then aggregate with Promtail or Filebeat. - Metrics: Prometheus + cAdvisor expose CPU, RAM, and I/O per container. Grafana renders the dashboards.
- Alerting: Alertmanager or PagerDuty with thresholds on failed health checks, restart counts, and OOM kills.
The three metrics to monitor first
- Restart count: a container restarting more than 3 times an hour signals an unresolved issue.
- Memory usage trend: a curve that climbs and never comes back down indicates a memory leak.
- Latency p95 and p99: these percentiles surface the slowdowns that the average hides.
Furthermore, consider enabling Docker events via docker events. Those logs detail every action on the daemon (start, stop, kill, OOM) and prove invaluable when investigating an incident afterwards. Combined with a SIEM, they also help detect abnormal behavior.
The Docker production checklist
Finally, here is the condensed checklist to plug into your deployment pipeline before any production release. If any box is empty, you carry an unmanaged risk into production.
| Area | Check | Tool or command |
|---|---|---|
| Security | Container runs as non-root | USER app + docker exec whoami |
| Security | Read-only filesystem | --read-only + targeted tmpfs |
| Image | Multi-stage and minimal base | Alpine or distroless |
| Image | Vulnerability scan | Trivy or Snyk |
| Secrets | No secret inside the image | Vault or Docker Secrets |
| Runtime | Active health check | HEALTHCHECK directive |
| Runtime | Memory and CPU limits | --memory and --cpus |
| Observability | Centralized logs | Loki, ELK, or Datadog |
| Observability | Metrics + alerting | Prometheus + Grafana |
This checklist covers the main Docker risks we regularly see in production environments at the teams we train. It does not replace a code review or a security audit, but it helps you avoid the most expensive mistakes.
Recommended training
Docker Associate (DCA) Certification Preparation
Ref. DCA-PREP
Go further and validate your Docker production skills with a certification focused on operational Docker skills. Intensive preparation across the 13 domains of the DCA exam.
Duration: 4 days
Level: Advanced
Location: Geneva / Lausanne / Virtual
Conclusion
Docker is neither complex nor moody. However, it does not forgive shortcuts taken in production. The five mistakes we just covered (running as root, bloated images, exposed secrets, missing health checks, and neglected monitoring) account for the vast majority of incidents Swiss DevOps teams hit on any given week.
Fixing these five points stays within reach of an organized team, and the return on investment is fast. Your deployments get faster, your infrastructure becomes more stable, and your attack surface shrinks noticeably. The next time a container falls over in production, ask yourself this: which of these five mistakes did we let slip through?
To go further, complete your Docker mastery with a structured path covering Docker, Kubernetes, and cloud-native orchestration. Specifically, that’s exactly what our DevOps trainings in Geneva and Lausanne are designed for: teams shipping to real production environments.
Docker in production FAQ
What is the leading cause of Docker container crashes in production?
In most cases, it’s the absence of a health check combined with the absence of resource limits. The container falls over with no safeguard to restart it, or it eats all the host RAM and drags the other services down with it.
Should you really avoid running Docker as root in production?
Yes, no exceptions. Running as root massively increases the impact of a compromise. A simple USER app in your Dockerfile neutralizes most known container escape techniques.
How do you reduce the size of a Docker image in production?
Use multi-stage builds to separate compilation from runtime, pick a minimal base image like Alpine or distroless, and group your RUN commands to limit the number of layers.
Which tools should you use to monitor Docker in production?
The standard stack combines Prometheus + cAdvisor for metrics, Grafana for visualization, and Loki or ELK for logs. For teams that prefer managed services, Datadog or New Relic cover all three needs in a single offering.
Is Docker suitable for critical workloads in production?
Yes, provided you pair it with an orchestrator like Kubernetes or Docker Swarm and stick to the security, monitoring, and resource-limit best practices. Without an orchestrator, standalone Docker remains viable only for non-critical workloads.
