This is an example of a simple banner

Top 5 Docker Mistakes that Crash your Containers in Production

Are your Docker containers crashing in production without warning? In most cases, the issue isn’t Docker itself but a handful of recurring configuration mistakes. Here are the five most expensive ones, and how to fix them today.

What is your Docker production maturity level?

1 / 5 — Do your Docker containers run as root by default?

Table of contents

  1. Why Docker fails in production
  2. Mistake #1 – Running your containers as root
  3. Mistake #2 – Building bloated images without multi-stage
  4. Mistake #3 – Storing secrets and data inside the image
  5. Mistake #4 – Skipping health checks and resource limits
  6. Mistake #5 – Flying blind with no logs or monitoring
  7. The Docker production checklist
  8. Conclusion
  9. Docker in production FAQ

devops engineer monitoring docker containers in production on screen

Docker remains the most widely deployed container engine in the world. These five mistakes are among the most frequent root causes of Docker incidents seen in production by DevOps teams in French-speaking Switzerland. They are not Docker bugs. Instead, they are configuration shortcuts taken in development that come back to haunt you the day traffic spikes or an attacker comes knocking.

In production, a container that keeps restarting, an image that ships 2 GB of unused weight, or a secret committed inside a Dockerfile costs time, money, and sometimes customer trust. The good news is that all these mistakes are preventable with the right method and reflexes. The catch is knowing which ones to look for first.

Why Docker fails in production

devops team analyzing a production docker incident in a meeting

First, a container is not a virtual machine. It is an isolated process that shares the host kernel. Therefore, any misconfiguration can ripple straight back to the host. Furthermore, unlike development where a quick restart erases the issue, production is unforgiving of the smallest oversight.

According to a 2024 Snyk study, more than 60% of scanned Docker images contain at least one critical vulnerability. In short, the gap between a robust Docker setup and a fragile one usually comes down to five key decisions.

Three symptoms that should set off alarms

  • Containers stuck in a restart loop: typically a missing health check, an unset environment variable, or a permission issue.
  • Memory creeping up for no reason: missing --memory and --cpus limits, a memory leak, or unrotated logs.
  • Latency exploding under load: oversized image at startup, a database sharing the default Docker network, or no orchestration in place.

Of course, these symptoms are only the visible tip of the iceberg. However, in production they are exactly what wakes up your team at 3 AM. Now let’s break down the five mistakes that cause them.

Mistake #1 – Running your containers as root

cybersecurity analyst reviewing docker container permissions

By default, a Docker container runs as root. That’s convenient in development, but it is a serious security flaw in production. If an attacker compromises your application, they immediately gain root privileges inside the container. Moreover, in some configurations they can break out toward the host.

The typical attack scenario

A developer publishes an image based on node:18. As a result, the image inherits the root user. In production, the Node app exposes an SSRF flaw. Consequently, the attacker triggers an arbitrary command inside the container. Because they are root, they can modify files, install tools, or abuse a poorly configured volume to reach the host system.

How to fix the mistake

Fortunately, the fix is straightforward. Add a dedicated user in your Dockerfile and switch to it before the entrypoint command.

  • Create an application user: RUN addgroup --system app && adduser --system --ingroup app app
  • Switch to it: USER app right before CMD or ENTRYPOINT.
  • Verify at runtime: docker exec my-container whoami must return something other than root.

In addition, use the --read-only flag to make the container filesystem immutable. As a result, even an attacker with root inside the container cannot persist any file. This is a baseline requirement of the CIS Docker Benchmark, the reference for container security.

Mistake #2 – Building bloated images without multi-stage

developer optimizing a dockerfile with multi-stage build

A 2 GB Docker image for an application that ships 50 MB of code is the most visible and most common mistake. It slows every deployment, inflates your storage bill, and increases the attack surface. Specifically, the more packages an image carries, the more potential vulnerabilities it ships.

The three main causes of overweight images

  1. Bloated base image: ubuntu:latest weighs 80 MB, alpine:3.19 only 7 MB, and distroless a tiny 2 MB.
  2. Build tools left in the final image: compilers, package managers, temporary files, and source maps.
  3. Unoptimized layers: each RUN creates a layer, and a poorly written RUN apt-get install can double the image size.

The fix: multi-stage builds

The multi-stage pattern lets you compile inside a heavy image and then copy only the final binary into a minimal runtime image. In other words, you keep the comfort of a full build while shipping a featherweight runtime.

For example, here is a typical Dockerfile for a Go application:

  • Stage 1 (builder): FROM golang:1.22 AS builder, copy sources, run go build.
  • Stage 2 (runtime): FROM gcr.io/distroless/static, then COPY --from=builder /app/bin /app.
  • Result: a final image of 15 MB instead of 800 MB.

Consequently, deployments drop from 45 seconds to 3 seconds, and the attack surface shrinks dramatically. To go further, read the official Docker building best practices, which detail the advanced patterns.

Recommended training

Docker – Administration

Ref. DOCK-02

Master Docker in production: deployment, security, multi-stage builds, volume management, and orchestration. Prepares you for the Docker Certified Associate certification.

Duration: 3 days

Level: Intermediate

Location: Geneva / Lausanne / Virtual

Discover the training →

Mistake #3 – Storing secrets and data inside the image

devsecops engineer managing docker secrets and sensitive variables

Hardcoding a database password into a Dockerfile, or passing it as an environment variable at startup, looks harmless. In fact, it is one of the most common leaks Snyk identifies every year in public images. Furthermore, once a secret is committed inside a layer, it stays there even if you delete it later.

Why environment variables are not enough

Variables passed via docker run -e DB_PASS=... or a .env file are visible in many places. For instance, docker inspect exposes them, and any process inside the container can read them in /proc/1/environ. In short, this is not a secret, it is plaintext configuration.

The three solutions to favor

  • Docker Secrets (Swarm): injects secrets in memory under /run/secrets/, never on disk.
  • HashiCorp Vault or AWS Secrets Manager: secrets fetched at runtime via an authenticated API, with automatic rotation.
  • BuildKit secret mounts: RUN --mount=type=secret during build, without persisting the value into the final layer.

Moreover, never commit a .env file to Git. Configure a pre-commit hook with Gitleaks to block any leak. This is standard practice among the Swiss DevOps teams running critical CI/CD pipelines.

The special case of user data

In addition, never store data inside the container itself. Instead, always rely on persistent volumes. However, watch out for poorly configured volumes. A volume mounted with the wrong permissions can overwrite system files. Worse, it may enable a host escape. Always test with docker volume inspect before going to production.

Mistake #4 – Skipping health checks and resource limits

SRE engineer monitoring docker healthchecks and metrics in production

A running container is not a working container. That nuance is one many teams learn the hard way. Without a HEALTHCHECK directive in your Dockerfile, Docker has no idea whether your application actually answers requests. Consequently, the orchestrator keeps routing traffic to a zombie container.

Anatomy of a clean health check

For example, here is a typical health check for an HTTP API:

  • HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3
  • CMD curl -fsS http://localhost:8080/health || exit 1

The orchestrator (Docker Swarm, Kubernetes, or Nomad) can then automatically restart the container on failure. In particular, the start-period avoids false positives when slow-starting apps (Java, .NET) take time to boot.

Resource limits: your safeguard against noisy neighbors

Without limits, a leaking container can devour all the host RAM and crash every other container next door. This is the classic noisy neighbor effect. Therefore, always configure:

  • Memory: --memory=512m --memory-swap=512m to prevent any overflow.
  • CPU: --cpus=1.5 to cap processor usage.
  • PIDs: --pids-limit=200 to prevent fork bombs.
  • Logs: --log-opt max-size=10m --log-opt max-file=3 to avoid filling the disk.

Of course, the right values depend on your actual workload. First, measure with docker stats under nominal load. Then, add a 30% safety margin. This is the method recommended by reference DevOps guides such as Stéphane Robert’s Docker hub.

Mistake #5 – Flying blind with no logs or monitoring

sre team analyzing docker logs and metrics in a control center

The last trap is arguably the most underestimated one. In practice, many teams ship Docker to production without centralized monitoring. They rely on docker logs on demand. That works for a single container. However, it becomes unmanageable the moment you have ten of them.

The minimum recommended stack

For a calm production, three complementary building blocks are essential:

  1. Centralized logs: Grafana Loki, ELK, or a managed service like Datadog. Configure Docker to ship logs via the json-file driver with rotation, then aggregate with Promtail or Filebeat.
  2. Metrics: Prometheus + cAdvisor expose CPU, RAM, and I/O per container. Grafana renders the dashboards.
  3. Alerting: Alertmanager or PagerDuty with thresholds on failed health checks, restart counts, and OOM kills.

The three metrics to monitor first

  • Restart count: a container restarting more than 3 times an hour signals an unresolved issue.
  • Memory usage trend: a curve that climbs and never comes back down indicates a memory leak.
  • Latency p95 and p99: these percentiles surface the slowdowns that the average hides.

Furthermore, consider enabling Docker events via docker events. Those logs detail every action on the daemon (start, stop, kill, OOM) and prove invaluable when investigating an incident afterwards. Combined with a SIEM, they also help detect abnormal behavior.

The Docker production checklist

Finally, here is the condensed checklist to plug into your deployment pipeline before any production release. If any box is empty, you carry an unmanaged risk into production.

Area Check Tool or command
Security Container runs as non-root USER app + docker exec whoami
Security Read-only filesystem --read-only + targeted tmpfs
Image Multi-stage and minimal base Alpine or distroless
Image Vulnerability scan Trivy or Snyk
Secrets No secret inside the image Vault or Docker Secrets
Runtime Active health check HEALTHCHECK directive
Runtime Memory and CPU limits --memory and --cpus
Observability Centralized logs Loki, ELK, or Datadog
Observability Metrics + alerting Prometheus + Grafana

This checklist covers the main Docker risks we regularly see in production environments at the teams we train. It does not replace a code review or a security audit, but it helps you avoid the most expensive mistakes.

Recommended training

Docker Associate (DCA) Certification Preparation

Ref. DCA-PREP

Go further and validate your Docker production skills with a certification focused on operational Docker skills. Intensive preparation across the 13 domains of the DCA exam.

Duration: 4 days

Level: Advanced

Location: Geneva / Lausanne / Virtual

Discover the training →

Conclusion

Docker is neither complex nor moody. However, it does not forgive shortcuts taken in production. The five mistakes we just covered (running as root, bloated images, exposed secrets, missing health checks, and neglected monitoring) account for the vast majority of incidents Swiss DevOps teams hit on any given week.

Fixing these five points stays within reach of an organized team, and the return on investment is fast. Your deployments get faster, your infrastructure becomes more stable, and your attack surface shrinks noticeably. The next time a container falls over in production, ask yourself this: which of these five mistakes did we let slip through?

To go further, complete your Docker mastery with a structured path covering Docker, Kubernetes, and cloud-native orchestration. Specifically, that’s exactly what our DevOps trainings in Geneva and Lausanne are designed for: teams shipping to real production environments.

Docker in production FAQ

What is the leading cause of Docker container crashes in production?

In most cases, it’s the absence of a health check combined with the absence of resource limits. The container falls over with no safeguard to restart it, or it eats all the host RAM and drags the other services down with it.

Should you really avoid running Docker as root in production?

Yes, no exceptions. Running as root massively increases the impact of a compromise. A simple USER app in your Dockerfile neutralizes most known container escape techniques.

How do you reduce the size of a Docker image in production?

Use multi-stage builds to separate compilation from runtime, pick a minimal base image like Alpine or distroless, and group your RUN commands to limit the number of layers.

Which tools should you use to monitor Docker in production?

The standard stack combines Prometheus + cAdvisor for metrics, Grafana for visualization, and Loki or ELK for logs. For teams that prefer managed services, Datadog or New Relic cover all three needs in a single offering.

Is Docker suitable for critical workloads in production?

Yes, provided you pair it with an orchestrator like Kubernetes or Docker Swarm and stick to the security, monitoring, and resource-limit best practices. Without an orchestrator, standalone Docker remains viable only for non-critical workloads.

Facebook
Twitter
LinkedIn
Email
About the author

ITTA is the leader in IT training and project management solutions and services in French-speaking Switzerland.

Our latest posts

Subscribe to the newsletter

Confirmed training courses

Consult our confirmed trainings and sessions

SC-300T00
Intermédiaire
4
jours
Présentiel, Virtuel
Dès CHF 3'000.-
MD-102T00
Intermédiaire
5
jours
Présentiel, Virtuel
Dès CHF 3'650.-
55342
Intermédiaire
4
jours
Présentiel, Virtuel
Dès CHF 3'000.-
SC-401
Intermédiaire
4
jours
Présentiel, Virtuel
Dès CHF 3'000.-

Contact

ITTA
Route des jeunes 35
1227 Carouge, Suisse

Opening hours

Monday to Friday
8:30 AM to 6:00 PM
Tel. 058 307 73 00

Contact-us

ITTA
Route des jeunes 35
1227 Carouge, Suisse

Make a request

Contact

ITTA
Route des jeunes 35
1227 Carouge, Suisse

Opening hours

Monday to Friday, from 8:30 am to 06:00 pm.

Contact us

Your request