Una Actualización Mayor de Docker Engine en un Servidor de Producción Activo

Upgrading Docker Engine on a server that is actively running over a hundred containers is not a task that rewards improvisation. When the time came to move from Docker Engine 28 to 29 and Docker Compose v2 to v5 on our Hetzner production server, we treated it with the same discipline we would apply to a database migration: a written preparation checklist, a tested rollback path, a precise execution sequence, and a post-upgrade verification phase before declaring the window closed.

What follows is a complete account of how we approached it — the reasoning behind each step, the commands we actually ran, and what to watch for when your container count is high enough that a misconfigured restart policy can cascade into a difficult recovery.

Why This Upgrade Warranted a Formal Runbook

Docker Engine 29 introduced changes to the containerd integration path and adjusted how docker compose handles dependency resolution in depends_on blocks with condition: service_healthy. Docker Compose v5, meanwhile, is a significant internal rewrite — moving from the Go-based v2 binary to a new architecture that resolves several long-standing issues with parallel startup ordering, but also deprecates a handful of Compose file directives that v2 silently ignored.

On a server where a single docker-compose.yml might govern twelve interdependent services — databases, reverse proxies, application containers, background workers — a silent behavioural change in dependency resolution is exactly the kind of problem that does not surface immediately. It surfaces at 03:00 when a healthcheck never fires and a dependent container loops indefinitely.

The practical answer is preparation, not caution for its own sake.

Preparation Checklist

Before touching a single package, we worked through the following checklist. Each item has a reason.

1. Inventory all running containers and their restart policies.

docker ps --format "table {{.Names}}	{{.Status}}	{{.Image}}" | sort
docker inspect $(docker ps -q) --format '{{.Name}} restart={{.HostConfig.RestartPolicy.Name}}' | sort

This gives a baseline. Containers with restart: always will attempt to come back up automatically after the Docker daemon restarts — which is what you want, but only if the underlying compose files and images are in a known good state. Any container in a crash-loop at the time of the upgrade will still be in a crash-loop afterward, and it is better to know that now.

2. Validate all compose files against the v5 schema before upgrading.

docker compose config --quiet 2>&1 | grep -i warning

Run this in every directory that contains a docker-compose.yml. Compose v5 is stricter about deprecated keys — particularly version: at the top of compose files (now ignored but generates a warning), and any use of the deprecated links: directive. Warnings in v5 can become errors in subsequent releases; it is worth resolving them now.

3. Confirm available disk space.

df -h /var/lib/docker
docker system df

The upgrade will pull a new containerd shim and replace the Docker Engine binary. The old binary and its dependencies are not always cleaned up automatically. On a server that has been running for a year, docker system df will often reveal several gigabytes of reclaimable image layers. Clean these before the upgrade, not during.

docker system prune -f --volumes

Use --volumes only if you are certain no anonymous volumes contain data you need. On our server, all persistent data is mounted from named, explicitly declared volumes — so this was safe.

4. Export the current Docker Engine version and compose version to a reference file.

docker version > /root/docker-pre-upgrade.txt
docker compose version >> /root/docker-pre-upgrade.txt
docker ps -a >> /root/docker-pre-upgrade.txt

This takes ten seconds and provides an unambiguous rollback reference if something goes wrong and you need to identify which containers were running at upgrade time.

5. Notify downstream systems.

Our server runs a self-hosted Supabase stack and several n8n workflow automation instances. Any webhook that fires during a container restart cycle will fail silently on the sending side. We set a maintenance window in our uptime monitoring tool and disabled inbound webhooks in n8n for the duration.

The Rollback Plan

A rollback plan is only useful if you have decided in advance what condition triggers it. Ours was simple: if any container that was running before the upgrade is not running ten minutes after the upgrade completes, and cannot be recovered with a docker compose up -d, we roll back the Docker Engine binary.

Rolling back Docker Engine on a Debian-based system means pinning the previous package version. We noted the exact version string before starting:

apt-cache policy docker-ce docker-ce-cli containerd.io docker-compose-plugin

Output on our server before the upgrade:

docker-ce:
  Installed: 5:28.1.1-1~debian.12~bookworm
  Candidate: 5:29.0.1-1~debian.12~bookworm

docker-compose-plugin:
  Installed: 2.35.1-1~debian.12~bookworm
  Candidate: 2.36.0-1~debian.12~bookworm

The rollback command, should it be needed:

apt-get install -y   docker-ce=5:28.1.1-1~debian.12~bookworm   docker-ce-cli=5:28.1.1-1~debian.12~bookworm   containerd.io   docker-compose-plugin=2.35.1-1~debian.12~bookworm

We kept this command in a text file on the server at /root/docker-rollback.sh with execute permissions, ready to run without typing under pressure.

The Upgrade Process

The actual upgrade is straightforward once the preparation is complete. The sequence matters: update the package index first, verify what will be installed, then install.

Step 1: Update the apt index for the Docker repository only.

apt-get update -o Dir::Etc::sourcelist="sources.list.d/docker.list"   -o Dir::Etc::sourceparts="-"   -o APT::Get::List-Cleanup="0"

This restricts the update to the Docker repository rather than refreshing all sources. On a production server, an unintended package update from an unrelated source during a Docker maintenance window is a variable you do not need.

Step 2: Confirm the candidate versions.

apt-cache policy docker-ce docker-ce-cli containerd.io docker-compose-plugin

Verify the candidate matches what you expect before proceeding. If the candidate shows a version newer than what you tested against in staging, pause and evaluate.

Step 3: Install.

apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

This will stop the Docker daemon, replace the binaries, and restart the daemon. Containers with restart: always will be brought back up by the daemon on restart. The window where containers are not running is typically under thirty seconds on a reasonably sized server — though this varies depending on how long containerd takes to initialise.

Watch the restart in a second terminal:

watch -n 2 'docker ps --format "table {{.Names}}	{{.Status}}" | sort'

Step 4: Verify the installed versions.

docker version
docker compose version

On our server, the output confirmed:

Docker Engine: 29.0.1
Docker Compose: v2.36.0

Note that Docker Compose v5 is distributed as version 2.36.x of the docker-compose-plugin package — the "v5" designation refers to the internal rewrite version, not the package semver. This causes some confusion in documentation; the package version is what the apt index tracks.

Post-Upgrade Verification

Ten minutes after the upgrade, we ran a structured verification pass. This is not optional — the daemon restart will bring containers up, but it does not guarantee that containers which depend on healthchecks have actually passed them.

Check all containers are in the expected state.

docker ps -a --format "table {{.Names}}	{{.Status}}	{{.RunningFor}}" | sort

Any container showing Restarting or Exited needs immediate investigation. On our server, one container — a Supabase analytics service — showed Restarting (1). It had been in a low-frequency crash loop before the upgrade as well; the upgrade did not cause it, but it surfaced more visibly in the post-upgrade scan. This is exactly the kind of pre-existing issue the pre-upgrade inventory exists to catch.

Verify healthchecks are passing.

docker inspect $(docker ps -q)   --format '{{.Name}} health={{.State.Health.Status}}'   2>/dev/null | grep -v "health=<no value>" | sort

Containers without explicit healthchecks will show <no value> — filter these out. The ones that matter are those your other containers depend on via condition: service_healthy.

Run a compose config validation pass on all stacks.

for dir in /opt/*/; do
  if [ -f "${dir}docker-compose.yml" ]; then
    echo "--- $dir ---"
    docker compose -f "${dir}docker-compose.yml" config --quiet 2>&1
  fi
done

This catches any compose file directives that Compose v5 now treats as warnings or errors. On our server, three stacks emitted the version key deprecation warning. These are cosmetic but worth resolving in the next maintenance cycle.

Check the Docker daemon logs for errors.

journalctl -u docker --since "1 hour ago" | grep -i -E "error|warn|fatal"

A clean upgrade produces no errors in the daemon log. Warnings about deprecated configuration are worth noting but not acting on immediately.

Confirm network connectivity between containers.

docker network ls
docker network inspect bridge --format '{{range .Containers}}{{.Name}} {{end}}'

Docker Engine 29 did not change the default network driver behaviour, but it is worth confirming that custom bridge networks still have the expected container memberships after the daemon restart.

What We Observed

The upgrade completed in under two minutes. All containers that were running before the upgrade were running after it. The Docker Compose v5 change we noticed most immediately was a cleaner --dry-run output format — the new version produces structured, coloured output that makes it easier to review what a docker compose up will actually do before executing it.

The depends_on healthcheck ordering behaved identically to v2 on our stacks. This was expected — we had validated the compose files beforehand — but it was reassuring to confirm.

One substantive change: Compose v5 no longer silently ignores the container_name field in combination with scale. We had one stack that used both — an old holdover from before we moved to proper replicas. Compose v5 emitted a clear error on docker compose up for that stack, where v2 had silently ignored the conflict. The fix was a one-line edit to remove the redundant container_name field.