Docker Health Checks
Docker health checks allow the daemon to actively probe whether a container is working correctly, not just whether the process is running. A container whose process hasn't crashed but is silently failing (e.g. unable to respond to requests) will be marked unhealthy rather than incorrectly appearing as Up.
This page documents the health checks configured across the homelab stack, the reasoning behind each, and why certain containers deliberately have no health check defined.
How Health Checks Work
Each check runs a command inside the container on a repeating interval. Docker tracks the result and marks the container as one of:
| Status | Meaning |
|---|---|
starting |
Within the start_period grace window — failures don't count yet |
healthy |
Last check passed |
unhealthy |
Failed retries consecutive checks |
Docker does not automatically restart unhealthy containers — that is handled by an autoheal container or orchestrator. However, unhealthy status is visible in docker ps, Dozzle, Portainer, and Uptime Kuma, making it a useful early-warning signal.
Healthchecks run inside the container
The check command executes inside the container's filesystem and network namespace. Always use the internal port, not the host-mapped port. A container listening on port 3000 internally but mapped to 8080 on the host must use localhost:3000 in its health check.
Standard Parameters
All health checks across the stack follow these defaults unless noted otherwise:
healthcheck:
interval: 30s # How often to run the check
timeout: 10s # How long before a single check is considered failed
retries: 3 # Consecutive failures before marking unhealthy
start_period: 30s # Grace period after container start before failures count
start_period is particularly important for services that have slow startup times (databases, Grafana, Ghost) — failures during this window are ignored.
Health Checks in Use
Prometheus
healthcheck:
test: ["CMD-SHELL", "wget -qO /dev/null http://localhost:9090/-/healthy || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
Why wget and not curl? The prom/prometheus image is minimal and does not include curl. This was confirmed during initial setup when a curl-based check caused the container to immediately show as unhealthy despite Prometheus starting correctly. wget is available and achieves the same result.
Why /-/healthy? Prometheus exposes a dedicated health endpoint at /-/healthy which returns HTTP 200 once the server is ready to serve queries. This is preferred over hitting /metrics directly.
Grafana
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
curl is available in the Grafana image. The /api/health endpoint is Grafana's official health check endpoint, returning a JSON response with database connectivity status — more meaningful than a simple process check.
cAdvisor
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/healthz"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
cAdvisor exposes a dedicated /healthz endpoint specifically intended for health checking. This is preferred over /metrics as it doesn't trigger a full metrics collection pass. The start_period of 15s reflects the time cAdvisor takes to enumerate running containers on startup.
node_exporter
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:9100/metrics"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
node_exporter starts almost instantly with no external dependencies, hence the shorter start_period. Hitting /metrics directly confirms the exporter is both running and actively serving data.
MySQL (application databases)
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "--password=$$MYSQL_ROOT_PASSWORD"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
mysqladmin ping is the standard MySQL liveness check — it verifies the daemon is accepting connections, not just that the process is running. The $$ double-dollar sign is required in Compose files to escape the variable so it isn't consumed by Compose before Docker evaluates it.
Ghost (CMS)
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:2368 || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
Ghost can take a significant amount of time to start (theme compilation, database migrations), so start_period is extended to 60s. Port 2368 is Ghost's internal default — the external port mapping is irrelevant here.
Pi-hole
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/api/info/login"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
The /api/info/login endpoint is used rather than /admin/ because the admin redirect can produce a 301 that curl -f may not follow consistently across versions. The API endpoint returns a clean 200 and confirms the web interface is functional.
Containers Without Health Checks
The following containers have no health check defined. This is a deliberate decision in each case, not an oversight.
UnPoller
UnPoller uses a scratch-based image — it contains only the compiled Go binary with no shell, no wget, no curl, and no nc. There is no mechanism to execute a health check command inside the container.
# Confirmed on Tethys:
docker exec unpoller which wget # → OCI runtime exec failed: "which" not found
docker exec unpoller which curl # → OCI runtime exec failed: "which" not found
docker exec unpoller ls /bin # → OCI runtime exec failed: "ls" not found
Effective alternative: If UnPoller stops exporting metrics, Prometheus scrape failures will immediately surface in Grafana. The Prometheus scrape target itself acts as a functional health check.
pihole-exporter
Same situation as UnPoller — scratch-based image with no available tooling:
# Confirmed on Tethys:
docker exec pihole-exporter wget ... # → OCI runtime exec failed: "wget" not found
Effective alternative: Prometheus scrape failure detection covers this, as with UnPoller.
General rule for exporter images
Prometheus exporter images (pihole-exporter, unpoller, blackbox-exporter, etc.) are commonly built on scratch or distroless base images to minimise attack surface and image size. Assume no shell tooling is available unless confirmed otherwise. If an exporter stops working, Prometheus will report it as a scrape failure before Docker's own health check would catch it anyway.
Checking Health Status
# View health status for all containers
docker ps --format "table {{.Names}}\t{{.Status}}"
# Inspect the last health check result for a specific container
docker inspect <container_name> --format='{{json .State.Health}}' | jq
# View health check history
docker inspect <container_name> | jq '.[0].State.Health.Log'