cpr monitors the health of your docker containers and restarts them when necessary.
Why was cpr created, when there are plenty of other alternatives on the market, such as docker-autoheal?
Well, unfortunately the alternatives all require forking a process for each healthcheck. This is quite costly,
especially on smaller cloud instances.
For example, here we see the difference in CPU utilization between docker-autoheal and cpr on
an AWS EC2 t3a.small instance:
We can see the CPU usage drop from ~13% to ~5% after having switched to cpr just before 08:00.
For reference, this was a t3a.small instance with healthchecks enabled for 3 containers, each of which had
a default interval of 2 seconds. The instance was essentially idling, with no traffic from outside. (The spikes
we see early in the morning are cronjobs running.)
Running cpr using docker-compose:
version: '3.0'
services:
cpr:
container_name: cpr
image: kopf/cpr:latest
restart: "unless-stopped"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Once cpr is running, you need to mark containers with labels in order to let it know what healthchecks to perform. For example:
version: '3.0'
services:
nginx:
image: nginx
container_name: nginx
volumes:
...
ports:
...
labels:
cpr.enabled: "true" # required
cpr.url: "http://nginx/_nginxhealthcheck/" # required; url to be probed
cpr.headers: '{"Host":"www.mywebsite.com","X-Forwarded-Proto":"https"}' # optional; additional headers to send in healthcheck
cpr.start_period: 10 # optional; number of seconds to wait before checking
cpr.retries: 3 # optional; number of retries to make
cpr.timeout: 2.5 # optional; number of seconds before timing out
cpr's defaults can be configured by setting environment variables on the cpr container itself. Here is an overview:
CPR_DEFAULT_START_PERIOD(default:8) - The length of time (in seconds) to wait before probing a container.CPR_DEFAULT_INTERVAL(default:3) - The length of time (in seconds) to wait between probes.CPR_DEFAULT_RETRIES(default:2) - The number of retries before marking a container as unhealthy and restarting it.CPR_DEFAULT_TIMEOUT(default:1) - The default HTTP timeout (in seconds) to use when probing a container.CPR_REFRESH_TIME(default:60) - The default amount of time (in seconds) to wait before scanning for newcpr-enabled containers to probe.CPR_LOGLEVEL(default:INFO) - The default log level. Set toDEBUGfor more verbose logging.
cprwon't detect changes to your containers' logs after they've been scanned. In order to ensure cpr respects changes after the fact, be sure to restartcpronce you've redeployed your services with new labels.
