Collect Docker metrics with Prometheus

Estimated reading time: 6 minutes

Prometheus is an open-source systems monitoring and alerting toolkit. You can configure Docker as a Prometheus target. This topic shows you how to configure Docker, set up Prometheus to run as a Docker container, and monitor your Docker instance using Prometheus.

Warning: The available metrics and the names of those metrics are in active development and may change at any time.

Currently, you can only monitor Docker itself. You cannot currently monitor your application using the Docker target.

Configure Docker

To configure the Docker daemon as a Prometheus target, you need to specify the metrics-address. The best way to do this is via the daemon.json, which is located at one of the following locations by default. If the file does not exist, create it.

Linux: /etc/docker/daemon.json
Windows Server: C:\ProgramData\docker\config\daemon.json
Docker for Mac / Docker for Windows: Click the Docker icon in the toolbar, select Preferences, then select Daemon. Click Advanced.

If the file is currently empty, paste the following:

{
  "metrics-addr" : "127.0.0.1:9323",
  "experimental" : true
}

If the file is not empty, add those two keys, making sure that the resulting file is valid JSON. Be careful that every line ends with a comma (,) except for the last line.

Save the file, or in the case of Docker for Mac or Docker for Windows, save the configuration. Restart Docker.

Docker now exposes Prometheus-compatible metrics on port 9323.

Configure and run Prometheus

In this example, Prometheus runs as a Docker container on the same host.

Copy one of the following configuration files and save it to /tmp/prometheus.yml (Linux or Mac) or C:\tmp\prometheus.yml (Windows). This is a stock Prometheus configuration file, except for the addition of the Docker job definition at the bottom of the file. Docker for Mac and Docker for Windows need a slightly different configuration.

Docker for Linux
Docker for Mac or Windows

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9323']

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.

    static_configs:
      - targets: ['192.168.65.1:9323']

Next, start a single-replica Prometheus service using this configuration.

Docker for Linux or Docker for Mac
Docker for Windows or Windows Server

$ docker service create --replicas 1 --name my-prometheus \
    --mount type=bind,source=/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
    --publish 9090:9090/tcp \
    prom/prometheus

PS C:\> docker service create --replicas 1 --name my-prometheus
    --mount type=bind,source=C:/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml
    --publish 9090:9090/tcp
    prom/prometheus

Verify that the Docker target is listed at http://localhost:9090/targets/.

Prometheus targets page

You will not be able to access the endpoint URLs directly if you use Docker for Mac or Docker for Windows.

Use Prometheus

Create a graph. Click the Graphs link in the Prometheus UI. Choose a metric from the combo box to the right of the Execute button, and click Execute. The screenshot below shows the graph for engine_daemon_network_actions_seconds_count.

Prometheus engine_daemon_network_actions_seconds_count report

The above graph shows a pretty idle Docker instance. Your graph might look different if you are running active workloads.

To make the graph more interesting, create some network actions by starting a service with 10 tasks that just ping Docker non-stop (you can change the ping target to anything you like):

$ docker service create \
  --replicas 10 \
  --name ping_service \
  alpine ping docker.com

Wait a few minutes (the default scrape interval is 15 seconds) and reload your graph.

Prometheus engine_daemon_network_actions_seconds_count report

When you are ready, stop and remove the ping_service service, so that you are not flooding a host with pings for no reason.

$ docker service remove ping_service

Wait a few minutes and you should see that the graph falls back to the idle level.

Next steps

Read the Prometheus documentation
Set up some alerts

prometheus, metrics