Troubleshoot swarm configurations

Estimated reading time: 4 minutes

These are the docs for UCP version 2.2.4

To select a different version, use the selector below.

UCP automatically tries to heal itself by monitoring its internal components and trying to bring them to a healthy state.

In most cases, if a single UCP component is in a failed state persistently, you should be able to restore the cluster to a healthy state by removing the unhealthy node from the cluster and joining it again. Lean how to remove and join modes.

Troubleshoot the etcd key-value store

UCP persists configuration data on an etcd key-value store and RethinkDB database that are replicated on all manager nodes of the UCP swarm. These data stores are for internal use only and should not be used by other applications.

With the HTTP API

In this example we’ll use curl for making requests to the key-value store REST API, and jq to process the responses.

You can install these tools on a Ubuntu distribution by running:

$ sudo apt-get update && apt-get install curl jq
  1. Use a client bundle to authenticate your requests. Learn more.

  2. Use the REST API to access the cluster configurations. The $DOCKER_HOST and $DOCKER_CERT_PATH environment variables are set when using the client bundle.

    $ export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379"
    
    $ curl -s \
         --cert ${DOCKER_CERT_PATH}/cert.pem \
         --key ${DOCKER_CERT_PATH}/key.pem \
         --cacert ${DOCKER_CERT_PATH}/ca.pem \
         ${KV_URL}/v2/keys | jq "."
    

To learn more about the key-value store REST API, check the etcd official documentation.

With the CLI client

The containers running the key-value store, include etcdctl, a command line client for etcd. You can run it using the docker exec command.

The examples below assume you are logged in with ssh into a UCP manager node.

$ docker exec -it ucp-kv etcdctl \
        --endpoint https://127.0.0.1:2379 \
        --ca-file /etc/docker/ssl/ca.pem \
        --cert-file /etc/docker/ssl/cert.pem \
        --key-file /etc/docker/ssl/key.pem \
        cluster-health

member 16c9ae1872e8b1f0 is healthy: got healthy result from https://192.168.122.64:12379
member c5a24cfdb4263e72 is healthy: got healthy result from https://192.168.122.196:12379
member ca3c1bb18f1b30bf is healthy: got healthy result from https://192.168.122.223:12379
cluster is healthy

On failure, the command exits with an error code and no output.

To learn more about the etcdctl utility, check the etcd official documentation.

RethinkDB Database

User and organization data for Docker Enterprise Edition is stored in a RethinkDB database which is replicated across all manager nodes in the UCP swarm.

Replication and failover of this database is typically handled automatically by UCP’s own configuration management processes, but detailed database status and manual reconfiguration of database replication is available through a command line tool available as part of UCP.

The examples below assume you are logged in with ssh into a UCP manager node.

Check the status of the database


# NODE_ADDRESS will be the IP address of this Docker Swarm manager node
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}')
# VERSION will be your most recent version of the docker/ucp-auth image
VERSION=$(docker image ls --format '{{.Tag}}' docker/ucp-auth | head -n 1)
# This command will output detailed status of all servers and database tables
# in the RethinkDB cluster.
docker container run --rm -v ucp-auth-store-certs:/tls docker/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 db-status

Server Status: [
  {
    "ID": "ffa9cd5a-3370-4ccd-a21f-d7437c90e900",
    "Name": "ucp_auth_store_192_168_1_25",
    "Network": {
      "CanonicalAddresses": [
        {
          "Host": "192.168.1.25",
          "Port": 12384
        }
      ],
      "TimeConnected": "2017-07-14T17:21:44.198Z"
    }
  }
]
...

Manually reconfigure database replication


# NODE_ADDRESS will be the IP address of this Docker Swarm manager node
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}')
# NUM_MANAGERS will be the current number of manager nodes in the cluster
NUM_MANAGERS=$(docker node ls --filter role=manager -q | wc -l)
# VERSION will be your most recent version of the docker/ucp-auth image
VERSION=$(docker image ls --format '{{.Tag}}' docker/ucp-auth | head -n 1)
# This reconfigure-db command will repair the RethinkDB cluster to have a
# number of replicas equal to the number of manager nodes in the cluster.
docker container run --rm -v ucp-auth-store-certs:/tls docker/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 --debug reconfigure-db --num-replicas ${NUM_MANAGERS} --emergency-repair

time="2017-07-14T20:46:09Z" level=debug msg="Connecting to db ..." 
time="2017-07-14T20:46:09Z" level=debug msg="connecting to DB Addrs: [192.168.1.25:12383]" 
time="2017-07-14T20:46:09Z" level=debug msg="Reconfiguring number of replicas to 1" 
time="2017-07-14T20:46:09Z" level=debug msg="(00/16) Emergency Repairing Tables..." 
time="2017-07-14T20:46:09Z" level=debug msg="(01/16) Emergency Repaired Table \"grant_objects\"" 
...

Where to go next

troubleshoot, etcd, rethinkdb, key, value, store, database, ucp, health, swarm