Use a load balancer

Estimated reading time: 8 minutes

These are the docs for DTR version 2.3.4

To select a different version, use the selector below.

Once you’ve joined multiple DTR replicas nodes for high-availability, you can configure your own load balancer to balance user requests across all replicas.

This allows users to access DTR using a centralized domain name. If a replica goes down, the load balancer can detect that and stop forwarding requests to it, so that the failure goes unnoticed by users.

Load balancing DTR

DTR does not provide a load balancing service. You can use an on-premises or cloud-based load balancer to balance requests across multiple DTR replicas.

Make sure you configure your load balancer to:

  • Load balance TCP traffic on ports 80 and 443
  • Make sure the load balancer is not buffering requests
  • Make sure the load balancer is forwarding the Host HTTP header correctly
  • Make sure there’s no timeout for idle connections, or set it to more than 10 minutes
  • Use the unauthenticated /_ping endpoint (note the lack of an /api/v0/ in the path) on each DTR replica, to check if the replica is healthy and if it should remain in the load balancing pool or not

Health check endpoints

The /_ping endpoint returns a JSON object for the replica being queried of the form:

{
  "Error": "error message",
  "Healthy": true
}

A response of "Healthy": true means the replica is suitable for taking requests. It is also sufficient to check whether the HTTP status code is 200.

An unhealthy replica will return 503 as the status code and populate "Error" with more details on any one of these services:

  • Storage container (registry)
  • Authorization (garant)
  • Metadata persistence (rethinkdb)
  • Content trust (notary)

Note that this endpoint is for checking the health of a single replica. To get the health of every replica in a cluster, querying each replica individiually is the preferred way to do it in real time.

The /api/v0/meta/cluster_status endpoint returns a JSON object for the entire cluster as observed by the replica being queried, and it takes the form:

{
  "replica_health": {
    "replica id": "OK",
    "another replica id": "error message"
  },
  "replica_timestamp": {
    "replica id": "2006-01-02T15:04:05Z07:00",
    "another replica id": "2006-01-02T15:04:05Z07:00"
  },
  "(other fields go here)": "..."
}

Health statuses for the replicas is available in the "replica_health" object. These statuses are taken from a cache which is last updated by each replica individually at the time specified in the "replica_timestamp" object.

The response also contains information about the internal DTR storage state, which is around 45 KB of data. This, combined with the fact that the endpoint requires admin credentials, means it is not particularly appropriate for load balance checks. Use /_ping instead for those kinds of checks.

Configuration examples

Use the following examples to configure your load balancer for DTR.

user  nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

stream {
    upstream dtr_80 {
        server <DTR_REPLICA_1_IP>:80  max_fails=2 fail_timeout=30s;
        server <DTR_REPLICA_2_IP>:80  max_fails=2 fail_timeout=30s;
        server <DTR_REPLICA_N_IP>:80   max_fails=2 fail_timeout=30s;
    }
    upstream dtr_443 {
        server <DTR_REPLICA_1_IP>:443 max_fails=2 fail_timeout=30s;
        server <DTR_REPLICA_2_IP>:443 max_fails=2 fail_timeout=30s;
        server <DTR_REPLICA_N_IP>:443  max_fails=2 fail_timeout=30s;
    }
    server {
        listen 443;
        proxy_pass dtr_443;
    }

    server {
        listen 80;
        proxy_pass dtr_80;
    }
}
global
    log /dev/log    local0
    log /dev/log    local1 notice

defaults
        mode    tcp
        option  dontlognull
        timeout connect 5000
        timeout client 50000
        timeout server 50000
### frontends
# Optional HAProxy Stats Page accessible at http://<host-ip>:8181/haproxy?stats
frontend dtr_stats
        mode http
        bind 0.0.0.0:8181
        default_backend dtr_stats
frontend dtr_80
        mode tcp
        bind 0.0.0.0:80
        default_backend dtr_upstream_servers_80
frontend dtr_443
        mode tcp
        bind 0.0.0.0:443
        default_backend dtr_upstream_servers_443
### backends
backend dtr_stats
        mode http
        option httplog
        stats enable
        stats admin if TRUE
        stats refresh 5m
backend dtr_upstream_servers_80
        mode tcp
        option httpchk GET /_ping HTTP/1.1\r\nHost:\ <DTR_FQDN>
        server node01 <DTR_REPLICA_1_IP>:80 check weight 100
        server node02 <DTR_REPLICA_2_IP>:80 check weight 100
        server node03 <DTR_REPLICA_N_IP>:80 check weight 100
backend dtr_upstream_servers_443
        mode tcp
        option httpchk GET /_ping HTTP/1.1\r\nHost:\ <DTR_FQDN>
        server node01 <DTR_REPLICA_1_IP>:443 weight 100 check check-ssl verify none
        server node02 <DTR_REPLICA_2_IP>:443 weight 100 check check-ssl verify none
        server node03 <DTR_REPLICA_N_IP>:443 weight 100 check check-ssl verify none
{
    "Subnets": [
        "subnet-XXXXXXXX",
        "subnet-YYYYYYYY",
        "subnet-ZZZZZZZZ"
    ],
    "CanonicalHostedZoneNameID": "XXXXXXXXXXX",
    "CanonicalHostedZoneName": "XXXXXXXXX.us-west-XXX.elb.amazonaws.com",
    "ListenerDescriptions": [
        {
            "Listener": {
                "InstancePort": 443,
                "LoadBalancerPort": 443,
                "Protocol": "TCP",
                "InstanceProtocol": "TCP"
            },
            "PolicyNames": []
        }
    ],
    "HealthCheck": {
        "HealthyThreshold": 2,
        "Interval": 10,
        "Target": "HTTPS:443/_ping",
        "Timeout": 2,
        "UnhealthyThreshold": 4
    },
    "VPCId": "vpc-XXXXXX",
    "BackendServerDescriptions": [],
    "Instances": [
        {
            "InstanceId": "i-XXXXXXXXX"
        },
        {
            "InstanceId": "i-XXXXXXXXX"
        },
        {
            "InstanceId": "i-XXXXXXXXX"
        }
    ],
    "DNSName": "XXXXXXXXXXXX.us-west-2.elb.amazonaws.com",
    "SecurityGroups": [
        "sg-XXXXXXXXX"
    ],
    "Policies": {
        "LBCookieStickinessPolicies": [],
        "AppCookieStickinessPolicies": [],
        "OtherPolicies": []
    },
    "LoadBalancerName": "ELB-DTR",
    "CreatedTime": "2017-02-13T21:40:15.400Z",
    "AvailabilityZones": [
        "us-west-2c",
        "us-west-2a",
        "us-west-2b"
    ],
    "Scheme": "internet-facing",
    "SourceSecurityGroup": {
        "OwnerAlias": "XXXXXXXXXXXX",
        "GroupName":  "XXXXXXXXXXXX"
    }
}

You can deploy your load balancer using:

# Create the nginx.conf file, then
# deploy the load balancer

docker run --detach \
  --name dtr-lb \
  --restart=unless-stopped \
  --publish 80:80 \
  --publish 443:443 \
  --volume ${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro \
  nginx:stable-alpine
# Create the haproxy.cfg file, then
# deploy the load balancer

docker run --detach \
  --name dtr-lb \
  --publish 443:443 \
  --publish 80:80 \
  --publish 8181:8181 \
  --restart=unless-stopped \
  --volume ${PWD}/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \
  haproxy:1.7-alpine haproxy -d -f /usr/local/etc/haproxy/haproxy.cfg

Where to go next

dtr, load balancer