cAdvisor cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.
Node Exporter is a Prometheus exporter for hardware and OS metrics
Alert Manager Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integrations such as email, Slack, etc.
I'd encourage you to spend some time reading https://github.com/stefanprodan/swarmprom. Stefan has included detailed explanations about which elements perform which functions, as well as how to customize your stack. (This is only a starting point, after all)
DNS entry for the hostname you intend to use (or a wildcard), pointed to your keepalived IP
Related:
Traefik Forward Auth or Authelia to secure your Traefik-exposed services with an additional layer of authentication
Preparation
This is basically a rehash of stefanprodan's instructions to match the way I've configured other recipes.
Setup oauth provider
Grafana includes decent login protections, but from what I can see, Prometheus, AlertManager, and Unsee do no authentication. In order to expose these publicly for your own consumption (my assumption for the rest of this recipe), you'll want to prepare to run oauth_proxy containers in front of each of the 4 web UIs in this recipe.
Setup metrics
Edit (or create, depending on your OS) /etc/docker/daemon.json, and add the following, to enable the experimental export of metrics to Prometheus:
We'll need several files to bind-mount into our containers, so create directories for them and get the latest copies:
mkdir-p/var/data/swarmprom/dockerd-exporter/
cd/var/data/swarmprom/dockerd-exporter/
wgethttps://raw.githubusercontent.com/stefanprodan/swarmprom/master/dockerd-exporter/Caddyfile
mkdir-p/var/data/swarmprom/prometheus/rules/
cd/var/data/swarmprom/prometheus/rules/
wgethttps://raw.githubusercontent.com/stefanprodan/swarmprom/master/prometheus/rules/swarm_task.rules.yml
wgethttps://raw.githubusercontent.com/stefanprodan/swarmprom/master/prometheus/rules/swarm_node.rules.yml
# Directories for holding runtime data
mkdir/var/data/runtime/swarmprom/grafana/
mkdir/var/data/runtime/swarmprom/alertmanager/
mkdir/var/data/runtime/prometheus
chownnobody:nogroup/var/data/runtime/prometheus
Prepare Grafana
Grafana will make all the data we collect from our swarm beautiful.
Create /var/data/swarmprom/grafana.env, and populate with the following variables
OAUTH2_PROXY_CLIENT_ID=OAUTH2_PROXY_CLIENT_SECRET=OAUTH2_PROXY_COOKIE_SECRET=# Disable basic auth (it conflicts with oauth_proxy)GF_AUTH_BASIC_ENABLED=false# Set this to the real-world URL to your grafana install (else you get screwy CSS thanks to oauth_proxy)GF_SERVER_ROOT_URL=https://grafana.example.comGF_SERVER_DOMAIN=grafana.example.com# Set your default admin/pass hereGF_SECURITY_ADMIN_USER=adminGF_SECURITY_ADMIN_PASSWORD=ilovemybatmanunderpants
Swarmprom Docker Swarm config
Create a docker swarm config file in docker-compose syntax (v3), based on the original swarmprom docker-compose.yml file
Fast-track with premix! 🚀
"Premix" is a git repository which includes necessary docker-compose and env files for all published recipes. This means that you can launch any recipe with just a git pull and a docker stack deploy 👍.
🚀 Update: Premix now includes an ansible playbook, enabling you to deploy an entire stack + recipes, with a single ansible command! (more here)
This example is 274 lines long. Click here to collapse it for better readability
version:"3.3"networks:net:driver:overlayattachable:truevolumes:prometheus:{}grafana:{}alertmanager:{}configs:dockerd_config:file:/var/data/swarmprom/dockerd-exporter/Caddyfilenode_rules:file:/var/data/swarmprom/prometheus/rules/swarm_node.rules.ymltask_rules:file:/var/data/swarmprom/prometheus/rules/swarm_task.rules.ymlservices:dockerd-exporter:image:stefanprodan/caddynetworks:-internalenvironment:-DOCKER_GWBRIDGE_IP=172.18.0.1configs:-source:dockerd_configtarget:/etc/caddy/Caddyfiledeploy:mode:globalresources:limits:memory:128Mreservations:memory:64Mcadvisor:image:google/cadvisornetworks:-internalcommand:-logtostderr -docker_onlyvolumes:-/var/run/docker.sock:/var/run/docker.sock:ro-/:/rootfs:ro-/var/run:/var/run-/sys:/sys:ro-/var/lib/docker/:/var/lib/docker:rodeploy:mode:globalresources:limits:memory:128Mreservations:memory:64Mgrafana:image:stefanprodan/swarmprom-grafana:5.3.4networks:-internalenv_file:/var/data/config/swarmprom/grafana.envenvironment:-GF_USERS_ALLOW_SIGN_UP=false-GF_SMTP_ENABLED=${GF_SMTP_ENABLED:-false}-GF_SMTP_FROM_ADDRESS=${GF_SMTP_FROM_ADDRESS:-grafana@test.com}-GF_SMTP_FROM_NAME=${GF_SMTP_FROM_NAME:-Grafana}-GF_SMTP_HOST=${GF_SMTP_HOST:-smtp:25}-GF_SMTP_USER=${GF_SMTP_USER}-GF_SMTP_PASSWORD=${GF_SMTP_PASSWORD}volumes:-/var/data/runtime/swarmprom/grafana:/var/lib/grafanadeploy:mode:replicatedreplicas:1placement:constraints:-node.role == managerresources:limits:memory:128Mreservations:memory:64Mgrafana-proxy:image:a5huynh/oauth2_proxyenv_file :/var/data/config/swarmprom/grafana.envnetworks:-internal-traefik_publicdeploy:labels:-traefik.frontend.rule=Host:grafana.swarmprom.example.com-traefik.docker.network=traefik_public-traefik.port=4180volumes:-/var/data/config/swarmprom/authenticated-emails.txt:/authenticated-emails.txtcommand:|-cookie-secure=false-upstream=http://grafana:3000-redirect-url=https://grafana.swarmprom.example.com-http-address=http://0.0.0.0:4180-email-domain=example.com-provider=github-authenticated-emails-file=/authenticated-emails.txtalertmanager:image:stefanprodan/swarmprom-alertmanager:v0.14.0networks:-internalenvironment:-SLACK_URL=${SLACK_URL:-https://hooks.slack.com/services/TOKEN}-SLACK_CHANNEL=${SLACK_CHANNEL:-general}-SLACK_USER=${SLACK_USER:-alertmanager}command:-'--config.file=/etc/alertmanager/alertmanager.yml'-'--storage.path=/alertmanager'volumes:-/var/data/runtime/swarmprom/alertmanager:/alertmanagerdeploy:mode:replicatedreplicas:1placement:constraints:-node.role == managerresources:limits:memory:128Mreservations:memory:64Malertmanager-proxy:image:a5huynh/oauth2_proxyenv_file :/var/data/config/swarmprom/alertmanager.envnetworks:-internal-traefik_publicdeploy:labels:-traefik.frontend.rule=Host:alertmanager.swarmprom.example.com-traefik.docker.network=traefik_public-traefik.port=4180volumes:-/var/data/config/swarmprom/authenticated-emails.txt:/authenticated-emails.txtcommand:|-cookie-secure=false-upstream=http://alertmanager:9093-redirect-url=https://alertmanager.swarmprom.example.com-http-address=http://0.0.0.0:4180-email-domain=example.com-provider=github-authenticated-emails-file=/authenticated-emails.txtunsee:image:cloudflare/unsee:v0.8.0networks:-internalenvironment:-"ALERTMANAGER_URIS=default:http://alertmanager:9093"deploy:mode:replicatedreplicas:1unsee-proxy:image:a5huynh/oauth2_proxyenv_file :/var/data/config/swarmprom/unsee.envnetworks:-internal-traefik_publicdeploy:labels:-traefik.frontend.rule=Host:unsee.swarmprom.example.com-traefik.docker.network=traefik_public-traefik.port=4180volumes:-/var/data/config/swarmprom/authenticated-emails.txt:/authenticated-emails.txtcommand:|-cookie-secure=false-upstream=http://unsee:8080-redirect-url=https://unsee.swarmprom.example.com-http-address=http://0.0.0.0:4180-email-domain=example.com-provider=github-authenticated-emails-file=/authenticated-emails.txtnode-exporter:image:stefanprodan/swarmprom-node-exporter:v0.16.0networks:-internalenvironment:-NODE_ID={{.Node.ID}}volumes:-/proc:/host/proc:ro-/sys:/host/sys:ro-/:/rootfs:ro-/etc/hostname:/etc/nodenamecommand:-'--path.sysfs=/host/sys'-'--path.procfs=/host/proc'-'--collector.textfile.directory=/etc/node-exporter/'-'--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'# no collectors are explicitely enabled here, because the defaults are just fine,# see https://github.com/prometheus/node_exporter# disable ipvs collector because it barfs the node-exporter logs full with errors on my centos 7 vm's-'--no-collector.ipvs'deploy:mode:globalresources:limits:memory:128Mreservations:memory:64Mprometheus:image:stefanprodan/swarmprom-prometheus:v2.5.0networks:-internalcommand:-'--config.file=/etc/prometheus/prometheus.yml'-'--web.console.libraries=/etc/prometheus/console_libraries'-'--web.console.templates=/etc/prometheus/consoles'-'--storage.tsdb.path=/prometheus'-'--storage.tsdb.retention=24h'volumes:-/var/data/runtime/swarmprom/prometheus:/prometheusconfigs:-source:node_rulestarget:/etc/prometheus/swarm_node.rules.yml-source:task_rulestarget:/etc/prometheus/swarm_task.rules.ymldeploy:mode:replicatedreplicas:1placement:constraints:-node.role == managerresources:limits:memory:2048Mreservations:memory:128Mprometheus-proxy:image:a5huynh/oauth2_proxyenv_file :/var/data/config/swarmprom/prometheus.envnetworks:-internal-traefik_publicdeploy:labels:-traefik.frontend.rule=Host:prometheus.swarmprom.example.com-traefik.docker.network=traefik_public-traefik.port=4180volumes:-/var/data/config/swarmprom/authenticated-emails.txt:/authenticated-emails.txtcommand:|-cookie-secure=false-upstream=http://prometheus:9090-redirect-url=https://prometheus.swarmprom.example.com-http-address=http://0.0.0.0:4180-email-domain=example.com-provider=github-authenticated-emails-file=/authenticated-emails.txtnetworks:traefik_public:external:trueinternal:driver:overlayipam:config:-subnet:172.16.29.0/24
Note
Setup unique static subnets for every stack you deploy. This avoids IP/gateway conflicts which can otherwise occur when you're creating/removing stacks a lot. See my list here.
Serving
Launch Swarmprom stack
Launch the Swarm stack by running docker stack deploy swarmprom -c <path -to-docker-compose.yml>
Log into your new grafana instance, check out your beautiful graphs. Move onto drooling over Prometheus, AlertManager, and Unsee.
Chef's notes 📓
Pay close attention to the grafana.env config. If you encounter errors about basic auth failed, or failed CSS, it's likely due to misconfiguration of one of the grafana environment variables. ↩
Tip your waiter (sponsor) 👏
Did you receive excellent service? Want to compliment the chef? (..and support development of current and future recipes!) Sponsor me on Github / Ko-Fi / Patreon, or see the contribute page for more (free or paid) ways to say thank you! 👏
Employ your chef (engage) 🤝
Is this too much of a geeky PITA? Do you just want results, stat? I do this for a living - I'm a full-time Kubernetes contractor, providing consulting and engineering expertise to businesses needing short-term, short-notice support in the cloud-native space, including AWS/Azure/GKE, Kubernetes, CI/CD and automation.
Want to know now when this recipe gets updated, or when future recipes are added? Subscribe to the RSS feed, or leave your email address below, and we'll keep you updated.