Collecting metrics with Prometheus

Introduction

  • Prometheus is an open-source monitoring system including:

    • multiple service discovery backends to figure out which metrics to collect

    • a scraper to collect these metrics

    • an efficient time series database to store these metrics

    • a specific query language (PromQL) to query these time series

    • an alert manager to notify us according to metrics values or trends

  • We are going to deploy it on our Kubernetes cluster and see how to query it

Why Prometheus?

  • We don't endorse Prometheus more or less than any other system

  • It's relatively well integrated within the Cloud Native ecosystem

  • It can be self-hosted (this is useful for tutorials like this)

  • It can be used for deployments of varying complexity:

    • one binary and 10 lines of configuration to get started

    • all the way to thousands of nodes and millions of metrics

Exposing metrics to Prometheus

  • Prometheus obtains metrics and their values by querying exporters

  • An exporter serves metrics over HTTP, in plain text

  • This is what the node exporter looks like:

    http://demo.robustperception.io:9100/metrics

  • Prometheus itself exposes its own internal metrics, too:

    http://demo.robustperception.io:9090/metrics

  • If you want to expose custom metrics to Prometheus:

    • serve a text page like these, and you're good to go

    • libraries are available in various languages to help with quantiles etc.

How Prometheus gets these metrics

  • The Prometheus server will scrape URLs like these at regular intervals

    (by default: every minute; can be more/less frequent)

  • If you're worried about parsing overhead: exporters can also use protobuf

  • The list of URLs to scrape (the scrape targets) is defined in configuration

Defining scrape targets

This is maybe the simplest configuration file for Prometheus:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  • In this configuration, Prometheus collects its own internal metrics

  • A typical configuration file will have multiple scrape_configs

  • In this configuration, the list of targets is fixed

  • A typical configuration file will use dynamic service discovery

Service discovery

This configuration file will leverage existing DNS A records:

scrape_configs:
  - ...
  - job_name: 'node'
    dns_sd_configs:
      - names: ['api-backends.dc-paris-2.enix.io']
        type: 'A'
        port: 9100
  • In this configuration, Prometheus resolves the provided name(s)

    (here, api-backends.dc-paris-2.enix.io)

  • Each resulting IP address is added as a target on port 9100

Dynamic service discovery

  • In the DNS example, the names are re-resolved at regular intervals

  • As DNS records are created/updated/removed, scrape targets change as well

  • Existing data (previously collected metrics) is not deleted

  • Other service discovery backends work in a similar fashion

Other service discovery mechanisms

  • Prometheus can connect to e.g. a cloud API to list instances

  • Or to the Kubernetes API to list nodes, pods, services ...

  • Or a service like Consul, Zookeeper, etcd, to list applications

  • The resulting configurations files are way more complex

    (but don't worry, we won't need to write them ourselves)

Time series database

  • We could wonder, "why do we need a specialized database?"

  • One metrics data point = metrics ID + timestamp + value

  • With a classic SQL or noSQL data store, that's at least 160 bits of data + indexes

  • Prometheus is way more efficient, without sacrificing performance

    (it will even be gentler on the I/O subsystem since it needs to write less)

Storage in Prometheus 2.0 by Goutham V at DC17EU

Running Prometheus on our cluster

We need to:

  • Run the Prometheus server in a pod

    (using e.g. a Deployment to ensure that it keeps running)

  • Expose the Prometheus server web UI (e.g. with a NodePort)

  • Run the node exporter on each node (with a Daemon Set)

  • Setup a Service Account so that Prometheus can query the Kubernetes API

  • Configure the Prometheus server

    (storing the configuration in a Config Map for easy updates)

Helm Charts to the rescue

  • To make our lives easier, we are going to use a Helm Chart

  • The Helm Chart will take care of all the steps explained above

    (including some extra features that we don't need, but won't hurt)

Step 1: install Helm

  • If we already installed Helm earlier, these commands won't break anything

Exercice

  • Install Tiller (Helm's server-side component) on our cluster:

    helm init
    
  • Give Tiller permission to deploy things on our cluster:

    kubectl create clusterrolebinding add-on-cluster-admin \
        --clusterrole=cluster-admin --serviceaccount=kube-system:default
    

Step 2: install Prometheus

  • Skip this if we already installed Prometheus earlier

    (in doubt, check with helm list)

Exercice

  • Install Prometheus on our cluster:

    helm install stable/prometheus \
           --set server.service.type=NodePort \
           --set server.persistentVolume.enabled=false
    

The provided flags:

  • expose the server web UI (and API) on a NodePort

  • use an ephemeral volume for metrics storage
    (instead of requesting a Persistent Volume through a Persistent Volume Claim)

Connecting to the Prometheus web UI

  • Let's connect to the web UI and see what we can do

Exercice

  • Figure out the NodePort that was allocated to the Prometheus server:

    kubectl get svc | grep prometheus-server
    
  • With your browser, connect to that port

Querying some metrics

  • This is easy ... if you are familiar with PromQL

Exercice

  • Click on "Graph", and in "expression", paste the following:

      sum by (instance) (
        irate(
          container_cpu_usage_seconds_total{
            pod_name=~"worker.*"
            }[5m]
        )
      )
    
  • Click on the blue "Execute" button and on the "Graph" tab just below

  • We see the cumulated CPU usage of worker pods for each node
    (if we just deployed Prometheus, there won't be much data to see, though)

Getting started with PromQL

  • We can't learn PromQL in just 5 minutes

  • But we can cover the basics to get an idea of what is possible

    (and have some keywords and pointers)

  • We are going to break down the query above

    (building it one step at a time)

Graphing one metric across all tags

This query will show us CPU usage across all containers:

container_cpu_usage_seconds_total
  • The suffix of the metrics name tells us:

    • the unit (seconds of CPU)

    • that it's the total used since the container creation

  • Since it's a "total", it is an increasing quantity

    (we need to compute the derivative if we want e.g. CPU % over time)

  • We see that the metrics retrieved have tags attached to them

Selecting metrics with tags

This query will show us only metrics for worker containers:

container_cpu_usage_seconds_total{pod_name=~"worker.*"}
  • The =~ operator allows regex matching

  • We select all the pods with a name starting with worker

    (it would be better to use labels to select pods; more on that later)

  • The result is a smaller set of containers

Transforming counters in rates

This query will show us CPU usage % instead of total seconds used:

100*irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])
  • The irate operator computes the "per-second instant rate of increase"

    • rate is similar but allows decreasing counters and negative values

    • with irate, if a counter goes back to zero, we don't get a negative spike

  • The [5m] tells how far to look back if there is a gap in the data

  • And we multiply with 100* to get CPU % usage

Aggregation operators

This query sums the CPU usage per node:

sum by (instance) (
  irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])
)
  • instance corresponds to the node on which the container is running

  • sum by (instance) (...) computes the sum for each instance

  • Note: all the other tags are collapsed

    (in other words, the resulting graph only shows the instance tag)

  • PromQL supports many more aggregation operators

What kind of metrics can we collect?

  • Node metrics (related to physical or virtual machines)

  • Container metrics (resource usage per container)

  • Databases, message queues, load balancers, ...

    (check out this list of exporters!)

  • Instrumentation (=deluxe printf for our code)

  • Business metrics (customers served, revenue, ...)

Node metrics

  • CPU, RAM, disk usage on the whole node

  • Total number of processes running, and their states

  • Number of open files, sockets, and their states

  • I/O activity (disk, network), per operation or volume

  • Physical/hardware (when applicable): temperature, fan speed ...

  • ... and much more!

Container metrics

  • Similar to node metrics, but not totally identical

  • RAM breakdown will be different

    • active vs inactive memory
    • some memory is shared between containers, and accounted specially
  • I/O activity is also harder to track

    • async writes can cause deferred "charges"
    • some page-ins are also shared between containers

For details about container metrics, see:
http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/

Application metrics

  • Arbitrary metrics related to your application and business

  • System performance: request latency, error rate ...

  • Volume information: number of rows in database, message queue size ...

  • Business data: inventory, items sold, revenue ...

Detecting scrape targets

  • Prometheus can leverage Kubernetes service discovery

    (with proper configuration)

  • Services or pods can be annotated with:

    • prometheus.io/scrape: true to enable scraping
    • prometheus.io/port: 9090 to indicate the port number
    • prometheus.io/path: /metrics to indicate the URI (/metrics by default)
  • Prometheus will detect and scrape these (without needing a restart or reload)

Querying labels

  • What if we want to get metrics for containers belong to pod tagged worker?

  • The cAdvisor exporter does not give us Kubernetes labels

  • Kubernetes labels are exposed through another exporter

  • We can see Kubernetes labels through metrics kube_pod_labels

    (each container appears as a time series with constant value of 1)

  • Prometheus kind of supports "joins" between time series

  • But only if the names of the tags match exactly

Unfortunately ...

  • The cAdvisor exporter uses tag pod_name for the name of a pod

  • The Kubernetes service endpoints exporter uses tag pod instead

  • See this blog post or this other one to see how to perform "joins"

  • Alas, Prometheus cannot "join" time series with different labels

    (see Prometheus issue #2204 for the rationale)

  • There is a workaround involving relabeling, but it's "not cheap"