complete monitoring

17/04/2020

A complete monitoring solution collects logs and metrics from multiple servers, sends it to one server for aggregation and storage, and provides visualisation tools and alerting systems.

all in one : netdata

It automatically scans a huge range of metrics, but no logs, and it doesnt store more than some hours of data.
This is why we often need a more complete stack.

collectors

Need to be installed on every server, redirecting data to their master server (database)

promtail reads logs and push them to loki

filebeat reads logs and push them to elasticsearch

metricbeat reads metrics and push them to elasticsearch

telegraf reads metrics and push them to influxdb

databases

Need to be installed on a master server to gather every datasources

influxdb is a metric database

loki is a log database (light)

elasticsearch is a log database (heavy)

visualisation

Needs to connect to databases

grafana can visualize almost every datasource (influxdb, elasticsearch, loki, mysql, ...).

kibana can visualize elasticsearch database

chronograf can visualize influxdb database

alerting

needs testing : kapacitor / alertmanager / kibana ?

stacks

Theses tools are usually organised in stacks :

  • TICK (metrics) : telegraf => influxdb => chronograf => kapacitor
  • prometheus stack (metrics) : prometheus => grafana + alertmanager
  • ELK (logs) : filebeat + metricbeat + logstash => elasticsearch => kibana
  • Loki (logs) : promtail => loki => grafana

loki

healthcheck

curl -X GET -H 'Content-type:application/json' http://localhost:3100/ready
curl -X GET -H 'Content-type:application/json' http://localhost:3100/metrics

querying

curl -s "http://localhost:3100/api/prom/label"
curl -s "http://localhost:3100/api/prom/label/filename/values"
curl -G -s  "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query={job="varlogs"}' | jq
curl -G -s  "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query={stream=~".*"}' | jq
curl -G -H 'Sec-WebSocket-Version: 13' -H 'Sec-WebSocket-Extensions: permessage-deflate' -H 'Sec-WebSocket-Key: v4vMUSLqpDDrrvhrCqfE+Q==' -H 'Connection: keep-alive, Upgrade' -H 'Upgrade: websocket' 'http://localhost:3100/loki/api/v1/tail' --data-urlencode 'query={job="varlogs"}'

inserting

curl -v -H "Content-Type: application/json" -XPOST -s "http://localhost:3100/loki/api/v1/push" --data-raw '{"streams": [{ "stream": { "foo": "bar2" }, "values": [ [ "1570818238000000000", "fizzbuzz" ] ] }]}'

influxdb

healthcheck

http://localhost:8086/api/health

create database

curl -XPOST 'http://localhost:8086/query' --data-urlencode 'q=CREATE DATABASE "mydb"'

insert

curl -i -X POST 'http://localhost:8086/write?db=mydb&u=root&p=superpassword' --data-binary 'grostest value=0.64 1582032137924'
curl -i -X POST 'http://localhost:8086/write?db=mydb&u=root&p=superpassword' --data-binary 'grostest value=0.64 1582032137934'
curl -i -X POST 'http://localhost:8086/write?db=mydb&u=root&p=superpassword' --data-binary 'grostest value=0.64 2020-02-18T13:15:16Z'

query

curl -G 'http://localhost:8086/query?db=mydb&u=root&p=superpassword' --data-urlencode "q=SHOW Measurements"
curl -G 'http://localhost:8086/query?db=mydb&u=root&p=superpassword' --data-urlencode "q=Select count(*) from grostest"

portainer

healthcheck

curl http://localhost:9000/api/status

generate a password

docker run --rm httpd:2.4-alpine htpasswd -nbB admin 'superpassword' | cut -d ":" -f 2

ps: when using the password in docker compose, dont forget to double the $ as yaml format would replace $xxxx with environment value.

prometheus

healthcheck

http://localhost:9090/status

grafana

healthcheck

http://localhost:3000/api/health