In a production Kubernetes deployment it is necessary to make it easier to monitor the status of the cluster effectively. Kubernetes provides Prometheus to gather data from the different components of Kubernetes and Grafana to access those data and provide real-time plotting and inspection capability. Moreover, they both provide systems to send alerts in case some conditions on the state of the cluster are met, i.e. using more than 90% of RAM or CPU.
The only downside is that the pods that handle monitoring consume some resource themselves, so this could be significant for small clusters below 5 nodes or so, but shouldn't be a problem for typical larger production deployments.
Both Prometheus and Grafana can be installed separately with Helm recipes or using the Prometheus operator Helm recipe,
however those deployments do not have any preconfigured dashboards, it is easier to get started thanks to the
which not only installs Prometheus and Grafana, but also preconfigures about 10 different Grafana dashboards to explore in depth
the status of a Kubernetes cluster.
The main issue is that customizing it is really complicated, it requires modifying
jsonnet templates and recompiling them with a
jsonnet builder which requires
go, however I don't foresee the need to do that for most users.
Unfortunately it is not based on Helm, so you need to first checkout the repository:
git clone https://github.com/coreos/kube-prometheus
and then follow the instructions in the documentation, copied here for convenience:
kubectl create -f manifests/
wait a moment, do not worry if some of the tasks fails, they should get fixed running:
kubectl apply -f manifests/
This creates several pods in the
kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 13m alertmanager-main-1 2/2 Running 0 13m alertmanager-main-2 2/2 Running 0 13m grafana-9d97dfdc7-zkfft 1/1 Running 0 14m kube-state-metrics-7c7979b6bc-srcvk 4/4 Running 0 12m node-exporter-b6n2w 2/2 Running 0 14m node-exporter-cgp46 2/2 Running 0 14m prometheus-adapter-b7d894c9c-z2ph7 1/1 Running 0 14m prometheus-k8s-0 3/3 Running 1 13m prometheus-k8s-1 3/3 Running 1 13m prometheus-operator-65c44fb7b7-8ltzs 1/1 Running 0 14m
Then you can setup forwarding on your laptop to export grafana locally:
kubectl --namespace monitoring port-forward svc/grafana 3000
localhost:3000 with your browser and you should be able to navigate through all the statistics of your cluster,
see for example this screenshot. The credentials are user
admin and password
Access the UI from a different machine
In case you are running the configuration on a remote server and you would like to access the Grafana UI (or any other service) from your laptop, you can install
kubectl also your my laptop, then copy the
.kube/config to the laptop with:
scp -r KUBECTLMACHINE:~/.kube/config ~/.kube
ssh ubuntu@$IP -f -L 6443:localhost:6443 sleep 3h &
from the laptop and then run the
port-forward command locally on the laptop.
Once we have deployed JupyterHub with Helm, we can pull up the
"namespace" monitor and select the
jhub namespace to visualize resource usage but also usage requests and limits of all pods created by JupyterHub and its users. See a screenshot below.
Grafana supports email alerts, but it needs a SMTP server, and it is not easy to setup and to avoid being filtered as spam. The easiest way is to setup an alert to Slack, and optionally be notified via email of Slack messages.
- Create a Slack app, name it e.g. Grafana
- Add feature "Incoming webhook"
- Create a incoming webhook in the workspace and channel your prefer on Slack
- In the Grafana Alerting menu, set the webhook incoming url, the channel name