Knowledge-Base/30 - Systems/observability/monitoring-stack-architecture.md

---
title: Monitoring Stack Architecture
description: Reference architecture for a monitoring stack in a self-hosted or homelab environment
tags:
  - monitoring
  - observability
  - architecture
category: systems
created: 2026-03-14
updated: 2026-03-14
---

# Monitoring Stack Architecture

## Summary

A monitoring stack architecture defines how metrics, probes, dashboards, and alerts fit together. In self-hosted environments, the stack should stay small enough to operate but broad enough to cover infrastructure, ingress, and critical services.

## Why it matters

Monitoring that is bolted on late often misses the services operators actually depend on. A planned stack architecture makes it easier to understand where signals come from and how alerts reach the right people.

## Core concepts

- Collection: exporters and scrape targets
- Storage and evaluation: Prometheus
- Visualization: Grafana
- Alert routing: Alertmanager
- External validation: blackbox or equivalent endpoint checks

## Practical usage

Typical architecture:

```text
Hosts and services -> Exporters / probes -> Prometheus
Prometheus -> Grafana dashboards
Prometheus -> Alertmanager -> notification channel
```

Recommended coverage:

- Host metrics for compute and storage systems
- Endpoint checks for user-facing services
- Backup freshness and certificate expiry
- Platform services such as DNS, reverse proxy, and identity provider

## Best practices

- Monitor the path users depend on, not only the host underneath it
- Keep the monitoring stack itself backed up and access controlled
- Alert on actionable failures rather than every threshold crossing
- Document ownership for critical alerts and dashboards

## Pitfalls

- Monitoring only CPU and memory while ignoring ingress and backups
- Running a complex stack with no retention or alert review policy
- Depending on dashboards alone for outage detection
- Forgetting to monitor the monitoring components themselves

## References

- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)