--- title: Monitoring Stack Architecture description: Reference architecture for a monitoring stack in a self-hosted or homelab environment tags: - monitoring - observability - architecture category: systems created: 2026-03-14 updated: 2026-03-14 --- # Monitoring Stack Architecture ## Summary A monitoring stack architecture defines how metrics, probes, dashboards, and alerts fit together. In self-hosted environments, the stack should stay small enough to operate but broad enough to cover infrastructure, ingress, and critical services. ## Why it matters Monitoring that is bolted on late often misses the services operators actually depend on. A planned stack architecture makes it easier to understand where signals come from and how alerts reach the right people. ## Core concepts - Collection: exporters and scrape targets - Storage and evaluation: Prometheus - Visualization: Grafana - Alert routing: Alertmanager - External validation: blackbox or equivalent endpoint checks ## Practical usage Typical architecture: ```text Hosts and services -> Exporters / probes -> Prometheus Prometheus -> Grafana dashboards Prometheus -> Alertmanager -> notification channel ``` Recommended coverage: - Host metrics for compute and storage systems - Endpoint checks for user-facing services - Backup freshness and certificate expiry - Platform services such as DNS, reverse proxy, and identity provider ## Best practices - Monitor the path users depend on, not only the host underneath it - Keep the monitoring stack itself backed up and access controlled - Alert on actionable failures rather than every threshold crossing - Document ownership for critical alerts and dashboards ## Pitfalls - Monitoring only CPU and memory while ignoring ingress and backups - Running a complex stack with no retention or alert review policy - Depending on dashboards alone for outage detection - Forgetting to monitor the monitoring components themselves ## References - [Prometheus overview](https://prometheus.io/docs/introduction/overview/) - [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/) - [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter) - [Grafana documentation](https://grafana.com/docs/grafana/latest/)