Prometheus

Summary

Prometheus is an open source monitoring system built around time-series metrics, pull-based scraping, alert evaluation, and queryable historical data. It is a standard choice for infrastructure and service monitoring in self-hosted environments.

Why it matters

Prometheus gives operators a consistent way to collect metrics from hosts, applications, and infrastructure components. It is especially valuable because it pairs collection, storage, and alert evaluation in one practical operational model.

Core concepts

Scrape targets and exporters
Time-series storage
PromQL for querying and aggregation
Alerting rules for actionable conditions
Service discovery integrations for dynamic environments

Practical usage

Prometheus commonly fits into infrastructure as:

Targets and exporters -> Prometheus -> dashboards and alerts

Typical uses:

Scraping node, container, and application metrics
Evaluating alert rules for outages and resource pressure
Providing metrics data to Grafana

Best practices

Start with critical infrastructure and user-facing services
Keep retention and scrape frequency aligned with actual operational needs
Write alerts that map to a human response
Protect Prometheus access because metrics can reveal sensitive system details

Pitfalls

Collecting too many high-cardinality metrics without a clear reason
Treating every metric threshold as an alert
Forgetting to monitor backup freshness, certificate expiry, or ingress paths
Running Prometheus without a retention and storage plan

2.0 KiB Raw Blame History