1
Observability
Latte edited this page 2026-06-26 12:58:15 +02:00

Observability

Logging

  • Structured JSON logs.
  • Request correlation via X-Request-ID.
  • Security events and policy denials are audit logged.

Structured event helpers

logging_utils exposes reusable helpers so endpoints emit consistent, secret-safe structured events instead of ad-hoc inline logging:

  • log_event(logger, level, event, **context) — emit a named event with a context mapping; keys in SENSITIVE_CONTEXT_KEYS (e.g. token, authorization, password) are masked as ***.
  • log_nullable_field(logger, event, field, value) — record whether a parsed response field is None and its runtime type, without dumping its contents.
  • sanitize_context(context) — the masking primitive used by the above.

The context mapping is serialized into the JSON log payload under a context key. These run at DEBUG, so they are silent unless LOG_LEVEL=DEBUG.

get_issue is instrumented with these helpers (get_issue.start, get_issue.payload_shape, get_issue.field_check) to make nullable-field parsing failures diagnosable. The same pattern can be reused for other parsing-heavy endpoints (get_pull_request, list_issues, get_commit_diff).

Metrics

Prometheus-compatible endpoint: GET /metrics.

Current metrics:

  • aegis_http_requests_total{method,path,status}
  • aegis_tool_calls_total{tool,status}
  • aegis_tool_duration_seconds_sum{tool}
  • aegis_tool_duration_seconds_count{tool}

Tracing and Correlation

  • Request IDs propagate in response header (X-Request-ID).
  • Tool-level correlation IDs included in MCP responses.

Operational Guidance

  • Alert on spikes in 401/403/429 rates.
  • Alert on repeated access_denied and auth-rate-limit events.
  • Track tool latency trends for incident triage.