feat: harden gateway with policy engine, secure tools, and governance docs

2026-02-14 16:05:56 +01:00
parent e17d34e6d7
commit 5969892af3
55 changed files with 4711 additions and 1587 deletions
--- a/docs/api-reference.md
+++ b/docs/api-reference.md
@@ -1,255 +1,61 @@
 # API Reference

-## HTTP Endpoints
-
-### `GET /`
-
-Returns basic server information. No authentication required.
-
-**Response**
-
-```json
-{
-  "name": "AegisGitea MCP",
-  "version": "0.1.0",
-  "status": "running"
-}
-```
-
---
-
-### `GET /health`
-
-Health check endpoint. No authentication required.
-
-**Response**
-
-```json
-{
-  "status": "healthy",
-  "gitea_connected": true
-}
-```
-
-Returns HTTP 200 when healthy. Returns HTTP 503 when Gitea is unreachable.
-
---
-
-### `GET /mcp/tools`
-
-Returns the list of available MCP tools. No authentication required (needed for ChatGPT tool discovery).
-
-**Response**
-
-```json
-{
-  "tools": [
-    {
-      "name": "list_repositories",
-      "description": "...",
-      "inputSchema": { ... }
-    }
-  ]
-}
-```
-
---
-
-### `POST /mcp/tool/call`
-
-Executes an MCP tool. **Authentication required.**
-
-**Request headers**
-
-```
-Authorization: Bearer <api-key>
-Content-Type: application/json
-```
-
-**Request body**
-
-```json
-{
-  "name": "<tool-name>",
-  "arguments": { ... }
-}
-```
-
-**Response**
-
-```json
-{
-  "content": [
-    {
-      "type": "text",
-      "text": "..."
-    }
-  ],
-  "isError": false
-}
-```
-
-On error, `isError` is `true` and `text` contains the error message.
-
---
-
-### `GET /mcp/sse`
-
-Server-Sent Events stream endpoint. Authentication required. Used for streaming MCP sessions.
-
---
-
-### `POST /mcp/sse`
-
-Sends a client message over an active SSE session. Authentication required.
-
---
-
-## Authentication
-
-All authenticated endpoints require a bearer token:
-
-```
-Authorization: Bearer <api-key>
-```
-
-Alternatively, the key can be passed as a query parameter (useful for tools that do not support custom headers):
-
-```
-GET /mcp/tool/call?api_key=<api-key>
-```
-
---
-
-## MCP Tools
-
-### `list_repositories`
-
-Lists all Gitea repositories accessible to the bot user.
-
-**Arguments:** none
-
-**Example response text**
-
-```
-Found 3 repositories:
-
-1. myorg/backend - Backend API service [Python] ★ 42
-2. myorg/frontend - React frontend [TypeScript] ★ 18
-3. myorg/infra - Infrastructure as code [HCL] ★ 5
-```
-
---
-
-### `get_repository_info`
-
-Returns metadata for a single repository.
-
-**Arguments**
-
-| Name | Type | Required | Description |
-|---|---|---|---|
-| `owner` | string | Yes | Repository owner (user or organisation) |
-| `repo` | string | Yes | Repository name |
-
-**Example response text**
-
-```
-Repository: myorg/backend
-Description: Backend API service
-Language: Python
-Stars: 42
-Forks: 3
-Default branch: main
-Private: false
-URL: https://gitea.example.com/myorg/backend
-```
-
---
-
-### `get_file_tree`
-
-Returns the file and directory structure of a repository.
-
-**Arguments**
-
-| Name | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `owner` | string | Yes | — | Repository owner |
-| `repo` | string | Yes | — | Repository name |
-| `ref` | string | No | default branch | Branch, tag, or commit SHA |
-| `recursive` | boolean | No | `false` | Recursively list all subdirectories |
-
-> **Note:** Recursive mode is disabled by default to limit response size. Enable with care on large repositories.
-
-**Example response text**
-
-```
-File tree for myorg/backend (ref: main):
-
-src/
-src/main.py
-src/config.py
-tests/
-tests/test_main.py
-README.md
-requirements.txt
-```
-
---
-
-### `get_file_contents`
-
-Returns the contents of a single file.
-
-**Arguments**
-
-| Name | Type | Required | Default | Description |
-|---|---|---|---|---|
-| `owner` | string | Yes | — | Repository owner |
-| `repo` | string | Yes | — | Repository name |
-| `filepath` | string | Yes | — | Path to the file within the repository |
-| `ref` | string | No | default branch | Branch, tag, or commit SHA |
-
-**Limits**
-
- Files larger than `MAX_FILE_SIZE_BYTES` (default 1 MB) are rejected.
- Binary files that cannot be decoded as UTF-8 are returned as raw base64.
-
-**Example response text**
-
-```
-Contents of myorg/backend/src/main.py (ref: main):
-
-import fastapi
-...
-```
-
---
-
-## Error Responses
-
-All errors follow this structure:
-
-```json
-{
-  "content": [
-    {
-      "type": "text",
-      "text": "Error: <description>"
-    }
-  ],
-  "isError": true
-}
-```
-
-Common error scenarios:
-
-| Scenario | HTTP Status | `isError` |
-|---|---|---|
-| Missing or invalid API key | 401 | — (rejected before tool runs) |
-| Rate limited IP address | 429 | — |
-| Tool not found | 404 | — |
-| Repository not found in Gitea | 200 | `true` |
-| File too large | 200 | `true` |
-| Gitea API unavailable | 200 | `true` |
+## Endpoints
+
+- `GET /`: server metadata.
+- `GET /health`: health probe.
+- `GET /metrics`: Prometheus metrics (when enabled).
+- `POST /automation/webhook`: ingest policy-controlled webhook events.
+- `POST /automation/jobs/run`: run policy-controlled automation jobs.
+- `GET /mcp/tools`: list tool definitions.
+- `POST /mcp/tool/call`: execute a tool (`Authorization: Bearer <api-key>` required except in explicitly disabled auth mode).
+- `GET /mcp/sse` and `POST /mcp/sse`: MCP SSE transport.
+
+## Automation Jobs
+
+`POST /automation/jobs/run` supports:
+- `dependency_hygiene_scan` (read-only scaffold).
+- `stale_issue_detection` (read-only issue age analysis).
+- `auto_issue_creation` (write-mode + whitelist + policy required).
+
+## Read Tools
+
+- `list_repositories`.
+- `get_repository_info` (`owner`, `repo`).
+- `get_file_tree` (`owner`, `repo`, optional `ref`, `recursive`).
+- `get_file_contents` (`owner`, `repo`, `filepath`, optional `ref`).
+- `search_code` (`owner`, `repo`, `query`, optional `ref`, `page`, `limit`).
+- `list_commits` (`owner`, `repo`, optional `ref`, `page`, `limit`).
+- `get_commit_diff` (`owner`, `repo`, `sha`).
+- `compare_refs` (`owner`, `repo`, `base`, `head`).
+- `list_issues` (`owner`, `repo`, optional `state`, `page`, `limit`, `labels`).
+- `get_issue` (`owner`, `repo`, `issue_number`).
+- `list_pull_requests` (`owner`, `repo`, optional `state`, `page`, `limit`).
+- `get_pull_request` (`owner`, `repo`, `pull_number`).
+- `list_labels` (`owner`, `repo`, optional `page`, `limit`).
+- `list_tags` (`owner`, `repo`, optional `page`, `limit`).
+- `list_releases` (`owner`, `repo`, optional `page`, `limit`).
+
+## Write Tools (Write Mode Required)
+
+- `create_issue` (`owner`, `repo`, `title`, optional `body`, `labels`, `assignees`).
+- `update_issue` (`owner`, `repo`, `issue_number`, one or more of `title`, `body`, `state`).
+- `create_issue_comment` (`owner`, `repo`, `issue_number`, `body`).
+- `create_pr_comment` (`owner`, `repo`, `pull_number`, `body`).
+- `add_labels` (`owner`, `repo`, `issue_number`, `labels`).
+- `assign_issue` (`owner`, `repo`, `issue_number`, `assignees`).
+
+## Validation and Limits
+
+- All tool argument schemas reject unknown fields.
+- List responses are capped by `MAX_TOOL_RESPONSE_ITEMS`.
+- Text payloads are capped by `MAX_TOOL_RESPONSE_CHARS`.
+- File reads are capped by `MAX_FILE_SIZE_BYTES`.
+
+## Error Model
+
+- Policy denial: HTTP `403`.
+- Validation error: HTTP `400`.
+- Auth error: HTTP `401`.
+- Rate limit: HTTP `429`.
+- Internal errors: HTTP `500` without stack traces in production.
--- a/docs/audit.md
+++ b/docs/audit.md
@@ -0,0 +1,33 @@
+# Audit Logging
+
+## Design
+
+Audit logs are append-only JSON lines with hash chaining:
+- `prev_hash`: previous entry hash.
+- `entry_hash`: hash of current entry payload + previous hash.
+
+This makes tampering detectable.
+
+## Event Types
+
+- `tool_invocation`
+- `access_denied`
+- `security_event`
+
+Each event includes timestamps and correlation context.
+
+## Integrity Validation
+
+Use:
+
+```bash
+python3 scripts/validate_audit_log.py --path /var/log/aegis-mcp/audit.log
+```
+
+Exit code `0` indicates valid chain, non-zero indicates tamper/corruption.
+
+## Operational Expectations
+
+- Persist audit logs to durable storage.
+- Protect write permissions (service account only).
+- Validate integrity during incident response and release checks.
--- a/docs/automation.md
+++ b/docs/automation.md
@@ -0,0 +1,27 @@
+# Automation
+
+## Scope
+
+Current automation capabilities:
+- Webhook ingestion endpoint (`POST /automation/webhook`).
+- On-demand scheduled-job execution endpoint (`POST /automation/jobs/run`).
+- Dependency hygiene scan job scaffold (`dependency_hygiene_scan`).
+- Stale issue detection job (`stale_issue_detection`).
+- Auto issue creation job scaffold (`auto_issue_creation`, write-mode and policy required).
+
+Planned extensions:
+- Background scheduler orchestration.
+
+## Control Requirements
+
+All automation must be:
+- Policy-controlled.
+- Independently disableable.
+- Fully audited.
+- Explicitly documented with runbook guidance.
+
+## Enablement
+
+- `AUTOMATION_ENABLED=true` to allow automation endpoints.
+- `AUTOMATION_SCHEDULER_ENABLED=true` reserved for future built-in scheduler loop.
+- Policy rules must allow automation pseudo-tools (`automation_*`) per repository.
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -1,126 +1,46 @@
 # Deployment

-## Local / Development
+## Secure Defaults
+
+- Default bind: `MCP_HOST=127.0.0.1`.
+- Binding `0.0.0.0` requires explicit `ALLOW_INSECURE_BIND=true`.
+- Write mode disabled by default.
+- Policy file path configurable via `POLICY_FILE_PATH`.
+
+## Local Development

 ```bash
 make install-dev
-source venv/bin/activate    # Linux/macOS
-# venv\Scripts\activate     # Windows
-
 cp .env.example .env
-# Edit .env
-make generate-key           # Add key to .env
+make generate-key
 make run
 ```

-The server listens on `http://0.0.0.0:8080` by default.
-
---
-
 ## Docker

-### Build
+- Use `docker/Dockerfile` (non-root runtime).
+- Use compose profiles:
+  - `prod`: hardened runtime profile.
+  - `dev`: local development profile (localhost-only port bind).
+
+Run examples:

 ```bash
-make docker-build
-# or: docker build -f docker/Dockerfile -t aegis-gitea-mcp .
+docker compose --profile prod up -d
+docker compose --profile dev up -d
 ```

-### Configure
+## Environment Validation

-Create a `.env` file (copy from `.env.example`) with your settings before starting the container.
+Startup validates:
+- Required Gitea settings.
+- API keys (when auth enabled).
+- Insecure bind opt-in.
+- Write whitelist when write mode enabled.

-### Run
+## Production Recommendations

-```bash
-make docker-up
-# or: docker-compose up -d
-```
-
-### Logs
-
-```bash
-make docker-logs
-# or: docker-compose logs -f
-```
-
-### Stop
-
-```bash
-make docker-down
-# or: docker-compose down
-```
-
---
-
-## docker-compose.yml Overview
-
-The included `docker-compose.yml` provides:
-
- **Health check:** polls `GET /health` every 30 seconds
- **Audit log volume:** mounts a named volume at `/var/log/aegis-mcp` so logs survive container restarts
- **Resource limits:** 1 CPU, 512 MB memory
- **Security:** non-root user, `no-new-privileges`
- **Traefik labels:** commented out — uncomment and set `MCP_DOMAIN` to enable automatic HTTPS via Traefik
-
-### Enabling Traefik
-
-1. Set `MCP_DOMAIN=mcp.yourdomain.com` in `.env`.
-2. Uncomment the Traefik labels in `docker-compose.yml`.
-3. Make sure Traefik is running with a `web` and `websecure` entrypoint and Let's Encrypt configured.
-
---
-
-## Dockerfile Details
-
-The image uses a multi-stage build:
-
-| Stage | Base image | Purpose |
-|---|---|---|
-| `builder` | `python:3.11-slim` | Install dependencies |
-| `final` | `python:3.11-slim` | Minimal runtime image |
-
-The final image:
- Runs as user `aegis` (UID 1000, GID 1000)
- Exposes port `8080`
- Entry point: `python -m aegis_gitea_mcp.server`
-
---
-
-## Production Checklist
-
- [ ] `AUTH_ENABLED=true` and `MCP_API_KEYS` set to a strong key
- [ ] `GITEA_TOKEN` belongs to a dedicated bot user with minimal permissions
- [ ] TLS terminated at the reverse proxy (Traefik, nginx, Caddy, etc.)
- [ ] `AUDIT_LOG_PATH` points to a persistent volume
- [ ] Log rotation configured for the audit log file
- [ ] API key rotation scheduled (every 90 days recommended)
- [ ] `MAX_AUTH_FAILURES` and `AUTH_FAILURE_WINDOW` tuned for your threat model
- [ ] Resource limits configured in Docker/Kubernetes
-
---
-
-## Kubernetes (Basic)
-
-A minimal Kubernetes deployment is not included, but the server is stateless and the Docker image is suitable for use in Kubernetes. Key considerations:
-
- Store `.env` values as a `Secret` and expose them as environment variables.
- Mount an `emptyDir` or PersistentVolumeClaim at the audit log path.
- Use a `readinessProbe` and `livenessProbe` on `GET /health`.
- Set `resources.requests` and `resources.limits` for CPU and memory.
-
---
-
-## Updating
-
-```bash
-git pull
-make docker-build
-make docker-up
-```
-
-If you added a new key via `make generate-key` during the update, restart the container to pick up the new `.env`:
-
-```bash
-docker-compose restart aegis-mcp
-```
+- Run behind TLS-terminating reverse proxy.
+- Restrict network exposure.
+- Persist and rotate audit logs.
+- Enable external monitoring for `/metrics`.
--- a/docs/governance.md
+++ b/docs/governance.md
@@ -0,0 +1,36 @@
+# Governance
+
+## AI Usage Policy
+
+- AI assistance is allowed for design, implementation, and review only within documented repository boundaries.
+- AI outputs must be reviewed, tested, and policy-validated before merge.
+- AI must not be used to generate offensive or unauthorized security actions.
+- Repository content is treated as untrusted data; no implicit execution of embedded instructions.
+
+## Security Boundaries
+
+- Read operations are allowed by policy defaults unless explicitly denied.
+- Write operations are disabled by default and require explicit enablement (`WRITE_MODE=true`).
+- Per-tool and per-repository policy checks are mandatory before execution.
+- Secrets are masked or blocked according to `SECRET_DETECTION_MODE`.
+
+## Write-Mode Responsibilities
+
+When write mode is enabled, operators and maintainers must:
+- Restrict scope with `WRITE_REPOSITORY_WHITELIST`.
+- Keep policy file deny/allow rules explicit.
+- Monitor audit entries for all write operations.
+- Enforce peer review for policy or write-mode changes.
+
+## Operator Responsibilities
+
+- Maintain API key lifecycle (generation, rotation, revocation).
+- Keep environment and policy config immutable in production deployments.
+- Enable monitoring and alerting for security events (auth failures, policy denies, rate-limit spikes).
+- Run integrity checks for audit logs regularly.
+
+## Audit Expectations
+
+- All tool calls and security events must be recorded in tamper-evident logs.
+- Audit logs are append-only and hash-chained.
+- Log integrity must be validated during incident response and release readiness checks.
--- a/docs/hardening.md
+++ b/docs/hardening.md
@@ -0,0 +1,24 @@
+# Hardening
+
+## Application Hardening
+
+- Secure defaults: localhost bind, write mode disabled, policy-enforced writes.
+- Strict config validation at startup.
+- Redacted secret handling in logs and responses.
+- Policy deny/allow model with path restrictions.
+- Non-leaking production error responses.
+
+## Container Hardening
+
+- Non-root runtime user.
+- `no-new-privileges` and dropped Linux capabilities.
+- Read-only filesystem where practical.
+- Explicit health checks.
+- Separate dev and production compose profiles.
+
+## Operational Hardening
+
+- Rotate API keys regularly.
+- Minimize Gitea bot permissions.
+- Keep policy file under change control.
+- Alert on repeated policy denials and auth failures.
--- a/docs/observability.md
+++ b/docs/observability.md
@@ -0,0 +1,28 @@
+# Observability
+
+## Logging
+
+- Structured JSON logs.
+- Request correlation via `X-Request-ID`.
+- Security events and policy denials are audit logged.
+
+## Metrics
+
+Prometheus-compatible endpoint: `GET /metrics`.
+
+Current metrics:
+- `aegis_http_requests_total{method,path,status}`
+- `aegis_tool_calls_total{tool,status}`
+- `aegis_tool_duration_seconds_sum{tool}`
+- `aegis_tool_duration_seconds_count{tool}`
+
+## Tracing and Correlation
+
+- Request IDs propagate in response header (`X-Request-ID`).
+- Tool-level correlation IDs included in MCP responses.
+
+## Operational Guidance
+
+- Alert on spikes in 401/403/429 rates.
+- Alert on repeated `access_denied` and auth-rate-limit events.
+- Track tool latency trends for incident triage.
--- a/docs/policy.md
+++ b/docs/policy.md
@@ -0,0 +1,50 @@
+# Policy Engine
+
+## Overview
+
+Aegis uses a YAML policy engine to authorize tool execution before any Gitea API call is made.
+
+## Behavior Summary
+
+- Global tool allow/deny supported.
+- Per-repository tool allow/deny supported.
+- Optional repository path allow/deny supported.
+- Write operations are denied by default.
+- Write operations also require `WRITE_MODE=true` and `WRITE_REPOSITORY_WHITELIST` match.
+
+## Example Configuration
+
+```yaml
+defaults:
+  read: allow
+  write: deny
+
+tools:
+  deny:
+    - search_code
+
+repositories:
+  acme/service-a:
+    tools:
+      allow:
+        - get_file_contents
+        - list_commits
+    paths:
+      allow:
+        - src/*
+      deny:
+        - src/secrets/*
+```
+
+## Failure Behavior
+
+- Invalid YAML or invalid schema: startup failure (fail closed).
+- Denied tool call: HTTP `403` + audit `access_denied` entry.
+- Path traversal attempt in path-scoped tools: denied by validation/policy checks.
+
+## Operational Guidance
+
+- Keep policy files version-controlled and code-reviewed.
+- Prefer explicit deny entries for sensitive tools.
+- Use repository-specific allow lists for high-risk environments.
+- Test policy updates in staging before production rollout.
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -0,0 +1,72 @@
+# Roadmap
+
+## High-Level Evolution Plan
+
+1. Hardened read-only gateway baseline.
+2. Policy-driven authorization and observability.
+3. Controlled write-mode rollout.
+4. Automation and event-driven workflows.
+5. Continuous hardening and enterprise controls.
+
+## Threat Model Updates
+
+- Primary threats: credential theft, over-permissioned automation, prompt injection via repo data, policy bypass, audit tampering.
+- Secondary threats: denial-of-service, misconfiguration drift, unsafe deployment defaults.
+
+## Security Model
+
+- API key authentication + auth failure throttling.
+- Per-IP and per-token request rate limits.
+- Secret detection and outbound sanitization.
+- Tamper-evident audit logs with integrity verification.
+- No production stack-trace disclosure.
+
+## Policy Model
+
+- YAML policy with global and per-repository allow/deny rules.
+- Optional path restrictions for file-oriented tools.
+- Default write deny.
+- Write-mode repository whitelist enforcement.
+
+## Capability Matrix Concept
+
+- `Read` capabilities: enabled by default but policy-filtered.
+- `Write` capabilities: disabled by default, policy + whitelist gated.
+- `Automation` capabilities: disabled by default, policy-controlled.
+
+## Audit Log Design
+
+- JSON lines.
+- `prev_hash` + `entry_hash` chain.
+- Correlation/request IDs for traceability.
+- Validation script for chain integrity.
+
+## Write-Mode Architecture
+
+- Separate write tool set with strict schemas.
+- Global toggle (`WRITE_MODE`) + per-repo whitelist.
+- Policy engine still authoritative.
+- No merge, branch deletion, or force push endpoints.
+
+## Deployment Architecture
+
+- Non-root container runtime.
+- Read-only filesystem where practical.
+- Explicit opt-in for insecure bind.
+- Separate dev and prod compose profiles.
+
+## Observability Architecture
+
+- Structured JSON logs with request correlation.
+- Prometheus-compatible `/metrics` endpoint.
+- Tool execution counters and duration aggregates.
+
+## Risk Analysis
+
+- Highest risk: write-mode misuse and policy misconfiguration.
+- Mitigations: deny-by-default, whitelist, audit chain, tests, docs, reviews.
+
+## Extensibility Notes
+
+- Add new tools only through schema + policy + docs + tests path.
+- Keep transport-agnostic execution core for webhook/scheduler integrations.
--- a/docs/security.md
+++ b/docs/security.md
@@ -1,155 +1,39 @@
 # Security

-## Authentication
+## Core Controls

-AegisGitea MCP uses bearer token authentication. Clients must include a valid API key with every tool call.
+- API key authentication with constant-time comparison.
+- Auth failure throttling.
+- Per-IP and per-token request rate limits.
+- Strict input validation via Pydantic schemas (`extra=forbid`).
+- Policy engine authorization before tool execution.
+- Secret detection with mask/block behavior.
+- Production-safe error responses (no stack traces).

-### How It Works
+## Prompt Injection Hardening

-1. The client sends `Authorization: Bearer <key>` with its request.
-2. The server extracts the token and validates it against the configured `MCP_API_KEYS`.
-3. Comparison is done in **constant time** to prevent timing attacks.
-4. If validation fails, the failure is counted against the client's IP address.
+Repository content is treated strictly as data.

-### Generating API Keys
+- Tool outputs are bounded and sanitized.
+- No instruction execution from repository text.
+- Untrusted content handling helpers enforce maximum output size.

-Use the provided script to generate cryptographically secure 64-character hex keys:
+## Secret Detection

-```bash
-make generate-key
-# or: python scripts/generate_api_key.py
-```
+Detected classes include:
+- API keys and generic token patterns.
+- JWT-like tokens.
+- Private key block markers.
+- Common provider token formats.

-Keys must be at least 32 characters long. The script also saves metadata (creation date, expiration) to a `keys/` directory.
+Behavior:
+- `SECRET_DETECTION_MODE=mask`: redact in place.
+- `SECRET_DETECTION_MODE=block`: replace secret-bearing field values.
+- `SECRET_DETECTION_MODE=off`: disable sanitization (not recommended).

-### Multiple Keys (Grace Period During Rotation)
+## Authentication and Key Lifecycle

-You can configure multiple keys separated by commas. This allows you to add a new key and remove the old one without downtime:
-
-```env
-MCP_API_KEYS=newkey...,oldkey...
-```
-
-Remove the old key from the list after all clients have been updated.
-
---
-
-## Key Rotation
-
-Rotate keys regularly (recommended: every 90 days).
-
-```bash
-make rotate-key
-# or: python scripts/rotate_api_key.py
-```
-
-The rotation script:
-1. Reads the current key from `.env`
-2. Generates a new key
-3. Offers to replace the key immediately or add it alongside the old key (grace period)
-4. Creates a backup of your `.env` before modifying it
-
-### Checking Key Age
-
-```bash
-make check-key-age
-# or: python scripts/check_key_age.py
-```
-
-Exit codes: `0` = OK, `1` = expiring within 7 days (warning), `2` = already expired (critical).
-
---
-
-## Rate Limiting
-
-Failed authentication attempts are tracked per client IP address.
-
-| Setting | Default | Description |
-|---|---|---|
-| `MAX_AUTH_FAILURES` | `5` | Maximum failures before the IP is blocked |
-| `AUTH_FAILURE_WINDOW` | `300` | Rolling window in seconds |
-
-Once an IP exceeds the threshold, all further requests from that IP return HTTP 429 until the window resets. This is enforced entirely in memory — a server restart resets the counters.
-
---
-
-## Audit Logging
-
-All security-relevant events are written to a structured JSON log file.
-
-### Log Location
-
-Default: `/var/log/aegis-mcp/audit.log`  
-Configurable via `AUDIT_LOG_PATH`.
-
-The directory is created automatically on startup.
-
-### What Is Logged
-
-| Event | Description |
-|---|---|
-| Tool invocation | Every call to a tool: tool name, arguments, result status, correlation ID |
-| Access denied | Failed authentication attempts: IP address, reason |
-| Security event | Rate limit triggers, invalid key formats, startup authentication status |
-
-### Log Format
-
-Each entry is a JSON object on a single line:
-
-```json
-{
-  "timestamp": "2026-02-13T10:00:00Z",
-  "event": "tool_invocation",
-  "correlation_id": "a1b2c3d4-...",
-  "tool": "get_file_contents",
-  "owner": "myorg",
-  "repo": "backend",
-  "path": "src/main.py",
-  "result": "success",
-  "client_ip": "10.0.0.1"
-}
-```
-
-### Using Logs for Monitoring
-
-Because entries are newline-delimited JSON, they are easy to parse:
-
-```bash
-# Show all failed tool calls
-grep '"result": "error"' /var/log/aegis-mcp/audit.log | jq .
-
-# Show all access-denied events
-grep '"event": "access_denied"' /var/log/aegis-mcp/audit.log | jq .
-```
-
---
-
-## Access Control Model
-
-AegisGitea MCP does **not** implement its own repository access control. Access to repositories is determined entirely by the Gitea bot user's permissions:
-
- If the bot user has no access to a repository, it will not appear in `list_repositories` and `get_repository_info` will return an error.
- Grant the bot user the minimum set of repository permissions needed.
-
-**Principle of least privilege:** create a dedicated bot user and grant it read-only access only to the repositories that the AI needs to see.
-
---
-
-## Network Security Recommendations
-
- Run the MCP server behind a reverse proxy (e.g. Traefik or nginx) with TLS.
- Do not expose the server directly on a public port without TLS.
- Restrict inbound connections to known AI client IP ranges where possible.
- The `/mcp/tools` endpoint is intentionally public (required for ChatGPT plugin discovery). If this is undesirable, restrict it at the network/proxy level.
-
---
-
-## Container Security
-
-The provided Docker image runs with:
-
- A non-root user (`aegis`, UID 1000)
- `no-new-privileges` security option
- CPU and memory resource limits (1 CPU, 512 MB)
-
-See [Deployment](deployment.md) for details.
+- Keys must be at least 32 characters.
+- Rotate keys regularly (`scripts/rotate_api_key.py`).
+- Check key age and expiry (`scripts/check_key_age.py`).
+- Prefer dedicated bot credentials with least privilege.
--- a/docs/todo.md
+++ b/docs/todo.md
@@ -0,0 +1,92 @@
+# TODO
+
+## Phase 0 Governance
+
+- [x] Add `CODE_OF_CONDUCT.md`.
+- [x] Add governance policy documentation.
+- [x] Upgrade `AGENTS.md` as authoritative AI contract.
+
+## Phase 1 Architecture
+
+- [x] Publish roadmap and threat/security model updates.
+- [x] Publish phased TODO tracker.
+
+## Phase 2 Expanded Read Tools
+
+- [x] Implement `search_code`.
+- [x] Implement `list_commits`.
+- [x] Implement `get_commit_diff`.
+- [x] Implement `compare_refs`.
+- [x] Implement `list_issues`.
+- [x] Implement `get_issue`.
+- [x] Implement `list_pull_requests`.
+- [x] Implement `get_pull_request`.
+- [x] Implement `list_labels`.
+- [x] Implement `list_tags`.
+- [x] Implement `list_releases`.
+- [x] Add input validation and response bounds.
+- [x] Add unit/failure-mode tests.
+
+## Phase 3 Policy Engine
+
+- [x] Implement YAML policy loader and validator.
+- [x] Implement per-tool and per-repo allow/deny.
+- [x] Implement optional path restrictions.
+- [x] Enforce default write deny.
+- [x] Add policy unit tests.
+
+## Phase 4 Write Mode
+
+- [x] Implement write tools (`create_issue`, `update_issue`, comments, labels, assignment).
+- [x] Keep write mode disabled by default.
+- [x] Enforce repository whitelist.
+- [x] Ensure no merge/deletion/force-push capabilities.
+- [x] Add write denial tests.
+
+## Phase 5 Hardening
+
+- [x] Add secret detection + mask/block controls.
+- [x] Add prompt-injection defensive model (data-only handling).
+- [x] Add tamper-evident audit chaining and validation.
+- [x] Add per-IP and per-token rate limiting.
+
+## Phase 6 Automation
+
+- [x] Implement webhook ingestion pipeline.
+- [x] Implement on-demand scheduled jobs runner endpoint.
+- [x] Implement auto issue creation job scaffold from findings.
+- [x] Implement dependency hygiene scan orchestration scaffold.
+- [x] Implement stale issue detection automation.
+- [x] Add automation endpoint tests.
+
+## Phase 7 Deployment
+
+- [x] Harden Docker runtime defaults.
+- [x] Separate dev/prod compose profiles.
+- [x] Preserve non-root runtime and health checks.
+
+## Phase 8 Observability
+
+- [x] Add Prometheus metrics endpoint.
+- [x] Add structured JSON logging.
+- [x] Add request ID correlation.
+- [x] Add tool timing metrics.
+
+## Phase 9 Testing and Release Readiness
+
+- [x] Extend unit tests.
+- [x] Add policy tests.
+- [x] Add secret detection tests.
+- [x] Add write-mode denial tests.
+- [x] Add audit integrity tests.
+- [ ] Add integration-tagged tests against live Gitea (optional CI stage).
+- [ ] Final security review sign-off.
+- [ ] Release checklist execution.
+
+## Release Checklist
+
+- [ ] `make lint`
+- [ ] `make test`
+- [ ] Documentation review complete
+- [ ] Policy file reviewed for production scope
+- [ ] Write mode remains disabled unless explicitly approved
--- a/docs/write-mode.md
+++ b/docs/write-mode.md
@@ -0,0 +1,40 @@
+# Write Mode
+
+## Threat Model
+
+Write mode introduces mutation risk (issue/PR changes, metadata updates). Risks include unauthorized action, accidental mass updates, and audit evasion.
+
+## Default Posture
+
+- `WRITE_MODE=false` by default.
+- Even when enabled, writes require repository whitelist membership.
+- Policy engine remains authoritative and may deny specific write tools.
+
+## Supported Write Tools
+
+- `create_issue`
+- `update_issue`
+- `create_issue_comment`
+- `create_pr_comment`
+- `add_labels`
+- `assign_issue`
+
+Not supported (explicitly forbidden): merge actions, branch deletion, force push.
+
+## Enablement Steps
+
+1. Set `WRITE_MODE=true`.
+2. Set `WRITE_REPOSITORY_WHITELIST=owner/repo,...`.
+3. Review policy file for write-tool scope.
+4. Verify audit logging and alerting before rollout.
+
+## Safe Operations
+
+- Start with one repository in whitelist.
+- Use narrowly scoped bot credentials.
+- Require peer review for whitelist/policy changes.
+- Disable write mode during incident response if abuse is suspected.
+
+## Risk Tradeoffs
+
+Write mode improves automation and triage speed but increases blast radius. Use least privilege, tight policy, and strong monitoring.