feat: harden gateway with policy engine, secure tools, and governance docs

This commit is contained in:
2026-02-14 16:05:56 +01:00
parent e17d34e6d7
commit 5969892af3
55 changed files with 4711 additions and 1587 deletions

View File

@@ -1,255 +1,61 @@
# API Reference
## HTTP Endpoints
### `GET /`
Returns basic server information. No authentication required.
**Response**
```json
{
"name": "AegisGitea MCP",
"version": "0.1.0",
"status": "running"
}
```
---
### `GET /health`
Health check endpoint. No authentication required.
**Response**
```json
{
"status": "healthy",
"gitea_connected": true
}
```
Returns HTTP 200 when healthy. Returns HTTP 503 when Gitea is unreachable.
---
### `GET /mcp/tools`
Returns the list of available MCP tools. No authentication required (needed for ChatGPT tool discovery).
**Response**
```json
{
"tools": [
{
"name": "list_repositories",
"description": "...",
"inputSchema": { ... }
}
]
}
```
---
### `POST /mcp/tool/call`
Executes an MCP tool. **Authentication required.**
**Request headers**
```
Authorization: Bearer <api-key>
Content-Type: application/json
```
**Request body**
```json
{
"name": "<tool-name>",
"arguments": { ... }
}
```
**Response**
```json
{
"content": [
{
"type": "text",
"text": "..."
}
],
"isError": false
}
```
On error, `isError` is `true` and `text` contains the error message.
---
### `GET /mcp/sse`
Server-Sent Events stream endpoint. Authentication required. Used for streaming MCP sessions.
---
### `POST /mcp/sse`
Sends a client message over an active SSE session. Authentication required.
---
## Authentication
All authenticated endpoints require a bearer token:
```
Authorization: Bearer <api-key>
```
Alternatively, the key can be passed as a query parameter (useful for tools that do not support custom headers):
```
GET /mcp/tool/call?api_key=<api-key>
```
---
## MCP Tools
### `list_repositories`
Lists all Gitea repositories accessible to the bot user.
**Arguments:** none
**Example response text**
```
Found 3 repositories:
1. myorg/backend - Backend API service [Python] ★ 42
2. myorg/frontend - React frontend [TypeScript] ★ 18
3. myorg/infra - Infrastructure as code [HCL] ★ 5
```
---
### `get_repository_info`
Returns metadata for a single repository.
**Arguments**
| Name | Type | Required | Description |
|---|---|---|---|
| `owner` | string | Yes | Repository owner (user or organisation) |
| `repo` | string | Yes | Repository name |
**Example response text**
```
Repository: myorg/backend
Description: Backend API service
Language: Python
Stars: 42
Forks: 3
Default branch: main
Private: false
URL: https://gitea.example.com/myorg/backend
```
---
### `get_file_tree`
Returns the file and directory structure of a repository.
**Arguments**
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| `owner` | string | Yes | — | Repository owner |
| `repo` | string | Yes | — | Repository name |
| `ref` | string | No | default branch | Branch, tag, or commit SHA |
| `recursive` | boolean | No | `false` | Recursively list all subdirectories |
> **Note:** Recursive mode is disabled by default to limit response size. Enable with care on large repositories.
**Example response text**
```
File tree for myorg/backend (ref: main):
src/
src/main.py
src/config.py
tests/
tests/test_main.py
README.md
requirements.txt
```
---
### `get_file_contents`
Returns the contents of a single file.
**Arguments**
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| `owner` | string | Yes | — | Repository owner |
| `repo` | string | Yes | — | Repository name |
| `filepath` | string | Yes | — | Path to the file within the repository |
| `ref` | string | No | default branch | Branch, tag, or commit SHA |
**Limits**
- Files larger than `MAX_FILE_SIZE_BYTES` (default 1 MB) are rejected.
- Binary files that cannot be decoded as UTF-8 are returned as raw base64.
**Example response text**
```
Contents of myorg/backend/src/main.py (ref: main):
import fastapi
...
```
---
## Error Responses
All errors follow this structure:
```json
{
"content": [
{
"type": "text",
"text": "Error: <description>"
}
],
"isError": true
}
```
Common error scenarios:
| Scenario | HTTP Status | `isError` |
|---|---|---|
| Missing or invalid API key | 401 | — (rejected before tool runs) |
| Rate limited IP address | 429 | — |
| Tool not found | 404 | — |
| Repository not found in Gitea | 200 | `true` |
| File too large | 200 | `true` |
| Gitea API unavailable | 200 | `true` |
## Endpoints
- `GET /`: server metadata.
- `GET /health`: health probe.
- `GET /metrics`: Prometheus metrics (when enabled).
- `POST /automation/webhook`: ingest policy-controlled webhook events.
- `POST /automation/jobs/run`: run policy-controlled automation jobs.
- `GET /mcp/tools`: list tool definitions.
- `POST /mcp/tool/call`: execute a tool (`Authorization: Bearer <api-key>` required except in explicitly disabled auth mode).
- `GET /mcp/sse` and `POST /mcp/sse`: MCP SSE transport.
## Automation Jobs
`POST /automation/jobs/run` supports:
- `dependency_hygiene_scan` (read-only scaffold).
- `stale_issue_detection` (read-only issue age analysis).
- `auto_issue_creation` (write-mode + whitelist + policy required).
## Read Tools
- `list_repositories`.
- `get_repository_info` (`owner`, `repo`).
- `get_file_tree` (`owner`, `repo`, optional `ref`, `recursive`).
- `get_file_contents` (`owner`, `repo`, `filepath`, optional `ref`).
- `search_code` (`owner`, `repo`, `query`, optional `ref`, `page`, `limit`).
- `list_commits` (`owner`, `repo`, optional `ref`, `page`, `limit`).
- `get_commit_diff` (`owner`, `repo`, `sha`).
- `compare_refs` (`owner`, `repo`, `base`, `head`).
- `list_issues` (`owner`, `repo`, optional `state`, `page`, `limit`, `labels`).
- `get_issue` (`owner`, `repo`, `issue_number`).
- `list_pull_requests` (`owner`, `repo`, optional `state`, `page`, `limit`).
- `get_pull_request` (`owner`, `repo`, `pull_number`).
- `list_labels` (`owner`, `repo`, optional `page`, `limit`).
- `list_tags` (`owner`, `repo`, optional `page`, `limit`).
- `list_releases` (`owner`, `repo`, optional `page`, `limit`).
## Write Tools (Write Mode Required)
- `create_issue` (`owner`, `repo`, `title`, optional `body`, `labels`, `assignees`).
- `update_issue` (`owner`, `repo`, `issue_number`, one or more of `title`, `body`, `state`).
- `create_issue_comment` (`owner`, `repo`, `issue_number`, `body`).
- `create_pr_comment` (`owner`, `repo`, `pull_number`, `body`).
- `add_labels` (`owner`, `repo`, `issue_number`, `labels`).
- `assign_issue` (`owner`, `repo`, `issue_number`, `assignees`).
## Validation and Limits
- All tool argument schemas reject unknown fields.
- List responses are capped by `MAX_TOOL_RESPONSE_ITEMS`.
- Text payloads are capped by `MAX_TOOL_RESPONSE_CHARS`.
- File reads are capped by `MAX_FILE_SIZE_BYTES`.
## Error Model
- Policy denial: HTTP `403`.
- Validation error: HTTP `400`.
- Auth error: HTTP `401`.
- Rate limit: HTTP `429`.
- Internal errors: HTTP `500` without stack traces in production.

33
docs/audit.md Normal file
View File

@@ -0,0 +1,33 @@
# Audit Logging
## Design
Audit logs are append-only JSON lines with hash chaining:
- `prev_hash`: previous entry hash.
- `entry_hash`: hash of current entry payload + previous hash.
This makes tampering detectable.
## Event Types
- `tool_invocation`
- `access_denied`
- `security_event`
Each event includes timestamps and correlation context.
## Integrity Validation
Use:
```bash
python3 scripts/validate_audit_log.py --path /var/log/aegis-mcp/audit.log
```
Exit code `0` indicates valid chain, non-zero indicates tamper/corruption.
## Operational Expectations
- Persist audit logs to durable storage.
- Protect write permissions (service account only).
- Validate integrity during incident response and release checks.

27
docs/automation.md Normal file
View File

@@ -0,0 +1,27 @@
# Automation
## Scope
Current automation capabilities:
- Webhook ingestion endpoint (`POST /automation/webhook`).
- On-demand scheduled-job execution endpoint (`POST /automation/jobs/run`).
- Dependency hygiene scan job scaffold (`dependency_hygiene_scan`).
- Stale issue detection job (`stale_issue_detection`).
- Auto issue creation job scaffold (`auto_issue_creation`, write-mode and policy required).
Planned extensions:
- Background scheduler orchestration.
## Control Requirements
All automation must be:
- Policy-controlled.
- Independently disableable.
- Fully audited.
- Explicitly documented with runbook guidance.
## Enablement
- `AUTOMATION_ENABLED=true` to allow automation endpoints.
- `AUTOMATION_SCHEDULER_ENABLED=true` reserved for future built-in scheduler loop.
- Policy rules must allow automation pseudo-tools (`automation_*`) per repository.

View File

@@ -1,126 +1,46 @@
# Deployment
## Local / Development
## Secure Defaults
- Default bind: `MCP_HOST=127.0.0.1`.
- Binding `0.0.0.0` requires explicit `ALLOW_INSECURE_BIND=true`.
- Write mode disabled by default.
- Policy file path configurable via `POLICY_FILE_PATH`.
## Local Development
```bash
make install-dev
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
cp .env.example .env
# Edit .env
make generate-key # Add key to .env
make generate-key
make run
```
The server listens on `http://0.0.0.0:8080` by default.
---
## Docker
### Build
- Use `docker/Dockerfile` (non-root runtime).
- Use compose profiles:
- `prod`: hardened runtime profile.
- `dev`: local development profile (localhost-only port bind).
Run examples:
```bash
make docker-build
# or: docker build -f docker/Dockerfile -t aegis-gitea-mcp .
docker compose --profile prod up -d
docker compose --profile dev up -d
```
### Configure
## Environment Validation
Create a `.env` file (copy from `.env.example`) with your settings before starting the container.
Startup validates:
- Required Gitea settings.
- API keys (when auth enabled).
- Insecure bind opt-in.
- Write whitelist when write mode enabled.
### Run
## Production Recommendations
```bash
make docker-up
# or: docker-compose up -d
```
### Logs
```bash
make docker-logs
# or: docker-compose logs -f
```
### Stop
```bash
make docker-down
# or: docker-compose down
```
---
## docker-compose.yml Overview
The included `docker-compose.yml` provides:
- **Health check:** polls `GET /health` every 30 seconds
- **Audit log volume:** mounts a named volume at `/var/log/aegis-mcp` so logs survive container restarts
- **Resource limits:** 1 CPU, 512 MB memory
- **Security:** non-root user, `no-new-privileges`
- **Traefik labels:** commented out — uncomment and set `MCP_DOMAIN` to enable automatic HTTPS via Traefik
### Enabling Traefik
1. Set `MCP_DOMAIN=mcp.yourdomain.com` in `.env`.
2. Uncomment the Traefik labels in `docker-compose.yml`.
3. Make sure Traefik is running with a `web` and `websecure` entrypoint and Let's Encrypt configured.
---
## Dockerfile Details
The image uses a multi-stage build:
| Stage | Base image | Purpose |
|---|---|---|
| `builder` | `python:3.11-slim` | Install dependencies |
| `final` | `python:3.11-slim` | Minimal runtime image |
The final image:
- Runs as user `aegis` (UID 1000, GID 1000)
- Exposes port `8080`
- Entry point: `python -m aegis_gitea_mcp.server`
---
## Production Checklist
- [ ] `AUTH_ENABLED=true` and `MCP_API_KEYS` set to a strong key
- [ ] `GITEA_TOKEN` belongs to a dedicated bot user with minimal permissions
- [ ] TLS terminated at the reverse proxy (Traefik, nginx, Caddy, etc.)
- [ ] `AUDIT_LOG_PATH` points to a persistent volume
- [ ] Log rotation configured for the audit log file
- [ ] API key rotation scheduled (every 90 days recommended)
- [ ] `MAX_AUTH_FAILURES` and `AUTH_FAILURE_WINDOW` tuned for your threat model
- [ ] Resource limits configured in Docker/Kubernetes
---
## Kubernetes (Basic)
A minimal Kubernetes deployment is not included, but the server is stateless and the Docker image is suitable for use in Kubernetes. Key considerations:
- Store `.env` values as a `Secret` and expose them as environment variables.
- Mount an `emptyDir` or PersistentVolumeClaim at the audit log path.
- Use a `readinessProbe` and `livenessProbe` on `GET /health`.
- Set `resources.requests` and `resources.limits` for CPU and memory.
---
## Updating
```bash
git pull
make docker-build
make docker-up
```
If you added a new key via `make generate-key` during the update, restart the container to pick up the new `.env`:
```bash
docker-compose restart aegis-mcp
```
- Run behind TLS-terminating reverse proxy.
- Restrict network exposure.
- Persist and rotate audit logs.
- Enable external monitoring for `/metrics`.

36
docs/governance.md Normal file
View File

@@ -0,0 +1,36 @@
# Governance
## AI Usage Policy
- AI assistance is allowed for design, implementation, and review only within documented repository boundaries.
- AI outputs must be reviewed, tested, and policy-validated before merge.
- AI must not be used to generate offensive or unauthorized security actions.
- Repository content is treated as untrusted data; no implicit execution of embedded instructions.
## Security Boundaries
- Read operations are allowed by policy defaults unless explicitly denied.
- Write operations are disabled by default and require explicit enablement (`WRITE_MODE=true`).
- Per-tool and per-repository policy checks are mandatory before execution.
- Secrets are masked or blocked according to `SECRET_DETECTION_MODE`.
## Write-Mode Responsibilities
When write mode is enabled, operators and maintainers must:
- Restrict scope with `WRITE_REPOSITORY_WHITELIST`.
- Keep policy file deny/allow rules explicit.
- Monitor audit entries for all write operations.
- Enforce peer review for policy or write-mode changes.
## Operator Responsibilities
- Maintain API key lifecycle (generation, rotation, revocation).
- Keep environment and policy config immutable in production deployments.
- Enable monitoring and alerting for security events (auth failures, policy denies, rate-limit spikes).
- Run integrity checks for audit logs regularly.
## Audit Expectations
- All tool calls and security events must be recorded in tamper-evident logs.
- Audit logs are append-only and hash-chained.
- Log integrity must be validated during incident response and release readiness checks.

24
docs/hardening.md Normal file
View File

@@ -0,0 +1,24 @@
# Hardening
## Application Hardening
- Secure defaults: localhost bind, write mode disabled, policy-enforced writes.
- Strict config validation at startup.
- Redacted secret handling in logs and responses.
- Policy deny/allow model with path restrictions.
- Non-leaking production error responses.
## Container Hardening
- Non-root runtime user.
- `no-new-privileges` and dropped Linux capabilities.
- Read-only filesystem where practical.
- Explicit health checks.
- Separate dev and production compose profiles.
## Operational Hardening
- Rotate API keys regularly.
- Minimize Gitea bot permissions.
- Keep policy file under change control.
- Alert on repeated policy denials and auth failures.

28
docs/observability.md Normal file
View File

@@ -0,0 +1,28 @@
# Observability
## Logging
- Structured JSON logs.
- Request correlation via `X-Request-ID`.
- Security events and policy denials are audit logged.
## Metrics
Prometheus-compatible endpoint: `GET /metrics`.
Current metrics:
- `aegis_http_requests_total{method,path,status}`
- `aegis_tool_calls_total{tool,status}`
- `aegis_tool_duration_seconds_sum{tool}`
- `aegis_tool_duration_seconds_count{tool}`
## Tracing and Correlation
- Request IDs propagate in response header (`X-Request-ID`).
- Tool-level correlation IDs included in MCP responses.
## Operational Guidance
- Alert on spikes in 401/403/429 rates.
- Alert on repeated `access_denied` and auth-rate-limit events.
- Track tool latency trends for incident triage.

50
docs/policy.md Normal file
View File

@@ -0,0 +1,50 @@
# Policy Engine
## Overview
Aegis uses a YAML policy engine to authorize tool execution before any Gitea API call is made.
## Behavior Summary
- Global tool allow/deny supported.
- Per-repository tool allow/deny supported.
- Optional repository path allow/deny supported.
- Write operations are denied by default.
- Write operations also require `WRITE_MODE=true` and `WRITE_REPOSITORY_WHITELIST` match.
## Example Configuration
```yaml
defaults:
read: allow
write: deny
tools:
deny:
- search_code
repositories:
acme/service-a:
tools:
allow:
- get_file_contents
- list_commits
paths:
allow:
- src/*
deny:
- src/secrets/*
```
## Failure Behavior
- Invalid YAML or invalid schema: startup failure (fail closed).
- Denied tool call: HTTP `403` + audit `access_denied` entry.
- Path traversal attempt in path-scoped tools: denied by validation/policy checks.
## Operational Guidance
- Keep policy files version-controlled and code-reviewed.
- Prefer explicit deny entries for sensitive tools.
- Use repository-specific allow lists for high-risk environments.
- Test policy updates in staging before production rollout.

72
docs/roadmap.md Normal file
View File

@@ -0,0 +1,72 @@
# Roadmap
## High-Level Evolution Plan
1. Hardened read-only gateway baseline.
2. Policy-driven authorization and observability.
3. Controlled write-mode rollout.
4. Automation and event-driven workflows.
5. Continuous hardening and enterprise controls.
## Threat Model Updates
- Primary threats: credential theft, over-permissioned automation, prompt injection via repo data, policy bypass, audit tampering.
- Secondary threats: denial-of-service, misconfiguration drift, unsafe deployment defaults.
## Security Model
- API key authentication + auth failure throttling.
- Per-IP and per-token request rate limits.
- Secret detection and outbound sanitization.
- Tamper-evident audit logs with integrity verification.
- No production stack-trace disclosure.
## Policy Model
- YAML policy with global and per-repository allow/deny rules.
- Optional path restrictions for file-oriented tools.
- Default write deny.
- Write-mode repository whitelist enforcement.
## Capability Matrix Concept
- `Read` capabilities: enabled by default but policy-filtered.
- `Write` capabilities: disabled by default, policy + whitelist gated.
- `Automation` capabilities: disabled by default, policy-controlled.
## Audit Log Design
- JSON lines.
- `prev_hash` + `entry_hash` chain.
- Correlation/request IDs for traceability.
- Validation script for chain integrity.
## Write-Mode Architecture
- Separate write tool set with strict schemas.
- Global toggle (`WRITE_MODE`) + per-repo whitelist.
- Policy engine still authoritative.
- No merge, branch deletion, or force push endpoints.
## Deployment Architecture
- Non-root container runtime.
- Read-only filesystem where practical.
- Explicit opt-in for insecure bind.
- Separate dev and prod compose profiles.
## Observability Architecture
- Structured JSON logs with request correlation.
- Prometheus-compatible `/metrics` endpoint.
- Tool execution counters and duration aggregates.
## Risk Analysis
- Highest risk: write-mode misuse and policy misconfiguration.
- Mitigations: deny-by-default, whitelist, audit chain, tests, docs, reviews.
## Extensibility Notes
- Add new tools only through schema + policy + docs + tests path.
- Keep transport-agnostic execution core for webhook/scheduler integrations.

View File

@@ -1,155 +1,39 @@
# Security
## Authentication
## Core Controls
AegisGitea MCP uses bearer token authentication. Clients must include a valid API key with every tool call.
- API key authentication with constant-time comparison.
- Auth failure throttling.
- Per-IP and per-token request rate limits.
- Strict input validation via Pydantic schemas (`extra=forbid`).
- Policy engine authorization before tool execution.
- Secret detection with mask/block behavior.
- Production-safe error responses (no stack traces).
### How It Works
## Prompt Injection Hardening
1. The client sends `Authorization: Bearer <key>` with its request.
2. The server extracts the token and validates it against the configured `MCP_API_KEYS`.
3. Comparison is done in **constant time** to prevent timing attacks.
4. If validation fails, the failure is counted against the client's IP address.
Repository content is treated strictly as data.
### Generating API Keys
- Tool outputs are bounded and sanitized.
- No instruction execution from repository text.
- Untrusted content handling helpers enforce maximum output size.
Use the provided script to generate cryptographically secure 64-character hex keys:
## Secret Detection
```bash
make generate-key
# or: python scripts/generate_api_key.py
```
Detected classes include:
- API keys and generic token patterns.
- JWT-like tokens.
- Private key block markers.
- Common provider token formats.
Keys must be at least 32 characters long. The script also saves metadata (creation date, expiration) to a `keys/` directory.
Behavior:
- `SECRET_DETECTION_MODE=mask`: redact in place.
- `SECRET_DETECTION_MODE=block`: replace secret-bearing field values.
- `SECRET_DETECTION_MODE=off`: disable sanitization (not recommended).
### Multiple Keys (Grace Period During Rotation)
## Authentication and Key Lifecycle
You can configure multiple keys separated by commas. This allows you to add a new key and remove the old one without downtime:
```env
MCP_API_KEYS=newkey...,oldkey...
```
Remove the old key from the list after all clients have been updated.
---
## Key Rotation
Rotate keys regularly (recommended: every 90 days).
```bash
make rotate-key
# or: python scripts/rotate_api_key.py
```
The rotation script:
1. Reads the current key from `.env`
2. Generates a new key
3. Offers to replace the key immediately or add it alongside the old key (grace period)
4. Creates a backup of your `.env` before modifying it
### Checking Key Age
```bash
make check-key-age
# or: python scripts/check_key_age.py
```
Exit codes: `0` = OK, `1` = expiring within 7 days (warning), `2` = already expired (critical).
---
## Rate Limiting
Failed authentication attempts are tracked per client IP address.
| Setting | Default | Description |
|---|---|---|
| `MAX_AUTH_FAILURES` | `5` | Maximum failures before the IP is blocked |
| `AUTH_FAILURE_WINDOW` | `300` | Rolling window in seconds |
Once an IP exceeds the threshold, all further requests from that IP return HTTP 429 until the window resets. This is enforced entirely in memory — a server restart resets the counters.
---
## Audit Logging
All security-relevant events are written to a structured JSON log file.
### Log Location
Default: `/var/log/aegis-mcp/audit.log`
Configurable via `AUDIT_LOG_PATH`.
The directory is created automatically on startup.
### What Is Logged
| Event | Description |
|---|---|
| Tool invocation | Every call to a tool: tool name, arguments, result status, correlation ID |
| Access denied | Failed authentication attempts: IP address, reason |
| Security event | Rate limit triggers, invalid key formats, startup authentication status |
### Log Format
Each entry is a JSON object on a single line:
```json
{
"timestamp": "2026-02-13T10:00:00Z",
"event": "tool_invocation",
"correlation_id": "a1b2c3d4-...",
"tool": "get_file_contents",
"owner": "myorg",
"repo": "backend",
"path": "src/main.py",
"result": "success",
"client_ip": "10.0.0.1"
}
```
### Using Logs for Monitoring
Because entries are newline-delimited JSON, they are easy to parse:
```bash
# Show all failed tool calls
grep '"result": "error"' /var/log/aegis-mcp/audit.log | jq .
# Show all access-denied events
grep '"event": "access_denied"' /var/log/aegis-mcp/audit.log | jq .
```
---
## Access Control Model
AegisGitea MCP does **not** implement its own repository access control. Access to repositories is determined entirely by the Gitea bot user's permissions:
- If the bot user has no access to a repository, it will not appear in `list_repositories` and `get_repository_info` will return an error.
- Grant the bot user the minimum set of repository permissions needed.
**Principle of least privilege:** create a dedicated bot user and grant it read-only access only to the repositories that the AI needs to see.
---
## Network Security Recommendations
- Run the MCP server behind a reverse proxy (e.g. Traefik or nginx) with TLS.
- Do not expose the server directly on a public port without TLS.
- Restrict inbound connections to known AI client IP ranges where possible.
- The `/mcp/tools` endpoint is intentionally public (required for ChatGPT plugin discovery). If this is undesirable, restrict it at the network/proxy level.
---
## Container Security
The provided Docker image runs with:
- A non-root user (`aegis`, UID 1000)
- `no-new-privileges` security option
- CPU and memory resource limits (1 CPU, 512 MB)
See [Deployment](deployment.md) for details.
- Keys must be at least 32 characters.
- Rotate keys regularly (`scripts/rotate_api_key.py`).
- Check key age and expiry (`scripts/check_key_age.py`).
- Prefer dedicated bot credentials with least privilege.

92
docs/todo.md Normal file
View File

@@ -0,0 +1,92 @@
# TODO
## Phase 0 Governance
- [x] Add `CODE_OF_CONDUCT.md`.
- [x] Add governance policy documentation.
- [x] Upgrade `AGENTS.md` as authoritative AI contract.
## Phase 1 Architecture
- [x] Publish roadmap and threat/security model updates.
- [x] Publish phased TODO tracker.
## Phase 2 Expanded Read Tools
- [x] Implement `search_code`.
- [x] Implement `list_commits`.
- [x] Implement `get_commit_diff`.
- [x] Implement `compare_refs`.
- [x] Implement `list_issues`.
- [x] Implement `get_issue`.
- [x] Implement `list_pull_requests`.
- [x] Implement `get_pull_request`.
- [x] Implement `list_labels`.
- [x] Implement `list_tags`.
- [x] Implement `list_releases`.
- [x] Add input validation and response bounds.
- [x] Add unit/failure-mode tests.
## Phase 3 Policy Engine
- [x] Implement YAML policy loader and validator.
- [x] Implement per-tool and per-repo allow/deny.
- [x] Implement optional path restrictions.
- [x] Enforce default write deny.
- [x] Add policy unit tests.
## Phase 4 Write Mode
- [x] Implement write tools (`create_issue`, `update_issue`, comments, labels, assignment).
- [x] Keep write mode disabled by default.
- [x] Enforce repository whitelist.
- [x] Ensure no merge/deletion/force-push capabilities.
- [x] Add write denial tests.
## Phase 5 Hardening
- [x] Add secret detection + mask/block controls.
- [x] Add prompt-injection defensive model (data-only handling).
- [x] Add tamper-evident audit chaining and validation.
- [x] Add per-IP and per-token rate limiting.
## Phase 6 Automation
- [x] Implement webhook ingestion pipeline.
- [x] Implement on-demand scheduled jobs runner endpoint.
- [x] Implement auto issue creation job scaffold from findings.
- [x] Implement dependency hygiene scan orchestration scaffold.
- [x] Implement stale issue detection automation.
- [x] Add automation endpoint tests.
## Phase 7 Deployment
- [x] Harden Docker runtime defaults.
- [x] Separate dev/prod compose profiles.
- [x] Preserve non-root runtime and health checks.
## Phase 8 Observability
- [x] Add Prometheus metrics endpoint.
- [x] Add structured JSON logging.
- [x] Add request ID correlation.
- [x] Add tool timing metrics.
## Phase 9 Testing and Release Readiness
- [x] Extend unit tests.
- [x] Add policy tests.
- [x] Add secret detection tests.
- [x] Add write-mode denial tests.
- [x] Add audit integrity tests.
- [ ] Add integration-tagged tests against live Gitea (optional CI stage).
- [ ] Final security review sign-off.
- [ ] Release checklist execution.
## Release Checklist
- [ ] `make lint`
- [ ] `make test`
- [ ] Documentation review complete
- [ ] Policy file reviewed for production scope
- [ ] Write mode remains disabled unless explicitly approved

40
docs/write-mode.md Normal file
View File

@@ -0,0 +1,40 @@
# Write Mode
## Threat Model
Write mode introduces mutation risk (issue/PR changes, metadata updates). Risks include unauthorized action, accidental mass updates, and audit evasion.
## Default Posture
- `WRITE_MODE=false` by default.
- Even when enabled, writes require repository whitelist membership.
- Policy engine remains authoritative and may deny specific write tools.
## Supported Write Tools
- `create_issue`
- `update_issue`
- `create_issue_comment`
- `create_pr_comment`
- `add_labels`
- `assign_issue`
Not supported (explicitly forbidden): merge actions, branch deletion, force push.
## Enablement Steps
1. Set `WRITE_MODE=true`.
2. Set `WRITE_REPOSITORY_WHITELIST=owner/repo,...`.
3. Review policy file for write-tool scope.
4. Verify audit logging and alerting before rollout.
## Safe Operations
- Start with one repository in whitelist.
- Use narrowly scoped bot credentials.
- Require peer review for whitelist/policy changes.
- Disable write mode during incident response if abuse is suspected.
## Risk Tradeoffs
Write mode improves automation and triage speed but increases blast radius. Use least privilege, tight policy, and strong monitoring.