Add OAuth2/OIDC per-user Gitea authentication

Introduce a GiteaOAuthValidator for JWT and userinfo validation and fallbacks, add /oauth/token proxy, and thread per-user tokens through the request context and automation paths. Update config and .env.example for OAuth-first mode, add OpenAPI, extensive unit/integration tests, GitHub/Gitea CI workflows, docs, and lint/test enforcement (>=80% cov).
2026-02-25 16:54:01 +01:00
parent a00b6a0ba2
commit 59e1ea53a8
31 changed files with 2575 additions and 660 deletions
--- a/docs/security.md
+++ b/docs/security.md
@@ -2,38 +2,57 @@

 ## Core Controls

- API key authentication with constant-time comparison.
- Auth failure throttling.
- Per-IP and per-token request rate limits.
- Strict input validation via Pydantic schemas (`extra=forbid`).
- Policy engine authorization before tool execution.
- Secret detection with mask/block behavior.
- Production-safe error responses (no stack traces).
+- OAuth2/OIDC bearer-token authentication for MCP tool execution.
+- OIDC discovery + JWKS validation cache for JWT tokens.
+- Userinfo validation fallback for opaque OAuth tokens.
+- Scope enforcement:
+  - `read:repository` for read tools.
+  - `write:repository` for write tools.
+- Policy engine checks before tool execution.
+- Per-IP and per-token rate limiting.
+- Strict schema validation (`extra=forbid`).
+- Tamper-evident audit logging with hash chaining.
+- Secret sanitization for logs and tool output.
+- Production-safe error responses (no internal stack traces).
+
+## Threat Model
+
+### Why shared bot tokens are dangerous
+
+- A single leaked bot token can expose all repositories that bot can access.
+- Access is not naturally bounded per end user.
+- Blast radius is large and cross-tenant.
+
+### Why token-in-URL is insecure
+
+- URLs can be captured by reverse proxy logs, browser history, referer headers, and monitoring pipelines.
+- Bearer tokens must be passed in `Authorization` headers only.
+
+### Why per-user OAuth reduces lateral access
+
+- Each MCP request executes with the signed-in user token.
+- Gitea authorization stays source-of-truth for repository visibility.
+- A compromised token is limited to that user’s permissions.

 ## Prompt Injection Hardening

-Repository content is treated strictly as data.
+Repository content is treated as untrusted data.

 - Tool outputs are bounded and sanitized.
- No instruction execution from repository text.
- Untrusted content handling helpers enforce maximum output size.
+- No instructions from repository text are executed.
+- Text fields are size-limited before returning to LLM clients.

 ## Secret Detection

 Detected classes include:
- API keys and generic token patterns.
+
+- API key and token patterns.
 - JWT-like tokens.
 - Private key block markers.
- Common provider token formats.
+- Common provider credential formats.

 Behavior:
+
 - `SECRET_DETECTION_MODE=mask`: redact in place.
- `SECRET_DETECTION_MODE=block`: replace secret-bearing field values.
+- `SECRET_DETECTION_MODE=block`: replace secret-bearing values.
 - `SECRET_DETECTION_MODE=off`: disable sanitization (not recommended).
-
-## Authentication and Key Lifecycle
-
- Keys must be at least 32 characters.
- Rotate keys regularly (`scripts/rotate_api_key.py`).
- Check key age and expiry (`scripts/check_key_age.py`).
- Prefer dedicated bot credentials with least privilege.