AegisGitea-MCP/docs/security.md

# Security

## Core Controls

- OAuth2/OIDC bearer-token authentication for MCP tool execution.
- OIDC discovery + JWKS validation cache for JWT tokens.
- Userinfo validation fallback for opaque OAuth tokens.
- Scope enforcement:
  - `read:repository` for read tools.
  - `write:repository` for write tools.
- Policy engine checks before tool execution.
- Per-IP and per-token rate limiting.
- Strict schema validation (`extra=forbid`).
- Tamper-evident audit logging with hash chaining.
- Secret sanitization for logs and tool output.
- Production-safe error responses (no internal stack traces).

## Threat Model

### Why shared bot tokens are dangerous

- A single leaked bot token can expose all repositories that bot can access.
- Access is not naturally bounded per end user.
- Blast radius is large and cross-tenant.

### Why token-in-URL is insecure

- URLs can be captured by reverse proxy logs, browser history, referer headers, and monitoring pipelines.
- Bearer tokens must be passed in `Authorization` headers only.

### Why per-user OAuth reduces lateral access

- Each MCP request executes with the signed-in user token.
- Gitea authorization stays source-of-truth for repository visibility.
- A compromised token is limited to that user�s permissions.

## Resource-type-aware authorization

The public server runs in *service-PAT mode*: a privileged bot token makes the
actual Gitea calls while the per-user OAuth identity decides what the user may
reach. Repository calls are gated by the user's collaborator permission on
`owner/repo`. The rest of the Gitea surface — reachable through the
`gitea_request` escape hatch — is gated by **resource-type-aware authorization**
(`authz.py`). Every call is classified by `(method, path)` and enforced against
a type-specific rule. **Every decision fails closed**: a call that cannot be
classified, or whose permission cannot be positively verified against Gitea, is
denied and audited.

| Resource type | Rule (service-PAT mode) |
|---------------|--------------------------|
| `repository` | Per-user collaborator permission on `owner/repo` (existing check). A repo path that cannot be parsed to `owner/repo` is denied. |
| `org` | The signed-in user must be a **verified member** of the target org (checked against Gitea, fail closed). |
| `user_owned` | A resource owned by a named user/org (`/users/{name}`, `/packages/{owner}`): allowed only when the owner is the caller, or the caller is a verified member of the owning org. |
| `user_self` | Token-owner-scoped endpoints (`/user`, `/notifications`): **denied** — in service-PAT mode the data belongs to the bot, not the caller. |
| `misc_global` | Instance-wide read-only utilities (markdown render, version, gitignore templates): reads allowed; writes denied. |
| `admin` | **Default deny.** Allowed only when the operator opts in (`RAW_API_ALLOW_SENSITIVE=true`) **and** the signed-in user is a verified Gitea site administrator. |
| `unknown` | Denied. |

This gate runs *in addition to* the policy engine and the `WRITE_MODE` gate — a
write call is denied unless write mode is on, policy allows it, and the
resource-type rule passes. In pure-OAuth mode (no service PAT) the user's own
token already scopes every call at Gitea, so the extra gate is unnecessary.

Positive verification results (org membership, site-admin) are cached briefly
and bounded; only successful checks are cached, so a transient failure never
grants access.

## Full-API coverage: classified `gitea_request`

`gitea_request` exposes the long tail of the Gitea API that the curated typed
tools do not cover, safely:

- **Deterministic read/write classifier.** `GET`/`HEAD` are reads; everything
  else is a write. A small, explicit override table may only *downgrade*
  provably side-effect-free render endpoints (markdown/markup) to reads — never
  the reverse — so a mutating call can never be misclassified as a read and slip
  past the `WRITE_MODE` gate.
- **Known-path gate.** A request whose top path segment is not a recognized
  Gitea `/api/v1` route prefix is denied (fail closed): unknown paths are never
  passed straight through.
- **Admin/credential denylist.** `/admin`, `*tokens*`, `*secrets*`, `*hooks*`,
  `*keys*`, `applications/oauth2`, and runner registration tokens are blocked for
  every method (including `GET`) and cannot be re-opened from `policy.yaml` —
  only `RAW_API_ALLOW_SENSITIVE=true` overrides them, and admin then still
  requires a verified site administrator (see above).

## Prompt Injection Hardening

Repository content is treated as untrusted data.

- Tool outputs are bounded and sanitized.
- No instructions from repository text are executed.
- Text fields are size-limited before returning to LLM clients.

## Secret Detection

Detected classes include:

- API key and token patterns.
- JWT-like tokens.
- Private key block markers.
- Common provider credential formats.

Behavior:

- `SECRET_DETECTION_MODE=mask`: redact in place.
- `SECRET_DETECTION_MODE=block`: replace secret-bearing values.
- `SECRET_DETECTION_MODE=off`: disable sanitization (not recommended).