Files
Latte 385b442b6f docs: local vs server quickstart, authz model, packaging
Reframe the README around two transports and add a local stdio quickstart with
uvx/pip and Claude Desktop / Claude Code wiring. New docs: local-quickstart.md
and packaging.md (uv build/publish). Document resource-type-aware authorization
and classified gitea_request in security.md; stdio env vars + audit-log
fallback in configuration.md; local install in deployment.md; core+adapters in
architecture.md. Add the missing root AGENTS.md contract, update CLAUDE.md with
the core/adapter layout, fail-closed invariants, and the branching flow
(HEAD -> feature -> dev -> main). Update roadmap/todo and .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 11:17:01 +02:00

109 lines
5.0 KiB
Markdown

# Security
## Core Controls
- OAuth2/OIDC bearer-token authentication for MCP tool execution.
- OIDC discovery + JWKS validation cache for JWT tokens.
- Userinfo validation fallback for opaque OAuth tokens.
- Scope enforcement:
- `read:repository` for read tools.
- `write:repository` for write tools.
- Policy engine checks before tool execution.
- Per-IP and per-token rate limiting.
- Strict schema validation (`extra=forbid`).
- Tamper-evident audit logging with hash chaining.
- Secret sanitization for logs and tool output.
- Production-safe error responses (no internal stack traces).
## Threat Model
### Why shared bot tokens are dangerous
- A single leaked bot token can expose all repositories that bot can access.
- Access is not naturally bounded per end user.
- Blast radius is large and cross-tenant.
### Why token-in-URL is insecure
- URLs can be captured by reverse proxy logs, browser history, referer headers, and monitoring pipelines.
- Bearer tokens must be passed in `Authorization` headers only.
### Why per-user OAuth reduces lateral access
- Each MCP request executes with the signed-in user token.
- Gitea authorization stays source-of-truth for repository visibility.
- A compromised token is limited to that users permissions.
## Resource-type-aware authorization
The public server runs in *service-PAT mode*: a privileged bot token makes the
actual Gitea calls while the per-user OAuth identity decides what the user may
reach. Repository calls are gated by the user's collaborator permission on
`owner/repo`. The rest of the Gitea surface — reachable through the
`gitea_request` escape hatch — is gated by **resource-type-aware authorization**
(`authz.py`). Every call is classified by `(method, path)` and enforced against
a type-specific rule. **Every decision fails closed**: a call that cannot be
classified, or whose permission cannot be positively verified against Gitea, is
denied and audited.
| Resource type | Rule (service-PAT mode) |
|---------------|--------------------------|
| `repository` | Per-user collaborator permission on `owner/repo` (existing check). A repo path that cannot be parsed to `owner/repo` is denied. |
| `org` | The signed-in user must be a **verified member** of the target org (checked against Gitea, fail closed). |
| `user_owned` | A resource owned by a named user/org (`/users/{name}`, `/packages/{owner}`): allowed only when the owner is the caller, or the caller is a verified member of the owning org. |
| `user_self` | Token-owner-scoped endpoints (`/user`, `/notifications`): **denied** — in service-PAT mode the data belongs to the bot, not the caller. |
| `misc_global` | Instance-wide read-only utilities (markdown render, version, gitignore templates): reads allowed; writes denied. |
| `admin` | **Default deny.** Allowed only when the operator opts in (`RAW_API_ALLOW_SENSITIVE=true`) **and** the signed-in user is a verified Gitea site administrator. |
| `unknown` | Denied. |
This gate runs *in addition to* the policy engine and the `WRITE_MODE` gate — a
write call is denied unless write mode is on, policy allows it, and the
resource-type rule passes. In pure-OAuth mode (no service PAT) the user's own
token already scopes every call at Gitea, so the extra gate is unnecessary.
Positive verification results (org membership, site-admin) are cached briefly
and bounded; only successful checks are cached, so a transient failure never
grants access.
## Full-API coverage: classified `gitea_request`
`gitea_request` exposes the long tail of the Gitea API that the curated typed
tools do not cover, safely:
- **Deterministic read/write classifier.** `GET`/`HEAD` are reads; everything
else is a write. A small, explicit override table may only *downgrade*
provably side-effect-free render endpoints (markdown/markup) to reads — never
the reverse — so a mutating call can never be misclassified as a read and slip
past the `WRITE_MODE` gate.
- **Known-path gate.** A request whose top path segment is not a recognized
Gitea `/api/v1` route prefix is denied (fail closed): unknown paths are never
passed straight through.
- **Admin/credential denylist.** `/admin`, `*tokens*`, `*secrets*`, `*hooks*`,
`*keys*`, `applications/oauth2`, and runner registration tokens are blocked for
every method (including `GET`) and cannot be re-opened from `policy.yaml`
only `RAW_API_ALLOW_SENSITIVE=true` overrides them, and admin then still
requires a verified site administrator (see above).
## Prompt Injection Hardening
Repository content is treated as untrusted data.
- Tool outputs are bounded and sanitized.
- No instructions from repository text are executed.
- Text fields are size-limited before returning to LLM clients.
## Secret Detection
Detected classes include:
- API key and token patterns.
- JWT-like tokens.
- Private key block markers.
- Common provider credential formats.
Behavior:
- `SECRET_DETECTION_MODE=mask`: redact in place.
- `SECRET_DETECTION_MODE=block`: replace secret-bearing values.
- `SECRET_DETECTION_MODE=off`: disable sanitization (not recommended).