.

2026-01-29 19:53:36 +01:00
parent 1bda2013bb
commit a9708b33e2
27 changed files with 3745 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -1,5 +1,322 @@
-# AegisGitea-MCP
+# AegisGitea MCP

-AegisGitea MCP is a private, security-first MCP (Model Context Protocol) server that enables controlled, auditable, read-only AI access to a self-hosted Gitea environment.
-
-The system allows ChatGPT (Business / Developer environment) to inspect repositories, code, commits, issues, and pull requests only through explicit MCP tool calls, while all access control is dynamically managed through a dedicated bot user inside Gitea itself.
+**A private, security-first MCP server for controlled AI access to self-hosted Gitea**
+
+---
+
+## Overview
+
+AegisGitea MCP is a Model Context Protocol (MCP) server that enables controlled, auditable, read-only AI access to a self-hosted Gitea environment.
+
+The system allows ChatGPT (Business / Developer environment) to inspect repositories, code, commits, issues, and pull requests **only through explicit MCP tool calls**, while all access control is dynamically managed through a dedicated bot user inside Gitea itself.
+
+### Core Principles
+
+- **Strong separation of concerns**: Clear boundaries between AI, MCP server, and Gitea
+- **Least-privilege access**: Bot user has minimal necessary permissions
+- **Full auditability**: Every AI action is logged with context
+- **Dynamic authorization**: Access control via Gitea permissions (no redeployment needed)
+- **Privacy-first**: Designed for homelab and private infrastructure
+
+---
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ChatGPT (Business/Developer)                               │
+│  - Initiates explicit MCP tool calls                        │
+│  - Human-in-the-loop decision making                        │
+└────────────────────┬────────────────────────────────────────┘
+                     │ HTTPS (MCP over SSE)
+                     ▼
+┌─────────────────────────────────────────────────────────────┐
+│  AegisGitea MCP Server (Python, Docker)                     │
+│  - Implements MCP protocol                                  │
+│  - Translates tool calls → Gitea API requests               │
+│  - Enforces access, logging, and safety constraints         │
+│  - Provides bounded, single-purpose tools                   │
+└────────────────────┬────────────────────────────────────────┘
+                     │ Gitea API (Bot User Token)
+                     ▼
+┌─────────────────────────────────────────────────────────────┐
+│  Gitea Instance (Docker)                                    │
+│  - Source of truth for authorization                        │
+│  - Hosts dedicated read-only bot user                       │
+│  - Determines AI-visible repositories dynamically           │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Trust Model
+
+| Component | Responsibility |
+|-----------|----------------|
+| **Gitea** | Authorization (what the AI can see) |
+| **MCP Server** | Policy enforcement (how the AI accesses data) |
+| **ChatGPT** | Decision initiation (when the AI acts) |
+| **Human** | Final decision authority (why the AI acts) |
+
+---
+
+## Features
+
+### Phase 1 — Foundation (Current)
+
+- MCP protocol handling with SSE lifecycle
+- Secure Gitea API communication via bot user token
+- Health and readiness endpoints
+- ChatGPT MCP registration flow
+
+### Phase 2 — Authorization & Data Access (Planned)
+
+- Repository discovery based on bot user permissions
+- File tree and content retrieval with size limits
+- Dynamic access control (changes in Gitea apply instantly)
+
+### Phase 3 — Audit & Hardening (Planned)
+
+- Comprehensive audit logging (timestamp, tool, repo, path, correlation ID)
+- Request correlation and tracing
+- Input validation and rate limiting
+- Defensive bounds on all operations
+
+### Phase 4 — Extended Context (Future)
+
+- Commit history and diff inspection
+- Issue and pull request visibility
+- Full contextual understanding while maintaining read-only guarantees
+
+---
+
+## Authorization Model
+
+### Bot User Strategy
+
+A dedicated Gitea bot user represents "the AI":
+
+- The MCP server authenticates as this user using a read-only token
+- The bot user's repository permissions define AI visibility
+- **No admin privileges**
+- **No write permissions**
+- **No implicit access**
+
+This allows dynamic enable/disable of AI access **without restarting or reconfiguring** the MCP server.
+
+**Example:**
+```bash
+# Grant AI access to a repository
+git clone https://gitea.example.com/org/repo.git
+cd repo
+# Add bot user as collaborator with Read permission in Gitea UI
+
+# Revoke AI access
+# Remove bot user from repository in Gitea UI
+```
+
+---
+
+## MCP Tool Design
+
+All tools are:
+
+- **Explicit**: Single-purpose, no hidden behavior
+- **Deterministic**: Same input always produces same output
+- **Bounded**: Size limits, path constraints, no wildcards
+- **Auditable**: Full logging of every invocation
+
+### Tool Categories
+
+1. **Repository Discovery**
+   - List repositories visible to bot user
+   - Get repository metadata
+
+2. **File Operations**
+   - Get file tree for a repository
+   - Read file contents (with size limits)
+
+3. **Commit History** (Phase 4)
+   - List commits for a repository
+   - Get commit details and diffs
+
+4. **Issues & PRs** (Phase 4)
+   - List issues and pull requests
+   - Read issue/PR details and comments
+
+### Explicit Constraints
+
+- No wildcard search tools
+- No full-text indexing
+- No recursive "read everything" operations
+- No hidden or implicit data access
+
+---
+
+## Audit & Observability
+
+Every MCP tool invocation logs:
+
+- **Timestamp** (UTC)
+- **Tool name**
+- **Repository identifier**
+- **Target** (path / commit / issue)
+- **Correlation ID**
+
+Logs are:
+
+- Append-only
+- Human-readable JSON
+- Machine-parseable
+- Stored locally by default
+
+**Audit Philosophy**: The system must answer "What exactly did the AI see, and when?" without ambiguity.
+
+---
+
+## Deployment
+
+### Prerequisites
+
+- Docker and Docker Compose
+- Self-hosted Gitea instance
+- Gitea bot user with read-only access token
+
+### Quick Start
+
+```bash
+# Clone repository
+git clone https://gitea.example.com/your-org/AegisGitea-MCP.git
+cd AegisGitea-MCP
+
+# Configure environment
+cp .env.example .env
+# Edit .env with your Gitea URL and bot token
+
+# Start MCP server
+docker-compose up -d
+
+# Check logs
+docker-compose logs -f aegis-mcp
+```
+
+### Environment Variables
+
+| Variable | Description | Required |
+|----------|-------------|----------|
+| `GITEA_URL` | Base URL of Gitea instance | Yes |
+| `GITEA_TOKEN` | Bot user access token | Yes |
+| `MCP_HOST` | MCP server listen host | No (default: 0.0.0.0) |
+| `MCP_PORT` | MCP server listen port | No (default: 8080) |
+| `LOG_LEVEL` | Logging verbosity | No (default: INFO) |
+| `AUDIT_LOG_PATH` | Audit log file path | No (default: /var/log/aegis-mcp/audit.log) |
+
+### Security Considerations
+
+1. **Never expose the MCP server publicly** — use a reverse proxy with TLS
+2. **Rotate bot tokens regularly**
+3. **Monitor audit logs** for unexpected access patterns
+4. **Keep Docker images updated**
+5. **Use a dedicated bot user** — never use a personal account token
+
+---
+
+## Development
+
+### Setup
+
+```bash
+# Create virtual environment
+python3 -m venv venv
+source venv/bin/activate
+
+# Install dependencies
+pip install -r requirements-dev.txt
+
+# Run tests
+pytest tests/
+
+# Run server locally
+python -m aegis_gitea_mcp.server
+```
+
+### Project Structure
+
+```
+AegisGitea-MCP/
+├── src/
+│   └── aegis_gitea_mcp/
+│       ├── __init__.py
+│       ├── server.py          # MCP server entry point
+│       ├── mcp_protocol.py    # MCP protocol implementation
+│       ├── gitea_client.py    # Gitea API client
+│       ├── audit.py           # Audit logging
+│       ├── config.py          # Configuration management
+│       └── tools/             # MCP tool implementations
+│           ├── __init__.py
+│           ├── repository.py  # Repository discovery tools
+│           └── files.py       # File access tools
+├── tests/
+│   ├── test_mcp_protocol.py
+│   ├── test_gitea_client.py
+│   └── test_tools.py
+├── docker/
+│   ├── Dockerfile
+│   └── docker-compose.yml
+├── .env.example
+├── pyproject.toml
+├── requirements.txt
+├── requirements-dev.txt
+└── README.md
+```
+
+---
+
+## Non-Goals
+
+Explicitly **out of scope**:
+
+- No write access to Gitea (no commits, comments, merges, edits)
+- No autonomous or background scanning
+- No global search or unrestricted crawling
+- No public exposure of repositories or credentials
+- No coupling to GitHub or external VCS platforms
+
+---
+
+## Roadmap
+
+- [x] Project initialization and architecture design
+- [ ] **Phase 1**: MCP server foundation and Gitea integration
+- [ ] **Phase 2**: Repository discovery and file access tools
+- [ ] **Phase 3**: Audit logging and security hardening
+- [ ] **Phase 4**: Commit history, issues, and PR support
+
+---
+
+## Contributing
+
+This project prioritizes security and privacy. Contributions should:
+
+1. Maintain read-only guarantees
+2. Add comprehensive audit logging for new tools
+3. Include tests for authorization and boundary cases
+4. Document security implications
+
+---
+
+## License
+
+MIT License - See LICENSE file for details
+
+---
+
+## Acknowledgments
+
+Built on the [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic.
+
+---
+
+## Support
+
+For issues, questions, or security concerns, please open an issue in the Gitea repository.
+
+**Remember**: This is designed to be **boring, predictable, and safe** — not clever, not magical, and not autonomous.