.
This commit is contained in:
456
ARCHITECTURE.md
Normal file
456
ARCHITECTURE.md
Normal file
@@ -0,0 +1,456 @@
|
||||
# AegisGitea MCP - Architecture Documentation
|
||||
|
||||
---
|
||||
|
||||
## System Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ChatGPT Business │
|
||||
│ (AI Assistant Interface) │
|
||||
│ │
|
||||
│ User: "Show me the files in my-repo" │
|
||||
└────────────────────────────┬────────────────────────────────────────┘
|
||||
│ HTTPS (MCP over SSE)
|
||||
│ Tool: get_file_tree(owner, repo)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Reverse Proxy (Traefik/Nginx) │
|
||||
│ TLS Termination │
|
||||
└────────────────────────────┬────────────────────────────────────────┘
|
||||
│ HTTP
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ AegisGitea MCP Server (Docker) │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────┐ │
|
||||
│ │ FastAPI Application │ │
|
||||
│ │ │ │
|
||||
│ │ Endpoints: │ │
|
||||
│ │ - GET /health (Health check) │ │
|
||||
│ │ - GET /mcp/tools (List available tools) │ │
|
||||
│ │ - POST /mcp/tool/call (Execute tool) │ │
|
||||
│ │ - GET /mcp/sse (Server-sent events) │ │
|
||||
│ └───────────────────────┬───────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────┴───────────────────────────────────────┐ │
|
||||
│ │ MCP Protocol Handler │ │
|
||||
│ │ - Tool validation │ │
|
||||
│ │ - Request/response mapping │ │
|
||||
│ │ - Correlation ID management │ │
|
||||
│ └───────────────────────┬───────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────┴───────────────────────────────────────┐ │
|
||||
│ │ Tool Implementations │ │
|
||||
│ │ │ │
|
||||
│ │ - list_repositories() - get_repository_info() │ │
|
||||
│ │ - get_file_tree() - get_file_contents() │ │
|
||||
│ └───────────────────────┬───────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────┬────────┴────────┬─────────────────────────────┐ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ┌───────────▼───────┐ ┌─────▼──────┐ ┌────────────────┐ │ │
|
||||
│ │ │ Gitea Client │ │ Config │ │ Audit Logger │ │ │
|
||||
│ │ │ - Auth │ │ Manager │ │ - Structured │ │ │
|
||||
│ │ │ - API calls │ │ - Env vars│ │ - JSON logs │ │ │
|
||||
│ │ │ - Error handling│ │ - Defaults│ │ - Correlation │ │ │
|
||||
│ │ └───────────┬───────┘ └────────────┘ └────────┬───────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────────┼────────────────────────────────────┼─────────┘ │
|
||||
│ │ │ │
|
||||
└─────────────────┼────────────────────────────────────┼───────────┘
|
||||
│ Gitea API │
|
||||
│ (Authorization: token XXX) │ Audit Logs
|
||||
▼ ▼
|
||||
┌─────────────────────────────────────┐ ┌──────────────────────────┐
|
||||
│ Gitea Instance │ │ Persistent Volume │
|
||||
│ (Self-hosted VCS) │ │ /var/log/aegis-mcp/ │
|
||||
│ │ │ audit.log │
|
||||
│ Repositories: │ └──────────────────────────┘
|
||||
│ ┌─────────────────────────────┐ │
|
||||
│ │ org/repo-1 (bot has access)│ │
|
||||
│ │ org/repo-2 (bot has access)│ │
|
||||
│ │ org/private (NO ACCESS) │ │
|
||||
│ └─────────────────────────────┘ │
|
||||
│ │
|
||||
│ Bot User: aegis-bot │
|
||||
│ Permissions: Read-only │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Component Responsibilities
|
||||
|
||||
### 1. ChatGPT (External)
|
||||
**Responsibility**: Initiate explicit tool calls based on user requests
|
||||
|
||||
- Receives MCP tool definitions
|
||||
- Constructs tool call requests
|
||||
- Presents results to user
|
||||
- Human-in-the-loop decision making
|
||||
|
||||
### 2. Reverse Proxy
|
||||
**Responsibility**: TLS termination and routing
|
||||
|
||||
- Terminates HTTPS connections
|
||||
- Routes to MCP server container
|
||||
- Handles SSL certificates
|
||||
- Optional: IP filtering, rate limiting
|
||||
|
||||
### 3. AegisGitea MCP Server (Core)
|
||||
**Responsibility**: MCP protocol implementation and policy enforcement
|
||||
|
||||
#### 3a. FastAPI Application
|
||||
- HTTP server with async support
|
||||
- Server-Sent Events endpoint
|
||||
- Health and status endpoints
|
||||
- Request routing
|
||||
|
||||
#### 3b. MCP Protocol Handler
|
||||
- Tool definition management
|
||||
- Request validation
|
||||
- Response formatting
|
||||
- Correlation ID tracking
|
||||
|
||||
#### 3c. Tool Implementations
|
||||
- Repository discovery
|
||||
- File tree navigation
|
||||
- File content retrieval
|
||||
- Bounded, single-purpose operations
|
||||
|
||||
#### 3d. Gitea Client
|
||||
- Async HTTP client for Gitea API
|
||||
- Bot user authentication
|
||||
- Error handling and retries
|
||||
- Response parsing
|
||||
|
||||
#### 3e. Config Manager
|
||||
- Environment variable loading
|
||||
- Validation with Pydantic
|
||||
- Default values
|
||||
- Type safety
|
||||
|
||||
#### 3f. Audit Logger
|
||||
- Structured JSON logging
|
||||
- Correlation ID tracking
|
||||
- Timestamp (UTC)
|
||||
- Append-only logs
|
||||
|
||||
### 4. Gitea Instance
|
||||
**Responsibility**: Authorization and data storage
|
||||
|
||||
- Source of truth for permissions
|
||||
- Repository data storage
|
||||
- Bot user management
|
||||
- Access control enforcement
|
||||
|
||||
### 5. Persistent Volume
|
||||
**Responsibility**: Audit log storage
|
||||
|
||||
- Durable storage for audit logs
|
||||
- Survives container restarts
|
||||
- Accessible for review/analysis
|
||||
|
||||
---
|
||||
|
||||
## Data Flow: Tool Invocation
|
||||
|
||||
```
|
||||
1. User Request
|
||||
├─> "Show me files in org/my-repo"
|
||||
└─> ChatGPT decides to call: get_file_tree(owner="org", repo="my-repo")
|
||||
|
||||
2. MCP Request
|
||||
├─> POST /mcp/tool/call
|
||||
├─> Body: {"tool": "get_file_tree", "arguments": {"owner": "org", "repo": "my-repo"}}
|
||||
└─> Generate correlation_id: uuid4()
|
||||
|
||||
3. Audit Log (Entry)
|
||||
├─> Log: tool_invocation
|
||||
├─> tool_name: "get_file_tree"
|
||||
├─> repository: "org/my-repo"
|
||||
└─> status: "pending"
|
||||
|
||||
4. Gitea API Call
|
||||
├─> GET /api/v1/repos/org/my-repo/git/trees/main
|
||||
├─> Header: Authorization: token XXX
|
||||
└─> Response: {"tree": [...files...]}
|
||||
|
||||
5. Authorization Check
|
||||
├─> 200 OK → Bot has access
|
||||
├─> 403 Forbidden → Log access_denied, raise error
|
||||
└─> 404 Not Found → Repository doesn't exist or no access
|
||||
|
||||
6. Response Processing
|
||||
├─> Extract file tree
|
||||
├─> Transform to simplified format
|
||||
└─> Apply size/count limits
|
||||
|
||||
7. Audit Log (Success)
|
||||
├─> Log: tool_invocation
|
||||
├─> status: "success"
|
||||
└─> params: {"count": 42}
|
||||
|
||||
8. MCP Response
|
||||
├─> 200 OK
|
||||
├─> Body: {"success": true, "result": {...files...}}
|
||||
└─> correlation_id: same as request
|
||||
|
||||
9. ChatGPT Processing
|
||||
├─> Receive file tree data
|
||||
├─> Format for user presentation
|
||||
└─> "Here are the files in org/my-repo: ..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Boundaries
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────────────────────────────┐
|
||||
│ Trust Boundary 1 │
|
||||
│ (Internet ↔ MCP Server) │
|
||||
│ │
|
||||
│ Controls: │
|
||||
│ - HTTPS/TLS encryption │
|
||||
│ - Reverse proxy authentication (optional) │
|
||||
│ - Rate limiting │
|
||||
│ - Firewall rules │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌───────────────────────────────────────────────────────────────┐
|
||||
│ Trust Boundary 2 │
|
||||
│ (MCP Server ↔ Gitea API) │
|
||||
│ │
|
||||
│ Controls: │
|
||||
│ - Bot user token authentication │
|
||||
│ - Gitea's access control (authoritative) │
|
||||
│ - API request timeouts │
|
||||
│ - Input validation │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌───────────────────────────────────────────────────────────────┐
|
||||
│ Trust Boundary 3 │
|
||||
│ (Container ↔ Host System) │
|
||||
│ │
|
||||
│ Controls: │
|
||||
│ - Non-root container user │
|
||||
│ - Resource limits (CPU, memory) │
|
||||
│ - No new privileges │
|
||||
│ - Read-only filesystem (where possible) │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authorization Flow
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ AI requests access to "org/private-repo" │
|
||||
└────────────────────────┬─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ MCP Server: Forward to Gitea API │
|
||||
│ with bot user token │
|
||||
└───────────────┬───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ Gitea: Check bot user permissions │
|
||||
│ for "org/private-repo" │
|
||||
└───────────────┬───────────────────────┘
|
||||
│
|
||||
┌───────┴────────┐
|
||||
│ │
|
||||
Bot is collaborator? │
|
||||
│ │
|
||||
┌────────▼─────┐ ┌──────▼──────┐
|
||||
│ YES │ │ NO │
|
||||
│ (Read access)│ │ (No access) │
|
||||
└────────┬─────┘ └──────┬──────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌─────────────────┐
|
||||
│ Return data │ │ Return 403 │
|
||||
│ Log: success │ │ Log: denied │
|
||||
└───────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Key Insight**: The MCP server never makes authorization decisions - it only forwards requests and respects Gitea's response.
|
||||
|
||||
---
|
||||
|
||||
## Failure Modes & Handling
|
||||
|
||||
### 1. Gitea Unavailable
|
||||
- **Detection**: HTTP connection error
|
||||
- **Response**: Return error to ChatGPT
|
||||
- **Logging**: Log connection failure
|
||||
- **Recovery**: Automatic retry on next request
|
||||
|
||||
### 2. Invalid Bot Token
|
||||
- **Detection**: 401 Unauthorized from Gitea
|
||||
- **Response**: Log security event, return auth error
|
||||
- **Logging**: High-severity security log
|
||||
- **Recovery**: Operator must rotate token
|
||||
|
||||
### 3. Bot Lacks Permission
|
||||
- **Detection**: 403 Forbidden from Gitea
|
||||
- **Response**: Return authorization error
|
||||
- **Logging**: Access denied event
|
||||
- **Recovery**: Grant permission in Gitea UI
|
||||
|
||||
### 4. File Too Large
|
||||
- **Detection**: File size exceeds MAX_FILE_SIZE_BYTES
|
||||
- **Response**: Return size limit error
|
||||
- **Logging**: Security event (potential abuse)
|
||||
- **Recovery**: Increase limit or reject request
|
||||
|
||||
### 5. Network Timeout
|
||||
- **Detection**: Request exceeds REQUEST_TIMEOUT_SECONDS
|
||||
- **Response**: Return timeout error
|
||||
- **Logging**: Log timeout event
|
||||
- **Recovery**: Automatic retry possible
|
||||
|
||||
### 6. Rate Limit Exceeded
|
||||
- **Detection**: Too many requests per minute
|
||||
- **Response**: Return 429 Too Many Requests
|
||||
- **Logging**: Log rate limit event
|
||||
- **Recovery**: Wait and retry
|
||||
|
||||
---
|
||||
|
||||
## Scaling Considerations
|
||||
|
||||
### Vertical Scaling (Single Instance)
|
||||
- **Current**: 128-512 MB RAM, minimal CPU
|
||||
- **Bottleneck**: Gitea API response time
|
||||
- **Max throughput**: ~100-200 requests/second
|
||||
|
||||
### Horizontal Scaling (Multiple Instances)
|
||||
- **Stateless design**: Each instance independent
|
||||
- **Load balancing**: Standard HTTP load balancer
|
||||
- **Shared state**: None (all state in Gitea)
|
||||
- **Audit logs**: Each instance writes to own log (or use centralized logging)
|
||||
|
||||
### Performance Optimization (Future)
|
||||
- Add Redis caching layer
|
||||
- Implement connection pooling
|
||||
- Use HTTP/2 for Gitea API
|
||||
- Batch multiple file reads
|
||||
|
||||
---
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics to Monitor
|
||||
1. **Request rate**: Requests per minute
|
||||
2. **Error rate**: Failed requests / total requests
|
||||
3. **Response time**: P50, P95, P99 latency
|
||||
4. **Gitea API health**: Success rate to Gitea
|
||||
5. **Auth failures**: 401/403 responses
|
||||
|
||||
### Logs to Track
|
||||
1. **Audit logs**: Every tool invocation
|
||||
2. **Access denied**: Permission violations
|
||||
3. **Security events**: Rate limits, size limits
|
||||
4. **Errors**: Exceptions and failures
|
||||
|
||||
### Alerts to Configure
|
||||
1. **High error rate**: > 5% errors
|
||||
2. **Auth failures**: Any 401 responses
|
||||
3. **Gitea unreachable**: Connection failures
|
||||
4. **Disk space**: Audit logs filling disk
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Phase 2: Extended Context
|
||||
```
|
||||
New Tools:
|
||||
├── get_commits(owner, repo, limit)
|
||||
├── get_commit_diff(owner, repo, sha)
|
||||
├── list_issues(owner, repo)
|
||||
├── get_issue(owner, repo, number)
|
||||
├── list_pull_requests(owner, repo)
|
||||
└── get_pull_request(owner, repo, number)
|
||||
```
|
||||
|
||||
### Phase 3: Advanced Features
|
||||
```
|
||||
Capabilities:
|
||||
├── Caching layer (Redis)
|
||||
├── Webhook support for real-time updates
|
||||
├── OAuth2 flow instead of static tokens
|
||||
├── Per-client rate limiting
|
||||
├── Multi-tenant support (multiple bot users)
|
||||
└── GraphQL API for more efficient queries
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Pattern 1: Single Homelab Instance
|
||||
```
|
||||
[Homelab Server]
|
||||
├── Gitea container
|
||||
├── AegisGitea MCP container
|
||||
└── Caddy reverse proxy
|
||||
└── Exposes HTTPS endpoint
|
||||
```
|
||||
|
||||
### Pattern 2: Kubernetes Deployment
|
||||
```
|
||||
[Kubernetes Cluster]
|
||||
├── Namespace: aegis-mcp
|
||||
├── Deployment: aegis-mcp (3 replicas)
|
||||
├── Service: ClusterIP
|
||||
├── Ingress: HTTPS with cert-manager
|
||||
└── PersistentVolume: Audit logs
|
||||
```
|
||||
|
||||
### Pattern 3: Cloud Deployment
|
||||
```
|
||||
[AWS/GCP/Azure]
|
||||
├── Container service (ECS/Cloud Run/ACI)
|
||||
├── Load balancer (ALB/Cloud Load Balancing)
|
||||
├── Secrets manager (Secrets Manager/Secret Manager/Key Vault)
|
||||
└── Log aggregation (CloudWatch/Cloud Logging/Monitor)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- Configuration loading
|
||||
- Gitea client methods
|
||||
- Tool implementations
|
||||
- Audit logging
|
||||
|
||||
### Integration Tests
|
||||
- Full MCP protocol flow
|
||||
- Gitea API interactions (mocked)
|
||||
- Error handling paths
|
||||
|
||||
### End-to-End Tests
|
||||
- Real Gitea instance
|
||||
- Real bot user
|
||||
- Real tool invocations
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
This architecture prioritizes:
|
||||
1. **Security**: Read-only, auditable, fail-safe
|
||||
2. **Simplicity**: Straightforward data flow
|
||||
3. **Maintainability**: Clear separation of concerns
|
||||
4. **Observability**: Comprehensive logging
|
||||
|
||||
The design is intentionally boring and predictable - perfect for a security-critical system.
|
||||
Reference in New Issue
Block a user