AegisGitea-MCP/README.md

# AegisGitea MCP

**A private, security-first MCP server for controlled AI access to self-hosted Gitea**

---

## Overview

AegisGitea MCP is a Model Context Protocol (MCP) server that enables controlled, auditable, read-only AI access to a self-hosted Gitea environment.

The system allows ChatGPT (Business / Developer environment) to inspect repositories, code, commits, issues, and pull requests **only through explicit MCP tool calls**, while all access control is dynamically managed through a dedicated bot user inside Gitea itself.

### Core Principles

- **Strong separation of concerns**: Clear boundaries between AI, MCP server, and Gitea
- **Least-privilege access**: Bot user has minimal necessary permissions
- **Full auditability**: Every AI action is logged with context
- **Dynamic authorization**: Access control via Gitea permissions (no redeployment needed)
- **Privacy-first**: Designed for homelab and private infrastructure

---

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│  ChatGPT (Business/Developer)                               │
│  - Initiates explicit MCP tool calls                        │
│  - Human-in-the-loop decision making                        │
└────────────────────┬────────────────────────────────────────┘
                     │ HTTPS (MCP over SSE)
                     ▼
┌─────────────────────────────────────────────────────────────┐
│  AegisGitea MCP Server (Python, Docker)                     │
│  - Implements MCP protocol                                  │
│  - Translates tool calls → Gitea API requests               │
│  - Enforces access, logging, and safety constraints         │
│  - Provides bounded, single-purpose tools                   │
└────────────────────┬────────────────────────────────────────┘
                     │ Gitea API (Bot User Token)
                     ▼
┌─────────────────────────────────────────────────────────────┐
│  Gitea Instance (Docker)                                    │
│  - Source of truth for authorization                        │
│  - Hosts dedicated read-only bot user                       │
│  - Determines AI-visible repositories dynamically           │
└─────────────────────────────────────────────────────────────┘
```

### Trust Model

| Component | Responsibility |
|-----------|----------------|
| **Gitea** | Authorization (what the AI can see) |
| **MCP Server** | Policy enforcement (how the AI accesses data) |
| **ChatGPT** | Decision initiation (when the AI acts) |
| **Human** | Final decision authority (why the AI acts) |

---

## Features

### Phase 1 — Foundation (Current)

- MCP protocol handling with SSE lifecycle
- Secure Gitea API communication via bot user token
- Health and readiness endpoints
- ChatGPT MCP registration flow

### Phase 2 — Authorization & Data Access (Planned)

- Repository discovery based on bot user permissions
- File tree and content retrieval with size limits
- Dynamic access control (changes in Gitea apply instantly)

### Phase 3 — Audit & Hardening (Planned)

- Comprehensive audit logging (timestamp, tool, repo, path, correlation ID)
- Request correlation and tracing
- Input validation and rate limiting
- Defensive bounds on all operations

### Phase 4 — Extended Context (Future)

- Commit history and diff inspection
- Issue and pull request visibility
- Full contextual understanding while maintaining read-only guarantees

---

## Authorization Model

### Bot User Strategy

A dedicated Gitea bot user represents "the AI":

- The MCP server authenticates as this user using a read-only token
- The bot user's repository permissions define AI visibility
- **No admin privileges**
- **No write permissions**
- **No implicit access**

This allows dynamic enable/disable of AI access **without restarting or reconfiguring** the MCP server.

**Example:**
```bash
# Grant AI access to a repository
git clone https://gitea.example.com/org/repo.git
cd repo
# Add bot user as collaborator with Read permission in Gitea UI

# Revoke AI access
# Remove bot user from repository in Gitea UI
```

---

## MCP Tool Design

All tools are:

- **Explicit**: Single-purpose, no hidden behavior
- **Deterministic**: Same input always produces same output
- **Bounded**: Size limits, path constraints, no wildcards
- **Auditable**: Full logging of every invocation

### Tool Categories

1. **Repository Discovery**
   - List repositories visible to bot user
   - Get repository metadata

2. **File Operations**
   - Get file tree for a repository
   - Read file contents (with size limits)

3. **Commit History** (Phase 4)
   - List commits for a repository
   - Get commit details and diffs

4. **Issues & PRs** (Phase 4)
   - List issues and pull requests
   - Read issue/PR details and comments

### Explicit Constraints

- No wildcard search tools
- No full-text indexing
- No recursive "read everything" operations
- No hidden or implicit data access

---

## Audit & Observability

Every MCP tool invocation logs:

- **Timestamp** (UTC)
- **Tool name**
- **Repository identifier**
- **Target** (path / commit / issue)
- **Correlation ID**

Logs are:

- Append-only
- Human-readable JSON
- Machine-parseable
- Stored locally by default

**Audit Philosophy**: The system must answer "What exactly did the AI see, and when?" without ambiguity.

---

## Deployment

### Prerequisites

- Docker and Docker Compose
- Self-hosted Gitea instance
- Gitea bot user with read-only access token

### Quick Start

```bash
# Clone repository
git clone https://gitea.example.com/your-org/AegisGitea-MCP.git
cd AegisGitea-MCP

# Configure environment
cp .env.example .env
# Edit .env with your Gitea URL and bot token

# Start MCP server
docker-compose up -d

# Check logs
docker-compose logs -f aegis-mcp
```

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `GITEA_URL` | Base URL of Gitea instance | Yes |
| `GITEA_TOKEN` | Bot user access token | Yes |
| `MCP_HOST` | MCP server listen host | No (default: 0.0.0.0) |
| `MCP_PORT` | MCP server listen port | No (default: 8080) |
| `LOG_LEVEL` | Logging verbosity | No (default: INFO) |
| `AUDIT_LOG_PATH` | Audit log file path | No (default: /var/log/aegis-mcp/audit.log) |

### Security Considerations

1. **Never expose the MCP server publicly** — use a reverse proxy with TLS
2. **Rotate bot tokens regularly**
3. **Monitor audit logs** for unexpected access patterns
4. **Keep Docker images updated**
5. **Use a dedicated bot user** — never use a personal account token

---

## Development

### Setup

```bash
# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Run server locally
python -m aegis_gitea_mcp.server
```

### Project Structure

```
AegisGitea-MCP/
├── src/
│   └── aegis_gitea_mcp/
│       ├── __init__.py
│       ├── server.py          # MCP server entry point
│       ├── mcp_protocol.py    # MCP protocol implementation
│       ├── gitea_client.py    # Gitea API client
│       ├── audit.py           # Audit logging
│       ├── config.py          # Configuration management
│       └── tools/             # MCP tool implementations
│           ├── __init__.py
│           ├── repository.py  # Repository discovery tools
│           └── files.py       # File access tools
├── tests/
│   ├── test_mcp_protocol.py
│   ├── test_gitea_client.py
│   └── test_tools.py
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── .env.example
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
└── README.md
```

---

## Non-Goals

Explicitly **out of scope**:

- No write access to Gitea (no commits, comments, merges, edits)
- No autonomous or background scanning
- No global search or unrestricted crawling
- No public exposure of repositories or credentials
- No coupling to GitHub or external VCS platforms

---

## Roadmap

- [x] Project initialization and architecture design
- [ ] **Phase 1**: MCP server foundation and Gitea integration
- [ ] **Phase 2**: Repository discovery and file access tools
- [ ] **Phase 3**: Audit logging and security hardening
- [ ] **Phase 4**: Commit history, issues, and PR support

---

## Contributing

This project prioritizes security and privacy. Contributions should:

1. Maintain read-only guarantees
2. Add comprehensive audit logging for new tools
3. Include tests for authorization and boundary cases
4. Document security implications

---

## License

MIT License - See LICENSE file for details

---

## Acknowledgments

Built on the [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic.

---

## Support

For issues, questions, or security concerns, please open an issue in the Gitea repository.

**Remember**: This is designed to be **boring, predictable, and safe** — not clever, not magical, and not autonomous.