first commit

This commit is contained in:
2025-12-21 13:42:30 +01:00
parent 823b825acb
commit f9b24fe248
47 changed files with 8222 additions and 1 deletions

52
docs/README.md Normal file
View File

@@ -0,0 +1,52 @@
# AI Code Review Workflow Documentation
Enterprise-grade AI code review system for Gitea with automated issue triage, PR review, and codebase analysis.
## 📚 Documentation
| Document | Description |
|----------|-------------|
| [Getting Started](getting-started.md) | Quick setup guide |
| [Configuration](configuration.md) | All configuration options |
| [Agents](agents.md) | Detailed agent documentation |
| [Security](security.md) | Security scanning features |
| [API Reference](api-reference.md) | Client and agent APIs |
| [Workflows](workflows.md) | Gitea workflow examples |
| [Troubleshooting](troubleshooting.md) | Common issues and solutions |
## Quick Links
- **Setup**: See [Getting Started](getting-started.md)
- **Configuration**: See [Configuration](configuration.md)
- **Enterprise Features**: See [Enterprise](enterprise.md)
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Event Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ PR Event │ │ Issue │ │ Schedule │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼─────────────────────────┘
│ │ │
└─────────────┼─────────────┘
┌───────────────┐
│ Dispatcher │
└───────┬───────┘
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Issue │ │ PR │ │ Codebase │
│ Agent │ │ Agent │ │ Agent │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────┼──────────────┘
┌─────────────────┐
│ Gitea API │
│ LLM Provider │
└─────────────────┘
```

298
docs/agents.md Normal file
View File

@@ -0,0 +1,298 @@
# Agents Documentation
The AI Code Review system includes four specialized agents.
## Issue Agent
Handles issue triage, classification, and interaction.
### Triggers
- `issues.opened` - New issue created (handled by `run_issue_triage`)
- `issues.labeled` - Label added to issue
- `issue_comment.created` - Comment with @mention (handled by `run_issue_comment`)
### Features
**Automatic Triage:**
- Classifies issue type: bug, feature, question, documentation, support
- Assigns priority: high, medium, low
- Calculates confidence score
**Auto-Labeling:**
- Applies type labels (`type: bug`, etc.)
- Applies priority labels (`priority: high`, etc.)
- Adds `ai-reviewed` status label
**@Mention Commands:**
| Command | Description |
|---------|-------------|
| `@ai-bot summarize` | Generate concise summary |
| `@ai-bot explain` | Detailed explanation |
| `@ai-bot suggest` | Solution suggestions |
### Output
Posts a triage comment:
```markdown
## AI Issue Triage
| Field | Value |
|-------|--------|
| **Type** | Bug |
| **Priority** | High |
| **Confidence** | 85% |
### Additional Information Needed
- Steps to reproduce
- Error logs
---
*Classification based on issue content*
```
---
## PR Agent
Comprehensive pull request review with security scanning.
### Triggers
- `pull_request.opened` - New PR created
- `pull_request.synchronize` - PR updated with new commits
### Features
**AI Code Review:**
- Analyzes diff for issues
- Categorizes: Security, Correctness, Performance, Maintainability
- Assigns severity: HIGH, MEDIUM, LOW
**Inline Comments:**
- Posts comments on specific lines
- Links to file and line number
- Provides recommendations
**Security Scanning:**
- 17 OWASP-aligned rules
- Detects hardcoded secrets, SQL injection, XSS
- Fails CI on HIGH severity
**Label Management:**
- `ai-approved` - No blocking issues
- `ai-changes-required` - HIGH severity issues found
### Output
Posts summary comment:
```markdown
## AI Code Review
Review of changes in this PR.
### Summary
| Severity | Count |
|----------|-------|
| HIGH | 1 |
| MEDIUM | 2 |
| LOW | 3 |
### Security Issues
- **[HIGH]** `src/auth.py:45` - Hardcoded API key detected
### Review Findings
- **[MEDIUM]** `src/db.py:12` - SQL query uses string formatting
- **[LOW]** `src/utils.py:30` - Missing docstring
---
**Overall Severity:** `HIGH`
**AI Recommendation:** Changes Requested
```
---
## Codebase Agent
Repository-wide quality and health analysis.
### Triggers
- `schedule` - Cron schedule (default: weekly)
- `workflow_dispatch` - Manual trigger
- `@ai-bot codebase` - Comment command
### Features
**Metrics Collection:**
- Total files and lines of code
- Language distribution
- TODO/FIXME/DEPRECATED counts
**AI Analysis:**
- Overall health score (0-100)
- Architecture observations
- Technical debt identification
- Improvement recommendations
### Output
Creates/updates report issue:
```markdown
# AI Codebase Quality Report
## Health Score: 72/100
The codebase is in reasonable condition with some areas for improvement.
---
## Metrics
| Metric | Value |
|--------|-------|
| Total Files | 45 |
| Total Lines | 12,500 |
| TODO Comments | 23 |
| FIXME Comments | 8 |
### Languages
- **Python**: 35 files
- **JavaScript**: 10 files
## Issues Found
### [MEDIUM] Code Quality
Missing docstrings in 15 functions.
**Recommendation:** Add docstrings for public functions.
## Recommendations
1. Add comprehensive test coverage
2. Document API endpoints
3. Reduce TODO backlog
```
---
## Chat Agent (Bartender)
Interactive AI chat assistant with tool-calling capabilities.
### Triggers
- `issue_comment.created` - Any @ai-bot mention that isn't a specific command
- `chat` - Direct CLI invocation
### Features
**Tool Calling:**
The Chat Agent uses LLM function calling to gather information before responding:
| Tool | Description |
|------|-------------|
| `search_codebase` | Search repository files and code patterns |
| `read_file` | Read specific files from the repository |
| `search_web` | Search the web via SearXNG instance |
**Iterative Reasoning:**
- Makes up to 5 tool calls per request
- Combines information from multiple sources
- Provides comprehensive, contextual answers
**Web Search:**
- Requires SearXNG instance URL (via `SEARXNG_URL` env var or config)
- Searches for documentation, tutorials, external resources
### Configuration
```yaml
agents:
chat:
enabled: true
name: "Bartender"
max_iterations: 5
tools:
- search_codebase
- read_file
- search_web
searxng_url: "" # Or set SEARXNG_URL env var
```
### CLI Usage
```bash
# Simple chat
python main.py chat owner/repo "How does authentication work?"
# Chat and post response to issue
python main.py chat owner/repo "Explain this bug" --issue 123
```
### Issue Comment Usage
```
@ai-bot How do I configure rate limiting?
@ai-bot Find all files that handle user authentication
@ai-bot What does the dispatcher module do?
```
### Output
Posts a response comment:
```markdown
**Note:** This review was generated by an AI assistant...
---
Based on my analysis of the codebase, rate limiting is configured in
`tools/ai-review/config.yml` under the `enterprise.rate_limit` section:
- `requests_per_minute`: Maximum requests per minute (default: 30)
- `max_concurrent`: Maximum concurrent requests (default: 4)
The rate limiting logic is implemented in `enterprise/rate_limiter.py`...
```
---
## Agent Interface
All agents extend `BaseAgent`:
```python
from agents import BaseAgent, AgentContext, AgentResult
class CustomAgent(BaseAgent):
def can_handle(self, event_type: str, event_data: dict) -> bool:
# Return True if this agent handles the event
return event_type == "custom_event"
def execute(self, context: AgentContext) -> AgentResult:
# Perform agent logic
return AgentResult(
success=True,
message="Custom action completed",
actions_taken=["action1", "action2"],
)
```
Register with dispatcher:
```python
from dispatcher import get_dispatcher
from agents import CustomAgent
dispatcher = get_dispatcher()
dispatcher.register_agent(CustomAgent())
```

280
docs/api-reference.md Normal file
View File

@@ -0,0 +1,280 @@
# API Reference
## Gitea Client
`clients/gitea_client.py`
### Initialization
```python
from clients import GiteaClient
client = GiteaClient(
api_url="https://gitea.example.com/api/v1",
token="your_token",
timeout=30,
)
```
### Issue Methods
```python
# List issues
issues = client.list_issues(
owner="user",
repo="repo",
state="open", # open, closed, all
labels=["bug"],
page=1,
limit=30,
)
# Get single issue
issue = client.get_issue(owner, repo, index=123)
# Create comment
comment = client.create_issue_comment(owner, repo, index=123, body="Comment text")
# Update comment
client.update_issue_comment(owner, repo, comment_id=456, body="Updated text")
# List comments
comments = client.list_issue_comments(owner, repo, index=123)
# Add labels
client.add_issue_labels(owner, repo, index=123, labels=[1, 2, 3])
# Get repo labels
labels = client.get_repo_labels(owner, repo)
```
### Pull Request Methods
```python
# Get PR
pr = client.get_pull_request(owner, repo, index=123)
# Get diff
diff = client.get_pull_request_diff(owner, repo, index=123)
# List changed files
files = client.list_pull_request_files(owner, repo, index=123)
# Create review with inline comments
client.create_pull_request_review(
owner, repo, index=123,
body="Review summary",
event="COMMENT", # APPROVE, REQUEST_CHANGES, COMMENT
comments=[
{"path": "file.py", "line": 10, "body": "Issue here"},
],
)
```
### Repository Methods
```python
# Get repository info
repo = client.get_repository(owner, repo)
# Get file contents (base64 encoded)
content = client.get_file_contents(owner, repo, "path/to/file.py", ref="main")
# Get branch
branch = client.get_branch(owner, repo, "main")
```
---
## LLM Client
`clients/llm_client.py`
### Initialization
```python
from clients import LLMClient
# Direct initialization
client = LLMClient(
provider="openai", # openai, openrouter, ollama
config={"model": "gpt-4", "temperature": 0},
)
# From config file
client = LLMClient.from_config(config_dict)
```
### Methods
```python
# Basic call
response = client.call("Explain this code")
print(response.content)
print(response.tokens_used)
# JSON response
result = client.call_json("Return JSON: {\"key\": \"value\"}")
print(result["key"])
```
### Response Object
```python
@dataclass
class LLMResponse:
content: str # Generated text
model: str # Model used
provider: str # Provider name
tokens_used: int # Token count
finish_reason: str # stop, length, etc.
```
---
## Base Agent
`agents/base_agent.py`
### Creating Custom Agent
```python
from agents import BaseAgent, AgentContext, AgentResult
class MyAgent(BaseAgent):
def can_handle(self, event_type: str, event_data: dict) -> bool:
return event_type == "my_event"
def execute(self, context: AgentContext) -> AgentResult:
# Use built-in methods
prompt = self.load_prompt("my_prompt")
response = self.call_llm(prompt)
self.upsert_comment(
context.owner,
context.repo,
issue_index=123,
body=response.content,
)
return AgentResult(
success=True,
message="Done",
actions_taken=["posted comment"],
)
```
### Built-in Methods
```python
# Load prompt template
prompt = self.load_prompt("prompt_name") # From prompts/prompt_name.md
# LLM calls (with rate limiting)
response = self.call_llm(prompt)
json_result = self.call_llm_json(prompt)
# Comment management
comment_id = self.find_ai_comment(owner, repo, issue_index)
self.upsert_comment(owner, repo, issue_index, body)
# Format with disclaimer
formatted = self.format_with_disclaimer(content)
```
### Context Object
```python
@dataclass
class AgentContext:
owner: str # Repository owner
repo: str # Repository name
event_type: str # Event type
event_data: dict # Event payload
config: dict # Configuration
```
### Result Object
```python
@dataclass
class AgentResult:
success: bool
message: str
data: dict = {}
actions_taken: list[str] = []
error: str | None = None
```
---
## Dispatcher
`dispatcher.py`
### Usage
```python
from dispatcher import Dispatcher, get_dispatcher
# Get global dispatcher
dispatcher = get_dispatcher()
# Register agents
dispatcher.register_agent(MyAgent())
# Dispatch event
result = dispatcher.dispatch(
event_type="pull_request",
event_data={"action": "opened", ...},
owner="user",
repo="repo",
)
# Async dispatch
future = dispatcher.dispatch_async(event_type, event_data, owner, repo)
result = future.result()
```
---
## Security Scanner
`security/security_scanner.py`
### Usage
```python
from security import SecurityScanner
scanner = SecurityScanner()
# Scan content
for finding in scanner.scan_content(code, "file.py"):
print(finding.rule_id, finding.severity, finding.line)
# Scan diff (only added lines)
for finding in scanner.scan_diff(diff):
print(finding.file, finding.line, finding.code_snippet)
# Summary
findings = list(scanner.scan_diff(diff))
summary = scanner.get_summary(findings)
```
### Finding Object
```python
@dataclass
class SecurityFinding:
rule_id: str # SEC001, SEC002, etc.
rule_name: str # Human-readable name
severity: str # HIGH, MEDIUM, LOW
category: str # OWASP category
file: str # File path
line: int # Line number
code_snippet: str # Matched code
description: str # Issue description
recommendation: str # How to fix
cwe: str | None # CWE reference
```

196
docs/configuration.md Normal file
View File

@@ -0,0 +1,196 @@
# Configuration Reference
All configuration is managed in `tools/ai-review/config.yml`.
## Provider Settings
```yaml
# LLM Provider: openai | openrouter | ollama
provider: openai
# Model per provider
model:
openai: gpt-4.1-mini
openrouter: anthropic/claude-3.5-sonnet
ollama: codellama:13b
# Generation settings
temperature: 0 # 0 = deterministic
max_tokens: 4096 # Max response tokens
```
## Review Settings
```yaml
review:
fail_on_severity: HIGH # Fail CI on this severity
max_diff_lines: 800 # Truncate large diffs
inline_comments: true # Post inline PR comments
security_scan: true # Run security scanner
```
## Agent Configuration
### Issue Agent
```yaml
agents:
issue:
enabled: true
auto_label: true # Apply labels automatically
auto_triage: true # Run triage on new issues
duplicate_threshold: 0.85 # Similarity threshold
events:
- opened
- labeled
```
### PR Agent
```yaml
agents:
pr:
enabled: true
inline_comments: true # Post inline comments
security_scan: true # Run security scanner
events:
- opened
- synchronize
```
### Codebase Agent
```yaml
agents:
codebase:
enabled: true
schedule: "0 0 * * 0" # Cron schedule (weekly)
```
### Chat Agent (Bartender)
```yaml
agents:
chat:
enabled: true
name: "Bartender" # Display name for the bot
max_iterations: 5 # Max tool calls per chat
tools:
- search_codebase # Search repository files
- read_file # Read file contents
- search_web # Web search via SearXNG
searxng_url: "" # SearXNG instance URL (or use SEARXNG_URL env var)
```
## Interaction Settings
### Customizing the Bot Name
The `mention_prefix` controls what trigger the bot responds to. You can change it to any name you prefer:
```yaml
interaction:
mention_prefix: "@bartender" # Users will type @bartender to invoke the bot
```
**Important:** When changing the bot name, you must also update the workflow files:
1. Edit `.github/workflows/ai-comment-reply.yml` and `ai-chat.yml` (for GitHub)
2. Edit `.gitea/workflows/ai-comment-reply.yml` and `ai-chat.yml` (for Gitea)
3. Change the `if:` condition to match your new prefix:
```yaml
if: contains(github.event.comment.body, '@bartender')
```
**Example bot names:**
- `@ai-bot` - Default, generic
- `@bartender` - Friendly, conversational
- `@uni` - Short, quick to type
- `@joey` - Personal assistant
- `@codebot` - Technical focus
```yaml
interaction:
respond_to_mentions: true
mention_prefix: "@ai-bot"
commands:
- explain # Explain code/issue
- suggest # Suggest solutions
- security # Run security check
- summarize # Summarize content
```
## Label Mappings
```yaml
labels:
priority:
high: "priority: high"
medium: "priority: medium"
low: "priority: low"
type:
bug: "type: bug"
feature: "type: feature"
question: "type: question"
docs: "type: documentation"
status:
ai_approved: "ai-approved"
ai_changes_required: "ai-changes-required"
ai_reviewed: "ai-reviewed"
```
## Enterprise Settings
```yaml
enterprise:
audit_log: true
audit_path: "/var/log/ai-review/"
metrics_enabled: true
rate_limit:
requests_per_minute: 30
max_concurrent: 4
```
## Security Configuration
```yaml
security:
enabled: true
fail_on_high: true
rules_file: "security/security_rules.yml" # Custom rules
```
## Environment Variables
These override config file settings:
| Variable | Description |
|----------|-------------|
| `AI_REVIEW_TOKEN` | Gitea/GitHub API token |
| `AI_REVIEW_API_URL` | API base URL (`https://api.github.com` or Gitea URL) |
| `AI_REVIEW_REPO` | Target repository (owner/repo) |
| `OPENAI_API_KEY` | OpenAI API key |
| `OPENROUTER_API_KEY` | OpenRouter API key |
| `OLLAMA_HOST` | Ollama server URL |
| `SEARXNG_URL` | SearXNG instance URL for web search |
| `AI_AUDIT_PATH` | Audit log directory |
## Per-Repository Overrides
Create `.ai-review.yml` in repository root:
```yaml
# Override global config for this repo
agents:
pr:
security_scan: false # Disable security scan
issue:
auto_label: false # Disable auto-labeling
# Custom labels
labels:
priority:
high: "P0"
medium: "P1"
low: "P2"
```

223
docs/enterprise.md Normal file
View File

@@ -0,0 +1,223 @@
# Enterprise Features
Advanced features for enterprise deployments.
## Audit Logging
All AI actions are logged for compliance and debugging.
### Configuration
```yaml
enterprise:
audit_log: true
audit_path: "/var/log/ai-review/"
```
### Log Format
Logs are stored as JSONL (JSON Lines) with daily rotation:
```
/var/log/ai-review/audit-2024-01-15.jsonl
```
Each line is a JSON object:
```json
{
"timestamp": "2024-01-15T10:30:45.123Z",
"action": "review_pr",
"agent": "PRAgent",
"repository": "org/repo",
"success": true,
"details": {
"pr_number": 123,
"severity": "MEDIUM",
"issues_found": 3
}
}
```
### Actions Logged
| Action | Description |
|--------|-------------|
| `review_pr` | PR review completed |
| `triage_issue` | Issue triaged |
| `llm_call` | LLM API call made |
| `comment_posted` | Comment created/updated |
| `labels_applied` | Labels added |
| `security_scan` | Security scan completed |
### Querying Logs
```python
from enterprise import get_audit_logger
logger = get_audit_logger()
# Get all logs for a date range
logs = logger.get_logs(
start_date="2024-01-01",
end_date="2024-01-31",
action="review_pr",
repository="org/repo",
)
# Generate summary report
report = logger.generate_report(
start_date="2024-01-01",
end_date="2024-01-31",
)
print(f"Total events: {report['total_events']}")
print(f"Success rate: {report['success_rate']:.1%}")
```
---
## Metrics & Observability
Track performance and usage metrics.
### Configuration
```yaml
enterprise:
metrics_enabled: true
```
### Available Metrics
**Counters:**
- `ai_review_requests_total` - Total requests processed
- `ai_review_requests_success` - Successful requests
- `ai_review_requests_failed` - Failed requests
- `ai_review_llm_calls_total` - Total LLM API calls
- `ai_review_llm_tokens_total` - Total tokens consumed
- `ai_review_comments_posted` - Comments posted
- `ai_review_security_findings` - Security issues found
**Gauges:**
- `ai_review_active_requests` - Currently processing
**Histograms:**
- `ai_review_request_duration_seconds` - Request latency
- `ai_review_llm_duration_seconds` - LLM call latency
### Getting Metrics
```python
from enterprise import get_metrics
metrics = get_metrics()
# Get summary
summary = metrics.get_summary()
print(f"Total requests: {summary['requests']['total']}")
print(f"Success rate: {summary['requests']['success_rate']:.1%}")
print(f"Avg latency: {summary['latency']['avg_ms']:.0f}ms")
print(f"P95 latency: {summary['latency']['p95_ms']:.0f}ms")
print(f"LLM tokens used: {summary['llm']['tokens']}")
# Export Prometheus format
prometheus_output = metrics.export_prometheus()
```
### Prometheus Integration
Expose metrics endpoint:
```python
from flask import Flask
from enterprise import get_metrics
app = Flask(__name__)
@app.route("/metrics")
def metrics():
return get_metrics().export_prometheus()
```
---
## Rate Limiting
Prevent API overload and manage costs.
### Configuration
```yaml
enterprise:
rate_limit:
requests_per_minute: 30
max_concurrent: 4
```
### Built-in Rate Limiting
The `BaseAgent` class includes automatic rate limiting:
```python
class BaseAgent:
def __init__(self):
self._min_request_interval = 1.0 # seconds
def _rate_limit(self):
elapsed = time.time() - self._last_request_time
if elapsed < self._min_request_interval:
time.sleep(self._min_request_interval - elapsed)
```
---
## Queue Management
The dispatcher handles concurrent execution:
```python
dispatcher = Dispatcher(max_workers=4)
```
For high-volume environments, use async dispatch:
```python
future = dispatcher.dispatch_async(event_type, event_data, owner, repo)
# Continue with other work
result = future.result() # Block when needed
```
---
## Security Considerations
### Token Permissions
Minimum required permissions for `AI_REVIEW_TOKEN`:
- `repo:read` - Read repository contents
- `repo:write` - Create branches (if needed)
- `issue:read` - Read issues and PRs
- `issue:write` - Create comments, labels
### Network Isolation
For air-gapped environments, use Ollama:
```yaml
provider: ollama
# Internal network address
# Set via environment: OLLAMA_HOST=http://ollama.internal:11434
```
### Data Privacy
By default:
- Code is sent to LLM provider for analysis
- Review comments are stored in Gitea
- Audit logs are stored locally
For sensitive codebases:
1. Use self-hosted Ollama
2. Disable external LLM providers
3. Review audit log retention policies

82
docs/future_roadmap.md Normal file
View File

@@ -0,0 +1,82 @@
# Future Features Roadmap
This document outlines the strategic plan for evolving the AI Code Review system. These features are proposed for future implementation to enhance security coverage, context awareness, and user interaction.
---
## Phase 1: Advanced Security Scanning
Expand the current 17-rule regex scanner with dedicated industry-standard tools for **Static Application Security Testing (SAST)** and **Software Composition Analysis (SCA)**.
### Proposed Integrations
| Tool | Type | Purpose | Implementation Plan |
|------|------|---------|---------------------|
| **Bandit** | SAST | Analyze Python code for common vulnerability patterns (e.g., `exec`, weak crypto). | Run `bandit -r . -f json` and parse results into the review report. |
| **Semgrep** | SAST | Polyglot scanning with custom rule support. | Integrate `semgrep --config=p/security-audit` for broader language support (JS, Go, Java). |
| **Safety** | SCA | Check installed dependencies against known vulnerability databases. | Run `safety check --json` during CI to flag vulnerable packages in `requirements.txt`. |
| **Trivy** | SCA/Container | Scan container images (Dockerfiles) and filesystem. | Add a workflow step to run Trivy for container-based projects. |
**Impact:** significantly reduces false negatives and covers dependency chain risks (Supply Chain Security).
---
## Phase 2: "Chat with Codebase" (RAG)
Move beyond single-file context by implementing **Retrieval-Augmented Generation (RAG)**. This allows the AI to answer questions like *"Where is authentication handled?"* by searching the entire codebase semantically.
### Architecture
1. **Vector Database:**
* **ChromaDB** or **Qdrant**: Lightweight, open-source choices for storing code embeddings.
2. **Embeddings Model:**
* **OpenAI `text-embedding-3-small`** or **FastEmbed**: To convert code chunks (functions/classes) into vectors.
3. **Workflow:**
* **Index:** Run a nightly job to parse the codebase -> chunk it -> embed it -> store in Vector DB.
* **Query:** When `@ai-bot` receives a question, convert the question to a vector -> search Vector DB -> inject relevant snippets into the LLM prompt.
**Impact:** Enables high-accuracy architectural advice and deep-dive explanations spanning multiple files.
---
## Phase 3: Interactive Code Repair
Transform the bot from a passive reviewer into an active collaborator.
### Features
* **`@ai-bot apply <suggestion_id>`**:
* The bot generates a secure `git patch` for a specific recommendation.
* The system commits the patch directly to the PR branch.
* **Refactoring Assistance**:
* Command: `@ai-bot refactor this function to use dependency injection`.
* Bot proposes the changed code block and offers to commit it.
**Risk Mitigation:**
* Require human approval (comment reply) before any commit is pushed.
* Run tests automatically after bot commits.
---
## Phase 4: Enterprise Dashboard
Provide a high-level view of engineering health across the organization.
### Metrics to Visualize
* **Security Health:** Trend of High/Critical issues over time.
* **Code Quality:** Technical debt accumulation vs. reduction rate.
* **Review Velocity:** Average time to AI review vs. Human review.
* **Bot Usage:** Most frequent commands and value-add interactions.
### Tech Stack
* **Prometheus** (already implemented) + **Grafana**: For time-series tracking.
* **Streamlit** / **Next.js**: For a custom management console to configure rules and view logs.
---
## Strategic Recommendations
1. **Immediate Win:** Implement **Bandit** integration. It is low-effort (Python library) and high-value (detects real vulnerabilities).
2. **High Impact:** **Safety** dependency scanning. Vulnerable dependencies are the #1 attack vector for modern apps.
3. **Long Term:** Work on **Vector DB** integration only after the core review logic is flawless, as it introduces significant infrastructure complexity.

142
docs/getting-started.md Normal file
View File

@@ -0,0 +1,142 @@
# Getting Started
This guide will help you set up the AI Code Review system for your Gitea repositories.
## Prerequisites
- Gitea instance (self-hosted or managed)
- Python 3.11+
- LLM API access (OpenAI, OpenRouter, or Ollama)
---
## Step 1: Create a Bot Account
1. Create a new Gitea user account for the bot (e.g., `ai-reviewer`)
2. Generate an access token with these permissions:
- `repo` - Full repository access
- `issue` - Issue read/write access
3. Save the token securely
---
## Step 2: Configure Organization Secrets
In your Gitea organization or repository settings, add these secrets:
| Secret | Description |
|--------|-------------|
| `AI_REVIEW_TOKEN` | Bot's Gitea access token |
| `OPENAI_API_KEY` | OpenAI API key (if using OpenAI) |
| `OPENROUTER_API_KEY` | OpenRouter key (if using OpenRouter) |
| `OLLAMA_HOST` | Ollama URL (if using Ollama, e.g., `http://localhost:11434`) |
---
## Step 3: Add Workflows to Your Repository
Copy the workflow files from this repository to your target repo:
```bash
# Create workflows directory
mkdir -p .gitea/workflows
# Copy workflow files
# Option 1: Copy manually from this repo's .gitea/workflows/
# Option 2: Reference this repo in your workflows (see README)
```
### Workflow Files:
| File | Trigger | Purpose |
|------|---------|---------|
| `enterprise-ai-review.yml` | PR opened/updated | Run AI code review |
| `ai-issue-review.yml` | Issue opened, @ai-bot | Triage issues & respond to commands |
| `ai-codebase-review.yml` | Weekly/manual | Analyze codebase health |
---
## Step 4: Create Labels
Create these labels in your repository for auto-labeling:
**Priority Labels:**
- `priority: high` (red)
- `priority: medium` (yellow)
- `priority: low` (green)
**Type Labels:**
- `type: bug`
- `type: feature`
- `type: question`
- `type: documentation`
**AI Status Labels:**
- `ai-approved`
- `ai-changes-required`
- `ai-reviewed`
---
## Step 5: Test the Setup
### Test PR Review:
1. Create a new pull request
2. Wait for the AI review workflow to run
3. Check for the AI review comment
### Test Issue Triage:
1. Create a new issue
2. The AI should automatically triage and comment
### Test @ai-bot Commands:
1. On any issue, comment: `@ai-bot summarize`
2. The AI should respond with a summary
---
## Troubleshooting
### Common Issues:
**"Missing token" error:**
- Verify `AI_REVIEW_TOKEN` is set in secrets
- Ensure the token has correct permissions
**"LLM call failed" error:**
- Verify your LLM API key is set
- Check the `provider` setting in `config.yml`
**Workflow not triggering:**
- Verify workflow files are in `.gitea/workflows/`
- Check that Actions are enabled for your repository
See [Troubleshooting Guide](troubleshooting.md) for more.
---
## Helper: CLI Usage
If you need to run the agents manually (e.g. for debugging or local testing), you can use the CLI:
```bash
# Review a pull request
python main.py pr owner/repo 123
# Triage a new issue
python main.py issue owner/repo 456
# Handle @ai-bot command in comment
python main.py comment owner/repo 456 "@ai-bot summarize"
# Analyze codebase
python main.py codebase owner/repo
```
---
## Next Steps
- [Configuration Reference](configuration.md) - Customize behavior
- [Agents Documentation](agents.md) - Learn about each agent
- [Security Scanning](security.md) - Understand security rules

163
docs/security.md Normal file
View File

@@ -0,0 +1,163 @@
# Security Scanning
The security scanner detects vulnerabilities aligned with OWASP Top 10.
## Supported Rules
### A01:2021 Broken Access Control
| Rule | Severity | Description |
|------|----------|-------------|
| SEC001 | HIGH | Hardcoded credentials (passwords, API keys) |
| SEC002 | HIGH | Exposed private keys |
### A02:2021 Cryptographic Failures
| Rule | Severity | Description |
|------|----------|-------------|
| SEC003 | MEDIUM | Weak hash algorithms (MD5, SHA1) |
| SEC004 | MEDIUM | Non-cryptographic random for security |
### A03:2021 Injection
| Rule | Severity | Description |
|------|----------|-------------|
| SEC005 | HIGH | SQL injection via string formatting |
| SEC006 | HIGH | Command injection in subprocess |
| SEC007 | HIGH | eval() usage |
| SEC008 | MEDIUM | XSS via innerHTML |
### A04:2021 Insecure Design
| Rule | Severity | Description |
|------|----------|-------------|
| SEC009 | MEDIUM | Debug mode enabled |
### A05:2021 Security Misconfiguration
| Rule | Severity | Description |
|------|----------|-------------|
| SEC010 | MEDIUM | CORS wildcard (*) |
| SEC011 | HIGH | SSL verification disabled |
### A07:2021 Authentication Failures
| Rule | Severity | Description |
|------|----------|-------------|
| SEC012 | HIGH | Hardcoded JWT secrets |
### A08:2021 Integrity Failures
| Rule | Severity | Description |
|------|----------|-------------|
| SEC013 | MEDIUM | Pickle deserialization |
### A09:2021 Logging Failures
| Rule | Severity | Description |
|------|----------|-------------|
| SEC014 | MEDIUM | Logging sensitive data |
### A10:2021 Server-Side Request Forgery
| Rule | Severity | Description |
|------|----------|-------------|
| SEC015 | MEDIUM | SSRF via dynamic URLs |
### Additional Rules
| Rule | Severity | Description |
|------|----------|-------------|
| SEC016 | LOW | Hardcoded IP addresses |
| SEC017 | MEDIUM | Security-related TODO/FIXME |
## Usage
### In PR Reviews
Security scanning runs automatically during PR review:
```yaml
agents:
pr:
security_scan: true
```
### Standalone
```python
from security import SecurityScanner
scanner = SecurityScanner()
# Scan file content
for finding in scanner.scan_content(code, "file.py"):
print(f"[{finding.severity}] {finding.rule_name}")
print(f" Line {finding.line}: {finding.code_snippet}")
print(f" {finding.description}")
# Scan git diff
for finding in scanner.scan_diff(diff):
print(f"{finding.file}:{finding.line} - {finding.rule_name}")
```
### Get Summary
```python
findings = list(scanner.scan_content(code, "file.py"))
summary = scanner.get_summary(findings)
print(f"Total: {summary['total']}")
print(f"HIGH: {summary['by_severity']['HIGH']}")
print(f"Categories: {summary['by_category']}")
```
## Custom Rules
Create `security/security_rules.yml`:
```yaml
rules:
- id: "CUSTOM001"
name: "Custom Pattern"
pattern: "dangerous_function\\s*\\("
severity: "HIGH"
category: "Custom"
cwe: "CWE-xxx"
description: "Usage of dangerous function detected"
recommendation: "Use safe_function() instead"
```
Load custom rules:
```python
scanner = SecurityScanner(rules_file="security/custom_rules.yml")
```
## CI Integration
Fail CI on HIGH severity findings:
```yaml
security:
fail_on_high: true
```
Or in code:
```python
findings = list(scanner.scan_diff(diff))
high_count = sum(1 for f in findings if f.severity == "HIGH")
if high_count > 0:
sys.exit(1)
```
## CWE References
All rules include CWE (Common Weakness Enumeration) references:
- [CWE-78](https://cwe.mitre.org/data/definitions/78.html): OS Command Injection
- [CWE-79](https://cwe.mitre.org/data/definitions/79.html): XSS
- [CWE-89](https://cwe.mitre.org/data/definitions/89.html): SQL Injection
- [CWE-798](https://cwe.mitre.org/data/definitions/798.html): Hardcoded Credentials

263
docs/troubleshooting.md Normal file
View File

@@ -0,0 +1,263 @@
# Troubleshooting
Common issues and solutions for the AI Code Review system.
## Installation Issues
### `ModuleNotFoundError: No module named 'requests'`
Install dependencies:
```bash
pip install requests pyyaml
```
### `ImportError: cannot import name 'BaseAgent'`
Ensure you're running from the correct directory:
```bash
cd tools/ai-review
python main.py pr owner/repo 123
```
---
## Authentication Issues
### `repository not found`
**Causes:**
- Bot token lacks access to the repository
- Repository path is incorrect
**Solutions:**
1. Verify token has `repo` permissions
2. Check repository path format: `owner/repo`
3. Ensure token can access both the target repo and the AI tooling repo
### `401 Unauthorized`
**Causes:**
- Invalid or expired token
- Missing token in environment
**Solutions:**
1. Regenerate the bot token
2. Verify `AI_REVIEW_TOKEN` is set correctly
3. Check organization secret scope is "All Repositories"
### `403 Forbidden`
**Causes:**
- Token lacks write permissions
- Repository is private and token doesn't have access
**Solutions:**
1. Ensure token has `issue:write` permission
2. Add bot account as collaborator to private repos
---
## LLM Issues
### `OPENAI_API_KEY not set`
Set the environment variable:
```bash
export OPENAI_API_KEY="sk-..."
```
Or in workflow:
```yaml
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
```
### `Rate limit exceeded`
**Causes:**
- Too many requests to LLM provider
- API quota exhausted
**Solutions:**
1. Increase rate limit interval in config
2. Switch to a different provider temporarily
3. Check your API plan limits
### `JSON decode error` from LLM
**Causes:**
- LLM returned non-JSON response
- Response was truncated
**Solutions:**
1. Increase `max_tokens` in config
2. Check LLM response in logs
3. Improve prompt to enforce JSON output
---
## Workflow Issues
### Workflow doesn't trigger
**Causes:**
- Workflow file not in correct location
- Event type not configured
**Solutions:**
1. Ensure workflow is in `.gitea/workflows/`
2. Check event types match your needs:
```yaml
on:
pull_request:
types: [opened, synchronize]
```
3. Verify Gitea Actions is enabled for the repository
### `review.py not found`
**Causes:**
- Central repo checkout failed
- Path is incorrect
**Solutions:**
1. Verify the checkout step has correct repository and path
2. Check bot token has access to the AI tooling repo
3. Ensure path matches: `.ai-review/tools/ai-review/main.py`
### PR comments not appearing
**Causes:**
- Token lacks issue write permission
- API URL is incorrect
**Solutions:**
1. Check `AI_REVIEW_API_URL` is correct
2. Verify token has `issue:write` permission
3. Check workflow logs for API errors
### @ai-bot edits the issue instead of replying
**Causes:**
- Workflow is using the wrong CLI command for comments
- `event_type` is incorrectly set to "issues"
**Solutions:**
1. Ensure your workflow uses the `comment` command for mentions:
```yaml
python main.py comment owner/repo 123 "@ai-bot ..."
```
2. Verify you have separate jobs for `issues` vs `issue_comment` events (see [Workflows](workflows.md))
---
## Label Issues
### Labels not being applied
**Causes:**
- Labels don't exist in repository
- Label names don't match config
**Solutions:**
1. Create labels matching your config:
- `priority: high`
- `type: bug`
- `ai-approved`
2. Or update config to match existing labels:
```yaml
labels:
priority:
high: "P0" # Your label name
```
### `label not found` error
The agent gracefully handles missing labels. Create labels manually or disable auto-labeling:
```yaml
agents:
issue:
auto_label: false
```
---
## Performance Issues
### Reviews are slow
**Causes:**
- Large diffs taking long to process
- LLM response time
**Solutions:**
1. Reduce max diff lines:
```yaml
review:
max_diff_lines: 500
```
2. Use a faster model:
```yaml
model:
openai: gpt-4.1-mini # Faster than gpt-4
```
3. Consider Ollama for local, faster inference
### Timeout errors
Increase timeout in API calls or use async processing:
```python
client = GiteaClient(timeout=60) # Increase from default 30
```
---
## Debugging
### Enable verbose logging
```bash
python main.py -v pr owner/repo 123
```
### Check workflow logs
1. Go to repository -> Actions
2. Click on the failed workflow run
3. Expand job steps to see output
### Test locally
```bash
# Set environment variables
export AI_REVIEW_TOKEN="your_token"
export AI_REVIEW_API_URL="https://your-gitea/api/v1"
export OPENAI_API_KEY="sk-..."
# Run locally
cd tools/ai-review
python main.py pr owner/repo 123
```
### Validate Python syntax
```bash
python -m py_compile main.py
```
---
## Getting Help
1. Check the [documentation](README.md)
2. Search existing issues in the repository
3. Create a new issue with:
- Steps to reproduce
- Error messages
- Environment details (Gitea version, Python version)

389
docs/workflows.md Normal file
View File

@@ -0,0 +1,389 @@
# Workflows
This document provides ready-to-use workflow files for integrating AI code review into your repositories. Workflows are provided for both **GitHub Actions** and **Gitea Actions**.
---
## Platform Comparison
| Feature | GitHub | Gitea |
|---------|--------|-------|
| Context variable | `github.*` | `gitea.*` |
| Default token | `GITHUB_TOKEN` | `AI_REVIEW_TOKEN` (custom) |
| API URL | `https://api.github.com` | Your Gitea instance URL |
| Tools location | Same repo (`tools/ai-review`) | Checkout from central repo |
---
## GitHub Workflows
### PR Review Workflow
```yaml
# .github/workflows/ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Review
env:
AI_REVIEW_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AI_REVIEW_REPO: ${{ github.repository }}
AI_REVIEW_API_URL: https://api.github.com
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd tools/ai-review
python main.py pr ${{ github.repository }} ${{ github.event.pull_request.number }}
```
### Issue Triage Workflow
```yaml
# .github/workflows/ai-issue-triage.yml
name: AI Issue Triage
on:
issues:
types: [opened, labeled]
jobs:
ai-triage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Issue Triage
env:
AI_REVIEW_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AI_REVIEW_REPO: ${{ github.repository }}
AI_REVIEW_API_URL: https://api.github.com
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd tools/ai-review
python main.py issue ${{ github.repository }} ${{ github.event.issue.number }} \
--title "${{ github.event.issue.title }}"
```
### Comment Reply Workflow (includes Bartender Chat)
```yaml
# .github/workflows/ai-comment-reply.yml
name: AI Comment Reply
on:
issue_comment:
types: [created]
jobs:
ai-reply:
runs-on: ubuntu-latest
if: contains(github.event.comment.body, '@ai-bot')
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Comment Response
env:
AI_REVIEW_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AI_REVIEW_REPO: ${{ github.repository }}
AI_REVIEW_API_URL: https://api.github.com
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
SEARXNG_URL: ${{ secrets.SEARXNG_URL }}
run: |
cd tools/ai-review
python main.py comment ${{ github.repository }} ${{ github.event.issue.number }} \
"${{ github.event.comment.body }}"
```
### Codebase Analysis Workflow
```yaml
# .github/workflows/ai-codebase-review.yml
name: AI Codebase Analysis
on:
schedule:
- cron: "0 0 * * 0" # Weekly on Sunday
workflow_dispatch: # Manual trigger
jobs:
ai-codebase:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run Codebase Analysis
env:
AI_REVIEW_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AI_REVIEW_REPO: ${{ github.repository }}
AI_REVIEW_API_URL: https://api.github.com
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd tools/ai-review
python main.py codebase ${{ github.repository }}
```
---
## Gitea Workflows
### PR Review Workflow
```yaml
# .gitea/workflows/enterprise-ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/checkout@v4
with:
repository: YourOrg/OpenRabbit
path: .ai-review
token: ${{ secrets.AI_REVIEW_TOKEN }}
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Review
env:
AI_REVIEW_TOKEN: ${{ secrets.AI_REVIEW_TOKEN }}
AI_REVIEW_REPO: ${{ gitea.repository }}
AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd .ai-review/tools/ai-review
python main.py pr ${{ gitea.repository }} ${{ gitea.event.pull_request.number }}
```
### Issue Triage Workflow
```yaml
# .gitea/workflows/ai-issue-triage.yml
name: AI Issue Triage
on:
issues:
types: [opened, labeled]
jobs:
ai-triage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
repository: YourOrg/OpenRabbit
path: .ai-review
token: ${{ secrets.AI_REVIEW_TOKEN }}
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Issue Triage
env:
AI_REVIEW_TOKEN: ${{ secrets.AI_REVIEW_TOKEN }}
AI_REVIEW_REPO: ${{ gitea.repository }}
AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd .ai-review/tools/ai-review
python main.py issue ${{ gitea.repository }} ${{ gitea.event.issue.number }} \
--title "${{ gitea.event.issue.title }}"
```
### Comment Reply Workflow (includes Bartender Chat)
```yaml
# .gitea/workflows/ai-comment-reply.yml
name: AI Comment Reply
on:
issue_comment:
types: [created]
jobs:
ai-reply:
runs-on: ubuntu-latest
if: contains(github.event.comment.body, '@ai-bot')
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
repository: YourOrg/OpenRabbit
path: .ai-review
token: ${{ secrets.AI_REVIEW_TOKEN }}
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run AI Comment Response
env:
AI_REVIEW_TOKEN: ${{ secrets.AI_REVIEW_TOKEN }}
AI_REVIEW_REPO: ${{ gitea.repository }}
AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
SEARXNG_URL: ${{ secrets.SEARXNG_URL }}
run: |
cd .ai-review/tools/ai-review
python main.py comment ${{ gitea.repository }} ${{ gitea.event.issue.number }} \
"${{ gitea.event.comment.body }}"
```
### Codebase Analysis Workflow
```yaml
# .gitea/workflows/ai-codebase-review.yml
name: AI Codebase Analysis
on:
schedule:
- cron: "0 0 * * 0" # Weekly on Sunday
workflow_dispatch: # Manual trigger
jobs:
ai-codebase:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/checkout@v4
with:
repository: YourOrg/OpenRabbit
path: .ai-review
token: ${{ secrets.AI_REVIEW_TOKEN }}
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install requests pyyaml
- name: Run Codebase Analysis
env:
AI_REVIEW_TOKEN: ${{ secrets.AI_REVIEW_TOKEN }}
AI_REVIEW_REPO: ${{ gitea.repository }}
AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd .ai-review/tools/ai-review
python main.py codebase ${{ gitea.repository }}
```
---
## Required Secrets
### GitHub
| Secret | Required | Description |
|--------|----------|-------------|
| `GITHUB_TOKEN` | Auto | Built-in token (automatic) |
| `OPENAI_API_KEY` | Choose one | OpenAI API key |
| `OPENROUTER_API_KEY` | Choose one | OpenRouter API key |
| `OLLAMA_HOST` | Choose one | Ollama server URL |
| `SEARXNG_URL` | Optional | SearXNG instance for web search |
### Gitea
| Secret | Required | Description |
|--------|----------|-------------|
| `AI_REVIEW_TOKEN` | Yes | Gitea bot access token |
| `OPENAI_API_KEY` | Choose one | OpenAI API key |
| `OPENROUTER_API_KEY` | Choose one | OpenRouter API key |
| `OLLAMA_HOST` | Choose one | Ollama server URL |
| `SEARXNG_URL` | Optional | SearXNG instance for web search |
---
## Customization
### For GitHub
The tools are included in the same repository under `tools/ai-review`, so no additional checkout is needed.
### For Gitea
Replace the repository reference with your OpenRabbit fork:
```yaml
repository: YourOrg/OpenRabbit
```
Replace the API URL with your Gitea instance:
```yaml
AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
```
---
## Chat/Bartender Workflow
Both platforms support the Bartender chat agent through the comment reply workflow. When `@ai-bot` is mentioned with a question (not a specific command like `summarize`), the Chat Agent handles it with tool calling capabilities.
To enable web search, set the `SEARXNG_URL` secret to your SearXNG instance URL.
**Example usage:**
```
@ai-bot How do I configure rate limiting?
@ai-bot Find all authentication-related files
@ai-bot What does the dispatcher module do?
```