792 lines
27 KiB
Markdown
792 lines
27 KiB
Markdown
# CLAUDE.md
|
||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
||
## Overview
|
||
|
||
OpenRabbit is an enterprise-grade AI code review system for Gitea (and GitHub). It provides automated PR review, issue triage, interactive chat, and codebase analysis through a collection of specialized AI agents.
|
||
|
||
## Commands
|
||
|
||
### Development
|
||
|
||
```bash
|
||
# Run tests
|
||
pytest tests/ -v
|
||
|
||
# Run specific test file
|
||
pytest tests/test_ai_review.py -v
|
||
|
||
# Install dependencies
|
||
pip install -r tools/ai-review/requirements.txt
|
||
|
||
# Run a PR review locally
|
||
cd tools/ai-review
|
||
python main.py pr owner/repo 123
|
||
|
||
# Run issue triage
|
||
python main.py issue owner/repo 456
|
||
|
||
# Test chat functionality
|
||
python main.py chat owner/repo "How does authentication work?"
|
||
|
||
# Run with custom config
|
||
python main.py pr owner/repo 123 --config /path/to/config.yml
|
||
```
|
||
|
||
### Testing Workflows
|
||
|
||
```bash
|
||
# Validate workflow YAML syntax
|
||
python -c "import yaml; yaml.safe_load(open('.github/workflows/ai-review.yml'))"
|
||
|
||
# Test security scanner
|
||
python -c "from security.security_scanner import SecurityScanner; s = SecurityScanner(); print(list(s.scan_content('password = \"secret123\"', 'test.py')))"
|
||
|
||
# Test webhook sanitization
|
||
cd tools/ai-review
|
||
python -c "from utils.webhook_sanitizer import sanitize_webhook_data; print(sanitize_webhook_data({'user': {'email': 'test@example.com'}}))"
|
||
|
||
# Test safe dispatch
|
||
python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", "issue": {"number": 1}, "comment": {"body": "test"}}'
|
||
```
|
||
|
||
## Architecture
|
||
|
||
### Agent System
|
||
|
||
The codebase uses an **agent-based architecture** where specialized agents handle different types of events:
|
||
|
||
1. **BaseAgent** (`agents/base_agent.py`) - Abstract base class providing:
|
||
- Gitea API client integration
|
||
- LLM client integration with rate limiting
|
||
- Common comment management (upsert, find AI comments)
|
||
- Prompt loading from `prompts/` directory
|
||
- Standard execution flow with error handling
|
||
|
||
2. **Specialized Agents** - Each agent implements:
|
||
- `can_handle(event_type, event_data)` - Determines if agent should process the event
|
||
- `execute(context)` - Main execution logic
|
||
- Returns `AgentResult` with success status, message, data, and actions taken
|
||
|
||
**Core Agents:**
|
||
- **PRAgent** - Reviews pull requests with inline comments and security scanning
|
||
- **IssueAgent** - Triages issues and responds to @codebot commands
|
||
- **CodebaseAgent** - Analyzes entire codebase health and tech debt
|
||
- **ChatAgent** - Interactive assistant with tool calling (search_codebase, read_file, search_web)
|
||
|
||
**Specialized Agents:**
|
||
- **DependencyAgent** - Scans dependencies for security vulnerabilities (Python, JavaScript)
|
||
- **TestCoverageAgent** - Analyzes code for test coverage gaps and suggests test cases
|
||
- **ArchitectureAgent** - Enforces layer separation and detects architecture violations
|
||
|
||
3. **Dispatcher** (`dispatcher.py`) - Routes events to appropriate agents:
|
||
- Registers agents at startup
|
||
- Determines which agents can handle each event
|
||
- Executes agents (supports concurrent execution)
|
||
- Returns aggregated results
|
||
|
||
### Multi-Provider LLM Client
|
||
|
||
The `LLMClient` (`clients/llm_client.py`) provides a unified interface for multiple LLM providers:
|
||
|
||
**Core Providers (in llm_client.py):**
|
||
- **OpenAI** - Primary provider (gpt-4.1-mini default)
|
||
- **OpenRouter** - Multi-provider access (claude-3.5-sonnet)
|
||
- **Ollama** - Self-hosted models (codellama:13b)
|
||
|
||
**Additional Providers (in clients/providers/):**
|
||
- **AnthropicProvider** - Direct Anthropic Claude API (claude-3.5-sonnet)
|
||
- **AzureOpenAIProvider** - Azure OpenAI Service with API key auth
|
||
- **AzureOpenAIWithAADProvider** - Azure OpenAI with Azure AD authentication
|
||
- **GeminiProvider** - Google Gemini API (public)
|
||
- **VertexAIGeminiProvider** - Google Vertex AI Gemini (enterprise GCP)
|
||
|
||
Key features:
|
||
- Tool/function calling support via `call_with_tools(messages, tools)`
|
||
- JSON response parsing with fallback extraction
|
||
- Provider-specific configuration via `config.yml`
|
||
- Configurable timeouts per provider
|
||
|
||
### Platform Abstraction
|
||
|
||
The `GiteaClient` (`clients/gitea_client.py`) provides a unified REST API client for **Gitea** (also compatible with GitHub API):
|
||
|
||
- Issue operations (create, update, list, get, comments, labels)
|
||
- PR operations (get, diff, files, reviews)
|
||
- Repository operations (get repo, file contents, branches)
|
||
|
||
Environment variables:
|
||
- `AI_REVIEW_API_URL` - API base URL (e.g., `https://api.github.com` or `https://gitea.example.com/api/v1`)
|
||
- `AI_REVIEW_TOKEN` - Authentication token
|
||
|
||
### Security Scanner
|
||
|
||
The `SecurityScanner` (`security/security_scanner.py`) uses **pattern-based detection** with 17 built-in rules covering:
|
||
|
||
- OWASP Top 10 categories (A01-A10)
|
||
- Common vulnerabilities (SQL injection, XSS, hardcoded secrets, weak crypto)
|
||
- Returns `SecurityFinding` objects with severity (HIGH/MEDIUM/LOW), CWE references, and recommendations
|
||
|
||
Can scan:
|
||
- File content via `scan_content(content, filename)`
|
||
- Git diffs via `scan_diff(diff)` - only scans added lines
|
||
|
||
### Chat Agent Tool Calling
|
||
|
||
The `ChatAgent` implements an **iterative tool calling loop**:
|
||
|
||
1. Send user message + system prompt to LLM with available tools
|
||
2. If LLM returns tool calls, execute each tool and append results to conversation
|
||
3. Repeat until LLM returns a final response (max 5 iterations)
|
||
|
||
Available tools:
|
||
- `search_codebase` - Searches repository files and code patterns
|
||
- `read_file` - Reads specific file contents (truncated at 8KB)
|
||
- `search_web` - Queries SearXNG instance (requires `SEARXNG_URL`)
|
||
|
||
## Configuration
|
||
|
||
### Primary Config File: `tools/ai-review/config.yml`
|
||
|
||
Critical settings:
|
||
|
||
```yaml
|
||
provider: openai # openai | openrouter | ollama
|
||
|
||
model:
|
||
openai: gpt-4.1-mini
|
||
openrouter: anthropic/claude-3.5-sonnet
|
||
ollama: codellama:13b
|
||
|
||
interaction:
|
||
mention_prefix: "@codebot" # Bot trigger name - update workflows too!
|
||
commands:
|
||
- explain # Explain what the issue is about
|
||
- suggest # Suggest solutions or next steps
|
||
- security # Security analysis
|
||
- summarize # Summarize the issue
|
||
- triage # Full triage with labeling
|
||
- review-again # Re-run PR review (PR comments only)
|
||
|
||
review:
|
||
fail_on_severity: HIGH # Fail CI if HIGH severity issues found
|
||
max_diff_lines: 800 # Skip review if diff too large
|
||
|
||
agents:
|
||
chat:
|
||
max_iterations: 5 # Tool calling loop limit
|
||
```
|
||
|
||
**Important**: When changing `mention_prefix`, also update all workflow files in `.gitea/workflows/`:
|
||
- `ai-comment-reply.yml`
|
||
- `ai-chat.yml`
|
||
- `ai-issue-triage.yml`
|
||
|
||
Look for: `if: contains(github.event.comment.body, '@codebot')` and update to your new bot name.
|
||
|
||
Current bot name: `@codebot`
|
||
|
||
### Environment Variables
|
||
|
||
Required:
|
||
- `AI_REVIEW_API_URL` - Platform API URL
|
||
- `AI_REVIEW_TOKEN` - Bot authentication token
|
||
- `OPENAI_API_KEY` - OpenAI API key (or provider-specific key)
|
||
|
||
Optional:
|
||
- `SEARXNG_URL` - SearXNG instance for web search
|
||
- `OPENROUTER_API_KEY` - OpenRouter API key
|
||
- `OLLAMA_HOST` - Ollama server URL
|
||
|
||
## Workflow Architecture
|
||
|
||
Workflows are located in `.gitea/workflows/` and are **mutually exclusive** to prevent duplicate runs:
|
||
|
||
- **enterprise-ai-review.yml** - Triggered on PR open/sync
|
||
- **ai-issue-triage.yml** - Triggered ONLY on `@codebot triage` in comments
|
||
- **ai-comment-reply.yml** - Triggered on specific commands: `help`, `explain`, `suggest`, `security`, `summarize`, `changelog`, `explain-diff`, `review-again`, `setup-labels`
|
||
- **ai-chat.yml** - Triggered on `@codebot` mentions that are NOT specific commands (free-form questions)
|
||
- **ai-codebase-review.yml** - Scheduled weekly analysis
|
||
|
||
**Workflow Routing Logic:**
|
||
1. If comment contains `@codebot triage` → ai-issue-triage.yml only
|
||
2. If comment contains specific command (e.g., `@codebot help`) → ai-comment-reply.yml only
|
||
3. If comment contains `@codebot <question>` (no command) → ai-chat.yml only
|
||
|
||
This prevents the issue where all three workflows would trigger on every `@codebot` mention, causing massive duplication.
|
||
|
||
**CRITICAL: Bot Self-Trigger Prevention**
|
||
|
||
All workflows include `github.event.comment.user.login != 'Bartender'` to prevent infinite loops. Without this check:
|
||
- Bot posts comment mentioning `@codebot`
|
||
- Workflow triggers, bot posts another comment with `@codebot`
|
||
- Triggers again infinitely → 10+ duplicate runs
|
||
|
||
**If you change the bot username**, update all three workflow files:
|
||
- `.gitea/workflows/ai-comment-reply.yml`
|
||
- `.gitea/workflows/ai-chat.yml`
|
||
- `.gitea/workflows/ai-issue-triage.yml`
|
||
|
||
Look for: `github.event.comment.user.login != 'Bartender'` and replace `'Bartender'` with your bot's username.
|
||
|
||
**Note**: Issue triage is now **opt-in** via `@codebot triage` command, not automatic on issue creation.
|
||
|
||
Key workflow pattern:
|
||
1. Checkout repository
|
||
2. Setup Python 3.11
|
||
3. Install dependencies (`pip install requests pyyaml`)
|
||
4. Set environment variables
|
||
5. Run `python main.py <command> <args>`
|
||
|
||
## Prompt Templates
|
||
|
||
Prompts are stored in `tools/ai-review/prompts/` as Markdown files:
|
||
|
||
- `base.md` - Base instructions for all reviews
|
||
- `pr_summary.md` - PR summary generation template
|
||
- `changelog.md` - Keep a Changelog format generation template
|
||
- `explain_diff.md` - Plain-language diff explanation template
|
||
- `issue_triage.md` - Issue classification template
|
||
- `issue_response.md` - Issue response template
|
||
|
||
**Important**: JSON examples in prompts must use **double curly braces** (`{{` and `}}`) to escape Python's `.format()` method. This is tested in `tests/test_ai_review.py::TestPromptFormatting`.
|
||
|
||
## Code Patterns
|
||
|
||
### Creating a New Agent
|
||
|
||
```python
|
||
from agents.base_agent import BaseAgent, AgentContext, AgentResult
|
||
|
||
class MyAgent(BaseAgent):
|
||
def can_handle(self, event_type: str, event_data: dict) -> bool:
|
||
# Check if agent is enabled in config
|
||
if not self.config.get("agents", {}).get("my_agent", {}).get("enabled", True):
|
||
return False
|
||
return event_type == "my_event_type"
|
||
|
||
def execute(self, context: AgentContext) -> AgentResult:
|
||
# Load prompt template
|
||
prompt = self.load_prompt("my_prompt")
|
||
formatted = prompt.format(data=context.event_data.get("field"))
|
||
|
||
# Call LLM with rate limiting
|
||
response = self.call_llm(formatted)
|
||
|
||
# Post comment to issue/PR
|
||
self.upsert_comment(
|
||
context.owner,
|
||
context.repo,
|
||
issue_index,
|
||
response.content
|
||
)
|
||
|
||
return AgentResult(
|
||
success=True,
|
||
message="Agent completed",
|
||
actions_taken=["Posted comment"]
|
||
)
|
||
```
|
||
|
||
### Calling LLM with Tools
|
||
|
||
```python
|
||
messages = [
|
||
{"role": "system", "content": "You are a helpful assistant"},
|
||
{"role": "user", "content": "Search for authentication code"}
|
||
]
|
||
|
||
tools = [{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "search_code",
|
||
"description": "Search codebase",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {"query": {"type": "string"}},
|
||
"required": ["query"]
|
||
}
|
||
}
|
||
}]
|
||
|
||
response = self.llm.call_with_tools(messages, tools=tools)
|
||
|
||
if response.tool_calls:
|
||
for tc in response.tool_calls:
|
||
result = execute_tool(tc.name, tc.arguments)
|
||
messages.append({
|
||
"role": "tool",
|
||
"tool_call_id": tc.id,
|
||
"content": result
|
||
})
|
||
```
|
||
|
||
### Adding Security Rules
|
||
|
||
Edit `tools/ai-review/security/security_scanner.py` or create `security/security_rules.yml`:
|
||
|
||
```yaml
|
||
rules:
|
||
- id: SEC018
|
||
name: Custom Rule Name
|
||
pattern: 'regex_pattern_here'
|
||
severity: HIGH # HIGH, MEDIUM, LOW
|
||
category: A03:2021 Injection
|
||
cwe: CWE-XXX
|
||
description: What this detects
|
||
recommendation: How to fix it
|
||
```
|
||
|
||
## Security Best Practices
|
||
|
||
**CRITICAL**: Always follow these security guidelines when modifying workflows or handling webhook data.
|
||
|
||
### Workflow Security Rules
|
||
|
||
1. **Never pass full webhook data to environment variables**
|
||
```yaml
|
||
# ❌ NEVER DO THIS
|
||
env:
|
||
EVENT_DATA: ${{ toJSON(github.event) }} # Exposes emails, tokens, etc.
|
||
|
||
# ✅ ALWAYS DO THIS
|
||
run: |
|
||
EVENT_DATA=$(cat <<EOF
|
||
{
|
||
"issue": {"number": ${{ github.event.issue.number }}},
|
||
"comment": {"body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)}
|
||
}
|
||
EOF
|
||
)
|
||
python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"
|
||
```
|
||
|
||
2. **Always validate repository format**
|
||
```bash
|
||
# Validate before use
|
||
if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
|
||
echo "Error: Invalid repository format"
|
||
exit 1
|
||
fi
|
||
```
|
||
|
||
3. **Use safe_dispatch.py for webhook processing**
|
||
```bash
|
||
# Instead of inline Python with os.environ, use:
|
||
python utils/safe_dispatch.py issue_comment owner/repo "$EVENT_JSON"
|
||
```
|
||
|
||
### Input Validation
|
||
|
||
Always use `webhook_sanitizer.py` utilities:
|
||
|
||
```python
|
||
from utils.webhook_sanitizer import (
|
||
sanitize_webhook_data, # Remove sensitive fields
|
||
validate_repository_format, # Validate owner/repo format
|
||
extract_minimal_context, # Extract only necessary fields
|
||
)
|
||
|
||
# Validate repository input
|
||
owner, repo = validate_repository_format(repo_string) # Raises ValueError if invalid
|
||
|
||
# Sanitize webhook data
|
||
sanitized = sanitize_webhook_data(raw_event_data)
|
||
|
||
# Extract minimal context (reduces attack surface)
|
||
minimal = extract_minimal_context(event_type, sanitized)
|
||
```
|
||
|
||
### Pre-commit Security Scanning
|
||
|
||
Install pre-commit hooks to catch security issues before commit:
|
||
|
||
```bash
|
||
# Install pre-commit
|
||
pip install pre-commit
|
||
|
||
# Install hooks
|
||
pre-commit install
|
||
|
||
# Run manually
|
||
pre-commit run --all-files
|
||
```
|
||
|
||
The hooks will:
|
||
- Scan Python files for security vulnerabilities
|
||
- Validate workflow files for security anti-patterns
|
||
- Detect hardcoded secrets
|
||
- Run security scanner on code changes
|
||
|
||
### Security Resources
|
||
|
||
- **SECURITY.md** - Complete security guidelines and best practices
|
||
- **tools/ai-review/utils/webhook_sanitizer.py** - Input validation utilities
|
||
- **tools/ai-review/utils/safe_dispatch.py** - Safe webhook dispatch wrapper
|
||
- **.pre-commit-config.yaml** - Pre-commit hook configuration
|
||
|
||
## Testing
|
||
|
||
The test suite covers:
|
||
|
||
1. **Prompt Formatting** (`tests/test_ai_review.py`) - Ensures prompts don't have unescaped `{}` that break `.format()`
|
||
2. **Module Imports** - Verifies all modules can be imported
|
||
3. **Security Scanner** - Tests pattern detection and false positive rate
|
||
4. **Agent Context** - Tests dataclass creation and validation
|
||
5. **Security Utilities** (`tests/test_security_utils.py`) - Tests webhook sanitization, validation, and safe dispatch
|
||
6. **Safe Dispatch** (`tests/test_safe_dispatch.py`) - Tests secure event dispatching
|
||
5. **Metrics** - Tests enterprise metrics collection
|
||
|
||
Run specific test classes:
|
||
```bash
|
||
pytest tests/test_ai_review.py::TestPromptFormatting -v
|
||
pytest tests/test_ai_review.py::TestSecurityScanner -v
|
||
```
|
||
|
||
## Common Development Tasks
|
||
|
||
### PR Summary Generation
|
||
|
||
The PR summary feature automatically generates comprehensive summaries for pull requests.
|
||
|
||
**Key Features:**
|
||
- Auto-generates summary for PRs with empty descriptions
|
||
- Can be manually triggered with `@codebot summarize` in PR comments
|
||
- Analyzes diff to extract key changes, files affected, and impact
|
||
- Categorizes change type (Feature/Bugfix/Refactor/Documentation/Testing)
|
||
- Posts as comment or updates PR description (configurable)
|
||
|
||
**Implementation Details:**
|
||
|
||
1. **Auto-Summary on PR Open** - `PRAgent.execute()`:
|
||
- Checks if PR body is empty and `auto_summary.enabled` is true
|
||
- Calls `_generate_pr_summary()` automatically
|
||
- Continues with normal PR review after posting summary
|
||
|
||
2. **Manual Trigger** - `@codebot summarize` in PR comments:
|
||
- `PRAgent.can_handle()` detects `summarize` command in PR comments
|
||
- Routes to `_handle_summarize_command()`
|
||
- Generates and posts summary on demand
|
||
|
||
3. **Summary Generation** - `_generate_pr_summary()`:
|
||
- Fetches PR diff using `_get_diff()`
|
||
- Loads `prompts/pr_summary.md` template
|
||
- Calls LLM with diff to analyze changes
|
||
- Returns structured JSON with summary data
|
||
- Formats using `_format_pr_summary()`
|
||
- Posts as comment or updates description based on config
|
||
|
||
4. **Configuration** - `config.yml`:
|
||
```yaml
|
||
agents:
|
||
pr:
|
||
auto_summary:
|
||
enabled: true # Auto-generate for empty PRs
|
||
post_as_comment: true # true = comment, false = update description
|
||
```
|
||
|
||
**Summary Structure:**
|
||
- Brief 2-3 sentence overview
|
||
- Change type categorization (Feature/Bugfix/Refactor/etc)
|
||
- Key changes (Added/Modified/Removed)
|
||
- Files affected with descriptions
|
||
- Impact assessment (scope: small/medium/large)
|
||
|
||
**Common Use Cases:**
|
||
- Developers who forget to write PR descriptions
|
||
- Quick understanding of complex changes
|
||
- Standardized documentation format
|
||
- Pre-review context for reviewers
|
||
|
||
### PR Changelog Generation
|
||
|
||
The `@codebot changelog` command generates Keep a Changelog format entries from PR diffs.
|
||
|
||
**Key Features:**
|
||
- Generates structured changelog entries following Keep a Changelog format
|
||
- Categorizes changes: Added/Changed/Deprecated/Removed/Fixed/Security
|
||
- Automatically detects breaking changes
|
||
- Includes technical details (files changed, LOC, components)
|
||
- Output is ready to copy-paste into CHANGELOG.md
|
||
|
||
**Implementation Details:**
|
||
|
||
1. **Command Handler** - `PRAgent._handle_changelog_command()`:
|
||
- Triggered by `@codebot changelog` in PR comments
|
||
- Fetches PR title, description, and diff
|
||
- Loads `prompts/changelog.md` template
|
||
- Formats prompt with PR context
|
||
|
||
2. **LLM Analysis** - Generates structured JSON:
|
||
```json
|
||
{
|
||
"changelog": {
|
||
"added": ["New features"],
|
||
"changed": ["Changes to existing functionality"],
|
||
"fixed": ["Bug fixes"],
|
||
"security": ["Security fixes"]
|
||
},
|
||
"breaking_changes": ["Breaking changes"],
|
||
"technical_details": {
|
||
"files_changed": 15,
|
||
"insertions": 450,
|
||
"deletions": 120,
|
||
"main_components": ["auth/", "api/"]
|
||
}
|
||
}
|
||
```
|
||
|
||
3. **Formatting** - `_format_changelog()`:
|
||
- Converts JSON to Keep a Changelog markdown format
|
||
- Uses emojis for visual categorization (✨ Added, 🔄 Changed, 🐛 Fixed)
|
||
- Highlights breaking changes prominently
|
||
- Includes technical summary at the end
|
||
- Omits empty sections for clean output
|
||
|
||
4. **Prompt Engineering** - `prompts/changelog.md`:
|
||
- User-focused language (not developer jargon)
|
||
- Filters noise (formatting, typos, minor refactoring)
|
||
- Groups related changes
|
||
- Active voice, concise entries
|
||
- Maximum 100 characters per entry
|
||
|
||
**Common Use Cases:**
|
||
- Preparing release notes
|
||
- Maintaining CHANGELOG.md
|
||
- Customer-facing announcements
|
||
- Version documentation
|
||
|
||
**Workflow Safety:**
|
||
- Only triggers on PR comments (not issue comments)
|
||
- Included in ai-comment-reply.yml workflow conditions
|
||
- Excluded from ai-chat.yml to prevent duplicate runs
|
||
- No automatic triggering - manual command only
|
||
|
||
### Code Diff Explainer
|
||
|
||
The `@codebot explain-diff` command translates technical code changes into plain language for non-technical stakeholders.
|
||
|
||
**Key Features:**
|
||
- Plain-language explanations without jargon
|
||
- File-by-file breakdown with "what" and "why" context
|
||
- Architecture impact analysis
|
||
- Breaking change detection
|
||
- Perfect for PMs, designers, and new team members
|
||
|
||
**Implementation Details:**
|
||
|
||
1. **Command Handler** - `PRAgent._handle_explain_diff_command()`:
|
||
- Triggered by `@codebot explain-diff` in PR comments
|
||
- Fetches PR title, description, and full diff
|
||
- Loads `prompts/explain_diff.md` template
|
||
- Formats prompt with PR context
|
||
|
||
2. **LLM Analysis** - Generates plain-language JSON:
|
||
```json
|
||
{
|
||
"overview": "High-level summary in everyday language",
|
||
"key_changes": [
|
||
{
|
||
"file": "path/to/file.py",
|
||
"status": "new|modified|deleted",
|
||
"explanation": "What changed (no jargon)",
|
||
"why_it_matters": "Business/user impact"
|
||
}
|
||
],
|
||
"architecture_impact": {
|
||
"description": "System-level effects explained simply",
|
||
"new_dependencies": ["External libraries added"],
|
||
"affected_components": ["System parts impacted"]
|
||
},
|
||
"breaking_changes": ["User-facing breaking changes"],
|
||
"technical_details": { /* Stats for reference */ }
|
||
}
|
||
```
|
||
|
||
3. **Formatting** - `_format_diff_explanation()`:
|
||
- Converts JSON to readable markdown
|
||
- Uses emojis for visual categorization (➕ new, 📝 modified, 🗑️ deleted)
|
||
- Highlights breaking changes prominently
|
||
- Includes technical summary for developers
|
||
- Omits empty sections for clean output
|
||
|
||
4. **Prompt Engineering** - `prompts/explain_diff.md`:
|
||
- **Avoids jargon**: "API" → "connection point between systems"
|
||
- **Explains why**: Not just what changed, but why it matters
|
||
- **Uses analogies**: "Caching" → "memory system for faster loading"
|
||
- **Focus on impact**: Who is affected and how
|
||
- **Groups changes**: Combines related files into themes
|
||
- **Translates concepts**: Technical terms → everyday language
|
||
|
||
**Plain Language Rules:**
|
||
- ❌ "Refactored authentication middleware" → ✅ "Updated login system for better security"
|
||
- ❌ "Implemented Redis caching" → ✅ "Added memory to make pages load 10x faster"
|
||
- ❌ "Database migration" → ✅ "Updated how data is stored"
|
||
|
||
**Common Use Cases:**
|
||
- New team members understanding large PRs
|
||
- Non-technical reviewers (PMs, designers) reviewing features
|
||
- Documenting architectural decisions
|
||
- Learning from other developers' code
|
||
|
||
**Workflow Safety:**
|
||
- Only triggers on PR comments (not issue comments)
|
||
- Included in ai-comment-reply.yml workflow conditions
|
||
- Excluded from ai-chat.yml to prevent duplicate runs
|
||
- No automatic triggering - manual command only
|
||
|
||
### Review-Again Command Implementation
|
||
|
||
The `@codebot review-again` command allows manual re-triggering of PR reviews without new commits.
|
||
|
||
**Key Features:**
|
||
- Detects `@codebot review-again` in PR comments (not issue comments)
|
||
- Compares new review with previous review to show resolved/new issues
|
||
- Updates existing AI review comment instead of creating duplicates
|
||
- Updates PR labels based on new severity assessment
|
||
|
||
**Implementation Details:**
|
||
|
||
1. **PRAgent.can_handle()** - Handles `issue_comment` events on PRs containing "review-again"
|
||
2. **PRAgent._handle_review_again()** - Main handler that:
|
||
- Fetches previous review comment
|
||
- Re-runs full PR review (security scan + AI analysis)
|
||
- Compares findings using `_compare_reviews()`
|
||
- Generates diff report with `_format_review_update()`
|
||
- Updates comment and labels
|
||
|
||
3. **Review Comparison** - Uses finding keys (file:line:description) to match issues:
|
||
- **Resolved**: Issues in previous but not in current review
|
||
- **New**: Issues in current but not in previous review
|
||
- **Still Present**: Issues in both reviews
|
||
- **Severity Changed**: Same issue with different severity
|
||
|
||
4. **Workflow Integration** - `.gitea/workflows/ai-comment-reply.yml`:
|
||
- Detects if comment is on PR or issue
|
||
- Uses `dispatch` command for PRs to route to PRAgent
|
||
- Preserves backward compatibility with issue commands
|
||
|
||
**Usage:**
|
||
```bash
|
||
# In a PR comment:
|
||
@codebot review-again
|
||
```
|
||
|
||
**Common Use Cases:**
|
||
- Re-evaluate after explaining false positives in comments
|
||
- Test new `.ai-review.yml` configuration
|
||
- Update severity after code clarification
|
||
- Faster iteration without empty commits
|
||
|
||
### Adding a New Command to @codebot
|
||
|
||
1. Add command to `config.yml` under `interaction.commands`
|
||
2. Add handler method in `IssueAgent` (e.g., `_command_yourcommand()`)
|
||
3. Update `_handle_command()` to route the command to your handler
|
||
4. Update README.md with command documentation
|
||
5. Add tests in `tests/test_ai_review.py`
|
||
|
||
Example commands:
|
||
- `@codebot help` - Show all available commands with examples
|
||
- `@codebot triage` - Full issue triage with labeling
|
||
- `@codebot explain` - Explain the issue
|
||
- `@codebot suggest` - Suggest solutions
|
||
- `@codebot summarize` - Generate PR summary or issue summary (works on both)
|
||
- `@codebot changelog` - Generate Keep a Changelog format entries (PR comments only)
|
||
- `@codebot explain-diff` - Explain code changes in plain language (PR comments only)
|
||
- `@codebot setup-labels` - Automatic label setup (built-in, not in config)
|
||
- `@codebot review-again` - Re-run PR review without new commits (PR comments only)
|
||
|
||
### Changing the Bot Name
|
||
|
||
1. Edit `config.yml`: `interaction.mention_prefix: "@newname"`
|
||
2. Update all Gitea workflow files in `.gitea/workflows/` (search for `contains(github.event.comment.body`)
|
||
3. Update README.md and documentation
|
||
|
||
### Supporting a New LLM Provider
|
||
|
||
1. Create provider class in `clients/llm_client.py` inheriting from `BaseLLMProvider`
|
||
2. Implement `call()` and optionally `call_with_tools()`
|
||
3. Register in `LLMClient.PROVIDERS` dict
|
||
4. Add model config to `config.yml`
|
||
5. Document in README.md
|
||
|
||
## Repository Labels
|
||
|
||
### Automatic Label Setup (Recommended)
|
||
|
||
Use the `@codebot setup-labels` command to automatically configure labels. This command:
|
||
|
||
**For repositories with existing labels:**
|
||
- Detects naming patterns: `Kind/Bug`, `Priority - High`, `type: bug`
|
||
- Maps existing labels to OpenRabbit schema using aliases
|
||
- Creates only missing labels following detected pattern
|
||
- Zero duplicate labels
|
||
|
||
**For fresh repositories:**
|
||
- Creates OpenRabbit's default label set
|
||
- Uses standard naming: `type:`, `priority:`, status labels
|
||
|
||
**Example with existing `Kind/` and `Priority -` labels:**
|
||
```
|
||
@codebot setup-labels
|
||
|
||
✅ Found 18 existing labels with pattern: prefix_slash
|
||
|
||
Proposed Mapping:
|
||
| OpenRabbit Expected | Your Existing Label | Status |
|
||
|---------------------|---------------------|--------|
|
||
| type: bug | Kind/Bug | ✅ Map |
|
||
| type: feature | Kind/Feature | ✅ Map |
|
||
| priority: high | Priority - High | ✅ Map |
|
||
| ai-reviewed | (missing) | ⚠️ Create |
|
||
|
||
✅ Created Kind/Question
|
||
✅ Created Status - AI Reviewed
|
||
|
||
Setup Complete! Auto-labeling will use your existing label schema.
|
||
```
|
||
|
||
### Manual Label Setup
|
||
|
||
The system expects these labels to exist in repositories for auto-labeling:
|
||
|
||
- `priority: critical`, `priority: high`, `priority: medium`, `priority: low`
|
||
- `type: bug`, `type: feature`, `type: question`, `type: documentation`, `type: security`, `type: testing`
|
||
- `ai-approved`, `ai-changes-required`, `ai-reviewed`
|
||
|
||
Labels are mapped in `config.yml` under the `labels` section.
|
||
|
||
### Label Configuration Format
|
||
|
||
Labels support two formats for backwards compatibility:
|
||
|
||
**New format (with colors and aliases):**
|
||
```yaml
|
||
labels:
|
||
type:
|
||
bug:
|
||
name: "type: bug"
|
||
color: "d73a4a" # Red
|
||
description: "Something isn't working"
|
||
aliases: ["Kind/Bug", "bug", "Type: Bug"] # For auto-detection
|
||
```
|
||
|
||
**Old format (strings only):**
|
||
```yaml
|
||
labels:
|
||
type:
|
||
bug: "type: bug" # Still works, uses default blue color
|
||
```
|
||
|
||
### Label Pattern Detection
|
||
|
||
The `setup-labels` command detects these patterns (configured in `label_patterns`):
|
||
|
||
1. **prefix_slash**: `Kind/Bug`, `Type/Feature`, `Category/X`
|
||
2. **prefix_dash**: `Priority - High`, `Status - Blocked`
|
||
3. **colon**: `type: bug`, `priority: high`
|
||
|
||
When creating missing labels, the bot follows the detected pattern to maintain consistency.
|