# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview OpenRabbit is an enterprise-grade AI code review system for Gitea (and GitHub). It provides automated PR review, issue triage, interactive chat, and codebase analysis through a collection of specialized AI agents. ## Commands ### Development ```bash # Run tests pytest tests/ -v # Run specific test file pytest tests/test_ai_review.py -v # Install dependencies pip install -r tools/ai-review/requirements.txt # Run a PR review locally cd tools/ai-review python main.py pr owner/repo 123 # Run issue triage python main.py issue owner/repo 456 # Test chat functionality python main.py chat owner/repo "How does authentication work?" # Run with custom config python main.py pr owner/repo 123 --config /path/to/config.yml ``` ### Testing Workflows ```bash # Validate workflow YAML syntax python -c "import yaml; yaml.safe_load(open('.github/workflows/ai-review.yml'))" # Test security scanner python -c "from security.security_scanner import SecurityScanner; s = SecurityScanner(); print(list(s.scan_content('password = \"secret123\"', 'test.py')))" # Test webhook sanitization cd tools/ai-review python -c "from utils.webhook_sanitizer import sanitize_webhook_data; print(sanitize_webhook_data({'user': {'email': 'test@example.com'}}))" # Test safe dispatch python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", "issue": {"number": 1}, "comment": {"body": "test"}}' ``` ## Architecture ### Agent System The codebase uses an **agent-based architecture** where specialized agents handle different types of events: 1. **BaseAgent** (`agents/base_agent.py`) - Abstract base class providing: - Gitea API client integration - LLM client integration with rate limiting - Common comment management (upsert, find AI comments) - Prompt loading from `prompts/` directory - Standard execution flow with error handling 2. **Specialized Agents** - Each agent implements: - `can_handle(event_type, event_data)` - Determines if agent should process the event - `execute(context)` - Main execution logic - Returns `AgentResult` with success status, message, data, and actions taken - **PRAgent** - Reviews pull requests with inline comments and security scanning - **IssueAgent** - Triages issues and responds to @ai-bot commands - **CodebaseAgent** - Analyzes entire codebase health and tech debt - **ChatAgent** - Interactive assistant with tool calling (search_codebase, read_file, search_web) 3. **Dispatcher** (`dispatcher.py`) - Routes events to appropriate agents: - Registers agents at startup - Determines which agents can handle each event - Executes agents (supports concurrent execution) - Returns aggregated results ### Multi-Provider LLM Client The `LLMClient` (`clients/llm_client.py`) provides a unified interface for multiple LLM providers: - **OpenAI** - Primary provider (gpt-4.1-mini default) - **OpenRouter** - Multi-provider access (claude-3.5-sonnet) - **Ollama** - Self-hosted models (codellama:13b) Key features: - Tool/function calling support via `call_with_tools(messages, tools)` - JSON response parsing with fallback extraction - Provider-specific configuration via `config.yml` ### Platform Abstraction The `GiteaClient` (`clients/gitea_client.py`) provides a unified REST API client for **Gitea** (also compatible with GitHub API): - Issue operations (create, update, list, get, comments, labels) - PR operations (get, diff, files, reviews) - Repository operations (get repo, file contents, branches) Environment variables: - `AI_REVIEW_API_URL` - API base URL (e.g., `https://api.github.com` or `https://gitea.example.com/api/v1`) - `AI_REVIEW_TOKEN` - Authentication token ### Security Scanner The `SecurityScanner` (`security/security_scanner.py`) uses **pattern-based detection** with 17 built-in rules covering: - OWASP Top 10 categories (A01-A10) - Common vulnerabilities (SQL injection, XSS, hardcoded secrets, weak crypto) - Returns `SecurityFinding` objects with severity (HIGH/MEDIUM/LOW), CWE references, and recommendations Can scan: - File content via `scan_content(content, filename)` - Git diffs via `scan_diff(diff)` - only scans added lines ### Chat Agent Tool Calling The `ChatAgent` implements an **iterative tool calling loop**: 1. Send user message + system prompt to LLM with available tools 2. If LLM returns tool calls, execute each tool and append results to conversation 3. Repeat until LLM returns a final response (max 5 iterations) Available tools: - `search_codebase` - Searches repository files and code patterns - `read_file` - Reads specific file contents (truncated at 8KB) - `search_web` - Queries SearXNG instance (requires `SEARXNG_URL`) ## Configuration ### Primary Config File: `tools/ai-review/config.yml` Critical settings: ```yaml provider: openai # openai | openrouter | ollama model: openai: gpt-4.1-mini openrouter: anthropic/claude-3.5-sonnet ollama: codellama:13b interaction: mention_prefix: "@codebot" # Bot trigger name - update workflows too! commands: - explain # Explain what the issue is about - suggest # Suggest solutions or next steps - security # Security analysis - summarize # Summarize the issue - triage # Full triage with labeling - review-again # Re-run PR review (PR comments only) review: fail_on_severity: HIGH # Fail CI if HIGH severity issues found max_diff_lines: 800 # Skip review if diff too large agents: chat: max_iterations: 5 # Tool calling loop limit ``` **Important**: When changing `mention_prefix`, also update all workflow files in `.gitea/workflows/`: - `ai-comment-reply.yml` - `ai-chat.yml` - `ai-issue-triage.yml` Look for: `if: contains(github.event.comment.body, '@codebot')` and update to your new bot name. Current bot name: `@codebot` ### Environment Variables Required: - `AI_REVIEW_API_URL` - Platform API URL - `AI_REVIEW_TOKEN` - Bot authentication token - `OPENAI_API_KEY` - OpenAI API key (or provider-specific key) Optional: - `SEARXNG_URL` - SearXNG instance for web search - `OPENROUTER_API_KEY` - OpenRouter API key - `OLLAMA_HOST` - Ollama server URL ## Workflow Architecture Workflows are located in `.gitea/workflows/` and are **mutually exclusive** to prevent duplicate runs: - **enterprise-ai-review.yml** - Triggered on PR open/sync - **ai-issue-triage.yml** - Triggered ONLY on `@codebot triage` in comments - **ai-comment-reply.yml** - Triggered on specific commands: `help`, `explain`, `suggest`, `security`, `summarize`, `review-again`, `setup-labels` - **ai-chat.yml** - Triggered on `@codebot` mentions that are NOT specific commands (free-form questions) - **ai-codebase-review.yml** - Scheduled weekly analysis **Workflow Routing Logic:** 1. If comment contains `@codebot triage` → ai-issue-triage.yml only 2. If comment contains specific command (e.g., `@codebot help`) → ai-comment-reply.yml only 3. If comment contains `@codebot ` (no command) → ai-chat.yml only This prevents the issue where all three workflows would trigger on every `@codebot` mention, causing massive duplication. **Note**: Issue triage is now **opt-in** via `@codebot triage` command, not automatic on issue creation. Key workflow pattern: 1. Checkout repository 2. Setup Python 3.11 3. Install dependencies (`pip install requests pyyaml`) 4. Set environment variables 5. Run `python main.py ` ## Prompt Templates Prompts are stored in `tools/ai-review/prompts/` as Markdown files: - `base.md` - Base instructions for all reviews - `pr_summary.md` - PR summary generation template - `issue_triage.md` - Issue classification template - `issue_response.md` - Issue response template **Important**: JSON examples in prompts must use **double curly braces** (`{{` and `}}`) to escape Python's `.format()` method. This is tested in `tests/test_ai_review.py::TestPromptFormatting`. ## Code Patterns ### Creating a New Agent ```python from agents.base_agent import BaseAgent, AgentContext, AgentResult class MyAgent(BaseAgent): def can_handle(self, event_type: str, event_data: dict) -> bool: # Check if agent is enabled in config if not self.config.get("agents", {}).get("my_agent", {}).get("enabled", True): return False return event_type == "my_event_type" def execute(self, context: AgentContext) -> AgentResult: # Load prompt template prompt = self.load_prompt("my_prompt") formatted = prompt.format(data=context.event_data.get("field")) # Call LLM with rate limiting response = self.call_llm(formatted) # Post comment to issue/PR self.upsert_comment( context.owner, context.repo, issue_index, response.content ) return AgentResult( success=True, message="Agent completed", actions_taken=["Posted comment"] ) ``` ### Calling LLM with Tools ```python messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Search for authentication code"} ] tools = [{ "type": "function", "function": { "name": "search_code", "description": "Search codebase", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"] } } }] response = self.llm.call_with_tools(messages, tools=tools) if response.tool_calls: for tc in response.tool_calls: result = execute_tool(tc.name, tc.arguments) messages.append({ "role": "tool", "tool_call_id": tc.id, "content": result }) ``` ### Adding Security Rules Edit `tools/ai-review/security/security_scanner.py` or create `security/security_rules.yml`: ```yaml rules: - id: SEC018 name: Custom Rule Name pattern: 'regex_pattern_here' severity: HIGH # HIGH, MEDIUM, LOW category: A03:2021 Injection cwe: CWE-XXX description: What this detects recommendation: How to fix it ``` ## Security Best Practices **CRITICAL**: Always follow these security guidelines when modifying workflows or handling webhook data. ### Workflow Security Rules 1. **Never pass full webhook data to environment variables** ```yaml # ❌ NEVER DO THIS env: EVENT_DATA: ${{ toJSON(github.event) }} # Exposes emails, tokens, etc. # ✅ ALWAYS DO THIS run: | EVENT_DATA=$(cat <