Files
openrabbit/CLAUDE.md
latte 21470c7a4a
All checks were successful
Enterprise AI Code Review / ai-review (pull_request) Successful in 40s
fix: Prevent duplicate workflow runs on @codebot mentions
Critical fix for workflow routing that was causing 3x duplication on every
@codebot mention. All three workflows (ai-chat, ai-comment-reply, ai-issue-triage)
were triggering simultaneously.

Changes:
- ai-issue-triage.yml: Only runs on '@codebot triage' (unchanged, already specific)
- ai-comment-reply.yml: Only runs on specific commands (help, explain, suggest, etc)
- ai-chat.yml: Only runs on free-form questions (excludes all specific commands)

Workflow routing logic:
1. '@codebot triage' → ai-issue-triage.yml ONLY
2. '@codebot <command>' → ai-comment-reply.yml ONLY
3. '@codebot <question>' → ai-chat.yml ONLY (fallback)

This prevents the massive duplication issue where every @codebot mention
triggered all three workflows simultaneously, causing 10+ redundant runs.

Updated documentation in CLAUDE.md with workflow routing architecture.
2025-12-29 10:31:07 +00:00

20 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

OpenRabbit is an enterprise-grade AI code review system for Gitea (and GitHub). It provides automated PR review, issue triage, interactive chat, and codebase analysis through a collection of specialized AI agents.

Commands

Development

# Run tests
pytest tests/ -v

# Run specific test file
pytest tests/test_ai_review.py -v

# Install dependencies
pip install -r tools/ai-review/requirements.txt

# Run a PR review locally
cd tools/ai-review
python main.py pr owner/repo 123

# Run issue triage
python main.py issue owner/repo 456

# Test chat functionality
python main.py chat owner/repo "How does authentication work?"

# Run with custom config
python main.py pr owner/repo 123 --config /path/to/config.yml

Testing Workflows

# Validate workflow YAML syntax
python -c "import yaml; yaml.safe_load(open('.github/workflows/ai-review.yml'))"

# Test security scanner
python -c "from security.security_scanner import SecurityScanner; s = SecurityScanner(); print(list(s.scan_content('password = \"secret123\"', 'test.py')))"

# Test webhook sanitization
cd tools/ai-review
python -c "from utils.webhook_sanitizer import sanitize_webhook_data; print(sanitize_webhook_data({'user': {'email': 'test@example.com'}}))"

# Test safe dispatch
python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", "issue": {"number": 1}, "comment": {"body": "test"}}'

Architecture

Agent System

The codebase uses an agent-based architecture where specialized agents handle different types of events:

  1. BaseAgent (agents/base_agent.py) - Abstract base class providing:

    • Gitea API client integration
    • LLM client integration with rate limiting
    • Common comment management (upsert, find AI comments)
    • Prompt loading from prompts/ directory
    • Standard execution flow with error handling
  2. Specialized Agents - Each agent implements:

    • can_handle(event_type, event_data) - Determines if agent should process the event

    • execute(context) - Main execution logic

    • Returns AgentResult with success status, message, data, and actions taken

    • PRAgent - Reviews pull requests with inline comments and security scanning

    • IssueAgent - Triages issues and responds to @ai-bot commands

    • CodebaseAgent - Analyzes entire codebase health and tech debt

    • ChatAgent - Interactive assistant with tool calling (search_codebase, read_file, search_web)

  3. Dispatcher (dispatcher.py) - Routes events to appropriate agents:

    • Registers agents at startup
    • Determines which agents can handle each event
    • Executes agents (supports concurrent execution)
    • Returns aggregated results

Multi-Provider LLM Client

The LLMClient (clients/llm_client.py) provides a unified interface for multiple LLM providers:

  • OpenAI - Primary provider (gpt-4.1-mini default)
  • OpenRouter - Multi-provider access (claude-3.5-sonnet)
  • Ollama - Self-hosted models (codellama:13b)

Key features:

  • Tool/function calling support via call_with_tools(messages, tools)
  • JSON response parsing with fallback extraction
  • Provider-specific configuration via config.yml

Platform Abstraction

The GiteaClient (clients/gitea_client.py) provides a unified REST API client for Gitea (also compatible with GitHub API):

  • Issue operations (create, update, list, get, comments, labels)
  • PR operations (get, diff, files, reviews)
  • Repository operations (get repo, file contents, branches)

Environment variables:

  • AI_REVIEW_API_URL - API base URL (e.g., https://api.github.com or https://gitea.example.com/api/v1)
  • AI_REVIEW_TOKEN - Authentication token

Security Scanner

The SecurityScanner (security/security_scanner.py) uses pattern-based detection with 17 built-in rules covering:

  • OWASP Top 10 categories (A01-A10)
  • Common vulnerabilities (SQL injection, XSS, hardcoded secrets, weak crypto)
  • Returns SecurityFinding objects with severity (HIGH/MEDIUM/LOW), CWE references, and recommendations

Can scan:

  • File content via scan_content(content, filename)
  • Git diffs via scan_diff(diff) - only scans added lines

Chat Agent Tool Calling

The ChatAgent implements an iterative tool calling loop:

  1. Send user message + system prompt to LLM with available tools
  2. If LLM returns tool calls, execute each tool and append results to conversation
  3. Repeat until LLM returns a final response (max 5 iterations)

Available tools:

  • search_codebase - Searches repository files and code patterns
  • read_file - Reads specific file contents (truncated at 8KB)
  • search_web - Queries SearXNG instance (requires SEARXNG_URL)

Configuration

Primary Config File: tools/ai-review/config.yml

Critical settings:

provider: openai  # openai | openrouter | ollama

model:
  openai: gpt-4.1-mini
  openrouter: anthropic/claude-3.5-sonnet
  ollama: codellama:13b

interaction:
  mention_prefix: "@codebot"  # Bot trigger name - update workflows too!
  commands:
    - explain      # Explain what the issue is about
    - suggest      # Suggest solutions or next steps
    - security     # Security analysis
    - summarize    # Summarize the issue
    - triage       # Full triage with labeling
    - review-again # Re-run PR review (PR comments only)
  
review:
  fail_on_severity: HIGH  # Fail CI if HIGH severity issues found
  max_diff_lines: 800     # Skip review if diff too large
  
agents:
  chat:
    max_iterations: 5  # Tool calling loop limit

Important: When changing mention_prefix, also update all workflow files in .gitea/workflows/:

  • ai-comment-reply.yml
  • ai-chat.yml
  • ai-issue-triage.yml

Look for: if: contains(github.event.comment.body, '@codebot') and update to your new bot name.

Current bot name: @codebot

Environment Variables

Required:

  • AI_REVIEW_API_URL - Platform API URL
  • AI_REVIEW_TOKEN - Bot authentication token
  • OPENAI_API_KEY - OpenAI API key (or provider-specific key)

Optional:

  • SEARXNG_URL - SearXNG instance for web search
  • OPENROUTER_API_KEY - OpenRouter API key
  • OLLAMA_HOST - Ollama server URL

Workflow Architecture

Workflows are located in .gitea/workflows/ and are mutually exclusive to prevent duplicate runs:

  • enterprise-ai-review.yml - Triggered on PR open/sync
  • ai-issue-triage.yml - Triggered ONLY on @codebot triage in comments
  • ai-comment-reply.yml - Triggered on specific commands: help, explain, suggest, security, summarize, review-again, setup-labels
  • ai-chat.yml - Triggered on @codebot mentions that are NOT specific commands (free-form questions)
  • ai-codebase-review.yml - Scheduled weekly analysis

Workflow Routing Logic:

  1. If comment contains @codebot triage → ai-issue-triage.yml only
  2. If comment contains specific command (e.g., @codebot help) → ai-comment-reply.yml only
  3. If comment contains @codebot <question> (no command) → ai-chat.yml only

This prevents the issue where all three workflows would trigger on every @codebot mention, causing massive duplication.

Note: Issue triage is now opt-in via @codebot triage command, not automatic on issue creation.

Key workflow pattern:

  1. Checkout repository
  2. Setup Python 3.11
  3. Install dependencies (pip install requests pyyaml)
  4. Set environment variables
  5. Run python main.py <command> <args>

Prompt Templates

Prompts are stored in tools/ai-review/prompts/ as Markdown files:

  • base.md - Base instructions for all reviews
  • pr_summary.md - PR summary generation template
  • issue_triage.md - Issue classification template
  • issue_response.md - Issue response template

Important: JSON examples in prompts must use double curly braces ({{ and }}) to escape Python's .format() method. This is tested in tests/test_ai_review.py::TestPromptFormatting.

Code Patterns

Creating a New Agent

from agents.base_agent import BaseAgent, AgentContext, AgentResult

class MyAgent(BaseAgent):
    def can_handle(self, event_type: str, event_data: dict) -> bool:
        # Check if agent is enabled in config
        if not self.config.get("agents", {}).get("my_agent", {}).get("enabled", True):
            return False
        return event_type == "my_event_type"
    
    def execute(self, context: AgentContext) -> AgentResult:
        # Load prompt template
        prompt = self.load_prompt("my_prompt")
        formatted = prompt.format(data=context.event_data.get("field"))
        
        # Call LLM with rate limiting
        response = self.call_llm(formatted)
        
        # Post comment to issue/PR
        self.upsert_comment(
            context.owner,
            context.repo,
            issue_index,
            response.content
        )
        
        return AgentResult(
            success=True,
            message="Agent completed",
            actions_taken=["Posted comment"]
        )

Calling LLM with Tools

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Search for authentication code"}
]

tools = [{
    "type": "function",
    "function": {
        "name": "search_code",
        "description": "Search codebase",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
}]

response = self.llm.call_with_tools(messages, tools=tools)

if response.tool_calls:
    for tc in response.tool_calls:
        result = execute_tool(tc.name, tc.arguments)
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": result
        })

Adding Security Rules

Edit tools/ai-review/security/security_scanner.py or create security/security_rules.yml:

rules:
  - id: SEC018
    name: Custom Rule Name
    pattern: 'regex_pattern_here'
    severity: HIGH  # HIGH, MEDIUM, LOW
    category: A03:2021 Injection
    cwe: CWE-XXX
    description: What this detects
    recommendation: How to fix it

Security Best Practices

CRITICAL: Always follow these security guidelines when modifying workflows or handling webhook data.

Workflow Security Rules

  1. Never pass full webhook data to environment variables

    # ❌ NEVER DO THIS
    env:
      EVENT_DATA: ${{ toJSON(github.event) }}  # Exposes emails, tokens, etc.
    
    # ✅ ALWAYS DO THIS
    run: |
      EVENT_DATA=$(cat <<EOF
      {
        "issue": {"number": ${{ github.event.issue.number }}},
        "comment": {"body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)}
      }
      EOF
      )
      python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"
    
  2. Always validate repository format

    # Validate before use
    if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
        echo "Error: Invalid repository format"
        exit 1
    fi
    
  3. Use safe_dispatch.py for webhook processing

    # Instead of inline Python with os.environ, use:
    python utils/safe_dispatch.py issue_comment owner/repo "$EVENT_JSON"
    

Input Validation

Always use webhook_sanitizer.py utilities:

from utils.webhook_sanitizer import (
    sanitize_webhook_data,      # Remove sensitive fields
    validate_repository_format,  # Validate owner/repo format
    extract_minimal_context,     # Extract only necessary fields
)

# Validate repository input
owner, repo = validate_repository_format(repo_string)  # Raises ValueError if invalid

# Sanitize webhook data
sanitized = sanitize_webhook_data(raw_event_data)

# Extract minimal context (reduces attack surface)
minimal = extract_minimal_context(event_type, sanitized)

Pre-commit Security Scanning

Install pre-commit hooks to catch security issues before commit:

# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install

# Run manually
pre-commit run --all-files

The hooks will:

  • Scan Python files for security vulnerabilities
  • Validate workflow files for security anti-patterns
  • Detect hardcoded secrets
  • Run security scanner on code changes

Security Resources

  • SECURITY.md - Complete security guidelines and best practices
  • tools/ai-review/utils/webhook_sanitizer.py - Input validation utilities
  • tools/ai-review/utils/safe_dispatch.py - Safe webhook dispatch wrapper
  • .pre-commit-config.yaml - Pre-commit hook configuration

Testing

The test suite covers:

  1. Prompt Formatting (tests/test_ai_review.py) - Ensures prompts don't have unescaped {} that break .format()
  2. Module Imports - Verifies all modules can be imported
  3. Security Scanner - Tests pattern detection and false positive rate
  4. Agent Context - Tests dataclass creation and validation
  5. Security Utilities (tests/test_security_utils.py) - Tests webhook sanitization, validation, and safe dispatch
  6. Safe Dispatch (tests/test_safe_dispatch.py) - Tests secure event dispatching
  7. Metrics - Tests enterprise metrics collection

Run specific test classes:

pytest tests/test_ai_review.py::TestPromptFormatting -v
pytest tests/test_ai_review.py::TestSecurityScanner -v

Common Development Tasks

PR Summary Generation

The PR summary feature automatically generates comprehensive summaries for pull requests.

Key Features:

  • Auto-generates summary for PRs with empty descriptions
  • Can be manually triggered with @codebot summarize in PR comments
  • Analyzes diff to extract key changes, files affected, and impact
  • Categorizes change type (Feature/Bugfix/Refactor/Documentation/Testing)
  • Posts as comment or updates PR description (configurable)

Implementation Details:

  1. Auto-Summary on PR Open - PRAgent.execute():

    • Checks if PR body is empty and auto_summary.enabled is true
    • Calls _generate_pr_summary() automatically
    • Continues with normal PR review after posting summary
  2. Manual Trigger - @codebot summarize in PR comments:

    • PRAgent.can_handle() detects summarize command in PR comments
    • Routes to _handle_summarize_command()
    • Generates and posts summary on demand
  3. Summary Generation - _generate_pr_summary():

    • Fetches PR diff using _get_diff()
    • Loads prompts/pr_summary.md template
    • Calls LLM with diff to analyze changes
    • Returns structured JSON with summary data
    • Formats using _format_pr_summary()
    • Posts as comment or updates description based on config
  4. Configuration - config.yml:

    agents:
      pr:
        auto_summary:
          enabled: true  # Auto-generate for empty PRs
          post_as_comment: true  # true = comment, false = update description
    

Summary Structure:

  • Brief 2-3 sentence overview
  • Change type categorization (Feature/Bugfix/Refactor/etc)
  • Key changes (Added/Modified/Removed)
  • Files affected with descriptions
  • Impact assessment (scope: small/medium/large)

Common Use Cases:

  • Developers who forget to write PR descriptions
  • Quick understanding of complex changes
  • Standardized documentation format
  • Pre-review context for reviewers

Review-Again Command Implementation

The @codebot review-again command allows manual re-triggering of PR reviews without new commits.

Key Features:

  • Detects @codebot review-again in PR comments (not issue comments)
  • Compares new review with previous review to show resolved/new issues
  • Updates existing AI review comment instead of creating duplicates
  • Updates PR labels based on new severity assessment

Implementation Details:

  1. PRAgent.can_handle() - Handles issue_comment events on PRs containing "review-again"

  2. PRAgent._handle_review_again() - Main handler that:

    • Fetches previous review comment
    • Re-runs full PR review (security scan + AI analysis)
    • Compares findings using _compare_reviews()
    • Generates diff report with _format_review_update()
    • Updates comment and labels
  3. Review Comparison - Uses finding keys (file:line:description) to match issues:

    • Resolved: Issues in previous but not in current review
    • New: Issues in current but not in previous review
    • Still Present: Issues in both reviews
    • Severity Changed: Same issue with different severity
  4. Workflow Integration - .gitea/workflows/ai-comment-reply.yml:

    • Detects if comment is on PR or issue
    • Uses dispatch command for PRs to route to PRAgent
    • Preserves backward compatibility with issue commands

Usage:

# In a PR comment:
@codebot review-again

Common Use Cases:

  • Re-evaluate after explaining false positives in comments
  • Test new .ai-review.yml configuration
  • Update severity after code clarification
  • Faster iteration without empty commits

Adding a New Command to @codebot

  1. Add command to config.yml under interaction.commands
  2. Add handler method in IssueAgent (e.g., _command_yourcommand())
  3. Update _handle_command() to route the command to your handler
  4. Update README.md with command documentation
  5. Add tests in tests/test_ai_review.py

Example commands:

  • @codebot help - Show all available commands with examples
  • @codebot triage - Full issue triage with labeling
  • @codebot explain - Explain the issue
  • @codebot suggest - Suggest solutions
  • @codebot summarize - Generate PR summary or issue summary (works on both)
  • @codebot setup-labels - Automatic label setup (built-in, not in config)
  • @codebot review-again - Re-run PR review without new commits (PR comments only)

Changing the Bot Name

  1. Edit config.yml: interaction.mention_prefix: "@newname"
  2. Update all Gitea workflow files in .gitea/workflows/ (search for contains(github.event.comment.body)
  3. Update README.md and documentation

Supporting a New LLM Provider

  1. Create provider class in clients/llm_client.py inheriting from BaseLLMProvider
  2. Implement call() and optionally call_with_tools()
  3. Register in LLMClient.PROVIDERS dict
  4. Add model config to config.yml
  5. Document in README.md

Repository Labels

Use the @codebot setup-labels command to automatically configure labels. This command:

For repositories with existing labels:

  • Detects naming patterns: Kind/Bug, Priority - High, type: bug
  • Maps existing labels to OpenRabbit schema using aliases
  • Creates only missing labels following detected pattern
  • Zero duplicate labels

For fresh repositories:

  • Creates OpenRabbit's default label set
  • Uses standard naming: type:, priority:, status labels

Example with existing Kind/ and Priority - labels:

@codebot setup-labels

✅ Found 18 existing labels with pattern: prefix_slash

Proposed Mapping:
| OpenRabbit Expected | Your Existing Label | Status |
|---------------------|---------------------|--------|
| type: bug          | Kind/Bug            | ✅ Map |
| type: feature      | Kind/Feature        | ✅ Map |
| priority: high     | Priority - High     | ✅ Map |
| ai-reviewed        | (missing)           | ⚠️ Create |

✅ Created Kind/Question
✅ Created Status - AI Reviewed

Setup Complete! Auto-labeling will use your existing label schema.

Manual Label Setup

The system expects these labels to exist in repositories for auto-labeling:

  • priority: critical, priority: high, priority: medium, priority: low
  • type: bug, type: feature, type: question, type: documentation, type: security, type: testing
  • ai-approved, ai-changes-required, ai-reviewed

Labels are mapped in config.yml under the labels section.

Label Configuration Format

Labels support two formats for backwards compatibility:

New format (with colors and aliases):

labels:
  type:
    bug:
      name: "type: bug"
      color: "d73a4a"  # Red
      description: "Something isn't working"
      aliases: ["Kind/Bug", "bug", "Type: Bug"]  # For auto-detection

Old format (strings only):

labels:
  type:
    bug: "type: bug"  # Still works, uses default blue color

Label Pattern Detection

The setup-labels command detects these patterns (configured in label_patterns):

  1. prefix_slash: Kind/Bug, Type/Feature, Category/X
  2. prefix_dash: Priority - High, Status - Blocked
  3. colon: type: bug, priority: high

When creating missing labels, the bot follows the detected pattern to maintain consistency.