Hiddenden/openrabbit

Fork 0

Files

latte b24ae0dcda remove documentation that are no longer needed

2026-01-16 11:15:51 +00:00

27 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

OpenRabbit is an enterprise-grade AI code review system for Gitea (and GitHub). It provides automated PR review, issue triage, interactive chat, and codebase analysis through a collection of specialized AI agents.

Commands

Development

# Run tests
pytest tests/ -v

# Run specific test file
pytest tests/test_ai_review.py -v

# Install dependencies
pip install -r tools/ai-review/requirements.txt

# Run a PR review locally
cd tools/ai-review
python main.py pr owner/repo 123

# Run issue triage
python main.py issue owner/repo 456

# Test chat functionality
python main.py chat owner/repo "How does authentication work?"

# Run with custom config
python main.py pr owner/repo 123 --config /path/to/config.yml

Testing Workflows

# Validate workflow YAML syntax
python -c "import yaml; yaml.safe_load(open('.github/workflows/ai-review.yml'))"

# Test security scanner
python -c "from security.security_scanner import SecurityScanner; s = SecurityScanner(); print(list(s.scan_content('password = \"secret123\"', 'test.py')))"

# Test webhook sanitization
cd tools/ai-review
python -c "from utils.webhook_sanitizer import sanitize_webhook_data; print(sanitize_webhook_data({'user': {'email': 'test@example.com'}}))"

# Test safe dispatch
python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", "issue": {"number": 1}, "comment": {"body": "test"}}'

Architecture

Agent System

The codebase uses an agent-based architecture where specialized agents handle different types of events:

BaseAgent (agents/base_agent.py) - Abstract base class providing:
- Gitea API client integration
- LLM client integration with rate limiting
- Common comment management (upsert, find AI comments)
- Prompt loading from prompts/ directory
- Standard execution flow with error handling
Specialized Agents - Each agent implements:
- can_handle(event_type, event_data) - Determines if agent should process the event
- execute(context) - Main execution logic
- Returns AgentResult with success status, message, data, and actions taken
Core Agents:
- PRAgent - Reviews pull requests with inline comments and security scanning
- IssueAgent - Triages issues and responds to @codebot commands
- CodebaseAgent - Analyzes entire codebase health and tech debt
- ChatAgent - Interactive assistant with tool calling (search_codebase, read_file, search_web)
Specialized Agents:
- DependencyAgent - Scans dependencies for security vulnerabilities (Python, JavaScript)
- TestCoverageAgent - Analyzes code for test coverage gaps and suggests test cases
- ArchitectureAgent - Enforces layer separation and detects architecture violations
Dispatcher (dispatcher.py) - Routes events to appropriate agents:
- Registers agents at startup
- Determines which agents can handle each event
- Executes agents (supports concurrent execution)
- Returns aggregated results

Multi-Provider LLM Client

The LLMClient (clients/llm_client.py) provides a unified interface for multiple LLM providers:

Core Providers (in llm_client.py):

OpenAI - Primary provider (gpt-4.1-mini default)
OpenRouter - Multi-provider access (claude-3.5-sonnet)
Ollama - Self-hosted models (codellama:13b)

Additional Providers (in clients/providers/):

AnthropicProvider - Direct Anthropic Claude API (claude-3.5-sonnet)
AzureOpenAIProvider - Azure OpenAI Service with API key auth
AzureOpenAIWithAADProvider - Azure OpenAI with Azure AD authentication
GeminiProvider - Google Gemini API (public)
VertexAIGeminiProvider - Google Vertex AI Gemini (enterprise GCP)

Key features:

Tool/function calling support via call_with_tools(messages, tools)
JSON response parsing with fallback extraction
Provider-specific configuration via config.yml
Configurable timeouts per provider

Platform Abstraction

The GiteaClient (clients/gitea_client.py) provides a unified REST API client for Gitea (also compatible with GitHub API):

Issue operations (create, update, list, get, comments, labels)
PR operations (get, diff, files, reviews)
Repository operations (get repo, file contents, branches)

Environment variables:

AI_REVIEW_API_URL - API base URL (e.g., https://api.github.com or https://gitea.example.com/api/v1)
AI_REVIEW_TOKEN - Authentication token

Security Scanner

The SecurityScanner (security/security_scanner.py) uses pattern-based detection with 17 built-in rules covering:

OWASP Top 10 categories (A01-A10)
Common vulnerabilities (SQL injection, XSS, hardcoded secrets, weak crypto)
Returns SecurityFinding objects with severity (HIGH/MEDIUM/LOW), CWE references, and recommendations

Can scan:

File content via scan_content(content, filename)
Git diffs via scan_diff(diff) - only scans added lines

Chat Agent Tool Calling

The ChatAgent implements an iterative tool calling loop:

Send user message + system prompt to LLM with available tools
If LLM returns tool calls, execute each tool and append results to conversation
Repeat until LLM returns a final response (max 5 iterations)

Available tools:

search_codebase - Searches repository files and code patterns
read_file - Reads specific file contents (truncated at 8KB)
search_web - Queries SearXNG instance (requires SEARXNG_URL)

Configuration

Primary Config File: `tools/ai-review/config.yml`

Critical settings:

provider: openai  # openai | openrouter | ollama

model:
  openai: gpt-4.1-mini
  openrouter: anthropic/claude-3.5-sonnet
  ollama: codellama:13b

interaction:
  mention_prefix: "@codebot"  # Bot trigger name - update workflows too!
  commands:
    - explain      # Explain what the issue is about
    - suggest      # Suggest solutions or next steps
    - security     # Security analysis
    - summarize    # Summarize the issue
    - triage       # Full triage with labeling
    - review-again # Re-run PR review (PR comments only)
  
review:
  fail_on_severity: HIGH  # Fail CI if HIGH severity issues found
  max_diff_lines: 800     # Skip review if diff too large
  
agents:
  chat:
    max_iterations: 5  # Tool calling loop limit

Important: When changing mention_prefix, also update all workflow files in .gitea/workflows/:

ai-comment-reply.yml
ai-chat.yml
ai-issue-triage.yml

Look for: if: contains(github.event.comment.body, '@codebot') and update to your new bot name.

Current bot name: @codebot

Environment Variables

Required:

AI_REVIEW_API_URL - Platform API URL
AI_REVIEW_TOKEN - Bot authentication token
OPENAI_API_KEY - OpenAI API key (or provider-specific key)

Optional:

SEARXNG_URL - SearXNG instance for web search
OPENROUTER_API_KEY - OpenRouter API key
OLLAMA_HOST - Ollama server URL

Workflow Architecture

Workflows are located in .gitea/workflows/ and are mutually exclusive to prevent duplicate runs:

enterprise-ai-review.yml - Triggered on PR open/sync
ai-issue-triage.yml - Triggered ONLY on @codebot triage in comments
ai-comment-reply.yml - Triggered on specific commands: help, explain, suggest, security, summarize, changelog, explain-diff, review-again, setup-labels
ai-chat.yml - Triggered on @codebot mentions that are NOT specific commands (free-form questions)
ai-codebase-review.yml - Scheduled weekly analysis

Workflow Routing Logic:

If comment contains @codebot triage → ai-issue-triage.yml only
If comment contains specific command (e.g., @codebot help) → ai-comment-reply.yml only
If comment contains @codebot <question> (no command) → ai-chat.yml only

This prevents the issue where all three workflows would trigger on every @codebot mention, causing massive duplication.

CRITICAL: Bot Self-Trigger Prevention

All workflows include github.event.comment.user.login != 'Bartender' to prevent infinite loops. Without this check:

Bot posts comment mentioning @codebot
Workflow triggers, bot posts another comment with @codebot
Triggers again infinitely → 10+ duplicate runs

If you change the bot username, update all three workflow files:

.gitea/workflows/ai-comment-reply.yml
.gitea/workflows/ai-chat.yml
.gitea/workflows/ai-issue-triage.yml

Look for: github.event.comment.user.login != 'Bartender' and replace 'Bartender' with your bot's username.

Note: Issue triage is now opt-in via @codebot triage command, not automatic on issue creation.

Key workflow pattern:

Checkout repository
Setup Python 3.11
Install dependencies (pip install requests pyyaml)
Set environment variables
Run python main.py <command> <args>

Prompt Templates

Prompts are stored in tools/ai-review/prompts/ as Markdown files:

base.md - Base instructions for all reviews
pr_summary.md - PR summary generation template
changelog.md - Keep a Changelog format generation template
explain_diff.md - Plain-language diff explanation template
issue_triage.md - Issue classification template
issue_response.md - Issue response template

Important: JSON examples in prompts must use double curly braces ({{ and }}) to escape Python's .format() method. This is tested in tests/test_ai_review.py::TestPromptFormatting.

Code Patterns

Creating a New Agent

from agents.base_agent import BaseAgent, AgentContext, AgentResult

class MyAgent(BaseAgent):
    def can_handle(self, event_type: str, event_data: dict) -> bool:
        # Check if agent is enabled in config
        if not self.config.get("agents", {}).get("my_agent", {}).get("enabled", True):
            return False
        return event_type == "my_event_type"
    
    def execute(self, context: AgentContext) -> AgentResult:
        # Load prompt template
        prompt = self.load_prompt("my_prompt")
        formatted = prompt.format(data=context.event_data.get("field"))
        
        # Call LLM with rate limiting
        response = self.call_llm(formatted)
        
        # Post comment to issue/PR
        self.upsert_comment(
            context.owner,
            context.repo,
            issue_index,
            response.content
        )
        
        return AgentResult(
            success=True,
            message="Agent completed",
            actions_taken=["Posted comment"]
        )

Calling LLM with Tools

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Search for authentication code"}
]

tools = [{
    "type": "function",
    "function": {
        "name": "search_code",
        "description": "Search codebase",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
}]

response = self.llm.call_with_tools(messages, tools=tools)

if response.tool_calls:
    for tc in response.tool_calls:
        result = execute_tool(tc.name, tc.arguments)
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": result
        })

Adding Security Rules

Edit tools/ai-review/security/security_scanner.py or create security/security_rules.yml:

rules:
  - id: SEC018
    name: Custom Rule Name
    pattern: 'regex_pattern_here'
    severity: HIGH  # HIGH, MEDIUM, LOW
    category: A03:2021 Injection
    cwe: CWE-XXX
    description: What this detects
    recommendation: How to fix it

Security Best Practices

CRITICAL: Always follow these security guidelines when modifying workflows or handling webhook data.

Workflow Security Rules

Never pass full webhook data to environment variables

# ❌ NEVER DO THIS
env:
  EVENT_DATA: ${{ toJSON(github.event) }}  # Exposes emails, tokens, etc.

# ✅ ALWAYS DO THIS
run: |
  EVENT_DATA=$(cat <<EOF
  {
    "issue": {"number": ${{ github.event.issue.number }}},
    "comment": {"body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)}
  }
  EOF
  )
  python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"

Always validate repository format

# Validate before use
if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
    echo "Error: Invalid repository format"
    exit 1
fi

Use safe_dispatch.py for webhook processing

# Instead of inline Python with os.environ, use:
python utils/safe_dispatch.py issue_comment owner/repo "$EVENT_JSON"

Input Validation

Always use webhook_sanitizer.py utilities:

from utils.webhook_sanitizer import (
    sanitize_webhook_data,      # Remove sensitive fields
    validate_repository_format,  # Validate owner/repo format
    extract_minimal_context,     # Extract only necessary fields
)

# Validate repository input
owner, repo = validate_repository_format(repo_string)  # Raises ValueError if invalid

# Sanitize webhook data
sanitized = sanitize_webhook_data(raw_event_data)

# Extract minimal context (reduces attack surface)
minimal = extract_minimal_context(event_type, sanitized)

Pre-commit Security Scanning

Install pre-commit hooks to catch security issues before commit:

# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install

# Run manually
pre-commit run --all-files

The hooks will:

Scan Python files for security vulnerabilities
Validate workflow files for security anti-patterns
Detect hardcoded secrets
Run security scanner on code changes

Security Resources

SECURITY.md - Complete security guidelines and best practices
tools/ai-review/utils/webhook_sanitizer.py - Input validation utilities
tools/ai-review/utils/safe_dispatch.py - Safe webhook dispatch wrapper
.pre-commit-config.yaml - Pre-commit hook configuration

Testing

The test suite covers:

Prompt Formatting (tests/test_ai_review.py) - Ensures prompts don't have unescaped {} that break .format()
Module Imports - Verifies all modules can be imported
Security Scanner - Tests pattern detection and false positive rate
Agent Context - Tests dataclass creation and validation
Security Utilities (tests/test_security_utils.py) - Tests webhook sanitization, validation, and safe dispatch
Safe Dispatch (tests/test_safe_dispatch.py) - Tests secure event dispatching
Metrics - Tests enterprise metrics collection

Run specific test classes:

pytest tests/test_ai_review.py::TestPromptFormatting -v
pytest tests/test_ai_review.py::TestSecurityScanner -v

Common Development Tasks

PR Summary Generation

The PR summary feature automatically generates comprehensive summaries for pull requests.

Key Features:

Auto-generates summary for PRs with empty descriptions
Can be manually triggered with @codebot summarize in PR comments
Analyzes diff to extract key changes, files affected, and impact
Categorizes change type (Feature/Bugfix/Refactor/Documentation/Testing)
Posts as comment or updates PR description (configurable)

Implementation Details:

Auto-Summary on PR Open - PRAgent.execute():
- Checks if PR body is empty and auto_summary.enabled is true
- Calls _generate_pr_summary() automatically
- Continues with normal PR review after posting summary
Manual Trigger - @codebot summarize in PR comments:
- PRAgent.can_handle() detects summarize command in PR comments
- Routes to _handle_summarize_command()
- Generates and posts summary on demand
Summary Generation - _generate_pr_summary():
- Fetches PR diff using _get_diff()
- Loads prompts/pr_summary.md template
- Calls LLM with diff to analyze changes
- Returns structured JSON with summary data
- Formats using _format_pr_summary()
- Posts as comment or updates description based on config

Configuration - config.yml:

agents:
  pr:
    auto_summary:
      enabled: true  # Auto-generate for empty PRs
      post_as_comment: true  # true = comment, false = update description

Summary Structure:

Brief 2-3 sentence overview
Change type categorization (Feature/Bugfix/Refactor/etc)
Key changes (Added/Modified/Removed)
Files affected with descriptions
Impact assessment (scope: small/medium/large)

Common Use Cases:

Developers who forget to write PR descriptions
Quick understanding of complex changes
Standardized documentation format
Pre-review context for reviewers

PR Changelog Generation

The @codebot changelog command generates Keep a Changelog format entries from PR diffs.

Key Features:

Generates structured changelog entries following Keep a Changelog format
Categorizes changes: Added/Changed/Deprecated/Removed/Fixed/Security
Automatically detects breaking changes
Includes technical details (files changed, LOC, components)
Output is ready to copy-paste into CHANGELOG.md

Implementation Details:

Command Handler - PRAgent._handle_changelog_command():
- Triggered by @codebot changelog in PR comments
- Fetches PR title, description, and diff
- Loads prompts/changelog.md template
- Formats prompt with PR context

LLM Analysis - Generates structured JSON:

{
  "changelog": {
    "added": ["New features"],
    "changed": ["Changes to existing functionality"],
    "fixed": ["Bug fixes"],
    "security": ["Security fixes"]
  },
  "breaking_changes": ["Breaking changes"],
  "technical_details": {
    "files_changed": 15,
    "insertions": 450,
    "deletions": 120,
    "main_components": ["auth/", "api/"]
  }
}

Formatting - _format_changelog():
- Converts JSON to Keep a Changelog markdown format
- Uses emojis for visual categorization (✨ Added, 🔄 Changed, 🐛 Fixed)
- Highlights breaking changes prominently
- Includes technical summary at the end
- Omits empty sections for clean output
Prompt Engineering - prompts/changelog.md:
- User-focused language (not developer jargon)
- Filters noise (formatting, typos, minor refactoring)
- Groups related changes
- Active voice, concise entries
- Maximum 100 characters per entry

Common Use Cases:

Preparing release notes
Maintaining CHANGELOG.md
Customer-facing announcements
Version documentation

Workflow Safety:

Only triggers on PR comments (not issue comments)
Included in ai-comment-reply.yml workflow conditions
Excluded from ai-chat.yml to prevent duplicate runs
No automatic triggering - manual command only

Code Diff Explainer

The @codebot explain-diff command translates technical code changes into plain language for non-technical stakeholders.

Key Features:

Plain-language explanations without jargon
File-by-file breakdown with "what" and "why" context
Architecture impact analysis
Breaking change detection
Perfect for PMs, designers, and new team members

Implementation Details:

Command Handler - PRAgent._handle_explain_diff_command():
- Triggered by @codebot explain-diff in PR comments
- Fetches PR title, description, and full diff
- Loads prompts/explain_diff.md template
- Formats prompt with PR context

LLM Analysis - Generates plain-language JSON:

{
  "overview": "High-level summary in everyday language",
  "key_changes": [
    {
      "file": "path/to/file.py",
      "status": "new|modified|deleted",
      "explanation": "What changed (no jargon)",
      "why_it_matters": "Business/user impact"
    }
  ],
  "architecture_impact": {
    "description": "System-level effects explained simply",
    "new_dependencies": ["External libraries added"],
    "affected_components": ["System parts impacted"]
  },
  "breaking_changes": ["User-facing breaking changes"],
  "technical_details": { /* Stats for reference */ }
}

Formatting - _format_diff_explanation():
- Converts JSON to readable markdown
- Uses emojis for visual categorization (➕ new, 📝 modified, 🗑️ deleted)
- Highlights breaking changes prominently
- Includes technical summary for developers
- Omits empty sections for clean output
Prompt Engineering - prompts/explain_diff.md:
- Avoids jargon: "API" → "connection point between systems"
- Explains why: Not just what changed, but why it matters
- Uses analogies: "Caching" → "memory system for faster loading"
- Focus on impact: Who is affected and how
- Groups changes: Combines related files into themes
- Translates concepts: Technical terms → everyday language

Plain Language Rules:

❌ "Refactored authentication middleware" → ✅ "Updated login system for better security"
❌ "Implemented Redis caching" → ✅ "Added memory to make pages load 10x faster"
❌ "Database migration" → ✅ "Updated how data is stored"

Common Use Cases:

New team members understanding large PRs
Non-technical reviewers (PMs, designers) reviewing features
Documenting architectural decisions
Learning from other developers' code

Workflow Safety:

Only triggers on PR comments (not issue comments)
Included in ai-comment-reply.yml workflow conditions
Excluded from ai-chat.yml to prevent duplicate runs
No automatic triggering - manual command only

Review-Again Command Implementation

The @codebot review-again command allows manual re-triggering of PR reviews without new commits.

Key Features:

Detects @codebot review-again in PR comments (not issue comments)
Compares new review with previous review to show resolved/new issues
Updates existing AI review comment instead of creating duplicates
Updates PR labels based on new severity assessment

Implementation Details:

PRAgent.can_handle() - Handles issue_comment events on PRs containing "review-again"
PRAgent._handle_review_again() - Main handler that:
- Fetches previous review comment
- Re-runs full PR review (security scan + AI analysis)
- Compares findings using _compare_reviews()
- Generates diff report with _format_review_update()
- Updates comment and labels
Review Comparison - Uses finding keys (file:line:description) to match issues:
- Resolved: Issues in previous but not in current review
- New: Issues in current but not in previous review
- Still Present: Issues in both reviews
- Severity Changed: Same issue with different severity
Workflow Integration - .gitea/workflows/ai-comment-reply.yml:
- Detects if comment is on PR or issue
- Uses dispatch command for PRs to route to PRAgent
- Preserves backward compatibility with issue commands

Usage:

# In a PR comment:
@codebot review-again

Common Use Cases:

Re-evaluate after explaining false positives in comments
Test new .ai-review.yml configuration
Update severity after code clarification
Faster iteration without empty commits

Adding a New Command to @codebot

Add command to config.yml under interaction.commands
Add handler method in IssueAgent (e.g., _command_yourcommand())
Update _handle_command() to route the command to your handler
Update README.md with command documentation
Add tests in tests/test_ai_review.py

Example commands:

@codebot help - Show all available commands with examples
@codebot triage - Full issue triage with labeling
@codebot explain - Explain the issue
@codebot suggest - Suggest solutions
@codebot summarize - Generate PR summary or issue summary (works on both)
@codebot changelog - Generate Keep a Changelog format entries (PR comments only)
@codebot explain-diff - Explain code changes in plain language (PR comments only)
@codebot setup-labels - Automatic label setup (built-in, not in config)
@codebot review-again - Re-run PR review without new commits (PR comments only)

Changing the Bot Name

Edit config.yml: interaction.mention_prefix: "@newname"
Update all Gitea workflow files in .gitea/workflows/ (search for contains(github.event.comment.body)
Update README.md and documentation

Supporting a New LLM Provider

Create provider class in clients/llm_client.py inheriting from BaseLLMProvider
Implement call() and optionally call_with_tools()
Register in LLMClient.PROVIDERS dict
Add model config to config.yml
Document in README.md

Repository Labels

Automatic Label Setup (Recommended)

Use the @codebot setup-labels command to automatically configure labels. This command:

For repositories with existing labels:

Detects naming patterns: Kind/Bug, Priority - High, type: bug
Maps existing labels to OpenRabbit schema using aliases
Creates only missing labels following detected pattern
Zero duplicate labels

For fresh repositories:

Creates OpenRabbit's default label set
Uses standard naming: type:, priority:, status labels

Example with existing Kind/ and Priority - labels:

@codebot setup-labels

✅ Found 18 existing labels with pattern: prefix_slash

Proposed Mapping:
| OpenRabbit Expected | Your Existing Label | Status |
|---------------------|---------------------|--------|
| type: bug          | Kind/Bug            | ✅ Map |
| type: feature      | Kind/Feature        | ✅ Map |
| priority: high     | Priority - High     | ✅ Map |
| ai-reviewed        | (missing)           | ⚠️ Create |

✅ Created Kind/Question
✅ Created Status - AI Reviewed

Setup Complete! Auto-labeling will use your existing label schema.

Manual Label Setup

The system expects these labels to exist in repositories for auto-labeling:

priority: critical, priority: high, priority: medium, priority: low
type: bug, type: feature, type: question, type: documentation, type: security, type: testing
ai-approved, ai-changes-required, ai-reviewed

Labels are mapped in config.yml under the labels section.

Label Configuration Format

Labels support two formats for backwards compatibility:

New format (with colors and aliases):

labels:
  type:
    bug:
      name: "type: bug"
      color: "d73a4a"  # Red
      description: "Something isn't working"
      aliases: ["Kind/Bug", "bug", "Type: Bug"]  # For auto-detection

Old format (strings only):

labels:
  type:
    bug: "type: bug"  # Still works, uses default blue color

Label Pattern Detection

The setup-labels command detects these patterns (configured in label_patterns):

prefix_slash: Kind/Bug, Type/Feature, Category/X
prefix_dash: Priority - High, Status - Blocked
colon: type: bug, priority: high

When creating missing labels, the bot follows the detected pattern to maintain consistency.

27 KiB Raw Blame History Unescape Escape

CLAUDE.md

Overview

Commands

Development

Testing Workflows

Architecture

Agent System

Multi-Provider LLM Client

Platform Abstraction

Security Scanner

Chat Agent Tool Calling

Configuration

Primary Config File: tools/ai-review/config.yml

Environment Variables

Workflow Architecture

Prompt Templates

Code Patterns

Creating a New Agent

Calling LLM with Tools

Adding Security Rules

Security Best Practices

Workflow Security Rules

Input Validation

Pre-commit Security Scanning

Security Resources

Testing

Common Development Tasks

PR Summary Generation

PR Changelog Generation

Code Diff Explainer

Review-Again Command Implementation

Adding a New Command to @codebot

Changing the Bot Name

Supporting a New LLM Provider

Repository Labels

Automatic Label Setup (Recommended)

Manual Label Setup

Label Configuration Format

Label Pattern Detection

27 KiB

Raw Blame History

Primary Config File: `tools/ai-review/config.yml`