openrabbit/README.md

# OpenRabbit

Enterprise-grade AI code review system for **Gitea** and **GitHub** with automated PR review, issue triage, interactive chat, and codebase analysis.

---

## Features

| Feature | Description |
|---------|-------------|
| **PR Review** | Inline comments, security scanning, severity-based CI failure |
| **PR Summaries** | Auto-generate comprehensive PR summaries with change analysis and impact assessment |
| **Issue Triage** | On-demand classification, labeling, priority assignment via `@codebot triage` |
| **Chat** | Interactive AI chat with codebase search and web search tools |
| **@codebot Commands** | `@codebot summarize`, `changelog`, `explain-diff`, `explain`, `suggest`, `triage`, `review-again` in comments |
| **Codebase Analysis** | Health scores, tech debt tracking, weekly reports |
| **Security Scanner** | 17 OWASP-aligned rules + SAST integration (Bandit, Semgrep) |
| **Dependency Scanning** | Vulnerability detection for Python, JavaScript dependencies |
| **Test Coverage** | AI-powered test suggestions for untested code |
| **Architecture Compliance** | Layer separation enforcement, circular dependency detection |
| **Notifications** | Slack/Discord alerts for security findings and reviews |
| **Compliance** | Audit trail, CODEOWNERS enforcement, regulatory support |
| **Multi-Provider LLM** | OpenAI, Anthropic Claude, Azure OpenAI, Google Gemini, Ollama |
| **Enterprise Ready** | Audit logging, metrics, Prometheus export |
| **Gitea Native** | Built for Gitea workflows and API (also works with GitHub) |

---

## 📦 Installation

**Quick Setup (5 minutes):**

```bash
# Clone OpenRabbit
git clone https://github.com/YourOrg/openrabbit.git
cd openrabbit

# Run interactive setup wizard
./setup.sh
```

The wizard will generate workflow files, create configuration, and guide you through the remaining steps.

**📖 See [INSTALL.md](INSTALL.md) for:**
- Detailed installation instructions
- Manual setup guide
- Platform-specific differences (Gitea vs GitHub)
- Troubleshooting common issues

---

## Quick Start

### 1. Set Repository/Organization Secrets

```
AI_PROVIDER         - LLM provider: openai | openrouter | ollama | anthropic | azure | gemini
AI_MODEL            - Model to use for the active provider (e.g. gpt-4.1-mini, claude-3-5-sonnet-20241022)
OPENAI_API_KEY      - OpenAI API key (or use OpenRouter/Ollama)
SEARXNG_URL         - (Optional) SearXNG instance URL for web search
```

**For Gitea:**
```
AI_REVIEW_TOKEN     - Bot token with repo + issue permissions
```

**For GitHub:**
The built-in `GITHUB_TOKEN` is used automatically.

### 2. Add Workflows to Repository

Workflows are located in `.gitea/workflows/`.

#### Gitea Example

#### Gitea PR Review Workflow

```yaml
# .gitea/workflows/enterprise-ai-review.yml
name: AI PR Review
on: [pull_request]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/checkout@v4
        with:
          repository: YourOrg/OpenRabbit
          path: .ai-review
          token: ${{ secrets.AI_REVIEW_TOKEN }}

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - run: pip install requests pyyaml

      - name: Run AI Review
        env:
          AI_REVIEW_TOKEN: ${{ secrets.AI_REVIEW_TOKEN }}
          AI_REVIEW_REPO: ${{ gitea.repository }}
          AI_REVIEW_API_URL: https://your-gitea.example.com/api/v1
          AI_PROVIDER: ${{ secrets.AI_PROVIDER }}
          AI_MODEL: ${{ secrets.AI_MODEL }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          cd .ai-review/tools/ai-review
          python main.py pr ${{ gitea.repository }} ${{ gitea.event.pull_request.number }}
```

See `.gitea/workflows/` for all workflow examples.

### 3. Create Labels (Automatic Setup)

**Option A: Automatic Setup (Recommended)**

Create an issue and comment:
```
@codebot setup-labels
```

The bot will automatically:
- Detect your existing label schema (e.g., `Kind/Bug`, `Priority - High`)
- Map existing labels to OpenRabbit's auto-labeling system
- Create only the missing labels you need
- Follow your repository's naming convention

**Option B: Manual Setup**

Create these labels in your repository for auto-labeling:
- `priority: critical`, `priority: high`, `priority: medium`, `priority: low`
- `type: bug`, `type: feature`, `type: question`, `type: documentation`
- `ai-approved`, `ai-changes-required`, `ai-reviewed`

---

## Project Structure

```
tools/ai-review/
├── agents/                 # Agent implementations
│   ├── base_agent.py       # Abstract base agent
│   ├── issue_agent.py      # Issue triage & @codebot commands
│   ├── pr_agent.py         # PR review with security scan
│   ├── codebase_agent.py   # Codebase health analysis
│   ├── chat_agent.py       # Interactive chat with tool calling
│   ├── dependency_agent.py # Dependency vulnerability scanning
│   ├── test_coverage_agent.py # Test coverage analysis
│   └── architecture_agent.py  # Architecture compliance checking
├── clients/                # API clients
│   ├── gitea_client.py     # Gitea REST API wrapper
│   ├── llm_client.py       # Multi-provider LLM client with tool support
│   └── providers/          # Additional LLM providers
│       ├── anthropic_provider.py  # Direct Anthropic Claude API
│       ├── azure_provider.py      # Azure OpenAI Service
│       └── gemini_provider.py     # Google Gemini API
├── security/               # Security scanning
│   ├── security_scanner.py # 17 OWASP-aligned rules
│   └── sast_scanner.py     # Bandit, Semgrep, Trivy integration
├── notifications/          # Alerting system
│   └── notifier.py         # Slack, Discord, webhook notifications
├── compliance/             # Compliance & audit
│   ├── audit_trail.py      # Audit logging with integrity verification
│   └── codeowners.py       # CODEOWNERS enforcement
├── utils/                  # Utility functions
│   ├── ignore_patterns.py  # .ai-reviewignore support
│   └── webhook_sanitizer.py # Input validation
├── enterprise/             # Enterprise features
│   ├── audit_logger.py     # JSONL audit logging
│   └── metrics.py          # Prometheus-compatible metrics
├── prompts/                # AI prompt templates
├── main.py                 # CLI entry point
└── config.yml              # Configuration

.github/workflows/          # GitHub Actions workflows
├── ai-review.yml           # PR review workflow
├── ai-issue-triage.yml     # Issue triage workflow
├── ai-codebase-review.yml  # Codebase analysis
├── ai-comment-reply.yml    # @codebot command responses
└── ai-chat.yml             # Interactive AI chat

.gitea/workflows/           # Gitea Actions workflows
├── enterprise-ai-review.yml
├── ai-issue-triage.yml
├── ai-codebase-review.yml
├── ai-comment-reply.yml
└── ai-chat.yml
```

---

## CLI Commands

```bash
# Review a pull request
python main.py pr owner/repo 123

# Triage an issue
python main.py issue owner/repo 456

# Respond to @codebot command
python main.py comment owner/repo 456 "@codebot explain"

# Analyze codebase
python main.py codebase owner/repo

# Chat with the AI bot
python main.py chat owner/repo "How does authentication work?"
python main.py chat owner/repo "Find all API endpoints" --issue 789
```

---

## @codebot Commands

### Issue Commands

In any issue comment:

| Command | Description |
|---------|-------------|
| `@codebot help` | **Help:** Show all available commands with examples |
| `@codebot setup-labels` | **Setup:** Automatically create/map repository labels for auto-labeling |
| `@codebot triage` | Full issue triage with auto-labeling and analysis |
| `@codebot summarize` | Summarize the issue in 2-3 sentences |
| `@codebot explain` | Explain what the issue is about |
| `@codebot suggest` | Suggest solutions or next steps |
| `@codebot check-deps` | Scan dependencies for security vulnerabilities |
| `@codebot suggest-tests` | Suggest test cases for changed code |
| `@codebot refactor-suggest` | Suggest refactoring opportunities |
| `@codebot architecture` | Check architecture compliance (alias: `arch-check`) |
| `@codebot` (any question) | Chat with AI using codebase/web search tools |

### Pull Request Commands

In any PR comment:

| Command | Description |
|---------|-------------|
| `@codebot summarize` | Generate a comprehensive PR summary with changes, files affected, and impact |
| `@codebot changelog` | Generate Keep a Changelog format entries ready for CHANGELOG.md |
| `@codebot explain-diff` | Explain code changes in plain language for non-technical stakeholders |
| `@codebot review-again` | Re-run AI code review on current PR state without new commits |

#### PR Summary (`@codebot summarize`)

**Features:**
- 📋 Generates structured summary of PR changes
- ✨ Categorizes change type (Feature/Bugfix/Refactor/Documentation/Testing)
- 📝 Lists what was added, modified, and removed
- 📁 Shows all affected files with descriptions
- 🎯 Assesses impact scope (small/medium/large)
- 🤖 Automatically generates on PRs with empty descriptions

**When to use:**
- When a PR lacks a description
- To quickly understand what changed
- For standardized PR documentation
- Before reviewing complex PRs

**Example output:**
```markdown
## 📋 Pull Request Summary
This PR implements automatic PR summary generation...

**Type:** ✨ Feature

## Changes
✅ Added:
- PR summary generation in PRAgent
- Auto-summary for empty PR descriptions

📝 Modified:
- Updated config.yml with new settings

## Files Affected
- ➕ `tools/ai-review/prompts/pr_summary.md` - New prompt template
- 📝 `tools/ai-review/agents/pr_agent.py` - Added summary methods

## Impact
🟡 **Scope:** Medium
Adds new feature without affecting existing functionality
```

#### Changelog Generator (`@codebot changelog`)

**Features:**
- 📋 Generates Keep a Changelog format entries
- 🏷️ Categorizes changes (Added/Changed/Fixed/Removed/Security)
- ⚠️ Detects breaking changes automatically
- 📊 Includes technical details (files, LOC, components)
- 📝 Ready to copy-paste into CHANGELOG.md

**When to use:**
- Preparing release notes
- Maintaining CHANGELOG.md
- Customer-facing announcements
- Version documentation

**Example output:**
```markdown
## 📋 Changelog for PR #123

### ✨ Added
- User authentication system with JWT tokens
- Password reset functionality via email

### 🔄 Changed
- Updated database schema for user table
- Refactored login endpoint for better error handling

### 🐛 Fixed
- Session timeout bug causing premature logouts
- Security vulnerability in password validation

### 🔒 Security
- Fixed XSS vulnerability in user input validation

---

### ⚠️ BREAKING CHANGES
- **Removed legacy API endpoint /api/v1/old - migrate to /api/v2**

---

### 📊 Technical Details
- **Files changed:** 15
- **Lines:** +450 / -120
- **Main components:** auth/, api/users/, database/
```

#### Diff Explainer (`@codebot explain-diff`)

**Features:**
- 📖 Translates technical changes into plain language
- 🎯 Perfect for non-technical stakeholders (PMs, designers)
- 🔍 File-by-file breakdown with "what" and "why"
- 🏗️ Architecture impact analysis
- ⚠️ Breaking change detection
- 📊 Technical summary for reference

**When to use:**
- New team members reviewing complex PRs
- Non-technical reviewers need to understand changes
- Documenting architectural decisions
- Learning from others' code

**Example output:**
```markdown
## 📖 Code Changes Explained (PR #123)

### 🎯 Overview
This PR adds user authentication using secure tokens that expire after 24 hours, enabling users to log in securely without storing passwords in the application.

### 🔍 What Changed

#### ➕ `auth/jwt.py` (new)
**What changed:** Creates secure tokens for logged-in users
**Why it matters:** Enables the app to remember who you are without constantly asking for your password

#### 📝 `api/users.py` (modified)
**What changed:** Added a login page where users can sign in
**Why it matters:** Users can now create accounts and access their personal data

---

### 🏗️ Architecture Impact
Introduces a security layer across the entire application, ensuring only authenticated users can access protected features.

**New dependencies:**
- PyJWT (for creating secure tokens)
- bcrypt (for password encryption)

**Affected components:**
- API (all endpoints now check authentication)
- Database (added user credentials storage)

---

### ⚠️ Breaking Changes
- **All API endpoints now require authentication - existing scripts need to be updated**

---

### 📊 Technical Summary
- **Files changed:** 5
- **Lines:** +200 / -10
- **Components:** auth/, api/
```

#### Review Again (`@codebot review-again`)

**Features:**
- ✅ Shows diff from previous review (resolved/new/changed issues)
- 🏷️ Updates labels based on new severity
- ⚡ No need for empty commits to trigger review
- 🔧 Respects latest `.ai-review.yml` configuration

**When to use:**
- After addressing review feedback in comments
- When AI flagged a false positive and you explained it
- After updating `.ai-review.yml` security rules
- To re-evaluate severity after code clarification

**Example:**
```
The hardcoded string at line 45 is a public API URL, not a secret.
@codebot review-again
```

**New to OpenRabbit?** Just type `@codebot help` in any issue to see all available commands!

### Label Setup Command

The `@codebot setup-labels` command intelligently detects your existing label schema and sets up auto-labeling:

**For repositories with existing labels (e.g., `Kind/Bug`, `Priority - High`):**
- Detects your naming pattern (prefix/slash, prefix-dash, or colon-style)
- Maps your existing labels to OpenRabbit's schema
- Creates only missing labels following your pattern
- Zero duplicate labels created

**For fresh repositories:**
- Creates OpenRabbit's default label set
- Uses `type:`, `priority:`, and status labels

**Example output:**
```
@codebot setup-labels

✅ Found 18 existing labels with pattern: prefix_slash

Detected Categories:
- Kind (7 labels): Bug, Feature, Documentation, Security, Testing
- Priority (4 labels): Critical, High, Medium, Low

Proposed Mapping:
| OpenRabbit Expected | Your Existing Label | Status |
|---------------------|---------------------|--------|
| type: bug          | Kind/Bug            | ✅ Map |
| priority: high     | Priority - High     | ✅ Map |
| ai-reviewed        | (missing)           | ⚠️ Create |

✅ Created Kind/Question (#cc317c)
✅ Created Status - AI Reviewed (#1d76db)

Setup Complete! Auto-labeling will use your existing label schema.
```

---

## Interactive Chat

The chat agent is an interactive AI assistant with tool-calling capabilities:

**Tools Available:**
- `search_codebase` - Search repository files and code
- `read_file` - Read specific files
- `search_web` - Search the web via SearXNG

**Example:**
```
@codebot How do I configure rate limiting in this project?
```

The bot will search the codebase, read relevant files, and provide a comprehensive answer.

---

## Configuration

Edit `tools/ai-review/config.yml`:

```yaml
# Set via AI_PROVIDER secret — or hardcode here as fallback
provider: openai   # openai | openrouter | ollama | anthropic | azure | gemini

# Set via AI_MODEL secret — or hardcode per provider here
model:
  openai: gpt-4.1-mini
  openrouter: anthropic/claude-3.5-sonnet
  ollama: codellama:13b

agents:
  issue:
    enabled: true
    auto_label: true
  pr:
    enabled: true
    inline_comments: true
    security_scan: true
  codebase:
    enabled: true
  chat:
    enabled: true
    searxng_url: ""  # Or set SEARXNG_URL env var

interaction:
  respond_to_mentions: true
  mention_prefix: "@codebot"  # Customize your bot name here!
  commands:
    - summarize
    - explain
    - suggest
```

---

## Customizing the Bot Name

The default bot name is `@codebot`. To change it:

**Step 1:** Edit `tools/ai-review/config.yml`:
```yaml
interaction:
  mention_prefix: "@yourname"  # e.g., "@assistant", "@reviewer", etc.
```

**Step 2:** Update all workflow files in `.gitea/workflows/`:
- `ai-comment-reply.yml`
- `ai-chat.yml`
- `ai-issue-triage.yml`

Look for and update:
```yaml
if: contains(github.event.comment.body, '@codebot')
```

Change `@codebot` to your new bot name.

**Step 3 (CRITICAL):** Update bot username to prevent infinite loops:

In all three workflow files, find:
```yaml
github.event.comment.user.login != 'Bartender'
```

Replace `'Bartender'` with your bot's Gitea username. This prevents the bot from triggering itself when it posts comments containing `@codebot`, which would cause infinite loops and 10+ duplicate workflow runs.

---

## Security Scanning

17 rules covering OWASP Top 10:

| Category | Examples |
|----------|----------|
| Injection | SQL injection, command injection, XSS |
| Access Control | Hardcoded secrets, private keys |
| Crypto Failures | Weak hashing (MD5/SHA1), insecure random |
| Misconfiguration | Debug mode, CORS wildcard, SSL bypass |

---

## Documentation

| Document | Description |
|----------|-------------|
| [Getting Started](docs/getting-started.md) | Quick setup guide |
| [Configuration](docs/configuration.md) | All options explained |
| [Agents](docs/agents.md) | Agent documentation |
| [Security](docs/security.md) | Security rules reference |
| [Workflows](docs/workflows.md) | GitHub & Gitea workflow examples |
| [API Reference](docs/api-reference.md) | Client and agent APIs |
| [Enterprise](docs/enterprise.md) | Audit logging, metrics |
| [Troubleshooting](docs/troubleshooting.md) | Common issues |

---

## LLM Providers

| Provider | Model | Use Case |
|----------|-------|----------|
| OpenAI | gpt-4.1-mini | Fast, reliable, default |
| Anthropic | claude-3.5-sonnet | Direct Claude API access |
| Azure OpenAI | gpt-4 (deployment) | Enterprise Azure deployments |
| Google Gemini | gemini-1.5-pro | GCP customers, Vertex AI |
| OpenRouter | claude-3.5-sonnet | Multi-provider access |
| Ollama | codellama:13b | Self-hosted, private |

### Provider Configuration

The provider and model can be set via Gitea secrets so you don't need to edit `config.yml`:

| Secret | Description | Example |
|--------|-------------|---------|
| `AI_PROVIDER` | Which LLM provider to use | `openrouter` |
| `AI_MODEL` | Model for the active provider | `google/gemini-2.0-flash` |

The `config.yml` values are used as fallback when secrets are not set.

```yaml
# In config.yml (fallback defaults)
provider: openai  # openai | anthropic | azure | gemini | openrouter | ollama

# Azure OpenAI
azure:
  endpoint: ""  # Set via AZURE_OPENAI_ENDPOINT env var
  deployment: "gpt-4"
  api_version: "2024-02-15-preview"

# Google Gemini (Vertex AI)
gemini:
  project: ""  # Set via GOOGLE_CLOUD_PROJECT env var
  region: "us-central1"
```

### Environment Variables

| Variable | Provider | Description |
|----------|----------|-------------|
| `AI_PROVIDER` | All | Override the active provider (e.g. `openrouter`) |
| `AI_MODEL` | All | Override the model for the active provider |
| `OPENAI_API_KEY` | OpenAI | API key |
| `ANTHROPIC_API_KEY` | Anthropic | API key |
| `AZURE_OPENAI_ENDPOINT` | Azure | Service endpoint URL |
| `AZURE_OPENAI_API_KEY` | Azure | API key |
| `AZURE_OPENAI_DEPLOYMENT` | Azure | Deployment name |
| `GOOGLE_API_KEY` | Gemini | API key (public API) |
| `GOOGLE_CLOUD_PROJECT` | Vertex AI | GCP project ID |
| `OPENROUTER_API_KEY` | OpenRouter | API key |
| `OLLAMA_HOST` | Ollama | Server URL (default: localhost:11434) |

---

## Enterprise Features

- **Audit Logging**: JSONL logs with integrity checksums and daily rotation
- **Compliance**: HIPAA, SOC2, PCI-DSS, GDPR support with configurable rules
- **CODEOWNERS Enforcement**: Validate approvals against CODEOWNERS file
- **Notifications**: Slack/Discord webhooks for critical findings
- **SAST Integration**: Bandit, Semgrep, Trivy for advanced security scanning
- **Metrics**: Prometheus-compatible export
- **Rate Limiting**: Configurable request limits and timeouts
- **Custom Security Rules**: Define your own patterns via YAML
- **Tool Calling**: LLM function calling for interactive chat
- **Ignore Patterns**: `.ai-reviewignore` for excluding files from review

### Notifications Configuration

```yaml
# In config.yml
notifications:
  enabled: true
  threshold: "warning"  # info | warning | error | critical

  slack:
    enabled: true
    webhook_url: ""  # Set via SLACK_WEBHOOK_URL env var
    channel: "#code-review"

  discord:
    enabled: true
    webhook_url: ""  # Set via DISCORD_WEBHOOK_URL env var
```

### Compliance Configuration

```yaml
compliance:
  enabled: true
  audit:
    enabled: true
    log_file: "audit.log"
    retention_days: 90
  codeowners:
    enabled: true
    require_approval: true
```

---

## License

MIT