Hiddenden/openrabbit

Fork 0

Files

latte f94d21580c

Enterprise AI Code Review / ai-review (pull_request) Successful in 26s

Details

security fixes

2025-12-28 19:55:05 +00:00

9.9 KiB

Raw Blame History

Security Guidelines for OpenRabbit

This document outlines security best practices and requirements for OpenRabbit development.

Workflow Security
Webhook Data Handling
Input Validation
Secret Management
Security Scanning
Reporting Vulnerabilities

Workflow Security

Principle: Minimize Data Exposure

Problem: GitHub Actions/Gitea Actions can expose sensitive data through:

Environment variables visible in logs
Debug output
Error messages
Process listings

Solution: Use minimal data in workflows and sanitize all inputs.

❌ Bad: Exposing Full Webhook Data

# NEVER DO THIS - exposes all user data, emails, tokens
env:
  EVENT_JSON: ${{ toJSON(github.event) }}
run: |
  python process.py "$EVENT_JSON"

Why this is dangerous:

Full webhook payloads can contain user emails, private repo URLs, installation tokens
Data appears in workflow logs if debug mode is enabled
Environment variables can be dumped by malicious code
Violates principle of least privilege

✅ Good: Minimal Data Extraction

# SAFE: Only extract necessary fields
run: |
  EVENT_DATA=$(cat <<EOF
  {
    "issue": {
      "number": ${{ github.event.issue.number }}
    },
    "comment": {
      "body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)
    }
  }
  EOF
  )
  python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"

Why this is safe:

Only includes necessary fields (number, body)
Agents fetch full data from API with proper auth
Reduces attack surface
Follows data minimization principle

Input Validation Requirements

All workflow inputs MUST be validated before use:

Repository Format

# Validate owner/repo format
if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
    echo "Error: Invalid repository format"
    exit 1
fi

Numeric Inputs

# Validate issue/PR numbers are numeric
if ! [[ "$ISSUE_NUMBER" =~ ^[0-9]+$ ]]; then
    echo "Error: Invalid issue number"
    exit 1
fi

String Sanitization

# Use jq for JSON string escaping
BODY=$(echo "$RAW_BODY" | jq -Rs .)

Boolean Comparison

# ❌ WRONG: String comparison on boolean
if [ "$IS_PR" = "true" ]; then

# ✅ CORRECT: Use workflow expression
IS_PR="${{ gitea.event.issue.pull_request != null }}"
if [ "$IS_PR" = "true" ]; then

Webhook Data Handling

Using the Sanitization Utilities

Always use utils/webhook_sanitizer.py when handling webhook data:

from utils.webhook_sanitizer import (
    sanitize_webhook_data,
    validate_repository_format,
    extract_minimal_context,
)

# Sanitize data before logging or storing
sanitized = sanitize_webhook_data(raw_event_data)

# Extract only necessary fields
minimal = extract_minimal_context(event_type, sanitized)

# Validate repository input
owner, repo = validate_repository_format(repo_string)

Sensitive Fields (Automatically Redacted)

The sanitizer removes these fields:

email, private_email, email_addresses
token, access_token, refresh_token, api_key
secret, password, private_key, ssh_key
phone, address, ssn, credit_card
installation_id, node_id

Large Field Truncation

These fields are truncated to prevent log flooding:

body: 500 characters
description: 500 characters
message: 500 characters

Input Validation

Repository Name Validation

from utils.webhook_sanitizer import validate_repository_format

try:
    owner, repo = validate_repository_format(user_input)
except ValueError as e:
    logger.error(f"Invalid repository: {e}")
    return

Checks performed:

Format is owner/repo
No path traversal (..)
No shell injection characters (;, |, &, `, etc.)
Non-empty owner and repo name

Event Data Size Limits

# Maximum event size: 10MB
MAX_EVENT_SIZE = 10 * 1024 * 1024

if len(event_json) > MAX_EVENT_SIZE:
    raise ValueError("Event data too large")

JSON Validation

try:
    data = json.loads(event_json)
except json.JSONDecodeError as e:
    raise ValueError(f"Invalid JSON: {e}")

if not isinstance(data, dict):
    raise ValueError("Event data must be a JSON object")

Secret Management

Environment Variables

Required secrets (set in CI/CD settings):

AI_REVIEW_TOKEN - Gitea/GitHub API token (read/write access)
OPENAI_API_KEY - OpenAI API key
OPENROUTER_API_KEY - OpenRouter API key (optional)
OLLAMA_HOST - Ollama server URL (optional)

❌ Never Commit Secrets

# NEVER DO THIS
api_key = "sk-1234567890abcdef"  # ❌ Hardcoded secret

# NEVER DO THIS
config = {
    "openai_key": "sk-1234567890abcdef"  # ❌ Secret in config
}

✅ Always Use Environment Variables

# CORRECT
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY not set")

Secret Scanning

The security scanner checks for:

Hardcoded API keys (pattern: sk-[a-zA-Z0-9]{32,})
AWS keys (AKIA[0-9A-Z]{16})
Private keys (-----BEGIN.*PRIVATE KEY-----)
Passwords in code (password\s*=\s*["'][^"']+["'])

Security Scanning

Automated Scanning

All code is scanned for vulnerabilities:

PR Reviews - Automatic security scan on every PR
Pre-commit Hooks - Local scanning before commit
Pattern-based Detection - 17 built-in security rules

Running Manual Scans

# Scan a specific file
python -c "
from security.security_scanner import SecurityScanner
s = SecurityScanner()
with open('myfile.py') as f:
    findings = s.scan_content(f.read(), 'myfile.py')
    for f in findings:
        print(f'{f.severity}: {f.description}')
"

# Scan a git diff
git diff | python tools/ai-review/security/scan_diff.py

Security Rule Categories

A01: Broken Access Control - Missing auth, insecure file operations
A02: Cryptographic Failures - Weak crypto, hardcoded secrets
A03: Injection - SQL injection, command injection, XSS
A06: Vulnerable Components - Insecure imports
A07: Authentication Failures - Weak auth mechanisms
A09: Logging Failures - Security logging issues

Severity Levels

HIGH: Critical vulnerabilities requiring immediate fix
- SQL injection, command injection, hardcoded secrets
MEDIUM: Important issues requiring attention
- Missing input validation, weak crypto, XSS
LOW: Best practice violations
- TODO comments with security keywords, eval() usage

CI Failure Threshold

Configure in config.yml:

review:
  fail_on_severity: HIGH  # Fail CI if HIGH severity found

Webhook Signature Validation

Future GitHub Integration

When accepting webhooks directly (not through Gitea Actions):

from utils.webhook_sanitizer import validate_webhook_signature

# Validate webhook is from GitHub
signature = request.headers.get("X-Hub-Signature-256")
payload = request.get_data(as_text=True)
secret = os.environ["WEBHOOK_SECRET"]

if not validate_webhook_signature(payload, signature, secret):
    return "Unauthorized", 401

Important: Always validate webhook signatures to prevent:

Replay attacks
Forged webhook events
Unauthorized access

Reporting Vulnerabilities

Security Issues

If you discover a security vulnerability:

DO NOT create a public issue
Email security contact: [maintainer email]
Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if available)

Response Timeline

Acknowledgment: Within 48 hours
Initial Assessment: Within 1 week
Fix Development: Depends on severity
- HIGH: Within 1 week
- MEDIUM: Within 2 weeks
- LOW: Next release cycle

Security Checklist for Contributors

Before submitting a PR:

No secrets in code or config files
All user inputs are validated
No SQL injection vulnerabilities
No command injection vulnerabilities
No XSS vulnerabilities
Sensitive data is sanitized before logging
Environment variables are not exposed in workflows
Repository format validation is used
Error messages don't leak sensitive info
Security scanner passes (no HIGH severity)

Security Tools

Webhook Sanitizer

Location: tools/ai-review/utils/webhook_sanitizer.py

Functions:

sanitize_webhook_data(data) - Remove sensitive fields
extract_minimal_context(event_type, data) - Minimal payload
validate_repository_format(repo) - Validate owner/repo
validate_webhook_signature(payload, sig, secret) - Verify webhook

Safe Dispatch Utility

Location: tools/ai-review/utils/safe_dispatch.py

Usage:

python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", ...}'

Features:

Input validation
Size limits (10MB max)
Automatic sanitization
Comprehensive error handling

Security Scanner

Location: tools/ai-review/security/security_scanner.py