Files
openrabbit/docs/SECURITY.md

9.9 KiB

Security Guidelines for OpenRabbit

This document outlines security best practices and requirements for OpenRabbit development.

Table of Contents

  1. Workflow Security
  2. Webhook Data Handling
  3. Input Validation
  4. Secret Management
  5. Security Scanning
  6. Reporting Vulnerabilities

Workflow Security

Principle: Minimize Data Exposure

Problem: GitHub Actions/Gitea Actions can expose sensitive data through:

  • Environment variables visible in logs
  • Debug output
  • Error messages
  • Process listings

Solution: Use minimal data in workflows and sanitize all inputs.

Bad: Exposing Full Webhook Data

# NEVER DO THIS - exposes all user data, emails, tokens
env:
  EVENT_JSON: ${{ toJSON(github.event) }}
run: |
  python process.py "$EVENT_JSON"

Why this is dangerous:

  • Full webhook payloads can contain user emails, private repo URLs, installation tokens
  • Data appears in workflow logs if debug mode is enabled
  • Environment variables can be dumped by malicious code
  • Violates principle of least privilege

Good: Minimal Data Extraction

# SAFE: Only extract necessary fields
run: |
  EVENT_DATA=$(cat <<EOF
  {
    "issue": {
      "number": ${{ github.event.issue.number }}
    },
    "comment": {
      "body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)
    }
  }
  EOF
  )
  python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"

Why this is safe:

  • Only includes necessary fields (number, body)
  • Agents fetch full data from API with proper auth
  • Reduces attack surface
  • Follows data minimization principle

Input Validation Requirements

All workflow inputs MUST be validated before use:

  1. Repository Format

    # Validate owner/repo format
    if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
        echo "Error: Invalid repository format"
        exit 1
    fi
    
  2. Numeric Inputs

    # Validate issue/PR numbers are numeric
    if ! [[ "$ISSUE_NUMBER" =~ ^[0-9]+$ ]]; then
        echo "Error: Invalid issue number"
        exit 1
    fi
    
  3. String Sanitization

    # Use jq for JSON string escaping
    BODY=$(echo "$RAW_BODY" | jq -Rs .)
    

Boolean Comparison

# ❌ WRONG: String comparison on boolean
if [ "$IS_PR" = "true" ]; then

# ✅ CORRECT: Use workflow expression
IS_PR="${{ gitea.event.issue.pull_request != null }}"
if [ "$IS_PR" = "true" ]; then

Webhook Data Handling

Using the Sanitization Utilities

Always use utils/webhook_sanitizer.py when handling webhook data:

from utils.webhook_sanitizer import (
    sanitize_webhook_data,
    validate_repository_format,
    extract_minimal_context,
)

# Sanitize data before logging or storing
sanitized = sanitize_webhook_data(raw_event_data)

# Extract only necessary fields
minimal = extract_minimal_context(event_type, sanitized)

# Validate repository input
owner, repo = validate_repository_format(repo_string)

Sensitive Fields (Automatically Redacted)

The sanitizer removes these fields:

  • email, private_email, email_addresses
  • token, access_token, refresh_token, api_key
  • secret, password, private_key, ssh_key
  • phone, address, ssn, credit_card
  • installation_id, node_id

Large Field Truncation

These fields are truncated to prevent log flooding:

  • body: 500 characters
  • description: 500 characters
  • message: 500 characters

Input Validation

Repository Name Validation

from utils.webhook_sanitizer import validate_repository_format

try:
    owner, repo = validate_repository_format(user_input)
except ValueError as e:
    logger.error(f"Invalid repository: {e}")
    return

Checks performed:

  • Format is owner/repo
  • No path traversal (..)
  • No shell injection characters (;, |, &, `, etc.)
  • Non-empty owner and repo name

Event Data Size Limits

# Maximum event size: 10MB
MAX_EVENT_SIZE = 10 * 1024 * 1024

if len(event_json) > MAX_EVENT_SIZE:
    raise ValueError("Event data too large")

JSON Validation

try:
    data = json.loads(event_json)
except json.JSONDecodeError as e:
    raise ValueError(f"Invalid JSON: {e}")

if not isinstance(data, dict):
    raise ValueError("Event data must be a JSON object")

Secret Management

Environment Variables

Required secrets (set in CI/CD settings):

  • AI_REVIEW_TOKEN - Gitea/GitHub API token (read/write access)
  • OPENAI_API_KEY - OpenAI API key
  • OPENROUTER_API_KEY - OpenRouter API key (optional)
  • OLLAMA_HOST - Ollama server URL (optional)

Never Commit Secrets

# NEVER DO THIS
api_key = "sk-1234567890abcdef"  # ❌ Hardcoded secret

# NEVER DO THIS
config = {
    "openai_key": "sk-1234567890abcdef"  # ❌ Secret in config
}

Always Use Environment Variables

# CORRECT
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY not set")

Secret Scanning

The security scanner checks for:

  • Hardcoded API keys (pattern: sk-[a-zA-Z0-9]{32,})
  • AWS keys (AKIA[0-9A-Z]{16})
  • Private keys (-----BEGIN.*PRIVATE KEY-----)
  • Passwords in code (password\s*=\s*["'][^"']+["'])

Security Scanning

Automated Scanning

All code is scanned for vulnerabilities:

  1. PR Reviews - Automatic security scan on every PR
  2. Pre-commit Hooks - Local scanning before commit
  3. Pattern-based Detection - 17 built-in security rules

Running Manual Scans

# Scan a specific file
python -c "
from security.security_scanner import SecurityScanner
s = SecurityScanner()
with open('myfile.py') as f:
    findings = s.scan_content(f.read(), 'myfile.py')
    for f in findings:
        print(f'{f.severity}: {f.description}')
"

# Scan a git diff
git diff | python tools/ai-review/security/scan_diff.py

Security Rule Categories

  • A01: Broken Access Control - Missing auth, insecure file operations
  • A02: Cryptographic Failures - Weak crypto, hardcoded secrets
  • A03: Injection - SQL injection, command injection, XSS
  • A06: Vulnerable Components - Insecure imports
  • A07: Authentication Failures - Weak auth mechanisms
  • A09: Logging Failures - Security logging issues

Severity Levels

  • HIGH: Critical vulnerabilities requiring immediate fix

    • SQL injection, command injection, hardcoded secrets
  • MEDIUM: Important issues requiring attention

    • Missing input validation, weak crypto, XSS
  • LOW: Best practice violations

    • TODO comments with security keywords, eval() usage

CI Failure Threshold

Configure in config.yml:

review:
  fail_on_severity: HIGH  # Fail CI if HIGH severity found

Webhook Signature Validation

Future GitHub Integration

When accepting webhooks directly (not through Gitea Actions):

from utils.webhook_sanitizer import validate_webhook_signature

# Validate webhook is from GitHub
signature = request.headers.get("X-Hub-Signature-256")
payload = request.get_data(as_text=True)
secret = os.environ["WEBHOOK_SECRET"]

if not validate_webhook_signature(payload, signature, secret):
    return "Unauthorized", 401

Important: Always validate webhook signatures to prevent:

  • Replay attacks
  • Forged webhook events
  • Unauthorized access

Reporting Vulnerabilities

Security Issues

If you discover a security vulnerability:

  1. DO NOT create a public issue
  2. Email security contact: [maintainer email]
  3. Include:
    • Description of the vulnerability
    • Steps to reproduce
    • Potential impact
    • Suggested fix (if available)

Response Timeline

  • Acknowledgment: Within 48 hours
  • Initial Assessment: Within 1 week
  • Fix Development: Depends on severity
    • HIGH: Within 1 week
    • MEDIUM: Within 2 weeks
    • LOW: Next release cycle

Security Checklist for Contributors

Before submitting a PR:

  • No secrets in code or config files
  • All user inputs are validated
  • No SQL injection vulnerabilities
  • No command injection vulnerabilities
  • No XSS vulnerabilities
  • Sensitive data is sanitized before logging
  • Environment variables are not exposed in workflows
  • Repository format validation is used
  • Error messages don't leak sensitive info
  • Security scanner passes (no HIGH severity)

Security Tools

Webhook Sanitizer

Location: tools/ai-review/utils/webhook_sanitizer.py

Functions:

  • sanitize_webhook_data(data) - Remove sensitive fields
  • extract_minimal_context(event_type, data) - Minimal payload
  • validate_repository_format(repo) - Validate owner/repo
  • validate_webhook_signature(payload, sig, secret) - Verify webhook

Safe Dispatch Utility

Location: tools/ai-review/utils/safe_dispatch.py

Usage:

python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", ...}'

Features:

  • Input validation
  • Size limits (10MB max)
  • Automatic sanitization
  • Comprehensive error handling

Security Scanner

Location: tools/ai-review/security/security_scanner.py

Features:

  • 17 built-in security rules
  • OWASP Top 10 coverage
  • CWE references
  • Severity classification
  • Pattern-based detection

Best Practices Summary

  1. Minimize Data: Only pass necessary data to workflows
  2. Validate Inputs: Always validate external inputs
  3. Sanitize Outputs: Remove sensitive data before logging
  4. Use Utilities: Leverage webhook_sanitizer.py and safe_dispatch.py
  5. Scan Code: Run security scanner before committing
  6. Rotate Secrets: Regularly rotate API keys and tokens
  7. Review Changes: Manual security review for sensitive changes
  8. Test Security: Add tests for security-critical code

Updates and Maintenance

This security policy is reviewed quarterly and updated as needed.

Last Updated: 2025-12-28
Next Review: 2026-03-28