# Security Guidelines for OpenRabbit This document outlines security best practices and requirements for OpenRabbit development. ## Table of Contents 1. [Workflow Security](#workflow-security) 2. [Webhook Data Handling](#webhook-data-handling) 3. [Input Validation](#input-validation) 4. [Secret Management](#secret-management) 5. [Security Scanning](#security-scanning) 6. [Reporting Vulnerabilities](#reporting-vulnerabilities) --- ## Workflow Security ### Principle: Minimize Data Exposure **Problem:** GitHub Actions/Gitea Actions can expose sensitive data through: - Environment variables visible in logs - Debug output - Error messages - Process listings **Solution:** Use minimal data in workflows and sanitize all inputs. ### ❌ Bad: Exposing Full Webhook Data ```yaml # NEVER DO THIS - exposes all user data, emails, tokens env: EVENT_JSON: ${{ toJSON(github.event) }} run: | python process.py "$EVENT_JSON" ``` **Why this is dangerous:** - Full webhook payloads can contain user emails, private repo URLs, installation tokens - Data appears in workflow logs if debug mode is enabled - Environment variables can be dumped by malicious code - Violates principle of least privilege ### ✅ Good: Minimal Data Extraction ```yaml # SAFE: Only extract necessary fields run: | EVENT_DATA=$(cat < MAX_EVENT_SIZE: raise ValueError("Event data too large") ``` ### JSON Validation ```python try: data = json.loads(event_json) except json.JSONDecodeError as e: raise ValueError(f"Invalid JSON: {e}") if not isinstance(data, dict): raise ValueError("Event data must be a JSON object") ``` --- ## Secret Management ### Environment Variables Required secrets (set in CI/CD settings): - `AI_REVIEW_TOKEN` - Gitea/GitHub API token (read/write access) - `OPENAI_API_KEY` - OpenAI API key - `OPENROUTER_API_KEY` - OpenRouter API key (optional) - `OLLAMA_HOST` - Ollama server URL (optional) ### ❌ Never Commit Secrets ```python # NEVER DO THIS api_key = "sk-1234567890abcdef" # ❌ Hardcoded secret # NEVER DO THIS config = { "openai_key": "sk-1234567890abcdef" # ❌ Secret in config } ``` ### ✅ Always Use Environment Variables ```python # CORRECT api_key = os.environ.get("OPENAI_API_KEY") if not api_key: raise ValueError("OPENAI_API_KEY not set") ``` ### Secret Scanning The security scanner checks for: - Hardcoded API keys (pattern: `sk-[a-zA-Z0-9]{32,}`) - AWS keys (`AKIA[0-9A-Z]{16}`) - Private keys (`-----BEGIN.*PRIVATE KEY-----`) - Passwords in code (`password\s*=\s*["'][^"']+["']`) --- ## Security Scanning ### Automated Scanning All code is scanned for vulnerabilities: 1. **PR Reviews** - Automatic security scan on every PR 2. **Pre-commit Hooks** - Local scanning before commit 3. **Pattern-based Detection** - 17 built-in security rules ### Running Manual Scans ```bash # Scan a specific file python -c " from security.security_scanner import SecurityScanner s = SecurityScanner() with open('myfile.py') as f: findings = s.scan_content(f.read(), 'myfile.py') for f in findings: print(f'{f.severity}: {f.description}') " # Scan a git diff git diff | python tools/ai-review/security/scan_diff.py ``` ### Security Rule Categories - **A01: Broken Access Control** - Missing auth, insecure file operations - **A02: Cryptographic Failures** - Weak crypto, hardcoded secrets - **A03: Injection** - SQL injection, command injection, XSS - **A06: Vulnerable Components** - Insecure imports - **A07: Authentication Failures** - Weak auth mechanisms - **A09: Logging Failures** - Security logging issues ### Severity Levels - **HIGH**: Critical vulnerabilities requiring immediate fix - SQL injection, command injection, hardcoded secrets - **MEDIUM**: Important issues requiring attention - Missing input validation, weak crypto, XSS - **LOW**: Best practice violations - TODO comments with security keywords, eval() usage ### CI Failure Threshold Configure in `config.yml`: ```yaml review: fail_on_severity: HIGH # Fail CI if HIGH severity found ``` --- ## Webhook Signature Validation ### Future GitHub Integration When accepting webhooks directly (not through Gitea Actions): ```python from utils.webhook_sanitizer import validate_webhook_signature # Validate webhook is from GitHub signature = request.headers.get("X-Hub-Signature-256") payload = request.get_data(as_text=True) secret = os.environ["WEBHOOK_SECRET"] if not validate_webhook_signature(payload, signature, secret): return "Unauthorized", 401 ``` **Important:** Always validate webhook signatures to prevent: - Replay attacks - Forged webhook events - Unauthorized access --- ## Reporting Vulnerabilities ### Security Issues If you discover a security vulnerability: 1. **DO NOT** create a public issue 2. Email security contact: [maintainer email] 3. Include: - Description of the vulnerability - Steps to reproduce - Potential impact - Suggested fix (if available) ### Response Timeline - **Acknowledgment**: Within 48 hours - **Initial Assessment**: Within 1 week - **Fix Development**: Depends on severity - HIGH: Within 1 week - MEDIUM: Within 2 weeks - LOW: Next release cycle --- ## Security Checklist for Contributors Before submitting a PR: - [ ] No secrets in code or config files - [ ] All user inputs are validated - [ ] No SQL injection vulnerabilities - [ ] No command injection vulnerabilities - [ ] No XSS vulnerabilities - [ ] Sensitive data is sanitized before logging - [ ] Environment variables are not exposed in workflows - [ ] Repository format validation is used - [ ] Error messages don't leak sensitive info - [ ] Security scanner passes (no HIGH severity) --- ## Security Tools ### Webhook Sanitizer Location: `tools/ai-review/utils/webhook_sanitizer.py` Functions: - `sanitize_webhook_data(data)` - Remove sensitive fields - `extract_minimal_context(event_type, data)` - Minimal payload - `validate_repository_format(repo)` - Validate owner/repo - `validate_webhook_signature(payload, sig, secret)` - Verify webhook ### Safe Dispatch Utility Location: `tools/ai-review/utils/safe_dispatch.py` Usage: ```bash python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", ...}' ``` Features: - Input validation - Size limits (10MB max) - Automatic sanitization - Comprehensive error handling ### Security Scanner Location: `tools/ai-review/security/security_scanner.py` Features: - 17 built-in security rules - OWASP Top 10 coverage - CWE references - Severity classification - Pattern-based detection --- ## Best Practices Summary 1. **Minimize Data**: Only pass necessary data to workflows 2. **Validate Inputs**: Always validate external inputs 3. **Sanitize Outputs**: Remove sensitive data before logging 4. **Use Utilities**: Leverage `webhook_sanitizer.py` and `safe_dispatch.py` 5. **Scan Code**: Run security scanner before committing 6. **Rotate Secrets**: Regularly rotate API keys and tokens 7. **Review Changes**: Manual security review for sensitive changes 8. **Test Security**: Add tests for security-critical code --- ## Updates and Maintenance This security policy is reviewed quarterly and updated as needed. **Last Updated**: 2025-12-28 **Next Review**: 2026-03-28