9.9 KiB
Security Guidelines for OpenRabbit
This document outlines security best practices and requirements for OpenRabbit development.
Table of Contents
- Workflow Security
- Webhook Data Handling
- Input Validation
- Secret Management
- Security Scanning
- Reporting Vulnerabilities
Workflow Security
Principle: Minimize Data Exposure
Problem: GitHub Actions/Gitea Actions can expose sensitive data through:
- Environment variables visible in logs
- Debug output
- Error messages
- Process listings
Solution: Use minimal data in workflows and sanitize all inputs.
❌ Bad: Exposing Full Webhook Data
# NEVER DO THIS - exposes all user data, emails, tokens
env:
EVENT_JSON: ${{ toJSON(github.event) }}
run: |
python process.py "$EVENT_JSON"
Why this is dangerous:
- Full webhook payloads can contain user emails, private repo URLs, installation tokens
- Data appears in workflow logs if debug mode is enabled
- Environment variables can be dumped by malicious code
- Violates principle of least privilege
✅ Good: Minimal Data Extraction
# SAFE: Only extract necessary fields
run: |
EVENT_DATA=$(cat <<EOF
{
"issue": {
"number": ${{ github.event.issue.number }}
},
"comment": {
"body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)
}
}
EOF
)
python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"
Why this is safe:
- Only includes necessary fields (number, body)
- Agents fetch full data from API with proper auth
- Reduces attack surface
- Follows data minimization principle
Input Validation Requirements
All workflow inputs MUST be validated before use:
-
Repository Format
# Validate owner/repo format if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then echo "Error: Invalid repository format" exit 1 fi -
Numeric Inputs
# Validate issue/PR numbers are numeric if ! [[ "$ISSUE_NUMBER" =~ ^[0-9]+$ ]]; then echo "Error: Invalid issue number" exit 1 fi -
String Sanitization
# Use jq for JSON string escaping BODY=$(echo "$RAW_BODY" | jq -Rs .)
Boolean Comparison
# ❌ WRONG: String comparison on boolean
if [ "$IS_PR" = "true" ]; then
# ✅ CORRECT: Use workflow expression
IS_PR="${{ gitea.event.issue.pull_request != null }}"
if [ "$IS_PR" = "true" ]; then
Webhook Data Handling
Using the Sanitization Utilities
Always use utils/webhook_sanitizer.py when handling webhook data:
from utils.webhook_sanitizer import (
sanitize_webhook_data,
validate_repository_format,
extract_minimal_context,
)
# Sanitize data before logging or storing
sanitized = sanitize_webhook_data(raw_event_data)
# Extract only necessary fields
minimal = extract_minimal_context(event_type, sanitized)
# Validate repository input
owner, repo = validate_repository_format(repo_string)
Sensitive Fields (Automatically Redacted)
The sanitizer removes these fields:
email,private_email,email_addressestoken,access_token,refresh_token,api_keysecret,password,private_key,ssh_keyphone,address,ssn,credit_cardinstallation_id,node_id
Large Field Truncation
These fields are truncated to prevent log flooding:
body: 500 charactersdescription: 500 charactersmessage: 500 characters
Input Validation
Repository Name Validation
from utils.webhook_sanitizer import validate_repository_format
try:
owner, repo = validate_repository_format(user_input)
except ValueError as e:
logger.error(f"Invalid repository: {e}")
return
Checks performed:
- Format is
owner/repo - No path traversal (
..) - No shell injection characters (
;,|,&,`, etc.) - Non-empty owner and repo name
Event Data Size Limits
# Maximum event size: 10MB
MAX_EVENT_SIZE = 10 * 1024 * 1024
if len(event_json) > MAX_EVENT_SIZE:
raise ValueError("Event data too large")
JSON Validation
try:
data = json.loads(event_json)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON: {e}")
if not isinstance(data, dict):
raise ValueError("Event data must be a JSON object")
Secret Management
Environment Variables
Required secrets (set in CI/CD settings):
AI_REVIEW_TOKEN- Gitea/GitHub API token (read/write access)OPENAI_API_KEY- OpenAI API keyOPENROUTER_API_KEY- OpenRouter API key (optional)OLLAMA_HOST- Ollama server URL (optional)
❌ Never Commit Secrets
# NEVER DO THIS
api_key = "sk-1234567890abcdef" # ❌ Hardcoded secret
# NEVER DO THIS
config = {
"openai_key": "sk-1234567890abcdef" # ❌ Secret in config
}
✅ Always Use Environment Variables
# CORRECT
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
raise ValueError("OPENAI_API_KEY not set")
Secret Scanning
The security scanner checks for:
- Hardcoded API keys (pattern:
sk-[a-zA-Z0-9]{32,}) - AWS keys (
AKIA[0-9A-Z]{16}) - Private keys (
-----BEGIN.*PRIVATE KEY-----) - Passwords in code (
password\s*=\s*["'][^"']+["'])
Security Scanning
Automated Scanning
All code is scanned for vulnerabilities:
- PR Reviews - Automatic security scan on every PR
- Pre-commit Hooks - Local scanning before commit
- Pattern-based Detection - 17 built-in security rules
Running Manual Scans
# Scan a specific file
python -c "
from security.security_scanner import SecurityScanner
s = SecurityScanner()
with open('myfile.py') as f:
findings = s.scan_content(f.read(), 'myfile.py')
for f in findings:
print(f'{f.severity}: {f.description}')
"
# Scan a git diff
git diff | python tools/ai-review/security/scan_diff.py
Security Rule Categories
- A01: Broken Access Control - Missing auth, insecure file operations
- A02: Cryptographic Failures - Weak crypto, hardcoded secrets
- A03: Injection - SQL injection, command injection, XSS
- A06: Vulnerable Components - Insecure imports
- A07: Authentication Failures - Weak auth mechanisms
- A09: Logging Failures - Security logging issues
Severity Levels
-
HIGH: Critical vulnerabilities requiring immediate fix
- SQL injection, command injection, hardcoded secrets
-
MEDIUM: Important issues requiring attention
- Missing input validation, weak crypto, XSS
-
LOW: Best practice violations
- TODO comments with security keywords, eval() usage
CI Failure Threshold
Configure in config.yml:
review:
fail_on_severity: HIGH # Fail CI if HIGH severity found
Webhook Signature Validation
Future GitHub Integration
When accepting webhooks directly (not through Gitea Actions):
from utils.webhook_sanitizer import validate_webhook_signature
# Validate webhook is from GitHub
signature = request.headers.get("X-Hub-Signature-256")
payload = request.get_data(as_text=True)
secret = os.environ["WEBHOOK_SECRET"]
if not validate_webhook_signature(payload, signature, secret):
return "Unauthorized", 401
Important: Always validate webhook signatures to prevent:
- Replay attacks
- Forged webhook events
- Unauthorized access
Reporting Vulnerabilities
Security Issues
If you discover a security vulnerability:
- DO NOT create a public issue
- Email security contact: [maintainer email]
- Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if available)
Response Timeline
- Acknowledgment: Within 48 hours
- Initial Assessment: Within 1 week
- Fix Development: Depends on severity
- HIGH: Within 1 week
- MEDIUM: Within 2 weeks
- LOW: Next release cycle
Security Checklist for Contributors
Before submitting a PR:
- No secrets in code or config files
- All user inputs are validated
- No SQL injection vulnerabilities
- No command injection vulnerabilities
- No XSS vulnerabilities
- Sensitive data is sanitized before logging
- Environment variables are not exposed in workflows
- Repository format validation is used
- Error messages don't leak sensitive info
- Security scanner passes (no HIGH severity)
Security Tools
Webhook Sanitizer
Location: tools/ai-review/utils/webhook_sanitizer.py
Functions:
sanitize_webhook_data(data)- Remove sensitive fieldsextract_minimal_context(event_type, data)- Minimal payloadvalidate_repository_format(repo)- Validate owner/repovalidate_webhook_signature(payload, sig, secret)- Verify webhook
Safe Dispatch Utility
Location: tools/ai-review/utils/safe_dispatch.py
Usage:
python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", ...}'
Features:
- Input validation
- Size limits (10MB max)
- Automatic sanitization
- Comprehensive error handling
Security Scanner
Location: tools/ai-review/security/security_scanner.py
Features:
- 17 built-in security rules
- OWASP Top 10 coverage
- CWE references
- Severity classification
- Pattern-based detection
Best Practices Summary
- Minimize Data: Only pass necessary data to workflows
- Validate Inputs: Always validate external inputs
- Sanitize Outputs: Remove sensitive data before logging
- Use Utilities: Leverage
webhook_sanitizer.pyandsafe_dispatch.py - Scan Code: Run security scanner before committing
- Rotate Secrets: Regularly rotate API keys and tokens
- Review Changes: Manual security review for sensitive changes
- Test Security: Add tests for security-critical code
Updates and Maintenance
This security policy is reviewed quarterly and updated as needed.
Last Updated: 2025-12-28
Next Review: 2026-03-28