remove documentation that are no longer needed
This commit is contained in:
419
docs/SECURITY.md
Normal file
419
docs/SECURITY.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# Security Guidelines for OpenRabbit
|
||||
|
||||
This document outlines security best practices and requirements for OpenRabbit development.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Workflow Security](#workflow-security)
|
||||
2. [Webhook Data Handling](#webhook-data-handling)
|
||||
3. [Input Validation](#input-validation)
|
||||
4. [Secret Management](#secret-management)
|
||||
5. [Security Scanning](#security-scanning)
|
||||
6. [Reporting Vulnerabilities](#reporting-vulnerabilities)
|
||||
|
||||
---
|
||||
|
||||
## Workflow Security
|
||||
|
||||
### Principle: Minimize Data Exposure
|
||||
|
||||
**Problem:** GitHub Actions/Gitea Actions can expose sensitive data through:
|
||||
- Environment variables visible in logs
|
||||
- Debug output
|
||||
- Error messages
|
||||
- Process listings
|
||||
|
||||
**Solution:** Use minimal data in workflows and sanitize all inputs.
|
||||
|
||||
### ❌ Bad: Exposing Full Webhook Data
|
||||
|
||||
```yaml
|
||||
# NEVER DO THIS - exposes all user data, emails, tokens
|
||||
env:
|
||||
EVENT_JSON: ${{ toJSON(github.event) }}
|
||||
run: |
|
||||
python process.py "$EVENT_JSON"
|
||||
```
|
||||
|
||||
**Why this is dangerous:**
|
||||
- Full webhook payloads can contain user emails, private repo URLs, installation tokens
|
||||
- Data appears in workflow logs if debug mode is enabled
|
||||
- Environment variables can be dumped by malicious code
|
||||
- Violates principle of least privilege
|
||||
|
||||
### ✅ Good: Minimal Data Extraction
|
||||
|
||||
```yaml
|
||||
# SAFE: Only extract necessary fields
|
||||
run: |
|
||||
EVENT_DATA=$(cat <<EOF
|
||||
{
|
||||
"issue": {
|
||||
"number": ${{ github.event.issue.number }}
|
||||
},
|
||||
"comment": {
|
||||
"body": $(echo '${{ github.event.comment.body }}' | jq -Rs .)
|
||||
}
|
||||
}
|
||||
EOF
|
||||
)
|
||||
python utils/safe_dispatch.py issue_comment "$REPO" "$EVENT_DATA"
|
||||
```
|
||||
|
||||
**Why this is safe:**
|
||||
- Only includes necessary fields (number, body)
|
||||
- Agents fetch full data from API with proper auth
|
||||
- Reduces attack surface
|
||||
- Follows data minimization principle
|
||||
|
||||
### Input Validation Requirements
|
||||
|
||||
All workflow inputs MUST be validated before use:
|
||||
|
||||
1. **Repository Format**
|
||||
```bash
|
||||
# Validate owner/repo format
|
||||
if ! echo "$REPO" | grep -qE '^[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$'; then
|
||||
echo "Error: Invalid repository format"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
2. **Numeric Inputs**
|
||||
```bash
|
||||
# Validate issue/PR numbers are numeric
|
||||
if ! [[ "$ISSUE_NUMBER" =~ ^[0-9]+$ ]]; then
|
||||
echo "Error: Invalid issue number"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
3. **String Sanitization**
|
||||
```bash
|
||||
# Use jq for JSON string escaping
|
||||
BODY=$(echo "$RAW_BODY" | jq -Rs .)
|
||||
```
|
||||
|
||||
### Boolean Comparison
|
||||
|
||||
```bash
|
||||
# ❌ WRONG: String comparison on boolean
|
||||
if [ "$IS_PR" = "true" ]; then
|
||||
|
||||
# ✅ CORRECT: Use workflow expression
|
||||
IS_PR="${{ gitea.event.issue.pull_request != null }}"
|
||||
if [ "$IS_PR" = "true" ]; then
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Webhook Data Handling
|
||||
|
||||
### Using the Sanitization Utilities
|
||||
|
||||
Always use `utils/webhook_sanitizer.py` when handling webhook data:
|
||||
|
||||
```python
|
||||
from utils.webhook_sanitizer import (
|
||||
sanitize_webhook_data,
|
||||
validate_repository_format,
|
||||
extract_minimal_context,
|
||||
)
|
||||
|
||||
# Sanitize data before logging or storing
|
||||
sanitized = sanitize_webhook_data(raw_event_data)
|
||||
|
||||
# Extract only necessary fields
|
||||
minimal = extract_minimal_context(event_type, sanitized)
|
||||
|
||||
# Validate repository input
|
||||
owner, repo = validate_repository_format(repo_string)
|
||||
```
|
||||
|
||||
### Sensitive Fields (Automatically Redacted)
|
||||
|
||||
The sanitizer removes these fields:
|
||||
- `email`, `private_email`, `email_addresses`
|
||||
- `token`, `access_token`, `refresh_token`, `api_key`
|
||||
- `secret`, `password`, `private_key`, `ssh_key`
|
||||
- `phone`, `address`, `ssn`, `credit_card`
|
||||
- `installation_id`, `node_id`
|
||||
|
||||
### Large Field Truncation
|
||||
|
||||
These fields are truncated to prevent log flooding:
|
||||
- `body`: 500 characters
|
||||
- `description`: 500 characters
|
||||
- `message`: 500 characters
|
||||
|
||||
---
|
||||
|
||||
## Input Validation
|
||||
|
||||
### Repository Name Validation
|
||||
|
||||
```python
|
||||
from utils.webhook_sanitizer import validate_repository_format
|
||||
|
||||
try:
|
||||
owner, repo = validate_repository_format(user_input)
|
||||
except ValueError as e:
|
||||
logger.error(f"Invalid repository: {e}")
|
||||
return
|
||||
```
|
||||
|
||||
**Checks performed:**
|
||||
- Format is `owner/repo`
|
||||
- No path traversal (`..`)
|
||||
- No shell injection characters (`;`, `|`, `&`, `` ` ``, etc.)
|
||||
- Non-empty owner and repo name
|
||||
|
||||
### Event Data Size Limits
|
||||
|
||||
```python
|
||||
# Maximum event size: 10MB
|
||||
MAX_EVENT_SIZE = 10 * 1024 * 1024
|
||||
|
||||
if len(event_json) > MAX_EVENT_SIZE:
|
||||
raise ValueError("Event data too large")
|
||||
```
|
||||
|
||||
### JSON Validation
|
||||
|
||||
```python
|
||||
try:
|
||||
data = json.loads(event_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"Invalid JSON: {e}")
|
||||
|
||||
if not isinstance(data, dict):
|
||||
raise ValueError("Event data must be a JSON object")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Secret Management
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Required secrets (set in CI/CD settings):
|
||||
- `AI_REVIEW_TOKEN` - Gitea/GitHub API token (read/write access)
|
||||
- `OPENAI_API_KEY` - OpenAI API key
|
||||
- `OPENROUTER_API_KEY` - OpenRouter API key (optional)
|
||||
- `OLLAMA_HOST` - Ollama server URL (optional)
|
||||
|
||||
### ❌ Never Commit Secrets
|
||||
|
||||
```python
|
||||
# NEVER DO THIS
|
||||
api_key = "sk-1234567890abcdef" # ❌ Hardcoded secret
|
||||
|
||||
# NEVER DO THIS
|
||||
config = {
|
||||
"openai_key": "sk-1234567890abcdef" # ❌ Secret in config
|
||||
}
|
||||
```
|
||||
|
||||
### ✅ Always Use Environment Variables
|
||||
|
||||
```python
|
||||
# CORRECT
|
||||
api_key = os.environ.get("OPENAI_API_KEY")
|
||||
if not api_key:
|
||||
raise ValueError("OPENAI_API_KEY not set")
|
||||
```
|
||||
|
||||
### Secret Scanning
|
||||
|
||||
The security scanner checks for:
|
||||
- Hardcoded API keys (pattern: `sk-[a-zA-Z0-9]{32,}`)
|
||||
- AWS keys (`AKIA[0-9A-Z]{16}`)
|
||||
- Private keys (`-----BEGIN.*PRIVATE KEY-----`)
|
||||
- Passwords in code (`password\s*=\s*["'][^"']+["']`)
|
||||
|
||||
---
|
||||
|
||||
## Security Scanning
|
||||
|
||||
### Automated Scanning
|
||||
|
||||
All code is scanned for vulnerabilities:
|
||||
|
||||
1. **PR Reviews** - Automatic security scan on every PR
|
||||
2. **Pre-commit Hooks** - Local scanning before commit
|
||||
3. **Pattern-based Detection** - 17 built-in security rules
|
||||
|
||||
### Running Manual Scans
|
||||
|
||||
```bash
|
||||
# Scan a specific file
|
||||
python -c "
|
||||
from security.security_scanner import SecurityScanner
|
||||
s = SecurityScanner()
|
||||
with open('myfile.py') as f:
|
||||
findings = s.scan_content(f.read(), 'myfile.py')
|
||||
for f in findings:
|
||||
print(f'{f.severity}: {f.description}')
|
||||
"
|
||||
|
||||
# Scan a git diff
|
||||
git diff | python tools/ai-review/security/scan_diff.py
|
||||
```
|
||||
|
||||
### Security Rule Categories
|
||||
|
||||
- **A01: Broken Access Control** - Missing auth, insecure file operations
|
||||
- **A02: Cryptographic Failures** - Weak crypto, hardcoded secrets
|
||||
- **A03: Injection** - SQL injection, command injection, XSS
|
||||
- **A06: Vulnerable Components** - Insecure imports
|
||||
- **A07: Authentication Failures** - Weak auth mechanisms
|
||||
- **A09: Logging Failures** - Security logging issues
|
||||
|
||||
### Severity Levels
|
||||
|
||||
- **HIGH**: Critical vulnerabilities requiring immediate fix
|
||||
- SQL injection, command injection, hardcoded secrets
|
||||
|
||||
- **MEDIUM**: Important issues requiring attention
|
||||
- Missing input validation, weak crypto, XSS
|
||||
|
||||
- **LOW**: Best practice violations
|
||||
- TODO comments with security keywords, eval() usage
|
||||
|
||||
### CI Failure Threshold
|
||||
|
||||
Configure in `config.yml`:
|
||||
|
||||
```yaml
|
||||
review:
|
||||
fail_on_severity: HIGH # Fail CI if HIGH severity found
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Webhook Signature Validation
|
||||
|
||||
### Future GitHub Integration
|
||||
|
||||
When accepting webhooks directly (not through Gitea Actions):
|
||||
|
||||
```python
|
||||
from utils.webhook_sanitizer import validate_webhook_signature
|
||||
|
||||
# Validate webhook is from GitHub
|
||||
signature = request.headers.get("X-Hub-Signature-256")
|
||||
payload = request.get_data(as_text=True)
|
||||
secret = os.environ["WEBHOOK_SECRET"]
|
||||
|
||||
if not validate_webhook_signature(payload, signature, secret):
|
||||
return "Unauthorized", 401
|
||||
```
|
||||
|
||||
**Important:** Always validate webhook signatures to prevent:
|
||||
- Replay attacks
|
||||
- Forged webhook events
|
||||
- Unauthorized access
|
||||
|
||||
---
|
||||
|
||||
## Reporting Vulnerabilities
|
||||
|
||||
### Security Issues
|
||||
|
||||
If you discover a security vulnerability:
|
||||
|
||||
1. **DO NOT** create a public issue
|
||||
2. Email security contact: [maintainer email]
|
||||
3. Include:
|
||||
- Description of the vulnerability
|
||||
- Steps to reproduce
|
||||
- Potential impact
|
||||
- Suggested fix (if available)
|
||||
|
||||
### Response Timeline
|
||||
|
||||
- **Acknowledgment**: Within 48 hours
|
||||
- **Initial Assessment**: Within 1 week
|
||||
- **Fix Development**: Depends on severity
|
||||
- HIGH: Within 1 week
|
||||
- MEDIUM: Within 2 weeks
|
||||
- LOW: Next release cycle
|
||||
|
||||
---
|
||||
|
||||
## Security Checklist for Contributors
|
||||
|
||||
Before submitting a PR:
|
||||
|
||||
- [ ] No secrets in code or config files
|
||||
- [ ] All user inputs are validated
|
||||
- [ ] No SQL injection vulnerabilities
|
||||
- [ ] No command injection vulnerabilities
|
||||
- [ ] No XSS vulnerabilities
|
||||
- [ ] Sensitive data is sanitized before logging
|
||||
- [ ] Environment variables are not exposed in workflows
|
||||
- [ ] Repository format validation is used
|
||||
- [ ] Error messages don't leak sensitive info
|
||||
- [ ] Security scanner passes (no HIGH severity)
|
||||
|
||||
---
|
||||
|
||||
## Security Tools
|
||||
|
||||
### Webhook Sanitizer
|
||||
|
||||
Location: `tools/ai-review/utils/webhook_sanitizer.py`
|
||||
|
||||
Functions:
|
||||
- `sanitize_webhook_data(data)` - Remove sensitive fields
|
||||
- `extract_minimal_context(event_type, data)` - Minimal payload
|
||||
- `validate_repository_format(repo)` - Validate owner/repo
|
||||
- `validate_webhook_signature(payload, sig, secret)` - Verify webhook
|
||||
|
||||
### Safe Dispatch Utility
|
||||
|
||||
Location: `tools/ai-review/utils/safe_dispatch.py`
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
python utils/safe_dispatch.py issue_comment owner/repo '{"action": "created", ...}'
|
||||
```
|
||||
|
||||
Features:
|
||||
- Input validation
|
||||
- Size limits (10MB max)
|
||||
- Automatic sanitization
|
||||
- Comprehensive error handling
|
||||
|
||||
### Security Scanner
|
||||
|
||||
Location: `tools/ai-review/security/security_scanner.py`
|
||||
|
||||
Features:
|
||||
- 17 built-in security rules
|
||||
- OWASP Top 10 coverage
|
||||
- CWE references
|
||||
- Severity classification
|
||||
- Pattern-based detection
|
||||
|
||||
---
|
||||
|
||||
## Best Practices Summary
|
||||
|
||||
1. **Minimize Data**: Only pass necessary data to workflows
|
||||
2. **Validate Inputs**: Always validate external inputs
|
||||
3. **Sanitize Outputs**: Remove sensitive data before logging
|
||||
4. **Use Utilities**: Leverage `webhook_sanitizer.py` and `safe_dispatch.py`
|
||||
5. **Scan Code**: Run security scanner before committing
|
||||
6. **Rotate Secrets**: Regularly rotate API keys and tokens
|
||||
7. **Review Changes**: Manual security review for sensitive changes
|
||||
8. **Test Security**: Add tests for security-critical code
|
||||
|
||||
---
|
||||
|
||||
## Updates and Maintenance
|
||||
|
||||
This security policy is reviewed quarterly and updated as needed.
|
||||
|
||||
**Last Updated**: 2025-12-28
|
||||
**Next Review**: 2026-03-28
|
||||
Reference in New Issue
Block a user