feat: Complete minimal bot refactor - AI providers, models, docs, and migration
Changes: - Strip AI providers to image-only analysis (remove text/phishing methods) - Simplify guild models (remove BannedWord, reduce GuildSettings columns) - Create migration to drop unused tables and columns - Rewrite README for minimal bot focus - Update CLAUDE.md architecture documentation Result: -992 lines, +158 lines (net -834 lines) Cost-conscious bot ready for deployment.
This commit is contained in:
74
CLAUDE.md
74
CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
GuardDen is a Discord moderation bot built with discord.py, PostgreSQL, and optional AI integration (Claude/OpenAI). Self-hosted with Docker support.
|
GuardDen is a minimal, cost-conscious Discord moderation bot focused on spam detection and NSFW image filtering. Built with discord.py, PostgreSQL, and optional AI integration (Claude/OpenAI) for image analysis only. Self-hosted with Docker support.
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
@@ -19,7 +19,7 @@ python -m guardden
|
|||||||
pytest
|
pytest
|
||||||
|
|
||||||
# Run single test
|
# Run single test
|
||||||
pytest tests/test_verification.py::TestVerificationService::test_verify_correct
|
pytest tests/test_automod.py::TestAutomodService::test_spam_detection
|
||||||
|
|
||||||
# Lint and format
|
# Lint and format
|
||||||
ruff check src tests
|
ruff check src tests
|
||||||
@@ -30,73 +30,48 @@ mypy src
|
|||||||
|
|
||||||
# Docker deployment
|
# Docker deployment
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
|
# Database migrations
|
||||||
|
alembic upgrade head
|
||||||
```
|
```
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
- `src/guardden/bot.py` - Main bot class (`GuardDen`) extending `commands.Bot`, manages lifecycle and services
|
- `src/guardden/bot.py` - Main bot class (`GuardDen`) extending `commands.Bot`, manages lifecycle and services
|
||||||
- `src/guardden/config.py` - Pydantic settings loaded from environment variables (prefix: `GUARDDEN_`)
|
- `src/guardden/config.py` - Pydantic settings loaded from environment variables (prefix: `GUARDDEN_`)
|
||||||
- `src/guardden/models/` - SQLAlchemy 2.0 async models for PostgreSQL
|
- `src/guardden/models/guild.py` - SQLAlchemy 2.0 async models for guilds and settings
|
||||||
- `src/guardden/services/` - Business logic (database, guild config, automod, AI, verification, rate limiting)
|
- `src/guardden/services/` - Business logic (database, guild config, automod, AI, rate limiting)
|
||||||
- `src/guardden/cogs/` - Discord command groups (events, moderation, admin, automod, ai_moderation, verification)
|
- `src/guardden/cogs/` - Discord command groups (automod, ai_moderation, owner)
|
||||||
|
- `config.yml` - Single YAML file for bot configuration
|
||||||
|
|
||||||
## Key Patterns
|
## Key Patterns
|
||||||
|
|
||||||
- All database operations use async SQLAlchemy with `asyncpg`
|
- All database operations use async SQLAlchemy with `asyncpg`
|
||||||
- Guild configurations are cached in `GuildConfigService._cache`
|
- Guild configurations loaded from single `config.yml` file (not per-guild)
|
||||||
- Discord snowflake IDs stored as `BigInteger` in PostgreSQL
|
- Discord snowflake IDs stored as `BigInteger` in PostgreSQL
|
||||||
- Moderation actions logged to `ModerationLog` table with automatic strike escalation
|
- No moderation logging or strike system
|
||||||
- Environment variables: `GUARDDEN_DISCORD_TOKEN`, `GUARDDEN_DATABASE_URL`
|
- Environment variables: `GUARDDEN_DISCORD_TOKEN`, `GUARDDEN_DATABASE_URL`, AI keys
|
||||||
|
|
||||||
## Automod System
|
## Automod System
|
||||||
|
|
||||||
- `AutomodService` in `services/automod.py` handles rule-based content filtering
|
- `AutomodService` in `services/automod.py` handles spam detection
|
||||||
- Checks run in order: banned words → scam links → spam → invite links
|
- Checks: message rate limit → duplicate messages → mass mentions
|
||||||
- Spam tracking uses per-guild, per-user trackers with automatic cleanup
|
- Spam tracking uses per-guild, per-user trackers with automatic cleanup
|
||||||
- Scam detection uses compiled regex patterns in `SCAM_PATTERNS` list
|
|
||||||
- Results return `AutomodResult` dataclass with actions to take
|
- Results return `AutomodResult` dataclass with actions to take
|
||||||
- **Whitelist**: Users in `GuildSettings.whitelisted_user_ids` bypass ALL automod checks
|
- Everyone gets moderated (no whitelist, no bypass for permissions)
|
||||||
- Users with "Manage Messages" permission also bypass automod
|
|
||||||
|
|
||||||
## AI Moderation System
|
## AI Moderation System
|
||||||
|
|
||||||
- `services/ai/` contains provider abstraction and implementations
|
- `services/ai/` contains provider abstraction and implementations
|
||||||
- `AIProvider` base class defines interface: `moderate_text()`, `analyze_image()`, `analyze_phishing()`
|
- `AIProvider` base class defines interface: `analyze_image()` only
|
||||||
- `AnthropicProvider` and `OpenAIProvider` implement the interface
|
- `AnthropicProvider` and `OpenAIProvider` implement the interface
|
||||||
- `NullProvider` used when AI is disabled (returns empty results)
|
- `NullProvider` used when AI is disabled (returns empty results)
|
||||||
- Factory pattern via `create_ai_provider(provider, api_key)`
|
- Factory pattern via `create_ai_provider(provider, api_key)`
|
||||||
- `ModerationResult` includes severity scoring based on confidence + category weights
|
- `ImageAnalysisResult` includes NSFW categories, severity, confidence
|
||||||
- Sensitivity setting (0-100) adjusts thresholds per guild
|
- Sensitivity setting (0-100) adjusts thresholds
|
||||||
- **NSFW-Only Filtering** (default: `True`): When enabled, only sexual content is filtered; violence, harassment, etc. are allowed
|
- **NSFW-Only Filtering** (default: `True`): Only sexual content is filtered
|
||||||
- Filtering controlled by `nsfw_only_filtering` field in `GuildSettings`
|
- **Cost Controls**: Rate limiting, deduplication, file size limits, max images per message
|
||||||
- **Whitelist**: Users in `GuildSettings.whitelisted_user_ids` bypass ALL AI moderation checks
|
- `AIRateLimiter` in `services/ai_rate_limiter.py` tracks usage
|
||||||
|
|
||||||
## Verification System
|
|
||||||
|
|
||||||
- `VerificationService` in `services/verification.py` manages challenges
|
|
||||||
- Challenge types: button, captcha, math, emoji (via `ChallengeGenerator` classes)
|
|
||||||
- `PendingVerification` tracks user challenges with expiry and attempt limits
|
|
||||||
- Discord UI components in `cogs/verification.py`: `VerifyButton`, `EmojiButton`, `CaptchaModal`
|
|
||||||
- Background task cleans up expired verifications every 5 minutes
|
|
||||||
|
|
||||||
## Rate Limiting System
|
|
||||||
|
|
||||||
- `RateLimiter` in `services/ratelimit.py` provides general-purpose rate limiting
|
|
||||||
- Scopes: USER (global), MEMBER (per-guild), CHANNEL, GUILD
|
|
||||||
- `@ratelimit()` decorator for easy command rate limiting
|
|
||||||
- `get_rate_limiter()` returns singleton instance
|
|
||||||
- Default limits configured for commands, moderation, verification, messages
|
|
||||||
|
|
||||||
## Notification System
|
|
||||||
|
|
||||||
- `utils/notifications.py` contains `send_moderation_notification()` utility
|
|
||||||
- Handles sending moderation warnings to users with DM → in-channel fallback
|
|
||||||
- **In-Channel Warnings** (default: `False`): Optional PUBLIC channel messages when DMs fail
|
|
||||||
- **IMPORTANT**: In-channel messages are PUBLIC, visible to all users (Discord API limitation)
|
|
||||||
- Temporary messages auto-delete after 10 seconds to minimize clutter
|
|
||||||
- Used by automod, AI moderation, and manual moderation commands
|
|
||||||
- Controlled by `send_in_channel_warnings` field in `GuildSettings`
|
|
||||||
- Disabled by default for privacy reasons
|
|
||||||
|
|
||||||
## Adding New Cogs
|
## Adding New Cogs
|
||||||
|
|
||||||
@@ -110,10 +85,3 @@ docker compose up -d
|
|||||||
2. Implement `AIProvider` abstract class
|
2. Implement `AIProvider` abstract class
|
||||||
3. Add to factory in `services/ai/factory.py`
|
3. Add to factory in `services/ai/factory.py`
|
||||||
4. Add config option in `config.py`
|
4. Add config option in `config.py`
|
||||||
|
|
||||||
## Adding New Challenge Type
|
|
||||||
|
|
||||||
1. Create new `ChallengeGenerator` subclass in `services/verification.py`
|
|
||||||
2. Add to `ChallengeType` enum
|
|
||||||
3. Register in `VerificationService._generators`
|
|
||||||
4. Create corresponding UI components in `cogs/verification.py` if needed
|
|
||||||
|
|||||||
588
README.md
588
README.md
@@ -1,41 +1,19 @@
|
|||||||
# GuardDen
|
# GuardDen
|
||||||
|
|
||||||
GuardDen is a comprehensive Discord moderation bot designed to protect your community while maintaining a warm, welcoming environment. Built with privacy and self-hosting in mind, GuardDen combines AI-powered content filtering with traditional moderation tools to create a safe space for your members.
|
A lightweight, cost-conscious Discord moderation bot focused on essential protection. Built for self-hosting with minimal resource usage and AI costs.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
### Core Moderation
|
### Spam Detection
|
||||||
- **Warn, Kick, Ban, Timeout** - Standard moderation commands with logging
|
|
||||||
- **Strike System** - Configurable point-based system with automatic escalation
|
|
||||||
- **Moderation History** - Track all actions taken against users
|
|
||||||
- **Bulk Message Deletion** - Purge up to 100 messages at once
|
|
||||||
|
|
||||||
### Automod
|
|
||||||
- **Banned Words Filter** - Block words/phrases with regex support
|
|
||||||
- **Scam Detection** - Automatic detection of phishing/scam links
|
|
||||||
- **Anti-Spam** - Rate limiting, duplicate detection, mass mention protection
|
- **Anti-Spam** - Rate limiting, duplicate detection, mass mention protection
|
||||||
- **Link Filtering** - Block Discord invites and suspicious URLs
|
- **Automatic Actions** - Message deletion and user timeout for spam violations
|
||||||
|
|
||||||
### AI Moderation
|
### AI-Powered NSFW Image Detection
|
||||||
- **Text Analysis** - AI-powered content moderation using Claude or GPT
|
- **Smart Image Analysis** - AI-powered detection of inappropriate images using Claude or GPT
|
||||||
- **NSFW Image Detection** - Automatic flagging of inappropriate images
|
- **Cost Controls** - Conservative rate limits (25 checks/hour/guild by default)
|
||||||
- **NSFW-Only Filtering** - Enabled by default - only filters sexual content, allows violence/harassment
|
- **Embed Support** - Optional checking of Discord GIF embeds
|
||||||
- **Phishing Analysis** - AI-enhanced detection of scam URLs
|
- **NSFW Video Domain Blocking** - Block known NSFW video domains
|
||||||
- **Configurable Sensitivity** - Adjust strictness per server (0-100)
|
- **Configurable Sensitivity** - Adjust strictness (0-100)
|
||||||
- **Public In-Channel Warnings** - Optional: sends temporary public channel messages when users have DMs disabled
|
|
||||||
|
|
||||||
### Verification System
|
|
||||||
- **Multiple Challenge Types** - Button, captcha, math problems, emoji selection
|
|
||||||
- **Automatic New Member Verification** - Challenge users on join
|
|
||||||
- **Configurable Verified Role** - Auto-assign role on successful verification
|
|
||||||
- **Rate Limited** - Prevents verification spam
|
|
||||||
|
|
||||||
### Logging
|
|
||||||
- Member joins/leaves
|
|
||||||
- Message edits and deletions
|
|
||||||
- Voice channel activity
|
|
||||||
- Ban/unban events
|
|
||||||
- All moderation actions
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
@@ -55,32 +33,22 @@ GuardDen is a comprehensive Discord moderation bot designed to protect your comm
|
|||||||
- Disable **Public Bot** if you only want yourself to add it
|
- Disable **Public Bot** if you only want yourself to add it
|
||||||
- Copy the **Token** (click "Reset Token") - this is your `GUARDDEN_DISCORD_TOKEN`
|
- Copy the **Token** (click "Reset Token") - this is your `GUARDDEN_DISCORD_TOKEN`
|
||||||
|
|
||||||
5. **Enable Privileged Gateway Intents** (all three required):
|
5. **Enable Privileged Gateway Intents** (required):
|
||||||
- **Presence Intent** - for user status tracking
|
- **Message Content Intent** - for reading messages (spam detection, image checking)
|
||||||
- **Server Members Intent** - for member join/leave events, verification
|
|
||||||
- **Message Content Intent** - for reading messages (automod, AI moderation)
|
|
||||||
|
|
||||||
6. **Generate Invite URL** - Go to **OAuth2** > **URL Generator**:
|
6. **Generate Invite URL** - Go to **OAuth2** > **URL Generator**:
|
||||||
|
|
||||||
**Scopes:**
|
**Scopes:**
|
||||||
- `bot`
|
- `bot`
|
||||||
- `applications.commands`
|
|
||||||
|
|
||||||
**Bot Permissions:**
|
**Bot Permissions:**
|
||||||
- Manage Roles
|
|
||||||
- Kick Members
|
|
||||||
- Ban Members
|
|
||||||
- Moderate Members (timeout)
|
- Moderate Members (timeout)
|
||||||
- Manage Channels
|
|
||||||
- View Channels
|
- View Channels
|
||||||
- Send Messages
|
- Send Messages
|
||||||
- Manage Messages
|
- Manage Messages
|
||||||
- Embed Links
|
|
||||||
- Attach Files
|
|
||||||
- Read Message History
|
- Read Message History
|
||||||
- Add Reactions
|
|
||||||
|
|
||||||
Or use permission integer: `1239943348294`
|
Or use permission integer: `275415089216`
|
||||||
|
|
||||||
7. Use the generated URL to invite the bot to your server
|
7. Use the generated URL to invite the bot to your server
|
||||||
|
|
||||||
@@ -134,310 +102,94 @@ GuardDen is a comprehensive Discord moderation bot designed to protect your comm
|
|||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
GuardDen now supports **file-based configuration** as the primary method for managing bot settings. This replaces Discord commands for configuration, providing better version control, easier management, and more reliable deployments.
|
GuardDen uses a **single YAML configuration file** (`config.yml`) for managing all bot settings across all guilds.
|
||||||
|
|
||||||
### File-Based Configuration (Recommended)
|
### Configuration File (`config.yml`)
|
||||||
|
|
||||||
#### Directory Structure
|
Create a `config.yml` file in your project root:
|
||||||
```
|
|
||||||
config/
|
|
||||||
├── guilds/
|
|
||||||
│ ├── guild-123456789.yml # Per-server configuration
|
|
||||||
│ ├── guild-987654321.yml
|
|
||||||
│ └── default-template.yml # Template for new servers
|
|
||||||
├── wordlists/
|
|
||||||
│ ├── banned-words.yml # Custom banned words
|
|
||||||
│ ├── domain-allowlists.yml # Allowed domains whitelist
|
|
||||||
│ └── external-sources.yml # Managed wordlist sources
|
|
||||||
├── schemas/
|
|
||||||
│ ├── guild-schema.yml # Configuration validation
|
|
||||||
│ └── wordlists-schema.yml
|
|
||||||
└── templates/
|
|
||||||
└── guild-default.yml # Default configuration template
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Quick Start with File Configuration
|
```yaml
|
||||||
|
bot:
|
||||||
|
prefix: "!"
|
||||||
|
owner_ids:
|
||||||
|
- 123456789012345678 # Your Discord user ID
|
||||||
|
|
||||||
1. **Create your first server configuration:**
|
# Spam detection settings
|
||||||
```bash
|
automod:
|
||||||
python -m guardden.cli.config guild create 123456789012345678 "My Discord Server"
|
enabled: true
|
||||||
```
|
anti_spam_enabled: true
|
||||||
|
message_rate_limit: 5 # Max messages per window
|
||||||
|
message_rate_window: 5 # Window in seconds
|
||||||
|
duplicate_threshold: 3 # Duplicates to trigger
|
||||||
|
mention_limit: 5 # Max mentions per message
|
||||||
|
mention_rate_limit: 10 # Max mentions per window
|
||||||
|
mention_rate_window: 60 # Window in seconds
|
||||||
|
|
||||||
2. **Edit the configuration file:**
|
# AI moderation settings
|
||||||
```bash
|
ai_moderation:
|
||||||
nano config/guilds/guild-123456789012345678.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Customize settings (example):**
|
|
||||||
```yaml
|
|
||||||
# Basic server information
|
|
||||||
guild_id: 123456789012345678
|
|
||||||
name: "My Discord Server"
|
|
||||||
|
|
||||||
settings:
|
|
||||||
# AI Moderation
|
|
||||||
ai_moderation:
|
|
||||||
enabled: true
|
enabled: true
|
||||||
sensitivity: 80 # 0-100 (higher = stricter)
|
sensitivity: 80 # 0-100 (higher = stricter)
|
||||||
nsfw_only_filtering: true # Only block sexual content, allow violence
|
nsfw_only_filtering: true # Only filter sexual content
|
||||||
|
max_checks_per_hour_per_guild: 25 # Cost control
|
||||||
|
max_checks_per_user_per_hour: 5 # Cost control
|
||||||
|
max_images_per_message: 2 # Analyze max 2 images/msg
|
||||||
|
max_image_size_mb: 3 # Skip images > 3MB
|
||||||
|
check_embed_images: true # Check Discord GIF embeds
|
||||||
|
check_video_thumbnails: false # Skip video thumbnails
|
||||||
|
url_image_check_enabled: false # Skip URL image downloads
|
||||||
|
|
||||||
# Automod settings
|
# Known NSFW video domains (auto-block)
|
||||||
automod:
|
nsfw_video_domains:
|
||||||
message_rate_limit: 5 # Max messages per 5 seconds
|
- pornhub.com
|
||||||
scam_allowlist:
|
- xvideos.com
|
||||||
- "discord.com"
|
- xnxx.com
|
||||||
- "github.com"
|
- redtube.com
|
||||||
```
|
- youporn.com
|
||||||
|
|
||||||
4. **Validate your configuration:**
|
|
||||||
```bash
|
|
||||||
python -m guardden.cli.config guild validate 123456789012345678
|
|
||||||
```
|
|
||||||
|
|
||||||
5. **Start the bot** (configurations auto-reload):
|
|
||||||
```bash
|
|
||||||
python -m guardden
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Configuration Management CLI
|
|
||||||
|
|
||||||
**Guild Management:**
|
|
||||||
```bash
|
|
||||||
# List all configured servers
|
|
||||||
python -m guardden.cli.config guild list
|
|
||||||
|
|
||||||
# Create new server configuration
|
|
||||||
python -m guardden.cli.config guild create <guild_id> "Server Name"
|
|
||||||
|
|
||||||
# Edit specific settings
|
|
||||||
python -m guardden.cli.config guild edit <guild_id> ai_moderation.sensitivity 75
|
|
||||||
python -m guardden.cli.config guild edit <guild_id> ai_moderation.nsfw_only_filtering true
|
|
||||||
|
|
||||||
# Validate configurations
|
|
||||||
python -m guardden.cli.config guild validate
|
|
||||||
python -m guardden.cli.config guild validate <guild_id>
|
|
||||||
|
|
||||||
# Backup configuration
|
|
||||||
python -m guardden.cli.config guild backup <guild_id>
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Migration from Discord Commands:**
|
### Key Configuration Options
|
||||||
```bash
|
|
||||||
# Export existing Discord command settings to files
|
|
||||||
python -m guardden.cli.config migrate from-database
|
|
||||||
|
|
||||||
# Verify migration was successful
|
**AI Moderation (NSFW Image Detection):**
|
||||||
python -m guardden.cli.config migrate verify
|
- `sensitivity`: 0-100 scale (higher = stricter detection)
|
||||||
```
|
- `nsfw_only_filtering`: Only flag sexual content (violence/harassment allowed)
|
||||||
|
- `max_checks_per_hour_per_guild`: Cost control - limits AI API calls
|
||||||
|
- `check_embed_images`: Whether to analyze Discord GIF embeds
|
||||||
|
|
||||||
**Wordlist Management:**
|
**Spam Detection:**
|
||||||
```bash
|
- `message_rate_limit`: Max messages allowed per window
|
||||||
# View wordlist status
|
- `duplicate_threshold`: How many duplicate messages trigger action
|
||||||
python -m guardden.cli.config wordlist info
|
- `mention_limit`: Max @mentions allowed per message
|
||||||
|
|
||||||
# View available templates
|
**Cost Controls:**
|
||||||
python -m guardden.cli.config template info
|
The bot includes multiple layers of cost control:
|
||||||
```
|
- Rate limiting (25 AI checks/hour/guild, 5/hour/user by default)
|
||||||
|
- Image deduplication (tracks last 1000 analyzed messages)
|
||||||
#### Key Configuration Options
|
- File size limits (skip images > 3MB)
|
||||||
|
- Max images per message (analyze max 2 images)
|
||||||
**AI Moderation Settings:**
|
- Optional embed checking (disable to save costs)
|
||||||
```yaml
|
|
||||||
ai_moderation:
|
|
||||||
enabled: true # Enable AI content analysis
|
|
||||||
sensitivity: 80 # 0-100 scale (higher = stricter)
|
|
||||||
confidence_threshold: 0.7 # 0.0-1.0 confidence required
|
|
||||||
nsfw_only_filtering: true # true = only sexual content (DEFAULT), false = all content
|
|
||||||
log_only: false # true = log only, false = take action
|
|
||||||
|
|
||||||
notifications:
|
|
||||||
send_in_channel_warnings: false # Send temporary PUBLIC channel messages when DMs fail (DEFAULT: false)
|
|
||||||
```
|
|
||||||
|
|
||||||
**NSFW-Only Filtering Guide (Default: Enabled):**
|
|
||||||
- `true` = Only block sexual/nude content, allow violence and other content types **(DEFAULT)**
|
|
||||||
- `false` = Block ALL inappropriate content (sexual, violence, harassment, hate speech)
|
|
||||||
|
|
||||||
**Public In-Channel Warnings (Default: Disabled):**
|
|
||||||
- **IMPORTANT**: These messages are PUBLIC and visible to everyone in the channel, NOT private
|
|
||||||
- When enabled and a user has DMs disabled, sends a temporary public message in the channel
|
|
||||||
- Messages auto-delete after 10 seconds to minimize clutter
|
|
||||||
- **Privacy Warning**: The user's violation and reason will be visible to all users for 10 seconds
|
|
||||||
- Set to `true` only if you prefer public transparency over privacy
|
|
||||||
|
|
||||||
**Automod Configuration:**
|
|
||||||
```yaml
|
|
||||||
automod:
|
|
||||||
message_rate_limit: 5 # Max messages per time window
|
|
||||||
message_rate_window: 5 # Time window in seconds
|
|
||||||
duplicate_threshold: 3 # Duplicate messages to trigger
|
|
||||||
scam_allowlist: # Domains that bypass scam detection
|
|
||||||
- "discord.com"
|
|
||||||
- "github.com"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Banned Words Management:**
|
|
||||||
Edit `config/wordlists/banned-words.yml`:
|
|
||||||
```yaml
|
|
||||||
global_patterns:
|
|
||||||
- pattern: "badword"
|
|
||||||
action: delete
|
|
||||||
is_regex: false
|
|
||||||
category: profanity
|
|
||||||
|
|
||||||
guild_patterns:
|
|
||||||
123456789: # Specific server overrides
|
|
||||||
- pattern: "server-specific-rule"
|
|
||||||
action: warn
|
|
||||||
override_global: false
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Hot-Reloading
|
|
||||||
|
|
||||||
Configuration changes are automatically detected and applied without restarting the bot:
|
|
||||||
- ✅ Edit YAML files directly
|
|
||||||
- ✅ Changes apply within seconds
|
|
||||||
- ✅ Invalid configs are rejected with error logs
|
|
||||||
- ✅ Automatic rollback on errors
|
|
||||||
|
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
|
||||||
| Variable | Description | Default |
|
| Variable | Description | Default |
|
||||||
|----------|-------------|---------|
|
|----------|-------------|---------|
|
||||||
| `GUARDDEN_DISCORD_TOKEN` | Your Discord bot token | Required |
|
| `GUARDDEN_DISCORD_TOKEN` | Your Discord bot token | **Required** |
|
||||||
| `GUARDDEN_DISCORD_PREFIX` | Default command prefix | `!` |
|
|
||||||
| `GUARDDEN_ALLOWED_GUILDS` | Comma-separated guild allowlist | (empty = all) |
|
|
||||||
| `GUARDDEN_OWNER_IDS` | Comma-separated owner user IDs | (empty = admins) |
|
|
||||||
| `GUARDDEN_DATABASE_URL` | PostgreSQL connection URL | `postgresql://guardden:guardden@localhost:5432/guardden` |
|
| `GUARDDEN_DATABASE_URL` | PostgreSQL connection URL | `postgresql://guardden:guardden@localhost:5432/guardden` |
|
||||||
| `GUARDDEN_LOG_LEVEL` | Logging level | `INFO` |
|
| `GUARDDEN_LOG_LEVEL` | Logging level | `INFO` |
|
||||||
| `GUARDDEN_AI_PROVIDER` | AI provider (anthropic/openai/none) | `none` |
|
| `GUARDDEN_AI_PROVIDER` | AI provider (anthropic/openai/none) | `none` |
|
||||||
| `GUARDDEN_ANTHROPIC_API_KEY` | Anthropic API key (if using Claude) | - |
|
| `GUARDDEN_ANTHROPIC_API_KEY` | Anthropic API key (if using Claude) | - |
|
||||||
| `GUARDDEN_OPENAI_API_KEY` | OpenAI API key (if using GPT) | - |
|
| `GUARDDEN_OPENAI_API_KEY` | OpenAI API key (if using GPT) | - |
|
||||||
| `GUARDDEN_WORDLIST_ENABLED` | Enable managed wordlist sync | `true` |
|
|
||||||
| `GUARDDEN_WORDLIST_UPDATE_HOURS` | Managed wordlist sync interval | `168` |
|
|
||||||
| `GUARDDEN_WORDLIST_SOURCES` | JSON array of wordlist sources | (empty = defaults) |
|
|
||||||
|
|
||||||
### Per-Guild Settings
|
## Owner Commands
|
||||||
|
|
||||||
Each server can be configured via YAML files in `config/guilds/`:
|
GuardDen includes a minimal set of owner-only commands for bot management:
|
||||||
|
|
||||||
**General Settings:**
|
|
||||||
- Command prefix and locale
|
|
||||||
- Channel IDs (log, moderation, welcome)
|
|
||||||
- Role IDs (mute, verified, moderator)
|
|
||||||
|
|
||||||
**Content Moderation:**
|
|
||||||
- AI moderation (enabled, sensitivity, NSFW-only mode)
|
|
||||||
- Automod thresholds and rate limits
|
|
||||||
- Banned words and domain allowlists
|
|
||||||
- Strike system and escalation actions
|
|
||||||
|
|
||||||
**Member Verification:**
|
|
||||||
- Verification challenges (button, captcha, math, emoji)
|
|
||||||
- Auto-role assignment
|
|
||||||
|
|
||||||
**All settings support hot-reloading** - edit files and changes apply immediately!
|
|
||||||
|
|
||||||
## Commands
|
|
||||||
|
|
||||||
> **Note:** Configuration commands (`!config`, `!ai`, `!automod`, etc.) have been replaced with file-based configuration. See the [Configuration](#configuration) section above for managing settings via YAML files and the CLI tool.
|
|
||||||
|
|
||||||
### Moderation
|
|
||||||
|
|
||||||
| Command | Permission | Description |
|
|
||||||
|---------|------------|-------------|
|
|
||||||
| `!warn <user> [reason]` | Kick Members | Warn a user |
|
|
||||||
| `!strike <user> [points] [reason]` | Kick Members | Add strikes to a user |
|
|
||||||
| `!strikes <user>` | Kick Members | View user's strikes |
|
|
||||||
| `!timeout <user> <duration> [reason]` | Moderate Members | Timeout a user (e.g., 1h, 30m, 7d) |
|
|
||||||
| `!untimeout <user>` | Moderate Members | Remove timeout |
|
|
||||||
| `!kick <user> [reason]` | Kick Members | Kick a user |
|
|
||||||
| `!ban <user> [reason]` | Ban Members | Ban a user |
|
|
||||||
| `!unban <user_id> [reason]` | Ban Members | Unban a user by ID |
|
|
||||||
| `!purge <amount>` | Manage Messages | Delete multiple messages (max 100) |
|
|
||||||
| `!modlogs <user>` | Kick Members | View moderation history |
|
|
||||||
|
|
||||||
### Configuration Management
|
|
||||||
|
|
||||||
Configuration is now managed via **YAML files** instead of Discord commands. Use the CLI tool:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Configuration Management CLI
|
|
||||||
python -m guardden.cli.config guild create <guild_id> "Server Name"
|
|
||||||
python -m guardden.cli.config guild list
|
|
||||||
python -m guardden.cli.config guild edit <guild_id> <setting> <value>
|
|
||||||
python -m guardden.cli.config guild validate [guild_id]
|
|
||||||
|
|
||||||
# Migration from old Discord commands
|
|
||||||
python -m guardden.cli.config migrate from-database
|
|
||||||
python -m guardden.cli.config migrate verify
|
|
||||||
|
|
||||||
# Wordlist management
|
|
||||||
python -m guardden.cli.config wordlist info
|
|
||||||
```
|
|
||||||
|
|
||||||
**Read-only Status Commands (Still Available):**
|
|
||||||
|
|
||||||
| Command | Description |
|
| Command | Description |
|
||||||
|---------|-------------|
|
|---------|-------------|
|
||||||
| `!config` | View current configuration (read-only) |
|
| `!status` | Show bot status (uptime, guilds, latency, AI provider) |
|
||||||
| `!ai` | View AI moderation settings (read-only) |
|
| `!reload` | Reload all cogs |
|
||||||
| `!automod` | View automod status (read-only) |
|
| `!ping` | Check bot latency |
|
||||||
| `!bannedwords` | List banned words (read-only) |
|
|
||||||
|
|
||||||
**Configuration Examples:**
|
**Note:** All configuration is done via the `config.yml` file. There are no in-Discord configuration commands.
|
||||||
|
|
||||||
```bash
|
|
||||||
# Set AI sensitivity to 75 (0-100 scale)
|
|
||||||
python -m guardden.cli.config guild edit 123456789 ai_moderation.sensitivity 75
|
|
||||||
|
|
||||||
# Enable NSFW-only filtering (only block sexual content)
|
|
||||||
python -m guardden.cli.config guild edit 123456789 ai_moderation.nsfw_only_filtering true
|
|
||||||
|
|
||||||
# Add domain to scam allowlist
|
|
||||||
Edit config/wordlists/domain-allowlists.yml
|
|
||||||
|
|
||||||
# Add banned word pattern
|
|
||||||
Edit config/wordlists/banned-words.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
### Whitelist Management (Admin only)
|
|
||||||
|
|
||||||
| Command | Description |
|
|
||||||
|---------|-------------|
|
|
||||||
| `!whitelist` | View all whitelisted users |
|
|
||||||
| `!whitelist add @user` | Add a user to the whitelist (bypasses all moderation) |
|
|
||||||
| `!whitelist remove @user` | Remove a user from the whitelist |
|
|
||||||
| `!whitelist clear` | Clear the entire whitelist |
|
|
||||||
|
|
||||||
**What is the whitelist?**
|
|
||||||
- Whitelisted users bypass **ALL** moderation checks (automod and AI moderation)
|
|
||||||
- Useful for trusted members, bots, or staff who need to post content that might trigger filters
|
|
||||||
- Users with "Manage Messages" permission are already exempt from moderation
|
|
||||||
|
|
||||||
### Diagnostics (Admin only)
|
|
||||||
|
|
||||||
| Command | Description |
|
|
||||||
|---------|-------------|
|
|
||||||
| `!health` | Check database and AI provider status |
|
|
||||||
|
|
||||||
### Verification (Admin only)
|
|
||||||
|
|
||||||
| Command | Description |
|
|
||||||
|---------|-------------|
|
|
||||||
| `!verify` | Request verification (for users) |
|
|
||||||
| `!verify setup` | View verification setup status |
|
|
||||||
| `!verify enable` | Enable verification for new members |
|
|
||||||
| `!verify disable` | Disable verification |
|
|
||||||
| `!verify role @role` | Set the verified role |
|
|
||||||
| `!verify type <type>` | Set verification type (button/captcha/math/emoji) |
|
|
||||||
| `!verify test [type]` | Test a verification challenge |
|
|
||||||
| `!verify reset @user` | Reset verification for a user |
|
|
||||||
|
|
||||||
## CI (Gitea Actions)
|
|
||||||
|
|
||||||
Workflows live under `.gitea/workflows/` and mirror the previous GitHub Actions
|
|
||||||
pipeline for linting, tests, and Docker builds.
|
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
@@ -447,167 +199,55 @@ guardden/
|
|||||||
│ ├── bot.py # Main bot class
|
│ ├── bot.py # Main bot class
|
||||||
│ ├── config.py # Settings management
|
│ ├── config.py # Settings management
|
||||||
│ ├── cogs/ # Discord command groups
|
│ ├── cogs/ # Discord command groups
|
||||||
│ │ ├── admin.py # Configuration commands (read-only)
|
│ │ ├── automod.py # Spam detection
|
||||||
│ │ ├── ai_moderation.py # AI-powered moderation
|
│ │ ├── ai_moderation.py # NSFW image detection
|
||||||
│ │ ├── automod.py # Automatic moderation
|
│ │ └── owner.py # Owner commands
|
||||||
│ │ ├── events.py # Event logging
|
|
||||||
│ │ ├── moderation.py # Moderation commands
|
|
||||||
│ │ └── verification.py # Member verification
|
|
||||||
│ ├── models/ # Database models
|
│ ├── models/ # Database models
|
||||||
│ │ ├── guild.py # Guild settings, banned words
|
│ │ └── guild.py # Guild settings
|
||||||
│ │ └── moderation.py # Logs, strikes, notes
|
|
||||||
│ ├── services/ # Business logic
|
│ ├── services/ # Business logic
|
||||||
│ │ ├── ai/ # AI provider implementations
|
│ │ ├── ai/ # AI provider implementations
|
||||||
│ │ ├── automod.py # Content filtering
|
│ │ ├── automod.py # Spam detection logic
|
||||||
|
│ │ ├── config_loader.py # YAML config loading
|
||||||
|
│ │ ├── ai_rate_limiter.py # AI cost control
|
||||||
│ │ ├── database.py # DB connections
|
│ │ ├── database.py # DB connections
|
||||||
│ │ ├── guild_config.py # Config caching
|
│ │ └── guild_config.py # Config caching
|
||||||
│ │ ├── file_config.py # File-based configuration system
|
│ └── __main__.py # Entry point
|
||||||
│ │ ├── config_migration.py # Migration from DB to files
|
├── config.yml # Bot configuration
|
||||||
│ │ ├── ratelimit.py # Rate limiting
|
|
||||||
│ │ └── verification.py # Verification challenges
|
|
||||||
│ └── cli/ # Command-line tools
|
|
||||||
│ └── config.py # Configuration management CLI
|
|
||||||
├── config/ # File-based configuration
|
|
||||||
│ ├── guilds/ # Per-server configuration files
|
|
||||||
│ ├── wordlists/ # Banned words and allowlists
|
|
||||||
│ ├── schemas/ # Configuration validation schemas
|
|
||||||
│ └── templates/ # Configuration templates
|
|
||||||
├── tests/ # Test suite
|
├── tests/ # Test suite
|
||||||
├── migrations/ # Database migrations
|
├── migrations/ # Database migrations
|
||||||
├── docker-compose.yml # Docker deployment
|
├── docker-compose.yml # Docker deployment
|
||||||
├── pyproject.toml # Dependencies
|
├── pyproject.toml # Dependencies
|
||||||
├── README.md # This file
|
└── README.md # This file
|
||||||
└── MIGRATION.md # Migration guide for file-based config
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Verification System
|
## How It Works
|
||||||
|
|
||||||
GuardDen includes a verification system to protect your server from bots and raids.
|
### Spam Detection
|
||||||
|
1. Bot monitors message rate per user
|
||||||
|
2. Detects duplicate messages
|
||||||
|
3. Counts @mentions (mass mention detection)
|
||||||
|
4. Violations result in message deletion + timeout
|
||||||
|
|
||||||
### Challenge Types
|
### NSFW Image Detection
|
||||||
|
1. Bot checks attachments and embeds for images
|
||||||
|
2. Applies rate limiting and deduplication
|
||||||
|
3. Downloads image and sends to AI provider
|
||||||
|
4. AI analyzes for NSFW content categories
|
||||||
|
5. Violations result in message deletion + timeout
|
||||||
|
6. Optionally checks known NSFW video domain links
|
||||||
|
|
||||||
| Type | Description |
|
### Cost Management
|
||||||
|------|-------------|
|
The bot includes aggressive cost controls for AI usage:
|
||||||
| `button` | Simple button click (default, easiest) |
|
- **Rate Limiting**: 25 checks/hour/guild, 5/hour/user (configurable)
|
||||||
| `captcha` | Text-based captcha code entry |
|
- **Deduplication**: Skips recently analyzed message IDs
|
||||||
| `math` | Solve a simple math problem |
|
- **File Size Limits**: Skips images larger than 3MB
|
||||||
| `emoji` | Select the correct emoji from options |
|
- **Max Images**: Analyzes max 2 images per message
|
||||||
|
- **Optional Features**: Embed checking, video thumbnails, URL downloads all controllable
|
||||||
|
|
||||||
### Setup
|
**Estimated Costs** (with defaults):
|
||||||
|
- Small server (< 100 users): ~$5-10/month
|
||||||
1. Create a verified role in your server
|
- Medium server (100-500 users): ~$15-25/month
|
||||||
2. Configure the role permissions (verified members get full access)
|
- Large server (500+ users): Consider increasing rate limits or disabling embeds
|
||||||
3. Set up verification:
|
|
||||||
```
|
|
||||||
!verify role @Verified
|
|
||||||
!verify type captcha
|
|
||||||
!verify enable
|
|
||||||
```
|
|
||||||
|
|
||||||
### How It Works
|
|
||||||
|
|
||||||
1. New member joins the server
|
|
||||||
2. Bot sends verification challenge via DM (or channel if DMs disabled)
|
|
||||||
3. Member completes the challenge
|
|
||||||
4. Bot assigns the verified role
|
|
||||||
5. Member gains access to the server
|
|
||||||
|
|
||||||
## AI Moderation
|
|
||||||
|
|
||||||
GuardDen supports AI-powered content moderation using either Anthropic's Claude or OpenAI's GPT models.
|
|
||||||
|
|
||||||
### Setup
|
|
||||||
|
|
||||||
1. Set the AI provider in your environment:
|
|
||||||
```bash
|
|
||||||
GUARDDEN_AI_PROVIDER=anthropic # or "openai"
|
|
||||||
GUARDDEN_ANTHROPIC_API_KEY=sk-ant-... # if using Claude
|
|
||||||
GUARDDEN_OPENAI_API_KEY=sk-... # if using OpenAI
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Enable AI moderation per server:
|
|
||||||
```
|
|
||||||
!ai enable
|
|
||||||
!ai sensitivity 50 # 0=lenient, 100=strict
|
|
||||||
!ai nsfw true # Enable NSFW image detection
|
|
||||||
```
|
|
||||||
|
|
||||||
### Content Categories
|
|
||||||
|
|
||||||
The AI analyzes content for:
|
|
||||||
- **Harassment** - Personal attacks, bullying
|
|
||||||
- **Hate Speech** - Discrimination, slurs
|
|
||||||
- **Sexual Content** - Explicit material
|
|
||||||
- **Violence** - Threats, graphic content
|
|
||||||
- **Self-Harm** - Suicide/self-injury content
|
|
||||||
- **Scams** - Phishing, fraud attempts
|
|
||||||
- **Spam** - Promotional, low-quality content
|
|
||||||
|
|
||||||
### How It Works
|
|
||||||
|
|
||||||
1. Messages are analyzed by the AI provider
|
|
||||||
2. Results include confidence scores and severity ratings
|
|
||||||
3. Actions are taken based on guild sensitivity settings
|
|
||||||
4. All AI actions are logged to the mod log channel
|
|
||||||
|
|
||||||
### NSFW-Only Filtering Mode (Enabled by Default)
|
|
||||||
|
|
||||||
**Default Behavior:**
|
|
||||||
GuardDen is configured to only filter sexual/NSFW content by default. This allows communities to have mature discussions about violence, politics, and controversial topics while still maintaining a "safe for work" environment.
|
|
||||||
|
|
||||||
**When enabled (DEFAULT):**
|
|
||||||
- ✅ **Blocked:** Sexual content, nude images, explicit material
|
|
||||||
- ❌ **Allowed:** Violence, harassment, hate speech, self-harm content
|
|
||||||
|
|
||||||
**When disabled (strict mode):**
|
|
||||||
- ✅ **Blocked:** All inappropriate content categories
|
|
||||||
|
|
||||||
**To change to strict mode:**
|
|
||||||
```yaml
|
|
||||||
# Edit config/guilds/guild-<id>.yml
|
|
||||||
ai_moderation:
|
|
||||||
nsfw_only_filtering: false
|
|
||||||
```
|
|
||||||
|
|
||||||
This default is useful for:
|
|
||||||
- Gaming communities (violence in gaming discussions)
|
|
||||||
- Mature discussion servers (politics, news)
|
|
||||||
- Communities with specific content policies that allow violence but prohibit sexual material
|
|
||||||
|
|
||||||
### Public In-Channel Warnings (Disabled by Default)
|
|
||||||
|
|
||||||
**IMPORTANT PRIVACY NOTICE**: In-channel warnings are **PUBLIC** and visible to all users in the channel, NOT private messages. This is a Discord API limitation.
|
|
||||||
|
|
||||||
When enabled and users have DMs disabled, moderation warnings are sent as temporary public messages in the channel where the violation occurred.
|
|
||||||
|
|
||||||
**How it works:**
|
|
||||||
1. Bot tries to DM the user about the violation
|
|
||||||
2. If DM fails (user has DMs disabled):
|
|
||||||
- If `send_in_channel_warnings: true`: Sends a **PUBLIC** temporary message in the channel mentioning the user
|
|
||||||
- If `send_in_channel_warnings: false` (DEFAULT): Silent failure, no notification sent
|
|
||||||
- Message includes violation reason and any timeout information
|
|
||||||
- Message auto-deletes after 10 seconds
|
|
||||||
3. If DM succeeds, no channel message is sent
|
|
||||||
|
|
||||||
**To enable in-channel warnings:**
|
|
||||||
```yaml
|
|
||||||
# Edit config/guilds/guild-<id>.yml
|
|
||||||
notifications:
|
|
||||||
send_in_channel_warnings: true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Considerations:**
|
|
||||||
|
|
||||||
**Pros:**
|
|
||||||
- Users are always notified of moderation actions, even with DMs disabled
|
|
||||||
- Public transparency about what content is not allowed
|
|
||||||
- Educational for other members
|
|
||||||
|
|
||||||
**Cons:**
|
|
||||||
- **NOT PRIVATE** - Violation details visible to all users for 10 seconds
|
|
||||||
- May embarrass users publicly
|
|
||||||
- Could expose sensitive moderation information
|
|
||||||
- Privacy-conscious communities may prefer silent failures
|
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
@@ -638,15 +278,9 @@ MIT License - see LICENSE file for details.
|
|||||||
- **Documentation**: See `docs/` directory
|
- **Documentation**: See `docs/` directory
|
||||||
- **Configuration Help**: Check `CLAUDE.md` for developer guidance
|
- **Configuration Help**: Check `CLAUDE.md` for developer guidance
|
||||||
|
|
||||||
## Roadmap
|
## Future Considerations
|
||||||
|
|
||||||
- [x] AI-powered content moderation (Claude/OpenAI integration)
|
- [ ] Per-guild sensitivity settings (currently global)
|
||||||
- [x] NSFW image detection
|
- [ ] Slash commands
|
||||||
- [x] NSFW-only filtering mode (default)
|
- [ ] Custom NSFW category thresholds
|
||||||
- [x] Optional public in-channel warnings when DMs disabled
|
- [ ] Whitelist for trusted image sources
|
||||||
- [x] Verification/captcha system
|
|
||||||
- [x] Rate limiting
|
|
||||||
- [ ] Voice channel moderation
|
|
||||||
- [ ] Slash commands with true ephemeral messages
|
|
||||||
- [ ] Custom notification templates
|
|
||||||
- [ ] Advanced analytics dashboard
|
|
||||||
|
|||||||
214
migrations/versions/20260127_minimal_bot_cleanup.py
Normal file
214
migrations/versions/20260127_minimal_bot_cleanup.py
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
"""Minimal bot cleanup - remove unused tables and columns.
|
||||||
|
|
||||||
|
Revision ID: 20260127_minimal_bot_cleanup
|
||||||
|
Revises: 20260125_add_whitelist
|
||||||
|
Create Date: 2026-01-27 00:00:00.000000
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sqlalchemy as sa
|
||||||
|
from alembic import op
|
||||||
|
from sqlalchemy.dialects import postgresql
|
||||||
|
|
||||||
|
# revision identifiers, used by Alembic.
|
||||||
|
revision = "20260127_minimal_bot_cleanup"
|
||||||
|
down_revision = "20260125_add_whitelist"
|
||||||
|
branch_labels = None
|
||||||
|
depends_on = None
|
||||||
|
|
||||||
|
|
||||||
|
def upgrade() -> None:
|
||||||
|
"""Remove tables and columns not needed for minimal bot."""
|
||||||
|
# Drop unused tables
|
||||||
|
op.drop_table("user_activity")
|
||||||
|
op.drop_table("message_activity")
|
||||||
|
op.drop_table("ai_checks")
|
||||||
|
op.drop_table("banned_words")
|
||||||
|
op.drop_table("user_notes")
|
||||||
|
op.drop_table("strikes")
|
||||||
|
op.drop_table("moderation_logs")
|
||||||
|
|
||||||
|
# Drop unused columns from guild_settings
|
||||||
|
op.drop_column("guild_settings", "verification_enabled")
|
||||||
|
op.drop_column("guild_settings", "verification_type")
|
||||||
|
op.drop_column("guild_settings", "verified_role_id")
|
||||||
|
op.drop_column("guild_settings", "strike_actions")
|
||||||
|
op.drop_column("guild_settings", "mute_role_id")
|
||||||
|
op.drop_column("guild_settings", "mod_role_ids")
|
||||||
|
op.drop_column("guild_settings", "welcome_channel_id")
|
||||||
|
op.drop_column("guild_settings", "whitelisted_user_ids")
|
||||||
|
op.drop_column("guild_settings", "scam_allowlist")
|
||||||
|
op.drop_column("guild_settings", "send_in_channel_warnings")
|
||||||
|
op.drop_column("guild_settings", "ai_log_only")
|
||||||
|
op.drop_column("guild_settings", "ai_confidence_threshold")
|
||||||
|
op.drop_column("guild_settings", "log_channel_id")
|
||||||
|
op.drop_column("guild_settings", "mod_log_channel_id")
|
||||||
|
op.drop_column("guild_settings", "link_filter_enabled")
|
||||||
|
|
||||||
|
|
||||||
|
def downgrade() -> None:
|
||||||
|
"""Restore removed tables and columns (WARNING: Data will be lost!)."""
|
||||||
|
# Restore guild_settings columns
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("link_filter_enabled", sa.Boolean, nullable=False, default=False),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("mod_log_channel_id", sa.BigInteger, nullable=True),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("log_channel_id", sa.BigInteger, nullable=True),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("ai_confidence_threshold", sa.Float, nullable=False, default=0.7),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("ai_log_only", sa.Boolean, nullable=False, default=False),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("send_in_channel_warnings", sa.Boolean, nullable=False, default=False),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column(
|
||||||
|
"scam_allowlist",
|
||||||
|
postgresql.JSONB().with_variant(sa.JSON(), "sqlite"),
|
||||||
|
nullable=False,
|
||||||
|
default=list,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column(
|
||||||
|
"whitelisted_user_ids",
|
||||||
|
postgresql.JSONB().with_variant(sa.JSON(), "sqlite"),
|
||||||
|
nullable=False,
|
||||||
|
default=list,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("welcome_channel_id", sa.BigInteger, nullable=True),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column(
|
||||||
|
"mod_role_ids",
|
||||||
|
postgresql.JSONB().with_variant(sa.JSON(), "sqlite"),
|
||||||
|
nullable=False,
|
||||||
|
default=list,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("mute_role_id", sa.BigInteger, nullable=True),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column(
|
||||||
|
"strike_actions",
|
||||||
|
postgresql.JSONB().with_variant(sa.JSON(), "sqlite"),
|
||||||
|
nullable=False,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("verified_role_id", sa.BigInteger, nullable=True),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("verification_type", sa.String(20), nullable=False, default="button"),
|
||||||
|
)
|
||||||
|
op.add_column(
|
||||||
|
"guild_settings",
|
||||||
|
sa.Column("verification_enabled", sa.Boolean, nullable=False, default=False),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Restore tables (empty, data lost)
|
||||||
|
op.create_table(
|
||||||
|
"moderation_logs",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("moderator_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("action", sa.String(20), nullable=False),
|
||||||
|
sa.Column("reason", sa.Text, nullable=True),
|
||||||
|
sa.Column("created_at", sa.DateTime, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"strikes",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("reason", sa.Text, nullable=True),
|
||||||
|
sa.Column("created_at", sa.DateTime, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"user_notes",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("moderator_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("note", sa.Text, nullable=False),
|
||||||
|
sa.Column("created_at", sa.DateTime, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"banned_words",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("pattern", sa.Text, nullable=False),
|
||||||
|
sa.Column("is_regex", sa.Boolean, nullable=False, default=False),
|
||||||
|
sa.Column("action", sa.String(20), nullable=False, default="delete"),
|
||||||
|
sa.Column("reason", sa.Text, nullable=True),
|
||||||
|
sa.Column("source", sa.String(100), nullable=True),
|
||||||
|
sa.Column("category", sa.String(20), nullable=True),
|
||||||
|
sa.Column("managed", sa.Boolean, nullable=False, default=False),
|
||||||
|
sa.Column("added_by", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("created_at", sa.DateTime, nullable=False),
|
||||||
|
sa.Column("updated_at", sa.DateTime, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"ai_checks",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("message_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("check_type", sa.String(20), nullable=False),
|
||||||
|
sa.Column("flagged", sa.Boolean, nullable=False),
|
||||||
|
sa.Column("confidence", sa.Float, nullable=False),
|
||||||
|
sa.Column("created_at", sa.DateTime, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"message_activity",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("channel_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("message_count", sa.Integer, nullable=False),
|
||||||
|
sa.Column("date", sa.Date, nullable=False),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
|
|
||||||
|
op.create_table(
|
||||||
|
"user_activity",
|
||||||
|
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
|
||||||
|
sa.Column("guild_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("user_id", sa.BigInteger, nullable=False),
|
||||||
|
sa.Column("last_seen", sa.DateTime, nullable=False),
|
||||||
|
sa.Column("message_count", sa.Integer, nullable=False, default=0),
|
||||||
|
sa.ForeignKeyConstraint(["guild_id"], ["guilds.id"], ondelete="CASCADE"),
|
||||||
|
)
|
||||||
@@ -1,17 +1,10 @@
|
|||||||
"""Guild-related database models."""
|
"""Guild-related database models."""
|
||||||
|
|
||||||
from datetime import datetime
|
from sqlalchemy import Boolean, ForeignKey, Integer, String
|
||||||
from typing import TYPE_CHECKING
|
|
||||||
|
|
||||||
from sqlalchemy import JSON, Boolean, Float, ForeignKey, Integer, String, Text
|
|
||||||
from sqlalchemy.dialects.postgresql import JSONB
|
|
||||||
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
||||||
|
|
||||||
from guardden.models.base import Base, SnowflakeID, TimestampMixin
|
from guardden.models.base import Base, SnowflakeID, TimestampMixin
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
|
||||||
from guardden.models.moderation import ModerationLog, Strike
|
|
||||||
|
|
||||||
|
|
||||||
class Guild(Base, TimestampMixin):
|
class Guild(Base, TimestampMixin):
|
||||||
"""Represents a Discord guild (server) configuration."""
|
"""Represents a Discord guild (server) configuration."""
|
||||||
@@ -27,15 +20,6 @@ class Guild(Base, TimestampMixin):
|
|||||||
settings: Mapped["GuildSettings"] = relationship(
|
settings: Mapped["GuildSettings"] = relationship(
|
||||||
back_populates="guild", uselist=False, cascade="all, delete-orphan"
|
back_populates="guild", uselist=False, cascade="all, delete-orphan"
|
||||||
)
|
)
|
||||||
banned_words: Mapped[list["BannedWord"]] = relationship(
|
|
||||||
back_populates="guild", cascade="all, delete-orphan"
|
|
||||||
)
|
|
||||||
moderation_logs: Mapped[list["ModerationLog"]] = relationship(
|
|
||||||
back_populates="guild", cascade="all, delete-orphan"
|
|
||||||
)
|
|
||||||
strikes: Mapped[list["Strike"]] = relationship(
|
|
||||||
back_populates="guild", cascade="all, delete-orphan"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class GuildSettings(Base, TimestampMixin):
|
class GuildSettings(Base, TimestampMixin):
|
||||||
@@ -51,94 +35,21 @@ class GuildSettings(Base, TimestampMixin):
|
|||||||
prefix: Mapped[str] = mapped_column(String(10), default="!", nullable=False)
|
prefix: Mapped[str] = mapped_column(String(10), default="!", nullable=False)
|
||||||
locale: Mapped[str] = mapped_column(String(10), default="en", nullable=False)
|
locale: Mapped[str] = mapped_column(String(10), default="en", nullable=False)
|
||||||
|
|
||||||
# Channel configuration (stored as snowflake IDs)
|
# Spam detection settings
|
||||||
log_channel_id: Mapped[int | None] = mapped_column(SnowflakeID, nullable=True)
|
|
||||||
mod_log_channel_id: Mapped[int | None] = mapped_column(SnowflakeID, nullable=True)
|
|
||||||
welcome_channel_id: Mapped[int | None] = mapped_column(SnowflakeID, nullable=True)
|
|
||||||
|
|
||||||
# Role configuration
|
|
||||||
mute_role_id: Mapped[int | None] = mapped_column(SnowflakeID, nullable=True)
|
|
||||||
verified_role_id: Mapped[int | None] = mapped_column(SnowflakeID, nullable=True)
|
|
||||||
mod_role_ids: Mapped[dict] = mapped_column(
|
|
||||||
JSONB().with_variant(JSON(), "sqlite"), default=list, nullable=False
|
|
||||||
)
|
|
||||||
|
|
||||||
# Moderation settings
|
|
||||||
automod_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
automod_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
||||||
anti_spam_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
anti_spam_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
||||||
link_filter_enabled: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
|
|
||||||
# Automod thresholds
|
|
||||||
message_rate_limit: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
message_rate_limit: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
||||||
message_rate_window: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
message_rate_window: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
||||||
duplicate_threshold: Mapped[int] = mapped_column(Integer, default=3, nullable=False)
|
duplicate_threshold: Mapped[int] = mapped_column(Integer, default=3, nullable=False)
|
||||||
mention_limit: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
mention_limit: Mapped[int] = mapped_column(Integer, default=5, nullable=False)
|
||||||
mention_rate_limit: Mapped[int] = mapped_column(Integer, default=10, nullable=False)
|
mention_rate_limit: Mapped[int] = mapped_column(Integer, default=10, nullable=False)
|
||||||
mention_rate_window: Mapped[int] = mapped_column(Integer, default=60, nullable=False)
|
mention_rate_window: Mapped[int] = mapped_column(Integer, default=60, nullable=False)
|
||||||
scam_allowlist: Mapped[list[str]] = mapped_column(
|
|
||||||
JSONB().with_variant(JSON(), "sqlite"), default=list, nullable=False
|
|
||||||
)
|
|
||||||
|
|
||||||
# Strike thresholds (actions at each threshold)
|
|
||||||
strike_actions: Mapped[dict] = mapped_column(
|
|
||||||
JSONB().with_variant(JSON(), "sqlite"),
|
|
||||||
default=lambda: {
|
|
||||||
"1": {"action": "warn"},
|
|
||||||
"3": {"action": "timeout", "duration": 300},
|
|
||||||
"5": {"action": "kick"},
|
|
||||||
"7": {"action": "ban"},
|
|
||||||
},
|
|
||||||
nullable=False,
|
|
||||||
)
|
|
||||||
|
|
||||||
# AI moderation settings
|
# AI moderation settings
|
||||||
ai_moderation_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
ai_moderation_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
||||||
ai_sensitivity: Mapped[int] = mapped_column(Integer, default=80, nullable=False) # 0-100 scale
|
ai_sensitivity: Mapped[int] = mapped_column(Integer, default=80, nullable=False)
|
||||||
ai_confidence_threshold: Mapped[float] = mapped_column(Float, default=0.7, nullable=False)
|
|
||||||
ai_log_only: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
nsfw_detection_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
nsfw_detection_enabled: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
||||||
nsfw_only_filtering: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
nsfw_only_filtering: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
|
||||||
|
|
||||||
# Notification settings
|
|
||||||
send_in_channel_warnings: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
|
|
||||||
# Whitelist settings
|
|
||||||
whitelisted_user_ids: Mapped[list[int]] = mapped_column(
|
|
||||||
JSONB().with_variant(JSON(), "sqlite"), default=list, nullable=False
|
|
||||||
)
|
|
||||||
|
|
||||||
# Verification settings
|
|
||||||
verification_enabled: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
verification_type: Mapped[str] = mapped_column(
|
|
||||||
String(20), default="button", nullable=False
|
|
||||||
) # button, captcha, questions
|
|
||||||
|
|
||||||
# Relationship
|
# Relationship
|
||||||
guild: Mapped["Guild"] = relationship(back_populates="settings")
|
guild: Mapped["Guild"] = relationship(back_populates="settings")
|
||||||
|
|
||||||
|
|
||||||
class BannedWord(Base, TimestampMixin):
|
|
||||||
"""Banned words/phrases for a guild with regex support."""
|
|
||||||
|
|
||||||
__tablename__ = "banned_words"
|
|
||||||
|
|
||||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
|
||||||
guild_id: Mapped[int] = mapped_column(
|
|
||||||
SnowflakeID, ForeignKey("guilds.id", ondelete="CASCADE"), nullable=False
|
|
||||||
)
|
|
||||||
|
|
||||||
pattern: Mapped[str] = mapped_column(Text, nullable=False)
|
|
||||||
is_regex: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
action: Mapped[str] = mapped_column(
|
|
||||||
String(20), default="delete", nullable=False
|
|
||||||
) # delete, warn, strike
|
|
||||||
reason: Mapped[str | None] = mapped_column(Text, nullable=True)
|
|
||||||
source: Mapped[str | None] = mapped_column(String(100), nullable=True)
|
|
||||||
category: Mapped[str | None] = mapped_column(String(20), nullable=True)
|
|
||||||
managed: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
|
|
||||||
|
|
||||||
# Who added this and when
|
|
||||||
added_by: Mapped[int] = mapped_column(SnowflakeID, nullable=False)
|
|
||||||
|
|
||||||
# Relationship
|
|
||||||
guild: Mapped["Guild"] = relationship(back_populates="banned_words")
|
|
||||||
|
|||||||
@@ -3,41 +3,10 @@
|
|||||||
import logging
|
import logging
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from guardden.services.ai.base import (
|
from guardden.services.ai.base import AIProvider, ImageAnalysisResult, run_with_retries
|
||||||
AIProvider,
|
|
||||||
ImageAnalysisResult,
|
|
||||||
ModerationResult,
|
|
||||||
PhishingAnalysisResult,
|
|
||||||
parse_categories,
|
|
||||||
run_with_retries,
|
|
||||||
)
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
# Content moderation system prompt
|
|
||||||
MODERATION_SYSTEM_PROMPT = """You are a content moderation AI for a Discord server. Analyze the given message and determine if it violates community guidelines.
|
|
||||||
|
|
||||||
Categories to check:
|
|
||||||
- harassment: Personal attacks, bullying, intimidation
|
|
||||||
- hate_speech: Discrimination, slurs, dehumanization based on identity
|
|
||||||
- sexual: Explicit sexual content, sexual solicitation
|
|
||||||
- violence: Threats, graphic violence, encouraging harm
|
|
||||||
- self_harm: Suicide, self-injury content or encouragement
|
|
||||||
- spam: Repetitive, promotional, or low-quality content
|
|
||||||
- scam: Phishing attempts, fraudulent offers, impersonation
|
|
||||||
- misinformation: Dangerous false information
|
|
||||||
|
|
||||||
Respond in this exact JSON format:
|
|
||||||
{
|
|
||||||
"is_flagged": true/false,
|
|
||||||
"confidence": 0.0-1.0,
|
|
||||||
"categories": ["category1", "category2"],
|
|
||||||
"explanation": "Brief explanation",
|
|
||||||
"suggested_action": "none/warn/delete/timeout/ban"
|
|
||||||
}
|
|
||||||
|
|
||||||
Be balanced - flag genuinely problematic content but allow normal conversation, jokes, and mild language. Consider context."""
|
|
||||||
|
|
||||||
IMAGE_ANALYSIS_PROMPT = """Analyze this image for content moderation. Classify NSFW content with precision.
|
IMAGE_ANALYSIS_PROMPT = """Analyze this image for content moderation. Classify NSFW content with precision.
|
||||||
|
|
||||||
NSFW CATEGORIES (in order of severity):
|
NSFW CATEGORIES (in order of severity):
|
||||||
@@ -76,24 +45,6 @@ NSFW SEVERITY GUIDELINES:
|
|||||||
- nudity: 60-75
|
- nudity: 60-75
|
||||||
- explicit: 80-100"""
|
- explicit: 80-100"""
|
||||||
|
|
||||||
PHISHING_ANALYSIS_PROMPT = """Analyze this URL and message context for phishing or scam indicators.
|
|
||||||
|
|
||||||
Check for:
|
|
||||||
- Domain impersonation (typosquatting, lookalike domains)
|
|
||||||
- Urgency tactics ("act now", "limited time")
|
|
||||||
- Requests for credentials or personal info
|
|
||||||
- Too-good-to-be-true offers
|
|
||||||
- Suspicious redirects or URL shorteners
|
|
||||||
- Mismatched or hidden URLs
|
|
||||||
|
|
||||||
Respond in this exact JSON format:
|
|
||||||
{
|
|
||||||
"is_phishing": true/false,
|
|
||||||
"confidence": 0.0-1.0,
|
|
||||||
"risk_factors": ["factor1", "factor2"],
|
|
||||||
"explanation": "Brief explanation"
|
|
||||||
}"""
|
|
||||||
|
|
||||||
|
|
||||||
class AnthropicProvider(AIProvider):
|
class AnthropicProvider(AIProvider):
|
||||||
"""AI provider using Anthropic's Claude API."""
|
"""AI provider using Anthropic's Claude API."""
|
||||||
@@ -150,47 +101,6 @@ class AnthropicProvider(AIProvider):
|
|||||||
|
|
||||||
return json.loads(text)
|
return json.loads(text)
|
||||||
|
|
||||||
async def moderate_text(
|
|
||||||
self,
|
|
||||||
content: str,
|
|
||||||
context: str | None = None,
|
|
||||||
sensitivity: int = 50,
|
|
||||||
) -> ModerationResult:
|
|
||||||
"""Analyze text content for policy violations."""
|
|
||||||
# Adjust prompt based on sensitivity
|
|
||||||
sensitivity_note = ""
|
|
||||||
if sensitivity < 30:
|
|
||||||
sensitivity_note = "\n\nBe lenient - only flag clearly problematic content."
|
|
||||||
elif sensitivity > 70:
|
|
||||||
sensitivity_note = "\n\nBe strict - flag anything potentially problematic."
|
|
||||||
|
|
||||||
system = MODERATION_SYSTEM_PROMPT + sensitivity_note
|
|
||||||
|
|
||||||
user_message = f"Message to analyze:\n{content}"
|
|
||||||
if context:
|
|
||||||
user_message = f"Context: {context}\n\n{user_message}"
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = await self._call_api(system, user_message)
|
|
||||||
data = self._parse_json_response(response)
|
|
||||||
|
|
||||||
categories = parse_categories(data.get("categories", []))
|
|
||||||
|
|
||||||
return ModerationResult(
|
|
||||||
is_flagged=data.get("is_flagged", False),
|
|
||||||
confidence=float(data.get("confidence", 0.0)),
|
|
||||||
categories=categories,
|
|
||||||
explanation=data.get("explanation", ""),
|
|
||||||
suggested_action=data.get("suggested_action", "none"),
|
|
||||||
)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error moderating text: {e}")
|
|
||||||
return ModerationResult(
|
|
||||||
is_flagged=False,
|
|
||||||
explanation=f"Error analyzing content: {str(e)}",
|
|
||||||
)
|
|
||||||
|
|
||||||
async def analyze_image(
|
async def analyze_image(
|
||||||
self,
|
self,
|
||||||
image_url: str,
|
image_url: str,
|
||||||
@@ -276,31 +186,6 @@ SENSITIVITY: BALANCED
|
|||||||
logger.error(f"Error analyzing image: {e}")
|
logger.error(f"Error analyzing image: {e}")
|
||||||
return ImageAnalysisResult(description=f"Error analyzing image: {str(e)}")
|
return ImageAnalysisResult(description=f"Error analyzing image: {str(e)}")
|
||||||
|
|
||||||
async def analyze_phishing(
|
|
||||||
self,
|
|
||||||
url: str,
|
|
||||||
message_content: str | None = None,
|
|
||||||
) -> PhishingAnalysisResult:
|
|
||||||
"""Analyze a URL for phishing/scam indicators."""
|
|
||||||
user_message = f"URL to analyze: {url}"
|
|
||||||
if message_content:
|
|
||||||
user_message += f"\n\nFull message context:\n{message_content}"
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = await self._call_api(PHISHING_ANALYSIS_PROMPT, user_message)
|
|
||||||
data = self._parse_json_response(response)
|
|
||||||
|
|
||||||
return PhishingAnalysisResult(
|
|
||||||
is_phishing=data.get("is_phishing", False),
|
|
||||||
confidence=float(data.get("confidence", 0.0)),
|
|
||||||
risk_factors=data.get("risk_factors", []),
|
|
||||||
explanation=data.get("explanation", ""),
|
|
||||||
)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error analyzing phishing: {e}")
|
|
||||||
return PhishingAnalysisResult(explanation=f"Error analyzing URL: {str(e)}")
|
|
||||||
|
|
||||||
async def close(self) -> None:
|
async def close(self) -> None:
|
||||||
"""Clean up resources."""
|
"""Clean up resources."""
|
||||||
await self.client.close()
|
await self.client.close()
|
||||||
|
|||||||
@@ -91,53 +91,6 @@ async def run_with_retries(
|
|||||||
raise RuntimeError("Retry loop exited unexpectedly")
|
raise RuntimeError("Retry loop exited unexpectedly")
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class ModerationResult:
|
|
||||||
"""Result of AI content moderation."""
|
|
||||||
|
|
||||||
is_flagged: bool = False
|
|
||||||
confidence: float = 0.0 # 0.0 to 1.0
|
|
||||||
categories: list[ContentCategory] = field(default_factory=list)
|
|
||||||
explanation: str = ""
|
|
||||||
suggested_action: Literal["none", "warn", "delete", "timeout", "ban"] = "none"
|
|
||||||
severity_override: int | None = None # Direct severity for NSFW images
|
|
||||||
|
|
||||||
@property
|
|
||||||
def severity(self) -> int:
|
|
||||||
"""Get severity score 0-100 based on confidence and categories."""
|
|
||||||
if not self.is_flagged:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
# Use override if provided (e.g., from NSFW image analysis)
|
|
||||||
if self.severity_override is not None:
|
|
||||||
return min(self.severity_override, 100)
|
|
||||||
|
|
||||||
# Base severity from confidence
|
|
||||||
severity = int(self.confidence * 50)
|
|
||||||
|
|
||||||
# Add severity based on category
|
|
||||||
high_severity = {
|
|
||||||
ContentCategory.HATE_SPEECH,
|
|
||||||
ContentCategory.SELF_HARM,
|
|
||||||
ContentCategory.SCAM,
|
|
||||||
}
|
|
||||||
medium_severity = {
|
|
||||||
ContentCategory.HARASSMENT,
|
|
||||||
ContentCategory.VIOLENCE,
|
|
||||||
ContentCategory.SEXUAL,
|
|
||||||
}
|
|
||||||
|
|
||||||
for cat in self.categories:
|
|
||||||
if cat in high_severity:
|
|
||||||
severity += 30
|
|
||||||
elif cat in medium_severity:
|
|
||||||
severity += 20
|
|
||||||
else:
|
|
||||||
severity += 10
|
|
||||||
|
|
||||||
return min(severity, 100)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class ImageAnalysisResult:
|
class ImageAnalysisResult:
|
||||||
"""Result of AI image analysis."""
|
"""Result of AI image analysis."""
|
||||||
@@ -152,38 +105,8 @@ class ImageAnalysisResult:
|
|||||||
nsfw_severity: int = 0 # 0-100 specific NSFW severity score
|
nsfw_severity: int = 0 # 0-100 specific NSFW severity score
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class PhishingAnalysisResult:
|
|
||||||
"""Result of AI phishing/scam analysis."""
|
|
||||||
|
|
||||||
is_phishing: bool = False
|
|
||||||
confidence: float = 0.0
|
|
||||||
risk_factors: list[str] = field(default_factory=list)
|
|
||||||
explanation: str = ""
|
|
||||||
|
|
||||||
|
|
||||||
class AIProvider(ABC):
|
class AIProvider(ABC):
|
||||||
"""Abstract base class for AI providers."""
|
"""Abstract base class for AI providers - Image analysis only."""
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
async def moderate_text(
|
|
||||||
self,
|
|
||||||
content: str,
|
|
||||||
context: str | None = None,
|
|
||||||
sensitivity: int = 50,
|
|
||||||
) -> ModerationResult:
|
|
||||||
"""
|
|
||||||
Analyze text content for policy violations.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
content: The text to analyze
|
|
||||||
context: Optional context about the conversation/server
|
|
||||||
sensitivity: 0-100, higher means more strict
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
ModerationResult with analysis
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
async def analyze_image(
|
async def analyze_image(
|
||||||
@@ -203,24 +126,6 @@ class AIProvider(ABC):
|
|||||||
"""
|
"""
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
async def analyze_phishing(
|
|
||||||
self,
|
|
||||||
url: str,
|
|
||||||
message_content: str | None = None,
|
|
||||||
) -> PhishingAnalysisResult:
|
|
||||||
"""
|
|
||||||
Analyze a URL for phishing/scam indicators.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
url: The URL to analyze
|
|
||||||
message_content: Optional full message for context
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
PhishingAnalysisResult with analysis
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
async def close(self) -> None:
|
async def close(self) -> None:
|
||||||
"""Clean up resources."""
|
"""Clean up resources."""
|
||||||
|
|||||||
@@ -3,14 +3,7 @@
|
|||||||
import logging
|
import logging
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from guardden.services.ai.base import (
|
from guardden.services.ai.base import AIProvider, ImageAnalysisResult, run_with_retries
|
||||||
AIProvider,
|
|
||||||
ContentCategory,
|
|
||||||
ImageAnalysisResult,
|
|
||||||
ModerationResult,
|
|
||||||
PhishingAnalysisResult,
|
|
||||||
run_with_retries,
|
|
||||||
)
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -35,107 +28,12 @@ class OpenAIProvider(AIProvider):
|
|||||||
self.model = model
|
self.model = model
|
||||||
logger.info(f"Initialized OpenAI provider with model: {model}")
|
logger.info(f"Initialized OpenAI provider with model: {model}")
|
||||||
|
|
||||||
async def _call_api(
|
|
||||||
self,
|
|
||||||
system: str,
|
|
||||||
user_content: Any,
|
|
||||||
max_tokens: int = 500,
|
|
||||||
) -> str:
|
|
||||||
"""Make an API call to OpenAI."""
|
|
||||||
|
|
||||||
async def _request() -> str:
|
|
||||||
response = await self.client.chat.completions.create(
|
|
||||||
model=self.model,
|
|
||||||
max_tokens=max_tokens,
|
|
||||||
messages=[
|
|
||||||
{"role": "system", "content": system},
|
|
||||||
{"role": "user", "content": user_content},
|
|
||||||
],
|
|
||||||
response_format={"type": "json_object"},
|
|
||||||
)
|
|
||||||
return response.choices[0].message.content or ""
|
|
||||||
|
|
||||||
try:
|
|
||||||
return await run_with_retries(
|
|
||||||
_request,
|
|
||||||
logger=logger,
|
|
||||||
operation_name="OpenAI chat completion",
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"OpenAI API error: {e}")
|
|
||||||
raise
|
|
||||||
|
|
||||||
def _parse_json_response(self, response: str) -> dict:
|
def _parse_json_response(self, response: str) -> dict:
|
||||||
"""Parse JSON from response."""
|
"""Parse JSON from response."""
|
||||||
import json
|
import json
|
||||||
|
|
||||||
return json.loads(response)
|
return json.loads(response)
|
||||||
|
|
||||||
async def moderate_text(
|
|
||||||
self,
|
|
||||||
content: str,
|
|
||||||
context: str | None = None,
|
|
||||||
sensitivity: int = 50,
|
|
||||||
) -> ModerationResult:
|
|
||||||
"""Analyze text content for policy violations."""
|
|
||||||
# First, use OpenAI's built-in moderation API for quick check
|
|
||||||
try:
|
|
||||||
|
|
||||||
async def _moderate() -> Any:
|
|
||||||
return await self.client.moderations.create(input=content)
|
|
||||||
|
|
||||||
mod_response = await run_with_retries(
|
|
||||||
_moderate,
|
|
||||||
logger=logger,
|
|
||||||
operation_name="OpenAI moderation",
|
|
||||||
)
|
|
||||||
results = mod_response.results[0]
|
|
||||||
|
|
||||||
# Map OpenAI categories to our categories
|
|
||||||
category_mapping = {
|
|
||||||
"harassment": ContentCategory.HARASSMENT,
|
|
||||||
"harassment/threatening": ContentCategory.HARASSMENT,
|
|
||||||
"hate": ContentCategory.HATE_SPEECH,
|
|
||||||
"hate/threatening": ContentCategory.HATE_SPEECH,
|
|
||||||
"self-harm": ContentCategory.SELF_HARM,
|
|
||||||
"self-harm/intent": ContentCategory.SELF_HARM,
|
|
||||||
"self-harm/instructions": ContentCategory.SELF_HARM,
|
|
||||||
"sexual": ContentCategory.SEXUAL,
|
|
||||||
"sexual/minors": ContentCategory.SEXUAL,
|
|
||||||
"violence": ContentCategory.VIOLENCE,
|
|
||||||
"violence/graphic": ContentCategory.VIOLENCE,
|
|
||||||
}
|
|
||||||
|
|
||||||
flagged_categories = []
|
|
||||||
max_score = 0.0
|
|
||||||
|
|
||||||
for category, score in results.category_scores.model_dump().items():
|
|
||||||
if score > 0.5: # Threshold
|
|
||||||
if category in category_mapping:
|
|
||||||
flagged_categories.append(category_mapping[category])
|
|
||||||
max_score = max(max_score, score)
|
|
||||||
|
|
||||||
# Adjust threshold based on sensitivity
|
|
||||||
threshold = 0.3 + (0.4 * (100 - sensitivity) / 100) # 0.3 to 0.7
|
|
||||||
|
|
||||||
if results.flagged or max_score > threshold:
|
|
||||||
return ModerationResult(
|
|
||||||
is_flagged=True,
|
|
||||||
confidence=max_score,
|
|
||||||
categories=list(set(flagged_categories)),
|
|
||||||
explanation="Content flagged by moderation API",
|
|
||||||
suggested_action="delete" if max_score > 0.8 else "warn",
|
|
||||||
)
|
|
||||||
|
|
||||||
return ModerationResult(is_flagged=False, confidence=1.0 - max_score)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error moderating text: {e}")
|
|
||||||
return ModerationResult(
|
|
||||||
is_flagged=False,
|
|
||||||
explanation=f"Error analyzing content: {str(e)}",
|
|
||||||
)
|
|
||||||
|
|
||||||
async def analyze_image(
|
async def analyze_image(
|
||||||
self,
|
self,
|
||||||
image_url: str,
|
image_url: str,
|
||||||
@@ -223,41 +121,6 @@ NSFW SEVERITY GUIDELINES: none=0, suggestive=20-35, partial_nudity=40-55, nudity
|
|||||||
logger.error(f"Error analyzing image: {e}")
|
logger.error(f"Error analyzing image: {e}")
|
||||||
return ImageAnalysisResult(description=f"Error analyzing image: {str(e)}")
|
return ImageAnalysisResult(description=f"Error analyzing image: {str(e)}")
|
||||||
|
|
||||||
async def analyze_phishing(
|
|
||||||
self,
|
|
||||||
url: str,
|
|
||||||
message_content: str | None = None,
|
|
||||||
) -> PhishingAnalysisResult:
|
|
||||||
"""Analyze a URL for phishing/scam indicators."""
|
|
||||||
system = """Analyze the URL for phishing/scam indicators. Respond in JSON:
|
|
||||||
{
|
|
||||||
"is_phishing": true/false,
|
|
||||||
"confidence": 0.0-1.0,
|
|
||||||
"risk_factors": ["factor1"],
|
|
||||||
"explanation": "Brief explanation"
|
|
||||||
}
|
|
||||||
|
|
||||||
Check for: domain impersonation, urgency tactics, credential requests, too-good-to-be-true offers."""
|
|
||||||
|
|
||||||
user_message = f"URL: {url}"
|
|
||||||
if message_content:
|
|
||||||
user_message += f"\n\nMessage context: {message_content}"
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = await self._call_api(system, user_message)
|
|
||||||
data = self._parse_json_response(response)
|
|
||||||
|
|
||||||
return PhishingAnalysisResult(
|
|
||||||
is_phishing=data.get("is_phishing", False),
|
|
||||||
confidence=float(data.get("confidence", 0.0)),
|
|
||||||
risk_factors=data.get("risk_factors", []),
|
|
||||||
explanation=data.get("explanation", ""),
|
|
||||||
)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error analyzing phishing: {e}")
|
|
||||||
return PhishingAnalysisResult(explanation=f"Error analyzing URL: {str(e)}")
|
|
||||||
|
|
||||||
async def close(self) -> None:
|
async def close(self) -> None:
|
||||||
"""Clean up resources."""
|
"""Clean up resources."""
|
||||||
await self.client.close()
|
await self.client.close()
|
||||||
|
|||||||
Reference in New Issue
Block a user