docs: Rewrite README with comprehensive feature documentation

Complete overhaul of README.md with better structure and clarity. New sections: - Clear overview of what GuardDen is (and isn't) - Feature comparison table with costs - Detailed feature descriptions - Prerequisites table - Step-by-step Discord bot setup - Configuration options explained - Detection flow diagram - Cost controls breakdown - Troubleshooting guide - Project structure - Development guide Improvements: - Professional formatting with tables - Clear cost transparency - Better quick start instructions - Comprehensive configuration guide - Troubleshooting section for common issues
2026-01-27 20:19:49 +01:00
parent 562c92dae6
commit b9bc2cf0b5
1 changed files with 333 additions and 193 deletions
--- a/README.md
+++ b/README.md
@@ -1,158 +1,231 @@
 # GuardDen

-A lightweight, cost-conscious Discord moderation bot focused on essential protection. Built for self-hosting with minimal resource usage and AI costs.
+A lightweight, cost-conscious Discord moderation bot focused on automated protection against spam and NSFW content. Built for self-hosting with minimal resource usage and predictable AI costs.
+
+## Overview
+
+GuardDen is a minimal Discord bot designed for small to medium servers (1-2 guilds) that need automated moderation without the complexity of full-featured moderation systems. It focuses on two core areas:
+
+1. **Spam Detection** - Automatic rate limiting, duplicate detection, and mass mention protection
+2. **NSFW Content Filtering** - AI-powered image analysis with aggressive cost controls
+
+**What GuardDen is NOT:**
+- Not a full moderation suite (no manual mod commands, logging, or strike systems)
+- Not a verification/captcha system
+- Not a chat moderation bot (no text analysis, banned words, or scam detection)
+
+**Target Users:**
+- Small community servers that need automated spam + NSFW protection
+- Budget-conscious server owners (~$5-25/month AI costs)
+- Self-hosters who want a simple, maintainable bot
+
+---

 ## Features

-### Spam Detection
- **Anti-Spam** - Rate limiting, duplicate detection, mass mention protection
- **Automatic Actions** - Message deletion and user timeout for spam violations
+| Feature | Description | Cost |
+|---------|-------------|------|
+| **Spam Detection** | Rate limiting, duplicate messages, mass mentions | Free |
+| **NSFW Image Detection** | AI-powered analysis of images/GIFs using Claude or GPT | ~$5-25/month |
+| **User Blocklist** | Block ALL media from specific users instantly | Free |
+| **NSFW Domain Blocking** | Instant blocking of known NSFW video domains | Free |
+| **Cost Controls** | Rate limits, deduplication, file size limits | Built-in |
+| **Single Config File** | One YAML file for all settings | Easy |
+| **Owner Commands** | Status, reload, ping | Free |

-### AI-Powered NSFW Image Detection
- **Smart Image Analysis** - AI-powered detection of inappropriate images using Claude or GPT
- **Cost Controls** - Conservative rate limits (25 checks/hour/guild by default)
- **Embed Support** - Optional checking of Discord GIF embeds
- **NSFW Video Domain Blocking** - Block known NSFW video domains
- **Configurable Sensitivity** - Adjust strictness (0-100)
+### Spam Detection
+
+Automatically detects and deletes spam messages based on:
+- **Message Rate Limiting**: Max 5 messages per 5 seconds (configurable)
+- **Duplicate Detection**: Flags repeated identical messages
+- **Mass Mentions**: Limits @mentions per message and per time window
+- **Actions**: Deletes message, no notifications to user
+
+### NSFW Image Detection
+
+AI-powered analysis of images and GIFs with strict cost controls:
+- **Supported Providers**: Anthropic Claude, OpenAI GPT
+- **Content Types**: Image attachments, Discord GIF embeds (optional)
+- **NSFW Categories**: Suggestive, Partial Nudity, Nudity, Explicit
+- **Filtering Mode**: NSFW-only by default (only blocks sexual content)
+- **Cost Controls**:
+  - 25 AI checks/hour/guild (default)
+  - 5 AI checks/hour/user (default)
+  - Image deduplication (tracks 1000 recent messages)
+  - File size limit (skip > 3MB)
+  - Max images per message (2 by default)
+- **Actions**: Deletes message, no notifications to user
+
+### User Blocklist
+
+Instantly delete ALL media from specific users:
+- **Blocks**: Images, GIFs, embeds, URLs
+- **No AI Cost**: Instant deletion without analysis
+- **Use Case**: Known problematic users, spam accounts
+
+### NSFW Domain Blocking
+
+Pre-configured list of known NSFW video domains:
+- Blocks: pornhub.com, xvideos.com, xnxx.com, etc.
+- **No AI Cost**: Pattern matching only
+- **Instant**: Deletes message immediately
+
+---

 ## Quick Start

 ### Prerequisites
- Python 3.11+
- PostgreSQL 15+
- Discord Bot Token (see setup below)
- (Optional) Anthropic or OpenAI API key for AI features

-### Discord Bot Setup
+| Requirement | Version | Purpose |
+|-------------|---------|---------|
+| Python | 3.11+ | Bot runtime |
+| PostgreSQL | 15+ | Database |
+| Discord Bot Token | - | Bot authentication |
+| AI API Key | (Optional) | Claude or OpenAI for NSFW detection |

-1. Go to the [Discord Developer Portal](https://discord.com/developers/applications)
-2. Click **New Application** and give it a name (e.g., "GuardDen")
-3. Go to the **Bot** tab and click **Add Bot**
+### 1. Discord Bot Setup

-4. **Configure Bot Settings:**
-   - Disable **Public Bot** if you only want yourself to add it
-   - Copy the **Token** (click "Reset Token") - this is your `GUARDDEN_DISCORD_TOKEN`
+1. **Create Application**
+   - Go to [Discord Developer Portal](https://discord.com/developers/applications)
+   - Click **New Application** → Name it (e.g., "GuardDen")
+   - Go to **Bot** tab → **Add Bot**

-5. **Enable Privileged Gateway Intents** (required):
-   - **Message Content Intent** - for reading messages (spam detection, image checking)
+2. **Get Bot Token**
+   - Click **Reset Token** → Copy the token
+   - Save as `GUARDDEN_DISCORD_TOKEN` in `.env`

-6. **Generate Invite URL** - Go to **OAuth2** > **URL Generator**:
-   
-   **Scopes:**
-   - `bot`
-   
-   **Bot Permissions:**
-   - Moderate Members (timeout)
-   - View Channels
-   - Send Messages
-   - Manage Messages
-   - Read Message History
-   
-   Or use permission integer: `275415089216`
+3. **Enable Intents**
+   - Enable **Message Content Intent** (required for reading messages)

-7. Use the generated URL to invite the bot to your server
+4. **Generate Invite URL**
+   - Go to **OAuth2** → **URL Generator**
+   - Select scopes: `bot`
+   - Select permissions:
+     - Moderate Members (timeout)
+     - View Channels
+     - Send Messages
+     - Manage Messages
+     - Read Message History
+   - Or use permission integer: `275415089216`
+   - Copy generated URL and invite to your server

-### Docker Deployment (Recommended)
+### 2. Installation

-1. Clone the repository:
-   ```bash
-   git clone https://git.hiddenden.cafe/Hiddenden/GuardDen.git
-   cd guardden
-   ```
+**Option A: Docker (Recommended)**

-2. Create your configuration files:
-   ```bash
-   # Environment variables
-   cp .env.example .env
-   # Edit .env and add your Discord token
-   
-   # Bot configuration
-   cp config.example.yml config.yml
-   # Edit config.yml with your settings
-   ```
+```bash
+# Clone repository
+git clone https://git.hiddenden.cafe/Hiddenden/GuardDen.git
+cd GuardDen

-3. Start with Docker Compose:
-   ```bash
-   docker compose up -d
-   ```
+# Create configuration files
+cp .env.example .env
+cp config.example.yml config.yml

-### Local Development
+# Edit .env - Add your Discord token
+nano .env

-1. Create a virtual environment:
-   ```bash
-   python -m venv venv
-   source venv/bin/activate  # On Windows: venv\Scripts\activate
-   ```
+# Edit config.yml - Configure settings
+nano config.yml

-2. Install dependencies:
-   ```bash
-   pip install -e ".[dev,ai]"
-   ```
+# Start with Docker Compose
+docker compose up -d

-3. Set up configuration:
-   ```bash
-   # Environment variables
-   cp .env.example .env
-   # Edit .env with your Discord token
-   
-   # Bot configuration
-   cp config.example.yml config.yml
-   # Edit config.yml with your settings
-   ```
+# View logs
+docker logs guardden-bot -f
+```

-4. Start PostgreSQL (or use Docker):
-   ```bash
-   docker compose up db -d
-   ```
+**Option B: Local Development**

-5. Run the bot:
-   ```bash
-   python -m guardden
-   ```
+```bash
+# Clone repository
+git clone https://git.hiddenden.cafe/Hiddenden/GuardDen.git
+cd GuardDen
+
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+
+# Install dependencies
+pip install -e ".[dev,ai]"
+
+# Create configuration files
+cp .env.example .env
+cp config.example.yml config.yml
+
+# Edit configuration
+nano .env
+nano config.yml
+
+# Start PostgreSQL (or use Docker)
+docker compose up db -d
+
+# Run database migrations
+alembic upgrade head
+
+# Start bot
+python -m guardden
+```
+
+---

 ## Configuration

-GuardDen uses a **single YAML configuration file** (`config.yml`) for managing all bot settings across all guilds.
+### Environment Variables (`.env`)

-### Configuration File (`config.yml`)
+| Variable | Required | Description | Default |
+|----------|----------|-------------|---------|
+| `GUARDDEN_DISCORD_TOKEN` | ✅ | Discord bot token | - |
+| `GUARDDEN_DATABASE_URL` | No | PostgreSQL connection URL | `postgresql://guardden:guardden@localhost:5432/guardden` |
+| `GUARDDEN_LOG_LEVEL` | No | Logging level (DEBUG/INFO/WARNING/ERROR) | `INFO` |
+| `GUARDDEN_AI_PROVIDER` | No | AI provider (`anthropic`/`openai`/`none`) | `none` |
+| `GUARDDEN_ANTHROPIC_API_KEY` | No* | Anthropic API key | - |
+| `GUARDDEN_OPENAI_API_KEY` | No* | OpenAI API key | - |

-Create a `config.yml` file in your project root:
+*Required if `AI_PROVIDER` is set to `anthropic` or `openai`
+
+### Bot Configuration (`config.yml`)

 ```yaml
+# Bot Settings
 bot:
  prefix: "!"
  owner_ids:
-    - 123456789012345678  # Your Discord user ID
+    - 123456789012345678  # Your Discord user ID (for owner commands)

-# Spam detection settings
+# Spam Detection
 automod:
  enabled: true
  anti_spam_enabled: true
  message_rate_limit: 5           # Max messages per window
  message_rate_window: 5          # Window in seconds
-  duplicate_threshold: 3          # Duplicates to trigger
+  duplicate_threshold: 3          # Duplicate messages to trigger
  mention_limit: 5                # Max mentions per message
  mention_rate_limit: 10          # Max mentions per window
-  mention_rate_window: 60         # Window in seconds
+  mention_rate_window: 60         # Mention window in seconds

-# AI moderation settings
+# AI Moderation (NSFW Detection)
 ai_moderation:
  enabled: true
  sensitivity: 80                  # 0-100 (higher = stricter)
  nsfw_only_filtering: true        # Only filter sexual content
-  max_checks_per_hour_per_guild: 25  # Cost control
-  max_checks_per_user_per_hour: 5    # Cost control
-  max_images_per_message: 2          # Analyze max 2 images/msg
-  max_image_size_mb: 3               # Skip images > 3MB
-  check_embed_images: true           # Check Discord GIF embeds
+  
+  # Cost Controls
+  max_checks_per_hour_per_guild: 25  # Conservative limit
+  max_checks_per_user_per_hour: 5    # Prevent abuse
+  max_images_per_message: 2          # Analyze max 2 images
+  max_image_size_mb: 3               # Skip large files
+  
+  # Feature Toggles
+  check_embed_images: true           # Check Discord GIFs
  check_video_thumbnails: false      # Skip video thumbnails
-  url_image_check_enabled: false     # Skip URL image downloads
+  url_image_check_enabled: false     # Skip URL downloads

-# User blocklist (blocks ALL media from specific users)
+# User Blocklist (instant deletion)
 blocked_user_ids:
  - 123456789012345678  # Discord user ID to block

-# Known NSFW video domains (auto-block)
+# NSFW Domain Blocklist (instant blocking)
 nsfw_video_domains:
  - pornhub.com
  - xvideos.com
@@ -161,64 +234,100 @@ nsfw_video_domains:
  - youporn.com
 ```

-### Key Configuration Options
-
-**AI Moderation (NSFW Image Detection):**
- `sensitivity`: 0-100 scale (higher = stricter detection)
- `nsfw_only_filtering`: Only flag sexual content (violence/harassment allowed)
- `max_checks_per_hour_per_guild`: Cost control - limits AI API calls
- `check_embed_images`: Whether to analyze Discord GIF embeds
+### Configuration Options Explained

 **Spam Detection:**
- `message_rate_limit`: Max messages allowed per window
- `duplicate_threshold`: How many duplicate messages trigger action
+- `message_rate_limit`: How many messages allowed in time window
+- `duplicate_threshold`: How many identical messages trigger spam detection
 - `mention_limit`: Max @mentions allowed per message

+**AI Moderation:**
+- `sensitivity`: Detection strictness (80 = balanced, 100 = very strict, 50 = lenient)
+- `nsfw_only_filtering`: `true` = only block sexual content (default), `false` = block all inappropriate content
+- `max_checks_per_hour_per_guild`: Hard limit on AI API calls per guild (cost control)
+- `max_checks_per_user_per_hour`: Per-user limit to prevent spam/abuse
+
 **User Blocklist:**
- `blocked_user_ids`: List of Discord user IDs to block
- Automatically deletes ALL images, GIFs, embeds, and URLs from these users
- No AI cost - instant deletion
- Useful for known problematic users or spam accounts
+- Add Discord user IDs to instantly delete ALL their media
+- No AI cost - instant pattern matching
+- Useful for repeat offenders or spam bots

-**Cost Controls:**
-The bot includes multiple layers of cost control:
- Rate limiting (25 AI checks/hour/guild, 5/hour/user by default)
- Image deduplication (tracks last 1000 analyzed messages)
- File size limits (skip images > 3MB)
- Max images per message (analyze max 2 images)
- Optional embed checking (disable to save costs)
+**Cost Estimation:**
+- Small server (< 100 users): ~$5-10/month
+- Medium server (100-500 users): ~$15-25/month
+- Large server (500+ users): Increase rate limits or disable embed checking

-### Environment Variables
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `GUARDDEN_DISCORD_TOKEN` | Your Discord bot token | **Required** |
-| `GUARDDEN_DATABASE_URL` | PostgreSQL connection URL | `postgresql://guardden:guardden@localhost:5432/guardden` |
-| `GUARDDEN_LOG_LEVEL` | Logging level | `INFO` |
-| `GUARDDEN_AI_PROVIDER` | AI provider (anthropic/openai/none) | `none` |
-| `GUARDDEN_ANTHROPIC_API_KEY` | Anthropic API key (if using Claude) | - |
-| `GUARDDEN_OPENAI_API_KEY` | OpenAI API key (if using GPT) | - |
+---

 ## Owner Commands

-GuardDen includes a minimal set of owner-only commands for bot management:
-
 | Command | Description |
 |---------|-------------|
 | `!status` | Show bot status (uptime, guilds, latency, AI provider) |
-| `!reload` | Reload all cogs |
+| `!reload` | Reload all cogs (apply code changes without restart) |
 | `!ping` | Check bot latency |

-**Note:** All configuration is done via the `config.yml` file. There are no in-Discord configuration commands.
+**Note:** All configuration is done via `config.yml`. There are no in-Discord configuration commands.

-## Project Structure
+---
+
+## How It Works
+
+### Detection Flow
+
+```
+Message Received
+    ↓
+[1] User Blocklist Check (instant)
+    ↓ (if not blocked)
+[2] NSFW Domain Check (instant)
+    ↓ (if no NSFW domain)
+[3] Spam Detection (free)
+    ↓ (if not spam)
+[4] Has Images/Embeds?
+    ↓ (if yes)
+[5] AI Rate Limit Check
+    ↓ (if under limit)
+[6] Image Deduplication
+    ↓ (if not analyzed recently)
+[7] AI Analysis (cost)
+    ↓
+[8] Action: Delete if violation
+```
+
+### Action Behavior
+
+When a violation is detected:
+- ✅ **Message deleted** immediately
+- ✅ **Action logged** to console/log file
+- ❌ **No DM sent** to user (silent)
+- ❌ **No timeout** applied (delete only)
+- ❌ **No moderation log** in Discord
+
+### Cost Controls
+
+Multiple layers to keep AI costs predictable:
+
+1. **User Blocklist** - Skip AI entirely for known bad actors
+2. **Domain Blocklist** - Skip AI for known NSFW domains
+3. **Rate Limiting** - Hard caps per guild and per user
+4. **Deduplication** - Don't re-analyze same message
+5. **File Size Limits** - Skip very large files
+6. **Max Images** - Limit images analyzed per message
+7. **Optional Features** - Disable embed checking to save costs
+
+---
+
+## Development
+
+### Project Structure

 ```
 guardden/
 ├── src/guardden/
 │   ├── bot.py                  # Main bot class
 │   ├── config.py               # Settings management
-│   ├── cogs/                   # Discord command groups
+│   ├── cogs/                   # Discord command modules
 │   │   ├── automod.py          # Spam detection
 │   │   ├── ai_moderation.py    # NSFW image detection
 │   │   └── owner.py            # Owner commands
@@ -228,86 +337,117 @@ guardden/
 │   │   ├── ai/                 # AI provider implementations
 │   │   ├── automod.py          # Spam detection logic
 │   │   ├── config_loader.py    # YAML config loading
-│   │   ├── ai_rate_limiter.py  # AI cost control
-│   │   ├── database.py         # DB connections
-│   │   └── guild_config.py     # Config caching
+│   │   ├── ai_rate_limiter.py  # Cost control
+│   │   └── database.py         # DB connections
 │   └── __main__.py             # Entry point
-├── config.yml                  # Bot configuration
+├── config.yml                  # Bot configuration (not in git)
+├── config.example.yml          # Configuration template
+├── .env                        # Environment variables (not in git)
+├── .env.example                # Environment template
 ├── tests/                      # Test suite
 ├── migrations/                 # Database migrations
 ├── docker-compose.yml          # Docker deployment
-├── pyproject.toml              # Dependencies
-└── README.md                   # This file
+└── pyproject.toml              # Dependencies
 ```

-## How It Works
-
-### User Blocklist (Instant, No AI Cost)
-1. Checks if message author is in `blocked_user_ids` list
-2. If message contains ANY media (images, embeds, URLs), instantly deletes it
-3. No AI analysis needed - immediate action
-4. Useful for known spam accounts or problematic users
-
-### Spam Detection
-1. Bot monitors message rate per user
-2. Detects duplicate messages
-3. Counts @mentions (mass mention detection)
-4. Violations result in message deletion + timeout
-
-### NSFW Image Detection
-1. Checks user blocklist first (instant deletion if matched)
-2. Checks NSFW video domain blocklist (instant deletion)
-3. Bot checks attachments and embeds for images
-4. Applies rate limiting and deduplication
-5. Downloads image and sends to AI provider
-6. AI analyzes for NSFW content categories
-7. Violations result in message deletion + timeout
-
-### Cost Management
-The bot includes aggressive cost controls for AI usage:
- **Rate Limiting**: 25 checks/hour/guild, 5/hour/user (configurable)
- **Deduplication**: Skips recently analyzed message IDs
- **File Size Limits**: Skips images larger than 3MB
- **Max Images**: Analyzes max 2 images per message
- **Optional Features**: Embed checking, video thumbnails, URL downloads all controllable
-
-**Estimated Costs** (with defaults):
- Small server (< 100 users): ~$5-10/month
- Medium server (100-500 users): ~$15-25/month
- Large server (500+ users): Consider increasing rate limits or disabling embeds
-
-## Development
-
 ### Running Tests

 ```bash
+# Run all tests
 pytest
-pytest -v                           # Verbose output
-pytest tests/test_automod.py        # Specific file
-pytest -k "test_scam"               # Filter by name
+
+# Run specific tests
+pytest tests/test_automod.py
+
+# Run with coverage
+pytest --cov=src/guardden --cov-report=html
 ```

 ### Code Quality

 ```bash
-ruff check src tests                # Linting
-ruff format src tests               # Formatting
-mypy src                            # Type checking
+# Linting
+ruff check src tests
+
+# Formatting
+ruff format src tests
+
+# Type checking
+mypy src
 ```

+### Database Migrations
+
+```bash
+# Apply migrations
+alembic upgrade head
+
+# Create new migration
+alembic revision --autogenerate -m "description"
+
+# Rollback one migration
+alembic downgrade -1
+```
+
+---
+
+## Troubleshooting
+
+### Bot won't start
+
+**Error: `Config file not found: config.yml`**
+- Solution: Copy `config.example.yml` to `config.yml` and edit settings
+
+**Error: `Discord token cannot be empty`**
+- Solution: Add `GUARDDEN_DISCORD_TOKEN` to `.env` file
+
+**Error: `Cannot import name 'ModerationResult'`**
+- Solution: Pull latest changes and rebuild: `docker compose up -d --build`
+
+### Bot doesn't respond to commands
+
+**Check:**
+1. Bot is online in Discord
+2. Bot has correct permissions (Manage Messages, View Channels)
+3. Your user ID is in `owner_ids` in config.yml
+4. Check logs: `docker logs guardden-bot -f`
+
+### AI not working
+
+**Check:**
+1. `ai_moderation.enabled: true` in config.yml
+2. `GUARDDEN_AI_PROVIDER` set to `anthropic` or `openai` in .env
+3. API key is set in .env (`GUARDDEN_ANTHROPIC_API_KEY` or `GUARDDEN_OPENAI_API_KEY`)
+4. Check logs for API errors
+
+### High AI costs
+
+**Reduce costs by:**
+1. Lower `max_checks_per_hour_per_guild` in config.yml
+2. Set `check_embed_images: false` to skip GIF embeds
+3. Add known offenders to `blocked_user_ids` blocklist
+4. Increase `max_image_size_mb` to skip large files
+
+---
+
 ## License

 MIT License - see LICENSE file for details.

+---
+
 ## Support

- **Issues**: Report bugs at https://github.com/anthropics/claude-code/issues
- **Documentation**: See `docs/` directory
- **Configuration Help**: Check `CLAUDE.md` for developer guidance
+- **Issues**: [Report bugs](https://git.hiddenden.cafe/Hiddenden/GuardDen/issues)
+- **Configuration**: See `CLAUDE.md` for developer guidance
+- **Testing**: See `TESTING_TODO.md` for test status

-## Future Considerations
+---

- [ ] Per-guild sensitivity settings (currently global)
+## Roadmap
+
+- [ ] Per-guild configuration support
 - [ ] Slash commands
- [ ] Custom NSFW category thresholds
+- [ ] Custom NSFW thresholds per category
 - [ ] Whitelist for trusted image sources
+- [ ] Dashboard for viewing stats