first version of the knowledge base :)

2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions
--- a/Guides/containers/docker-basics.md
+++ b/Guides/containers/docker-basics.md
@@ -0,0 +1,125 @@
+---
+title: Docker Basics
+description: Practical introduction to Docker images, containers, and everyday command-line workflows
+tags:
+  - containers
+  - docker
+  - linux
+category: containers
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Docker Basics
+
+## Introduction
+
+Docker packages applications and their dependencies into images that run as isolated containers. For homelab and developer workflows, it is commonly used to deploy repeatable services without building a full virtual machine for each workload.
+
+## Purpose
+
+Docker is useful when you need:
+
+- Repeatable application packaging
+- Simple local development environments
+- Fast service deployment on Linux hosts
+- Clear separation between host OS and application runtime
+
+## Architecture Overview
+
+Core Docker concepts:
+
+- Image: immutable application package template
+- Container: running instance of an image
+- Registry: source for pulling and pushing images
+- Volume: persistent storage outside the writable container layer
+- Network: connectivity boundary for one or more containers
+
+Typical flow:
+
+```text
+Dockerfile -> Image -> Registry or local cache -> Container runtime
+```
+
+## Step-by-Step Guide
+
+### 1. Verify Docker is installed
+
+```bash
+docker version
+docker info
+```
+
+### 2. Pull and run a container
+
+```bash
+docker pull nginx:stable
+docker run -d --name web -p 8080:80 nginx:stable
+```
+
+### 3. Inspect the running container
+
+```bash
+docker ps
+docker logs web
+docker exec -it web sh
+```
+
+### 4. Stop and remove it
+
+```bash
+docker stop web
+docker rm web
+```
+
+## Configuration Example
+
+Run a service with a persistent named volume:
+
+```bash
+docker volume create app-data
+docker run -d \
+  --name app \
+  -p 3000:3000 \
+  -v app-data:/var/lib/app \
+  ghcr.io/example/app:latest
+```
+
+Inspect resource usage:
+
+```bash
+docker stats
+```
+
+## Troubleshooting Tips
+
+### Container starts and exits immediately
+
+- Check `docker logs <container>`
+- Verify the image's default command is valid
+- Confirm required environment variables or mounted files exist
+
+### Port publishing does not work
+
+- Verify the service is listening inside the container
+- Confirm the host port is not already in use
+- Check host firewall rules
+
+### Data disappears after recreation
+
+- Use a named volume or bind mount instead of the writable container layer
+- Confirm the application writes data to the mounted path
+
+## Best Practices
+
+- Pin images to a known tag and update intentionally
+- Use named volumes for application state
+- Prefer non-root containers when supported by the image
+- Keep containers single-purpose and externalize configuration
+- Use Compose for multi-service stacks instead of long `docker run` commands
+
+## References
+
+- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
+- [Docker: Get started](https://docs.docker.com/get-started/)
+- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
--- a/Guides/containers/docker-compose-patterns.md
+++ b/Guides/containers/docker-compose-patterns.md
@@ -0,0 +1,156 @@
+---
+title: Docker Compose Patterns
+description: Reusable patterns for structuring Docker Compose applications in homelab and development environments
+tags:
+  - containers
+  - docker
+  - compose
+category: containers
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Docker Compose Patterns
+
+## Introduction
+
+Docker Compose defines multi-container applications in a single declarative file. It is a good fit for homelab stacks, local development, and small self-hosted services that do not require a full orchestrator.
+
+## Purpose
+
+Compose helps when you need:
+
+- Repeatable service definitions
+- Shared networks and volumes for a stack
+- Environment-specific overrides
+- A clear deployment artifact that can live in Git
+
+## Architecture Overview
+
+A Compose application usually includes:
+
+- One or more services
+- One or more shared networks
+- Persistent volumes
+- Environment variables and mounted configuration
+- Optional health checks and startup dependencies
+
+## Step-by-Step Guide
+
+### 1. Start with a minimal Compose file
+
+```yaml
+services:
+  app:
+    image: ghcr.io/example/app:1.2.3
+    ports:
+      - "8080:8080"
+```
+
+Start it:
+
+```bash
+docker compose up -d
+docker compose ps
+```
+
+### 2. Add persistent storage and configuration
+
+```yaml
+services:
+  app:
+    image: ghcr.io/example/app:1.2.3
+    ports:
+      - "8080:8080"
+    environment:
+      APP_BASE_URL: "https://app.example.com"
+    volumes:
+      - app-data:/var/lib/app
+
+volumes:
+  app-data:
+```
+
+### 3. Add dependencies with health checks
+
+```yaml
+services:
+  db:
+    image: postgres:16
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: app
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U app"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    volumes:
+      - db-data:/var/lib/postgresql/data
+
+  app:
+    image: ghcr.io/example/app:1.2.3
+    depends_on:
+      db:
+        condition: service_healthy
+    environment:
+      DATABASE_URL: postgres://app:${POSTGRES_PASSWORD}@db:5432/app
+    ports:
+      - "8080:8080"
+
+volumes:
+  db-data:
+```
+
+## Common Patterns
+
+### Use one project directory per stack
+
+Keep the Compose file, `.env` example, and mounted config together in one directory.
+
+### Use user-defined networks
+
+Private internal services should communicate over Compose networks rather than the host network.
+
+### Prefer explicit volumes
+
+Named volumes are easier to back up and document than anonymous ones.
+
+### Use profiles for optional services
+
+Profiles are useful for dev-only services, one-shot migration jobs, or optional observability components.
+
+## Troubleshooting Tips
+
+### Services start in the wrong order
+
+- Use health checks instead of only container start order
+- Ensure the application retries database or dependency connections
+
+### Configuration drift between hosts
+
+- Commit the Compose file to Git
+- Keep secrets out of the file and inject them separately
+- Avoid host-specific bind mount paths when portability matters
+
+### Containers cannot resolve each other
+
+- Check that the services share the same Compose network
+- Use the service name as the hostname
+- Verify the application is not hard-coded to `localhost`
+
+## Best Practices
+
+- Omit the deprecated top-level `version` field in new Compose files
+- Keep secrets outside the Compose YAML when possible
+- Pin images to intentional versions
+- Use health checks for stateful dependencies
+- Treat Compose as deployment code and review changes like application code
+
+## References
+
+- [Docker: Compose file reference](https://docs.docker.com/reference/compose-file/)
+- [Docker: Compose application model](https://docs.docker.com/compose/intro/compose-application-model/)
+- [Docker: Control startup and shutdown order in Compose](https://docs.docker.com/compose/how-tos/startup-order/)
+- [Compose Specification](https://compose-spec.io/)
--- a/Guides/networking/tailscale-exit-nodes.md
+++ b/Guides/networking/tailscale-exit-nodes.md
@@ -0,0 +1,124 @@
+---
+title: Tailscale Exit Nodes
+description: Guide to publishing and using Tailscale exit nodes for internet-bound traffic
+tags:
+  - networking
+  - tailscale
+  - vpn
+category: networking
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Tailscale Exit Nodes
+
+## Introduction
+
+An exit node is a Tailscale device that forwards a client's default route. When enabled, internet-bound traffic leaves through that node instead of the client's local network.
+
+## Purpose
+
+Exit nodes are commonly used for:
+
+- Secure browsing on untrusted networks
+- Reaching the internet through a trusted home or lab connection
+- Testing geo-dependent behavior from another site
+- Concentrating egress through a monitored network path
+
+## Architecture Overview
+
+With an exit node, the selected client sends default-route traffic through Tailscale to the exit node, which then forwards it to the public internet.
+
+```text
+Client -> Tailscale tunnel -> Exit node -> Internet
+```
+
+Important implications:
+
+- The exit node becomes part of the trust boundary
+- Bandwidth, DNS behavior, and logging depend on the exit node's network
+- Local LAN access on the client may need explicit allowance
+
+## Step-by-Step Guide
+
+### 1. Prepare the exit node host
+
+Choose a stable host with sufficient upstream bandwidth and a network path you trust. Typical choices are a home server, small VPS, or a utility VM.
+
+### 2. Advertise the node as an exit node
+
+On the node:
+
+```bash
+sudo tailscale up --advertise-exit-node
+```
+
+You can combine this with tags:
+
+```bash
+sudo tailscale up --advertise-exit-node --advertise-tags=tag:exit-node
+```
+
+### 3. Approve or review the role
+
+Approve the exit node in the admin console if required by policy. Restrict who can use it with ACLs or grants.
+
+### 4. Select the exit node on a client
+
+From a client, choose the exit node in the Tailscale UI or configure it from the CLI:
+
+```bash
+sudo tailscale up --exit-node=<exit-node-name-or-ip>
+```
+
+If the client still needs to reach the local LAN directly, enable local LAN access in the client configuration or UI.
+
+## Configuration Example
+
+Example for a dedicated Linux exit node:
+
+```bash
+sudo tailscale up \
+  --advertise-exit-node \
+  --advertise-tags=tag:exit-node
+```
+
+Client-side example:
+
+```bash
+sudo tailscale up --exit-node=home-gateway
+curl https://ifconfig.me
+```
+
+## Troubleshooting Tips
+
+### Internet access stops after selecting the exit node
+
+- Confirm the exit node is online in `tailscale status`
+- Verify the exit node host itself has working internet access
+- Check the exit node's local firewall and forwarding configuration
+
+### Local printers or NAS become unreachable
+
+- Enable local LAN access on the client if that behavior is required
+- Split administrative traffic from internet egress if the use case is mixed
+
+### Performance is poor
+
+- Verify the client is using a nearby and healthy exit node
+- Check the exit node's CPU, uplink bandwidth, and packet loss
+- Avoid placing an exit node behind overloaded or unstable consumer hardware
+
+## Best Practices
+
+- Use exit nodes for specific trust and egress requirements, not as a default for every device
+- Restrict usage to approved groups or devices
+- Keep exit nodes patched because they handle broad traffic scopes
+- Log and monitor egress hosts like any other shared network gateway
+- Separate personal browsing, admin traffic, and production service egress when the risk model requires it
+
+## References
+
+- [Tailscale: Exit nodes](https://tailscale.com/kb/1103/exit-nodes)
+- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
+- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
--- a/Guides/networking/tailscale-subnet-routing.md
+++ b/Guides/networking/tailscale-subnet-routing.md
@@ -0,0 +1,143 @@
+---
+title: Tailscale Subnet Routing
+description: Guide to publishing LAN subnets into a Tailscale tailnet with subnet routers
+tags:
+  - networking
+  - tailscale
+  - routing
+category: networking
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Tailscale Subnet Routing
+
+## Introduction
+
+Subnet routing allows Tailscale clients to reach devices that are not running the Tailscale agent directly. This is useful for printers, storage appliances, hypervisors, IoT controllers, and legacy systems on a homelab LAN.
+
+## Purpose
+
+Use subnet routing when:
+
+- A device cannot run the Tailscale client
+- A full site-to-site VPN is unnecessary
+- Remote users need access to one or more internal networks
+- You want to publish access to a specific VLAN without exposing the entire environment
+
+## Architecture Overview
+
+A subnet router is a Tailscale node with IP forwarding enabled. It advertises one or more LAN prefixes to the tailnet.
+
+```text
+Remote client -> Tailscale tunnel -> Subnet router -> LAN target
+```
+
+Recommended placement:
+
+- One router per routed network or security zone
+- Prefer stable hosts such as small Linux VMs, routers, or dedicated utility nodes
+- Apply restrictive ACLs so only approved identities can use the route
+
+## Step-by-Step Guide
+
+### 1. Prepare the router host
+
+Install Tailscale on a Linux host that already has reachability to the target subnet.
+
+Enable IPv4 forwarding:
+
+```bash
+echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-tailscale.conf
+sudo sysctl --system
+```
+
+If the subnet is IPv6-enabled, also enable IPv6 forwarding:
+
+```bash
+echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
+sudo sysctl --system
+```
+
+### 2. Advertise the subnet
+
+Start Tailscale and advertise the route:
+
+```bash
+sudo tailscale up --advertise-routes=192.168.10.0/24
+```
+
+Multiple routes can be advertised as a comma-separated list:
+
+```bash
+sudo tailscale up --advertise-routes=192.168.10.0/24,192.168.20.0/24
+```
+
+### 3. Approve the route
+
+Approve the advertised route in the Tailscale admin console, or pre-authorize it with `autoApprovers` if that matches your policy model.
+
+### 4. Restrict access
+
+Use ACLs or grants so only the necessary users or tagged devices can reach the routed subnet.
+
+Example policy intent:
+
+- `group:admins` can reach `192.168.10.0/24`
+- `group:developers` can only reach specific hosts or ports
+- IoT and management subnets require separate approval
+
+## Configuration Example
+
+Example server-side command for a dedicated subnet router:
+
+```bash
+sudo tailscale up \
+  --advertise-routes=192.168.10.0/24 \
+  --advertise-tags=tag:subnet-router
+```
+
+Example policy idea:
+
+```json
+{
+  "tagOwners": {
+    "tag:subnet-router": ["group:admins"]
+  }
+}
+```
+
+## Troubleshooting Tips
+
+### Clients can see the route but cannot reach hosts
+
+- Verify IP forwarding is enabled on the router
+- Confirm local firewall rules permit forwarding traffic
+- Make sure the router has normal LAN connectivity to the destination hosts
+- Check whether the destination host has a host firewall blocking the source
+
+### Route does not appear in the tailnet
+
+- Confirm the router is online in `tailscale status`
+- Check that the route was approved in the admin console
+- Review whether policy requires a specific tag owner or auto-approval
+
+### Asymmetric routing or reply failures
+
+- Make sure the subnet router is in the normal return path for the destination subnet
+- Avoid overlapping subnets across multiple sites unless routing precedence is intentional
+- Do not advertise broad prefixes when a narrower one is sufficient
+
+## Best Practices
+
+- Advertise the smallest subnet that solves the use case
+- Run subnet routers on stable infrastructure, not laptops
+- Use separate routers for management and user-facing networks where possible
+- Combine routing with ACLs; route advertisement alone is not authorization
+- Monitor route health and document ownership of every advertised prefix
+
+## References
+
+- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
+- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
+- [Tailscale: Policy file syntax](https://tailscale.com/kb/1337/policy-syntax)
--- a/Guides/security/ssh-hardening.md
+++ b/Guides/security/ssh-hardening.md
@@ -0,0 +1,135 @@
+---
+title: SSH Hardening
+description: Practical SSH server hardening guidance for Linux systems in homelab and self-hosted environments
+tags:
+  - security
+  - ssh
+  - linux
+category: security
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# SSH Hardening
+
+## Introduction
+
+SSH is the primary administrative entry point for many Linux systems. Hardening it reduces the likelihood of credential attacks, accidental privilege exposure, and overly broad remote access.
+
+## Purpose
+
+This guide focuses on making SSH safer by:
+
+- Disabling weak authentication paths
+- Reducing exposure to brute-force attacks
+- Limiting which users can log in
+- Preserving maintainability by relying on modern OpenSSH defaults where possible
+
+## Architecture Overview
+
+SSH hardening has three layers:
+
+- Transport and daemon configuration
+- Network exposure and firewall policy
+- Operational practices such as key handling and logging
+
+For most self-hosted systems, the best model is:
+
+```text
+Admin workstation -> VPN or trusted network -> SSH server
+```
+
+## Step-by-Step Guide
+
+### 1. Use key-based authentication
+
+Generate a key on the client and copy the public key to the server:
+
+```bash
+ssh-keygen -t ed25519 -C "admin@example.com"
+ssh-copy-id admin@server.example
+```
+
+### 2. Harden `sshd_config`
+
+Baseline example:
+
+```text
+PermitRootLogin no
+PasswordAuthentication no
+KbdInteractiveAuthentication no
+PubkeyAuthentication yes
+MaxAuthTries 3
+LoginGraceTime 30
+X11Forwarding no
+AllowTcpForwarding no
+AllowAgentForwarding no
+AllowUsers admin
+```
+
+If you need port forwarding for a specific workflow, enable it deliberately instead of leaving it broadly available.
+
+### 3. Validate the configuration
+
+```bash
+sudo sshd -t
+```
+
+### 4. Reload safely
+
+Keep an existing SSH session open while reloading:
+
+```bash
+sudo systemctl reload sshd
+```
+
+Distribution-specific service names may be `ssh` or `sshd`.
+
+### 5. Restrict network exposure
+
+- Prefer VPN-only or management-VLAN-only access
+- Allow SSH from trusted subnets only
+- Do not expose SSH publicly unless it is necessary and monitored
+
+## Configuration Example
+
+Example host firewall intent:
+
+```text
+Allow TCP 22 from 192.168.10.0/24
+Allow TCP 22 from Tailscale tailnet range
+Deny TCP 22 from all other sources
+```
+
+## Troubleshooting Tips
+
+### Locked out after config change
+
+- Keep the original session open until a new login succeeds
+- Validate the daemon config with `sshd -t`
+- Check the service name and logs with `journalctl -u sshd` or `journalctl -u ssh`
+
+### Key authentication fails
+
+- Check file permissions on `~/.ssh` and `authorized_keys`
+- Confirm the server allows public key authentication
+- Verify the client is offering the intended key with `ssh -v`
+
+### Automation jobs break
+
+- Review whether the workload depended on password auth, port forwarding, or agent forwarding
+- Create narrowly scoped exceptions rather than reverting the whole hardening change
+
+## Best Practices
+
+- Rely on current OpenSSH defaults for ciphers and algorithms unless you have a specific compliance need
+- Disable password-based interactive logins on internet-reachable systems
+- Use individual user accounts and `sudo` instead of direct root SSH
+- Combine SSH hardening with network-level restrictions
+- Review SSH logs regularly on administrative systems
+
+## References
+
+- [OpenBSD `sshd_config` manual](https://man.openbsd.org/sshd_config)
+- [OpenSSH](https://www.openssh.com/)
+- [Mozilla OpenSSH guidelines](https://infosec.mozilla.org/guidelines/openssh)
--- a/Guides/security/yubikey-usage.md
+++ b/Guides/security/yubikey-usage.md
@@ -0,0 +1,113 @@
+---
+title: YubiKey Usage
+description: Guide to using a YubiKey for SSH, authentication, and key protection in self-hosted environments
+tags:
+  - security
+  - yubikey
+  - ssh
+category: security
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# YubiKey Usage
+
+## Introduction
+
+A YubiKey is a hardware token that can protect authentication and cryptographic operations. In homelab and engineering workflows, it is commonly used for MFA, SSH keys, and protection of GPG subkeys.
+
+## Purpose
+
+Use a YubiKey when you want:
+
+- Stronger authentication than password-only login
+- Private keys that require physical presence
+- Portable hardware-backed credentials for administrative access
+
+## Architecture Overview
+
+YubiKeys can be used through different interfaces:
+
+- FIDO2 or WebAuthn: MFA and modern hardware-backed authentication
+- OpenSSH security keys: SSH keys such as `ed25519-sk`
+- OpenPGP applet: card-resident GPG subkeys
+- PIV: smart-card style certificate workflows
+
+Choose the interface based on the workflow instead of trying to use one mode for everything.
+
+## Step-by-Step Guide
+
+### 1. Use the key for MFA first
+
+Register the YubiKey with identity providers and critical services before moving on to SSH or GPG workflows.
+
+### 2. Create a hardware-backed SSH key
+
+On a system with OpenSSH support for security keys:
+
+```bash
+ssh-keygen -t ed25519-sk -C "admin@example.com"
+```
+
+This creates an SSH key tied to the hardware token.
+
+### 3. Install the public key on servers
+
+```bash
+ssh-copy-id -i ~/.ssh/id_ed25519_sk.pub admin@server.example
+```
+
+### 4. Test login
+
+```bash
+ssh admin@server.example
+```
+
+Expect a touch prompt when required by the device policy.
+
+## Configuration Example
+
+Example client SSH config for a dedicated administrative target:
+
+```text
+Host lab-admin
+  HostName server.example
+  User admin
+  IdentityFile ~/.ssh/id_ed25519_sk
+```
+
+For GPG workflows, move only subkeys onto the YubiKey and keep the primary key offline when possible.
+
+## Troubleshooting Tips
+
+### The key is not detected
+
+- Confirm USB or NFC access is available
+- Check whether another smart-card daemon has locked the device
+- Verify the client OS has support for the intended mode
+
+### SSH prompts repeatedly or fails
+
+- Make sure the correct public key is installed on the server
+- Confirm the client is offering the security-key identity
+- Check that the OpenSSH version supports the selected key type
+
+### GPG or smart-card workflows are inconsistent
+
+- Verify which YubiKey applet is in use
+- Avoid mixing PIV and OpenPGP instructions unless the workflow requires both
+- Keep backup tokens or recovery paths for administrative access
+
+## Best Practices
+
+- Use the YubiKey as part of a broader account recovery plan, not as the only path back in
+- Keep at least one spare token for high-value admin accounts
+- Prefer hardware-backed SSH keys for administrator accounts
+- Document which services rely on the token and how recovery works
+- Separate MFA usage from certificate and signing workflows unless there is a clear operational reason to combine them
+
+## References
+
+- [Yubico: SSH](https://developers.yubico.com/SSH/)
+- [Yubico: YubiKey and OpenPGP](https://developers.yubico.com/PGP/)
+- [Yubico developer documentation](https://developers.yubico.com/)
--- a/Guides/self-hosting/backup-strategies.md
+++ b/Guides/self-hosting/backup-strategies.md
@@ -0,0 +1,121 @@
+---
+title: Backup Strategies
+description: Practical backup strategy guidance for self-hosted services, containers, and virtualized homelabs
+tags:
+  - backup
+  - self-hosting
+  - operations
+category: self-hosting
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Backup Strategies
+
+## Introduction
+
+Backups protect against deletion, corruption, hardware failure, ransomware, and operational mistakes. In self-hosted environments, a backup strategy should cover both data and the information needed to restore services correctly.
+
+## Purpose
+
+This guide covers:
+
+- What to back up
+- How often to back it up
+- Where to store copies
+- How to validate restore readiness
+
+## Architecture Overview
+
+A good strategy includes:
+
+- Primary data backups
+- Configuration and infrastructure backups
+- Off-site or offline copies
+- Restore testing
+
+The 3-2-1 rule is a strong baseline:
+
+- 3 copies of data
+- 2 different media or storage systems
+- 1 copy off-site
+
+For higher assurance, also consider an immutable or offline copy and zero-error verification.
+
+## Step-by-Step Guide
+
+### 1. Inventory what matters
+
+Back up:
+
+- Databases
+- Application data directories
+- Compose files and infrastructure code
+- DNS, reverse proxy, and secrets configuration
+- Hypervisor or VM backup metadata
+
+### 2. Choose backup tools by workload
+
+- File-level backups: restic, Borg, rsync-based workflows
+- VM backups: hypervisor-integrated backup jobs
+- Database-aware backups: logical dumps or physical backup tools where needed
+
+### 3. Schedule and retain intelligently
+
+Use a retention policy that matches recovery needs. Short retention for frequent snapshots and longer retention for off-site backups is common.
+
+### 4. Test restores
+
+Backups are incomplete until you can restore and start the service successfully.
+
+## Configuration Example
+
+Restic backup example:
+
+```bash
+export RESTIC_REPOSITORY=/backup/restic
+export RESTIC_PASSWORD_FILE=/run/secrets/restic_password
+
+restic backup /srv/app-data /srv/compose
+restic snapshots
+```
+
+Example restore check:
+
+```bash
+restic restore latest --target /tmp/restore-check
+```
+
+## Troubleshooting Tips
+
+### Backups exist but restores are incomplete
+
+- Confirm databases were backed up consistently, not mid-write without support
+- Verify application config and secret material were included
+- Check permissions and ownership in the restored data
+
+### Repository size grows too quickly
+
+- Review retention rules and pruning behavior
+- Exclude caches, transient files, and rebuildable artifacts
+- Split hot data from archival data if retention needs differ
+
+### Backups run but nobody notices failures
+
+- Alert on backup freshness and last successful run
+- Record the restore procedure for each critical service
+- Test restores on a schedule, not only after incidents
+
+## Best Practices
+
+- Back up both data and the configuration needed to use it
+- Keep at least one copy outside the main failure domain
+- Prefer encrypted backup repositories for off-site storage
+- Automate backup jobs and monitor their success
+- Practice restores for your most important services first
+
+## References
+
+- [restic documentation](https://restic.readthedocs.io/en/latest/)
+- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
+- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)
--- a/Guides/self-hosting/service-monitoring.md
+++ b/Guides/self-hosting/service-monitoring.md
@@ -0,0 +1,125 @@
+---
+title: Service Monitoring
+description: Guide to building a basic monitoring stack for self-hosted services and infrastructure
+tags:
+  - monitoring
+  - self-hosting
+  - observability
+category: self-hosting
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Service Monitoring
+
+## Introduction
+
+Monitoring turns a self-hosted environment from a collection of services into an operable system. At minimum, that means collecting metrics, checking service availability, and alerting on failures that need human action.
+
+## Purpose
+
+This guide focuses on:
+
+- Host and service metrics
+- Uptime checks
+- Dashboards and alerting
+- Monitoring coverage for common homelab services
+
+## Architecture Overview
+
+A small monitoring stack often includes:
+
+- Prometheus for scraping metrics
+- Exporters such as `node_exporter` for host metrics
+- Blackbox probing for endpoint availability
+- Grafana for dashboards
+- Alertmanager for notifications
+
+Typical flow:
+
+```text
+Exporter or target -> Prometheus -> Grafana dashboards
+Prometheus alerts -> Alertmanager -> notification channel
+```
+
+## Step-by-Step Guide
+
+### 1. Start with host metrics
+
+Install `node_exporter` on important Linux hosts or run it in a controlled containerized setup.
+
+### 2. Scrape targets from Prometheus
+
+Example scrape config:
+
+```yaml
+scrape_configs:
+  - job_name: node
+    static_configs:
+      - targets:
+          - "server-01.internal.example:9100"
+          - "server-02.internal.example:9100"
+```
+
+### 3. Add endpoint checks
+
+Use a blackbox probe or equivalent to test HTTPS and TCP reachability for user-facing services.
+
+### 4. Add dashboards and alerts
+
+Alert only on conditions that require action, such as:
+
+- Host down
+- Disk nearly full
+- Backup job missing
+- TLS certificate near expiry
+
+## Configuration Example
+
+Example alert concept:
+
+```yaml
+groups:
+  - name: infrastructure
+    rules:
+      - alert: HostDown
+        expr: up == 0
+        for: 5m
+        labels:
+          severity: critical
+```
+
+## Troubleshooting Tips
+
+### Metrics are missing for one host
+
+- Check exporter health on that host
+- Confirm firewall rules allow scraping
+- Verify the target name and port in the Prometheus config
+
+### Alerts are noisy
+
+- Add `for` durations to avoid alerting on short blips
+- Remove alerts that never trigger action
+- Tune thresholds per service class rather than globally
+
+### Dashboards look healthy while the service is down
+
+- Add blackbox checks in addition to internal metrics
+- Monitor the reverse proxy or external entry point, not only the app process
+- Track backups and certificate expiry separately from CPU and RAM
+
+## Best Practices
+
+- Monitor the services users depend on, not only the hosts they run on
+- Keep alert volume low enough that alerts remain meaningful
+- Document the owner and response path for each critical alert
+- Treat backup freshness and certificate expiry as first-class signals
+- Start simple, then add coverage where operational pain justifies it
+
+## References
+
+- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
+- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
+- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
+- [Grafana documentation](https://grafana.com/docs/grafana/latest/)
--- a/Guides/self-hosting/update-management.md
+++ b/Guides/self-hosting/update-management.md
@@ -0,0 +1,124 @@
+---
+title: Update Management
+description: Practical update management for Linux hosts, containers, and self-hosted services
+tags:
+  - updates
+  - patching
+  - self-hosting
+category: self-hosting
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Update Management
+
+## Introduction
+
+Update management keeps systems secure and supportable without turning every patch cycle into an outage. In self-hosted environments, the challenge is balancing security, uptime, and limited operator time.
+
+## Purpose
+
+This guide focuses on:
+
+- Operating system updates
+- Container and dependency updates
+- Scheduling, staging, and rollback planning
+
+## Architecture Overview
+
+A practical update process has four layers:
+
+- Inventory: know what you run
+- Detection: know when updates are available
+- Deployment: apply updates in a controlled order
+- Validation: confirm services still work
+
+## Step-by-Step Guide
+
+### 1. Separate systems by risk
+
+Create update rings such as:
+
+- Ring 1: non-critical test systems
+- Ring 2: internal services
+- Ring 3: critical stateful services and edge entry points
+
+### 2. Automate security updates where safe
+
+For Linux hosts, automated security updates can reduce patch delay for low-risk packages. Review distribution guidance and keep reboots controlled.
+
+### 3. Automate update discovery
+
+Use tools that open reviewable pull requests or dashboards for:
+
+- Container image updates
+- Dependency updates
+- Operating system patch reporting
+
+### 4. Validate after rollout
+
+Confirm:
+
+- Service health
+- Reverse proxy reachability
+- Backup jobs
+- Monitoring and alerting
+
+## Configuration Example
+
+Ubuntu unattended upgrades example:
+
+```text
+APT::Periodic::Update-Package-Lists "1";
+APT::Periodic::Unattended-Upgrade "1";
+```
+
+Dependency update automation example:
+
+```json
+{
+  "extends": ["config:recommended"],
+  "schedule": ["before 5am on monday"],
+  "packageRules": [
+    {
+      "matchUpdateTypes": ["major"],
+      "automerge": false
+    }
+  ]
+}
+```
+
+## Troubleshooting Tips
+
+### Updates are applied but regressions go unnoticed
+
+- Add post-update health checks
+- Review dashboards and key alerts after patch windows
+- Keep rollback or restore steps documented for stateful services
+
+### Too many update notifications create fatigue
+
+- Group low-risk updates into maintenance windows
+- Separate critical security issues from routine version bumps
+- Use labels or dashboards to prioritize by service importance
+
+### Containers stay outdated even though automation exists
+
+- Verify image digests and registry visibility
+- Confirm the deployment process actually recreates containers after image updates
+- Prefer reviewed rebuild and redeploy workflows over blind runtime mutation for important services
+
+## Best Practices
+
+- Patch internet-exposed and admin-facing services first
+- Stage risky or major updates through lower-risk environments
+- Prefer reviewable dependency automation over silent uncontrolled updates
+- Keep maintenance windows small and predictable
+- Document rollback expectations before making large version jumps
+
+## References
+
+- [Ubuntu Community Help Wiki: Automatic Security Updates](https://help.ubuntu.com/community/AutomaticSecurityUpdates)
+- [Debian Wiki: UnattendedUpgrades](https://wiki.debian.org/UnattendedUpgrades)
+- [Renovate documentation](https://docs.renovatebot.com/)
+- [GitHub Docs: Configuring Dependabot version updates](https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuring-dependabot-version-updates)