first version of the knowledge base :)

This commit is contained in:
2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions

View File

@@ -0,0 +1,125 @@
---
title: Docker Basics
description: Practical introduction to Docker images, containers, and everyday command-line workflows
tags:
- containers
- docker
- linux
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Docker Basics
## Introduction
Docker packages applications and their dependencies into images that run as isolated containers. For homelab and developer workflows, it is commonly used to deploy repeatable services without building a full virtual machine for each workload.
## Purpose
Docker is useful when you need:
- Repeatable application packaging
- Simple local development environments
- Fast service deployment on Linux hosts
- Clear separation between host OS and application runtime
## Architecture Overview
Core Docker concepts:
- Image: immutable application package template
- Container: running instance of an image
- Registry: source for pulling and pushing images
- Volume: persistent storage outside the writable container layer
- Network: connectivity boundary for one or more containers
Typical flow:
```text
Dockerfile -> Image -> Registry or local cache -> Container runtime
```
## Step-by-Step Guide
### 1. Verify Docker is installed
```bash
docker version
docker info
```
### 2. Pull and run a container
```bash
docker pull nginx:stable
docker run -d --name web -p 8080:80 nginx:stable
```
### 3. Inspect the running container
```bash
docker ps
docker logs web
docker exec -it web sh
```
### 4. Stop and remove it
```bash
docker stop web
docker rm web
```
## Configuration Example
Run a service with a persistent named volume:
```bash
docker volume create app-data
docker run -d \
--name app \
-p 3000:3000 \
-v app-data:/var/lib/app \
ghcr.io/example/app:latest
```
Inspect resource usage:
```bash
docker stats
```
## Troubleshooting Tips
### Container starts and exits immediately
- Check `docker logs <container>`
- Verify the image's default command is valid
- Confirm required environment variables or mounted files exist
### Port publishing does not work
- Verify the service is listening inside the container
- Confirm the host port is not already in use
- Check host firewall rules
### Data disappears after recreation
- Use a named volume or bind mount instead of the writable container layer
- Confirm the application writes data to the mounted path
## Best Practices
- Pin images to a known tag and update intentionally
- Use named volumes for application state
- Prefer non-root containers when supported by the image
- Keep containers single-purpose and externalize configuration
- Use Compose for multi-service stacks instead of long `docker run` commands
## References
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Docker: Get started](https://docs.docker.com/get-started/)
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)

View File

@@ -0,0 +1,156 @@
---
title: Docker Compose Patterns
description: Reusable patterns for structuring Docker Compose applications in homelab and development environments
tags:
- containers
- docker
- compose
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Docker Compose Patterns
## Introduction
Docker Compose defines multi-container applications in a single declarative file. It is a good fit for homelab stacks, local development, and small self-hosted services that do not require a full orchestrator.
## Purpose
Compose helps when you need:
- Repeatable service definitions
- Shared networks and volumes for a stack
- Environment-specific overrides
- A clear deployment artifact that can live in Git
## Architecture Overview
A Compose application usually includes:
- One or more services
- One or more shared networks
- Persistent volumes
- Environment variables and mounted configuration
- Optional health checks and startup dependencies
## Step-by-Step Guide
### 1. Start with a minimal Compose file
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
ports:
- "8080:8080"
```
Start it:
```bash
docker compose up -d
docker compose ps
```
### 2. Add persistent storage and configuration
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
ports:
- "8080:8080"
environment:
APP_BASE_URL: "https://app.example.com"
volumes:
- app-data:/var/lib/app
volumes:
app-data:
```
### 3. Add dependencies with health checks
```yaml
services:
db:
image: postgres:16
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- db-data:/var/lib/postgresql/data
app:
image: ghcr.io/example/app:1.2.3
depends_on:
db:
condition: service_healthy
environment:
DATABASE_URL: postgres://app:${POSTGRES_PASSWORD}@db:5432/app
ports:
- "8080:8080"
volumes:
db-data:
```
## Common Patterns
### Use one project directory per stack
Keep the Compose file, `.env` example, and mounted config together in one directory.
### Use user-defined networks
Private internal services should communicate over Compose networks rather than the host network.
### Prefer explicit volumes
Named volumes are easier to back up and document than anonymous ones.
### Use profiles for optional services
Profiles are useful for dev-only services, one-shot migration jobs, or optional observability components.
## Troubleshooting Tips
### Services start in the wrong order
- Use health checks instead of only container start order
- Ensure the application retries database or dependency connections
### Configuration drift between hosts
- Commit the Compose file to Git
- Keep secrets out of the file and inject them separately
- Avoid host-specific bind mount paths when portability matters
### Containers cannot resolve each other
- Check that the services share the same Compose network
- Use the service name as the hostname
- Verify the application is not hard-coded to `localhost`
## Best Practices
- Omit the deprecated top-level `version` field in new Compose files
- Keep secrets outside the Compose YAML when possible
- Pin images to intentional versions
- Use health checks for stateful dependencies
- Treat Compose as deployment code and review changes like application code
## References
- [Docker: Compose file reference](https://docs.docker.com/reference/compose-file/)
- [Docker: Compose application model](https://docs.docker.com/compose/intro/compose-application-model/)
- [Docker: Control startup and shutdown order in Compose](https://docs.docker.com/compose/how-tos/startup-order/)
- [Compose Specification](https://compose-spec.io/)

View File

@@ -0,0 +1,124 @@
---
title: Tailscale Exit Nodes
description: Guide to publishing and using Tailscale exit nodes for internet-bound traffic
tags:
- networking
- tailscale
- vpn
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Exit Nodes
## Introduction
An exit node is a Tailscale device that forwards a client's default route. When enabled, internet-bound traffic leaves through that node instead of the client's local network.
## Purpose
Exit nodes are commonly used for:
- Secure browsing on untrusted networks
- Reaching the internet through a trusted home or lab connection
- Testing geo-dependent behavior from another site
- Concentrating egress through a monitored network path
## Architecture Overview
With an exit node, the selected client sends default-route traffic through Tailscale to the exit node, which then forwards it to the public internet.
```text
Client -> Tailscale tunnel -> Exit node -> Internet
```
Important implications:
- The exit node becomes part of the trust boundary
- Bandwidth, DNS behavior, and logging depend on the exit node's network
- Local LAN access on the client may need explicit allowance
## Step-by-Step Guide
### 1. Prepare the exit node host
Choose a stable host with sufficient upstream bandwidth and a network path you trust. Typical choices are a home server, small VPS, or a utility VM.
### 2. Advertise the node as an exit node
On the node:
```bash
sudo tailscale up --advertise-exit-node
```
You can combine this with tags:
```bash
sudo tailscale up --advertise-exit-node --advertise-tags=tag:exit-node
```
### 3. Approve or review the role
Approve the exit node in the admin console if required by policy. Restrict who can use it with ACLs or grants.
### 4. Select the exit node on a client
From a client, choose the exit node in the Tailscale UI or configure it from the CLI:
```bash
sudo tailscale up --exit-node=<exit-node-name-or-ip>
```
If the client still needs to reach the local LAN directly, enable local LAN access in the client configuration or UI.
## Configuration Example
Example for a dedicated Linux exit node:
```bash
sudo tailscale up \
--advertise-exit-node \
--advertise-tags=tag:exit-node
```
Client-side example:
```bash
sudo tailscale up --exit-node=home-gateway
curl https://ifconfig.me
```
## Troubleshooting Tips
### Internet access stops after selecting the exit node
- Confirm the exit node is online in `tailscale status`
- Verify the exit node host itself has working internet access
- Check the exit node's local firewall and forwarding configuration
### Local printers or NAS become unreachable
- Enable local LAN access on the client if that behavior is required
- Split administrative traffic from internet egress if the use case is mixed
### Performance is poor
- Verify the client is using a nearby and healthy exit node
- Check the exit node's CPU, uplink bandwidth, and packet loss
- Avoid placing an exit node behind overloaded or unstable consumer hardware
## Best Practices
- Use exit nodes for specific trust and egress requirements, not as a default for every device
- Restrict usage to approved groups or devices
- Keep exit nodes patched because they handle broad traffic scopes
- Log and monitor egress hosts like any other shared network gateway
- Separate personal browsing, admin traffic, and production service egress when the risk model requires it
## References
- [Tailscale: Exit nodes](https://tailscale.com/kb/1103/exit-nodes)
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)

View File

@@ -0,0 +1,143 @@
---
title: Tailscale Subnet Routing
description: Guide to publishing LAN subnets into a Tailscale tailnet with subnet routers
tags:
- networking
- tailscale
- routing
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Subnet Routing
## Introduction
Subnet routing allows Tailscale clients to reach devices that are not running the Tailscale agent directly. This is useful for printers, storage appliances, hypervisors, IoT controllers, and legacy systems on a homelab LAN.
## Purpose
Use subnet routing when:
- A device cannot run the Tailscale client
- A full site-to-site VPN is unnecessary
- Remote users need access to one or more internal networks
- You want to publish access to a specific VLAN without exposing the entire environment
## Architecture Overview
A subnet router is a Tailscale node with IP forwarding enabled. It advertises one or more LAN prefixes to the tailnet.
```text
Remote client -> Tailscale tunnel -> Subnet router -> LAN target
```
Recommended placement:
- One router per routed network or security zone
- Prefer stable hosts such as small Linux VMs, routers, or dedicated utility nodes
- Apply restrictive ACLs so only approved identities can use the route
## Step-by-Step Guide
### 1. Prepare the router host
Install Tailscale on a Linux host that already has reachability to the target subnet.
Enable IPv4 forwarding:
```bash
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-tailscale.conf
sudo sysctl --system
```
If the subnet is IPv6-enabled, also enable IPv6 forwarding:
```bash
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl --system
```
### 2. Advertise the subnet
Start Tailscale and advertise the route:
```bash
sudo tailscale up --advertise-routes=192.168.10.0/24
```
Multiple routes can be advertised as a comma-separated list:
```bash
sudo tailscale up --advertise-routes=192.168.10.0/24,192.168.20.0/24
```
### 3. Approve the route
Approve the advertised route in the Tailscale admin console, or pre-authorize it with `autoApprovers` if that matches your policy model.
### 4. Restrict access
Use ACLs or grants so only the necessary users or tagged devices can reach the routed subnet.
Example policy intent:
- `group:admins` can reach `192.168.10.0/24`
- `group:developers` can only reach specific hosts or ports
- IoT and management subnets require separate approval
## Configuration Example
Example server-side command for a dedicated subnet router:
```bash
sudo tailscale up \
--advertise-routes=192.168.10.0/24 \
--advertise-tags=tag:subnet-router
```
Example policy idea:
```json
{
"tagOwners": {
"tag:subnet-router": ["group:admins"]
}
}
```
## Troubleshooting Tips
### Clients can see the route but cannot reach hosts
- Verify IP forwarding is enabled on the router
- Confirm local firewall rules permit forwarding traffic
- Make sure the router has normal LAN connectivity to the destination hosts
- Check whether the destination host has a host firewall blocking the source
### Route does not appear in the tailnet
- Confirm the router is online in `tailscale status`
- Check that the route was approved in the admin console
- Review whether policy requires a specific tag owner or auto-approval
### Asymmetric routing or reply failures
- Make sure the subnet router is in the normal return path for the destination subnet
- Avoid overlapping subnets across multiple sites unless routing precedence is intentional
- Do not advertise broad prefixes when a narrower one is sufficient
## Best Practices
- Advertise the smallest subnet that solves the use case
- Run subnet routers on stable infrastructure, not laptops
- Use separate routers for management and user-facing networks where possible
- Combine routing with ACLs; route advertisement alone is not authorization
- Monitor route health and document ownership of every advertised prefix
## References
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
- [Tailscale: Policy file syntax](https://tailscale.com/kb/1337/policy-syntax)

View File

@@ -0,0 +1,135 @@
---
title: SSH Hardening
description: Practical SSH server hardening guidance for Linux systems in homelab and self-hosted environments
tags:
- security
- ssh
- linux
category: security
created: 2026-03-14
updated: 2026-03-14
---
# SSH Hardening
## Introduction
SSH is the primary administrative entry point for many Linux systems. Hardening it reduces the likelihood of credential attacks, accidental privilege exposure, and overly broad remote access.
## Purpose
This guide focuses on making SSH safer by:
- Disabling weak authentication paths
- Reducing exposure to brute-force attacks
- Limiting which users can log in
- Preserving maintainability by relying on modern OpenSSH defaults where possible
## Architecture Overview
SSH hardening has three layers:
- Transport and daemon configuration
- Network exposure and firewall policy
- Operational practices such as key handling and logging
For most self-hosted systems, the best model is:
```text
Admin workstation -> VPN or trusted network -> SSH server
```
## Step-by-Step Guide
### 1. Use key-based authentication
Generate a key on the client and copy the public key to the server:
```bash
ssh-keygen -t ed25519 -C "admin@example.com"
ssh-copy-id admin@server.example
```
### 2. Harden `sshd_config`
Baseline example:
```text
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 30
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
AllowUsers admin
```
If you need port forwarding for a specific workflow, enable it deliberately instead of leaving it broadly available.
### 3. Validate the configuration
```bash
sudo sshd -t
```
### 4. Reload safely
Keep an existing SSH session open while reloading:
```bash
sudo systemctl reload sshd
```
Distribution-specific service names may be `ssh` or `sshd`.
### 5. Restrict network exposure
- Prefer VPN-only or management-VLAN-only access
- Allow SSH from trusted subnets only
- Do not expose SSH publicly unless it is necessary and monitored
## Configuration Example
Example host firewall intent:
```text
Allow TCP 22 from 192.168.10.0/24
Allow TCP 22 from Tailscale tailnet range
Deny TCP 22 from all other sources
```
## Troubleshooting Tips
### Locked out after config change
- Keep the original session open until a new login succeeds
- Validate the daemon config with `sshd -t`
- Check the service name and logs with `journalctl -u sshd` or `journalctl -u ssh`
### Key authentication fails
- Check file permissions on `~/.ssh` and `authorized_keys`
- Confirm the server allows public key authentication
- Verify the client is offering the intended key with `ssh -v`
### Automation jobs break
- Review whether the workload depended on password auth, port forwarding, or agent forwarding
- Create narrowly scoped exceptions rather than reverting the whole hardening change
## Best Practices
- Rely on current OpenSSH defaults for ciphers and algorithms unless you have a specific compliance need
- Disable password-based interactive logins on internet-reachable systems
- Use individual user accounts and `sudo` instead of direct root SSH
- Combine SSH hardening with network-level restrictions
- Review SSH logs regularly on administrative systems
## References
- [OpenBSD `sshd_config` manual](https://man.openbsd.org/sshd_config)
- [OpenSSH](https://www.openssh.com/)
- [Mozilla OpenSSH guidelines](https://infosec.mozilla.org/guidelines/openssh)

View File

@@ -0,0 +1,113 @@
---
title: YubiKey Usage
description: Guide to using a YubiKey for SSH, authentication, and key protection in self-hosted environments
tags:
- security
- yubikey
- ssh
category: security
created: 2026-03-14
updated: 2026-03-14
---
# YubiKey Usage
## Introduction
A YubiKey is a hardware token that can protect authentication and cryptographic operations. In homelab and engineering workflows, it is commonly used for MFA, SSH keys, and protection of GPG subkeys.
## Purpose
Use a YubiKey when you want:
- Stronger authentication than password-only login
- Private keys that require physical presence
- Portable hardware-backed credentials for administrative access
## Architecture Overview
YubiKeys can be used through different interfaces:
- FIDO2 or WebAuthn: MFA and modern hardware-backed authentication
- OpenSSH security keys: SSH keys such as `ed25519-sk`
- OpenPGP applet: card-resident GPG subkeys
- PIV: smart-card style certificate workflows
Choose the interface based on the workflow instead of trying to use one mode for everything.
## Step-by-Step Guide
### 1. Use the key for MFA first
Register the YubiKey with identity providers and critical services before moving on to SSH or GPG workflows.
### 2. Create a hardware-backed SSH key
On a system with OpenSSH support for security keys:
```bash
ssh-keygen -t ed25519-sk -C "admin@example.com"
```
This creates an SSH key tied to the hardware token.
### 3. Install the public key on servers
```bash
ssh-copy-id -i ~/.ssh/id_ed25519_sk.pub admin@server.example
```
### 4. Test login
```bash
ssh admin@server.example
```
Expect a touch prompt when required by the device policy.
## Configuration Example
Example client SSH config for a dedicated administrative target:
```text
Host lab-admin
HostName server.example
User admin
IdentityFile ~/.ssh/id_ed25519_sk
```
For GPG workflows, move only subkeys onto the YubiKey and keep the primary key offline when possible.
## Troubleshooting Tips
### The key is not detected
- Confirm USB or NFC access is available
- Check whether another smart-card daemon has locked the device
- Verify the client OS has support for the intended mode
### SSH prompts repeatedly or fails
- Make sure the correct public key is installed on the server
- Confirm the client is offering the security-key identity
- Check that the OpenSSH version supports the selected key type
### GPG or smart-card workflows are inconsistent
- Verify which YubiKey applet is in use
- Avoid mixing PIV and OpenPGP instructions unless the workflow requires both
- Keep backup tokens or recovery paths for administrative access
## Best Practices
- Use the YubiKey as part of a broader account recovery plan, not as the only path back in
- Keep at least one spare token for high-value admin accounts
- Prefer hardware-backed SSH keys for administrator accounts
- Document which services rely on the token and how recovery works
- Separate MFA usage from certificate and signing workflows unless there is a clear operational reason to combine them
## References
- [Yubico: SSH](https://developers.yubico.com/SSH/)
- [Yubico: YubiKey and OpenPGP](https://developers.yubico.com/PGP/)
- [Yubico developer documentation](https://developers.yubico.com/)

View File

@@ -0,0 +1,121 @@
---
title: Backup Strategies
description: Practical backup strategy guidance for self-hosted services, containers, and virtualized homelabs
tags:
- backup
- self-hosting
- operations
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Backup Strategies
## Introduction
Backups protect against deletion, corruption, hardware failure, ransomware, and operational mistakes. In self-hosted environments, a backup strategy should cover both data and the information needed to restore services correctly.
## Purpose
This guide covers:
- What to back up
- How often to back it up
- Where to store copies
- How to validate restore readiness
## Architecture Overview
A good strategy includes:
- Primary data backups
- Configuration and infrastructure backups
- Off-site or offline copies
- Restore testing
The 3-2-1 rule is a strong baseline:
- 3 copies of data
- 2 different media or storage systems
- 1 copy off-site
For higher assurance, also consider an immutable or offline copy and zero-error verification.
## Step-by-Step Guide
### 1. Inventory what matters
Back up:
- Databases
- Application data directories
- Compose files and infrastructure code
- DNS, reverse proxy, and secrets configuration
- Hypervisor or VM backup metadata
### 2. Choose backup tools by workload
- File-level backups: restic, Borg, rsync-based workflows
- VM backups: hypervisor-integrated backup jobs
- Database-aware backups: logical dumps or physical backup tools where needed
### 3. Schedule and retain intelligently
Use a retention policy that matches recovery needs. Short retention for frequent snapshots and longer retention for off-site backups is common.
### 4. Test restores
Backups are incomplete until you can restore and start the service successfully.
## Configuration Example
Restic backup example:
```bash
export RESTIC_REPOSITORY=/backup/restic
export RESTIC_PASSWORD_FILE=/run/secrets/restic_password
restic backup /srv/app-data /srv/compose
restic snapshots
```
Example restore check:
```bash
restic restore latest --target /tmp/restore-check
```
## Troubleshooting Tips
### Backups exist but restores are incomplete
- Confirm databases were backed up consistently, not mid-write without support
- Verify application config and secret material were included
- Check permissions and ownership in the restored data
### Repository size grows too quickly
- Review retention rules and pruning behavior
- Exclude caches, transient files, and rebuildable artifacts
- Split hot data from archival data if retention needs differ
### Backups run but nobody notices failures
- Alert on backup freshness and last successful run
- Record the restore procedure for each critical service
- Test restores on a schedule, not only after incidents
## Best Practices
- Back up both data and the configuration needed to use it
- Keep at least one copy outside the main failure domain
- Prefer encrypted backup repositories for off-site storage
- Automate backup jobs and monitor their success
- Practice restores for your most important services first
## References
- [restic documentation](https://restic.readthedocs.io/en/latest/)
- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)

View File

@@ -0,0 +1,125 @@
---
title: Service Monitoring
description: Guide to building a basic monitoring stack for self-hosted services and infrastructure
tags:
- monitoring
- self-hosting
- observability
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Service Monitoring
## Introduction
Monitoring turns a self-hosted environment from a collection of services into an operable system. At minimum, that means collecting metrics, checking service availability, and alerting on failures that need human action.
## Purpose
This guide focuses on:
- Host and service metrics
- Uptime checks
- Dashboards and alerting
- Monitoring coverage for common homelab services
## Architecture Overview
A small monitoring stack often includes:
- Prometheus for scraping metrics
- Exporters such as `node_exporter` for host metrics
- Blackbox probing for endpoint availability
- Grafana for dashboards
- Alertmanager for notifications
Typical flow:
```text
Exporter or target -> Prometheus -> Grafana dashboards
Prometheus alerts -> Alertmanager -> notification channel
```
## Step-by-Step Guide
### 1. Start with host metrics
Install `node_exporter` on important Linux hosts or run it in a controlled containerized setup.
### 2. Scrape targets from Prometheus
Example scrape config:
```yaml
scrape_configs:
- job_name: node
static_configs:
- targets:
- "server-01.internal.example:9100"
- "server-02.internal.example:9100"
```
### 3. Add endpoint checks
Use a blackbox probe or equivalent to test HTTPS and TCP reachability for user-facing services.
### 4. Add dashboards and alerts
Alert only on conditions that require action, such as:
- Host down
- Disk nearly full
- Backup job missing
- TLS certificate near expiry
## Configuration Example
Example alert concept:
```yaml
groups:
- name: infrastructure
rules:
- alert: HostDown
expr: up == 0
for: 5m
labels:
severity: critical
```
## Troubleshooting Tips
### Metrics are missing for one host
- Check exporter health on that host
- Confirm firewall rules allow scraping
- Verify the target name and port in the Prometheus config
### Alerts are noisy
- Add `for` durations to avoid alerting on short blips
- Remove alerts that never trigger action
- Tune thresholds per service class rather than globally
### Dashboards look healthy while the service is down
- Add blackbox checks in addition to internal metrics
- Monitor the reverse proxy or external entry point, not only the app process
- Track backups and certificate expiry separately from CPU and RAM
## Best Practices
- Monitor the services users depend on, not only the hosts they run on
- Keep alert volume low enough that alerts remain meaningful
- Document the owner and response path for each critical alert
- Treat backup freshness and certificate expiry as first-class signals
- Start simple, then add coverage where operational pain justifies it
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)

View File

@@ -0,0 +1,124 @@
---
title: Update Management
description: Practical update management for Linux hosts, containers, and self-hosted services
tags:
- updates
- patching
- self-hosting
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Update Management
## Introduction
Update management keeps systems secure and supportable without turning every patch cycle into an outage. In self-hosted environments, the challenge is balancing security, uptime, and limited operator time.
## Purpose
This guide focuses on:
- Operating system updates
- Container and dependency updates
- Scheduling, staging, and rollback planning
## Architecture Overview
A practical update process has four layers:
- Inventory: know what you run
- Detection: know when updates are available
- Deployment: apply updates in a controlled order
- Validation: confirm services still work
## Step-by-Step Guide
### 1. Separate systems by risk
Create update rings such as:
- Ring 1: non-critical test systems
- Ring 2: internal services
- Ring 3: critical stateful services and edge entry points
### 2. Automate security updates where safe
For Linux hosts, automated security updates can reduce patch delay for low-risk packages. Review distribution guidance and keep reboots controlled.
### 3. Automate update discovery
Use tools that open reviewable pull requests or dashboards for:
- Container image updates
- Dependency updates
- Operating system patch reporting
### 4. Validate after rollout
Confirm:
- Service health
- Reverse proxy reachability
- Backup jobs
- Monitoring and alerting
## Configuration Example
Ubuntu unattended upgrades example:
```text
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
```
Dependency update automation example:
```json
{
"extends": ["config:recommended"],
"schedule": ["before 5am on monday"],
"packageRules": [
{
"matchUpdateTypes": ["major"],
"automerge": false
}
]
}
```
## Troubleshooting Tips
### Updates are applied but regressions go unnoticed
- Add post-update health checks
- Review dashboards and key alerts after patch windows
- Keep rollback or restore steps documented for stateful services
### Too many update notifications create fatigue
- Group low-risk updates into maintenance windows
- Separate critical security issues from routine version bumps
- Use labels or dashboards to prioritize by service importance
### Containers stay outdated even though automation exists
- Verify image digests and registry visibility
- Confirm the deployment process actually recreates containers after image updates
- Prefer reviewed rebuild and redeploy workflows over blind runtime mutation for important services
## Best Practices
- Patch internet-exposed and admin-facing services first
- Stage risky or major updates through lower-risk environments
- Prefer reviewable dependency automation over silent uncontrolled updates
- Keep maintenance windows small and predictable
- Document rollback expectations before making large version jumps
## References
- [Ubuntu Community Help Wiki: Automatic Security Updates](https://help.ubuntu.com/community/AutomaticSecurityUpdates)
- [Debian Wiki: UnattendedUpgrades](https://wiki.debian.org/UnattendedUpgrades)
- [Renovate documentation](https://docs.renovatebot.com/)
- [GitHub Docs: Configuring Dependabot version updates](https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuring-dependabot-version-updates)