first version of the knowledge base :)
This commit is contained in:
125
40 - Guides/containers/docker-basics.md
Normal file
125
40 - Guides/containers/docker-basics.md
Normal file
@@ -0,0 +1,125 @@
|
||||
---
|
||||
title: Docker Basics
|
||||
description: Practical introduction to Docker images, containers, and everyday command-line workflows
|
||||
tags:
|
||||
- containers
|
||||
- docker
|
||||
- linux
|
||||
category: containers
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Docker Basics
|
||||
|
||||
## Introduction
|
||||
|
||||
Docker packages applications and their dependencies into images that run as isolated containers. For homelab and developer workflows, it is commonly used to deploy repeatable services without building a full virtual machine for each workload.
|
||||
|
||||
## Purpose
|
||||
|
||||
Docker is useful when you need:
|
||||
|
||||
- Repeatable application packaging
|
||||
- Simple local development environments
|
||||
- Fast service deployment on Linux hosts
|
||||
- Clear separation between host OS and application runtime
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Core Docker concepts:
|
||||
|
||||
- Image: immutable application package template
|
||||
- Container: running instance of an image
|
||||
- Registry: source for pulling and pushing images
|
||||
- Volume: persistent storage outside the writable container layer
|
||||
- Network: connectivity boundary for one or more containers
|
||||
|
||||
Typical flow:
|
||||
|
||||
```text
|
||||
Dockerfile -> Image -> Registry or local cache -> Container runtime
|
||||
```
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Verify Docker is installed
|
||||
|
||||
```bash
|
||||
docker version
|
||||
docker info
|
||||
```
|
||||
|
||||
### 2. Pull and run a container
|
||||
|
||||
```bash
|
||||
docker pull nginx:stable
|
||||
docker run -d --name web -p 8080:80 nginx:stable
|
||||
```
|
||||
|
||||
### 3. Inspect the running container
|
||||
|
||||
```bash
|
||||
docker ps
|
||||
docker logs web
|
||||
docker exec -it web sh
|
||||
```
|
||||
|
||||
### 4. Stop and remove it
|
||||
|
||||
```bash
|
||||
docker stop web
|
||||
docker rm web
|
||||
```
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Run a service with a persistent named volume:
|
||||
|
||||
```bash
|
||||
docker volume create app-data
|
||||
docker run -d \
|
||||
--name app \
|
||||
-p 3000:3000 \
|
||||
-v app-data:/var/lib/app \
|
||||
ghcr.io/example/app:latest
|
||||
```
|
||||
|
||||
Inspect resource usage:
|
||||
|
||||
```bash
|
||||
docker stats
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Container starts and exits immediately
|
||||
|
||||
- Check `docker logs <container>`
|
||||
- Verify the image's default command is valid
|
||||
- Confirm required environment variables or mounted files exist
|
||||
|
||||
### Port publishing does not work
|
||||
|
||||
- Verify the service is listening inside the container
|
||||
- Confirm the host port is not already in use
|
||||
- Check host firewall rules
|
||||
|
||||
### Data disappears after recreation
|
||||
|
||||
- Use a named volume or bind mount instead of the writable container layer
|
||||
- Confirm the application writes data to the mounted path
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Pin images to a known tag and update intentionally
|
||||
- Use named volumes for application state
|
||||
- Prefer non-root containers when supported by the image
|
||||
- Keep containers single-purpose and externalize configuration
|
||||
- Use Compose for multi-service stacks instead of long `docker run` commands
|
||||
|
||||
## References
|
||||
|
||||
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
|
||||
- [Docker: Get started](https://docs.docker.com/get-started/)
|
||||
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
|
||||
156
40 - Guides/containers/docker-compose-patterns.md
Normal file
156
40 - Guides/containers/docker-compose-patterns.md
Normal file
@@ -0,0 +1,156 @@
|
||||
---
|
||||
title: Docker Compose Patterns
|
||||
description: Reusable patterns for structuring Docker Compose applications in homelab and development environments
|
||||
tags:
|
||||
- containers
|
||||
- docker
|
||||
- compose
|
||||
category: containers
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Docker Compose Patterns
|
||||
|
||||
## Introduction
|
||||
|
||||
Docker Compose defines multi-container applications in a single declarative file. It is a good fit for homelab stacks, local development, and small self-hosted services that do not require a full orchestrator.
|
||||
|
||||
## Purpose
|
||||
|
||||
Compose helps when you need:
|
||||
|
||||
- Repeatable service definitions
|
||||
- Shared networks and volumes for a stack
|
||||
- Environment-specific overrides
|
||||
- A clear deployment artifact that can live in Git
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
A Compose application usually includes:
|
||||
|
||||
- One or more services
|
||||
- One or more shared networks
|
||||
- Persistent volumes
|
||||
- Environment variables and mounted configuration
|
||||
- Optional health checks and startup dependencies
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Start with a minimal Compose file
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
image: ghcr.io/example/app:1.2.3
|
||||
ports:
|
||||
- "8080:8080"
|
||||
```
|
||||
|
||||
Start it:
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
### 2. Add persistent storage and configuration
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
image: ghcr.io/example/app:1.2.3
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
APP_BASE_URL: "https://app.example.com"
|
||||
volumes:
|
||||
- app-data:/var/lib/app
|
||||
|
||||
volumes:
|
||||
app-data:
|
||||
```
|
||||
|
||||
### 3. Add dependencies with health checks
|
||||
|
||||
```yaml
|
||||
services:
|
||||
db:
|
||||
image: postgres:16
|
||||
environment:
|
||||
POSTGRES_DB: app
|
||||
POSTGRES_USER: app
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U app"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
volumes:
|
||||
- db-data:/var/lib/postgresql/data
|
||||
|
||||
app:
|
||||
image: ghcr.io/example/app:1.2.3
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
DATABASE_URL: postgres://app:${POSTGRES_PASSWORD}@db:5432/app
|
||||
ports:
|
||||
- "8080:8080"
|
||||
|
||||
volumes:
|
||||
db-data:
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Use one project directory per stack
|
||||
|
||||
Keep the Compose file, `.env` example, and mounted config together in one directory.
|
||||
|
||||
### Use user-defined networks
|
||||
|
||||
Private internal services should communicate over Compose networks rather than the host network.
|
||||
|
||||
### Prefer explicit volumes
|
||||
|
||||
Named volumes are easier to back up and document than anonymous ones.
|
||||
|
||||
### Use profiles for optional services
|
||||
|
||||
Profiles are useful for dev-only services, one-shot migration jobs, or optional observability components.
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Services start in the wrong order
|
||||
|
||||
- Use health checks instead of only container start order
|
||||
- Ensure the application retries database or dependency connections
|
||||
|
||||
### Configuration drift between hosts
|
||||
|
||||
- Commit the Compose file to Git
|
||||
- Keep secrets out of the file and inject them separately
|
||||
- Avoid host-specific bind mount paths when portability matters
|
||||
|
||||
### Containers cannot resolve each other
|
||||
|
||||
- Check that the services share the same Compose network
|
||||
- Use the service name as the hostname
|
||||
- Verify the application is not hard-coded to `localhost`
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Omit the deprecated top-level `version` field in new Compose files
|
||||
- Keep secrets outside the Compose YAML when possible
|
||||
- Pin images to intentional versions
|
||||
- Use health checks for stateful dependencies
|
||||
- Treat Compose as deployment code and review changes like application code
|
||||
|
||||
## References
|
||||
|
||||
- [Docker: Compose file reference](https://docs.docker.com/reference/compose-file/)
|
||||
- [Docker: Compose application model](https://docs.docker.com/compose/intro/compose-application-model/)
|
||||
- [Docker: Control startup and shutdown order in Compose](https://docs.docker.com/compose/how-tos/startup-order/)
|
||||
- [Compose Specification](https://compose-spec.io/)
|
||||
124
40 - Guides/networking/tailscale-exit-nodes.md
Normal file
124
40 - Guides/networking/tailscale-exit-nodes.md
Normal file
@@ -0,0 +1,124 @@
|
||||
---
|
||||
title: Tailscale Exit Nodes
|
||||
description: Guide to publishing and using Tailscale exit nodes for internet-bound traffic
|
||||
tags:
|
||||
- networking
|
||||
- tailscale
|
||||
- vpn
|
||||
category: networking
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Tailscale Exit Nodes
|
||||
|
||||
## Introduction
|
||||
|
||||
An exit node is a Tailscale device that forwards a client's default route. When enabled, internet-bound traffic leaves through that node instead of the client's local network.
|
||||
|
||||
## Purpose
|
||||
|
||||
Exit nodes are commonly used for:
|
||||
|
||||
- Secure browsing on untrusted networks
|
||||
- Reaching the internet through a trusted home or lab connection
|
||||
- Testing geo-dependent behavior from another site
|
||||
- Concentrating egress through a monitored network path
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
With an exit node, the selected client sends default-route traffic through Tailscale to the exit node, which then forwards it to the public internet.
|
||||
|
||||
```text
|
||||
Client -> Tailscale tunnel -> Exit node -> Internet
|
||||
```
|
||||
|
||||
Important implications:
|
||||
|
||||
- The exit node becomes part of the trust boundary
|
||||
- Bandwidth, DNS behavior, and logging depend on the exit node's network
|
||||
- Local LAN access on the client may need explicit allowance
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Prepare the exit node host
|
||||
|
||||
Choose a stable host with sufficient upstream bandwidth and a network path you trust. Typical choices are a home server, small VPS, or a utility VM.
|
||||
|
||||
### 2. Advertise the node as an exit node
|
||||
|
||||
On the node:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --advertise-exit-node
|
||||
```
|
||||
|
||||
You can combine this with tags:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --advertise-exit-node --advertise-tags=tag:exit-node
|
||||
```
|
||||
|
||||
### 3. Approve or review the role
|
||||
|
||||
Approve the exit node in the admin console if required by policy. Restrict who can use it with ACLs or grants.
|
||||
|
||||
### 4. Select the exit node on a client
|
||||
|
||||
From a client, choose the exit node in the Tailscale UI or configure it from the CLI:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --exit-node=<exit-node-name-or-ip>
|
||||
```
|
||||
|
||||
If the client still needs to reach the local LAN directly, enable local LAN access in the client configuration or UI.
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Example for a dedicated Linux exit node:
|
||||
|
||||
```bash
|
||||
sudo tailscale up \
|
||||
--advertise-exit-node \
|
||||
--advertise-tags=tag:exit-node
|
||||
```
|
||||
|
||||
Client-side example:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --exit-node=home-gateway
|
||||
curl https://ifconfig.me
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Internet access stops after selecting the exit node
|
||||
|
||||
- Confirm the exit node is online in `tailscale status`
|
||||
- Verify the exit node host itself has working internet access
|
||||
- Check the exit node's local firewall and forwarding configuration
|
||||
|
||||
### Local printers or NAS become unreachable
|
||||
|
||||
- Enable local LAN access on the client if that behavior is required
|
||||
- Split administrative traffic from internet egress if the use case is mixed
|
||||
|
||||
### Performance is poor
|
||||
|
||||
- Verify the client is using a nearby and healthy exit node
|
||||
- Check the exit node's CPU, uplink bandwidth, and packet loss
|
||||
- Avoid placing an exit node behind overloaded or unstable consumer hardware
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Use exit nodes for specific trust and egress requirements, not as a default for every device
|
||||
- Restrict usage to approved groups or devices
|
||||
- Keep exit nodes patched because they handle broad traffic scopes
|
||||
- Log and monitor egress hosts like any other shared network gateway
|
||||
- Separate personal browsing, admin traffic, and production service egress when the risk model requires it
|
||||
|
||||
## References
|
||||
|
||||
- [Tailscale: Exit nodes](https://tailscale.com/kb/1103/exit-nodes)
|
||||
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
|
||||
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
|
||||
143
40 - Guides/networking/tailscale-subnet-routing.md
Normal file
143
40 - Guides/networking/tailscale-subnet-routing.md
Normal file
@@ -0,0 +1,143 @@
|
||||
---
|
||||
title: Tailscale Subnet Routing
|
||||
description: Guide to publishing LAN subnets into a Tailscale tailnet with subnet routers
|
||||
tags:
|
||||
- networking
|
||||
- tailscale
|
||||
- routing
|
||||
category: networking
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Tailscale Subnet Routing
|
||||
|
||||
## Introduction
|
||||
|
||||
Subnet routing allows Tailscale clients to reach devices that are not running the Tailscale agent directly. This is useful for printers, storage appliances, hypervisors, IoT controllers, and legacy systems on a homelab LAN.
|
||||
|
||||
## Purpose
|
||||
|
||||
Use subnet routing when:
|
||||
|
||||
- A device cannot run the Tailscale client
|
||||
- A full site-to-site VPN is unnecessary
|
||||
- Remote users need access to one or more internal networks
|
||||
- You want to publish access to a specific VLAN without exposing the entire environment
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
A subnet router is a Tailscale node with IP forwarding enabled. It advertises one or more LAN prefixes to the tailnet.
|
||||
|
||||
```text
|
||||
Remote client -> Tailscale tunnel -> Subnet router -> LAN target
|
||||
```
|
||||
|
||||
Recommended placement:
|
||||
|
||||
- One router per routed network or security zone
|
||||
- Prefer stable hosts such as small Linux VMs, routers, or dedicated utility nodes
|
||||
- Apply restrictive ACLs so only approved identities can use the route
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Prepare the router host
|
||||
|
||||
Install Tailscale on a Linux host that already has reachability to the target subnet.
|
||||
|
||||
Enable IPv4 forwarding:
|
||||
|
||||
```bash
|
||||
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-tailscale.conf
|
||||
sudo sysctl --system
|
||||
```
|
||||
|
||||
If the subnet is IPv6-enabled, also enable IPv6 forwarding:
|
||||
|
||||
```bash
|
||||
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
|
||||
sudo sysctl --system
|
||||
```
|
||||
|
||||
### 2. Advertise the subnet
|
||||
|
||||
Start Tailscale and advertise the route:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --advertise-routes=192.168.10.0/24
|
||||
```
|
||||
|
||||
Multiple routes can be advertised as a comma-separated list:
|
||||
|
||||
```bash
|
||||
sudo tailscale up --advertise-routes=192.168.10.0/24,192.168.20.0/24
|
||||
```
|
||||
|
||||
### 3. Approve the route
|
||||
|
||||
Approve the advertised route in the Tailscale admin console, or pre-authorize it with `autoApprovers` if that matches your policy model.
|
||||
|
||||
### 4. Restrict access
|
||||
|
||||
Use ACLs or grants so only the necessary users or tagged devices can reach the routed subnet.
|
||||
|
||||
Example policy intent:
|
||||
|
||||
- `group:admins` can reach `192.168.10.0/24`
|
||||
- `group:developers` can only reach specific hosts or ports
|
||||
- IoT and management subnets require separate approval
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Example server-side command for a dedicated subnet router:
|
||||
|
||||
```bash
|
||||
sudo tailscale up \
|
||||
--advertise-routes=192.168.10.0/24 \
|
||||
--advertise-tags=tag:subnet-router
|
||||
```
|
||||
|
||||
Example policy idea:
|
||||
|
||||
```json
|
||||
{
|
||||
"tagOwners": {
|
||||
"tag:subnet-router": ["group:admins"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Clients can see the route but cannot reach hosts
|
||||
|
||||
- Verify IP forwarding is enabled on the router
|
||||
- Confirm local firewall rules permit forwarding traffic
|
||||
- Make sure the router has normal LAN connectivity to the destination hosts
|
||||
- Check whether the destination host has a host firewall blocking the source
|
||||
|
||||
### Route does not appear in the tailnet
|
||||
|
||||
- Confirm the router is online in `tailscale status`
|
||||
- Check that the route was approved in the admin console
|
||||
- Review whether policy requires a specific tag owner or auto-approval
|
||||
|
||||
### Asymmetric routing or reply failures
|
||||
|
||||
- Make sure the subnet router is in the normal return path for the destination subnet
|
||||
- Avoid overlapping subnets across multiple sites unless routing precedence is intentional
|
||||
- Do not advertise broad prefixes when a narrower one is sufficient
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Advertise the smallest subnet that solves the use case
|
||||
- Run subnet routers on stable infrastructure, not laptops
|
||||
- Use separate routers for management and user-facing networks where possible
|
||||
- Combine routing with ACLs; route advertisement alone is not authorization
|
||||
- Monitor route health and document ownership of every advertised prefix
|
||||
|
||||
## References
|
||||
|
||||
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
|
||||
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
|
||||
- [Tailscale: Policy file syntax](https://tailscale.com/kb/1337/policy-syntax)
|
||||
135
40 - Guides/security/ssh-hardening.md
Normal file
135
40 - Guides/security/ssh-hardening.md
Normal file
@@ -0,0 +1,135 @@
|
||||
---
|
||||
title: SSH Hardening
|
||||
description: Practical SSH server hardening guidance for Linux systems in homelab and self-hosted environments
|
||||
tags:
|
||||
- security
|
||||
- ssh
|
||||
- linux
|
||||
category: security
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# SSH Hardening
|
||||
|
||||
## Introduction
|
||||
|
||||
SSH is the primary administrative entry point for many Linux systems. Hardening it reduces the likelihood of credential attacks, accidental privilege exposure, and overly broad remote access.
|
||||
|
||||
## Purpose
|
||||
|
||||
This guide focuses on making SSH safer by:
|
||||
|
||||
- Disabling weak authentication paths
|
||||
- Reducing exposure to brute-force attacks
|
||||
- Limiting which users can log in
|
||||
- Preserving maintainability by relying on modern OpenSSH defaults where possible
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
SSH hardening has three layers:
|
||||
|
||||
- Transport and daemon configuration
|
||||
- Network exposure and firewall policy
|
||||
- Operational practices such as key handling and logging
|
||||
|
||||
For most self-hosted systems, the best model is:
|
||||
|
||||
```text
|
||||
Admin workstation -> VPN or trusted network -> SSH server
|
||||
```
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Use key-based authentication
|
||||
|
||||
Generate a key on the client and copy the public key to the server:
|
||||
|
||||
```bash
|
||||
ssh-keygen -t ed25519 -C "admin@example.com"
|
||||
ssh-copy-id admin@server.example
|
||||
```
|
||||
|
||||
### 2. Harden `sshd_config`
|
||||
|
||||
Baseline example:
|
||||
|
||||
```text
|
||||
PermitRootLogin no
|
||||
PasswordAuthentication no
|
||||
KbdInteractiveAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
MaxAuthTries 3
|
||||
LoginGraceTime 30
|
||||
X11Forwarding no
|
||||
AllowTcpForwarding no
|
||||
AllowAgentForwarding no
|
||||
AllowUsers admin
|
||||
```
|
||||
|
||||
If you need port forwarding for a specific workflow, enable it deliberately instead of leaving it broadly available.
|
||||
|
||||
### 3. Validate the configuration
|
||||
|
||||
```bash
|
||||
sudo sshd -t
|
||||
```
|
||||
|
||||
### 4. Reload safely
|
||||
|
||||
Keep an existing SSH session open while reloading:
|
||||
|
||||
```bash
|
||||
sudo systemctl reload sshd
|
||||
```
|
||||
|
||||
Distribution-specific service names may be `ssh` or `sshd`.
|
||||
|
||||
### 5. Restrict network exposure
|
||||
|
||||
- Prefer VPN-only or management-VLAN-only access
|
||||
- Allow SSH from trusted subnets only
|
||||
- Do not expose SSH publicly unless it is necessary and monitored
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Example host firewall intent:
|
||||
|
||||
```text
|
||||
Allow TCP 22 from 192.168.10.0/24
|
||||
Allow TCP 22 from Tailscale tailnet range
|
||||
Deny TCP 22 from all other sources
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Locked out after config change
|
||||
|
||||
- Keep the original session open until a new login succeeds
|
||||
- Validate the daemon config with `sshd -t`
|
||||
- Check the service name and logs with `journalctl -u sshd` or `journalctl -u ssh`
|
||||
|
||||
### Key authentication fails
|
||||
|
||||
- Check file permissions on `~/.ssh` and `authorized_keys`
|
||||
- Confirm the server allows public key authentication
|
||||
- Verify the client is offering the intended key with `ssh -v`
|
||||
|
||||
### Automation jobs break
|
||||
|
||||
- Review whether the workload depended on password auth, port forwarding, or agent forwarding
|
||||
- Create narrowly scoped exceptions rather than reverting the whole hardening change
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Rely on current OpenSSH defaults for ciphers and algorithms unless you have a specific compliance need
|
||||
- Disable password-based interactive logins on internet-reachable systems
|
||||
- Use individual user accounts and `sudo` instead of direct root SSH
|
||||
- Combine SSH hardening with network-level restrictions
|
||||
- Review SSH logs regularly on administrative systems
|
||||
|
||||
## References
|
||||
|
||||
- [OpenBSD `sshd_config` manual](https://man.openbsd.org/sshd_config)
|
||||
- [OpenSSH](https://www.openssh.com/)
|
||||
- [Mozilla OpenSSH guidelines](https://infosec.mozilla.org/guidelines/openssh)
|
||||
113
40 - Guides/security/yubikey-usage.md
Normal file
113
40 - Guides/security/yubikey-usage.md
Normal file
@@ -0,0 +1,113 @@
|
||||
---
|
||||
title: YubiKey Usage
|
||||
description: Guide to using a YubiKey for SSH, authentication, and key protection in self-hosted environments
|
||||
tags:
|
||||
- security
|
||||
- yubikey
|
||||
- ssh
|
||||
category: security
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# YubiKey Usage
|
||||
|
||||
## Introduction
|
||||
|
||||
A YubiKey is a hardware token that can protect authentication and cryptographic operations. In homelab and engineering workflows, it is commonly used for MFA, SSH keys, and protection of GPG subkeys.
|
||||
|
||||
## Purpose
|
||||
|
||||
Use a YubiKey when you want:
|
||||
|
||||
- Stronger authentication than password-only login
|
||||
- Private keys that require physical presence
|
||||
- Portable hardware-backed credentials for administrative access
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
YubiKeys can be used through different interfaces:
|
||||
|
||||
- FIDO2 or WebAuthn: MFA and modern hardware-backed authentication
|
||||
- OpenSSH security keys: SSH keys such as `ed25519-sk`
|
||||
- OpenPGP applet: card-resident GPG subkeys
|
||||
- PIV: smart-card style certificate workflows
|
||||
|
||||
Choose the interface based on the workflow instead of trying to use one mode for everything.
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Use the key for MFA first
|
||||
|
||||
Register the YubiKey with identity providers and critical services before moving on to SSH or GPG workflows.
|
||||
|
||||
### 2. Create a hardware-backed SSH key
|
||||
|
||||
On a system with OpenSSH support for security keys:
|
||||
|
||||
```bash
|
||||
ssh-keygen -t ed25519-sk -C "admin@example.com"
|
||||
```
|
||||
|
||||
This creates an SSH key tied to the hardware token.
|
||||
|
||||
### 3. Install the public key on servers
|
||||
|
||||
```bash
|
||||
ssh-copy-id -i ~/.ssh/id_ed25519_sk.pub admin@server.example
|
||||
```
|
||||
|
||||
### 4. Test login
|
||||
|
||||
```bash
|
||||
ssh admin@server.example
|
||||
```
|
||||
|
||||
Expect a touch prompt when required by the device policy.
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Example client SSH config for a dedicated administrative target:
|
||||
|
||||
```text
|
||||
Host lab-admin
|
||||
HostName server.example
|
||||
User admin
|
||||
IdentityFile ~/.ssh/id_ed25519_sk
|
||||
```
|
||||
|
||||
For GPG workflows, move only subkeys onto the YubiKey and keep the primary key offline when possible.
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### The key is not detected
|
||||
|
||||
- Confirm USB or NFC access is available
|
||||
- Check whether another smart-card daemon has locked the device
|
||||
- Verify the client OS has support for the intended mode
|
||||
|
||||
### SSH prompts repeatedly or fails
|
||||
|
||||
- Make sure the correct public key is installed on the server
|
||||
- Confirm the client is offering the security-key identity
|
||||
- Check that the OpenSSH version supports the selected key type
|
||||
|
||||
### GPG or smart-card workflows are inconsistent
|
||||
|
||||
- Verify which YubiKey applet is in use
|
||||
- Avoid mixing PIV and OpenPGP instructions unless the workflow requires both
|
||||
- Keep backup tokens or recovery paths for administrative access
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Use the YubiKey as part of a broader account recovery plan, not as the only path back in
|
||||
- Keep at least one spare token for high-value admin accounts
|
||||
- Prefer hardware-backed SSH keys for administrator accounts
|
||||
- Document which services rely on the token and how recovery works
|
||||
- Separate MFA usage from certificate and signing workflows unless there is a clear operational reason to combine them
|
||||
|
||||
## References
|
||||
|
||||
- [Yubico: SSH](https://developers.yubico.com/SSH/)
|
||||
- [Yubico: YubiKey and OpenPGP](https://developers.yubico.com/PGP/)
|
||||
- [Yubico developer documentation](https://developers.yubico.com/)
|
||||
121
40 - Guides/self-hosting/backup-strategies.md
Normal file
121
40 - Guides/self-hosting/backup-strategies.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
title: Backup Strategies
|
||||
description: Practical backup strategy guidance for self-hosted services, containers, and virtualized homelabs
|
||||
tags:
|
||||
- backup
|
||||
- self-hosting
|
||||
- operations
|
||||
category: self-hosting
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Backup Strategies
|
||||
|
||||
## Introduction
|
||||
|
||||
Backups protect against deletion, corruption, hardware failure, ransomware, and operational mistakes. In self-hosted environments, a backup strategy should cover both data and the information needed to restore services correctly.
|
||||
|
||||
## Purpose
|
||||
|
||||
This guide covers:
|
||||
|
||||
- What to back up
|
||||
- How often to back it up
|
||||
- Where to store copies
|
||||
- How to validate restore readiness
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
A good strategy includes:
|
||||
|
||||
- Primary data backups
|
||||
- Configuration and infrastructure backups
|
||||
- Off-site or offline copies
|
||||
- Restore testing
|
||||
|
||||
The 3-2-1 rule is a strong baseline:
|
||||
|
||||
- 3 copies of data
|
||||
- 2 different media or storage systems
|
||||
- 1 copy off-site
|
||||
|
||||
For higher assurance, also consider an immutable or offline copy and zero-error verification.
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Inventory what matters
|
||||
|
||||
Back up:
|
||||
|
||||
- Databases
|
||||
- Application data directories
|
||||
- Compose files and infrastructure code
|
||||
- DNS, reverse proxy, and secrets configuration
|
||||
- Hypervisor or VM backup metadata
|
||||
|
||||
### 2. Choose backup tools by workload
|
||||
|
||||
- File-level backups: restic, Borg, rsync-based workflows
|
||||
- VM backups: hypervisor-integrated backup jobs
|
||||
- Database-aware backups: logical dumps or physical backup tools where needed
|
||||
|
||||
### 3. Schedule and retain intelligently
|
||||
|
||||
Use a retention policy that matches recovery needs. Short retention for frequent snapshots and longer retention for off-site backups is common.
|
||||
|
||||
### 4. Test restores
|
||||
|
||||
Backups are incomplete until you can restore and start the service successfully.
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Restic backup example:
|
||||
|
||||
```bash
|
||||
export RESTIC_REPOSITORY=/backup/restic
|
||||
export RESTIC_PASSWORD_FILE=/run/secrets/restic_password
|
||||
|
||||
restic backup /srv/app-data /srv/compose
|
||||
restic snapshots
|
||||
```
|
||||
|
||||
Example restore check:
|
||||
|
||||
```bash
|
||||
restic restore latest --target /tmp/restore-check
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Backups exist but restores are incomplete
|
||||
|
||||
- Confirm databases were backed up consistently, not mid-write without support
|
||||
- Verify application config and secret material were included
|
||||
- Check permissions and ownership in the restored data
|
||||
|
||||
### Repository size grows too quickly
|
||||
|
||||
- Review retention rules and pruning behavior
|
||||
- Exclude caches, transient files, and rebuildable artifacts
|
||||
- Split hot data from archival data if retention needs differ
|
||||
|
||||
### Backups run but nobody notices failures
|
||||
|
||||
- Alert on backup freshness and last successful run
|
||||
- Record the restore procedure for each critical service
|
||||
- Test restores on a schedule, not only after incidents
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Back up both data and the configuration needed to use it
|
||||
- Keep at least one copy outside the main failure domain
|
||||
- Prefer encrypted backup repositories for off-site storage
|
||||
- Automate backup jobs and monitor their success
|
||||
- Practice restores for your most important services first
|
||||
|
||||
## References
|
||||
|
||||
- [restic documentation](https://restic.readthedocs.io/en/latest/)
|
||||
- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
|
||||
- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)
|
||||
125
40 - Guides/self-hosting/service-monitoring.md
Normal file
125
40 - Guides/self-hosting/service-monitoring.md
Normal file
@@ -0,0 +1,125 @@
|
||||
---
|
||||
title: Service Monitoring
|
||||
description: Guide to building a basic monitoring stack for self-hosted services and infrastructure
|
||||
tags:
|
||||
- monitoring
|
||||
- self-hosting
|
||||
- observability
|
||||
category: self-hosting
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Service Monitoring
|
||||
|
||||
## Introduction
|
||||
|
||||
Monitoring turns a self-hosted environment from a collection of services into an operable system. At minimum, that means collecting metrics, checking service availability, and alerting on failures that need human action.
|
||||
|
||||
## Purpose
|
||||
|
||||
This guide focuses on:
|
||||
|
||||
- Host and service metrics
|
||||
- Uptime checks
|
||||
- Dashboards and alerting
|
||||
- Monitoring coverage for common homelab services
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
A small monitoring stack often includes:
|
||||
|
||||
- Prometheus for scraping metrics
|
||||
- Exporters such as `node_exporter` for host metrics
|
||||
- Blackbox probing for endpoint availability
|
||||
- Grafana for dashboards
|
||||
- Alertmanager for notifications
|
||||
|
||||
Typical flow:
|
||||
|
||||
```text
|
||||
Exporter or target -> Prometheus -> Grafana dashboards
|
||||
Prometheus alerts -> Alertmanager -> notification channel
|
||||
```
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Start with host metrics
|
||||
|
||||
Install `node_exporter` on important Linux hosts or run it in a controlled containerized setup.
|
||||
|
||||
### 2. Scrape targets from Prometheus
|
||||
|
||||
Example scrape config:
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: node
|
||||
static_configs:
|
||||
- targets:
|
||||
- "server-01.internal.example:9100"
|
||||
- "server-02.internal.example:9100"
|
||||
```
|
||||
|
||||
### 3. Add endpoint checks
|
||||
|
||||
Use a blackbox probe or equivalent to test HTTPS and TCP reachability for user-facing services.
|
||||
|
||||
### 4. Add dashboards and alerts
|
||||
|
||||
Alert only on conditions that require action, such as:
|
||||
|
||||
- Host down
|
||||
- Disk nearly full
|
||||
- Backup job missing
|
||||
- TLS certificate near expiry
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Example alert concept:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: infrastructure
|
||||
rules:
|
||||
- alert: HostDown
|
||||
expr: up == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Metrics are missing for one host
|
||||
|
||||
- Check exporter health on that host
|
||||
- Confirm firewall rules allow scraping
|
||||
- Verify the target name and port in the Prometheus config
|
||||
|
||||
### Alerts are noisy
|
||||
|
||||
- Add `for` durations to avoid alerting on short blips
|
||||
- Remove alerts that never trigger action
|
||||
- Tune thresholds per service class rather than globally
|
||||
|
||||
### Dashboards look healthy while the service is down
|
||||
|
||||
- Add blackbox checks in addition to internal metrics
|
||||
- Monitor the reverse proxy or external entry point, not only the app process
|
||||
- Track backups and certificate expiry separately from CPU and RAM
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Monitor the services users depend on, not only the hosts they run on
|
||||
- Keep alert volume low enough that alerts remain meaningful
|
||||
- Document the owner and response path for each critical alert
|
||||
- Treat backup freshness and certificate expiry as first-class signals
|
||||
- Start simple, then add coverage where operational pain justifies it
|
||||
|
||||
## References
|
||||
|
||||
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
|
||||
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
|
||||
- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
|
||||
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)
|
||||
124
40 - Guides/self-hosting/update-management.md
Normal file
124
40 - Guides/self-hosting/update-management.md
Normal file
@@ -0,0 +1,124 @@
|
||||
---
|
||||
title: Update Management
|
||||
description: Practical update management for Linux hosts, containers, and self-hosted services
|
||||
tags:
|
||||
- updates
|
||||
- patching
|
||||
- self-hosting
|
||||
category: self-hosting
|
||||
created: 2026-03-14
|
||||
updated: 2026-03-14
|
||||
---
|
||||
|
||||
# Update Management
|
||||
|
||||
## Introduction
|
||||
|
||||
Update management keeps systems secure and supportable without turning every patch cycle into an outage. In self-hosted environments, the challenge is balancing security, uptime, and limited operator time.
|
||||
|
||||
## Purpose
|
||||
|
||||
This guide focuses on:
|
||||
|
||||
- Operating system updates
|
||||
- Container and dependency updates
|
||||
- Scheduling, staging, and rollback planning
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
A practical update process has four layers:
|
||||
|
||||
- Inventory: know what you run
|
||||
- Detection: know when updates are available
|
||||
- Deployment: apply updates in a controlled order
|
||||
- Validation: confirm services still work
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### 1. Separate systems by risk
|
||||
|
||||
Create update rings such as:
|
||||
|
||||
- Ring 1: non-critical test systems
|
||||
- Ring 2: internal services
|
||||
- Ring 3: critical stateful services and edge entry points
|
||||
|
||||
### 2. Automate security updates where safe
|
||||
|
||||
For Linux hosts, automated security updates can reduce patch delay for low-risk packages. Review distribution guidance and keep reboots controlled.
|
||||
|
||||
### 3. Automate update discovery
|
||||
|
||||
Use tools that open reviewable pull requests or dashboards for:
|
||||
|
||||
- Container image updates
|
||||
- Dependency updates
|
||||
- Operating system patch reporting
|
||||
|
||||
### 4. Validate after rollout
|
||||
|
||||
Confirm:
|
||||
|
||||
- Service health
|
||||
- Reverse proxy reachability
|
||||
- Backup jobs
|
||||
- Monitoring and alerting
|
||||
|
||||
## Configuration Example
|
||||
|
||||
Ubuntu unattended upgrades example:
|
||||
|
||||
```text
|
||||
APT::Periodic::Update-Package-Lists "1";
|
||||
APT::Periodic::Unattended-Upgrade "1";
|
||||
```
|
||||
|
||||
Dependency update automation example:
|
||||
|
||||
```json
|
||||
{
|
||||
"extends": ["config:recommended"],
|
||||
"schedule": ["before 5am on monday"],
|
||||
"packageRules": [
|
||||
{
|
||||
"matchUpdateTypes": ["major"],
|
||||
"automerge": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting Tips
|
||||
|
||||
### Updates are applied but regressions go unnoticed
|
||||
|
||||
- Add post-update health checks
|
||||
- Review dashboards and key alerts after patch windows
|
||||
- Keep rollback or restore steps documented for stateful services
|
||||
|
||||
### Too many update notifications create fatigue
|
||||
|
||||
- Group low-risk updates into maintenance windows
|
||||
- Separate critical security issues from routine version bumps
|
||||
- Use labels or dashboards to prioritize by service importance
|
||||
|
||||
### Containers stay outdated even though automation exists
|
||||
|
||||
- Verify image digests and registry visibility
|
||||
- Confirm the deployment process actually recreates containers after image updates
|
||||
- Prefer reviewed rebuild and redeploy workflows over blind runtime mutation for important services
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Patch internet-exposed and admin-facing services first
|
||||
- Stage risky or major updates through lower-risk environments
|
||||
- Prefer reviewable dependency automation over silent uncontrolled updates
|
||||
- Keep maintenance windows small and predictable
|
||||
- Document rollback expectations before making large version jumps
|
||||
|
||||
## References
|
||||
|
||||
- [Ubuntu Community Help Wiki: Automatic Security Updates](https://help.ubuntu.com/community/AutomaticSecurityUpdates)
|
||||
- [Debian Wiki: UnattendedUpgrades](https://wiki.debian.org/UnattendedUpgrades)
|
||||
- [Renovate documentation](https://docs.renovatebot.com/)
|
||||
- [GitHub Docs: Configuring Dependabot version updates](https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuring-dependabot-version-updates)
|
||||
Reference in New Issue
Block a user