first version of the knowledge base :)

This commit is contained in:
2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions

View File

@@ -0,0 +1,133 @@
---
title: Container Networking
description: Overview of Docker container networking modes and practical networking patterns
tags:
- containers
- docker
- networking
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Container Networking
## Introduction
Container networking determines how workloads talk to each other, the host, and the rest of the network. In Docker environments, understanding bridge networks, published ports, and special drivers is essential for secure and predictable service deployment.
## Purpose
This document explains how container networking works so you can:
- Choose the right network mode for a workload
- Avoid unnecessary host exposure
- Troubleshoot service discovery and connectivity problems
- Design cleaner multi-service stacks
## Architecture Overview
Docker commonly uses these networking approaches:
- Default bridge: basic isolated network for containers on one host
- User-defined bridge: preferred for most application stacks because it adds built-in DNS and cleaner isolation
- Host network: container shares the host network namespace
- Macvlan or ipvlan: container appears directly on the physical network
- Overlay: multi-host networking for orchestrated environments such as Swarm
## Network Modes
### User-defined bridge
This is the normal choice for single-host multi-container applications. Containers on the same network can resolve each other by service or container name.
Example:
```bash
docker network create app-net
docker run -d --name db --network app-net postgres:16
docker run -d --name app --network app-net ghcr.io/example/app:1.2.3
```
### Published ports
Publishing a port maps traffic from the host into the container:
```bash
docker run -d -p 8080:80 nginx:stable
```
This exposes a service through the host IP and should be limited to the ports you actually need.
### Host networking
Host networking removes network namespace isolation. It can be useful for performance-sensitive agents or software that depends on broadcast-heavy behavior, but it increases the chance of port conflicts and broad host exposure.
### Macvlan or ipvlan
These drivers give a container its own presence on the LAN. They can be useful for software that needs direct network identity, but they also bypass some of the simplicity and isolation of bridge networking.
## Configuration Example
Compose network example:
```yaml
services:
reverse-proxy:
image: caddy:2
ports:
- "80:80"
- "443:443"
networks:
- edge
app:
image: ghcr.io/example/app:1.2.3
networks:
- edge
- backend
db:
image: postgres:16
networks:
- backend
networks:
edge:
backend:
internal: true
```
In this pattern, the database is not reachable directly from the host or external clients.
## Troubleshooting Tips
### Container can reach the internet but not another container
- Verify both containers are attached to the same user-defined network
- Use container or service names rather than host loopback addresses
### Service is reachable internally but not from another host
- Confirm the port is published on the host
- Check host firewall rules and upstream routing
### Random connectivity issues after custom network changes
- Inspect network configuration with `docker network inspect <name>`
- Check for overlapping subnets between Docker networks and the physical LAN
- Restart affected containers after major network topology changes
## Best Practices
- Use user-defined bridge networks instead of the legacy default bridge where possible
- Publish only reverse proxy or explicitly required service ports
- Keep databases and internal backends on private internal networks
- Avoid `network_mode: host` unless there is a clear technical reason
- Document custom subnets to avoid conflicts with VPN and LAN address plans
## References
- [Docker: Bridge network driver](https://docs.docker.com/network/drivers/bridge/)
- [Docker: Networking overview](https://docs.docker.com/engine/network/)
- [Docker: Published ports](https://docs.docker.com/get-started/docker-concepts/running-containers/publishing-ports/)

View File

@@ -0,0 +1,125 @@
---
title: Persistent Volumes
description: Storage patterns for keeping container data durable across restarts and upgrades
tags:
- containers
- docker
- storage
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Persistent Volumes
## Introduction
Containers are disposable, but application data usually is not. Persistent volumes provide storage that survives container restarts, recreation, and image upgrades.
## Purpose
Use persistent volumes to:
- Preserve databases, uploads, and application state
- Separate data lifecycle from container lifecycle
- Simplify backup and restore workflows
- Reduce accidental data loss during redeployments
## Architecture Overview
Docker storage typically falls into three categories:
- Named volumes: managed by Docker and usually the best default for persistent app data
- Bind mounts: direct host paths mounted into a container
- Tmpfs mounts: memory-backed storage for temporary data
## Storage Patterns
### Named volumes
Named volumes are portable within a host and reduce the chance of coupling to host directory layouts.
```bash
docker volume create postgres-data
docker run -d \
--name db \
-v postgres-data:/var/lib/postgresql/data \
postgres:16
```
### Bind mounts
Bind mounts are useful when:
- The application expects editable configuration files
- You need direct host visibility into files
- Backups are based on host file paths
Example:
```bash
docker run -d \
--name caddy \
-v /srv/caddy/Caddyfile:/etc/caddy/Caddyfile:ro \
-v /srv/caddy/data:/data \
caddy:2
```
### Permissions and ownership
Many container storage issues come from mismatched UID and GID values between the host and containerized process. Check the image documentation and align ownership before assuming the application is broken.
## Configuration Example
Compose example with named volumes:
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
volumes:
- app-data:/var/lib/app
db:
image: postgres:16
volumes:
- db-data:/var/lib/postgresql/data
volumes:
app-data:
db-data:
```
## Troubleshooting Tips
### Data disappears after updating a container
- Verify the service is writing to the mounted path
- Check whether a bind mount accidentally hides expected image content
- Inspect mounts with `docker inspect <container>`
### Permission denied errors
- Check ownership and mode bits on bind-mounted directories
- Match container user expectations to host permissions
- Avoid mounting sensitive directories with broad write access
### Backups restore but the app still fails
- Confirm the restored data matches the application version
- Restore metadata such as permissions and database WAL files if applicable
- Test restores on a separate host before using them in production
## Best Practices
- Use named volumes for most stateful container data
- Use bind mounts deliberately for human-managed configuration
- Keep backups separate from the production host
- Record where every service stores its critical state
- Test restore procedures, not only backup creation
## References
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
- [Docker: Bind mounts](https://docs.docker.com/engine/storage/bind-mounts/)
- [Docker: Tmpfs mounts](https://docs.docker.com/engine/storage/tmpfs/)

View File

@@ -0,0 +1,112 @@
---
title: CI/CD Basics
description: Introduction to continuous integration and continuous delivery pipelines for application and infrastructure repositories
tags:
- ci
- cd
- devops
category: devops
created: 2026-03-14
updated: 2026-03-14
---
# CI/CD Basics
## Introduction
Continuous integration and continuous delivery reduce manual deployment risk by automating validation, packaging, and release steps. Even small self-hosted projects benefit from predictable pipelines that lint, test, and package changes before they reach live systems.
## Purpose
CI/CD pipelines help with:
- Fast feedback on changes
- Repeatable build and test execution
- Safer promotion of artifacts between environments
- Reduced manual drift in deployment procedures
## Architecture Overview
A basic pipeline usually includes:
- Trigger: push, pull request, tag, or schedule
- Jobs: isolated units such as lint, test, build, or deploy
- Artifacts: build outputs or packages passed to later stages
- Environments: dev, staging, production, or similar release targets
Typical flow:
```text
Commit -> CI checks -> Build artifact -> Approval or policy gate -> Deploy
```
## Core Concepts
### Continuous integration
Every meaningful change should run automated checks quickly and consistently.
### Continuous delivery
Artifacts are always kept in a releasable state, even if production deployment requires a manual approval.
### Continuous deployment
Every validated change is deployed automatically. This is powerful but requires strong tests, rollback paths, and change confidence.
## Configuration Example
GitHub Actions workflow example:
```yaml
name: ci
on:
pull_request:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm ci
- run: npm test
```
## Troubleshooting Tips
### Pipeline is slow and developers stop trusting it
- Run fast checks early
- Cache dependencies carefully
- Separate heavyweight integration tests from every small change if needed
### Deployments succeed but services still break
- Add health checks and post-deploy validation
- Make environment-specific configuration explicit
- Track which artifact version reached which environment
### CI and local results disagree
- Match tool versions between local and CI environments
- Keep pipeline setup code in version control
- Avoid hidden mutable runners when reproducibility matters
## Best Practices
- Keep CI feedback fast enough to be used during active development
- Require checks before merging to shared branches
- Build once and promote the same artifact when possible
- Separate validation, packaging, and deployment concerns
- Treat pipeline configuration as production code
## References
- [GitHub Docs: Understanding GitHub Actions](https://docs.github.com/actions/about-github-actions/understanding-github-actions)
- [GitHub Docs: Workflow syntax for GitHub Actions](https://docs.github.com/actions/reference/workflows-and-actions/workflow-syntax)

View File

@@ -0,0 +1,111 @@
---
title: Git Workflows
description: Practical Git workflow patterns for teams and personal infrastructure repositories
tags:
- git
- devops
- workflow
category: devops
created: 2026-03-14
updated: 2026-03-14
---
# Git Workflows
## Introduction
A Git workflow defines how changes move from local work to reviewed and deployable history. The right workflow keeps collaboration predictable without adding unnecessary ceremony.
## Purpose
This document covers the most common workflow choices for:
- Application repositories
- Infrastructure-as-code repositories
- Self-hosted service configuration
## Architecture Overview
A Git workflow usually combines:
- Branching strategy
- Review policy
- Merge policy
- Release or deployment trigger
The two patterns most teams evaluate first are:
- Trunk-based development with short-lived branches
- Feature branches with pull or merge requests
## Common Workflow Patterns
### Trunk-based with short-lived branches
Changes are kept small and integrated frequently into the default branch. This works well for active teams, automated test pipelines, and repositories that benefit from continuous deployment.
### Longer-lived feature branches
This can be useful for larger changes or teams with less frequent integration, but it increases drift and merge complexity.
### Infrastructure repositories
For IaC and self-hosting repos, prefer small reviewed changes with strong defaults:
- Protected main branch
- Required checks before merge
- Clear rollback path
- Commit messages that explain operational impact
## Configuration Example
Example daily workflow:
```bash
git switch main
git pull --ff-only
git switch -c feature/update-grafana
git add .
git commit -m "Update Grafana image and alert rules"
git push -u origin feature/update-grafana
```
Before merge:
```bash
git fetch origin
git rebase origin/main
```
## Troubleshooting Tips
### Merge conflicts happen constantly
- Reduce branch lifetime
- Split large changes into smaller reviewable commits
- Rebase or merge from the default branch more frequently
### History becomes hard to audit
- Use meaningful commit messages
- Avoid mixing unrelated infrastructure and application changes in one commit
- Document the operational reason for risky changes in the pull request
### Reverts are painful
- Keep commits cohesive
- Avoid squash-merging unrelated fixes together
- Ensure deployments can be tied back to a specific Git revision
## Best Practices
- Prefer short-lived branches and small pull requests
- Protect the default branch and require review for shared repos
- Use fast-forward pulls locally to avoid accidental merge noise
- Keep configuration and deployment code in Git, not in ad hoc host edits
- Align the Git workflow with deployment automation instead of treating them as separate processes
## References
- [Git: `gitworkflows`](https://git-scm.com/docs/gitworkflows)
- [Pro Git: Branching workflows](https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows)

View File

@@ -0,0 +1,58 @@
---
title: Monitoring and Observability
description: Core concepts behind monitoring, alerting, and observability for self-hosted systems
tags:
- monitoring
- observability
- operations
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Monitoring and Observability
## Summary
Monitoring and observability provide visibility into system health, failure modes, and operational behavior. For self-hosted systems, they turn infrastructure from a black box into an environment that can be maintained intentionally.
## Why it matters
Without visibility, teams discover failures only after users notice them. Observability reduces diagnosis time, helps verify changes safely, and supports day-two operations such as capacity planning and backup validation.
## Core concepts
- Metrics: numerical measurements over time
- Logs: event records produced by systems and applications
- Traces: request-path visibility across components
- Alerting: notifications triggered by actionable failure conditions
- Service-level thinking: monitoring what users experience, not only host resource usage
## Practical usage
A practical starting point often includes:
- Host metrics from exporters
- Availability checks for critical endpoints
- Dashboards for infrastructure and core services
- Alerts for outages, storage pressure, certificate expiry, and failed backups
## Best practices
- Monitor both infrastructure health and service reachability
- Alert on conditions that require action
- Keep dashboards focused on questions operators actually ask
- Use monitoring data to validate upgrades and incident recovery
## Pitfalls
- Treating dashboards as a substitute for alerts
- Collecting far more data than anyone reviews
- Monitoring only CPU and RAM while ignoring ingress, DNS, and backups
- Sending noisy alerts that train operators to ignore them
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)

View File

@@ -0,0 +1,114 @@
---
title: Proxmox Cluster Basics
description: Overview of how Proxmox VE clusters work, including quorum, networking, and operational constraints
tags:
- proxmox
- virtualization
- clustering
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Proxmox Cluster Basics
## Introduction
A Proxmox VE cluster groups multiple Proxmox nodes into a shared management domain. This allows centralized administration of virtual machines, containers, storage definitions, and optional high-availability workflows.
## Purpose
Use a Proxmox cluster when you want:
- Centralized management for multiple hypervisor nodes
- Shared visibility of guests, storage, and permissions
- Live migration or controlled workload movement between nodes
- A foundation for HA services backed by shared or replicated storage
## Architecture Overview
A Proxmox cluster relies on several core components:
- `pvecm`: the cluster management tool used to create and join clusters
- Corosync: provides the cluster communication layer
- `pmxcfs`: the Proxmox cluster file system used to distribute cluster configuration
- Quorum: majority voting used to protect cluster consistency
Important operational behavior:
- Each node normally has one vote
- A majority of votes must be online for state-changing operations
- Loss of quorum causes the cluster to become read-only for protected operations
## Cluster Design Notes
### Network requirements
Proxmox expects a reliable low-latency network for cluster traffic. Corosync is sensitive to packet loss, jitter, and unstable links. In homelabs, this generally means wired LAN links, stable switching, and avoiding Wi-Fi for cluster communication.
### Odd node counts
Three nodes is the common minimum for a healthy quorum-based design. Two-node designs can work, but they need extra planning such as a QDevice or acceptance of reduced fault tolerance.
### Storage considerations
Clustering does not automatically provide shared storage. Features such as live migration and HA depend on storage design:
- Shared storage: NFS, iSCSI, Ceph, or other shared backends
- Replicated local storage: possible for some workflows, but requires careful planning
- Backup storage: separate from guest runtime storage
## Configuration Example
Create a new cluster on the first node:
```bash
pvecm create lab-cluster
```
Check cluster status:
```bash
pvecm status
```
Join another node to the cluster from that node:
```bash
pvecm add 192.0.2.10
```
Use placeholder management addresses in documentation and never expose real administrative IPs publicly.
## Troubleshooting Tips
### Cluster is read-only
- Check quorum status with `pvecm status`
- Look for network instability between nodes
- Verify time synchronization and general host health
### Node join fails
- Confirm name resolution and basic IP reachability
- Make sure cluster traffic is not filtered by a firewall
- Verify the node is not already part of another cluster
### Random cluster instability
- Review packet loss, duplex mismatches, and switch reliability
- Keep corosync on stable wired links with low latency
- Separate heavy storage replication traffic from cluster messaging when possible
## Best Practices
- Use at least three voting members for a stable quorum model
- Keep cluster traffic on reliable wired networking
- Document node roles, storage backends, and migration dependencies
- Treat the Proxmox management network as a high-trust segment
- Test backup and restore separately from cluster failover assumptions
## References
- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
- [Proxmox VE `pvecm` manual](https://pve.proxmox.com/pve-docs/pvecm.1.html)

View File

@@ -0,0 +1,125 @@
---
title: Reverse Proxy Patterns
description: Common reverse proxy design patterns for self-hosted services and internal platforms
tags:
- reverse-proxy
- networking
- self-hosting
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Reverse Proxy Patterns
## Introduction
A reverse proxy accepts client requests and forwards them to upstream services. It commonly handles TLS termination, host-based routing, request header forwarding, and policy enforcement in front of self-hosted applications.
## Purpose
Reverse proxies are used to:
- Publish multiple services behind one or a few public entry points
- Centralize TLS certificates
- Apply authentication, authorization, or rate-limiting controls
- Simplify backend service placement and migration
## Architecture Overview
Typical request flow:
```text
Client -> Reverse proxy -> Upstream application
```
Common proxy responsibilities:
- TLS termination and certificate management
- Routing by hostname, path, or protocol
- Forwarding of `Host`, client IP, and other headers
- Optional load balancing across multiple backends
## Common Patterns
### Edge proxy for many internal services
One proxy handles traffic for multiple hostnames:
- `grafana.example.com`
- `gitea.example.com`
- `vault.example.com`
This is a good default for small homelabs and internal platforms.
### Internal proxy behind a VPN
Administrative services are reachable only through a private network such as Tailscale, WireGuard, or a dedicated management VLAN. This reduces public attack surface.
### Path-based routing
Useful when hostnames are limited, but more fragile than host-based routing because some applications assume they live at `/`.
### Dynamic discovery proxy
Tools such as Traefik can watch container metadata and update routes automatically. This reduces manual config for dynamic container environments, but it also makes label hygiene and network policy more important.
## Configuration Example
NGINX example:
```nginx
server {
listen 443 ssl http2;
server_name app.example.com;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:8080;
}
}
```
Caddy example:
```caddyfile
app.example.com {
reverse_proxy 127.0.0.1:8080
}
```
## Troubleshooting Tips
### Application redirects to the wrong URL
- Check forwarded headers such as `Host` and `X-Forwarded-Proto`
- Verify the application's configured external base URL
- Confirm TLS termination behavior matches application expectations
### WebSocket or streaming traffic fails
- Check proxy support for upgraded connections
- Review buffering behavior if the application expects streaming responses
### Backend works locally but not through the proxy
- Verify the proxy can reach the upstream host and port
- Check the proxy network namespace if running in a container
- Confirm firewall rules permit the proxy-to-upstream path
## Best Practices
- Prefer host-based routing over deep path rewriting
- Publish only the services that need an edge entry point
- Keep proxy configuration under version control
- Use separate internal and public entry points when trust boundaries differ
- Standardize upstream headers and base URL settings across applications
## References
- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
- [Traefik: Routing overview](https://doc.traefik.io/traefik/routing/overview/)
- [Caddy: `reverse_proxy` directive](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy)

View File

@@ -0,0 +1,66 @@
---
title: Service Architecture Patterns
description: Common service architecture patterns for self-hosted platforms and small engineering environments
tags:
- architecture
- services
- infrastructure
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Service Architecture Patterns
## Summary
Service architecture patterns describe how applications are packaged, connected, exposed, and operated. In self-hosted environments, the most useful patterns balance simplicity, isolation, and operability rather than chasing scale for its own sake.
## Why it matters
Architecture decisions affect deployment complexity, failure domains, recovery steps, and long-term maintenance. Small environments benefit from choosing patterns that remain understandable without full-time platform engineering overhead.
## Core concepts
- Single-service deployment: one service per VM or container stack
- Shared platform services: DNS, reverse proxy, monitoring, identity, backups
- Stateful versus stateless workloads
- Explicit ingress, persistence, and dependency boundaries
- Loose coupling through DNS, reverse proxies, and documented interfaces
## Practical usage
Useful patterns for self-hosted systems include:
- Reverse proxy plus multiple backend services
- Dedicated database service with application separation
- Utility VMs or containers for platform services
- Private admin interfaces with public application ingress kept separate
Example dependency view:
```text
Client -> Reverse proxy -> Application -> Database
-> Identity provider
-> Monitoring and logs
```
## Best practices
- Keep stateful services isolated and clearly backed up
- Make ingress paths and dependencies easy to trace
- Reuse shared platform services where they reduce duplication
- Prefer a small number of well-understood patterns across the environment
## Pitfalls
- Putting every service into one giant stack with unclear boundaries
- Mixing public ingress and administrative paths without review
- Scaling architecture complexity before operational need exists
- Depending on undocumented local assumptions between services
## References
- [Martin Fowler: MonolithFirst](https://martinfowler.com/bliki/MonolithFirst.html)
- [The Twelve-Factor App](https://12factor.net/)
- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)

View File

@@ -0,0 +1,122 @@
---
title: Service Discovery
description: Concepts and practical patterns for finding services in self-hosted and homelab environments
tags:
- networking
- service-discovery
- dns
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Service Discovery
## Introduction
Service discovery is the process of locating services by identity instead of hard-coded IP addresses and ports. It becomes more important as workloads move between hosts, IPs change, or multiple service instances exist behind one logical name.
## Purpose
Good service discovery helps with:
- Decoupling applications from fixed network locations
- Supporting scaling and failover
- Simplifying service-to-service communication
- Reducing manual DNS and inventory drift
## Architecture Overview
There are several discovery models commonly used in self-hosted environments:
- Static DNS: manually managed A, AAAA, CNAME, or SRV records
- DNS-based service discovery: clients query DNS or DNS-SD metadata
- mDNS: local-link multicast discovery for small LANs
- Registry-based discovery: a central catalog such as Consul tracks service registration and health
## Discovery Patterns
### Static DNS
Best for stable infrastructure services such as hypervisors, reverse proxies, storage appliances, and monitoring endpoints.
Example:
```text
proxy.internal.example A 192.168.20.10
grafana.internal.example CNAME proxy.internal.example
```
### DNS-SD and mDNS
Useful for local networks where clients need to discover services such as printers or media endpoints. This works well for small trusted LAN segments, but it does not cross routed boundaries cleanly without extra relays or reflectors.
### Registry-based discovery
A service catalog stores registrations and health checks. Clients query the catalog or use DNS interfaces exposed by the registry.
This is useful when:
- Service instances are dynamic
- Health-aware routing matters
- Multiple nodes host the same service
## Configuration Example
Consul service registration example:
```json
{
"service": {
"name": "gitea",
"port": 3000,
"checks": [
{
"http": "http://127.0.0.1:3000/api/healthz",
"interval": "10s"
}
]
}
}
```
DNS-SD example concept:
```text
_https._tcp.internal.example SRV 0 0 443 proxy.internal.example
```
## Troubleshooting Tips
### Clients resolve a name but still fail to connect
- Check whether the resolved port is correct
- Verify firewall policy and reverse proxy routing
- Confirm the service is healthy, not just registered
### Discovery works on one VLAN but not another
- Review routed DNS access
- Check whether the workload depends on multicast discovery such as mDNS
- Avoid relying on broadcast or multicast across segmented networks unless intentionally supported
### Service records become stale
- Use health checks where possible
- Remove hand-managed DNS entries that no longer match current placements
- Prefer stable canonical names in front of dynamic backends
## Best Practices
- Use DNS as the default discovery mechanism for stable infrastructure
- Add service registries only when the environment is dynamic enough to justify them
- Pair discovery with health checks when multiple instances or failover paths exist
- Keep discovery names human-readable and environment-specific
- Avoid hard-coding IP addresses in application configuration unless there is no realistic alternative
## References
- [Consul: Discover services overview](https://developer.hashicorp.com/consul/docs/discover)
- [Consul: Service discovery explained](https://developer.hashicorp.com/consul/docs/use-case/service-discovery)
- [RFC 6762: Multicast DNS](https://www.rfc-editor.org/rfc/rfc6762)
- [RFC 6763: DNS-Based Service Discovery](https://www.rfc-editor.org/rfc/rfc6763)

View File

@@ -0,0 +1,66 @@
---
title: DNS Architecture
description: Core DNS architecture patterns for self-hosted and homelab environments
tags:
- dns
- networking
- infrastructure
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# DNS Architecture
## Summary
DNS architecture defines how names are assigned, resolved, delegated, and operated across internal and external systems. In self-hosted environments, good DNS design reduces configuration drift, improves service discoverability, and simplifies remote access.
## Why it matters
DNS is a foundational dependency for reverse proxies, TLS, service discovery, monitoring, and operator workflows. Weak DNS design creates brittle systems that depend on hard-coded IP addresses and manual recovery steps.
## Core concepts
- Authoritative DNS: the source of truth for a zone
- Recursive resolution: the process clients use to resolve names
- Internal DNS: records intended only for private services
- Split-horizon DNS: different answers depending on the client context
- TTL: cache lifetime that affects propagation and change speed
## Practical usage
A practical self-hosted DNS model often includes:
- Public DNS for internet-facing records
- Internal DNS for management and private services
- Reverse proxy hostnames for application routing
- Stable names for infrastructure services such as hypervisors, backup targets, and monitoring systems
Example record set:
```text
proxy.example.net A 198.51.100.20
grafana.internal.example A 192.0.2.20
gitea.internal.example CNAME proxy.internal.example
```
## Best practices
- Use DNS names instead of embedding IP addresses in application config
- Separate public and private naming where trust boundaries differ
- Keep TTLs appropriate for the change rate of the record
- Treat authoritative DNS as critical infrastructure with backup and access control
## Pitfalls
- Reusing the same name for unrelated services over time
- Forgetting that split DNS can confuse troubleshooting if undocumented
- Leaving DNS ownership unclear across platforms and providers
- Building service dependencies on local `/etc/hosts` entries
## References
- [Cloudflare Learning Center: What is DNS?](https://www.cloudflare.com/learning/dns/what-is-dns/)
- [RFC 1034: Domain Concepts and Facilities](https://www.rfc-editor.org/rfc/rfc1034)
- [RFC 1035: Domain Implementation and Specification](https://www.rfc-editor.org/rfc/rfc1035)

View File

@@ -0,0 +1,131 @@
---
title: Network Segmentation for Homelabs
description: Practical network segmentation patterns for separating trust zones in a homelab
tags:
- networking
- security
- homelab
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Network Segmentation for Homelabs
## Introduction
Network segmentation reduces blast radius by separating devices and services into smaller trust zones. In a homelab, this helps isolate management systems, user devices, public services, and less trusted endpoints such as IoT equipment.
## Purpose
Segmentation is useful for:
- Limiting lateral movement after a compromise
- Keeping management interfaces off general user networks
- Isolating noisy or untrusted devices
- Applying different routing, DNS, and firewall policies per zone
## Architecture Overview
A practical homelab usually benefits from separate L3 segments or VLANs for at least the following areas:
- Management: hypervisors, switches, storage admin interfaces
- Servers: application VMs, container hosts, databases
- Clients: laptops, desktops, mobile devices
- IoT: cameras, media devices, printers, controllers
- Guest: devices that should only reach the internet
- Storage or backup: optional dedicated replication path
Example layout:
```text
VLAN 10 Management 192.168.10.0/24
VLAN 20 Servers 192.168.20.0/24
VLAN 30 Clients 192.168.30.0/24
VLAN 40 IoT 192.168.40.0/24
VLAN 50 Guest 192.168.50.0/24
```
Traffic should pass through a firewall or router between zones instead of being bridged freely.
## Design Guidelines
### Segment by trust and function
Start with simple boundaries:
- High trust: management, backup, secrets infrastructure
- Medium trust: internal application servers
- Lower trust: personal devices, guest devices, consumer IoT
### Route between zones with policy
Use inter-VLAN routing with explicit firewall rules. Default deny between segments is easier to reason about than a flat network with ad hoc exceptions.
### Use DNS intentionally
- Give internal services stable names
- Avoid exposing management DNS records to guest or IoT segments
- Consider split DNS for remote access through Tailscale or another VPN
### Minimize overlap
Use clean RFC 1918 address plans and document them. Overlapping subnets complicate VPN routing, container networking, and future site expansion.
## Configuration Example
Example policy intent for a firewall:
```text
Allow Clients -> Servers : TCP 80,443
Allow Management -> Servers : any
Allow Servers -> Storage : TCP 2049,445,3260 as needed
Deny IoT -> Management : any
Deny Guest -> Internal RFC1918 ranges : any
```
Example address planning notes:
```text
192.168.10.0/24 Management
192.168.20.0/24 Server workloads
192.168.30.0/24 User devices
192.168.40.0/24 IoT
192.168.50.0/24 Guest
fd00:10::/64 IPv6 management ULA
```
## Troubleshooting Tips
### Service works from one VLAN but not another
- Check the inter-VLAN firewall rule order
- Confirm DNS resolves to the intended internal address
- Verify the destination service is listening on the right interface
### VPN users can reach too much
- Review ACLs or firewall policy for routed VPN traffic
- Publish only the required subnets through subnet routers
- Avoid combining management and user services in the same routed segment
### Broadcast-dependent services break across segments
- Use unicast DNS or service discovery where possible
- For mDNS-dependent workflows, consider a reflector only where justified
- Do not flatten the network just to support one legacy discovery method
## Best Practices
- Keep management on its own segment from the beginning
- Treat IoT and guest networks as untrusted
- Document every VLAN, subnet, DHCP scope, and routing rule
- Prefer L3 policy enforcement over broad L2 access
- Revisit segmentation when new services expose public endpoints or remote admin paths
## References
- [RFC 1918: Address Allocation for Private Internets](https://www.rfc-editor.org/rfc/rfc1918)
- [RFC 4193: Unique Local IPv6 Unicast Addresses](https://www.rfc-editor.org/rfc/rfc4193)
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)

View File

@@ -0,0 +1,123 @@
---
title: Tailscale Overview
description: Conceptual overview of how Tailscale works and where it fits in a homelab or engineering environment
tags:
- networking
- tailscale
- vpn
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Overview
## Introduction
Tailscale is a mesh VPN built on WireGuard. It provides secure connectivity between devices without requiring a traditional hub-and-spoke VPN concentrator for day-to-day traffic. In practice, it is often used to reach homelab services, administrative networks, remote workstations, and private developer environments.
## Purpose
The main purpose of Tailscale is to make private networking easier to operate:
- Identity-based access instead of exposing services directly to the internet
- Encrypted device-to-device connectivity
- Simple onboarding across laptops, servers, phones, and virtual machines
- Optional features for routing subnets, advertising exit nodes, and publishing services
## Architecture Overview
Tailscale separates coordination from data transfer.
- Control plane: devices authenticate to Tailscale and exchange node information, keys, policy, and routing metadata
- Data plane: traffic is encrypted with WireGuard and sent directly between peers whenever possible
- Relay fallback: when direct peer-to-peer connectivity is blocked, traffic can traverse DERP relays
Typical flow:
```text
Client -> Tailscale control plane for coordination
Client <-> Peer direct WireGuard tunnel when possible
Client -> DERP relay -> Peer when direct connectivity is unavailable
```
Important components:
- Tailnet: the private network that contains your devices and policies
- ACLs or grants: rules that control which identities can reach which resources
- Tags: non-human identities for servers and automation
- MagicDNS: tailnet DNS names for easier service discovery
- Subnet routers: devices that advertise non-Tailscale LAN routes
- Exit nodes: devices that forward default internet-bound traffic
## Core Concepts
### Identity first
Tailscale access control is tied to users, groups, devices, and tags rather than only source IP addresses. This works well for environments where laptops move between networks and services are distributed across cloud and on-prem hosts.
### Peer-to-peer by default
When NAT traversal succeeds, traffic goes directly between devices. This reduces latency and avoids creating a permanent bottleneck on one VPN server.
### Overlay networking
Each device keeps its normal local network connectivity and also gains a Tailscale address space. This makes it useful for remote administration without redesigning the entire local network.
## Configuration Example
Install and authenticate a Linux node:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
tailscale status
```
Advertise the node as infrastructure with a tag:
```bash
sudo tailscale up --advertise-tags=tag:server
```
## Operational Notes
- Use ACLs or grants early instead of leaving the entire tailnet flat
- Use tags for servers, containers, and automation agents
- Prefer MagicDNS or split DNS over hard-coded IP lists
- Treat subnet routers and exit nodes as infrastructure roles with extra review
## Troubleshooting
### Device is connected but cannot reach another node
- Check whether ACLs or grants allow the connection
- Confirm the target device is online with `tailscale status`
- Verify the service is listening on the expected interface and port
### Traffic is slower than expected
- Confirm whether the connection is direct or using DERP
- Inspect firewall and NAT behavior on both sides
- Check whether the path crosses an exit node or subnet router unnecessarily
### DNS names do not resolve
- Verify MagicDNS is enabled
- Check the client resolver configuration
- Confirm the hostname exists in the tailnet admin UI
## Best Practices
- Use identity-based policies and avoid broad any-to-any access
- Separate human users from infrastructure with groups and tags
- Limit high-trust roles such as subnet routers and exit nodes
- Document which services are intended for tailnet-only access
- Keep the local firewall enabled; Tailscale complements it rather than replacing it
## References
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Tailscale: How NAT traversal works](https://tailscale.com/blog/how-nat-traversal-works)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
- [Tailscale: MagicDNS](https://tailscale.com/kb/1081/magicdns)

View File

@@ -0,0 +1,120 @@
---
title: GPG Basics
description: Overview of core GnuPG concepts, key management, and common operational workflows
tags:
- security
- gpg
- encryption
category: security
created: 2026-03-14
updated: 2026-03-14
---
# GPG Basics
## Introduction
GPG, implemented by GnuPG, is used for public-key encryption, signing, and verification. It remains common for signing Git commits and tags, exchanging encrypted files, and maintaining long-term personal or team keys.
## Purpose
This document covers:
- What GPG keys and subkeys are
- Common encryption and signing workflows
- Key management practices that matter operationally
## Architecture Overview
A practical GPG setup often includes:
- Primary key: used mainly for certification and identity management
- Subkeys: used for signing, encryption, or authentication
- Revocation certificate: lets you invalidate a lost or compromised key
- Public key distribution: keyserver, WKD, or direct sharing
The primary key should be treated as more sensitive than everyday-use subkeys.
## Core Workflows
### Generate a key
Interactive generation:
```bash
gpg --full-generate-key
```
List keys:
```bash
gpg --list-secret-keys --keyid-format=long
```
### Export the public key
```bash
gpg --armor --export KEYID
```
### Encrypt a file for a recipient
```bash
gpg --encrypt --recipient KEYID secrets.txt
```
### Sign a file
```bash
gpg --detach-sign --armor release.tar.gz
```
### Verify a signature
```bash
gpg --verify release.tar.gz.asc release.tar.gz
```
## Configuration Example
Export a revocation certificate after key creation:
```bash
gpg --output revoke-KEYID.asc --gen-revoke KEYID
```
Store that revocation certificate offline in a secure location.
## Troubleshooting Tips
### Encryption works but trust warnings appear
- Confirm you imported the correct public key
- Verify fingerprints out of band before marking a key as trusted
- Do not treat keyserver availability as proof of identity
### Git signing fails
- Check that Git points to the expected key ID
- Confirm the GPG agent is running
- Verify terminal pinentry integration on the local system
### Lost laptop or corrupted keyring
- Restore from secure backups
- Revoke compromised keys if needed
- Reissue or rotate subkeys while keeping identity documentation current
## Best Practices
- Keep the primary key offline when practical and use subkeys day to day
- Generate and safely store a revocation certificate immediately
- Verify key fingerprints through a trusted secondary channel
- Back up secret keys securely before relying on them operationally
- Use GPG where it fits existing tooling; do not force it into workflows that are better served by simpler modern tools
## References
- [GnuPG Documentation](https://www.gnupg.org/documentation/)
- [The GNU Privacy Handbook](https://www.gnupg.org/gph/en/manual/book1.html)
- [GnuPG manual](https://www.gnupg.org/documentation/manuals/gnupg/)

View File

@@ -0,0 +1,65 @@
---
title: Identity and Authentication
description: Core concepts and patterns for identity, authentication, and authorization in self-hosted systems
tags:
- security
- identity
- authentication
category: security
created: 2026-03-14
updated: 2026-03-14
---
# Identity and Authentication
## Summary
Identity and authentication define who or what is requesting access and how that claim is verified. In self-hosted environments, a clear identity model is essential for secure remote access, service-to-service trust, and administrative control.
## Why it matters
As environments grow, per-application local accounts become hard to manage and harder to audit. Shared identity patterns reduce duplicated credentials, improve MFA coverage, and make access revocation more predictable.
## Core concepts
- Identity: the user, service, or device being represented
- Authentication: proving that identity
- Authorization: deciding what the identity may do
- Federation: delegating identity verification to a trusted provider
- MFA: requiring more than one authentication factor
## Practical usage
Common self-hosted patterns include:
- Central identity provider for user login
- SSO using OIDC or SAML for web applications
- SSH keys or hardware-backed credentials for administrative access
- Service accounts with narrowly scoped machine credentials
Example pattern:
```text
User -> Identity provider -> OIDC token -> Reverse proxy or application
Admin -> VPN -> SSH key or hardware-backed credential -> Server
```
## Best practices
- Centralize user identity where possible
- Enforce MFA for admin and internet-facing accounts
- Separate human accounts from machine identities
- Review how account disablement or key rotation propagates across services
## Pitfalls
- Leaving critical systems on isolated local accounts with no lifecycle control
- Reusing the same credentials across multiple services
- Treating authentication and authorization as the same problem
- Forgetting account recovery and break-glass access paths
## References
- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
- [NIST Digital Identity Guidelines](https://pages.nist.gov/800-63-3/)
- [Yubico developer documentation](https://developers.yubico.com/)

View File

@@ -0,0 +1,109 @@
---
title: Secrets Management
description: Principles and tool choices for handling secrets safely in self-hosted and engineering environments
tags:
- security
- secrets
- devops
category: security
created: 2026-03-14
updated: 2026-03-14
---
# Secrets Management
## Introduction
Secrets management is the practice of storing, distributing, rotating, and auditing sensitive values such as API tokens, database passwords, SSH private keys, and certificate material.
## Purpose
Good secrets management helps you:
- Keep credentials out of Git and chat logs
- Reduce accidental disclosure in deployment pipelines
- Rotate credentials without rewriting every system by hand
- Apply least privilege to applications and operators
## Architecture Overview
A practical secrets strategy distinguishes between:
- Human secrets: admin credentials, recovery codes, hardware token backups
- Machine secrets: database passwords, API tokens, TLS private keys
- Dynamic secrets: short-lived credentials issued on demand
- Encrypted configuration: secrets stored in version control in encrypted form
Common tooling patterns:
- Vault for centrally managed and dynamic secrets
- SOPS for Git-managed encrypted secret files
- Platform-native secret stores for specific runtimes
## Operational Model
### Centralized secret service
A service such as Vault handles storage, access policy, audit logging, and secret issuance. This is most useful when you need rotation, leasing, or many consumers across multiple environments.
### Encrypted files in Git
Tools such as SOPS allow you to keep encrypted configuration alongside deployment code. This is useful for small teams and GitOps-style workflows, as long as decryption keys are managed carefully.
### Runtime injection
Applications should receive secrets at runtime through a controlled delivery path rather than through hard-coded values inside images or repositories.
## Configuration Example
Example placeholder environment file layout:
```text
APP_DATABASE_URL=postgres://app:${DB_PASSWORD}@db.internal.example/app
APP_SMTP_PASSWORD=<provided-at-runtime>
```
Example SOPS-managed YAML structure:
```yaml
database:
user: app
password: ENC[AES256_GCM,data:...,type:str]
smtp:
password: ENC[AES256_GCM,data:...,type:str]
```
## Troubleshooting Tips
### Secret appears in logs or shell history
- Remove it from the source immediately if exposure is ongoing
- Rotate the credential instead of assuming it stayed private
- Review the delivery path that leaked it
### Encrypted config exists but deployments still fail
- Verify the deployment environment has access to the correct decryption keys
- Check whether placeholders or environment interpolation are incomplete
- Confirm the application reads secrets from the documented location
### Secret sprawl grows over time
- Inventory where secrets live and who owns them
- Standardize naming and rotation intervals
- Remove stale credentials from old hosts and repos
## Best Practices
- Never commit plaintext secrets to Git
- Prefer short-lived or scoped credentials where the platform supports them
- Separate secret storage from application images
- Rotate credentials after incidents, staff changes, and major platform migrations
- Document ownership, rotation method, and recovery path for every critical secret
## References
- [HashiCorp Vault: What is Vault?](https://developer.hashicorp.com/vault/docs/what-is-vault)
- [HashiCorp Vault documentation](https://developer.hashicorp.com/vault/docs)
- [SOPS documentation](https://getsops.io/docs/)
- [The Twelve-Factor App: Config](https://12factor.net/config)