first version of the knowledge base :)

2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions
--- a/Knowledge/containers/container-networking.md
+++ b/Knowledge/containers/container-networking.md
@@ -0,0 +1,133 @@
+---
+title: Container Networking
+description: Overview of Docker container networking modes and practical networking patterns
+tags:
+  - containers
+  - docker
+  - networking
+category: containers
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Container Networking
+
+## Introduction
+
+Container networking determines how workloads talk to each other, the host, and the rest of the network. In Docker environments, understanding bridge networks, published ports, and special drivers is essential for secure and predictable service deployment.
+
+## Purpose
+
+This document explains how container networking works so you can:
+
+- Choose the right network mode for a workload
+- Avoid unnecessary host exposure
+- Troubleshoot service discovery and connectivity problems
+- Design cleaner multi-service stacks
+
+## Architecture Overview
+
+Docker commonly uses these networking approaches:
+
+- Default bridge: basic isolated network for containers on one host
+- User-defined bridge: preferred for most application stacks because it adds built-in DNS and cleaner isolation
+- Host network: container shares the host network namespace
+- Macvlan or ipvlan: container appears directly on the physical network
+- Overlay: multi-host networking for orchestrated environments such as Swarm
+
+## Network Modes
+
+### User-defined bridge
+
+This is the normal choice for single-host multi-container applications. Containers on the same network can resolve each other by service or container name.
+
+Example:
+
+```bash
+docker network create app-net
+docker run -d --name db --network app-net postgres:16
+docker run -d --name app --network app-net ghcr.io/example/app:1.2.3
+```
+
+### Published ports
+
+Publishing a port maps traffic from the host into the container:
+
+```bash
+docker run -d -p 8080:80 nginx:stable
+```
+
+This exposes a service through the host IP and should be limited to the ports you actually need.
+
+### Host networking
+
+Host networking removes network namespace isolation. It can be useful for performance-sensitive agents or software that depends on broadcast-heavy behavior, but it increases the chance of port conflicts and broad host exposure.
+
+### Macvlan or ipvlan
+
+These drivers give a container its own presence on the LAN. They can be useful for software that needs direct network identity, but they also bypass some of the simplicity and isolation of bridge networking.
+
+## Configuration Example
+
+Compose network example:
+
+```yaml
+services:
+  reverse-proxy:
+    image: caddy:2
+    ports:
+      - "80:80"
+      - "443:443"
+    networks:
+      - edge
+
+  app:
+    image: ghcr.io/example/app:1.2.3
+    networks:
+      - edge
+      - backend
+
+  db:
+    image: postgres:16
+    networks:
+      - backend
+
+networks:
+  edge:
+  backend:
+    internal: true
+```
+
+In this pattern, the database is not reachable directly from the host or external clients.
+
+## Troubleshooting Tips
+
+### Container can reach the internet but not another container
+
+- Verify both containers are attached to the same user-defined network
+- Use container or service names rather than host loopback addresses
+
+### Service is reachable internally but not from another host
+
+- Confirm the port is published on the host
+- Check host firewall rules and upstream routing
+
+### Random connectivity issues after custom network changes
+
+- Inspect network configuration with `docker network inspect <name>`
+- Check for overlapping subnets between Docker networks and the physical LAN
+- Restart affected containers after major network topology changes
+
+## Best Practices
+
+- Use user-defined bridge networks instead of the legacy default bridge where possible
+- Publish only reverse proxy or explicitly required service ports
+- Keep databases and internal backends on private internal networks
+- Avoid `network_mode: host` unless there is a clear technical reason
+- Document custom subnets to avoid conflicts with VPN and LAN address plans
+
+## References
+
+- [Docker: Bridge network driver](https://docs.docker.com/network/drivers/bridge/)
+- [Docker: Networking overview](https://docs.docker.com/engine/network/)
+- [Docker: Published ports](https://docs.docker.com/get-started/docker-concepts/running-containers/publishing-ports/)
--- a/Knowledge/containers/persistent-volumes.md
+++ b/Knowledge/containers/persistent-volumes.md
@@ -0,0 +1,125 @@
+---
+title: Persistent Volumes
+description: Storage patterns for keeping container data durable across restarts and upgrades
+tags:
+  - containers
+  - docker
+  - storage
+category: containers
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Persistent Volumes
+
+## Introduction
+
+Containers are disposable, but application data usually is not. Persistent volumes provide storage that survives container restarts, recreation, and image upgrades.
+
+## Purpose
+
+Use persistent volumes to:
+
+- Preserve databases, uploads, and application state
+- Separate data lifecycle from container lifecycle
+- Simplify backup and restore workflows
+- Reduce accidental data loss during redeployments
+
+## Architecture Overview
+
+Docker storage typically falls into three categories:
+
+- Named volumes: managed by Docker and usually the best default for persistent app data
+- Bind mounts: direct host paths mounted into a container
+- Tmpfs mounts: memory-backed storage for temporary data
+
+## Storage Patterns
+
+### Named volumes
+
+Named volumes are portable within a host and reduce the chance of coupling to host directory layouts.
+
+```bash
+docker volume create postgres-data
+docker run -d \
+  --name db \
+  -v postgres-data:/var/lib/postgresql/data \
+  postgres:16
+```
+
+### Bind mounts
+
+Bind mounts are useful when:
+
+- The application expects editable configuration files
+- You need direct host visibility into files
+- Backups are based on host file paths
+
+Example:
+
+```bash
+docker run -d \
+  --name caddy \
+  -v /srv/caddy/Caddyfile:/etc/caddy/Caddyfile:ro \
+  -v /srv/caddy/data:/data \
+  caddy:2
+```
+
+### Permissions and ownership
+
+Many container storage issues come from mismatched UID and GID values between the host and containerized process. Check the image documentation and align ownership before assuming the application is broken.
+
+## Configuration Example
+
+Compose example with named volumes:
+
+```yaml
+services:
+  app:
+    image: ghcr.io/example/app:1.2.3
+    volumes:
+      - app-data:/var/lib/app
+
+  db:
+    image: postgres:16
+    volumes:
+      - db-data:/var/lib/postgresql/data
+
+volumes:
+  app-data:
+  db-data:
+```
+
+## Troubleshooting Tips
+
+### Data disappears after updating a container
+
+- Verify the service is writing to the mounted path
+- Check whether a bind mount accidentally hides expected image content
+- Inspect mounts with `docker inspect <container>`
+
+### Permission denied errors
+
+- Check ownership and mode bits on bind-mounted directories
+- Match container user expectations to host permissions
+- Avoid mounting sensitive directories with broad write access
+
+### Backups restore but the app still fails
+
+- Confirm the restored data matches the application version
+- Restore metadata such as permissions and database WAL files if applicable
+- Test restores on a separate host before using them in production
+
+## Best Practices
+
+- Use named volumes for most stateful container data
+- Use bind mounts deliberately for human-managed configuration
+- Keep backups separate from the production host
+- Record where every service stores its critical state
+- Test restore procedures, not only backup creation
+
+## References
+
+- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
+- [Docker: Bind mounts](https://docs.docker.com/engine/storage/bind-mounts/)
+- [Docker: Tmpfs mounts](https://docs.docker.com/engine/storage/tmpfs/)
--- a/Knowledge/devops/ci-cd-basics.md
+++ b/Knowledge/devops/ci-cd-basics.md
@@ -0,0 +1,112 @@
+---
+title: CI/CD Basics
+description: Introduction to continuous integration and continuous delivery pipelines for application and infrastructure repositories
+tags:
+  - ci
+  - cd
+  - devops
+category: devops
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# CI/CD Basics
+
+## Introduction
+
+Continuous integration and continuous delivery reduce manual deployment risk by automating validation, packaging, and release steps. Even small self-hosted projects benefit from predictable pipelines that lint, test, and package changes before they reach live systems.
+
+## Purpose
+
+CI/CD pipelines help with:
+
+- Fast feedback on changes
+- Repeatable build and test execution
+- Safer promotion of artifacts between environments
+- Reduced manual drift in deployment procedures
+
+## Architecture Overview
+
+A basic pipeline usually includes:
+
+- Trigger: push, pull request, tag, or schedule
+- Jobs: isolated units such as lint, test, build, or deploy
+- Artifacts: build outputs or packages passed to later stages
+- Environments: dev, staging, production, or similar release targets
+
+Typical flow:
+
+```text
+Commit -> CI checks -> Build artifact -> Approval or policy gate -> Deploy
+```
+
+## Core Concepts
+
+### Continuous integration
+
+Every meaningful change should run automated checks quickly and consistently.
+
+### Continuous delivery
+
+Artifacts are always kept in a releasable state, even if production deployment requires a manual approval.
+
+### Continuous deployment
+
+Every validated change is deployed automatically. This is powerful but requires strong tests, rollback paths, and change confidence.
+
+## Configuration Example
+
+GitHub Actions workflow example:
+
+```yaml
+name: ci
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 22
+      - run: npm ci
+      - run: npm test
+```
+
+## Troubleshooting Tips
+
+### Pipeline is slow and developers stop trusting it
+
+- Run fast checks early
+- Cache dependencies carefully
+- Separate heavyweight integration tests from every small change if needed
+
+### Deployments succeed but services still break
+
+- Add health checks and post-deploy validation
+- Make environment-specific configuration explicit
+- Track which artifact version reached which environment
+
+### CI and local results disagree
+
+- Match tool versions between local and CI environments
+- Keep pipeline setup code in version control
+- Avoid hidden mutable runners when reproducibility matters
+
+## Best Practices
+
+- Keep CI feedback fast enough to be used during active development
+- Require checks before merging to shared branches
+- Build once and promote the same artifact when possible
+- Separate validation, packaging, and deployment concerns
+- Treat pipeline configuration as production code
+
+## References
+
+- [GitHub Docs: Understanding GitHub Actions](https://docs.github.com/actions/about-github-actions/understanding-github-actions)
+- [GitHub Docs: Workflow syntax for GitHub Actions](https://docs.github.com/actions/reference/workflows-and-actions/workflow-syntax)
--- a/Knowledge/devops/git-workflows.md
+++ b/Knowledge/devops/git-workflows.md
@@ -0,0 +1,111 @@
+---
+title: Git Workflows
+description: Practical Git workflow patterns for teams and personal infrastructure repositories
+tags:
+  - git
+  - devops
+  - workflow
+category: devops
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Git Workflows
+
+## Introduction
+
+A Git workflow defines how changes move from local work to reviewed and deployable history. The right workflow keeps collaboration predictable without adding unnecessary ceremony.
+
+## Purpose
+
+This document covers the most common workflow choices for:
+
+- Application repositories
+- Infrastructure-as-code repositories
+- Self-hosted service configuration
+
+## Architecture Overview
+
+A Git workflow usually combines:
+
+- Branching strategy
+- Review policy
+- Merge policy
+- Release or deployment trigger
+
+The two patterns most teams evaluate first are:
+
+- Trunk-based development with short-lived branches
+- Feature branches with pull or merge requests
+
+## Common Workflow Patterns
+
+### Trunk-based with short-lived branches
+
+Changes are kept small and integrated frequently into the default branch. This works well for active teams, automated test pipelines, and repositories that benefit from continuous deployment.
+
+### Longer-lived feature branches
+
+This can be useful for larger changes or teams with less frequent integration, but it increases drift and merge complexity.
+
+### Infrastructure repositories
+
+For IaC and self-hosting repos, prefer small reviewed changes with strong defaults:
+
+- Protected main branch
+- Required checks before merge
+- Clear rollback path
+- Commit messages that explain operational impact
+
+## Configuration Example
+
+Example daily workflow:
+
+```bash
+git switch main
+git pull --ff-only
+git switch -c feature/update-grafana
+git add .
+git commit -m "Update Grafana image and alert rules"
+git push -u origin feature/update-grafana
+```
+
+Before merge:
+
+```bash
+git fetch origin
+git rebase origin/main
+```
+
+## Troubleshooting Tips
+
+### Merge conflicts happen constantly
+
+- Reduce branch lifetime
+- Split large changes into smaller reviewable commits
+- Rebase or merge from the default branch more frequently
+
+### History becomes hard to audit
+
+- Use meaningful commit messages
+- Avoid mixing unrelated infrastructure and application changes in one commit
+- Document the operational reason for risky changes in the pull request
+
+### Reverts are painful
+
+- Keep commits cohesive
+- Avoid squash-merging unrelated fixes together
+- Ensure deployments can be tied back to a specific Git revision
+
+## Best Practices
+
+- Prefer short-lived branches and small pull requests
+- Protect the default branch and require review for shared repos
+- Use fast-forward pulls locally to avoid accidental merge noise
+- Keep configuration and deployment code in Git, not in ad hoc host edits
+- Align the Git workflow with deployment automation instead of treating them as separate processes
+
+## References
+
+- [Git: `gitworkflows`](https://git-scm.com/docs/gitworkflows)
+- [Pro Git: Branching workflows](https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows)
--- a/Knowledge/infrastructure/monitoring-and-observability.md
+++ b/Knowledge/infrastructure/monitoring-and-observability.md
@@ -0,0 +1,58 @@
+---
+title: Monitoring and Observability
+description: Core concepts behind monitoring, alerting, and observability for self-hosted systems
+tags:
+  - monitoring
+  - observability
+  - operations
+category: infrastructure
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Monitoring and Observability
+
+## Summary
+
+Monitoring and observability provide visibility into system health, failure modes, and operational behavior. For self-hosted systems, they turn infrastructure from a black box into an environment that can be maintained intentionally.
+
+## Why it matters
+
+Without visibility, teams discover failures only after users notice them. Observability reduces diagnosis time, helps verify changes safely, and supports day-two operations such as capacity planning and backup validation.
+
+## Core concepts
+
+- Metrics: numerical measurements over time
+- Logs: event records produced by systems and applications
+- Traces: request-path visibility across components
+- Alerting: notifications triggered by actionable failure conditions
+- Service-level thinking: monitoring what users experience, not only host resource usage
+
+## Practical usage
+
+A practical starting point often includes:
+
+- Host metrics from exporters
+- Availability checks for critical endpoints
+- Dashboards for infrastructure and core services
+- Alerts for outages, storage pressure, certificate expiry, and failed backups
+
+## Best practices
+
+- Monitor both infrastructure health and service reachability
+- Alert on conditions that require action
+- Keep dashboards focused on questions operators actually ask
+- Use monitoring data to validate upgrades and incident recovery
+
+## Pitfalls
+
+- Treating dashboards as a substitute for alerts
+- Collecting far more data than anyone reviews
+- Monitoring only CPU and RAM while ignoring ingress, DNS, and backups
+- Sending noisy alerts that train operators to ignore them
+
+## References
+
+- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
+- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
+- [Grafana documentation](https://grafana.com/docs/grafana/latest/)
--- a/Knowledge/infrastructure/proxmox-cluster-basics.md
+++ b/Knowledge/infrastructure/proxmox-cluster-basics.md
@@ -0,0 +1,114 @@
+---
+title: Proxmox Cluster Basics
+description: Overview of how Proxmox VE clusters work, including quorum, networking, and operational constraints
+tags:
+  - proxmox
+  - virtualization
+  - clustering
+category: infrastructure
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Proxmox Cluster Basics
+
+## Introduction
+
+A Proxmox VE cluster groups multiple Proxmox nodes into a shared management domain. This allows centralized administration of virtual machines, containers, storage definitions, and optional high-availability workflows.
+
+## Purpose
+
+Use a Proxmox cluster when you want:
+
+- Centralized management for multiple hypervisor nodes
+- Shared visibility of guests, storage, and permissions
+- Live migration or controlled workload movement between nodes
+- A foundation for HA services backed by shared or replicated storage
+
+## Architecture Overview
+
+A Proxmox cluster relies on several core components:
+
+- `pvecm`: the cluster management tool used to create and join clusters
+- Corosync: provides the cluster communication layer
+- `pmxcfs`: the Proxmox cluster file system used to distribute cluster configuration
+- Quorum: majority voting used to protect cluster consistency
+
+Important operational behavior:
+
+- Each node normally has one vote
+- A majority of votes must be online for state-changing operations
+- Loss of quorum causes the cluster to become read-only for protected operations
+
+## Cluster Design Notes
+
+### Network requirements
+
+Proxmox expects a reliable low-latency network for cluster traffic. Corosync is sensitive to packet loss, jitter, and unstable links. In homelabs, this generally means wired LAN links, stable switching, and avoiding Wi-Fi for cluster communication.
+
+### Odd node counts
+
+Three nodes is the common minimum for a healthy quorum-based design. Two-node designs can work, but they need extra planning such as a QDevice or acceptance of reduced fault tolerance.
+
+### Storage considerations
+
+Clustering does not automatically provide shared storage. Features such as live migration and HA depend on storage design:
+
+- Shared storage: NFS, iSCSI, Ceph, or other shared backends
+- Replicated local storage: possible for some workflows, but requires careful planning
+- Backup storage: separate from guest runtime storage
+
+## Configuration Example
+
+Create a new cluster on the first node:
+
+```bash
+pvecm create lab-cluster
+```
+
+Check cluster status:
+
+```bash
+pvecm status
+```
+
+Join another node to the cluster from that node:
+
+```bash
+pvecm add 192.0.2.10
+```
+
+Use placeholder management addresses in documentation and never expose real administrative IPs publicly.
+
+## Troubleshooting Tips
+
+### Cluster is read-only
+
+- Check quorum status with `pvecm status`
+- Look for network instability between nodes
+- Verify time synchronization and general host health
+
+### Node join fails
+
+- Confirm name resolution and basic IP reachability
+- Make sure cluster traffic is not filtered by a firewall
+- Verify the node is not already part of another cluster
+
+### Random cluster instability
+
+- Review packet loss, duplex mismatches, and switch reliability
+- Keep corosync on stable wired links with low latency
+- Separate heavy storage replication traffic from cluster messaging when possible
+
+## Best Practices
+
+- Use at least three voting members for a stable quorum model
+- Keep cluster traffic on reliable wired networking
+- Document node roles, storage backends, and migration dependencies
+- Treat the Proxmox management network as a high-trust segment
+- Test backup and restore separately from cluster failover assumptions
+
+## References
+
+- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
+- [Proxmox VE `pvecm` manual](https://pve.proxmox.com/pve-docs/pvecm.1.html)
--- a/Knowledge/infrastructure/reverse-proxy-patterns.md
+++ b/Knowledge/infrastructure/reverse-proxy-patterns.md
@@ -0,0 +1,125 @@
+---
+title: Reverse Proxy Patterns
+description: Common reverse proxy design patterns for self-hosted services and internal platforms
+tags:
+  - reverse-proxy
+  - networking
+  - self-hosting
+category: infrastructure
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Reverse Proxy Patterns
+
+## Introduction
+
+A reverse proxy accepts client requests and forwards them to upstream services. It commonly handles TLS termination, host-based routing, request header forwarding, and policy enforcement in front of self-hosted applications.
+
+## Purpose
+
+Reverse proxies are used to:
+
+- Publish multiple services behind one or a few public entry points
+- Centralize TLS certificates
+- Apply authentication, authorization, or rate-limiting controls
+- Simplify backend service placement and migration
+
+## Architecture Overview
+
+Typical request flow:
+
+```text
+Client -> Reverse proxy -> Upstream application
+```
+
+Common proxy responsibilities:
+
+- TLS termination and certificate management
+- Routing by hostname, path, or protocol
+- Forwarding of `Host`, client IP, and other headers
+- Optional load balancing across multiple backends
+
+## Common Patterns
+
+### Edge proxy for many internal services
+
+One proxy handles traffic for multiple hostnames:
+
+- `grafana.example.com`
+- `gitea.example.com`
+- `vault.example.com`
+
+This is a good default for small homelabs and internal platforms.
+
+### Internal proxy behind a VPN
+
+Administrative services are reachable only through a private network such as Tailscale, WireGuard, or a dedicated management VLAN. This reduces public attack surface.
+
+### Path-based routing
+
+Useful when hostnames are limited, but more fragile than host-based routing because some applications assume they live at `/`.
+
+### Dynamic discovery proxy
+
+Tools such as Traefik can watch container metadata and update routes automatically. This reduces manual config for dynamic container environments, but it also makes label hygiene and network policy more important.
+
+## Configuration Example
+
+NGINX example:
+
+```nginx
+server {
+    listen 443 ssl http2;
+    server_name app.example.com;
+
+    location / {
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+        proxy_pass http://127.0.0.1:8080;
+    }
+}
+```
+
+Caddy example:
+
+```caddyfile
+app.example.com {
+    reverse_proxy 127.0.0.1:8080
+}
+```
+
+## Troubleshooting Tips
+
+### Application redirects to the wrong URL
+
+- Check forwarded headers such as `Host` and `X-Forwarded-Proto`
+- Verify the application's configured external base URL
+- Confirm TLS termination behavior matches application expectations
+
+### WebSocket or streaming traffic fails
+
+- Check proxy support for upgraded connections
+- Review buffering behavior if the application expects streaming responses
+
+### Backend works locally but not through the proxy
+
+- Verify the proxy can reach the upstream host and port
+- Check the proxy network namespace if running in a container
+- Confirm firewall rules permit the proxy-to-upstream path
+
+## Best Practices
+
+- Prefer host-based routing over deep path rewriting
+- Publish only the services that need an edge entry point
+- Keep proxy configuration under version control
+- Use separate internal and public entry points when trust boundaries differ
+- Standardize upstream headers and base URL settings across applications
+
+## References
+
+- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
+- [Traefik: Routing overview](https://doc.traefik.io/traefik/routing/overview/)
+- [Caddy: `reverse_proxy` directive](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy)
--- a/Knowledge/infrastructure/service-architecture-patterns.md
+++ b/Knowledge/infrastructure/service-architecture-patterns.md
@@ -0,0 +1,66 @@
+---
+title: Service Architecture Patterns
+description: Common service architecture patterns for self-hosted platforms and small engineering environments
+tags:
+  - architecture
+  - services
+  - infrastructure
+category: infrastructure
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Service Architecture Patterns
+
+## Summary
+
+Service architecture patterns describe how applications are packaged, connected, exposed, and operated. In self-hosted environments, the most useful patterns balance simplicity, isolation, and operability rather than chasing scale for its own sake.
+
+## Why it matters
+
+Architecture decisions affect deployment complexity, failure domains, recovery steps, and long-term maintenance. Small environments benefit from choosing patterns that remain understandable without full-time platform engineering overhead.
+
+## Core concepts
+
+- Single-service deployment: one service per VM or container stack
+- Shared platform services: DNS, reverse proxy, monitoring, identity, backups
+- Stateful versus stateless workloads
+- Explicit ingress, persistence, and dependency boundaries
+- Loose coupling through DNS, reverse proxies, and documented interfaces
+
+## Practical usage
+
+Useful patterns for self-hosted systems include:
+
+- Reverse proxy plus multiple backend services
+- Dedicated database service with application separation
+- Utility VMs or containers for platform services
+- Private admin interfaces with public application ingress kept separate
+
+Example dependency view:
+
+```text
+Client -> Reverse proxy -> Application -> Database
+                       -> Identity provider
+                       -> Monitoring and logs
+```
+
+## Best practices
+
+- Keep stateful services isolated and clearly backed up
+- Make ingress paths and dependencies easy to trace
+- Reuse shared platform services where they reduce duplication
+- Prefer a small number of well-understood patterns across the environment
+
+## Pitfalls
+
+- Putting every service into one giant stack with unclear boundaries
+- Mixing public ingress and administrative paths without review
+- Scaling architecture complexity before operational need exists
+- Depending on undocumented local assumptions between services
+
+## References
+
+- [Martin Fowler: MonolithFirst](https://martinfowler.com/bliki/MonolithFirst.html)
+- [The Twelve-Factor App](https://12factor.net/)
+- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
--- a/Knowledge/infrastructure/service-discovery.md
+++ b/Knowledge/infrastructure/service-discovery.md
@@ -0,0 +1,122 @@
+---
+title: Service Discovery
+description: Concepts and practical patterns for finding services in self-hosted and homelab environments
+tags:
+  - networking
+  - service-discovery
+  - dns
+category: infrastructure
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Service Discovery
+
+## Introduction
+
+Service discovery is the process of locating services by identity instead of hard-coded IP addresses and ports. It becomes more important as workloads move between hosts, IPs change, or multiple service instances exist behind one logical name.
+
+## Purpose
+
+Good service discovery helps with:
+
+- Decoupling applications from fixed network locations
+- Supporting scaling and failover
+- Simplifying service-to-service communication
+- Reducing manual DNS and inventory drift
+
+## Architecture Overview
+
+There are several discovery models commonly used in self-hosted environments:
+
+- Static DNS: manually managed A, AAAA, CNAME, or SRV records
+- DNS-based service discovery: clients query DNS or DNS-SD metadata
+- mDNS: local-link multicast discovery for small LANs
+- Registry-based discovery: a central catalog such as Consul tracks service registration and health
+
+## Discovery Patterns
+
+### Static DNS
+
+Best for stable infrastructure services such as hypervisors, reverse proxies, storage appliances, and monitoring endpoints.
+
+Example:
+
+```text
+proxy.internal.example   A     192.168.20.10
+grafana.internal.example CNAME proxy.internal.example
+```
+
+### DNS-SD and mDNS
+
+Useful for local networks where clients need to discover services such as printers or media endpoints. This works well for small trusted LAN segments, but it does not cross routed boundaries cleanly without extra relays or reflectors.
+
+### Registry-based discovery
+
+A service catalog stores registrations and health checks. Clients query the catalog or use DNS interfaces exposed by the registry.
+
+This is useful when:
+
+- Service instances are dynamic
+- Health-aware routing matters
+- Multiple nodes host the same service
+
+## Configuration Example
+
+Consul service registration example:
+
+```json
+{
+  "service": {
+    "name": "gitea",
+    "port": 3000,
+    "checks": [
+      {
+        "http": "http://127.0.0.1:3000/api/healthz",
+        "interval": "10s"
+      }
+    ]
+  }
+}
+```
+
+DNS-SD example concept:
+
+```text
+_https._tcp.internal.example SRV 0 0 443 proxy.internal.example
+```
+
+## Troubleshooting Tips
+
+### Clients resolve a name but still fail to connect
+
+- Check whether the resolved port is correct
+- Verify firewall policy and reverse proxy routing
+- Confirm the service is healthy, not just registered
+
+### Discovery works on one VLAN but not another
+
+- Review routed DNS access
+- Check whether the workload depends on multicast discovery such as mDNS
+- Avoid relying on broadcast or multicast across segmented networks unless intentionally supported
+
+### Service records become stale
+
+- Use health checks where possible
+- Remove hand-managed DNS entries that no longer match current placements
+- Prefer stable canonical names in front of dynamic backends
+
+## Best Practices
+
+- Use DNS as the default discovery mechanism for stable infrastructure
+- Add service registries only when the environment is dynamic enough to justify them
+- Pair discovery with health checks when multiple instances or failover paths exist
+- Keep discovery names human-readable and environment-specific
+- Avoid hard-coding IP addresses in application configuration unless there is no realistic alternative
+
+## References
+
+- [Consul: Discover services overview](https://developer.hashicorp.com/consul/docs/discover)
+- [Consul: Service discovery explained](https://developer.hashicorp.com/consul/docs/use-case/service-discovery)
+- [RFC 6762: Multicast DNS](https://www.rfc-editor.org/rfc/rfc6762)
+- [RFC 6763: DNS-Based Service Discovery](https://www.rfc-editor.org/rfc/rfc6763)
--- a/Knowledge/networking/dns-architecture.md
+++ b/Knowledge/networking/dns-architecture.md
@@ -0,0 +1,66 @@
+---
+title: DNS Architecture
+description: Core DNS architecture patterns for self-hosted and homelab environments
+tags:
+  - dns
+  - networking
+  - infrastructure
+category: networking
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# DNS Architecture
+
+## Summary
+
+DNS architecture defines how names are assigned, resolved, delegated, and operated across internal and external systems. In self-hosted environments, good DNS design reduces configuration drift, improves service discoverability, and simplifies remote access.
+
+## Why it matters
+
+DNS is a foundational dependency for reverse proxies, TLS, service discovery, monitoring, and operator workflows. Weak DNS design creates brittle systems that depend on hard-coded IP addresses and manual recovery steps.
+
+## Core concepts
+
+- Authoritative DNS: the source of truth for a zone
+- Recursive resolution: the process clients use to resolve names
+- Internal DNS: records intended only for private services
+- Split-horizon DNS: different answers depending on the client context
+- TTL: cache lifetime that affects propagation and change speed
+
+## Practical usage
+
+A practical self-hosted DNS model often includes:
+
+- Public DNS for internet-facing records
+- Internal DNS for management and private services
+- Reverse proxy hostnames for application routing
+- Stable names for infrastructure services such as hypervisors, backup targets, and monitoring systems
+
+Example record set:
+
+```text
+proxy.example.net        A      198.51.100.20
+grafana.internal.example A      192.0.2.20
+gitea.internal.example   CNAME  proxy.internal.example
+```
+
+## Best practices
+
+- Use DNS names instead of embedding IP addresses in application config
+- Separate public and private naming where trust boundaries differ
+- Keep TTLs appropriate for the change rate of the record
+- Treat authoritative DNS as critical infrastructure with backup and access control
+
+## Pitfalls
+
+- Reusing the same name for unrelated services over time
+- Forgetting that split DNS can confuse troubleshooting if undocumented
+- Leaving DNS ownership unclear across platforms and providers
+- Building service dependencies on local `/etc/hosts` entries
+
+## References
+
+- [Cloudflare Learning Center: What is DNS?](https://www.cloudflare.com/learning/dns/what-is-dns/)
+- [RFC 1034: Domain Concepts and Facilities](https://www.rfc-editor.org/rfc/rfc1034)
+- [RFC 1035: Domain Implementation and Specification](https://www.rfc-editor.org/rfc/rfc1035)
--- a/Knowledge/networking/homelab-network-segmentation.md
+++ b/Knowledge/networking/homelab-network-segmentation.md
@@ -0,0 +1,131 @@
+---
+title: Network Segmentation for Homelabs
+description: Practical network segmentation patterns for separating trust zones in a homelab
+tags:
+  - networking
+  - security
+  - homelab
+category: networking
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Network Segmentation for Homelabs
+
+## Introduction
+
+Network segmentation reduces blast radius by separating devices and services into smaller trust zones. In a homelab, this helps isolate management systems, user devices, public services, and less trusted endpoints such as IoT equipment.
+
+## Purpose
+
+Segmentation is useful for:
+
+- Limiting lateral movement after a compromise
+- Keeping management interfaces off general user networks
+- Isolating noisy or untrusted devices
+- Applying different routing, DNS, and firewall policies per zone
+
+## Architecture Overview
+
+A practical homelab usually benefits from separate L3 segments or VLANs for at least the following areas:
+
+- Management: hypervisors, switches, storage admin interfaces
+- Servers: application VMs, container hosts, databases
+- Clients: laptops, desktops, mobile devices
+- IoT: cameras, media devices, printers, controllers
+- Guest: devices that should only reach the internet
+- Storage or backup: optional dedicated replication path
+
+Example layout:
+
+```text
+VLAN 10  Management   192.168.10.0/24
+VLAN 20  Servers      192.168.20.0/24
+VLAN 30  Clients      192.168.30.0/24
+VLAN 40  IoT          192.168.40.0/24
+VLAN 50  Guest        192.168.50.0/24
+```
+
+Traffic should pass through a firewall or router between zones instead of being bridged freely.
+
+## Design Guidelines
+
+### Segment by trust and function
+
+Start with simple boundaries:
+
+- High trust: management, backup, secrets infrastructure
+- Medium trust: internal application servers
+- Lower trust: personal devices, guest devices, consumer IoT
+
+### Route between zones with policy
+
+Use inter-VLAN routing with explicit firewall rules. Default deny between segments is easier to reason about than a flat network with ad hoc exceptions.
+
+### Use DNS intentionally
+
+- Give internal services stable names
+- Avoid exposing management DNS records to guest or IoT segments
+- Consider split DNS for remote access through Tailscale or another VPN
+
+### Minimize overlap
+
+Use clean RFC 1918 address plans and document them. Overlapping subnets complicate VPN routing, container networking, and future site expansion.
+
+## Configuration Example
+
+Example policy intent for a firewall:
+
+```text
+Allow Clients -> Servers : TCP 80,443
+Allow Management -> Servers : any
+Allow Servers -> Storage : TCP 2049,445,3260 as needed
+Deny IoT -> Management : any
+Deny Guest -> Internal RFC1918 ranges : any
+```
+
+Example address planning notes:
+
+```text
+192.168.10.0/24  Management
+192.168.20.0/24  Server workloads
+192.168.30.0/24  User devices
+192.168.40.0/24  IoT
+192.168.50.0/24  Guest
+fd00:10::/64     IPv6 management ULA
+```
+
+## Troubleshooting Tips
+
+### Service works from one VLAN but not another
+
+- Check the inter-VLAN firewall rule order
+- Confirm DNS resolves to the intended internal address
+- Verify the destination service is listening on the right interface
+
+### VPN users can reach too much
+
+- Review ACLs or firewall policy for routed VPN traffic
+- Publish only the required subnets through subnet routers
+- Avoid combining management and user services in the same routed segment
+
+### Broadcast-dependent services break across segments
+
+- Use unicast DNS or service discovery where possible
+- For mDNS-dependent workflows, consider a reflector only where justified
+- Do not flatten the network just to support one legacy discovery method
+
+## Best Practices
+
+- Keep management on its own segment from the beginning
+- Treat IoT and guest networks as untrusted
+- Document every VLAN, subnet, DHCP scope, and routing rule
+- Prefer L3 policy enforcement over broad L2 access
+- Revisit segmentation when new services expose public endpoints or remote admin paths
+
+## References
+
+- [RFC 1918: Address Allocation for Private Internets](https://www.rfc-editor.org/rfc/rfc1918)
+- [RFC 4193: Unique Local IPv6 Unicast Addresses](https://www.rfc-editor.org/rfc/rfc4193)
+- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
+- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
--- a/Knowledge/networking/tailscale-overview.md
+++ b/Knowledge/networking/tailscale-overview.md
@@ -0,0 +1,123 @@
+---
+title: Tailscale Overview
+description: Conceptual overview of how Tailscale works and where it fits in a homelab or engineering environment
+tags:
+  - networking
+  - tailscale
+  - vpn
+category: networking
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Tailscale Overview
+
+## Introduction
+
+Tailscale is a mesh VPN built on WireGuard. It provides secure connectivity between devices without requiring a traditional hub-and-spoke VPN concentrator for day-to-day traffic. In practice, it is often used to reach homelab services, administrative networks, remote workstations, and private developer environments.
+
+## Purpose
+
+The main purpose of Tailscale is to make private networking easier to operate:
+
+- Identity-based access instead of exposing services directly to the internet
+- Encrypted device-to-device connectivity
+- Simple onboarding across laptops, servers, phones, and virtual machines
+- Optional features for routing subnets, advertising exit nodes, and publishing services
+
+## Architecture Overview
+
+Tailscale separates coordination from data transfer.
+
+- Control plane: devices authenticate to Tailscale and exchange node information, keys, policy, and routing metadata
+- Data plane: traffic is encrypted with WireGuard and sent directly between peers whenever possible
+- Relay fallback: when direct peer-to-peer connectivity is blocked, traffic can traverse DERP relays
+
+Typical flow:
+
+```text
+Client -> Tailscale control plane for coordination
+Client <-> Peer direct WireGuard tunnel when possible
+Client -> DERP relay -> Peer when direct connectivity is unavailable
+```
+
+Important components:
+
+- Tailnet: the private network that contains your devices and policies
+- ACLs or grants: rules that control which identities can reach which resources
+- Tags: non-human identities for servers and automation
+- MagicDNS: tailnet DNS names for easier service discovery
+- Subnet routers: devices that advertise non-Tailscale LAN routes
+- Exit nodes: devices that forward default internet-bound traffic
+
+## Core Concepts
+
+### Identity first
+
+Tailscale access control is tied to users, groups, devices, and tags rather than only source IP addresses. This works well for environments where laptops move between networks and services are distributed across cloud and on-prem hosts.
+
+### Peer-to-peer by default
+
+When NAT traversal succeeds, traffic goes directly between devices. This reduces latency and avoids creating a permanent bottleneck on one VPN server.
+
+### Overlay networking
+
+Each device keeps its normal local network connectivity and also gains a Tailscale address space. This makes it useful for remote administration without redesigning the entire local network.
+
+## Configuration Example
+
+Install and authenticate a Linux node:
+
+```bash
+curl -fsSL https://tailscale.com/install.sh | sh
+sudo tailscale up
+tailscale status
+```
+
+Advertise the node as infrastructure with a tag:
+
+```bash
+sudo tailscale up --advertise-tags=tag:server
+```
+
+## Operational Notes
+
+- Use ACLs or grants early instead of leaving the entire tailnet flat
+- Use tags for servers, containers, and automation agents
+- Prefer MagicDNS or split DNS over hard-coded IP lists
+- Treat subnet routers and exit nodes as infrastructure roles with extra review
+
+## Troubleshooting
+
+### Device is connected but cannot reach another node
+
+- Check whether ACLs or grants allow the connection
+- Confirm the target device is online with `tailscale status`
+- Verify the service is listening on the expected interface and port
+
+### Traffic is slower than expected
+
+- Confirm whether the connection is direct or using DERP
+- Inspect firewall and NAT behavior on both sides
+- Check whether the path crosses an exit node or subnet router unnecessarily
+
+### DNS names do not resolve
+
+- Verify MagicDNS is enabled
+- Check the client resolver configuration
+- Confirm the hostname exists in the tailnet admin UI
+
+## Best Practices
+
+- Use identity-based policies and avoid broad any-to-any access
+- Separate human users from infrastructure with groups and tags
+- Limit high-trust roles such as subnet routers and exit nodes
+- Document which services are intended for tailnet-only access
+- Keep the local firewall enabled; Tailscale complements it rather than replacing it
+
+## References
+
+- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
+- [Tailscale: How NAT traversal works](https://tailscale.com/blog/how-nat-traversal-works)
+- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
+- [Tailscale: MagicDNS](https://tailscale.com/kb/1081/magicdns)
--- a/Knowledge/security/gpg-basics.md
+++ b/Knowledge/security/gpg-basics.md
@@ -0,0 +1,120 @@
+---
+title: GPG Basics
+description: Overview of core GnuPG concepts, key management, and common operational workflows
+tags:
+  - security
+  - gpg
+  - encryption
+category: security
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# GPG Basics
+
+## Introduction
+
+GPG, implemented by GnuPG, is used for public-key encryption, signing, and verification. It remains common for signing Git commits and tags, exchanging encrypted files, and maintaining long-term personal or team keys.
+
+## Purpose
+
+This document covers:
+
+- What GPG keys and subkeys are
+- Common encryption and signing workflows
+- Key management practices that matter operationally
+
+## Architecture Overview
+
+A practical GPG setup often includes:
+
+- Primary key: used mainly for certification and identity management
+- Subkeys: used for signing, encryption, or authentication
+- Revocation certificate: lets you invalidate a lost or compromised key
+- Public key distribution: keyserver, WKD, or direct sharing
+
+The primary key should be treated as more sensitive than everyday-use subkeys.
+
+## Core Workflows
+
+### Generate a key
+
+Interactive generation:
+
+```bash
+gpg --full-generate-key
+```
+
+List keys:
+
+```bash
+gpg --list-secret-keys --keyid-format=long
+```
+
+### Export the public key
+
+```bash
+gpg --armor --export KEYID
+```
+
+### Encrypt a file for a recipient
+
+```bash
+gpg --encrypt --recipient KEYID secrets.txt
+```
+
+### Sign a file
+
+```bash
+gpg --detach-sign --armor release.tar.gz
+```
+
+### Verify a signature
+
+```bash
+gpg --verify release.tar.gz.asc release.tar.gz
+```
+
+## Configuration Example
+
+Export a revocation certificate after key creation:
+
+```bash
+gpg --output revoke-KEYID.asc --gen-revoke KEYID
+```
+
+Store that revocation certificate offline in a secure location.
+
+## Troubleshooting Tips
+
+### Encryption works but trust warnings appear
+
+- Confirm you imported the correct public key
+- Verify fingerprints out of band before marking a key as trusted
+- Do not treat keyserver availability as proof of identity
+
+### Git signing fails
+
+- Check that Git points to the expected key ID
+- Confirm the GPG agent is running
+- Verify terminal pinentry integration on the local system
+
+### Lost laptop or corrupted keyring
+
+- Restore from secure backups
+- Revoke compromised keys if needed
+- Reissue or rotate subkeys while keeping identity documentation current
+
+## Best Practices
+
+- Keep the primary key offline when practical and use subkeys day to day
+- Generate and safely store a revocation certificate immediately
+- Verify key fingerprints through a trusted secondary channel
+- Back up secret keys securely before relying on them operationally
+- Use GPG where it fits existing tooling; do not force it into workflows that are better served by simpler modern tools
+
+## References
+
+- [GnuPG Documentation](https://www.gnupg.org/documentation/)
+- [The GNU Privacy Handbook](https://www.gnupg.org/gph/en/manual/book1.html)
+- [GnuPG manual](https://www.gnupg.org/documentation/manuals/gnupg/)
--- a/Knowledge/security/identity-and-authentication.md
+++ b/Knowledge/security/identity-and-authentication.md
@@ -0,0 +1,65 @@
+---
+title: Identity and Authentication
+description: Core concepts and patterns for identity, authentication, and authorization in self-hosted systems
+tags:
+  - security
+  - identity
+  - authentication
+category: security
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Identity and Authentication
+
+## Summary
+
+Identity and authentication define who or what is requesting access and how that claim is verified. In self-hosted environments, a clear identity model is essential for secure remote access, service-to-service trust, and administrative control.
+
+## Why it matters
+
+As environments grow, per-application local accounts become hard to manage and harder to audit. Shared identity patterns reduce duplicated credentials, improve MFA coverage, and make access revocation more predictable.
+
+## Core concepts
+
+- Identity: the user, service, or device being represented
+- Authentication: proving that identity
+- Authorization: deciding what the identity may do
+- Federation: delegating identity verification to a trusted provider
+- MFA: requiring more than one authentication factor
+
+## Practical usage
+
+Common self-hosted patterns include:
+
+- Central identity provider for user login
+- SSO using OIDC or SAML for web applications
+- SSH keys or hardware-backed credentials for administrative access
+- Service accounts with narrowly scoped machine credentials
+
+Example pattern:
+
+```text
+User -> Identity provider -> OIDC token -> Reverse proxy or application
+Admin -> VPN -> SSH key or hardware-backed credential -> Server
+```
+
+## Best practices
+
+- Centralize user identity where possible
+- Enforce MFA for admin and internet-facing accounts
+- Separate human accounts from machine identities
+- Review how account disablement or key rotation propagates across services
+
+## Pitfalls
+
+- Leaving critical systems on isolated local accounts with no lifecycle control
+- Reusing the same credentials across multiple services
+- Treating authentication and authorization as the same problem
+- Forgetting account recovery and break-glass access paths
+
+## References
+
+- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
+- [NIST Digital Identity Guidelines](https://pages.nist.gov/800-63-3/)
+- [Yubico developer documentation](https://developers.yubico.com/)
--- a/Knowledge/security/secrets-management.md
+++ b/Knowledge/security/secrets-management.md
@@ -0,0 +1,109 @@
+---
+title: Secrets Management
+description: Principles and tool choices for handling secrets safely in self-hosted and engineering environments
+tags:
+  - security
+  - secrets
+  - devops
+category: security
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Secrets Management
+
+## Introduction
+
+Secrets management is the practice of storing, distributing, rotating, and auditing sensitive values such as API tokens, database passwords, SSH private keys, and certificate material.
+
+## Purpose
+
+Good secrets management helps you:
+
+- Keep credentials out of Git and chat logs
+- Reduce accidental disclosure in deployment pipelines
+- Rotate credentials without rewriting every system by hand
+- Apply least privilege to applications and operators
+
+## Architecture Overview
+
+A practical secrets strategy distinguishes between:
+
+- Human secrets: admin credentials, recovery codes, hardware token backups
+- Machine secrets: database passwords, API tokens, TLS private keys
+- Dynamic secrets: short-lived credentials issued on demand
+- Encrypted configuration: secrets stored in version control in encrypted form
+
+Common tooling patterns:
+
+- Vault for centrally managed and dynamic secrets
+- SOPS for Git-managed encrypted secret files
+- Platform-native secret stores for specific runtimes
+
+## Operational Model
+
+### Centralized secret service
+
+A service such as Vault handles storage, access policy, audit logging, and secret issuance. This is most useful when you need rotation, leasing, or many consumers across multiple environments.
+
+### Encrypted files in Git
+
+Tools such as SOPS allow you to keep encrypted configuration alongside deployment code. This is useful for small teams and GitOps-style workflows, as long as decryption keys are managed carefully.
+
+### Runtime injection
+
+Applications should receive secrets at runtime through a controlled delivery path rather than through hard-coded values inside images or repositories.
+
+## Configuration Example
+
+Example placeholder environment file layout:
+
+```text
+APP_DATABASE_URL=postgres://app:${DB_PASSWORD}@db.internal.example/app
+APP_SMTP_PASSWORD=<provided-at-runtime>
+```
+
+Example SOPS-managed YAML structure:
+
+```yaml
+database:
+  user: app
+  password: ENC[AES256_GCM,data:...,type:str]
+smtp:
+  password: ENC[AES256_GCM,data:...,type:str]
+```
+
+## Troubleshooting Tips
+
+### Secret appears in logs or shell history
+
+- Remove it from the source immediately if exposure is ongoing
+- Rotate the credential instead of assuming it stayed private
+- Review the delivery path that leaked it
+
+### Encrypted config exists but deployments still fail
+
+- Verify the deployment environment has access to the correct decryption keys
+- Check whether placeholders or environment interpolation are incomplete
+- Confirm the application reads secrets from the documented location
+
+### Secret sprawl grows over time
+
+- Inventory where secrets live and who owns them
+- Standardize naming and rotation intervals
+- Remove stale credentials from old hosts and repos
+
+## Best Practices
+
+- Never commit plaintext secrets to Git
+- Prefer short-lived or scoped credentials where the platform supports them
+- Separate secret storage from application images
+- Rotate credentials after incidents, staff changes, and major platform migrations
+- Document ownership, rotation method, and recovery path for every critical secret
+
+## References
+
+- [HashiCorp Vault: What is Vault?](https://developer.hashicorp.com/vault/docs/what-is-vault)
+- [HashiCorp Vault documentation](https://developer.hashicorp.com/vault/docs)
+- [SOPS documentation](https://getsops.io/docs/)
+- [The Twelve-Factor App: Config](https://12factor.net/config)