first version of the knowledge base :)

2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions
--- a/Systems/homelab/backup-architecture.md
+++ b/Systems/homelab/backup-architecture.md
@@ -0,0 +1,64 @@
+---
+title: Backup Architecture
+description: Reference backup architecture for self-hosted services, data, and infrastructure components
+tags:
+  - backup
+  - architecture
+  - self-hosting
+category: systems
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Backup Architecture
+
+## Summary
+
+A backup architecture defines what is protected, where copies live, and how recovery is validated. In self-hosted environments, the architecture must account for application data, infrastructure configuration, and the operational steps needed to restore service safely.
+
+## Why it matters
+
+Many backup failures are architectural rather than tool-specific. Storing copies on the wrong system, skipping configuration, or never testing restores can make an otherwise successful backup job useless during an incident.
+
+## Core concepts
+
+- Multiple copies across different failure domains
+- Separation of live storage, backup storage, and off-site retention
+- Consistent backups for databases and stateful services
+- Restore validation as part of the architecture
+
+## Practical usage
+
+A practical backup architecture usually includes:
+
+- Host or VM backups for infrastructure nodes
+- File or repository backups for application data
+- Separate backup of configuration, Compose files, and DNS or proxy settings
+- Off-site encrypted copy of critical repositories
+
+Example model:
+
+```text
+Primary workloads -> Local backup repository -> Off-site encrypted copy
+Infrastructure config -> Git + encrypted secret store -> off-site mirror
+```
+
+## Best practices
+
+- Back up both data and the metadata needed to use it
+- Keep at least one copy outside the main site or storage domain
+- Use backup tooling that supports verification and restore inspection
+- Make restore order and dependency assumptions explicit
+
+## Pitfalls
+
+- Treating snapshots as the only backup mechanism
+- Backing up encrypted data without preserving key recovery paths
+- Assuming application consistency without database-aware handling
+- Skipping restore drills for high-value services
+
+## References
+
+- [restic documentation](https://restic.readthedocs.io/en/latest/)
+- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
+- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)
--- a/Systems/homelab/homelab-architecture.md
+++ b/Systems/homelab/homelab-architecture.md
@@ -0,0 +1,156 @@
+---
+title: Homelab Architecture
+description: Reference architecture for building a maintainable homelab with clear trust zones and operational boundaries
+tags:
+  - homelab
+  - architecture
+  - infrastructure
+category: systems
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Homelab Architecture
+
+## Introduction
+
+A homelab architecture should make experimentation possible without turning the environment into an undocumented collection of one-off systems. The most effective designs separate compute, networking, storage, identity, and operations concerns so each layer can evolve without breaking everything above it.
+
+## Purpose
+
+This document describes a reusable architecture for:
+
+- Self-hosted services
+- Virtualization and container workloads
+- Secure remote access
+- Monitoring, backup, and update workflows
+
+## Architecture Overview
+
+A practical homelab can be viewed as layered infrastructure:
+
+```text
+Edge and Access
+  -> ISP/router, firewall, VPN, reverse proxy
+
+Network Segmentation
+  -> management, servers, clients, IoT, guest
+
+Compute
+  -> Proxmox nodes, VMs, container hosts
+
+Platform Services
+  -> DNS, reverse proxy, identity, secrets, service discovery
+
+Application Services
+  -> dashboards, git forge, media, automation, monitoring
+
+Data Protection
+  -> backups, snapshots, off-site copy, restore testing
+```
+
+## Recommended Building Blocks
+
+### Access and identity
+
+- VPN or zero-trust access layer for administrative entry
+- SSH with keys only for infrastructure access
+- DNS as the primary naming system for internal services
+
+### Compute
+
+- Proxmox for VM and LXC orchestration
+- Dedicated container hosts for Docker or another runtime
+- Utility VMs for DNS, reverse proxy, monitoring, and automation
+
+### Storage
+
+- Fast local storage for active workloads
+- Separate backup target with different failure characteristics
+- Clear distinction between snapshots and real backups
+
+### Observability and operations
+
+- Metrics collection with Prometheus-compatible exporters
+- Dashboards and alerting through Grafana and Alertmanager
+- Centralized backup jobs and restore validation
+- Controlled update workflow for host OS, containers, and dependencies
+
+## Example Layout
+
+```text
+VLAN 10  Management   hypervisors, switches, storage admin
+VLAN 20  Servers      reverse proxy, app VMs, databases
+VLAN 30  Clients      desktops, laptops, admin workstations
+VLAN 40  IoT          cameras, smart home, media devices
+VLAN 50  Guest        internet-only devices
+```
+
+Service placement example:
+
+- Reverse proxy and DNS on small utility VMs
+- Stateful applications in dedicated VMs or clearly documented persistent containers
+- Monitoring and backup services isolated from guest and IoT traffic
+
+## Configuration Example
+
+Example inventory model:
+
+```yaml
+edge:
+  router: gateway-01
+  vpn: tailscale
+
+compute:
+  proxmox:
+    - pve-01
+    - pve-02
+    - pve-03
+  docker_hosts:
+    - docker-01
+
+platform:
+  dns:
+    - dns-01
+  reverse_proxy:
+    - proxy-01
+  monitoring:
+    - mon-01
+  backup:
+    - backup-01
+```
+
+## Troubleshooting Tips
+
+### Services are easy to deploy but hard to operate
+
+- Add inventory, ownership, and restore notes
+- Separate platform services from experimental application stacks
+- Avoid hiding critical dependencies inside one large Compose file
+
+### Changes in one area break unrelated systems
+
+- Recheck network boundaries and shared credentials
+- Remove unnecessary coupling between storage, reverse proxy, and app hosts
+- Keep DNS, secrets, and backup dependencies explicit
+
+### Remote access becomes risky over time
+
+- Review which services are internet-exposed
+- Prefer tailnet-only or VPN-only admin paths
+- Keep management interfaces off user-facing networks
+
+## Best Practices
+
+- Design around failure domains, not only convenience
+- Keep a small number of core platform services well documented
+- Prefer simple, replaceable building blocks over fragile all-in-one stacks
+- Maintain an asset inventory with hostnames, roles, and backup coverage
+- Test recovery paths for DNS, identity, and backup infrastructure first
+
+## References
+
+- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
+- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
+- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
+- [Prometheus](https://prometheus.io/)
--- a/Systems/homelab/homelab-network-architecture.md
+++ b/Systems/homelab/homelab-network-architecture.md
@@ -0,0 +1,67 @@
+---
+title: Homelab Network Architecture
+description: Reference network architecture for a segmented homelab with private access and clear service boundaries
+tags:
+  - homelab
+  - networking
+  - architecture
+category: systems
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Homelab Network Architecture
+
+## Summary
+
+A homelab network architecture should separate trust zones, keep administrative paths private, and make service traffic easy to reason about. The goal is not enterprise complexity, but a structure that reduces blast radius and operational confusion.
+
+## Why it matters
+
+Flat networks are easy to start with and difficult to secure later. A basic segmented design helps isolate management, servers, clients, guest devices, and less trusted endpoints such as IoT hardware.
+
+## Core concepts
+
+- Segmentation by trust and function
+- Routed inter-VLAN policy instead of unrestricted layer-2 reachability
+- Separate administrative access paths from public ingress
+- DNS and reverse proxy as shared network-facing platform services
+
+## Practical usage
+
+Example logical layout:
+
+```text
+Management   -> hypervisors, switches, storage admin
+Servers      -> applications, databases, utility VMs
+Clients      -> workstations and laptops
+IoT          -> low-trust devices
+Guest        -> internet-only access
+VPN overlay  -> remote access for administrators and approved services
+```
+
+This model works well with:
+
+- A firewall or router handling inter-segment policy
+- Private access through Tailscale or another VPN
+- Reverse proxy entry points for published applications
+
+## Best practices
+
+- Keep management services on a dedicated segment
+- Use DNS names and documented routes instead of ad hoc host entries
+- Limit which segments can reach storage, backup, and admin systems
+- Treat guest and IoT networks as untrusted
+
+## Pitfalls
+
+- Publishing management interfaces through the same path as public apps
+- Allowing lateral access between all segments for convenience
+- Forgetting to document routing and firewall dependencies
+- Relying on multicast-based discovery across routed segments without a plan
+
+## References
+
+- [RFC 1918: Address Allocation for Private Internets](https://www.rfc-editor.org/rfc/rfc1918)
+- [RFC 4193: Unique Local IPv6 Unicast Addresses](https://www.rfc-editor.org/rfc/rfc4193)
+- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
--- a/Systems/homelab/identity-management-patterns.md
+++ b/Systems/homelab/identity-management-patterns.md
@@ -0,0 +1,64 @@
+---
+title: Identity Management Patterns
+description: System-level identity management patterns for self-hosted and homelab environments
+tags:
+  - identity
+  - authentication
+  - architecture
+category: systems
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Identity Management Patterns
+
+## Summary
+
+Identity management patterns describe how users, devices, and services are authenticated and governed across a self-hosted environment. Strong patterns reduce credential sprawl and make account lifecycle management more consistent.
+
+## Why it matters
+
+As services multiply, local account management becomes a source of weak passwords, missed offboarding, and inconsistent MFA coverage. A system-level identity pattern helps centralize trust while preserving operational fallback paths.
+
+## Core concepts
+
+- Central identity provider for users
+- Federated login to applications through OIDC or SAML
+- Strong admin authentication for infrastructure access
+- Separate handling for service accounts and machine credentials
+
+## Practical usage
+
+A practical identity pattern often looks like:
+
+```text
+Users -> Identity provider -> Web applications
+Admins -> VPN + SSH key or hardware-backed credential -> Infrastructure
+Services -> Scoped machine credentials -> Databases and APIs
+```
+
+Supporting services may include:
+
+- MFA-capable identity provider
+- Reverse proxy integration for auth-aware routing
+- Secrets management for service credentials
+
+## Best practices
+
+- Centralize user login where applications support it
+- Require MFA for administrative and internet-exposed access
+- Keep service credentials scoped to one system or purpose
+- Maintain documented break-glass and recovery procedures
+
+## Pitfalls
+
+- Treating shared admin accounts as acceptable long-term practice
+- Leaving old local users in place after federation is introduced
+- Using one service credential across many applications
+- Forgetting to protect the identity provider as critical infrastructure
+
+## References
+
+- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
+- [NIST Digital Identity Guidelines](https://pages.nist.gov/800-63-3/)
+- [Yubico developer documentation](https://developers.yubico.com/)
--- a/Systems/observability/monitoring-stack-architecture.md
+++ b/Systems/observability/monitoring-stack-architecture.md
@@ -0,0 +1,67 @@
+---
+title: Monitoring Stack Architecture
+description: Reference architecture for a monitoring stack in a self-hosted or homelab environment
+tags:
+  - monitoring
+  - observability
+  - architecture
+category: systems
+created: 2026-03-14
+updated: 2026-03-14
+---
+
+# Monitoring Stack Architecture
+
+## Summary
+
+A monitoring stack architecture defines how metrics, probes, dashboards, and alerts fit together. In self-hosted environments, the stack should stay small enough to operate but broad enough to cover infrastructure, ingress, and critical services.
+
+## Why it matters
+
+Monitoring that is bolted on late often misses the services operators actually depend on. A planned stack architecture makes it easier to understand where signals come from and how alerts reach the right people.
+
+## Core concepts
+
+- Collection: exporters and scrape targets
+- Storage and evaluation: Prometheus
+- Visualization: Grafana
+- Alert routing: Alertmanager
+- External validation: blackbox or equivalent endpoint checks
+
+## Practical usage
+
+Typical architecture:
+
+```text
+Hosts and services -> Exporters / probes -> Prometheus
+Prometheus -> Grafana dashboards
+Prometheus -> Alertmanager -> notification channel
+```
+
+Recommended coverage:
+
+- Host metrics for compute and storage systems
+- Endpoint checks for user-facing services
+- Backup freshness and certificate expiry
+- Platform services such as DNS, reverse proxy, and identity provider
+
+## Best practices
+
+- Monitor the path users depend on, not only the host underneath it
+- Keep the monitoring stack itself backed up and access controlled
+- Alert on actionable failures rather than every threshold crossing
+- Document ownership for critical alerts and dashboards
+
+## Pitfalls
+
+- Monitoring only CPU and memory while ignoring ingress and backups
+- Running a complex stack with no retention or alert review policy
+- Depending on dashboards alone for outage detection
+- Forgetting to monitor the monitoring components themselves
+
+## References
+
+- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
+- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
+- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
+- [Grafana documentation](https://grafana.com/docs/grafana/latest/)