first version of the knowledge base :)

This commit is contained in:
2026-03-14 11:41:54 +01:00
commit 27965301ad
47 changed files with 4356 additions and 0 deletions

View File

@@ -0,0 +1,59 @@
---
title: About Den Vault
description: Purpose and scope of Den Vault as a public technical knowledge base within the Hidden Den ecosystem
tags:
- about
- documentation
- knowledge-base
category: about
created: 2026-03-14
updated: 2026-03-14
---
# About Den Vault
## Summary
Den Vault is the public technical knowledge base for Hidden Den. It is designed as an engineering notebook, operations handbook, and reusable documentation archive for self-hosting, homelab engineering, DevOps workflows, Linux systems, and infrastructure design.
## Why it matters
Technical notes are most useful when they outlive the moment that produced them. Den Vault turns ad hoc knowledge into structured documentation that can help both the original operator and other engineers facing similar problems.
## Core concepts
- Public technical writing with reusable examples
- Documentation that favors systems, patterns, and operational guidance over personal notes
- Cross-linkable knowledge organized by domain, system, guide, and tool
- Research-backed articles that prefer official sources and current practices
- A human-scale technical archive that is meant to remain readable and calm rather than over-produced
## Practical usage
Den Vault is intended to document:
- Core infrastructure concepts
- Practical guides for setup and operations
- Tool references and architectural patterns
- Day-two operations such as monitoring, backups, and updates
- Public project context where it helps explain the surrounding ecosystem
## Best practices
- Write for a technically capable stranger, not only for the current operator
- Keep examples generic and safe for public publication
- Update documents when practices or tooling change
- Link claims to upstream documentation or standards when possible
## Pitfalls
- Turning public docs into private scratch notes
- Publishing configuration fragments that include sensitive details
- Writing tool-specific instructions without explaining the underlying concept
- Letting pages drift away from current upstream behavior
## References
- [Diataxis](https://diataxis.fr/)
- [Write the Docs: Docs as Code](https://www.writethedocs.org/guide/docs-as-code/)
- [Markdown Guide](https://www.markdownguide.org/)

View File

@@ -0,0 +1,64 @@
---
title: About Hidden Den
description: Overview of Hidden Den as a self-hosted engineering environment focused on privacy, durability, and human-scale infrastructure
tags:
- about
- homelab
- infrastructure
category: about
created: 2026-03-14
updated: 2026-03-14
---
# About Hidden Den
## Summary
Hidden Den is a self-hosted engineering environment centered on privacy, technical autonomy, durability, and human-scale infrastructure. It combines homelab systems, open source tooling, and practical DevOps workflows into a platform for running services, testing ideas, and documenting repeatable engineering patterns in a way that stays understandable over time.
## Why it matters
Many engineers operate personal infrastructure but leave the reasoning behind their systems undocumented or let it drift into a collection of tools without a clear operating model. Hidden Den exists to make the architecture, tradeoffs, and operating practices explicit while keeping the environment calm, maintainable, and fully owned by its operator.
## Core concepts
- Self-hosting as a way to understand and control critical services
- Privacy as both a philosophy and an implementation requirement
- Durable systems that can be migrated, backed up, repaired, and replaced
- Small, composable systems instead of opaque all-in-one stacks
- Documentation as part of the system, not separate from it
- Human-scale design that keeps technology legible and understandable
## Practical usage
Within the Hidden Den ecosystem, infrastructure topics typically include:
- Private access using VPN or zero-trust networking
- Virtualization and container workloads
- Reverse proxies, DNS, and service discovery
- Monitoring, backups, and update management
- Tooling that can be reproduced on standard Linux-based infrastructure
- Static or low-dependency publishing patterns when they reduce operational drag
## Best practices
- Prefer documented systems over convenient but fragile one-off fixes
- Keep infrastructure services understandable enough to rebuild
- Choose open standards and open source tools where practical
- Treat access control, backup, and observability as core services
- Favor warm, legible, low-friction systems over polished but opaque stacks
## Pitfalls
- Adding too many overlapping tools without a clear ownership model
- Relying on memory instead of written operational notes
- Exposing administrative services publicly when a private access layer is sufficient
- Allowing convenience to override maintainability
- Optimizing for image, novelty, or feature count instead of long-term operability
## References
- [The Twelve-Factor App](https://12factor.net/)
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Proxmox VE Administration Guide](https://pve.proxmox.com/pve-docs/)

View File

@@ -0,0 +1,65 @@
---
title: Design Principles
description: Architectural and operational design principles used to shape Den Vault content and the systems it describes
tags:
- about
- design
- architecture
category: about
created: 2026-03-14
updated: 2026-03-14
---
# Design Principles
## Summary
Den Vault favors design principles that make systems easier to understand, operate, and recover. These principles apply to both the infrastructure being documented and the way documentation itself is structured.
## Why it matters
Without stable design principles, infrastructure turns into a collection of local optimizations that are difficult to audit and harder to maintain. Shared principles make it easier to evaluate tools, architecture choices, and documentation quality consistently.
## Core concepts
- Systems should be understandable by inspection
- Keep it durable: back up, migrate, repair, and replace
- Leave room for care in interfaces, naming, and operational ergonomics
- Stay practical when choosing tradeoffs and tooling
- Clear trust boundaries
- Minimal exposed surface area
- Declarative configuration where possible
- Explicit ownership of data, identity, and ingress paths
- Observability and recovery designed in from the start
## Practical usage
These principles usually lead to patterns such as:
- Private administrative access through VPN or dedicated management networks
- Reverse proxies and DNS as shared platform services
- Container and VM workloads with documented persistence boundaries
- Backup and restore strategy treated as a design requirement
- Low-dependency site and service architecture when that improves long-term maintainability
## Best practices
- Prefer one clear pattern over multiple overlapping ones
- Keep the path of a request or dependency easy to trace
- Design around failure domains, not only nominal behavior
- Make operational boundaries visible in diagrams, inventory, and docs
- Treat usability and readability as part of technical quality
## Pitfalls
- Mixing management, storage, and user traffic without a reasoned boundary
- Depending on defaults without documenting them
- Building fragile chains of dependencies for simple services
- Adding security controls that cannot be operated consistently
- Making systems colder, more complex, or more abstract than the problem requires
## References
- [The Twelve-Factor App](https://12factor.net/)
- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)

View File

@@ -0,0 +1,61 @@
---
title: Documentation Philosophy
description: Principles for writing durable, public-facing technical documentation in Den Vault
tags:
- about
- documentation
- writing
category: about
created: 2026-03-14
updated: 2026-03-14
---
# Documentation Philosophy
## Summary
Documentation in Den Vault is written to be durable, public, and technically useful. Each document should explain a concept clearly enough for another engineer to apply it without needing access to private context.
## Why it matters
Unstructured notes decay quickly. Public technical documentation only becomes valuable when it is scoped well, written clearly, and maintained as part of the normal engineering workflow.
## Core concepts
- Write reusable technical documentation, not private memory aids
- Prefer explanation plus implementation guidance over checklist-only pages
- Keep documents scoped so they can be updated without rewriting the entire vault
- Use frontmatter and predictable structure for discoverability
- Keep writing human-scale, legible, and maintainable rather than optimized for feed-style consumption
## Practical usage
In practice, this means:
- Explaining why a technology or pattern matters before diving into commands
- Using placeholders instead of sensitive values
- Linking to upstream docs for authoritative behavior
- Revising pages when tooling or best practices change
- Preserving enough warmth and clarity that the docs feel written for people, not dashboards
## Best practices
- Prefer official documentation and standards as primary sources
- State assumptions clearly when a topic has multiple valid designs
- Keep examples small enough to understand but complete enough to use
- Separate conceptual articles, operational guides, and tool references
- Avoid unnecessary jargon when simpler language communicates the same idea accurately
## Pitfalls
- Mixing personal environment details into public docs
- Copying large blocks from upstream sources instead of summarizing them
- Writing instructions that depend on outdated versions or defaults
- Creating giant pages that cover many unrelated concerns
- Writing in a polished but impersonal style that obscures the actual operational intent
## References
- [Diataxis](https://diataxis.fr/)
- [Write the Docs: Docs as Code](https://www.writethedocs.org/guide/docs-as-code/)
- [Google Developer Documentation Style Guide](https://developers.google.com/style)

View File

@@ -0,0 +1,64 @@
---
title: Engineering Philosophy
description: Engineering principles that guide system design, operations, and documentation in Den Vault
tags:
- about
- engineering
- philosophy
category: about
created: 2026-03-14
updated: 2026-03-14
---
# Engineering Philosophy
## Summary
The engineering philosophy behind Den Vault favors clarity, repeatability, small composable systems, and operational ownership. The goal is not to collect tools, but to build infrastructure that can be reasoned about, repaired, and evolved with confidence.
## Why it matters
Homelab and self-hosted environments often fail for social reasons rather than technical ones: unclear ownership, undocumented dependencies, and systems that only work while the original operator remembers every detail. A defined philosophy helps prevent that drift.
## Core concepts
- Understand the system before scaling the system
- Durability over short-term convenience
- Explicit dependencies over hidden coupling
- Safe defaults over permissive convenience
- Automation where it reduces toil, not where it removes understanding
- Documentation as part of operational readiness
- Practicality over ideological purity
## Practical usage
This philosophy affects decisions such as:
- Preferring documented deployment code over manual changes on hosts
- Choosing tools that can be inspected and reasoned about directly
- Separating management and user-facing services by trust boundary
- Using monitoring and backup validation as part of system design
- Keeping change scopes small enough to review and roll back
- Favoring systems that are calm and maintainable over systems that look impressive but are fragile
## Best practices
- Start with the smallest architecture that can be operated reliably
- Prefer tools with clear operational models and good documentation
- Review failure domains before adding new dependencies
- Keep recovery steps visible and tested
- Prefer systems that can be migrated and replaced without platform lock-in
## Pitfalls
- Choosing tools for feature count rather than supportability
- Building tightly coupled stacks that cannot be upgraded safely
- Automating critical changes without health checks or rollback thinking
- Treating production-like services as disposable experiments
- Mistaking complexity or aesthetics for engineering quality
## References
- [The Twelve-Factor App](https://12factor.net/)
- [Google SRE Book](https://sre.google/books/)
- [Diataxis](https://diataxis.fr/)

View File

@@ -0,0 +1,62 @@
---
title: How This Knowledge Base Is Organized
description: Explanation of the folder structure and content model used by Den Vault
tags:
- about
- documentation
- structure
category: about
created: 2026-03-14
updated: 2026-03-14
---
# How This Knowledge Base Is Organized
## Summary
Den Vault is organized by document purpose rather than by one large mixed topic tree. The top-level structure separates foundational context, reusable knowledge, systems architecture, hands-on guides, reference material, learning notes, and tool documentation.
## Why it matters
A public knowledge base becomes hard to navigate when conceptual explanations, operational guides, and tool notes are mixed together. A consistent structure helps readers find the right kind of document quickly.
## Core concepts
- `00 - About`: project purpose, philosophy, and organizational context
- `10 - Projects`: project-specific documentation and initiatives
- `20 - Knowledge`: reusable concepts and architecture explanations
- `30 - Systems`: system-level patterns and higher-level design views
- `40 - Guides`: step-by-step implementation material
- `50 - Reference`: compact reference material
- `60 - Learning`: study notes and topic overviews
- `70 - Tools`: tool-specific deep dives and operational context
- `90 - Archive`: retired or superseded material
## Practical usage
When adding a document:
- Put conceptual explanations in `20 - Knowledge`
- Put architecture views and system patterns in `30 - Systems`
- Put procedural setup or operations documents in `40 - Guides`
- Put tool deep-dives and platform notes in `70 - Tools`
## Best practices
- Keep filenames specific and searchable
- Use frontmatter consistently for metadata and indexing
- Avoid duplicating the same topic in multiple places unless the purpose differs
- Link related concept, guide, and tool pages when it improves navigation
## Pitfalls
- Putting every page into one category because it is convenient in the moment
- Creating new top-level structures without a strong reason
- Mixing reference snippets with conceptual architecture writing
- Letting archived or outdated pages appear current
## References
- [Diataxis](https://diataxis.fr/)
- [Write the Docs: Docs as Code](https://www.writethedocs.org/guide/docs-as-code/)
- [GitHub Docs: Creating a table of contents for Markdown files](https://docs.github.com/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-collapsed-sections)

View File

@@ -0,0 +1,63 @@
---
title: Infrastructure Overview
description: High-level overview of the infrastructure domains documented within Hidden Den and Den Vault
tags:
- about
- infrastructure
- overview
category: about
created: 2026-03-14
updated: 2026-03-14
---
# Infrastructure Overview
## Summary
The infrastructure documented in Den Vault is built around a small set of repeatable domains: networking, compute, platform services, observability, security, and data protection. Together, these form a practical blueprint for self-hosted engineering environments.
## Why it matters
Readers need a clear map before diving into individual guides. An infrastructure overview helps explain how virtualization, containers, DNS, reverse proxying, monitoring, identity, and backups fit together as one operating model.
## Core concepts
- Networking and access: segmentation, VPN, DNS, ingress
- Compute: hypervisors, VMs, and container hosts
- Platform services: reverse proxy, service discovery, identity, secrets
- Operations: monitoring, alerting, backups, updates
- Tooling: documented platforms used to implement these layers
## Practical usage
A typical self-hosted environment described by Den Vault includes:
- Proxmox or equivalent compute hosts
- Docker-based application workloads
- Tailscale or another private access layer
- Reverse proxy and TLS termination with tools such as Caddy, Traefik, or NGINX
- Prometheus and Grafana for observability
- Backup tooling with regular validation
- Static or low-dependency site infrastructure where that aligns with privacy and maintainability goals
## Best practices
- Keep core platform services few in number and well understood
- Separate public ingress from administrative access paths
- Maintain inventory of systems, dependencies, and backup coverage
- Prefer architectures that can be rebuilt from documented source material
## Pitfalls
- Treating infrastructure as a pile of tools instead of a coherent system
- Running critical services without monitoring or backup validation
- Allowing naming, routing, and authentication patterns to drift over time
- Adding redundancy without understanding operational complexity
## References
- [Proxmox VE Administration Guide](https://pve.proxmox.com/pve-docs/)
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [restic documentation](https://restic.readthedocs.io/en/latest/)

View File

@@ -0,0 +1,62 @@
---
title: Projects Overview
description: Public overview of the current Hidden Den project landscape and how those projects relate to the wider ecosystem
tags:
- projects
- overview
- hidden-den
category: projects
created: 2026-03-14
updated: 2026-03-14
---
# Projects Overview
## Summary
The Hidden Den ecosystem includes a small set of public projects that reflect its broader themes: self-hosting, operational clarity, community tooling, and maintainable personal infrastructure.
## Why it matters
Project documentation helps readers understand where Den Vault fits. It connects the knowledge base to the practical tools and experiments that motivate many of the architectural and workflow notes documented elsewhere in the repository.
## Core concepts
- Stable tools are documented differently from in-progress experiments
- Project status should be visible so readers know what is operational versus exploratory
- Projects should be framed by what problem they solve, not only by their implementation details
## Practical usage
Publicly visible projects in the Hidden Den ecosystem currently include:
- `GuardDen`: security and moderation tooling for community operations; shown publicly as stable
- `openrabbit`: an open-source tool with an accessibility and community-development focus; shown publicly as stable
- `DevDen`: a development environment concept for the broader Den ecosystem; shown publicly as concept-stage
- `loyal_companion`: a companion bot project; shown publicly as work in progress
These projects suggest several documentation priorities for Den Vault:
- Self-hosted Git and forge workflows
- Bot and automation deployment patterns
- Security and moderation tooling architecture
- Durable development environments for small independent platforms
## Best practices
- Keep project descriptions short, public, and non-sensitive
- Record project maturity so readers understand whether a document is a reference, an experiment, or a roadmap item
- Link project docs to related tool, guide, and architecture pages when those pages exist
- Avoid over-documenting early concepts before the architecture stabilizes
## Pitfalls
- Treating project listings as a substitute for real technical documentation
- Publishing internal implementation details before they are safe for public release
- Letting project status drift out of date
- Mixing aspirational ideas with stable operating guidance without labeling the difference
## References
- [Gitea Documentation](https://docs.gitea.com/)
- [Write the Docs: Docs as Code](https://www.writethedocs.org/guide/docs-as-code/)

View File

@@ -0,0 +1,133 @@
---
title: Container Networking
description: Overview of Docker container networking modes and practical networking patterns
tags:
- containers
- docker
- networking
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Container Networking
## Introduction
Container networking determines how workloads talk to each other, the host, and the rest of the network. In Docker environments, understanding bridge networks, published ports, and special drivers is essential for secure and predictable service deployment.
## Purpose
This document explains how container networking works so you can:
- Choose the right network mode for a workload
- Avoid unnecessary host exposure
- Troubleshoot service discovery and connectivity problems
- Design cleaner multi-service stacks
## Architecture Overview
Docker commonly uses these networking approaches:
- Default bridge: basic isolated network for containers on one host
- User-defined bridge: preferred for most application stacks because it adds built-in DNS and cleaner isolation
- Host network: container shares the host network namespace
- Macvlan or ipvlan: container appears directly on the physical network
- Overlay: multi-host networking for orchestrated environments such as Swarm
## Network Modes
### User-defined bridge
This is the normal choice for single-host multi-container applications. Containers on the same network can resolve each other by service or container name.
Example:
```bash
docker network create app-net
docker run -d --name db --network app-net postgres:16
docker run -d --name app --network app-net ghcr.io/example/app:1.2.3
```
### Published ports
Publishing a port maps traffic from the host into the container:
```bash
docker run -d -p 8080:80 nginx:stable
```
This exposes a service through the host IP and should be limited to the ports you actually need.
### Host networking
Host networking removes network namespace isolation. It can be useful for performance-sensitive agents or software that depends on broadcast-heavy behavior, but it increases the chance of port conflicts and broad host exposure.
### Macvlan or ipvlan
These drivers give a container its own presence on the LAN. They can be useful for software that needs direct network identity, but they also bypass some of the simplicity and isolation of bridge networking.
## Configuration Example
Compose network example:
```yaml
services:
reverse-proxy:
image: caddy:2
ports:
- "80:80"
- "443:443"
networks:
- edge
app:
image: ghcr.io/example/app:1.2.3
networks:
- edge
- backend
db:
image: postgres:16
networks:
- backend
networks:
edge:
backend:
internal: true
```
In this pattern, the database is not reachable directly from the host or external clients.
## Troubleshooting Tips
### Container can reach the internet but not another container
- Verify both containers are attached to the same user-defined network
- Use container or service names rather than host loopback addresses
### Service is reachable internally but not from another host
- Confirm the port is published on the host
- Check host firewall rules and upstream routing
### Random connectivity issues after custom network changes
- Inspect network configuration with `docker network inspect <name>`
- Check for overlapping subnets between Docker networks and the physical LAN
- Restart affected containers after major network topology changes
## Best Practices
- Use user-defined bridge networks instead of the legacy default bridge where possible
- Publish only reverse proxy or explicitly required service ports
- Keep databases and internal backends on private internal networks
- Avoid `network_mode: host` unless there is a clear technical reason
- Document custom subnets to avoid conflicts with VPN and LAN address plans
## References
- [Docker: Bridge network driver](https://docs.docker.com/network/drivers/bridge/)
- [Docker: Networking overview](https://docs.docker.com/engine/network/)
- [Docker: Published ports](https://docs.docker.com/get-started/docker-concepts/running-containers/publishing-ports/)

View File

@@ -0,0 +1,125 @@
---
title: Persistent Volumes
description: Storage patterns for keeping container data durable across restarts and upgrades
tags:
- containers
- docker
- storage
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Persistent Volumes
## Introduction
Containers are disposable, but application data usually is not. Persistent volumes provide storage that survives container restarts, recreation, and image upgrades.
## Purpose
Use persistent volumes to:
- Preserve databases, uploads, and application state
- Separate data lifecycle from container lifecycle
- Simplify backup and restore workflows
- Reduce accidental data loss during redeployments
## Architecture Overview
Docker storage typically falls into three categories:
- Named volumes: managed by Docker and usually the best default for persistent app data
- Bind mounts: direct host paths mounted into a container
- Tmpfs mounts: memory-backed storage for temporary data
## Storage Patterns
### Named volumes
Named volumes are portable within a host and reduce the chance of coupling to host directory layouts.
```bash
docker volume create postgres-data
docker run -d \
--name db \
-v postgres-data:/var/lib/postgresql/data \
postgres:16
```
### Bind mounts
Bind mounts are useful when:
- The application expects editable configuration files
- You need direct host visibility into files
- Backups are based on host file paths
Example:
```bash
docker run -d \
--name caddy \
-v /srv/caddy/Caddyfile:/etc/caddy/Caddyfile:ro \
-v /srv/caddy/data:/data \
caddy:2
```
### Permissions and ownership
Many container storage issues come from mismatched UID and GID values between the host and containerized process. Check the image documentation and align ownership before assuming the application is broken.
## Configuration Example
Compose example with named volumes:
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
volumes:
- app-data:/var/lib/app
db:
image: postgres:16
volumes:
- db-data:/var/lib/postgresql/data
volumes:
app-data:
db-data:
```
## Troubleshooting Tips
### Data disappears after updating a container
- Verify the service is writing to the mounted path
- Check whether a bind mount accidentally hides expected image content
- Inspect mounts with `docker inspect <container>`
### Permission denied errors
- Check ownership and mode bits on bind-mounted directories
- Match container user expectations to host permissions
- Avoid mounting sensitive directories with broad write access
### Backups restore but the app still fails
- Confirm the restored data matches the application version
- Restore metadata such as permissions and database WAL files if applicable
- Test restores on a separate host before using them in production
## Best Practices
- Use named volumes for most stateful container data
- Use bind mounts deliberately for human-managed configuration
- Keep backups separate from the production host
- Record where every service stores its critical state
- Test restore procedures, not only backup creation
## References
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
- [Docker: Bind mounts](https://docs.docker.com/engine/storage/bind-mounts/)
- [Docker: Tmpfs mounts](https://docs.docker.com/engine/storage/tmpfs/)

View File

@@ -0,0 +1,112 @@
---
title: CI/CD Basics
description: Introduction to continuous integration and continuous delivery pipelines for application and infrastructure repositories
tags:
- ci
- cd
- devops
category: devops
created: 2026-03-14
updated: 2026-03-14
---
# CI/CD Basics
## Introduction
Continuous integration and continuous delivery reduce manual deployment risk by automating validation, packaging, and release steps. Even small self-hosted projects benefit from predictable pipelines that lint, test, and package changes before they reach live systems.
## Purpose
CI/CD pipelines help with:
- Fast feedback on changes
- Repeatable build and test execution
- Safer promotion of artifacts between environments
- Reduced manual drift in deployment procedures
## Architecture Overview
A basic pipeline usually includes:
- Trigger: push, pull request, tag, or schedule
- Jobs: isolated units such as lint, test, build, or deploy
- Artifacts: build outputs or packages passed to later stages
- Environments: dev, staging, production, or similar release targets
Typical flow:
```text
Commit -> CI checks -> Build artifact -> Approval or policy gate -> Deploy
```
## Core Concepts
### Continuous integration
Every meaningful change should run automated checks quickly and consistently.
### Continuous delivery
Artifacts are always kept in a releasable state, even if production deployment requires a manual approval.
### Continuous deployment
Every validated change is deployed automatically. This is powerful but requires strong tests, rollback paths, and change confidence.
## Configuration Example
GitHub Actions workflow example:
```yaml
name: ci
on:
pull_request:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm ci
- run: npm test
```
## Troubleshooting Tips
### Pipeline is slow and developers stop trusting it
- Run fast checks early
- Cache dependencies carefully
- Separate heavyweight integration tests from every small change if needed
### Deployments succeed but services still break
- Add health checks and post-deploy validation
- Make environment-specific configuration explicit
- Track which artifact version reached which environment
### CI and local results disagree
- Match tool versions between local and CI environments
- Keep pipeline setup code in version control
- Avoid hidden mutable runners when reproducibility matters
## Best Practices
- Keep CI feedback fast enough to be used during active development
- Require checks before merging to shared branches
- Build once and promote the same artifact when possible
- Separate validation, packaging, and deployment concerns
- Treat pipeline configuration as production code
## References
- [GitHub Docs: Understanding GitHub Actions](https://docs.github.com/actions/about-github-actions/understanding-github-actions)
- [GitHub Docs: Workflow syntax for GitHub Actions](https://docs.github.com/actions/reference/workflows-and-actions/workflow-syntax)

View File

@@ -0,0 +1,111 @@
---
title: Git Workflows
description: Practical Git workflow patterns for teams and personal infrastructure repositories
tags:
- git
- devops
- workflow
category: devops
created: 2026-03-14
updated: 2026-03-14
---
# Git Workflows
## Introduction
A Git workflow defines how changes move from local work to reviewed and deployable history. The right workflow keeps collaboration predictable without adding unnecessary ceremony.
## Purpose
This document covers the most common workflow choices for:
- Application repositories
- Infrastructure-as-code repositories
- Self-hosted service configuration
## Architecture Overview
A Git workflow usually combines:
- Branching strategy
- Review policy
- Merge policy
- Release or deployment trigger
The two patterns most teams evaluate first are:
- Trunk-based development with short-lived branches
- Feature branches with pull or merge requests
## Common Workflow Patterns
### Trunk-based with short-lived branches
Changes are kept small and integrated frequently into the default branch. This works well for active teams, automated test pipelines, and repositories that benefit from continuous deployment.
### Longer-lived feature branches
This can be useful for larger changes or teams with less frequent integration, but it increases drift and merge complexity.
### Infrastructure repositories
For IaC and self-hosting repos, prefer small reviewed changes with strong defaults:
- Protected main branch
- Required checks before merge
- Clear rollback path
- Commit messages that explain operational impact
## Configuration Example
Example daily workflow:
```bash
git switch main
git pull --ff-only
git switch -c feature/update-grafana
git add .
git commit -m "Update Grafana image and alert rules"
git push -u origin feature/update-grafana
```
Before merge:
```bash
git fetch origin
git rebase origin/main
```
## Troubleshooting Tips
### Merge conflicts happen constantly
- Reduce branch lifetime
- Split large changes into smaller reviewable commits
- Rebase or merge from the default branch more frequently
### History becomes hard to audit
- Use meaningful commit messages
- Avoid mixing unrelated infrastructure and application changes in one commit
- Document the operational reason for risky changes in the pull request
### Reverts are painful
- Keep commits cohesive
- Avoid squash-merging unrelated fixes together
- Ensure deployments can be tied back to a specific Git revision
## Best Practices
- Prefer short-lived branches and small pull requests
- Protect the default branch and require review for shared repos
- Use fast-forward pulls locally to avoid accidental merge noise
- Keep configuration and deployment code in Git, not in ad hoc host edits
- Align the Git workflow with deployment automation instead of treating them as separate processes
## References
- [Git: `gitworkflows`](https://git-scm.com/docs/gitworkflows)
- [Pro Git: Branching workflows](https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows)

View File

@@ -0,0 +1,58 @@
---
title: Monitoring and Observability
description: Core concepts behind monitoring, alerting, and observability for self-hosted systems
tags:
- monitoring
- observability
- operations
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Monitoring and Observability
## Summary
Monitoring and observability provide visibility into system health, failure modes, and operational behavior. For self-hosted systems, they turn infrastructure from a black box into an environment that can be maintained intentionally.
## Why it matters
Without visibility, teams discover failures only after users notice them. Observability reduces diagnosis time, helps verify changes safely, and supports day-two operations such as capacity planning and backup validation.
## Core concepts
- Metrics: numerical measurements over time
- Logs: event records produced by systems and applications
- Traces: request-path visibility across components
- Alerting: notifications triggered by actionable failure conditions
- Service-level thinking: monitoring what users experience, not only host resource usage
## Practical usage
A practical starting point often includes:
- Host metrics from exporters
- Availability checks for critical endpoints
- Dashboards for infrastructure and core services
- Alerts for outages, storage pressure, certificate expiry, and failed backups
## Best practices
- Monitor both infrastructure health and service reachability
- Alert on conditions that require action
- Keep dashboards focused on questions operators actually ask
- Use monitoring data to validate upgrades and incident recovery
## Pitfalls
- Treating dashboards as a substitute for alerts
- Collecting far more data than anyone reviews
- Monitoring only CPU and RAM while ignoring ingress, DNS, and backups
- Sending noisy alerts that train operators to ignore them
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)

View File

@@ -0,0 +1,114 @@
---
title: Proxmox Cluster Basics
description: Overview of how Proxmox VE clusters work, including quorum, networking, and operational constraints
tags:
- proxmox
- virtualization
- clustering
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Proxmox Cluster Basics
## Introduction
A Proxmox VE cluster groups multiple Proxmox nodes into a shared management domain. This allows centralized administration of virtual machines, containers, storage definitions, and optional high-availability workflows.
## Purpose
Use a Proxmox cluster when you want:
- Centralized management for multiple hypervisor nodes
- Shared visibility of guests, storage, and permissions
- Live migration or controlled workload movement between nodes
- A foundation for HA services backed by shared or replicated storage
## Architecture Overview
A Proxmox cluster relies on several core components:
- `pvecm`: the cluster management tool used to create and join clusters
- Corosync: provides the cluster communication layer
- `pmxcfs`: the Proxmox cluster file system used to distribute cluster configuration
- Quorum: majority voting used to protect cluster consistency
Important operational behavior:
- Each node normally has one vote
- A majority of votes must be online for state-changing operations
- Loss of quorum causes the cluster to become read-only for protected operations
## Cluster Design Notes
### Network requirements
Proxmox expects a reliable low-latency network for cluster traffic. Corosync is sensitive to packet loss, jitter, and unstable links. In homelabs, this generally means wired LAN links, stable switching, and avoiding Wi-Fi for cluster communication.
### Odd node counts
Three nodes is the common minimum for a healthy quorum-based design. Two-node designs can work, but they need extra planning such as a QDevice or acceptance of reduced fault tolerance.
### Storage considerations
Clustering does not automatically provide shared storage. Features such as live migration and HA depend on storage design:
- Shared storage: NFS, iSCSI, Ceph, or other shared backends
- Replicated local storage: possible for some workflows, but requires careful planning
- Backup storage: separate from guest runtime storage
## Configuration Example
Create a new cluster on the first node:
```bash
pvecm create lab-cluster
```
Check cluster status:
```bash
pvecm status
```
Join another node to the cluster from that node:
```bash
pvecm add 192.0.2.10
```
Use placeholder management addresses in documentation and never expose real administrative IPs publicly.
## Troubleshooting Tips
### Cluster is read-only
- Check quorum status with `pvecm status`
- Look for network instability between nodes
- Verify time synchronization and general host health
### Node join fails
- Confirm name resolution and basic IP reachability
- Make sure cluster traffic is not filtered by a firewall
- Verify the node is not already part of another cluster
### Random cluster instability
- Review packet loss, duplex mismatches, and switch reliability
- Keep corosync on stable wired links with low latency
- Separate heavy storage replication traffic from cluster messaging when possible
## Best Practices
- Use at least three voting members for a stable quorum model
- Keep cluster traffic on reliable wired networking
- Document node roles, storage backends, and migration dependencies
- Treat the Proxmox management network as a high-trust segment
- Test backup and restore separately from cluster failover assumptions
## References
- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
- [Proxmox VE `pvecm` manual](https://pve.proxmox.com/pve-docs/pvecm.1.html)

View File

@@ -0,0 +1,125 @@
---
title: Reverse Proxy Patterns
description: Common reverse proxy design patterns for self-hosted services and internal platforms
tags:
- reverse-proxy
- networking
- self-hosting
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Reverse Proxy Patterns
## Introduction
A reverse proxy accepts client requests and forwards them to upstream services. It commonly handles TLS termination, host-based routing, request header forwarding, and policy enforcement in front of self-hosted applications.
## Purpose
Reverse proxies are used to:
- Publish multiple services behind one or a few public entry points
- Centralize TLS certificates
- Apply authentication, authorization, or rate-limiting controls
- Simplify backend service placement and migration
## Architecture Overview
Typical request flow:
```text
Client -> Reverse proxy -> Upstream application
```
Common proxy responsibilities:
- TLS termination and certificate management
- Routing by hostname, path, or protocol
- Forwarding of `Host`, client IP, and other headers
- Optional load balancing across multiple backends
## Common Patterns
### Edge proxy for many internal services
One proxy handles traffic for multiple hostnames:
- `grafana.example.com`
- `gitea.example.com`
- `vault.example.com`
This is a good default for small homelabs and internal platforms.
### Internal proxy behind a VPN
Administrative services are reachable only through a private network such as Tailscale, WireGuard, or a dedicated management VLAN. This reduces public attack surface.
### Path-based routing
Useful when hostnames are limited, but more fragile than host-based routing because some applications assume they live at `/`.
### Dynamic discovery proxy
Tools such as Traefik can watch container metadata and update routes automatically. This reduces manual config for dynamic container environments, but it also makes label hygiene and network policy more important.
## Configuration Example
NGINX example:
```nginx
server {
listen 443 ssl http2;
server_name app.example.com;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:8080;
}
}
```
Caddy example:
```caddyfile
app.example.com {
reverse_proxy 127.0.0.1:8080
}
```
## Troubleshooting Tips
### Application redirects to the wrong URL
- Check forwarded headers such as `Host` and `X-Forwarded-Proto`
- Verify the application's configured external base URL
- Confirm TLS termination behavior matches application expectations
### WebSocket or streaming traffic fails
- Check proxy support for upgraded connections
- Review buffering behavior if the application expects streaming responses
### Backend works locally but not through the proxy
- Verify the proxy can reach the upstream host and port
- Check the proxy network namespace if running in a container
- Confirm firewall rules permit the proxy-to-upstream path
## Best Practices
- Prefer host-based routing over deep path rewriting
- Publish only the services that need an edge entry point
- Keep proxy configuration under version control
- Use separate internal and public entry points when trust boundaries differ
- Standardize upstream headers and base URL settings across applications
## References
- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
- [Traefik: Routing overview](https://doc.traefik.io/traefik/routing/overview/)
- [Caddy: `reverse_proxy` directive](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy)

View File

@@ -0,0 +1,66 @@
---
title: Service Architecture Patterns
description: Common service architecture patterns for self-hosted platforms and small engineering environments
tags:
- architecture
- services
- infrastructure
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Service Architecture Patterns
## Summary
Service architecture patterns describe how applications are packaged, connected, exposed, and operated. In self-hosted environments, the most useful patterns balance simplicity, isolation, and operability rather than chasing scale for its own sake.
## Why it matters
Architecture decisions affect deployment complexity, failure domains, recovery steps, and long-term maintenance. Small environments benefit from choosing patterns that remain understandable without full-time platform engineering overhead.
## Core concepts
- Single-service deployment: one service per VM or container stack
- Shared platform services: DNS, reverse proxy, monitoring, identity, backups
- Stateful versus stateless workloads
- Explicit ingress, persistence, and dependency boundaries
- Loose coupling through DNS, reverse proxies, and documented interfaces
## Practical usage
Useful patterns for self-hosted systems include:
- Reverse proxy plus multiple backend services
- Dedicated database service with application separation
- Utility VMs or containers for platform services
- Private admin interfaces with public application ingress kept separate
Example dependency view:
```text
Client -> Reverse proxy -> Application -> Database
-> Identity provider
-> Monitoring and logs
```
## Best practices
- Keep stateful services isolated and clearly backed up
- Make ingress paths and dependencies easy to trace
- Reuse shared platform services where they reduce duplication
- Prefer a small number of well-understood patterns across the environment
## Pitfalls
- Putting every service into one giant stack with unclear boundaries
- Mixing public ingress and administrative paths without review
- Scaling architecture complexity before operational need exists
- Depending on undocumented local assumptions between services
## References
- [Martin Fowler: MonolithFirst](https://martinfowler.com/bliki/MonolithFirst.html)
- [The Twelve-Factor App](https://12factor.net/)
- [NGINX: Reverse Proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)

View File

@@ -0,0 +1,122 @@
---
title: Service Discovery
description: Concepts and practical patterns for finding services in self-hosted and homelab environments
tags:
- networking
- service-discovery
- dns
category: infrastructure
created: 2026-03-14
updated: 2026-03-14
---
# Service Discovery
## Introduction
Service discovery is the process of locating services by identity instead of hard-coded IP addresses and ports. It becomes more important as workloads move between hosts, IPs change, or multiple service instances exist behind one logical name.
## Purpose
Good service discovery helps with:
- Decoupling applications from fixed network locations
- Supporting scaling and failover
- Simplifying service-to-service communication
- Reducing manual DNS and inventory drift
## Architecture Overview
There are several discovery models commonly used in self-hosted environments:
- Static DNS: manually managed A, AAAA, CNAME, or SRV records
- DNS-based service discovery: clients query DNS or DNS-SD metadata
- mDNS: local-link multicast discovery for small LANs
- Registry-based discovery: a central catalog such as Consul tracks service registration and health
## Discovery Patterns
### Static DNS
Best for stable infrastructure services such as hypervisors, reverse proxies, storage appliances, and monitoring endpoints.
Example:
```text
proxy.internal.example A 192.168.20.10
grafana.internal.example CNAME proxy.internal.example
```
### DNS-SD and mDNS
Useful for local networks where clients need to discover services such as printers or media endpoints. This works well for small trusted LAN segments, but it does not cross routed boundaries cleanly without extra relays or reflectors.
### Registry-based discovery
A service catalog stores registrations and health checks. Clients query the catalog or use DNS interfaces exposed by the registry.
This is useful when:
- Service instances are dynamic
- Health-aware routing matters
- Multiple nodes host the same service
## Configuration Example
Consul service registration example:
```json
{
"service": {
"name": "gitea",
"port": 3000,
"checks": [
{
"http": "http://127.0.0.1:3000/api/healthz",
"interval": "10s"
}
]
}
}
```
DNS-SD example concept:
```text
_https._tcp.internal.example SRV 0 0 443 proxy.internal.example
```
## Troubleshooting Tips
### Clients resolve a name but still fail to connect
- Check whether the resolved port is correct
- Verify firewall policy and reverse proxy routing
- Confirm the service is healthy, not just registered
### Discovery works on one VLAN but not another
- Review routed DNS access
- Check whether the workload depends on multicast discovery such as mDNS
- Avoid relying on broadcast or multicast across segmented networks unless intentionally supported
### Service records become stale
- Use health checks where possible
- Remove hand-managed DNS entries that no longer match current placements
- Prefer stable canonical names in front of dynamic backends
## Best Practices
- Use DNS as the default discovery mechanism for stable infrastructure
- Add service registries only when the environment is dynamic enough to justify them
- Pair discovery with health checks when multiple instances or failover paths exist
- Keep discovery names human-readable and environment-specific
- Avoid hard-coding IP addresses in application configuration unless there is no realistic alternative
## References
- [Consul: Discover services overview](https://developer.hashicorp.com/consul/docs/discover)
- [Consul: Service discovery explained](https://developer.hashicorp.com/consul/docs/use-case/service-discovery)
- [RFC 6762: Multicast DNS](https://www.rfc-editor.org/rfc/rfc6762)
- [RFC 6763: DNS-Based Service Discovery](https://www.rfc-editor.org/rfc/rfc6763)

View File

@@ -0,0 +1,66 @@
---
title: DNS Architecture
description: Core DNS architecture patterns for self-hosted and homelab environments
tags:
- dns
- networking
- infrastructure
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# DNS Architecture
## Summary
DNS architecture defines how names are assigned, resolved, delegated, and operated across internal and external systems. In self-hosted environments, good DNS design reduces configuration drift, improves service discoverability, and simplifies remote access.
## Why it matters
DNS is a foundational dependency for reverse proxies, TLS, service discovery, monitoring, and operator workflows. Weak DNS design creates brittle systems that depend on hard-coded IP addresses and manual recovery steps.
## Core concepts
- Authoritative DNS: the source of truth for a zone
- Recursive resolution: the process clients use to resolve names
- Internal DNS: records intended only for private services
- Split-horizon DNS: different answers depending on the client context
- TTL: cache lifetime that affects propagation and change speed
## Practical usage
A practical self-hosted DNS model often includes:
- Public DNS for internet-facing records
- Internal DNS for management and private services
- Reverse proxy hostnames for application routing
- Stable names for infrastructure services such as hypervisors, backup targets, and monitoring systems
Example record set:
```text
proxy.example.net A 198.51.100.20
grafana.internal.example A 192.0.2.20
gitea.internal.example CNAME proxy.internal.example
```
## Best practices
- Use DNS names instead of embedding IP addresses in application config
- Separate public and private naming where trust boundaries differ
- Keep TTLs appropriate for the change rate of the record
- Treat authoritative DNS as critical infrastructure with backup and access control
## Pitfalls
- Reusing the same name for unrelated services over time
- Forgetting that split DNS can confuse troubleshooting if undocumented
- Leaving DNS ownership unclear across platforms and providers
- Building service dependencies on local `/etc/hosts` entries
## References
- [Cloudflare Learning Center: What is DNS?](https://www.cloudflare.com/learning/dns/what-is-dns/)
- [RFC 1034: Domain Concepts and Facilities](https://www.rfc-editor.org/rfc/rfc1034)
- [RFC 1035: Domain Implementation and Specification](https://www.rfc-editor.org/rfc/rfc1035)

View File

@@ -0,0 +1,131 @@
---
title: Network Segmentation for Homelabs
description: Practical network segmentation patterns for separating trust zones in a homelab
tags:
- networking
- security
- homelab
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Network Segmentation for Homelabs
## Introduction
Network segmentation reduces blast radius by separating devices and services into smaller trust zones. In a homelab, this helps isolate management systems, user devices, public services, and less trusted endpoints such as IoT equipment.
## Purpose
Segmentation is useful for:
- Limiting lateral movement after a compromise
- Keeping management interfaces off general user networks
- Isolating noisy or untrusted devices
- Applying different routing, DNS, and firewall policies per zone
## Architecture Overview
A practical homelab usually benefits from separate L3 segments or VLANs for at least the following areas:
- Management: hypervisors, switches, storage admin interfaces
- Servers: application VMs, container hosts, databases
- Clients: laptops, desktops, mobile devices
- IoT: cameras, media devices, printers, controllers
- Guest: devices that should only reach the internet
- Storage or backup: optional dedicated replication path
Example layout:
```text
VLAN 10 Management 192.168.10.0/24
VLAN 20 Servers 192.168.20.0/24
VLAN 30 Clients 192.168.30.0/24
VLAN 40 IoT 192.168.40.0/24
VLAN 50 Guest 192.168.50.0/24
```
Traffic should pass through a firewall or router between zones instead of being bridged freely.
## Design Guidelines
### Segment by trust and function
Start with simple boundaries:
- High trust: management, backup, secrets infrastructure
- Medium trust: internal application servers
- Lower trust: personal devices, guest devices, consumer IoT
### Route between zones with policy
Use inter-VLAN routing with explicit firewall rules. Default deny between segments is easier to reason about than a flat network with ad hoc exceptions.
### Use DNS intentionally
- Give internal services stable names
- Avoid exposing management DNS records to guest or IoT segments
- Consider split DNS for remote access through Tailscale or another VPN
### Minimize overlap
Use clean RFC 1918 address plans and document them. Overlapping subnets complicate VPN routing, container networking, and future site expansion.
## Configuration Example
Example policy intent for a firewall:
```text
Allow Clients -> Servers : TCP 80,443
Allow Management -> Servers : any
Allow Servers -> Storage : TCP 2049,445,3260 as needed
Deny IoT -> Management : any
Deny Guest -> Internal RFC1918 ranges : any
```
Example address planning notes:
```text
192.168.10.0/24 Management
192.168.20.0/24 Server workloads
192.168.30.0/24 User devices
192.168.40.0/24 IoT
192.168.50.0/24 Guest
fd00:10::/64 IPv6 management ULA
```
## Troubleshooting Tips
### Service works from one VLAN but not another
- Check the inter-VLAN firewall rule order
- Confirm DNS resolves to the intended internal address
- Verify the destination service is listening on the right interface
### VPN users can reach too much
- Review ACLs or firewall policy for routed VPN traffic
- Publish only the required subnets through subnet routers
- Avoid combining management and user services in the same routed segment
### Broadcast-dependent services break across segments
- Use unicast DNS or service discovery where possible
- For mDNS-dependent workflows, consider a reflector only where justified
- Do not flatten the network just to support one legacy discovery method
## Best Practices
- Keep management on its own segment from the beginning
- Treat IoT and guest networks as untrusted
- Document every VLAN, subnet, DHCP scope, and routing rule
- Prefer L3 policy enforcement over broad L2 access
- Revisit segmentation when new services expose public endpoints or remote admin paths
## References
- [RFC 1918: Address Allocation for Private Internets](https://www.rfc-editor.org/rfc/rfc1918)
- [RFC 4193: Unique Local IPv6 Unicast Addresses](https://www.rfc-editor.org/rfc/rfc4193)
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)

View File

@@ -0,0 +1,123 @@
---
title: Tailscale Overview
description: Conceptual overview of how Tailscale works and where it fits in a homelab or engineering environment
tags:
- networking
- tailscale
- vpn
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Overview
## Introduction
Tailscale is a mesh VPN built on WireGuard. It provides secure connectivity between devices without requiring a traditional hub-and-spoke VPN concentrator for day-to-day traffic. In practice, it is often used to reach homelab services, administrative networks, remote workstations, and private developer environments.
## Purpose
The main purpose of Tailscale is to make private networking easier to operate:
- Identity-based access instead of exposing services directly to the internet
- Encrypted device-to-device connectivity
- Simple onboarding across laptops, servers, phones, and virtual machines
- Optional features for routing subnets, advertising exit nodes, and publishing services
## Architecture Overview
Tailscale separates coordination from data transfer.
- Control plane: devices authenticate to Tailscale and exchange node information, keys, policy, and routing metadata
- Data plane: traffic is encrypted with WireGuard and sent directly between peers whenever possible
- Relay fallback: when direct peer-to-peer connectivity is blocked, traffic can traverse DERP relays
Typical flow:
```text
Client -> Tailscale control plane for coordination
Client <-> Peer direct WireGuard tunnel when possible
Client -> DERP relay -> Peer when direct connectivity is unavailable
```
Important components:
- Tailnet: the private network that contains your devices and policies
- ACLs or grants: rules that control which identities can reach which resources
- Tags: non-human identities for servers and automation
- MagicDNS: tailnet DNS names for easier service discovery
- Subnet routers: devices that advertise non-Tailscale LAN routes
- Exit nodes: devices that forward default internet-bound traffic
## Core Concepts
### Identity first
Tailscale access control is tied to users, groups, devices, and tags rather than only source IP addresses. This works well for environments where laptops move between networks and services are distributed across cloud and on-prem hosts.
### Peer-to-peer by default
When NAT traversal succeeds, traffic goes directly between devices. This reduces latency and avoids creating a permanent bottleneck on one VPN server.
### Overlay networking
Each device keeps its normal local network connectivity and also gains a Tailscale address space. This makes it useful for remote administration without redesigning the entire local network.
## Configuration Example
Install and authenticate a Linux node:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
tailscale status
```
Advertise the node as infrastructure with a tag:
```bash
sudo tailscale up --advertise-tags=tag:server
```
## Operational Notes
- Use ACLs or grants early instead of leaving the entire tailnet flat
- Use tags for servers, containers, and automation agents
- Prefer MagicDNS or split DNS over hard-coded IP lists
- Treat subnet routers and exit nodes as infrastructure roles with extra review
## Troubleshooting
### Device is connected but cannot reach another node
- Check whether ACLs or grants allow the connection
- Confirm the target device is online with `tailscale status`
- Verify the service is listening on the expected interface and port
### Traffic is slower than expected
- Confirm whether the connection is direct or using DERP
- Inspect firewall and NAT behavior on both sides
- Check whether the path crosses an exit node or subnet router unnecessarily
### DNS names do not resolve
- Verify MagicDNS is enabled
- Check the client resolver configuration
- Confirm the hostname exists in the tailnet admin UI
## Best Practices
- Use identity-based policies and avoid broad any-to-any access
- Separate human users from infrastructure with groups and tags
- Limit high-trust roles such as subnet routers and exit nodes
- Document which services are intended for tailnet-only access
- Keep the local firewall enabled; Tailscale complements it rather than replacing it
## References
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Tailscale: How NAT traversal works](https://tailscale.com/blog/how-nat-traversal-works)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
- [Tailscale: MagicDNS](https://tailscale.com/kb/1081/magicdns)

View File

@@ -0,0 +1,120 @@
---
title: GPG Basics
description: Overview of core GnuPG concepts, key management, and common operational workflows
tags:
- security
- gpg
- encryption
category: security
created: 2026-03-14
updated: 2026-03-14
---
# GPG Basics
## Introduction
GPG, implemented by GnuPG, is used for public-key encryption, signing, and verification. It remains common for signing Git commits and tags, exchanging encrypted files, and maintaining long-term personal or team keys.
## Purpose
This document covers:
- What GPG keys and subkeys are
- Common encryption and signing workflows
- Key management practices that matter operationally
## Architecture Overview
A practical GPG setup often includes:
- Primary key: used mainly for certification and identity management
- Subkeys: used for signing, encryption, or authentication
- Revocation certificate: lets you invalidate a lost or compromised key
- Public key distribution: keyserver, WKD, or direct sharing
The primary key should be treated as more sensitive than everyday-use subkeys.
## Core Workflows
### Generate a key
Interactive generation:
```bash
gpg --full-generate-key
```
List keys:
```bash
gpg --list-secret-keys --keyid-format=long
```
### Export the public key
```bash
gpg --armor --export KEYID
```
### Encrypt a file for a recipient
```bash
gpg --encrypt --recipient KEYID secrets.txt
```
### Sign a file
```bash
gpg --detach-sign --armor release.tar.gz
```
### Verify a signature
```bash
gpg --verify release.tar.gz.asc release.tar.gz
```
## Configuration Example
Export a revocation certificate after key creation:
```bash
gpg --output revoke-KEYID.asc --gen-revoke KEYID
```
Store that revocation certificate offline in a secure location.
## Troubleshooting Tips
### Encryption works but trust warnings appear
- Confirm you imported the correct public key
- Verify fingerprints out of band before marking a key as trusted
- Do not treat keyserver availability as proof of identity
### Git signing fails
- Check that Git points to the expected key ID
- Confirm the GPG agent is running
- Verify terminal pinentry integration on the local system
### Lost laptop or corrupted keyring
- Restore from secure backups
- Revoke compromised keys if needed
- Reissue or rotate subkeys while keeping identity documentation current
## Best Practices
- Keep the primary key offline when practical and use subkeys day to day
- Generate and safely store a revocation certificate immediately
- Verify key fingerprints through a trusted secondary channel
- Back up secret keys securely before relying on them operationally
- Use GPG where it fits existing tooling; do not force it into workflows that are better served by simpler modern tools
## References
- [GnuPG Documentation](https://www.gnupg.org/documentation/)
- [The GNU Privacy Handbook](https://www.gnupg.org/gph/en/manual/book1.html)
- [GnuPG manual](https://www.gnupg.org/documentation/manuals/gnupg/)

View File

@@ -0,0 +1,65 @@
---
title: Identity and Authentication
description: Core concepts and patterns for identity, authentication, and authorization in self-hosted systems
tags:
- security
- identity
- authentication
category: security
created: 2026-03-14
updated: 2026-03-14
---
# Identity and Authentication
## Summary
Identity and authentication define who or what is requesting access and how that claim is verified. In self-hosted environments, a clear identity model is essential for secure remote access, service-to-service trust, and administrative control.
## Why it matters
As environments grow, per-application local accounts become hard to manage and harder to audit. Shared identity patterns reduce duplicated credentials, improve MFA coverage, and make access revocation more predictable.
## Core concepts
- Identity: the user, service, or device being represented
- Authentication: proving that identity
- Authorization: deciding what the identity may do
- Federation: delegating identity verification to a trusted provider
- MFA: requiring more than one authentication factor
## Practical usage
Common self-hosted patterns include:
- Central identity provider for user login
- SSO using OIDC or SAML for web applications
- SSH keys or hardware-backed credentials for administrative access
- Service accounts with narrowly scoped machine credentials
Example pattern:
```text
User -> Identity provider -> OIDC token -> Reverse proxy or application
Admin -> VPN -> SSH key or hardware-backed credential -> Server
```
## Best practices
- Centralize user identity where possible
- Enforce MFA for admin and internet-facing accounts
- Separate human accounts from machine identities
- Review how account disablement or key rotation propagates across services
## Pitfalls
- Leaving critical systems on isolated local accounts with no lifecycle control
- Reusing the same credentials across multiple services
- Treating authentication and authorization as the same problem
- Forgetting account recovery and break-glass access paths
## References
- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
- [NIST Digital Identity Guidelines](https://pages.nist.gov/800-63-3/)
- [Yubico developer documentation](https://developers.yubico.com/)

View File

@@ -0,0 +1,109 @@
---
title: Secrets Management
description: Principles and tool choices for handling secrets safely in self-hosted and engineering environments
tags:
- security
- secrets
- devops
category: security
created: 2026-03-14
updated: 2026-03-14
---
# Secrets Management
## Introduction
Secrets management is the practice of storing, distributing, rotating, and auditing sensitive values such as API tokens, database passwords, SSH private keys, and certificate material.
## Purpose
Good secrets management helps you:
- Keep credentials out of Git and chat logs
- Reduce accidental disclosure in deployment pipelines
- Rotate credentials without rewriting every system by hand
- Apply least privilege to applications and operators
## Architecture Overview
A practical secrets strategy distinguishes between:
- Human secrets: admin credentials, recovery codes, hardware token backups
- Machine secrets: database passwords, API tokens, TLS private keys
- Dynamic secrets: short-lived credentials issued on demand
- Encrypted configuration: secrets stored in version control in encrypted form
Common tooling patterns:
- Vault for centrally managed and dynamic secrets
- SOPS for Git-managed encrypted secret files
- Platform-native secret stores for specific runtimes
## Operational Model
### Centralized secret service
A service such as Vault handles storage, access policy, audit logging, and secret issuance. This is most useful when you need rotation, leasing, or many consumers across multiple environments.
### Encrypted files in Git
Tools such as SOPS allow you to keep encrypted configuration alongside deployment code. This is useful for small teams and GitOps-style workflows, as long as decryption keys are managed carefully.
### Runtime injection
Applications should receive secrets at runtime through a controlled delivery path rather than through hard-coded values inside images or repositories.
## Configuration Example
Example placeholder environment file layout:
```text
APP_DATABASE_URL=postgres://app:${DB_PASSWORD}@db.internal.example/app
APP_SMTP_PASSWORD=<provided-at-runtime>
```
Example SOPS-managed YAML structure:
```yaml
database:
user: app
password: ENC[AES256_GCM,data:...,type:str]
smtp:
password: ENC[AES256_GCM,data:...,type:str]
```
## Troubleshooting Tips
### Secret appears in logs or shell history
- Remove it from the source immediately if exposure is ongoing
- Rotate the credential instead of assuming it stayed private
- Review the delivery path that leaked it
### Encrypted config exists but deployments still fail
- Verify the deployment environment has access to the correct decryption keys
- Check whether placeholders or environment interpolation are incomplete
- Confirm the application reads secrets from the documented location
### Secret sprawl grows over time
- Inventory where secrets live and who owns them
- Standardize naming and rotation intervals
- Remove stale credentials from old hosts and repos
## Best Practices
- Never commit plaintext secrets to Git
- Prefer short-lived or scoped credentials where the platform supports them
- Separate secret storage from application images
- Rotate credentials after incidents, staff changes, and major platform migrations
- Document ownership, rotation method, and recovery path for every critical secret
## References
- [HashiCorp Vault: What is Vault?](https://developer.hashicorp.com/vault/docs/what-is-vault)
- [HashiCorp Vault documentation](https://developer.hashicorp.com/vault/docs)
- [SOPS documentation](https://getsops.io/docs/)
- [The Twelve-Factor App: Config](https://12factor.net/config)

View File

@@ -0,0 +1,64 @@
---
title: Backup Architecture
description: Reference backup architecture for self-hosted services, data, and infrastructure components
tags:
- backup
- architecture
- self-hosting
category: systems
created: 2026-03-14
updated: 2026-03-14
---
# Backup Architecture
## Summary
A backup architecture defines what is protected, where copies live, and how recovery is validated. In self-hosted environments, the architecture must account for application data, infrastructure configuration, and the operational steps needed to restore service safely.
## Why it matters
Many backup failures are architectural rather than tool-specific. Storing copies on the wrong system, skipping configuration, or never testing restores can make an otherwise successful backup job useless during an incident.
## Core concepts
- Multiple copies across different failure domains
- Separation of live storage, backup storage, and off-site retention
- Consistent backups for databases and stateful services
- Restore validation as part of the architecture
## Practical usage
A practical backup architecture usually includes:
- Host or VM backups for infrastructure nodes
- File or repository backups for application data
- Separate backup of configuration, Compose files, and DNS or proxy settings
- Off-site encrypted copy of critical repositories
Example model:
```text
Primary workloads -> Local backup repository -> Off-site encrypted copy
Infrastructure config -> Git + encrypted secret store -> off-site mirror
```
## Best practices
- Back up both data and the metadata needed to use it
- Keep at least one copy outside the main site or storage domain
- Use backup tooling that supports verification and restore inspection
- Make restore order and dependency assumptions explicit
## Pitfalls
- Treating snapshots as the only backup mechanism
- Backing up encrypted data without preserving key recovery paths
- Assuming application consistency without database-aware handling
- Skipping restore drills for high-value services
## References
- [restic documentation](https://restic.readthedocs.io/en/latest/)
- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)

View File

@@ -0,0 +1,156 @@
---
title: Homelab Architecture
description: Reference architecture for building a maintainable homelab with clear trust zones and operational boundaries
tags:
- homelab
- architecture
- infrastructure
category: systems
created: 2026-03-14
updated: 2026-03-14
---
# Homelab Architecture
## Introduction
A homelab architecture should make experimentation possible without turning the environment into an undocumented collection of one-off systems. The most effective designs separate compute, networking, storage, identity, and operations concerns so each layer can evolve without breaking everything above it.
## Purpose
This document describes a reusable architecture for:
- Self-hosted services
- Virtualization and container workloads
- Secure remote access
- Monitoring, backup, and update workflows
## Architecture Overview
A practical homelab can be viewed as layered infrastructure:
```text
Edge and Access
-> ISP/router, firewall, VPN, reverse proxy
Network Segmentation
-> management, servers, clients, IoT, guest
Compute
-> Proxmox nodes, VMs, container hosts
Platform Services
-> DNS, reverse proxy, identity, secrets, service discovery
Application Services
-> dashboards, git forge, media, automation, monitoring
Data Protection
-> backups, snapshots, off-site copy, restore testing
```
## Recommended Building Blocks
### Access and identity
- VPN or zero-trust access layer for administrative entry
- SSH with keys only for infrastructure access
- DNS as the primary naming system for internal services
### Compute
- Proxmox for VM and LXC orchestration
- Dedicated container hosts for Docker or another runtime
- Utility VMs for DNS, reverse proxy, monitoring, and automation
### Storage
- Fast local storage for active workloads
- Separate backup target with different failure characteristics
- Clear distinction between snapshots and real backups
### Observability and operations
- Metrics collection with Prometheus-compatible exporters
- Dashboards and alerting through Grafana and Alertmanager
- Centralized backup jobs and restore validation
- Controlled update workflow for host OS, containers, and dependencies
## Example Layout
```text
VLAN 10 Management hypervisors, switches, storage admin
VLAN 20 Servers reverse proxy, app VMs, databases
VLAN 30 Clients desktops, laptops, admin workstations
VLAN 40 IoT cameras, smart home, media devices
VLAN 50 Guest internet-only devices
```
Service placement example:
- Reverse proxy and DNS on small utility VMs
- Stateful applications in dedicated VMs or clearly documented persistent containers
- Monitoring and backup services isolated from guest and IoT traffic
## Configuration Example
Example inventory model:
```yaml
edge:
router: gateway-01
vpn: tailscale
compute:
proxmox:
- pve-01
- pve-02
- pve-03
docker_hosts:
- docker-01
platform:
dns:
- dns-01
reverse_proxy:
- proxy-01
monitoring:
- mon-01
backup:
- backup-01
```
## Troubleshooting Tips
### Services are easy to deploy but hard to operate
- Add inventory, ownership, and restore notes
- Separate platform services from experimental application stacks
- Avoid hiding critical dependencies inside one large Compose file
### Changes in one area break unrelated systems
- Recheck network boundaries and shared credentials
- Remove unnecessary coupling between storage, reverse proxy, and app hosts
- Keep DNS, secrets, and backup dependencies explicit
### Remote access becomes risky over time
- Review which services are internet-exposed
- Prefer tailnet-only or VPN-only admin paths
- Keep management interfaces off user-facing networks
## Best Practices
- Design around failure domains, not only convenience
- Keep a small number of core platform services well documented
- Prefer simple, replaceable building blocks over fragile all-in-one stacks
- Maintain an asset inventory with hostnames, roles, and backup coverage
- Test recovery paths for DNS, identity, and backup infrastructure first
## References
- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Prometheus](https://prometheus.io/)

View File

@@ -0,0 +1,67 @@
---
title: Homelab Network Architecture
description: Reference network architecture for a segmented homelab with private access and clear service boundaries
tags:
- homelab
- networking
- architecture
category: systems
created: 2026-03-14
updated: 2026-03-14
---
# Homelab Network Architecture
## Summary
A homelab network architecture should separate trust zones, keep administrative paths private, and make service traffic easy to reason about. The goal is not enterprise complexity, but a structure that reduces blast radius and operational confusion.
## Why it matters
Flat networks are easy to start with and difficult to secure later. A basic segmented design helps isolate management, servers, clients, guest devices, and less trusted endpoints such as IoT hardware.
## Core concepts
- Segmentation by trust and function
- Routed inter-VLAN policy instead of unrestricted layer-2 reachability
- Separate administrative access paths from public ingress
- DNS and reverse proxy as shared network-facing platform services
## Practical usage
Example logical layout:
```text
Management -> hypervisors, switches, storage admin
Servers -> applications, databases, utility VMs
Clients -> workstations and laptops
IoT -> low-trust devices
Guest -> internet-only access
VPN overlay -> remote access for administrators and approved services
```
This model works well with:
- A firewall or router handling inter-segment policy
- Private access through Tailscale or another VPN
- Reverse proxy entry points for published applications
## Best practices
- Keep management services on a dedicated segment
- Use DNS names and documented routes instead of ad hoc host entries
- Limit which segments can reach storage, backup, and admin systems
- Treat guest and IoT networks as untrusted
## Pitfalls
- Publishing management interfaces through the same path as public apps
- Allowing lateral access between all segments for convenience
- Forgetting to document routing and firewall dependencies
- Relying on multicast-based discovery across routed segments without a plan
## References
- [RFC 1918: Address Allocation for Private Internets](https://www.rfc-editor.org/rfc/rfc1918)
- [RFC 4193: Unique Local IPv6 Unicast Addresses](https://www.rfc-editor.org/rfc/rfc4193)
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)

View File

@@ -0,0 +1,64 @@
---
title: Identity Management Patterns
description: System-level identity management patterns for self-hosted and homelab environments
tags:
- identity
- authentication
- architecture
category: systems
created: 2026-03-14
updated: 2026-03-14
---
# Identity Management Patterns
## Summary
Identity management patterns describe how users, devices, and services are authenticated and governed across a self-hosted environment. Strong patterns reduce credential sprawl and make account lifecycle management more consistent.
## Why it matters
As services multiply, local account management becomes a source of weak passwords, missed offboarding, and inconsistent MFA coverage. A system-level identity pattern helps centralize trust while preserving operational fallback paths.
## Core concepts
- Central identity provider for users
- Federated login to applications through OIDC or SAML
- Strong admin authentication for infrastructure access
- Separate handling for service accounts and machine credentials
## Practical usage
A practical identity pattern often looks like:
```text
Users -> Identity provider -> Web applications
Admins -> VPN + SSH key or hardware-backed credential -> Infrastructure
Services -> Scoped machine credentials -> Databases and APIs
```
Supporting services may include:
- MFA-capable identity provider
- Reverse proxy integration for auth-aware routing
- Secrets management for service credentials
## Best practices
- Centralize user login where applications support it
- Require MFA for administrative and internet-exposed access
- Keep service credentials scoped to one system or purpose
- Maintain documented break-glass and recovery procedures
## Pitfalls
- Treating shared admin accounts as acceptable long-term practice
- Leaving old local users in place after federation is introduced
- Using one service credential across many applications
- Forgetting to protect the identity provider as critical infrastructure
## References
- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
- [NIST Digital Identity Guidelines](https://pages.nist.gov/800-63-3/)
- [Yubico developer documentation](https://developers.yubico.com/)

View File

@@ -0,0 +1,67 @@
---
title: Monitoring Stack Architecture
description: Reference architecture for a monitoring stack in a self-hosted or homelab environment
tags:
- monitoring
- observability
- architecture
category: systems
created: 2026-03-14
updated: 2026-03-14
---
# Monitoring Stack Architecture
## Summary
A monitoring stack architecture defines how metrics, probes, dashboards, and alerts fit together. In self-hosted environments, the stack should stay small enough to operate but broad enough to cover infrastructure, ingress, and critical services.
## Why it matters
Monitoring that is bolted on late often misses the services operators actually depend on. A planned stack architecture makes it easier to understand where signals come from and how alerts reach the right people.
## Core concepts
- Collection: exporters and scrape targets
- Storage and evaluation: Prometheus
- Visualization: Grafana
- Alert routing: Alertmanager
- External validation: blackbox or equivalent endpoint checks
## Practical usage
Typical architecture:
```text
Hosts and services -> Exporters / probes -> Prometheus
Prometheus -> Grafana dashboards
Prometheus -> Alertmanager -> notification channel
```
Recommended coverage:
- Host metrics for compute and storage systems
- Endpoint checks for user-facing services
- Backup freshness and certificate expiry
- Platform services such as DNS, reverse proxy, and identity provider
## Best practices
- Monitor the path users depend on, not only the host underneath it
- Keep the monitoring stack itself backed up and access controlled
- Alert on actionable failures rather than every threshold crossing
- Document ownership for critical alerts and dashboards
## Pitfalls
- Monitoring only CPU and memory while ignoring ingress and backups
- Running a complex stack with no retention or alert review policy
- Depending on dashboards alone for outage detection
- Forgetting to monitor the monitoring components themselves
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)

View File

@@ -0,0 +1,125 @@
---
title: Docker Basics
description: Practical introduction to Docker images, containers, and everyday command-line workflows
tags:
- containers
- docker
- linux
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Docker Basics
## Introduction
Docker packages applications and their dependencies into images that run as isolated containers. For homelab and developer workflows, it is commonly used to deploy repeatable services without building a full virtual machine for each workload.
## Purpose
Docker is useful when you need:
- Repeatable application packaging
- Simple local development environments
- Fast service deployment on Linux hosts
- Clear separation between host OS and application runtime
## Architecture Overview
Core Docker concepts:
- Image: immutable application package template
- Container: running instance of an image
- Registry: source for pulling and pushing images
- Volume: persistent storage outside the writable container layer
- Network: connectivity boundary for one or more containers
Typical flow:
```text
Dockerfile -> Image -> Registry or local cache -> Container runtime
```
## Step-by-Step Guide
### 1. Verify Docker is installed
```bash
docker version
docker info
```
### 2. Pull and run a container
```bash
docker pull nginx:stable
docker run -d --name web -p 8080:80 nginx:stable
```
### 3. Inspect the running container
```bash
docker ps
docker logs web
docker exec -it web sh
```
### 4. Stop and remove it
```bash
docker stop web
docker rm web
```
## Configuration Example
Run a service with a persistent named volume:
```bash
docker volume create app-data
docker run -d \
--name app \
-p 3000:3000 \
-v app-data:/var/lib/app \
ghcr.io/example/app:latest
```
Inspect resource usage:
```bash
docker stats
```
## Troubleshooting Tips
### Container starts and exits immediately
- Check `docker logs <container>`
- Verify the image's default command is valid
- Confirm required environment variables or mounted files exist
### Port publishing does not work
- Verify the service is listening inside the container
- Confirm the host port is not already in use
- Check host firewall rules
### Data disappears after recreation
- Use a named volume or bind mount instead of the writable container layer
- Confirm the application writes data to the mounted path
## Best Practices
- Pin images to a known tag and update intentionally
- Use named volumes for application state
- Prefer non-root containers when supported by the image
- Keep containers single-purpose and externalize configuration
- Use Compose for multi-service stacks instead of long `docker run` commands
## References
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Docker: Get started](https://docs.docker.com/get-started/)
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)

View File

@@ -0,0 +1,156 @@
---
title: Docker Compose Patterns
description: Reusable patterns for structuring Docker Compose applications in homelab and development environments
tags:
- containers
- docker
- compose
category: containers
created: 2026-03-14
updated: 2026-03-14
---
# Docker Compose Patterns
## Introduction
Docker Compose defines multi-container applications in a single declarative file. It is a good fit for homelab stacks, local development, and small self-hosted services that do not require a full orchestrator.
## Purpose
Compose helps when you need:
- Repeatable service definitions
- Shared networks and volumes for a stack
- Environment-specific overrides
- A clear deployment artifact that can live in Git
## Architecture Overview
A Compose application usually includes:
- One or more services
- One or more shared networks
- Persistent volumes
- Environment variables and mounted configuration
- Optional health checks and startup dependencies
## Step-by-Step Guide
### 1. Start with a minimal Compose file
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
ports:
- "8080:8080"
```
Start it:
```bash
docker compose up -d
docker compose ps
```
### 2. Add persistent storage and configuration
```yaml
services:
app:
image: ghcr.io/example/app:1.2.3
ports:
- "8080:8080"
environment:
APP_BASE_URL: "https://app.example.com"
volumes:
- app-data:/var/lib/app
volumes:
app-data:
```
### 3. Add dependencies with health checks
```yaml
services:
db:
image: postgres:16
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- db-data:/var/lib/postgresql/data
app:
image: ghcr.io/example/app:1.2.3
depends_on:
db:
condition: service_healthy
environment:
DATABASE_URL: postgres://app:${POSTGRES_PASSWORD}@db:5432/app
ports:
- "8080:8080"
volumes:
db-data:
```
## Common Patterns
### Use one project directory per stack
Keep the Compose file, `.env` example, and mounted config together in one directory.
### Use user-defined networks
Private internal services should communicate over Compose networks rather than the host network.
### Prefer explicit volumes
Named volumes are easier to back up and document than anonymous ones.
### Use profiles for optional services
Profiles are useful for dev-only services, one-shot migration jobs, or optional observability components.
## Troubleshooting Tips
### Services start in the wrong order
- Use health checks instead of only container start order
- Ensure the application retries database or dependency connections
### Configuration drift between hosts
- Commit the Compose file to Git
- Keep secrets out of the file and inject them separately
- Avoid host-specific bind mount paths when portability matters
### Containers cannot resolve each other
- Check that the services share the same Compose network
- Use the service name as the hostname
- Verify the application is not hard-coded to `localhost`
## Best Practices
- Omit the deprecated top-level `version` field in new Compose files
- Keep secrets outside the Compose YAML when possible
- Pin images to intentional versions
- Use health checks for stateful dependencies
- Treat Compose as deployment code and review changes like application code
## References
- [Docker: Compose file reference](https://docs.docker.com/reference/compose-file/)
- [Docker: Compose application model](https://docs.docker.com/compose/intro/compose-application-model/)
- [Docker: Control startup and shutdown order in Compose](https://docs.docker.com/compose/how-tos/startup-order/)
- [Compose Specification](https://compose-spec.io/)

View File

@@ -0,0 +1,124 @@
---
title: Tailscale Exit Nodes
description: Guide to publishing and using Tailscale exit nodes for internet-bound traffic
tags:
- networking
- tailscale
- vpn
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Exit Nodes
## Introduction
An exit node is a Tailscale device that forwards a client's default route. When enabled, internet-bound traffic leaves through that node instead of the client's local network.
## Purpose
Exit nodes are commonly used for:
- Secure browsing on untrusted networks
- Reaching the internet through a trusted home or lab connection
- Testing geo-dependent behavior from another site
- Concentrating egress through a monitored network path
## Architecture Overview
With an exit node, the selected client sends default-route traffic through Tailscale to the exit node, which then forwards it to the public internet.
```text
Client -> Tailscale tunnel -> Exit node -> Internet
```
Important implications:
- The exit node becomes part of the trust boundary
- Bandwidth, DNS behavior, and logging depend on the exit node's network
- Local LAN access on the client may need explicit allowance
## Step-by-Step Guide
### 1. Prepare the exit node host
Choose a stable host with sufficient upstream bandwidth and a network path you trust. Typical choices are a home server, small VPS, or a utility VM.
### 2. Advertise the node as an exit node
On the node:
```bash
sudo tailscale up --advertise-exit-node
```
You can combine this with tags:
```bash
sudo tailscale up --advertise-exit-node --advertise-tags=tag:exit-node
```
### 3. Approve or review the role
Approve the exit node in the admin console if required by policy. Restrict who can use it with ACLs or grants.
### 4. Select the exit node on a client
From a client, choose the exit node in the Tailscale UI or configure it from the CLI:
```bash
sudo tailscale up --exit-node=<exit-node-name-or-ip>
```
If the client still needs to reach the local LAN directly, enable local LAN access in the client configuration or UI.
## Configuration Example
Example for a dedicated Linux exit node:
```bash
sudo tailscale up \
--advertise-exit-node \
--advertise-tags=tag:exit-node
```
Client-side example:
```bash
sudo tailscale up --exit-node=home-gateway
curl https://ifconfig.me
```
## Troubleshooting Tips
### Internet access stops after selecting the exit node
- Confirm the exit node is online in `tailscale status`
- Verify the exit node host itself has working internet access
- Check the exit node's local firewall and forwarding configuration
### Local printers or NAS become unreachable
- Enable local LAN access on the client if that behavior is required
- Split administrative traffic from internet egress if the use case is mixed
### Performance is poor
- Verify the client is using a nearby and healthy exit node
- Check the exit node's CPU, uplink bandwidth, and packet loss
- Avoid placing an exit node behind overloaded or unstable consumer hardware
## Best Practices
- Use exit nodes for specific trust and egress requirements, not as a default for every device
- Restrict usage to approved groups or devices
- Keep exit nodes patched because they handle broad traffic scopes
- Log and monitor egress hosts like any other shared network gateway
- Separate personal browsing, admin traffic, and production service egress when the risk model requires it
## References
- [Tailscale: Exit nodes](https://tailscale.com/kb/1103/exit-nodes)
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)

View File

@@ -0,0 +1,143 @@
---
title: Tailscale Subnet Routing
description: Guide to publishing LAN subnets into a Tailscale tailnet with subnet routers
tags:
- networking
- tailscale
- routing
category: networking
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale Subnet Routing
## Introduction
Subnet routing allows Tailscale clients to reach devices that are not running the Tailscale agent directly. This is useful for printers, storage appliances, hypervisors, IoT controllers, and legacy systems on a homelab LAN.
## Purpose
Use subnet routing when:
- A device cannot run the Tailscale client
- A full site-to-site VPN is unnecessary
- Remote users need access to one or more internal networks
- You want to publish access to a specific VLAN without exposing the entire environment
## Architecture Overview
A subnet router is a Tailscale node with IP forwarding enabled. It advertises one or more LAN prefixes to the tailnet.
```text
Remote client -> Tailscale tunnel -> Subnet router -> LAN target
```
Recommended placement:
- One router per routed network or security zone
- Prefer stable hosts such as small Linux VMs, routers, or dedicated utility nodes
- Apply restrictive ACLs so only approved identities can use the route
## Step-by-Step Guide
### 1. Prepare the router host
Install Tailscale on a Linux host that already has reachability to the target subnet.
Enable IPv4 forwarding:
```bash
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-tailscale.conf
sudo sysctl --system
```
If the subnet is IPv6-enabled, also enable IPv6 forwarding:
```bash
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl --system
```
### 2. Advertise the subnet
Start Tailscale and advertise the route:
```bash
sudo tailscale up --advertise-routes=192.168.10.0/24
```
Multiple routes can be advertised as a comma-separated list:
```bash
sudo tailscale up --advertise-routes=192.168.10.0/24,192.168.20.0/24
```
### 3. Approve the route
Approve the advertised route in the Tailscale admin console, or pre-authorize it with `autoApprovers` if that matches your policy model.
### 4. Restrict access
Use ACLs or grants so only the necessary users or tagged devices can reach the routed subnet.
Example policy intent:
- `group:admins` can reach `192.168.10.0/24`
- `group:developers` can only reach specific hosts or ports
- IoT and management subnets require separate approval
## Configuration Example
Example server-side command for a dedicated subnet router:
```bash
sudo tailscale up \
--advertise-routes=192.168.10.0/24 \
--advertise-tags=tag:subnet-router
```
Example policy idea:
```json
{
"tagOwners": {
"tag:subnet-router": ["group:admins"]
}
}
```
## Troubleshooting Tips
### Clients can see the route but cannot reach hosts
- Verify IP forwarding is enabled on the router
- Confirm local firewall rules permit forwarding traffic
- Make sure the router has normal LAN connectivity to the destination hosts
- Check whether the destination host has a host firewall blocking the source
### Route does not appear in the tailnet
- Confirm the router is online in `tailscale status`
- Check that the route was approved in the admin console
- Review whether policy requires a specific tag owner or auto-approval
### Asymmetric routing or reply failures
- Make sure the subnet router is in the normal return path for the destination subnet
- Avoid overlapping subnets across multiple sites unless routing precedence is intentional
- Do not advertise broad prefixes when a narrower one is sufficient
## Best Practices
- Advertise the smallest subnet that solves the use case
- Run subnet routers on stable infrastructure, not laptops
- Use separate routers for management and user-facing networks where possible
- Combine routing with ACLs; route advertisement alone is not authorization
- Monitor route health and document ownership of every advertised prefix
## References
- [Tailscale: Subnet routers](https://tailscale.com/kb/1019/subnets)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
- [Tailscale: Policy file syntax](https://tailscale.com/kb/1337/policy-syntax)

View File

@@ -0,0 +1,135 @@
---
title: SSH Hardening
description: Practical SSH server hardening guidance for Linux systems in homelab and self-hosted environments
tags:
- security
- ssh
- linux
category: security
created: 2026-03-14
updated: 2026-03-14
---
# SSH Hardening
## Introduction
SSH is the primary administrative entry point for many Linux systems. Hardening it reduces the likelihood of credential attacks, accidental privilege exposure, and overly broad remote access.
## Purpose
This guide focuses on making SSH safer by:
- Disabling weak authentication paths
- Reducing exposure to brute-force attacks
- Limiting which users can log in
- Preserving maintainability by relying on modern OpenSSH defaults where possible
## Architecture Overview
SSH hardening has three layers:
- Transport and daemon configuration
- Network exposure and firewall policy
- Operational practices such as key handling and logging
For most self-hosted systems, the best model is:
```text
Admin workstation -> VPN or trusted network -> SSH server
```
## Step-by-Step Guide
### 1. Use key-based authentication
Generate a key on the client and copy the public key to the server:
```bash
ssh-keygen -t ed25519 -C "admin@example.com"
ssh-copy-id admin@server.example
```
### 2. Harden `sshd_config`
Baseline example:
```text
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 30
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
AllowUsers admin
```
If you need port forwarding for a specific workflow, enable it deliberately instead of leaving it broadly available.
### 3. Validate the configuration
```bash
sudo sshd -t
```
### 4. Reload safely
Keep an existing SSH session open while reloading:
```bash
sudo systemctl reload sshd
```
Distribution-specific service names may be `ssh` or `sshd`.
### 5. Restrict network exposure
- Prefer VPN-only or management-VLAN-only access
- Allow SSH from trusted subnets only
- Do not expose SSH publicly unless it is necessary and monitored
## Configuration Example
Example host firewall intent:
```text
Allow TCP 22 from 192.168.10.0/24
Allow TCP 22 from Tailscale tailnet range
Deny TCP 22 from all other sources
```
## Troubleshooting Tips
### Locked out after config change
- Keep the original session open until a new login succeeds
- Validate the daemon config with `sshd -t`
- Check the service name and logs with `journalctl -u sshd` or `journalctl -u ssh`
### Key authentication fails
- Check file permissions on `~/.ssh` and `authorized_keys`
- Confirm the server allows public key authentication
- Verify the client is offering the intended key with `ssh -v`
### Automation jobs break
- Review whether the workload depended on password auth, port forwarding, or agent forwarding
- Create narrowly scoped exceptions rather than reverting the whole hardening change
## Best Practices
- Rely on current OpenSSH defaults for ciphers and algorithms unless you have a specific compliance need
- Disable password-based interactive logins on internet-reachable systems
- Use individual user accounts and `sudo` instead of direct root SSH
- Combine SSH hardening with network-level restrictions
- Review SSH logs regularly on administrative systems
## References
- [OpenBSD `sshd_config` manual](https://man.openbsd.org/sshd_config)
- [OpenSSH](https://www.openssh.com/)
- [Mozilla OpenSSH guidelines](https://infosec.mozilla.org/guidelines/openssh)

View File

@@ -0,0 +1,113 @@
---
title: YubiKey Usage
description: Guide to using a YubiKey for SSH, authentication, and key protection in self-hosted environments
tags:
- security
- yubikey
- ssh
category: security
created: 2026-03-14
updated: 2026-03-14
---
# YubiKey Usage
## Introduction
A YubiKey is a hardware token that can protect authentication and cryptographic operations. In homelab and engineering workflows, it is commonly used for MFA, SSH keys, and protection of GPG subkeys.
## Purpose
Use a YubiKey when you want:
- Stronger authentication than password-only login
- Private keys that require physical presence
- Portable hardware-backed credentials for administrative access
## Architecture Overview
YubiKeys can be used through different interfaces:
- FIDO2 or WebAuthn: MFA and modern hardware-backed authentication
- OpenSSH security keys: SSH keys such as `ed25519-sk`
- OpenPGP applet: card-resident GPG subkeys
- PIV: smart-card style certificate workflows
Choose the interface based on the workflow instead of trying to use one mode for everything.
## Step-by-Step Guide
### 1. Use the key for MFA first
Register the YubiKey with identity providers and critical services before moving on to SSH or GPG workflows.
### 2. Create a hardware-backed SSH key
On a system with OpenSSH support for security keys:
```bash
ssh-keygen -t ed25519-sk -C "admin@example.com"
```
This creates an SSH key tied to the hardware token.
### 3. Install the public key on servers
```bash
ssh-copy-id -i ~/.ssh/id_ed25519_sk.pub admin@server.example
```
### 4. Test login
```bash
ssh admin@server.example
```
Expect a touch prompt when required by the device policy.
## Configuration Example
Example client SSH config for a dedicated administrative target:
```text
Host lab-admin
HostName server.example
User admin
IdentityFile ~/.ssh/id_ed25519_sk
```
For GPG workflows, move only subkeys onto the YubiKey and keep the primary key offline when possible.
## Troubleshooting Tips
### The key is not detected
- Confirm USB or NFC access is available
- Check whether another smart-card daemon has locked the device
- Verify the client OS has support for the intended mode
### SSH prompts repeatedly or fails
- Make sure the correct public key is installed on the server
- Confirm the client is offering the security-key identity
- Check that the OpenSSH version supports the selected key type
### GPG or smart-card workflows are inconsistent
- Verify which YubiKey applet is in use
- Avoid mixing PIV and OpenPGP instructions unless the workflow requires both
- Keep backup tokens or recovery paths for administrative access
## Best Practices
- Use the YubiKey as part of a broader account recovery plan, not as the only path back in
- Keep at least one spare token for high-value admin accounts
- Prefer hardware-backed SSH keys for administrator accounts
- Document which services rely on the token and how recovery works
- Separate MFA usage from certificate and signing workflows unless there is a clear operational reason to combine them
## References
- [Yubico: SSH](https://developers.yubico.com/SSH/)
- [Yubico: YubiKey and OpenPGP](https://developers.yubico.com/PGP/)
- [Yubico developer documentation](https://developers.yubico.com/)

View File

@@ -0,0 +1,121 @@
---
title: Backup Strategies
description: Practical backup strategy guidance for self-hosted services, containers, and virtualized homelabs
tags:
- backup
- self-hosting
- operations
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Backup Strategies
## Introduction
Backups protect against deletion, corruption, hardware failure, ransomware, and operational mistakes. In self-hosted environments, a backup strategy should cover both data and the information needed to restore services correctly.
## Purpose
This guide covers:
- What to back up
- How often to back it up
- Where to store copies
- How to validate restore readiness
## Architecture Overview
A good strategy includes:
- Primary data backups
- Configuration and infrastructure backups
- Off-site or offline copies
- Restore testing
The 3-2-1 rule is a strong baseline:
- 3 copies of data
- 2 different media or storage systems
- 1 copy off-site
For higher assurance, also consider an immutable or offline copy and zero-error verification.
## Step-by-Step Guide
### 1. Inventory what matters
Back up:
- Databases
- Application data directories
- Compose files and infrastructure code
- DNS, reverse proxy, and secrets configuration
- Hypervisor or VM backup metadata
### 2. Choose backup tools by workload
- File-level backups: restic, Borg, rsync-based workflows
- VM backups: hypervisor-integrated backup jobs
- Database-aware backups: logical dumps or physical backup tools where needed
### 3. Schedule and retain intelligently
Use a retention policy that matches recovery needs. Short retention for frequent snapshots and longer retention for off-site backups is common.
### 4. Test restores
Backups are incomplete until you can restore and start the service successfully.
## Configuration Example
Restic backup example:
```bash
export RESTIC_REPOSITORY=/backup/restic
export RESTIC_PASSWORD_FILE=/run/secrets/restic_password
restic backup /srv/app-data /srv/compose
restic snapshots
```
Example restore check:
```bash
restic restore latest --target /tmp/restore-check
```
## Troubleshooting Tips
### Backups exist but restores are incomplete
- Confirm databases were backed up consistently, not mid-write without support
- Verify application config and secret material were included
- Check permissions and ownership in the restored data
### Repository size grows too quickly
- Review retention rules and pruning behavior
- Exclude caches, transient files, and rebuildable artifacts
- Split hot data from archival data if retention needs differ
### Backups run but nobody notices failures
- Alert on backup freshness and last successful run
- Record the restore procedure for each critical service
- Test restores on a schedule, not only after incidents
## Best Practices
- Back up both data and the configuration needed to use it
- Keep at least one copy outside the main failure domain
- Prefer encrypted backup repositories for off-site storage
- Automate backup jobs and monitor their success
- Practice restores for your most important services first
## References
- [restic documentation](https://restic.readthedocs.io/en/latest/)
- [BorgBackup documentation](https://borgbackup.readthedocs.io/en/stable/)
- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)

View File

@@ -0,0 +1,125 @@
---
title: Service Monitoring
description: Guide to building a basic monitoring stack for self-hosted services and infrastructure
tags:
- monitoring
- self-hosting
- observability
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Service Monitoring
## Introduction
Monitoring turns a self-hosted environment from a collection of services into an operable system. At minimum, that means collecting metrics, checking service availability, and alerting on failures that need human action.
## Purpose
This guide focuses on:
- Host and service metrics
- Uptime checks
- Dashboards and alerting
- Monitoring coverage for common homelab services
## Architecture Overview
A small monitoring stack often includes:
- Prometheus for scraping metrics
- Exporters such as `node_exporter` for host metrics
- Blackbox probing for endpoint availability
- Grafana for dashboards
- Alertmanager for notifications
Typical flow:
```text
Exporter or target -> Prometheus -> Grafana dashboards
Prometheus alerts -> Alertmanager -> notification channel
```
## Step-by-Step Guide
### 1. Start with host metrics
Install `node_exporter` on important Linux hosts or run it in a controlled containerized setup.
### 2. Scrape targets from Prometheus
Example scrape config:
```yaml
scrape_configs:
- job_name: node
static_configs:
- targets:
- "server-01.internal.example:9100"
- "server-02.internal.example:9100"
```
### 3. Add endpoint checks
Use a blackbox probe or equivalent to test HTTPS and TCP reachability for user-facing services.
### 4. Add dashboards and alerts
Alert only on conditions that require action, such as:
- Host down
- Disk nearly full
- Backup job missing
- TLS certificate near expiry
## Configuration Example
Example alert concept:
```yaml
groups:
- name: infrastructure
rules:
- alert: HostDown
expr: up == 0
for: 5m
labels:
severity: critical
```
## Troubleshooting Tips
### Metrics are missing for one host
- Check exporter health on that host
- Confirm firewall rules allow scraping
- Verify the target name and port in the Prometheus config
### Alerts are noisy
- Add `for` durations to avoid alerting on short blips
- Remove alerts that never trigger action
- Tune thresholds per service class rather than globally
### Dashboards look healthy while the service is down
- Add blackbox checks in addition to internal metrics
- Monitor the reverse proxy or external entry point, not only the app process
- Track backups and certificate expiry separately from CPU and RAM
## Best Practices
- Monitor the services users depend on, not only the hosts they run on
- Keep alert volume low enough that alerts remain meaningful
- Document the owner and response path for each critical alert
- Treat backup freshness and certificate expiry as first-class signals
- Start simple, then add coverage where operational pain justifies it
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus Alertmanager overview](https://prometheus.io/docs/alerting/latest/overview/)
- [Prometheus `node_exporter`](https://github.com/prometheus/node_exporter)
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)

View File

@@ -0,0 +1,124 @@
---
title: Update Management
description: Practical update management for Linux hosts, containers, and self-hosted services
tags:
- updates
- patching
- self-hosting
category: self-hosting
created: 2026-03-14
updated: 2026-03-14
---
# Update Management
## Introduction
Update management keeps systems secure and supportable without turning every patch cycle into an outage. In self-hosted environments, the challenge is balancing security, uptime, and limited operator time.
## Purpose
This guide focuses on:
- Operating system updates
- Container and dependency updates
- Scheduling, staging, and rollback planning
## Architecture Overview
A practical update process has four layers:
- Inventory: know what you run
- Detection: know when updates are available
- Deployment: apply updates in a controlled order
- Validation: confirm services still work
## Step-by-Step Guide
### 1. Separate systems by risk
Create update rings such as:
- Ring 1: non-critical test systems
- Ring 2: internal services
- Ring 3: critical stateful services and edge entry points
### 2. Automate security updates where safe
For Linux hosts, automated security updates can reduce patch delay for low-risk packages. Review distribution guidance and keep reboots controlled.
### 3. Automate update discovery
Use tools that open reviewable pull requests or dashboards for:
- Container image updates
- Dependency updates
- Operating system patch reporting
### 4. Validate after rollout
Confirm:
- Service health
- Reverse proxy reachability
- Backup jobs
- Monitoring and alerting
## Configuration Example
Ubuntu unattended upgrades example:
```text
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
```
Dependency update automation example:
```json
{
"extends": ["config:recommended"],
"schedule": ["before 5am on monday"],
"packageRules": [
{
"matchUpdateTypes": ["major"],
"automerge": false
}
]
}
```
## Troubleshooting Tips
### Updates are applied but regressions go unnoticed
- Add post-update health checks
- Review dashboards and key alerts after patch windows
- Keep rollback or restore steps documented for stateful services
### Too many update notifications create fatigue
- Group low-risk updates into maintenance windows
- Separate critical security issues from routine version bumps
- Use labels or dashboards to prioritize by service importance
### Containers stay outdated even though automation exists
- Verify image digests and registry visibility
- Confirm the deployment process actually recreates containers after image updates
- Prefer reviewed rebuild and redeploy workflows over blind runtime mutation for important services
## Best Practices
- Patch internet-exposed and admin-facing services first
- Stage risky or major updates through lower-risk environments
- Prefer reviewable dependency automation over silent uncontrolled updates
- Keep maintenance windows small and predictable
- Document rollback expectations before making large version jumps
## References
- [Ubuntu Community Help Wiki: Automatic Security Updates](https://help.ubuntu.com/community/AutomaticSecurityUpdates)
- [Debian Wiki: UnattendedUpgrades](https://wiki.debian.org/UnattendedUpgrades)
- [Renovate documentation](https://docs.renovatebot.com/)
- [GitHub Docs: Configuring Dependabot version updates](https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuring-dependabot-version-updates)

63
70 - Tools/caddy/caddy.md Normal file
View File

@@ -0,0 +1,63 @@
---
title: Caddy
description: Tool overview for Caddy as a web server and reverse proxy with automatic HTTPS
tags:
- caddy
- reverse-proxy
- web
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Caddy
## Summary
Caddy is a web server and reverse proxy known for automatic HTTPS and a simple configuration model. In self-hosted environments, it is often used as an easy-to-operate edge or internal reverse proxy for web applications.
## Why it matters
For many homelab and small infrastructure setups, Caddy offers a faster path to a secure reverse proxy than more manual alternatives. It is especially effective when a small team wants readable configuration and low TLS management overhead.
## Core concepts
- Caddyfile as the high-level configuration format
- Automatic HTTPS and certificate management
- `reverse_proxy` as the core upstream routing primitive
- Site blocks for host-based routing
- JSON configuration for advanced automation cases
## Practical usage
Caddy commonly fits into infrastructure as:
```text
Client -> Caddy -> upstream application
```
Typical uses:
- Terminating TLS for self-hosted apps
- Routing multiple hostnames to different backends
- Serving simple static sites alongside proxied services
## Best practices
- Keep hostnames and upstream targets explicit
- Use Caddy as a shared ingress layer instead of publishing many app ports
- Back up Caddy configuration and persistent state if certificates or ACME state matter
- Keep external base URLs aligned with proxy behavior
## Pitfalls
- Assuming automatic HTTPS removes the need to understand DNS and port reachability
- Mixing public and private services without clear routing boundaries
- Forgetting that proxied apps may need forwarded header awareness
- Leaving Caddy state or config out of the backup plan
## References
- [Caddy documentation](https://caddyserver.com/docs/)
- [Caddy: `reverse_proxy` directive](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy)
- [Caddyfile concepts](https://caddyserver.com/docs/caddyfile/concepts)

View File

@@ -0,0 +1,63 @@
---
title: Cloudflare
description: Tool overview for Cloudflare as a DNS, edge, and access platform in self-hosted environments
tags:
- cloudflare
- dns
- edge
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Cloudflare
## Summary
Cloudflare is an edge platform commonly used for DNS hosting, proxying, TLS, tunnels, and access control. In self-hosted environments, it is often the public-facing layer in front of privately managed infrastructure.
## Why it matters
Cloudflare can reduce operational burden for public DNS, certificates, and internet exposure. It becomes especially useful when services need a controlled edge while the underlying infrastructure remains private or partially private.
## Core concepts
- Authoritative DNS hosting
- Proxy mode for HTTP and selected proxied traffic
- Zero Trust and Access controls
- Tunnels for publishing services without opening inbound ports directly
- CDN and caching features for web workloads
## Practical usage
Cloudflare commonly fits into infrastructure like this:
```text
Client -> Cloudflare edge -> reverse proxy or tunnel -> application
```
Typical uses:
- Public DNS for domains and subdomains
- Cloudflare Tunnel for selected internal apps
- Access policies in front of sensitive web services
## Best practices
- Keep public DNS records documented and intentional
- Use tunnels or private access controls for admin-facing services when appropriate
- Understand which services are proxied and which are DNS-only
- Review TLS mode and origin certificate behavior carefully
## Pitfalls
- Assuming proxy mode works identically for every protocol
- Forgetting that Cloudflare becomes part of the trust and availability path
- Mixing internal admin services with public publishing defaults
- Losing track of which records are authoritative in Cloudflare versus internal DNS
## References
- [Cloudflare Docs](https://developers.cloudflare.com/)
- [Cloudflare Learning Center: What is DNS?](https://www.cloudflare.com/learning/dns/what-is-dns/)
- [Cloudflare Zero Trust documentation](https://developers.cloudflare.com/cloudflare-one/)

View File

@@ -0,0 +1,64 @@
---
title: Docker
description: Tool overview for Docker as a container runtime and packaging platform
tags:
- docker
- containers
- infrastructure
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Docker
## Summary
Docker is a container platform used to package and run applications with their dependencies in isolated environments. In self-hosted systems, it is often the default runtime for lightweight service deployment and reproducible application stacks.
## Why it matters
Docker reduces packaging inconsistency and makes service deployment more repeatable than hand-built application installs. It also provides a practical base for Compose-managed stacks in small to medium self-hosted environments.
## Core concepts
- Images and containers
- Registries as image distribution points
- Volumes for persistent data
- Networks for service connectivity
- Compose for multi-service application definitions
## Practical usage
Docker commonly fits into infrastructure as:
```text
Image registry -> Docker host -> containerized services -> reverse proxy or internal clients
```
Typical uses:
- Hosting web apps, dashboards, automation tools, and utility services
- Running small multi-container stacks with Compose
- Keeping application deployment separate from the base OS lifecycle
## Best practices
- Pin images intentionally and update them through a reviewed process
- Use named volumes or clearly documented bind mounts for state
- Put multi-service stacks under Compose and version control
- Keep ingress and persistence boundaries explicit
## Pitfalls
- Treating containers as ephemeral while silently storing irreplaceable state inside them
- Publishing too many host ports directly
- Using `latest` everywhere without a maintenance workflow
- Running every unrelated workload inside one large Compose project
## References
- [Docker: Docker overview](https://docs.docker.com/get-started/docker-overview/)
- [Docker: Networking overview](https://docs.docker.com/engine/network/)
- [Docker: Volumes](https://docs.docker.com/engine/storage/volumes/)
- [Compose Specification](https://compose-spec.io/)

63
70 - Tools/gitea/gitea.md Normal file
View File

@@ -0,0 +1,63 @@
---
title: Gitea
description: Tool overview for Gitea as a lightweight self-hosted Git forge
tags:
- gitea
- git
- self-hosting
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Gitea
## Summary
Gitea is a lightweight self-hosted Git forge that provides repositories, issues, pull requests, user and organization management, and optional automation features. It is commonly used as a self-hosted alternative to centralized Git hosting platforms.
## Why it matters
For self-hosted environments, Gitea offers source control and collaboration without the operational weight of larger enterprise platforms. It is often a good fit for homelabs, small teams, and private infrastructure repositories.
## Core concepts
- Repositories, organizations, and teams
- Authentication and user management
- Webhooks and integrations
- Actions or CI integrations depending on deployment model
- Persistent storage for repository data and attachments
## Practical usage
Gitea commonly fits into infrastructure as:
```text
Users and automation -> Gitea -> Git repositories -> CI or deployment systems
```
Typical uses:
- Hosting application and infrastructure repositories
- Managing issues and pull requests in a private environment
- Acting as a central source of truth for docs-as-code workflows
## Best practices
- Back up repository data, configuration, and the database together
- Integrate with centralized identity when possible
- Put Gitea behind a reverse proxy with a stable external URL
- Protect administrator access with MFA or a private access layer
## Pitfalls
- Treating Git repository data as sufficient without backing up the database and config
- Allowing base URL and reverse proxy headers to drift out of sync
- Running a forge without monitoring, backup validation, or update planning
- Using one shared administrator account for normal operations
## References
- [Gitea Documentation](https://docs.gitea.com/)
- [Gitea administration docs](https://docs.gitea.com/administration)
- [Gitea installation docs](https://docs.gitea.com/installation)

View File

@@ -0,0 +1,124 @@
---
title: Repository Labeling Strategies
description: A practical GitHub label taxonomy for issues and pull requests
tags:
- github
- devops
- workflow
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Repository Labeling Strategies
## Introduction
Labels make issue trackers easier to triage, search, automate, and report on. A good label system is small enough to stay consistent and expressive enough to support planning and maintenance.
## Purpose
This document provides a reusable label taxonomy for:
- Bugs and incidents
- Features and enhancements
- Operations and maintenance work
- Pull request triage
## Architecture Overview
A useful label strategy separates labels by function instead of creating one long undifferentiated list. A practical model uses these groups:
- Type: what kind of work item it is
- Priority: how urgent it is
- Status: where it is in the workflow
- Area: which subsystem it affects
- Effort: rough size or complexity
## Suggested Taxonomy
### Type labels
- `type:bug`
- `type:feature`
- `type:docs`
- `type:maintenance`
- `type:security`
- `type:question`
### Priority labels
- `priority:p0`
- `priority:p1`
- `priority:p2`
- `priority:p3`
### Status labels
- `status:needs-triage`
- `status:blocked`
- `status:in-progress`
- `status:ready-for-review`
### Area labels
- `area:networking`
- `area:containers`
- `area:security`
- `area:ci`
- `area:docs`
### Effort labels
- `size:small`
- `size:medium`
- `size:large`
## Configuration Example
Example policy:
```text
Every new issue gets exactly one type label and one status label.
High-impact incidents also get one priority label.
Area labels are optional but recommended for owned systems.
```
Example automation targets:
- Auto-add `status:needs-triage` to new issues
- Route `type:security` to security reviewers
- Build dashboards using `priority:*` and `area:*`
## Troubleshooting Tips
### Too many labels and nobody uses them
- Reduce the taxonomy to the labels that drive decisions
- Remove near-duplicate labels such as `bug` and `kind:bug`
- Standardize prefixes so labels sort clearly
### Labels stop reflecting reality
- Review automation rules and board filters
- Make status changes part of the pull request or issue workflow
- Archive labels that no longer map to current processes
### Teams interpret labels differently
- Document label meaning in the repository
- Reserve priority labels for response urgency, not personal preference
- Keep type and status labels mutually understandable
## Best Practices
- Use prefixes such as `type:` and `priority:` for readability and automation
- Keep the total label count manageable
- Apply a small mandatory label set and leave the rest optional
- Review labels quarterly as workflows change
- Match label taxonomy to how the team searches and reports on work
## References
- [GitHub Docs: Managing labels](https://docs.github.com/issues/using-labels-and-milestones-to-track-work/managing-labels)
- [GitHub Docs: Filtering and searching issues and pull requests](https://docs.github.com/issues/tracking-your-work-with-issues/using-issues/filtering-and-searching-issues-and-pull-requests)

View File

@@ -0,0 +1,63 @@
---
title: Grafana
description: Tool overview for Grafana as a dashboarding and observability interface
tags:
- grafana
- monitoring
- dashboards
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Grafana
## Summary
Grafana is a visualization and observability platform used to build dashboards, explore metrics, and manage alerting workflows across multiple data sources. In self-hosted environments, it is commonly paired with Prometheus to make infrastructure and service health easier to understand.
## Why it matters
Metrics data is more useful when operators can navigate it quickly during incidents and routine reviews. Grafana helps turn raw monitoring data into operational context that supports troubleshooting, reporting, and change validation.
## Core concepts
- Data sources such as Prometheus, Loki, or other backends
- Dashboards and panels for visualization
- Variables for reusable filtered views
- Alerting and notification integration
- Role-based access to shared observability data
## Practical usage
Grafana commonly fits into infrastructure as:
```text
Prometheus and other data sources -> Grafana dashboards and alerts -> operators
```
Typical uses:
- Infrastructure overview dashboards
- Service-specific health views
- Incident triage and post-change validation
## Best practices
- Keep dashboards tied to operational questions
- Build separate views for platform health and service health
- Use variables and naming conventions consistently
- Protect Grafana access and treat it as part of the observability platform
## Pitfalls
- Creating dashboards that look impressive but answer no real question
- Treating dashboards as enough without proper alerts
- Allowing panel sprawl and inconsistent naming
- Failing to back up dashboard definitions and provisioning config
## References
- [Grafana documentation](https://grafana.com/docs/grafana/latest/)
- [Grafana dashboards](https://grafana.com/docs/grafana/latest/dashboards/)
- [Grafana alerting](https://grafana.com/docs/grafana/latest/alerting/)

View File

@@ -0,0 +1,63 @@
---
title: Prometheus
description: Tool overview for Prometheus as a metrics collection, query, and alerting platform
tags:
- prometheus
- monitoring
- observability
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Prometheus
## Summary
Prometheus is an open source monitoring system built around time-series metrics, pull-based scraping, alert evaluation, and queryable historical data. It is a standard choice for infrastructure and service monitoring in self-hosted environments.
## Why it matters
Prometheus gives operators a consistent way to collect metrics from hosts, applications, and infrastructure components. It is especially valuable because it pairs collection, storage, and alert evaluation in one practical operational model.
## Core concepts
- Scrape targets and exporters
- Time-series storage
- PromQL for querying and aggregation
- Alerting rules for actionable conditions
- Service discovery integrations for dynamic environments
## Practical usage
Prometheus commonly fits into infrastructure as:
```text
Targets and exporters -> Prometheus -> dashboards and alerts
```
Typical uses:
- Scraping node, container, and application metrics
- Evaluating alert rules for outages and resource pressure
- Providing metrics data to Grafana
## Best practices
- Start with critical infrastructure and user-facing services
- Keep retention and scrape frequency aligned with actual operational needs
- Write alerts that map to a human response
- Protect Prometheus access because metrics can reveal sensitive system details
## Pitfalls
- Collecting too many high-cardinality metrics without a clear reason
- Treating every metric threshold as an alert
- Forgetting to monitor backup freshness, certificate expiry, or ingress paths
- Running Prometheus without a retention and storage plan
## References
- [Prometheus overview](https://prometheus.io/docs/introduction/overview/)
- [Prometheus concepts](https://prometheus.io/docs/concepts/)
- [Prometheus configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)

View File

@@ -0,0 +1,63 @@
---
title: Proxmox VE
description: Tool overview for Proxmox VE as a virtualization and clustering platform
tags:
- proxmox
- virtualization
- infrastructure
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Proxmox VE
## Summary
Proxmox VE is a virtualization platform for managing KVM virtual machines, Linux containers, storage, networking, and optional clustering. It is widely used in homelabs because it combines a web UI, CLI tooling, and strong documentation around core virtualization workflows.
## Why it matters
Proxmox provides a practical base layer for self-hosted environments that need flexible compute without managing every VM entirely by hand. It is especially useful when services need isolation that is stronger or more flexible than containers alone.
## Core concepts
- Nodes as individual hypervisor hosts
- Virtual machines and LXC containers as workload types
- Storage backends for disks, ISOs, backups, and templates
- Clustering and quorum for multi-node management
- Backup and restore tooling for guest protection
## Practical usage
Proxmox commonly fits into infrastructure as:
```text
Physical host or cluster -> Proxmox VE -> VMs and containers -> platform and application services
```
Typical uses:
- Hosting Docker VMs, DNS VMs, monitoring systems, and utility appliances
- Separating critical services into dedicated guests
- Running a small cluster for shared management and migration workflows
## Best practices
- Keep Proxmox management access on a trusted network segment
- Document which workloads are stateful and how they are backed up
- Use clustering only when the network and storage model support it
- Treat hypervisors as core infrastructure with tighter change control
## Pitfalls
- Assuming clustering alone provides shared storage or HA guarantees
- Mixing experimental and critical workloads on the same host without planning
- Ignoring quorum behavior in small clusters
- Treating snapshots as a complete backup strategy
## References
- [Proxmox VE documentation](https://pve.proxmox.com/pve-docs/)
- [Proxmox VE Administration Guide: Cluster Manager](https://pve.proxmox.com/pve-docs/chapter-pvecm.html)
- [Proxmox VE Backup and Restore](https://pve.proxmox.com/pve-docs/chapter-vzdump.html)

View File

@@ -0,0 +1,63 @@
---
title: Tailscale
description: Tool overview for Tailscale as a private networking and remote access layer
tags:
- tailscale
- vpn
- networking
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Tailscale
## Summary
Tailscale is a WireGuard-based mesh VPN that provides identity-aware connectivity between devices. It is frequently used to reach homelab services, private admin interfaces, and remote systems without exposing them directly to the public internet.
## Why it matters
Tailscale simplifies remote access and private service connectivity without requiring a traditional central VPN gateway for all traffic. It is especially useful for small environments where easy onboarding and policy-driven access matter more than complex appliance-based VPN design.
## Core concepts
- Tailnet as the private network boundary
- Identity-based access controls
- Peer-to-peer encrypted connectivity with DERP fallback
- MagicDNS for tailnet name resolution
- Subnet routers and exit nodes for advanced routing roles
## Practical usage
Tailscale commonly fits into infrastructure as:
```text
Admin or device -> tailnet -> private service or subnet router
```
Typical uses:
- Remote SSH access to servers
- Private access to dashboards and management services
- Routing selected LAN subnets into a private network overlay
## Best practices
- Use tags and access controls early instead of keeping the tailnet flat
- Treat exit nodes and subnet routers as high-trust infrastructure roles
- Use MagicDNS or split DNS instead of memorized addresses
- Limit which services are intended for tailnet-only access
## Pitfalls
- Advertising broad routes without matching access policy
- Treating overlay connectivity as a substitute for local firewalling
- Leaving unused devices enrolled in the tailnet
- Using one large unrestricted trust domain for every user and service
## References
- [Tailscale: What is Tailscale?](https://tailscale.com/kb/1151/what-is-tailscale)
- [Tailscale: Access controls](https://tailscale.com/kb/1018/acls)
- [Tailscale: MagicDNS](https://tailscale.com/kb/1081/magicdns)

View File

@@ -0,0 +1,63 @@
---
title: Traefik
description: Tool overview for Traefik as a modern reverse proxy and dynamic ingress controller
tags:
- traefik
- reverse-proxy
- ingress
category: tools
created: 2026-03-14
updated: 2026-03-14
---
# Traefik
## Summary
Traefik is a reverse proxy and ingress tool designed for dynamic environments. It is especially popular in containerized setups because it can discover services from providers such as Docker and build routes from metadata.
## Why it matters
When services are created or moved frequently, static proxy configuration becomes a maintenance burden. Traefik reduces manual route management by linking service discovery with ingress configuration.
## Core concepts
- EntryPoints as listening ports or addresses
- Routers for request matching
- Services for upstream destinations
- Middlewares for auth, redirects, headers, and rate controls
- Providers such as Docker or file-based configuration
## Practical usage
Traefik commonly fits into infrastructure as:
```text
Client -> Traefik entrypoint -> router -> middleware -> service backend
```
Typical uses:
- Reverse proxying containerized services
- Automatic route generation from Docker labels
- Central TLS termination for a container platform
## Best practices
- Keep provider metadata minimal and standardized
- Separate public and internal entrypoints where trust boundaries differ
- Review middleware behavior as part of security policy
- Monitor certificate and routing health
## Pitfalls
- Hiding important routing logic in inconsistent labels across stacks
- Exposing internal services accidentally through default provider behavior
- Letting Docker label sprawl become the only source of ingress documentation
- Assuming dynamic config removes the need for change review
## References
- [Traefik documentation](https://doc.traefik.io/traefik/)
- [Traefik: Routing overview](https://doc.traefik.io/traefik/routing/overview/)
- [Traefik Docker provider](https://doc.traefik.io/traefik/providers/docker/)