Back to climacs.net
Live Infrastructure

Climacs Homelab

Public Edge + Homelab Platform — Caddy edge routing, Kubernetes, Observability, AI Applications

Last verified: 2026-05-21 · Manually maintained
9
VMs & LXCs
9
Public Routes
3
K8s Nodes
3.6T
NAS Storage
64G
Proxmox RAM

🌐 Public Edge Traffic Flow

🖥️
Browser
*.climacs.net
🌍
Route 53
AWS DNS
📡
Home Router / NAT
WAN → LAN forward
🔒
Caddy (.60)
TLS :80/:443
Origin
static / reverse proxy
All public HTTPS terminates on the edge VM Caddy. Origins are LAN-only services.

🗺️ Public Route Table

Hostname Type Origin Description
climacs.net STATIC edge VM /srv Portfolio site — Astro build output
www.climacs.net REDIRECT edge VM Caddy 301 redirect to apex
vault.climacs.net PROXY vault LXC :8080 Vaultwarden — password manager (isolated LXC)
copilot.climacs.net PROXY edge VM :3000 EKS Triage Copilot — AI operations UI
nero.climacs.net PROXY apps VM :3000 NeroCamp — scheduling app (Next.js)
finops.climacs.net PROXY apps VM :8002 AWS Cost Dashboard — FinOps demo
runbook.climacs.net K8S k8s worker-2 :80 Homelab Runbooks — Kubernetes Ingress
prometheus.climacs.net K8S k8s worker-1 :31041 Prometheus — monitoring metrics (HTTPS origin)
grafana.climacs.net K8S k8s worker-1 :30080 Grafana — dashboards (NodePort 30080)

🖥️ Infrastructure

🏗️

Proxmox Hypervisor

hypervisor · 64GB RAM · 1TB SSD

Virtual Machines

cka-cp-v2 (VM 300)
K8s Control Plane · kubeadm v1.32
LIVE
cka-worker-v2 (VM 311)
K8s Worker 1 · 6GB · 4 CPU
LIVE
worker-v2-node-2 (VM 312)
K8s Worker 2 · 6GB · 2 CPU
LIVE
web-climacs-01 (VM 600)
Public Edge · Docker Compose · Caddy
LIVE
web-climacs-test (VM 601)
Apps/Test · Nero, FinOps, Uptime Kuma
LIVE
openrag-mvp-test (VM 630)
OpenRAG Platform · 16GB RAM
LIVE

LXC Containers

pg-db-01 (LXC 800)
PostgreSQL 16 · copilot_db
LIVE
arr (LXC 120)
Media automation
LIVE
Vaultwarden (LXC 910)
Password manager · isolated LXC
LIVE
💾

UGREEN NAS

DXP2800 · UGOS (Debian 12)

Docker Services

Gitea Git Server
:3001 (web) · :2222 (SSH)
LIVE
Gitea Actions Runner
CI/CD for site deploy
LIVE
Docker Registry v2
:5000 · Container image storage
LIVE

NFS Shares

/volume1/git
Gitea data + repos
MOUNTED
/volume1/registry
Container image layers
MOUNTED
/volume1/backups
PG dumps, VM backups
MOUNTED
/volume1/nfs-appdata
App persistent data
MOUNTED
/volume1/dev-share
SMB human-accessible
MOUNTED

Storage

Total Capacity
3.6 TB (HDD + SSD)
620GB FREE

🔒 Public Edge VM

🌐

web-climacs-01 (VM 600)

edge VM · Ubuntu 24.04 · 6GB RAM · Docker Compose

The single public entry point for all *.climacs.net traffic. The home router forwards ports 80/443 to this VM. Caddy handles automatic TLS and routes requests to local containers or LAN services.

Caddy (production)
caddy:2-alpine · :80/:443 · auto-TLS
LIVE
Caddy (pre-prod)
caddy:2-alpine · :6767 · LAN only
LIVE
Copilot Frontend
React · :3000 → internal :80
LIVE
Copilot BFF
FastAPI · :8000 internal only
LIVE

Security Hardening

UFW Firewall
Allow 22, 80, 443 only
ACTIVE
fail2ban
SSH brute-force protection
ACTIVE
SSH Key-Only
cloud-init provisioned
OK
HSTS Header
max-age=31536000; includeSubDomains
OK

☸️ Kubernetes Cluster

🎛️

Control Plane

cka-cp-v2 · control plane
kubeadm v1.32.11
LIVE
etcd
single-node, stacked
OK
containerd
trusts NAS registry :5000 (HTTP)
OK
⚙️

Worker 1

cka-worker-v2 · worker node 1
Cilium CNI
LIVE
containerd 1.7
6GB RAM · 4 CPU · 38GB disk
OK
NodePorts
Prometheus :31041 · Grafana :30080
LIVE
⚙️

Worker 2

worker-v2-node-2 · worker node 2
Cilium CNI
LIVE
containerd 2.2
6GB RAM · 2 CPU · 30GB disk
OK
Ingress
Runbook site via :80
LIVE
📦

Platform Stack

kubeadm cluster config
CNI
Cilium
LIVE
Storage
local-path-provisioner
LIVE
GitOps
Argo CD
LIVE
DNS
CoreDNS
LIVE
🗂️

Namespaces

active workloads
monitoring
Prometheus, Alertmanager, Grafana
LIVE
argocd
GitOps delivery
LIVE
bedrock-copilot
Copilot BFF + frontend
LIVE
wordpress
MariaDB + WordPress on worker-2
LIVE
🗃️

PostgreSQL

pg-db-01 · database LXC
PostgreSQL 16
DB: copilot_db
LIVE
Backup to NAS
pg_dump cron · daily 03:00 · 14d retention
LIVE

📊 Observability

🔥

Prometheus

in-cluster · monitoring namespace
kube-prometheus-stack
145 alert rules
LIVE
Public URL
prometheus.climacs.net
LIVE
NodePort
worker-1 NodePort :31041 (HTTPS)
OK
🔔

Alertmanager

in-cluster · Telegram receiver
Telegram Bot
@climacs_homelab_bot
LIVE
Critical route
repeat_interval: 1h
OK
Warning route
repeat_interval: 4h
OK
📈

Grafana

in-cluster · public dashboards
Public URL
grafana.climacs.net
LIVE
NodePort
worker-1 NodePort :30080
OK
🩺

Uptime Kuma

apps VM :3001 · external prober
Synthetic checks
LAN-based health checks for all services
LIVE
📡

Node Exporters

Prometheus scrape targets · :9100
edge VM · apps VM · RAG VM
Host metrics: CPU, memory, disk, network
LIVE

🛡️ Application Stack — BFF Guardrails

🔐

Backend-for-Frontend (FastAPI / Python)

web-ui/bff/main.py + guardrails.py · Uvicorn ASGI · runs in Docker / Kubernetes

The BFF sits between the React frontend and AWS API Gateway. The browser never touches AWS directly — the API key is injected server-side only. Every request passes through a layered guardrail stack before being forwarded.

⏱️
Rate Limiting
slowapi · RATE_LIMIT=10/minute
Per-IP rate limiter on all /api/<endpoint> POST routes. Returns HTTP 429 when exceeded. Limit is configurable via .env without code changes.
🔑
Secret Pattern Detection
guardrails.py · 6 regex patterns
Scans every request body before forwarding to AWS:
• AWS Access Keys (AKIA…)
aws_access_key_id / aws_secret_access_key
• RSA / PEM private keys
• Slack bot tokens (xoxb-)
• GitHub PATs (ghp_…)
Returns HTTP 400 if any match found.
📏
Payload Size Cap
MAX_REQUEST_BYTES=51200 (50 KB)
Body size checked before secret scan or AWS forwarding. Returns HTTP 413 for oversized payloads. Prevents prompt-stuffing and runaway Lambda costs.
🚧
Endpoint Allowlist
ALLOWED_ENDPOINTS = {triage, explain, runbook-snippet}
Only 3 paths are routable to AWS. Any other path returns HTTP 404 before touching the network. Prevents endpoint-probing attacks.
🔇
No-Prompt Logging Policy
logger.info(metadata only)
Logs emit only: conv_id[:8], endpoint, status_code, latency_ms. Full prompts and responses are never logged to stdout. Only persisted (encrypted at rest) in PostgreSQL history.
🔧
LLM Response Normalization
_cleanup_response() · per-endpoint
Handles real-world LLM inconsistencies from Lambda:
• Strips markdown code fences (```json)
• Re-parses broken JSON payloads
• Converts "key": "value" dict responses to readable text
• Filters JSON noise ({, } lines) from steps lists
• Handles escaped \n and \" in runbook markdown
🗄️
PostgreSQL Conversation History
database.py · SQLite (dev) / PostgreSQL (prod)
Every request saved with: conv_id, endpoint, model_id, confidence, latency_ms, status_code. Supports project-based organization, archiving, soft-delete, and full-text search. Survives container restarts.
Request Timeout + Error Budget
REQUEST_TIMEOUT_SECONDS=30
HTTPX async client enforces 30s hard timeout to AWS. Timeouts and network errors return structured HTTP 504 / 502 and are also persisted to conversation history for debugging.
Request Flow Through Guardrails
Browser POST Rate Limit ✓ Size ≤ 50KB ✓ Secret Scan ✓ Inject x-api-key AWS API GW Normalize JSON Save to PG Response ✅

📋 Open Items & Risks

🔧

Platform

Confirm ingress controller class
nginx vs Traefik — currently unclear
CONFIRM
etcd snapshot cron
Verify schedule and retention policy
CONFIRM
Proxmox backup schedule
All VMs/LXCs to NAS
CONFIRM
📡

Monitoring Gaps

External hostname checks
Add for every public *.climacs.net route
TODO
Caddyfile drift check
Align with inventory.yaml on every deploy
TODO
vault.climacs.net monitor
Prevent repeat of 2026-05-05 outage
PLANNED
📝

Documentation

Mac Mini kubectl access
Document kubeconfig path and context
TODO
Router forwarding rules
Document router forwarding config
TODO
Restrict SSH to LAN CIDR
private LAN CIDR only
TODO