CDC-to-Lakehouse Framework

❄️ FrostStream

Real-time Change Data Capture from PostgreSQL to S3-compatible object storage, powered by Debezium, Kafka KRaft, and MinIO.

🐘
PostgreSQL
WAL / pgoutput
v16 Alpine
● Healthy
WAL
stream
Debezium
Source Connector
v2.5.4
● Running
CDC
events
📨
Kafka
KRaft Mode
v7.6.0 (CP)
● Healthy
topic
consume
📤
S3 Sink
Confluent Connector
v10.5.7
● Running
JSON
files
🪣
MinIO
S3-Compatible Storage
latest
● Healthy
11/12
Checks Passed
5
Docker Services
4 = 4
Rows → Files Reconciled
~€0
Monthly Cost (Proxmox)
3
CDC Operations Tested

🔄 CDC Operations Verified

  • Snapshot — Initial 3 seed rows captured
  • INSERT — New row → CDC event with "op": "c"
  • UPDATE — Modified row → before/after delta
  • DELETE — Removed row → tombstone event
  • Recovery — Connector restart → offsets preserved
  • Reconciliation — Source rows = sink files

⚙️ Key Configuration

Kafka Mode KRaft (no ZooKeeper)
CDC Plugin pgoutput
Output Format JSON (→ Parquet Phase 2)
Partitioner TimeBasedPartitioner (hourly)
Converter JsonConverter + schemas.enable
Heartbeat 10s interval
Flush Size 3 records (MVP)

🗺️ Phased Roadmap

  • Phase 1 — Lean CDC MVP (PostgreSQL → MinIO)
  • Phase 2 — Iceberg Sink + Nessie Catalog + Trino
  • Phase 3 — K8s + Monitoring + Disaster Recovery
  • AWS — Same compose, real S3, ~$57-60/mo

🚀 Deployment Comparison

Proxmox ✅ Active (~€0/mo)
AWS EC2 + S3 ○ Planned (~$57/mo)
Portability 95% same compose
CI/CD Gitea Actions
Registry Private (self-hosted)

🖥️ Infrastructure — Dedicated VM (froststream-01)

4
vCPU (host type)
8 GB
RAM (no balloon)
100 GB
SSD (ZFS)
froststream-postgres
postgres:16-alpine
froststream-kafka
cp-kafka:7.6.0
froststream-connect
custom (Debezium + S3)
froststream-minio
minio/minio:latest
froststream-minio-init
minio/mc (one-shot)