Technical Architecture

OpenRAG Platform
Engineering Overview

Infrastructure, container topology, AI pipeline, security posture, and hardware specifications for IT architects, platform engineers, and technical decision-makers.

Reference Architecture

Current deployment on a dedicated VM (live)

This is the actual architecture running on a self-hosted Proxmox VM — not a theoretical diagram. Every component listed below is deployed and operational.

Client

👤 User / Browser

Accesses Chainlit chat UI or API directly

https://<your-host>:8080/chainlit

Container

🧠 openrag-cpu

Main API + Chainlit + Ray actor pool + Document Serializer

FastAPI Ray Whisper ~4.7 GB RAM

Container

📤 indexer-ui

React file upload dashboard

:8060 ~31 MB RAM

AI

🔤 vllm-cpu (Embedder)

BAAI/bge-small-en-v1.5 — 33M params

384-dim ~1.1 GB RAM

AI

🔀 LiteLLM Proxy

Routes OpenAI-compatible calls → AWS Bedrock

:4000 --drop_params ~800 MB

Stopped

📊 reranker-cpu

gte-multilingual-reranker-base

~2.1 GB Enable for 50+ docs

Store

🗄️ Milvus

Vector database — stores document embeddings

~190 MB RAM

Store

📦 MinIO + etcd

Object storage + metadata for Milvus

~163 MB combined

Store

🐘 rdb (PostgreSQL)

Internal metadata, partition config

~43 MB RAM

↓ via HTTPS
Cloud

☁️ AWS Bedrock (us-east-1)

Claude 3.5 Haiku — inference profile: us.anthropic.claude-3-5-haiku-20241022-v1:0

DPA available. Data not used for training. ~$0.25/M input tokens.

AI Pipeline

Query flow: question → answer

When a user asks a question, this is the exact sequence of operations.

💬 User Query
🔤 Embed Query
BGE-small-en
🔍 Milvus Search
Top-K vectors
📊 Rerank
(optional)
🧠 LiteLLM → Bedrock
Claude 3.5 Haiku
📎 Answer + Sources
Container Inventory

Live container status

9 containers total. Reranker stopped to save 2.1GB RAM — enable for 50+ documents.

ContainerImageRAMPortStatus
openrag-cpu linagoraai/openrag 4.7 GB :8080 Running
litellm-proxy ghcr.io/berriai/litellm 800 MB :4000 Running
vllm-cpu vllm (BGE-small) 1.1 GB :8000 Healthy
milvus milvusdb/milvus 190 MB :19530 Healthy
minio minio/minio 116 MB :9000 Running
etcd quay.io/coreos/etcd 47 MB :2379 Running
rdb postgres 43 MB :5432 Running
indexer-ui linagoraai/indexer-ui 31 MB :8060 Running
reranker-cpu gte-reranker-base 2.1 GB :7997 Stopped

Total idle RAM: ~6.4 GB (without reranker) / ~8.5 GB (with reranker) of 16 GB allocated.

Hardware Requirements

Specifications per implementation level

🟢 Level 1 — Proof of Concept

Single VM on Proxmox, VMware, Hyper-V, or cloud VPS

Minimum

CPU4 vCPU (AVX2)
RAM16 GB
Storage100 GB SSD
Network100 Mbps
GPUNone (cloud LLM)

Recommended

CPU8 vCPU
RAM32 GB
Storage200 GB NVMe
Network1 Gbps
GPURTX 3060 12GB (opt)

Software Stack

OSUbuntu 22/24 LTS
RuntimeDocker + Compose
LLMBedrock / Ollama
EmbedderBGE-small-en
VectorDBMilvus

Cost Estimate

Cloud VPS€30–50/mo
LLM API€5–30/mo
On-prem HW€1,500–2,000
USD equiv$45–95/mo
Deploy time1–2 weeks

🔵 Level 2 — Secured Production

1–2 VMs with reverse proxy, SSO, encrypted storage, audit logging

Minimum

CPU8 vCPU total
RAM32 GB
Storage500 GB SSD (LUKS)
Backup1 TB encrypted
GPURTX 4060 Ti 16GB

Recommended

CPU12–16 vCPU
RAM64 GB
Storage1 TB NVMe
Network1 Gbps + VPN
UPSRecommended

Added Components

AuthKeycloak SSO
ProxyNginx + TLS
RerankerBGE-reranker
AuditTamper-proof log
BackupDaily encrypted

Cost Estimate

Managed€150–300/mo
Support€500–1,000/mo
On-prem HW€3,000–5,000
USD equiv$770–1,650/mo
Deploy time4–6 weeks

🟣 Level 3 — Enterprise Platform

Kubernetes cluster with HA, multi-tenant isolation, GPU inference, observability

Minimum (3-node)

CPU16+ cores/node
RAM64 GB/node
Storage2 TB NVMe/node
Network10 Gbps inter-node
GPU2× RTX 4090 / A100

Recommended

Nodes3–5 (EPYC/Xeon)
RAM128 GB/node
HAActive-passive DR
BackupGeo-redundant
SLA99.9% uptime

Enterprise Stack

OrchestrationKubernetes
IngressNginx + WAF
AuthKeycloak + MFA
MonitorPrometheus+Grafana
LogsLoki + SIEM

Cost Estimate

Monthly€2,600–8,000
HW (USD)$13,200–16,500
HW (NL)€10,000–15,000
LLM (local)€0 (air-gapped)
Deploy time8–16 weeks
Security & Compliance

Security posture per level

🟢 Level 1

  • Network perimeter access only (VPN/firewall)
  • Documents stored locally on server
  • Embeddings generated on-premise
  • TLS for LLM API calls
  • No user authentication
  • No audit trail
  • No encryption at rest
  • No right-to-erasure workflow

🔵 Level 2

  • SSO / OIDC authentication (Keycloak)
  • Role-based access control (RBAC)
  • HTTPS everywhere (Let's Encrypt)
  • LUKS full-disk encryption
  • Audit log (who, what, when)
  • Daily encrypted backups
  • Right-to-erasure (partition delete)
  • Department/client partitioning
  • VPN-only or IP whitelist access

🟣 Level 3

  • Everything in Level 2
  • Multi-factor authentication (MFA)
  • Document-level ACLs
  • Immutable append-only audit log
  • SIEM integration
  • WAF + DDoS protection
  • Tenant-level data isolation
  • Penetration tested annually
  • EU AI Act transparency reporting
  • ISO 27001 / SOC 2 ready
  • Disaster recovery (RPO <1h, RTO <4h)
AI Strategy

Model selection per privacy requirement

ComponentLevel 1Level 2Level 3
Embedder BGE-small-en
33M params · 130 MB
BGE-base-en
110M params · 400 MB
BGE-large / multilingual
GPU-accelerated
LLM Cloud: Bedrock Haiku
OR Ollama (Llama 3.1 8B)
Cloud: Bedrock + DPA
OR local Mistral 7B
Fully local: Llama 70B
OR Mixtral 8×7B (air-gap)
Reranker None
<50 documents
BGE-reranker-base
~500 MB
BGE-reranker-v2-m3
GPU-accelerated
Privacy Embeddings local
LLM via cloud DPA
Everything local possible
Cloud fallback with DPA
Fully air-gapped
Zero cloud dependency
Operational

Configuration file paths

Source of truth on the OpenRAG VM: /opt/openrag/quick_start/.env

FilePath on VMPurpose
Main .env /opt/openrag/quick_start/.env All environment variables
Docker Compose /opt/openrag/quick_start/docker-compose.yaml Container definitions + volumes
Pipeline patch /opt/openrag/pipeline_patched.py Removes vLLM-specific params (Bedrock compat)
LiteLLM config /opt/openrag/litellm_config.yaml Bedrock model routing
Hydra config /opt/openrag/.hydra_config/config.yaml OpenRAG internal settings