OpenRAG Technical Architecture — Climacs IT Consulting

Reference Architecture

Current deployment on a dedicated VM (live)

This is the actual architecture running on a self-hosted Proxmox VM — not a theoretical diagram. Every component listed below is deployed and operational.

Client

👤 User / Browser

Accesses Chainlit chat UI or API directly

https://<your-host>:8080/chainlit

↓

Container

🧠 openrag-cpu

Main API + Chainlit + Ray actor pool + Document Serializer

FastAPI Ray Whisper ~4.7 GB RAM

Container

📤 indexer-ui

React file upload dashboard

:8060 ~31 MB RAM

↓

🔤 vllm-cpu (Embedder)

BAAI/bge-small-en-v1.5 — 33M params

384-dim ~1.1 GB RAM

🔀 LiteLLM Proxy

Routes OpenAI-compatible calls → AWS Bedrock

:4000 --drop_params ~800 MB

Stopped

📊 reranker-cpu

gte-multilingual-reranker-base

~2.1 GB Enable for 50+ docs

↓

Store

🗄️ Milvus

Vector database — stores document embeddings

~190 MB RAM

Store

📦 MinIO + etcd

Object storage + metadata for Milvus

~163 MB combined

Store

🐘 rdb (PostgreSQL)

Internal metadata, partition config

~43 MB RAM

↓ via HTTPS

Cloud

☁️ AWS Bedrock (us-east-1)

Claude 3.5 Haiku — inference profile: us.anthropic.claude-3-5-haiku-20241022-v1:0

DPA available. Data not used for training. ~$0.25/M input tokens.

AI Pipeline

Query flow: question → answer

When a user asks a question, this is the exact sequence of operations.

💬 User Query

→

🔤 Embed Query
BGE-small-en

→

🔍 Milvus Search
Top-K vectors

→

📊 Rerank
(optional)

→

🧠 LiteLLM → Bedrock
Claude 3.5 Haiku

→

📎 Answer + Sources

Container Inventory

Live container status

9 containers total. Reranker stopped to save 2.1GB RAM — enable for 50+ documents.

Container	Image	RAM	Port	Status
openrag-cpu	linagoraai/openrag	4.7 GB	:8080	Running
litellm-proxy	ghcr.io/berriai/litellm	800 MB	:4000	Running
vllm-cpu	vllm (BGE-small)	1.1 GB	:8000	Healthy
milvus	milvusdb/milvus	190 MB	:19530	Healthy
minio	minio/minio	116 MB	:9000	Running
etcd	quay.io/coreos/etcd	47 MB	:2379	Running
rdb	postgres	43 MB	:5432	Running
indexer-ui	linagoraai/indexer-ui	31 MB	:8060	Running
reranker-cpu	gte-reranker-base	2.1 GB	:7997	Stopped

Container

Image

RAM

Port

Status

openrag-cpu

linagoraai/openrag

4.7 GB

:8080

Running

litellm-proxy

ghcr.io/berriai/litellm

800 MB

:4000

Running

vllm-cpu

vllm (BGE-small)

1.1 GB

:8000

Healthy

milvus

milvusdb/milvus

190 MB

:19530

Healthy

minio

minio/minio

116 MB

:9000

Running

etcd

quay.io/coreos/etcd

47 MB

:2379

Running

rdb

postgres

43 MB

:5432

Running

indexer-ui

linagoraai/indexer-ui

31 MB

:8060

Running

reranker-cpu

gte-reranker-base

2.1 GB

:7997

Stopped

Total idle RAM: ~6.4 GB (without reranker) / ~8.5 GB (with reranker) of 16 GB allocated.

Hardware Requirements

Specifications per implementation level

🟢 Level 1 — Proof of Concept

Single VM on Proxmox, VMware, Hyper-V, or cloud VPS

Minimum

CPU4 vCPU (AVX2)

RAM16 GB

Storage100 GB SSD

Network100 Mbps

GPUNone (cloud LLM)

Software Stack

OSUbuntu 22/24 LTS

RuntimeDocker + Compose

LLMBedrock / Ollama

EmbedderBGE-small-en

VectorDBMilvus

Cost Estimate

Cloud VPS€30–50/mo

LLM API€5–30/mo

On-prem HW€1,500–2,000

USD equiv$45–95/mo

Deploy time1–2 weeks

🔵 Level 2 — Secured Production

1–2 VMs with reverse proxy, SSO, encrypted storage, audit logging

Minimum

CPU8 vCPU total

RAM32 GB

Storage500 GB SSD (LUKS)

Backup1 TB encrypted

GPURTX 4060 Ti 16GB

Added Components

AuthKeycloak SSO

ProxyNginx + TLS

RerankerBGE-reranker

AuditTamper-proof log

BackupDaily encrypted

Cost Estimate

Managed€150–300/mo

Support€500–1,000/mo

On-prem HW€3,000–5,000

USD equiv$770–1,650/mo

Deploy time4–6 weeks

🟣 Level 3 — Enterprise Platform

Kubernetes cluster with HA, multi-tenant isolation, GPU inference, observability

Minimum (3-node)

CPU16+ cores/node

RAM64 GB/node

Storage2 TB NVMe/node

Network10 Gbps inter-node

GPU2× RTX 4090 / A100

Enterprise Stack

OrchestrationKubernetes

IngressNginx + WAF

AuthKeycloak + MFA

MonitorPrometheus+Grafana

LogsLoki + SIEM

Cost Estimate

Monthly€2,600–8,000

HW (USD)$13,200–16,500

HW (NL)€10,000–15,000

LLM (local)€0 (air-gapped)

Deploy time8–16 weeks

Security & Compliance

Security posture per level

🟢 Level 1

Network perimeter access only (VPN/firewall)
Documents stored locally on server
Embeddings generated on-premise
TLS for LLM API calls

No user authentication
No audit trail
No encryption at rest
No right-to-erasure workflow

🔵 Level 2

SSO / OIDC authentication (Keycloak)
Role-based access control (RBAC)
HTTPS everywhere (Let's Encrypt)
LUKS full-disk encryption
Audit log (who, what, when)
Daily encrypted backups
Right-to-erasure (partition delete)
Department/client partitioning
VPN-only or IP whitelist access

🟣 Level 3

Everything in Level 2
Multi-factor authentication (MFA)
Document-level ACLs
Immutable append-only audit log
SIEM integration
WAF + DDoS protection
Tenant-level data isolation
Penetration tested annually
EU AI Act transparency reporting
ISO 27001 / SOC 2 ready
Disaster recovery (RPO <1h, RTO <4h)

AI Strategy

Model selection per privacy requirement

Component	Level 1	Level 2	Level 3
Embedder	BGE-small-en 33M params · 130 MB	BGE-base-en 110M params · 400 MB	BGE-large / multilingual GPU-accelerated
LLM	Cloud: Bedrock Haiku OR Ollama (Llama 3.1 8B)	Cloud: Bedrock + DPA OR local Mistral 7B	Fully local: Llama 70B OR Mixtral 8×7B (air-gap)
Reranker	None <50 documents	BGE-reranker-base ~500 MB	BGE-reranker-v2-m3 GPU-accelerated
Privacy	Embeddings local LLM via cloud DPA	Everything local possible Cloud fallback with DPA	Fully air-gapped Zero cloud dependency

Component

Level 1

Level 2

Level 3

Embedder

BGE-small-en
33M params · 130 MB

BGE-base-en
110M params · 400 MB

BGE-large / multilingual
GPU-accelerated

LLM

Cloud: Bedrock Haiku
OR Ollama (Llama 3.1 8B)

Cloud: Bedrock + DPA
OR local Mistral 7B

Fully local: Llama 70B
OR Mixtral 8×7B (air-gap)

Reranker

None
<50 documents

BGE-reranker-base
~500 MB

BGE-reranker-v2-m3
GPU-accelerated

Privacy

Embeddings local
LLM via cloud DPA

Everything local possible
Cloud fallback with DPA

Fully air-gapped
Zero cloud dependency

Operational

Configuration file paths

Source of truth on the OpenRAG VM: /opt/openrag/quick_start/.env

File	Path on VM	Purpose
Main .env	`/opt/openrag/quick_start/.env`	All environment variables
Docker Compose	`/opt/openrag/quick_start/docker-compose.yaml`	Container definitions + volumes
Pipeline patch	`/opt/openrag/pipeline_patched.py`	Removes vLLM-specific params (Bedrock compat)
LiteLLM config	`/opt/openrag/litellm_config.yaml`	Bedrock model routing
Hydra config	`/opt/openrag/.hydra_config/config.yaml`	OpenRAG internal settings

OpenRAG Platform
Engineering Overview

Current deployment on a dedicated VM (live)

👤 User / Browser

🧠 openrag-cpu

📤 indexer-ui

🔤 vllm-cpu (Embedder)

🔀 LiteLLM Proxy

📊 reranker-cpu

🗄️ Milvus

📦 MinIO + etcd

🐘 rdb (PostgreSQL)

☁️ AWS Bedrock (us-east-1)

Query flow: question → answer

Live container status

Specifications per implementation level

🟢 Level 1 — Proof of Concept

Minimum

Recommended

Software Stack

Cost Estimate

🔵 Level 2 — Secured Production

Minimum

Recommended

Added Components

Cost Estimate

🟣 Level 3 — Enterprise Platform

Minimum (3-node)

Recommended

Enterprise Stack

Cost Estimate

Security posture per level

🟢 Level 1

🔵 Level 2

🟣 Level 3

Model selection per privacy requirement

Configuration file paths

OpenRAG PlatformEngineering Overview

Current deployment on a dedicated VM (live)

👤 User / Browser

🧠 openrag-cpu

📤 indexer-ui

🔤 vllm-cpu (Embedder)

🔀 LiteLLM Proxy

📊 reranker-cpu

🗄️ Milvus

📦 MinIO + etcd

🐘 rdb (PostgreSQL)

☁️ AWS Bedrock (us-east-1)

Query flow: question → answer

Live container status

Specifications per implementation level

🟢 Level 1 — Proof of Concept

Minimum

Recommended

Software Stack

Cost Estimate

🔵 Level 2 — Secured Production

Minimum

Recommended

Added Components

Cost Estimate

🟣 Level 3 — Enterprise Platform

Minimum (3-node)

Recommended

Enterprise Stack

Cost Estimate

Security posture per level

🟢 Level 1

🔵 Level 2

🟣 Level 3

Model selection per privacy requirement

Configuration file paths

OpenRAG Platform
Engineering Overview