The open-source platform
for AI chat agents.
Deploy intelligent RAG chat agents powered by your own data. Start for free on our managed cloud, or self-host natively inside your own AWS account.
Live demo โ RAG chat with document upload, SQL querying, and streaming responses
How VegaRAG Actually Works
A production-grade RAG platform built on AWS Fargate, LangGraph StateGraph agents, Pinecone vector search, PostgreSQL Row-Level Security, and Amazon Bedrock Nova โ guarded by Microsoft Presidio PII redaction and dual-LLM hallucination checks.
Layer 1 โ Cloud Infrastructure
Route 53 + ACM
Custom domain with HTTPS. SSL certificate auto-renewed via ACM. All traffic encrypted in transit end-to-end.
Application Load Balancer
Single ALB handles all traffic via priority rules: /api/* โ Backend, /chat/* โ Chat UI, /* โ Frontend.
ECS Fargate Cluster
Serverless containers โ no EC2 to manage. Auto-scales on demand, zero idle cost. Each service in its own task definition with separate IAM task roles.
VPC + Security Groups
All services share one SG. Inbound: 80, 443, 3000, 3001, 8000. No hardcoded keys โ IAM Instance Profile auth only.
Layer 2 โ Three Fargate Microservices
Frontend Dashboard
- AWS Cognito authentication (JWT + Refresh tokens)
- Agent CRUD โ create, configure, deploy agents
- Data ingestion: URLs, PDFs, CSVs, plain text
- Workflow Studio (ReactFlow visual canvas)
- Analytics charts, chat logs, deploy embed codes
- Settings: system prompt, brand color, chat UI branding (title + logo per agent)
Backend AI Engine
- LangGraph StateGraph orchestrates every agent run end-to-end
- Semantic Caching (Pinecone-backed <50ms exact-match replies)
- Token Bucket Rate Limiting (Multi-tenant noisy-neighbor protection)
- Microsoft Presidio PII Redaction (SSN/Email anonymisation)
- PostgreSQL Data Warehouse with mandatory Row-Level Security (RLS)
- Output Guardrails (Dual-LLM entailment checks to block hallucinations)
- Asynchronous Background Ingestion (No ALB timeouts on large PDFs)
Chat UI Service
- LangGraph SDK React โ full streaming state management
- Next.js API proxy at /chat/api/langgraph/* (server-side)
- Translates LangGraph SDK wire calls โ VegaRAG REST API
- Thread history sidebar grouped by assistantId
- Per-agent branding: custom name + logo from backend config
- Artifact renderer for structured outputs, tool call display
- SSE event reconstruction with stable message IDs
Layer 3 โ LangGraph StateGraph Agent Topology
User query enters StateGraph. Session ID, bot_id, conversation history loaded from DynamoDB context.
Bedrock Nova Lite classifies as casual, rag, or sql using strict JSON schema structured output parsing.
Conditional edge dispatches to RAG retriever node, SQL executor node, or direct casual LLM node.
Retrieved Pinecone chunks or DuckDB SQL rows injected into prompt as <context>...</context> XML markers.
Bedrock Nova Pro streams tokens. FastAPI yields each chunk as SSE to the Chat UI proxy and then to the browser.
Layer 4 โ Persistent Data Stores
DynamoDB (Single Table)
- โธ PK: USER#{email} โ agent list per user
- โธ PK: AGENT#{id} / SK: CONFIG โ prompt, brand, chat title/logo
- โธ PK: AGENT#{id} / SK: SOURCE#* โ data sources
- โธ PK: ACTIVITY#{id} / SK: ENTRY#* โ full chat logs
- โธ PK: STATS#{id} / SK: DAY#* โ daily query metrics
Pinecone Vector DB
- โธ Namespace-per-agent isolation (no cross-contamination)
- โธ Amazon Titan v2 embeddings (1536-dim)
- โธ Top-5 cosine similarity ANN retrieval
- โธ Semantic Caching: Sub-50ms cache hits bypassing LLM
- โธ Sub-50ms query latency at scale
PostgreSQL Data Warehouse
- โธ Mandatory Row-Level Security (SET LOCAL app.current_tenant)
- โธ Text-to-SQL via Bedrock Nova structured output
- โธ Read-only session enforcement (blocks DROP/DELETE)
- โธ Cross-tenant data leakage structurally impossible
- โธ Enterprise-scale persistence and execution
OpenTelemetry + CloudWatch
- โธ Structured JSON Logging (TraceID/SpanID injection)
- โธ AWS X-Ray Distributed Waterfall Tracing
- โธ CloudWatch Logs: /ecs/vegarag-* log streams
- โธ CloudWatch Metrics: task CPU + memory graphs
- โธ ALB access logs for traffic + error analysis
Complete Request Lifecycle โ A to Z
Every single hop from the browser to Bedrock and back, in exact order.
User visits vegarag.com/chat?assistantId=bot_xxx
Browser hits Route 53 โ resolves to ALB โ HTTPS listener on port 443
ALB matches /chat/* priority rule โ forwards to TG-ChatUI on port 3001
Chat UI Next.js container (basePath=/chat) receives the request and serves the React app
React app boots, LangGraph SDK reads assistantId from URL query param
SDK fetches /chat/api/langgraph/info โ Next.js proxy returns {version} confirming proxy is alive
SDK calls POST /chat/api/langgraph/threads/search to load thread history
Next.js proxy hits backend GET /api/agents/{id}/activity โ groups by session_id โ returns thread list
User types a message and submits the chat form
SDK fires POST /chat/api/langgraph/threads/{thread_id}/runs/stream with message payload
Next.js proxy receives request, extracts query text + bot_id
Forwards to backend POST /api/chat as {bot_id, session_id, query} JSON body
FastAPI receives request, starts LangGraph StateGraph run
Entry node loads conversation history from DynamoDB, initializes GraphState
Intent Router node calls Bedrock Nova Lite with structured schema
LLM returns JSON: {intent: 'rag'|'sql'|'casual'} โ conditional edge dispatches to correct branch
RAG branch: Titan v2 embeds query โ Pinecone top-5 cosine ANN search
SQL branch: Nova generates SQL โ DuckDB executes in-memory โ structured rows returned
Context injected into Nova Pro prompt via <context>...</context> XML markers
FastAPI AsyncIterator chunks Bedrock token stream โ yields SSE events: data: {text: '...'}
Next.js proxy re-wraps backend SSE as valid LangGraph values events
Browser LangGraph SDK React hook receives events, updates message state, renders tokens progressively
Stream ends. SDK fires GET /chat/api/langgraph/threads/{id}/state
Proxy fetches full session from DynamoDB, reconstructs LangGraph message format with stable IDs
Deploy exactly how you want.
Choose between zero-config managed hosting up to 250k free tokens, or own your data completely by deploying the open-source platform into your AWS account.
The Free SaaS Trial
Test your agents instantly without touching AWS. We provide Amazon Bedrock compute and Pinecone vector hosting free for up to 250,000 tokens per month and 100MB of storage.
Self-Host Open Source
Use our exhaustive GitHub deployment instructions to provision 100% of the infrastructure inside your own AWS VPC using strict ECS Fargate zero-key Task IAM boundaries.