Open-source · Self-host on AWS Fargate

The open-source platform
for AI chat agents.

Deploy intelligent RAG chat agents powered by your own data. Start for free on our managed cloud, or self-host natively inside your own AWS account.

Deploy Open Source 📝 Share Feedback

Amazon Bedrock Nova

PostgreSQL RLS

Pinecone Semantic Cache

Microsoft Presidio PII

AWS Fargate

Using VegaRAG? Tell us what you're building →📝 2-min Feedback Form 💬 GitHub Discussions

vegarag.com/chat

Live demo — RAG chat with document upload, SQL querying, and streaming responses

Full Stack Architecture

How VegaRAG Actually Works

A production-grade RAG platform built on AWS Fargate, LangGraph StateGraph agents, Pinecone vector search, PostgreSQL Row-Level Security, and Amazon Bedrock Nova — guarded by Microsoft Presidio PII redaction and dual-LLM hallucination checks.

AWS

Layer 1 — Cloud Infrastructure

DNS + TLS

Route 53 + ACM

Custom domain with HTTPS. SSL certificate auto-renewed via ACM. All traffic encrypted in transit end-to-end.

ALB + Listener Rules

Application Load Balancer

Single ALB handles all traffic via priority rules: /api/* → Backend, /chat/* → Chat UI, /* → Frontend.

3 Services

ECS Fargate Cluster

Serverless containers — no EC2 to manage. Auto-scales on demand, zero idle cost. Each service in its own task definition with separate IAM task roles.

Network Isolation

VPC + Security Groups

All services share one SG. Inbound: 80, 443, 3000, 3001, 8000. No hardcoded keys — IAM Instance Profile auth only.

ECS

Layer 2 — Three Fargate Microservices

Next.js 15 · Port 3000

Frontend Dashboard

AWS Cognito authentication (JWT + Refresh tokens)
Agent CRUD — create, configure, deploy agents
Data ingestion: URLs, PDFs, CSVs, plain text
Workflow Studio (ReactFlow visual canvas)
Analytics charts, chat logs, deploy embed codes
Settings: system prompt, brand color, chat UI branding (title + logo per agent)

FastAPI + LangGraph · Port 8000

Backend AI Engine

LangGraph StateGraph orchestrates every agent run end-to-end
Semantic Caching (Pinecone-backed <50ms exact-match replies)
Token Bucket Rate Limiting (Multi-tenant noisy-neighbor protection)
Microsoft Presidio PII Redaction (SSN/Email anonymisation)
PostgreSQL Data Warehouse with mandatory Row-Level Security (RLS)
Output Guardrails (Dual-LLM entailment checks to block hallucinations)
Asynchronous Background Ingestion (No ALB timeouts on large PDFs)

Next.js 15 · Port 3001 · basePath /chat

Chat UI Service

LangGraph SDK React — full streaming state management
Next.js API proxy at /chat/api/langgraph/* (server-side)
Translates LangGraph SDK wire calls → VegaRAG REST API
Thread history sidebar grouped by assistantId
Per-agent branding: custom name + logo from backend config
Artifact renderer for structured outputs, tool call display
SSE event reconstruction with stable message IDs

Layer 3 — LangGraph StateGraph Agent Topology

Entry Node

User query enters StateGraph. Session ID, bot_id, conversation history loaded from DynamoDB context.

→

Intent Router

Bedrock Nova Lite classifies as casual, rag, or sql using strict JSON schema structured output parsing.

→

Branch Executor

Conditional edge dispatches to RAG retriever node, SQL executor node, or direct casual LLM node.

→

Context Injector

Retrieved Pinecone chunks or DuckDB SQL rows injected into prompt as <context>...</context> XML markers.

→

SSE Streamer

Bedrock Nova Pro streams tokens. FastAPI yields each chunk as SSE to the Chat UI proxy and then to the browser.

Layer 4 — Persistent Data Stores

Primary Store

DynamoDB (Single Table)

▸ PK: USER#{email} → agent list per user
▸ PK: AGENT#{id} / SK: CONFIG → prompt, brand, chat title/logo
▸ PK: AGENT#{id} / SK: SOURCE#* → data sources
▸ PK: ACTIVITY#{id} / SK: ENTRY#* → full chat logs
▸ PK: STATS#{id} / SK: DAY#* → daily query metrics

Semantic Engine

Pinecone Vector DB

▸ Namespace-per-agent isolation (no cross-contamination)
▸ Amazon Titan v2 embeddings (1536-dim)
▸ Top-5 cosine similarity ANN retrieval
▸ Semantic Caching: Sub-50ms cache hits bypassing LLM
▸ Sub-50ms query latency at scale

SQL Analytics

PostgreSQL Data Warehouse

▸ Mandatory Row-Level Security (SET LOCAL app.current_tenant)
▸ Text-to-SQL via Bedrock Nova structured output
▸ Read-only session enforcement (blocks DROP/DELETE)
▸ Cross-tenant data leakage structurally impossible
▸ Enterprise-scale persistence and execution

Observability

OpenTelemetry + CloudWatch

▸ Structured JSON Logging (TraceID/SpanID injection)
▸ AWS X-Ray Distributed Waterfall Tracing
▸ CloudWatch Logs: /ecs/vegarag-* log streams
▸ CloudWatch Metrics: task CPU + memory graphs
▸ ALB access logs for traffic + error analysis

Complete Request Lifecycle — A to Z

Every single hop from the browser to Bedrock and back, in exact order.

User visits vegarag.com/chat?assistantId=bot_xxx

Browser hits Route 53 → resolves to ALB → HTTPS listener on port 443

ALB matches /chat/* priority rule → forwards to TG-ChatUI on port 3001

Chat UI Next.js container (basePath=/chat) receives the request and serves the React app

React app boots, LangGraph SDK reads assistantId from URL query param

SDK fetches /chat/api/langgraph/info → Next.js proxy returns {version} confirming proxy is alive

SDK calls POST /chat/api/langgraph/threads/search to load thread history

Next.js proxy hits backend GET /api/agents/{id}/activity → groups by session_id → returns thread list

User types a message and submits the chat form

SDK fires POST /chat/api/langgraph/threads/{thread_id}/runs/stream with message payload

Next.js proxy receives request, extracts query text + bot_id

Forwards to backend POST /api/chat as {bot_id, session_id, query} JSON body

FastAPI receives request, starts LangGraph StateGraph run

Entry node loads conversation history from DynamoDB, initializes GraphState

Intent Router node calls Bedrock Nova Lite with structured schema

LLM returns JSON: {intent: 'rag'|'sql'|'casual'} — conditional edge dispatches to correct branch

RAG branch: Titan v2 embeds query → Pinecone top-5 cosine ANN search

SQL branch: Nova generates SQL → DuckDB executes in-memory → structured rows returned

Context injected into Nova Pro prompt via <context>...</context> XML markers

FastAPI AsyncIterator chunks Bedrock token stream → yields SSE events: data: {text: '...'}

Next.js proxy re-wraps backend SSE as valid LangGraph values events

Browser LangGraph SDK React hook receives events, updates message state, renders tokens progressively

Stream ends. SDK fires GET /chat/api/langgraph/threads/{id}/state

Proxy fetches full session from DynamoDB, reconstructs LangGraph message format with stable IDs

See the full technical breakdown

Deploy exactly how you want.

Choose between zero-config managed hosting up to 250k free tokens, or own your data completely by deploying the open-source platform into your AWS account.

The Free SaaS Trial

Test your agents instantly without touching AWS. We provide Amazon Bedrock compute and Pinecone vector hosting free for up to 250,000 tokens per month and 100MB of storage.

Self-Host Open Source

Use our exhaustive GitHub deployment instructions to provision 100% of the infrastructure inside your own AWS VPC using strict ECS Fargate zero-key Task IAM boundaries.

Read the Deploy Guide

Ready to deploy?

Free forever. Self-host on AWS Fargate in under an hour.

See Pricing →

The open-source platformfor AI chat agents.