Open-source ยท Self-host on AWS Fargate

The open-source platform
for AI chat agents.

Deploy intelligent RAG chat agents powered by your own data. Start for free on our managed cloud, or self-host natively inside your own AWS account.

Amazon Bedrock Nova
PostgreSQL RLS
Pinecone Semantic Cache
Microsoft Presidio PII
AWS Fargate
Using VegaRAG? Tell us what you're building โ†’๐Ÿ“ 2-min Feedback Form๐Ÿ’ฌ GitHub Discussions
vegarag.com/chat

Live demo โ€” RAG chat with document upload, SQL querying, and streaming responses

Full Stack Architecture

How VegaRAG Actually Works

A production-grade RAG platform built on AWS Fargate, LangGraph StateGraph agents, Pinecone vector search, PostgreSQL Row-Level Security, and Amazon Bedrock Nova โ€” guarded by Microsoft Presidio PII redaction and dual-LLM hallucination checks.

AWS

Layer 1 โ€” Cloud Infrastructure

DNS + TLS

Route 53 + ACM

Custom domain with HTTPS. SSL certificate auto-renewed via ACM. All traffic encrypted in transit end-to-end.

ALB + Listener Rules

Application Load Balancer

Single ALB handles all traffic via priority rules: /api/* โ†’ Backend, /chat/* โ†’ Chat UI, /* โ†’ Frontend.

3 Services

ECS Fargate Cluster

Serverless containers โ€” no EC2 to manage. Auto-scales on demand, zero idle cost. Each service in its own task definition with separate IAM task roles.

Network Isolation

VPC + Security Groups

All services share one SG. Inbound: 80, 443, 3000, 3001, 8000. No hardcoded keys โ€” IAM Instance Profile auth only.

ECS

Layer 2 โ€” Three Fargate Microservices

Next.js 15 ยท Port 3000

Frontend Dashboard

  • AWS Cognito authentication (JWT + Refresh tokens)
  • Agent CRUD โ€” create, configure, deploy agents
  • Data ingestion: URLs, PDFs, CSVs, plain text
  • Workflow Studio (ReactFlow visual canvas)
  • Analytics charts, chat logs, deploy embed codes
  • Settings: system prompt, brand color, chat UI branding (title + logo per agent)
FastAPI + LangGraph ยท Port 8000

Backend AI Engine

  • LangGraph StateGraph orchestrates every agent run end-to-end
  • Semantic Caching (Pinecone-backed <50ms exact-match replies)
  • Token Bucket Rate Limiting (Multi-tenant noisy-neighbor protection)
  • Microsoft Presidio PII Redaction (SSN/Email anonymisation)
  • PostgreSQL Data Warehouse with mandatory Row-Level Security (RLS)
  • Output Guardrails (Dual-LLM entailment checks to block hallucinations)
  • Asynchronous Background Ingestion (No ALB timeouts on large PDFs)
Next.js 15 ยท Port 3001 ยท basePath /chat

Chat UI Service

  • LangGraph SDK React โ€” full streaming state management
  • Next.js API proxy at /chat/api/langgraph/* (server-side)
  • Translates LangGraph SDK wire calls โ†’ VegaRAG REST API
  • Thread history sidebar grouped by assistantId
  • Per-agent branding: custom name + logo from backend config
  • Artifact renderer for structured outputs, tool call display
  • SSE event reconstruction with stable message IDs
LG

Layer 3 โ€” LangGraph StateGraph Agent Topology

01
Entry Node

User query enters StateGraph. Session ID, bot_id, conversation history loaded from DynamoDB context.

02
Intent Router

Bedrock Nova Lite classifies as casual, rag, or sql using strict JSON schema structured output parsing.

03
Branch Executor

Conditional edge dispatches to RAG retriever node, SQL executor node, or direct casual LLM node.

04
Context Injector

Retrieved Pinecone chunks or DuckDB SQL rows injected into prompt as <context>...</context> XML markers.

05
SSE Streamer

Bedrock Nova Pro streams tokens. FastAPI yields each chunk as SSE to the Chat UI proxy and then to the browser.

DB

Layer 4 โ€” Persistent Data Stores

Primary Store

DynamoDB (Single Table)

  • โ–ธ PK: USER#{email} โ†’ agent list per user
  • โ–ธ PK: AGENT#{id} / SK: CONFIG โ†’ prompt, brand, chat title/logo
  • โ–ธ PK: AGENT#{id} / SK: SOURCE#* โ†’ data sources
  • โ–ธ PK: ACTIVITY#{id} / SK: ENTRY#* โ†’ full chat logs
  • โ–ธ PK: STATS#{id} / SK: DAY#* โ†’ daily query metrics
Semantic Engine

Pinecone Vector DB

  • โ–ธ Namespace-per-agent isolation (no cross-contamination)
  • โ–ธ Amazon Titan v2 embeddings (1536-dim)
  • โ–ธ Top-5 cosine similarity ANN retrieval
  • โ–ธ Semantic Caching: Sub-50ms cache hits bypassing LLM
  • โ–ธ Sub-50ms query latency at scale
SQL Analytics

PostgreSQL Data Warehouse

  • โ–ธ Mandatory Row-Level Security (SET LOCAL app.current_tenant)
  • โ–ธ Text-to-SQL via Bedrock Nova structured output
  • โ–ธ Read-only session enforcement (blocks DROP/DELETE)
  • โ–ธ Cross-tenant data leakage structurally impossible
  • โ–ธ Enterprise-scale persistence and execution
Observability

OpenTelemetry + CloudWatch

  • โ–ธ Structured JSON Logging (TraceID/SpanID injection)
  • โ–ธ AWS X-Ray Distributed Waterfall Tracing
  • โ–ธ CloudWatch Logs: /ecs/vegarag-* log streams
  • โ–ธ CloudWatch Metrics: task CPU + memory graphs
  • โ–ธ ALB access logs for traffic + error analysis

Complete Request Lifecycle โ€” A to Z

Every single hop from the browser to Bedrock and back, in exact order.

1

User visits vegarag.com/chat?assistantId=bot_xxx

Browser hits Route 53 โ†’ resolves to ALB โ†’ HTTPS listener on port 443

2

ALB matches /chat/* priority rule โ†’ forwards to TG-ChatUI on port 3001

Chat UI Next.js container (basePath=/chat) receives the request and serves the React app

3

React app boots, LangGraph SDK reads assistantId from URL query param

SDK fetches /chat/api/langgraph/info โ†’ Next.js proxy returns {version} confirming proxy is alive

4

SDK calls POST /chat/api/langgraph/threads/search to load thread history

Next.js proxy hits backend GET /api/agents/{id}/activity โ†’ groups by session_id โ†’ returns thread list

5

User types a message and submits the chat form

SDK fires POST /chat/api/langgraph/threads/{thread_id}/runs/stream with message payload

6

Next.js proxy receives request, extracts query text + bot_id

Forwards to backend POST /api/chat as {bot_id, session_id, query} JSON body

7

FastAPI receives request, starts LangGraph StateGraph run

Entry node loads conversation history from DynamoDB, initializes GraphState

8

Intent Router node calls Bedrock Nova Lite with structured schema

LLM returns JSON: {intent: 'rag'|'sql'|'casual'} โ€” conditional edge dispatches to correct branch

9

RAG branch: Titan v2 embeds query โ†’ Pinecone top-5 cosine ANN search

SQL branch: Nova generates SQL โ†’ DuckDB executes in-memory โ†’ structured rows returned

10

Context injected into Nova Pro prompt via <context>...</context> XML markers

FastAPI AsyncIterator chunks Bedrock token stream โ†’ yields SSE events: data: {text: '...'}

11

Next.js proxy re-wraps backend SSE as valid LangGraph values events

Browser LangGraph SDK React hook receives events, updates message state, renders tokens progressively

12

Stream ends. SDK fires GET /chat/api/langgraph/threads/{id}/state

Proxy fetches full session from DynamoDB, reconstructs LangGraph message format with stable IDs

Deploy exactly how you want.

Choose between zero-config managed hosting up to 250k free tokens, or own your data completely by deploying the open-source platform into your AWS account.

The Free SaaS Trial

Test your agents instantly without touching AWS. We provide Amazon Bedrock compute and Pinecone vector hosting free for up to 250,000 tokens per month and 100MB of storage.

Self-Host Open Source

Use our exhaustive GitHub deployment instructions to provision 100% of the infrastructure inside your own AWS VPC using strict ECS Fargate zero-key Task IAM boundaries.

Ready to deploy?

Free forever. Self-host on AWS Fargate in under an hour.

See Pricing โ†’