Building a Production RAG Pipeline with AWS Bedrock and Pinecone
A complete guide to chunking, embedding with Amazon Titan v2, and serving sub-100ms queries from a Pinecone serverless index — including multi-tenant namespace isolation.
Deep technical content on RAG pipelines, LangGraph agents, Text-to-SQL, and production AI on AWS.
A complete guide to chunking, embedding with Amazon Titan v2, and serving sub-100ms queries from a Pinecone serverless index — including multi-tenant namespace isolation.
How we built a single LangGraph that classifies user intent and routes to the right pipeline — direct LLM for casual chat, Pinecone for document queries, and DuckDB for CSV analytics.
Mar 22, 2025
Users upload spreadsheets — the agent automatically generates SQL, executes it in-process with DuckDB, and returns results in natural language. Here's the exact implementation.
Mar 24, 2025
The exact Docker → ECR → ECS Fargate deployment flow for VegaRAG: two services (frontend on port 3000, backend on 8000), one ALB, path-based routing, and Cognito auth.
Mar 26, 2025
We ran 10,000 queries through both models on identical RAG tasks. Nova Micro wins on price/latency for short-context answers. Full benchmark data included.
Mar 27, 2025
One index, thousands of tenants — using Pinecone namespaces as the isolation boundary. Covers cost model, query pattern, and why we chose this over separate indexes.
Mar 28, 2025