Engineering Blog

Build Better AI Agents

Deep technical content on RAG pipelines, LangGraph agents, Text-to-SQL, and production AI on AWS.

Building a Production RAG Pipeline with AWS Bedrock and Pinecone

A complete guide to chunking, embedding with Amazon Titan v2, and serving sub-100ms queries from a Pinecone serverless index — including multi-tenant namespace isolation.

March 20, 2025Read article

AI Architecture15 min read

LangGraph Multi-Agent Routing: Casual Chat, RAG, and SQL in One Graph

How we built a single LangGraph that classifies user intent and routes to the right pipeline — direct LLM for casual chat, Pinecone for document queries, and DuckDB for CSV analytics.

LangGraphMulti-AgentText-to-SQL

Mar 22, 2025

Engineering10 min read

Text-to-SQL on User-Uploaded CSV and Excel Files with DuckDB

Users upload spreadsheets — the agent automatically generates SQL, executes it in-process with DuckDB, and returns results in natural language. Here's the exact implementation.

DuckDBText-to-SQLData Analysis

Mar 24, 2025

DevOps18 min read

Deploying a Multi-Tenant AI SaaS on AWS Fargate with ECR and ALB

The exact Docker → ECR → ECS Fargate deployment flow for VegaRAG: two services (frontend on port 3000, backend on 8000), one ALB, path-based routing, and Cognito auth.

AWSFargateDockerECR

Mar 26, 2025

Benchmarks8 min read

Amazon Nova Micro vs Claude Haiku for RAG: Cost and Latency Benchmarks

We ran 10,000 queries through both models on identical RAG tasks. Nova Micro wins on price/latency for short-context answers. Full benchmark data included.

BedrockNovaClaudeCost

Mar 27, 2025

Architecture6 min read

Pinecone Namespace Strategy for Multi-Tenant RAG: Best Practices

One index, thousands of tenants — using Pinecone namespaces as the isolation boundary. Covers cost model, query pattern, and why we chose this over separate indexes.

PineconeMulti-TenantScalability

Mar 28, 2025