Back to all articles
AI Architecture 15 min readMarch 22, 2025

LangGraph Multi-Agent Routing: Casual Chat, RAG, and SQL in One Graph

How we built a single LangGraph that classifies user intent and routes to the right pipeline — direct LLM for casual chat, Pinecone for document queries, and DuckDB for CSV analytics.

LangGraph Multi-Agent Text-to-SQL

Moving Beyond Simple Prompting

Most AI agents fail because they blindly inject vector context into the prompt, even when the user asks a conversational question or a numerical data request. We quickly realized VegaRAG needed a deterministic routing mechanism: LangGraph.

The Architectural Switch to LangGraph

LangGraph transitions an LLM application from a linear chain to a cyclic state machine. Our state definition holds all requested data in the node transitions:

class AgentState(TypedDict):
    bot_id: str
    query: str
    intent: str
    context: str
    sql_result: str

The Classification Step (Router Node)

Every incoming query from our FastAPI chat endpoint goes through an LLM router fueled by Amazon Nova Micro. Using a strict system prompt, we force the LLM to output exactly one word: casual, rag, or sql.

  • Casual: Small talk, greetings, general knowledge. Skips retrieval entirely.
  • RAG: Semantic knowledge retrieval. Hits Pinecone and returns text chunks.
  • SQL: Triggers our DuckDB Text-to-SQL pipeline for analyzing tabular files uploaded by the user.

Streaming and the 'Retrieve-Only' Pattern

In our architecture, the LangGraph only routes and retrieves. It does not generate the final answer! Why? Because generating the answer synchronously inside the graph blocks Server-Sent Events (SSE) streaming to the client. Instead, the graph returns the intent, context, and sql_result back to the FastAPI endpoint, which then constructs the final prompt and streams the Bedrock Nova response byte-by-byte into the frontend.

Build exactly what you just read.

VegaRAG is entirely open-source and ready for production on AWS.