Moving Beyond Simple Prompting
Most AI agents fail because they blindly inject vector context into the prompt, even when the user asks a conversational question or a numerical data request. We quickly realized VegaRAG needed a deterministic routing mechanism: LangGraph.
The Architectural Switch to LangGraph
LangGraph transitions an LLM application from a linear chain to a cyclic state machine. Our state definition holds all requested data in the node transitions:
class AgentState(TypedDict):
bot_id: str
query: str
intent: str
context: str
sql_result: strThe Classification Step (Router Node)
Every incoming query from our FastAPI chat endpoint goes through an LLM router fueled by Amazon Nova Micro. Using a strict system prompt, we force the LLM to output exactly one word: casual, rag, or sql.
- Casual: Small talk, greetings, general knowledge. Skips retrieval entirely.
- RAG: Semantic knowledge retrieval. Hits Pinecone and returns text chunks.
- SQL: Triggers our DuckDB Text-to-SQL pipeline for analyzing tabular files uploaded by the user.
Streaming and the 'Retrieve-Only' Pattern
In our architecture, the LangGraph only routes and retrieves. It does not generate the final answer! Why? Because generating the answer synchronously inside the graph blocks Server-Sent Events (SSE) streaming to the client. Instead, the graph returns the intent, context, and sql_result back to the FastAPI endpoint, which then constructs the final prompt and streams the Bedrock Nova response byte-by-byte into the frontend.