Toyota Agentic RAG AI Assistant

View the source code on GitHub | Watch the Demo

Overview

This project is an agentic Retrieval-Augmented Generation (RAG) AI assistant designed and deployed for a Toyota web application. The assistant is capable of answering vehicle-specific support questions in real time; understanding the intent behind a query, determining whether it has enough information to answer, and intelligently asking follow-up questions when needed before retrieving a semantically accurate response.

The system was built using a LangGraph multi-turn pipeline, OpenAI's GPT-4o mini for language understanding and response generation, and MongoDB Atlas Vector Search for semantic retrieval from a knowledge base of Toyota support documents stored as vector embeddings.

How It Works

The assistant follows a structured, multi-step pipeline from the moment a user submits a query to the moment a response is delivered back to the frontend. Each step is handled by a dedicated node in the LangGraph agent graph.

LangGraph pipeline diagram showing the order of steps

Step 1: User Submits a Query

The pipeline begins when a user types a question into the Toyota web application's support interface. This query is passed into the LangGraph agent as the initial input to the graph.

Step 2: Follow-Up Question Check

Before retrieving any information, the LangGraph agent evaluates whether the query contains enough context to generate a useful answer.

Example — No follow-up needed:
"Does my 2008 Sienna have wireless CarPlay?"
This query includes both the model year (2008) and the vehicle model (Sienna), so the agent has all the information it needs to proceed directly to retrieval.

Example — Follow-up required:
"Does my car have wireless CarPlay?"
This query is missing the vehicle model and year. The agent detects the ambiguity and asks the user: "What is the model of your vehicle?" before continuing. This back-and-forth continues until all necessary context has been collected.

Once all required information is in hand, the agent proceeds to the next step.

Step 3: Query Embedding

With a fully-resolved query, the agent passes the question into an embeddings model — included in the project repository. This model converts the natural language query into a high-dimensional numeric vector (an embedding) that captures the semantic meaning of the question.

This vector representation is what allows the system to search for meaning, not just matching keywords. Two questions that are worded differently but mean the same thing will produce similar vectors, enabling accurate retrieval even when phrasing varies.

Step 4: MongoDB Atlas Vector Search

The query vector is sent to MongoDB Atlas Vector Search, which searches the Toyota support knowledge base for the most semantically similar documents. This database contains answers to a wide range of vehicle support questions — all pre-processed and stored as vector embeddings at indexing time.

The search returns the most relevant document(s) from the knowledge base, representing the best-matched answer to the user's query.

Step 5: Answer Generation (GPT-4o mini)

The original user query and the retrieved document(s) from the vector search are passed together into OpenAI's GPT-4o mini model as a prompt. GPT-4o mini acts as the answer generation layer. It takes the raw retrieved content and synthesizes it into a clear, natural-language response tailored to what the user asked. This specific model was chosen for its balance of strong language understanding capabilities and cost-effectiveness, making it ideal for real-time applications like this one.

This step is what distinguishes a RAG system from a plain database lookup: rather than returning a raw document, the LLM understands the context and formats the answer in a way that directly addresses the user's question.

Step 6: Response Delivered to the User

The generated response is returned through the pipeline and displayed on the Toyota web application's frontend, completing the interaction. The entire process happens in real time. From query submission to response delivery, the system is designed to operate with minimal latency, providing users with quick and accurate answers to their support questions.

Tech Stack

LangGraph — multi-turn agentic pipeline; each stage of the workflow (follow-up check, embedding, retrieval, generation) is a node in the graph
OpenAI GPT-4o mini — answer generation LLM that converts the retrieved document and user query into a natural-language response
OpenAI Embeddings Model — converts queries into vector representations for semantic search
MongoDB Atlas Vector Search — stores the Toyota knowledge base as vectors and performs semantic similarity search at query time

Key Highlights

Context-Aware Follow-Up Logic:
The agent does not blindly retrieve on every query. It first reasons about whether the query is complete enough to answer, and only proceeds once it has gathered sufficient context. This prevents incorrect or irrelevant responses caused by underspecified questions.

Semantic Retrieval Over Keyword Search:
By converting queries to embeddings before searching, the system understands the meaning of a question rather than looking for exact word matches. This makes it significantly more robust to phrasing variations and natural language diversity.