Agentic RAG

7 min readJun 9, 2024

What is Agentic RAG?

Agentic Retrieval-Augmented Generation (RAG) is an advanced framework designed to handle complex information retrieval tasks using a network of intelligent agents. These agents collaborate to perform nuanced tasks such as synthesizing information from multiple documents, summarizing content, and comparing data points across various sources. Agentic RAG infuses autonomy and intelligence into traditional retrieval systems, enabling them to act as passive tools and proactive entities that understand context, evaluate data quality, and make informed decisions.

Core Components and Their Functionalities

Agentic RAG is built around several key components that work in harmony to provide a seamless and efficient user experience:

Context-Aware Agents: These agents understand the broader conversational context, making interactions more coherent and responses more relevant.
Intelligent Retrieval Strategies: Unlike static retrieval rules, these strategies dynamically adapt to the query and contextual cues, ensuring the most relevant information is retrieved.
Multi-Agent Orchestration: Complex queries are handled by coordinating multiple specialized agents, each an expert in their respective domain, providing a holistic response.
Advanced Reasoning Capabilities: Agents in Agentic RAG are equipped with capabilities to evaluate, correct, and perform quality checks on the retrieved data.
Post-Generation Verification: This feature ensures the reliability of the information provided by verifying the generated content and selecting the best outputs.
Learning and Adaptability: Agentic RAG systems incorporate learning mechanisms that allow them to improve and adapt their performance over time.

Implementation Example with Langraph and LanceDB

To better understand the practical application of Agentic RAG, let’s explore a scenario utilizing Langraph for sophisticated orchestration alongside LanceDB for robust data management.

Langraph serves as the backbone of our agent orchestration, facilitating seamless interactions between various agents, each responsible for a specific aspect of the retrieval process. LanceDB, on the other hand, manages the storage and retrieval of document segments, ensuring that agents have access to the necessary data in real time.

Let’s take a look at the data. We will use export-import-related data for this use case

'https://content.dgft.gov.in/Website/CIEP.pdf',
'https://content.dgft.gov.in/Website/GAE.pdf',
'https://content.dgft.gov.in/Website/HTE.pdf

LanceDB as Retriever

In our first step, we are using LanceDB as a vector database to serve as a retriever. LanceDB is pivotal for efficiently storing all the necessary information.

We will begin by indexing the content of three PDF documents.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import LanceDB
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=100
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to lancedb 
vectorstore = LanceDB.from_documents(
    documents=doc_splits,
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

then create a retriever tool.

Given our specific use case related to export-import data, the retriever tool will need to be adaptable. We are working with three different PDFs, each containing unique content. Below, I have provided a brief overview of the information contained in each PDF

"Search and return information about customs import export procedure,GST & EXPORTS , How to export",
)

Our smart retrieval system knows which PDF a user is asking about, so it can quickly find and return that PDF to answer the user’s question



retriever_tool = create_retriever_tool(
    retriever,
    "retrieve_blog_posts",
    "Search and return information about customs import export procedure,GST & EXPORTS , How to export",
)

tools = [retriever_tool]
tool_executor = ToolExecutor(tools)

Now, let’s implement it with Langraph. For more details, please refer to the Langraph documentation.

In this graph system, each node represents a decision or a processing step, and each edge represents the possible paths that the agent can take based on conditions evaluated at each node.

2. Agent State

The AgentState is essentially a dynamically updated record of the process. As the agent progresses through the graph:

State: The state is represented as a list of messages. These messages can contain any data that nodes might need to process or that might be generated as a result of processing.
Updating State: Nodes in the graph modify the state by appending messages to it. This is an accumulative process where the state grows as the agent moves from one node to another.

from typing import Annotated, Sequence, TypedDict

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages


class AgentState(TypedDict):
    # The add_messages function defines how an update should be processed
    # Default is to replace. add_messages says "append"
    messages: Annotated[Sequence[BaseMessage], add_messages]

Nodes and Edges in the Graph

Nodes: Each node in the graph processes the current state or adds new information to it. When a node processes the state, it appends new messages which can be simple text messages, data objects, or anything that fits within the defined structure of a message.
Edges: Edges are conditional; they determine the path the agent takes based on the current state. For example, if a message indicates a certain condition is met, the agent may take one path; otherwise, it takes another.

from typing import Annotated, Literal, Sequence, TypedDict

from langchain import hub
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import tools_condition
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

### Edges


def grade_documents(state) -> Literal["generate", "rewrite"]:
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (messages): The current state

    Returns:
        str: A decision for whether the documents are relevant or not
    """

    print("---CHECK RELEVANCE---")

    # Data model
    class grade(BaseModel):
        """Binary score for relevance check."""

        binary_score: str = Field(description="Relevance score 'yes' or 'no'")

    # LLM
    model = ChatOpenAI(temperature=0, model="gpt-4-0125-preview", streaming=True)

    # LLM with tool and validation
    llm_with_tool = model.with_structured_output(grade)

    # Prompt
    prompt = PromptTemplate(
        template="""You are a grader assessing relevance of a retrieved document to a user question. \n 
        Here is the retrieved document: \n\n {context} \n\n
        Here is the user question: {question} \n
        If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
        Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.""",
        input_variables=["context", "question"],
    )

    # Chain
    chain = prompt | llm_with_tool

    messages = state["messages"]
    last_message = messages[-1]

    question = messages[0].content
    docs = last_message.content

    scored_result = chain.invoke({"question": question, "context": docs})

    score = scored_result.binary_score

    if score == "yes":
        print("---DECISION: DOCS RELEVANT---")
        return "generate"

    else:
        print("---DECISION: DOCS NOT RELEVANT---")
        print(score)
        return "rewrite"


### Nodes


def agent(state):
    """
    Invokes the agent model to generate a response based on the current state. Given
    the question, it will decide to retrieve using the retriever tool, or simply end.

    Args:
        state (messages): The current state

    Returns:
        dict: The updated state with the agent response appended to messages
    """
    print("---CALL AGENT---")
    messages = state["messages"]
    model = ChatOpenAI(temperature=0, streaming=True, model="gpt-4-turbo")
    model = model.bind_tools(tools)
    response = model.invoke(messages)
    # We return a list, because this will get added to the existing list
    return {"messages": [response]}


def rewrite(state):
    """
    Transform the query to produce a better question.

    Args:
        state (messages): The current state

    Returns:
        dict: The updated state with re-phrased question
    """

    print("---TRANSFORM QUERY---")
    messages = state["messages"]
    question = messages[0].content

    msg = [
        HumanMessage(
            content=f""" \n 
    Look at the input and try to reason about the underlying semantic intent / meaning. \n 
    Here is the initial question:
    \n ------- \n
    {question} 
    \n ------- \n
    Formulate an improved question: """,
        )
    ]

    # Grader
    model = ChatOpenAI(temperature=0, model="gpt-4-0125-preview", streaming=True)
    response = model.invoke(msg)
    return {"messages": [response]}


def generate(state):
    """
    Generate answer

    Args:
        state (messages): The current state

    Returns:
         dict: The updated state with re-phrased question
    """
    print("---GENERATE---")
    messages = state["messages"]
    question = messages[0].content
    last_message = messages[-1]

    question = messages[0].content
    docs = last_message.content

    # Prompt
    prompt = hub.pull("rlm/rag-prompt")

    # LLM
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, streaming=True)

    # Post-processing
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    # Chain
    rag_chain = prompt | llm | StrOutputParser()

    # Run
    response = rag_chain.invoke({"context": docs, "question": question})
    return {"messages": [response]}


print("*" * 20 + "prompt[rlm/rag-prompt]" + "*" * 20)
prompt = hub.pull("rlm/rag-prompt").pretty_print()  # Show what the prompt looks like

Graph

To begin with, let’s start with an agent called call_model. The agent decides whether to call a function. If it decides to do so, it takes action by calling a tool (retriever). Subsequently, it calls the agent with the tool's output added to the messages (state).

Below is a detailed breakdown of the workflow implementation:

from langgraph.graph import END, StateGraph
from langgraph.prebuilt import ToolNode

# Define a new graph
workflow = StateGraph(AgentState)

# Define the nodes we will cycle between
workflow.add_node("agent", agent)  # agent
retrieve = ToolNode([retriever_tool])
workflow.add_node("retrieve", retrieve)  # retrieval
workflow.add_node("rewrite", rewrite)  # Re-writing the question
workflow.add_node(
    "generate", generate
)  # Generating a response after we know the documents are relevant
# Call agent node to decide to retrieve or not
workflow.set_entry_point("agent")

# Decide whether to retrieve
workflow.add_conditional_edges(
    "agent",
    # Assess agent decision
    tools_condition,
    {
        # Translate the condition outputs to nodes in our graph
        "tools": "retrieve",
        END: END,
    },
)

# Edges taken after the `action` node is called.
workflow.add_conditional_edges(
    "retrieve",
    # Assess agent decision
    grade_documents,
)
workflow.add_edge("generate", END)
workflow.add_edge("rewrite", "agent")

# Compile
graph = workflow.compile()

import pprint

inputs = {
    "messages": [
        ("user", "explain me in short what is PM Gati Shakti National Master Plan (NMP)?"),
    ]
}
for output in graph.stream(inputs):
    for key, value in output.items():
        pprint.pprint(f"Output from node '{key}':")
        pprint.pprint("---")
        pprint.pprint(value, indent=2, width=80, depth=None)
    pprint.pprint("\n---\n")

You can try running this code on Google Colab, where I have added support for Gradio to make it easier for you to use. Gradio provides a simple way to create UIs for your machine-learning models, making it more interactive and user-friendly.