Workshops ... Part 2: From RAG to an agent

Part 2: From RAG to an agent

In Part 1 we built a classic RAG pipeline and saw where it breaks: typos derail retrieval, chunking loses context, and the LLM has no way to go back for more information. Here we hand control to the LLM by turning search into a tool and letting the model decide when and how to use it.

The agentic flow

In an agent, the LLM decides what to do next. We give it tools and it chooses when and how to use them. The flow:

  1. The user asks a question.
  2. We send the question to the LLM along with the list of available tools.
  3. The LLM either replies directly or asks us to call one of the tools with specific arguments.
  4. If the LLM asks for a tool call, we run it and return the result.
  5. The LLM looks at the result and either calls another tool or replies.
  6. We repeat until the LLM produces a final answer.

What makes this agentic is step 3: the LLM, not us, decides when and how to search. Under the hood this is a small request-response loop called the agentic loop. We send messages, run tool calls, append results, and ask again until the model stops asking for tools.

Using toyaikit

We will not implement the agentic loop ourselves. We use toyaikit, a small framework built for teaching and workshops. It is simple enough to read the source in a few hours, and it shows how the loop actually works.

Install it:

uv add toyaikit

toyaikit requires tools to have type hints and docstrings. This information gets passed to the LLM so it knows when to call each function. Let's annotate search properly:

from typing import Any, Dict, List

def search(query: str) -> List[Dict[str, Any]]:
    """
    Search the documentation database.
    Args:
        query: The search query to look up in the index.
    Returns:
        List of matching documents.
    """
    return index.search(query=query, num_results=5)

Wrap the function in a Tools collection:

from toyaikit.tools import Tools

agent_tools = Tools()
agent_tools.add_tool(search)

Inspect the schema toyaikit generated:

agent_tools.get_tools()

Creating the agent

Set up the imports:

from toyaikit.llm import OpenAIClient
from toyaikit.chat.interface import IPythonChatInterface
from toyaikit.chat.runners import (
    OpenAIResponsesRunner,
    DisplayingRunnerCallback,
)

Initialize the helpers:

llm_client = OpenAIClient(
    model="gpt-4o-mini",
    client=openai_client,
)
chat_interface = IPythonChatInterface()
runner_callback = DisplayingRunnerCallback(
    chat_interface=chat_interface
)

Create the agent:

instructions = """
You're a documentation assistant.
Answer the user question using the documentation knowledge base.
Use only facts from the knowledge base when answering.
If you cannot find the answer, inform the user.
""".strip()

agent = OpenAIResponsesRunner(
    tools=agent_tools,
    developer_prompt=instructions,
    chat_interface=chat_interface,
    llm_client=llm_client,
)

Running on a single question

Notice the typo in the query - we will see that the agent quietly fixes it before searching:

result = agent.loop(
    "How do I create a dahsbord in Evidently?",
    callback=runner_callback,
)
print(result.last_message)

We solved the typo problem from Part 1. The LLM rewrote dahsbord into dashboard before calling the search tool. The agent can also run as an interactive chat (type stop to exit):

result = agent.run()

But we still have the chunking problem: if the right answer spans several chunks, we are guessing. The agent has one tool and one tool only. In Part 3: Agentic search - going beyond RAG we give it a second tool so it can open full documents on demand.

Questions & Answers

Sign in to ask questions