In Part 2 the agent had one tool: search. It could fix typos and decide when to search, but it still returned chunks and could not open a full document. Here we replace the single chunked index with two tools that mirror how humans read documentation: search for snippets, then open the document that looks most promising.

The chunking problem

Traditional RAG takes a big document, chunks it, and hopes the retrieved chunks contain what we need. If we retrieve chunks 2 and 4 out of 5, we miss information from 1, 3, and 5.

Think about how you actually read documentation:

Search for something.
Look at the snippets and titles.
Open the most relevant document.
Read through it to find what you need.

With agents we can replicate that flow. Instead of chunking upfront, we give the agent two tools:

search - returns short highlighted snippets so the agent can decide which document is worth opening
get_file - returns the full document when the agent wants to read it end-to-end

The agent stays in charge: it searches, looks at the snippets, picks the most promising filename, opens the file, and synthesizes the answer.

Re-indexing without chunking

We rebuild the index over full documents this time, not chunks. This replaces the chunked index from Part 1:

from gitsource import GithubRepositoryDataReader
from minsearch import Index

reader = GithubRepositoryDataReader(
    repo_owner="evidentlyai",
    repo_name="docs",
    allowed_extensions={"md", "mdx"},
)
files = reader.read()
parsed_docs = [doc.parse() for doc in files]

index = Index(
    text_fields=["title", "description", "content"],
    keyword_fields=["filename"],
)
index.fit(parsed_docs)

Highlighter: showing snippets instead of full documents

When we search, returning the full document is too much context. We want concise snippets that help the agent decide which documents to open. minsearch has a highlighter (the same idea as Lucene/Elasticsearch highlighting):

from minsearch import Highlighter, Tokenizer
from minsearch.tokenizer import DEFAULT_ENGLISH_STOP_WORDS

stopwords = DEFAULT_ENGLISH_STOP_WORDS | {"evidently"}
tokenizer = Tokenizer(stemmer="snowball", stop_words=stopwords)

highlighter = Highlighter(
    highlight_fields=["content"],
    max_matches=3,
    snippet_size=50,
    tokenizer=tokenizer,
)

Try it on a search query:

query = "how to create a dashboard"
search_results = index.search(query=query, num_results=5)

snippets = highlighter.highlight(query, search_results)
snippets[0]

Each result keeps the same fields, but content is replaced with a short list of snippets around the matched terms. What the parameters do:

highlight_fields - which fields to extract snippets from
max_matches=3 - up to 3 snippets per document
snippet_size=50 - roughly 50 tokens per snippet
stop_words - skip noise words; evidently appears everywhere in this corpus so we treat it as a stop word
stemmer="snowball" - matches word variations (use/using)

File index: opening full documents on demand

We need a way to look up the full content of a document by filename:

file_index = {
    doc["filename"]: doc["content"] for doc in parsed_docs
}

Verify it works:

filename = next(iter(file_index))
print(filename)
print(file_index[filename][:500])

The two-tool class

Both search and get_file need access to the same shared state - the index, the highlighter, the file index. A class holds that state in one place instead of relying on globals, and adding more tools later is a one-line change.

from typing import Any, Dict, List

class SearchTools:
    def __init__(
        self, index, highlighter, file_index: Dict[str, str]
    ):
        self.index = index
        self.highlighter = highlighter
        self.file_index = file_index

    def search(self, query: str) -> List[Dict[str, Any]]:
        """
        Search the index and return highlighted snippets.
        Use this to find relevant documents.
        Args:
            query: The search query.
        Returns:
            List of results with highlighted content snippets.
        """
        results = self.index.search(query=query, num_results=5)
        return self.highlighter.highlight(query, results)

    def get_file(self, filename: str) -> str:
        """
        Retrieve the full content of a file by filename.
        Use this when you need the complete document.
        Args:
            filename: The name of the file to retrieve.
        Returns:
            The full file contents.
        """
        return self.file_index.get(
            filename, f"file {filename} does not exist"
        )

Initialize and smoke-test both methods:

search_tools = SearchTools(index, highlighter, file_index)
snippets = search_tools.search("create a dashboard")
snippets[0]

Then fetch a full document by filename:

filename = snippets[0]["filename"]
search_tools.get_file(filename)[:500]

The first call returns snippets for decision-making. The second call returns the full document.

Wiring up the agent

Swap the single search function for the SearchTools instance:

from toyaikit.tools import Tools

agent_tools = Tools()
agent_tools.add_tools(search_tools)

Note add_tools (plural) - it discovers all public methods on the instance and registers them as tools.

Create the agent with minimal instructions first:

instructions = """
You're a documentation assistant.
Answer the user question using only the documentation knowledge base.
""".strip()

agent = OpenAIResponsesRunner(
    tools=agent_tools,
    developer_prompt=instructions,
    chat_interface=chat_interface,
    llm_client=llm_client,
)

Running with a simple prompt

Run the agent on a question and watch the tool calls:

result = agent.loop(
    "how can I create evidently dahsbord? show me the code",
    callback=runner_callback,
)

With minimal instructions, the agent often does the bare minimum: one search, then it answers from the snippets. Sometimes that is enough, but often the snippets do not contain the code or miss the part that actually answers the question.

Tightening the prompt

We push the agent toward more thorough behavior by being explicit about the stages we want:

instructions = """
You're a documentation assistant.
Answer the user question using only the documentation knowledge base.
Make 3 iterations:
1) First iteration:
   - Perform one search using the search tool to identify
     potentially relevant documents.
   - Explain (in 2-3 sentences) why this search query is
     appropriate for the user question.
2) Second iteration:
   - Analyze the results from the previous search.
   - Based on the filenames or documents returned, perform:
       - Up to 2 additional search queries to refine or expand
         coverage, and
       - One or more get_file calls to retrieve the full content
         of the most relevant documents.
   - For each search or get_file call, explain (in 2-3 sentences)
     why this action is necessary and how it helps answer the
     question.
3) Third iteration:
   - Analyze the retrieved document contents from get_file.
   - Synthesize the information into a final answer to the user.
IMPORTANT:
- At every step, explicitly explain your reasoning for each
  search query or file retrieval.
- Use only facts found in the documentation knowledge base.
- Do not introduce outside knowledge or assumptions.
- If the answer cannot be found in the retrieved documents,
  clearly inform the user.
Additional notes:
- The knowledge base is entirely about Evidently, so you do not
  need to include the word "evidently" in search queries.
- Prefer retrieving and analyzing full documents (via get_file)
  before producing the final answer.
""".strip()

Why three explicit stages? Without staging, the agent tends to stop early - one search, one answer. Naming the stages forces it to start broad, use the snippets to decide what to read in full, and only then commit to an answer grounded in the full documents. The 2-3 sentence explanation at each step makes the agent's reasoning visible so you can diagnose problems.

Recreate the agent with the new instructions:

agent = OpenAIResponsesRunner(
    tools=agent_tools,
    developer_prompt=instructions,
    chat_interface=chat_interface,
    llm_client=llm_client,
)

Then run the same question:

result = agent.loop(
    "how can I create evidently dahsbord? show me the code",
    callback=runner_callback,
)

This time the agent searches with a corrected query, looks at the snippets, calls get_file on the most promising filenames, and synthesizes the answer from the full content - including code examples from the actual document. This is agentic search: the agent decides what to open based on what it sees in the snippets, and reads only the documents it needs.

Continue with Part 4: From toyaikit to PydanticAI to migrate this agent to a production framework.

Part 3: Agentic search - going beyond RAG

The chunking problem

Re-indexing without chunking

Highlighter: showing snippets instead of full documents

File index: opening full documents on demand

The two-tool class

Wiring up the agent

Running with a simple prompt

Tightening the prompt

Questions & Answers

Part 3: Agentic search - going beyond RAG

The chunking problem

Re-indexing without chunking

Highlighter: showing snippets instead of full documents

File index: opening full documents on demand

The two-tool class

Wiring up the agent

Running with a simple prompt

Tightening the prompt

Questions & Answers (0)

Questions & Answers