Part 3: Agentic search - going beyond RAG
In Part 2 the agent had one tool: search. It could fix typos and decide when to search, but it still returned chunks and could not open a full document. Here we replace the single chunked index with two tools that mirror how humans read documentation: search for snippets, then open the document that looks most promising.
The chunking problem
Traditional RAG takes a big document, chunks it, and hopes the retrieved chunks contain what we need. If we retrieve chunks 2 and 4 out of 5, we miss information from 1, 3, and 5.
Think about how you actually read documentation:
- Search for something.
- Look at the snippets and titles.
- Open the most relevant document.
- Read through it to find what you need.
With agents we can replicate that flow. Instead of chunking upfront, we give the agent two tools:
search- returns short highlighted snippets so the agent can decide which document is worth openingget_file- returns the full document when the agent wants to read it end-to-end
The agent stays in charge: it searches, looks at the snippets, picks the most promising filename, opens the file, and synthesizes the answer.
Re-indexing without chunking
We rebuild the index over full documents this time, not chunks. This replaces the chunked index from Part 1:
from gitsource import GithubRepositoryDataReader
from minsearch import Index
reader = GithubRepositoryDataReader(
repo_owner="evidentlyai",
repo_name="docs",
allowed_extensions={"md", "mdx"},
)
files = reader.read()
parsed_docs = [doc.parse() for doc in files]
index = Index(
text_fields=["title", "description", "content"],
keyword_fields=["filename"],
)
index.fit(parsed_docs)
Highlighter: showing snippets instead of full documents
When we search, returning the full document is too much context. We want concise snippets that help the agent decide which documents to open. minsearch has a highlighter (the same idea as Lucene/Elasticsearch highlighting):
from minsearch import Highlighter, Tokenizer
from minsearch.tokenizer import DEFAULT_ENGLISH_STOP_WORDS
stopwords = DEFAULT_ENGLISH_STOP_WORDS | {"evidently"}
tokenizer = Tokenizer(stemmer="snowball", stop_words=stopwords)
highlighter = Highlighter(
highlight_fields=["content"],
max_matches=3,
snippet_size=50,
tokenizer=tokenizer,
)
Try it on a search query:
query = "how to create a dashboard"
search_results = index.search(query=query, num_results=5)
snippets = highlighter.highlight(query, search_results)
snippets[0]
Each result keeps the same fields, but content is replaced with a short list of snippets around the matched terms. What the parameters do:
highlight_fields- which fields to extract snippets frommax_matches=3- up to 3 snippets per documentsnippet_size=50- roughly 50 tokens per snippetstop_words- skip noise words;evidentlyappears everywhere in this corpus so we treat it as a stop wordstemmer="snowball"- matches word variations (use/using)
File index: opening full documents on demand
We need a way to look up the full content of a document by filename:
file_index = {
doc["filename"]: doc["content"] for doc in parsed_docs
}
Verify it works:
filename = next(iter(file_index))
print(filename)
print(file_index[filename][:500])
The two-tool class
Both search and get_file need access to the same shared state - the index, the highlighter, the file index. A class holds that state in one place instead of relying on globals, and adding more tools later is a one-line change.
from typing import Any, Dict, List
class SearchTools:
def __init__(
self, index, highlighter, file_index: Dict[str, str]
):
self.index = index
self.highlighter = highlighter
self.file_index = file_index
def search(self, query: str) -> List[Dict[str, Any]]:
"""
Search the index and return highlighted snippets.
Use this to find relevant documents.
Args:
query: The search query.
Returns:
List of results with highlighted content snippets.
"""
results = self.index.search(query=query, num_results=5)
return self.highlighter.highlight(query, results)
def get_file(self, filename: str) -> str:
"""
Retrieve the full content of a file by filename.
Use this when you need the complete document.
Args:
filename: The name of the file to retrieve.
Returns:
The full file contents.
"""
return self.file_index.get(
filename, f"file {filename} does not exist"
)
Initialize and smoke-test both methods:
search_tools = SearchTools(index, highlighter, file_index)
snippets = search_tools.search("create a dashboard")
snippets[0]
Then fetch a full document by filename:
filename = snippets[0]["filename"]
search_tools.get_file(filename)[:500]
The first call returns snippets for decision-making. The second call returns the full document.
Wiring up the agent
Swap the single search function for the SearchTools instance:
from toyaikit.tools import Tools
agent_tools = Tools()
agent_tools.add_tools(search_tools)
Note add_tools (plural) - it discovers all public methods on the instance and registers them as tools.
Create the agent with minimal instructions first:
instructions = """
You're a documentation assistant.
Answer the user question using only the documentation knowledge base.
""".strip()
agent = OpenAIResponsesRunner(
tools=agent_tools,
developer_prompt=instructions,
chat_interface=chat_interface,
llm_client=llm_client,
)
Running with a simple prompt
Run the agent on a question and watch the tool calls:
result = agent.loop(
"how can I create evidently dahsbord? show me the code",
callback=runner_callback,
)
With minimal instructions, the agent often does the bare minimum: one search, then it answers from the snippets. Sometimes that is enough, but often the snippets do not contain the code or miss the part that actually answers the question.
Tightening the prompt
We push the agent toward more thorough behavior by being explicit about the stages we want:
instructions = """
You're a documentation assistant.
Answer the user question using only the documentation knowledge base.
Make 3 iterations:
1) First iteration:
- Perform one search using the search tool to identify
potentially relevant documents.
- Explain (in 2-3 sentences) why this search query is
appropriate for the user question.
2) Second iteration:
- Analyze the results from the previous search.
- Based on the filenames or documents returned, perform:
- Up to 2 additional search queries to refine or expand
coverage, and
- One or more get_file calls to retrieve the full content
of the most relevant documents.
- For each search or get_file call, explain (in 2-3 sentences)
why this action is necessary and how it helps answer the
question.
3) Third iteration:
- Analyze the retrieved document contents from get_file.
- Synthesize the information into a final answer to the user.
IMPORTANT:
- At every step, explicitly explain your reasoning for each
search query or file retrieval.
- Use only facts found in the documentation knowledge base.
- Do not introduce outside knowledge or assumptions.
- If the answer cannot be found in the retrieved documents,
clearly inform the user.
Additional notes:
- The knowledge base is entirely about Evidently, so you do not
need to include the word "evidently" in search queries.
- Prefer retrieving and analyzing full documents (via get_file)
before producing the final answer.
""".strip()
Why three explicit stages? Without staging, the agent tends to stop early - one search, one answer. Naming the stages forces it to start broad, use the snippets to decide what to read in full, and only then commit to an answer grounded in the full documents. The 2-3 sentence explanation at each step makes the agent's reasoning visible so you can diagnose problems.
Recreate the agent with the new instructions:
agent = OpenAIResponsesRunner(
tools=agent_tools,
developer_prompt=instructions,
chat_interface=chat_interface,
llm_client=llm_client,
)
Then run the same question:
result = agent.loop(
"how can I create evidently dahsbord? show me the code",
callback=runner_callback,
)
This time the agent searches with a corrected query, looks at the snippets, calls get_file on the most promising filenames, and synthesizes the answer from the full content - including code examples from the actual document. This is agentic search: the agent decides what to open based on what it sees in the snippets, and reads only the documents it needs.
Continue with Part 4: From toyaikit to PydanticAI to migrate this agent to a production framework.