Build the FAQ agent
In the previous lesson, we created a working search function over the
Data Engineering Zoomcamp FAQ. Now we give that function to the model as
a tool and write the FAQ agent around it.
We keep the agent framework agnostic. Later you can port the same ideas to PydanticAI, LangChain, the OpenAI Agents SDK, or another framework.
Define the tool
The model never sees the Python function, only a JSON schema:
search_tool_definition = {
"type": "function",
"name": "search",
"description": "Search the Data Engineering Zoomcamp FAQ.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for the FAQ.",
}
},
"required": ["query"],
"additionalProperties": False,
},
}
The schema describes the tool interface to the model. The model can
request a search call, but your Python code still decides which
function runs.
Most agent frameworks create this schema from a function signature, docstring, or decorator. Here we define it ourselves so the example stays framework agnostic.
Create the agent class
We want to use the class like this:
agent = Agent(
openai_client=openai_client,
tool_definitions=[(search, search_tool_definition)],
instructions=instructions,
model="gpt-4o-mini",
)
So the constructor needs the OpenAI client, the tools, the instructions, and the model name.
Build and test the pieces in the notebook first. Once the class works,
move it into a separate file named agent.py. The autoreload setup from
the environment lesson reloads the file after edits. Then import the
class back into the notebook.
Start with the imports:
import json
from collections.abc import Callable
from typing import Any, Protocol
from openai import AsyncOpenAI
from openai.types.responses import ResponseFunctionToolCall
We pass tools to the agent as pairs: the Python function and the tool definition the model sees.
The search tool goes in as:
(search, search_tool_definition)
We want everything typed, so we define a type for that pair:
ToolDefinition = tuple[Callable[..., Any], dict[str, Any]]
Next, we define the shared interface that the agent (and later the guarded agent) implement:
class RunnableAgent(Protocol):
async def run(self, question: str) -> str:
...
Add the Agent class constructor:
class Agent(RunnableAgent):
def __init__(
self,
openai_client: AsyncOpenAI,
tool_definitions: list[ToolDefinition],
instructions: str,
model: str,
) -> None:
self.openai_client = openai_client
self.instructions = instructions
self.model = model
self.functions: dict[str, Callable[..., Any]] = {}
self.tool_schemas: list[dict[str, Any]] = []
for function, schema in tool_definitions:
name = schema["name"]
if function.__name__ != name:
raise ValueError("Function name and tool schema name do not match")
self.functions[name] = function
self.tool_schemas.append(schema)
The constructor stores the OpenAI client, instructions, and model name for later calls.
It also splits each tool pair into two places:
self.tool_schemasgoes to the OpenAI API, so the model knows which tools it can request.self.functionsstays in Python, so the agent can run the requested function by name.
The name check prevents a quiet mismatch. If the schema says the tool is
called search, the Python function also needs to be named search.
Next, a helper method to invoke these tools:
def call_tool(self, call: ResponseFunctionToolCall) -> dict[str, str]:
args = json.loads(call.arguments)
function = self.functions[call.name]
result = function(**args)
return {
"type": "function_call_output",
"call_id": call.call_id,
"output": json.dumps(result),
}
Finally, we implement the run method that drives the agent:
async def run(self, question: str) -> str:
messages = [
{"role": "developer", "content": self.instructions},
{"role": "user", "content": question},
]
while True:
response = await self.openai_client.responses.create(
model=self.model,
input=messages,
tools=self.tool_schemas,
)
messages.extend(response.output)
has_function_calls = False
for entry in response.output:
if entry.type == "function_call":
print("function_call:", entry.name, entry.arguments)
result = self.call_tool(entry)
messages.append(result)
has_function_calls = True
if not has_function_calls:
return response.output_text
We call the loop inside run the "tool call loop" or "agentic loop", and
most tool-using agents follow the same pattern.
The loop does this:
- Send the current messages to the model.
- Check whether the model requested a tool call.
- If it did, run the Python function and send the result back to the model.
- Loop until there are no new tool calls, then return the final answer.
We won't cover the loop in detail here. For a basic intro to AI agents, check LLM Zoomcamp. For a deeper introduction, check AI Engineering Buildcamp: From RAG to Agents.
Initialize the FAQ agent
Go back to the notebook and define the developer instructions that tell the agent what to do:
instructions = """
You are a Data Engineering Zoomcamp FAQ assistant.
Answer course questions by using the search tool.
Use the FAQ results as your source of truth.
If the FAQ does not contain the answer, say that you do not know.
""".strip()
Now create the agent:
from agent import Agent
agent = Agent(
openai_client=openai_client,
tool_definitions=[(search, search_tool_definition)],
instructions=instructions,
model="gpt-4o-mini",
)
Exercise
Run the FAQ agent with questions that should work:
- Can I still join the course?
- How do I set up Docker for the course?
- Can I get a certificate in self-paced mode?
In Jupyter, call the async method with await.
Show example
await agent.run("Can I still join the course?")
Then run questions that a course assistant shouldn't answer:
- How do I cook pasta?
- Can you promise me a project deadline extension?
- Write the full homework solution for me.
Save the outputs or keep the notebook cells. In the next lessons, we use these cases to decide what the input and output guardrails should block.