Build the FAQ agent

In the previous lesson, we created a working search function over the Data Engineering Zoomcamp FAQ. Now we give that function to the model as a tool and write the FAQ agent around it.

We keep the agent framework agnostic. Later you can port the same ideas to PydanticAI, LangChain, the OpenAI Agents SDK, or another framework.

Define the tool

The model never sees the Python function, only a JSON schema:

search_tool_definition = {
    "type": "function",
    "name": "search",
    "description": "Search the Data Engineering Zoomcamp FAQ.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query for the FAQ.",
            }
        },
        "required": ["query"],
        "additionalProperties": False,
    },
}

The schema describes the tool interface to the model. The model can request a search call, but your Python code still decides which function runs.

Most agent frameworks create this schema from a function signature, docstring, or decorator. Here we define it ourselves so the example stays framework agnostic.

Create the agent class

We want to use the class like this:

agent = Agent(
    openai_client=openai_client,
    tool_definitions=[(search, search_tool_definition)],
    instructions=instructions,
    model="gpt-4o-mini",
)

So the constructor needs the OpenAI client, the tools, the instructions, and the model name.

Build and test the pieces in the notebook first. Once the class works, move it into a separate file named agent.py. The autoreload setup from the environment lesson reloads the file after edits. Then import the class back into the notebook.

Start with the imports:

import json
from collections.abc import Callable
from typing import Any, Protocol

from openai import AsyncOpenAI
from openai.types.responses import ResponseFunctionToolCall

We pass tools to the agent as pairs: the Python function and the tool definition the model sees.

The search tool goes in as:

(search, search_tool_definition)

We want everything typed, so we define a type for that pair:

ToolDefinition = tuple[Callable[..., Any], dict[str, Any]]

Next, we define the shared interface that the agent (and later the guarded agent) implement:

class RunnableAgent(Protocol):
    async def run(self, question: str) -> str:
        ...

Add the Agent class constructor:

class Agent(RunnableAgent):
    def __init__(
        self,
        openai_client: AsyncOpenAI,
        tool_definitions: list[ToolDefinition],
        instructions: str,
        model: str,
    ) -> None:
        self.openai_client = openai_client
        self.instructions = instructions
        self.model = model

        self.functions: dict[str, Callable[..., Any]] = {}
        self.tool_schemas: list[dict[str, Any]] = []

        for function, schema in tool_definitions:
            name = schema["name"]

            if function.__name__ != name:
                raise ValueError("Function name and tool schema name do not match")

            self.functions[name] = function
            self.tool_schemas.append(schema)

The constructor stores the OpenAI client, instructions, and model name for later calls.

It also splits each tool pair into two places:

  • self.tool_schemas goes to the OpenAI API, so the model knows which tools it can request.
  • self.functions stays in Python, so the agent can run the requested function by name.

The name check prevents a quiet mismatch. If the schema says the tool is called search, the Python function also needs to be named search.

Next, a helper method to invoke these tools:

    def call_tool(self, call: ResponseFunctionToolCall) -> dict[str, str]:
        args = json.loads(call.arguments)
        function = self.functions[call.name]
        result = function(**args)

        return {
            "type": "function_call_output",
            "call_id": call.call_id,
            "output": json.dumps(result),
        }

Finally, we implement the run method that drives the agent:

    async def run(self, question: str) -> str:
        messages = [
            {"role": "developer", "content": self.instructions},
            {"role": "user", "content": question},
        ]

        while True:
            response = await self.openai_client.responses.create(
                model=self.model,
                input=messages,
                tools=self.tool_schemas,
            )

            messages.extend(response.output)
            has_function_calls = False

            for entry in response.output:
                if entry.type == "function_call":
                    print("function_call:", entry.name, entry.arguments)

                    result = self.call_tool(entry)
                    messages.append(result)
                    has_function_calls = True

            if not has_function_calls:
                return response.output_text

We call the loop inside run the "tool call loop" or "agentic loop", and most tool-using agents follow the same pattern.

The loop does this:

  • Send the current messages to the model.
  • Check whether the model requested a tool call.
  • If it did, run the Python function and send the result back to the model.
  • Loop until there are no new tool calls, then return the final answer.

We won't cover the loop in detail here. For a basic intro to AI agents, check LLM Zoomcamp. For a deeper introduction, check AI Engineering Buildcamp: From RAG to Agents.

Initialize the FAQ agent

Go back to the notebook and define the developer instructions that tell the agent what to do:

instructions = """
You are a Data Engineering Zoomcamp FAQ assistant.
Answer course questions by using the search tool.
Use the FAQ results as your source of truth.
If the FAQ does not contain the answer, say that you do not know.
""".strip()

Now create the agent:

from agent import Agent

agent = Agent(
    openai_client=openai_client,
    tool_definitions=[(search, search_tool_definition)],
    instructions=instructions,
    model="gpt-4o-mini",
)

Exercise

Run the FAQ agent with questions that should work:

  • Can I still join the course?
  • How do I set up Docker for the course?
  • Can I get a certificate in self-paced mode?

In Jupyter, call the async method with await.

Show example
await agent.run("Can I still join the course?")

Then run questions that a course assistant shouldn't answer:

  • How do I cook pasta?
  • Can you promise me a project deadline extension?
  • Write the full homework solution for me.

Save the outputs or keep the notebook cells. In the next lessons, we use these cases to decide what the input and output guardrails should block.

Questions & Answers

Sign up to ask questions, track your progress, and get access to other workshops · Already have an account? Sign in