So far we can handle one function call by hand. A real agent loop needs to handle messages, function calls, tool results, and repeated API requests until the model stops asking for tools.

Start by giving the agent more explicit instructions:

developer_prompt = """
You're a course teaching assistant.
You're given a question from a course student and your task is to answer it.

If you want to look up the answer, explain why before making the call. Use as many
keywords from the user question as possible when making first requests.

Make multiple searches. Try to expand your search by using new keywords based on the results you
get from the search.

At the end, make a clarifying question based on what you presented and ask if there are
other areas that the user wants to explore.
""".strip()

This prompt changes the behavior. The model may now produce a short assistant message explaining that it will search, then produce one or more function calls.

Add a generic function-call helper

Instead of hard-coding search, create a helper that reads the function name and arguments from the model output. This is still notebook code, so it uses globals() for simplicity.

def make_call(call):
    args = json.loads(call.arguments)
    f_name = call.name
    f = globals()[f_name]
    result = f(**args)
    result_json = json.dumps(result, indent=2)
    return {
        "type": "function_call_output",
        "call_id": call.call_id,
        "output": result_json,
    }

The helper returns the exact object shape the Responses API expects for a tool result. That keeps the loop focused on control flow.

Gotcha: this globals() lookup is fine for a notebook demonstration. In application code, use an explicit registry like {"search": search} so the model can only call functions you intentionally expose.

Process one model response

When a response comes back, append every output entry to chat_messages. If the entry is a message, display it. If it is a function call, run the function and append the result.

response = openai_client.responses.create(
    model="gpt-4o-mini",
    input=chat_messages,
    tools=[search_tool],
)

has_function_calls = False

for entry in response.output:
    chat_messages.append(entry)

    if entry.type == "message":
        print(entry.content[0].text)

    if entry.type == "function_call":
        print("function_call:", entry.name, entry.arguments)
        result = make_call(entry)
        chat_messages.append(result)
        has_function_calls = True

The flag tells us whether the model needs another API call. If the response contains a function call, the new chat_messages contains tool output the model has not seen yet.

Repeat until there are no tool calls

Wrap the response processing in a loop. The loop stops only when the model returns messages without function calls.

while True:
    response = openai_client.responses.create(
        model="gpt-4o-mini",
        input=chat_messages,
        tools=[search_tool],
    )

    chat_messages.extend(response.output)
    has_function_calls = False

    for entry in response.output:
        if entry.type == "message":
            print(entry.content[0].text)

        if entry.type == "function_call":
            print("function_call:", entry.name, entry.arguments)
            result = make_call(entry)
            chat_messages.append(result)
            has_function_calls = True

    if not has_function_calls:
        break

This is the core agent loop. The model reasons about the next action, your code performs the action, and the model sees the result on the next turn.

Turn it into a chat loop

The outer loop asks the user for the next question. The inner loop keeps calling the model until one user question is fully answered.

while True:
    question = input()
    if question == "stop":
        break

    message = {"role": "user", "content": question}
    chat_messages.append(message)

    while True:
        response = openai_client.responses.create(
            model="gpt-4o-mini",
            input=chat_messages,
            tools=[search_tool],
        )

        has_tool_calls = False

        for entry in response.output:
            chat_messages.append(entry)

            if entry.type == "function_call":
                print("function_call:", entry)
                result = make_call(entry)
                chat_messages.append(result)
                has_tool_calls = True

            elif entry.type == "message":
                print(entry.content[0].text)

        if not has_tool_calls:
            break

This handwritten version is the best way to understand what frameworks hide for you later.

Note: with the Responses API, you append the model output objects directly. With the older Chat Completions API, the message format is different and role fields matter more.

RAG and text search

This FAQ assistant is already RAG: retrieval augmented generation. The retrieval part is minsearch, and the generation part is the LLM answer that uses the retrieved FAQ entries.

The retrieval backend can be text search, vector search, a database query, an API call, or anything else. The agent pattern does not depend on minsearch. In this workshop, search is a small function so we can focus on tool use.

Part 3: Make the instructions stronger

Add a generic function-call helper

Process one model response

Repeat until there are no tool calls

Turn it into a chat loop

RAG and text search

Questions & Answers (0)