Part 3: Make the instructions stronger
So far we can handle one function call by hand. A real agent loop needs to handle messages, function calls, tool results, and repeated API requests until the model stops asking for tools.
Start by giving the agent more explicit instructions:
developer_prompt = """
You're a course teaching assistant.
You're given a question from a course student and your task is to answer it.
If you want to look up the answer, explain why before making the call. Use as many
keywords from the user question as possible when making first requests.
Make multiple searches. Try to expand your search by using new keywords based on the results you
get from the search.
At the end, make a clarifying question based on what you presented and ask if there are
other areas that the user wants to explore.
""".strip()
This prompt changes the behavior. The model may now produce a short assistant message explaining that it will search, then produce one or more function calls.
Add a generic function-call helper
Instead of hard-coding search, create a helper that reads the function
name and arguments from the model output. This is still notebook code,
so it uses globals() for simplicity.
def make_call(call):
args = json.loads(call.arguments)
f_name = call.name
f = globals()[f_name]
result = f(**args)
result_json = json.dumps(result, indent=2)
return {
"type": "function_call_output",
"call_id": call.call_id,
"output": result_json,
}
The helper returns the exact object shape the Responses API expects for a tool result. That keeps the loop focused on control flow.
Gotcha: this
globals()lookup is fine for a notebook demonstration. In application code, use an explicit registry like{"search": search}so the model can only call functions you intentionally expose.
Process one model response
When a response comes back, append every output entry to
chat_messages. If the entry is a message, display it. If it is a
function call, run the function and append the result.
response = openai_client.responses.create(
model="gpt-4o-mini",
input=chat_messages,
tools=[search_tool],
)
has_function_calls = False
for entry in response.output:
chat_messages.append(entry)
if entry.type == "message":
print(entry.content[0].text)
if entry.type == "function_call":
print("function_call:", entry.name, entry.arguments)
result = make_call(entry)
chat_messages.append(result)
has_function_calls = True
The flag tells us whether the model needs another API call. If the
response contains a function call, the new chat_messages contains tool
output the model has not seen yet.
Repeat until there are no tool calls
Wrap the response processing in a loop. The loop stops only when the model returns messages without function calls.
while True:
response = openai_client.responses.create(
model="gpt-4o-mini",
input=chat_messages,
tools=[search_tool],
)
chat_messages.extend(response.output)
has_function_calls = False
for entry in response.output:
if entry.type == "message":
print(entry.content[0].text)
if entry.type == "function_call":
print("function_call:", entry.name, entry.arguments)
result = make_call(entry)
chat_messages.append(result)
has_function_calls = True
if not has_function_calls:
break
This is the core agent loop. The model reasons about the next action, your code performs the action, and the model sees the result on the next turn.
Turn it into a chat loop
The outer loop asks the user for the next question. The inner loop keeps calling the model until one user question is fully answered.
while True:
question = input()
if question == "stop":
break
message = {"role": "user", "content": question}
chat_messages.append(message)
while True:
response = openai_client.responses.create(
model="gpt-4o-mini",
input=chat_messages,
tools=[search_tool],
)
has_tool_calls = False
for entry in response.output:
chat_messages.append(entry)
if entry.type == "function_call":
print("function_call:", entry)
result = make_call(entry)
chat_messages.append(result)
has_tool_calls = True
elif entry.type == "message":
print(entry.content[0].text)
if not has_tool_calls:
break
This handwritten version is the best way to understand what frameworks hide for you later.
Note: with the Responses API, you append the model output objects directly. With the older Chat Completions API, the message format is different and role fields matter more.
RAG and text search
This FAQ assistant is already RAG: retrieval augmented generation. The
retrieval part is minsearch, and the generation part is the LLM answer
that uses the retrieved FAQ entries.
The retrieval backend can be text search, vector search, a database
query, an API call, or anything else. The agent pattern does not depend
on minsearch. In this workshop, search is a small function so we can
focus on tool use.