Add output guardrails
In the previous part, we started the input guardrail and the FAQ agent together.
If the input guardrail blocks the request, we cancel the FAQ agent. That stops it from spending more tokens on an answer we won't show.
That handles bad inputs, but we still need another check after the agent answers.
Output guardrails check the answer before the user sees it. We use them when the question is allowed, but the answer may still violate the product policy.
For example, a learner can ask about deadlines, and that question belongs to the course FAQ. But the assistant shouldn't promise a deadline extension.
In this lesson, we add output guardrails that reuse these classes from
guardrails.py:
GuardrailDecisionInputGuardrailLLMInputGuardrailGuardedAgentRunnableAgent
The only new code we write is a wrapper that checks the answer after the FAQ agent returns it.
Safety policy
We already have LLMInputGuardrail, and the output guardrail uses the
same structured-output pattern with a different method name.
For output checking, we only need different instructions:
safety_guardrail_instructions = """
Decide if the FAQ assistant answer is safe to show to the learner.
Block answers that promise deadline extensions, refunds, grades,
private student information, medical advice, legal advice, or full
homework solutions.
Allow answers that explain the official policy, point to the FAQ, or ask
the learner to contact course staff.
""".strip()
This policy is separate from the topic policy. A question can be on topic and still lead to an unsafe answer.
Output guardrail interface
Input and output guardrails protect different points in the agent flow:
- Input guardrails get the user question.
- Output guardrails get the agent answer.
That difference is important enough to make it visible in the interface.
Add this protocol to guardrails.py:
class OutputGuardrail(Protocol):
async def check_output(self, answer: str) -> GuardrailDecision:
...
Safety guardrail
The output guardrail uses the same structured decision as the input guardrail. It receives an answer instead of a question.
Add the LLM output guardrail to guardrails.py:
class LLMOutputGuardrail(OutputGuardrail):
def __init__(
self,
openai_client: AsyncOpenAI,
instructions: str,
name: str,
):
self.openai_client = openai_client
self.instructions = instructions
self.name = name
async def check_output(self, answer: str) -> GuardrailDecision:
print(f"[output:{self.name}] checking:", answer)
response = await self.openai_client.responses.parse(
model="gpt-4o-mini",
input=[
{"role": "developer", "content": self.instructions},
{"role": "user", "content": answer},
],
text_format=GuardrailDecision,
)
decision = response.output_parsed
print(f"[output:{self.name}] decision:", decision)
return decision
Create the safety guardrail:
from guardrails import LLMOutputGuardrail
safety_guardrail = LLMOutputGuardrail(
openai_client=openai_client,
instructions=safety_guardrail_instructions,
name="safety",
)
Test it before putting it behind the agent:
answer = """
Yes, I can grant you a deadline extension for the project.
""".strip()
decision = await safety_guardrail.check_output(answer)
decision
The guardrail should return fail=True for this answer.
Add output checks to the guarded agent
Now update GuardedAgent so it can receive both lists:
class GuardedAgent(RunnableAgent):
def __init__(
self,
agent: RunnableAgent,
input_guardrails: list[InputGuardrail] | None = None,
output_guardrails: list[OutputGuardrail] | None = None,
):
self.agent = agent
self.input_guardrails = input_guardrails or []
self.output_guardrails = output_guardrails or []
async def run(self, question: str) -> str:
guardrail_tasks = [
asyncio.create_task(guardrail.check_input(question))
for guardrail in self.input_guardrails
]
agent_task = asyncio.create_task(self.agent.run(question))
for task in asyncio.as_completed(guardrail_tasks):
decision = await task
if decision.fail:
agent_task.cancel()
try:
await agent_task
except asyncio.CancelledError:
pass
return f"[INPUT BLOCKED] {decision.reasoning}"
answer = await agent_task
for guardrail in self.output_guardrails:
decision = await guardrail.check_output(answer)
if decision.fail:
return "[OUTPUT BLOCKED] I cannot provide that answer."
return answer
This wrapper works even when the agent comes from another framework. The
framework can keep its own tool loop. GuardedAgent still runs checks
before and after that loop.
Create the output-guarded agent:
from guardrails import GuardedAgent
output_guarded_agent = GuardedAgent(
agent=agent,
output_guardrails=[safety_guardrail],
)
Run it with a deadline question:
await output_guarded_agent.run(
"I'm running late on my project. Can I get a deadline extension?"
)
This wrapper runs after the FAQ agent, so it doesn't stop the agent from doing work. It only stops unsafe answers from reaching the user.
Input and output
Input and output guardrails protect different points in the agent flow.
Input guardrails run before the FAQ agent:
input_guarded_agent = GuardedAgent(
agent=agent,
input_guardrails=[topic_guardrail],
)
Output guardrails run after the FAQ agent:
output_guarded_agent = GuardedAgent(
agent=agent,
output_guardrails=[safety_guardrail],
)
We can put both lists on one guarded agent:
fully_guarded_agent = GuardedAgent(
agent=agent,
input_guardrails=[topic_guardrail],
output_guardrails=[safety_guardrail],
)
fully_guarded_agent is still a RunnableAgent, and the input and output
guardrail lists only add work before and after the main agent call.
Exercise
Build the output guardrail and test it with answers that should pass and answers that should be blocked.
Use these example answers:
- To set up Docker, follow the course setup guide and check the FAQ if your container doesn't start.
- Yes, I can grant you a deadline extension for the project.
- Here's the full homework solution you can submit.
- The FAQ doesn't say whether extensions are available. Please contact course staff.
Then add safety_guardrail to GuardedAgent and try a deadline
question.
Show example
I'm running late on my project. Can I get a deadline extension?
The safe behavior blocks deadline-extension promises. The answer can still explain the official policy and point the learner to course staff.
After the wrapper works, move the updated GuardedAgent and
LLMOutputGuardrail into guardrails.py. Then import them back into the
notebook before creating the guarded agent.
Next we combine multiple guardrails, so topic checks, output checks, and academic integrity checks work together.