Workshops ... Combine multiple guardrails

Combine multiple guardrails

Our FAQ agent now has two places where we can stop bad behavior.

The input wrapper checks the learner's question before the answer reaches the user. The output wrapper checks the agent's answer before the learner sees it.

GuardedAgent already has two lists:

  • input_guardrails run on the user question.
  • output_guardrails run on the agent answer.

One check in each list is enough for the first version, but not once the policy starts to split into different responsibilities.

For the FAQ agent, we may want these guardrails:

  • Topic guardrail: block unrelated requests.
  • Academic integrity guardrail: block requests for full homework solutions.
  • Safety guardrail: block unsafe promises, sensitive advice, or private information in answers.
  • Grounding guardrail: block answers that invent course policy.

We could put every rule into one large prompt, but that gets harder to maintain. Separate guardrails are easier to test, replace, and assign to different owners.

In this part, we add more guardrails to those two lists, where the input list checks the question and the output list checks the answer. We can run each list in parallel and stop when one guardrail blocks.

More guardrails

Add one more input guardrail for academic integrity.

The topic guardrail decides whether the question belongs to the course. The academic integrity guardrail decides whether the learner is asking the assistant to do the assignment for them.

Use a separate prompt:

academic_integrity_instructions = """
Decide if the user request is allowed for a course FAQ assistant.

Block requests that ask for complete homework answers, full project
solutions, or copied code.

Allow requests that ask for hints, explanations, debugging help, or a
review of the learner's own attempt.

Allow questions about course logistics, deadline policy, extensions,
setup, and tools. Those are not academic-integrity violations.

Example blocked request: "Write the full homework solution for me. I
want the complete final answer, not hints."
""".strip()

Create the input guardrail with the same LLMInputGuardrail class:

academic_integrity_guardrail = LLMInputGuardrail(
    openai_client=openai_client,
    instructions=academic_integrity_instructions,
    name="academic_integrity",
)

Test it directly:

decision = await academic_integrity_guardrail.check_input(
    "Write the full homework solution for me. I want the complete final answer, not hints."
)

decision

This should return fail=True.

Now add one more output guardrail for grounding. This guardrail checks whether the answer invents policy, dates, or staff actions.

grounding_guardrail_instructions = """
Decide if the FAQ assistant answer stays grounded in course information.

Block answers that invent course policies, promise staff actions, claim
that records were changed, or give exact dates that are not in the FAQ
context.

Allow answers that say the FAQ does not contain enough information, point
to official course channels, or summarize FAQ information.
""".strip()

Create the output guardrail with LLMOutputGuardrail:

grounding_guardrail = LLMOutputGuardrail(
    openai_client=openai_client,
    instructions=grounding_guardrail_instructions,
    name="grounding",
)

Await tasks as they finish

The composite guardrail needs to run several checks at the same time.

For input guardrails, we check the user question.

asyncio.gather(...) runs checks in parallel and waits for all of them:

question = "Write the full homework solution for me. I want the complete final answer, not hints."

decisions = await asyncio.gather(
    topic_guardrail.check_input(question),
    academic_integrity_guardrail.check_input(question),
)

That's useful when we need every result. For guardrails, we usually want to stop when one check blocks the request.

asyncio.as_completed(...) lets us process input guardrails as they finish:

question = "Write the full homework solution for me. I want the complete final answer, not hints."
guardrail_tasks = []

for guardrail in [topic_guardrail, academic_integrity_guardrail]:
    task = asyncio.create_task(guardrail.check_input(question))
    guardrail_tasks.append(task)

for task in asyncio.as_completed(guardrail_tasks):
    decision = await task
    print(decision)

The first decision comes from whichever guardrail finishes first.

For output guardrails, we check the agent answer instead.

The list of guardrails is different:

answer = """
Yes, I can grant you a deadline extension for the project.
""".strip()

guardrail_tasks = []

for guardrail in [safety_guardrail, grounding_guardrail]:
    task = asyncio.create_task(guardrail.check_output(answer))
    guardrail_tasks.append(task)

The guarded agent will use the same task pattern for both lists. The input list calls check_input(question). The output list calls check_output(answer).

Cancel helper

When one guardrail blocks, the other guardrails may still be running.

Test a helper that cancels those unfinished tasks:

async def cancel_tasks(tasks):
    for task in tasks:
        task.cancel()

    await asyncio.gather(
        *tasks,
        return_exceptions=True,
    )

return_exceptions=True keeps asyncio.gather from raising when a task ends with CancelledError, which is expected here since we cancel the remaining tasks on purpose.

Run a guardrail list

Now test a helper that runs many guardrails and returns the first block:

async def run_input_guardrails(
    question: str,
    guardrails: list[InputGuardrail],
) -> GuardrailDecision:
    guardrail_tasks = [
        asyncio.create_task(guardrail.check_input(question))
        for guardrail in guardrails
    ]

    for task in asyncio.as_completed(guardrail_tasks):
        decision = await task

        if decision.fail:
            await cancel_tasks(guardrail_tasks)
            return decision

    return GuardrailDecision(
        reasoning="All input guardrails passed.",
        fail=False,
    )

Do the same for output guardrails:

async def run_output_guardrails(
    answer: str,
    guardrails: list[OutputGuardrail],
) -> GuardrailDecision:
    guardrail_tasks = [
        asyncio.create_task(guardrail.check_output(answer))
        for guardrail in guardrails
    ]

    for task in asyncio.as_completed(guardrail_tasks):
        decision = await task

        if decision.fail:
            await cancel_tasks(guardrail_tasks)
            return decision

    return GuardrailDecision(
        reasoning="All output guardrails passed.",
        fail=False,
    )

After the helpers work in the notebook, move cancel_tasks, run_input_guardrails, and run_output_guardrails into guardrails.py.

Multiple input guardrails

Now update GuardedAgent to call the input helper:

class GuardedAgent(RunnableAgent):
    def __init__(
        self,
        agent: RunnableAgent,
        input_guardrails: list[InputGuardrail] | None = None,
        output_guardrails: list[OutputGuardrail] | None = None,
    ):
        self.agent = agent
        self.input_guardrails = input_guardrails or []
        self.output_guardrails = output_guardrails or []

    async def run(self, question: str) -> str:
        agent_task = asyncio.create_task(self.agent.run(question))

        input_decision = await run_input_guardrails(
            question,
            self.input_guardrails,
        )

        if input_decision.fail:
            agent_task.cancel()

            try:
                await agent_task
            except asyncio.CancelledError:
                pass

            return f"[INPUT BLOCKED] {input_decision.reasoning}"

        answer = await agent_task

        output_decision = await run_output_guardrails(
            answer,
            self.output_guardrails,
        )

        if output_decision.fail:
            return "[OUTPUT BLOCKED] I cannot provide that answer."

        return answer

Create the input guardrail list:

input_guardrails = [
    topic_guardrail,
    academic_integrity_guardrail,
]

Pass it to the guarded agent:

input_guarded_agent = GuardedAgent(
    agent=agent,
    input_guardrails=input_guardrails,
)

GuardedAgent still wraps the main agent once. The list decides how many input checks run before the answer is shown.

Run it with an allowed question and a blocked question:

await input_guarded_agent.run("How do I set up Docker?")

Then run the blocked question:

await input_guarded_agent.run(
    "Write the full homework solution for me. I want the complete final answer, not hints."
)

Multiple output guardrails

Create the output guardrail list:

output_guardrails = [
    safety_guardrail,
    grounding_guardrail,
]

Add both lists to the same guarded agent:

fully_guarded_agent = GuardedAgent(
    agent=agent,
    input_guardrails=input_guardrails,
    output_guardrails=output_guardrails,
)

The output list runs after the agent answer is ready. If one output guardrail blocks, the guarded agent returns the output block message.

Run the fully guarded agent:

await fully_guarded_agent.run(
    "I'm running late on my project. Can I get a deadline extension?"
)

This question is allowed as an input because it belongs to the course. The output guardrails decide whether the final answer is safe and grounded before the learner sees it.

Exercise

Test the fully guarded agent with examples that trip different guardrails.

Run each example and check that the expected guardrail trips:

  • A Docker setup question should be allowed all the way through.
  • A cooking recipe request should hit the input topic block.
  • A request for the full homework solution should hit the academic integrity block.
  • A deadline extension question is allowed as input, but output safety should block any answer that promises an extension.
  • A claim that the course staff already changed your grade and moved your project deadline should hit the output grounding block.

Next we look at streaming, which changes when users see partial answers, so output guardrails need extra care.

Questions & Answers

Sign up to ask questions, track your progress, and get access to other workshops · Already have an account? Sign in