Summary and next steps

We build the same safety idea in several forms. Start with the SDK guardrails when your framework supports them. Reach for tool-based or DIY guardrails when you need the pattern in a different framework.

Takeaways

These are the main points:

  1. Guardrails can be agents. They use LLMs to make pass/fail decisions.
  2. Input guardrails block irrelevant or harmful input before the main agent sees it.
  3. Output guardrails catch unsafe responses before the user sees them.
  4. Tool-based guardrails work in any framework that supports tools.
  5. Parallel guardrails reduce added latency.
  6. Cancellation saves tokens when a guardrail trips before the agent finishes.

We saw each point play out in the FAQ assistant. The same base agent could answer course questions, drift into a pizza recipe, or produce a risky policy answer. Each guardrail moved one of those outcomes into an explicit check.

Picking a guardrail type

Use input guardrails for checks on the user message:

  • Topic restriction, such as "only answer about this course".
  • PII filtering before the agent sees the input.
  • Blocking requests that shouldn't start an agent run.

Use output guardrails for checks on the agent response:

  • Deadline extensions, refunds, legal advice, or medical advice.
  • Offensive language.
  • Personal information leakage.
  • Response format or style policies.
  • Homework or exam answers.

Use both when the application has meaningful user-facing risk, and a course assistant is a mild example of that risk. Customer support, medical triage, financial advice, and internal tools with private data need stricter versions of the same pattern.

Framework choices

The OpenAI Agents SDK gives you first-class input and output guardrail hooks. That's the simplest option when you're already using the SDK.

The tool-based approach moves across frameworks more easily. It relies on the agent following the instruction to call the guardrail tool first. Use it when you need something quick or when the framework has no guardrail hook.

The DIY asyncio runner is useful when you have async agent calls and want cancellation. It takes more code.

In return, you control the run mechanics directly:

  • Start the agent and guardrails together.
  • Return the agent result when all checks pass.
  • Cancel unfinished work when a guardrail raises.
  • Let the caller decide how to present the blocked reason.

To learn more

Two concrete references are useful after the workshop:

We run this workshop as part of AI Bootcamp: From RAG to Agents. For a deeper walkthrough of docs.py and the first-agent setup, see the AI Hero email course linked from the repo.

The next topic after guardrails is building Skills.md from scratch.

Questions & Answers

Sign up to ask questions, track your progress, and get access to other workshops · Already have an account? Sign in