Workshops ... Overview and workshop outline

Overview and workshop outline

We start with a simple Data Engineering Zoomcamp FAQ bot. At first it's a normal RAG pipeline that searches the FAQ, builds a prompt, and asks the model to answer. Then we turn it into an agent with one tool and add guardrails around the agent loop.

The workshop stays in a notebook so the moving parts stay small enough to see what's happening. We still use the same patterns you would use in a real support bot.

Many ideas here come from the OpenAI Agents SDK guardrails. The SDK has built-in input and output guardrails. Here we implement the same pattern ourselves, so it works with any agent framework, including frameworks that don't include guardrails by default.

The FAQ bot

The bot answers questions about Data Engineering Zoomcamp.

It uses the FAQ JSON that DataTalks.Club publishes:

https://datatalks.club/faq/json/data-engineering-zoomcamp.json

Each FAQ entry has the course slug, the section, the question, and the answer. That gives us a clean knowledge base with real student questions.

We use three failure cases throughout the workshop:

  • The user asks about something unrelated.
  • The agent promises something it shouldn't.
  • The user asks for a homework answer instead of help.

Segment 1: Introduction to agents (25 minutes)

We start by comparing fixed RAG with agentic RAG.

A fixed RAG pipeline always follows the same path:

  • Search once.
  • Build a prompt.
  • Ask the model to answer.

An agent is different: we give the model one tool, search, and let it decide when to call it. We implement that loop ourselves so the control flow is visible before we add guardrails.

We leave 5 minutes for Q&A after this segment.

Segment 2: Build the FAQ agent (30 minutes)

Next we build the bot that the rest of the workshop protects. We set up the notebook and load the Data Engineering Zoomcamp FAQ JSON. Then we index it with minsearch and wrap search in one function.

In this part, we:

  • Set up the environment and dependencies.
  • Load the FAQ data.
  • Build the search index.
  • Try search directly before involving the model.
  • Define the search tool schema.
  • Wrap the OpenAI Responses API loop in a small class.
  • Try normal and off-topic questions.

Now the agent can answer course questions, but it can also drift. That gives us concrete examples to fix.

The exercise is to run the agent and test it with normal questions, off-topic questions, and unsafe requests.

We leave 5 minutes for Q&A after this segment, then take a 5 minute break.

Segment 3: Add input guardrails (35 minutes)

Now we add the first safety layer before await agent.run(...) runs. The input guardrail checks whether the user question belongs to the Data Engineering Zoomcamp assistant.

In this part, we:

  • Define the topic policy.
  • Return a structured Pydantic decision.
  • Block unrelated questions before the agent runs.
  • Return a clear message when the guardrail trips.

This stops the agent from spending tokens on cooking questions, generic advice, or prompt-injection attempts.

The exercise is to implement and test a topic guardrail that blocks off-topic questions.

We leave 5 minutes for Q&A after this segment.

Segment 4: Add output guardrails (35 minutes)

Input checks aren't enough on their own, because some questions are allowed but the answer can still be unsafe. For example, a student can ask about deadlines, but the assistant shouldn't promise an extension.

In this part, we:

  • Define the response safety policy.
  • Check the answer before the user sees it.
  • Block unsafe promises.
  • Keep safe FAQ answers working.

The key idea is simple: input guardrails decide whether the request may reach the agent. Output guardrails decide whether the answer may reach the user.

The exercise is to build a safety guardrail that blocks deadline extension promises.

We leave 5 minutes for Q&A after this segment, then take a 5 minute break.

Segment 5: Combine multiple guardrails (30 minutes)

Real systems usually need more than one rule. We split the checks so each guardrail has one job and a specific failure message.

In this part, we:

  • Chain multiple guardrails.
  • Add an academic integrity check.
  • Discuss streaming with guardrails.
  • Run checks concurrently when latency matters.

The first input guardrail runner is sequential, which is fine for some applications. But once we have several checks, waiting for each one before starting the FAQ agent adds latency.

So this segment also includes a short async lesson. We use asyncio.create_task to start the guardrail and FAQ agent together. If the guardrail blocks the request, we cancel the agent task.

The exercise is to test the fully guarded agent with streaming enabled.

We leave 5 minutes for Q&A after this segment.

Wrap-up and next steps (15 minutes)

By the end, we have a handwritten agent loop with guardrails around it. The same idea can move into PydanticAI, LangChain, OpenAI Agents SDK, or your own runner.

We close with the main takeaways, when to use each guardrail type, and the final Q&A.

Questions & Answers

Sign up to ask questions, track your progress, and get access to other workshops · Already have an account? Sign in