Back to Event Recordings
Workshop Resource

Build a Production-Ready YouTube AI Agent with Temporal

Build a durable data ingestion pipeline, handle IP blocking with proxies, index transcripts into ElasticSearch, and design a multi-stage research agent with Temporal orchestration.

December 16, 2025Advanced
ai-agentsdata-engineeringtemporalelasticsearchproduction-systems

Timestamps

Click any timestamp to jump to that moment in the video

00:00
Introduction & Project Overview

Welcome and overview of building a production-ready YouTube AI agent with Temporal orchestration

02:15
Environment Setup (GitHub Codespaces vs. Local)

Setting up the development environment, comparing GitHub Codespaces and local setup options

07:13
Part 1: Data Ingestion Workflow Setup

Beginning the data ingestion workflow: planning and initial setup

12:00
Formatting Transcripts & Subtitles

Processing and formatting YouTube transcripts and subtitles for indexing

16:05
Deploying ElasticSearch with Docker

Setting up ElasticSearch using Docker for search infrastructure

19:39
Configuring ElasticSearch Indices & Stop Words

Configuring ElasticSearch indices, mappings, and stop words for optimal search

30:52
Iterating Through Videos & Progress Tracking

Building the video iteration logic and implementing progress tracking

35:47
The Challenge: Handling Real-Time YouTube IP Blocking

Understanding the IP blocking problem when scraping YouTube at scale

36:52
The Solution: Implementing Proxies for Scraping

Implementing residential proxies to handle IP blocking and rate limiting

47:09
Intro to Temporal: Making Workflows Durable

Introduction to Temporal for building durable, fault-tolerant workflows

58:25
Migrating Logic into Temporal Activities

Refactoring ingestion logic into Temporal activities for reliability

1:06:49
Defining the Ingestion Workflow

Defining the complete ingestion workflow with Temporal workflow definitions

1:16:43
Implementing the Temporal Worker

Building the Temporal worker to execute workflows and activities

1:29:08
Part 2: Building the Research Agent with PydanticAI

Starting Part 2: Building the research agent using PydanticAI framework

1:39:29
Configuring Agent Instructions & Models

Setting up agent instructions, prompts, and model configuration

1:46:37
Optimization: Adding a Summarization Agent

Adding a secondary summarization agent to handle long contexts effectively

1:57:34
Converting the Script to a Durable Temporal Agent

Migrating the research agent to use Temporal for durability and reliability

2:10:00
Running the Full Durable Research Agent

Executing the complete system: durable ingestion + research agent

2:19:45
Final Results & Next Steps

Reviewing results, key takeaways, and next steps for production deployment

Core Tools

YouTube Transcript APIElasticSearchDockerTemporalPydanticAIOpenAI API

What You'll Learn

  • Building a durable data ingestion pipeline
  • Handling IP blocking and retries with proxies
  • Indexing long-form text into ElasticSearch
  • Designing a multi-stage research agent with tool use and summarization
  • Orchestrating workflows with Temporal
  • Handling retries, state, and recovery in production
  • Working with long contexts effectively

Expected Outcome

A production-oriented deep research agent that can answer questions using years of podcast transcripts, backed by a fault-tolerant ingestion workflow