AI Shipping Labs launched - early members get extra onboarding benefits.Read more

Back to Event Recordings

Workshop Resource

Build a Production-Ready YouTube AI Agent with Temporal

Build a durable data ingestion pipeline, handle IP blocking with proxies, index transcripts into ElasticSearch, and design a multi-stage research agent with Temporal orchestration.

December 16, 2025Advanced

ai-agentsdata-engineeringtemporalelasticsearchproduction-systems

Timestamps

Click any timestamp to jump to that moment in the video

Introduction & Project Overview

Welcome and overview of building a production-ready YouTube AI agent with Temporal orchestration

Environment Setup (GitHub Codespaces vs. Local)

Setting up the development environment, comparing GitHub Codespaces and local setup options

Part 1: Data Ingestion Workflow Setup

Beginning the data ingestion workflow: planning and initial setup

Formatting Transcripts & Subtitles

Processing and formatting YouTube transcripts and subtitles for indexing

Deploying ElasticSearch with Docker

Setting up ElasticSearch using Docker for search infrastructure

Configuring ElasticSearch Indices & Stop Words

Configuring ElasticSearch indices, mappings, and stop words for optimal search

Iterating Through Videos & Progress Tracking

Building the video iteration logic and implementing progress tracking

The Challenge: Handling Real-Time YouTube IP Blocking

Understanding the IP blocking problem when scraping YouTube at scale

The Solution: Implementing Proxies for Scraping

Implementing residential proxies to handle IP blocking and rate limiting

Intro to Temporal: Making Workflows Durable

Introduction to Temporal for building durable, fault-tolerant workflows

Migrating Logic into Temporal Activities

Refactoring ingestion logic into Temporal activities for reliability

Defining the Ingestion Workflow

Defining the complete ingestion workflow with Temporal workflow definitions

Implementing the Temporal Worker

Building the Temporal worker to execute workflows and activities

Part 2: Building the Research Agent with PydanticAI

Starting Part 2: Building the research agent using PydanticAI framework

Configuring Agent Instructions & Models

Setting up agent instructions, prompts, and model configuration

Optimization: Adding a Summarization Agent

Adding a secondary summarization agent to handle long contexts effectively

Converting the Script to a Durable Temporal Agent

Migrating the research agent to use Temporal for durability and reliability

Running the Full Durable Research Agent

Executing the complete system: durable ingestion + research agent

Final Results & Next Steps

Reviewing results, key takeaways, and next steps for production deployment

Core Tools

YouTube Transcript APIElasticSearchDockerTemporalPydanticAIOpenAI API

What You'll Learn

Building a durable data ingestion pipeline
Handling IP blocking and retries with proxies
Indexing long-form text into ElasticSearch
Designing a multi-stage research agent with tool use and summarization
Orchestrating workflows with Temporal
Handling retries, state, and recovery in production
Working with long contexts effectively

Expected Outcome

A production-oriented deep research agent that can answer questions using years of podcast transcripts, backed by a fault-tolerant ingestion workflow

Materials

Related Course: AI Engineering Buildcamp

article