Overview and setup
The system you will build
We build a deep research agent over the DataTalks.Club podcast archive. The data side downloads YouTube transcripts and formats them as timestamped subtitles. Then it indexes them in Elasticsearch and makes ingestion durable with Temporal. The agent side searches those indexed transcripts and summarizes long episodes when needed. Then it runs the research agent inside a Temporal workflow.
The final shape has two projects:
flow/contains the ingestion workflow.agent/contains the Pydantic AI research agent.- Elasticsearch stores podcast titles and transcript subtitles.
- Temporal runs the ingestion workflow and later the agent workflow.
- OpenAI powers the research and summarization agents.
The full system looks like this:
This is not a web app workshop. You leave with a working local research workflow and the code structure you can reuse in a production pipeline.
Prerequisites
You can run the workshop locally or in GitHub Codespaces. Codespaces gives you Python, Docker, and a Linux environment, but YouTube transcript fetching from cloud IP addresses can fail. That is why the workshop includes both a cached transcript fallback and a proxy path.
Install or prepare these tools:
- Python 3.13.
- Docker.
uvfor Python package management.- Jupyter for the notebook phases.
- An OpenAI API key for the agent phase.
- Optional residential proxy credentials for direct YouTube transcript fetching at scale.
- Temporal CLI for local durable workflow execution.
Install uv if you do not have it:
pip install uv
For secrets, create .env files and add them to .gitignore.
The workshop uses these variables:
PROXY_BASE_URL=...
PROXY_USER=...
PROXY_PASSWORD=...
OPENAI_API_KEY=...
Keep .env out of git:
cat > .gitignore <<'EOF'
.env
.venv
__pycache__/
*.py[oc]
.ipynb_checkpoints/
EOF
Project layout
Create an empty folder for the workshop.
Use flow/ for ingestion and agent/ for research:
mkdir temporal-workshop
cd temporal-workshop
mkdir flow
Initialize the ingestion project with Python 3.13:
cd flow
uv init --python=3.13
Install the first dependencies:
uv add youtube-transcript-api
uv add --dev jupyter
Start Jupyter and create a notebook named notebook.ipynb:
uv run jupyter notebook
We start in a notebook because the first task is exploratory. We want to fetch one transcript, look at the shape, create subtitles, and index one document. Only then do we turn the working notebook code into modules and a Temporal workflow.
You are ready for Part 1: Fetch one transcript.