Overview and setup
The system you will build
We build a deep research agent over the DataTalks.Club podcast archive. The data side downloads YouTube transcripts, formats them as timestamped subtitles, indexes them in Elasticsearch, and then makes the ingestion durable with Temporal. The agent side searches those indexed transcripts, summarizes long episodes when needed, and then runs the research agent itself inside a Temporal workflow.
The final shape has two projects:
flow/contains the ingestion workflow.agent/contains the Pydantic AI research agent.- Elasticsearch stores podcast titles and transcript subtitles.
- Temporal runs the ingestion workflow and later the agent workflow.
- OpenAI powers the research and summarization agents.
The full system looks like this:
This is not a web app workshop. You leave with a working local research workflow and the code structure you can reuse in a production pipeline.
Prerequisites
You can run the workshop locally or in GitHub Codespaces. Codespaces gives you Python, Docker, and a Linux environment, but YouTube transcript fetching from cloud IP addresses can fail. That is why the workshop includes both a cached transcript fallback and a proxy path.
Install or prepare:
- Python 3.13.
- Docker.
uvfor Python package management.- Jupyter for the notebook phases.
- An OpenAI API key for the agent phase.
- Optional residential proxy credentials for direct YouTube transcript fetching at scale.
- Temporal CLI for local durable workflow execution.
Install uv if you do not have it:
pip install uv
For secrets, create .env files and add them to .gitignore. The workshop
uses these variables:
PROXY_BASE_URL=...
PROXY_USER=...
PROXY_PASSWORD=...
OPENAI_API_KEY=...
Keep .env out of git:
cat > .gitignore <<'EOF'
.env
.venv
__pycache__/
*.py[oc]
.ipynb_checkpoints/
EOF
Project layout
Create an empty folder for the workshop. Use flow/ for ingestion and
agent/ for research:
mkdir temporal-workshop
cd temporal-workshop
mkdir flow
Initialize the ingestion project with Python 3.13:
cd flow
uv init --python=3.13
Install the first dependencies:
uv add youtube-transcript-api
uv add --dev jupyter
Start Jupyter and create a notebook named notebook.ipynb:
uv run jupyter notebook
We start in a notebook because the first task is exploratory. We want to fetch one transcript, look at the shape, create subtitles, index one document, and only then turn the working notebook code into modules and a Temporal workflow.
You are ready for Part 1: Fetch one transcript.