Part 7: Summarize long transcripts

The research agent can search and fetch full subtitles, but long transcripts can exhaust the model context window. Instead of passing a whole episode into the main research agent, we add a second agent whose only job is to summarize one transcript for the current user query and search history.

Create the summarization instructions:

summarization_instructions = """
Your task is to summarize the provided YouTube transcript for a specific topic.

Select the parts of the transcripts that are relevant for the topic and search queries.

Format:
paragraph with discussion (timestamp)
""".strip()

Create the sub-agent:

from pydantic_ai import Agent

summarization_agent = Agent(
    name='summarization',
    instructions=summarization_instructions,
    model='openai:gpt-4o-mini'
)

Test it directly before turning it into a tool. This test gives the summarizer the user query, the search queries, and one full transcript:

user_query = 'how do I get rich with AI?'
search_queries = [
    "investment opportunities in AI",
    "starting AI-focused businesses",
    "AI applications in wealth generation"
]

subtitles = get_subtitles_by_id('1aMuynlLM3o')['subtitles']

Build the prompt:

prompt = f"""
user query:
{user_query}

search engine queries:
{'\n'.join(search_queries)}

subtitles:
{subtitles}
""".strip()

summary_result = await summarization_agent.run(prompt)
print(summary_result.output)

Now the sub-agent has enough context to summarize the transcript for the topic the user asked about, not as a generic episode summary.

Turn summarization into a tool

The summarize tool needs access to the current run context. It reads the original user prompt and the previous search_videos tool calls, fetches the full subtitles, and asks the summarization agent to summarize only the relevant parts.

Start with imports:

import json
import textwrap

from pydantic_ai import RunContext

Define the tool:

async def summarize(ctx: RunContext, video_id: str) -> str:
    """
    Generate a summary for a video based on the conversation history,
    search queries, and the video's subtitles.
    """
    user_queries = []
    search_queries = []

Extract user prompts and search queries from the message history:

    for m in ctx.messages:
        for p in m.parts:
            kind = p.part_kind
            if kind == 'user-prompt':
                user_queries.append(p.content)
            if kind == 'tool-call':
                if p.tool_name == 'search_videos':
                    args = json.loads(p.args)
                    query = args['query']
                    search_queries.append(query)

Fetch subtitles and create the summarization prompt:

    subtitles = get_subtitles_by_id(video_id)['subtitles']

    prompt = textwrap.dedent(f"""
        user query:
        {'\n'.join(user_queries)}

        search engine queries:
        {'\n'.join(search_queries)}

        subtitles:
        {subtitles}
    """).strip()

    summary_result = await summarization_agent.run(prompt)
    return summary_result.output

Replace get_subtitles_by_id with summarize in the main agent tools:

research_agent = Agent(
    name='research_agent',
    instructions=research_instructions,
    model='openai:gpt-4o-mini',
    tools=[search_videos, summarize]
)

Run it with the same callback:

result = await research_agent.run(
    user_prompt='how do I get rich with AI?',
    event_stream_handler=research_agent_callback
)

print(result.output)

If the model does not call summarize when you expect it to, treat that as an instruction design issue, not an Elasticsearch issue. Stronger tool instructions and structured output are good follow-up work.

Move tools into modules

Convert the notebook into a script:

uv run jupyter nbconvert --to=script agent.ipynb

Create tools.py. Wrap the search functions in a class so the Elasticsearch client and index name are dependencies:

import json
import textwrap

from pydantic_ai import Agent, RunContext
from elasticsearch import Elasticsearch

class SearchTools:
    def __init__(self, es_client: Elasticsearch, index_name: str):
        self.es_client = es_client
        self.index_name = index_name

The search_videos method is the same Elasticsearch query from the notebook:

    def search_videos(self, query: str, size: int = 5) -> list[dict]:
        body = {
            "size": size,
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": ["title^3", "subtitles"],
                    "type": "best_fields",
                    "analyzer": "english_with_stop_and_stem"
                }
            },
            "highlight": {
                "pre_tags": ["*"],
                "post_tags": ["*"],
                "fields": {
                    "title": {"fragment_size": 150, "number_of_fragments": 1},
                    "subtitles": {"fragment_size": 150, "number_of_fragments": 1}
                }
            }
        }

Search and return snippets:

        response = self.es_client.search(index=self.index_name, body=body)
        hits = response.body['hits']['hits']

        results = []
        for hit in hits:
            highlight = hit['highlight']
            highlight['video_id'] = hit['_id']
            results.append(highlight)

        return results

The full transcript retrieval stays in the same class:

    def get_subtitles_by_id(self, video_id: str) -> dict:
        result = self.es_client.get(index=self.index_name, id=video_id)
        return result['_source']

Now create SummarizationTools, which depends on both SearchTools and the summarization agent:

class SummarizationTools:
    def __init__(
        self,
        search_tools: SearchTools,
        summarization_agent: Agent
    ):
        self.search_tools = search_tools
        self.summarization_agent = summarization_agent

Its summarize method is the tool version from the notebook:

    async def summarize(self, ctx: RunContext, video_id: str) -> str:
        user_queries = []
        search_queries = []

        for m in ctx.messages:
            for p in m.parts:
                kind = p.part_kind
                if kind == 'user-prompt':
                    user_queries.append(p.content)
                if kind == 'tool-call' and p.tool_name == 'search_videos':
                    args = json.loads(p.args)
                    search_queries.append(args['query'])

Finish the method with the prompt and sub-agent call:

        subtitles = self.search_tools.get_subtitles_by_id(video_id)['subtitles']

        prompt = textwrap.dedent(f"""
            user query:
            {'\n'.join(user_queries)}

            search engine queries:
            {'\n'.join(search_queries)}

            subtitles:
            {subtitles}
        """).strip()

        summary_result = await self.summarization_agent.run(prompt)
        return summary_result.output

Now we have a clean non-Temporal agent. The last implementation step wraps it in Temporal.

Questions & Answers (0)

Sign in to ask questions