Saving Money on API Calls with Batch and Flex Processing
This event is for Main members
Registering for this event requires a Main membership or above.
API costs get expensive fast when you run many model calls. In this workshop we'll look at where you can deliberately trade latency for lower cost, especially for evaluations where results usually do not need to be immediate.
We'll focus on: - When evaluations can run asynchronously instead of inline - Using OpenAI Batch API for eval runs and other delayed jobs - Using Flex processing for lower-priority Responses / Chat Completions calls - Prompt caching, model choice, and request shaping as cost levers - How to structure an eval pipeline so slower, cheaper calls are acceptable - Failure modes: timeouts, retries, expired batches, and when not to delay
Bring an existing eval idea or a workflow with repeated API calls if you want to reason through it live.
References: - https://developers.openai.com/api/docs/guides/batch - https://developers.openai.com/api/docs/guides/flex-processing
Hosted by
Alexey Grigorev
Chief Agent Officer at AI Shipping Labs
Software engineer and machine learning practitioner with 15+ years of experience building production ML systems. I focus on practical, production-grade ML and AI systems, from early prototypes to reliable systems in production.
I'm the founder of DataTalks.Club, a free community that connects tens of thousands of practitioners worldwide, and the creator of the Zoomcamp series, free, code-first programs that have reached 100,000+ learners globally.
At AI Shipping Labs, I'm building the kind of environment that would have accelerated my own career growth. After years of teaching at scale, I wanted something more focused: a space for action-oriented builders who want to turn AI ideas into real projects. The community gives members the structure, accountability, and peer support to ship practical AI products consistently, even alongside their main jobs.