Solving a Real AI Engineer Take-Home Assignment Live
We solve a real AI engineer take-home assignment live. The product is a small investment coaching bot that answers public-company research questions without crossing into personalized financial advice.
The flow of the session:
- Brainstorm scope and survey data sources.
- Build a CLI agent with PydanticAI and OpenAI.
- Pivot from a paid market-data API to free SEC EDGAR data after the free tier blocks us.
- Clean up the pointless tests the coding agent generated and write test guidelines we actually want to follow.
- Add a small evaluation harness with scenarios and an LLM judge.
- Wrap the result as a Telegram bot.
Every step is done alongside a coding agent. The prompts are included verbatim so you can reproduce the workflow with any agent you like.
Links
Useful resources for this workshop:
The app you will build
The final app looks like this:
The agent has four tools, all backed by free SEC EDGAR endpoints:
search_company(query)resolves a ticker or company name to a CIK.get_financial_snapshot(ticker_or_cik)fetches recent annual revenue, profit, balance sheet, and cash facts from companyfacts XBRL data.get_latest_filings(ticker_or_cik)returns recent filing metadata and SEC URLs.get_filing_digest(ticker_or_cik, form_type)pulls the latest 10-K or 10-Q text and extracts business, revenue, risk, and MD&A snippets.
The agent has a strict safety boundary baked into its instructions:
- no buy/sell/hold recommendations
- no position sizing
- no predictions of price direction
A buy/sell question gets transformed into an educational research brief instead.
The evaluation harness lives in evals/manual/. It runs a list of
scenarios from scenarios.csv through the agent and captures the
full tool-call trajectory. An LLM-as-judge then scores whether the
answer stayed inside the safety boundary and addressed the question.
Tutorial pages
- Overview and setup
- Pick a task and scope the agent
- First CLI agent with PydanticAI
- Pivot from FMP to SEC EDGAR
- Cull pointless tests and write guidelines
- Write real agent and SEC tests
- Manual evaluation with scenarios and a judge
- Wrap the agent as a Telegram bot
- Deferred items and how to clean up
- Wrap-up and reflections
Upgrade to Basic to access this workshop
The workshop overview and page list are visible now; membership unlocks the step-by-step tutorial.
Basic or above required
View Pricing