Name: Solving a Real AI Engineer Take-Home Assignment Live
Start: 2026-05-19T00:00:00+00:00
Location: Online

We solve a real AI engineer take-home assignment live. The product is a small investment coaching bot that answers public-company research questions without crossing into personalized financial advice.

The flow of the session:

Brainstorm scope and survey data sources.
Build a CLI agent with PydanticAI and OpenAI.
Pivot from a paid market-data API to free SEC EDGAR data after the free tier blocks us.
Clean up the pointless tests the coding agent generated and write test guidelines we actually want to follow.
Add a small evaluation harness with scenarios and an LLM judge.
Wrap the result as a Telegram bot.

Every step is done alongside a coding agent. The prompts are included verbatim so you can reproduce the workflow with any agent you like.

Links

Useful resources for this workshop:

The app you will build

The final app looks like this:

flowchart LR USER["User"] TG["Telegram bot"] CLI["CLI app"] AGENT["PydanticAI agent<br/>safety + structured output"] TOOLS["SEC tools<br/>search, snapshot,<br/>filings, digest"] EDGAR["SEC EDGAR<br/>company facts + filings"] OPENAI["OpenAI Responses API"] USER --> TG USER --> CLI TG --> AGENT CLI --> AGENT AGENT -->|tool call| TOOLS TOOLS -->|HTTP JSON| EDGAR AGENT -->|model call| OPENAI

The agent has four tools, all backed by free SEC EDGAR endpoints:

search_company(query) resolves a ticker or company name to a CIK.
get_financial_snapshot(ticker_or_cik) fetches recent annual revenue, profit, balance sheet, and cash facts from companyfacts XBRL data.
get_latest_filings(ticker_or_cik) returns recent filing metadata and SEC URLs.
get_filing_digest(ticker_or_cik, form_type) pulls the latest 10-K or 10-Q text and extracts business, revenue, risk, and MD&A snippets.

The agent has a strict safety boundary baked into its instructions:

no buy/sell/hold recommendations
no position sizing
no predictions of price direction

A buy/sell question gets transformed into an educational research brief instead.

The evaluation harness lives in evals/manual/. It runs a list of scenarios from scenarios.csv through the agent and captures the full tool-call trajectory. An LLM-as-judge then scores whether the answer stayed inside the safety boundary and addressed the question.