Back to Events
Solving a Real AI Engineer Take-Home Assignment Live
Past Main or above

Solving a Real AI Engineer Take-Home Assignment Live

May 19, 2026, 02:00 Europe/Berlin

Continue with the workshop writeup

Open the canonical pages, recording, materials, and code repo.

View workshop writeup

We solve a real AI engineer take-home assignment live. The product is a small investment coaching bot that answers public-company research questions without crossing into personalized financial advice.

The flow of the session:

  • Brainstorm scope and survey data sources.
  • Build a CLI agent with PydanticAI and OpenAI.
  • Pivot from a paid market-data API to free SEC EDGAR data after the free tier blocks us.
  • Clean up the pointless tests the coding agent generated and write test guidelines we actually want to follow.
  • Add a small evaluation harness with scenarios and an LLM judge.
  • Wrap the result as a Telegram bot.

Every step is done alongside a coding agent. The prompts are included verbatim so you can reproduce the workflow with any agent you like.

Links

Useful resources for this workshop:

The app you will build

The final app looks like this:

flowchart LR USER["User"] TG["Telegram bot"] CLI["CLI app"] AGENT["PydanticAI agent<br/>safety + structured output"] TOOLS["SEC tools<br/>search, snapshot,<br/>filings, digest"] EDGAR["SEC EDGAR<br/>company facts + filings"] OPENAI["OpenAI Responses API"] USER --> TG USER --> CLI TG --> AGENT CLI --> AGENT AGENT -->|tool call| TOOLS TOOLS -->|HTTP JSON| EDGAR AGENT -->|model call| OPENAI

The agent has four tools, all backed by free SEC EDGAR endpoints:

  • search_company(query) resolves a ticker or company name to a CIK.
  • get_financial_snapshot(ticker_or_cik) fetches recent annual revenue, profit, balance sheet, and cash facts from companyfacts XBRL data.
  • get_latest_filings(ticker_or_cik) returns recent filing metadata and SEC URLs.
  • get_filing_digest(ticker_or_cik, form_type) pulls the latest 10-K or 10-Q text and extracts business, revenue, risk, and MD&A snippets.

The agent has a strict safety boundary baked into its instructions:

  • no buy/sell/hold recommendations
  • no position sizing
  • no predictions of price direction

A buy/sell question gets transformed into an educational research brief instead.

The evaluation harness lives in evals/manual/. It runs a list of scenarios from scenarios.csv through the agent and captures the full tool-call trajectory. An LLM-as-judge then scores whether the answer stayed inside the safety boundary and addressed the question.

Feedback