Where to go from here
We built a search engine from scratch, progressing from keyword matching to semantic embeddings. The concepts here scale to real systems. This page covers the tools and techniques that make that scaling possible.
The minsearch library
The TextSearch class we built became minsearch,
a production-ready text search library. It adds appendable indices, multiple
filter fields, and a cleaner API on top of the same TF-IDF approach.
Install it with:
uv add minsearch
For search over a small-to-medium dataset, minsearch is the practical version
of what we built here.
Inverted indexes
Our implementation computes similarity against every document. That works for a few thousand FAQ entries, but does not scale to millions. An inverted index maps each word to the list of documents that contain it. Then the search engine only looks at documents that have at least one matching term. This is how every major text search engine works under the hood.
Vector search at scale
For vector embeddings, comparing a query against every document vector is also linear in the number of documents.
Two techniques make vector search fast:
- LSH (Locality-Sensitive Hashing) uses random projections to group similar vectors into the same bucket. The search only checks vectors in the same bucket as the query.
- Product quantization compresses vectors into shorter codes. It trades a small amount of accuracy for much faster distance computation.
Tools and databases
For real projects, use established tools instead of building from scratch:
- Elasticsearch (built on Lucene) for text search with inverted indexes
- FAISS for fast vector similarity search
- Qdrant, Weaviate, or Chroma as dedicated vector databases
Each of these handles the indexing, storage, and retrieval concerns that we skipped for clarity.
Follow-up: Agentic RAG
The search engine we built retrieves FAQ entries. The natural next step is to feed those results into a language model. The model can then generate answers.
The From RAG to Agents workshop picks up where this one leaves off. It starts with classic RAG over the same FAQ data. Then it evolves into an agentic workflow. The LLM decides what to search for and whether to open a full document.