2025 · Sole engineer
Search Engine for Internal Docs
Built an internal semantic search engine over 2.4M company docs using local embeddings and a custom HNSW index. Query latency p95 under 80ms.
- stack
- Rust, Postgres, pgvector, Tantivy, Cloudflare Workers
- tags
- backend, data, search
- status
- shipped
Problem
The wiki was 14 years old and 2.4M documents deep. Full-text search returned 800 hits for “oncall rotation policy.” Nobody trusted it. Slack searches for old conversations were the de facto institutional memory.
Constraints
- Couldn’t ship documents to an external embedding API — most of them were internal-only.
- p95 query latency target: 100ms (it was 4s on the legacy system).
- Had to support both lexical and semantic search; the old system had only lexical.
What I did
Combined three things: a Tantivy lexical index for keyword recall, a pgvector HNSW index for semantic similarity using locally-hosted Jina embeddings, and a reranker that fuses the two using reciprocal rank fusion. The frontend is a Cloudflare Worker that fans out to both backends in parallel.
// Reciprocal rank fusion. Trivial math, surprisingly hard to beat empirically.
fn rrf(rankings: &[Vec<DocId>], k: f32) -> Vec<(DocId, f32)> {
let mut scores: HashMap<DocId, f32> = HashMap::new();
for ranking in rankings {
for (rank, doc) in ranking.iter().enumerate() {
*scores.entry(*doc).or_insert(0.0) += 1.0 / (k + rank as f32 + 1.0);
}
}
let mut out: Vec<_> = scores.into_iter().collect();
out.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
out
}
Outcome
- p95 query latency: 4s → 78ms
- Search usage (queries/week): 1.2k → 9.4k after 60 days
- One team retired their internal “Slack archive” Notion page that existed solely because search didn’t work.
What I’d do differently
I should have shipped lexical-only first and added semantic in a second pass. The unified launch was harder to evaluate — when results were bad, you couldn’t tell which pipeline to blame.