2025 · Sole engineer

Search Engine for Internal Docs

Built an internal semantic search engine over 2.4M company docs using local embeddings and a custom HNSW index. Query latency p95 under 80ms.

stack
Rust, Postgres, pgvector, Tantivy, Cloudflare Workers
tags
backend, data, search
status
shipped

Code

Problem

The wiki was 14 years old and 2.4M documents deep. Full-text search returned 800 hits for “oncall rotation policy.” Nobody trusted it. Slack searches for old conversations were the de facto institutional memory.

Constraints

  • Couldn’t ship documents to an external embedding API — most of them were internal-only.
  • p95 query latency target: 100ms (it was 4s on the legacy system).
  • Had to support both lexical and semantic search; the old system had only lexical.

What I did

Combined three things: a Tantivy lexical index for keyword recall, a pgvector HNSW index for semantic similarity using locally-hosted Jina embeddings, and a reranker that fuses the two using reciprocal rank fusion. The frontend is a Cloudflare Worker that fans out to both backends in parallel.

// Reciprocal rank fusion. Trivial math, surprisingly hard to beat empirically.
fn rrf(rankings: &[Vec<DocId>], k: f32) -> Vec<(DocId, f32)> {
    let mut scores: HashMap<DocId, f32> = HashMap::new();
    for ranking in rankings {
        for (rank, doc) in ranking.iter().enumerate() {
            *scores.entry(*doc).or_insert(0.0) += 1.0 / (k + rank as f32 + 1.0);
        }
    }
    let mut out: Vec<_> = scores.into_iter().collect();
    out.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    out
}

Outcome

  • p95 query latency: 4s → 78ms
  • Search usage (queries/week): 1.2k → 9.4k after 60 days
  • One team retired their internal “Slack archive” Notion page that existed solely because search didn’t work.

What I’d do differently

I should have shipped lexical-only first and added semantic in a second pass. The unified launch was harder to evaluate — when results were bad, you couldn’t tell which pipeline to blame.