Skip to content

pgvector has a reputation for 'toy-scale'. The reputation is outdated. We run production RAG on 10M+ row pgvector instances with p95 query latency under 80ms. The key is choosing the right index and writing filter-friendly queries. Here is how.

HNSW vs IVFFlat

  • HNSW: better recall, higher memory, slower build. Default choice 2024+.
  • IVFFlat: faster build, less memory, worse recall. Use only if you rebuild frequently.
  • Neither: sequential scan fine up to ~500k rows with good selectivity.

Filtered search is the hard part

Most RAG queries filter by tenant_id, document_type, or recency before similarity. pgvector 0.5+ added proper filtered HNSW, but naive queries still scan too much. Always apply the selective filter first.

-- GOOD: tenant filter narrows first, vector search on small set
SELECT * FROM chunks
WHERE tenant_id = $1 AND created_at > now() - interval '30 days'
ORDER BY embedding <=> $2
LIMIT 10;

-- Index: btree on (tenant_id, created_at) + HNSW on embedding

Real numbers

  • Law firm RAG · 12M chunks · HNSW m=16 ef=64 · p50 38ms, p95 72ms.
  • News aggregator · 8M articles · HNSW m=12 ef=40 · p50 24ms, p95 58ms.
  • SaaS support bot · 4M tickets · HNSW + tenant filter · p50 18ms, p95 44ms.

If p95 creeps above 200ms, 95% of the time the index is not being used. Run EXPLAIN ANALYZE, confirm the HNSW index is hit, not a sequential scan. Usually it is a WHERE clause that disables the index.

ShareXLinkedIn#
Dezso Mezo

By

Dezso Mezo

Founder, DField Solutions

I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.

Keep reading

RELATED PROJECTS

Would rather build together?

Let's talk about your project. 30 minutes, no strings.

Let's talk