pgvector at 10M+ rows: index, queries, real numbers

pgvector at 10M rows is not scary · if you pick the right index. HNSW vs IVFFlat, filter patterns, real numbers.

Last verified22 April 2026

Listen

By Dezso MezoFounder, DField Solutions

ShareX LinkedIn#

pgvector at 10M+ rows: index, queries, real numbers

pgvector has a reputation for 'toy-scale'. The reputation is outdated. We run production RAG on 10M+ row pgvector instances with p95 query latency under 80ms. The key is choosing the right index and writing filter-friendly queries. Here is how.

HNSW vs IVFFlat

HNSW: better recall, higher memory, slower build. Default choice 2024+.
IVFFlat: faster build, less memory, worse recall. Use only if you rebuild frequently.
Neither: sequential scan fine up to ~500k rows with good selectivity.

Filtered search is the hard part

Most RAG queries filter by tenant_id, document_type, or recency before similarity. pgvector 0.5+ added proper filtered HNSW, but naive queries still scan too much. Always apply the selective filter first.

-- GOOD: tenant filter narrows first, vector search on small set
SELECT * FROM chunks
WHERE tenant_id = $1 AND created_at > now() - interval '30 days'
ORDER BY embedding <=> $2
LIMIT 10;

-- Index: btree on (tenant_id, created_at) + HNSW on embedding

Real numbers

Law firm RAG · 12M chunks · HNSW m=16 ef=64 · p50 38ms, p95 72ms.
News aggregator · 8M articles · HNSW m=12 ef=40 · p50 24ms, p95 58ms.
SaaS support bot · 4M tickets · HNSW + tenant filter · p50 18ms, p95 44ms.

If p95 creeps above 200ms, 95% of the time the index is not being used. Run EXPLAIN ANALYZE, confirm the HNSW index is hit, not a sequential scan. Usually it is a WHERE clause that disables the index.

ShareX LinkedIn#

Dezso Mezo

Founder, DField Solutions

I'm a full-stack engineer and I build across the whole stack myself · AI agents, web and mobile apps, blockchain, backends, security, right down to the OS layer. If it's software, I've probably built it and broken it.

ABOUT Let's talk

Keep reading

22 Jan 2026·10 min read

Picking a vector DB in 2026: pgvector, Pinecone, Weaviate

Three serious vector DBs, three very different DNA. Here's the decision framework that held up across our 2026 projects.

Read

22 Apr 2026·8 min read

LLM prompt caching in production · a 60-80% cost cut

Prompt caching is the single biggest LLM cost lever in 2026. 4 patterns, real savings numbers, 2 gotchas worth knowing.

Read

22 Apr 2026·11 min read

LLM evals-as-code · the CI gate we run on every RAG deploy

An eval that's not in CI is not an eval. Here's the evals-as-code workflow we run on every RAG project.

Read

RELATED PROJECTS

Websites, web apps & online shops · Custom software · everything else · AI solutions · 2026Vilya ProtectionVilya Protection · assassination-prevention software platform for public figures and large events. The demo shows the full operational dashboard.

Custom software · everything else · Websites, web apps & online shops · AI solutions · 2026AutoImportEU→HU car-import arbitrage platform - turns 'you can buy this car abroad and resell it at home' into a live, scored feed.

AI solutions · Websites, web apps & online shops · Custom software · everything else · 2026ClarixAIA misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out.

Let's talk

Would rather build together?

Let's talk about your project. 30 minutes, no strings.

Let's talk