LOADING
1519 words
8 minutes
How Vector Databases Are Changing Enterprise Search

Exploring the engine behind modern semantic search and knowledge retrieval.

In the previous post I explained why RAG, Retrieval-Augmented Generation, is the architectural pattern that makes AI systems actually useful with proprietary enterprise data. The retrieval step is the heart of that pattern: fetching the right context at query time, injecting it into the prompt, grounding the model’s output in your actual documents.

What I didn’t go into was the infrastructure that makes that retrieval possible. That’s what this post is about.

The short version: traditional databases aren’t built for this. You need a different kind of storage and search, one optimized for finding meaning rather than matching text. That’s what vector databases do. And understanding what they are, what they’re good at, and where they have limits is increasingly a requirement for anyone thinking seriously about enterprise AI architecture.

Why traditional search falls short

Enterprise search has been a solved problem — or at least a well-understood problem — for decades. You index your documents, you build an inverted index that maps terms to documents, and when someone searches, you find documents that contain the query terms, ranked by relevance signals like term frequency.

Why Traditional Search Falls Short

This works well for what it was designed for: finding documents that contain specific words. It handles scale well. It’s fast. It’s predictable. Elasticsearch, Solr, and similar systems have been doing this reliably for years.

But keyword search has a fundamental limitation that becomes painful when you’re building AI systems: it finds the words, not the meaning.

Ask a keyword search system for documents about “reducing latency in distributed systems” and it’ll return documents that contain those specific words. Ask it about “making microservices faster” and it may return completely different results (even though the documents that answer both questions might be identical. Synonyms, paraphrases, conceptual relationships) traditional search doesn’t understand these. It matches strings.

For AI systems, this is a serious problem. Users don’t phrase their questions the way documents are written. They use different vocabulary, different levels of specificity, different conceptual framings. A retrieval system that only finds exact or near-exact matches is going to miss relevant content constantly, and that content is what the model needs to give a good answer.

What semantic search actually means

The alternative is to search by meaning rather than by words. To do that, you need a way to represent meaning mathematically, and that’s what embeddings are.

An embedding is a numerical vector, a list of hundreds or thousands of numbers, that represents the semantic content of a piece of text. The key property is that texts with similar meanings have vectors that are close together in that high-dimensional space. “Reducing latency” and “making things faster” end up near each other. “Quarterly revenue report” ends up somewhere very different.

This representation is learned, not hand-crafted. Embedding models, trained on vast amounts of text, develop these representations by learning which texts tend to appear in similar contexts. The result is a geometry of meaning: distance in the vector space corresponds to semantic similarity.

Semantic search works by embedding the query, embedding the documents, and finding the documents whose embeddings are closest to the query embedding. No keyword matching. No index of terms. Just distance in meaning-space.

The results are qualitatively different from keyword search. Semantically relevant documents surface even when they share no words with the query. Jargon differences across departments stop being a barrier. The system finds what the user means, not just what they typed.

Where vector databases come in

Storing and searching embeddings at scale requires infrastructure that traditional databases weren’t built for. A single document chunk might have an embedding with 768 or 1536 dimensions. A knowledge base with a hundred thousand chunks has a hundred thousand such vectors. And for every query, you need to find the closest vectors (a nearest-neighbor search in very high-dimensional space) fast enough that users don’t notice the latency.

This is a non-trivial computational problem. Exact nearest-neighbor search scales poorly. The solutions that work at scale involve approximate algorithms, HNSW (Hierarchical Navigable Small World graphs) is the most widely used, that trade a small amount of accuracy for dramatically better performance. Getting these algorithms right, tuning their parameters for your data and query patterns, and managing the index as your knowledge base grows are real engineering challenges.

Vector databases are purpose-built to handle this. They store embeddings, build and maintain the search indices, handle approximate nearest-neighbor queries efficiently, and typically provide filtering capabilities so you can combine semantic search with metadata constraints, find the semantically closest documents that are also from a specific department, or newer than a specific date.

The current landscape includes several options with different characteristics. Weaviate and Qdrant are purpose-built vector databases with rich filtering capabilities and active development communities. Pinecone is a managed cloud service that abstracts most of the operational complexity. Chroma is a lightweight option well-suited for development and smaller-scale use cases. pgvector extends PostgreSQL with vector search capabilities, an attractive option for teams that already run Postgres and want to avoid adding another infrastructure component.

Each has different performance profiles, operational requirements, filtering capabilities, and integration patterns. The right choice depends on your scale, your existing infrastructure, your team’s operational capacity, and how you need to combine vector search with other query types.

The hybrid search question

Pure semantic search isn’t always the right answer. Keyword search is still better for certain query types: exact phrase matching, searching for specific identifiers like product codes or document numbers, queries where the exact wording matters. Semantic search can return irrelevant results when queries are short or ambiguous, because there isn’t enough signal for the embedding to capture the right meaning.

The Hybrid Search Question

The pattern that’s emerging in production systems is hybrid search: combining semantic search with keyword search (BM25 is the standard algorithm), then using a re-ranking step to merge the results. The semantic search catches conceptually relevant documents that keyword search would miss. The keyword search catches exact matches that semantic search might rank lower than it should. The re-ranker sorts the combined results by overall relevance.

This is more complex than either approach alone. It requires running two search systems in parallel, then reconciling their results. But for enterprise knowledge bases with varied content (technical documentation alongside policy documents alongside product information alongside support records) it tends to produce significantly better retrieval quality than either approach in isolation.

And retrieval quality, as I noted in the previous post, is the most consequential variable in a RAG system. The model can’t compensate for what the retrieval step misses.

What this means for architecture decisions

If you’re evaluating whether to add vector search to an existing system, or planning the infrastructure for a new AI product, a few things are worth thinking about carefully.

The embedding model matters as much as the vector database. The quality of your semantic search is determined first by how well your embedding model captures meaning for your specific type of content. A model trained primarily on general web text may not represent technical documentation, legal language, or domain-specific jargon as well as a model fine-tuned for those domains. Choosing a vector database without first validating your embedding model’s performance on representative queries is building on uncertain ground.

Chunking strategy affects retrieval more than most people expect. Documents need to be split into chunks before embedding, you can’t embed a hundred-page document as a single vector. How you split them matters enormously. Chunks that are too small lose context. Chunks that are too large dilute the semantic signal. Splits that cut across logical boundaries produce chunks that don’t mean anything coherently. There’s no universal right answer; it depends on your content type and your query patterns.

Operational complexity is real and shouldn’t be underestimated. A vector database is another stateful system to run, monitor, back up, and scale. Index rebuilds take time and resources. Embedding updates (when you change your embedding model) require re-embedding your entire knowledge base. These operational realities belong in your architecture evaluation, not just the query performance benchmarks.

The managed vs self-hosted decision has long-term implications. Managed services reduce operational burden significantly but introduce vendor dependency and ongoing costs that scale with usage. Self-hosted gives you control and predictable costs but requires the operational capacity to run it well. This is the same tradeoff as any managed infrastructure decision, the parameters are just different.

The bigger picture

Vector databases are infrastructure. They’re not inherently interesting in the way that model capabilities are interesting, they’re a means to an end, the foundation that makes semantic retrieval possible at scale.

But infrastructure decisions have long tails. The vector database choice you make early in a project shapes what’s easy and what’s painful for years afterward. Getting it right requires understanding not just the technology but the operational reality of running it in production, the query patterns your use case actually generates, and how it fits into the rest of your data architecture.

That’s the lens I bring to all of these decisions: not what’s impressive in a benchmark, but what works reliably over time in a real system.

Next: the reliability question directly. Because retrieval and vector search are prerequisites for reliable enterprise AI, but they’re not sufficient. There’s a layer above the infrastructure that most teams still haven’t figured out how to build.

How Vector Databases Are Changing Enterprise Search
Author
Raúl Ferrer
Published at
2024-08-13
License
CC BY-NC-SA 4.0

Some information may be outdated

Profile Image of the Author
Raúl Ferrer
Software Architect & Tech Lead. Applying software and systems engineering principles in production to build reliable, observable, and maintainable AI. Author of iOS Architecture Patterns (Apress).

Loading stats...