LOADING
1466 words
7 minutes
Hybrid Search Is Not One Thing

How to Balance Keyword and Vector Retrieval

The moment I realized vector search alone wasn’t enough

I asked the system: “What is Article 9 of the EU AI Act?”

The system retrieved four documents. They all showed high semantic similarity and were technically relevant to AI regulation. The system returned a response with complete certainty.

The response was wrong.

But not because the vector search system had hallucinated, but because the retrieved documents didn’t contain information about Article 9. They contained information about other articles of the AI ​​Act: compliance requirements, audit logs, and documentation. The vector search had found “items similar to Article 9”, but not “Article 9itself.”

This is the failure mode that pure vector search doesn’t handle correctly: exact matches and unique identifiers.

When a user searches for a regulation by name (“Article 9”), they aren’t looking for semantic similarity, but rather an exact match of the text string. The embedding model has never seen that exact string during training, so it can’t encode it with enough precision to reliably match it.

That’s where hybrid search comes in.

The two-signal problem

By now you’ve probably read about naive retrieval and re-ranking. Both use a single signal: vector similarity. The query is integrated. The vector store finds similar vectors. Done.

Hybrid search uses two signals:

  1. BM25 (keyword-based). Does the document contain the exact words?

  2. Vector (semantic) search. Is the document semantically related to the query?

The results are combined. BM25 detects exact matches. Vector search detects paraphrases and related concepts. Together, they detect what neither can on its own.

This is a more powerful model, but also a more complex one. Here’s where the decision needs to be made: Which signal is more important? How do they combine? What do you do when they disagree?

How hybrid search works

In Weaviate, the hybrid query looks like this:

{
Get {
EnterpriseDocument(
hybrid: {
query: "Article 9"
vector: [0.12, -0.04, 0.53, ...]
alpha: 0.5
}
limit: 5
) {
text
_additional {
score
}
}
}
}

There are three parameters:

query. This is the user’s original question in the form of a text string. Weaviate uses BM25 to search for exact and near-exact matches in the text field.

vector. This is the vector representation of the user’s question. Weaviate uses this for semantic similarity searching.

alpha: Alpha is the so-called balance factor.

  • alpha=0.0: Pure BM25 (keywords only)
  • alpha=1.0: Pure vector search (semantics only)
  • alpha=0.5: 50% keywords, 50% semantics

The default value is 0.5, and it’s a good starting point.

Why alpha matters (more than you think)

Keep one thing in mind about setting Alpha: there’s no one-size-fits-all answer.

For example, if your documents are technical specifications with unique identifiers, you should use an alpha value close to 0.0 (i.e., more in line with BM25). Users will be searching by regulation number, product code, or acronym. Therefore, an exact match is crucial.

If your documents are blog posts or general information, you should use an alpha value close to 1.0 (more in line with vector search). This is because users are looking for concepts and ideas, not exact text strings.

If you have a mix of different document types, as is often the case in most business environments, it’s best to keep the value at 0.5 and monitor the results.

The real difference: hybrid search detects edge cases that pure vector misses

Let’s look at a case where hybrid search clearly wins.

Question: “What is Article 5 of the GDPR?”

Corpus Contains: A document about Article 5 of the GDPR (General Data Protection Regulation) concerning data minimization.

Results of Pure Vector Search

The question is embedded, and this embedding captures terms like “legal requirement” and “data regulation.” The vector database then finds documents about the GDPR and data protection. However, since the question uses the acronym “GDPR” and pure vector search relies on semantic understanding, it may rank documents about other regulations (such as the EU AI Act) with the same importance if they use similar language about data protection.

All of this results in mediocre retrieval.

Hybrid Search Results

BM25 detects “Article 5 of the GDPR” and performs an exact match of the string. It immediately finds the document containing those exact words. Furthermore, the vector search provides additional context (“legal regulations on data”).

The hybrid result then combines both signals:

  1. BM25 indicates: “This document contains the exact string.”

  2. The vector search indicates: “This document is semantically related.”

  3. Hybrid conclusion: “This is undoubtedly the correct document.”

As a result, we obtain a precise retrieval.

This is why hybrid search is a standard in enterprise knowledge management systems. It’s not a theoretical matter, but a practical one.

Hybrid search in production

I tested this with three queries against a real corpus:

Query 1: “What are the governance requirements for AI systems in the EU?”

Chunks retrieved: 4
Alpha: 0.5
Answer quality: Excellent
Answer included: Section 1 (Purpose), Section 3 (Compliance), reliability requirements

Query 2: “How does AI affect compliance?” The system transformed the query (remember CompressingRagService?) and then ran hybrid search:

Original query: "How does AI affect compliance?"
Transformed query: "What are key regulatory implications of artificial intelligence on organizational compliance?"
Chunks retrieved: 3
Alpha: 0.5
Answer quality: Excellent
Answer included: EU AI Act high-risk requirements, audit trails, transparency

Query 3: “What is Article 9 of the EU AI Act?”

Chunks retrieved: 4
Alpha: 0.5
Answer quality: Degraded (context insufficient)
System response: "Insufficient context provided. The question refers to Article 9 of the EU AI Act, but the context only mentions Section 3: Compliance and does not provide information about Article 9 specifically."

In query 3, we can observe something important: the system recognizes the deficiency. It didn’t return a categorically incorrect answer. It simply indicated: “I don’t have enough context”. That’s the desired behavior in production.

The trade-off: simplicity vs. precision

Hybrid search is more complex than simple retrieval, and that implies:

  • Two retrieval signals (BM25 + vector)
  • An alpha parameter to adjust
  • The need to monitor both signals separately

However, simple retrieval is simpler:

  • One signal (vector similarity)
  • A threshold to adjust (minimum score)
  • Easier to debug

But this simplicity comes at a price. With pure vector search, exact matches are missed. With hybrid search, they are detected.

As I’ve written before, in architecture matters more than prompts, choosing hybrid over naive isn’t a performance optimization. It’s an architectural decision that determines what your system can reliably answer.

How hybrid search connects with re-ranking

Below is the pattern that should be followed:

  1. Naive Retrieval. One filter (minScore threshold)

  2. Hybrid Search. Two signals (BM25 + vector)

  3. Reclassification. Two filters (retrieval + relevance assessment)

Each layer adds discrimination, and each layer forces the system to make a decision: Is this document truly relevant?

With [reclassification on top of hybrid search] (coming soon), we have three filtering layers:

  1. BM25 indicates: “Contains the keywords”
  2. Vector indicates: “Is semantically related”
  3. The reclassifier indicates: “Truly answers the question”

Only documents that pass all three filters reach the LLM context. That’s why enterprise RAG systems use all three. It’s not overly complex. It’s the minimum required for reliability.

Why this matters for EU AI compliance

Article 13 of the EU AI Law requires transparency. A system must be able to explain why it retrieved certain documents.

With hybrid search, a suitable explanation is available:

  • “We retrieved this document because it contains the exact keywords you requested (BM25 signal) and is also semantically related to your query (vector signal)”.

If you also add re-ranking:

  • “We retrieved this document using hybrid search and then assessed its relevance using LLM, and it passed the test”.

This is an auditable decision-making process. This is what compliance requires.

Pure vector search only allows you to say: “This vector was similar to your query vector”. And that is weaker evidence, as it doesn’t explain why this document is more relevant than other semantically similar documents.

Honest evaluation

Hybrid search is not the ultimate solution. It doesn’t solve the fundamental problem of RAG: if the input data is garbage, the output will be too. If the corpus doesn’t contain the information the user is looking for, hybrid search won’t find it.

What hybrid search does is maximize the probability of finding what does exist. It detects exact matches that vector search wouldn’t. It detects concepts that BM25 wouldn’t. It’s pragmatic, not magical.

What comes next

The next article will covers what happens after retrieval: audit logging as an architectural requirement, not a debugging tool. With hybrid search pulling from two signals, your audit log needs to capture both. With re-ranking filtering candidates, it needs to record what was rejected and why.

That’s when observability becomes compliance infrastructure.

The code

The full implementation is in Module 02 of the reliable-enterprise-ai repository, in the HybridSearchService class.

Conclusion

Naive RAG retrieval uses one signal, hybrid search uses two, and re-ranking uses a third. This means that each layer increases reliability at the cost of higher latency.

Hybrid search should be used when the corpus combines exact identifiers (item numbers, product codes) with conceptual information.

In enterprise systems, especially for compliance with the EU AI Act, hybrid search is essential. It is the minimum requirement, as it allows us to explain to auditors the reason for retrieving the data.

The advantage is real: increased latency in exchange for higher retrieval quality and complete auditability. This is advantageous when answering compliance questions or making critical decisions.

Hybrid Search Is Not One Thing
Author
Raúl Ferrer
Published at
2026-04-06
License
CC BY-NC-SA 4.0

Some information may be outdated

Profile Image of the Author
Raúl Ferrer
Software Architect studying Reliable Enterprise AI systems. I document RAG architecture patterns, retrieval failure modes, and EU AI Act compliance requirements for production environments. Author of iOS Architecture Patterns (Apress).

Loading stats...