Beyond Semantic Similarity: The Case for Agentic and Lexical Retrieval

The current trend in AI-driven information retrieval (IR) has leaned heavily toward semantic similarity—using embeddings to find documents that are 'conceptually' related to a query. While this approach solves the problem of synonymy (finding 'dog' when searching for 'canine'), it has introduced a new set of challenges: a lack of explainability, unpredictable results, and a loss of fine-grained control.

Recent discussions around the paper Beyond Semantic Similarity suggest a shift in perspective. Instead of relying solely on a black-box embedding space, there is a growing interest in "agentic search"—where an LLM acts as an orchestrator that can iteratively refine search terms, use lexical tools (like grep), and navigate a corpus based on discovered context.

The Friction of Semantic Search

For many developers and power users, the move toward purely semantic search has been frustrating. The primary issue is the lack of transparency. When a keyword search fails, it is clear why: the word isn't there. When a semantic search fails or returns irrelevant results, it is often impossible to understand why the embedding model associated the query with a specific document.

As one practitioner noted:

Semantic search explainability sucks. Keyword is great because you can understand exactly what it is and isn’t finding, and can intelligently iterate on it.

This lack of control mirrors the current frustration with modern web search engines, which often attempt to interpret user intent rather than providing the exact matches requested. In an agentic workflow, the ability to provide precise feedback to a search tool is critical for the agent to converge on the correct answer.

The Power of Lexical Retrieval in Agentic Workflows

While semantic search is often viewed as "modern" and keyword search as "legacy," the latter remains incredibly powerful when paired with an LLM. An agent can use a tool like grep or BM25 to find exact matches, then use the resulting context to refine its next query. This iterative loop allows the agent to handle synonyms by generating multiple search terms based on the domain it discovers during the process.

Certain domains benefit more from this approach than others. Technical documentation and source code, for instance, have highly precise terminology. In these environments, a fixed-string search is often more reliable than a semantic approximation. Conversely, medical or legal texts—where the same concept can be expressed in dozens of different ways—may still require a semantic first pass to ensure coverage.

Some developers have even found that leveraging existing version control tools can simplify their agent harnesses. By using git grep, git log, or git diff, agents can navigate complex codebases with a level of precision that embedding-based RAG (Retrieval-Augmented Generation) often misses.

Practical Constraints: Latency, Scale, and Language

Despite the theoretical appeal of agentic lexical search, several production hurdles remain:

1. Latency and Predictability

Iterative search is inherently slower than a single vector lookup. While a vector database returns results in milliseconds, an agent that "wanders" through a corpus, refining its search over multiple turns, can introduce unpredictable latencies. For enterprise systems requiring sub-five-second response times, this approach may be too slow unless limited to background tasks.

2. The Scale Problem

Searching a small local corpus with grep is fast. Searching hundreds of gigabytes of enterprise data is a different story. Without a distributed filesystem or a highly optimized index, the cost of data egress and the time required to scan massive datasets can make raw lexical search prohibitive.

3. Cross-Language Retrieval

Semantic embeddings are often superior for cross-language retrieval (e.g., querying in English to find a document in Hindi). Lexical search fails completely in these scenarios unless a translation layer is added, as it relies on exact character matches.

Emerging Patterns: Hybrid Search and Map-Reduce

To balance these trade-offs, the industry is converging on hybrid models. The most robust production systems typically employ a multi-stage pipeline:

Initial Retrieval: A combination of BM25 (lexical) and embedding-based (semantic) search to cast a wide but relevant net.
Reranking: Using a more expensive "LLM-as-judge" or a cross-encoder to refine the top results for precision.
Agentic Refinement: Allowing an agent to iteratively query the index if the initial results are insufficient.

There is also interest in reviving the Map-Reduce pattern for IR. By mapping shards of a corpus to different agents and reducing their findings, systems can achieve high localization and coverage, iteratively traversing the corpus until the desired information is found.

Conclusion

The move "beyond semantic similarity" isn't about abandoning embeddings, but about recognizing that they are one tool among many. For high-precision tasks, especially in technical domains, the transparency and control of lexical search—when orchestrated by an intelligent agent—often outperform the "black box" of semantic similarity.

Beyond Semantic Similarity: The Case for Agentic and Lexical Retrieval

Beyond Semantic Similarity: The Case for Agentic and Lexical Retrieval

The Friction of Semantic Search

The Power of Lexical Retrieval in Agentic Workflows

Practical Constraints: Latency, Scale, and Language

1. Latency and Predictability

2. The Scale Problem

3. Cross-Language Retrieval

Emerging Patterns: Hybrid Search and Map-Reduce

Conclusion

References

HN Stories