Semble: Optimizing Code Search for AI Agents with 98% Token Reduction

For developers building AI coding agents, the "grep-and-read" cycle is a notorious bottleneck. When an agent needs to understand a codebase, it typically greps for a keyword, identifies a file, and then reads the entire file into its context window. This process is not only slow but incredibly wasteful, often consuming thousands of tokens on irrelevant code just to find a single function definition.

Semble is a code search library designed to break this cycle. By providing agents with a way to perform natural-language and semantic queries that return only the most relevant code chunks, Semble claims to reduce token usage by up to 98% compared to traditional grep-based exploration.

The Architecture of Semble

Unlike heavyweight transformer-based search engines, Semble is designed for speed and local execution. It runs entirely on the CPU with no requirement for GPUs, API keys, or external services.

How it Works

Semble employs a hybrid retrieval strategy to ensure both precision and recall:

Code-Aware Chunking: Using the Chonkie library, Semble splits files into chunks that respect the logical structure of the code.
Hybrid Retrieval: It combines two complementary methods:
- Semantic Search: Utilizes static Model2Vec embeddings (via the potion-code-16M model) for conceptual similarity.
- Lexical Search: Uses BM25 for exact matches on identifiers and API names.
Reciprocal Rank Fusion (RRF): The results from both retrievers are fused to create a unified ranking.
Code-Aware Reranking: The final results are refined using several specific signals:
- Adaptive Weighting: Symbol-like queries (e.g., getUserById) prioritize lexical matches, while natural language queries remain balanced.
- Definition Boosts: Chunks that define a class or function are ranked higher than those that simply reference it.
- Identifier Stemming: Query tokens are stemmed to match variations like parseConfig and ConfigParser.
- Noise Penalties: Test files, legacy shims, and declaration stubs are down-ranked to surface canonical implementations first.

Performance and Benchmarks

Semble's primary value proposition is the intersection of speed and accuracy. According to the project's benchmarks, it achieves an NDCG@10 of 0.854, which is nearly identical to much larger transformer models (like CodeRankEmbed Hybrid), but with drastically lower latency.

Indexing Speed: An average repository can be indexed in ~250ms.
Query Latency: Queries typically resolve in ~1.5ms.
Token Efficiency: The project reports that Semble can reach 94% recall using only 2k tokens, whereas a grep+read approach would require a 100k context window to reach only 85% recall.

Integration and Workflow

Semble is designed to be a "drop-in" tool for modern agent harnesses. It can be integrated in two primary ways:

1. Model Context Protocol (MCP) Server

For agents that support MCP (such as Claude Code, Cursor, or Codex), Semble can be run as a server. This allows the agent to call search and find_related tools directly. Repositories are cloned and indexed on demand, and local paths are watched for automatic re-indexing.

2. Bash Integration

For sub-agents or CLI-based harnesses, Semble can be added to AGENTS.md or CLAUDE.md. This instructs the agent to use semble search "query" ./path instead of relying on grep.

Community Perspectives and Critical Analysis

While the benchmarks are impressive, the Hacker News community raised several critical points regarding the practical application of semantic search in agentic workflows.

The "Trust" Problem

One significant concern is whether LLMs will actually trust the results of a semantic search tool. As one user (@jerezzprime) noted, many models are heavily RL-tuned for grep. If a model doesn't trust a semantic result, it may still perform a grep or re-read the file anyway, nullifying the token savings.

Probabilistic vs. Deterministic Search

Unlike grep, which is deterministic, Semble is probabilistic. Some users expressed concern that a small embedding model might miss critical, obscure identifiers that a literal string search would find instantly.

The "Cognitive Deficit" Hypothesis

A more philosophical critique suggests that by providing a "shortcut" tool, we might be reducing the agent's cognitive capability. Some argue that the agent's ability to navigate a codebase using tree and grep is a core part of its reasoning process, and replacing that with a black-box retrieval tool could lead to a net negative in "deployable intelligence" over long-horizon tasks.

Real-World Testing

Despite these critiques, some users reported positive results. One user (@aadishv) conducted a side-by-side test and found that while the non-Semble version was sometimes more detailed, the Semble version was consistently more context-efficient and cost-effective for specific tracing tasks.

Conclusion

Semble represents a shift toward "agent-native" code search. By moving away from the expensive and imprecise process of reading entire files, it allows agents to operate with a much smaller context footprint. While the debate continues on whether probabilistic search can fully replace the deterministic nature of grep, the massive reduction in token overhead makes Semble a compelling tool for anyone scaling AI-driven development.

Semble: Optimizing Code Search for AI Agents with 98% Token Reduction

Semble: Optimizing Code Search for AI Agents with 98% Token Reduction

The Architecture of Semble

How it Works

Performance and Benchmarks

Integration and Workflow

1. Model Context Protocol (MCP) Server

2. Bash Integration

Community Perspectives and Critical Analysis

The "Trust" Problem

Probabilistic vs. Deterministic Search

The "Cognitive Deficit" Hypothesis

Real-World Testing

Conclusion

References

HN Stories