← Back to Blogs
HN Story

Subquadratic and the Quest for the 12 Million Token Context Window

May 10, 2026

Subquadratic and the Quest for the 12 Million Token Context Window

The landscape of Large Language Models (LLMs) has long been defined by the struggle against the "context window"—the limit on how much information a model can process in a single prompt. While industry leaders have pushed these limits to 1 million or 2 million tokens, a new entrant, Subquadratic, claims to have shattered this ceiling with a 12 million token window.

This leap represents a potential paradigm shift in how AI handles massive datasets, allowing for the ingestion of entire codebases, thousands of pages of documentation, or exhaustive legal archives without the need for aggressive RAG (Retrieval-Augmented Generation) or complex chunking strategies.

The Promise of Subquadratic

The core value proposition of Subquadratic is the ability to maintain coherence and retrieval accuracy across a massive 12M token span. In practical terms, this would allow developers to feed an entire enterprise-scale project into a model, enabling the AI to understand deep architectural dependencies that are often lost when using smaller windows or fragmented retrieval methods.

For many, this is seen as a fundamental evolution in data compression and processing for AI. As one observer noted, the impact of such a breakthrough could be compared to the introduction of the JPG format for images—a way to handle vast amounts of data more efficiently without sacrificing the essential utility of the output.

Technical Skepticism and the "Black Box" Problem

Despite the ambitious claims, the technical community remains cautious. The primary point of contention is the absence of a formal technical report, whitepaper, or open-source weights. In the world of AI research, claims of this magnitude are typically accompanied by a "needle-in-a-haystack" test or a detailed explanation of the attention mechanism used to achieve subquadratic scaling.

Community members on Hacker News have expressed significant skepticism, with several users stating they will only believe the claims once a model card or peer-reviewed paper is released. The lack of transparency is often attributed to the nature of VC-funded startups, where proprietary advantages are guarded closely to maintain investment valuation.

Speculations on the Underlying Architecture

While Subquadratic has not disclosed its methodology, experts speculate that the technology likely relies on advanced attention mechanisms. One theory suggests the use of native sparse attention with content-based granularity, similar to approaches seen in DeepSeek's architectures. By avoiding the quadratic cost of standard self-attention—where the computational requirement grows exponentially with the sequence length—Subquadratic may be utilizing a linear or logarithmic scaling method to handle the 12M token load.

The Practicality of Massive Context

Beyond the technical feasibility, there is a debate regarding whether a 12M token window is actually necessary for most use cases. For instance, users of tools like Claude Code have noted that 1 million tokens are often sufficient for the majority of coding tasks.

This raises a critical question for the future of LLM design: is the goal to infinitely expand the context window, or to improve the efficiency of how models manage that context? Potential alternatives to massive windows include:

  • Context Compaction: Summarizing previous interactions to save space.
  • Memory Tools: Implementing external databases that the model can query dynamically.
  • Dynamic Windowing: Adjusting the context size based on the complexity of the task.

Conclusion

Subquadratic's claim of a 12 million token window is a bold assertion that could redefine the boundaries of AI memory. However, until the company moves beyond marketing claims and provides a technical foundation for its achievements, it remains a speculative milestone. The tension between proprietary corporate secrecy and the scientific need for verification continues to be a defining characteristic of the current AI arms race.

References

HN Stories