Beyond Statelessness: Why LLMs are Challenging Traditional System Design
For two decades, the blueprint for scalable web architecture has been remarkably consistent: state lives in the database, and compute is stateless. In this model, any request can hit any server behind a load balancer, and the database serves as the absolute single source of truth. This design allowed the internet to scale to billions of users by treating application servers as disposable commodities.
However, the emergence of Large Language Models (LLMs) and autonomous agents is beginning to fracture this long-standing assumption. While the "stateless compute" model works for traditional CRUD applications, it is increasingly ill-suited for the requirements of agentic AI.
The Three Fractures of Stateless Architecture
Agentic workflows introduce three specific challenges that traditional stateless request-response cycles cannot efficiently handle:
- Long-Running Work: An AI agent performing a complex task—such as researching a topic and writing a report—might take ten minutes to complete. This is no longer a "request" in the HTTP sense; it is a long-running asynchronous process.
- Stateful Compute: Agents rely on accumulated context, memory of previous turns in a conversation, and the results of multiple tool calls. This state is not merely "database state" (like a user profile); it is the active memory of a running process.
- Bi-directional Interaction: Users don't just want a final answer; they want to watch an agent "think," interrupt its progress, or redirect its trajectory in real-time. This transforms the interaction from a query to a stateless API into a live conversation with a running process.
The Routing Problem: Why Polling Isn't Enough
To handle the execution part of this problem, the industry has turned to "durable execution" frameworks (such as Temporal, Inngest, or Restate). These tools make the process resilient, ensuring that if a server crashes, the workflow can resume from where it left off.
But durable execution solves the process problem, not the interaction problem. Because standard load balancers and HTTP cannot route a request to a specific running process, developers often revert to polling. The client polls a database endpoint to see if the durable process has written an update.
As noted in the source material, this essentially treats the database as a message bus—a workaround for a fundamental routing deficiency. Polling introduces latency, increases database load, and creates a poor user experience for streaming data.
Seeking a New Routing Primitive
To move beyond polling, we need a routing primitive that can address a process rather than just a server or a database. The goal is to be able to say: "Deliver this message to whoever is currently producing output for workflow X," without needing to know the specific server replica or IP address.
Why WebSockets Aren't the Answer
WebSockets provide bi-directional communication, but they are a connection, not an address. If a WebSocket connection drops—perhaps because a user enters a tunnel—the address is lost. There is no native way to reconnect to the exact same process state without an external routing mechanism.
The Case for Pub/Sub Channels
A more robust solution is the use of named pub/sub channels. In this model, neither the client nor the server process is the address; the transport channel is the address. Both the client and the server connect to a named channel. If the connection drops, the client simply reconnects to the same named channel and resumes the stream of data.
By combining durable execution with durable transport (pub/sub), developers can create agentic applications where the workflow is resilient and the communication is seamless, eliminating the need to thread every token through a database for the sake of resilience.
Counterpoints and Industry Perspectives
The shift toward stateful routing is not without its critics. Some engineers argue that these "problems" are not new. Long-running jobs, webhooks, and job IDs have been used for decades to handle asynchronous work. Critics suggest that the "single source of truth" in the database is a proven pattern that should remain the core of system design.
Others point to existing technologies that already solve these issues, such as:
- Virtual Actors: The actor model (seen in Elixir or Azure Orleans) provides a way to address specific stateful entities.
- Durable Objects: Cloudflare's Durable Objects provide a way to coordinate state and compute in a single location.
- Session Pinning: A traditional method to ensure a client stays connected to a specific server.
Why LLMs Make This Urgent
If these patterns existed before, why does it matter now? The difference lies in the nature of LLMs: they are non-deterministic and expensive.
In a traditional system, if a connection drops, you can often retry the request and get the same result. With LLMs, retrying a request might produce a different answer and cost more tokens. You cannot afford to waste expensive compute or lose the state of a complex, non-deterministic chain of thought simply because of a network flicker.
LLMs have not invented these problems, but they have amplified them, making the trade-offs of the 20-year-old stateless web architecture more painful and visible than ever before.