OpenClaw Issue Digest: Session Lane Starvation and Critical Gateway Stability
Open Issues
The recent activity in the OpenClaw repository reveals several critical stability and performance bottlenecks, particularly concerning session management and gateway reliability. A primary concern is the emergence of "session lane starvation," where background processes like followup drains and compaction cycles monopolize the session lane, blocking inbound user messages for up to 30 minutes (#54488). This is compounded by reports of significant memory leaks in the gateway, with some users seeing RSS grow from 389MB to nearly 15GB over four days (#54155), and OOM crashes during filesystem scans (#57349).
On the integration front, the Discord and Telegram channels are facing delivery and routing regressions. Discord users report that outbound attachments are silently dropped despite success codes (#53641), and messages are lost during WebSocket reconnect windows (#56610). In Telegram, a critical bug is causing callback_query events from inline buttons to be ignored or hallucinated by the agent rather than triggering the configured HTTP tools (#54909).
Finally, the ACP (Agent Control Protocol) and subagent workflows are experiencing "stuck" states. Parent sessions often remain non-responsive after a child subagent completes until a manual UI refresh is performed (#52249), and there is a documented failure in the direct announce flow for stale requester wake paths (#83699).
Key Themes
1. Session Lane & Resource Contention
There is a recurring theme of "lane blocking" where the sequential nature of session processing creates bottlenecks.
- Lane Starvation: System events and compaction cycles are processed in the same lane as user messages, leading to massive delays (#54488).
- Resource Leaks: Unbounded growth in
sessions.jsondue to duplicatedskillsSnapshotdata is leading to gateway OOMs (#55334). - Concurrency Gaps: Users are requesting concurrent message handling within a single session to prevent long-running tasks from blocking quick follow-up questions (#56880).
2. Reliability of Outbound Delivery
Several issues highlight a gap between the gateway reporting "success" and the actual delivery of content to the end user.
- Silent Drops: Discord attachments are not reaching the server despite valid message IDs (#53641).
- Delivery Gaps: The lack of a delivery queue for Discord during WebSocket reconnects leads to permanent message loss (#56610).
- Internal Failures: The
agent --deliverCLI command reports success (exit 0) but fails to actually deliver messages because it triggers an LLM turn instead of a direct send (#57284).
3. Security & Configuration Fragility
Recent updates have introduced regressions in how configuration and secrets are handled.
- Config Loss: Updates to v2026.3.24 have been reported to silently drop configurations when the
HOMEenvironment variable changes (#54634). - Secret Provider Loops: 1Password secret providers can enter a crash-loop that exhausts account-wide daily rate limits due to a lack of exponential backoff (#56217).
- Sandbox Escapes: Reports indicate that agents can occasionally access or modify files outside their designated working space (#54518).
Action Required
High Severity / Blocked
- #54488 (Lane Starvation): Immediate attention is needed to decouple context engine maintenance and followup drains from the primary session lane to prevent 20-30 minute inbound stalls.
- #54909 (Telegram Callback Routing): This is a critical failure of the inline button workflow, causing agents to hallucinate confirmations instead of executing real API calls.
- #53641 (Discord Attachment Loss): A regression that breaks the ability to send files via Discord, which is a core capability for many agents.
- #56217 (1Password Rate Limit Exhaustion): The current crash-loop behavior can lock out an entire organization's 1Password account for 24 hours; a circuit breaker is urgently required.
Contributor Focus
- Memory Management: Addressing the
sessions.jsonbloat (#55334) and thesession-resource-loadergrowth on Windows (#83943) is essential for long-term stability. - ACP Stabilization: Implementing the "Phase B" architecture for explicit
yieldWaitstate tracking (#52249) to fix the "stuck" parent session issue. - Configuration Safety: Fixing the
HOMEdirectory migration bug (#54634) to prevent silent data loss during updates.