OpenClaw Issue Digest: System Stability, Provider Regressions, and Sandbox Architecture
Open Issues
Recent activity in the OpenClaw repository reveals a mix of high-severity stability regressions and ambitious architectural proposals. The most critical reports center on system-wide hangs and authentication failures across various providers.
Critical Stability & Performance
Several reports indicate severe event-loop starvation. Issue #78402 describes a scenario where a single stuck exec tool call can block the entire runtime for over 20 minutes, causing WebSocket disconnects (codes 1000/1005/1006) and making the gateway unresponsive. Similarly, #76562 reports extreme control-plane RPC latency and 100% CPU utilization following upgrades to v2026.4.29 and v2026.5.2, suggesting a regression in how the gateway handles polling and status collection.
Provider & Integration Regressions
Model-specific issues are surfacing across several major providers:
- Google Gemini: Users report that Gemini models (3.1 Pro & 2.5 Pro) hang and timeout on main sessions while working fine in isolated subagents (#78502). Additionally, streaming 4xx responses are losing their JSON error bodies, hiding critical provider details (#78180).
- DeepSeek: A recurring "incomplete turn" error (
stopReason=stop payloads=0) is affecting the DeepSeek provider across versions v2026.5.3 through v2026.5.6 (#79061). - Mistral: A behavior bug is causing
[object Object]strings to appear in agent messages and memory when using Mistral thinking models (#78846).
Channel-Specific Bugs
- Telegram: Issues include silent drops of subagent completion announcements (#75663) and a failure to resolve
SecretRefbot tokens in the inbound message path (#79060). - Discord: A regression in v2026.5.7 causes the
messagetool to fail withUnknown Channelwhen attempting outbound-initiated sends to user DMs (#79109). - WhatsApp: Users report the channel becoming unavailable after upgrading to 2026.5.5 unless the
@openclaw/whatsappplugin is manually re-installed (#78593).
Architectural Proposals
Two major RFCs aim to harden the system:
- Unified Sandbox Architecture (#12505): A comprehensive proposal to move away from a single Node.js process. It suggests a tiered preset system (
minimal,standard,strict,paranoid) using platform-native isolation (e.g.,bubblewrapon Linux,AppContaineron Windows) to prevent plugin vulnerabilities from compromising the entire system. - Multi-Session Architecture (#48874): A proposal to decouple the LLM layer from isolated session layers and a shared public knowledge base, solving current issues with session isolation and multi-channel routing.
Key Themes
1. The "Silent Failure" Pattern
Across multiple issues, a theme of silent failures emerges. Whether it is cron jobs failing without logs (#13593), subagent wake events being dropped due to unrecognized error patterns (#78581), or Telegram group forum replies being silently skipped (#79062), the system often fails to notify the operator when a background process dies.
2. SecretRef Resolution Gaps
There is a recurring pattern of SecretRef resolution failing in specific code paths. While secrets audit may report success, the actual runtime often fails to resolve these references in channel startup paths (Discord #79073) or inbound message handlers (Telegram #79060).
3. Tool Execution Risks
The exec tool remains a primary source of instability. From event-loop starvation (#78402) to agents fabricating successful output after a "command not found" error (#60497), the lack of strict isolation and validation for shell execution is a recurring pain point.
Action Required
High Priority / Blocked
- Event-Loop Guarding: Immediate attention is needed to prevent
exectool calls from blocking the main Node.js event loop. Implementing a watchdog or forcing async execution is critical to prevent total gateway collapse. - Provider Fixes: The
FallbackSummaryErrorin subagent announcements (#78581) and the Gemini streaming error body loss (#78180) are high-impact bugs that hinder production debuggability. - Security Hardening: The
trusted-proxyauth mode bypass via local password fallback (#78684) is a critical security footgun that needs immediate remediation.
Contributor Attention Needed
- Installation Fixes: Multiple reports of installation failures on macOS (#79048) and Ubuntu (#72382) suggest the install script needs better dependency handling and path validation.
- CLI Consistency: The
gateway probeandgateway healthcommands exhibit inconsistent behavior regarding port flags and reachability reporting on Windows (#79100, #79099). - Voice-Call Stability: The Twilio stale-call reaper is prematurely ending active conversations (#79121), and Telnyx inbound calls are failing to trigger auto-responses (#79118).