OpenClaw Issue Digest: System Stability, Provider Regressions, and Sandbox Architecture

Open Issues

Recent activity in the OpenClaw repository reveals a mix of high-severity stability regressions and ambitious architectural proposals. The most critical reports center on system-wide hangs and authentication failures across various providers.

Critical Stability & Performance

Several reports indicate severe event-loop starvation. Issue #78402 describes a scenario where a single stuck exec tool call can block the entire runtime for over 20 minutes, causing WebSocket disconnects (codes 1000/1005/1006) and making the gateway unresponsive. Similarly, #76562 reports extreme control-plane RPC latency and 100% CPU utilization following upgrades to v2026.4.29 and v2026.5.2, suggesting a regression in how the gateway handles polling and status collection.

Provider & Integration Regressions

Model-specific issues are surfacing across several major providers:

Google Gemini: Users report that Gemini models (3.1 Pro & 2.5 Pro) hang and timeout on main sessions while working fine in isolated subagents (#78502). Additionally, streaming 4xx responses are losing their JSON error bodies, hiding critical provider details (#78180).
DeepSeek: A recurring "incomplete turn" error (stopReason=stop payloads=0) is affecting the DeepSeek provider across versions v2026.5.3 through v2026.5.6 (#79061).
Mistral: A behavior bug is causing [object Object] strings to appear in agent messages and memory when using Mistral thinking models (#78846).

Channel-Specific Bugs

Telegram: Issues include silent drops of subagent completion announcements (#75663) and a failure to resolve SecretRef bot tokens in the inbound message path (#79060).
Discord: A regression in v2026.5.7 causes the message tool to fail with Unknown Channel when attempting outbound-initiated sends to user DMs (#79109).
WhatsApp: Users report the channel becoming unavailable after upgrading to 2026.5.5 unless the @openclaw/whatsapp plugin is manually re-installed (#78593).

Architectural Proposals

Two major RFCs aim to harden the system:

Unified Sandbox Architecture (#12505): A comprehensive proposal to move away from a single Node.js process. It suggests a tiered preset system (minimal, standard, strict, paranoid) using platform-native isolation (e.g., bubblewrap on Linux, AppContainer on Windows) to prevent plugin vulnerabilities from compromising the entire system.
Multi-Session Architecture (#48874): A proposal to decouple the LLM layer from isolated session layers and a shared public knowledge base, solving current issues with session isolation and multi-channel routing.

Key Themes

1. The "Silent Failure" Pattern

Across multiple issues, a theme of silent failures emerges. Whether it is cron jobs failing without logs (#13593), subagent wake events being dropped due to unrecognized error patterns (#78581), or Telegram group forum replies being silently skipped (#79062), the system often fails to notify the operator when a background process dies.

2. SecretRef Resolution Gaps

There is a recurring pattern of SecretRef resolution failing in specific code paths. While secrets audit may report success, the actual runtime often fails to resolve these references in channel startup paths (Discord #79073) or inbound message handlers (Telegram #79060).

3. Tool Execution Risks

The exec tool remains a primary source of instability. From event-loop starvation (#78402) to agents fabricating successful output after a "command not found" error (#60497), the lack of strict isolation and validation for shell execution is a recurring pain point.

Action Required

High Priority / Blocked

Event-Loop Guarding: Immediate attention is needed to prevent exec tool calls from blocking the main Node.js event loop. Implementing a watchdog or forcing async execution is critical to prevent total gateway collapse.
Provider Fixes: The FallbackSummaryError in subagent announcements (#78581) and the Gemini streaming error body loss (#78180) are high-impact bugs that hinder production debuggability.
Security Hardening: The trusted-proxy auth mode bypass via local password fallback (#78684) is a critical security footgun that needs immediate remediation.

Contributor Attention Needed

Installation Fixes: Multiple reports of installation failures on macOS (#79048) and Ubuntu (#72382) suggest the install script needs better dependency handling and path validation.
CLI Consistency: The gateway probe and gateway health commands exhibit inconsistent behavior regarding port flags and reachability reporting on Windows (#79100, #79099).
Voice-Call Stability: The Twilio stale-call reaper is prematurely ending active conversations (#79121), and Telnyx inbound calls are failing to trigger auto-responses (#79118).