OpenClaw Issue Digest: Stability Regressions and Runtime Isolation Gaps

Open Issues

Recent activity in the OpenClaw repository reveals a mix of high-severity stability regressions and a strong push toward more granular security and runtime isolation. While feature requests for UI enhancements and i18n continue, the core focus has shifted toward resolving critical failures in the gateway and session management layers.

Critical Stability & Runtime Regressions

Several reports indicate a pattern of "silent failures" where the system remains active but stops responding to users:

Gateway WebSocket Timeouts: Issue #79032 describes a critical regression in version 2026.5.3-1 where local WebSocket handshakes and HTTP fetches are hitting 10-second timeouts. This specifically blocks sub-agent "announce-back" paths, leaving parent agents unable to receive child output and causing chat sessions to appear non-responsive.
Session Persistence Failures: A regression in version 2026.5.6 (#79019) is causing Telegram direct messages to route and reply correctly but fail to persist in sessions.json. This results in a total loss of recoverable history after gateway restarts.
Claude-CLI Context Loss: Issue #77974 highlights a race condition where OpenClaw invalidates claude-cli sessions as missing-transcript before the CLI has finished flushing the transcript to disk, leading to intermittent "amnesia" in Telegram DM sessions.
Provider Cooldown Cascades: Reports (#76829, #77228) show that single-request format errors (such as assistant message prefill errors) are triggering provider-wide cooldowns, blocking all requests across all sessions for up to 42 minutes.

Security & Isolation Themes

There is a recurring theme of "silent no-ops" regarding security configurations when using CLI runtimes:

CLI Runtime Isolation Gaps: Issue #78879 warns that sandbox, workspaceOnly, and sessions_send configurations are silently ignored when using CLI runtimes (like claude-cli). Because the LLM uses the binary's native tools rather than PI-tools, agents may have full host filesystem access despite the configuration suggesting otherwise.
Local User Sandboxing: To address these gaps, there is a proposal (#78965) to implement a local user sandbox backend, allowing each agent to run under a dedicated OS user account to prevent credential collision and home-directory interference.
Tool-Level Isolation: Request #13543 proposes a "selective isolation" mode, allowing specific high-risk tools (like web_fetch) to always run in a Docker sandbox while keeping trusted tools on the host for performance.

Channel-Specific Issues

Slack Routing: A high-severity bug (#78666) reports that Slack replies are only delivered to the web gateway and not posted back to the originating Slack channel unless the message tool is explicitly called with a hardcoded channel ID.
Feishu Streaming: Issues #79042 and #55027 indicate that block streaming in Feishu is either dropping text-only replies or relying on a slow, character-by-character playback animation rather than true progressive delivery.
Matrix Rendering: Markdown tables in the Matrix channel are being incorrectly wrapped in fenced code blocks (#78990), degrading readability.

Key Themes

1. The "Silent Failure" Pattern

Many of the most severe bugs reported are not crashing the process but are instead causing the system to enter a "zombie" state. Whether it is the 10s gateway timeout (#79032), the Telegram sticky monotonic index (#77088), or the session lock stuck in processing (#70334), the common thread is a system that looks healthy in logs but fails to deliver responses to the user.

2. Runtime vs. Tooling Disconnect

There is a growing tension between the PI-runtime (which OpenClaw controls) and CLI-runtimes (which are external binaries). This has led to security misconceptions where users believe their agents are sandboxed when they are actually running with full user permissions via a CLI binary (#78879).

3. UX Friction in Multi-Agent Orchestration

Users are requesting better visibility into sub-agent behavior, including the ability to suppress unverified sub-agent announcements (#8299) and first-class tracking of session/task chains to reconstruct execution DAGs (#11040).

Action Required

Immediate Attention (High Severity)

#79032 (Gateway Timeouts): This is a primary blocker for sub-agent orchestration and needs urgent investigation into the 10s timeout cluster.
#79019 (Telegram Session Persistence): A regression in session indexing that causes data loss upon restart.
#78666 (Slack Reply Routing): A fundamental failure in the Slack integration's ability to reply to originating channels.
#78879 (CLI Security Warning): A critical need to add openclaw doctor warnings to prevent users from deploying unsandboxed agents under the guise of secure config.

Blocked or High-Priority Enhancements

#79026 (Active-Memory Deadlock): A predictable self-block on the main lane that requires a simple one-line fix to assign a dedicated lane for recall sub-agents.
#79038 (Webhook Session Security): A vulnerability where run_task accepts child sessions outside the route session tree, potentially allowing unauthorized session targeting.