OpenClaw Issue Digest: Event Loop Bottlenecks and Critical System Deadlocks

The recent activity in the OpenClaw repository reveals a period of significant instability following the v2026.5.6 release. While the community continues to push for advanced orchestration and security features, the core system is currently battling severe performance regressions that threaten the basic usability of the gateway.

This digest covers a critical cluster of event loop bottlenecks, a catastrophic "Triple-Lock Deadlock" scenario, and several high-impact regressions affecting channel integrations and configuration management.

Open Issues

System Performance & Stability

Several reports highlight a fundamental architectural bottleneck in the single-threaded Node.js event loop.

Event Loop Saturation: Issue #78861 describes a critical state where WebSocket responses take up to 100 seconds and agent dispatch overhead reaches 26 seconds before the model is even called. This is attributed to synchronous operations in model-resolution, bootstrap-context, and core-plugin-tools blocking the main thread.
The Triple-Lock Deadlock: Issue #78908 details a "worst-case" scenario where a zombie dashboard session, 100% event loop utilization (ELU), and repeated model API timeouts compound to leave the system unresponsive for over 20 minutes.
Resource Leaks: Issue #76171 reports a load average spike (up to 31) caused by the accumulation of stale worker processes that fail to exit after completing cron or agent turns.

Channel & Integration Regressions

Discord Gateway Loops: Users are reporting a rapid 1006 disconnect loop in v2026.5.6 (#78910), where the WebSocket opens but fails to receive the initial HELLO frame, likely due to event loop saturation.
Feishu Connectivity: Multiple issues (#78840, #78702) indicate that the Feishu channel is suffering from synchronous startup blocking and silent failures following the externalization of the Feishu plugin.
WhatsApp Media Failures: A regression in v2026.5.6 (#78578) is causing MEDIA: directives in assistant replies to be dropped, despite direct CLI media sends continuing to work.

Model & Runtime Bugs

DeepSeek Integration: Issue #78903 reports that deepseek-v4-pro-thinking fails immediately when used as a subagent model due to request schema rejections, despite working in direct chat.
Codex Runtime Hangs: The native Codex runtime is reportedly hanging after the first tool-call follow-up sampling, leading to high CPU usage and host unresponsiveness (#78870).
Transcript Corruption: A critical bug in repairSessionFileIfNeeded (#78883) is deleting finalized plain-text assistant replies, causing models to repeat prior turns and doubling token costs.

Key Themes

1. Architectural Scaling Limits

The recurring theme across #78861, #78908, and #78910 is the failure of the single-threaded event loop under load. The community is calling for a transition to a multi-threaded gateway architecture or the offloading of heavy preparation tasks (like prompt building and model resolution) to Worker Threads.

2. Enhanced Agent Orchestration

There is a strong push for more "agent-aware" capabilities:

Dynamic Discovery: Request #7490 suggests adding a description field to agent configs so orchestrators can intelligently select sub-agents at runtime.
Self-Management: Request #6757 proposes a self-compact tool, allowing agents to trigger their own context compaction to avoid hitting limits during long-running tasks.
Sub-agent UX: Requests for graceful timeouts with pre-timeout warnings (#6625) aim to prevent total loss of work when a sub-agent is killed.

3. Data-Centric Security

A comprehensive proposal for "Security Profile v1.1" (#8719) advocates for moving away from LLM-based safety and toward hard enforcement. This includes defining data sensitivity levels (Safe, Critical, Secret) and gating destructive actions (Write/Delete/Export) behind human approval based on the resource's security level.

Action Required

Immediate Attention (P0)

Event Loop Fixes: Address the synchronous blocking in agent preparation and the zombie session recovery logic to prevent system-wide deadlocks (#78861, #78908).
Transcript Repair: Fix the isTrimmableTrailingAssistantEntry logic in compaction-successor-transcript.ts to stop the deletion of valid assistant replies (#78883).
Config Preservation: Fix doctor --fix to stop silently deleting unrecognized but valid configuration fields like the mcp section (#78858, #78848).

High Priority (P1)

Discord/Feishu Stability: Resolve the WS 1006 loops and startup blocking that are causing channel disconnects (#78910, #78840).
Sub-agent Routing: Fix the sessions_spawn scope errors (missing scope: operator.write) that are blocking native sub-agent delegation (#77807).