OpenClaw Issue Digest: Critical Event Loop Bottlenecks and Multi-Agent Instability

Open Issues

The recent activity window reveals a system under significant architectural strain, particularly regarding the Node.js event loop and the stability of multi-agent orchestration. While several quality-of-life feature requests for the Control UI and channel adapters (Telegram, Discord, Feishu) have been submitted, the core focus remains on resolving critical performance bottlenecks and data-loss risks.

Critical Performance and Stability Issues

Several reports indicate that the OpenClaw Gateway is suffering from severe event loop blocking. One critical report describes WebSocket response times jumping to 100+ seconds and agent dispatch taking up to 26 seconds before a model is even called. The root cause is identified as a single-threaded architecture where heavy synchronous operations—such as macOS keychain access via execSync and various readFileSync calls—monopolize the main thread, effectively freezing the system for all users.

Similarly, issues with the openai-codex harness have surfaced, where a regression in version 2026.5.12 causes a "harness not registered" error, silently forcing agents onto fallback models and degrading performance and cost predictability.

Multi-Agent and Session Routing Failures

Orchestration instability is a recurring theme. Users report that concurrent agents add commands race on the global config file, leading to lost agent configurations. Furthermore, isolated agent runs are tripping session lock timeouts, and some runs appear to fail at the CLI layer while continuing to mutate worktrees in the background as "detached" processes.

Routing errors are also prevalent in channel-specific implementations:

Feishu: Reports indicate that all messages are being routed to a single agent regardless of bindings, and group chat replies are erroneously dispatching to the main webchat session.
Discord: A critical preflight bug allows messages to pass through mention-gating when a mention appears in quoted text rather than as an active target.
Telegram: Forum-topic voice notes are reaching agents as raw audio, bypassing the transcription and echo paths that work in direct messages.

Memory and Context Management

Memory management remains a point of contention, with reports of "chaos" in how different users experience chunking and embedding. Specific technical failures include MEMORY.md silent truncation at 20K characters and a bug where max_tokens is not subtracting used input tokens, leading to API "too large" errors for models with large context windows.

Key Themes

1. Architectural Bottlenecks

There is a clear push toward moving away from synchronous I/O and single-threaded dispatch. The current model of rebuilding system prompts and resolving model configurations on every turn is creating unacceptable latency.

2. Security and Data Integrity

Security concerns are rising, with requests for automatic output sanitization to prevent API key leakage and a critical report regarding the gh-issues skill, which injects untrusted issue bodies directly into sub-agent prompts, creating a prompt injection vector for agents with shell access.

3. Tooling and UX Refinement

There is a strong demand for better developer and operator tooling, including a stable Plugin SDK to prevent "correctness drift" in third-party skills and a more robust openclaw doctor that provides actionable repair hints rather than raw schema errors.

Action Required

High Severity / Blockers

Event Loop Optimization: Immediate refactoring of execSync and readFileSync in the critical path is required to prevent gateway freezes.
Codex Harness Registration: Fix the harness name mismatch in 2026.5.12 to restore openai-codex functionality.
Discord Mention Gating: Tighten the attention window for mention detection to prevent incorrect routing.
Prompt Injection in gh-issues: Implement sanitization or isolation for untrusted issue bodies to prevent arbitrary command execution.

Blocked or High-Attention Issues

Multi-Agent Config Races: Implement file-locking or serialization for agents add to prevent config overwrites.
Feishu Routing: Resolve the regression where multiple agents are collapsed into a single session.
Subagent Completion Loss: Address the silent loss of subagent results when announce-back fails, which currently leaves parent sessions in a permanent "waiting" state.