OpenClaw Issue Digest: Concurrency Bottlenecks and Subagent Reliability
Open Issues
Recent activity in the OpenClaw repository reveals a significant cluster of issues related to concurrency, session management, and the reliability of subagent orchestration. While the platform continues to expand its feature set, several high-severity regressions are impacting the stability of multi-agent workflows and the responsiveness of interactive sessions.
Critical System Bottlenecks
Several reports highlight severe event-loop starvation and concurrency failures. A critical issue (#4337H) describes a scenario where multiple concurrent agents cause all LLM API calls to time out simultaneously, despite the APIs being reachable. This is hypothesized to be a result of the default UV_THREADPOOL_SIZE being too low (4), leading to bottlenecks during concurrent file I/O for session locks and logs. Similarly, issue #82773 reports that sessions.usage requests can block the gateway for minutes on large profiles, causing WebSocket handshake timeouts and event-loop starvation.
Subagent and Orchestration Failures
Subagent reliability has emerged as a primary pain point. Users are reporting "silent loss" of subagent completions (#44925), where results are lost due to announce-back failures or timeouts without notification. Furthermore, a critical concurrency bug (#82758) in the native Codex app-server causes a second agent's turn to abort another agent's in-flight turn, leaving the victim agent "wedged" until a full gateway restart is performed.
Channel and Delivery Regressions
Delivery failures are appearing across multiple channels. In Telegram, inbound messages from Supergroup Forum Topics are not being processed (#81530), and interactive responses in groups are silently dropped (#82742). On Discord, internal tool-call traces (such as NO_REPLY and raw JSON arguments) are leaking into public channels (#44905), posing a security and UX risk.
Key Themes
1. Concurrency and Resource Contention
There is a recurring theme of "silent stalls" and "wedged" sessions. Whether it is the Codex app-server global client contention (#82758) or the shared timeoutSeconds budget across fallback chains (#43374), the system is struggling to isolate concurrent workloads. The evidence suggests that the transition to multi-agent orchestration has outpaced the underlying resource management and locking mechanisms.
2. The "Silent Failure" Pattern
Across subagents, cron jobs, and channel delivery, there is a pattern of failures that provide no feedback to the operator.
- Subagents: Time out without notifying the parent (#82784).
- Cron Jobs: Skip runs silently when local providers are unreachable, ignoring cloud fallbacks (#79329).
- Delivery: Responses are generated but never sent to the channel (#82742).
3. Configuration and Secret Management UX
Users are reporting friction with the openclaw.json workflow. Issues include the lack of a chat-based interface for managing Codex plugins (#82218) and the "plaintext" audit warnings that persist even when using environment variable references (#53998).
Action Required
Immediate contributor attention is needed for the following high-severity items:
- [Critical] Codex Concurrency (#82758): Fix the process-global shared client in the Codex app-server to prevent cross-agent turn abortions.
- [High] Discord Leakage (#44905): Implement a universal outbound sanitizer to prevent internal tool traces and
NO_REPLYmarkers from reaching end-users. - [High] Telegram Forum Processing (#81530, #82742): Resolve the regression preventing inbound message processing and outbound response delivery in Supergroup Forum Topics.
- [High] Event Loop Starvation (#43374, #82773): Address the
UV_THREADPOOL_SIZElimitations and the synchronous nature of usage cache rebuilds to prevent gateway-wide freezes. - [Medium] Subagent Reliability (#44925, #82787): Implement a reconciliation mechanism for late subagent success and a robust retry/notification system for completion announces.