OpenClaw Issue Digest: Routing Failures, Performance Regressions, and Sandbox Conflicts
Open Issues
Recent activity in the OpenClaw repository reveals several high-severity regressions and architectural bottlenecks, particularly affecting the Codex runtime and messaging channel reliability.
Critical Runtime & Auth Failures
One of the most severe issues involves the Codex harness, where users are reporting a total failure of the OpenAI Codex OAuth path. Issue #83380 and #81941 highlight a critical bug where valid OAuth profiles are not bound to requests, resulting in 401 Unauthorized errors and token_expired messages even for fresh logins. This is compounded by a performance collapse in v2026.5.12, where the Codex runtime path causes severe latency, high CPU usage, and stuck sessions (#82065), making the gateway effectively unusable for some users.
Messaging & Routing Regressions
Routing integrity is a recurring theme. A critical bug in the Signal channel (#83393) causes final assistant replies to be silently dropped or routed to internal surfaces (like Codex/VSCode) instead of returning to the original Signal recipient. Similarly, the Feishu/Lark channel is suffering from a massive performance regression (#82073), with response delays of 26-46 seconds per message due to a lack of caching in the core-plugin-tools and system prompt assembly stages.
Furthermore, Anthropic API compatibility is being broken by group chat context injection (#83419). By injecting metadata as a separate {role: "user"} message, OpenClaw creates consecutive same-role messages, which the Anthropic API strictly rejects. This results in silent fallbacks to Gemini models, meaning agents do not run on their configured primary models.
Infrastructure & Sandbox Conflicts
Sandbox nesting has emerged as a blocker for Docker users. Issue #83018 describes a conflict where the Codex inner bwrap sandbox fails when running inside an OpenClaw-managed Docker sandbox, leading to Operation not permitted errors during shell execution. This creates a nested isolation failure that prevents basic tool use.
Key Themes
1. The "Silent Failure" Pattern
Across multiple plugins, there is a trend of failures that do not surface clearly to the operator:
- Telegram Initialization: Bots stuck in "Bot not initialized" loops due to missing
bot.init()calls in isolated polling ingress (#81973). - Discord Token Resolution: Ref-based token resolution fails silently, skipping WebSocket connections without logging errors (#81926).
- Claude-CLI Classification: Empty subprocess responses are misclassified as
billingfailures, triggering unnecessary cooldowns (#83231).
2. Resource Leaks & Event Loop Starvation
Memory and file descriptor management are causing gateway instability:
- FD Leaks: A massive leak of ~14,000 file descriptors has been observed over 7 hours of uptime (#77327), eventually leading to
spawn EBADFand complete gateway unresponsiveness. - Watcher Overhead: The
memorySearchsync watcher leaks thousands of FDs when tracking large directory trees (#78224), saturating ulimits. - Event Loop Delay: Severe event loop starvation is being reported on Raspberry Pi canary builds (#83456), where cron forced-runs close the gateway and trigger CPU warnings.
3. Tool-Level Governance & Security
There is a growing demand for more granular control over agent actions. Feature requests like #48304 (Tool-level authorization) and #6615 (Exec-approval denylists) highlight a gap in the current security model, where prompt-based rules are consistently ignored by models in favor of being "helpful."
Action Required
Immediate Attention (P1/Critical)
- Codex Auth Fix: Resolve the
profile=-and401 token_expiredissues in the Codex harness to restore functionality for OAuth users (#83380, #81941). - Anthropic Role Alternation: Merge the fix to squash consecutive user messages in group chats to prevent forced fallbacks to Gemini (#83419).
- Signal Routing Invariant: Implement a delivery invariant to ensure Signal inbound turns always return to the Signal channel, regardless of the active runner surface (#83393).
High Priority (Performance & Stability)
- Feishu Latency: Implement caching for
core-plugin-toolsand system prompt assembly to reduce the 30s+ prep time (#82073). - FD Leak Triage: Investigate the
REGfile descriptor leak in the gateway to preventspawn EBADFcrashes (#77327). - Codex Sandbox Toggle: Add a configuration option to disable the inner Codex sandbox when an outer Docker sandbox is already active (#83018).
Blocked/Needs Decision
- Computer-Use Integration: Decide on the architectural placement of
screenshotandclicktools. The community is pushing for these to be native first-class primitives rather than skills to enable organic LLM iteration (#82083).