OpenClaw Issue Digest: Tooling Regressions and Multi-Agent Orchestration Gaps

The recent window of activity in the OpenClaw repository highlights a period of significant architectural tension. While the project continues to expand its multi-agent capabilities, several high-severity regressions in the core execution engine—particularly concerning tool calls and the Codex runtime—have emerged, threatening the stability of automated workflows.

Simultaneously, there is a clear trend toward "production-grade" requirements. Contributors are increasingly requesting deterministic cost governance, structured agent handoffs, and better observability for long-running tasks, signaling a shift from experimental agent use to deployed automation.

Open Issues

Critical Regressions & Stability

Several issues point to a breakdown in the reliability of tool execution and runtime stability:

Codex Runtime Stalls: Issue #83109 reports a critical regression where Codex-runtime agents stall indefinitely during tool-using turns. This is attributed to hardcoded features.code_mode_only: true flags in the @openclaw/codex plugin, which force a synthetic JS-eval tool that fails to trigger the necessary task_complete events.
Tool Call Hangs: Issue #83546 describes a regression in v2026.5.12 where tool outputs frequently hang in WebChat, specifically when tools produce large outputs. This is compounded by reports of commands.log stopping entirely after gateway restarts.
Codex Dynamic Tooling: Issue #83474 highlights sessions getting stuck in blocked_tool_call state even after successful execution of dynamic bash commands in the Codex harness.
Event Loop Degradation: Multiple reports (#82936, #77115) indicate severe event-loop stalls and high CPU usage under subagent load, with some cases seeing P99 delays of over 12 seconds, leading to CLI timeouts and SIGKILLs.

Multi-Agent & Orchestration Gaps

As users deploy more complex agent swarms, the limitations of the current hierarchical delegation model have become apparent:

Silent Spawn Failures: Issue #83557 reveals that ad-hoc subagent spawns on OpenAI GPT models fail silently if any thinking level other than off is requested.
Information Silos: A comprehensive RFC (#35203) proposes a "Multi-Agent Collaboration Stack" to solve the problem of isolated workspaces. The proposal suggests a shared "Blackboard" for discoveries and a layered memory system (Private/Team/Global) to prevent redundant research.
Handoff Fragility: Issue #33478 argues that the current REPLY_SKIP logic for agent-to-agent handoffs is too fragile, as any conversational chatter from the LLM (e.g., "Success!") kills the internal announce loop.

Infrastructure & Security

Sandbox Escapes: Issue #17931 points out a security gap where skill directories are copied into writable sandbox workspaces, allowing agents to potentially modify their own instructions.
SSRF Risks: Issue #38931 requests a "confirm" mode for private network access to balance the need for local NAS/router management with the risk of malicious internal scanning.
Auth Regressions: Issue #83558 reports that the device-code authentication method for OpenAI Codex was dropped in v2026.5.12, blocking headless VPS installs.

Key Themes

1. The "Production-Grade" Shift

There is a recurring theme of moving away from "best-effort" AI behavior toward deterministic control. This is evident in requests for:

Cost Governance: Requests for per-turn model overrides (#83565) and global token budgets (#35203) to prevent "token runaway" in multi-agent loops.
Deterministic Execution: Proposals for a payload.kind = "exec" for cron jobs (#18160) to bypass the LLM entirely for simple scripts.
Observability: A strong demand for human-readable live progress logs (#83441) to replace the need for parsing raw trajectory JSONL files.

2. Modality Expansion

Users are pushing the boundaries of what agents can "sense" and "do":

Audio Integration: Requests to treat audio files as multimodal attachments (#35835) rather than raw binary text.
Native Search: A push to leverage the free native web search capabilities of Gemini and GLM (#17925) instead of relying on paid third-party APIs.

3. UX Refinement for Power Users

As the toolset grows, the UI is lagging. Key requests include a persistent "Active Agent" indicator in the dashboard (#30861) and better conversation management/categorization in WebChat (#27526).

Action Required

Immediate Attention (P0/P1)

Fix Codex Runtime Flags: Resolve the hardcoded code_mode_only flags in @openclaw/codex to restore tool-using capabilities for Codex agents (#83109).
Address Event Loop Stalls: Investigate the diagnostic event dispatch path to prevent the gateway from starving the main event loop during concurrent agent bursts (#82936).
Restore Device-Code Auth: Re-implement the device-code flow for Codex to unblock headless installations (#83558).

Blocked or High-Severity

Subagent Spawn Logic: Fix the silent failure of OpenAI-family subagent spawns when reasoning is enabled (#83557).
Sandbox Security: Implement read-only bind mounts for skill directories to prevent agent self-modification (#17931).
WebChat Hangs: Diagnose the I/O or streaming issue causing tool outputs to stall in v2026.5.12 (#83546).