← Back to Blogs
GH Issues

OpenClaw Issue Digest: Tooling Regressions and Multi-Agent Orchestration Gaps

06:30–12:30 UTC May 18, 2026

OpenClaw Issue Digest: Tooling Regressions and Multi-Agent Orchestration Gaps

The recent window of activity in the OpenClaw repository highlights a period of significant architectural tension. While the project continues to expand its multi-agent capabilities, several high-severity regressions in the core execution engine—particularly concerning tool calls and the Codex runtime—have emerged, threatening the stability of automated workflows.

Simultaneously, there is a clear trend toward "production-grade" requirements. Contributors are increasingly requesting deterministic cost governance, structured agent handoffs, and better observability for long-running tasks, signaling a shift from experimental agent use to deployed automation.

Open Issues

Critical Regressions & Stability

Several issues point to a breakdown in the reliability of tool execution and runtime stability:

  • Codex Runtime Stalls: Issue #83109 reports a critical regression where Codex-runtime agents stall indefinitely during tool-using turns. This is attributed to hardcoded features.code_mode_only: true flags in the @openclaw/codex plugin, which force a synthetic JS-eval tool that fails to trigger the necessary task_complete events.
  • Tool Call Hangs: Issue #83546 describes a regression in v2026.5.12 where tool outputs frequently hang in WebChat, specifically when tools produce large outputs. This is compounded by reports of commands.log stopping entirely after gateway restarts.
  • Codex Dynamic Tooling: Issue #83474 highlights sessions getting stuck in blocked_tool_call state even after successful execution of dynamic bash commands in the Codex harness.
  • Event Loop Degradation: Multiple reports (#82936, #77115) indicate severe event-loop stalls and high CPU usage under subagent load, with some cases seeing P99 delays of over 12 seconds, leading to CLI timeouts and SIGKILLs.

Multi-Agent & Orchestration Gaps

As users deploy more complex agent swarms, the limitations of the current hierarchical delegation model have become apparent:

  • Silent Spawn Failures: Issue #83557 reveals that ad-hoc subagent spawns on OpenAI GPT models fail silently if any thinking level other than off is requested.
  • Information Silos: A comprehensive RFC (#35203) proposes a "Multi-Agent Collaboration Stack" to solve the problem of isolated workspaces. The proposal suggests a shared "Blackboard" for discoveries and a layered memory system (Private/Team/Global) to prevent redundant research.
  • Handoff Fragility: Issue #33478 argues that the current REPLY_SKIP logic for agent-to-agent handoffs is too fragile, as any conversational chatter from the LLM (e.g., "Success!") kills the internal announce loop.

Infrastructure & Security

  • Sandbox Escapes: Issue #17931 points out a security gap where skill directories are copied into writable sandbox workspaces, allowing agents to potentially modify their own instructions.
  • SSRF Risks: Issue #38931 requests a "confirm" mode for private network access to balance the need for local NAS/router management with the risk of malicious internal scanning.
  • Auth Regressions: Issue #83558 reports that the device-code authentication method for OpenAI Codex was dropped in v2026.5.12, blocking headless VPS installs.

Key Themes

1. The "Production-Grade" Shift

There is a recurring theme of moving away from "best-effort" AI behavior toward deterministic control. This is evident in requests for:

  • Cost Governance: Requests for per-turn model overrides (#83565) and global token budgets (#35203) to prevent "token runaway" in multi-agent loops.
  • Deterministic Execution: Proposals for a payload.kind = "exec" for cron jobs (#18160) to bypass the LLM entirely for simple scripts.
  • Observability: A strong demand for human-readable live progress logs (#83441) to replace the need for parsing raw trajectory JSONL files.

2. Modality Expansion

Users are pushing the boundaries of what agents can "sense" and "do":

  • Audio Integration: Requests to treat audio files as multimodal attachments (#35835) rather than raw binary text.
  • Native Search: A push to leverage the free native web search capabilities of Gemini and GLM (#17925) instead of relying on paid third-party APIs.

3. UX Refinement for Power Users

As the toolset grows, the UI is lagging. Key requests include a persistent "Active Agent" indicator in the dashboard (#30861) and better conversation management/categorization in WebChat (#27526).

Action Required

Immediate Attention (P0/P1)

  • Fix Codex Runtime Flags: Resolve the hardcoded code_mode_only flags in @openclaw/codex to restore tool-using capabilities for Codex agents (#83109).
  • Address Event Loop Stalls: Investigate the diagnostic event dispatch path to prevent the gateway from starving the main event loop during concurrent agent bursts (#82936).
  • Restore Device-Code Auth: Re-implement the device-code flow for Codex to unblock headless installations (#83558).

Blocked or High-Severity

  • Subagent Spawn Logic: Fix the silent failure of OpenAI-family subagent spawns when reasoning is enabled (#83557).
  • Sandbox Security: Implement read-only bind mounts for skill directories to prevent agent self-modification (#17931).
  • WebChat Hangs: Diagnose the I/O or streaming issue causing tool outputs to stall in v2026.5.12 (#83546).

References

Issues