OpenClaw Issue Digest: Runtime Parity, Tooling Loops, and Infrastructure Stability

The recent activity window for the OpenClaw repository reveals a significant architectural shift as the project moves toward Codex as the default runtime for OpenAI agent turns. This transition has introduced a complex set of parity challenges, necessitating the development of a comprehensive QA harness to ensure that tool-call shapes, token efficiency, and auth-profile selection remain consistent between the legacy Pi and new Codex runtimes.

Beyond the runtime migration, the community has identified several critical behavior bugs—most notably infinite tool-call loops and session-management regressions—that impact the reliability of autonomous agents across various channel integrations including Telegram, Mattermost, and Microsoft Teams.

Open Issues

Runtime Parity and Codex Migration

Central to current development is the Codex-vs-Pi runtime parity effort. A new QA harness is being implemented to track drift across scenarios, runtimes, and auth shapes. Key areas of concern include:

Token Efficiency: Reports indicate that while Codex is often cheaper, certain fixtures (like runtime-tool-fs-read) show significant token regressions compared to Pi.
Tool Exposure: There are ongoing reports that MCP server tools are not reaching the outbound tools[] array in the LLM request body, despite servers being healthy and registered. This has persisted across multiple stable releases (4.26 through 5.7).
Auth Profile Selection: Issues have surfaced where the Pi runtime silently uses orphaned credentials from auth-profiles.json even after they are removed from the main openclaw.json configuration, leading to unexpected billing sources.

Tooling and Execution Loops

Several reports highlight a failure in the "fail-fast" mechanism for tool execution:

Infinite Loops: The Kimi Code model has been observed entering infinite loops, repeatedly calling the same tools with identical parameters. This is attributed to a bug in the Kimi stream wrapper that unconditionally rewrites stopReason from stop to toolUse.
Iteration Limits: There is a strong request for maxTurns and maxToolCalls configuration options to prevent runaway agent loops, particularly for models that ignore system prompt instructions to stop.
Execution Failures: Windows users are reporting spawn EPERM errors when using the exec tool, effectively blocking all shell commands.

Infrastructure and Stability

Stability regressions have been noted across different operating systems and filesystem types:

Filesystem Locks: A critical bug in session lock acquisition uses fs.promises.link() (hard links), which causes ENOTSUP crashes on SMB, NFS, and virtiofs mounts.
macOS SIGKILLs: Users report intermittent SIGKILL failures during broad diagnostic scans of large state directories, with insufficient metadata provided to distinguish between OOM kills and supervisor timeouts.
Windows Hard Crashes: The gateway has been observed hard-crashing with STATUS_STACK_BUFFER_OVERRUN (0xC0000409) during Mattermost streaming replies, often leaving the bot in a half-finished state.
Memory Dreaming: The dreaming pipeline is suffering from "noise pollution," where heartbeat pings and duplicate context blocks are ingested into the corpus, drowning out actual signal and stalling the promotion of memories to REM sleep.

Key Themes

1. The "Silent Failure" Pattern

Across multiple subsystems, there is a recurring theme of operations claiming success while failing silently:

Auth Login: On Windows, openclaw models auth login may report success but fail to persist tokens if the openclaw.json was created by PowerShell (due to a size-drop safety guard reacting to BOM/indentation changes).
Cron Delivery: Cron jobs with --announce report delivered: true but messages never arrive on WeChat or Feishu channels.
Lobby/Help UX: openclaw <unknown-command> --help exits with code 0 and shows generic help instead of erroring, misleading users into thinking the command exists.

2. Memory and Context Management

Efforts to optimize the "brain" of the agent are focusing on reducing overhead:

Prompt Assembly: There is a documented linear latency increase based on tool-schema size. Trivial turns (e.g., "say hello") pay the full cost of the ~18k token schema, leading to calls for "lite-mode" or lazy tool registration.
Context Overflow: Long-running WebChat sessions can silently orphan the agent:main:main mapping during compaction failures, causing the UI to rotate to a new session and lose continuity.

3. Channel-Specific Gaps

Telegram: Issues include the fabrication of "silent-reply chatter" (e.g., "No added response from me") for turns that should remain truly silent.
MSTeams: A need for configurable thread-session isolation has emerged, as some channel-bound agents need a single continuous memory rather than fragmented per-thread sessions.

Action Required

High Severity / Blockers

Hard Link Lock Fix: Immediate attention is needed for src/infra/session-cost-usage.ts to replace fs.link with a portable exclusive-create primitive to support network filesystems.
Kimi Loop Fix: The rewriteKimiTaggedToolCallsInMessage logic in the Kimi stream wrapper needs to be gated by a check for actual tool calls to stop infinite loops.
MCP Tool Exposure: Investigation into why server__* tools are dropped before serialization is required to unblock MCP-reliant agents.

Blocked / Needs Attention

Windows EPERM: The spawn EPERM issue in the exec tool on Windows 11 needs a root-cause analysis to restore basic shell functionality.
Auth Persistence: The size-drop guard in the config rewrite path needs to be updated to handle BOM/PowerShell formatting to prevent silent auth failures.
Dreaming Pipeline: Implementation of corpus pre-filtering and weighted scoring is necessary to prevent the dreaming process from stalling on noise.