OpenClaw Issue Digest: Runtime Parity, Tooling Loops, and Infrastructure Stability
The recent activity window for the OpenClaw repository reveals a significant architectural shift as the project moves toward Codex as the default runtime for OpenAI agent turns. This transition has introduced a complex set of parity challenges, necessitating the development of a comprehensive QA harness to ensure that tool-call shapes, token efficiency, and auth-profile selection remain consistent between the legacy Pi and new Codex runtimes.
Beyond the runtime migration, the community has identified several critical behavior bugs—most notably infinite tool-call loops and session-management regressions—that impact the reliability of autonomous agents across various channel integrations including Telegram, Mattermost, and Microsoft Teams.
Open Issues
Runtime Parity and Codex Migration
Central to current development is the Codex-vs-Pi runtime parity effort. A new QA harness is being implemented to track drift across scenarios, runtimes, and auth shapes. Key areas of concern include:
- Token Efficiency: Reports indicate that while Codex is often cheaper, certain fixtures (like
runtime-tool-fs-read) show significant token regressions compared to Pi. - Tool Exposure: There are ongoing reports that MCP server tools are not reaching the outbound
tools[]array in the LLM request body, despite servers being healthy and registered. This has persisted across multiple stable releases (4.26 through 5.7). - Auth Profile Selection: Issues have surfaced where the Pi runtime silently uses orphaned credentials from
auth-profiles.jsoneven after they are removed from the mainopenclaw.jsonconfiguration, leading to unexpected billing sources.
Tooling and Execution Loops
Several reports highlight a failure in the "fail-fast" mechanism for tool execution:
- Infinite Loops: The Kimi Code model has been observed entering infinite loops, repeatedly calling the same tools with identical parameters. This is attributed to a bug in the Kimi stream wrapper that unconditionally rewrites
stopReasonfromstoptotoolUse. - Iteration Limits: There is a strong request for
maxTurnsandmaxToolCallsconfiguration options to prevent runaway agent loops, particularly for models that ignore system prompt instructions to stop. - Execution Failures: Windows users are reporting
spawn EPERMerrors when using theexectool, effectively blocking all shell commands.
Infrastructure and Stability
Stability regressions have been noted across different operating systems and filesystem types:
- Filesystem Locks: A critical bug in session lock acquisition uses
fs.promises.link()(hard links), which causesENOTSUPcrashes on SMB, NFS, and virtiofs mounts. - macOS SIGKILLs: Users report intermittent
SIGKILLfailures during broad diagnostic scans of large state directories, with insufficient metadata provided to distinguish between OOM kills and supervisor timeouts. - Windows Hard Crashes: The gateway has been observed hard-crashing with
STATUS_STACK_BUFFER_OVERRUN(0xC0000409) during Mattermost streaming replies, often leaving the bot in a half-finished state. - Memory Dreaming: The dreaming pipeline is suffering from "noise pollution," where heartbeat pings and duplicate context blocks are ingested into the corpus, drowning out actual signal and stalling the promotion of memories to REM sleep.
Key Themes
1. The "Silent Failure" Pattern
Across multiple subsystems, there is a recurring theme of operations claiming success while failing silently:
- Auth Login: On Windows,
openclaw models auth loginmay report success but fail to persist tokens if theopenclaw.jsonwas created by PowerShell (due to a size-drop safety guard reacting to BOM/indentation changes). - Cron Delivery: Cron jobs with
--announcereportdelivered: truebut messages never arrive on WeChat or Feishu channels. - Lobby/Help UX:
openclaw <unknown-command> --helpexits with code 0 and shows generic help instead of erroring, misleading users into thinking the command exists.
2. Memory and Context Management
Efforts to optimize the "brain" of the agent are focusing on reducing overhead:
- Prompt Assembly: There is a documented linear latency increase based on tool-schema size. Trivial turns (e.g., "say hello") pay the full cost of the ~18k token schema, leading to calls for "lite-mode" or lazy tool registration.
- Context Overflow: Long-running WebChat sessions can silently orphan the
agent:main:mainmapping during compaction failures, causing the UI to rotate to a new session and lose continuity.
3. Channel-Specific Gaps
- Telegram: Issues include the fabrication of "silent-reply chatter" (e.g., "No added response from me") for turns that should remain truly silent.
- MSTeams: A need for configurable thread-session isolation has emerged, as some channel-bound agents need a single continuous memory rather than fragmented per-thread sessions.
Action Required
High Severity / Blockers
- Hard Link Lock Fix: Immediate attention is needed for
src/infra/session-cost-usage.tsto replacefs.linkwith a portable exclusive-create primitive to support network filesystems. - Kimi Loop Fix: The
rewriteKimiTaggedToolCallsInMessagelogic in the Kimi stream wrapper needs to be gated by a check for actual tool calls to stop infinite loops. - MCP Tool Exposure: Investigation into why
server__*tools are dropped before serialization is required to unblock MCP-reliant agents.
Blocked / Needs Attention
- Windows EPERM: The
spawn EPERMissue in theexectool on Windows 11 needs a root-cause analysis to restore basic shell functionality. - Auth Persistence: The size-drop guard in the config rewrite path needs to be updated to handle BOM/PowerShell formatting to prevent silent auth failures.
- Dreaming Pipeline: Implementation of corpus pre-filtering and weighted scoring is necessary to prevent the dreaming process from stalling on noise.