OpenClaw Issue Digest: Context Overflows, Auth Regressions, and Sandbox Stability
Open Issues
Recent activity in the OpenClaw repository reveals a cluster of high-impact regressions and stability concerns, particularly surrounding context management and authentication.
Critical Regressions & Bugs
- Token Accounting & Premature Compaction: Multiple reports indicate that prompt token counting is inflated for certain providers. Specifically, for MiniMax models (#68470) and Anthropic Opus 1M variants (#72964),
cacheReadtokens are being double-counted. This triggers premature session compaction and memory flushes at as little as 20% of actual context usage, leading to unnecessary loss of conversation history. - Authentication & Billing Drifts: Users of the
claude-clibackend are reporting a critical regression where valid OAuth sessions are suddenly marked asdisabled:billing(#82212). This is suspected to be a mismatch in how OpenClaw's CLI backend is treated by Anthropic compared to direct CLI usage, triggering "extra usage" billing errors. - Session Amnesia in Group Channels: A significant bug in the
claude-cliimplementation (#69118) causes sessions to reset on every turn in group channels. This is due to a hash mismatch in theextraSystemPromptwhen thegroupIntroblock is removed after the first turn, effectively giving the agent amnesia within seconds. - Security Authorization Bypass: A high-severity vulnerability (#68703) was identified in the Discord integration. While moderation actions are gated, guild-admin mutation actions (like
channel-delete) bypass requester authorization checks, allowing any guild member to trigger privileged bot actions.
Stability & Infrastructure
- Sandbox Resource Exhaustion: Reports indicate that sandboxed sessions can accumulate zombie processes under PID 1 (#68691), risking
pids.maxexhaustion. Additionally, misconfigured MCP servers can trigger "retry storms" (#68527), spawning hundreds of child processes and consuming gigabytes of RSS, potentially wedging the entire VM. - Gateway & Connectivity Issues: Several reports highlight instability in the Telegram channel, including polling stalls (#68494) and silent outbound message loss during recovery windows (#50040). On Windows, editing
openclaw.jsonwhile the gateway is running can trigger a crash loop due to stale lock files andEADDRINUSEerrors (#68493).
Feature Requests & UX Improvements
- Voice & TTS Enhancements: Proposals include adding xAI Realtime Voice Agent support via a shared OpenAI-Realtime protocol adapter (#73019) and implementing a
before_ttsmodifying hook to allow per-message voice routing (#69307). - UI/UX Polish: Requests for ultra-widescreen support in the Control UI (#72772) and the addition of a completion notification sound for agent turns (#69186) to improve background-tab usability.
Key Themes
1. The "Context-Compaction" Feedback Loop
There is a recurring theme of failure in the interaction between token estimation and session compaction. When the system overestimates context usage (due to cache-token double-counting), it compacts too early. This is compounded by reports that compaction can sometimes emit empty fallback summaries (#72964) or break session invariants (#69269), turning a performance optimization into a data-loss event.
2. CLI Backend Fragility
The claude-cli and other CLI-based backends are showing significant fragility. From session resets in group chats to billing-disabled false positives, the abstraction layer between the OpenClaw gateway and the external CLI binaries is currently a primary source of instability.
3. Enterprise & Governance Gaps
Several requests (#72991, #73082) highlight a gap in enterprise readiness. The lack of machine-wide hook policies and the need for readonly auto-approval scopes for canary skills suggest that OpenClaw's current security model is too user-centric and lacks the administrative controls required for regulated environments.
Action Required
Immediate Attention (High Severity)
- Fix Discord Authorization Bypass (#68703): Implement requester authorization checks for all guild-admin mutation actions to prevent unauthorized channel/role modifications.
- Resolve Token Double-Counting (#68470, #72964): Correct the
normalizeUsageandderivePromptTokenslogic to ensurecacheReadtokens do not trigger premature compaction. - Repair
claude-cliSession Resets (#69118): RemoveextraSystemPromptHashfrom the session reuse key to stop the turn-2 amnesia in group channels.
Blocked or High-Priority Stability
- MCP Circuit Breaker (#68527): Implement exponential backoff and a circuit breaker for MCP server restarts to prevent VM-level resource exhaustion.
- Sandbox Zombie Reaping (#68691): Ensure the sandbox PID 1 correctly reaps orphaned children to prevent process limit exhaustion.
- PDF Tool Timeout (#68649): Add a mandatory timeout to the
pdftool to prevent indefinite hangs that zombie the entire agent session.