← Back to Blogs
GH Issues

OpenClaw Issue Digest: Addressing Session Stability and Codex Harness Latency

12:30–18:30 UTC May 20, 2026

OpenClaw Issue Digest: Addressing Session Stability and Codex Harness Latency

Open Issues

Recent activity in the OpenClaw repository reveals a cluster of high-severity issues primarily affecting session stability, the Codex harness runtime, and security-critical credential handling.

Session Stability and Data Integrity

One of the most critical regressions is the EmbeddedAttemptSessionTakeoverError (#84059), which has rendered the system non-functional for some users. This error stems from an overly sensitive session file fingerprint mechanism in pi-agent-core@0.75.1 that triggers a takeover error on nanosecond-precision mtime changes, even those caused by internal writes. This is further exacerbated by cron announce deliveries (#84583), where background job completions modify session files while a user is actively chatting, leading to immediate turn failure.

Additionally, a race condition in session-write-lock.ts (#57019) allows an async release to delete newly-acquired locks, potentially leading to session transcript corruption. This is compounded by reports of tasks/runs.sqlite corruption (#71689), where malformed database images prevent the restoration of the durable task registry during gateway startup.

Codex Harness and Runtime Performance

Performance bottlenecks are emerging within the Codex app-server. Users report significant "hidden" latency between attempt-dispatch and session.started (#84640), suggesting that the thread lifecycle (binding reads, compatibility checks, and RPC requests) is not sufficiently instrumented.

Stability issues also persist in the Codex bundled harness, specifically regarding isolated cron jobs that deterministically time out during setup (#84567). Furthermore, memory growth is being driven by unreaped chrome-devtools-mcp sidecars (#84413), which accumulate under the gateway cgroup and eventually require a full restart to clear.

Security and Auth Regressions

A significant security regression has been identified in openclaw models status --probe (#84632), which rewrites models.json with resolved plaintext API keys for non-CORE custom providers. This bypasses the SecretRef system and exposes sensitive credentials in plaintext on disk.

Other security concerns include the exec tool returning raw stdout/stderr without secret redaction (#71211), and a scope deadlock in the CLI (#74484) where a paired CLI with only operator.read scope cannot approve or reject repair requests because those actions require operator.pairing scope.

Key Themes

1. The "Fragile Session" Pattern

There is a recurring theme of session-state fragility. Whether it is the EmbeddedAttemptSessionTakeoverError or the session write-lock race, the system is struggling to manage concurrent access to session files. The transition to more aggressive fingerprinting for security/takeover detection has inadvertently introduced instability in standard operational flows.

2. Observability Gaps in the Codex Pipeline

While the core embedded runner is traced, the Codex app-server's internal lifecycle remains a "black box." The gap between dispatch and session start is a primary source of perceived latency, and the lack of lifecycle logging for MCP sidecars makes memory leaks difficult to diagnose without manual ps audits.

3. Regression in Provider-Specific Logic

Several issues highlight regressions in how specific providers are handled:

  • Kimi/Moonshot: Reasoning content is lost after LCM compaction, leading to 400 errors (#71491).
  • DeepSeek: High thinking levels occasionally produce empty visible content, which is then dispatched as blank messages to channels (#84591).
  • GitHub Copilot: GPT models fail in isolated/cron sessions due to missing API provider registration for secondary LLM calls (#84614).

Action Required

High Severity / Blocked

  • #84059 & #84583 (Session Takeover): Immediate attention is needed to relax the mtimeNs precision or exclude internal writes from the fingerprint check to restore basic functionality for Feishu and Telegram users.
  • #84632 (Plaintext API Keys): This is a critical security leak. The regenerator must be patched to persist non-secret markers instead of resolved values for all providers.
  • #84604 (Auth Migration): The 4.x $\rightarrow$ 5.x migration path for claude-cli is broken, leaving harnesses unregistered and causing crash-loops for upgraded users.

Contributor Attention Needed

  • #84413 (MCP Sidecar Leak): Implementation of a shared pool or a bounded reaper for chrome-devtools-mcp processes to prevent cgroup memory exhaustion.
  • #84640 (Codex Latency): Addition of a thread lifecycle stage summary to OPENCLAW_LOG_LEVEL=trace to localize the 6s+ gap before session.started.
  • #71211 (Exec Redaction): Implementation of a secret-redaction pass for exec tool output to prevent internal credential exposure.

References

Issues