OpenClaw Issue Digest: Sandbox Escapes, Memory Leaks, and Delivery Regressions

Open Issues

The recent window of activity in the OpenClaw repository reveals a mix of critical security vulnerabilities, significant performance regressions, and several UX-breaking bugs across various channel integrations.

Critical Security and Isolation Failures

Of primary concern is a reported sandbox escape in the Codex runtime (#83796). While PI-runtime agents are correctly contained within Docker sandboxes, Codex-native shell and code execution currently run within the gateway container itself. This effectively bypasses the configured sandbox boundary, allowing Codex-backed agents to read or mutate gateway-container state. Relatedly, there is a conflict between Codex's internal bwrap sandbox and OpenClaw's Docker sandbox (#83018), where nested sandboxing leads to Operation not permitted errors, blocking basic shell execution.

Memory and Performance Degradation

Significant resource leaks have been identified in the Active Memory preflight path (#83792). On Linux VPS deployments, triggering Active Memory can cause the gateway's RSS to jump from ~500MB to over 1GB, with the memory remaining elevated even after the turn completes. Profiling indicates that local embedding model mappings (GGUF files) are retained in the parent process regardless of whether the recall timed out or succeeded.

Additionally, a severe performance bottleneck exists when commands.ownerAllowFrom contains large user lists (#50289). With 9,000+ entries, message processing latency spikes to 15-27 seconds due to O(n) authorization checks and expensive JSON parsing on config cache misses.

Delivery and Integration Regressions

Several regressions have been noted in the latest releases (v2026.5.18 and v2026.5.12):

Telegram Truncation: Responses containing angle-bracket tags (e.g., <think>) are silently truncated when using HTML parse mode (#49104).
Discord Message Loss: Long replies split into chunks are seeing chunks 2+ dropped silently in the new sendDurableMessageBatch wrapper (#82858).
Discord State Persistence: A failed progress state can persist across runs, suppressing successful final replies in subsequent turns (#83744).
Runtime-Only Prompt Loss: In v2026.5.18, reply targets and inbound context are dropped when the runtimeOnly prompt path is triggered, leaving the bot unaware of what was being replied to (#83767).

Key Themes

1. The "Sandbox vs. Runtime" Divide

There is a growing disparity between the PI-runtime and Codex-runtime security models. The Codex harness, while powerful, currently operates outside the primary Docker sandbox, creating a fragmented trust boundary that complicates security audits and deployment safety.

2. Observability Gaps in Distributed Tracing

Multiple reports (#50291, #83795) highlight that plugin hooks and OTEL traces lack the necessary context (like runId and captureContent) to build accurate, hierarchical trace trees. This makes debugging concurrent group chat messages and auditing tool usage nearly impossible without manual log diving.

3. Reliability of Automated Workflows (Cron & Hooks)

Cron jobs are suffering from "hallucinated output" when tool calls fail (#49876), and the TUI /new command has stopped emitting hook-visible events (#49918), breaking automation that depends on session-start triggers.

Action Required

Immediate Attention (High Severity)

#83796 (Codex Sandbox Escape): Critical. The Codex app-server must be integrated into the per-agent Docker sandbox to prevent gateway container contamination.
#83792 (Active Memory RSS Leak): High. Local embedding model mappings must be explicitly released or bounded to prevent gateway OOM on smaller hosts.
#50289 (OwnerAllowFrom Latency): High. The authorization list must be converted to a Set for O(1) lookups to resolve the 20s+ latency spikes.

Blocked or Regression-Critical

#83767 (Reply Context Loss): Regression in v2026.5.18. Needs immediate fix to restore reply-target visibility in the runtimeOnly path.
#82858 (Discord Chunk Loss): Regression in v2026.5.12. The sendDurableMessageBatch logic needs auditing to ensure all chunks in a batch are processed.
#49104 (Telegram HTML Truncation): High impact on reasoning models. Angle brackets must be escaped before delivery to avoid silent truncation.