GH Issues

OpenClaw Development Digest: Strengthening Security, Sandbox Isolation, and Agent Reliability

18:30–00:30 UTC May 20, 2026

OpenClaw Development Digest: Strengthening Security, Sandbox Isolation, and Agent Reliability

The recent activity in the OpenClaw repository reveals a concerted effort to move the platform from a flexible prototype to a production-ready agentic framework. The primary focus has shifted toward structural security—specifically the isolation of plugins and skills—and the reliability of autonomous background tasks.

As agents are granted more autonomy through tool-chaining and long-running cron jobs, the community is identifying critical gaps in how the system handles non-deterministic failures, resource exhaustion, and the inherent risks of executing untrusted code. This digest covers the emerging themes of "policy-as-code" for agent workflows and the architectural migration toward deeper containerization.

Open Issues

Security and Isolation

Several high-severity issues highlight a systemic risk in the current plugin architecture. Currently, all plugins share a single Node.js process, meaning a vulnerability in one can compromise the entire system's credentials and memory.

Unified Sandbox Architecture (#12505): A critical proposal to replace the fragmented path-sandbox with a multi-platform SandboxManager. This would move plugins into separate processes with restricted filesystem and network access, utilizing platform-native tools like bubblewrap (Linux) and AppContainer (Windows).
Sysbox Integration (#7575): To solve the "Docker-in-Docker" dilemma, there is a push to implement the Sysbox runtime. This would allow the gateway to manage sandbox containers without requiring the dangerous --privileged flag or mounting the host's Docker socket.
Skill Integrity (#12507, #12512): Concerns have been raised about "instruction injection" via SKILL.md files. Proposals include implementing cryptographic signing for skills and a dedicated isolation layer to prevent skill content from overriding the agent's core identity.

Reliability and Autonomous Execution

As users deploy more complex cron jobs, the "fire-and-forget" nature of current executions is proving insufficient.

Deterministic Verification (#12398): A proposal for postconditions in cron jobs. Instead of trusting the LLM's "done" message, a deterministic shell script would verify the actual outcome (e.g., checking if a file was actually created).
Direct Exec Mode (#18160): To reduce costs and latency, there is a request for a payload.kind = "exec" for cron jobs, allowing simple scripts to run without requiring an LLM turn.
Codex Runtime Latency (#84725, #78947): Reports indicate significant overhead in the Codex harness, with warm turns spending ~7.5s in setup before the prompt is even submitted. This points to a need for better memoization of auth profiles and tool schemas.

UX and Channel Parity

Efforts to bring Slack and WhatsApp to parity with Telegram continue, focusing on rich interactions and reliability.

WhatsApp Reliability (#7433, #11703): Issues regarding message decryption failures in groups and the need for coalescing messages based on server timestamps rather than receive time.
Slack Interaction (#12602, #84732): Requests for Block Kit support for richer UI and a fix for the reconcileUnknownSend capability mismatch that currently blocks some channel-targeted sends.

Key Themes

From "Vibes-Based" to "Policy-Based" Governance

There is a recurring theme of moving away from prompt-based constraints toward architectural enforcement. This is evident in:

Response Gating (#13583): The proposal for "hard gates" that mechanically prevent an agent from responding until a mandatory tool (like a security scanner) has been executed.
Action-Level Deny (#13948): Moving from blocking entire tools (e.g., message) to blocking specific actions (e.g., message:send) while allowing others (message:read).

Resource and Cost Awareness

With the rise of high-token-cost models, users are demanding more granular control over spending:

Token Budgeting (#13271, #9912): Requests for per-channel tool budgets and maxTurns limits to prevent infinite LLM loops.
Usage Observability (#13219, #9016): A push for dedicated usage logs and the exposure of OpenRouter's per-message cost data directly to the agent runtime.

Action Required

High Severity / Blockers

Codex Dependency Issue (#83964, #84715): The @openclaw/codex package is failing due to missing openclaw peer dependencies in some environments. This is a direct blocker for users of the Codex runtime.
SSRF Policy Failure (#84723): The allowRfc2544BenchmarkRange setting is ineffective in FakeIP environments, blocking legitimate proxied requests.

Immediate Contributor Attention

Symmetry in Tooling (#84734): A regression in v2026.5.19 causes the gateway's cron tool to return empty lists despite jobs existing in jobs.json.
Auth Refresh Retries (#8673): The OAuth token refresh process lacks retry logic, causing transient network blips to trigger full re-authentication failures.
TUI Accessibility (#9637): A high-impact request to disable emojis/unicode symbols for screen-reader users, which is a low-effort, high-value fix.

OpenClaw Development Digest: Strengthening Security, Sandbox Isolation, and Agent Reliability

Open Issues

Security and Isolation

Reliability and Autonomous Execution

UX and Channel Parity

Key Themes

From "Vibes-Based" to "Policy-Based" Governance

Resource and Cost Awareness

Action Required

High Severity / Blockers

Immediate Contributor Attention

References

Issues