OpenClaw Development Digest: Strengthening Security, Sandbox Isolation, and Agent Reliability
The recent activity in the OpenClaw repository reveals a concerted effort to move the platform from a flexible prototype to a production-ready agentic framework. The primary focus has shifted toward structural security—specifically the isolation of plugins and skills—and the reliability of autonomous background tasks.
As agents are granted more autonomy through tool-chaining and long-running cron jobs, the community is identifying critical gaps in how the system handles non-deterministic failures, resource exhaustion, and the inherent risks of executing untrusted code. This digest covers the emerging themes of "policy-as-code" for agent workflows and the architectural migration toward deeper containerization.
Open Issues
Security and Isolation
Several high-severity issues highlight a systemic risk in the current plugin architecture. Currently, all plugins share a single Node.js process, meaning a vulnerability in one can compromise the entire system's credentials and memory.
- Unified Sandbox Architecture (#12505): A critical proposal to replace the fragmented path-sandbox with a multi-platform
SandboxManager. This would move plugins into separate processes with restricted filesystem and network access, utilizing platform-native tools likebubblewrap(Linux) andAppContainer(Windows). - Sysbox Integration (#7575): To solve the "Docker-in-Docker" dilemma, there is a push to implement the Sysbox runtime. This would allow the gateway to manage sandbox containers without requiring the dangerous
--privilegedflag or mounting the host's Docker socket. - Skill Integrity (#12507, #12512): Concerns have been raised about "instruction injection" via
SKILL.mdfiles. Proposals include implementing cryptographic signing for skills and a dedicated isolation layer to prevent skill content from overriding the agent's core identity.
Reliability and Autonomous Execution
As users deploy more complex cron jobs, the "fire-and-forget" nature of current executions is proving insufficient.
- Deterministic Verification (#12398): A proposal for
postconditionsin cron jobs. Instead of trusting the LLM's "done" message, a deterministic shell script would verify the actual outcome (e.g., checking if a file was actually created). - Direct Exec Mode (#18160): To reduce costs and latency, there is a request for a
payload.kind = "exec"for cron jobs, allowing simple scripts to run without requiring an LLM turn. - Codex Runtime Latency (#84725, #78947): Reports indicate significant overhead in the Codex harness, with warm turns spending ~7.5s in setup before the prompt is even submitted. This points to a need for better memoization of auth profiles and tool schemas.
UX and Channel Parity
Efforts to bring Slack and WhatsApp to parity with Telegram continue, focusing on rich interactions and reliability.
- WhatsApp Reliability (#7433, #11703): Issues regarding message decryption failures in groups and the need for coalescing messages based on server timestamps rather than receive time.
- Slack Interaction (#12602, #84732): Requests for Block Kit support for richer UI and a fix for the
reconcileUnknownSendcapability mismatch that currently blocks some channel-targeted sends.
Key Themes
From "Vibes-Based" to "Policy-Based" Governance
There is a recurring theme of moving away from prompt-based constraints toward architectural enforcement. This is evident in:
- Response Gating (#13583): The proposal for "hard gates" that mechanically prevent an agent from responding until a mandatory tool (like a security scanner) has been executed.
- Action-Level Deny (#13948): Moving from blocking entire tools (e.g.,
message) to blocking specific actions (e.g.,message:send) while allowing others (message:read).
Resource and Cost Awareness
With the rise of high-token-cost models, users are demanding more granular control over spending:
- Token Budgeting (#13271, #9912): Requests for per-channel tool budgets and
maxTurnslimits to prevent infinite LLM loops. - Usage Observability (#13219, #9016): A push for dedicated usage logs and the exposure of OpenRouter's per-message cost data directly to the agent runtime.
Action Required
High Severity / Blockers
- Codex Dependency Issue (#83964, #84715): The
@openclaw/codexpackage is failing due to missingopenclawpeer dependencies in some environments. This is a direct blocker for users of the Codex runtime. - SSRF Policy Failure (#84723): The
allowRfc2544BenchmarkRangesetting is ineffective in FakeIP environments, blocking legitimate proxied requests.
Immediate Contributor Attention
- Symmetry in Tooling (#84734): A regression in v2026.5.19 causes the gateway's cron tool to return empty lists despite jobs existing in
jobs.json. - Auth Refresh Retries (#8673): The OAuth token refresh process lacks retry logic, causing transient network blips to trigger full re-authentication failures.
- TUI Accessibility (#9637): A high-impact request to disable emojis/unicode symbols for screen-reader users, which is a low-effort, high-value fix.