OpenClaw Issue Digest: Addressing Event Loop Stalls, Session Persistence, and Channel Routing
Open Issues
The recent window of activity in the OpenClaw repository reveals several critical stability and routing issues. The most pressing concerns center around system responsiveness, specifically severe event loop stalls during agent bootstrap and turn maintenance, as well as regressions in how sessions handle model defaults and persistence.
System Stability and Performance
Several reports highlight significant latency and responsiveness issues:
- Event Loop Saturation: Users are reporting severe event loop stalls (up to 60s) during
core-plugin-toolsinitialization and turn preparation (#77145, #77331, #76606). This blocks the Node.js process, causing WebSocket timeouts in voice huddles and unresponsive chat interfaces. - Livelocks in Maintenance: A critical livelock has been identified in
runDeferredTurnMaintenanceWorker(#77340). Under steady traffic, the maintenance worker waits for a session lane to drain, but new turns arrive faster than the drain condition is met, leading to a monotonic accumulation of trailing assistant messages and potential API failures. - Cold Start Latency: The first embedded agent run is reportedly blocked for ~21 seconds due to synchronous module evaluation in
createOpenClawCodingTools()(#77331).
Session and Model Management
Issues with how OpenClaw persists and resolves agent state are causing inconsistent behavior:
- Stale Model Caching: A high-severity bug (#77322) shows that the
session.modelfield persists across/newcommands and ignores changes toagents.defaults.model.primary, forcing agents to use obsolete models even after a session reset. - Session Rotation Failures: In Mattermost, session rotation is creating duplicate sessions with broken delivery contexts, leading to queued responses that flush as duplicates upon restart (#77378).
- Compaction Loops: Under
safeguardmode, sessions without real conversation messages enter a permanent re-trigger loop, blocking all agent invocations for that session key (#77314).
Channel-Specific Routing and Delivery
Routing and delivery regressions are affecting multiple integration channels:
- Discord: Multi-account setups are experiencing issues where slash commands are only registered for the default account (#77359), and
requireMention: falseis being ignored, causing all guild messages to be skipped (#77457). - Telegram: Users report that
agent --delivermay return a successful payload without actually delivering media to the chat (#77265), and progress drafts (e.g., "Sifting...") are persisting across restarts (#77389). - Feishu: Bot identity probes are failing with
ECONNRESETwhenproxy.enabledis true (#77323), and@allmentions are not being correctly recognized for auto-replies (#77383). - Slack: Final replies are being silently dropped in group chats when the agent emits a
[thinking, text]turn without calling themcp__openclaw__messagetool (#77320).
Key Themes
1. Synchronous Blocking in the Runtime
There is a recurring theme of synchronous, blocking operations in the core runtime—particularly during tool construction and plugin loading—that saturate the event loop. This affects not only the initial startup but every turn in some configurations, severely degrading the UX for real-time channels.
2. Cache Invalidation Gaps
Multiple issues (#77322, #73635) point to a systemic problem with cache invalidation. Whether it is the implicit model cache in sessions or the skillsSnapshot in long-lived sessions, the system often fails to recognize configuration changes until a full gateway restart or manual session deletion occurs.
3. Delivery Path Silences
Across Slack, Telegram, and Feishu, there is a pattern of "silent failures" where the agent generates a response, but the delivery layer suppresses it without logging a warning. This makes debugging nearly impossible for operators without direct access to session transcripts.
Action Required
High Severity / Blocked
- #77340 (Maintenance Livelock): Immediate attention is needed to decouple the deferred maintenance worker from the session inference lane to prevent trailing-assistant accumulation.
- #77322 (Stale Model Cache): This blocks the ability to migrate models (e.g., for the upcoming DeepSeek retirement) without manual session wipes.
- #77331 / #76606 (Event Loop Stalls): The synchronous nature of tool creation needs to be addressed via pre-warming or lazy-loading to restore gateway responsiveness.
Critical Bug Clusters
- Discord Multi-Account: Fix the slash command registration loop to iterate over all configured accounts, not just the default (#77359).
- Feishu Proxy: Resolve the
ECONNRESETissue by ensuring the Lark SDK's internal token requests bypass the global proxy (#77323). - Slack Group Delivery: Implement a