Statewright: Bringing Deterministic Guardrails to AI Agents

The prevailing trend in AI agent development is a race toward scale: bigger models, longer prompts, and more complex observability tools. However, as many developers have discovered, giving a model 40+ tools and an open-ended problem often leads to brittleness. Even the most capable frontier models can fall into "read-loop death spirals," where they re-read the same file repeatedly without ever making a meaningful edit.

Statewright proposes a fundamental shift in philosophy: "Agents are suggestions, states are laws." Instead of attempting to brute-force reliability through model size, Statewright reduces the problem space by implementing deterministic state machine guardrails. By constraining which tools an agent can use in any given phase, Statewright transforms an open-ended struggle into a structured, reliable workflow.

The Core Philosophy: Constraining the Solution Space

At its heart, Statewright is a Rust-based engine that evaluates state machine definitions. Unlike many agentic frameworks that rely on LLMs to manage their own planning, Statewright's engine is deterministic. It does not use an LLM in the loop to decide if a transition is valid; it simply enforces the rules defined in the workflow.

By breaking a task into distinct states (e.g., planning $\rightarrow$ implementing $\rightarrow$ testing), Statewright ensures the model reasons within a focused context. For example:

Planning State: The agent is granted read-only tools (Read, Grep, Glob). It cannot modify code, preventing premature or incorrect edits.
Implementing State: Edit tools are unlocked, but with strict limits. Statewright can block destructive operations like rm or shell redirects (>>) and cap the number of lines edited per state to prevent catastrophic failures.
Testing State: Only designated test commands (e.g., pytest, npm test) are permitted. The agent cannot return to the implementation phase without a valid transition trigger.

If an agent attempts to call a tool not permitted in its current state, the request is rejected at the protocol layer. The agent receives a message explaining what is currently available and how to transition to the next phase.

Measurable Impact on Model Performance

One of the most compelling arguments for this approach is its effect on smaller, local models. When the tool space is constrained, models that would otherwise fail complex tasks become viable.

According to Statewright's research on a 5-task SWE-bench subset, two local models (13.8GB and 19.9GB) saw their success rate jump from 2/10 to 10/10 when using Statewright constraints. This suggests that the "intelligence floor" for agentic work is lower than previously thought, provided the environment is sufficiently structured.

Technical Implementation and Integration

Statewright integrates with agents via the Model Context Protocol (MCP) or direct hooks. This allows it to provide "hard" enforcement—blocking tool calls at the protocol layer before they even reach the model—rather than "advisory" enforcement, where rules are merely suggested in the system prompt.

Supported Agent Integrations

Agent	Integration Method	Enforcement Level
Claude Code	Hooks + MCP	Hard
Codex	Hooks	Hard (Alpha)
opencode	TypeScript plugin	Hard (Alpha)
Pi	Skills extension	Hard (Alpha)
Cursor	MCP + rules	Advisory (Alpha)

Advanced Guardrails

Beyond simple tool gating, Statewright provides a suite of granular controls:

Bash Discernment: Blocks destructive operations and scripting interpreters in non-write states.
Edit Guards: Rejects diffs exceeding a specific line count or limits the number of files edited per state.
Conditional Transitions: Uses programmatic predicates (e.g., eq, gt, exists) to determine if a state transition is valid based on context data.
Approval Gates: Pauses execution for human review before high-risk transitions occur.

Beyond Coding: Enterprise and Creative Workflows

While the initial focus is on bug-fixing and software engineering, the state machine pattern is applicable to any multi-step process susceptible to non-deterministic quirks.

As the author, @azurewraith, notes, this approach can be applied to complex content pipelines—such as tabletop publishing—where a research phase gathers lore, a drafting phase generates structured JSON, and a human review gate ensures quality before finalization.

In an enterprise context, this could revolutionize SOC 2 change management. Instead of relying on checklists and hope, a workflow could structurally mandate a plan, human review, audited implementation, and final approval before a production deployment is even possible. This creates an auditable paper trail where humans are injected as approval gates rather than lifecycle managers.

Trade-offs and Considerations

Implementing a rigid state machine does come with costs. Users and contributors have raised several important points:

Flexibility vs. Rigidity: There is a risk that overly restrictive workflows may stifle creative exploration or leave an agent stuck if the task doesn't fit the predefined path.
Caching Costs: Frequent changes to the tool list (which occur during state transitions) may cause cache busts in some LLM architectures, potentially increasing token costs in long sessions.
Human Effort: Workflow definitions must be authored, although Statewright allows agents to generate these workflows by pointing them at a JSON schema.

Despite these challenges, the shift toward structural enforcement represents a significant departure from the current state of the art. By treating agents as suggestions and states as laws, Statewright provides a path toward truly reliable, auditable AI agents.

Statewright: Bringing Deterministic Guardrails to AI Agents

Statewright: Bringing Deterministic Guardrails to AI Agents

The Core Philosophy: Constraining the Solution Space

Measurable Impact on Model Performance

Technical Implementation and Integration

Supported Agent Integrations

Advanced Guardrails

Beyond Coding: Enterprise and Creative Workflows

Trade-offs and Considerations

References

HN Stories