Building a Multi-Paxos Engine: Lessons from 100K Lines of AI-Generated Rust
The ambition of building a production-grade distributed system from scratch is often deterred by the sheer volume of boilerplate and the extreme difficulty of ensuring correctness in concurrent environments. However, a recent project involving the modernization of Azure's Replicated State Library (RSL)—a multi-Paxos consensus engine—demonstrates how AI coding agents can drastically accelerate the development cycle without sacrificing technical rigor.
By leveraging a suite of AI tools, the developer was able to implement over 130,000 lines of Rust code in just a few weeks, optimizing performance from 23K to 300K operations per second. This project serves as a case study in "vibe coding" at scale, blending high-level architectural guidance with AI-driven execution.
The Challenge: Modernizing a Legacy Backbone
Azure's RSL has served as the backbone for replication in many Azure services for over a decade. While robust, it lacked support for modern hardware capabilities that are now standard in cloud datacenters. Specifically, the project aimed to address three critical gaps:
- Lack of Pipelining: In the original RSL, new requests were blocked while votes were in flight, increasing latency.
- No NVM Support: The absence of Non-Volatile Memory (NVM) support meant commit times were higher than necessary.
- Limited Hardware Awareness: The system was not optimized for RDMA (Remote Direct Memory Access), which is now pervasive in Azure environments.
The AI-Accelerated Workflow
To achieve this scale of productivity, the developer utilized a combination of CLI-based agents—primarily Claude Code and Codex CLI—integrated with VS Code for diffing and minor edits. The workflow shifted from a rigid, document-heavy approach to a "lightweight spec-driven development" model.
Lightweight Spec-Driven Development (SDD)
Rather than maintaining exhaustive requirement and design documents that quickly fall out of sync with the code, the developer adopted a more agile process:
- Specification: Using tools like
spec-kitto generate a spec markdown containing user stories and acceptance criteria. - Clarification: Using the
/clarifycommand to force the AI to self-critique and suggest missing edge cases or alternative implementation paths. - Planning: Breaking work into single user stories, which the author identifies as the "sweet spot" for AI agents to manage effectively without losing context.
Ensuring Correctness: Code Contracts
One of the most significant hurdles in AI-assisted coding is the "black box" nature of generated code. To combat this, the project employed code contracts—explicit preconditions, postconditions, and invariants for critical functions.
These contracts act as a triple-layer defense:
- AI-Generated Contracts: High-capability models (like GPT-5 High) are used to write the contracts themselves.
- Targeted Testing: The AI generates specific test cases designed to trigger the post-conditions defined in the contracts.
- Property-Based Testing: Contracts are translated into property-based tests that explore randomized input spaces to find deep, subtle bugs.
This approach proved its worth when an AI-generated contract identified a subtle Paxos safety violation that would have otherwise resulted in a serious replication consistency issue.
Aggressive Performance Tuning
Once correctness was established, the developer spent three weeks on performance engineering. The AI was used not just to write code, but as a data analyst in a closed-loop optimization cycle:
- Instrumentation: AI instruments latency metrics across all code paths.
- Analysis: AI writes Python scripts to analyze trace logs and calculate quantiles to find bottlenecks.
- Optimization: AI proposes and implements optimizations (e.g., minimizing allocations, zero-copy techniques, and removing async overhead).
- Verification: Re-measuring performance to validate the gain.
This iterative process resulted in a throughput increase from ~23K ops/sec to ~300K ops/sec on a single laptop.
Critical Perspectives and Counterpoints
While the productivity gains are staggering, the project sparked significant debate among the developer community regarding the quality of AI-generated Rust.
The "AI Slop" Concern
Several critics pointed out that the original RSL library was only 36K lines of C++. The fact that the AI-generated Rust version reached 130K lines suggests a lack of conciseness and potential "slop."
"Rust is supposed to be more expressive and concise. Yet, AI generated 130k LoCs. I guess nobody understands how this code works and nobody can tell if it actually works."
The Idiomatic Rust Problem
Another common critique is that AI often struggles with Rust's borrow checker, leading to unidiomatic code. To satisfy the compiler, AI agents frequently resort to excessive .clone() calls or wrapping everything in Arc<Mutex<...>>.
"LLMs path straight to the goal. Problem: code doesn't compile. Solution: more clone()"
This suggests that while AI can achieve safety (the code compiles and passes tests), it may fail at liveness or efficiency by bypassing the architectural thinking usually required to solve borrow-checker errors.
Future Outlook
The project concludes with a vision for more autonomous AI agents. The author envisions a future where AI can handle end-to-end user story execution, automate the entire contract-to-test workflow, and conduct autonomous performance experiments without human steering. As the industry moves toward "vibe coding," the role of the architect shifts from writing lines of code to defining the constraints, contracts, and goals of the system.