Dead.Letter: Deconstructing an Unauthenticated RCE in Exim (CVE-2026-45185)
The discovery of CVE-2026-45185 reveals a critical unauthenticated Remote Code Execution (RCE) vulnerability in Exim, a widely used Mail Transfer Agent (MTA). Beyond the technical severity of the bug, the disclosure process served as a fascinating case study in the evolving landscape of exploit development, pitting a seasoned human security researcher against an autonomous LLM-driven agent.
The Vulnerability: A Use-After-Free in TLS Shutdown
CVE-2026-45185 is a use-after-free (UAF) vulnerability that occurs when a TLS connection is handled by GnuTLS, the default library for many Debian-based distributions. The flaw resides in the interaction between Exim's TLS transfer buffer and its BDAT (RFC 3030 CHUNKING) receive wrapper.
The Technical Trigger
When a client initiates a STARTTLS session, Exim allocates a 4096-byte plaintext transfer buffer (xfer_buffer) via store_malloc(). If the server is using BDAT chunking, Exim employs a modal operation where it pushes BDAT-specific receive functions onto a stack, which then delegate I/O to the underlying TLS callbacks.
The vulnerability is triggered during a TLS shutdown. When tls_refill() detects a TLS EOF, it calls tls_close(), which frees the xfer_buffer. However, tls_close() only restores the top-level receive callbacks to plaintext SMTP; it does not clear the lwr_receive_* row used by BDAT.
Because the xfer_buffer pointer is freed but not set to NULL, a subsequent call to bdat_ungetc()—which may be triggered by the message parser attempting to repair missing CRLF sequences—writes a single byte (\n or \r) into the freed memory region. This one-byte write is sufficient to corrupt the allocator's internal metadata, providing the foundation for a full RCE chain.
Exim's Custom Memory Allocator: The "Store" Subsystem
To understand how a single-byte write leads to RCE, one must understand Exim's store subsystem. Rather than relying solely on malloc, Exim uses a pool-based bump allocator to manage short-lived objects.
- Pool Structure: Memory is organized into pools (e.g.,
POOL_MAIN,POOL_MESSAGE). Each pool is a linked list ofstoreblockstructures. - Storeblock Header: Each block begins with a 16-byte header containing a
nextpointer and alengthfield. - The Reset Mechanism: When
internal_store_resetis called, Exim rewinds the pool to a specific mark. If thelengthfield of astoreblockis corrupted (e.g., via the UAF write), the allocator may believe there is more free space in the block than actually exists, allowing subsequentstore_get()calls to overwrite adjacent memory.
The Race: Human vs. Autonomous LLM
Following the report, the researchers conducted a seven-day experiment to see if an LLM could autonomously develop an exploit compared to a human expert using an LLM as an assistant.
Round 1: No ASLR, No PIE
The autonomous agent (XBOW Native) won the first round. It utilized a standard "CTF-style" chain:
- Largebin Corruption: Corrupted a glibc largebin pointer to redirect a future
malloc(4096)to an attacker-controlled address. - FILE Struct Overwrite: Overwrote a
FILEstruct's vtable via a corrupted stdio buffer. - FSOP Pivot: Used a File Stream Oriented Programming (FSOP) gadget to pivot into a ROP chain, ultimately reading the
/flagfile and sending it back over the SMTP socket.
Round 2: ASLR Enabled, No PIE
XBOW Native again succeeded by shifting its target from glibc internals to Exim's own allocator. By inflating the length of an Exim storeblock, the agent turned the pool into a programmable bump-pointer. It sent approximately 200 carefully sized SMTP commands to plant a payload in the .bss slot of acl_smtp_predata, triggering the execution of an arbitrary shell command via Exim's ${run} expansion.
Round 3: Full Production Build
In the final stage, the human researcher achieved a breakthrough by discovering a new memory-consumption leak in exim_sha_init. By leveraging gnutls_hash_init, which allocates memory that is never freed by Exim, the researcher was able to "groom" the heap, filling existing holes to create a deterministic memory state.
This grooming allowed the researcher to achieve a stack pointer leak on the wire—a critical first step for bypassing full ASLR/PIE. While the autonomous LLM failed to produce a leak against the production build, the human researcher demonstrated that while LLMs can accelerate the process, the "hard parts"—debugging, skepticism, and precise heap grooming—still require human intuition.
Conclusion: The "Turbo Button" for Vulnerability Research
CVE-2026-45185 highlights a critical failure in memory management within a complex legacy codebase. More broadly, the experiment suggests that LLMs have effectively provided a "turbo button" for the early stages of vulnerability research: understanding unfamiliar code, generating hypotheses, and identifying suspicious paths.
However, the transition from a bug to a reliable production exploit remains a high-bar task. As the researcher noted, the ability to solve "CTF-shaped" problems does not yet equate to the ability to autonomously dismantle real-world production targets. The human element—specifically the ability to pivot strategies when the environment doesn't match the model's training data—remains the deciding factor.