← Back to Blogs
HN Story

Pushing the Z80 to the Limit: Real-Time 3D Rendering on the ZX Spectrum

May 18, 2026

Pushing the Z80 to the Limit: Real-Time 3D Rendering on the ZX Spectrum

For many developers, the ZX Spectrum is a nostalgic relic of the 1980s—a machine with a Z80 CPU and a mere 48K of RAM. However, for those interested in the intersection of mathematics and low-level engineering, it remains a fascinating playground for optimization. The challenge of implementing real-time 3D graphics on such limited hardware is not just a matter of nostalgia, but a masterclass in squeezing every possible cycle out of a processor.

In a recent project by Thanassis (ttsiodras), a 3D points renderer was ported to the ZX Spectrum 48K. The project demonstrates a journey from high-level C code to highly optimized Z80 assembly, revealing the stark performance differences that emerge when a developer takes direct control of the hardware registers.

The Performance Gap: C vs. Assembly

Implementing 3D projection on a Z80 is inherently difficult because the processor lacks floating-point support and has limited registers. The initial implementation was written in C using the z88dk cross-compiler. While functional, the C version struggled to maintain a playable frame rate, achieving only 6.2 frames per second (fps).

To improve this, the author transitioned to hand-written Z80 assembly. By making better use of the Z80 registers than a general-purpose C compiler could, and replacing costly division operations with multiplication via reciprocal lookup tables, the performance jumped to 14.0 fps.

For those seeking even more speed, a precomputed version was developed. By calculating target pixels and video RAM locations in advance, the renderer reached a staggering 40 fps. This highlights a recurring theme in retro-computing: when runtime computation is too expensive, move the work to the build pipeline.

Mathematical Optimizations for Limited Hardware

To make 3D rendering viable on a 3.5MHz processor, the author employed several clever mathematical shortcuts to simplify the projection equations.

Orbiting the Viewpoint

Rather than rotating the 3D model (which would require complex matrix multiplications), the author changed the logic to orbit the viewpoint around the model. This simplified the runtime equations to their barest essentials:

int wxnew = points[i][0] - mcos;
int x = 128 + ((points[i][1] + msin) / wxnew);
int y = 96 - (points[i][2] / wxnew);

This approach eliminates multiplications and shifts, relying only on two divisions and basic addition/subtraction.

The Build Pipeline and Pre-scaling

Since the Z80 cannot handle floating-point numbers, all source data (originally in Python) is pre-scaled by a factor of $S = 8960$ and converted to integers. The build pipeline also performs an axis swap (changing [X, Y, Z] to [X, Z, Y]). This allows the renderer to compute depth and screen-Y first; if a point is out of vertical bounds, the renderer can skip the screen-X calculation entirely, saving precious CPU cycles.

Reciprocal Lookup Tables

Division is one of the most expensive operations on a Z80. To optimize the inner loop, the author utilized "page-based" lookups. By loading the high byte of a table offset into the H register and the index into the L register, the CPU can read the reciprocal value from (HL) almost instantaneously, turning a division into a multiplication.

Technical Insights from the Community

The project sparked a discussion among retro-programming enthusiasts regarding the nature of assembly and modern tooling. One contributor, @flohofwoe, noted that the perceived "slowness" of 80s development wasn't necessarily due to assembly language itself, but the lack of modern tooling. With a modern IDE, a macro assembler, and a fast emulator debug loop, assembly development can be nearly as productive as high-level languages, provided the developer is comfortable with manual data layout and subroutine calls as their primary abstractions.

Another optimization tip shared by @Dwedit suggests treating the Z80 like a Game Boy CPU by using the IX and IY registers sparingly, as they are slower than other register pairs. Instead, using 256-byte aligned tables allows the developer to use the low byte of a register pair for indexing and the high byte for the table base, further streamlining memory access.

Conclusion

The port of the 3D points renderer to the ZX Spectrum serves as a reminder that hardware constraints often drive the most creative engineering. By shifting computation to the build pipeline, simplifying the geometry through viewpoint orbiting, and bypassing the limitations of C compilers through hand-optimized assembly, it is possible to achieve smooth 3D visuals on hardware that was never designed for them.

References

HN Stories