Monthly Archives: June 2010

RBox: A diy 32 bit game console for the price of a latte

Uses the smallest and cheapest 32 bit CPU to generate 3D graphics and sound.

The RBox is a game console that is simple enough to build on the prototype area of a dev kit; no pcb required just a crystal, a few capacitors and resistors.



320×240 composite or s-video output generated entirely in software
256 colors with standard palette, up to 8k colors
8 bit 15khz stereo audio
~$1 Analog joystick
~$1 CPU

A bit of history…
Ever since the Atari VCS generated video by Racing the Beam i have wanted to build something that generates video on the fly. There have been lots of great diy examples of this in the intervening years (Rickard Gunee’s Picpong and SX Tetris, the lovely Uzebox) but the arrival of the ARM Cortex M0 parts from NXP rekindled my interest. The NXP LPC111X family is smallest 32 bit CPU. It is an ARM Cortex M0, the same device I used in the Wikipedia reader. This family of devices starts at under $1.

The LPC111X parts have ample horsepower for black and white video, as many less powerful chips have done in the past. At a max clockspeed of 48mhz they require 2 clocks to access memory or gpio so you could write a blit loop fast enough for 320 horizontal resolution (each pixel at 320×240 takes 8 cpu clocks, enough time for read/palette lookup/write/test/loop). Add a couple of resistors for a DAC, add sync pulses and you are in business. Overclock to 57.27272 and then you are a multiple of the chroma carrier and can generate colorburst and manipulate chroma phase: Now I had nice color bars but it wasn’t clear how generalize this to a practical system that could be programmed to generate interesting and useful images.

The real breakthrough came when I realized I could re purpose SPI to manipulate chroma phase while the cpu used gpio to write luma. SPI also has a 16 byte fifo on these parts which allowed chroma writes to be queued relieving pressure on the luma timing. With SPI emitting bits at 1/2 the cpu clock rate I could get 8 bits per chroma clock, enough to do 8 different phases for 8 different hues. All the other 248 combinations of those bits generated other hues and levels of saturation, suggesting a palette of some sort might be useful.

Color generation solved, now I needed graphics. A frame buffer was out of the question: at 8 bits a 320×240 frame buffer is 75k. These devices have 2k to 8k total memory so another approach is needed. After fiddling with tiles (as in Uzebox) I eventually settled on a line buffer approach where the application sends individual lines to the video driver for display, allowing code that looks and feels like it is working with a framebuffer without the actual memory.

There are 2 pixel formats supported: 5 bits of luma with 3 bits of chroma index or 4 bits of luma with a 4 bit chroma index. The chroma indexes map to the actual 8 bit value emitted by SPI. The chroma palette can be changed every line allowing up to 8k colors on the screen at the same time. With 8 bit color filling the line buffer is simply a matter of dealing with 1 byte per pixel blitting and things like smooth scrolling become very simple.

The video driver generates 3 interrupts per line: i0 at start of the line to pull hsync and emit an audio sample. 1 pcm sample gets emitted per channel per scanline at 15.7khz which is the line rate of ntsc. i1 happens at the end of hsync. The driver releases hsync and feeds the colorburst data into the SPI fifo then returns. The active video interrupt i2 actually emits the pixels and uses about 70% of the cpu.
Much as I love this CPU there were a few wrinkles. At this clockspeed, it needs 3 wait states to access flash. If you execute code from flash on this part you can never be sure when those wait states will slow you down: the same code may run at different speeds depending on its position in flash. Really bad if you are writing a blit loop that has really tight timing requirements like generating ntsc in software. The solution was to copy the critical routines to RAM where they run without any wait states.

Other devices like the Cortex M3 based LPC13XX and the LPC17XX have flash accelerators and single clock gpio writes. The LPC17XX also has DMA that makes all this sort of thing really easy – 640×480 component video should be possible. Although more expensive than the M0 parts, they are still ridiculously cheap.

About the demo video

Scrolling around a 8192×2048 map of a certain hedgehogs homeworld. The demo uses the analog joystick as input, features single pixel horizontal and vertical smooth scrolling and runs at at 60fps. The audio in the background (the music, not the leafblower) is coming from the device.


500 particles spraying at random from two sources over an animated background. The texture in the back is being generated every line by multiplying the luminance of a blob pattern by sin(y). Single cycle multiplies take a lot of getting used to; it is often faster to multiply than to use shift/add “optimizations”. 60fps.

Platonic solids float over an animated 3D plane with weird patterns in the sky that gratuitously change color. The 3D models are rendered scanline by scanline and composited with the dynamically generated backgound. Once again, because we don’t have a frame buffer, we get 60fps with no tearing.
How to build

If you want to make your own I have included the schematic and the code. I built the prototype on a LPCXpresso devkit available at Digikey, you will need to replace the stock 12mhz with a very common 14.318mhz crystal.  I bought a reel of these so if you can’t find one I will send you one. The analog joystick is a replacement part from a PSP. $1ish on ebay. The wiring photos are a little tricky to follow, check the schematics for a clearer version.

Source for the LPCXpresso/Eclipse project is posted on

I will be doing a PCB version so for those squeamish about little green wires can hang on ’till then.

until next time,