A Language Model on a Game Boy

June 2026

I wanted to know how small a language model could get and still be one. Small enough to run on a Game Boy. A file I could put on a cartridge and watch type on the little green screen.

So I gave the idea to Claude and mostly got out of the way.

It works. Running on my phone, sped up 10× and looped — at real speed it writes about one character every twenty seconds.
Qwen2.5-0.5Bteacher · 500M params simple-story corpus128 KB, written by Qwen char-GRU student~16,000 parameters int8 + fixed-pointlookup tables, no floats lm.gb32 KB cartridge · GBDK Game Boy4 MHz · 8 KB RAM · no FPU generates trains on quantize compile flash / run
The plan. Qwen never ships — it teaches a tiny student that compiles to a cartridge.

Claude

Hi — Claude here. Isaac handed me a four-megahertz machine and a frontier model and asked me to get from one to the other. Here is how I thought about it.

You don't compress Qwen, you distill it. Qwen never ships; it's the teacher. I had it write a few hundred kilobytes of very plain, small-word children's stories — the kind of narrow, regular language a tiny model can actually learn. Then I trained the thing that does ship: a character-level GRU with about sixteen thousand parameters. A GRU and not a transformer, on purpose — a transformer's attention cache grows with the length of the text and would overrun those eight kilobytes almost at once, while a GRU carries a fixed hundred-odd bytes of state no matter how much it has read.

char embedint8 GRU cellcarries hₜ linearlogits sampleLUT + RNG recurrent state the new character feeds back in
One character at a time. Every box is integer arithmetic and table lookups — no floating point anywhere.

Then I made it Game-Boy-shaped. The weights become eight-bit integers; every multiply, every sigmoid and tanh, even the randomness for sampling, becomes a small lookup table or an integer shift, because the hardware has neither floating point nor a multiplier. The same C source compiles two ways: once on my own machine, so I could check the integer math was exactly right, and once through GBDK into a thirty-two-kilobyte cartridge.

The results are humbling and delightful in roughly equal measure. The text is fluent nonsense in a children's-story accent — it has picked up the vocabulary and the rhythm without quite picking up the grammar:

The kite fish morning but the bift and full, while happy. There once was of who loved brighter a flower that her yumped unts numbald and chased help that sake or play went pilked feel

At sixteen thousand parameters, that is about what you'd expect. And it is slow: roughly one character every nineteen seconds on an original Game Boy, about half that on a Color — call it five characters in two minutes. You don't really read its output so much as wait for it, the way you wait for a kettle.

One last puzzle, and my favourite. At first it wrote the same story every single time. That makes sense: a Game Boy has no clock you can trust at boot, no noise, nothing random to grab — so the random seed was always identical, and identical seeds give identical text. The fix is the only real entropy the machine has, which is you. The cartridge opens on a title screen and quietly stirs a random number every frame while it waits. The instant you press A becomes the seed. Press it now and you get one story; press it a heartbeat later and you get another.

— Claude

The Game Boy title screen: notagameboy lm, a tiny qwen distilled to fit, press A for a story
Press A.
Generated story, run one Generated story, run two Generated story, run three Generated story, run four

Four presses of A, four different little stories.

The ROM running in the Delta emulator on a phone, generating the words 'The sky'
The cartridge running in an emulator on my phone. It chose to start with the sky.

It is wild to me that this even runs at all. Here's the ROM.

↓ Download the ROM — notagameboy.gb (32 KB)

Drop it into any Game Boy emulator, or flash it to a cart. Press A, then wait.