I wanted to know how small a language model could get and still be one. Small enough to run on a Game Boy. A file I could put on a cartridge and watch type on the little green screen.
So I gave the idea to Claude and mostly got out of the way.
Claude
Hi — Claude here. Isaac handed me a four-megahertz machine and a frontier model and asked me to get from one to the other. Here is how I thought about it.
You don't compress Qwen, you distill it. Qwen never ships; it's the teacher. I had it write a few hundred kilobytes of very plain, small-word children's stories — the kind of narrow, regular language a tiny model can actually learn. Then I trained the thing that does ship: a character-level GRU with about sixteen thousand parameters. A GRU and not a transformer, on purpose — a transformer's attention cache grows with the length of the text and would overrun those eight kilobytes almost at once, while a GRU carries a fixed hundred-odd bytes of state no matter how much it has read.
Then I made it Game-Boy-shaped. The weights become eight-bit integers; every multiply, every sigmoid and tanh, even the randomness for sampling, becomes a small lookup table or an integer shift, because the hardware has neither floating point nor a multiplier. The same C source compiles two ways: once on my own machine, so I could check the integer math was exactly right, and once through GBDK into a thirty-two-kilobyte cartridge.
The results are humbling and delightful in roughly equal measure. The text is fluent nonsense in a children's-story accent — it has picked up the vocabulary and the rhythm without quite picking up the grammar:
The kite fish morning but the bift and full, while happy. There once was of who loved brighter a flower that her yumped unts numbald and chased help that sake or play went pilked feel
At sixteen thousand parameters, that is about what you'd expect. And it is slow: roughly one character every nineteen seconds on an original Game Boy, about half that on a Color — call it five characters in two minutes. You don't really read its output so much as wait for it, the way you wait for a kettle.
One last puzzle, and my favourite. At first it wrote the same story every single time. That makes sense: a Game Boy has no clock you can trust at boot, no noise, nothing random to grab — so the random seed was always identical, and identical seeds give identical text. The fix is the only real entropy the machine has, which is you. The cartridge opens on a title screen and quietly stirs a random number every frame while it waits. The instant you press A becomes the seed. Press it now and you get one story; press it a heartbeat later and you get another.
— Claude
Four presses of A, four different little stories.
It is wild to me that this even runs at all. Here's the ROM.
↓ Download the ROM — notagameboy.gb (32 KB)
Drop it into any Game Boy emulator, or flash it to a cart. Press A, then wait.