I wanted to generate some vanity Ethereum addresses. There is existing code on GitHub for this that runs on a Mac, but it runs on the CPU. I wondered whether it could run on Metal for the speedup. Not the speed of a CUDA GPU miner, but faster than CPU is still great.
Claude was excellent here, writing the Metal code, red-teaming its own code, writing proofs where it could and tests for the rest. Overall I feel pretty ok using these keys.
I made one with a fish at the start and the end, 0xf154fb73914b7cdb7b202268dbe626713f31f154.
What it is
A from-scratch Ethereum vanity-address miner that runs on the Apple Silicon GPU through Metal. A Swift host drives a Metal compute shader, with a full secp256k1 and Keccak-256 implementation written directly in Metal Shading Language. It mines addresses matching a chosen prefix and suffix at around 30 to 45 million keys per second on an M1 Max. That is an early Apple Silicon chip, and a newer one could probably be pushed up to 60 million per second or so.
The speed comes from batched Montgomery inversion, one modular inverse per roughly 128 candidates, and from keeping the elliptic-curve points alive across dispatches. That is the trick that takes it from about half a million per second to tens of millions.
Why it beats what existed
The keys use full 256-bit entropy from the OS CSPRNG, with no modulo bias, and are split-key ready. They are drawn with rejection sampling, and the uniformity of that sampling is checked by a machine-verified Lean proof.
There was also no trustworthy Metal-native ETH vanity miner. The fast tools are CUDA or OpenCL. This one is Metal-native, so it just runs on a Mac.
The pipeline ships a three-implementation trust anchor: noble in JavaScript, Foundry cast in Rust, and eth_keys in Python all have to agree. The production kernel is checked byte-for-byte against noble across more than 65,000 addresses.
Speed
At around 35 million keys per second it is roughly 300 to 900 times faster than the popular browser tool, and 20 to 40 times faster than a fast native multi-core CPU implementation. For scale, here is the time to find an 8-hex pattern, about 4.3 billion expected tries:
| Approach | Rate | Time to find |
|---|---|---|
| browser / JS CPU (vanity-eth) | ~50k/s | ~1 day |
| fast native multi-core CPU | ~1M/s | ~1-2 hours |
| this Metal miner | ~35M/s | ~2 minutes |
| discrete GPU (CUDA / OpenCL) | ~1B/s | ~4 seconds |
Sharing the prompt, not the artefact
I might push it to GitHub if anyone asks, but right now I am more interested in sharing the prompt than the artefact. We will see if that holds for longer. Here is the one line to rebuild it with Claude Code on a Mac: