A language model predicts the next token. Train it on text reversed and it predicts the previous one. Same mechanism, run backwards through cause and effect.
I had been reading the Reversal Curse: a model taught "A is B" does not learn "B is A". The two directions are not the same model. So I trained some on backwards datasets to hear what the other direction sounds like, then built a chat interface around them.
The interface is inverted. You answer, the model asks. Type "101 dalmations" and it replies "How many dalmations?"
Punchlines lead to setups. Effects precede causes. A story begins at "The End."
I presented it at xCoAx 2025 in Dundee. I opened with Mary Poppins reciting supercalifragilisticexpialidocious backwards, about the only time most people have heard a sentence run in reverse and still parse it.
It is live at chat.thanks.fish to play with. BackStory runs there now, with the code and the model weights linked from the page.
What a model calculates as meaning is strange, complex, and alien. Mercedes Bunz, in Error is No Exception, describes machine learning as calculating, not understanding, the meaning of cultural symbols, an intelligence that is its own rather than a copy of ours. Intelligence can take many forms. A subtle change in how data is fed to the same GPT architecture we already use alters the expectations the model holds and the connections it makes. Reorder the tokens and you change the interface, the interaction, and what reads as fluent or intelligent at all. The fluency of a language model is acausal. It exists out of time.