Glossa
Learns a voice, then speaks in it.
What it is
Glossa is a Markov-chain text generator that learns the style of a corpus and produces new text in that voice. The signature technique is to tokenize the input, tally each n-gram's possible continuations, and then sample forward with weighted probabilities from sentence-aware seeds, with its central test guaranteeing the generator never emits a transition it did not see in training. It is a from-scratch, dependency-light build you can download and run locally.
A pure Markov text generator: tokenize → tally each n-gram's continuations → sample forward, weighted, from sentence-aware seeds. 13 tests, centred on the invariant that the generator never emits a transition it didn't see in training.
What's inside
The full source, the tests, and CI. Open it, read it, change it. A zero-dependency core, free, in the MIT spirit.
Run it after unzip
pnpm install && pnpm dev