DiffusionGemma is an experimental language model that generates text by denoising 256 tokens in parallel rather than writing one token at a time. This demo uses it to correct noisy OCR from historical newspapers, head‑to‑head against a conventional autoregressive model (Gemma‑4‑E4B).
How to use it: pick a passage below (or paste your own), press Correct this text, and watch the correction emerge step by step. On a 75‑passage benchmark the diffusion model corrected more accurately than the autoregressive baseline — and roughly 8× faster.
All experiments ran on Hugging Face Jobs — benchmark scripts & README in this bucket.
The data. 75 passages from BLN600, a corpus of 600 excerpts of 19th‑century London newspapers (largely crime reporting) from the British Library's collections, each paired with both the original OCR and a careful human transcription. That human transcription is the “right answer” every number below is measured against. Passages longer than DiffusionGemma's 256‑token output block were trimmed at a point where OCR and transcription align, so the pairs stay parallel. (BLN600 is CC‑BY‑NC, so the passages themselves aren't republished here — only these metrics.)
The task. Both models got the identical instruction — fix recognition errors only, don't modernise or rephrase — one passage at a time on the same A100 GPU. CER / WER: how far the output remains from the human transcription, by character / by word (the “OCR input” row is the damage before any correction). Relative CER reduction: how much of that damage the model repaired. Over‑correction: how much text that was already right the model needlessly changed. Fix rate: how much of what was actually wrong it fixed.
The full record. Every experiment behind these numbers — scripts, configs, findings (including the negative results) — is logged in a public experiment-log bucket; all runs executed on Hugging Face Jobs.
Fetching the ledger…