Unicode Hell: torture your type for fun (and profit?)

by panglott

Sometime at the start of my graduate program in linguistics, I frequently needed to cite examples from differing alphabets and scripts. So I put together a dummy text to weed out fonts that can’t handle Unicode well, mostly edited snippets from Wikipedia. Serif fonts in general do poorly with this, and the best performer overall was Arial Unicode MS. Mostly I wrote in LibreOffice in Gentium Plus, or Hiragino Mincho ProN W3 for Japanese and Japanese/English texts, but even Gentium had difficulty with the wilderness of combining diacritics in a presentation on the Dené–Yeniseian hypothesis. Needless to say, this is overkill for most purposes.

“Neque porro quisquam est qui dolorem ipsum quia dolor sit amet,” wrote Cicero—there is no one who loves pain because it is pain. “Ðā ne sacað þe ætsamne ne bēoð,” said the canny Saxon—those do not quarrel who are not together. “Hwon gelpeð se þe wide siþað,” quoth he—a little boasts he who travels widely. “Kolik jazyků znáš, tolikrát jsi člověkem,” Masaryk wrote: as many languages you know, as many times you are a human being.

“Sphinx of black quartz, judge my vow” is a pangram, and thus contains every letter of the alphabet. “Příliš žluťoučký kůň úpěl ďábelské ódy” is Czech: The too-yellow horse groaned devilish odes. While “Eble ĉiu kvazaŭ-deca fuŝĥoraĵo ĝojigos homtipon” is Esperanto: Maybe every quasi-fitting bungle-choir makes a human type happy. Norwegian blåbærsyltetøy is blueberry jam. The Klingon word lIghoH “He disputes you (pl.)” shows the importance of serifs. It’s hard to pronounce Antonín Dvořák [ˈantoɲiːn ˈlɛopolt ˈdvor̝aːk]. Tau, or 6.28318…, is twice pi.

English idea derives ultimately from Greek ἰδέα “form, appearance, kind,” from the Proto-Indo-European root wid- “see, know,” which is cognate with Sanskrit veda “ritual knowledge, lore” and English wise and witty, Other English words that derive ultimately from Sanskrit include ashram (Sanskrit āśrama “hermitage”), ganja (gāñjā “hemp”), pundit (paṇḍitá “scholar”), and loot (luṇṭhati “he steals”).

ハングル(朝: 한글、hangeul)は、朝鮮語を表記するための表音文字である。「ピリカ チェプ」 means “good fish” in Ainu.