Content
Introducing UTF-Random- Making Unicode Fair UTF-RANDOM UTF-8 for a connected world. Geneva, Switzerland – Unicode revolutionized the written word, but it has one downside: It favors Roman languages by making the byte string representation of other scripts (Cyrillic, Greek, Asian scripts) longer than necessary. To address this issue, a team of linguists and computer scientists has developed UTF-Random. The key innovation is the use of a probabilistic algorithm ( devrandom ) that assigns a bit field to every newly created text file. Early tests have shown promising results. For example, a Cyrillic character that previously required three bytes in UTF- 8 encoding can now be represented with fewer bytes 33.33 of the time (repeating, of course).