Posts Tagged ‘Italian’

Entropy of the Voynich text

May 26, 2015 23 comments

The Shannon Entropy of a string of text measures the information content of the text. For text that is completely random i.e. where the appearance of any character is as likely as the appearance of any other, the entropy (or “disorder”) is high. For a text which is a long string of identical characters, for example, the entropy is low.

Mathematically, the Shannon Entropy is defined as:

Entropy = –ΣiN probi * Log( probi)

where probi is the frequency of the i’th character in the text, and the sum is over all the characters.

If the Voynich text is randomly created (by whatever means), we’d expect it to have high entropy (i.e. be very disordered). What we in fact find is that the text is ordered, with low Entropy, and is rather more ordered than English, for example. The result of comparing the Voynich text with several other texts in different languages is shown in the table below.

Language Source Entropy
Voynich GC’s Transcription 3.73
French Text from 1367 3.97
Latin Cantus Planus 4.05
Spanish Medina 1543 4.09
German Kochbuch 1553 4.15
English Thomas Hardy 4.21
Early Italian Divine Comedy 1300 4.23
None Random characters 6.01

The last entry in the table shows the Entropy for a random text – and is getting on for double the Entropy of the Voynich.