Home > Algorithms, cipher, Stolfi > Word Length Distributions

Word Length Distributions

In the previous blog post, we looked at the distribution of word lengths in the EVA transcription, and compared it with the binomial distribution for 9, as per the work of Stolfi. They matched well enough, as I had denoted EVA ch, sh, ain, aiin and qo as single glyphs, in a similar fashion as Stolfi:

For this page, we will define symbol as Currier did; i.e. EVA ch ans sh will be counted as single symbols, and so are EVA cth, ckh, etc..

https://www.ic.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/

i.e. he reduced some of the EVA glyph sequences to single symbols.

Without making these reductions, so leaving the EVA transcription unchanged, the distributions of course tend to higher values. As a check of my sanity, Marco Ponzi was kind enough to send me a list of VMS words he’d extracted from the ZL transcription, so that I could compare it with the words I extracted from the Takaheshi EVA. In the following plot I show the three word length distributions: EVA, ZL and the reduced EVA with ch, sh, ain, aiin and qo as single glyphs.

Reassuringly, the EVA and ZL (green and blue curves) match quite well, as they should, and the Reduced matches Stolfi’s result. (Curiously, the ZL transcription has a total of 8078 different words, compared with 7552 for Takaheshi EVA – which warrants further investigation.)

The EVA distribution now matches a binomial of (n=12,p=0.5), i.e. using 12 cipher wheels with a probability of 50% for a glyph being used from each wheel.

  1. Claire Bowern
    August 13, 2021 at 5:42 pm

    I wonder if the differences in counts are due to unreadable (or uncertain) characters? Like how * is treated?

    • JB
      August 13, 2021 at 9:10 pm

      I took a look at a few words that were in ZL but not in Takaheshi, and they seemed to be mostly words with an extra “d” at the start, but I didn’t investigate thoroughly. A bit odd, frankly.

      • Marco Ponzi
        August 14, 2021 at 8:36 am

        Hi Julian and Claire,
        the word list I sent is based on the TT_ivtff_v0a.txt file, where TT stands for Takeshi Takahashi. It likely is a different version than that used by Julian.
        As I wrote to Julian, I computed the list a few years ago and I am not sure about all the details. Something I pointed out is that I removed the occurrences of ‘?’, which probably is not a good choice.

  2. Nikolai
    August 19, 2021 at 12:45 pm

    Good day!
    Your site has information about the Voynich manuscript.
    I am deciphering the Voynich manuscript and received a positive result.
    There is a key to cipher the Voynich manuscript.
    The key to the cipher manuscript placed in the manuscript. It is placed throughout the text. Part of the key hints is placed on the sheet 14. With her help was able to translate a few dozen words that are completely relevant to the theme sections.
    The Voynich manuscript is not written with letters. It is written in signs. Characters replace the letters of the alphabet one of the ancient language. Moreover, in the text there are 2 levels of encryption. I figured out the key by which the first section could read the following words: hemp, wearing hemp; food, food (sheet 20 at the numbering on the Internet); to clean (gut), knowledge, perhaps the desire, to drink, sweet beverage (nectar), maturation (maturity), to consider, to believe (sheet 107); to drink; six; flourishing; increasing; intense; peas; sweet drink, nectar, etc. Is just the short words, 2-3 sign. To translate words with more than 2-3 characters requires knowledge of this ancient language. The fact that some symbols represent two letters. In the end, the word consisting of three characters can fit up to six letters. Three letters are superfluous. In the end, you need six characters to define the semantic word of three letters. Of course, without knowledge of this language make it very difficult even with a We can say that the Voynich manuscript is an encyclopedia of knowledge that humanity needs today. I managed to partially solve the mystery of mount Kailas ( for example, its height is 6825 meters). The manuscript indicates the place where the Grail Is hidden, as well as the Font and Cradle of Jesus.
    For more information, see my article https://scieuro.com/wp-content/uploads/2020/04/february-2020.pdf
    I am ready to share information.
    With respect, Nikolai.
    I am looking for a person, or even an organization, who will decide to responsibly continue to decipher the Voynich manuscript. I would be grateful if you would let me know.

    • JB
      August 19, 2021 at 1:04 pm

      Hi Nikolai, I suggest you go to Voynich Ninja if you want feedback on your proposed decryption.

  1. April 27, 2022 at 8:47 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: