Word Length Distributions

In the previous blog post, we looked at the distribution of word lengths in the EVA transcription, and compared it with the binomial distribution for 9, as per the work of Stolfi. They matched well enough, as I had denoted EVA ch, sh, ain, aiin and qo as single glyphs, in a similar fashion as Stolfi:

For this page, we will define symbol as Currier did; i.e. EVA ch ans sh will be counted as single symbols, and so are EVA cth, ckh, etc..


i.e. he reduced some of the EVA glyph sequences to single symbols.

Without making these reductions, so leaving the EVA transcription unchanged, the distributions of course tend to higher values. As a check of my sanity, Marco Ponzi was kind enough to send me a list of VMS words he’d extracted from the ZL transcription, so that I could compare it with the words I extracted from the Takaheshi EVA. In the following plot I show the three word length distributions: EVA, ZL and the reduced EVA with ch, sh, ain, aiin and qo as single glyphs.

Reassuringly, the EVA and ZL (green and blue curves) match quite well, as they should, and the Reduced matches Stolfi’s result. (Curiously, the ZL transcription has a total of 8078 different words, compared with 7552 for Takaheshi EVA – which warrants further investigation.)

The EVA distribution now matches a binomial of (n=12,p=0.5), i.e. using 12 cipher wheels with a probability of 50% for a glyph being used from each wheel.

  1. Claire Bowern
    August 13, 2021 at 5:42 pm

    I wonder if the differences in counts are due to unreadable (or uncertain) characters? Like how * is treated?

    • JB
      August 13, 2021 at 9:10 pm

      I took a look at a few words that were in ZL but not in Takaheshi, and they seemed to be mostly words with an extra “d” at the start, but I didn’t investigate thoroughly. A bit odd, frankly.

      • Marco Ponzi
        August 14, 2021 at 8:36 am

        Hi Julian and Claire,
        the word list I sent is based on the TT_ivtff_v0a.txt file, where TT stands for Takeshi Takahashi. It likely is a different version than that used by Julian.
        As I wrote to Julian, I computed the list a few years ago and I am not sure about all the details. Something I pointed out is that I removed the occurrences of ‘?’, which probably is not a good choice.

    • JB
      August 19, 2021 at 1:04 pm

      Hi Nikolai, I suggest you go to Voynich Ninja if you want feedback on your proposed decryption.

