Folio Similarities

Something Knox said recently made me wonder how the vocabulary of the VMs folios changes throughout the manuscript.

I made some counts and filled them into an Excel spreadsheet. I defined the Similarity between folio i and j to be computed as follows:

1) List all unique words in Folio i = Ni
2) List all unique words in Folio j = Nj
3) List all unique words appearing in both Folio i and Folio j = Mij

Then compute Similarity = Mij / (Ni + Nj – Mij)

(If Folio i contains exactly the same words as Folio j then S = 1, and if it contains no words in common with Folio j then S = 0)

You can see a visual pattern of of the Similarity distribution here:

(I have a feeling I’ve seen something similar to this for the Voynich before … but can’t find it now – can someone help? – see References below!)

This contour plot is symmetric about a line running diagonally from the left hand bottom corner to the top right hand corner, corresponding to i=j (for which I set the values to 0 for easier viewing).

The rectangular red region around folios 140 to 165 corresponds to strong similarity in the VMs folios f75r to f84v – the Biological Folios. These pages all typically share up to 50% of the same words.

What I found surprising is the generally low level of shared vocabulary between the folios: typically only a few of the words used on one folio are used on the next – but see below.

The spreadsheet answers questions like “Which folio is most similar to folio f1v?” … the answer being f24r by this metric.


Using the Similarity number as a connection strength between each pair of folios, we can generate a cluster map that arranges the folios so that similar folios appear together. I used the freely available software called LinLogLayout to do this. Here are the results:

The algorithm has split the folios into two clusters, shown as red and blue circles. Interestingly, the red circles generally match Currier Hand 1 and the blue match Currier Hand 2. For some folios near the interface, e.g. f68r1, the Currier Hand is “unknown” (according to … indicating uncertainty in the attribution, consistent with the folio’s position on the cluster map.

For folio f103v, at the far right edge of the blue cluster, the Currier Hand is “X”.

Comparison with a Latin Text

Here I took the Latin Herb Garden and split it into 20 folios corresponding to each of the herbs described. Then I ran the same code against it to generate the similarities between each folio, and made an Excel spreadsheet.. The corresponding contour plot is shown below, with the same colour scale as the one for the Voynich above.

As you can see, the typical value of “Similarity” between folios is around 0.02 or so … much *lower* than for the Voynich. The conclusion is that the Voynich folios are much more alike than this Latin text, and the Biological Folios in particular are quite unusually similar.


This is very similar work to that done by Rene in 1997: although his word counting rules are different (I only count unique words).

Comment by Nick Pelling

Nick sent me the following email and included an annotated version of the LinLogLayout shown above.

Having played with it a bit (as per the attached jpeg), it appears that while some pages’ recto and verso sides are very similar, others are wildly different. For example, just in the recipe section:-
103    good
104    very bad
105    very good
106    bad
107    excellent
108    excellent
109    (missing)
110    (missing)
111    excellent
112    good
113    excellent
114    excellent
115    very bad
116    n/a

Looking at pages within recipe bifolios, however, yields different results again: for example, even though both f104 and f115 are both “bad” above (and are on the same bifolio), f104v is extremely similar to f115r, while f104r is extremely similar to f115v (which is a bit odd). Furthermore, the closeness between f111v and f108r suggests that these originally formed the central bifolio (but reversed), i.e. that the correct page order across the centre was f111r, f111v, f108r, f108v. However, f105 / f114 seem quite unconnected, as do f106 / f113 and f107 / f112.

At this point, however, we may be mining too deeply, and that the presence of so many datapoints in a single overall set may be getting in the way. I suspect that pre-partitioning the dataset (i.e. working on each thematic section in isolation) may yield more informative results.

  1. Luciano Piccini
    August 12, 2012 at 8:55 am

    Note: The comment is of general nature on the Voynich MS, and not related with the specific matter of this topic in the blog. Having no other mean to contact the group of persons which have been working on the MS, I found this blog as a good way to express what could be a wild idea of who may have wriiten the MS with some good arguments to explore by the people who deeply study and analize the it.
    I am just a curious person who saw a documentary program on the Voynich MS on History Channel. Due to part of the nature of the book, astronomy, and the fact that Copernicus was the first person to formulate the theory of heliocentric cosmology, I started to read facts of his life and here is a short list of facts that have some match with elements of the MS (

    1.- He lived between 1473 and 1543

    2.- He studied astronomy in deep with Albert Brudzewski and with Domenico Maria Novara da Ferrara

    3.- He lived in Italy

    4.- He spoke Latin, German, and Polish with fluency. He also was able to speak Greek and Italian

    5.- He was raised and lived in a wealthy environment.

    6.- He was a priest

    7.- Was never married nor had descendants.

    8.- He studied medicine

    9.- He was reluctant to publish his work. Fearing the Church reaction, such as Galileo Galilei had to face decades later?

    10.- His work, De revolutionibus orbium coelestium, was given to scribes to be written… Is the Voynich MS an ancestor of the final work?

    These seem to be many good elements that fit in the contextual picture of the Voynich MS

    As I said at the beginning of this comment, I am not an expert, nor will be. I am just presenting to all of you an idea to be explored (just in case it was not investigated before, because so far has not been mentioned as a possibility ( / René Zandbergen)

    Thanks in advance for your attention.

    L. Piccini

    • JB
      August 12, 2012 at 12:17 pm

      Thanks for the interesting comment, Luciano.

      There is an email list for people interested in the VMs, which you could join here:, and I’m sure people there would also be interested in your theory, and probably better qualified to comment on it than I am 🙂

      • Luciano Piccini
        August 18, 2012 at 11:19 am

        Thank You for redirecting me where to place this idea.

        L. Piccini

  2. May 11, 2015 at 2:29 am

    hello Julian.
    Once again, I’m writing to ask permission to reproduce some of your work. In this case it’s the second diagram, which is by far the best graphic representation I’ve seen of how the ‘hands’ are distributed.

    May I?

    • JB
      May 11, 2015 at 12:08 pm

      You may use anything you like, as long as you have the courtesy to attribute it to its source 🙂

  3. May 11, 2015 at 7:35 pm

    Of course – I only wish that the people who constantly re-use and re-work my own research would pay attention to the same formalities. I’ve been trying for some time now to explain to various Voynicheros that this is not simply a matter of whinging for credit, but that the value of proper credits is that later researchers can track information back to its source, and so evaluate its real worth. As I’m sure you will understand.

    Thanks for the permission. I’m checking the distribution against the four folios that were radiocarbon dated. Part of a current series of posts about the manuscript’s codicology.


  4. May 11, 2015 at 7:37 pm

    PS – the post in which I included another of your distributions has been enormously popular and is still often visited after all this time. 🙂

  5. JB
    May 12, 2015 at 9:14 am

    Is that the post on the weaving patterns? I very much like that idea of yours, and it merits further investigation – e.g. by trying to match the glyph patterns in the VMS with known weave patterns and colours, perhaps using a Genetic Algorithm.

