Archive

Archive for the ‘Rene Zandbergen’ Category

The Relationship Between Currier Languages “A” and “B”

March 1, 2013 24 comments

Captain Prescott Currier, a cryptographer, looked at the Voynich many moons ago, and made some very perceptive comments about it, which can be seen here on Rene Zandbergen’s site.

In particular, he noticed that the handwriting was different between some folios and others, and he also noticed (based on glyph/character counts) that there were two “languages” being used.

When I first looked at the manuscript, I was principally considering the initial (roughly) fifty folios, constituting the herbal section. The first twenty-five folios in the herbal section are obviously in one hand and one ‘‘language,’’ which I called ‘‘A.’’ (It could have been called anything at all; it was just the first one I came to.) The second twenty-five or so folios are in two hands, very obviously the work of at least two different men. In addition to this fact, the text of this second portion of the herbal section (that is, the next twenty-five of thirty folios) is in two ‘‘languages,’’ and each ‘‘language’’ is in its own hand. This means that, there being two authors of the second part of the herbal section, each one wrote in his own ‘‘language.’’ Now, I’m stretching a point a bit, I’m aware; my use of the word language is convenient, but it does not have the same connotations as it would have in normal use. Still, it is a convenient word, and I see no reason not to continue using it.

We can look at some statistics to see what he was referring to. Let’s compare the most common words in Folios 1 to 25 (in the Herbal section, Language A, written in Hand 1) and in Folios 107 to 116 (in the Recipes section, Language B, written in a different Hand):

Comparison between word frequencies in Languages A and B

Comparison between word frequencies in Languages A and B

So, for example, in Language A the most common word is “8am” and it occurs 192 times in the folios, whereas in Language B the most common word is “am”, occuring 137 times.

We might expect that these are the same word, enciphered differently. The question then is, how does one convert between words in Language A and words in Language B, and vice versa? In the case of the “8am” to “am” it’s just a question of dropping the “8”, as if “8” is a null character in Language A. In the case of the next most popular words, “1oe”(A) and “1c89″(B) it looks like “oe”(A) converts to “c89″(B). And so on.

If we look at the most popular nGrams (substrings) in both Languages, perhaps there is a mapping that translates between the two. Perhaps the cipher machinery that was used to generate the text had different settings, that produced Language A in one configuration, and Language B in another. Perhaps, if we look at the nGram correspondence that results in the best match between the two Languages, a clue will be revealed as to how that machinery worked.

This involves some software (I’m using Python now, which is fun). The software first calculates the word frequencies for Language A and B in a set of folios (the table above is an output from this stage). It then calculates the nGram frequencies for each Language. Here are the top 10:

nGramFrequencies

The software then runs a Genetic Algorithm to find the best mapping between the two sets of nGrams, so that when the mapping is applied to all words in Language B, it produces a set of words in Language A the frequencies of which most closely match the frequencies of words observed in Language A (i.e.  the frequencies shown in the first table above).

Here is an initial result. With the following mapping, you can take most common words in Language B, and convert them to Language A.

Table for converting between a Language B word and a Language A word

Table for converting between a Language B word and a Language A word

A couple of remarks. This is an early result and probably not the best match. There are some interesting correspondences :

  • “9” and “c” are immutable, and have the same function
  • Another interesting feature is that “4o” in Language B maps to “o” in Language A, and vice versa!
  • in Language B, “ha” maps to “h” in Language A, as if “a” is a null

In the Comments, Dave suggested looking at word pair frequencies between the Languages. Here is a table of the most common pairs in each Language.

Common word pairs in Languages A and B

Common word pairs in Languages A and B

For clarity, I am using what I call the “HerbalRecipesAB” folios for this study i.e.

Using folios for HerbalRecipeAB : [107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

More results coming …

Folio Similarities

February 26, 2010 8 comments

Something Knox said recently made me wonder how the vocabulary of the VMs folios changes throughout the manuscript.

I made some counts and filled them into an Excel spreadsheet. I defined the Similarity between folio i and j to be computed as follows:

1) List all unique words in Folio i = Ni
2) List all unique words in Folio j = Nj
3) List all unique words appearing in both Folio i and Folio j = Mij

Then compute Similarity = Mij / (Ni + Nj – Mij)

(If Folio i contains exactly the same words as Folio j then S = 1, and if it contains no words in common with Folio j then S = 0)

You can see a visual pattern of of the Similarity distribution here:

(I have a feeling I’ve seen something similar to this for the Voynich before … but can’t find it now – can someone help? – see References below!)

This contour plot is symmetric about a line running diagonally from the left hand bottom corner to the top right hand corner, corresponding to i=j (for which I set the values to 0 for easier viewing).

The rectangular red region around folios 140 to 165 corresponds to strong similarity in the VMs folios f75r to f84v – the Biological Folios. These pages all typically share up to 50% of the same words.

What I found surprising is the generally low level of shared vocabulary between the folios: typically only a few of the words used on one folio are used on the next – but see below.

The spreadsheet answers questions like “Which folio is most similar to folio f1v?” … the answer being f24r by this metric.

Clustering

Using the Similarity number as a connection strength between each pair of folios, we can generate a cluster map that arranges the folios so that similar folios appear together. I used the freely available software called LinLogLayout to do this. Here are the results:

The algorithm has split the folios into two clusters, shown as red and blue circles. Interestingly, the red circles generally match Currier Hand 1 and the blue match Currier Hand 2. For some folios near the interface, e.g. f68r1, the Currier Hand is “unknown” (according to http://voynich.freie-literatur.de/index.php?show=page&id=f68r1) … indicating uncertainty in the attribution, consistent with the folio’s position on the cluster map.

For folio f103v, at the far right edge of the blue cluster, the Currier Hand is “X”.

Comparison with a Latin Text

Here I took the Latin Herb Garden and split it into 20 folios corresponding to each of the herbs described. Then I ran the same code against it to generate the similarities between each folio, and made an Excel spreadsheet.. The corresponding contour plot is shown below, with the same colour scale as the one for the Voynich above.

As you can see, the typical value of “Similarity” between folios is around 0.02 or so … much *lower* than for the Voynich. The conclusion is that the Voynich folios are much more alike than this Latin text, and the Biological Folios in particular are quite unusually similar.

References

This is very similar work to that done by Rene in 1997: http://www.voynich.nu/extra/lang.html although his word counting rules are different (I only count unique words).

Comment by Nick Pelling

Nick sent me the following email and included an annotated version of the LinLogLayout shown above.

Having played with it a bit (as per the attached jpeg), it appears that while some pages’ recto and verso sides are very similar, others are wildly different. For example, just in the recipe section:-
103    good
104    very bad
105    very good
106    bad
107    excellent
108    excellent
109    (missing)
110    (missing)
111    excellent
112    good
113    excellent
114    excellent
115    very bad
116    n/a

Looking at pages within recipe bifolios, however, yields different results again: for example, even though both f104 and f115 are both “bad” above (and are on the same bifolio), f104v is extremely similar to f115r, while f104r is extremely similar to f115v (which is a bit odd). Furthermore, the closeness between f111v and f108r suggests that these originally formed the central bifolio (but reversed), i.e. that the correct page order across the centre was f111r, f111v, f108r, f108v. However, f105 / f114 seem quite unconnected, as do f106 / f113 and f107 / f112.

At this point, however, we may be mining too deeply, and that the presence of so many datapoints in a single overall set may be getting in the way. I suspect that pre-partitioning the dataset (i.e. working on each thematic section in isolation) may yield more informative results.