Home > 1, 4, 8, 9, a, Algorithms, c, cc, Currier, e, gallows, Herbal Folios, Languages, Recipes Folios, Rene Zandbergen > The Relationship Between Currier Languages “A” and “B”

The Relationship Between Currier Languages “A” and “B”

Captain Prescott Currier, a cryptographer, looked at the Voynich many moons ago, and made some very perceptive comments about it, which can be seen here on Rene Zandbergen’s site.

In particular, he noticed that the handwriting was different between some folios and others, and he also noticed (based on glyph/character counts) that there were two “languages” being used.

When I first looked at the manuscript, I was principally considering the initial (roughly) fifty folios, constituting the herbal section. The first twenty-five folios in the herbal section are obviously in one hand and one ‘‘language,’’ which I called ‘‘A.’’ (It could have been called anything at all; it was just the first one I came to.) The second twenty-five or so folios are in two hands, very obviously the work of at least two different men. In addition to this fact, the text of this second portion of the herbal section (that is, the next twenty-five of thirty folios) is in two ‘‘languages,’’ and each ‘‘language’’ is in its own hand. This means that, there being two authors of the second part of the herbal section, each one wrote in his own ‘‘language.’’ Now, I’m stretching a point a bit, I’m aware; my use of the word language is convenient, but it does not have the same connotations as it would have in normal use. Still, it is a convenient word, and I see no reason not to continue using it.

We can look at some statistics to see what he was referring to. Let’s compare the most common words in Folios 1 to 25 (in the Herbal section, Language A, written in Hand 1) and in Folios 107 to 116 (in the Recipes section, Language B, written in a different Hand):

Comparison between word frequencies in Languages A and B

Comparison between word frequencies in Languages A and B

So, for example, in Language A the most common word is “8am” and it occurs 192 times in the folios, whereas in Language B the most common word is “am”, occuring 137 times.

We might expect that these are the same word, enciphered differently. The question then is, how does one convert between words in Language A and words in Language B, and vice versa? In the case of the “8am” to “am” it’s just a question of dropping the “8”, as if “8” is a null character in Language A. In the case of the next most popular words, “1oe”(A) and “1c89″(B) it looks like “oe”(A) converts to “c89″(B). And so on.

If we look at the most popular nGrams (substrings) in both Languages, perhaps there is a mapping that translates between the two. Perhaps the cipher machinery that was used to generate the text had different settings, that produced Language A in one configuration, and Language B in another. Perhaps, if we look at the nGram correspondence that results in the best match between the two Languages, a clue will be revealed as to how that machinery worked.

This involves some software (I’m using Python now, which is fun). The software first calculates the word frequencies for Language A and B in a set of folios (the table above is an output from this stage). It then calculates the nGram frequencies for each Language. Here are the top 10:


The software then runs a Genetic Algorithm to find the best mapping between the two sets of nGrams, so that when the mapping is applied to all words in Language B, it produces a set of words in Language A the frequencies of which most closely match the frequencies of words observed in Language A (i.e.  the frequencies shown in the first table above).

Here is an initial result. With the following mapping, you can take most common words in Language B, and convert them to Language A.

Table for converting between a Language B word and a Language A word

Table for converting between a Language B word and a Language A word

A couple of remarks. This is an early result and probably not the best match. There are some interesting correspondences :

  • “9” and “c” are immutable, and have the same function
  • Another interesting feature is that “4o” in Language B maps to “o” in Language A, and vice versa!
  • in Language B, “ha” maps to “h” in Language A, as if “a” is a null

In the Comments, Dave suggested looking at word pair frequencies between the Languages. Here is a table of the most common pairs in each Language.

Common word pairs in Languages A and B

Common word pairs in Languages A and B

For clarity, I am using what I call the “HerbalRecipesAB” folios for this study i.e.

Using folios for HerbalRecipeAB : [107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

More results coming …

  1. March 1, 2013 at 12:30 pm

    Fascinating! Really nice idea and approach. I’m curious to know if a secondary objective can be incorporated into the fitness function, whereby the “interesting correspondences” are maximized as well as the correspondences between word frequencies. Maybe the putative mapping between words is stronger when there’s some relationship between the symbols in each word (for example, the mapping between “4o” and “o”, and the mapping between “ha” and “h”).

    Also, would it be possible to explore correspondences of two-word pairs? For example, consider the phrase “w1 w2” in language A, and “w3 w4” in language B. If the phrases occur at roughly the same frequencies in both languages, AND the individual words occur at roughly the same frequencies (via your mappings), the hypothesis that they bear the same contextual meaning might be strengthened.

  2. JB
    March 1, 2013 at 1:04 pm

    Dave, these are great suggestions! The idea of using word pairs is very powerful. I am going to try that out.

    The fitness function in the GA is something I’m not yet happy with. Basically, it computes the sum of squares of the difference between the ranking of the converted word (from B) and the ranking of the same word from A. This tends to zero when the mapping results in the same words/frequencies are generated from B.

    I also tried weighting that function value with the number of converted B words that actually appear in the dictionary for A, but it was not successful (it didn’t improve the results).

  3. March 1, 2013 at 2:51 pm

    In the comparison table. the glyph drawn against 4ohii9 does not occur – should it not be 4ohcc9 (not sure which transcription you are using).

    • JB
      March 1, 2013 at 3:20 pm

      Hi Tony,

      I run a “simplification” on Voyn_101 (as I don’t agree with the nuances he has ascribed between some glyphs e.g. “2” and “3” and “5” – they are the same IMHO) and one conversion I make is C -> ii which may be where 4ohii9 comes from (would have been 4ohC9) … is that what you are referring to?

      Here’s my full table of conversions:

      seek = [“3”, “5”, “+”, “%”, “#”, “6”, “7”, “A”, “X”,
      “I”, “C”, “z”, “Z”, “j”, “u”, “d”, “U”, “P”,
      “Y”, “$”, “S”, “t”, “q”,
      “m”, “M”, “n”, “Y”, “!”]
      repl = [“2”, “2”, “2”, “2”, “2”, “8”, “8”, “a”, “y”,
      “ii”, “ii”, “iy”, “iiy”, “g”, “f”, “ccc”, “F”, “ip”,
      “y”, “s”, “cs”, “s”, “iip”,
      “m”, “M”, “n”, “y”, “2”]

      • March 1, 2013 at 4:03 pm

        I was referring to the hand written glyph where you have accidentally written iiy (as in the transcription) instead of ccy as it should appear. (I realise its just an oversight but thought you’d like to be aware of it)

  4. March 1, 2013 at 3:04 pm

    This could certainly advance the quest if it works out. Some of the frequencies may be too close to assign to rigid ranks. Maybe word pair tests can sort them out — with different trials. In “B” Swap “am” with “1c89”, for instance. Could be a lot of work. In “B” there are “oe” (0.028), “4oham” (0.028) and “4ohan” (0.029).

    • JB
      March 1, 2013 at 3:22 pm

      Hi Knox,

      Yes, the *absolute* ranks aren’t reliable, I agree, which is why my cost function looks at the squared difference between the (fractional) ranks … so as long as the words are close in the list they should get a good score.

  5. March 1, 2013 at 3:37 pm


    in my article “How many hands . .._?”
    I suggested that “two hands” and “two languages” are quite different things.
    I also reached the conclusion, that there was only one hand that wrote the VM, based on handwriting analysis I have done. I am no expert but I used the common methods from textbooks and reached similar conclusion that was presented by one handwriting expert to Currier who dismissed it off-hand (without having any expertise himself :-).The changes in VM handwriting could be caused by age, sickness or mental health. the changes in the text could be caused by different technical lingo, language or different code. So it could have been the same person whose handwriting changed with age and he later used also different code.After all, Currier started with five hands which he later considered rather too much.

    On the contrary, the comparison of the TEXT of different VM sections truly shows different statistics (while I would not call it yet “languages” as Currier did. ) Your idea to compare those sections ANALYTUCALLY (not just statistically) may truly reveal what different codes (or words) the author used and true relation. between different texts. As far as I know, nobody really have done that yet..

    The results can be then compared with similar comparison for two say contemporary articles of different subjects (say electrical and chemical, using different technical terms together with common English for both). Should the difference in the VM be bigger than in those two articles in English, we can truly believe that VM author (authors) used two languages – or codes – not just different technical ligo :-). But even with the less difference, the comparison could show very interesting results that could even lead to partial cracking of the VM. .

    By the way, could you pls put in your Blogroll the link to my site?

    Thanks, Jan

    • JB
      March 1, 2013 at 3:48 pm

      Hi Jan – thanks for your comment. I added your Blog – happy to do so 🙂

      When I first read Currier’s material I was confused between the hands and the languages he referred to. Now my feeling is that the difference in hands, whatever it is ascribed to (different people or the same person older) is not especially interesting. The two languages’ or encryptions’ statistics, on the other hand, should be a big clue as to what is going on.

      I’m still not very sure how best to tackle the analysis, though 🙂

  6. JB
    March 1, 2013 at 4:08 pm

    tony :

    I was referring to the hand written glyph where you have accidentally written iiy (as in the transcription) instead of ccy as it should appear. (I realise its just an oversight but thought you’d like to be aware of it)

    I’m being dense: I can’t see where you are referring to, and would like to so I can correct it!

    • March 1, 2013 at 4:35 pm

      In the first table at rank 10 on the left hand side you have –
      10 4ohii9 hand written next to it is 4ohiiy when it should appear as 4ohccy.

  7. March 1, 2013 at 4:53 pm

    In the simple EVA it is qokeey not qokiiy which never occurs – It’s late here but I have that font and will look at it tomorrow.
    Regards Tony

    • March 1, 2013 at 8:24 pm

      seek = [“3”, “5”, “+”, “%”, “#”, “6”, “7”, “A”, “X”,
      “I”, “C”, “z”, “Z”, “j”, “u”, “d”, “U”, “P”,
      “Y”, “$”, “S”, “t”, “q”,
      “m”, “M”, “n”, “Y”, “!”]
      repl = [“2”, “2”, “2”, “2”, “2”, “8”, “8”, “a”, “y”,
      “ii”, “ii”, “iy”, “iiy”, “g”, “f”, “ccc”, “F”, “ip”,
      “y”, “s”, “cs”, “s”, “iip”,
      “m”, “M”, “n”, “y”, “2”]

      In the above
      Seek “C” should be replaced with “cc” not “ii”

      • JB
        March 1, 2013 at 9:57 pm

        Good catch, Tony – thanks!

  8. MaRi
    March 2, 2013 at 3:16 am

    This is the most interesting thing anyone has done for a long time!

    Your results seem to back up some ideas that I have had in my mind for some time and I wish to share:
    – it seems to me that the VMs letters and words are in fact numbers belonging to a system which mixes the roman type of notation with the arabic one
    – if the code is a simple one but well disguised, as I suspect it is, there must often be more than one number in one “word”, which means that there must be dummy letters acting as delimiters or as signs to switch to a different set of plaintext letters for the numbers
    – the single “letters” have end forms, middle forms and stand-alone forms depending on their place in a word, i.e., n may be the end form of i, and y could be the end form of c
    – for example, aiin could be ‘roman’ VIII, in which case daiin could be 28, or the d (8) could be a dummy or a ‘switch’
    – my personal favourite as delimiters are the gallow letters and ‘8’
    – I also think that, e.g., lccy could very well be the same number as aiin

    All that said, I am not at all sure that there necessarily is any underlying plaintext at all – but there must be some rules to make the text resemble a real coded text.

    Best regards,

    • JB
      March 3, 2013 at 6:04 pm

      Hi Marianna,

      My results so far bear out your ideas. I get results that vary between the sections, but the majority of the glyphs match between Language A and B. The ones that tend not to match are the following:

      8,K y,e p,g k,h 1,2

      I’ve paired them like that because that’s how they behave. A typical result has the following properties:

      Chromosome [‘o’, ‘9’, ‘1’, ‘a’, ‘i’, ‘K’, ‘c’, ‘y’, ‘h’, ‘e’, ‘k’, ‘N’, ‘2’, ‘s’, ‘4’, ‘p’, ‘g’, ‘?’, ‘8’, ‘H’]
      ngramsA [‘o’, ‘9’, ‘1’, ‘a’, ‘i’, ‘8’, ‘c’, ‘e’, ‘h’, ‘y’, ‘k’, ‘N’, ‘2’, ‘s’, ‘4’, ‘g’, ‘p’, ‘?’, ‘K’, ‘H’]

      … i.e. to preserve word frequency when converting from Language B to A, you have to change K to 8, 8 to K, y to e, e to y and so on.

      This is a very curious effect, and does hint at these glyphs being a signal for some sort of code change or to signify what follows is a number, or a word, for example.


  9. March 2, 2013 at 3:25 am

    Noticed also seek m,M,n are replaced by themselves! – did’nt you intend to replace them with iiN,iiiN,iN?

    • JB
      March 2, 2013 at 8:59 am

      Sometimes I’m not sure if m,M,n are glyphs or not. I have been treating them as glyphs, rather than combinations of glyphs, for some of these tests.

    • March 2, 2013 at 12:30 pm

      At one time, there actually was a plaintext end form for Roman i. It was j.
      That gives us (in EVA) ij, iij, iiij to end the numbers. I don’t think there was a single j ending but let’s accept it, anyway, so we can have (EVA-n = final-i)
      an = vi
      ain = vii
      aiin = viii
      aiin = viii
      aiiin = viiii
      We don’t need any more.
      and I’ll guess that am = ain = vii
      d = x
      daiin, daiin = xvii, xviii = 17, 18
      e = another x
      o = l (lowercase-L)
      For a long time, there were many forms of combined Roman and Arabic (Indian) symbols to represent numbers.
      eody = ?
      What would dy represent? What is final-r? There are not many a endings. I suppose everything else is contradictory. Sorry to get so far off track. I’m looking forward to see where GA leads.

      • JB
        March 3, 2013 at 5:54 pm

        What is your best guess for y, e, p, 8? From my results so far there is something odd going on with these glyphs.

  10. March 2, 2013 at 5:25 am

    In my opinion it could be two different languages ​​or dailèctes, using, perhaps, two different alphabets.

  11. proto57
    March 2, 2013 at 5:51 am

    Very clever, Julian. A great idea. Best of luck.

  12. Vaughan Fulford
    October 8, 2013 at 11:53 am

    I just watched the video for the first time and thought id look at the manuscript and noticed all the numbers, letters that look like numbers.

    Has anyone looked at it that way before? Just using the numbers that appear.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: