Anagram Analysis

February 26, 2010 Leave a comment Go to comments

Several people, notably Robert Teague and Philip Neal, have theorized that the cipher is based on anagrams. The idea is that to decipher the VMs text, you first convert the VMs symbols to alphabet letters, and then you find an anagram of those letters that makes a valid plaintext word.

Variations on this theme are that e.g. some of the plaintext letters in the target word are allowed to be missing and can be inferred from “context”, and/or that the VMs words are anagrams of the plaintext words that have letters arranged alphabetically.

For this analysis, we look at Folio68r3 in particular:

The labels on the stars are, from the ten o’clock pie slice going anti-clockwise, in the Voyn 101 encoding:





Anagrams: Combinations/Permutations

If you make the reasonable assumption that all the labels on the star shapes in f68r3 are star names, and you assume a target language, then this puts severe constraints on the cipher, since there are only so many star names and only so many cipher schemes that can be consistent amongst all the stars.

For each of the 12 star labels on f68r3 (61oe7a9 8oay9 oae1coe oh1o89 8ayaee ok98* oK98 ohoe19 1G9 179,9h9 ohos okoy9 ) there are a number of possible anagrams of the symbols (anagrams):

Label size 3 6 anagrams
Label size 4 24 anagrams
Label size 5 120 anagrams
Label size 6 720 anagrams
Label size 7 5040 anagrams

It appears like there is a lot of freedom: the first label “61oe7a9” can be rearranged 5040 different ways – and you can pick any letter for any of the 7 symbols! However, when you start actually choosing the mapping between symbols and alphabet characters, and require that at least one of  the anagrams for each deciphered label appears in a dictionary of star names, you rapidly eliminate many of the possibilities.

This is still true even if you allow that a single symbol in the label can map to two alphabet characters – the problem appears to be still over-constrained

For example, say you pick a mapping for the first label:

S: 6  1  o  e  7  a  9
R: d  r  e  ba an a  l
61oe7a9 -> drebaanal -> aldebaran

then you want to apply this to the next label “8oay9”. You already have the cipher for symbols o, a and 9 … they translate to e, a and l respectively – you know that the star name must contain e, a and l. You know that the star name must be between 5 and 7 characters long, and you just need to pick suitable letters for 8 and y.

Consulting a list of ~300 star names you find the following possibilities.

Alkes, Algedi, Alheka, Alhena, Elnath, Lesath, Alcyone, Algebar, Algorel, Ascella, Capella, Eltanin and Sheliak

(Since this Don Latham sent me a link to his list of star names at , which I have simplified and removed any duplicates and non-alphabetic symbols from, and put here: )

Choosing Alcyone (for reasons which will be obvious to some), you now have an extended mapping that includes 8 and y:

S: 6  1  o  e  7  a  9  8  y
R: d  r  e  ba an a  l  cy on
61oe7a9 -> drebaanal -> aldebaran
8oay9 -> cyeaonl -> alcyone

So far so good. Now we come to the third label: “oae1coe” which we can decipher all but one symbol of, to: “eabar?eba”.

Unhappily there are no 9 or 10 letter stars in the dictionary that can match this letter assortment. and so this eliminates our initial choice of mapping that gave us “aldebaran” for the first word, and “alcyone” for the second, because it cannot produce a dictionary word for the third label. Back to square one.

You can keep on doing this: selecting mappings, trying them out on the first two or three labels, and finding that there is no fit to the dictionary.

Although at first sight there seems to be a lot of flexibility from the use of anagrams in the cipher, in practice if an anagram of each and all the deciphered words is required to appear
in a dictionary, then that severely constrains the solution space.

Allowing Missing Characters

Looking again at the 12 star labels on f68r3:

61oe7a9 8oay9 oae1coe oh1o89 8ayaee ok98* oK98 ohoe19 1G9 179,9h9 ohos okoy9

There are 16 different VMs symbols used:

 o 9 1 e a 8 h y 7 k 6 c * K G s

Let’s assume that each of the VMs symbols maps to a single plaintext alphabet letter. Which 16 of the 26 letters in the alphabet will we choose to map? Let’s look at our list of star names and find the 16 most used alphabet letters:

 a i r e l s h n u t b k m d o c

(shown in order of frequency). How many different ways can we map the 16 VMs symbols to these 16 letters? That is:

Factorial 16 = 16! = 20,922,789,888,000

which is about 21 trillion (give or take), and is quite a lot. To explore this huge space of possibilities, we can use a Monte Carlo method. Basically we write a program to shuffle the 16 alphabet letters, look to see how that mapping works, then shuffle again, and so on. As we go, we keep track of the best mapping found so far.

Suppose we have the following mapping (cipher):

S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a e n c i l h r o s u d k t b m

Let’s go through the f68r3 star labels and convert them to plaintext using the cipher. After we convert them, let’s look the plaintext characters up in our list of star names using a “fuzzy match”. Here are the results:

61oe7a9 -> unacoie -> ?
8oay9 -> laire -> albireo (bo)
oae1coe -> aicndac -> ?
oh1o89 -> ahnale -> alhena ()
8ayaee -> liricc -> ?
ok98* -> aselk -> sheliak (hi)
oK98 -> atel -> elnath (nh)
ohoe19 -> ahacne -> achernar (rr)
1G9 -> nbe -> deneb (de)
1799h9 -> noeehe -> ?
ohos -> aham -> hamal (l)
okoy9 -> asare -> antares (nt)

Looking at the second label as an example, this is converted to plaintext characters “laire”. The fuzzy match looks up “laire” in the star names list, and finds a match with the star called “albireo”.

Fuzzy Match Rules

There is a match between the deciphered plaintext word and a dictionary word if

  1. All the letters in the plaintext word appear in the dictionary word
  2. There are no more than N missing letters in the plaintext word

In the example shown above, N is 2, so “laire” fuzzy matches “albireo”, with letters “bo” missing (shown in brackets)


In the above example, 8 of the 12 VMs star labels have been matched to valid star names. The application continues exploring the 16! possible arrangements, trying to improve on the number of matches.

After looking at around 70 million arrangements (i.e. about 3 millionths of them), it finds this:

Iteration 67334517 Deciphered=9/12
S: o  9  1  e  a  8  h  y  7  k  6  c  *  K  G  s
R: a  e  l  b  r  h  c  d  o  t  i  u  n  s  k  m
61oe7a9 -> ilabore -> borealis (s)
8oay9 -> harde -> schedar (sc)
oae1coe -> arbluab -> ?
oh1o89 -> aclahe -> alphecca (pc)
8ayaee -> hrdrbb -> ?
ok98* -> atehn -> elnath (l)
oK98 -> aseh -> scheat (ct)
ohoe19 -> acable -> cebalrai (ri)
1G9 -> lke -> alkes (as)
1799h9 -> loeece -> ?
ohos -> acam -> almach (lh)
okoy9 -> atade -> tarazed (rz)

Just how interesting/plausible/believable is this? We can make a control experiment by using a dictionary of dog breeds of about the same size: does mapping the VMs labels on f68r3 to star names produce a better fit than mapping the labels to dog breeds?

We run the program for a few million mappings, first with the star names, then with the dog breed names. For about 4 million mappings, there is a slightly better (9 out of 12) mapping to dog breeds compared with star names (8/12):


Iteration 3779182 Deciphered=9/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: e r o n h l b s d c a p t u i g
61oe7a9 -> aoendhr -> rhodesian (si)
8oay9 -> lehsr -> charles (ca)
oae1coe -> ehnopen -> ?
oh1o89 -> eboelr -> boerboel (bo)
8ayaee -> lhshnn -> ?
ok98* -> ecrlt -> central (na)
oK98 -> eurl -> tulear (ta)
ohoe19 -> ebenor -> redbone (d)
1G9 -> oir -> corgi (cg)
1799h9 -> odrrbr -> ?
ohos -> ebeg -> beagle (al)
okoy9 -> ecesr -> crested (td)


Iteration 4208178 Deciphered=8/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a i e c o l b s t h n u r d k m
61oe7a9 -> neactoi -> ?
8oay9 -> laosi -> polaris (pr)
oae1coe -> aoceuac -> ?
oh1o89 -> abeali -> algieba (g)
8ayaee -> losocc -> ?
ok98* -> ahilr -> alphirk (pk)
oK98 -> adil -> alkaid (ka)
ohoe19 -> abacei -> cebalrai (lr)
1G9 -> eki -> keid (d)
1799h9 -> etiibi -> ?
ohos -> abam -> markab (rk)
okoy9 -> ahasi -> nashira (nr)

Of course, this doesn’t disprove that the labels on f68r3 are in fact star names: it may well be that one of the 16! combinations produces a perfect fit, with “61oe7a9” deciphered to “aldebaran” and “8oay9” deciphered to “alcyone” etc..

Character Position Based Cipher

Looking just at the star labels in Folio 68r, we can extract the letter/symbol frequencies as a function of position in the label.

Here’s what they look like:

1: o 1 8 4 9 k 2 A c y 7 W s ? G e 6 g ▐ h
2: o h k 1 c a 8 e y j C 9 K H 7 2 s u m J + f G ┘ W
3: o c 1 9 h a C e y 8 k K d U + 2 * s H n ª I 6 m Z J
4: o c 9 e a 8 y 1 C h i s k 7 A Q H 2 m J 3 ? d
5: 9 o 8 e y 1 c a m h s K * C 7 S 5 2
6: 9 e 8 a 1 y o 7 c s m i
7: 9 e a c y I o m p 1
8: 9 8 e y 7
9: a 9 e

In other words, the most likely symbol to find in the first position of any VMs label is “o” (this is the Voyn_101 encoding). Second most likely is “1” and so on. For the second symbol in the label, the most likely is again “o” in first position, then “k”, and so on.

This is another take on the Crust/Mantle business.

Clearly, the symbols at the ends of the words have a completely different distribution to those at the front.

Compare the VMs star labels with the English star names from Don Latham’s list of star names (simplified):

1: a  m  s  k  h  d  r  t  e  n  b  f  z  j  p  u  c  g  i  w  v  l  o  y
2: a  l  u  i  e  h  c  r  s  d  n  o  t  z  k  w  b  g  j  m  f  y  x
3: a  r  h  i  b  n  s  m  l  t  u  d  k  g  c  e  f  z  w  j  y  p  o  v
4: a  i  e  r  h  b  d  m  l  n  u  s  k  c  f  o  t  z  w  g  p  j  y  v
5: a  i  r  l  n  e  t  h  k  b  s  m  u  d  c  f  g  y  o  z  j  w  p
6: a  h  r  i  e  t  l  n  m  b  s  d  u  c  k  z  o  y  g  p  f  j  v
7: a  e  h  n  i  r  l  b  s  m  t  o  y  d  c  u  k  z  g  f  w  v
8: a  i  h  n  e  c  t  l  s  b  o  u  k  r  f  z  m  g  y  p
9: h  a  t  n  i  e  s  u  r  o  g  z  k  d  p  c  v  m  b

So the plaintext has a quite different letter distribution: basically “a” is always the most likely, regardless of the position in the word.

A hypothesis is that there is a different mapping being used depending on the character position in the word. E.g. if I am enciphering “aldebaran” I would use one mapping for the first “a”, a different mapping for the “l” and so on.

If we also allow a VMs symbol to map to two plaintext characters, we can generate a “likely” mapping as follows:

1T: a  al m  s  k  h  d  sa r  t  ha e  n  ma b  ka f  mi z  j
1S: o  1  8  4  9  k  2  A  c  y  7  W  s  ?  G  e  6  g  ▐  h 

2T: a  l  u  i  e  h  ar ha c  r  ab la al ai s  d  lg ub as n
2S: o  h  k  1  c  a  8  e  y  j  C  9  K  H  7  2  s  u  m  J 

3T: a  r  h  i  b  n  s  m  l  t  u  d  k  g  c  e  ra ha f  z
3S: o  c  1  9  h  a  C  e  y  8  k  K  d  U  +  2  *  s  H  n 

4T: a  i  e  r  h  b  d  m  l  n  u  s  ra k  at ar al ha en c
4S: o  c  9  e  a  8  y  1  C  h  i  s  k  7  A  Q  H  2  m  J 

5T: a  i  r  l  n  e  t  h  k  ah b  s  m  u  ar ra d  c  at f
5S: 9  o  8  e  y  1  c  a  m  h  s  K  *  C  7  S  5  2 

6T: a  h  r  i  e  t  l  n  m  b  ah s  d  u  c  k  an la ab ar
6S: 9  e  8  a  1  y  o  7  c  s  m  i 

7T: a  e  h  n  i  r  l  b  s  m  t  ah o  y  d  c  u  k  at an
7S: 9  e  a  c  y  I  o  m  p  1 

8T: a  i  h  n  e  c  t  l  s  in b  o  u  ch k  r  f  et at z
8S: 9  8  e  y  7 

9T: h  a  t  n  i  e  s  u  r  ha he o  to g  z  ze ab th ra k
9S: a  9  e

(where “T” labels the plaintext alphabet, and “S” labels the source VMs alphabet).

Using this set of mappings, we can translate all the labels to plaintext. The results are disappointing: there are no matches in the list of starnames to the deciphered labels, even allowing anagrams.

But we can then start shuffling the mapping order around for each character position, and use a Monte Carlo approach to see what we come up with. This is what I am trying now. It’s not clear to me that I should allow anagrams in the solutions, since shuffling the letters destroys the ordering in the tables above, that I have assumed.

  1. Robert Teague
    April 23, 2010 at 9:05 am

    This went off on a tangent from my work pretty early on.

    GC’s transcription of the label is: 6 1 o e 7 a 9 , but I disagree. I think it is 8 l o e 8 a 9. Look closely at the label in the .sid file and compare with the transcription alphabet letters and decide for yourself.

    You’ve made the assumption the star name has to be spelled completely, and since the label is two letters short, a Voynich letter has to stand for two plaintext letters. I discarded that last idea quickly.

    Here is how Aldebaran maps (after a lot of experimenting):

    Voy-101: 8 1 o e 8 a 9
    Plain: a e l d a b n

    The letter R and an A are left out.

    As far as the anagramming is concerned, I won’t go into detail here, but Philip Neal’s discovery that only certain letters are allowed as word final explains much of it.

    The star closest to the center in the upper right pie slice is Algol, and it maps as follows:

    Voy-101: o h o s
    Plain: l g l o

    This leaves off the initial A and follows the final letter rules.

    By Neal’s rules, there are four letters with one substitute each, and the four gallows letters substitute with each other. (This ignores EVA q, which has seven values and is word initial.) This is a total of twelve, and the remaining letters should have one value each.

    Given that the above mapping is correct, The letter substitution rules show that O and D are related, as well as B and L.

    At this point the “degrees of freedom” objection is usually raised. Neal’s rules provide restrictions, as well as my own rule: Only one value can be used for any particular letter in a word. For example, if a word has one Voy-101 o, it will have a value of either B or L. Any plaintext word containing both is invalid.

    Of the twelve labels on f68r3, eight have been successfully decoded with the values above.

    The remaining four had to be done from scratch, and apparently confirms the suggestion that more than one set of letter values is used.

    • JB
      April 23, 2010 at 2:01 pm

      Hi Robert … I agree with your transcription to 81oe8a9. Since writing the above, I have become convinced that the Voyn_101 transcription is too liberal with its glyph assignments. I think all the “6”s and “7”s are in fact “8”s. I’m now using a “simplified” transcription which I detail in a more recent post.

      So I should re-run my tests with this simplification. Referring to f68r3, you may have seen that I favour the two stars in one segment as being Castor and Pollux!

  2. March 26, 2011 at 7:55 am

    Your close…. here are some anagram items you missed as well as some other items to be considered.. anagrams yes, multiple languages yes, mirror anagrams yes, sentence or paragraph no.
    here is an exmpl of a simple sentence I have put together from the learning I have gooten from the Voynich.
    Tell me what it says, or email me and I will shoow you what it says and how the basic’s work, your on the right track, you just missed some important vatribles…
    Shhhhhh the adept keeps quite…
    Here is the exampl
    Flatroll… “Words hark” , “Yaw Only”… “Such torts” NoN “be cats place” !
    If you ca decipher that one, you can work hard and figure out the Voynich….
    Just remember mirrors, very Important fact…

    • JB
      March 26, 2011 at 9:38 am

      Thanks for your comment.

  3. March 26, 2011 at 11:27 am

    make the 8 you see by writing a cursive “d” with the top open, as a child learning would…

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: