Posts Tagged ‘Neal’

How about a “Verbose Homophonic cipher”?

September 24, 2010 7 comments

I’ve had a bit of hiatus from the VMs, but it’s always popping up in my mind and niggling me, even when I haven’t got time to spend on it. The latest niggle was the idea that the VMs scribe used a set of simple tables that showed how to convert plaintext letters into codes. So, in an example table, letter “A” is written “4oh”, letter “B” is written “8am” and so on. Also, spaces in the plaintext have their own code. Veteran VMs researcher Philip Neal informed me that this is called a “verbose homophonic cipher”.

Elaborating on the idea:  the scribe uses one of the set of tables for each folio s/he is writing. To encipher the plaintext onto the folio, it’s simply a matter of writing down the VMs “word”  for each letter in the plaintext word. If there is more space on the line for the next plaintext word, the scribe writes down the code for space, and then the codes for the letters in the next word. Long spaces are written by writing the code for space more than once … The next line is used for the next word, and so on.

On the next folio, a different table may be used.

It’s hard to imagine the justification for such a scheme, but it does appear (at least initially) to fit some of the features of the VMs script (especially the repeating VMs words often seen).

I made a quick test that looks at VMs word frequencies on a single folio (in the Recipes section, which has the densest text). These showed a word frequency distribution that looks similar to the letter frequency distribution in Latin, apart from the most frequently occurring word (which is much more frequent) and which it is suggested would code for a space in the cipher.

However, on a typical folio, there are usually many more VMs words than there are plaintext letters. So the scheme has to be extended to allow the scribe a choice between several different VMs words to encode a single letter. Each table must have a set of words appearing in each plaintext letter column. Something like this:

Plaintext (space) a b
VMs words 8am ay okoe 4ohoe 2ay 1coe faiis 4ay oka

If this is indeed the scheme, one would expect to see patterns in the VMs word sequences that match patterns seen in the letter sequences of e.g. Latin words. Also, as Philip Neal pointed out, patterns like “word1 word2 word2 word1” would indicate a plaintext letter sequence of either “vowel consonant consonant vowel” or vice versa.

Looking through the whole of the VMs for sequence patterns (on the same line of text), I found the following:

  • There are no 4 word sequences that repeat at all
  • There are only four 3 word sequences that repeat, and each only twice
  • There are no sequences at all of the form “xyyx”

(all of which I find rather surprising, and thought provoking).

So it looks like this hypothesis is dead in the water, and can be ticked off that long list of “things it might have been but in fact don’t fit”!

(It turns out that Elmar Vogt has been working on a related, but more sophisticated, idea which he describes on his blog and is called a “Stroke Theory”.)


Prefix Stem and Suffix Analysis

February 26, 2010 2 comments

I grouped all the folios from f1v to f20v inclusive, and labeled the group as “Herbal folios”, and folios f103r to f116r inclusive labeled as “Recipe folios”. I ran each group through a program that extracts all the prefixes,suffixes and stems, validates each, and orders them in frequency. (The method used was described in an earlier email to the list.) My first question was: are the word frequencies and prefix/stem/suffix(PSS) frequencies similar between the Herbal and Recipe collections?

Here are the results. I’ll show only the suffix frequencies, because they are the most interesting.

Herbal: 1331 different words, top 10 words: "8am 1oe 1oy K9 89 19 s 8ay 2oe oy" 
Recipe: 1443 different words, top 10 words: "am ay 1c89 oe 4ohC9 8am oy 4oham 1c9 2c89" 

Top 10 Herbal Suffixes  (Frequency) 

9       0.105580695 
89      0.065862246 
y       0.06435395 
e       0.058320764 
am      0.04524887 
m       0.03167421 
s       0.027652087 
19      0.025641026 
8       0.023629965 
oy      0.02212167 

Top 10 Recipe Suffixes 

9       0.11764706 
e       0.05882353 
89      0.05020284 
y       0.04817444 
am      0.036511157 
8       0.029411765 
ay      0.028904665 
ae      0.024340771 
oy      0.023326572 
oe      0.021805273 

Note: Similar sets (7 of 10), with suffix “9” being approximately a factor two more common than the next most common suffix. I’m not sure what conclusions can be drawn, if any, from this. For fun, I applied the same analysis to a similar number of words from Augustinus Latin. Here are the results, together with the VMs data:

(Augustinus: 1257 different words, top 10 words: "et te in non me mihi est domine ut enim") 

Top 10 Latin Suffixes 

m       0.118421055 
s       0.10526316 
e       0.047368422 
que     0.039473683 
i       0.034210525 
o       0.028947368 
t       0.028947368 
us      0.02631579 
rum     0.021052632 
a       0.021052632 

So, Latin does not have the same frequency pattern at all. Is there a language which does have a similar patterm? I looked at Frenchfrom 1367, Spanish from 1527, German from 1553, and old English (Courtier):

Top 10 French Suffixes 

s    0.1199 
t    0.0736 
z    0.0708 
e    0.0654 
nt    0.0463 
es    0.0436 
l    0.0245 
r    0.0218 
re    0.0191 
er    0.0191 
tre    0.0163 

Top 10 Spanish Suffixes 

s    0.1874 
n    0.0519 
o    0.0464 
a    0.0445 
r    0.0297 
do    0.0297 
es    0.0241 
l    0.0223 
e    0.0223 
va    0.0204 
to    0.0148 

Top 10 German Suffixes 

en    0.1171 
t    0.1171 
s    0.1122 
n    0.0537 
er    0.0390 
ten    0.0341 
d    0.0341 
e    0.0293 
m    0.0293 
ts    0.0244 
r    0.0244 

Top 10 English Suffixes 

e    0.1404 
n    0.0449 
s    0.0421 
t    0.0393 
re    0.0337 
y    0.0281 
ne    0.0253 
l    0.0253 
r    0.0253 
ll    0.0253 
ed    0.0225 

The Spanish suffix “s” is three times more frequent than the next suffix: not a good match to the VMs. Similarly for the English “e”. The German suffix pattern is completely different to the VMs. The French pattern looks similar to the VMs. Let’s look at the French Stems, and compare with the VMs:

Top 10 Herbal Stems 

o       0.15171504 
9       0.058377307 
8       0.045184698 
k       0.04287599 
1o      0.040567283 
oe      0.036609497 
o8      0.028364116 
oy      0.026385223 
y       0.02176781 
2       0.02176781 

Top 10 French Stems 

a       0.0704 
d       0.0544 
es      0.0528 
en      0.0448 
le      0.0432 
se      0.032 
ent     0.0304 
de      0.0272 
ce      0.0272 
ne      0.0256 

A poor match.

Conclusion: the “9” suffix in the VMs appears too frequently for it to come from Latin, German, English or Spanish. Although French has a similarly frequent suffix “s”, the stem frequencies of French don’t match the VMs.

Hypothesis: the “9” suffix in the VMs is not a word suffix, but punctuation or some other annotation. Perhaps a key mark for deciphering purposes. Next step: re-analyse the PSS frequencies in the VMs after removing suffix “9” from words where it appears.

Using the Biological and Astrological Folios

Astrological: folios 66v to 73v inclusive

Biological: folios 75r to 85r inclusive

Herbal: 1331 different words,       top 10 words: "8am 1oe 1oy K9 89 19 s 8ay 2oe oy"
Recipe: 1443 different words,       top 10 words: "am ay 1c89 oe 4ohC9 8am oy 4oham 1c9 2c89" 
Astrological: 1771 different words, top 10 words: "ay am ae 8am s 8ay 8ae 89 okcos ohC9" 
Biological: 2135 different words,   top 10 words: "oe 4ohan 1c89 2c89 4ohc89 4oe 4ohae 1c9 4oham" 

Top 10 Herbal Suffixes  (Frequency) 

9       0.105580695 
89      0.065862246 
y       0.06435395 
e       0.058320764 
am      0.04524887 
m       0.03167421 
s       0.027652087 
19      0.025641026 
8       0.023629965 
oy      0.02212167 

Top 10 Recipe Suffixes 

9       0.11764706 
e       0.05882353 
89      0.05020284 
y       0.04817444 
am      0.036511157 
8       0.029411765 
ay      0.028904665 
ae      0.024340771 
oy      0.023326572 
oe      0.021805273 

Top 10 Astrological Suffixes 

9       0.120173536 
89      0.055531453 
am      0.046420824 
ay      0.04381779 
s       0.04295011 
ae      0.04251627 
e       0.040347073 
79      0.026898047 
y       0.022993492 
oe      0.022125814 

Top 10 Biological Suffixes 

9       0.11961975 
89      0.049643517 
e       0.038288884 
oe      0.031687353 
y       0.030102983 
c89     0.029838923 
ae      0.0293108 
c9      0.0293108 
oy      0.02719831 
ay      0.02508582 

The suffix frequency results for the different folio groups look reassuringly similar to me: the differences are what you would see if you compared two modestly sized tests in, say, English. Indeed, one can tentatively conclude that the language is the same in all four of the VMs sections. On the other hand, the top 10 word lists are quite different. Curious.

Regarding word stems: the definition of a word stem for this study is “any group of characters that spells a valid word by itself, and is also found following one or more other characters (a prefix) and/or followed

by one or more other characters (a suffix).” So, single VMs characters can be stems. After all, it may be that a single VMs character equates to multiple plaintext characters, so we have to have the flexibility to assign single characters as stems.

To clarify, take for example the VMs word “8am”. The candidate stems are “8am”, “8a”, “am”, “8”, “a” and “m”. Those candidates that appear as single words in the VMS dictionary are classed as valid stems (in this case, I believe all six are valid stems).

Once we have a list of all the valid stems in the text, we can count how often each appears, and then order that list. This is what is done toobtain the lists above.

Because this method is fully general, we avoid any assumptions about how many characters a single VMs character maps to.


I changed the algorithm so that it only accumulated prefix/stem/suffixes for unique words in the VMs (as opposed to accumulating them for all words). I think this is more sensible, otherwise a very popular word ended up skewing the statistics. After doing this, the results for suffixes look similar between Latin and VMs (Recipes) – using 3800 words:

Top 20 Latin Suffixes (from a Latin dictionary)

s 0.08350305
o 0.042769857
t 0.03971487
m 0.034623217
is 0.029531568
e 0.02749491
us 0.026476579
a 0.022403259
es 0.020366598
rum 0.01934827
um 0.018329939
tum 0.017311608
mus 0.017311608
to 0.017311608
i 0.01629328
tus 0.01629328
tis 0.015274949
c 0.014256619
em 0.013238289
am 0.013238289

Top 20 Herbal Suffixes

9 0.094210714
89 0.045487236
e 0.040273283
ay 0.036857247
y 0.036857247
am 0.03613808
ae 0.029126214
an 0.028047465
oe 0.024631428
79 0.023552679
oy 0.023013305
8 0.023013305
o 0.020316433
ap 0.019417476
c89 0.018878102
c9 0.017979145
s 0.017799353
m 0.015462064
o89 0.014383315
19 0.01366415

This suggests the following (partial) cipher :

VMs Latin
=== =====
9 s
8 i
7 u
e m
a r
o a
y um
m is

1 t 
4 qu
c e
g f
k c
2 d
s p
h n
3 h

Top 20 VMs words translated

am -> ris
ay -> rum
ae -> rm
1c89 -> teis
4ohC9 -> quan?s
1c9 -> tes
oe -> am
4oham -> quanris
8am -> iris
4ohan -> quanr?
oham -> anris
okam -> acris
oy -> aum
an -> r?
ohan -> anr?
e -> m
2c89 -> dkis
1c79 -> tkus
ohC9 -> an?s
okay -> acrum

Looking for longer repeating character sequences

In this analysis, the software looks in the text for all nGrams that appear at least twice as a) a prefix, or b) as a suffix or at least once as a stem, and calculates their (normalised) frequencies. I’m not sure what to make of the results!

For N=3, looking at the Herbal folios f1v-f20v inclusive, 1331 different words. 

Confirmed valid prefix/stem/suffix counts 99 252 111 
Prefix/Stem/Suffix frequency, normalised 
4ok     0.1010101               o89     0.05952381              o89     0.09009009 
4oh     0.07070707              1oe     0.055555556             8am     0.09009009 
1oe     0.060606062             4ok     0.055555556             1c9     0.054054055 
1oh     0.04040404              8am     0.04761905              1oy     0.054054055 
ok1     0.04040404              4oh     0.04761905              1oe     0.045045044 
8oe     0.030303031             1oy     0.03968254              coe     0.036036037 
1oy     0.030303031             1c9     0.031746034             cc9     0.027027028 
1co     0.030303031             1co     0.023809524             e89     0.027027028 
1ok     0.030303031             8oe     0.023809524             ham     0.027027028 
4oj     0.030303031             coe     0.01984127              2c9     0.027027028 

For N=3, processing the same number of different words from Thomas Hardy (English) 

Confirmed valid prefix/stem/suffix counts 87 160 67 
Prefix/Stem/Suffix frequency, normalised 
com     0.04597701              ely     0.025           ing     0.07462686 
par     0.022988506             ted     0.025           led     0.04477612 
rea     0.022988506             led     0.025           sed     0.04477612 
mot     0.022988506             sed     0.025           ely     0.04477612 
pla     0.022988506             ght     0.025           ted     0.029850746 
see     0.022988506             ing     0.01875         ter     0.029850746 
pas     0.022988506             ked     0.01875         son     0.029850746 
wai     0.022988506             per     0.01875         ned     0.029850746 
can     0.022988506             com     0.01875         ner     0.029850746 
smi     0.022988506             par     0.01875         mon     0.029850746 

For N=3, same number of words from Augustinus (Latin) 

Confirmed valid prefix/stem/suffix counts 102 197 83 
Prefix/Stem/Suffix frequency, normalised 
qua     0.039215688             ere     0.05076142              ere     0.04819277 
fac     0.029411765             qua     0.035532996             iat     0.04819277 
qui     0.029411765             fac     0.02538071              que     0.036144577 
dic     0.029411765             ita     0.02538071              ius     0.036144577 
pot     0.029411765             ius     0.02538071              ita     0.036144577 
ter     0.019607844             que     0.020304568             rum     0.024096385 
ali     0.019607844             dic     0.020304568             ent     0.024096385 
aud     0.019607844             ini     0.020304568             ram     0.024096385 
par     0.019607844             ans     0.015228426             unt     0.024096385 
cor     0.019607844             ent     0.015228426             ris     0.024096385 

For N=4 Voynich (statistics become poorer as N increases, of course) 

Confirmed valid prefix/stem/suffix counts 6 14 6 
Prefix/Stem/Suffix frequency, normalised 
4oko    0.16666667              o8ae    0.14285715              co89    0.16666667 
okam    0.16666667              okam    0.14285715              e8am    0.16666667 
oh2o    0.16666667              4ok1    0.071428575             o8an    0.16666667 
4okc    0.16666667              4oh1    0.071428575             e2oe    0.16666667 
k2co    0.16666667              co89    0.071428575             9koy    0.16666667 
4ohC    0.16666667              4oko    0.071428575             oKoy    0.16666667 
4ok1    0.0                     e8am    0.071428575             1o89    0.0 
4oh1    0.0                     oh2o    0.071428575             oe89    0.0 
ok1c    0.0                     o8an    0.071428575             o8ae    0.0 
ohoe    0.0                     4okc    0.071428575             ho89    0.0 

For N=4 English 

Confirmed valid prefix/stem/suffix counts 36 66 26 
Prefix/Stem/Suffix frequency, normalised 
pres    0.055555556             ined    0.045454547             sing    0.115384616 
dist    0.055555556             ring    0.045454547             ined    0.115384616 
weak    0.055555556             test    0.045454547             ally    0.07692308 
occa    0.055555556             ment    0.030303031             ring    0.03846154 
outl    0.027777778             pres    0.030303031             ence    0.03846154 
prob    0.027777778             sing    0.030303031             nded    0.03846154 
ment    0.027777778             weak    0.030303031             ding    0.03846154 
cons    0.027777778             prob    0.030303031             ning    0.03846154 
atte    0.027777778             hern    0.030303031             ness    0.03846154 
stan    0.027777778             sion    0.030303031             wing    0.03846154 

For N=4 Latin 

Confirmed valid prefix/stem/suffix counts 63 126 57 
Prefix/Stem/Suffix frequency, normalised 
faci    0.06349207              bant    0.03968254              ntes    0.0877193 
pecc    0.04761905              ntes    0.03968254              quam    0.05263158 
invo    0.031746034             faci    0.031746034             endo    0.05263158 
cred    0.031746034             pecc    0.031746034             ebam    0.03508772 
infa    0.031746034             endo    0.023809524             erem    0.03508772 
puer    0.031746034             ndis    0.023809524             iens    0.03508772 
habe    0.031746034             quam    0.023809524             ones    0.03508772 
form    0.031746034             quid    0.023809524             bant    0.01754386 
pare    0.031746034             rati    0.023809524             abam    0.01754386 
nesc    0.031746034             ibus    0.015873017             ndis    0.01754386 

For N=5 Voynich (no data satisfies selection) 

For N=5 English 

Confirmed valid prefix/stem/suffix counts 15 29 13 
Prefix/Stem/Suffix frequency, normalised 
consi   0.13333334              ation   0.06896552              ation   0.15384616 
ornam   0.13333334              consi   0.06896552              sting   0.15384616 
appea   0.06666667              ornam   0.06896552              dered   0.07692308 
dimen   0.06666667              sting   0.06896552              ality   0.07692308 
occup   0.06666667              still   0.06896552              ingly   0.07692308 
stand   0.06666667              dered   0.03448276              ental   0.07692308 
conce   0.06666667              ingly   0.03448276              rning   0.07692308 
sugge   0.06666667              dimen   0.03448276              ented   0.07692308 
diffe   0.06666667              occup   0.03448276              rence   0.07692308 
speci   0.06666667              ality   0.03448276              sions   0.07692308 

For N=5 Latin 

Confirmed valid prefix/stem/suffix counts 21 44 23 
Prefix/Stem/Suffix frequency, normalised 
volun   0.0952381               entes   0.06818182              entes   0.13043478 
pecca   0.0952381               batur   0.045454547             batur   0.08695652 
lauda   0.0952381               tibus   0.045454547             antur   0.08695652 
quaer   0.0952381               invoc   0.045454547             tibus   0.08695652 
metue   0.0952381               pecca   0.045454547             bamus   0.08695652 
invoc   0.04761905              lauda   0.045454547             torum   0.08695652 
infan   0.04761905              quaer   0.045454547             tatis   0.04347826 
inven   0.04761905              volun   0.045454547             itate   0.04347826 
nesci   0.04761905              metue   0.045454547             antes   0.04347826 
paren   0.04761905              bamus   0.045454547             bilis   0.04347826 
Here are the N=3 counts/frequency for the 1331 unique words in f1v-f20v of the Herbal: 

Confirmed valid prefix/stem/suffix counts 99 252 111 
Prefix/Stem/Suffix frequency, normalised 
4ok     10      0.1010101               o89     15      0.05952381              o89     10      0.09009009 
4oh     7       0.07070707              1oe     14      0.055555556             8am     10      0.09009009 
1oe     6       0.060606062             4ok     14      0.055555556             1c9     6       0.054054055 
1oh     4       0.04040404              8am     12      0.04761905              1oy     6       0.054054055 
ok1     4       0.04040404              4oh     12      0.04761905              1oe     5       0.045045044 
8oe     3       0.030303031             1oy     10      0.03968254              coe     4       0.036036037 
1oy     3       0.030303031             1c9     8       0.031746034             cc9     3       0.027027028 
1co     3       0.030303031             1co     6       0.023809524             e89     3       0.027027028 
1ok     3       0.030303031             8oe     6       0.023809524             ham     3       0.027027028 
4oj     3       0.030303031             coe     5       0.01984127              2c9     3       0.027027028 

(e.g. the sequence "4ok" appears 10 times at the start of a longer word (prefix)) 

N=3 for 1331 unique words in the Astrological Section 

Confirmed valid prefix/stem/suffix counts 154 346 153 
Prefix/Stem/Suffix frequency, normalised 
okc     11      0.071428575             o89     16      0.046242774             o89     13      0.08496732 
ohc     8       0.051948052             okc     11      0.031791907             cos     6       0.039215688 
4oh     7       0.045454547             8ae     11      0.031791907             8am     6       0.039215688 
9hc     7       0.045454547             1co     10      0.028901733             8ae     6       0.039215688 
oko     6       0.038961038             oko     10      0.028901733             cc9     4       0.026143791 
oka     6       0.038961038             oho     9       0.02601156              coe     4       0.026143791 
oho     5       0.032467533             ohc     8       0.023121387             o79     4       0.026143791 
1ok     5       0.032467533             oka     8       0.023121387             oh9     4       0.026143791 
oh1     5       0.032467533             4oh     8       0.023121387             c79     4       0.026143791 
1co     4       0.025974026             9hc     7       0.020231213             c89     3       0.019607844 

N=3 for 1331 unique words in the Biological Section 

Confirmed valid prefix/stem/suffix counts 124 275 124 
Prefix/Stem/Suffix frequency, normalised 
4oh     13      0.10483871              c89     26      0.094545454             c89     17      0.13709678 
4ok     10      0.08064516              4oh     20      0.07272727              c79     13      0.10483871 
4oe     8       0.06451613              c79     13      0.047272727             1c9     9       0.07258064 
oeh     6       0.048387095             4ok     12      0.043636363             C89     7       0.05645161 
oe1     5       0.04032258              1c9     11      0.04                    2c9     7       0.05645161 
ohc     4       0.032258064             2c9     9       0.03272727              189     4       0.032258064 
soe     4       0.032258064             4oe     8       0.02909091              eoy     3       0.024193548 
oe2     3       0.024193548             oeh     7       0.025454545             cc9     3       0.024193548 
91c     3       0.024193548             8ae     7       0.025454545             hC9     3       0.024193548 
8ay     3       0.024193548             8ay     7       0.025454545             ae9     3       0.024193548 

N=3 for 1331 unique words in the Recipes Section 

Confirmed valid prefix/stem/suffix counts 135 303 143 
Prefix/Stem/Suffix frequency, normalised 
4oh     17      0.12592593              4oh     18      0.05940594              c89     13      0.09090909 
4ok     14      0.1037037               4ok     17      0.05610561              o89     13      0.09090909 
ohc     9       0.06666667              o89     16      0.052805282             189     8       0.055944055 
okc     8       0.05925926              c89     15      0.04950495              c79     7       0.04895105 
oeh     7       0.05185185              oeh     10      0.0330033               8am     7       0.04895105 
1co     5       0.037037037             1co     10      0.0330033               8ay     6       0.04195804 
g1c     4       0.02962963              ohc     9       0.02970297              coe     5       0.034965035 
4oj     4       0.02962963              c79     9       0.02970297              8ae     5       0.034965035 
ohC     4       0.02962963              8ae     9       0.02970297              1c9     4       0.027972028 
1oe     3       0.022222223             189     9       0.02970297              cc9     4       0.027972028 

Philip Neal’s Anagram Encryption

Notice how words tend to start with “4”, “o” and “1” and tend to end with “9”, “m” and “e”. This sort of feature has me excited about Philip Neal’s anagram encryption idea explained here: which is summarised thus (quoting from that page):

  "1. Divide a plaintext into lines 
   2. Sort the words of each line into alphabetical order 
   3. Sort the letters of each word into alphabetical order 

   1. one thing led to another thing last night 
   2. another last led night one to thing thing 
   3. aehnort alst del ghint eno ot ghint ghint" 

Right now I am repurposing my Genetic Algorithm to attach some lines of the VMs assuming such an encryption – I am killed by the permutations (which go as factorial the length of the word).

Anagram Analysis

February 26, 2010 5 comments

Several people, notably Robert Teague and Philip Neal, have theorized that the cipher is based on anagrams. The idea is that to decipher the VMs text, you first convert the VMs symbols to alphabet letters, and then you find an anagram of those letters that makes a valid plaintext word.

Variations on this theme are that e.g. some of the plaintext letters in the target word are allowed to be missing and can be inferred from “context”, and/or that the VMs words are anagrams of the plaintext words that have letters arranged alphabetically.

For this analysis, we look at Folio68r3 in particular:

The labels on the stars are, from the ten o’clock pie slice going anti-clockwise, in the Voyn 101 encoding:





Anagrams: Combinations/Permutations

If you make the reasonable assumption that all the labels on the star shapes in f68r3 are star names, and you assume a target language, then this puts severe constraints on the cipher, since there are only so many star names and only so many cipher schemes that can be consistent amongst all the stars.

For each of the 12 star labels on f68r3 (61oe7a9 8oay9 oae1coe oh1o89 8ayaee ok98* oK98 ohoe19 1G9 179,9h9 ohos okoy9 ) there are a number of possible anagrams of the symbols (anagrams):

Label size 3 6 anagrams
Label size 4 24 anagrams
Label size 5 120 anagrams
Label size 6 720 anagrams
Label size 7 5040 anagrams

It appears like there is a lot of freedom: the first label “61oe7a9” can be rearranged 5040 different ways – and you can pick any letter for any of the 7 symbols! However, when you start actually choosing the mapping between symbols and alphabet characters, and require that at least one of  the anagrams for each deciphered label appears in a dictionary of star names, you rapidly eliminate many of the possibilities.

This is still true even if you allow that a single symbol in the label can map to two alphabet characters – the problem appears to be still over-constrained

For example, say you pick a mapping for the first label:

S: 6  1  o  e  7  a  9
R: d  r  e  ba an a  l
61oe7a9 -> drebaanal -> aldebaran

then you want to apply this to the next label “8oay9”. You already have the cipher for symbols o, a and 9 … they translate to e, a and l respectively – you know that the star name must contain e, a and l. You know that the star name must be between 5 and 7 characters long, and you just need to pick suitable letters for 8 and y.

Consulting a list of ~300 star names you find the following possibilities.

Alkes, Algedi, Alheka, Alhena, Elnath, Lesath, Alcyone, Algebar, Algorel, Ascella, Capella, Eltanin and Sheliak

(Since this Don Latham sent me a link to his list of star names at , which I have simplified and removed any duplicates and non-alphabetic symbols from, and put here: )

Choosing Alcyone (for reasons which will be obvious to some), you now have an extended mapping that includes 8 and y:

S: 6  1  o  e  7  a  9  8  y
R: d  r  e  ba an a  l  cy on
61oe7a9 -> drebaanal -> aldebaran
8oay9 -> cyeaonl -> alcyone

So far so good. Now we come to the third label: “oae1coe” which we can decipher all but one symbol of, to: “eabar?eba”.

Unhappily there are no 9 or 10 letter stars in the dictionary that can match this letter assortment. and so this eliminates our initial choice of mapping that gave us “aldebaran” for the first word, and “alcyone” for the second, because it cannot produce a dictionary word for the third label. Back to square one.

You can keep on doing this: selecting mappings, trying them out on the first two or three labels, and finding that there is no fit to the dictionary.

Although at first sight there seems to be a lot of flexibility from the use of anagrams in the cipher, in practice if an anagram of each and all the deciphered words is required to appear
in a dictionary, then that severely constrains the solution space.

Allowing Missing Characters

Looking again at the 12 star labels on f68r3:

61oe7a9 8oay9 oae1coe oh1o89 8ayaee ok98* oK98 ohoe19 1G9 179,9h9 ohos okoy9

There are 16 different VMs symbols used:

 o 9 1 e a 8 h y 7 k 6 c * K G s

Let’s assume that each of the VMs symbols maps to a single plaintext alphabet letter. Which 16 of the 26 letters in the alphabet will we choose to map? Let’s look at our list of star names and find the 16 most used alphabet letters:

 a i r e l s h n u t b k m d o c

(shown in order of frequency). How many different ways can we map the 16 VMs symbols to these 16 letters? That is:

Factorial 16 = 16! = 20,922,789,888,000

which is about 21 trillion (give or take), and is quite a lot. To explore this huge space of possibilities, we can use a Monte Carlo method. Basically we write a program to shuffle the 16 alphabet letters, look to see how that mapping works, then shuffle again, and so on. As we go, we keep track of the best mapping found so far.

Suppose we have the following mapping (cipher):

S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a e n c i l h r o s u d k t b m

Let’s go through the f68r3 star labels and convert them to plaintext using the cipher. After we convert them, let’s look the plaintext characters up in our list of star names using a “fuzzy match”. Here are the results:

61oe7a9 -> unacoie -> ?
8oay9 -> laire -> albireo (bo)
oae1coe -> aicndac -> ?
oh1o89 -> ahnale -> alhena ()
8ayaee -> liricc -> ?
ok98* -> aselk -> sheliak (hi)
oK98 -> atel -> elnath (nh)
ohoe19 -> ahacne -> achernar (rr)
1G9 -> nbe -> deneb (de)
1799h9 -> noeehe -> ?
ohos -> aham -> hamal (l)
okoy9 -> asare -> antares (nt)

Looking at the second label as an example, this is converted to plaintext characters “laire”. The fuzzy match looks up “laire” in the star names list, and finds a match with the star called “albireo”.

Fuzzy Match Rules

There is a match between the deciphered plaintext word and a dictionary word if

  1. All the letters in the plaintext word appear in the dictionary word
  2. There are no more than N missing letters in the plaintext word

In the example shown above, N is 2, so “laire” fuzzy matches “albireo”, with letters “bo” missing (shown in brackets)


In the above example, 8 of the 12 VMs star labels have been matched to valid star names. The application continues exploring the 16! possible arrangements, trying to improve on the number of matches.

After looking at around 70 million arrangements (i.e. about 3 millionths of them), it finds this:

Iteration 67334517 Deciphered=9/12
S: o  9  1  e  a  8  h  y  7  k  6  c  *  K  G  s
R: a  e  l  b  r  h  c  d  o  t  i  u  n  s  k  m
61oe7a9 -> ilabore -> borealis (s)
8oay9 -> harde -> schedar (sc)
oae1coe -> arbluab -> ?
oh1o89 -> aclahe -> alphecca (pc)
8ayaee -> hrdrbb -> ?
ok98* -> atehn -> elnath (l)
oK98 -> aseh -> scheat (ct)
ohoe19 -> acable -> cebalrai (ri)
1G9 -> lke -> alkes (as)
1799h9 -> loeece -> ?
ohos -> acam -> almach (lh)
okoy9 -> atade -> tarazed (rz)

Just how interesting/plausible/believable is this? We can make a control experiment by using a dictionary of dog breeds of about the same size: does mapping the VMs labels on f68r3 to star names produce a better fit than mapping the labels to dog breeds?

We run the program for a few million mappings, first with the star names, then with the dog breed names. For about 4 million mappings, there is a slightly better (9 out of 12) mapping to dog breeds compared with star names (8/12):


Iteration 3779182 Deciphered=9/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: e r o n h l b s d c a p t u i g
61oe7a9 -> aoendhr -> rhodesian (si)
8oay9 -> lehsr -> charles (ca)
oae1coe -> ehnopen -> ?
oh1o89 -> eboelr -> boerboel (bo)
8ayaee -> lhshnn -> ?
ok98* -> ecrlt -> central (na)
oK98 -> eurl -> tulear (ta)
ohoe19 -> ebenor -> redbone (d)
1G9 -> oir -> corgi (cg)
1799h9 -> odrrbr -> ?
ohos -> ebeg -> beagle (al)
okoy9 -> ecesr -> crested (td)


Iteration 4208178 Deciphered=8/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a i e c o l b s t h n u r d k m
61oe7a9 -> neactoi -> ?
8oay9 -> laosi -> polaris (pr)
oae1coe -> aoceuac -> ?
oh1o89 -> abeali -> algieba (g)
8ayaee -> losocc -> ?
ok98* -> ahilr -> alphirk (pk)
oK98 -> adil -> alkaid (ka)
ohoe19 -> abacei -> cebalrai (lr)
1G9 -> eki -> keid (d)
1799h9 -> etiibi -> ?
ohos -> abam -> markab (rk)
okoy9 -> ahasi -> nashira (nr)

Of course, this doesn’t disprove that the labels on f68r3 are in fact star names: it may well be that one of the 16! combinations produces a perfect fit, with “61oe7a9” deciphered to “aldebaran” and “8oay9” deciphered to “alcyone” etc..

Character Position Based Cipher

Looking just at the star labels in Folio 68r, we can extract the letter/symbol frequencies as a function of position in the label.

Here’s what they look like:

1: o 1 8 4 9 k 2 A c y 7 W s ? G e 6 g ▐ h
2: o h k 1 c a 8 e y j C 9 K H 7 2 s u m J + f G ┘ W
3: o c 1 9 h a C e y 8 k K d U + 2 * s H n ª I 6 m Z J
4: o c 9 e a 8 y 1 C h i s k 7 A Q H 2 m J 3 ? d
5: 9 o 8 e y 1 c a m h s K * C 7 S 5 2
6: 9 e 8 a 1 y o 7 c s m i
7: 9 e a c y I o m p 1
8: 9 8 e y 7
9: a 9 e

In other words, the most likely symbol to find in the first position of any VMs label is “o” (this is the Voyn_101 encoding). Second most likely is “1” and so on. For the second symbol in the label, the most likely is again “o” in first position, then “k”, and so on.

This is another take on the Crust/Mantle business.

Clearly, the symbols at the ends of the words have a completely different distribution to those at the front.

Compare the VMs star labels with the English star names from Don Latham’s list of star names (simplified):

1: a  m  s  k  h  d  r  t  e  n  b  f  z  j  p  u  c  g  i  w  v  l  o  y
2: a  l  u  i  e  h  c  r  s  d  n  o  t  z  k  w  b  g  j  m  f  y  x
3: a  r  h  i  b  n  s  m  l  t  u  d  k  g  c  e  f  z  w  j  y  p  o  v
4: a  i  e  r  h  b  d  m  l  n  u  s  k  c  f  o  t  z  w  g  p  j  y  v
5: a  i  r  l  n  e  t  h  k  b  s  m  u  d  c  f  g  y  o  z  j  w  p
6: a  h  r  i  e  t  l  n  m  b  s  d  u  c  k  z  o  y  g  p  f  j  v
7: a  e  h  n  i  r  l  b  s  m  t  o  y  d  c  u  k  z  g  f  w  v
8: a  i  h  n  e  c  t  l  s  b  o  u  k  r  f  z  m  g  y  p
9: h  a  t  n  i  e  s  u  r  o  g  z  k  d  p  c  v  m  b

So the plaintext has a quite different letter distribution: basically “a” is always the most likely, regardless of the position in the word.

A hypothesis is that there is a different mapping being used depending on the character position in the word. E.g. if I am enciphering “aldebaran” I would use one mapping for the first “a”, a different mapping for the “l” and so on.

If we also allow a VMs symbol to map to two plaintext characters, we can generate a “likely” mapping as follows:

1T: a  al m  s  k  h  d  sa r  t  ha e  n  ma b  ka f  mi z  j
1S: o  1  8  4  9  k  2  A  c  y  7  W  s  ?  G  e  6  g  ▐  h 

2T: a  l  u  i  e  h  ar ha c  r  ab la al ai s  d  lg ub as n
2S: o  h  k  1  c  a  8  e  y  j  C  9  K  H  7  2  s  u  m  J 

3T: a  r  h  i  b  n  s  m  l  t  u  d  k  g  c  e  ra ha f  z
3S: o  c  1  9  h  a  C  e  y  8  k  K  d  U  +  2  *  s  H  n 

4T: a  i  e  r  h  b  d  m  l  n  u  s  ra k  at ar al ha en c
4S: o  c  9  e  a  8  y  1  C  h  i  s  k  7  A  Q  H  2  m  J 

5T: a  i  r  l  n  e  t  h  k  ah b  s  m  u  ar ra d  c  at f
5S: 9  o  8  e  y  1  c  a  m  h  s  K  *  C  7  S  5  2 

6T: a  h  r  i  e  t  l  n  m  b  ah s  d  u  c  k  an la ab ar
6S: 9  e  8  a  1  y  o  7  c  s  m  i 

7T: a  e  h  n  i  r  l  b  s  m  t  ah o  y  d  c  u  k  at an
7S: 9  e  a  c  y  I  o  m  p  1 

8T: a  i  h  n  e  c  t  l  s  in b  o  u  ch k  r  f  et at z
8S: 9  8  e  y  7 

9T: h  a  t  n  i  e  s  u  r  ha he o  to g  z  ze ab th ra k
9S: a  9  e

(where “T” labels the plaintext alphabet, and “S” labels the source VMs alphabet).

Using this set of mappings, we can translate all the labels to plaintext. The results are disappointing: there are no matches in the list of starnames to the deciphered labels, even allowing anagrams.

But we can then start shuffling the mapping order around for each character position, and use a Monte Carlo approach to see what we come up with. This is what I am trying now. It’s not clear to me that I should allow anagrams in the solutions, since shuffling the letters destroys the ordering in the tables above, that I have assumed.