Archive

Posts Tagged ‘Genetic Algorithm’

Language A and B Again

March 13, 2013 12 comments

A tentative conclusion from comparing Language A and Language B  is that the non-gallows glyphs are used in the same way in both Languages.

That is to say, they appear to mean the same thing. So the “o” in A means the same as the “o” in B.
There is some persistent “mixing” between the e/y glyphs, which is illustrated by the example result below:
ABMixing
There is also some doubt about the “8” glyph, which sometimes seems to mix with the gallows glyphs (e.g. in some cases, the “8” appears in A to function in the same way as a gallows glyph in B and vice versa). This may simply be an error in the comparison method, or it may be that the “8” is a null, or it may be due to some other effect.
The gallows glyphs are different – they don’t appear to mean the same in A and B. I’m focussing on those glyphs now.

Language “A” and “B” Conversions

March 5, 2013 12 comments

This is an update to my previous two posts on this topic.

I have been concentrating on searching for the correspondence between glyphs used in Language A, and glyphs used in Language B. As a reminder, the method is to take all words in, say, Language A, and “convert” them to words in Language B by changing the glyphs according to a candidate mapping table. The frequency of the converted Language B words is then compared with the original Language A words: the closer the frequencies, the better the mapping match.

Method Check using only Language A words

As a check of the method, I took the Herbal folios 1-25 (all in Language A) and split them into two groups: 1-12 and 13-25, and I then artificially labelled the latter group as Language B. Then I ran the matching procedure, which produced the following result:

Epoch 62 Best chromosome 0 Value= 5.62272615159e-05
Chromosome ['o', '9', '1', 'i', '8', 'a', 'e', 'c', 'k', 'y', 'h', 'N', '2', '4', 's', 'g', 'p', '?', 'K', 'H']
ngramsA    ['o', '9', '1', 'i', '8', 'a', 'e', 'c', 'h', 'y', 'k', 'N', '2', '4', 's', 'g', 'p', '?', 'K', 'H']

This is good and reassuring, since it shows that the words in folios 13-25 have essentially the same frequency distribution when their glyphs are mapped to the same glyphs in folios 1-12.

Removal of Glyph Variants in Voyn_101

As the tests progressed, it became clear that some of the glyphs GC defined in Voyn_101 were in fact variants of more common glyphs. The most obvious were the “m”, “n”, “N” glyphs mentioned before – with these included, the conversions between Language B and Language A were of much poorer quality than if they were expanded to “iiN”, “iN” and “iiiN” respectively. After some time weeding out these variants, the following table was arrived at:

seek =  ["3", "5", "+", "%", "#", "6", "7", "A", "X", 
         "I", "C", "z", "Z", "j", "u", "d", "U", "P", 
         "Y", "$", "S", "t", "q",
         "m", "M", "n", "Y", "!", ")", "*", "b", "J", "E", "x", "B", "D", "T", "Q", "W", "w", "V", "(", "&"]
repl =  ["2", "2", "2", "2", "2", "8", "8", "a", "y", 
         "ii", "cc", "iy", "iiy", "g", "f", "ccc", "F", "ip",
         "y", "s", "cs", "s", "iip",
         "iiN", "iiiN", "iN", "y", "2", "9", "p", "y", "G", "c", "y", "cccN", "ccN", "s", "p", "h", "h", "K", "9", "8"]

I am very confident that the glyphs remaining after using the above conversion table are the base set.  The base set of glyphs is thus:

Language A frequency order: 'o', 'c', '9', '1', 'a', '8', 'e', 'i', 'h', 'y', 'k', 's', '2', 'N', '4', 'g', 'p', '?', 'K', 'H', 'f', 'G', 'F', 'L', 'l', 'v', 'r', 'R'
Language B frequency order: 'c', 'o', '9', 'a', '8', 'e', '1', 'h', 'i', 'y', 'k', '2', 'N', 's', '4', 'g', 'p', 'f', '?', 'H', 'K', 'G', 'F', 'l', 'L', 'R', 'r', 'v'

where “?” represents all very rare glyphs (such as the “picnic table” glyph). There are thus 27 glyphs (15 gallows and 12 regular) excluding the rare special glyphs like the picnic table.

Glyph Mixing Between A and B

I ran many trials using the base set of glyphs, comparing various sections of the VMs written in the different hands. In particular, the following folio collections were defined:

Special = {'HerbalRecipeAB': range(107,117) + range(1,26),
           'HerbalAB': range(1,57),
           'HerbalBalneoAB': range(1,26) + range(75,85),
           'HerbalAstroAB': range(1,13) + range(67,75),
           'PharmaRecipeAB': [88,89,99,100,101,102] + range(103,117),
           'AllAB': range(1,117)
 }

The collection I used the most was the one called “HerbalBalneoAB”, which contains Herbal folios written in Language A, and Balneo folios written in Language B. The nice feature of this collection is that the number of words is around the same for both Languages, which makes comparing counts very easy:

Total words =  2846  Total Language A =  1581  Total Language B =  1584

As an example, here is a trial result for HerbalBalneoAB:

Language B ['o', '9', '1', 'a', 'i', 'f', 'c', 'y', 'h', 'e', 'K', 'N', '2', 's', '4', 'g', 'p', '8', 'k', 'H']
Language A ['o', '9', '1', 'a', 'i', '8', 'c', 'e', 'h', 'y', 'k', 'N', '2', 's', '4', 'g', 'p', 'K', '?', 'H']

In all the tests I ran, there were some common features in the results:

  • Mixing between “e” and “y” – when writing Language A, the use of “e” appears to be equivalent to the use of  “y” in Language B, and vice versa
  • Mixing between  8,f,F,k,K,g,G,r,R,?  and so on – the Gallows glyphs swap amongst themselves, and “8”

Just about all trials showed the “e”/”y” mixing. Tony Gaffney pointed out that these two glyphs are quite similar in stroke construction. The appearance of “8” amongst the swapping Gallows glyphs is curious.

The Relationship Between Currier Languages “A” and “B”

March 1, 2013 24 comments

Captain Prescott Currier, a cryptographer, looked at the Voynich many moons ago, and made some very perceptive comments about it, which can be seen here on Rene Zandbergen’s site.

In particular, he noticed that the handwriting was different between some folios and others, and he also noticed (based on glyph/character counts) that there were two “languages” being used.

When I first looked at the manuscript, I was principally considering the initial (roughly) fifty folios, constituting the herbal section. The first twenty-five folios in the herbal section are obviously in one hand and one ‘‘language,’’ which I called ‘‘A.’’ (It could have been called anything at all; it was just the first one I came to.) The second twenty-five or so folios are in two hands, very obviously the work of at least two different men. In addition to this fact, the text of this second portion of the herbal section (that is, the next twenty-five of thirty folios) is in two ‘‘languages,’’ and each ‘‘language’’ is in its own hand. This means that, there being two authors of the second part of the herbal section, each one wrote in his own ‘‘language.’’ Now, I’m stretching a point a bit, I’m aware; my use of the word language is convenient, but it does not have the same connotations as it would have in normal use. Still, it is a convenient word, and I see no reason not to continue using it.

We can look at some statistics to see what he was referring to. Let’s compare the most common words in Folios 1 to 25 (in the Herbal section, Language A, written in Hand 1) and in Folios 107 to 116 (in the Recipes section, Language B, written in a different Hand):

Comparison between word frequencies in Languages A and B

Comparison between word frequencies in Languages A and B

So, for example, in Language A the most common word is “8am” and it occurs 192 times in the folios, whereas in Language B the most common word is “am”, occuring 137 times.

We might expect that these are the same word, enciphered differently. The question then is, how does one convert between words in Language A and words in Language B, and vice versa? In the case of the “8am” to “am” it’s just a question of dropping the “8”, as if “8” is a null character in Language A. In the case of the next most popular words, “1oe”(A) and “1c89″(B) it looks like “oe”(A) converts to “c89″(B). And so on.

If we look at the most popular nGrams (substrings) in both Languages, perhaps there is a mapping that translates between the two. Perhaps the cipher machinery that was used to generate the text had different settings, that produced Language A in one configuration, and Language B in another. Perhaps, if we look at the nGram correspondence that results in the best match between the two Languages, a clue will be revealed as to how that machinery worked.

This involves some software (I’m using Python now, which is fun). The software first calculates the word frequencies for Language A and B in a set of folios (the table above is an output from this stage). It then calculates the nGram frequencies for each Language. Here are the top 10:

nGramFrequencies

The software then runs a Genetic Algorithm to find the best mapping between the two sets of nGrams, so that when the mapping is applied to all words in Language B, it produces a set of words in Language A the frequencies of which most closely match the frequencies of words observed in Language A (i.e.  the frequencies shown in the first table above).

Here is an initial result. With the following mapping, you can take most common words in Language B, and convert them to Language A.

Table for converting between a Language B word and a Language A word

Table for converting between a Language B word and a Language A word

A couple of remarks. This is an early result and probably not the best match. There are some interesting correspondences :

  • “9” and “c” are immutable, and have the same function
  • Another interesting feature is that “4o” in Language B maps to “o” in Language A, and vice versa!
  • in Language B, “ha” maps to “h” in Language A, as if “a” is a null

In the Comments, Dave suggested looking at word pair frequencies between the Languages. Here is a table of the most common pairs in each Language.

Common word pairs in Languages A and B

Common word pairs in Languages A and B

For clarity, I am using what I call the “HerbalRecipesAB” folios for this study i.e.

Using folios for HerbalRecipeAB : [107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

More results coming …

f75r cures, pregnancy, life and death – Latin Plainchant

May 15, 2012 3 comments

Here is a result obtained using a Genetic Algorithm to match the text on f75r to Latin. The training corpus I used was a large file of Latin plainchant (the idea being that repeated “words” in the VMs show similarities to chant).

First, here is the folio with the translated words overlain in red:

Folio f75r decrypted as Latin plainchant

The genetic algorithm searched for a set of glyphs that each matched to a pair of Latin letters.

Most of the decrypted words are valid Latin and match words in the plaintext I used to train the GA. Some are Latin but do not appear in the plaintext. The other, invalid, words could be caused by errors in the pair matching.

Or the whole thing could well be nonsense! This is likely – I asked Joel Stevens to translate some of the Latin, and here is what he said:

On first inspection, it seems to be random non-sense. For example:
recita lugete vena dans veta ia debent lustrata lite

Would mean: Recite! Mourn! Blood-vessel giving. Forbid! Oh, they owe things that were purified by the lawsuit.

I’m not really sure how to make sense of it. I don’t see anything that stands out as an obvious sentence. Maybe some words are filler and need to be dropped… or maybe there is a hidden order that needs to be found (assuming these are the correct words).

Here is the Latin:

piraextita recita' lugete' idpirata lucrte vena'
dans' veta' ia' debent' lustrata' vanagete vamirata lite'
lugens' esnt levata' nuta' gens' veanta rochum' nogete le 
dato' uascie excita' curi gent na veta' le veta' luedicta 
veexti vata' arta' te' chum no' no' amicta' luedet luga' 
mori' edente' noga date' reri' lugens' feta' luedicta luga' 
morata' luaena uechum vana' lugete' ad' nt vana' luga'
pate vata' lugertta audita' lugens' vita' curata' resona'
lupina' feta' lumina' lugeum veon lugete' no' vana' lant 
aule lucrte ista' veta' lugens' vita' na ruri' mena' strata' 
luedicta lugens' nota' luedicta na lugent' reti' vena' date' vageas 
dedita' lugent' nota' veta' te' no' iret' veta' na no' vena' luga' 
morata' edicta' lumina' lumina' lumina' lumina' lugens' novena' 
lace' si' educta' lugens' na novena' lugens' vita' lugens' ruti' sita' 
lans' lugens' luuacinota lacium vana' pant' le vena' reista luga' 
aurata' lugete' verata nt ha' urgens' ad' revena lugens' vana'
morati' vemota curatita dant' bunt' id' ncnt vena' lugeas 
pant' vesata lunt vena' lunt ruchum este' late' nota' 
luaerata lunt veta' luga' veta' ulta' lugens' ti 
iu' veta' lute vata' stri sschum veta' lumicium 
clrechum vata' poma' le sebete no' te' ut' 
sati' veta' lugens' vana' le acta' gete vana' tute' 
mori' aeri' luedet lugent' deri lunt vana' lugeas 
lugens' no' edet' luuatita lans' vato te' no' 
lans' orta' luedicta id' na luedicta strata' tuas' 
dans' no' veno dans' no' luia' date' muta' 
gnsiurte duno' luca' alti' vena' resina' date' ruti' rurata na sunt' 
errata' morata' luedet pant' no' dace' veto' lunt nt amicta' luedicta strata' 
dans' fisi' na uachns no' recina lunt dans' novata' luedicta vana' lucrum' 
lu' dans' irquta lumita lugens' vaga' lugent' dans' vana' resino vena' reti' nodo' 
aurata' lumita vita' lugete' vena' vata' lumini' lumina' dant' na 
locuta' dant' vena' lugens' date' pena' vena' lugete' vena' usta' 
luedet nt vena' lunt vana' lugens' ferata rorata' dalias 
pr' nomina' resi date' ruta reti' ruti' no' gens' nomina' 
lugens' ambita' lugens' date' date' vena' lugens' nona' 
pant' mirata luedicta luncta' reti' ruti' date' date' na 
lumina' na ambita' lumina' luedicta lumiista nt 
lumirata luedicta lumina' lumina' lumirata usta' 
serata' luedicta lumirata noedicta ruri' ncusta 
date' vena' lugens' vena' dant' edicta' te' vena' 
paedicta luedicta rucina luga' na edicta' ma 
aurata' lumina' luedicta luiget luigicta date' 
serata' vagete nona' lugens' fiti ruti' nona' 
sprata lans' rerata lumina' rurata renona ruti' ruut nochns ha' date' na 
derata orta' lumita rerata lalint ruta rurata lumena rurata rechns 
lumita lumina' veno lumina' dans' vena' eg plnt vana' noti' 

Landini’s Challenge

February 26, 2010 Leave a comment

An excerpt from Landini’s challenge text (text he generated using an undisclosed method, supposed to replicate the features of the VMs text):

qopchdy chckhy daiin ¬ ½shxam chor otechar okcharain ryly sheodykeyl
sheodykeyl daiin shd okaiin qokain qokal yteoldy otedy qokydy opchedy
otal oldar chor lkeedol eer ol dair chedy daiin ockhdar cpheol chedy
xar qokaiin y chedy kshdy ololdy aiin char y okeey oldar qokaiin lsho
daiin olsheam qoeey chedy dchos pshedaiin shedy d qol key sheol or
cpheeedol qokedy qokaiin daiin cthosy chedy ar aiir chedy teeol aiin
cheey y cheam oky qokaiin daldaiin loiii¯ ar shtchy chedy aldaiin
ydchedy daiin shd okaiin qokain daiin qotcho chedy daiin lchy olorol
otedy qockhor shol daiin paichy chedy ar shdair chedal chedy kchdaldy
chckhy otakar qokedy s qooko chor daiin otcholchy chedy daiin koroiin
qokain qokedy kosholdy ol kchedy kshdy qokaiin ar shaikhy olaldy seees
ar oteodar chedy oteeol shedy daiin key dain daiin keeokechy chedy
lchey ail lchedy sches ol dsheeo otol odaiin qokain daiin sheeod chshy
chedy qoekedy tair sain qocheey aiin cheey chaiin ols shedy sheolol
daiin lcheol chedy daiin pchoraiin oshaiin chedy lchey lor sal aiin
cheey y dsheom shedy todydy cheor saiin shdaldy daiin ofchtar daiin

Here are some thought-provoking results from analysing the text, as suggested by Knox, the VM text, and comparisons with English, Latin, German, French and Spanish. These use a new form of the Genetic Algorithm, described below.

Summary

It looks to me like that Landini either generated his text from a transcription of the VM itself, or his algorithm for generating that text is a good emulation of the encoding process used in the VM. In other words the Landini “language” is a good candidate as a plaintext language for the VM, as opposed to the European languages tested.

Results

Here is a table which shows the GA’s efficiency at converting/translating between Voynich, Landini, and the other languages.

(In the table, the best possible score is 1.0 – see below for an explanation)

Asking the GA to translate English to English, or Latin to Latin, etc. results in a high efficiency score, as expected. Note that the Landini to Landini  efficiency is 0.97 – almost perfect.

The GA performs moderately at converting between the languages and the Landini text. But what is most striking (to me) is the good efficiency for converting Voynich to Landini (0.74) and Landini to Voynich (0.89)

Some Notes on the table

To look at this I revised my GA code so that it was more flexible, and I jettisoned the use of separate dictionaries. Here is how the GA now functions. It can convert/translate between any language text samples.

1) Two text files are read in: the “source” text, and the “target” text. This could be, for example, a source file containing Landini’s text, and a target file containing Spanish text, if we want to convert from Landini to Spanish.

2) The text in each file is processed separately, producing two word lists, and two sets of n-Gram frequency tables.

3) The chromosomes are generated with random mappings between the source n-Grams and the target n-Grams

4) The GA evolves the chromosomes by trying to maximise their cost. The difference now is that when a target word is generated from the source text using the mappings, it is looked up in the target word list created in 2) above, rather than in a separate dictionary.

5) After training, the best chromosome can have a maximum cost value of 1.0, which would correspond to a perfect conversion between the source text and the target text (i.e. every word produced from the source text is found in the target text dictionary)

6) So we can feed the GA with two identical texts, and after training the score of the best chromosome should be 1.0, and indeed it approaches that (it doesn’t quite get there because only the top 100 n-Grams are translated, and so some characters in the source text cannot be translated).

7) The word and n-Gram frequency lists are made from the entirety of each text, but (for this exploratory study) the training takes place on only the first 50 “words” in the source text, and uses only the first 100 n-Grams for mapping.  Thus if the 50 words of Voynich chosen contain several rare characters, then for those the mapping will fail because those rare characters do not appear in the n-Gram list, and this will result in a lower score.

8) In all cases the “X->X” score in the table (i.e. the diagonal)  represents the best score possible for that language, and is a normalisation for the other numbers in the table. I should really revise the table and divide out the off-diagonal scores by the diagonal normalisations.

9) An improvement would be to configure the n-Gram list to be, say, 200 long, and use more source (Voynich) words for the training. The downside of this is mainly execution speed.

10) These runs were with n-Grams up to 3: it would be better to go to 4 at least.

11) I think Landini gets good scores because the character set he uses is very small. Knox comments ” A factor must be that the Landini Challenge has built-in frequency matches to any transcription of the VMs. Also, there is no meaningful correspondence in the letter sequence of one word to another in Landini. The difficulty fits what I said the VMs may be.”

Genetic Algorithm

February 26, 2010 Leave a comment

Basic Idea

In the plaintext, convert each group of 1, 2, 3 or 4 characters into a Voynich group of 1,2,3 or 4 characters. We call this a “mapping”. For example, when creating Voynich from Latin, a cipher mapping might be:

e => o, i => 9, …

er => 4o, is => ok, ti => 8a, …

ent => 9k, ant => A, …

… and so on. This can be encoded into an algorithm thus which maps strings in “repl” to strings in “seek”. For example:

	String seek[] = {"4ok1",
			 "4oh","8am","1oe","4ok","ok1","o89","1oy","oh1","o8a","oha","ohc","c89","1co","k1o","1c9",
			 "c79","h1o","1o8","oko","oho","coe","8ae","co8","k19","h19","8ay","ham","hcc","koe","oka",
			 "hco",
			 "1o", "oe", "oh", "4o", "ok", "8a", "89", "am", "1c", "oy", "o8", "co", "ay",
			 "k1", "h1", "19", "hc", "c9", "ha", "ae", "79", "2o", "cc", "ko", "ho", "c8", "9h",
			 "9k", "c7", "2c", "ka", "kc", "1a", "an", "h9", "o,", "e8", "k9", "ap", "8o", "e,",
			 ",1", "7a", "81",
			 "o",  "9",  "1",  "a",  "8",  "c",  "h",  "e",  "k",  "y",  "4",  "m",  ",",
			 "2",  "7",  "s",  "K",  "C",  "p",  "g",  "n",  "H",  "j",  "A"};

	//Latin
	String repl[] = {"un",
			 "ri", "on", "f",  "es", "g",  "em", "de", "se", "co", "ne", "ur", "si", "ic", "ui", "me",
			 "ere","eb", "la", "ma", "le", "id", "bu", "nti","no", "cu", "eba","qui","ie", "al", "ul",
			 "ns",
			 "c",  "d",  "l",  "er", "is", "ti", "nt", "en", "re", "in", "um", "am", "us",
			 "te", "it", "v",  "tu", "ta", "ra", "di", "an", "ni", "li", "et", "ba", "ae", "mi",
			 "ent","st", "h",  "nd", "ci", "pe", "im", "ua", "io", "tur","il", "ve", "iu", "as",
			 "vi", "ita","ca",
			 "e",  "i",  "a",  "t",  "u",  "s",  "r",  "n",  "m",  "o",  "p",  "b", "q",
			 "qu", "at", "or", "ia", "ar", "ce", "ib", "ec", "ab", "ru", "ant"};

Such an algorithm is used inside a Chromosome of the Genetic Algorithm. The Chromosome decodes Voynich into Latin by  matching character groups in the Voynich word against each of the strings in the “seek” list in turn. If a match occurs, then the  Voynich group is translated into the Latin group in the “repl” list at the same position. Thus “4ok1” in Voynich is translated into “un” in Latin.

Once the Voynich word has been translated into Latin, the Latin word is looked up in a Latin dictionary. If the word is found, then the “cost” (or “quality”) of the Chromosome is increased … if the word is not found, then the cost is decreased. After all words in the Voynich text have been converted to Latin, and the aggregate cost of the Chromosome evaluated, it can be judged whether the mapping “seek” to “repl” is a good one or not.

Generating the Chromosome Population

We generate a large number of Chromosomes, each of which has a different, randomised, “seek” to “repl” mapping. We do this by simply shuffling the order of the “repl” strings in each Chromosome.

Thus, one Chromosome may map “4ok1” to “s” and another may map it to “qui”.

This population of Chromosomes is then evaluated: each Chromosome converts the Voynich words to Latin, and each then gets a cost. The higher the cost, the better. The highest possible cost would be a Chromosome that had a seek-repl mapping that produced a valid Latin word for each Voynich word.

Training the Chromosomes

The Chromosomes are ordered in decreasing cost, and then the best of them (i.e. at the top of the list) are “mated” together to produce offspring Chromosomes. The mating process essentially involves taking sequences of the “repl” strings from both parents and combining them to form a new “repl” string.

Some of the offspring Chromosomes are then “mutated”. This involves replacing one of the “repl” strings with some randomly selected letters from the Latin character set.

The process repeats (ordering the Chromosomes, mating the best ones, mutating the offspring) until a predefined cost value is reached, or the population of Chromosomes refuses to improve itself.

In the end, the best, trained Chromosome will contain the optimal arrangement of “seek” to “repl” mappings for conversion of Voynich to Latin.

The same procedure can be used for a Voynich to English, to German, French or any other language, provided that a dictionary and substantial texts are available to process.

First Results – Voynich to Latin

This is a limited attack on the first five “sentences” of f1r, using 200 chromosomes and a Latin dictionary of around 15,000 words. The best chromosome scores 9.4 after 500 training epochs (cf a score of 20 for a one-to-one translation of Latin into Latin).

Here are the deciphered sentences:

1) Voynich: fa19s 9hae ay Akam 2oe !oy9 ²scs 9 hoy 2oe89 soy9 Hay oy9 hacy 1kam 2ay Ais Kay Kay 8aN s9aIy 2ch9 oy 9ham +o8 Koay9 Kcs 8ayam s9 8om okcc9 okcoy yoeok9 ?Aay 8am oham oy ohaN saz9 1cay Kam Jay Fam 98ayai29

Latin: ?ereieas vias is asasita meas ?ereis ?astuas is quinti mensis asereis vis ereis sttunti viasita viis as?as alis alis qui? asisere? nti quere ere viis ?ita alamisis altuas ereis asis quiita quantis querenti ntiviquis ?asis qui amita ere am? asere?is viis alis ?is ?is isereere?viis

Voynich Herbs

February 26, 2010 1 comment

Edith Sherwood has a web site where she details compelling possible identifications for the plants depicted in the “herbal” pages of the VM.

Dana Scott’s page also has plausible identifications for the plants.

As has often been pointed out, if we look at the first Voynich “word” that appears on each page of the herbal part of the VM, we find that those words are unique, or appear elsewhere very rarely. It thus seems reasonable that the words may be the names of the plants depicted.

The GA was set up to find a set of n-Gram mappings that would convert a list of 111 Voynich first herbal words into Latin/English or Spanish. For this, dictionaries of Latin, English and Spanish herb/plant names were used.

The GA sought a mapping that would convert all the Voynich words for herbs/plants into as many valid plaintext (Spanish, English, Latin) words as possible. The best result was for a mixed English/Latin dictionary (see table): 31 of the 111 Voynich words were converted, about 30% success rate.

(One should never expect 100% success, due to missing names in the dictionary, transcription errors, missing n-Grams, incomplete n-Grams etc..)

The results are shown below in tabular form, together with Dana Scott’s and Edith Sherwood’s identification. The first column shows the folio in the VM, the second shows the first Voynich word on that folio. For the GA identification columns (3 and 4) the Voynich mapped word is shown, in quotation marks if not found in the associated dictionary, and in bold if found in the dictionary.

Note that, probably unsurprisingly, nowhere do the IDs from the GA in Spanish, English/Latin and Scott/Sherwood, agree! NOT YET, anyway 🙂

(What amuses me about about this mapping technique is that it tends to produce words that sound plausible in the target language. E.g. for f4r the Latin/English word “paptise” sounds like a valid word.)

Folio Voynich 1st Word Candidate GA ID, Spanish Candidate GA ID, Latin/Engish Dana Scott ID, English Dana Scott ID, Latin Sherwood ID, Latin Sherwood ID, English
f1r fa19s costa “greica”
f1v h1s9 rabo geum Deadly Nightshade Atropa belladonna Hyoscyamus niger Solanum nigrum Solanum dulcamara Atropa belladonna Deadly Nightshade
f2r h98an9 “jzba” “ariapha” Cornflower Centaurea cyanus Centaurea diffusa Diffuse Knapweed
f2v hoom “meic” “padi” Water Lily Nymphaea candida Nymphoides Nymphoides
f3r k2cos chinita (Impatiens) arnica Celosia argentea Feathery amaranth
f3v hoam menta (mint) paris Helleborus foetidus Dungwort
f4r ho8ae19 “mezirn” “paptise” Saxifraga cespitosa Alpine Saxifrage
f4v j1oom pastora (Poinsettia) “oigle” Campanula rapunculus Rampion
f5r h2o89 “piyn” “hicse” Arnica montana Wolfs Bane
f5v hA1coy malanga (Malanga) cirsium Tennis Racket Plant Agrimonia eupatoria Malva sylvestris Mallow
f6r foay “oote” “erk” Acanthus mollis Bear Breeches
f6v hoay9say1Chay “meotendoteisedh” “pakpikrtsst” Eryngium maritimum Sea Holly
f7r f1o8am “saynta” acris Trientalis europea Starflower
f7v joe29 “rden” anise Myrica gale Bog Myrtle
f8r g2oe “dno” “miv” Pisum sativum Green Pea
f8v Ko8 “anop” “amot” Symphytum officinale Comfrey
f9r k98eo “uardna” “cernur” Ricinus communis Casteroil
f9v fo1oy “oveh” “erut” Heartsease, Wild Pansy Viola tricolor Violaceae Viola
f10r g1oK9 “pohon” “apryse” Cichorium pumilum Chicory Endive
f10v gam tora (Tora Tree) gale Linnaea borealis Twinflower
f11r k2oe chino (Chinese Hat Plant) “arv” Rosmarinus officinalis Rosemary
f11v goe81o89 “albaveaca” “maadud” Curcuma longa Turmeric
f13r koy3oy “lenga” “mdoium” Banana Banana
f13v hoaiy “memh” “paft” Lonicera periclymenum Honeysuckles Woodbines
f14r g1o8am “poynta” “apcris” Scorzonera Black Salsify Vipers Grass
f14v g891om “uomic” “gesdi” Stachys monnieri Wood Betony Heal-all Sel-heal Woundwort
f15r k2oy “chiga” “arium” Sonchus oleraceus Sow Thistles
f15v gayoy “t8h” “gabt” Paris quadrifolia Herb Paris
f16r go1co89 “alblanyn” “marscse” Cannabis Cannabis
f16v g1yAm “potoora” “aptule” Chrysanthemum Chrysanthemum
f17r f2o89 “hayn” “ulcse” Catananche caerulea Cupids Dart
f17v g1o8oe “poyno” “apcv” Dioscorea Yams
f18r g8yaz89 “ullngn” “gmeagse” Aster alpinus Aster
f18v koe8 la (?) mad Telfairia Fluted pumpkin
f19r g1oy “poga” apium Polemonium coeruleum Greek Valerian
f19v go1am “albbora” mantle Draba nivalis Nailwort
f20r h81o89 “caveaca” woud Astragalus hypoglottis Milk vetch
f20v faIsay “crrote” greek Cynara cardunculus Cardoon
f21r g1oy “poga” apium Anagallis arvensis Pimpernel
f21v koe829 “laol” “madpe” Dictamnus albus Burning bush False Dittany White Dittany Gas Plant
f22r goe “albv” “maus” Verbena officinalis Common Vervain Holy Herb
f22v g9samoy “..dah” “hnshot” Tulip Tulip
f23r g9818op “.fhilo” “hsthlo” Pulsatilla vulgaris Pasque flower
f23v go8azoe “albzucv” “mapacus” Borago officinalis Borage Star Flower
f24r goyoy9 “alb..” “maby” Cucumis sativus Cucumber
f24v k1o8ay coyote (wild) rock Ficus religiosa Sacred Fig Bo Tree
f25r f1oe89 “sanoaca” “avd” Wild Thyme
f25v goCam “albcuora” “malile” Isatis tinctoria Woad
f26r g%coh9 “spnij” lunaria Prunella vulgaris Self heal
f26v g1c8ay pochote (Pochote) “apgok” Lens culinaris Lentil
f27r hsoy manga (Mango) “veium” Spinacia oleracea Spinach
f27v fo1ou oveja (?) eruca French Marigold Tagetes patula Dianthus superbus Dianthus
f28r g1o8ay “poyote” “apck” Aristolochia Smearwort Birthwort Pipevine
f28v h2oe pino (Pine) “hiv” Dahlia Dahlia imperialis Rhododendrons Rhododendrons
f29r gosam “alb.ora” “mansle” Lactuva sativa longifolia Romaine Cos Lettuce
f29v hoom “meic” “padi” Nigella sativa Roman coriander
f30r oh1cs9 “elanbo” “inrsum” Prunella vulgaris Healall
f30v Ks1an rubia (Madder) montana Cuscuta europaea Dodder
f31r hcc8c9 lichi (Lychee) “rgoio” Erigeron acris Fleabane
f31v go8az “albzon” “mapnn” Fernleaf yarrow Achillea filipendulina Valerian Valerian
f32r f1am santa (?) “aris” Veronica triphyllos Speedwell
f32v h1co8am “ranizora” “genple” Campanula rotundifolia Harebell
f33r k28ay “chizh” “arpt” Silene vulgaris Bladder Campion
f33v kayay “qllh” “opmet” Masterwort Astrantia major Tanacetum parthenium Feverfew
f34r g1cocj19 “ponianos” “apnbie” Anemone hortensis
f34v hs189 “mansn” “vewse” Lunaria annua Honesty Money Plant
f35r Koo anona (Custard Apple) amur Cichorium intybus Radicchio
f35v gay1oy “trtga” galium Ribes nigrum Blackcurrant
f36r j1af8aN “pa.nzti” “onupfl” Delphinium staphisagria Delphinium
f36v g1ayos9 “pooteesn” “apksise” Lamium amplexicaule Henbit
f37r koGoe “luiv” malus Mentha longifolia Mint
f37v h2o89 “piyn” “hicse” fedtschenkoi englerii Emilia fosbergii Tassel flower
f38r koeoy “lilh” “mmut”
f38v oh1oj “eveet” inula Euphorbia myrsinites Myrtle Spurge
f39r kc7o128 “goguadp” “gienmpot”
f39v g7aiy “inmh” “naft”
f40r g1c9 “poi” apio Erodium malacoides Storks bill
f40v j1c7an “pagmo” “oospo” Epiphyllum oxypetalum Crocus vernus Crocus
f41r j2c9hc8aecc9 “roilizrii” “ediorpcuio” Origanum vulgare Wild Marjoram
f41v hcSo8ae “lirbzv” “riupus” Coriandrum sativum Coriander Cilantro
f42r 2o “ah” st
f42v k1o˛ cola (?) rosa Aquilegia vulgaris Columbine Culverwort
f43r kayo8am “q.zora” “opbple” Stellaria media Chickweed
f43v g8saiy9 “u.lbn” “gnsicse” Elytrigia repens Couch grass
f44r k2o8g9 “chiy.” arch Mandragora officinarum Mandrake
f44v k2o china (Impatiens) “arur” Apium graveolens Celery
f45r g9h98ae “.jzv” “hariapus” Atriplex hortensis Orach Saltbush
f45v hosay9 “me..” pansy Lavandula angustifolia Lavender
f46r g1coJ9 “ponitr” “apnta” Leucanthemum vulgare Oxeye Daisy
f46v jo79e3c7 “rimvig” “andretos” Tanacetum parthenium, Chrysanthemum parthenium Inula conyza Ploughmans Spikenard Great Fleabane
f47r g1aiy “pomh” “apft” Lady’s Mantle, Lion’s Foot Alchemilla vulgaris Rosaceae Sempervivum tectorum Houseleek
f47v g2cok “dnier” minor Arnica montana Pulmonaria officinalis Lungwort
f48r g28am “dzora” “miple” Adonis Vernalis False Hellebore
f48v g1co819 “ponifn” “apnsse” Ruta graveolens Rue Herb of Grace
f49r gA2oe “ceahv” costus Nymphaea caerulea Blue Nile Lotus
f49v g he wort
f50r g2coy “dnih” mint Astrantia major Masterwort
f50v k19 con (?) rose Telopea speciosissima Gentiana frigida Stiff Gentain
f51r k2oe819 “chinofn” “arvsse” Cakile maritima Searocket
f51v go2o89 albahaca (Basil) “mastd” Salva officinalis Sage
f52r k8oh1F9 “queacn” “toinnise” Anemone coronaria Poppy Anemone
f52v g1oy “poga” apium Polystichum setiferum Fern
f53r hA8ap “mazlo” “ciplo” Achillea Ptarmica Sneezewort
f53v k2oy3c9 “chigamin” “ariumocse” Hieracium aurantiacum Hawkweed
f54r go8am “albzora” maple Cirsium oleraceum Cabbage thistle
f54v g1co8ay “ponizh” “apnpt” Bittersweet Nightshade Solanum dulcamara Perovskia atriplicifolia Russian Sage
f55r go8am “albzora” maple Fumaria officinalis Fumitory
f55v h1C8189 “raecsn” “geriwse” Forest lily Veltheima bracteata Broccoli Broccoli
f56r ok1ae “tebv” “trntus” Drosera Sundews
f56v h1cok “ranier” “genor” Cycas revoluta Sago Palm
f57r joccoHc9 “riopei” “anomiaio” Sherardia arvemsis Blue Field Madder
f65r Alchemilla vulgaris Ladies Mantle
f65v Centaurea cyanus Cornflower
f66v Satureja montana Winter Savory
f87r Satureja hortensis Summer Savory
f87v Senecio Primula vulgaris Primrose
f87v Kleinia Pedicularis flammea Lousewort Wood Bettony
f89v Actaea spicata Baneberry
f90r Conyza bonariensis Fleabane
f90v Eruca vesicaria Arugula Rocket
f93r Cynara cardunculus Artichoke
f93v Lupinus Lupin
f94r Botrychium lunaria Botrychium lunaria Moonwort Moonfern
f94v Agrostemma Githago Corncockle Red Campion
f94v Glycyrrhiza glabra Liquorice
f94v Plantago lanceolata Ribwort Plantain Kemps
f95r Berberis Sambucus nigra Elderberry
f95v Althaea Rosea Hollyhock
f96r Angelica archangelica Garden Angelica
f96v Tamus communis Black Bryony