Archive

Archive for the ‘Genetic Algorithm’ Category

Language A and B Again

March 13, 2013 12 comments

A tentative conclusion from comparing Language A and Language B  is that the non-gallows glyphs are used in the same way in both Languages.

That is to say, they appear to mean the same thing. So the “o” in A means the same as the “o” in B.
There is some persistent “mixing” between the e/y glyphs, which is illustrated by the example result below:
ABMixing
There is also some doubt about the “8” glyph, which sometimes seems to mix with the gallows glyphs (e.g. in some cases, the “8” appears in A to function in the same way as a gallows glyph in B and vice versa). This may simply be an error in the comparison method, or it may be that the “8” is a null, or it may be due to some other effect.
The gallows glyphs are different – they don’t appear to mean the same in A and B. I’m focussing on those glyphs now.

Language “A” and “B” Conversions

March 5, 2013 12 comments

This is an update to my previous two posts on this topic.

I have been concentrating on searching for the correspondence between glyphs used in Language A, and glyphs used in Language B. As a reminder, the method is to take all words in, say, Language A, and “convert” them to words in Language B by changing the glyphs according to a candidate mapping table. The frequency of the converted Language B words is then compared with the original Language A words: the closer the frequencies, the better the mapping match.

Method Check using only Language A words

As a check of the method, I took the Herbal folios 1-25 (all in Language A) and split them into two groups: 1-12 and 13-25, and I then artificially labelled the latter group as Language B. Then I ran the matching procedure, which produced the following result:

Epoch 62 Best chromosome 0 Value= 5.62272615159e-05
Chromosome ['o', '9', '1', 'i', '8', 'a', 'e', 'c', 'k', 'y', 'h', 'N', '2', '4', 's', 'g', 'p', '?', 'K', 'H']
ngramsA    ['o', '9', '1', 'i', '8', 'a', 'e', 'c', 'h', 'y', 'k', 'N', '2', '4', 's', 'g', 'p', '?', 'K', 'H']

This is good and reassuring, since it shows that the words in folios 13-25 have essentially the same frequency distribution when their glyphs are mapped to the same glyphs in folios 1-12.

Removal of Glyph Variants in Voyn_101

As the tests progressed, it became clear that some of the glyphs GC defined in Voyn_101 were in fact variants of more common glyphs. The most obvious were the “m”, “n”, “N” glyphs mentioned before – with these included, the conversions between Language B and Language A were of much poorer quality than if they were expanded to “iiN”, “iN” and “iiiN” respectively. After some time weeding out these variants, the following table was arrived at:

seek =  ["3", "5", "+", "%", "#", "6", "7", "A", "X", 
         "I", "C", "z", "Z", "j", "u", "d", "U", "P", 
         "Y", "$", "S", "t", "q",
         "m", "M", "n", "Y", "!", ")", "*", "b", "J", "E", "x", "B", "D", "T", "Q", "W", "w", "V", "(", "&"]
repl =  ["2", "2", "2", "2", "2", "8", "8", "a", "y", 
         "ii", "cc", "iy", "iiy", "g", "f", "ccc", "F", "ip",
         "y", "s", "cs", "s", "iip",
         "iiN", "iiiN", "iN", "y", "2", "9", "p", "y", "G", "c", "y", "cccN", "ccN", "s", "p", "h", "h", "K", "9", "8"]

I am very confident that the glyphs remaining after using the above conversion table are the base set.  The base set of glyphs is thus:

Language A frequency order: 'o', 'c', '9', '1', 'a', '8', 'e', 'i', 'h', 'y', 'k', 's', '2', 'N', '4', 'g', 'p', '?', 'K', 'H', 'f', 'G', 'F', 'L', 'l', 'v', 'r', 'R'
Language B frequency order: 'c', 'o', '9', 'a', '8', 'e', '1', 'h', 'i', 'y', 'k', '2', 'N', 's', '4', 'g', 'p', 'f', '?', 'H', 'K', 'G', 'F', 'l', 'L', 'R', 'r', 'v'

where “?” represents all very rare glyphs (such as the “picnic table” glyph). There are thus 27 glyphs (15 gallows and 12 regular) excluding the rare special glyphs like the picnic table.

Glyph Mixing Between A and B

I ran many trials using the base set of glyphs, comparing various sections of the VMs written in the different hands. In particular, the following folio collections were defined:

Special = {'HerbalRecipeAB': range(107,117) + range(1,26),
           'HerbalAB': range(1,57),
           'HerbalBalneoAB': range(1,26) + range(75,85),
           'HerbalAstroAB': range(1,13) + range(67,75),
           'PharmaRecipeAB': [88,89,99,100,101,102] + range(103,117),
           'AllAB': range(1,117)
 }

The collection I used the most was the one called “HerbalBalneoAB”, which contains Herbal folios written in Language A, and Balneo folios written in Language B. The nice feature of this collection is that the number of words is around the same for both Languages, which makes comparing counts very easy:

Total words =  2846  Total Language A =  1581  Total Language B =  1584

As an example, here is a trial result for HerbalBalneoAB:

Language B ['o', '9', '1', 'a', 'i', 'f', 'c', 'y', 'h', 'e', 'K', 'N', '2', 's', '4', 'g', 'p', '8', 'k', 'H']
Language A ['o', '9', '1', 'a', 'i', '8', 'c', 'e', 'h', 'y', 'k', 'N', '2', 's', '4', 'g', 'p', 'K', '?', 'H']

In all the tests I ran, there were some common features in the results:

  • Mixing between “e” and “y” – when writing Language A, the use of “e” appears to be equivalent to the use of  “y” in Language B, and vice versa
  • Mixing between  8,f,F,k,K,g,G,r,R,?  and so on – the Gallows glyphs swap amongst themselves, and “8”

Just about all trials showed the “e”/”y” mixing. Tony Gaffney pointed out that these two glyphs are quite similar in stroke construction. The appearance of “8” amongst the swapping Gallows glyphs is curious.

An abjad result from the Genetic Algorithm

June 17, 2011 3 comments

Here is one of the GA results. This is an attempt at deciphering the text on f9v (the Viola plant). The VMs words on that folio are:

"fo1oy","ogoyo89","og9","2oy","4og19j1o","4ofoe","2oe",
"81oy","1oe","1oy","89","ok9","89",
"9hc9","1oy","oh9","occcs",
"9kc9","k19","okoe","ok9","koe89",
"g1oy","9j1cc9","4okoy","9j19","kc","ay","1k9",
"o8oe","1o9","h2co89","1o89","ok19","9ha",
"4o","1oe","1oe","okae","8oy",
"4oh1o","yoh98","8ae9",
"19","kay","19k9","8ay9","9koe89",
"ok9","h1oe","1oe","19","h9k9",
"91oy","12ok9","1oy"

These are not all the words on the folio: I have removed those that contain unusual or problematic glyphs (e.g. the “m”).

The GA comes up with the following VMs->Latin character mapping:

Voynich: o    9    1    k    y    8    e    c    h    a    4    g    2    j    f    s

Plain:   r    s    d    p    m    b    t    n    f    l        q    c    x    v    g

And here are the deciphered words. On each line you have the VMs word, the Latin consonants, then the possible Latin or English words in the dictionary that match the abjad.

fo1oy = vrdrm =  virdiarium viridarium viridiarium
ogoyo89 = rqrmrbs =  ?
og9 = rqs =  requies arquus
2oy = crm =  carum coram curam corium cremo cyrum curiam acerum acorum acroama acrum aecoreum careum cereum cerium ceroma coarmi coarmo crami cremii cremi croma cromae curium cream
4og19j1o = rqdsxdr =  ?
4ofoe = rvrt =  reverti reverto iuraverat
2oe = crt =  certa certe certo creta curatio curto creat coarto create cartae caret acerata careota careotae cariota cariotae carota carotae carta carti caryitae caryota caryotae ceratia ceratiae ceratii cerati cerata ceroti certi coertio coryti cratio creatio creati creata cretae cretea cretio crita critae croto curate curata curiatia curiata curito curta ocreata court courte curt cart
81oy = bdrm =  obdormio
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
9hc9 = sfns =  sifonis
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
oh9 = rfs =  rufus refuse
occcs = rnnng =  running runninge
9kc9 = spns =  sapiens spinas sponsi sponsa supinis spensa spinis yspanos sapineus sapinus saponis siponis sopionis spensae spineus spinosa spinus spons sponsae sponsio sponso supinus
k19 = pds =  pedes pedis apodis pods
okoe = rprt =  reperiet reparat eriperet reperta reperit reparatio reperti reporto reporte report
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
koe89 = prtbs =  partibus portabis parietibus
g1oy = qdrm =  quadrum quadrima
9j1cc9 = sxdnns =  ?
4okoy = rprm =  reprimi reprimo
9j19 = sxds =  ?
kc = pn =  opinio opino paene pene poena pono punio puny upon pane pena pone apiana apianae apina apinae paean paeon paeonia penae peni pinea pini poenae poenio open pen paine pain payne pyany pin pine pan peny peony
ay = lm =  aliam alium lama lamia lima limo olim almi oleum alme alma aulam alum aulaeum elimo ilum lamae lamiae lema limae limi ulmea ulmi elm
1k9 = dps =  dapes daps adeps adipis adipeus adips adipsi adposui dapis deposui depso depsui diapasi
o8oe = rbrt =  arboreti robert
1o9 = drs =  aderas derisui dorso durus odores duros dirus edurus odorus edrus durius diris duris derisio dares adoris adoreus adoriosa adrasi adrisi adrisio adrosi adursi derasi derisi derisa derosi derosa diarius dirasi dorsi odoris deirous dooers doores dryes dries drousie dyers
h2co89 = fcnrbs =  facinoribus
1o89 = drbs =  derbiosa
ok19 = rpds =  rapidus
9ha = sfl =  useful safly safely
4o = r =  aer ara aro aurae aure aurea auro ero eruo ira irae ire iuro or ore ori oro re rea rei rui ruo aera aerio ora iura aura era r uero uaria area auri iure iuri ere aeer aerae aerea aerei aeria aero arae areae areo arui aria ariae ari aureae aurei eiero eare erae erui eri euro euroa euri iro orae reae uro uri rai are oure yeare your our youre ear rue year yeer air rye ar
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
okae = rplt =  repleuit repleta repleat
8oy = brm =  baioarium barim baioariam brume bireme boarium boreum borium bromi bruma brumae eboreum ebrium ebureum obarmo broom
4oh1o = rfdr =  ?
yoh98 = mrfsb =  ?
8ae9 = blts =  oblitus balatus balteus ablatis ablutus abolitus ablatus belatus beluatus bliteus boletus bolites oblatus blites
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
kay = plm =  palam palma pluma pulmo puleium epulum pilum palmo apuliam palium apalum palmae palmea palmi palum paulum pileum plumae plumea plum polium polum palm
19k9 = dsps =  dasypus deseps disposui despise
8ay9 = blms =  bulimos bulimosa bulimus balms
9koe89 = sprtbs =  spiritibus
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
h1oe = fdrt =  foederata foederati
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
h9k9 = fsps =  ?
91oy = sdrm =  siderum sidereum sudarium
12ok9 = dcrps =  decerpsi decarpsi
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram

Vowels and 17

June 17, 2011 Leave a comment

Latin Alphabet – the number 17 again

From http://latindiscussion.com/latin/alphabet/history/evolution

“Before the Renaissance, letters J and U had been merely glyph variants of I and V.
W was first used by scribes writing Old English during the 7th century AD.”

CLASSICAL LATIN ALPHABET:

(22) A B C D E F G H I L M N O P Q R S T V X Y Z

REMOVE VOWELS:

(5) A E I O V

REMAINING:

(17) B C D F G H L M N P Q R S T X Y Z

f57v

Glyphs on f57v, 3rd ring, Voyn_101 encoding – 4 sets of 17 characters each:

o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.

compare:
b.c.d.f.g.h.l.m.n.p.q.r.s.t.x.y.z.

(Glyphs marked with “?” are very rare, and occur only on f57v and one other folio.)

Latest Vowel-less Table for Genetic Algorithm results

 

Vowel-less plaintext

June 13, 2011 3 comments

Suppose the VMs words have no vowels, and that a simple alphabetic substitution has been used to create the text from vowel-less plaintext.

I used a Genetic Algorithm to test this hypothesis on some of the naked lady labels in the Balneological section. Using a large Latin dictionary, I stripped out all vowels “aeiou” from the Latin words, giving me a set of vowel-less Latin words. This was then used by the GA to try to find the best 1-1 mapping between VMs glyph and Latin.

Here is a table of the starting statistics. The “Source” is the VMs (in the Voyn_101 encoding), the Target is Latin. The second and fifth columns show the total number of occurrences of each glyph and each Latin letter, respectively, and the following columns show that number as a fraction of the total. The rows are in order of glyph/letter frequency.

There are 16 VMs glyphs, and 22 Latin letters.

16 Voynich nGrams 21 plaintext nGrams
Top 16 1-grams in Voynich and 1-grams in plaintext
Source            Target
------            ------
o    52    0.21311475    s    7666    0.14250925
e    35    0.14344262    r    7450    0.13849385
9    30    0.12295082    t    7053    0.13111371
8    27    0.11065574    n    5706    0.10607328
a    25    0.10245901    c    4386    0.08153477
h    25    0.10245901    m    4340    0.08067964
y    17    0.06967213    l    3707    0.06891231
2    6    0.024590164    p    3079    0.05723793
k    5    0.020491803    d    2790    0.051865485
c    5    0.020491803    b    1725    0.03206737
i    5    0.020491803    v    1424    0.026471846
1    4    0.016393442    f    1372    0.025505178
s    3    0.012295082    g    1347    0.025040433
N    2    0.008196721    q    600    0.0111538675
4    2    0.008196721    h    509    0.009462197
g    1    0.0040983604    x    499    0.0092763

To run the GA, I used a simple weighting function that added the square of the length of every label that was decoded into a valid plaintext word.

Here are the results of one run, where about 50% of the labels (25/53) were converted. First the derived mapping between VMs glyph and Latin consonant:

Voynich: c    1    k    2    y    i    h    s    o    a    4    8    e    N    9    g    
Plain:   l    g    c    p    f    x    v    y        t    q    n    r    d    s    b

Note that the GA has assigned VMs “o” to a null …

Now here are the deciphered labels, with the possible voweled Latin words each may correspond to:

Source  : oeae9
Decipher: rtrs' : oratorius
Source  : oe189
Decipher: rgns' : origines
Source  : oha89
Decipher: vtns
Source  : ohoeo
Decipher: vr' : varia varie ver vera vere veri vero vir viro voro avara
Source  : ohoy9
Decipher: vfs
Source  : ogoy
Decipher: bf
Source  : oeh9
Decipher: rvs' : rivos
Source  : ohaN
Decipher: vtd
Source  : ohay
Decipher: vtf
Source  : oh29
Decipher: vps
Source  : sayae
Decipher: ytftr
Source  : 8ohae
Decipher: nvtr' : invetero
Source  : 8ayoe
Decipher: ntfr
Source  : 8ae89
Decipher: ntrns' : nutriens internus
Source  : 8ae28
Decipher: ntrpn' : interpono
Source  : 8aehay
Decipher: ntrvtf
Source  : 4ohae
Decipher: qvtr
Source  : 8e9
Decipher: nrs' : inrisuo iners
Source  : oy9
Decipher: fs' : fas
Source  : ok9
Decipher: cs' : acies acsi causa causae cuius iaces iocus ocius casa casia cos
Source  : e19
Decipher: rgs' : erigis reges regius rgis rugas regis
Source  : 8ay9
Decipher: ntfs
Source  : 8ae
Decipher: ntr' : antra inter interea intereo intra intro intueor natura naturae nitor nutrio nitori enitor enutrio ianitor notoare
Source  : 8ae89
Decipher: ntrns' : nutriens internus
Source  : 4oko8
Decipher: qcn
Source  : yhae
Decipher: fvtr
Source  : 9hc89
Decipher: svlns
Source  : oeh19
Decipher: rvgs
Source  : oko89
Decipher: cns' : canis canos cinis consui consuo censeo cuneus
Source  : ohay
Decipher: vtf
Source  : ohae
Decipher: vtr' : vetera viatori vitrea veter viator
Source  : ohoe89
Decipher: vrns
Source  : ohaiya89
Decipher: vtxftns
Source  : oh1oy
Decipher: vgf
Source  : oeaiiN
Decipher: rtxxd
Source  : 8oeoe
Decipher: nrr' : narro
Source  : sohoe9
Decipher: yvrs
Source  : oeha
Decipher: rvt
Source  : h9
Decipher: vs' : evasi ovis vasa vias viis vis visa visu vos avus vas visio
Source  : soyoye
Decipher: yffr
Source  : oeoeae
Decipher: rrtr
Source  : oy
Decipher: f' : fio fui f of
Source  : 2chay
Decipher: plvtf
Source  : 989
Decipher: sns' : sanes sanies sanus senis sensa sensi sensu sonas sinus
Source  : ohc89
Decipher: vlns' : valens volans volens vulnus
Source  : eoe9
Decipher: rrs' : rarus ruris rarius
Source  : 8oiiy
Decipher: nxxf
Source  : oe29
Decipher: rps' : repsi
Source  : okc89
Decipher: clns' : colonus
Source  : ehoe
Decipher: rvr' : revera
Source  : ohoe29
Decipher: vrps
Source  : oko89
Decipher: cns' : canis canos cinis consui consuo censeo cuneus
Source  : 82c89
Decipher: nplns

Does the language of Dante fit the VMs?

October 4, 2010 Leave a comment

Having spent many pleasurable hours checking various exotic cipher and code ideas, none of them remotely fits when using a GA, except one. My faith in the GA technique is that it very quickly gives an idea of how well a code/cipher theory fits the VMs text.

The one cipher idea and plaintext language that does notably better than all others is an nGram mapping with the language of Dante as the plaintext. This is a form of early Italian, and it produces results significantly better than all other languages tried with nGrams, including Latin, German, English, Spanish, Dutch, Chinese etc. .

I’ll post some results from this nGram/Dante GA later.

There is a significant obstacle with applying computational techniques to the VMs, and that is the machine transcriptions of the VMs text. Basically they differ substantially, to the extent that statistics obtained with, say, EVA do not match well with statistics obtained with, say, Voyn_101. A particular problem is glyph bloat … my opinion is that GC’s Voyn_101 transcription contains many more glyphs than the scribes were actually using. Little differences between the ways of writing “9″ for example, are classified as different glyphs. This plays havoc with statistical analysis. Thus I have a procedure that filters the Voyn_101 and remaps e.g. those multiple “9″ glyphs to the same glyph. This allows a smaller, more realistic, search space. But it still doesn’t address the question of what strokes make up a single glyph, which is often open to interpretation. Thus any nGram mapping procedure has to allow for at least 1-3 Grams in the Voynich to be reasonably sure of covering the glyph correspondences properly.

Here is an extract of the Dante Alighieri text that matches decently using nGrams to the VMs:


Cjant Prin

A metàt strada dal nustri lambicà
mi soj cjatàt ta un bosc cussì scur
chel troj just i no podevi pì cjatà.

A contàlu di nòuf a è propit dur:
stu post salvàdi al sgrifàva par dut
che al pensàighi al fa di nòuf timour!

Che colp amàr! Murì a lera puc pi brut!
Ma par tratà dal ben chiai cjatàt
i parlarài dal altri chiai jodùt.

I no saj propit coma chi soj entràt:
cun chel gran sùn che in chel moment i vèvi,
la strada justa i vèvi bandonàt.

Necuàrt che in riva in su i zèvi
propit la ca finiva la valàda
se tremaròla tal còu chi sintèvi

in alt jodùt iai la so spalàda
vistìda belzà dai rajs dal pianèta
cal mena i àltris dres pa la so strada.

(This is modified from a reply to Knox who commented on an earlier post.)

Genetic Algorithm

February 26, 2010 Leave a comment

Basic Idea

In the plaintext, convert each group of 1, 2, 3 or 4 characters into a Voynich group of 1,2,3 or 4 characters. We call this a “mapping”. For example, when creating Voynich from Latin, a cipher mapping might be:

e => o, i => 9, …

er => 4o, is => ok, ti => 8a, …

ent => 9k, ant => A, …

… and so on. This can be encoded into an algorithm thus which maps strings in “repl” to strings in “seek”. For example:

	String seek[] = {"4ok1",
			 "4oh","8am","1oe","4ok","ok1","o89","1oy","oh1","o8a","oha","ohc","c89","1co","k1o","1c9",
			 "c79","h1o","1o8","oko","oho","coe","8ae","co8","k19","h19","8ay","ham","hcc","koe","oka",
			 "hco",
			 "1o", "oe", "oh", "4o", "ok", "8a", "89", "am", "1c", "oy", "o8", "co", "ay",
			 "k1", "h1", "19", "hc", "c9", "ha", "ae", "79", "2o", "cc", "ko", "ho", "c8", "9h",
			 "9k", "c7", "2c", "ka", "kc", "1a", "an", "h9", "o,", "e8", "k9", "ap", "8o", "e,",
			 ",1", "7a", "81",
			 "o",  "9",  "1",  "a",  "8",  "c",  "h",  "e",  "k",  "y",  "4",  "m",  ",",
			 "2",  "7",  "s",  "K",  "C",  "p",  "g",  "n",  "H",  "j",  "A"};

	//Latin
	String repl[] = {"un",
			 "ri", "on", "f",  "es", "g",  "em", "de", "se", "co", "ne", "ur", "si", "ic", "ui", "me",
			 "ere","eb", "la", "ma", "le", "id", "bu", "nti","no", "cu", "eba","qui","ie", "al", "ul",
			 "ns",
			 "c",  "d",  "l",  "er", "is", "ti", "nt", "en", "re", "in", "um", "am", "us",
			 "te", "it", "v",  "tu", "ta", "ra", "di", "an", "ni", "li", "et", "ba", "ae", "mi",
			 "ent","st", "h",  "nd", "ci", "pe", "im", "ua", "io", "tur","il", "ve", "iu", "as",
			 "vi", "ita","ca",
			 "e",  "i",  "a",  "t",  "u",  "s",  "r",  "n",  "m",  "o",  "p",  "b", "q",
			 "qu", "at", "or", "ia", "ar", "ce", "ib", "ec", "ab", "ru", "ant"};

Such an algorithm is used inside a Chromosome of the Genetic Algorithm. The Chromosome decodes Voynich into Latin by  matching character groups in the Voynich word against each of the strings in the “seek” list in turn. If a match occurs, then the  Voynich group is translated into the Latin group in the “repl” list at the same position. Thus “4ok1” in Voynich is translated into “un” in Latin.

Once the Voynich word has been translated into Latin, the Latin word is looked up in a Latin dictionary. If the word is found, then the “cost” (or “quality”) of the Chromosome is increased … if the word is not found, then the cost is decreased. After all words in the Voynich text have been converted to Latin, and the aggregate cost of the Chromosome evaluated, it can be judged whether the mapping “seek” to “repl” is a good one or not.

Generating the Chromosome Population

We generate a large number of Chromosomes, each of which has a different, randomised, “seek” to “repl” mapping. We do this by simply shuffling the order of the “repl” strings in each Chromosome.

Thus, one Chromosome may map “4ok1” to “s” and another may map it to “qui”.

This population of Chromosomes is then evaluated: each Chromosome converts the Voynich words to Latin, and each then gets a cost. The higher the cost, the better. The highest possible cost would be a Chromosome that had a seek-repl mapping that produced a valid Latin word for each Voynich word.

Training the Chromosomes

The Chromosomes are ordered in decreasing cost, and then the best of them (i.e. at the top of the list) are “mated” together to produce offspring Chromosomes. The mating process essentially involves taking sequences of the “repl” strings from both parents and combining them to form a new “repl” string.

Some of the offspring Chromosomes are then “mutated”. This involves replacing one of the “repl” strings with some randomly selected letters from the Latin character set.

The process repeats (ordering the Chromosomes, mating the best ones, mutating the offspring) until a predefined cost value is reached, or the population of Chromosomes refuses to improve itself.

In the end, the best, trained Chromosome will contain the optimal arrangement of “seek” to “repl” mappings for conversion of Voynich to Latin.

The same procedure can be used for a Voynich to English, to German, French or any other language, provided that a dictionary and substantial texts are available to process.

First Results – Voynich to Latin

This is a limited attack on the first five “sentences” of f1r, using 200 chromosomes and a Latin dictionary of around 15,000 words. The best chromosome scores 9.4 after 500 training epochs (cf a score of 20 for a one-to-one translation of Latin into Latin).

Here are the deciphered sentences:

1) Voynich: fa19s 9hae ay Akam 2oe !oy9 ²scs 9 hoy 2oe89 soy9 Hay oy9 hacy 1kam 2ay Ais Kay Kay 8aN s9aIy 2ch9 oy 9ham +o8 Koay9 Kcs 8ayam s9 8om okcc9 okcoy yoeok9 ?Aay 8am oham oy ohaN saz9 1cay Kam Jay Fam 98ayai29

Latin: ?ereieas vias is asasita meas ?ereis ?astuas is quinti mensis asereis vis ereis sttunti viasita viis as?as alis alis qui? asisere? nti quere ere viis ?ita alamisis altuas ereis asis quiita quantis querenti ntiviquis ?asis qui amita ere am? asere?is viis alis ?is ?is isereere?viis