Archive

Archive for the ‘English’ Category

Frequency Distributions for Phonetic Codes

June 12, 2012 1 comment

Knox took the time to plot the frequency distributions from this post, where I looked at the theory that the VMs words are phonetic codes. Here are his results:

Where not included in the title, comparisons are to the Herbal Sections. VMs is in blue-black.

Comparison of phonetic code frequencies between VMs sections and various known texts.

With only 40 words to translate, there cannot be a meaningful series but it would be interesting to see the actual words in position, anyway. If this only shows the power of Genetic Algorithms to match something regardless of significance, why does the old Latin Herbal make the best matches to the Herbal and Astrological sections?

An abjad result from the Genetic Algorithm

June 17, 2011 3 comments

Here is one of the GA results. This is an attempt at deciphering the text on f9v (the Viola plant). The VMs words on that folio are:

"fo1oy","ogoyo89","og9","2oy","4og19j1o","4ofoe","2oe",
"81oy","1oe","1oy","89","ok9","89",
"9hc9","1oy","oh9","occcs",
"9kc9","k19","okoe","ok9","koe89",
"g1oy","9j1cc9","4okoy","9j19","kc","ay","1k9",
"o8oe","1o9","h2co89","1o89","ok19","9ha",
"4o","1oe","1oe","okae","8oy",
"4oh1o","yoh98","8ae9",
"19","kay","19k9","8ay9","9koe89",
"ok9","h1oe","1oe","19","h9k9",
"91oy","12ok9","1oy"

These are not all the words on the folio: I have removed those that contain unusual or problematic glyphs (e.g. the “m”).

The GA comes up with the following VMs->Latin character mapping:

Voynich: o    9    1    k    y    8    e    c    h    a    4    g    2    j    f    s

Plain:   r    s    d    p    m    b    t    n    f    l        q    c    x    v    g

And here are the deciphered words. On each line you have the VMs word, the Latin consonants, then the possible Latin or English words in the dictionary that match the abjad.

fo1oy = vrdrm =  virdiarium viridarium viridiarium
ogoyo89 = rqrmrbs =  ?
og9 = rqs =  requies arquus
2oy = crm =  carum coram curam corium cremo cyrum curiam acerum acorum acroama acrum aecoreum careum cereum cerium ceroma coarmi coarmo crami cremii cremi croma cromae curium cream
4og19j1o = rqdsxdr =  ?
4ofoe = rvrt =  reverti reverto iuraverat
2oe = crt =  certa certe certo creta curatio curto creat coarto create cartae caret acerata careota careotae cariota cariotae carota carotae carta carti caryitae caryota caryotae ceratia ceratiae ceratii cerati cerata ceroti certi coertio coryti cratio creatio creati creata cretae cretea cretio crita critae croto curate curata curiatia curiata curito curta ocreata court courte curt cart
81oy = bdrm =  obdormio
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
9hc9 = sfns =  sifonis
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
oh9 = rfs =  rufus refuse
occcs = rnnng =  running runninge
9kc9 = spns =  sapiens spinas sponsi sponsa supinis spensa spinis yspanos sapineus sapinus saponis siponis sopionis spensae spineus spinosa spinus spons sponsae sponsio sponso supinus
k19 = pds =  pedes pedis apodis pods
okoe = rprt =  reperiet reparat eriperet reperta reperit reparatio reperti reporto reporte report
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
koe89 = prtbs =  partibus portabis parietibus
g1oy = qdrm =  quadrum quadrima
9j1cc9 = sxdnns =  ?
4okoy = rprm =  reprimi reprimo
9j19 = sxds =  ?
kc = pn =  opinio opino paene pene poena pono punio puny upon pane pena pone apiana apianae apina apinae paean paeon paeonia penae peni pinea pini poenae poenio open pen paine pain payne pyany pin pine pan peny peony
ay = lm =  aliam alium lama lamia lima limo olim almi oleum alme alma aulam alum aulaeum elimo ilum lamae lamiae lema limae limi ulmea ulmi elm
1k9 = dps =  dapes daps adeps adipis adipeus adips adipsi adposui dapis deposui depso depsui diapasi
o8oe = rbrt =  arboreti robert
1o9 = drs =  aderas derisui dorso durus odores duros dirus edurus odorus edrus durius diris duris derisio dares adoris adoreus adoriosa adrasi adrisi adrisio adrosi adursi derasi derisi derisa derosi derosa diarius dirasi dorsi odoris deirous dooers doores dryes dries drousie dyers
h2co89 = fcnrbs =  facinoribus
1o89 = drbs =  derbiosa
ok19 = rpds =  rapidus
9ha = sfl =  useful safly safely
4o = r =  aer ara aro aurae aure aurea auro ero eruo ira irae ire iuro or ore ori oro re rea rei rui ruo aera aerio ora iura aura era r uero uaria area auri iure iuri ere aeer aerae aerea aerei aeria aero arae areae areo arui aria ariae ari aureae aurei eiero eare erae erui eri euro euroa euri iro orae reae uro uri rai are oure yeare your our youre ear rue year yeer air rye ar
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
okae = rplt =  repleuit repleta repleat
8oy = brm =  baioarium barim baioariam brume bireme boarium boreum borium bromi bruma brumae eboreum ebrium ebureum obarmo broom
4oh1o = rfdr =  ?
yoh98 = mrfsb =  ?
8ae9 = blts =  oblitus balatus balteus ablatis ablutus abolitus ablatus belatus beluatus bliteus boletus bolites oblatus blites
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
kay = plm =  palam palma pluma pulmo puleium epulum pilum palmo apuliam palium apalum palmae palmea palmi palum paulum pileum plumae plumea plum polium polum palm
19k9 = dsps =  dasypus deseps disposui despise
8ay9 = blms =  bulimos bulimosa bulimus balms
9koe89 = sprtbs =  spiritibus
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
h1oe = fdrt =  foederata foederati
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
h9k9 = fsps =  ?
91oy = sdrm =  siderum sidereum sudarium
12ok9 = dcrps =  decerpsi decarpsi
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram

Does the language of Dante fit the VMs?

October 4, 2010 Leave a comment

Having spent many pleasurable hours checking various exotic cipher and code ideas, none of them remotely fits when using a GA, except one. My faith in the GA technique is that it very quickly gives an idea of how well a code/cipher theory fits the VMs text.

The one cipher idea and plaintext language that does notably better than all others is an nGram mapping with the language of Dante as the plaintext. This is a form of early Italian, and it produces results significantly better than all other languages tried with nGrams, including Latin, German, English, Spanish, Dutch, Chinese etc. .

I’ll post some results from this nGram/Dante GA later.

There is a significant obstacle with applying computational techniques to the VMs, and that is the machine transcriptions of the VMs text. Basically they differ substantially, to the extent that statistics obtained with, say, EVA do not match well with statistics obtained with, say, Voyn_101. A particular problem is glyph bloat … my opinion is that GC’s Voyn_101 transcription contains many more glyphs than the scribes were actually using. Little differences between the ways of writing “9″ for example, are classified as different glyphs. This plays havoc with statistical analysis. Thus I have a procedure that filters the Voyn_101 and remaps e.g. those multiple “9″ glyphs to the same glyph. This allows a smaller, more realistic, search space. But it still doesn’t address the question of what strokes make up a single glyph, which is often open to interpretation. Thus any nGram mapping procedure has to allow for at least 1-3 Grams in the Voynich to be reasonably sure of covering the glyph correspondences properly.

Here is an extract of the Dante Alighieri text that matches decently using nGrams to the VMs:


Cjant Prin

A metàt strada dal nustri lambicà
mi soj cjatàt ta un bosc cussì scur
chel troj just i no podevi pì cjatà.

A contàlu di nòuf a è propit dur:
stu post salvàdi al sgrifàva par dut
che al pensàighi al fa di nòuf timour!

Che colp amàr! Murì a lera puc pi brut!
Ma par tratà dal ben chiai cjatàt
i parlarài dal altri chiai jodùt.

I no saj propit coma chi soj entràt:
cun chel gran sùn che in chel moment i vèvi,
la strada justa i vèvi bandonàt.

Necuàrt che in riva in su i zèvi
propit la ca finiva la valàda
se tremaròla tal còu chi sintèvi

in alt jodùt iai la so spalàda
vistìda belzà dai rajs dal pianèta
cal mena i àltris dres pa la so strada.

(This is modified from a reply to Knox who commented on an earlier post.)

Landini’s Challenge

February 26, 2010 Leave a comment

An excerpt from Landini’s challenge text (text he generated using an undisclosed method, supposed to replicate the features of the VMs text):

qopchdy chckhy daiin ¬ ½shxam chor otechar okcharain ryly sheodykeyl
sheodykeyl daiin shd okaiin qokain qokal yteoldy otedy qokydy opchedy
otal oldar chor lkeedol eer ol dair chedy daiin ockhdar cpheol chedy
xar qokaiin y chedy kshdy ololdy aiin char y okeey oldar qokaiin lsho
daiin olsheam qoeey chedy dchos pshedaiin shedy d qol key sheol or
cpheeedol qokedy qokaiin daiin cthosy chedy ar aiir chedy teeol aiin
cheey y cheam oky qokaiin daldaiin loiii¯ ar shtchy chedy aldaiin
ydchedy daiin shd okaiin qokain daiin qotcho chedy daiin lchy olorol
otedy qockhor shol daiin paichy chedy ar shdair chedal chedy kchdaldy
chckhy otakar qokedy s qooko chor daiin otcholchy chedy daiin koroiin
qokain qokedy kosholdy ol kchedy kshdy qokaiin ar shaikhy olaldy seees
ar oteodar chedy oteeol shedy daiin key dain daiin keeokechy chedy
lchey ail lchedy sches ol dsheeo otol odaiin qokain daiin sheeod chshy
chedy qoekedy tair sain qocheey aiin cheey chaiin ols shedy sheolol
daiin lcheol chedy daiin pchoraiin oshaiin chedy lchey lor sal aiin
cheey y dsheom shedy todydy cheor saiin shdaldy daiin ofchtar daiin

Here are some thought-provoking results from analysing the text, as suggested by Knox, the VM text, and comparisons with English, Latin, German, French and Spanish. These use a new form of the Genetic Algorithm, described below.

Summary

It looks to me like that Landini either generated his text from a transcription of the VM itself, or his algorithm for generating that text is a good emulation of the encoding process used in the VM. In other words the Landini “language” is a good candidate as a plaintext language for the VM, as opposed to the European languages tested.

Results

Here is a table which shows the GA’s efficiency at converting/translating between Voynich, Landini, and the other languages.

(In the table, the best possible score is 1.0 – see below for an explanation)

Asking the GA to translate English to English, or Latin to Latin, etc. results in a high efficiency score, as expected. Note that the Landini to Landini  efficiency is 0.97 – almost perfect.

The GA performs moderately at converting between the languages and the Landini text. But what is most striking (to me) is the good efficiency for converting Voynich to Landini (0.74) and Landini to Voynich (0.89)

Some Notes on the table

To look at this I revised my GA code so that it was more flexible, and I jettisoned the use of separate dictionaries. Here is how the GA now functions. It can convert/translate between any language text samples.

1) Two text files are read in: the “source” text, and the “target” text. This could be, for example, a source file containing Landini’s text, and a target file containing Spanish text, if we want to convert from Landini to Spanish.

2) The text in each file is processed separately, producing two word lists, and two sets of n-Gram frequency tables.

3) The chromosomes are generated with random mappings between the source n-Grams and the target n-Grams

4) The GA evolves the chromosomes by trying to maximise their cost. The difference now is that when a target word is generated from the source text using the mappings, it is looked up in the target word list created in 2) above, rather than in a separate dictionary.

5) After training, the best chromosome can have a maximum cost value of 1.0, which would correspond to a perfect conversion between the source text and the target text (i.e. every word produced from the source text is found in the target text dictionary)

6) So we can feed the GA with two identical texts, and after training the score of the best chromosome should be 1.0, and indeed it approaches that (it doesn’t quite get there because only the top 100 n-Grams are translated, and so some characters in the source text cannot be translated).

7) The word and n-Gram frequency lists are made from the entirety of each text, but (for this exploratory study) the training takes place on only the first 50 “words” in the source text, and uses only the first 100 n-Grams for mapping.  Thus if the 50 words of Voynich chosen contain several rare characters, then for those the mapping will fail because those rare characters do not appear in the n-Gram list, and this will result in a lower score.

8) In all cases the “X->X” score in the table (i.e. the diagonal)  represents the best score possible for that language, and is a normalisation for the other numbers in the table. I should really revise the table and divide out the off-diagonal scores by the diagonal normalisations.

9) An improvement would be to configure the n-Gram list to be, say, 200 long, and use more source (Voynich) words for the training. The downside of this is mainly execution speed.

10) These runs were with n-Grams up to 3: it would be better to go to 4 at least.

11) I think Landini gets good scores because the character set he uses is very small. Knox comments ” A factor must be that the Landini Challenge has built-in frequency matches to any transcription of the VMs. Also, there is no meaningful correspondence in the letter sequence of one word to another in Landini. The difficulty fits what I said the VMs may be.”