Archive

Archive for the ‘Latin’ Category

How was the Voynich Manuscript text written?

August 23, 2012 7 comments

I’ve spent many happy hours poring over the text, and am convinced that it is not as “simple” as it appears (i.e. the “words” are not words at all). Here are some conjectures:

  1. The lines look like they are written left to right i.e. the glyphs were written down from left to right, but were not.
  2. The scribe started with the drawing and started writing glyphs at various positions on the page.
  3. The method used for choosing each glyph and for deciding its position involved a mechanical apparatus, perhaps a set of co-rotating cipher wheels that were used to convert each character in the Latin plaintext into a VMs glyph and page position
  4. The apparatus is set to a new starting position for each folio/page (so e.g. Bettony labels on the three folios the plant appears on are different)
  5.  The density of ink is a clue to the order in which the glyphs were written (nib/quill freshly dipped and full of ink, or almost dry)
  6. At some point the scribe finishes writing the needed glyphs, and then fills out the spaces with pseudo-random words.
  7. There is no punctuation because what is seen are not words. What is seen makes no grammatical sense because the glyphs are not ordered and positioned linearly across the page.
  8. Perhaps the secret to unwinding the cipher is in the labels. The labels on one page are constrained to have been produced by the same initial position of the cipher apparatus, and they must come from the plaintext label.

There are so many clues as to what is going on, yet putting them all together is hugely challenging

For example, Jim Reeds suggested years ago that the order in which the text had been written on the sunflower page, f33v:

f33v

was first the text to the left of the left stalk, second the text in between the stalks, and finally the text to the right of the last stalk. This is compelling, since the ink density looks different, and the lines don’t line up well across the stalks. It becomes clearer if you saturate the image:

f33v Saturated

And in that image, what jumps out are the glyphs that are darker than the others. Those can be seen more clearly in black/white:

f33v monochrome drop

where the “o”, “y”, “8”, “e” stick out like sore thumbs. Most of those are in the left section, some in the middle, and fewer in the right. Why are these glyphs bolder, why are they inked more heavily? Were these the glyphs initially placed on the page, and contain the real information, and the rest, unimportant and pseudo-random, were all added later to make the text look “normal”?

Categories: 8, ay, Characters, e, f33v, Features, Jim Reeds, Latin, o, oy, Theories, Writing, y Tags: , ,

Frequency Distributions for Phonetic Codes

June 12, 2012 1 comment

Knox took the time to plot the frequency distributions from this post, where I looked at the theory that the VMs words are phonetic codes. Here are his results:

Where not included in the title, comparisons are to the Herbal Sections. VMs is in blue-black.

Comparison of phonetic code frequencies between VMs sections and various known texts.

With only 40 words to translate, there cannot be a meaningful series but it would be interesting to see the actual words in position, anyway. If this only shows the power of Genetic Algorithms to match something regardless of significance, why does the old Latin Herbal make the best matches to the Herbal and Astrological sections?

f75r cures, pregnancy, life and death – Latin Plainchant

May 15, 2012 3 comments

Here is a result obtained using a Genetic Algorithm to match the text on f75r to Latin. The training corpus I used was a large file of Latin plainchant (the idea being that repeated “words” in the VMs show similarities to chant).

First, here is the folio with the translated words overlain in red:

Folio f75r decrypted as Latin plainchant

The genetic algorithm searched for a set of glyphs that each matched to a pair of Latin letters.

Most of the decrypted words are valid Latin and match words in the plaintext I used to train the GA. Some are Latin but do not appear in the plaintext. The other, invalid, words could be caused by errors in the pair matching.

Or the whole thing could well be nonsense! This is likely – I asked Joel Stevens to translate some of the Latin, and here is what he said:

On first inspection, it seems to be random non-sense. For example:
recita lugete vena dans veta ia debent lustrata lite

Would mean: Recite! Mourn! Blood-vessel giving. Forbid! Oh, they owe things that were purified by the lawsuit.

I’m not really sure how to make sense of it. I don’t see anything that stands out as an obvious sentence. Maybe some words are filler and need to be dropped… or maybe there is a hidden order that needs to be found (assuming these are the correct words).

Here is the Latin:

piraextita recita' lugete' idpirata lucrte vena'
dans' veta' ia' debent' lustrata' vanagete vamirata lite'
lugens' esnt levata' nuta' gens' veanta rochum' nogete le 
dato' uascie excita' curi gent na veta' le veta' luedicta 
veexti vata' arta' te' chum no' no' amicta' luedet luga' 
mori' edente' noga date' reri' lugens' feta' luedicta luga' 
morata' luaena uechum vana' lugete' ad' nt vana' luga'
pate vata' lugertta audita' lugens' vita' curata' resona'
lupina' feta' lumina' lugeum veon lugete' no' vana' lant 
aule lucrte ista' veta' lugens' vita' na ruri' mena' strata' 
luedicta lugens' nota' luedicta na lugent' reti' vena' date' vageas 
dedita' lugent' nota' veta' te' no' iret' veta' na no' vena' luga' 
morata' edicta' lumina' lumina' lumina' lumina' lugens' novena' 
lace' si' educta' lugens' na novena' lugens' vita' lugens' ruti' sita' 
lans' lugens' luuacinota lacium vana' pant' le vena' reista luga' 
aurata' lugete' verata nt ha' urgens' ad' revena lugens' vana'
morati' vemota curatita dant' bunt' id' ncnt vena' lugeas 
pant' vesata lunt vena' lunt ruchum este' late' nota' 
luaerata lunt veta' luga' veta' ulta' lugens' ti 
iu' veta' lute vata' stri sschum veta' lumicium 
clrechum vata' poma' le sebete no' te' ut' 
sati' veta' lugens' vana' le acta' gete vana' tute' 
mori' aeri' luedet lugent' deri lunt vana' lugeas 
lugens' no' edet' luuatita lans' vato te' no' 
lans' orta' luedicta id' na luedicta strata' tuas' 
dans' no' veno dans' no' luia' date' muta' 
gnsiurte duno' luca' alti' vena' resina' date' ruti' rurata na sunt' 
errata' morata' luedet pant' no' dace' veto' lunt nt amicta' luedicta strata' 
dans' fisi' na uachns no' recina lunt dans' novata' luedicta vana' lucrum' 
lu' dans' irquta lumita lugens' vaga' lugent' dans' vana' resino vena' reti' nodo' 
aurata' lumita vita' lugete' vena' vata' lumini' lumina' dant' na 
locuta' dant' vena' lugens' date' pena' vena' lugete' vena' usta' 
luedet nt vena' lunt vana' lugens' ferata rorata' dalias 
pr' nomina' resi date' ruta reti' ruti' no' gens' nomina' 
lugens' ambita' lugens' date' date' vena' lugens' nona' 
pant' mirata luedicta luncta' reti' ruti' date' date' na 
lumina' na ambita' lumina' luedicta lumiista nt 
lumirata luedicta lumina' lumina' lumirata usta' 
serata' luedicta lumirata noedicta ruri' ncusta 
date' vena' lugens' vena' dant' edicta' te' vena' 
paedicta luedicta rucina luga' na edicta' ma 
aurata' lumina' luedicta luiget luigicta date' 
serata' vagete nona' lugens' fiti ruti' nona' 
sprata lans' rerata lumina' rurata renona ruti' ruut nochns ha' date' na 
derata orta' lumita rerata lalint ruta rurata lumena rurata rechns 
lumita lumina' veno lumina' dans' vena' eg plnt vana' noti' 

An abjad result from the Genetic Algorithm

June 17, 2011 3 comments

Here is one of the GA results. This is an attempt at deciphering the text on f9v (the Viola plant). The VMs words on that folio are:

"fo1oy","ogoyo89","og9","2oy","4og19j1o","4ofoe","2oe",
"81oy","1oe","1oy","89","ok9","89",
"9hc9","1oy","oh9","occcs",
"9kc9","k19","okoe","ok9","koe89",
"g1oy","9j1cc9","4okoy","9j19","kc","ay","1k9",
"o8oe","1o9","h2co89","1o89","ok19","9ha",
"4o","1oe","1oe","okae","8oy",
"4oh1o","yoh98","8ae9",
"19","kay","19k9","8ay9","9koe89",
"ok9","h1oe","1oe","19","h9k9",
"91oy","12ok9","1oy"

These are not all the words on the folio: I have removed those that contain unusual or problematic glyphs (e.g. the “m”).

The GA comes up with the following VMs->Latin character mapping:

Voynich: o    9    1    k    y    8    e    c    h    a    4    g    2    j    f    s

Plain:   r    s    d    p    m    b    t    n    f    l        q    c    x    v    g

And here are the deciphered words. On each line you have the VMs word, the Latin consonants, then the possible Latin or English words in the dictionary that match the abjad.

fo1oy = vrdrm =  virdiarium viridarium viridiarium
ogoyo89 = rqrmrbs =  ?
og9 = rqs =  requies arquus
2oy = crm =  carum coram curam corium cremo cyrum curiam acerum acorum acroama acrum aecoreum careum cereum cerium ceroma coarmi coarmo crami cremii cremi croma cromae curium cream
4og19j1o = rqdsxdr =  ?
4ofoe = rvrt =  reverti reverto iuraverat
2oe = crt =  certa certe certo creta curatio curto creat coarto create cartae caret acerata careota careotae cariota cariotae carota carotae carta carti caryitae caryota caryotae ceratia ceratiae ceratii cerati cerata ceroti certi coertio coryti cratio creatio creati creata cretae cretea cretio crita critae croto curate curata curiatia curiata curito curta ocreata court courte curt cart
81oy = bdrm =  obdormio
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
89 = bs =  abs bis bos iubeas iubes basio uobis abusi ibis abies absi abusio baes bas basi bes bios bus ibos obesa obsuo obsui base abuse bees boys busy bays
9hc9 = sfns =  sifonis
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram
oh9 = rfs =  rufus refuse
occcs = rnnng =  running runninge
9kc9 = spns =  sapiens spinas sponsi sponsa supinis spensa spinis yspanos sapineus sapinus saponis siponis sopionis spensae spineus spinosa spinus spons sponsae sponsio sponso supinus
k19 = pds =  pedes pedis apodis pods
okoe = rprt =  reperiet reparat eriperet reperta reperit reparatio reperti reporto reporte report
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
koe89 = prtbs =  partibus portabis parietibus
g1oy = qdrm =  quadrum quadrima
9j1cc9 = sxdnns =  ?
4okoy = rprm =  reprimi reprimo
9j19 = sxds =  ?
kc = pn =  opinio opino paene pene poena pono punio puny upon pane pena pone apiana apianae apina apinae paean paeon paeonia penae peni pinea pini poenae poenio open pen paine pain payne pyany pin pine pan peny peony
ay = lm =  aliam alium lama lamia lima limo olim almi oleum alme alma aulam alum aulaeum elimo ilum lamae lamiae lema limae limi ulmea ulmi elm
1k9 = dps =  dapes daps adeps adipis adipeus adips adipsi adposui dapis deposui depso depsui diapasi
o8oe = rbrt =  arboreti robert
1o9 = drs =  aderas derisui dorso durus odores duros dirus edurus odorus edrus durius diris duris derisio dares adoris adoreus adoriosa adrasi adrisi adrisio adrosi adursi derasi derisi derisa derosi derosa diarius dirasi dorsi odoris deirous dooers doores dryes dries drousie dyers
h2co89 = fcnrbs =  facinoribus
1o89 = drbs =  derbiosa
ok19 = rpds =  rapidus
9ha = sfl =  useful safly safely
4o = r =  aer ara aro aurae aure aurea auro ero eruo ira irae ire iuro or ore ori oro re rea rei rui ruo aera aerio ora iura aura era r uero uaria area auri iure iuri ere aeer aerae aerea aerei aeria aero arae areae areo arui aria ariae ari aureae aurei eiero eare erae erui eri euro euroa euri iro orae reae uro uri rai are oure yeare your our youre ear rue year yeer air rye ar
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
okae = rplt =  repleuit repleta repleat
8oy = brm =  baioarium barim baioariam brume bireme boarium boreum borium bromi bruma brumae eboreum ebrium ebureum obarmo broom
4oh1o = rfdr =  ?
yoh98 = mrfsb =  ?
8ae9 = blts =  oblitus balatus balteus ablatis ablutus abolitus ablatus belatus beluatus bliteus boletus bolites oblatus blites
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
kay = plm =  palam palma pluma pulmo puleium epulum pilum palmo apuliam palium apalum palmae palmea palmi palum paulum pileum plumae plumea plum polium polum palm
19k9 = dsps =  dasypus deseps disposui despise
8ay9 = blms =  bulimos bulimosa bulimus balms
9koe89 = sprtbs =  spiritibus
ok9 = rps =  repsi rapis aeripes euripus reapse reposui rupes rupis ropes
h1oe = fdrt =  foederata foederati
1oe = drt =  audierat deerat oderit odorati aderat auderet durat diruat daret deaurata adaeratio adoratio deartuo deorata deratio diratio dirutio duratio duritia duritiae duritiei odoratio odorata darte dirty
19 = ds =  ades audias audis das deos deus dies duos odiosa dis adso iudeis ydus adesa adsuo adsui aedes aedis aedus dasea daseae dasia dasiae des desuo desui diis dius dos duis edius edus idos odiose udus dayes daies odyous dose ads daisie
h9k9 = fsps =  ?
91oy = sdrm =  siderum sidereum sudarium
12ok9 = dcrps =  decerpsi decarpsi
1oy = drm =  audieram darem dierum dormio oderam odorem iudeorum deorum darium adoreum adorium dearmo diarium dirimo diremi dirum dormeo drama dromo durum edormio edurum odorum dram

Vowels and 17

June 17, 2011 Leave a comment

Latin Alphabet – the number 17 again

From http://latindiscussion.com/latin/alphabet/history/evolution

“Before the Renaissance, letters J and U had been merely glyph variants of I and V.
W was first used by scribes writing Old English during the 7th century AD.”

CLASSICAL LATIN ALPHABET:

(22) A B C D E F G H I L M N O P Q R S T V X Y Z

REMOVE VOWELS:

(5) A E I O V

REMAINING:

(17) B C D F G H L M N P Q R S T X Y Z

f57v

Glyphs on f57v, 3rd ring, Voyn_101 encoding – 4 sets of 17 characters each:

o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.
o.e.8.y.?.?.h.p.f.?.k.y.?.?.9.?.?.

compare:
b.c.d.f.g.h.l.m.n.p.q.r.s.t.x.y.z.

(Glyphs marked with “?” are very rare, and occur only on f57v and one other folio.)

Latest Vowel-less Table for Genetic Algorithm results

 

Does the language of Dante fit the VMs?

October 4, 2010 Leave a comment

Having spent many pleasurable hours checking various exotic cipher and code ideas, none of them remotely fits when using a GA, except one. My faith in the GA technique is that it very quickly gives an idea of how well a code/cipher theory fits the VMs text.

The one cipher idea and plaintext language that does notably better than all others is an nGram mapping with the language of Dante as the plaintext. This is a form of early Italian, and it produces results significantly better than all other languages tried with nGrams, including Latin, German, English, Spanish, Dutch, Chinese etc. .

I’ll post some results from this nGram/Dante GA later.

There is a significant obstacle with applying computational techniques to the VMs, and that is the machine transcriptions of the VMs text. Basically they differ substantially, to the extent that statistics obtained with, say, EVA do not match well with statistics obtained with, say, Voyn_101. A particular problem is glyph bloat … my opinion is that GC’s Voyn_101 transcription contains many more glyphs than the scribes were actually using. Little differences between the ways of writing “9″ for example, are classified as different glyphs. This plays havoc with statistical analysis. Thus I have a procedure that filters the Voyn_101 and remaps e.g. those multiple “9″ glyphs to the same glyph. This allows a smaller, more realistic, search space. But it still doesn’t address the question of what strokes make up a single glyph, which is often open to interpretation. Thus any nGram mapping procedure has to allow for at least 1-3 Grams in the Voynich to be reasonably sure of covering the glyph correspondences properly.

Here is an extract of the Dante Alighieri text that matches decently using nGrams to the VMs:


Cjant Prin

A metàt strada dal nustri lambicà
mi soj cjatàt ta un bosc cussì scur
chel troj just i no podevi pì cjatà.

A contàlu di nòuf a è propit dur:
stu post salvàdi al sgrifàva par dut
che al pensàighi al fa di nòuf timour!

Che colp amàr! Murì a lera puc pi brut!
Ma par tratà dal ben chiai cjatàt
i parlarài dal altri chiai jodùt.

I no saj propit coma chi soj entràt:
cun chel gran sùn che in chel moment i vèvi,
la strada justa i vèvi bandonàt.

Necuàrt che in riva in su i zèvi
propit la ca finiva la valàda
se tremaròla tal còu chi sintèvi

in alt jodùt iai la so spalàda
vistìda belzà dai rajs dal pianèta
cal mena i àltris dres pa la so strada.

(This is modified from a reply to Knox who commented on an earlier post.)

More on Consonants/Vowels in the Recipes

September 26, 2010 Leave a comment

Here are some results from the Recipes folios for the verbose homophonic cipher idea proposed earlier.

Using the Recipes Folios, we find
1085 lines of VMs words
3150 different words on those lines

Looking for word sequences within a line that fit the pattern XYYZ (note that X=Y as well as X=Z is allowed):

50 XYYZ sequences
102 different words

(This is somewhat disappointing, as 102 is a small fraction of the total vocabulary.)

Two of the 50 sequences are of the form YYYZ or XYYY (“2oy 2coe 2coe 2coe” and “2coe 2coe 2coe 4oh1c89“) and so I remove “2coe” from further consideration as being ambiguously a vowel or a consonant or something else such as a number digit. This involves removing it wherever it appears in any of the 50 sequences.

Next I collect a list of all the different Y words (there are 31), and for each, a list of the X and Z words it appears with.

The hypothesis is that for each sequence, X and Z must code for vowels and Y for a consonant, or vice versa. (This holds for Latin, for example.)

At this point, the words can be categorised into two sets: Category 1 and Category 2. A Cat1 word cannot appear in the Cat2 list, and vice versa. The categorisation is done by first taking the the initial Y word, assigning it to Cat1, and assigning its XZ words to Cat2:

Y=4ohii89 (Cat1)    X/Z=4oh29 1sk9 e1c89 4ohco 82coe 1c9 4ohcc89 4okc9 (Cat2)

The next Y word is then examined:

Y=4ohii9 X/Z=4okc9 okc8(

Since 4okc9 has already been categorised as Cat2 in the first step, it follows that 4ohii9 is Cat1, and okc8( is Cat2.

This procedure continues for several iterations over all the Y and X/Z words until all have either been allocated to Cat1 or Cat2 or cannot be allocated to either (16 words). One word cannot be unambiguously assigned: 4ohcc9

The contents of the two categories are:

Category 1 (28 words)
4ohii89 oe 1oe 4ohcc9 4ohii9 2cae 4ohc9 kii9 1cae okc8aiN 4okc8( 1ii9 1c8ae 1ae yae 4oh89 8ae 1c8 4okay ohaiN e 1c89kcahaiN ohciiN kcc89 hco8( hae okaiiN okay

Category 2 (18 words)
4oh29 1sk9 e1c89 4ohco 82coe 1c9 4ohcc89 4okc9 okae ohii89 4oh1c9 1c89 ohcokcc9 4o okc8( 1oy ay 4okaiN