Archive for the ‘Spanish’ Category

Alfonso X’s Lapidario: Stones, Stars and Colours

October 13, 2016 6 comments

I’ve been down a bit of rabbit hole over the last couple of days which others may have already been down. I was looking once more at the Zodiac folios, in particular Taurus. The Taurus Light and Dark folios are both marked “may” in, as often remarked, a later hand. There are 15 figures in each Taurus folio, for a total of 30. However, as we well know, May has 31 days, so the figures probably don’t represent days. I thus went in search of 30-way splits of Zodiac signs ….

Alfonso X’s Lapidario

Looking at this old Spanish illustrated manuscript:

“Tratados de Alfonso X sobre astrología y sobre las propiedades de las piedras”

which is a treatise on astrology and the importance of stones/gems etc., we can see a circular Taurus diagram with 30 divisions.


Each of these divisions is associated with a stone, of a noted colour, and one or a few stars in a constellation. There is a lengthy description of each division, its stone, its stars, the various ailments the stone cures, when the stone should be used, et cetera. There is a Spanish transcription of the text here, which I found very useful (combined with Google Translate):

Alfonso X Lapidario

Since a plausible language match to the month spellings as written in the Zodiac folios is Occitan which at one point covered part of Spain (please correct me on this, as I’m not sure), there seems to be a compelling regional match here, but I can’t quite figure it out.

From what I’ve read, Alfonso X assembled a team of scholars from all world regions, who worked on documents on a variety of topics. This website says of the Lapidario: “The Lapidario is a thirteenth century Castilian translation sponsored by King Alfonso X el Sabio, the Learned. The translation was done from an Arabic text which in turn is said to have been translated by the mysterious Abolays from an ancient text in the “Chaldean language”

Matching Stone Colours

Anyway, my first approach was to try to match the colours of the headgear or tunics of the clothed figures in the Taurus Light folio to the colours of the first and second fifteen stones mentioned in the Lapidario. It’s a little tricky, because although the stones are numbered, we don’t know which is figure 1 in the Taurus Light folio, and whether the inner ring precedes the outer. Even so, the patterns of colours in the stones sequence might reveal a match. I drew a blank.

Matching Stones with Voynich Star Labels

My second approach was to try to match the names of the stones with the labels on the figures, to see if there was some correlation between the label length, or its initial glyph, with the stones’ names. Very tricky.

Some of the stones that appear in the Taurus set of 30 also appear in other Zodiac signs in the Lapidario. For example, the ninth stone in Taurus is “esmeri(l)” (Latin), and esmeril is also the third stone of Libra, and the second stone of Aquarius.

This leads to the obvious question: is there a Figure in the both the Voynich Taurus and Libra roundels (Aquarius is missing) that shares the same label? If so, might that label be “esmeril”? And, are there other stones that appear in more than one sign which might be matched to duplicate stones in the Lapidario?

(As an aside, regarding the stones and colours, I was struck by the third stone of Taurus, called “camorica”, which is scarlet in colour and associated with the Pleiades.)

Another promising avenue is to compare the shapes and orientations of the stars in the constellations as they appear in the VM with how they appear in the Lapidario. Since the Pleiades are mentioned in the Lapidario, are they illustrated there, and does its illustration of the cluster match the apparent drawing of it in the VM (which differs in detail from its actual appearance in the night sky)? I need to investigate further, but my suspicion is that others have already been down this path 🙂

Chaldean Stones

I extracted the Chaldean stone names for each Zodiac sign, from the transcription I linked to above. The stones for Aries are shown below, as an example. (The whole set is available if anyone wants it.)

The first 12 sections in the Lapidario list the 30 stones for each zodiac sign, but a few signs appear to be truncated: Leo has only one stone, Pisces only has two stones, and Aquarius only 28.

There follow more sections, again one for each sign, but these each have only three stones. I’m not clear what they represent. I posted about all this at Voynich Ninja, and MarcoP was able to explain. Others also chimed in with some useful comments. The discussion is here.

Anyway, following those sections are several more that cover the stones of Saturn (4 stones), Jupiter (4 stones), Mars (4 stones), Venus (24 stones), Sun (9 stones) and Mercury (17 stones).

Here is an extract of the list I extracted for all the Zodiac stones: this is for Aries.

1 magnitad
2 zurudica
3 gagatiz
4 miliztiz
5 centiz
6 movedor
7 goliztiz
8 telliminuz
9 milititaz
10 huye de la leche
11 alj?far
12 anetatiz
13 beruth
14 piedra de cinc
15 tira el oro
16 chupa la sangre
17 parece en la mar cuando sube Marte
18 tira el vidrio
19 annora
20 yzf
21 cuminon
22 astarnuz
23 belyniz
24 gaciuz
25 azufaratiz
26 abietityz
27 lubi
28 ceraquiz
29 berlimaz
30 annoxatir

In total I count 301 stones in the Lapidario’s Zodiac section, of which 291 are unique to a sign. The remainder appear more than once as follows:

bezaar [(9, 'G\x83MINIS'), (11, 'G\x83MINIS')]
azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')]
pez [(7, 'LIBRA'), (30, 'LIBRA')]
plomo [(18, 'VIRGO'), (13, 'CANCRO')]
calcant [(10, 'VIRGO'), (11, 'VIRGO')]
aliaza [(23, 'TAURO'), (29, 'TAURO')]
parece en la mar [(15, 'SAGITARIO'), (15, 'TAURO'), (17, 'G\x83MINIS'), (17, 'ACUARIO')]
de la serpiente [(12, 'LIBRA'), (7, 'G\x83MINIS')]

e.g. “bezaar” is the 9th and the 11th stone in Gemini, “de la sepiente” is the 12th stone in Libra and the 7th in Gemini.

Turning to the Voynich Zodiac, I count 298 unique star labels of which 269 are unique to a sign. The labels that appear more than once are:

otal dar ['71r', '70v2'] Aries (Light) , Pisces , 
otal ['72r2', '73r'] Gemini , Scorpio , 
okeey ary ['72r1', '72r2'] Taurus (Dark) , Gemini , 
okal ['73v', '72r2', '72r2'] Sagittarius , Gemini , Gemini , 
okeos ['73v', '73r', '73r'] Sagittarius , Scorpio , Scorpio , 
okeoly ['70v2', '72v1'] Pisces , Libra , 
otaly ['70v2', '72v3', '73r'] Pisces , Leo , Scorpio , 
okaram ['70v2', '72r2'] Pisces , Gemini , 
okoly ['70v1', '72v3'] Aries (Dark) , Leo , 
okalar ['72r3', '72r2'] Cancer , Gemini , 
okary ['72v3', '73r'] Leo , Scorpio , 
okam ['72r2', '72v3'] Gemini , Leo , 
okeody ['73v', '73v', '73r', '72v2'] Sagittarius , Sagittarius , Scorpio , Virgo , 
ykey ['73v', '73v'] Sagittarius , Sagittarius , 
okaly ['70v2', '72r2', '72r2', '72v3'] Pisces , Gemini , Gemini , Leo , 
okaldy ['72r2', '72v3'] Gemini , Leo , 
otaraldy ['72r1', '72r2'] Taurus (Dark) , Gemini , 
otoly ['72v3', '73r'] Leo , Scorpio , 
oky ['73v', '72v3', '73r'] Sagittarius , Leo , Scorpio , 
oteody ['73v', '73v'] Sagittarius , Sagittarius , 
okedy ['72v1', '73r'] Libra , Scorpio ,

e.g. “otal dar” appears as a label on both the Aries(Light) and Pisces zodiac chart.

If the Voynich Zodiac charts are indeed showing stones (and the figure/star labels are their names), then there should be good matches between the two lists above.

One potential match is:

azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')] 
ykey ['73v', '73v'] Sagittarius , Sagittarius ,

However, the two labels “ykey” on f73v are not adjacent, which they should be if they are stones 12 and 13.


azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')] 
oteody ['73v', '73v'] Sagittarius , Sagittarius ,

in this case, the two labels “oteody” on f73v are adjacent to one another, but the figures/stars they label are in the group of four at the top of the folio: it’s a stretch to think their locations are 12th and 13th.

To be continued ….


Does the language of Dante fit the VMs?

October 4, 2010 Leave a comment

Having spent many pleasurable hours checking various exotic cipher and code ideas, none of them remotely fits when using a GA, except one. My faith in the GA technique is that it very quickly gives an idea of how well a code/cipher theory fits the VMs text.

The one cipher idea and plaintext language that does notably better than all others is an nGram mapping with the language of Dante as the plaintext. This is a form of early Italian, and it produces results significantly better than all other languages tried with nGrams, including Latin, German, English, Spanish, Dutch, Chinese etc. .

I’ll post some results from this nGram/Dante GA later.

There is a significant obstacle with applying computational techniques to the VMs, and that is the machine transcriptions of the VMs text. Basically they differ substantially, to the extent that statistics obtained with, say, EVA do not match well with statistics obtained with, say, Voyn_101. A particular problem is glyph bloat … my opinion is that GC’s Voyn_101 transcription contains many more glyphs than the scribes were actually using. Little differences between the ways of writing “9″ for example, are classified as different glyphs. This plays havoc with statistical analysis. Thus I have a procedure that filters the Voyn_101 and remaps e.g. those multiple “9″ glyphs to the same glyph. This allows a smaller, more realistic, search space. But it still doesn’t address the question of what strokes make up a single glyph, which is often open to interpretation. Thus any nGram mapping procedure has to allow for at least 1-3 Grams in the Voynich to be reasonably sure of covering the glyph correspondences properly.

Here is an extract of the Dante Alighieri text that matches decently using nGrams to the VMs:

Cjant Prin

A metàt strada dal nustri lambicà
mi soj cjatàt ta un bosc cussì scur
chel troj just i no podevi pì cjatà.

A contàlu di nòuf a è propit dur:
stu post salvàdi al sgrifàva par dut
che al pensàighi al fa di nòuf timour!

Che colp amàr! Murì a lera puc pi brut!
Ma par tratà dal ben chiai cjatàt
i parlarài dal altri chiai jodùt.

I no saj propit coma chi soj entràt:
cun chel gran sùn che in chel moment i vèvi,
la strada justa i vèvi bandonàt.

Necuàrt che in riva in su i zèvi
propit la ca finiva la valàda
se tremaròla tal còu chi sintèvi

in alt jodùt iai la so spalàda
vistìda belzà dai rajs dal pianèta
cal mena i àltris dres pa la so strada.

(This is modified from a reply to Knox who commented on an earlier post.)

Landini’s Challenge

February 26, 2010 Leave a comment

An excerpt from Landini’s challenge text (text he generated using an undisclosed method, supposed to replicate the features of the VMs text):

qopchdy chckhy daiin ¬ ½shxam chor otechar okcharain ryly sheodykeyl
sheodykeyl daiin shd okaiin qokain qokal yteoldy otedy qokydy opchedy
otal oldar chor lkeedol eer ol dair chedy daiin ockhdar cpheol chedy
xar qokaiin y chedy kshdy ololdy aiin char y okeey oldar qokaiin lsho
daiin olsheam qoeey chedy dchos pshedaiin shedy d qol key sheol or
cpheeedol qokedy qokaiin daiin cthosy chedy ar aiir chedy teeol aiin
cheey y cheam oky qokaiin daldaiin loiii¯ ar shtchy chedy aldaiin
ydchedy daiin shd okaiin qokain daiin qotcho chedy daiin lchy olorol
otedy qockhor shol daiin paichy chedy ar shdair chedal chedy kchdaldy
chckhy otakar qokedy s qooko chor daiin otcholchy chedy daiin koroiin
qokain qokedy kosholdy ol kchedy kshdy qokaiin ar shaikhy olaldy seees
ar oteodar chedy oteeol shedy daiin key dain daiin keeokechy chedy
lchey ail lchedy sches ol dsheeo otol odaiin qokain daiin sheeod chshy
chedy qoekedy tair sain qocheey aiin cheey chaiin ols shedy sheolol
daiin lcheol chedy daiin pchoraiin oshaiin chedy lchey lor sal aiin
cheey y dsheom shedy todydy cheor saiin shdaldy daiin ofchtar daiin

Here are some thought-provoking results from analysing the text, as suggested by Knox, the VM text, and comparisons with English, Latin, German, French and Spanish. These use a new form of the Genetic Algorithm, described below.


It looks to me like that Landini either generated his text from a transcription of the VM itself, or his algorithm for generating that text is a good emulation of the encoding process used in the VM. In other words the Landini “language” is a good candidate as a plaintext language for the VM, as opposed to the European languages tested.


Here is a table which shows the GA’s efficiency at converting/translating between Voynich, Landini, and the other languages.

(In the table, the best possible score is 1.0 – see below for an explanation)

Asking the GA to translate English to English, or Latin to Latin, etc. results in a high efficiency score, as expected. Note that the Landini to Landini  efficiency is 0.97 – almost perfect.

The GA performs moderately at converting between the languages and the Landini text. But what is most striking (to me) is the good efficiency for converting Voynich to Landini (0.74) and Landini to Voynich (0.89)

Some Notes on the table

To look at this I revised my GA code so that it was more flexible, and I jettisoned the use of separate dictionaries. Here is how the GA now functions. It can convert/translate between any language text samples.

1) Two text files are read in: the “source” text, and the “target” text. This could be, for example, a source file containing Landini’s text, and a target file containing Spanish text, if we want to convert from Landini to Spanish.

2) The text in each file is processed separately, producing two word lists, and two sets of n-Gram frequency tables.

3) The chromosomes are generated with random mappings between the source n-Grams and the target n-Grams

4) The GA evolves the chromosomes by trying to maximise their cost. The difference now is that when a target word is generated from the source text using the mappings, it is looked up in the target word list created in 2) above, rather than in a separate dictionary.

5) After training, the best chromosome can have a maximum cost value of 1.0, which would correspond to a perfect conversion between the source text and the target text (i.e. every word produced from the source text is found in the target text dictionary)

6) So we can feed the GA with two identical texts, and after training the score of the best chromosome should be 1.0, and indeed it approaches that (it doesn’t quite get there because only the top 100 n-Grams are translated, and so some characters in the source text cannot be translated).

7) The word and n-Gram frequency lists are made from the entirety of each text, but (for this exploratory study) the training takes place on only the first 50 “words” in the source text, and uses only the first 100 n-Grams for mapping.  Thus if the 50 words of Voynich chosen contain several rare characters, then for those the mapping will fail because those rare characters do not appear in the n-Gram list, and this will result in a lower score.

8) In all cases the “X->X” score in the table (i.e. the diagonal)  represents the best score possible for that language, and is a normalisation for the other numbers in the table. I should really revise the table and divide out the off-diagonal scores by the diagonal normalisations.

9) An improvement would be to configure the n-Gram list to be, say, 200 long, and use more source (Voynich) words for the training. The downside of this is mainly execution speed.

10) These runs were with n-Grams up to 3: it would be better to go to 4 at least.

11) I think Landini gets good scores because the character set he uses is very small. Knox comments ” A factor must be that the Landini Challenge has built-in frequency matches to any transcription of the VMs. Also, there is no meaningful correspondence in the letter sequence of one word to another in Landini. The difficulty fits what I said the VMs may be.”

Voynich Herbs

February 26, 2010 1 comment

Edith Sherwood has a web site where she details compelling possible identifications for the plants depicted in the “herbal” pages of the VM.

Dana Scott’s page also has plausible identifications for the plants.

As has often been pointed out, if we look at the first Voynich “word” that appears on each page of the herbal part of the VM, we find that those words are unique, or appear elsewhere very rarely. It thus seems reasonable that the words may be the names of the plants depicted.

The GA was set up to find a set of n-Gram mappings that would convert a list of 111 Voynich first herbal words into Latin/English or Spanish. For this, dictionaries of Latin, English and Spanish herb/plant names were used.

The GA sought a mapping that would convert all the Voynich words for herbs/plants into as many valid plaintext (Spanish, English, Latin) words as possible. The best result was for a mixed English/Latin dictionary (see table): 31 of the 111 Voynich words were converted, about 30% success rate.

(One should never expect 100% success, due to missing names in the dictionary, transcription errors, missing n-Grams, incomplete n-Grams etc..)

The results are shown below in tabular form, together with Dana Scott’s and Edith Sherwood’s identification. The first column shows the folio in the VM, the second shows the first Voynich word on that folio. For the GA identification columns (3 and 4) the Voynich mapped word is shown, in quotation marks if not found in the associated dictionary, and in bold if found in the dictionary.

Note that, probably unsurprisingly, nowhere do the IDs from the GA in Spanish, English/Latin and Scott/Sherwood, agree! NOT YET, anyway 🙂

(What amuses me about about this mapping technique is that it tends to produce words that sound plausible in the target language. E.g. for f4r the Latin/English word “paptise” sounds like a valid word.)

Folio Voynich 1st Word Candidate GA ID, Spanish Candidate GA ID, Latin/Engish Dana Scott ID, English Dana Scott ID, Latin Sherwood ID, Latin Sherwood ID, English
f1r fa19s costa “greica”
f1v h1s9 rabo geum Deadly Nightshade Atropa belladonna Hyoscyamus niger Solanum nigrum Solanum dulcamara Atropa belladonna Deadly Nightshade
f2r h98an9 “jzba” “ariapha” Cornflower Centaurea cyanus Centaurea diffusa Diffuse Knapweed
f2v hoom “meic” “padi” Water Lily Nymphaea candida Nymphoides Nymphoides
f3r k2cos chinita (Impatiens) arnica Celosia argentea Feathery amaranth
f3v hoam menta (mint) paris Helleborus foetidus Dungwort
f4r ho8ae19 “mezirn” “paptise” Saxifraga cespitosa Alpine Saxifrage
f4v j1oom pastora (Poinsettia) “oigle” Campanula rapunculus Rampion
f5r h2o89 “piyn” “hicse” Arnica montana Wolfs Bane
f5v hA1coy malanga (Malanga) cirsium Tennis Racket Plant Agrimonia eupatoria Malva sylvestris Mallow
f6r foay “oote” “erk” Acanthus mollis Bear Breeches
f6v hoay9say1Chay “meotendoteisedh” “pakpikrtsst” Eryngium maritimum Sea Holly
f7r f1o8am “saynta” acris Trientalis europea Starflower
f7v joe29 “rden” anise Myrica gale Bog Myrtle
f8r g2oe “dno” “miv” Pisum sativum Green Pea
f8v Ko8 “anop” “amot” Symphytum officinale Comfrey
f9r k98eo “uardna” “cernur” Ricinus communis Casteroil
f9v fo1oy “oveh” “erut” Heartsease, Wild Pansy Viola tricolor Violaceae Viola
f10r g1oK9 “pohon” “apryse” Cichorium pumilum Chicory Endive
f10v gam tora (Tora Tree) gale Linnaea borealis Twinflower
f11r k2oe chino (Chinese Hat Plant) “arv” Rosmarinus officinalis Rosemary
f11v goe81o89 “albaveaca” “maadud” Curcuma longa Turmeric
f13r koy3oy “lenga” “mdoium” Banana Banana
f13v hoaiy “memh” “paft” Lonicera periclymenum Honeysuckles Woodbines
f14r g1o8am “poynta” “apcris” Scorzonera Black Salsify Vipers Grass
f14v g891om “uomic” “gesdi” Stachys monnieri Wood Betony Heal-all Sel-heal Woundwort
f15r k2oy “chiga” “arium” Sonchus oleraceus Sow Thistles
f15v gayoy “t8h” “gabt” Paris quadrifolia Herb Paris
f16r go1co89 “alblanyn” “marscse” Cannabis Cannabis
f16v g1yAm “potoora” “aptule” Chrysanthemum Chrysanthemum
f17r f2o89 “hayn” “ulcse” Catananche caerulea Cupids Dart
f17v g1o8oe “poyno” “apcv” Dioscorea Yams
f18r g8yaz89 “ullngn” “gmeagse” Aster alpinus Aster
f18v koe8 la (?) mad Telfairia Fluted pumpkin
f19r g1oy “poga” apium Polemonium coeruleum Greek Valerian
f19v go1am “albbora” mantle Draba nivalis Nailwort
f20r h81o89 “caveaca” woud Astragalus hypoglottis Milk vetch
f20v faIsay “crrote” greek Cynara cardunculus Cardoon
f21r g1oy “poga” apium Anagallis arvensis Pimpernel
f21v koe829 “laol” “madpe” Dictamnus albus Burning bush False Dittany White Dittany Gas Plant
f22r goe “albv” “maus” Verbena officinalis Common Vervain Holy Herb
f22v g9samoy “..dah” “hnshot” Tulip Tulip
f23r g9818op “.fhilo” “hsthlo” Pulsatilla vulgaris Pasque flower
f23v go8azoe “albzucv” “mapacus” Borago officinalis Borage Star Flower
f24r goyoy9 “alb..” “maby” Cucumis sativus Cucumber
f24v k1o8ay coyote (wild) rock Ficus religiosa Sacred Fig Bo Tree
f25r f1oe89 “sanoaca” “avd” Wild Thyme
f25v goCam “albcuora” “malile” Isatis tinctoria Woad
f26r g%coh9 “spnij” lunaria Prunella vulgaris Self heal
f26v g1c8ay pochote (Pochote) “apgok” Lens culinaris Lentil
f27r hsoy manga (Mango) “veium” Spinacia oleracea Spinach
f27v fo1ou oveja (?) eruca French Marigold Tagetes patula Dianthus superbus Dianthus
f28r g1o8ay “poyote” “apck” Aristolochia Smearwort Birthwort Pipevine
f28v h2oe pino (Pine) “hiv” Dahlia Dahlia imperialis Rhododendrons Rhododendrons
f29r gosam “alb.ora” “mansle” Lactuva sativa longifolia Romaine Cos Lettuce
f29v hoom “meic” “padi” Nigella sativa Roman coriander
f30r oh1cs9 “elanbo” “inrsum” Prunella vulgaris Healall
f30v Ks1an rubia (Madder) montana Cuscuta europaea Dodder
f31r hcc8c9 lichi (Lychee) “rgoio” Erigeron acris Fleabane
f31v go8az “albzon” “mapnn” Fernleaf yarrow Achillea filipendulina Valerian Valerian
f32r f1am santa (?) “aris” Veronica triphyllos Speedwell
f32v h1co8am “ranizora” “genple” Campanula rotundifolia Harebell
f33r k28ay “chizh” “arpt” Silene vulgaris Bladder Campion
f33v kayay “qllh” “opmet” Masterwort Astrantia major Tanacetum parthenium Feverfew
f34r g1cocj19 “ponianos” “apnbie” Anemone hortensis
f34v hs189 “mansn” “vewse” Lunaria annua Honesty Money Plant
f35r Koo anona (Custard Apple) amur Cichorium intybus Radicchio
f35v gay1oy “trtga” galium Ribes nigrum Blackcurrant
f36r j1af8aN “pa.nzti” “onupfl” Delphinium staphisagria Delphinium
f36v g1ayos9 “pooteesn” “apksise” Lamium amplexicaule Henbit
f37r koGoe “luiv” malus Mentha longifolia Mint
f37v h2o89 “piyn” “hicse” fedtschenkoi englerii Emilia fosbergii Tassel flower
f38r koeoy “lilh” “mmut”
f38v oh1oj “eveet” inula Euphorbia myrsinites Myrtle Spurge
f39r kc7o128 “goguadp” “gienmpot”
f39v g7aiy “inmh” “naft”
f40r g1c9 “poi” apio Erodium malacoides Storks bill
f40v j1c7an “pagmo” “oospo” Epiphyllum oxypetalum Crocus vernus Crocus
f41r j2c9hc8aecc9 “roilizrii” “ediorpcuio” Origanum vulgare Wild Marjoram
f41v hcSo8ae “lirbzv” “riupus” Coriandrum sativum Coriander Cilantro
f42r 2o “ah” st
f42v k1o˛ cola (?) rosa Aquilegia vulgaris Columbine Culverwort
f43r kayo8am “q.zora” “opbple” Stellaria media Chickweed
f43v g8saiy9 “u.lbn” “gnsicse” Elytrigia repens Couch grass
f44r k2o8g9 “chiy.” arch Mandragora officinarum Mandrake
f44v k2o china (Impatiens) “arur” Apium graveolens Celery
f45r g9h98ae “.jzv” “hariapus” Atriplex hortensis Orach Saltbush
f45v hosay9 “me..” pansy Lavandula angustifolia Lavender
f46r g1coJ9 “ponitr” “apnta” Leucanthemum vulgare Oxeye Daisy
f46v jo79e3c7 “rimvig” “andretos” Tanacetum parthenium, Chrysanthemum parthenium Inula conyza Ploughmans Spikenard Great Fleabane
f47r g1aiy “pomh” “apft” Lady’s Mantle, Lion’s Foot Alchemilla vulgaris Rosaceae Sempervivum tectorum Houseleek
f47v g2cok “dnier” minor Arnica montana Pulmonaria officinalis Lungwort
f48r g28am “dzora” “miple” Adonis Vernalis False Hellebore
f48v g1co819 “ponifn” “apnsse” Ruta graveolens Rue Herb of Grace
f49r gA2oe “ceahv” costus Nymphaea caerulea Blue Nile Lotus
f49v g he wort
f50r g2coy “dnih” mint Astrantia major Masterwort
f50v k19 con (?) rose Telopea speciosissima Gentiana frigida Stiff Gentain
f51r k2oe819 “chinofn” “arvsse” Cakile maritima Searocket
f51v go2o89 albahaca (Basil) “mastd” Salva officinalis Sage
f52r k8oh1F9 “queacn” “toinnise” Anemone coronaria Poppy Anemone
f52v g1oy “poga” apium Polystichum setiferum Fern
f53r hA8ap “mazlo” “ciplo” Achillea Ptarmica Sneezewort
f53v k2oy3c9 “chigamin” “ariumocse” Hieracium aurantiacum Hawkweed
f54r go8am “albzora” maple Cirsium oleraceum Cabbage thistle
f54v g1co8ay “ponizh” “apnpt” Bittersweet Nightshade Solanum dulcamara Perovskia atriplicifolia Russian Sage
f55r go8am “albzora” maple Fumaria officinalis Fumitory
f55v h1C8189 “raecsn” “geriwse” Forest lily Veltheima bracteata Broccoli Broccoli
f56r ok1ae “tebv” “trntus” Drosera Sundews
f56v h1cok “ranier” “genor” Cycas revoluta Sago Palm
f57r joccoHc9 “riopei” “anomiaio” Sherardia arvemsis Blue Field Madder
f65r Alchemilla vulgaris Ladies Mantle
f65v Centaurea cyanus Cornflower
f66v Satureja montana Winter Savory
f87r Satureja hortensis Summer Savory
f87v Senecio Primula vulgaris Primrose
f87v Kleinia Pedicularis flammea Lousewort Wood Bettony
f89v Actaea spicata Baneberry
f90r Conyza bonariensis Fleabane
f90v Eruca vesicaria Arugula Rocket
f93r Cynara cardunculus Artichoke
f93v Lupinus Lupin
f94r Botrychium lunaria Botrychium lunaria Moonwort Moonfern
f94v Agrostemma Githago Corncockle Red Campion
f94v Glycyrrhiza glabra Liquorice
f94v Plantago lanceolata Ribwort Plantain Kemps
f95r Berberis Sambucus nigra Elderberry
f95v Althaea Rosea Hollyhock
f96r Angelica archangelica Garden Angelica
f96v Tamus communis Black Bryony

Prefix Stem and Suffix Analysis

February 26, 2010 2 comments

I grouped all the folios from f1v to f20v inclusive, and labeled the group as “Herbal folios”, and folios f103r to f116r inclusive labeled as “Recipe folios”. I ran each group through a program that extracts all the prefixes,suffixes and stems, validates each, and orders them in frequency. (The method used was described in an earlier email to the list.) My first question was: are the word frequencies and prefix/stem/suffix(PSS) frequencies similar between the Herbal and Recipe collections?

Here are the results. I’ll show only the suffix frequencies, because they are the most interesting.

Herbal: 1331 different words, top 10 words: "8am 1oe 1oy K9 89 19 s 8ay 2oe oy" 
Recipe: 1443 different words, top 10 words: "am ay 1c89 oe 4ohC9 8am oy 4oham 1c9 2c89" 

Top 10 Herbal Suffixes  (Frequency) 

9       0.105580695 
89      0.065862246 
y       0.06435395 
e       0.058320764 
am      0.04524887 
m       0.03167421 
s       0.027652087 
19      0.025641026 
8       0.023629965 
oy      0.02212167 

Top 10 Recipe Suffixes 

9       0.11764706 
e       0.05882353 
89      0.05020284 
y       0.04817444 
am      0.036511157 
8       0.029411765 
ay      0.028904665 
ae      0.024340771 
oy      0.023326572 
oe      0.021805273 

Note: Similar sets (7 of 10), with suffix “9” being approximately a factor two more common than the next most common suffix. I’m not sure what conclusions can be drawn, if any, from this. For fun, I applied the same analysis to a similar number of words from Augustinus Latin. Here are the results, together with the VMs data:

(Augustinus: 1257 different words, top 10 words: "et te in non me mihi est domine ut enim") 

Top 10 Latin Suffixes 

m       0.118421055 
s       0.10526316 
e       0.047368422 
que     0.039473683 
i       0.034210525 
o       0.028947368 
t       0.028947368 
us      0.02631579 
rum     0.021052632 
a       0.021052632 

So, Latin does not have the same frequency pattern at all. Is there a language which does have a similar patterm? I looked at Frenchfrom 1367, Spanish from 1527, German from 1553, and old English (Courtier):

Top 10 French Suffixes 

s    0.1199 
t    0.0736 
z    0.0708 
e    0.0654 
nt    0.0463 
es    0.0436 
l    0.0245 
r    0.0218 
re    0.0191 
er    0.0191 
tre    0.0163 

Top 10 Spanish Suffixes 

s    0.1874 
n    0.0519 
o    0.0464 
a    0.0445 
r    0.0297 
do    0.0297 
es    0.0241 
l    0.0223 
e    0.0223 
va    0.0204 
to    0.0148 

Top 10 German Suffixes 

en    0.1171 
t    0.1171 
s    0.1122 
n    0.0537 
er    0.0390 
ten    0.0341 
d    0.0341 
e    0.0293 
m    0.0293 
ts    0.0244 
r    0.0244 

Top 10 English Suffixes 

e    0.1404 
n    0.0449 
s    0.0421 
t    0.0393 
re    0.0337 
y    0.0281 
ne    0.0253 
l    0.0253 
r    0.0253 
ll    0.0253 
ed    0.0225 

The Spanish suffix “s” is three times more frequent than the next suffix: not a good match to the VMs. Similarly for the English “e”. The German suffix pattern is completely different to the VMs. The French pattern looks similar to the VMs. Let’s look at the French Stems, and compare with the VMs:

Top 10 Herbal Stems 

o       0.15171504 
9       0.058377307 
8       0.045184698 
k       0.04287599 
1o      0.040567283 
oe      0.036609497 
o8      0.028364116 
oy      0.026385223 
y       0.02176781 
2       0.02176781 

Top 10 French Stems 

a       0.0704 
d       0.0544 
es      0.0528 
en      0.0448 
le      0.0432 
se      0.032 
ent     0.0304 
de      0.0272 
ce      0.0272 
ne      0.0256 

A poor match.

Conclusion: the “9” suffix in the VMs appears too frequently for it to come from Latin, German, English or Spanish. Although French has a similarly frequent suffix “s”, the stem frequencies of French don’t match the VMs.

Hypothesis: the “9” suffix in the VMs is not a word suffix, but punctuation or some other annotation. Perhaps a key mark for deciphering purposes. Next step: re-analyse the PSS frequencies in the VMs after removing suffix “9” from words where it appears.

Using the Biological and Astrological Folios

Astrological: folios 66v to 73v inclusive

Biological: folios 75r to 85r inclusive

Herbal: 1331 different words,       top 10 words: "8am 1oe 1oy K9 89 19 s 8ay 2oe oy"
Recipe: 1443 different words,       top 10 words: "am ay 1c89 oe 4ohC9 8am oy 4oham 1c9 2c89" 
Astrological: 1771 different words, top 10 words: "ay am ae 8am s 8ay 8ae 89 okcos ohC9" 
Biological: 2135 different words,   top 10 words: "oe 4ohan 1c89 2c89 4ohc89 4oe 4ohae 1c9 4oham" 

Top 10 Herbal Suffixes  (Frequency) 

9       0.105580695 
89      0.065862246 
y       0.06435395 
e       0.058320764 
am      0.04524887 
m       0.03167421 
s       0.027652087 
19      0.025641026 
8       0.023629965 
oy      0.02212167 

Top 10 Recipe Suffixes 

9       0.11764706 
e       0.05882353 
89      0.05020284 
y       0.04817444 
am      0.036511157 
8       0.029411765 
ay      0.028904665 
ae      0.024340771 
oy      0.023326572 
oe      0.021805273 

Top 10 Astrological Suffixes 

9       0.120173536 
89      0.055531453 
am      0.046420824 
ay      0.04381779 
s       0.04295011 
ae      0.04251627 
e       0.040347073 
79      0.026898047 
y       0.022993492 
oe      0.022125814 

Top 10 Biological Suffixes 

9       0.11961975 
89      0.049643517 
e       0.038288884 
oe      0.031687353 
y       0.030102983 
c89     0.029838923 
ae      0.0293108 
c9      0.0293108 
oy      0.02719831 
ay      0.02508582 

The suffix frequency results for the different folio groups look reassuringly similar to me: the differences are what you would see if you compared two modestly sized tests in, say, English. Indeed, one can tentatively conclude that the language is the same in all four of the VMs sections. On the other hand, the top 10 word lists are quite different. Curious.

Regarding word stems: the definition of a word stem for this study is “any group of characters that spells a valid word by itself, and is also found following one or more other characters (a prefix) and/or followed

by one or more other characters (a suffix).” So, single VMs characters can be stems. After all, it may be that a single VMs character equates to multiple plaintext characters, so we have to have the flexibility to assign single characters as stems.

To clarify, take for example the VMs word “8am”. The candidate stems are “8am”, “8a”, “am”, “8”, “a” and “m”. Those candidates that appear as single words in the VMS dictionary are classed as valid stems (in this case, I believe all six are valid stems).

Once we have a list of all the valid stems in the text, we can count how often each appears, and then order that list. This is what is done toobtain the lists above.

Because this method is fully general, we avoid any assumptions about how many characters a single VMs character maps to.


I changed the algorithm so that it only accumulated prefix/stem/suffixes for unique words in the VMs (as opposed to accumulating them for all words). I think this is more sensible, otherwise a very popular word ended up skewing the statistics. After doing this, the results for suffixes look similar between Latin and VMs (Recipes) – using 3800 words:

Top 20 Latin Suffixes (from a Latin dictionary)

s 0.08350305
o 0.042769857
t 0.03971487
m 0.034623217
is 0.029531568
e 0.02749491
us 0.026476579
a 0.022403259
es 0.020366598
rum 0.01934827
um 0.018329939
tum 0.017311608
mus 0.017311608
to 0.017311608
i 0.01629328
tus 0.01629328
tis 0.015274949
c 0.014256619
em 0.013238289
am 0.013238289

Top 20 Herbal Suffixes

9 0.094210714
89 0.045487236
e 0.040273283
ay 0.036857247
y 0.036857247
am 0.03613808
ae 0.029126214
an 0.028047465
oe 0.024631428
79 0.023552679
oy 0.023013305
8 0.023013305
o 0.020316433
ap 0.019417476
c89 0.018878102
c9 0.017979145
s 0.017799353
m 0.015462064
o89 0.014383315
19 0.01366415

This suggests the following (partial) cipher :

VMs Latin
=== =====
9 s
8 i
7 u
e m
a r
o a
y um
m is

1 t 
4 qu
c e
g f
k c
2 d
s p
h n
3 h

Top 20 VMs words translated

am -> ris
ay -> rum
ae -> rm
1c89 -> teis
4ohC9 -> quan?s
1c9 -> tes
oe -> am
4oham -> quanris
8am -> iris
4ohan -> quanr?
oham -> anris
okam -> acris
oy -> aum
an -> r?
ohan -> anr?
e -> m
2c89 -> dkis
1c79 -> tkus
ohC9 -> an?s
okay -> acrum

Looking for longer repeating character sequences

In this analysis, the software looks in the text for all nGrams that appear at least twice as a) a prefix, or b) as a suffix or at least once as a stem, and calculates their (normalised) frequencies. I’m not sure what to make of the results!

For N=3, looking at the Herbal folios f1v-f20v inclusive, 1331 different words. 

Confirmed valid prefix/stem/suffix counts 99 252 111 
Prefix/Stem/Suffix frequency, normalised 
4ok     0.1010101               o89     0.05952381              o89     0.09009009 
4oh     0.07070707              1oe     0.055555556             8am     0.09009009 
1oe     0.060606062             4ok     0.055555556             1c9     0.054054055 
1oh     0.04040404              8am     0.04761905              1oy     0.054054055 
ok1     0.04040404              4oh     0.04761905              1oe     0.045045044 
8oe     0.030303031             1oy     0.03968254              coe     0.036036037 
1oy     0.030303031             1c9     0.031746034             cc9     0.027027028 
1co     0.030303031             1co     0.023809524             e89     0.027027028 
1ok     0.030303031             8oe     0.023809524             ham     0.027027028 
4oj     0.030303031             coe     0.01984127              2c9     0.027027028 

For N=3, processing the same number of different words from Thomas Hardy (English) 

Confirmed valid prefix/stem/suffix counts 87 160 67 
Prefix/Stem/Suffix frequency, normalised 
com     0.04597701              ely     0.025           ing     0.07462686 
par     0.022988506             ted     0.025           led     0.04477612 
rea     0.022988506             led     0.025           sed     0.04477612 
mot     0.022988506             sed     0.025           ely     0.04477612 
pla     0.022988506             ght     0.025           ted     0.029850746 
see     0.022988506             ing     0.01875         ter     0.029850746 
pas     0.022988506             ked     0.01875         son     0.029850746 
wai     0.022988506             per     0.01875         ned     0.029850746 
can     0.022988506             com     0.01875         ner     0.029850746 
smi     0.022988506             par     0.01875         mon     0.029850746 

For N=3, same number of words from Augustinus (Latin) 

Confirmed valid prefix/stem/suffix counts 102 197 83 
Prefix/Stem/Suffix frequency, normalised 
qua     0.039215688             ere     0.05076142              ere     0.04819277 
fac     0.029411765             qua     0.035532996             iat     0.04819277 
qui     0.029411765             fac     0.02538071              que     0.036144577 
dic     0.029411765             ita     0.02538071              ius     0.036144577 
pot     0.029411765             ius     0.02538071              ita     0.036144577 
ter     0.019607844             que     0.020304568             rum     0.024096385 
ali     0.019607844             dic     0.020304568             ent     0.024096385 
aud     0.019607844             ini     0.020304568             ram     0.024096385 
par     0.019607844             ans     0.015228426             unt     0.024096385 
cor     0.019607844             ent     0.015228426             ris     0.024096385 

For N=4 Voynich (statistics become poorer as N increases, of course) 

Confirmed valid prefix/stem/suffix counts 6 14 6 
Prefix/Stem/Suffix frequency, normalised 
4oko    0.16666667              o8ae    0.14285715              co89    0.16666667 
okam    0.16666667              okam    0.14285715              e8am    0.16666667 
oh2o    0.16666667              4ok1    0.071428575             o8an    0.16666667 
4okc    0.16666667              4oh1    0.071428575             e2oe    0.16666667 
k2co    0.16666667              co89    0.071428575             9koy    0.16666667 
4ohC    0.16666667              4oko    0.071428575             oKoy    0.16666667 
4ok1    0.0                     e8am    0.071428575             1o89    0.0 
4oh1    0.0                     oh2o    0.071428575             oe89    0.0 
ok1c    0.0                     o8an    0.071428575             o8ae    0.0 
ohoe    0.0                     4okc    0.071428575             ho89    0.0 

For N=4 English 

Confirmed valid prefix/stem/suffix counts 36 66 26 
Prefix/Stem/Suffix frequency, normalised 
pres    0.055555556             ined    0.045454547             sing    0.115384616 
dist    0.055555556             ring    0.045454547             ined    0.115384616 
weak    0.055555556             test    0.045454547             ally    0.07692308 
occa    0.055555556             ment    0.030303031             ring    0.03846154 
outl    0.027777778             pres    0.030303031             ence    0.03846154 
prob    0.027777778             sing    0.030303031             nded    0.03846154 
ment    0.027777778             weak    0.030303031             ding    0.03846154 
cons    0.027777778             prob    0.030303031             ning    0.03846154 
atte    0.027777778             hern    0.030303031             ness    0.03846154 
stan    0.027777778             sion    0.030303031             wing    0.03846154 

For N=4 Latin 

Confirmed valid prefix/stem/suffix counts 63 126 57 
Prefix/Stem/Suffix frequency, normalised 
faci    0.06349207              bant    0.03968254              ntes    0.0877193 
pecc    0.04761905              ntes    0.03968254              quam    0.05263158 
invo    0.031746034             faci    0.031746034             endo    0.05263158 
cred    0.031746034             pecc    0.031746034             ebam    0.03508772 
infa    0.031746034             endo    0.023809524             erem    0.03508772 
puer    0.031746034             ndis    0.023809524             iens    0.03508772 
habe    0.031746034             quam    0.023809524             ones    0.03508772 
form    0.031746034             quid    0.023809524             bant    0.01754386 
pare    0.031746034             rati    0.023809524             abam    0.01754386 
nesc    0.031746034             ibus    0.015873017             ndis    0.01754386 

For N=5 Voynich (no data satisfies selection) 

For N=5 English 

Confirmed valid prefix/stem/suffix counts 15 29 13 
Prefix/Stem/Suffix frequency, normalised 
consi   0.13333334              ation   0.06896552              ation   0.15384616 
ornam   0.13333334              consi   0.06896552              sting   0.15384616 
appea   0.06666667              ornam   0.06896552              dered   0.07692308 
dimen   0.06666667              sting   0.06896552              ality   0.07692308 
occup   0.06666667              still   0.06896552              ingly   0.07692308 
stand   0.06666667              dered   0.03448276              ental   0.07692308 
conce   0.06666667              ingly   0.03448276              rning   0.07692308 
sugge   0.06666667              dimen   0.03448276              ented   0.07692308 
diffe   0.06666667              occup   0.03448276              rence   0.07692308 
speci   0.06666667              ality   0.03448276              sions   0.07692308 

For N=5 Latin 

Confirmed valid prefix/stem/suffix counts 21 44 23 
Prefix/Stem/Suffix frequency, normalised 
volun   0.0952381               entes   0.06818182              entes   0.13043478 
pecca   0.0952381               batur   0.045454547             batur   0.08695652 
lauda   0.0952381               tibus   0.045454547             antur   0.08695652 
quaer   0.0952381               invoc   0.045454547             tibus   0.08695652 
metue   0.0952381               pecca   0.045454547             bamus   0.08695652 
invoc   0.04761905              lauda   0.045454547             torum   0.08695652 
infan   0.04761905              quaer   0.045454547             tatis   0.04347826 
inven   0.04761905              volun   0.045454547             itate   0.04347826 
nesci   0.04761905              metue   0.045454547             antes   0.04347826 
paren   0.04761905              bamus   0.045454547             bilis   0.04347826 
Here are the N=3 counts/frequency for the 1331 unique words in f1v-f20v of the Herbal: 

Confirmed valid prefix/stem/suffix counts 99 252 111 
Prefix/Stem/Suffix frequency, normalised 
4ok     10      0.1010101               o89     15      0.05952381              o89     10      0.09009009 
4oh     7       0.07070707              1oe     14      0.055555556             8am     10      0.09009009 
1oe     6       0.060606062             4ok     14      0.055555556             1c9     6       0.054054055 
1oh     4       0.04040404              8am     12      0.04761905              1oy     6       0.054054055 
ok1     4       0.04040404              4oh     12      0.04761905              1oe     5       0.045045044 
8oe     3       0.030303031             1oy     10      0.03968254              coe     4       0.036036037 
1oy     3       0.030303031             1c9     8       0.031746034             cc9     3       0.027027028 
1co     3       0.030303031             1co     6       0.023809524             e89     3       0.027027028 
1ok     3       0.030303031             8oe     6       0.023809524             ham     3       0.027027028 
4oj     3       0.030303031             coe     5       0.01984127              2c9     3       0.027027028 

(e.g. the sequence "4ok" appears 10 times at the start of a longer word (prefix)) 

N=3 for 1331 unique words in the Astrological Section 

Confirmed valid prefix/stem/suffix counts 154 346 153 
Prefix/Stem/Suffix frequency, normalised 
okc     11      0.071428575             o89     16      0.046242774             o89     13      0.08496732 
ohc     8       0.051948052             okc     11      0.031791907             cos     6       0.039215688 
4oh     7       0.045454547             8ae     11      0.031791907             8am     6       0.039215688 
9hc     7       0.045454547             1co     10      0.028901733             8ae     6       0.039215688 
oko     6       0.038961038             oko     10      0.028901733             cc9     4       0.026143791 
oka     6       0.038961038             oho     9       0.02601156              coe     4       0.026143791 
oho     5       0.032467533             ohc     8       0.023121387             o79     4       0.026143791 
1ok     5       0.032467533             oka     8       0.023121387             oh9     4       0.026143791 
oh1     5       0.032467533             4oh     8       0.023121387             c79     4       0.026143791 
1co     4       0.025974026             9hc     7       0.020231213             c89     3       0.019607844 

N=3 for 1331 unique words in the Biological Section 

Confirmed valid prefix/stem/suffix counts 124 275 124 
Prefix/Stem/Suffix frequency, normalised 
4oh     13      0.10483871              c89     26      0.094545454             c89     17      0.13709678 
4ok     10      0.08064516              4oh     20      0.07272727              c79     13      0.10483871 
4oe     8       0.06451613              c79     13      0.047272727             1c9     9       0.07258064 
oeh     6       0.048387095             4ok     12      0.043636363             C89     7       0.05645161 
oe1     5       0.04032258              1c9     11      0.04                    2c9     7       0.05645161 
ohc     4       0.032258064             2c9     9       0.03272727              189     4       0.032258064 
soe     4       0.032258064             4oe     8       0.02909091              eoy     3       0.024193548 
oe2     3       0.024193548             oeh     7       0.025454545             cc9     3       0.024193548 
91c     3       0.024193548             8ae     7       0.025454545             hC9     3       0.024193548 
8ay     3       0.024193548             8ay     7       0.025454545             ae9     3       0.024193548 

N=3 for 1331 unique words in the Recipes Section 

Confirmed valid prefix/stem/suffix counts 135 303 143 
Prefix/Stem/Suffix frequency, normalised 
4oh     17      0.12592593              4oh     18      0.05940594              c89     13      0.09090909 
4ok     14      0.1037037               4ok     17      0.05610561              o89     13      0.09090909 
ohc     9       0.06666667              o89     16      0.052805282             189     8       0.055944055 
okc     8       0.05925926              c89     15      0.04950495              c79     7       0.04895105 
oeh     7       0.05185185              oeh     10      0.0330033               8am     7       0.04895105 
1co     5       0.037037037             1co     10      0.0330033               8ay     6       0.04195804 
g1c     4       0.02962963              ohc     9       0.02970297              coe     5       0.034965035 
4oj     4       0.02962963              c79     9       0.02970297              8ae     5       0.034965035 
ohC     4       0.02962963              8ae     9       0.02970297              1c9     4       0.027972028 
1oe     3       0.022222223             189     9       0.02970297              cc9     4       0.027972028 

Philip Neal’s Anagram Encryption

Notice how words tend to start with “4”, “o” and “1” and tend to end with “9”, “m” and “e”. This sort of feature has me excited about Philip Neal’s anagram encryption idea explained here: which is summarised thus (quoting from that page):

  "1. Divide a plaintext into lines 
   2. Sort the words of each line into alphabetical order 
   3. Sort the letters of each word into alphabetical order 

   1. one thing led to another thing last night 
   2. another last led night one to thing thing 
   3. aehnort alst del ghint eno ot ghint ghint" 

Right now I am repurposing my Genetic Algorithm to attach some lines of the VMs assuming such an encryption – I am killed by the permutations (which go as factorial the length of the word).