Using t-distributed Stochastic Neighbor Embedding (TSNE) to cluster folios
For this attack we’ll use the Takeshi EVA transcription to count the number of times each glyph appears on each folio. This gives us a vector of probabilities for each glyph, for each folio – the vectors are 24 long, as there are 24 EVA glyphs in the alphabet.
For example, here is the probability vector for f1r:
1r 28 lines {‘a’: 0.08917835671342686, ‘c’: 0.08216432865731463, ‘e’: 0.05110220440881764, ‘d’: 0.06212424849699399, ‘f’: 0.00501002004008016, ‘i’: 0.08617234468937876, ‘h’: 0.12324649298597194, ‘k’: 0.045090180360721446, ‘*’: 0.012024048096192385, ‘m’: 0.001002004008016032, ‘l’: 0.03507014028056112, ‘o’: 0.11923847695390781, ‘n’: 0.050100200400801605, ‘p’: 0.012024048096192385, ‘s’: 0.06412825651302605, ‘r’: 0.04408817635270541, ‘t’: 0.03907815631262525, ‘y’: 0.07915831663326653}
(This reads as glyph “a” appears 8.9% of the time on f1r, glyph “c” 8.2% of the time, and so on.)
The question is: how similar are these frequency distributions amongst all the folios? Using tSNE (implemented in Scikit learn here: http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) we can try to find a 3D arrangement of all the folios that minimises the glyph frequency vector difference between nearby folios.
Here’s a typical result: each folio appears as a point in 3D space …
The colour coding is: red dots are folios that Currier identified as “Language A”, blue are “Language B”, and the remaining black dots do not have an assignment.
It’s clear that the red and blue are well separated, reinforcing Currier’s assignments. Thus this is independent support of Currier’s theory.
There are a couple of notable features:
- f57r and f57v are labelled as Language A (red) – but it looks like they should be labelled as Language B (blue)
- The unassigned folios (black dots) look like they are all Language B
Folios of the Voynich Manuscript, Glyph Colours
Puzzles of the Voynich Manuscript
I just published the guide “Puzzles of the Voynich Manuscript” as an ebook on Amazon. A paperback version is also available. From the blurb:
This illustrated guide to the Voynich Manuscript is targeted mainly at those who have recently come across the book and are wondering what all the fuss is about, and why, after more than a century of effort, nobody has cracked its code yet. It should also be useful as a set of tests for those who believe they may have cracked the code, so that they can see how their solution matches up against each of the puzzles or notable features described. And finally, it is hopefully of interest to those already familiar with the manuscript – perhaps they will find something new or thought provoking within.
Readers of this blog, who tend to be Voynich experts already, will probably not find much (if anything) new in the guide, as it is principally intended for newcomers to the Manuscript.
Edit Distance for Word Positions
The edit distance between two words is the number of edits needed to convert between the words. For example, the edit distance between “banana” and “bahama” is 2.
I looked at the average edit distance (the Levenshtein measure) between words on each line of each folio in the Herbal A and Herbal B sections. Here are the results:
How to interpret these plots
There is one square per word and line position: the top left square corresponds to the average edit distance between word 1 and word 2 on all the folios. The next square in the that row corresponds to the average edit distance between word 2 and word 3 on the folios.
Each square in the plot has a shade of gray: the darker the shade, the bigger the average edit distance.
One conclusion is that for both sets of folios, there is a big edit distance between the first and second words on the folios: the words are very dissimilar.
Another conclusion is that similar words (lighter shade of gray) tend not to occur in the first line, or as the first words.
Alfonso X’s Lapidario: Stones, Stars and Colours
I’ve been down a bit of rabbit hole over the last couple of days which others may have already been down. I was looking once more at the Zodiac folios, in particular Taurus. The Taurus Light and Dark folios are both marked “may” in, as often remarked, a later hand. There are 15 figures in each Taurus folio, for a total of 30. However, as we well know, May has 31 days, so the figures probably don’t represent days. I thus went in search of 30-way splits of Zodiac signs ….
Alfonso X’s Lapidario
Looking at this old Spanish illustrated manuscript:
“Tratados de Alfonso X sobre astrología y sobre las propiedades de las piedras”
which is a treatise on astrology and the importance of stones/gems etc., we can see a circular Taurus diagram with 30 divisions.
Each of these divisions is associated with a stone, of a noted colour, and one or a few stars in a constellation. There is a lengthy description of each division, its stone, its stars, the various ailments the stone cures, when the stone should be used, et cetera. There is a Spanish transcription of the text here, which I found very useful (combined with Google Translate):
Since a plausible language match to the month spellings as written in the Zodiac folios is Occitan which at one point covered part of Spain (please correct me on this, as I’m not sure), there seems to be a compelling regional match here, but I can’t quite figure it out.
From what I’ve read, Alfonso X assembled a team of scholars from all world regions, who worked on documents on a variety of topics. This website says of the Lapidario: “The Lapidario is a thirteenth century Castilian translation sponsored by King Alfonso X el Sabio, the Learned. The translation was done from an Arabic text which in turn is said to have been translated by the mysterious Abolays from an ancient text in the “Chaldean language””
Matching Stone Colours
Anyway, my first approach was to try to match the colours of the headgear or tunics of the clothed figures in the Taurus Light folio to the colours of the first and second fifteen stones mentioned in the Lapidario. It’s a little tricky, because although the stones are numbered, we don’t know which is figure 1 in the Taurus Light folio, and whether the inner ring precedes the outer. Even so, the patterns of colours in the stones sequence might reveal a match. I drew a blank.
Matching Stones with Voynich Star Labels
My second approach was to try to match the names of the stones with the labels on the figures, to see if there was some correlation between the label length, or its initial glyph, with the stones’ names. Very tricky.
Some of the stones that appear in the Taurus set of 30 also appear in other Zodiac signs in the Lapidario. For example, the ninth stone in Taurus is “esmeri(l)” (Latin), and esmeril is also the third stone of Libra, and the second stone of Aquarius.
This leads to the obvious question: is there a Figure in the both the Voynich Taurus and Libra roundels (Aquarius is missing) that shares the same label? If so, might that label be “esmeril”? And, are there other stones that appear in more than one sign which might be matched to duplicate stones in the Lapidario?
(As an aside, regarding the stones and colours, I was struck by the third stone of Taurus, called “camorica”, which is scarlet in colour and associated with the Pleiades.)
Another promising avenue is to compare the shapes and orientations of the stars in the constellations as they appear in the VM with how they appear in the Lapidario. Since the Pleiades are mentioned in the Lapidario, are they illustrated there, and does its illustration of the cluster match the apparent drawing of it in the VM (which differs in detail from its actual appearance in the night sky)? I need to investigate further, but my suspicion is that others have already been down this path 🙂
Chaldean Stones
I extracted the Chaldean stone names for each Zodiac sign, from the transcription I linked to above. The stones for Aries are shown below, as an example. (The whole set is available if anyone wants it.)
The first 12 sections in the Lapidario list the 30 stones for each zodiac sign, but a few signs appear to be truncated: Leo has only one stone, Pisces only has two stones, and Aquarius only 28.
There follow more sections, again one for each sign, but these each have only three stones. I’m not clear what they represent. I posted about all this at Voynich Ninja, and MarcoP was able to explain. Others also chimed in with some useful comments. The discussion is here.
Anyway, following those sections are several more that cover the stones of Saturn (4 stones), Jupiter (4 stones), Mars (4 stones), Venus (24 stones), Sun (9 stones) and Mercury (17 stones).
Here is an extract of the list I extracted for all the Zodiac stones: this is for Aries.
ARIES 1 magnitad 2 zurudica 3 gagatiz 4 miliztiz 5 centiz 6 movedor 7 goliztiz 8 telliminuz 9 milititaz 10 huye de la leche 11 alj?far 12 anetatiz 13 beruth 14 piedra de cinc 15 tira el oro 16 chupa la sangre 17 parece en la mar cuando sube Marte 18 tira el vidrio 19 annora 20 yzf 21 cuminon 22 astarnuz 23 belyniz 24 gaciuz 25 azufaratiz 26 abietityz 27 lubi 28 ceraquiz 29 berlimaz 30 annoxatir
In total I count 301 stones in the Lapidario’s Zodiac section, of which 291 are unique to a sign. The remainder appear more than once as follows:
bezaar [(9, 'G\x83MINIS'), (11, 'G\x83MINIS')] azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')] pez [(7, 'LIBRA'), (30, 'LIBRA')] plomo [(18, 'VIRGO'), (13, 'CANCRO')] calcant [(10, 'VIRGO'), (11, 'VIRGO')] aliaza [(23, 'TAURO'), (29, 'TAURO')] parece en la mar [(15, 'SAGITARIO'), (15, 'TAURO'), (17, 'G\x83MINIS'), (17, 'ACUARIO')] de la serpiente [(12, 'LIBRA'), (7, 'G\x83MINIS')]
e.g. “bezaar” is the 9th and the 11th stone in Gemini, “de la sepiente” is the 12th stone in Libra and the 7th in Gemini.
Turning to the Voynich Zodiac, I count 298 unique star labels of which 269 are unique to a sign. The labels that appear more than once are:
otal dar ['71r', '70v2'] Aries (Light) , Pisces , otal ['72r2', '73r'] Gemini , Scorpio , okeey ary ['72r1', '72r2'] Taurus (Dark) , Gemini , okal ['73v', '72r2', '72r2'] Sagittarius , Gemini , Gemini , okeos ['73v', '73r', '73r'] Sagittarius , Scorpio , Scorpio , okeoly ['70v2', '72v1'] Pisces , Libra , otaly ['70v2', '72v3', '73r'] Pisces , Leo , Scorpio , okaram ['70v2', '72r2'] Pisces , Gemini , okoly ['70v1', '72v3'] Aries (Dark) , Leo , okalar ['72r3', '72r2'] Cancer , Gemini , okary ['72v3', '73r'] Leo , Scorpio , okam ['72r2', '72v3'] Gemini , Leo , okeody ['73v', '73v', '73r', '72v2'] Sagittarius , Sagittarius , Scorpio , Virgo , ykey ['73v', '73v'] Sagittarius , Sagittarius , okaly ['70v2', '72r2', '72r2', '72v3'] Pisces , Gemini , Gemini , Leo , okaldy ['72r2', '72v3'] Gemini , Leo , otaraldy ['72r1', '72r2'] Taurus (Dark) , Gemini , otoly ['72v3', '73r'] Leo , Scorpio , oky ['73v', '72v3', '73r'] Sagittarius , Leo , Scorpio , oteody ['73v', '73v'] Sagittarius , Sagittarius , okedy ['72v1', '73r'] Libra , Scorpio ,
e.g. “otal dar” appears as a label on both the Aries(Light) and Pisces zodiac chart.
If the Voynich Zodiac charts are indeed showing stones (and the figure/star labels are their names), then there should be good matches between the two lists above.
One potential match is:
azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')] ykey ['73v', '73v'] Sagittarius , Sagittarius ,
However, the two labels “ykey” on f73v are not adjacent, which they should be if they are stones 12 and 13.
Another:
azarnech [(12, 'SAGITARIO'), (13, 'SAGITARIO')] oteody ['73v', '73v'] Sagittarius , Sagittarius ,
in this case, the two labels “oteody” on f73v are adjacent to one another, but the figures/stars they label are in the group of four at the top of the folio: it’s a stretch to think their locations are 12th and 13th.
To be continued ….
Are the Glyphs placed in specific folio locations?
Based on a lot of circumstantial evidence related to the weirdness of the Voynich text (such as the odd repeating words, the curious faintness and boldness of some glyphs, and the sometimes curious positioning of text words and lines), it appears that the folios were perhaps not written Left to Right (or Right to Left) and Top to Bottom.
Instead, suppose the scribe started each folio with a prescription: for example “put an h-Gallows at the top left, then put a c in the middle of the folio, then a 9 at the end of the last line”, and so on. This would be sort of like filling out the answers to a bizarre crossword puzzle.
If there was such a prescription, might it explain some of the Voynich text features?
In the following selected charts I’m showing a virtual folio from the Recipes section. Each chart has lines and columns. Line 1 position 1 is the top left of the folio. Let’s look at the chart folio for Glyph “o”:
Each disc indicates that the “o” appears at least twice in that location in the Recipes. The size of the disc indicates how many times it appears there: the bigger the disc, the more times it appeared. The random appearance of the chart suggests that “o” is not placed on the page in any particular pattern.
Let’s now look at the “s” glyph:
Here it is clear that this glyph vastly prefers the first column, but not the first line. It is infrequently found elsewhere on the folio. In contrast, take a look at the rare glyphs (I just call them “?”):
These abhor the early columns, and love the ends of the lines. They also seem to prefer the ends of the first lines (notice a little cluster there). Perhaps they hate the “s” glyphs…
The “4” glyph:
The gap after the first column is explained by how “4” only appears at the start of a word.
Here are some more glyphs:
No conclusions here, as usual!
Addendum: the distribution for “c”:
Entropy of the Voynich text
The Shannon Entropy of a string of text measures the information content of the text. For text that is completely random i.e. where the appearance of any character is as likely as the appearance of any other, the entropy (or “disorder”) is high. For a text which is a long string of identical characters, for example, the entropy is low.
Mathematically, the Shannon Entropy is defined as:
Entropy = –ΣiN probi * Log( probi)
where probi is the frequency of the i’th character in the text, and the sum is over all the characters.
If the Voynich text is randomly created (by whatever means), we’d expect it to have high entropy (i.e. be very disordered). What we in fact find is that the text is ordered, with low Entropy, and is rather more ordered than English, for example. The result of comparing the Voynich text with several other texts in different languages is shown in the table below.
Language | Source | Entropy |
---|---|---|
Voynich | GC’s Transcription | 3.73 |
French | Text from 1367 | 3.97 |
Latin | Cantus Planus | 4.05 |
Spanish | Medina 1543 | 4.09 |
German | Kochbuch 1553 | 4.15 |
English | Thomas Hardy | 4.21 |
Early Italian | Divine Comedy 1300 | 4.23 |
None | Random characters | 6.01 |
The last entry in the table shows the Entropy for a random text – and is getting on for double the Entropy of the Voynich.