polyalphabetic | Computational Attacks on the Voynich Manuscript

Current Status

This is my personal summary of where I am at the moment, in particular which theories I’ve rejected (for better or worse!)

Theory: VMs words are anagrams of a plaintext that has been enciphered into the VMs glyphs
- Attempts to find solutions with many mappings (1- 2- 3-grams) and various languages/dictionaries fail to find even mediocre matches
- Unusual prevalence of e.g. “8am 8am 8am” not explained by this theory
Theory: VMs words are in fact pieces of plaintext words, that need to be a) combined b) deciphered
- Trials with delimiters like VMs “o” and “9” and with many mappings and languages/dictionaries fail to find good matches
- But this would explain “8am 8am 8am” at a stretch
Theory: VMs words contain numeric codes, that use a Selenus type code table, with e.g. gallows characters used as multipliers
- There are too many VMs characters: for this to work – only, say, 4 gallows characters and ten digits are needed for a minimal implementation – what are all the rest for?
- Doesn’t explain “8am 8am 8am”
Theory: VMs words are phonetic codes for a reading of the manuscript
- Mapping the words to Soundex or Double Metaphone and comparing with plaintexts produces a poor frequency match (but is this a good test – see e.g. Robert Firth’s notes)
- This could explain “8am 8am 8am”
Theory: The text is produced by a polyalphabetic cipher with rotating/repeating sequences (a la Strong)
- Multiple attempt to fit this theory using various alphabet lengths and sequence lengths fails to find a convincing match, although plausible results can be generated
- Would explain “8am 8am 8am”
Procedure: since the cipher/code/whatever it is changes at least between sections, and possibly between folios (and maybe even within a folio), examining large quantities of VMs text for statistical properties is very misleading. Only text within a single side of a folio should be tackled for decryption.

Categories: 8am 8am 8am, 9, anagrams, cipher, codes, Double Metaphone, gallows, Languages, n-grams, o, phonetic, polyalphabetic, Robert Firth, soundex, Strong Tags: 8am, 9, cipher, codes, Double Metaphone, n-grams, phonetic, polyalphabetic, soundex, Strong

Strong’s “peculiar system of a double reversed arithmetic progression of a multiple alphabet” is a puzzling description, but GC recently (Feb 2010) explained it as “”double reversed arithmetic progression” as defined by the string 1-3-5-7-9-7-5-3-1-4-7-4″ (although I think the sequence given is an example, rather than the definition). The number of alphabets is “a handful”.

If we suppose that the cipher is indeed constructed like this, then can we crack it computationally?

First we need to make some assumptions. Let’s generously assume that the number of alphabets is 10. Let’s then assume that these alphabets are rotated through in a sequence that is 17 long (the number 17 is picked since it crops up as a feature of the VMs text in many places). Let’s not assume that the sequence is double, or reversed, or anything else: it’s just a sequence of alphabet numbers. Let’s assume that each alphabet contains 21 characters: abcdefghilmnopqrstuvx

We then take a sample of VMs text (I chose the first “paragraph” of f1v)

h1s9 1o8am oe oek1c9 1ay Fax ap 9kcc9 1ay oy o19 81o eho89 oho8ay 1o89 8o H9 HoH9 29 8h2ii9 K9 hok1o89 8ae 8oe 1ohco 8aiy 8ap so1c9 1o ho89

and, equipped with a large dictionary of Latin words, we start to build a possible cipher. To do this, we start by looking at the first VMs word “h1s9”, and pick a Latin word of the same length, at random: “acri”. With this pair we can start to construct the cipher table:

Voynich        o 9 e 1 8 a h y c k 2 i K s m H p F x g &
Alphabet 0     . . . . . . a . . . . . . . . . . . . . .
Alphabet 1     . . . c . . . . . . . . . . . . . . . . .
Alphabet 2     . . . . . . . . . . . . . r . . . . . . .
Alphabet 3     . i . . . . . . . . . . . . . . . . . . .
Alphabet 4     . . . . . . . . . . . . . . . . . . . . .
Alphabet 5     . . . . . . . . . . . . . . . . . . . . .
Alphabet 6     . . . . . . . . . . . . . . . . . . . . .
Alphabet 7     . . . . . . . . . . . . . . . . . . . . .
Alphabet 8     . . . . . . . . . . . . . . . . . . . . .
Alphabet 9     . . . . . . . . . . . . . . . . . . . . .

We continue with the next word: “1o8am” and a random Latin word of the same length: “paveo”, and update the table:

Voynich        o 9 e 1 8 a h y c k 2 i K s m H p F x g &
Alphabet 0     . . . . . . a . . . . . . . . . . . . . .
Alphabet 1     . . . c . . . . . . . . . . . . . . . . .
Alphabet 2     . . . . . . . . . . . . . r . . . . . . .
Alphabet 3     . i . . . . . . . . . . . . . . . . . . .
Alphabet 4     . . . p . . . . . . . . . . . . . . . . .
Alphabet 5     a . . . . . . . . . . . . . . . . . . . .
Alphabet 6     . . . . v . . . . . . . . . . . . . . . .
Alphabet 7     . . . . . e . . . . . . . . . . . . . . .
Alphabet 8     . . . . . . . . . . . . . . o . . . . . .
Alphabet 9     . . . . . . . . . . . . . . . . . . . . .

The next word is “eo” and the random Latin word is “do”. Now the Latin letter “o” has to be placed under the Voynich “o” column in Alphabet 0:

Voynich        o 9 e 1 8 a h y c k 2 i K s m H p F x g &
Alphabet 0     . . o . . . a . . . . . . . . . . . . . .
Alphabet 1     . . . c . . . . . . . . . . . . . . . . .
Alphabet 2     . . . . . . . . . . . . . r . . . . . . .
Alphabet 3     . i . . . . . . . . . . . . . . . . . . .
Alphabet 4     . . . p . . . . . . . . . . . . . . . . .
Alphabet 5     a . . . . . . . . . . . . . . . . . . . .
Alphabet 6     . . . . v . . . . . . . . . . . . . . . .
Alphabet 7     . . . . . e . . . . . . . . . . . . . . .
Alphabet 8     . . . . . . . . . . . . . . o . . . . . .
Alphabet 9     d . . . . . . . . . . . . . . . . . . . .

We continue in this vein, picking random Latin words to match the VMs words, and attempting to place them into the cipher. This starts off easily, but rapidly becomes impossible, with the Latin words chosen: when we come to place a letter into the required column at the current alphabet in the sequence, we find that the position is already occupied by a different letter, or that the alphabet already contains that letter but in a different column.

In such cases we try to select a different Latin word to see if it will fit. If we exhaust all possible Latin words, then we backtrack to the beginning, and start afresh with a new sequence and new choices.

Most of the time, this algorithm doesn’t get further than a few words into the text before failing. Occasionally it gets quite a long way. Of course, the search space of possible Latin word combinations is staggering …

This is one of the more interesting attempts at deciphering f2v:

Voynich          o 9 e 1 8 a h y c k 2 i K s m H p F x g &
Alphabet 0       l . t a s . b o f . n . . . . . . . . . .
Alphabet 1       m i . o l b . . r . . t . . . g . . . . .
Alphabet 2       f a u e . . . s . . . i . n . . . . . . .
Alphabet 3       . o . v f . i . . n u . . . . . . e . . .
Alphabet 4       . a . h d i . . . . . . o . . . . . . . .
Alphabet 5       e s . i t o . . . . . . . . . . . . a . .
Alphabet 6       u . . . r s c . . . . . . . . . . . . . .
Alphabet 7       u . i . n b . s r . . . . . . m t . . . .
Alphabet 8       e i . . . d u . . c . . . . a . . . . . .
Alphabet 9       s . . u . . . . . n . . . . . a . . . . .
Sequence vals = 0 1 2 3 4 5 6 7 8 9 0 1 2 3 5 7 8

h1s9 1o8am oe oek1c9 1ay Fax ap 9kcc9 1ay oy o19 81o eho89 oho8ay 1o89 8o H9 HoH9 29 8h2ii9 K9 hok1o89 8ae 8oe 1ohco 8aiy 8ap so1c9 1o ho89

bono herba st muniri abs eia st infra vos eo meo diu iussi fiendo offa tu mi alga us nuntio os cuculla

Categories: cipher, Latin, polyalphabetic, Strong Tags: cipher, Latin, polyalphabetic, Strong

Computational Attacks on the Voynich Manuscript

Archive

Current Status

Current Status

Strong’s Cipher

A Caution

Recent Posts

Blogroll