## How about a “Verbose Homophonic cipher”?

I’ve had a bit of hiatus from the VMs, but it’s always popping up in my mind and niggling me, even when I haven’t got time to spend on it. The latest niggle was the idea that the VMs scribe used a set of simple tables that showed how to convert plaintext letters into codes. So, in an example table, letter “A” is written “4oh”, letter “B” is written “8am” and so on. Also, spaces in the plaintext have their own code. Veteran VMs researcher Philip Neal informed me that this is called a “verbose homophonic cipher”.

Elaborating on the idea: the scribe uses one of the set of tables for each folio s/he is writing. To encipher the plaintext onto the folio, it’s simply a matter of writing down the VMs “word” for each letter in the plaintext word. If there is more space on the line for the next plaintext word, the scribe writes down the code for space, and then the codes for the letters in the next word. Long spaces are written by writing the code for space more than once … The next line is used for the next word, and so on.

On the next folio, a different table may be used.

It’s hard to imagine the justification for such a scheme, but it does appear (at least initially) to fit some of the features of the VMs script (especially the repeating VMs words often seen).

I made a quick test that looks at VMs word frequencies on a single folio (in the Recipes section, which has the densest text). These showed a word frequency distribution that looks similar to the letter frequency distribution in Latin, apart from the most frequently occurring word (which is much more frequent) and which it is suggested would code for a space in the cipher.

However, on a typical folio, there are usually many more VMs words than there are plaintext letters. So the scheme has to be extended to allow the scribe a **choice** between several different VMs words to encode a single letter. Each table must have a set of words appearing in each plaintext letter column. Something like this:

Plaintext | (space) | a | b | … |

VMs words | 8am ay okoe | 4ohoe 2ay 1coe | faiis 4ay oka | … |

If this is indeed the scheme, one would expect to see patterns in the VMs word sequences that match patterns seen in the letter sequences of e.g. Latin words. Also, as Philip Neal pointed out, patterns like “word1 word2 word2 word1” would indicate a plaintext letter sequence of either “vowel consonant consonant vowel” or vice versa.

Looking through the whole of the VMs for sequence patterns (on the same line of text), I found the following:

- There are no 4 word sequences that repeat at all
- There are only four 3 word sequences that repeat, and each only twice
- There are no sequences at all of the form “xyyx”

(all of which I find rather surprising, and thought provoking).

So it looks like this hypothesis is dead in the water, and can be ticked off that long list of “things it might have been but in fact don’t fit”!

(It turns out that Elmar Vogt has been working on a related, but more sophisticated, idea which he describes on his blog and is called a “Stroke Theory”.)

## The Cipher Black Box

This is how the cipher works: a plaintext is fed in to the “Black Box” and, based on a set of unknown parameters P, is enciphered by an unknown process in the box into the ciphertext, which is written onto the folio.

The internal mechanism of the Black Box is unknown. The only information we have is the output ciphertext (but we have a lot of it).

What strategies can we employ to discover how the internal mechanism of the Black Box works? What are the input parameters to the cipher? Can we avoid having to find out about the mechanism, and directly reverse the cipher, so yielding the plaintext from the ciphertext?

If we could “prod” the Box as enciphering was taking place (e.g. by twiddling one of the input parameters P_{i}), then by observing the output ciphertext, we could start to build a theory about how the cipher works.

Suppose we devise a candidate cipher that, from a sequence of random plaintext letters, and using a set of parameters we choose, generates a chosen line of VMs text exactly. Mathematically:

ciphertext = C(plaintext,P_{0}…P_{i})

where C is the cipher function. We can surely find a function that satisfies these conditions: some sort of State Machine is perhaps most appropriate. When twiddling one of the parameters P_{i}, and measuring the change in the ciphertext, we are able to calculate the partial derivative of C with respect to P_{i}:

δC/δP_{i} = Δ(ciphertext)

If the variation on the ciphertext produces new ciphertext that is compatible with the rest of the ciphertext in the VMs, then this would suggest that P_{i} is a good parameter (i.e. that it is a candidate for retention in the candidate cipher). If, on the other hand, the variation produces invalid new ciphertext, the parameter is poor, and is a candidate for removal from the candidate cipher.

## Potential algorithm:

- Create a state machine containing a cipher C that takes (many) Parameters P
- The state machine parameters are adjusted so that a (random) plaintext input string produces a valid sequence of VMs ciphertext
- Each parameter is varied slightly in value, and the state machine asked to reprocess the plaintext to produce new ciphertext
- The similarity of the new ciphertext is compared with the VMs corpus
- The parameter in question is demoted or promoted in weight
- Once the effects of all parameters have been examined they are graded by weight
- Good parameters are kept, poor ones discarded
- Develop an overall score for the state machine, based on a suitable metric
- Memorize this state machine if it has the best score so far
- Go to step 1, but now with the reduced set of parameters

This process is repeated for many different trial state machines, and the best memorized.

## Inverting the Cipher

TBA

## Strong’s Cipher

Strong’s “peculiar system of a double reversed arithmetic progression of a multiple alphabet” is a puzzling description, but GC recently (Feb 2010) explained it as “”double reversed arithmetic progression” as defined by the string 1-3-5-7-9-7-5-3-1-4-7-4″ (although I think the sequence given is an example, rather than the definition). The number of alphabets is “a handful”.

If we suppose that the cipher is indeed constructed like this, then can we crack it computationally?

First we need to make some assumptions. Let’s generously assume that the number of alphabets is 10. Let’s then assume that these alphabets are rotated through in a sequence that is 17 long (the number 17 is picked since it crops up as a feature of the VMs text in many places). Let’s not assume that the sequence is double, or reversed, or anything else: it’s just a sequence of alphabet numbers. Let’s assume that each alphabet contains 21 characters: abcdefghilmnopqrstuvx

We then take a sample of VMs text (I chose the first “paragraph” of f1v)

h1s9 1o8am oe oek1c9 1ay Fax ap 9kcc9 1ay oy o19 81o eho89 oho8ay 1o89 8o H9 HoH9 29 8h2ii9 K9 hok1o89 8ae 8oe 1ohco 8aiy 8ap so1c9 1o ho89

and, equipped with a large dictionary of Latin words, we start to build a possible cipher. To do this, we start by looking at the first VMs word “h1s9”, and pick a Latin word of the same length, at random: “acri”. With this pair we can start to construct the cipher table:

Voynich o 9 e 1 8 a h y c k 2 i K s m H p F x g & Alphabet 0 . . . . . . a . . . . . . . . . . . . . . Alphabet 1 . . . c . . . . . . . . . . . . . . . . . Alphabet 2 . . . . . . . . . . . . . r . . . . . . . Alphabet 3 . i . . . . . . . . . . . . . . . . . . . Alphabet 4 . . . . . . . . . . . . . . . . . . . . . Alphabet 5 . . . . . . . . . . . . . . . . . . . . . Alphabet 6 . . . . . . . . . . . . . . . . . . . . . Alphabet 7 . . . . . . . . . . . . . . . . . . . . . Alphabet 8 . . . . . . . . . . . . . . . . . . . . . Alphabet 9 . . . . . . . . . . . . . . . . . . . . .

We continue with the next word: “1o8am” and a random Latin word of the same length: “paveo”, and update the table:

Voynich o 9 e 1 8 a h y c k 2 i K s m H p F x g & Alphabet 0 . . . . . . a . . . . . . . . . . . . . . Alphabet 1 . . . c . . . . . . . . . . . . . . . . . Alphabet 2 . . . . . . . . . . . . . r . . . . . . . Alphabet 3 . i . . . . . . . . . . . . . . . . . . . Alphabet 4 . . . p . . . . . . . . . . . . . . . . . Alphabet 5 a . . . . . . . . . . . . . . . . . . . . Alphabet 6 . . . . v . . . . . . . . . . . . . . . . Alphabet 7 . . . . . e . . . . . . . . . . . . . . . Alphabet 8 . . . . . . . . . . . . . . o . . . . . . Alphabet 9 . . . . . . . . . . . . . . . . . . . . .

The next word is “eo” and the random Latin word is “do”. Now the Latin letter “o” has to be placed under the Voynich “o” column in Alphabet 0:

Voynich o 9 e 1 8 a h y c k 2 i K s m H p F x g & Alphabet 0 . . o . . . a . . . . . . . . . . . . . . Alphabet 1 . . . c . . . . . . . . . . . . . . . . . Alphabet 2 . . . . . . . . . . . . . r . . . . . . . Alphabet 3 . i . . . . . . . . . . . . . . . . . . . Alphabet 4 . . . p . . . . . . . . . . . . . . . . . Alphabet 5 a . . . . . . . . . . . . . . . . . . . . Alphabet 6 . . . . v . . . . . . . . . . . . . . . . Alphabet 7 . . . . . e . . . . . . . . . . . . . . . Alphabet 8 . . . . . . . . . . . . . . o . . . . . . Alphabet 9 d . . . . . . . . . . . . . . . . . . . .

We continue in this vein, picking random Latin words to match the VMs words, and attempting to place them into the cipher. This starts off easily, but rapidly becomes impossible, with the Latin words chosen: when we come to place a letter into the required column at the current alphabet in the sequence, we find that the position is already occupied by a different letter, or that the alphabet already contains that letter but in a different column.

In such cases we try to select a different Latin word to see if it will fit. If we exhaust all possible Latin words, then we backtrack to the beginning, and start afresh with a new sequence and new choices.

Most of the time, this algorithm doesn’t get further than a few words into the text before failing. Occasionally it gets quite a long way. Of course, the search space of possible Latin word combinations is staggering …

This is one of the more interesting attempts at deciphering f2v:

Voynich o 9 e 1 8 a h y c k 2 i K s m H p F x g & Alphabet 0 l . t a s . b o f . n . . . . . . . . . . Alphabet 1 m i . o l b . . r . . t . . . g . . . . . Alphabet 2 f a u e . . . s . . . i . n . . . . . . . Alphabet 3 . o . v f . i . . n u . . . . . . e . . . Alphabet 4 . a . h d i . . . . . . o . . . . . . . . Alphabet 5 e s . i t o . . . . . . . . . . . . a . . Alphabet 6 u . . . r s c . . . . . . . . . . . . . . Alphabet 7 u . i . n b . s r . . . . . . m t . . . . Alphabet 8 e i . . . d u . . c . . . . a . . . . . . Alphabet 9 s . . u . . . . . n . . . . . a . . . . . Sequence vals = 0 1 2 3 4 5 6 7 8 9 0 1 2 3 5 7 8

h1s9 1o8am oe oek1c9 1ay Fax ap 9kcc9 1ay oy o19 81o eho89 oho8ay 1o89 8o H9 HoH9 29 8h2ii9 K9 hok1o89 8ae 8oe 1ohco 8aiy 8ap so1c9 1o ho89bono herba st muniri abs eia st infra vos eo meo diu iussi fiendo offa tu mi alga us nuntio os cuculla