Home > Genetic Algorithm, n-grams, o, Vowel-less > Vowel-less plaintext

Vowel-less plaintext

Suppose the VMs words have no vowels, and that a simple alphabetic substitution has been used to create the text from vowel-less plaintext.

I used a Genetic Algorithm to test this hypothesis on some of the naked lady labels in the Balneological section. Using a large Latin dictionary, I stripped out all vowels “aeiou” from the Latin words, giving me a set of vowel-less Latin words. This was then used by the GA to try to find the best 1-1 mapping between VMs glyph and Latin.

Here is a table of the starting statistics. The “Source” is the VMs (in the Voyn_101 encoding), the Target is Latin. The second and fifth columns show the total number of occurrences of each glyph and each Latin letter, respectively, and the following columns show that number as a fraction of the total. The rows are in order of glyph/letter frequency.

There are 16 VMs glyphs, and 22 Latin letters.

16 Voynich nGrams 21 plaintext nGrams
Top 16 1-grams in Voynich and 1-grams in plaintext
Source            Target
------            ------
o    52    0.21311475    s    7666    0.14250925
e    35    0.14344262    r    7450    0.13849385
9    30    0.12295082    t    7053    0.13111371
8    27    0.11065574    n    5706    0.10607328
a    25    0.10245901    c    4386    0.08153477
h    25    0.10245901    m    4340    0.08067964
y    17    0.06967213    l    3707    0.06891231
2    6    0.024590164    p    3079    0.05723793
k    5    0.020491803    d    2790    0.051865485
c    5    0.020491803    b    1725    0.03206737
i    5    0.020491803    v    1424    0.026471846
1    4    0.016393442    f    1372    0.025505178
s    3    0.012295082    g    1347    0.025040433
N    2    0.008196721    q    600    0.0111538675
4    2    0.008196721    h    509    0.009462197
g    1    0.0040983604    x    499    0.0092763

To run the GA, I used a simple weighting function that added the square of the length of every label that was decoded into a valid plaintext word.

Here are the results of one run, where about 50% of the labels (25/53) were converted. First the derived mapping between VMs glyph and Latin consonant:

Voynich: c    1    k    2    y    i    h    s    o    a    4    8    e    N    9    g    
Plain:   l    g    c    p    f    x    v    y        t    q    n    r    d    s    b

Note that the GA has assigned VMs “o” to a null …

Now here are the deciphered labels, with the possible voweled Latin words each may correspond to:

Source  : oeae9
Decipher: rtrs' : oratorius
Source  : oe189
Decipher: rgns' : origines
Source  : oha89
Decipher: vtns
Source  : ohoeo
Decipher: vr' : varia varie ver vera vere veri vero vir viro voro avara
Source  : ohoy9
Decipher: vfs
Source  : ogoy
Decipher: bf
Source  : oeh9
Decipher: rvs' : rivos
Source  : ohaN
Decipher: vtd
Source  : ohay
Decipher: vtf
Source  : oh29
Decipher: vps
Source  : sayae
Decipher: ytftr
Source  : 8ohae
Decipher: nvtr' : invetero
Source  : 8ayoe
Decipher: ntfr
Source  : 8ae89
Decipher: ntrns' : nutriens internus
Source  : 8ae28
Decipher: ntrpn' : interpono
Source  : 8aehay
Decipher: ntrvtf
Source  : 4ohae
Decipher: qvtr
Source  : 8e9
Decipher: nrs' : inrisuo iners
Source  : oy9
Decipher: fs' : fas
Source  : ok9
Decipher: cs' : acies acsi causa causae cuius iaces iocus ocius casa casia cos
Source  : e19
Decipher: rgs' : erigis reges regius rgis rugas regis
Source  : 8ay9
Decipher: ntfs
Source  : 8ae
Decipher: ntr' : antra inter interea intereo intra intro intueor natura naturae nitor nutrio nitori enitor enutrio ianitor notoare
Source  : 8ae89
Decipher: ntrns' : nutriens internus
Source  : 4oko8
Decipher: qcn
Source  : yhae
Decipher: fvtr
Source  : 9hc89
Decipher: svlns
Source  : oeh19
Decipher: rvgs
Source  : oko89
Decipher: cns' : canis canos cinis consui consuo censeo cuneus
Source  : ohay
Decipher: vtf
Source  : ohae
Decipher: vtr' : vetera viatori vitrea veter viator
Source  : ohoe89
Decipher: vrns
Source  : ohaiya89
Decipher: vtxftns
Source  : oh1oy
Decipher: vgf
Source  : oeaiiN
Decipher: rtxxd
Source  : 8oeoe
Decipher: nrr' : narro
Source  : sohoe9
Decipher: yvrs
Source  : oeha
Decipher: rvt
Source  : h9
Decipher: vs' : evasi ovis vasa vias viis vis visa visu vos avus vas visio
Source  : soyoye
Decipher: yffr
Source  : oeoeae
Decipher: rrtr
Source  : oy
Decipher: f' : fio fui f of
Source  : 2chay
Decipher: plvtf
Source  : 989
Decipher: sns' : sanes sanies sanus senis sensa sensi sensu sonas sinus
Source  : ohc89
Decipher: vlns' : valens volans volens vulnus
Source  : eoe9
Decipher: rrs' : rarus ruris rarius
Source  : 8oiiy
Decipher: nxxf
Source  : oe29
Decipher: rps' : repsi
Source  : okc89
Decipher: clns' : colonus
Source  : ehoe
Decipher: rvr' : revera
Source  : ohoe29
Decipher: vrps
Source  : oko89
Decipher: cns' : canis canos cinis consui consuo censeo cuneus
Source  : 82c89
Decipher: nplns
  1. June 14, 2011 at 3:33 pm

    For me, this isn’t quite engaging with Voynichese: there’s strong statistical evidence of abbreviation (as per Mark Perakh, who basically asks why else A and B words should have different length profiles), while scribal vowel contraction merely presents an abjad-like face to the world without being a pure abjad. Even allowing for vowel contraction, you’re then left with the problem of the gallows characters – could they ever code for just one letter each ever in the plaintext? I think not. Lots to think about!

    • JB
      June 14, 2011 at 5:19 pm

      A vowel-less alphabet would look like abbreviation to the uninitiated. Comparing A and B isn’t much use, since the encoding could (probably is) different. (I haven’t seen Mark Perakh’s work – do you have a pointer?)

      What’s the problem with the gallows characters? What makes you believe they are so special?

      I don’t know, Nick, I just like to try new (to me) ideas, without over-analysing why they might not work 🙂

  2. June 15, 2011 at 12:48 am

    Hi Julian,

    Mark Perakh’s LSC papers 1 & 2 are here:-
    The abbreviation discussion is most of the way through the second paper, good work well worth engaging with. 🙂

    As far as the gallows go, I’m probably being too enigmatically Zen master-y here: but whereas I can see how (EVA) qo, d and y sensibly code for scribal abbreviations [subscriptio, contractio, and abbreviatio respectively], aiin steganographically codes for (say) Arabic numerals, e/ee/eee/ch/sh combinations somehow code for vowels, and o-/a-/y- pairs code for consonants, I’m still basically at a loss as to how to rationalize the gallows. That is, I can comfortably see the entire Voynichese alphabet in terms of an enciphered abbreviating scribal shorthand consistent with everything else we know about the manuscript’s history… except for the gallows. For me, the gallows are the thing that make the mystery most intense: the author(s) already had a system that was far from obvious, so why bother to crank the complexity up that extra notch by adding gallows?

    Just so you know! 🙂

    Cheers, ….Nick….

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: