Home > Algorithms, Languages > Single glyphs in Language A and Language B

Single glyphs in Language A and Language B

As a sanity check, I looked at single glyphs (rather than nGrams > 1), searching for the mapping that takes all the Language B glyphs and maps them to Language A glyphs, so that the Language B words converted with the mapping most closely match the frequency of Language A words. I found the following:
Chromosome  ['o', '9', '1', 'a', 'H', 'c', 'e', 'h', 'y', 'k', '2', 's', 'm', '4', 'i', '(', '8', 'p', 'g', 'n']
ngramsA     ['o', '9', '1', 'a', '8', 'c', 'e', 'h', 'y', 'k', '2', 's', 'm', '4', 'g', 'i', 'K', 'p', '?', 'n']

This shows that most Language B glyphs map to the same glyph in Language A. However, there is some mixing going on here between “H”, “8”, “i”, “g”, “(“, “K” and “?”

It occurred to me that this may be due to GC’s choice of ascribing single glyphs where there should perhaps be several. In particular, he has:
“m” which looks like “iiN”
“n” which looks like “iN”
“M” which looks like “iiiN”
(I think EVA does a better job of recognizing these.) So I adjusted the GC transcription accordingly, replacing n,m,M with the i,N combinations above.
This resulted in a new mapping for B to A:
Chromosome  ['o', '9', '1', 'a', 'i', 'g', 'c', 'y', 'k', 'e', 'h', 'N', '2', 's', '4', '(', '8', 'p', 'f', 'H']
ngramsA     ['o', '9', '1', 'a', 'i', '8', 'c', 'e', 'h', 'y', 'k', 'N', '2', 's', '4', 'g', 'K', 'p', '?', 'H']
(There may be better mappings, but this is the best so far.) This has some interesting features:
  • e and y swap between languages
  • h and k gallows swap between languages
  • some mixing of g,8,(,K,f,? – some of these are relatively rare, so the statistics are poor, which may explain the mixing.
 Note that the simplification table I’m using for Voyn_101 is currently:
    seek = ["3",   "5",    "+",  "%",   "#", "6", "7",    "A", "X",  
            "I",   "C",    "z",  "Z",   "j", "u", "d",    "U", "P", 
            "Y",   "$",    "S",  "t",   "q",
            "m",   "M",    "n",  "Y",   "!"]
    repl = ["2",   "2",    "2",  "2",   "2", "8",  "8",   "a", "y",  
            "ii",  "cc",   "iy", "iiy", "g", "f",  "ccc", "F", "ip",
            "y",   "s",    "cs", "s",   "iip",
            "iiN", "iiiN", "iN",  "y",   "2"]
(Thanks to Tony Gaffney for spotting an error in the conversion for C in a previous version.)
Advertisements
Categories: Algorithms, Languages
  1. March 4, 2013 at 1:15 pm

    “What is your best guess for y, e, p, 8? From my results so far there is something odd going on with these glyphs.”

    Julian

    “y” & “e” – two hands doing the same operation of adding the ascender or descender to a pre-existing single “i” in roughly equal proportions, one hand has a natural bias or preference for one way, the other hand the opposite way.

    The same bias/preference applies to the ”h” & “k” gallows swap
    .
    “p” & “8” – (I take it you mean “g” & “8”) no idea what’s occurring there but as the “?” maps to anything at all when it only represents unreadable characters, I’d be a little wary of what that program is actually doing.

    “(“ – I regard as just another 9, the tails vary quite a lot from sweeping curve to being straight (another nuance too far).

    Tony

    • JB
      March 4, 2013 at 1:49 pm

      Tony – great insight.

      It turned out as I ran more of these, that it’s a great way of simplifying Voyn_101 – after a while of tweaking I ended up with the following table. I’m pretty confident these are all the basic glyphs, and none are composites:

      seek = [“3”, “5”, “+”, “%”, “#”, “6”, “7”, “A”, “X”,
      “I”, “C”, “z”, “Z”, “j”, “u”, “d”, “U”, “P”,
      “Y”, “$”, “S”, “t”, “q”,
      “m”, “M”, “n”, “Y”, “!”, “)”, “*”, “b”, “J”, “E”, “x”, “B”, “D”, “T”, “Q”, “W”, “w”, “V”, “(“, “&”]
      repl = [“2”, “2”, “2”, “2”, “2”, “8”, “8”, “a”, “y”,
      “ii”, “cc”, “iy”, “iiy”, “g”, “f”, “ccc”, “F”, “ip”,
      “y”, “s”, “cs”, “s”, “iip”,
      “iiN”, “iiiN”, “iN”, “y”, “2”, “9”, “p”, “y”, “G”, “c”, “y”, “cccN”, “ccN”, “s”, “p”, “h”, “h”, “K”, “9”, “8”]

      (I agree with the “(” as you can see.)

      What I am seeing is a persistent appearance of “e” and “y” in swapped positions between the languages, and then typically a mixture of the gallows and “8” . E.g.

      Chromosome [‘o’, ‘9’, ‘1’, ‘a’, ‘i’, ‘f’, ‘c’, ‘y’, ‘h’, ‘e’, ‘K’, ‘N’, ‘2’, ‘s’, ‘4’, ‘g’, ‘p’, ‘8’, ‘k’, ‘H’]
      ngramsA [‘o’, ‘9’, ‘1’, ‘a’, ‘i’, ‘8’, ‘c’, ‘e’, ‘h’, ‘y’, ‘k’, ‘N’, ‘2’, ‘s’, ‘4’, ‘g’, ‘p’, ‘K’, ‘?’, ‘H’]

      If the “e”,”y” is a preference, why would both appear – why not just use one? Could these glyphs be indicators of a code change?

  2. March 4, 2013 at 2:32 pm

    Will look at your table later – as to why both would appear – it was to give it variety, to make it look like writing. I believe the h,k,j & g are all varieties of the same character as well.

  3. March 4, 2013 at 3:31 pm

    I think “w” should map to “f” rather than “h”.
    You’ve got the bracket “(“ that matches 9 the wrong way round – “)” stands for something else
    “V” and “K” are not the same – one begins with i and the other c, they are definitely different or distinct in that sense.
    I can’t think of any reason why 8 should map to a gallows.

    “indicators of a code change?” – I don’t believe it’s code or cipher at all.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: