PianoWow

  • RSS Google Reader Shared Items

    • Run a Total Background Check on Yourself with Free Online Tools [Privacy]
      Consumer-advocate blog Consumerist is always looking to help you keep tabs on Big Brother and any of your personal information He's tracking. Toward that end, their comprehensive list of online background-checking tools is worth a look.Photo by Charline Tetiyevsky. Why would you want to run a background check on yourself? We've offered reasons befo […]
    • Remains of the Day: Why DRM Doesn't Work Edition [For What It's Worth]
      Designer Brad Colbow details how to download an audio book from the Cleveland Public Library (or why DRM doesn't work), YouTube adds closed captioning to all videos, and a Google employee predicts that "in three years time desktops will be irrelevant."(Click the image above for a closer look.) Why DRM Doesn't Work? Another side of the sam […]
    • Officials Sue Couple Who Removed Their Lawn
      Hugh Pickens writes "The LA Times reports that Orange County officials are locked in a legal battle with a couple accused of violating city ordinances for replacing the grass on their lawn with wood chips and drought-tolerant plants, reducing their water usage from 299,221 gallons in 2007 to 58,348 gallons in 2009. The dispute began two years ago, when […]
    • G-Point Mouse Is Not a Very Good Valentine's Gift [Mouse]
      This sleek fire red mouse may be a perfect Valentine's gift. Smooth and beautiful. Until you see the whole thing from above. See what I mean? [Yanko Design] Bad Valentine is our own special take on the beauty—and awkwardness—of geek love.
    • Science Valentine
  • Twitter Feed

Russian Keyboard to English Keyboard Revisited

Posted by pianowow on April 16, 2009

Earlier version, in case you missed it, including explanation and motivation.

I write this now because I have a new version of the script that also includes a Ukrainian layout and an alternative encoding.  Not only does my friend type in both Russian and Ukrainian, the cyrilic character encoding can change from release to release in pidgin, the IM client I use. The latter is why I have a russian2 list. I suppose if pidgin ever goes back to the other encoding, I’ll create a ukrainian2 then.

Python 3.0

english   = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz?.,'
russian   = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
             1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050, 1067, 1045, #KLMNOPQRST
             1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
             1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
             1097, 1079, 1081, 1082, 1099, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
             1085, 1103,   46, 1102, 1073]                               #yz?.,
russian2  = [212, 200, 209, 194, 211, 192, 207, 208, 216, 206, 203, 196, #ABCDEFGHIJKL
             220, 210, 217, 199, 201, 202, 219, 197, 195, 204, 214, 215, #MNOPQRSTUVWX
             205, 223, 244, 232, 241, 226, 243, 224, 239, 240, 248, 238, #YZabcdefghij
             235, 228, 252, 242, 249, 231, 233, 234, 251, 229, 227, 236, #klmnopqrstuv
             246, 247, 237, 255, 44, 225, 254]                           #wxyz?.,
ukrainian = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
             1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050,  178, 1045, #KLMNOPQRST
             1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
             1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
             1097, 1079, 1081, 1082,  179, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
             1085, 1103,   44, 1102, 1073]                               #yz?.,

print ('Copy the Russian/Ukrainian text and press enter.  Enter to exit.')
s = input()
while s != '':
    if chr(178) in s or chr(179) in s:
        #ukrainian
        for i,r in enumerate(ukrainian):
            s = s.replace(chr(r),english[i])
    else:
        #russian
        for i,r in enumerate(russian):
            s = s.replace(chr(r),english[i])
            s = s.replace(chr(russian2[i]),english[i])
    print (s)
    s = input()

Note that even if she types in Ukrainian without using the letter s,  the Russian map will do the job fine because s is the only letter that differs.  Because the two encodings pidgin uses are completely independent, both replaces can be done simultaneously.

I thought it might be interesting to note how I came up with the list, so you can create a similar one for any language.

First step was to ask my friend to type an English pangram in the other language.

>>> e = 'the quick brown fox jumps over the lazy dog'
>>> u = 'еру йгшсл икщцт ащч огьз³ щмук еру дфян вщп'
>>> m = {}
>>> for n,c in enumerate(e):
	m[c]=ord(u[n])
>>> alphabet = list(m.keys())
>>> alphabet.sort()
>>> for letter in alphabet:
	print(m[letter],end=', ')
32, 1092, 1080, 1089, 1074, 1091, 1072, ...

<snip>

The space character, 32, shouldn’t be needed.  Rinse and repeat for upper case, punctuation.

My friend and I were laughing, because her mistakes are giving me interesting, fun things to do.  This akin to someone enjoying every second of cleaning spilled milk.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>