Skip navigation

Monthly Archives: April 2009

Earlier version, in case you missed it, including explanation and motivation.

I write this now because I have a new version of the script that also includes a Ukrainian layout and an alternative encoding.  Not only does my friend type in both Russian and Ukrainian, the cyrilic character encoding can change from release to release in pidgin, the IM client I use. The latter is why I have a russian2 list. I suppose if pidgin ever goes back to the other encoding, I’ll create a ukrainian2 then.

Python 3.0

english   = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz?.,'
russian   = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
             1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050, 1067, 1045, #KLMNOPQRST
             1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
             1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
             1097, 1079, 1081, 1082, 1099, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
             1085, 1103,   46, 1102, 1073]                               #yz?.,
russian2  = [212, 200, 209, 194, 211, 192, 207, 208, 216, 206, 203, 196, #ABCDEFGHIJKL
             220, 210, 217, 199, 201, 202, 219, 197, 195, 204, 214, 215, #MNOPQRSTUVWX
             205, 223, 244, 232, 241, 226, 243, 224, 239, 240, 248, 238, #YZabcdefghij
             235, 228, 252, 242, 249, 231, 233, 234, 251, 229, 227, 236, #klmnopqrstuv
             246, 247, 237, 255, 44, 225, 254]                           #wxyz?.,
ukrainian = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
             1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050,  178, 1045, #KLMNOPQRST
             1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
             1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
             1097, 1079, 1081, 1082,  179, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
             1085, 1103,   44, 1102, 1073]                               #yz?.,

print ('Copy the Russian/Ukrainian text and press enter.  Enter to exit.')
s = input()
while s != '':
    if chr(178) in s or chr(179) in s:
        #ukrainian
        for i,r in enumerate(ukrainian):
            s = s.replace(chr(r),english[i])
    else:
        #russian
        for i,r in enumerate(russian):
            s = s.replace(chr(r),english[i])
            s = s.replace(chr(russian2[i]),english[i])
    print (s)
    s = input()

Note that even if she types in Ukrainian without using the letter s,  the Russian map will do the job fine because s is the only letter that differs.  Because the two encodings pidgin uses are completely independent, both replaces can be done simultaneously.

I thought it might be interesting to note how I came up with the list, so you can create a similar one for any language.

First step was to ask my friend to type an English pangram in the other language.

>>> e = 'the quick brown fox jumps over the lazy dog'
>>> u = 'еру йгшсл икщцт ащч огьз³ щмук еру дфян вщп'
>>> m = {}
>>> for n,c in enumerate(e):
	m[c]=ord(u[n])
>>> alphabet = list(m.keys())
>>> alphabet.sort()
>>> for letter in alphabet:
	print(m[letter],end=', ')
32, 1092, 1080, 1089, 1074, 1091, 1072, ...

<snip>

The space character, 32, shouldn’t be needed.  Rinse and repeat for upper case, punctuation.

My friend and I were laughing, because her mistakes are giving me interesting, fun things to do.  This akin to someone enjoying every second of cleaning spilled milk.

Follow

Get every new post delivered to your Inbox.