Russian Keyboard to English Keyboard Revisited
Posted by pianowow on April 16, 2009
Earlier version, in case you missed it, including explanation and motivation.
I write this now because I have a new version of the script that also includes a Ukrainian layout and an alternative encoding. Not only does my friend type in both Russian and Ukrainian, the cyrilic character encoding can change from release to release in pidgin, the IM client I use. The latter is why I have a russian2 list. I suppose if pidgin ever goes back to the other encoding, I’ll create a ukrainian2 then.
Python 3.0
english = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz?.,'
russian = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050, 1067, 1045, #KLMNOPQRST
1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
1097, 1079, 1081, 1082, 1099, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
1085, 1103, 46, 1102, 1073] #yz?.,
russian2 = [212, 200, 209, 194, 211, 192, 207, 208, 216, 206, 203, 196, #ABCDEFGHIJKL
220, 210, 217, 199, 201, 202, 219, 197, 195, 204, 214, 215, #MNOPQRSTUVWX
205, 223, 244, 232, 241, 226, 243, 224, 239, 240, 248, 238, #YZabcdefghij
235, 228, 252, 242, 249, 231, 233, 234, 251, 229, 227, 236, #klmnopqrstuv
246, 247, 237, 255, 44, 225, 254] #wxyz?.,
ukrainian = [1060, 1048, 1057, 1042, 1059, 1040, 1055, 1056, 1064, 1054, #ABCDEFGHIJ
1051, 1044, 1068, 1058, 1065, 1047, 1049, 1050, 178, 1045, #KLMNOPQRST
1043, 1052, 1062, 1063, 1053, 1071, 1092, 1080, 1089, 1074, #UVWXYZabcd
1091, 1072, 1087, 1088, 1096, 1086, 1083, 1076, 1100, 1090, #efghijklmn
1097, 1079, 1081, 1082, 179, 1077, 1075, 1084, 1094, 1095, #opqrstuvwx
1085, 1103, 44, 1102, 1073] #yz?.,
print ('Copy the Russian/Ukrainian text and press enter. Enter to exit.')
s = input()
while s != '':
if chr(178) in s or chr(179) in s:
#ukrainian
for i,r in enumerate(ukrainian):
s = s.replace(chr(r),english[i])
else:
#russian
for i,r in enumerate(russian):
s = s.replace(chr(r),english[i])
s = s.replace(chr(russian2[i]),english[i])
print (s)
s = input()
Note that even if she types in Ukrainian without using the letter s, the Russian map will do the job fine because s is the only letter that differs. Because the two encodings pidgin uses are completely independent, both replaces can be done simultaneously.
I thought it might be interesting to note how I came up with the list, so you can create a similar one for any language.
First step was to ask my friend to type an English pangram in the other language.
>>> e = 'the quick brown fox jumps over the lazy dog'
>>> u = 'еру йгшсл икщцт ащч огьз³ щмук еру дфян вщп'
>>> m = {}
>>> for n,c in enumerate(e):
m[c]=ord(u[n])
>>> alphabet = list(m.keys())
>>> alphabet.sort()
>>> for letter in alphabet:
print(m[letter],end=', ')
32, 1092, 1080, 1089, 1074, 1091, 1072, ...
<snip>
The space character, 32, shouldn’t be needed. Rinse and repeat for upper case, punctuation.
My friend and I were laughing, because her mistakes are giving me interesting, fun things to do. This akin to someone enjoying every second of cleaning spilled milk.




