The aim of this page is to help anybody who wants to understand what is really happening when the cyrillic alphabet is displayed on screen - and why sometimes it doesn't work as nicely as it should.
Warning: I'm not an expert on this matter. I've learned about the subject along with writing this page. All corrections are welcome.
First of all: this A is not an A
It is true that a computer can play music and display pictures. However, all a computer can store is numbers. All songs and images you have on the computer are, at the end of the day, a sequence of numbers written on the disk. After all, you can't wait too much from a simple box full of metal, plastic and silicon.
Not even text files are stored in memory directly as sequences of letters. They are also stored as sequences of numbers. Programs such as text editors or web browsers are programmed to read these numbers from disk and transform them into letters which are displayed on screen.
Wouldn't it be easier to store letters as letters? Not really. It is much easier to store numbers. Numbers are simple and universal. Letters are much more arbitrary.
Maybe you now understand what I mean when I say: this A is not an A. This A, as stored in the hard disk, is really a number (a 41, in fact) your web browser has translated into an understandable form for you. If you want to know which numbers correspond to the letters of the latin alphabet, google for ASCII charts. Make sure you see number 41 next to capital letter A. The encoding according to which letter 41 represents letter A in memory is called ASCII.
Which are the numbers for the Cyrillic letters?
Fortunately, all people nowadays use ASCII to represent letters in the computer. I mean, nobody has decided that it would be better to make a letter C appear on screen each time a 41 is read from disk, instead of the A most people use. Nobody has created an alternative encoding, and this is good because text files can be used in different computers without going through much trouble. There is a standard encoding to represent the latin alphabet.
Here come the bad news: there is not a standard encoding for representing Cyrillic letters in the computer.
Windows-1251
This is the chart for Windows-1251 encoding.
| Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| А | 0xC0 | 192 | К | 0xCA | 202 | Х | 0xD5 | 213 | а | 0xE0 | 224 | к | 0xEA | 234 | х | 0xF5 | 245 |
| Б | 0xC1 | 193 | Л | 0xCB | 203 | Ц | 0xD6 | 214 | б | 0xE1 | 225 | л | 0xEB | 235 | ц | 0xF6 | 246 |
| В | 0xC2 | 194 | М | 0xCC | 204 | Ч | 0xD7 | 215 | в | 0xE2 | 226 | м | 0xEC | 236 | ч | 0xF7 | 247 |
| Г | 0xC3 | 195 | Н | 0xCD | 205 | Ш | 0xD8 | 216 | г | 0xE3 | 227 | н | 0xED | 237 | ш | 0xF8 | 248 |
| Д | 0xC4 | 196 | О | 0xCE | 206 | Щ | 0xD9 | 217 | д | 0xE4 | 228 | о | 0xEE | 238 | щ | 0xF9 | 249 |
| Е | 0xC5 | 197 | П | 0xCF | 207 | Ъ | 0xDA | 218 | е | 0xE5 | 229 | п | 0xEF | 239 | ъ | 0xFA | 250 |
| Ё | 0xA8 | 168 | Р | 0xD0 | 208 | Ы | 0xDB | 219 | ё | 0xB8 | 184 | р | 0xF0 | 240 | ы | 0xFB | 251 |
| Ж | 0xC6 | 198 | С | 0xD1 | 209 | Ь | 0xDC | 220 | ж | 0xE6 | 230 | с | 0xF1 | 241 | ь | 0xFC | 252 |
| З | 0xC7 | 199 | Т | 0xD2 | 210 | Э | 0xDD | 221 | з | 0xE7 | 231 | т | 0xF2 | 242 | э | 0xFD | 253 |
| И | 0xC8 | 200 | У | 0xD3 | 211 | Ю | 0xDE | 222 | и | 0xE8 | 232 | у | 0xF3 | 243 | ю | 0xFE | 254 |
| Й | 0xC9 | 201 | Ф | 0xD4 | 212 | Я | 0xDF | 223 | й | 0xE9 | 233 | ф | 0xF4 | 244 | я | 0xFF | 255 |
Each entry in the table has three columns: the Cyrillic letter and its numeric equivalent in the Windows-1251 encoding, which is a very common encoding (that's why I've shown you this chart). Both the decimal and hexadecimal representation of the letter appear on the chart. Decimal numbers are simply numbers as you know them. Hexadecimal numbers can be thought of as the numerical system we would use if we had 16 fingers (google for more detailed explanations). Hexadecimal is much more used than decimal in the encoding world, because it is more closely related to the system a computer uses to store numbers.
Look at the decimal column. Each Cyrillic letter, starting from А, has been assigned a number from 192 on. Only Ё and ё interrupt this sequence. Ё is a special letter, isn't it? It is always stressed, but Russians don't always write is with the two dots above, which means that one has to guess that a е isn't really an е but a soft o and has to be read like an o. But, sometimes, an е is really an ё which has lost the two dots and has to be read like an е... enough about this.
Sometimes, Russian looks like anything but Russian. Why?
Perhaps you have downloaded Russian music, and your playlist looks like this:
Perhaps you've come across pages looking like this while surfing in the Russian web (click to enlarge):
With what you know, you can make an educated guess of why this happens. This chart gives even more clues:
| Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| À | 0xC0 | 192 | Ê | 0xCA | 202 | Õ | 0xD5 | 213 | à | 0xE0 | 224 | ê | 0xEA | 234 | õ | 0xF5 | 245 |
| Á | 0xC1 | 193 | Ë | 0xCB | 203 | Ö | 0xD6 | 214 | á | 0xE1 | 225 | ë | 0xEB | 235 | ö | 0xF6 | 246 |
| Â | 0xC2 | 194 | Ì | 0xCC | 204 | × | 0xD7 | 215 | â | 0xE2 | 226 | ì | 0xEC | 236 | ÷ | 0xF7 | 247 |
| Ã | 0xC3 | 195 | Í | 0xCD | 205 | Ø | 0xD8 | 216 | ã | 0xE3 | 227 | í | 0xED | 237 | ø | 0xF8 | 248 |
| Ä | 0xC4 | 196 | Î | 0xCE | 206 | Ù | 0xD9 | 217 | ä | 0xE4 | 228 | î | 0xEE | 238 | ù | 0xF9 | 249 |
| Å | 0xC5 | 197 | Ï | 0xCF | 207 | Ú | 0xDA | 218 | å | 0xE5 | 229 | ï | 0xEF | 239 | ú | 0xFA | 250 |
| ¨ | 0xA8 | 168 | Ð | 0xD0 | 208 | Û | 0xDB | 219 | ¸ | 0xB8 | 184 | ð | 0xF0 | 240 | û | 0xFB | 251 |
| Æ | 0xC6 | 198 | Ñ | 0xD1 | 209 | Ü | 0xDC | 220 | æ | 0xE6 | 230 | ñ | 0xF1 | 241 | ü | 0xFC | 252 |
| Ç | 0xC7 | 199 | Ò | 0xD2 | 210 | Ý | 0xDD | 221 | ç | 0xE7 | 231 | ò | 0xF2 | 242 | ý | 0xFD | 253 |
| È | 0xC8 | 200 | Ó | 0xD3 | 211 | Þ | 0xDE | 222 | è | 0xE8 | 232 | ó | 0xF3 | 243 | þ | 0xFE | 254 |
| É | 0xC9 | 201 | Ô | 0xD4 | 212 | ß | 0xDF | 223 | é | 0xE9 | 233 | ô | 0xF4 | 244 | ÿ | 0xFF | 255 |
Read the title of this chart. Do you get it?
UTF-8
UTF-8 is what this site uses to represent Cyrillic. Here is the chart:
| Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec | Letter | Hex | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| А | U+0410 | 1040 | К | U+041A | 1050 | Х | U+0425 | 1061 | а | U+0430 | 1072 | к | U+043A | 1082 | х | U+0445 | 1093 |
| Б | U+0411 | 1041 | Л | U+041B | 1051 | Ц | U+0426 | 1062 | б | U+0431 | 1073 | л | U+043B | 1083 | ц | U+0446 | 1094 |
| В | U+0412 | 1042 | М | U+041C | 1052 | Ч | U+0427 | 1063 | в | U+0432 | 1074 | м | U+043C | 1084 | ч | U+0447 | 1095 |
| Г | U+0413 | 1043 | Н | U+041D | 1053 | Ш | U+0428 | 1064 | г | U+0433 | 1075 | н | U+043D | 1085 | ш | U+0448 | 1096 |
| Д | U+0414 | 1044 | О | U+041E | 1054 | Щ | U+0429 | 1065 | д | U+0434 | 1076 | о | U+043E | 1086 | щ | U+0449 | 1097 |
| Е | U+0415 | 1045 | П | U+041F | 1055 | Ъ | U+042A | 1066 | е | U+0435 | 1077 | п | U+043F | 1087 | ъ | U+044A | 1098 |
| Ё | U+0401 | 1025 | Р | U+0420 | 1056 | Ы | U+042B | 1067 | ё | U+0451 | 1105 | р | U+0440 | 1088 | ы | U+044B | 1099 |
| Ж | U+0416 | 1046 | С | U+0421 | 1057 | Ь | U+042C | 1068 | ж | U+0436 | 1078 | с | U+0441 | 1089 | ь | U+044C | 1100 |
| З | U+0417 | 1047 | Т | U+0422 | 1058 | Э | U+042D | 1069 | з | U+0437 | 1079 | т | U+0442 | 1090 | э | U+044D | 1101 |
| И | U+0418 | 1048 | У | U+0423 | 1059 | Ю | U+042E | 1070 | и | U+0438 | 1080 | у | U+0443 | 1091 | ю | U+044E | 1102 |
| Й | U+0419 | 1049 | Ф | U+0424 | 1060 | Я | U+042F | 1071 | й | U+0439 | 1081 | ф | U+0444 | 1092 | я | U+044F | 1103 |
Observe that Ё and ё are exceptions - again.
By the way, if you ask "which letters are represented by numbers 192 to 255 (or, in hexadecimal, C0 to FF) in this UTF-8 encoding?" the answer is "Look at the previous second table".
