Article Sabit Issakhan · Aug 29, 2019 3m read

Convert Cyrillic to Latin

Hello everyone! 

Recently I search how to convert Cyrillic to Latin with Cache Object Script, but didn't find anything and decided to write ourselves, 

So here code: 

//Create Class method

ClassMethod convertRussionToEnglish(word As %String)
{

   //add array of transliteration system
    Set convertArray = $LB(
        $LB("а","a"),$LB("б","b"),$LB("в","v"),$LB("г","g"),$LB("д","d"),$LB("е","e"),$LB("ё","e"),$LB("ж","zh"),$LB("з","z"),
        $LB("и","i"),$LB("й","y"),$LB("к","k"),$LB("л","l"),$LB("м","m"),$LB("н","n"),$LB("о","o"),$LB("п","p"),
        $LB("р","r"),$LB("с","s"),$LB("т","t"),$LB("у","u"),$LB("ф","f"),$LB("х","kh"),$LB("ц","ts"),$LB("ч","ch"),
        $LB("ш","sh"),$LB("щ","shch"),$LB("ы","y"),$LB("э","e"),$LB("ю","yu"),$LB("я","ya"),$LB("ъ",""),$LB("ь",""),
        $LB("А","A"),$LB("Б","B"),$LB("В","V"),$LB("Г","G"),$LB("Д","D"),$LB("Е","E"),$LB("Ё","E"),$LB("Ж","ZH"),$LB("З","Z"),
        $LB("И","I"),$LB("Й","Y"),$LB("К","K"),$LB("Л","L"),$LB("М","M"),$LB("Н","N"),$LB("О","O"),$LB("П","P"),
        $LB("Р","R"),$LB("С","S"),$LB("Т","T"),$LB("У","U"),$LB("Ф","F"),$LB("Х","KH"),$LB("Ц","TS"),$LB("Ч","CH"),
        $LB("Ш","SH"),$LB("Щ","SHCH"),$LB("Ы","Y"),$LB("Э","E"),$LB("Ю","YU"),$LB("Я","YA"),$LB("Ъ",""),$LB("Ь","")    
    )
    

   //word Example

    Set wordToConvert = "Пример для Кода"
    Set wordToConvertLength = $L(wordToConvert)
    
    Set cnt=$ListLength(convertArray)
    Set latinWord = ""
    

    //and with cycle get each letter and parse in  transliteration array
    for i=1:1:wordToConvertLength {
        
        Set cyrillicWord = $E(wordToConvert,i)
        
        for j=1:1:cnt {
            Set codes=$ListGet(convertArray,j)
            Set cyrillicLetter=$ListGet(codes,1)
            Set latinLetter=$ListGet(codes,2)

            if cyrillicLetter=cyrillicWord {
                Set cyrillicWord = latinLetter    
            }
        }
        Set latinWord = latinWord_cyrillicWord

        
    }
    //Get result of convert
    Quit latinWord

}

Comments

Dmitry Maslennikov · Aug 29, 2019

Interesting, why you duplicated lower and uppercase, and not sure if it's good to uppercase all letters in transliterated variant, even when only this letter was in uppercase. I mean like, Юла -> YUla, looks weird. I think it should check the case of the original word, if it completely uppercase, it should uppercase resulting word, but if only first letter in upper, so, resulting string should use $zconvert(word, "W")

0
Sabit Issakhan  Aug 29, 2019 to Dmitry Maslennikov

I was looking for quick solutions for my task and  get mapping of letters from the Internet

0
Eduard Lebedyuk · Aug 29, 2019

Less searching all around:

ClassMethod getDict(Output dict)
{
    kill dict
    set dict("а")="a"
    set dict("б")="b"
    set dict("в")="v"
    set dict("г")="g"
    set dict("д")="d"
    set dict("е")="e"
    set dict("ж")="zh"
    set dict("з")="z"
    set dict("и")="i"
    set dict("й")="y"
    set dict("к")="k"
    set dict("л")="l"
    set dict("м")="m"
    set dict("н")="n"
    set dict("о")="o"
    set dict("п")="p"
    set dict("р")="r"
    set dict("с")="s"
    set dict("т")="t"
    set dict("у")="u"
    set dict("ф")="f"
    set dict("х")="kh"
    set dict("ц")="ts"
    set dict("ч")="ch"
    set dict("ш")="sh"
    set dict("щ")="shch"
    set dict("ъ")=""
    set dict("ы")="y"
    set dict("ь")=""
    set dict("э")="e"
    set dict("ю")="yu"
    set dict("я")="ya"
}

/// w ##class(Test.Cyr).convertRussionToEnglish()
ClassMethod convertRussionToEnglish(word As %String = "Привет")
{
    do ..getDict(.dict)
    set out = ""
    for i=1:1:$l(word) {
        set letter = $e(word, i)
        set letterL = $zcvt(letter, "l")
        set outLetter = dict(letterL)
        set:letter'=letterL outLetter = $zcvt(outLetter, "U")
        set out = out _ outLetter
    }
    quit out
}
0
Jon Willeke · Aug 29, 2019

There is a way to do something similar in Caché and IRIS. In a Russian locale, you have access to the "KOI8R" I/O translation table. KOI8-R has the funny property that if you mask out the high-order bit, you get a sort of readable transliteration. Here's an example using a Unicode instance in the "rusw" locale:

    USER>s koi8=$zcvt("Пример для Кода","O","KOI8R")

    USER>s ascii="" f i=1:1:$l(koi8) s ascii=ascii_$c($zb($a(koi8,i),127,1))

    USER>zw ascii
    ascii="pRIMER DLQ kODA"
0
Evgeny Shvarov · Aug 29, 2019

Mine is faster ;)

ClassMethod RussianToEnglish(russian = "привет") As %String

{

set rus="абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭьЬъЪ"

set eng="abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE"

set rus("ж")="zh"

set rus("ц")="ts"

set rus("ч")="ch"

set rus("ш")="sh"

set rus("щ")="shch"

set rus("ю")="yu"

set rus("я")="ya"

set rus("Ж")="Zh"

set rus("Ц")="Ts"

set rus("Ч")="Ch"

set rus("Ш")="Sh"

set rus("Щ")="Shch"

set rus("Ю")="Yu"

set rus("Я")="Ya"

set english=$tr(russian,rus,eng)



set wow=$O(rus(""))

while wow'="" {

set english=$Replace(english,wow,rus(wow))

set wow=$O(rus(wow))

}

return english

}

USER>w ##class(Example.ObjectScript).RussianToEnglish("Я вас любил: любовь еще, быть может, В душе моей угасла не совсем;"))
Ya vas lyubil: lyubov eshche, byt mozhet, V dushe moey ugasla ne sovsem;
USER>

0
Eduard Lebedyuk  Aug 29, 2019 to Evgeny Shvarov

Here's my new one-liner. Now 6 times faster.

ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ]
{
$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Ш","Sh"),"O","UnicodeBig"),$c(0))
}

Here's tests:

do ##class(Test.Cyr).Time()
Method: convertRussionToEnglish1, time: .009022 <- original
Method: convertRussionToEnglish2, time: .000689 <- my first idea
Method: convertRussionToEnglish3, time: .000417 <- Evgeny
Method: convertRussionToEnglish4, time: .000072 <- this version
Method: convertRussionToEnglish5, time: .000124 <- Jon
 

Compete code

Class Test.Cyr
{

/// do ##class(Test.Cyr).Time()
ClassMethod Time()
{
    set words = $lb("Дорогие", "друзья", "начало", "повседневной", "работы", "по", "формированию", "позиции", "позволяет", "оценить", "значение", "ключевых", "компонентов", "планируемого", "обновления", "Соображения", "высшего", "порядка", "а", "также", "курс", "на", "социальноориентированный", "национальный", "проект", "играет", "важную", "роль", "в", "формировании", "всесторонне", "сбалансированных", "нововведений", "Таким", "образом", "сложившаяся", "структура", "организации", "способствует", "подготовке", "и", "реализации", "позиций", "занимаемых", "участниками", "в", "отношении", "поставленных", "задач")
    for method = 1:1:5 {
        set start = $zh
        
        for i=1:1:$ll(words) {
            set result = $classmethod(,"convertRussionToEnglish" _ method , $lg(words, i))
        }
        
        set end = $zh
        
        write $$$FormatText("Method: %1, time: %2", "convertRussionToEnglish" _ method, end - start),!
        
    }
}

ClassMethod getDict(Output dict)
{
    kill dict
    set dict("а")="a"
    set dict("б")="b"
    set dict("в")="v"
    set dict("г")="g"
    set dict("д")="d"
    set dict("е")="e"
    set dict("ж")="zh"
    set dict("з")="z"
    set dict("и")="i"
    set dict("й")="y"
    set dict("к")="k"
    set dict("л")="l"
    set dict("м")="m"
    set dict("н")="n"
    set dict("о")="o"
    set dict("п")="p"
    set dict("р")="r"
    set dict("с")="s"
    set dict("т")="t"
    set dict("у")="u"
    set dict("ф")="f"
    set dict("х")="kh"
    set dict("ц")="ts"
    set dict("ч")="ch"
    set dict("ш")="sh"
    set dict("щ")="shch"
    set dict("ъ")=""
    set dict("ы")="y"
    set dict("ь")=""
    set dict("э")="e"
    set dict("ю")="yu"
    set dict("я")="ya"
}

/// w ##class(Test.Cyr).convertRussionToEnglish2()
ClassMethod convertRussionToEnglish2(word As %String = "Привет")
{
    do ..getDict(.dict)
    set out = ""
    for i=1:1:$l(word) {
        set letter = $e(word, i)
        set letterL = $zcvt(letter, "l")
        set outLetter = dict(letterL)
        set:letter'=letterL outLetter = $zcvt(outLetter, "U")
        set out = out _ outLetter
    }
    quit out
}

ClassMethod convertRussionToEnglish3(russian = "привет") As %String
{
    set rus="абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭьЬъЪ"
    set eng="abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE"
    set rus("ж")="zh"
    set rus("ц")="ts"
    set rus("ч")="ch"
    set rus("ш")="sh"
    set rus("щ")="shch"
    set rus("ю")="yu"
    set rus("я")="ya"
    set rus("Ж")="Zh"
    set rus("Ц")="Ts"
    set rus("Ч")="Ch"
    set rus("Ш")="Sh"
    set rus("Щ")="Shch"
    set rus("Ю")="Yu"
    set rus("Я")="Ya"
    set english=$tr(russian,rus,eng)
    set wow=$O(rus(""))
    while wow'="" {
        set english=$Replace(english,wow,rus(wow))
        set wow=$O(rus(wow))
    }
    return english
}

ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ]
{
$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Ш","Sh"),"O","UnicodeBig"),$c(0))
}

ClassMethod convertRussionToEnglish1(word As %String)
{

   //add array of transliteration system
    Set convertArray = $LB(
        $LB("а","a"),$LB("б","b"),$LB("в","v"),$LB("г","g"),$LB("д","d"),$LB("е","e"),$LB("ё","e"),$LB("ж","zh"),$LB("з","z"),
        $LB("и","i"),$LB("й","y"),$LB("к","k"),$LB("л","l"),$LB("м","m"),$LB("н","n"),$LB("о","o"),$LB("п","p"),
        $LB("р","r"),$LB("с","s"),$LB("т","t"),$LB("у","u"),$LB("ф","f"),$LB("х","kh"),$LB("ц","ts"),$LB("ч","ch"),
        $LB("ш","sh"),$LB("щ","shch"),$LB("ы","y"),$LB("э","e"),$LB("ю","yu"),$LB("я","ya"),$LB("ъ",""),$LB("ь",""),
        $LB("А","A"),$LB("Б","B"),$LB("В","V"),$LB("Г","G"),$LB("Д","D"),$LB("Е","E"),$LB("Ё","E"),$LB("Ж","ZH"),$LB("З","Z"),
        $LB("И","I"),$LB("Й","Y"),$LB("К","K"),$LB("Л","L"),$LB("М","M"),$LB("Н","N"),$LB("О","O"),$LB("П","P"),
        $LB("Р","R"),$LB("С","S"),$LB("Т","T"),$LB("У","U"),$LB("Ф","F"),$LB("Х","KH"),$LB("Ц","TS"),$LB("Ч","CH"),
        $LB("Ш","SH"),$LB("Щ","SHCH"),$LB("Ы","Y"),$LB("Э","E"),$LB("Ю","YU"),$LB("Я","YA"),$LB("Ъ",""),$LB("Ь","")    
    )
    

   //word Example

    Set wordToConvert = "Пример для Кода"
    Set wordToConvertLength = $L(wordToConvert)
    
    Set cnt=$ListLength(convertArray)
    Set latinWord = ""
    

    //and with cycle get each letter and parse in  transliteration array
    for i=1:1:wordToConvertLength {
        
        Set cyrillicWord = $E(wordToConvert,i)
        
        for j=1:1:cnt {
            Set codes=$ListGet(convertArray,j)
            Set cyrillicLetter=$ListGet(codes,1)
            Set latinLetter=$ListGet(codes,2)

            if cyrillicLetter=cyrillicWord {
                Set cyrillicWord = latinLetter    
            }
        }
        Set latinWord = latinWord_cyrillicWord

        
    }
    //Get result of convert
    Quit latinWord
}

ClassMethod convertRussionToEnglish5(russian = "привет") As %String
{
	s koi8=$zcvt(russian,"O","KOI8R")
	s ascii=""
	f i=1:1:$l(koi8) s ascii=ascii_$c($zb($a(koi8,i),127,1))
	q ascii
}

}
0
Alexey Maslov  Aug 30, 2019 to Eduard Lebedyuk

Eduard,

You have just forgotten about "Щ" in your awesome one-liner, while the $replacing of "Ш" is excessive. So, it should look like that: 

$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Щ","Shch"),"O","UnicodeBig"),$c(0))
0
Eduard Lebedyuk  Aug 30, 2019 to Alexey Maslov

You're right, I need to change $replace with Ш to $replace with Щ. Ш is replaced in $translate anyway

ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ]
{
$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Щ","Shch"),"O","UnicodeBig"),$c(0))
}
0
Alexey Maslov  Aug 30, 2019 to Eduard Lebedyuk

Actual rules used for names and surnames transliteration are more complex as they can be phonetically dependent. E.g. "Егор" -> "Egor", but "Иеремия" -> "Iyeremiya".

0