How to remove accentuation?
How to remove accentuation of a word?
Ex:
Árvore = Arvore
você = voce
Então = entao
The words above are in brazilian portuguese, I need to get rid with the accentuation such that I can compare two sentences.
Thanks in advance.
Comments
Use $translate?
e.g.
ClassMethod NoAccents(stringWithAccents as %String) as %String
{
w "before: ",stringWithAccents
set accent="Áêã",usual="Aea",!
set val=$translate(stringWithAccents,accent,usual)
w "after: ",val
return val
}To handle this in the general case, you would decompose the string, then strip out non-spacing marks. Unicode normalization has been requested previously, and will hopefully make it into the product at some point.
Another option is to use a regular expression, like this:
ClassMethod ReplaceAccents(ByRef pWord As %String) As %Status{ Set tSC = $$$OK Try {
Set dictionary = ##class(%ArrayOfDataTypes).%New()
Do dictionary.SetAt("ÀÁÂÃÄÅ","A")
Do dictionary.SetAt("àáâãäå","a")
Do dictionary.SetAt("ÈÉÊË","E")
//.... all the rest
While dictionary.GetNext(.key) {
Set matcher = ##class(%Regex.Matcher).%New("["_ dictionary.GetAt(key) _ "]", pWord)
Set pWord = matcher.ReplaceAll(key)
}
} Catch tException { Set:$$$ISOK(tSC) tSC = tException.AsStatus() } Quit tSC}This is a very delayed answer to an old question, but there is now a $zconvert mode in IRIS that will do this for you:
> write $zconvert("Árvore", "A")
ArvoreWOW! Nice!
But...is there any reason why this is not documented?
When was it introduced?
A quick test shows it was not available in Caché based products, that is 2018 and is available in 2022.1.
At the moment I cannot test version 2019 to 2021.