Non ascii characters got considered as a question mark inside regex engine. Is this a bug ?
The following regex is matching while I think it should not :
write$match("♥","\?") //print '1' (unexpected)It should only match the '?' character. I think what happen is the ♥ character got converted to '?' (as it's not within regular ascii range) before being validated by the regex engine.
In fact, this give a clue about what happen :
write$char($ascii("♥")) //print '?'Is this an IRIS well known limitation, is there workarounds ? Should I report it to InterSystems ?
In my case, a way to detect non standard ascii code in a string will be good enough. I'm not sure if it's possible if all string function consider those special characters as '?'.
Comments
I do not see this behavior on my instance.
USER>write $char($ascii("♥"))
♥
USER>write $match("♥","\?")
0
USER>zw $zv
"IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT"
What is the value of $ascii("♥") for you? Are you executing this from an IRIS terminal?
write $ascii("♥") give me 63 (which is question mark). I'm running this in Studio output window (as for the OP).
Running this in terminal works !
I got expected result and thus same result as you.
It's interesting that this behaves differently in the Studio output window, but since Studio is deprecated I wouldn't expect anything to come of reporting this. And hopefully this doesn't affect your use-case; you were just trying to test something from within Studio, right?
I also did the following test : create a routine with "write $ascii("♥")" inside and call it from outside (eg: Studio console). It works (so server code also works).
However I have a IRIS server where write $ascii("♥") always return 63, even in code and Terminal. Is there a settings somewhere in portal for UTF-8 support ?
EDIT : I found where it's being defined, it's inside NLS (National Language Settings).
The server has Latin1 defined, while the working local station has UTF-8. You can define different tables per category : for Terminal, Process, ...
I can't reproduce the problem
I seriously doubt it is a %Regex issue. The ? appears when some interface is converting characters to one of the many 8-bit character set codes and the source character code has a character that is not in the destination character code. This can happen to about 2**20 characters of Unicode when they are converted to any 8-bit code. It can also happen when converting 8-bit to a different 8-bit code.
Your terminal emulator and the IRIS terminal device can both do such conversions. A IRIS file device can also do such conversions. Different platforms can use different default conversions which explains why some different people cannot reproduce the results of other people.
Norman,
If terminal is ok for you, but studio isn't, maybe you should try the VSCODE Lite terminal.
I'm using: IRIS for Windows (x86-64) 2024.1.4 (Build 516_1U) Sat May 31 2025 13:15:40 EDT
I get the same in Terminal (Telnet) and also in VS-Code Lite terminal:
MSCEME>write $ascii("♥")
9829