Question Norman W. Freeman · Nov 5

Non ascii characters got considered as a question mark inside regex engine. Is this a bug ?

The following regex is matching while I think it should not :

write$match("♥","\?") //print '1' (unexpected)

It should only match the '?' character. I think what happen is the ♥ character got converted to '?' (as it's not within regular ascii range) before being validated by the regex engine.

In fact, this give a clue about what happen : 

write$char($ascii("♥")) //print '?'

Is this an IRIS well known limitation, is there workarounds ? Should I report it to InterSystems ?

In my case, a way to detect non standard ascii code in a string will be good enough. I'm not sure if it's possible if all string function consider those special characters as '?'.

Product version: IRIS 2024.1
$ZV: IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT

Comments

Josh Bone · Nov 5

I do not see this behavior on my instance.

USER>write $char($ascii("♥"))

USER>write $match("♥","\?")
0
USER>zw $zv
"IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT"

What is the value of $ascii("♥") for you? Are you executing this from an IRIS terminal?

0
Norman W. Freeman  Nov 5 to Josh Bone

write $ascii("♥") give me 63 (which is question mark). I'm running this in Studio output window (as for the OP).

Running this in terminal works !
I got expected result and thus same result as you.

0
Josh Bone  Nov 5 to Norman W. Freeman

It's interesting that this behaves differently in the Studio output window, but since Studio is deprecated I wouldn't expect anything to come of reporting this. And hopefully this doesn't affect your use-case; you were just trying to test something from within Studio, right?

0
Norman W. Freeman  Nov 5 to Josh Bone

I also did the following test : create a routine with "write $ascii("♥")" inside and call it from outside (eg: Studio console). It works (so server code also works).

However I have a IRIS server where write $ascii("♥") always return 63, even in code and Terminal. Is there a settings somewhere in portal for UTF-8 support ?
EDIT : I found where it's being defined, it's inside NLS (National Language Settings). 

The server has Latin1 defined, while the working local station has UTF-8. You can define different tables per category : for Terminal, Process, ...

0
Steven Hobbs · Nov 6

I seriously doubt it is a %Regex issue.  The ? appears when some interface is converting characters to one of the many 8-bit character set codes and the source character code has a character that is not in the destination character code.  This can happen to about 2**20 characters of Unicode when they are converted to any 8-bit code.  It can also happen when converting 8-bit to a different 8-bit code.

Your terminal emulator and the IRIS terminal device can both do such conversions.  A IRIS file device can also do such conversions.  Different platforms can use different default conversions which explains why some different people cannot reproduce the results of other people.

0
Yaron Munz · Nov 6

Norman,
If terminal is ok for you, but studio isn't, maybe you should try the VSCODE Lite terminal.

I'm using: IRIS for Windows (x86-64) 2024.1.4 (Build 516_1U) Sat May 31 2025 13:15:40 EDT
I get the same in Terminal (Telnet) and also in VS-Code Lite terminal:
MSCEME>write $ascii("♥") 
9829
 

0