Getting URL detector RegEx to work with Caché %Regex.Matcher
I am trying to write some code that takes in a string and does a serverside transformation of it to find embedded URLs and replace it with clickable links. I found the following regex for Javascript which is rated highly on StackOverflow;
replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;
replacedText = inputText.replace(replacePattern1, '<a href="$1" target="_blank">$1</a>');And I tried to do the following in Caché ObjectScript but it's not working:
set matcher=##class(%Regex.Matcher).%New("/(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim",string)
set string = matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")
After I run the first line if I try matcher.Locate() it always returns a 0 (no matches).
I tested the RegEx on https://regex101.com/ and confirmed that it is finding the groups just as I expected it would. But it isn't working within Caché.
I admit that I am not a RegEx expert (but would like to learn).
Can anyone shed light onto why this isn't finding any matches in Caché when it does in JS? I can't even get it to work on a simple case:
s string="http://www.google.com"
Thanks in advance for the help!
Ben
Comments
Here's a solution that works for me:
s string="http://www.google.com"
set matcher=##class(%Regex.Matcher).%New("(\b(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|])",string)
w matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")Key changes:
- Remove the enclosing / /gim - this is part of the regex syntax in JS.
- Add two a-z ranges for case insensitivity
- Remove JS escaping of slashes (not necessary)
Thanks Tim!! Very helpful.
One question /comment - your approach doesn't allow for case insensitivity of the http(s)/ftp prefix. I would prefer to set the case insensitivity flag for the whole pattern.
According to the ICU documentation (http://userguide.icu-project.org/strings/regexp#TOC-Flag-Options):
[quote]
| (?ismwx-ismwx) | Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match. |
[/quote]
So I was able to make it work as follows:
set matcher=##class(%Regex.Matcher).%New("(?i)(\b(https?|ftp)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])",string)
set string = matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")
Thanks for the tips and pointing me in the right direction!
Nice! That's even better.