Sean Connelly · Apr 21, 2017 go to post

Hi Nikita,

Thanks for your detailed response. Good to hear a success story with WebSockets. Many of the problems that I have read about are very fringe. Most firewalls seem to allow the outgoing TCP sockets because they are running over 80 or 443, but it appears there are fringe cases of a firewall blocking the traffic. Also certain types of AV software can block. I suspect these problems are more prominent in the node.js community because node is more prevalent than Caché and that Caché is more likely to be inside the firewall with the end users.

The main problem I still have is that I work on Caché and Ensemble inside healthcare organisations in the UK and they are always behind on browser versions for various reasons. Only recently have I been able to stop developing applications that needed to work on IE6. Many are still on IE8 or IE9 (sometimes running in IE7 emulation mode). Either way websockets only work on IE10+. I can work around most browser problems with pollyfill's, but sockets require a client and server solution. That means you can't just drop in sockjs as an automatic fall-back library because there is no server side implementation for it on Caché.

Without any library built for Cache I am thinking what is needed is a simple native emulator of the client socket library that implements a simple long poll implementation with Cache. If I then hit a scalability problem then it would be time to put Node.JS in front of Cache with all the additional admin overhead. A nice problem to have all the same. Still, I suspect it would be a large number of users and sockets to swamp out Cache resources.

Your WebTerminal looks very good. Not sure why I have not seen that before, looks like something I would use. I'm not sure why we can't have a web based IDE for Cache when I see a complex working web application such as this. I even have my own web based IDE that I use for that other M database, not sure I can mentioned it here :) which I keep thinking to port to Cache.

Sean Connelly · Apr 21, 2017 go to post

Good to know QEWD has been using web sockets for a long time. Socket.io is a well maintained library with all of the fall back features that I worry are missing with Cache sockets alone. Makes it a lot easier when putting Node.JS in front of Cache. I guess I just want to avoid any additional moving parts as I bash out as many web applications per year as I have users on some of them. That's why I never use the likes of Zen, I just need things simple and fast and typically avoid web sockets for fear of headaches using them. But, it is 2017 and we can only hope the NHS masses will soon move on to an evergreen browser.

Do you have benchmarks for QEWD. I have my own Node to Cache binding that I have developed for an RPC solution and I found I could get twice as much throughput of marshalled TCP messages compared to CSP requests. But then I can never be sure with these types of benchmarks unless put into production environment.

Sean Connelly · Apr 21, 2017 go to post

I had some interesting results using just a TCP binding between node and cache.

With just one single node process and one single cache worker process I was able to process 1200 JSON RPC 2.0 messages per second. This included Cache de-serialising the JSON, calling its internal target method, writing and reading some random data and then passing back a new JSON object. Adding a second Cache process nearly doubled that throughput.

I was running node, cache and the stress test tool on the same desktop machine with lots of other programs running. I started to hit limits that seemed to be related to the test tool, so I wasn't sure how high I could take these benchmarks with this set up.

Interestingly when I bypass node and use a CSP page to handle the requests I could only get the same test set up to process 340 messages per second. This I couldn't understand. I am sure it was to do with the test tool, but could not work out how to get this higher. I would have expected Cache to spin up lots of processes and see more than the 1200 that were limited by one process.

It did make me wonder that no matter how many processes you have, you can only process 2 to 3 at a time per 4 CPU cores and that maybe Node was just much faster at dealing with the initial HTTP request handling, or that spreading the load between the two was a good symbiotic performance gain. Still I was not expecting such a big difference.

Now I would have thought, if you put Node.JS on one box and Cache on a second box so they don't compete for the resources they need most, that the TCP connection would be much more efficient than binding Node and Cache in the same process on the same box?

Sean Connelly · Apr 21, 2017 go to post

Hi Nikita,

Sounds like an interesting plan.

I've developed my own desktop grade UI widget library. I used to use ExtJS but got fed up with the price and speed. I've got it to the stage where I can build the shell of an IDE that could be mistaken for a thick client installation. If you right click the image below and open in a new tab you will see that it has all the features you would expect of an IDE. Split panel editing, drag panels, accordions, trees, menus, border layouts etc, and an empty panel that needs a terminal!

I have syntax highlighting working for a variation of M that I have been working on for a few years. I can get the syntax highlighting working for COS no problem (well long form at least).

The hard stuff would be getting things like the studio inspector working like for like. Lots of back end COS required etc.

I've still got a few months working on an RPC messaging solution for the UI but once thats done I would be open to collaborating on a back-end implementation for Cache.

Sean.

Sean Connelly · Apr 21, 2017 go to post

I can remember reading your previous post about the bottleneck before. I just wonder if such a fix would make the V8 less secure and as such would be unlikely to happen under the current architecture.

I did think about the problem for a while and I managed to get a basic AST JavaScript to Mumps compiler working. Its just a proof of concept and would require much more effort. Coercion missmatch is the biggest barrier. I came up with 1640 unit tests alone as an exercise to figure out how big the task would be. And that's just 1/20th of the overall unit tests it would require.

Effectively you would write JavaScript stored procedures inline that would run on the Cache side. I'm still not sure of the exact syntax, but it might be something like

db("namespace").execute( () => {
  //this block of code is pre-compiled into M
  //the code is cached with a guid and that's what execute will call
 //^global references are allowed because its not real JavaScript
  var data=[];
  for k1 in ^global {
    for k2 in ^global(k1) {
      data.push(^global(k1,k2));
    }
  }
  return data;
}).then( data => {
  //this runs in node, data is streamed efficiently in one go
  //and not in lots of costly small chunks
  console.log(data)
})


In terms of benchmarks, I would like to see the likes of Cache and ilk on here...

https://www.techempower.com/benchmarks/#section=data-r13&hw=ph&test=upd…
 

Sean Connelly · Apr 26, 2017 go to post

No problem. I know you said you didn't want to write your own custom code, so this is for anyone else landing on the question.

If you use the DTL editor (which I advise even for ardent COS developers), then you will most likely use the helper methods in Ens.Util.FunctionSet that your DTL extends from, e.g. ToUpper, ToLower etc.

Inevitably there will be other common functions that will be needed time and time again. A good example would be to select a patient ID in a repeating field of no known order. This can be achieved with a small amount of DTL logic, but many developers will break out and write some custom code for this. The problem however is that I see each developer doing there own thing and a system ends up with custom DTL logic all over the place, often repeating the same code over and over again.

The answer is to have one class for all of these functions and make that class extend Ens.Rule.FunctionSet. By extending this class, all of the ClassMethods in that class will magically appear in the drop down list of the DTL function wizard. This way all developers across the team, past and future will visibly see what methods are available to them.

To see this in action, create your own class, something like this...

Class Foo.FunctionSet Extends Ens.Rule.FunctionSet
{
  ClassMethod SegmentExists(pSegment) As %Boolean
  {
      Quit pSegment'=""
  }
}


Then create a new HL7 DTL. Click on any segment on the source and add an "If" action to it. Now in the action tab, click on the condition wizard and select the function drop down box. The SegmentExists function will magically appear in the list. Select it and the wizard will inject the segment value into this function.

Whilst developers feel the need to type these things out by hand, they will never beat the precision of using these types of building blocks and tools. It also means that you can have data mappers with strong business logic and not so broad programming skills bashing out these transforms.

Sean Connelly · Apr 27, 2017 go to post

That's interesting. The REST page is just a %CSP.Page so you would think it would. I'll have a dig around.

Sean Connelly · Apr 27, 2017 go to post

Ahhh OK, the Page() method has been overridden so we lose all of the %CSP.Page event methods.

Since there are no event methods the only thing you can do is override either the Page() method or the DispatchRequest(). At least the headers are not written too until after the method call.

I guess what you are doing will be as good as it gets. Only worry is if the implementation changes in a later release.

Ideally the class should have an OnBeforeRequest() and an OnAfterRequest() set of methods.

Sean Connelly · Apr 27, 2017 go to post

Hi Evgeny,

I couldn't help notice that this post has a -1 rating.

I can't see anything wrong with the post so curious what the reason for a down vote would be.

On stack overflow they explain the use of down voting as...

When should I vote down?
Use your downvotes whenever you encounter an egregiously sloppy, no-effort-expended post, or an answer that is clearly and perhaps dangerously incorrect.
You have a limited number of votes per day, and answer down-votes cost you a tiny bit of reputation on top of that; use them wisely.


This post has an interesting conversation around down votes...

https://meta.stackexchange.com/questions/135/encouraging-people-to-expl…

As the original poster explains...

Where the down-vote has been explained I've found it useful & it has improved my answer, or forced me to delete the answer if it was totally wrong
This initiated a change that prompts down voters with the message, "Please consider adding a comment if you think the post can be improved".

We all make mistakes so its good to get this type of feedback. It also helps anyone landing on the answer months or years down the line and not realising the answer they are trying to implement has a mistake. Perhaps the DC site could do with something similar?

Sean.

Sean Connelly · Apr 27, 2017 go to post

In relation to stackoverflow I think the intention was to stop people abusing the implementation of the down vote.

I'm not sure that's relevant here, unless the main DC site started displaying some kind of reputation score in the same way as stackoverflow.

> Then downvotes will never be used, they why even have them?

I think dangerously incorrect is a poor description and perhaps needs down voting :)

It should just say "contains incorrect information".

Despite that, there are 1,000,000+ users on stackoverflow regularly using the down vote so it must be working.

Sean Connelly · May 12, 2017 go to post

Thanks Evgeny,

As its config data its a shame to have it converted to Base64 and then wrapped up in XML.

Would be nice to see it as JSON in clear text so that it display nicely on GitHub.

Guess this might not work for certain types of binary data, but should work in this instance.

Sean Connelly · May 12, 2017 go to post

There is no command like utility that I know of to do this.

Classes are compiled down to routines and there is a command line utility for listing these...

Do ^%RD

Whilst there is no utility, you can write your own that queries the %Dictionary.CompiledClass, e.g. in SQL you can do...

select * from %Dictionary.CompiledClass


There is also a quick and dirty way to list from that storage class using the following from the command line...

set cn="" for  set cn=$order(^oddCOM(cn)) q:cn=""  if $e(cn,1)'="%",$e(cn,1,3)'="Ens",$e(cn,1,2)'="HS" write !,cn


*Its a one liner but it's wrapping on the comments (cut and paste it into terminal as one line).
Its a bit of a hack, but it should work. I have put a filter in so that it does not list system classes that I can think of, you might want to tinker with this to fit your needs. Remember to honor the spaces (there are two after the for and two after the quit).

WARNING

You mentioned that you have created three classes in the %SYS namespace.

Be careful doing this as you can lose your code doing an upgrade.

If you want to have common code that is accessible from all of your other namespaces then it is better to create a common namespace and then map that namespace to the others.

Sean

Sean Connelly · May 12, 2017 go to post

I would hope none of this information is being dumped on a live server, dev server only.

Sean Connelly · May 12, 2017 go to post

> What is your definition of a few k? Each line is about 25000 KB.

Do you mean 25,000 characters (25K)?

> Previous comment recommended processing the file as a single message stream. That ended up slowing the message viewer so much for these large messages that that it is impossible to view the message at all.

You can override the content display method on your Ens.Request class so that it doesn't display the entire message. You can replace this with a small summary about the file, size, no of records etc.

Creating 500,000 Ensemble messages is going to generate a lot of IO that you probably don't need.

I would still recommend processing them as one file. 

Sean Connelly · May 16, 2017 go to post

Images are fine for me.

Looks like they are hosted on a google drive, perhaps your behind an http proxy that blocks that domain?

Sean Connelly · May 18, 2017 go to post

Hi Shobha,

I've just had a couple of attempts using the same wsdl and package name that you are using, as well as trying a few variations in the settings.

On each attempt the wizard outputs a successful generation for me and I was unable to re-create the error that you are seeing.

The problem might be related to your specific version of Ensemble, or there could be something odd going on with dictionary mappings that needs to be looked at.

I would recommend contacting your local Intersystems support for more help on this matter.

Sean.

Sean Connelly · May 25, 2017 go to post

Nice solution, not seen RunCommandViaCPIPE used before.

Looking at the documentation it says...

"Run a command using a CPIPE device. The first unused CPIPE device is allocated and returned in pDevice. Upon exit the device is open; it is up to the caller to close that device when done with it."

Does this example need to handle this?

Also, do  you not worry that its an internal class?

Sean Connelly · May 26, 2017 go to post

> I don't recommend opening %ResultSet instances recursively.

Agreed, but maybe splitting hairs if only used once per process

> It's more performatic if you open a single %SQL.Statement  and reuse that.

Actually, its MUCH slower, not sure why, just gave it a quick test, see for yourself...

ClassMethod GetFileTree(pFolder As %String, pWildcards As %String = "*", Output oFiles, ByRef pState = "") As %Status
{
    if pState="" set pState=##class(%SQL.Statement).%New()
    set sc=pState.%PrepareClassQuery("%File", "FileSet")
    set fileset=pState.%Execute(##class(%File).NormalizeDirectory(pFolder),pWildcards,,1)
    while $$$ISOK(sc),fileset.%Next(.sc) {
        if fileset.%Get("Type")="D" {
            set sc=..GetFileTree(fileset.%Get("Name"),pWildcards,.oFiles,.pState)
        } else {
            set oFiles(fileset.%Get("Name"))=""
        }    
    }
    quit sc
}

** EDITED **

This example recycles the FileSet (see comments below regarding performance)

ClassMethod GetFileTree3(pFolder As %String, pWildcards As %String = "*", Output oFiles, ByRef fileset = "") As %Status
{
    if fileset="" set fileset=##class(%ResultSet).%New("%Library.File:FileSet")
    set sc=fileset.Execute(##class(%File).NormalizeDirectory(pFolder),pWildcards,,1)
    while $$$ISOK(sc),fileset.Next(.sc) {
        if fileset.Get("Type")="D" {
            set dirs(fileset.Get("Name"))=""
        } else {
            set oFiles(fileset.Get("Name"))=""
        }    
    }
    set dir=$order(dirs(""))
    while dir'="" {
        set sc=..GetFileTree3(dir,pWildcards,.oFiles,.fileset)        
        set dir=$order(dirs(dir))
    }
    quit sc
}
Sean Connelly · May 26, 2017 go to post

I've removed the recycled resultset example, it is not working correctly. Might not work at all as a recycled approach, will look at it further and run more time tests if it works.

In the mean time, my original example without recycling the resultset, on a nest of folders with 10,000+ files takes around 2 seconds, where as the recycled SQL.Statement example takes around 14 seconds.

Sean Connelly · May 26, 2017 go to post

OK, I got the third example working, needed to stash the dirs as they were getting lost.

Here are the timings...

Recursive ResultSet  =  2.678719

Recycled ResultSet  =  2.6759

Recursive SQL.Statement  =  15.090297

Recycled SQL.Statement  =  15.073955

I've tried it with shallow and deep folders with different file counts and the differential is about the same for all three.

The recycled objects surprisingly only shave off a small amount of performance. I think this is because of bottlenecks elsewhere that over shadow the milliseconds saved.

SQL.Statement 6-7x slower that RestulSet is a surprise, but then the underlying implementation is not doing a database query which is where you would expect it to be the other way around.

The interesting thing now would be to benchmark one of the command line examples that have been given to compare.

Sean Connelly · May 26, 2017 go to post

Just for good measure, I benchmarked Vitaliy's last example and it completes the same test in 0.344022, so for out and out performance a solution built around this approach is going to be the quickest.

Sean Connelly · May 31, 2017 go to post

Hi Rubens,

I designed the solution around the real life use cases that I hit in my mainstream work.

In most instances I am handling JSON to and from a browser and I have never had a use case where the JSON is over Cachés long string support of 3,641,144 characters. 

Thats with the exception of wanting to post a file with JSON. In that instance I have some boiler plate code that sends them as multiparts and joins them back together after the main JSON parse.

With those decisions made it was just a matter of writing very efficient COS code that processed long strings. A couple of years ago the serialiser and deserialiser classes stacked up pretty big. In this latest version they are an uber efficient 90 and 100 lines of code each.

There is no AST magic going on, just projection compilation with inspection of the dictionary. A small lib to abstract the annotations and various code generator tricks to bake in type handlers and deligators.

Where data might go over 3,641,144 characters is parsing data backwards and forwards with Node.JS or another Caché server. In this instance the data is almost always going to be an array of results or an array of objects. For the later there is a large array helper class I am working on that will split out individual objects from a stream and then handle them as long strings. This will be part of the Node package.

In the few fringe cases where someone might be generating objects larger than 3,641,144 characters then it wouldn't be to hard to have stream varients. I used to have these, but dropped them because they were never used. But I would keep the string handler variants as the primary implementations as they prove very quick.

As for older Caché instances, I have had to support JSON as long as 8 years ago and still see the need for backwards compatibility.

Sean.