Question Norman W. Freeman · Jul 6, 2023

Are there any known causes of IRIS entering a deadlock/hang state?

Based on your experience, do you know any reason why IRIS would enter a deadlock/hang state ? 

When such thing occurs, it's no more possible to connect to Portal or Studio, despite IRIS service (IRIS.EXE processes) being still active. CPU/memory/network usage are usually very low (eg: it does not occurs because server is overloaded). The only fix is a full restart of IRIS (eg: by clicking on IRIS icon in notification toolbar and choosing appropriate action).

I had that issue on a production server a few weeks ago. Any request sent to IRIS would lead to a timeout (and it was no more possible to enter Studio or Portal). The only solution was a restart of IRIS service. Apache seems fine. Inspecting the logs or doing a performance report (^pButtons) in the next days did not help to find what went wrong.

I did some research and find out at least two ways to recreate similar behavior : 

1) too many locks created (much more than what locksizparameter can allow).
This simple loop will crash system in a few seconds (do not try it !), needing a full restart.

for i=1:1:1000000
{
    set^A(i) = ""lock +^A(i)
}

Since locks are using shared memory (specified by gmheapparameter), is there a possibility of something else (eg: string allocation) using a lot of shared memory (thus leaving very little for locks themselves) ?

2) there is no more space on disk where journal is located.

Do you know any other reasons that can lead to system being down (the symptoms I describe in the top of my post) ?

Product version: IRIS 2021.1
$ZV: IRIS for Windows (x86-64) 2021.1 (Build 215U) Wed Jun 9 2021 09:39:22 EDT

Comments

Norman W. Freeman  Jul 6, 2023 to Alexander Pettitt

Locks are part of gmheap but you could allocate them in advance locksiz.

So what is specified in locksiz is already "reserved" from gmheap ? (eg: you cannot run out of memory for locks because of excessive gmheap usage).

0
Vic Sun  Jul 6, 2023 to Norman W. Freeman

Locksiz is only allocated as needed from gmheap, so if gmheap is used up you could be unable to take out further locks.

0
Alexander Pettitt  Jul 6, 2023 to Vic Sun

I think this is only true for locksiz=0 which is the default.

If you set it to a value that is what it is.

0
Vic Sun · Jul 6, 2023

"Deadlock" is too broad to describe any possibility that could cause the instance to hang. I would recommend reaching out to the WRC/support when that occurs so they can analyze the system with you.

FWIW the first place I would look would be the messages.log which would point to next investigative steps. Alexander's IRIShung suggestion is also a good one.

0