Written by

Sr Application Development Analyst at The Ohio State University Wexner Medical Center
Question Scott Roth · Apr 12, 2023

Quick Process to Start/Stop an Object

We are currently using different iterations of Ens.Director.EnableConfig items to start/stop objects within the Interoperability Namespace. We are looking for ways to minimize our downtime as we move from AIX to a new section of our Network and Red Hat Servers.

Besides using Ens.Director.EnableConfig item and waiting for a response, or just disabling the objects through the Namespace class file, is there a quicker way to stop Services and Operations to ensure the TCP disconnect is sent to those endpoints so we can move the networking rules to ensure they point to new servers? Kill is out of the question because it will not send the disconnect we are looking for.

Thanks

Product version: IRIS 2022.1
$ZV: IRIS for UNIX (Red Hat Enterprise Linux 8 for x86-64) 2022.1 (Build 209U) Tue May 31 2022 12:13:24 EDT

Comments

Eduard Lebedyuk · Apr 12, 2023

1. Do you need to restart several BHs at once or do you need to restart them one by one?

2. How long does it take currently and what's your goal timing-wise?

0
Scott Roth  Apr 12, 2023 to Eduard Lebedyuk

I need to be able to loop through all the Services/Operations to shut them down to ensure that the TCP Disconnect is sent. 

We had a consultant create us a script that uses Regex to loop through and shutdown those Services/Operations, but running the script to bring everything down takes a good 10 min or so to disable all the Service Operation Objects.

In testing the cut over when we did our Test Environment in total it took us 20 min to bring down Ensemble (2018.1.3) and bring up IRIS (2022.1) with the network changes.

But we forgot this step and had to ask different systems to restart their Interfaces because they were still hung on the previous connections and don't have framework to realize the connection was no longer connected.

Is there a way to be able to cutdown the response time of EnableConfig item?

0
Eduard Lebedyuk  Apr 12, 2023 to Scott Roth

I recommend you to check this article, but here's a summary:

1. Calculate a list of BHs which need a restart (not sure why you need regexp, all BHs are in Ens_Config.Item table):

SELECT %DLIST(Name) bhList
FROM Ens_Config.Item 
WHERE1=1AND Enabled = 1AND Production = :production
  AND ClassName %INLIST :classList -- or some other condition

2. Restart them all at once instead of one by one:

for stop = 1, 0 {
  for i=1:1:$ll(bhList) {
    set host = $lg(bhList, i)
    set sc = ##class(Ens.Director).TempStopConfigItem(host, stop, 0)
  }
  set sc = ##class(Ens.Director).UpdateProduction()
}
0
Jeffrey Drumm · Apr 13, 2023

If the 3rd argument to EnableConfigItem() is 1, the method will update the production on each call. That can be time consuming, so it might be worth considering setting that to 0 and then call Ens.Director.UpdateProduction() after the loop completes.

The other issue is that simply disabling a Production Config Item will only shut it down at the next polling interval or completion of the currently-processing request. This is a generally a good thing, but can take time for some interfaces.

For @Eduard Lebedyuk's benefit ... the regex @Scott Roth referred to is most likely to allow the selective shutdown of interfaces by name pattern to accommodate outages/upgrades for external systems. Alternately to be able to disable inbound interfaces before outbound interfaces to prevent queued messages.

0
Scott Roth  Apr 13, 2023 to Jeffrey Drumm

Thanks @Jeffrey Drumm, I had the same thoughts. I am working with other team members in discussing the options to see what is best.

0
Scott Roth · Apr 18, 2023

Using the suggestions below, looping through the Config Items doesn't appear to be the issue. I ran into an issue with the UpgradeProduction. In total it took 18 min to shutdown all the Services in just one namespace when I looped through all the Config Items, disabled them, and then Updated the Production with no timeout or force to shut them down. Is there a way to make UpgradeProduction run faster beside setting the timeout or force?

0
Vic Sun  Apr 18, 2023 to Scott Roth

Can you identify which components are taking a long time to stop? Often stopping a component being slow is the result of waiting for a synchronous call, for example. 

0
Scott Roth  Apr 18, 2023 to Vic Sun

By Adding a timeout of 2, and setting the force flag I was able to get it down from 18 min to roughly 11 min. If I remove the timeout and force flag, and watched the output I could possibly find those problematic children. I wonder if using Ens.Job would be any easier as the objects have been disabled its a matter of getting the Jobs to stop and send the unbind. 

Has anyone used Ens.Jobs before to quickly bring down a Namespace in a controlled environment? We need to unsure the unbinds are sent so we can move to a new IP address on a different section of the network.

0
Jeffrey Drumm  Apr 18, 2023 to Scott Roth

I don't think so.

UpdateProduction (I think that's what you meant) is attempting to obtain state information for all of the business hosts and likely won't complete until they're all down. Calling it at the end should still be faster than having it enabled for each EnableConfigItem() call.

The reality is that you appear to have a lot of processes that are dependent on polling rates and/or getting the appropriate responses back from external systems on notification they're terminating connections.

If you need to shut down the interfaces fast, you really can only do it at the expense of graceful connection termination.

Have you considered creating separate namespaces and compartmentalizing interfaces to keep your productions at a more manageable size? Business hosts in multiple smaller productions benefit from parallelism when performing administrative tasks like stopping/starting interfaces in bulk.

0
Scott Roth · Apr 20, 2023

We found that if we execute EnableConfigItem from the shell on the OS level we can kick off multiple processes (multi-threading) up to 125 instances much faster than doing this from an Object Script in Terminal.

0
Eduard Lebedyuk  May 12, 2023 to Scott Roth

Recently I wrote a snippet to determine which Business Host took to long to stop:

Class Test.ProdStop
{

/// do ##class(Test.ProdStop).Try()ClassMethodTry()
{
	set production = ##class(Ens.Director).GetActiveProductionName()
	set rs = ..EnabledFunc(production)
	if rs.%SQLCODE && (rs.%SQLCODE '= 100) {
		write$$$FormatText("Can't get enabled items in %1, SQLCode: %2, Message: %3", production, rs.%SQLCODE, rs.%Message)
		quit
	} 
	
	while rs.%Next() {
		set bh = rs.Name
		set start = $zhset sc = ##class(Ens.Director).EnableConfigItem(bh, $$$NO, $$$YES)
		set end = $zhset duration = $fn(end-start,"",1)
		write !, $$$FormatText("BH: %1, Stopped in: %2, sc: %3", bh,  duration, $case($$$ISOK(sc), $$$YES:1, :$system.Status.GetErrorText(sc))), !
		if duration>60 {
			write !, $$$FormatText("!!!!!!! BH: %1 TOOK TOO lONG !!!!!!!", bh),!
		}
	}
}

Query Enabled(production) As%SQLQuery
{
SELECT 
	Name 
	, PoolSize
FROM Ens_Config.Item 
WHERE 1=1
	AND Production = :production
	AND Enabled = 1
}

}

It stops BHs one by one, measuring how long it took to stop each one.

I would recommend you try to determine which items are taking too long to stop.

Export production before running this code to avoid manually reenabling all the hosts.

0