Mikhail Khomenko · May 13, 2016 go to post

Thank you, Ray! Something like this I've supposed. Ok, we'll be more careful in the future.

Mikhail Khomenko · Oct 30, 2016 go to post

I've found messages from ISCAgent in /var/log/messages. Something like this:

Oct 30 07:26:04 ecp01 ISCAgent[3823]: ISCAgent starting up
Oct 30 07:26:04 ecp01 ISCAgent[3823]: Application server enabled.
Oct 30 07:26:04 ecp01 ISCAgent[3824]: Starting ApplicationServer on *:2188

And I've found in docs that ISCAgent uses syslog system. So I can add to /etc/rsyslog.conf file the next string:

if $programname == 'ISCAgent' then /var/log/iscagent_console.log

Than restart rsyslog. And we get the same messages in iscagent_console.log.

That's fine. The more interesting are messages like this:

Oct 30 07:26:53 ecp01 ISCAgent[3550]: Arbiter client error: Message read failed.

This message appears on system with arbiter when Primary Failover member is halted by reboot or 'init 0' command. I would be glad to know what this message means. So this was the reason of question about ISCAgent debugging.

Thanks!

Mikhail Khomenko · Nov 1, 2016 go to post

Hi, Alexey! Thanks for you answer!

But I'm not completely agree with it. My laptop has one physical drive /dev/sda and two logical volumes above it (home + root) plus /boot on /dev/sda1.  And I've met different production server disk configurations with the same 7 SWDs. Additionally, how we can comment the file descriptors opened by those 7 SWDs + 1 WD?

[root@HP-6360B ~]# for wd in `pgrep -f WD`;do ls -l /proc/$wd/fd | grep -v '/dev/null';done
total 0
l-wx------ 1 root root 64 Nov  1 08:09 3 -> /opt/intersystems/cache-2016.2/bin/clock
lrwx------ 1 root root 64 Nov  1 08:09 4 -> /opt/intersystems/cache-2016.2/mgr/CACHE.WIJ
lrwx------ 1 root root 64 Nov  1 08:09 5 -> /opt/intersystems/cache-2016.2/mgr/CACHE.DAT
lrwx------ 1 root root 64 Nov  1 08:09 6 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT
lrwx------ 1 root root 64 Nov  1 08:09 7 -> /opt/intersystems/cache-2016.2/mgr/cacheaudit/CACHE.DAT
lrwx------ 1 root root 64 Nov  1 08:09 8 -> /opt/intersystems/cache-2016.2/mgr/cache/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:09 3 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT
total 0
lrwx------ 1 root root 64 Nov  1 08:10 3 -> /opt/intersystems/cache-2016.2/mgr/cachetemp/CACHE.DAT

 

Some of WDs has opened CACHESYS, some  - CACHETEMP as well as the main WD which doesn't have to write to disk at all (I've met such statement).

Mikhail Khomenko · Nov 2, 2016 go to post

Lisa, thanks for your answer!
I add this string to iscagent.conf, restart ISCAgent but at the first glance can't  see a difference in log file:

Nov  2 12:37:32 localhost ISCAgent[2796]: Starting
Nov  2 12:37:32 localhost ISCAgent[2796]: ISCAgent starting up
Nov  2 12:37:32 localhost ISCAgent[2796]: Application server enabled.
Nov  2 12:37:32 localhost ISCAgent[2797]: Starting ApplicationServer on *:2188

And no new messages when Primary and Backup connected to Arbiter. Just old messages:

Nov  2 12:45:08 localhost ISCAgent[2856]: Serving application: ISC1ARBITER
Nov  2 12:45:24 localhost ISCAgent[2858]: Serving application: ISC1ARBITER

Maybe, I'm doing something wrong )
Thanks!

Mikhail Khomenko · Nov 5, 2016 go to post

Aaron, thanks for answer! But, yes, why are exactly 8 daemons? Why not 4 or 13? Is there a method to configure it ($zu or cstat, or smth else)? As I've read somewhere, a few daemons appear because Linux didn't support async IO. Is this still true? It would be great to clarify the exact role of the Main Daemon and Slave Daemons. What I already noticed that these daemons wake up every 10 seconds to check write daemon queue. I think this one is the general for all of them. It would be cool to get more information about their configuration.

Mikhail Khomenko · Nov 7, 2016 go to post

Lisa and Kate, at first thanks for your answers!

[root@ecp01 log]# cat /etc/iscagent/iscagent.conf
mirroring.console_logging_enabled=true

I can't see any changes either in /var/log/messages nor in /var/log/iscagent_console.log with or without this line. I have a feeling that this parameter won't work. Example of session:

[root@ecp01 log]# cat /etc/iscagent/iscagent.conf
mirroring.console_logging_enabled=true
[root@ecp01 log]# systemctl stop ISCAgent
[root@ecp01 log]# systemctl start ISCAgent
[root@ecp01 log]# /usr/local/etc/cachesys/ISCAgentCtrl status
application_server.interface_address=*
application_server.port=2188
daemonized=yes
mirroring=on
pid=2591
running=yes
version=2016.2.0.721.0
 

This parameter is absent in status output.

What about the reason of such interest. Initially I've tried to understand why in Linux with systemd (RHEL 7.2, for example) Backup Failover member (connected to Arbiter) can't takes over Primary when I power off Primary server (additional condition: both Failover members were started by 'ccontrol start' ). There is a WRC 870617. 

But now it's just a interest for better understanding of switching processes.

Mikhail Khomenko · Mar 15, 2017 go to post

Class signature is:

Class SYS.Monitor.SystemSensors Extends %SYS.Monitor.AbstractSensor [ Hidden, System = 3 ]

So it's absent either in Class Reference or Studio, but it can be shown in Studio using "Open" command and typing Systems.Sensors.cls in search field. Methods of this class are deployed so you cannot see their implementation.

Mikhail Khomenko · Jun 7, 2017 go to post

Good day!

In article a simple example is described. In reality you should use Zabbix Discovery. I'll try to show this process soon. Please, wait for a couple of days.

Mikhail Khomenko · Jun 13, 2017 go to post

Hi, Murray! Thanks for your comment! I'll try to describe templating soon.  I should add it to my plan. For now, an article about ^mgstat is almost ready (in head -)). So stay tuned for more.

Mikhail Khomenko · Jul 7, 2017 go to post

Thanks, Rubens, for answer! I also thought to use Java compiler. If my attempt is successful, I'll report about it.

Mikhail Khomenko · Jul 25, 2017 go to post

mgstat.int uses such approach (in this case, for class %SYSTEM.CPU):

if $D(^oddDEF("%SYSTEM.CPU")) {
        ...
    }

Mikhail Khomenko · Nov 3, 2017 go to post

Hi, Alexey! Thanks! What about your question: I think, in that case we should:

1) run mgstat continuously

2) parse file.

Although both of these steps are not difficult, a REST-interface enables us to merge them in one step when we run class in that time we want. Besides we can always extend our metrics set. For example, it's worth to add monitoring of databases sizes as well as Mirroring, Shadowing etc. 

Mikhail Khomenko · Jan 4, 2018 go to post

Hi, Murray! Thanks for an answer!

Yes, that article is brilliant. And I was inspired to monitor as much as we can about Cache memory usage from inside Cache, because it will give us the opportunity to have one "monitoring" class (with a lot of API calls) to gather many Cache-connected metrics. Of course, a lot of metrics we should gather from OS as well.
Ok, let HugePages usage to be an OS-metrics.

Best regards!

Mikhail Khomenko · Mar 22, 2018 go to post

Hi!
Unfortunately, I didn't have cases with Solaris, so I'll answer in style RTFM -)
An official InterSystems documentation says:

Many UNIX operating systems (HP-UX, IBM AIX®, and Oracle Solaris) do not support the AgentX protocol at this time. If your system does not support AgentX, you must install a separate SNMP agent which supports AgentX, such as Net-SNMP.
Note:
On Oracle Solaris, the System Management Agent (SMA), although a version of NET-SNMP, is not compatible with the Cache AgentX implementation. You will therefore need to disable the SMA agent (and possibly the older snmpdx agent as well) and install a standard version of NET-SNMP to support AgentX.

You can read more about SNMP support on Solaris here

Mikhail Khomenko · Feb 8, 2020 go to post

@Arto Alatalo At that time i've used Prometheus as a central monitoring system for hosts, services and Cache as well.
Also Simple JSON plugin should be improved to provide similar to Prometheus/Grafana functionality, at least at that moment  I've looked at it last time.

Mikhail Khomenko · Feb 19, 2020 go to post

Metrics in this approach are stored in Prometheus. As Prometheus is time-series database, you can store any numeric metric there either counter or gauge.

Mikhail Khomenko · Feb 20, 2020 go to post

Hi David,
Thanks -) Regarding a meaning - it's taken from mgstat source-code (%SYS, routine mgstat.int).
Starting point was a line 159 in my local Cache 2017.1:

i maxeccon s estats=$p($system.ECP.GetProperty("ClientStats"),",",1,21),array($i(i))=+$system.ECP.NumClientConnections(),array($i(i))=$p(estats,",",2),array($i(i))=$p(estats,",",6),array($i(i))=$p(estats,",",7),array($i(i))=$p(estats,",",19),array($i(i))=$p(estats,",",20)


Then I guessed a meaning from a subroutine "heading" (line 289).

But the best option for you, I think, is to ask WRC. Support is very good.

Mikhail Khomenko · Feb 20, 2020 go to post

I'm not sure, but think that SAM-implementation is based on System Monitor (https://docs.intersystems.com/iris20194/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_healthmon) in some fashion. Could you clarify your task? Do you want to know an exact name of Global where samples are stored?  For what?
Link you've shared describes how to expose your custom numeric metric that could be read by Prometheus, for instance, and then stored there.

Mikhail Khomenko · Oct 27, 2021 go to post

@David Foard IKO is just a pod, and it runs IRIS pods, so it could be running in an existing cluster. I also think, it worth to use some kind of Network Policies in that case. Regarding IKO usage experience - don't have a such in production, but going to try it for async mirroring soon.

Mikhail Khomenko · Nov 1, 2021 go to post

Thank you for response!
But it looks like documentation should be updated in that case as it provides such an example -)

You can also use ? or * wild cards and if you wish to exclude items pass ' before the item name which also supports wild card, e.g. "User.*.cls,'User.T*.cls".