Thank you, Ray! Something like this I've supposed. Ok, we'll be more careful in the future.
- Log in to post comments
Thank you, Ray! Something like this I've supposed. Ok, we'll be more careful in the future.
I've found messages from ISCAgent in /var/log/messages. Something like this:
Oct 30 07:26:04 ecp01 ISCAgent[3823]: ISCAgent starting up
Oct 30 07:26:04 ecp01 ISCAgent[3823]: Application server enabled.
Oct 30 07:26:04 ecp01 ISCAgent[3824]: Starting ApplicationServer on *:2188
And I've found in docs that ISCAgent uses syslog system. So I can add to /etc/rsyslog.conf file the next string:
if $programname == 'ISCAgent' then /var/log/iscagent_console.log
Than restart rsyslog. And we get the same messages in iscagent_console.log.
That's fine. The more interesting are messages like this:
Oct 30 07:26:53 ecp01 ISCAgent[3550]: Arbiter client error: Message read failed.
This message appears on system with arbiter when Primary Failover member is halted by reboot or 'init 0' command. I would be glad to know what this message means. So this was the reason of question about ISCAgent debugging.
Thanks!
Hi, Alexey! Thanks for you answer!
But I'm not completely agree with it. My laptop has one physical drive /dev/sda and two logical volumes above it (home + root) plus /boot on /dev/sda1. And I've met different production server disk configurations with the same 7 SWDs. Additionally, how we can comment the file descriptors opened by those 7 SWDs + 1 WD?
Some of WDs has opened CACHESYS, some - CACHETEMP as well as the main WD which doesn't have to write to disk at all (I've met such statement).
Lisa, thanks for your answer!
I add this string to iscagent.conf, restart ISCAgent but at the first glance can't see a difference in log file:
Nov 2 12:37:32 localhost ISCAgent[2796]: Starting
Nov 2 12:37:32 localhost ISCAgent[2796]: ISCAgent starting up
Nov 2 12:37:32 localhost ISCAgent[2796]: Application server enabled.
Nov 2 12:37:32 localhost ISCAgent[2797]: Starting ApplicationServer on *:2188
And no new messages when Primary and Backup connected to Arbiter. Just old messages:
Nov 2 12:45:08 localhost ISCAgent[2856]: Serving application: ISC1ARBITER
Nov 2 12:45:24 localhost ISCAgent[2858]: Serving application: ISC1ARBITER
Maybe, I'm doing something wrong )
Thanks!
Aaron, thanks for answer! But, yes, why are exactly 8 daemons? Why not 4 or 13? Is there a method to configure it ($zu or cstat, or smth else)? As I've read somewhere, a few daemons appear because Linux didn't support async IO. Is this still true? It would be great to clarify the exact role of the Main Daemon and Slave Daemons. What I already noticed that these daemons wake up every 10 seconds to check write daemon queue. I think this one is the general for all of them. It would be cool to get more information about their configuration.
Lisa and Kate, at first thanks for your answers!
[root@ecp01 log]# cat /etc/iscagent/iscagent.conf
mirroring.console_logging_enabled=true
I can't see any changes either in /var/log/messages nor in /var/log/iscagent_console.log with or without this line. I have a feeling that this parameter won't work. Example of session:
[root@ecp01 log]# cat /etc/iscagent/iscagent.conf
mirroring.console_logging_enabled=true
[root@ecp01 log]# systemctl stop ISCAgent
[root@ecp01 log]# systemctl start ISCAgent
[root@ecp01 log]# /usr/local/etc/cachesys/ISCAgentCtrl status
application_server.interface_address=*
application_server.port=2188
daemonized=yes
mirroring=on
pid=2591
running=yes
version=2016.2.0.721.0
This parameter is absent in status output.
What about the reason of such interest. Initially I've tried to understand why in Linux with systemd (RHEL 7.2, for example) Backup Failover member (connected to Arbiter) can't takes over Primary when I power off Primary server (additional condition: both Failover members were started by 'ccontrol start' ). There is a WRC 870617.
But now it's just a interest for better understanding of switching processes.
Thank you, Mathew! It's a clear explanation.
Thank you, Brendan! Your comment was helpful.
Thanks, Aaron! I've created a WRC issue.
Hi, Evgeny! I'll try to organize it soon.
Login: operator
Pass: ZabbixOperator
Then press "Monitoring" -> "Screens"
Class signature is:
Class SYS.Monitor.SystemSensors Extends %SYS.Monitor.AbstractSensor [ Hidden, System = 3 ]
So it's absent either in Class Reference or Studio, but it can be shown in Studio using "Open" command and typing Systems.Sensors.cls in search field. Methods of this class are deployed so you cannot see their implementation.
Good day!
In article a simple example is described. In reality you should use Zabbix Discovery. I'll try to show this process soon. Please, wait for a couple of days.
Hi! Link to template with some of metrics is attached. You can easily add other standard metrics as well as your own. Meaning of the metrics see in <cache_dir>/mgr/SNMP/ISC-CACHE.mib. If you have any questions, fill free to ask them.
https://github.com/myardyas/zabbix/blob/master/zabbix_cache_snmp_exampl…
Good day! Please, see comment to other article https://community.intersystems.com/post/creating-custom-snmp-oids#comme…
Hi, Murray! Thanks for your comment! I'll try to describe templating soon. I should add it to my plan. For now, an article about ^mgstat is almost ready (in head -)). So stay tuned for more.
Thanks, Rubens, for answer! I also thought to use Java compiler. If my attempt is successful, I'll report about it.
mgstat.int uses such approach (in this case, for class %SYSTEM.CPU):
if $D(^oddDEF("%SYSTEM.CPU")) {
...
}
Fabian. thanks for link! I'll try it.
Hi, Alexey! Thanks! What about your question: I think, in that case we should:
1) run mgstat continuously
2) parse file.
Although both of these steps are not difficult, a REST-interface enables us to merge them in one step when we run class in that time we want. Besides we can always extend our metrics set. For example, it's worth to add monitoring of databases sizes as well as Mirroring, Shadowing etc.
Hi, Murray! Thanks for an answer!
Yes, that article is brilliant. And I was inspired to monitor as much as we can about Cache memory usage from inside Cache, because it will give us the opportunity to have one "monitoring" class (with a lot of API calls) to gather many Cache-connected metrics. Of course, a lot of metrics we should gather from OS as well.
Ok, let HugePages usage to be an OS-metrics.
Best regards!
Hi!
Unfortunately, I didn't have cases with Solaris, so I'll answer in style RTFM -)
An official InterSystems documentation says:
You can read more about SNMP support on Solaris here.
@Arto Alatalo At that time i've used Prometheus as a central monitoring system for hosts, services and Cache as well.
Also Simple JSON plugin should be improved to provide similar to Prometheus/Grafana functionality, at least at that moment I've looked at it last time.
Metrics in this approach are stored in Prometheus. As Prometheus is time-series database, you can store any numeric metric there either counter or gauge.
Hi David,
Thanks -) Regarding a meaning - it's taken from mgstat source-code (%SYS, routine mgstat.int).
Starting point was a line 159 in my local Cache 2017.1:
Then I guessed a meaning from a subroutine "heading" (line 289).
But the best option for you, I think, is to ask WRC. Support is very good.
I'm not sure, but think that SAM-implementation is based on System Monitor (https://docs.intersystems.com/iris20194/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_healthmon) in some fashion. Could you clarify your task? Do you want to know an exact name of Global where samples are stored? For what?
Link you've shared describes how to expose your custom numeric metric that could be read by Prometheus, for instance, and then stored there.
Thank you @Luca Ravazzolo -)
@David Foard IKO is just a pod, and it runs IRIS pods, so it could be running in an existing cluster. I also think, it worth to use some kind of Network Policies in that case. Regarding IKO usage experience - don't have a such in production, but going to try it for async mirroring soon.
Thank you for response!
But it looks like documentation should be updated in that case as it provides such an example -)
You can also use ? or * wild cards and if you wish to exclude items pass ' before the item name which also supports wild card, e.g. "User.*.cls,'User.T*.cls".