Murray Oldfield · Mar 14, 2016 go to post

The latest Caché documentation has details and examples for setting up read only or read/write asynchronous report mirror. The asynch reporting mirror is special because it is not used for high availability. For example it is not a DR server.

At the highest level running reports or extracts on a shadow is possible simply because the data exists on the other server in near real time. Operational or time-critcal reports should be run on the primary servers. The suggestion is that resource heavy reports or extracts can use the shadow or reporting server. 

While setting up a shadow or reporting asynch mirror is part of Caché, how a report or extract is scheduled or run is an application design question, and not something I can answer - hopefully someone else can jump in here with some advice or experience.

Posibilities may include web services or if you use ODBC your application could direct queries to the shadow or a reporting asynch mirror. For batch reports or extracts routines could be scheduled on the shadow/reporting asynch mirror via task manager. Or you may have a sepearte application module for this type of reporting. 

If you need to have results returned to the application on the primary production that is also application dependant.

You should also consider how to handle (e.g. via global mapping) any read/write application databases such as audit or logs which may be overwritten by the primary server. 

If you are going to do reporting on a shadow server search the online documentation for special considerations for "Purging Cached Queries".

Murray Oldfield · Mar 14, 2016 go to post

There are several more aticles to come before we are done with storage IO wink

I will focus more on IOPS and writes in comming weeks. And will show some examples and solutions to the type of problem you mentioned.

Thanks, for the comment. I have quite a few more articles (in my head) for this series, I will be using the comments to help me decide which topics you all are interested in.

Murray Oldfield · Mar 26, 2016 go to post

Thanks for the comments Francis, I think Mark sums up what I was aiming for. The first round of posts is to introduce the major system components that affect performance, and you are right memory has a big role to play along with CPU and IO. There has to be a balance - to keep stretching the analogy  good nutrition and peak performance is the result of a balanced diet. Certainly badly sized or configured memory will cause performance  problems for any application, and with Java applications this is obviously a big concern.  My next post is about capacity planning memory, so hopefully this will be useful - although I will be focusing more on the intersection with Caché. As Mark pointed out NUMA can also have influence performance, but there are strategies to plan for and mitigate the impact of NUMA which I will talk about in my Global Summit presentations, and which I will also cover in this series of posts.

Another aim in this series is to help customers who are monitoring their systems to understand what metrics are important and from that use the pointers in these posts to start to unpack whats going on with their application and why - and whether action needs to be taken. The best benchmark is monitoring and analysing your own live systems.

Murray Oldfield · Mar 31, 2016 go to post

Hi Michael, Its possible to update configuration files, run scripts etc that could be used to configure a system. So the short answer is yes. At the Ensemble level investigate what you can do with Caché[/Ensemble/HealthShare] %installer as well. For example I choose to do Caché config with %installer, rather than edit the cpf file. I use both tools when configuring benchmark servers... I install web servers, Caché and configure the OS etc with Ansible then use %installer to do the final Caché and application level work such as creating databases, namespaces, global mappings, configuring global buffers etc. You can call Caché routines from the command line so once Caché is installed you can run any routine. I'll create a post about this.

I suggest you map out the steps you need to do then get familiar with the Ansible and Caché %installer actions in the respective docs. There is a post on the Community already about %installer. 

Regards, MO

Murray Oldfield · May 4, 2016 go to post

I was asked a couple of questions offline, so the following is to answer them:

Q1. In your article, why do you say it is necessary to change information strings in snmpd.conf? (ie. syslocation/syscontact)?

A1. What I mean is that you should change syslocation and syscontact to reflect your site, but leaving them as the defaults in the sample will not stop SNMP working using this sample snmpd.conffile.

Murray Oldfield · May 4, 2016 go to post

Q2. you also mention basic errors you made in configuring it, which were these? It might be helpful to mention the debugging facilities for snmp (^SYS("MONITOR","SNMP","DEBUG") ) as well?

A2. One problem was misconfiguring the security settings in snmpd.conf. Following the example above will get you there.

I also spun my wheels with what turned out to be a spelling (or case) error on the line agentXSocket tcp:localhost:705. In the end I figured out the problem was to do with agentX not starting by looking at the logs written to the install-dir/mgr/SNMP.log file. Caché logs any problems encountered while establishing a connection or answering requests in the SNMP.log. You should also check cconsole.log and the logs for snmpd in the OS.

On Windows, iscsnmp.dll logs any errors it encounters in %System%\System32\snmpdbg.log (on a 64–bit Windows system, this file is in the SysWOW64 subdirectory).

As pointed out in Fabian's question more information can be logged to the SNMP.log if you set ^SYS("MONITOR","SNMP","DEBUG")=1 in the %SYS namespace and restart the ^SNMP Caché subagent process. This logs details about each message received and sent.

Thanks for the questions MO

Murray Oldfield · May 4, 2016 go to post

Thanks for adding your experience. Yes, your method for sizing per user process makes perfect sense, and that is how I did it when using client/server applications. I spend a lot of time now with a CSP (web) application which has less server processes per user so the calculations are different per user.

The same with memory so plentiful now 1023 MB is often the default for routine buffer. But smaller sites or small VMs may be adjusted down.

The 60/40 came about because of a need for sizing a new site, but I also like the idea of using a % for expected active data. In the end the best path is try and start in the ballpark with the rules we have... over time with constant monitoring adjust if/when needed.

Thanks again. MO

Murray Oldfield · Jul 7, 2016 go to post

Hi, mirroring will not make much difference. Without ECP you may see 10's of writes per second on the database server journal disk (eg primary mirror). With ECP you will see perhaps 1000's of small writes per second on the data server (eg primary mirror) from journal synch activity, which is why you need such tight response time on journal disk when using ECP. My next posts will be about ECP and will cover this in more detail. 

Right now I am on leave in Bali with only occasional Internet access until late July, so look for  post early August.

Murray Oldfield · Sep 5, 2016 go to post

Just a hint... if you are running Linux you can use shell to quickly script a progression of runs for example starting with 2 processes stepping up in 2's to 30 processes.

for i in `seq 2 2 30`; do echo "do ##class(PerfTools.RanRead).Run(\"/db/RANREAD\",${i},10000)" | csession H2015 -U "%SYS"; done

Piping the command to csession requires that Operating-system–based authentication is enabled and your unix shell user exists in Caché.

See: Configuring for Operating-System–based Authentication in DocBook

Murray Oldfield · Oct 14, 2016 go to post

Hi, thanks for the comments. 

Your right I had a bad choice of words... For ECP systems sustained throughput average write response time is for all journals -- because they are on the same disk.

I really can't give guidance on specific IO metrics you will have to validate for you own systems.

Murray Oldfield · Oct 25, 2016 go to post

Hi, if the question is how to format then I use the three tildes on the line immediately before and after the block of code "~~~" eg. The following code block had the tildes fencing the code on the line immediately before and immediately after the block.

eg.

//Get PDF stream object previously saved
   Set pdfStreamContainer = 
etc
etc

~~~ //Get PDF stream object previously saved    Set pdfStreamContainer = ##Class(Ens.StreamContainer).%OpenId(context.StreamContainerID)    Try {      Set pdfStreamObj = pdfStreamContainer.StreamGet()    }    Catch {       $$$TRACE("Error opening stream object ID = "_context.StreamContainerID)       Quit    }  //Set PDF stream object into new OBX:5 segment    Set status = target.StoreFieldStreamBase64(pdfStreamObj,"OBX("_count_"):ObservationValue(1)")    Set ^testglobal("status",$H) = status ~~~
I note that the engine for the community website should automatically highlight known languages but you can override .

eg. On first line before "~~~javascript" or "~~~python"


~~~javascript var s = "JavaScript syntax highlighting"; alert(s); ~~~
~~~python s = "Python syntax highlighting" print s ~~~
Murray Oldfield · Nov 22, 2016 go to post

Hi, back of the envelope logic is like this:

For the old server you have 8 cores. Assuming the workload does not change:

Each core on the new server is capable of about 25% more processing throughput. Or another way; each core of the old server is capable of of about 75% processing throughput of the new server. So roughly (8 *.75) old cores equates to about 6 cores on the new server.

You will have to confirm how your application behaves, but if you are using the calculation to work out how much you can consolidate applications on a new virtualized server you can get a good idea what to expect. If it is virtualized you can also right-size after monitoring to fine tune if you have to.

Murray Oldfield · Nov 25, 2016 go to post

Hi Timur, thanks for the comments and links. I agree, 3D XPoint is a case of waiting to see real performance when it's released. Even 10x lower latency is still a big jump - the figures in the post are what is publicly talked about by Micron now. My aim is to give people a heads up on what's coming and to look out for it (although vendors will be shouting it from the rooftops :) Hopefully we will have some real data and pricing soon.

Murray Oldfield · Dec 7, 2016 go to post

Hi Anzelem, Obviously you had some problems. And the best solution in your case was to use a single instance. My experience is ECP is widely used where customers want a single database instance to scale beyond the resources of a single server. On a well configured solution the impact for a user , e.g. response time for a screen, should be negligible. I have also seen cases where ECP is chosen for HA.

Bottlenecks in the infrastructure (network, storage or application) will have an impact.  This is true for single instance or ECP configurations. As noted in the post ECP has strict requirements for network and and storage that will impact performance. There are also considerations in application design. 

It is true there are additional latencies when using a distributed architecture. Even on a well resourced set-up expect there to be some loss of overall efficiency when comparing processing on a single server vs distributed architecture -- Four ECP application servers (CPU + Memory size x) will not produce throughput equal to the total of four database servers (CPU + Memory size x) running separate instances of an application.  But as above this should not impact individual users experience.

Murray Oldfield · Dec 8, 2016 go to post

Hi Alexey, I have not seen Caché implement distributed databases as you describe at customers. I can see how flash solutions in storage IO is a way to increase IOPS and lower latency, so perhaps some way to moving the choke point on that  particular problem. Of course the problem has not gone away we just moved threshold.

Murray Oldfield · Dec 28, 2016 go to post

Hi, Great tool - I look forward exploring it further. In the meantime... it appears that chart sensor does not work >1,000,000 e.g. see glorefs/sec. It does show on the sensor display...

Is it a simple fix to increase a graph max value? See the example below, 

OK on sensor

Murray Oldfield · Dec 29, 2016 go to post

One of the things I often do in lab environments (not production) is to use the Caché ./mgr folder because I know its always going to exist and has the advantage of also being a short relative path for .xml files etc.

If I copied the setup files  to <cache instance directory>/mgr/polykit/polykit-v1.0.0

Then in Caché I can use a short relative path:

%SYS>set status = $System.OBJ.Load("./polykit/polykit-v1.0.0/DashboardInstaller.xml", "ck")

Murray Oldfield · Dec 29, 2016 go to post

I wanted to blast the dashboard out to a bunch of my lab VMs this morning so I used ansible, I put code snippet here in case its useful.

The full post and introduction for using ansible is here; Ansible Post

I admit this is a bit of a hack but shows some useful syntax.

polykit-install.yml

---
### Main Task - instal polymetric dashboard ###
- debug: msg="Start polykit-install.yml"

- name: local path on laptop for tar.gzip of dashboard kit relative to ansible root
  set_fact:
    local_package_path: ./z-packages

- debug: 
    msg: "{{ local_package_path }}"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!

- name: get instance name
  shell:
    ccontrol qlist | awk -F '^' '{print $1}'
  register: instance_name

- debug: var=instance_name.stdout    

- name: get instance path (add ./mgr )
  shell:
    ccontrol qlist | awk -F '^' '{print $2}'
  register: instance_path

- debug: var=instance_path.stdout    

- name: set base path
  set_fact:
    base_path: "{{ instance_path.stdout }}/mgr/polykit"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!


# Create install directory on target
- name: Create install directory
  file:
    dest="{{ base_path }}"
    mode={{ 777 }}
    state=directory

# Copy polykit file to install directory
- name: Copy polykit.tar.gz to install directory
  copy:
    src="{{ local_package_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    remote_src=False
    
# Unarchive
- name: Unarchive polykit database
  unarchive:
    src="{{ base_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    creates="{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"
    remote_src=yes


# Run installer

# Build text file

- name: build csession file line at a time
  shell:
    echo "set status = \$System.OBJ.Load(\""{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"\", \"ck\")" >"{{ base_path }}"/polykit-inst.txt
  
- shell:
    echo "set status = {{ '#' }}{{ '#' }}class(SYS.Monitor.DashboardInstaller).Install(\""{{ base_path }}/polykit-v1.0.0"\",0)" >> "{{ base_path }}"/polykit-inst.txt

- shell:
    echo "h">>"{{ base_path }}"/polykit-inst.txt

- shell:  
    cat "{{ base_path }}"/polykit-inst.txt
  register: polyinst_check
- debug: var=polyinst_check.stdout_lines

- name: Install dashboard
  shell:  
    cat "{{ base_path }}/polykit-inst.txt" | csession "{{ instance_name.stdout }}" -U %SYS
  register: polyinst_install
- debug: var=polyinst_install.stdout_lines
Murray Oldfield · Dec 29, 2016 go to post

I wanted to blast out the polymetric dashboard to a bunch of my lab VMs this morning so I used ansible, I put code snippet in case its useful.

I admit this is a bit of a hack but shows some useful syntax.

polykit-install.yml

---
### Main Task - instal polymetric dashboard ###
- debug: msg="Start polykit-install.yml"

- name: local path on laptop for tar.gzip of dashboard kit relative to ansible root
  set_fact:
    local_package_path: ./z-packages

- debug: 
    msg: "{{ local_package_path }}"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!

- name: get instance name
  shell:
    ccontrol qlist | awk -F '^' '{print $1}'
  register: instance_name

- debug: var=instance_name.stdout    

- name: get instance path (add ./mgr )
  shell:
    ccontrol qlist | awk -F '^' '{print $2}'
  register: instance_path

- debug: var=instance_path.stdout    

- name: set base path
  set_fact:
    base_path: "{{ instance_path.stdout }}/mgr/polykit"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!


# Create install directory on target
- name: Create install directory
  file:
    dest="{{ base_path }}"
    mode={{ 777 }}
    state=directory

# Copy polykit file to install directory
- name: Copy polykit.tar.gz to install directory
  copy:
    src="{{ local_package_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    remote_src=False
    
# Unarchive
- name: Unarchive polykit database
  unarchive:
    src="{{ base_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    creates="{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"
    remote_src=yes


# Run installer

# Build text file

- name: build csession file line at a time
  shell:
    echo "set status = \$System.OBJ.Load(\""{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"\", \"ck\")" >"{{ base_path }}"/polykit-inst.txt
  
- shell:
    echo "set status = {{ '#' }}{{ '#' }}class(SYS.Monitor.DashboardInstaller).Install(\""{{ base_path }}/polykit-v1.0.0"\",0)" >> "{{ base_path }}"/polykit-inst.txt

- shell:
    echo "h">>"{{ base_path }}"/polykit-inst.txt

- shell:  
    cat "{{ base_path }}"/polykit-inst.txt
  register: polyinst_check
- debug: var=polyinst_check.stdout_lines

- name: Install dashboard
  shell:  
    cat "{{ base_path }}/polykit-inst.txt" | csession "{{ instance_name.stdout }}" -U %SYS
  register: polyinst_install
- debug: var=polyinst_install.stdout_lines
Murray Oldfield · Jan 15, 2017 go to post

Addendum: I was reminded that there is an extra step when configuring external backups.

When the security configuration requires that the backup script supply Caché credentials, you can do this by redirecting input from a file containing the needed credentials. Alternatively, you can enable OS-level authentication and create a Caché account for the OS user running the script.

Please see the online documentation for full details

Murray Oldfield · Jan 15, 2017 go to post

Hi Alexey, good question. There is no one size fits all. My  aim is to highlight how external backups work so teams responsible can evaluate their best solution when talking to vendors. 

Third-party solutions will be a suite of management tools, not simply backup/restore, so there are many features to evaluate. For example, products that backup VMs will have features for change block tracking (CBT) so only changed blocks in the VM (not just changes to CACHE.DAT) are backed up. So incremental. But they also include many other features including replication, compression, deduplication, and data exclusion to manage what is backed up, when and what space is required. Snapshot solutions at the storage array level also have many similar functions. You can also create your own solutions integrating freeze/thaw, for example using LVM snapshots. 

Often a Caché application is only one of many applications and databases at a company. So usually the question is turned around to "can you backup <Caché Application x> with <vendor product y>". So now with knowledge of how to implement freeze/thaw you can advise the vendor of your Caché application requirements.

Murray Oldfield · Jan 18, 2017 go to post

To backup only selected files/filesystems on logical volumes (for example a filesystem on LVM2) the snapshot process and freeze/thaw scripts can still be used and would be just about the same.

As an example the sequence of events is:

  • Start process e.g. via script scheduled via cron
  • Freeze Caché via script as above.
  • Create snapshot volume(s) withlvcreate.
  • Thaw Caché via script as above.
  • mount snapshot filesystem(s) (for safety mount read only).
  • backup snapshot files/filesystems to somewhere else…
  • unmountsnapshot filesystem(s)
  • Remove snapshot volume(s) withlvremove

Assuming the above is scripted with appropriate error traps. This will work for virtual or physical systems.

There are many resources on the web for explaining LVM snapshots. A few key points are:

LVM snapshots use a different copy-on-write to VMware. VMware writes to the delta disk and merges the changes when the snapshot is deleted which has an impact that is managed but must be considered -- as explained above. For LVM snapshots at snapshot creation LVM creates a pool of blocks (the snapshot volume) which also contains a full copy of the LVM metadata of the volume. When writes happen to the main volume the block being overwritten is copied to this new pool on the snapshot volume and the new block is written to the main volume. So the more data that changes between when a snapshot was taken and the current state of the main volume, the more space will get consumed by that snapshot pool. So you must consider the data change rate in your planning. When an access comes for a specific block, LVM knows which block to access.

Like VMware, best practice for production systems is not to have multiple snapshots of the same volume, every time you write to a block in the main volume you potentially trigger writes in every single snapshot in the tree. For the same reason accessing a block can be slower.

Deleting a single snapshot is very fast. LVM just drops the snapshot pool.