Poor database write performance
Hello,
I have created this script that does lot of writes to a single global. DB write performance is much slower than expected (compared to another similar systems).
set rec = "..."//fill it with somethingset time = $piece($horolog,",",2)
while(($piece($horolog,",",2)-time) < 30) //30 secondsset^A($System.Util.CreateGUID()) = rec
}I have notified the following :
- CPU usage does not reach 100% on a single core (eg: 25% of total CPU usage should be seen on a 4 cores system). Instead, much lower CPU usage is shown (with some drops to 0% from time to time). It looks like process has to wait for I/O completion before proceeding. Removing the set statement in the while loop above (only keeping CreateGUID) allow to reach 100% single core usage.
- In Process Monitor, it writes to the database using mostly 8KB blocks. Even if database is defined to use 8KB blocks, IRIS is usually able to batch multiple writes at same time (giving a better performance). The write to WIJ is done using 256KB blocks (as expected).
.png)
I have another IRIS server with exact same specs, and it behave as expected :
- CPU reach 100% usage on one core (as expected).
- It performs writes using blocks bigger than 8KB, followed by a lot of writes using 512KB blocks at the end (before another journal / database cycle occurs). The slow server does not have such 512KB writes.
.png)
What I have tried :
- restarting the IRIS instance
- using another database (like TEMP). One theory is that "slow" database would be heavily fragmented but it's not the case.
- killing the ^A global before or writing to another global
Additional info :
- databases are the same : encrypted, 8KB block size. Both have similar size (around 1.5TB) and free space left (a few percent). It should be plenty of space to store ^A nodes (not expansion needed).
- journaling is enabled on both systems.
- global buffer cache : both are set to 4095MB
- both systems are using Windows Server 2012 and Hyper-V. CPU frequency is similar.
- both systems are using Sophos. Maybe there is an exclusion rule for D:\ drive made for one of them. But AFAIK that should not explain the "8KB only" writes.
Comments
Out of interest:
- Are both instances running on the same version of Windows including architecture?
- Are both instances using the same license type for IRIS?
- I'm not sure if one using a community/temp license vs a "full" license could have this effect, but thought it's worth asking.
- If you call $System.Util.CreateGUID() without writing it to the database, does the CPU still hit 100% on the single core (just thinking this could point to the CreateGUID call being the bottleneck instead of the DB write)
- Both systems are using Windows Server 2012 R2 Standard and Hyper-V (with same very similar CPU).
- Both systems are using a core license.
- CreateGUID is not the bottleneck for sure. This is something I have checked very early. Removing the write to the global (keeping CreateGUID) will allow CPU to reach 100%. The effect of using a GUID (versus a incremental ID) is to spread out the global node writes, which might affect performance. But that not the explanation, because then both systems should be affected.
I have edited OP to reflect those details.
I have tested this on 4 systems (all very similar), and only one behave like that (slow DB writes).
- Are both instances running under a system service account (or user account)?
- Try to raise process priority before executing your script with: w $SYSTEM.Util.SetPrio(7) - does it change anything?
Do you have write caching enabled on one server vs. others?
First of all i'd point out sophos. it doesn't check the 8k write, it check the full iris.dat file if it's not excluded. Try to switchoff the AV
It looks like there is some disk I/O difference between the two systems.
How does the two systems storage (disk, controller etc.) compares?
Are the 2 systems equally configured in terms available RAM and of global buffers?
Is the shared memory using large pages? (check messages.log during startup)
I'd monitor and compare disk usage and I/O while running.
The spray of 8K I/O is a feature of IRIS. My guess is you have a smart controller combining writes on one server and not the other.
Look at this Storage Configuration
Hi,
Couple of things to check.
Is there any difference in Server design? .e.g. number of disks, scsi controllers, volume/storage distribution etc
Is the VM definition the same? e.g. storage driver versions (generic scsi controller vs hyperV SCSI controller)
Is the OS on the host and in HyperV the same?
Is the storage provider design the same?
Is the IRIS config the same (i.e. cpf file), especially are below settings present?
[config]
wduseasyncio=1
asyncwij=8
I guess both IRIS versions are the exactly the same build although i never heard that to affect disk performance.
In your question, yesterday you published the mgstat results for slow and fast servers. Now it looks like you've deleted this data, since I can't see it. Could you return them?
I noticed a very significant difference there for two parameters, namely routinebuffers and numberofcpus.