Written by

Lead Technical Analyst at Missouri Health Connection
Question Scott Beeson · Oct 6, 2016

ELI5: How does selecting 400,000 records take 12 seconds but selecting 500,000 records take ... to be determined?

I experience this constantly with Cache SQL.  Especially when querying the ATNA log.

SELECT TOP 400000 * FROM HS_IHE_ATNA_Repository.Aggregation ORDER BY ID DESC

That took 12 seconds.  I then upped the number to 500,000 and it took 185 seconds.

Shouldn't the execution time scale proportionately?

If I run the 500,00 query again it takes 2.4 seconds.

Comments

Kyle Baxter · Oct 6, 2016

Hi Scott,

Nope!  The issue here is probably caching.  While things are in memory they are fast, and it is slow when they have to get off of disk. So when you get the next 100,000 rows you need to read data off of the disk and that takes some time.  If everything is in memory or on disk then you might get a proportional increase.

0
Scott Beeson  Oct 7, 2016 to Kyle Baxter

This explains it. And I'd heard this before, I guess it just really doesn't make sense to me.  Sometimes it seems arbitrary which queries are fast and which are not.

0
Alexey Maslov · Oct 6, 2016

Sometimes such strange results are caused by ignoring the fact that usually there are several levels of caching, from high to low:

- Caché global cache

- filesystem cache (on Linux/UNIX only, as Windows version uses direct i/o)

- hdd controller cache.

So even restarting Caché can be not enough to drop the cache for clear "cold" testing. The tester should be aware of data volume involved, it should be much more than hdd controller cache (at least). mgstat can help to figure this out, besides it can show when you start reading data mostly from global cache rather than from filesystem/hdd.

0
Scott Beeson  Oct 7, 2016 to Alexey Maslov

Thanks for the info. Since we're hosted by Intersystems I assume all that was taken into account :)

I just want my data. :)

0