APM – Monitoring SQL Query Performance
Since Caché 2017 the SQL engine has included new set of statistics. These record the number of times a query is executed and the time it takes to run.
This is a gold mine for anyone monitoring and trying to optimize the performance of an application that includes many SQL statements but it isn’t as easy to access the data as some people want.
This article and the associated sample code explains how to use this information and how to routinely extract a summary of daily statistics and keep a historic record of the SQL performance of your application.
What is recorded?
Every time an SQL statement is executed the time taken is recorded. This is very light weight and you can’t turn it off. The data includes the number of times a query has been executed in the day and the total time taken by the query.
To minimize cost, the statistics are kept in memory. Each job writes the statistics to disks periodically and a task “Update SQL query statistics” puts them all in the stats table. The task is usually scheduled to run hourly, but can be fired off manually, if you want to see the statistics while testing a query, but that whole process requires a little patience.
Warning: In InterSystems IRIS 2019 and earlier these statistics are not collected for embedded SQL in classes or routines that have been deployed using the %Studio.Project:Deploy mechanism. Nothing will break with the sample code, but it might fool you (it fooled me) into thinking everything was OK because nothing showed up as expensive.
How do you see the information?
You can see the list of queries in the management portal. Go to the SQL page and click on the ‘SQL Statements’ tab. This is good for a new query you are running and looking at; but if there are thousands of queries running that can become unmanageable.
The alternative is to use SQL to search for the queries. The information is stored in tables in the INFORMATION_SCHEMA schema. This schema has a number of tables and I have include some sample SQL queries at the end of this article.
From IRIS 2019, the statistics are stored in the STATEMENT_DAILY_STATS table. In earlier versions you need to look at ^rINDEXSQL as you can see in the example code.
When are the statistics removed?
On recent releases, it needs a combination of compiling the class or purging the query, and running "Cleanup SQL Statement Index". But on earlier releases they seemed to be removed more often.
At a live site it is reasonable to expect that the statistics will be kept for more than a day or so, but the tables holding the statistics cannot be relied on as a long term reference source for running reports or long term analysis. On a development or test system, the chances of them getting deleted are higher.
How can you summarize the information?
I recommend extracting the data every night into permanent tables that are easier to work with when generating performance reports. There is a chance that some information is lost if classes are compiled during the day but this is unlikely to make any real difference to your analysis of slow queries.
The code below is an example of how you could extract the statistics into a daily summary for each query. The example accesses teh global directly for compatibility with older versions. It includes three short classes:
- A task that should be run every night.
- DRL.MonitorSQL is the main class that extracts and stores the data from the INFORMATION_SCHEMA tables.
- The third class DRL.MonitorSQLText is an optimization that stores the (potentially long) query text once and only stores the hash of the query in the statistics for each day.
The sample accesses ^rINDEXSQL directly so that it is compatible with older versions of IRIS, but today, I would write it with the SQL queries described below.
Tables in INFORMATION_SCHEMA schema
As well as the statistics, the tables in this schema keep track of where queries, columns, indices etc. are used. Typically the SQL statement is the starting table and it is joined on something like “Statements.Hash=OtherTable.Statement”.
A query for accessing these tables directly to find the most expensive queries for one day would be…
SELECT
Stats.Statement,
Loc.Location,
Stats.Day,
Stats.StatCount,
Round(Stats.StatTotal,2) TotalTime,
Round(Stats.StatVariance,2) Variance,
Round((Stats.StatTotal/Stats.StatCount),3) AvgTime,
SQL.Statement
FROM INFORMATION_SCHEMA.STATEMENT_DAILY_STATS Stats
LEFT JOIN INFORMATION_SCHEMA.STATEMENTS SQL on Stats.Statement=SQL.Hash
LEFT JOIN INFORMATION_SCHEMA.STATEMENT_LOCATIONS Loc on Stats.Statement=Loc.Statement
where Day='04/12/2022'
ORDER by TotalTime DESCWhether you are considering setting up a more systematic process or not, I recommend that everyone with a large application that uses SQL runs this query today.
If a particular query shows up as expensive or slow, you can get the history by running
SELECT
Stats.Day,
Stats.StatCount,
Stats.StatTotal,
Stats.Statement
FROM INFORMATION_SCHEMA.STATEMENT_DAILY_STATS Stats
where Stats.Statement='FaWA3+/Dz0wqCyotECEe39wX2WQ='
order by Stats.Day
Notes on the sample extract code
The task extracts the previous day and should therefore be scheduled shortly after midnight.
You can export more historic data if it exists. To extract the last 120 days
Do ##class(DRL.MonitorSQL).Capture($h-120,$h-1)
The sample code reads the ^rIndex global directly because the earliest versions of the statistics did not expose the Data to SQL.
The variation I have included loops through all namespaces in the instance but that is not always appropriate.
How to query the extracted data
Once the data is extracted, you can find the heaviest queries by running
SELECT top 20S.RunDate,S.RoutineName,S.TotalHits,S.SumpTIme,S.Hash,t.QueryTextfrom DRL.MonitorSQL Sleft join DRL.MonitorSQLText T on S.Hash=T.Hashwhere RunDate='08/25/2019'order by SumpTime desc
Also if you choose the hash for an expensive query you can see the history of that query with
SELECT S.RunDate,S.RoutineName,S.TotalHits,S.SumpTIme,S.Hash,t.QueryTextfrom DRL.MonitorSQL Sleft join DRL.MonitorSQLText T on S.Hash=T.Hashwhere S.Hash='CgOlfRw7pGL4tYbiijYznQ84kmQ='order by RunDate
Real Example
Earlier this year I looked at the most expensive queries from a live site. One query was averaging less than 6 seconds but was being called 14,000 times per day adding up to nearly 24 hours elapsed time every day. Effectively one core was fully occupied with this query. Worse still, the second query that takes an hour was a variation on the first.
|
RunDate |
RoutineName |
Total Hits |
Total Time |
Hash |
QueryText (abbreviated) |
|
03/16/2019 |
14,576 |
85,094 |
5xDSguu4PvK04se2pPiOexeh6aE= |
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) … |
|
|
03/16/2019 |
15,552 |
3,326 |
rCQX+CKPwFR9zOplmtMhxVnQxyw= |
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) , … |
|
|
03/16/2019 |
16,892 |
597 |
yW3catzQzC0KE9euvIJ+o4mDwKc= |
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) , :%col(5) , :%col(6) , :%col(7) , |
|
|
03/16/2019 |
16,664 |
436 |
giShyiqNR3K6pZEt7RWAcen55rs= |
DECLARE C CURSOR FOR SELECT * , TKGROUP INTO :%col(1) , :%col(2) , :%col(3) , .. |
|
|
03/16/2019 |
74,550 |
342 |
4ZClMPqMfyje4m9Wed0NJzxz9qw= |
DECLARE C CURSOR FOR SELECT … |
Table 1: Actual results from customer site showing one query taking almost 24 hours total time
Code Sample for extracting statistics daily
Class DRL.MonitorSQLTask Extends %SYS.Task.Definition
{
Parameter TaskName = "SQL Statistics Summary";
Method OnTask() As %Status
{
set tSC=$$$OK
TRY {
do ##class(DRL.MonitorSQL).Run()
}
CATCH exp {
set tSC=$SYSTEM.Status.Error("Error in SQL Monitor Summary Task")
}
quit tSC
}
}
Class DRL.MonitorSQLText Extends %Persistent
{
/// Hash of query text
Property Hash As %String;
/// query text for hash
Property QueryText As %String(MAXLEN = 9999);
Index IndHash On Hash [ IdKey, Unique ];
}
/// Summary of very low cost SQL query statistics collected in Cache 2017.1 and later. <br/>
/// Refer to documentation on "SQL Statement Details" for information on the source data. <br/>
/// Data is stored by date and time to support queries over time. <br/>
/// Typically run to summarise the SQL query data from the previous day.
Class DRL.MonitorSQL Extends %Persistent
{
/// RunDate and RunTime uniquely identify a run
Property RunDate As %Date;
/// Time the capture was started
/// RunDate and RunTime uniquely identify a run
Property RunTime As %Time;
/// Count of total hits for the time period for
Property TotalHits As %Integer;
/// Sum of pTime
Property SumPTime As %Numeric(SCALE = 4);
/// Routine where SQL is found
Property RoutineName As %String(MAXLEN = 1024);
/// Hash of query text
Property Hash As %String;
Property Variance As %Numeric(SCALE = 4);
/// Namespace where queries are run
Property Namespace As %String;
/// Default run will process the previous days data for a single day.
/// Other date range combinations can be achieved using the Capture method.
ClassMethod Run()
{
//Each run is identified by the start date / time to keep related items together
set h=$h-1
do ..Capture(+h,+h)
}
/// Captures historic statistics for a range of dates
ClassMethod Capture(dfrom, dto)
{
set oldstatsvalue=$system.SQL.SetSQLStatsJob(-1)
set currNS=$znspace
set tSC=##class(%SYS.Namespace).ListAll(.nsArray)
set ns=""
set time=$piece($h,",",2)
kill ^||TMP.MonitorSQL
do {
set ns=$o(nsArray(ns))
quit:ns=""
use 0 write !,"processing namespace ",ns
zn ns
for dateh=dfrom:1:dto {
set hash=""
set purgedun=0
do {
set hash=$order(^rINDEXSQL("sqlidx",1,hash))
continue:hash=""
set stats=$get(^rINDEXSQL("sqlidx",1,hash,"stat",dateh))
continue:stats=""
set ^||TMP.MonitorSQL(dateh,ns,hash)=stats
&SQL(SELECT Location into :tLocation FROM INFORMATION_SCHEMA.STATEMENT_LOCATIONS WHERE Statement=:hash)
if SQLCODE'=0 set Location=""
set ^||TMP.MonitorSQL(dateh,ns,hash,"Location")=tLocation
&SQL(SELECT Statement INTO :Statement FROM INFORMATION_SCHEMA.STATEMENTS WHERE Hash=:hash)
if SQLCODE'=0 set Statement=""
set ^||TMP.MonitorSQL(dateh,ns,hash,"QueryText")=Statement
} while hash'=""
}
} while ns'=""
zn currNS
set dateh=""
do {
set dateh=$o(^||TMP.MonitorSQL(dateh))
quit:dateh=""
set ns=""
do {
set ns=$o(^||TMP.MonitorSQL(dateh,ns))
quit:ns=""
set hash=""
do {
set hash=$o(^||TMP.MonitorSQL(dateh,ns,hash))
quit:hash=""
set stats=$g(^||TMP.MonitorSQL(dateh,ns,hash))
continue:stats=""
// The first time through the loop delete all statistics for the day so it is re-runnable
// But if we run for a day after the raw data has been purged, it will wreck eveything
// so do it here, where we already know there are results to insert in their place.
if purgedun=0 {
&SQL(DELETE FROM websys.MonitorSQL WHERE RunDate=:dateh )
set purgedun=1
}
set tObj=##class(DRL.MonitorSQL).%New()
set tObj.Namespace=ns
set tObj.RunDate=dateh
set tObj.RunTime=time
set tObj.Hash=hash
set tObj.TotalHits=$listget(stats,1)
set tObj.SumPTime=$listget(stats,2)
set tObj.Variance=$listget(stats,3)
set tObj.Variance=$listget(stats,3)
set queryText=^||TMP.MonitorSQL(dateh,ns,hash,"QueryText")
set tObj.RoutineName=^||TMP.MonitorSQL(dateh,ns,hash,"Location")
&SQL(Select ID into :TextID from DRL.MonitorSQLText where Hash=:hash)
if SQLCODE'=0 {
set textref=##class(DRL.MonitorSQLText).%New()
set textref.Hash=tObj.Hash
set textref.QueryText=queryText
set sc=textref.%Save()
}
set tSc=tObj.%Save()
//avoid dupicating the query text in each record because it can be very long. Use a lookup
//table keyed on the hash. If it doesn't exist add it.
if $$$ISERR(tSc) do $system.OBJ.DisplayError(tSc)
if $$$ISERR(tSc) do $system.OBJ.DisplayError(tSc)
} while hash'=""
} while ns'=""
} while dateh'=""
do $system.SQL.SetSQLStatsJob(0)
}
Query Export(RunDateH1 As %Date, RunDateH2 As %Date) As %SQLQuery
{
SELECT S.Hash,RoutineName,RunDate,RunTime,SumPTime,TotalHits,Variance,RoutineName,T.QueryText
FROM DRL.MonitorSQL S LEFT JOIN DRL.MonitorSQLText T on S.Hash=T.Hash
WHERE RunDate>=:RunDateH1 AND RunDate<=:RunDateH2
}
}
Comments
Thanks for sharing Dave, great article!
As this is an area we'd like to develop further on the product end, we're very eager to hear customers' experiences and feedback on this.
The article is considered as InterSystems Data Platform Best Practice.
@David Loveluck, thanks for sharing!
Do you want also to submit the code to Github and Open Exchange?
Hello,
Thank you for this useful tool.
I have an install using HealtShare 2018.1.4 where the node "stat" does not exists .
^rINDEXSQL("sqlidx",1,hash,"stat")
There is something to enabled ?
I don't have this problem on my own install. Thank you.
Lorenzo
Two guesses -
First, check that the task 'Update SQL query statistics' is running and that you have waited long enough for the task to run.
Second, are you looking at a query that is embedded SQL in a 'deployed' class (meaning there is no source code)? There is a problem with statistics for deployed classes, but I don't remember if it matches your situation exactly.
the problem with stats and studio deployed classes goes away in a more recent release (2020 perhaps).
Thank you for your quick reply @David Loveluck !
Indeed, the task Update SQL query statistics was in suspend state.
I resume, start and try again :-)