What's this odd behavior with "select count()" and a Unique Index?
Hello,
I have a class with a "Unique" index (pxfactidIndex) on a %Numeric property (pxfactid) (partially-edit code snippet below):
Property pxfactid As%Library.Numeric(MAXVAL = 9223372036854775807, MINVAL = -9223372036854775808, SCALE = 0) [ SqlColumnNumber = 7 ];
Index pxfactidIndex On pxfactid [ Unique ];
Storage Default
{
<Data name="FactDefaultData">
<Value name="1">
<Value>pysubjectid</Value>
</Value>
...
<Value name="6">
<Value>pxfactid</Value>
</Value>
...
</Data>
<DataLocation>^CRMBI.FactD</DataLocation>
<DefaultData>FactDefaultData</DefaultData>
<ExtentLocation>^CRMBI.Fact</ExtentLocation>
<ExtentSize>3521840</ExtentSize>
<IdFunction>sequence</IdFunction>
<IdLocation>^CRMBI.FactD</IdLocation>
<Index name="DDLBEIndex">
<Location>^CRMBI.FactI.2</Location>
</Index>
<Index name="IDKEY">
<Location>^CRMBI.FactD</Location>
</Index>
<Index name="pxfactidIndex">
<Location>^CRMBI.Fact.4</Location>
</Index>
...
<IndexLocation>^CRMBI.FactI</IndexLocation>
<Property name="%%ID">
<AverageFieldSize>4.99</AverageFieldSize>
<Histogram>$lb(.06666666666666666667,1,0,$lb(27673,228574,456707,618301,892202,1107091,1405679,1604163,1738513,2015749,2270724,2530345,2830827,3047237,3296375,3504324),$lb(1,0,0,0,0,1,1,1,0,1,1,1,0,1,1),$lb(842479159,926299955,842544439,842151989,875902519,875902519,909195315,909195315,943272498,943272498,825307191,825243440,875574582,875574582,909128753,909128753,926103605,825701176,842019125,808531255,842477623,842477623,892547123,892547123,942878776,842543920,858797111,808728370,842610227,842610227,892351539,859123764))</Histogram>
<Selectivity>1</Selectivity>
</Property>
<Property name="pxfactid">
<AverageFieldSize>9.99</AverageFieldSize>
<Histogram>$lb(.06666666666666666667,1,0,$lb(-8377896048695389137,-7487164305435710908,-6406061662044370629,-5112561079609519767,-4588722295094185245,-4512887624516713113,-3293381287254976544,-1753938466698510428,-1561375857877981192,-211989232600898713,2857966673366783778,4457490938253726327,5246678144060189771,6591622321393794159,8944949574919674333,8944949574921469469),$lb(1,1,1,1,3,1,1,2,1,0,0,0,0,0,11),$lb(758657847,942880567,926169143,926169143,909389878,909389878,892416306,892416306,875903032,943208242,825374776,875901234,858929459,858929459,825701683,926233401,892743987,825570865,842084665,758264113,842544439,842544439,875836727,875836727,892482614,892482614,909457713,909457713,943273012,825833015,842085430,943273012))</Histogram>
<Selectivity>0.0000283942484610317%</Selectivity>
</Property>I see the following odd behavior when counting how many records have that property set to null:
- The first query (17) returns the number of records in the table. The 2nd query (18) returns what I expect: all records have non-null values for that "pxfactid" property. However, when I run the 3rd query (19) which counts all records with non-null values in that field, I get a much smaller, unexpected, number.
.png)
- The 4th query (20) counts the number of records in a subquery that returns all non-null values for that property in the table and returns the expected number that matches the total number of records in the table.
.png)
Does anyone have any idea of what type of coding error would produce the odd behavior seen in query 19 (wrong number of records with non-null values in that indexed property)?
Note that the values in that field have a large range; here's a sample of some of them:
.png)
Thanks
Comments
Hi Jean,
at first glance I'd expect the query plan for #18. and #19. should be quite similar
:SQL offers 3 levels to see the query plan
- show
- Show the execution plan for the current statement.
- show pl[an] [v[erbose]]
- Shows the current statement execution plan.
- If the verbose qualifier is used, show all the module
- details for the current statement's execution plan;
- Otherwise, display only the top-level module details
- by default.
- show planalt [v[erbose]]
- Shows the current statement alternate execution plans.
- If the verbose qualifier is used, show all the module
- details for the current statement's execution plan and
- all alternate plans; Otherwise, display only the
- top-level module details by default..
This might offer a chance to identify the difference.
There's still an - unlikely - chance that #19 runs on some broken cached query,
that never was updated. so clear cached queries might be a possible solution. Not an explanation
Best regards, Robert
Thank you, Robert (great to hear from you)! I will give those a shot to see what's going on and potentially clear cached queries if nothing obviously problematic shows up in the plans.
Best regards, Jean
Hi again Robert,
I also would have expected the plans for both 18 and 19 to be the same, but "quelle surprise!":
18: Beautiful, expected plan:
.png)
19: What? I must have messed something up in the evolution of this class to have such an expensive plan that produces incorrect results:
.png)
I'll investigate further to figure out how to address that second plan, but any input from you or others would be welcome.
Thanks and best regards.
OOOOOPS 馃槷
I think the difference in relative costs makes sense. In the "WHERE pxfactid IS NULL" query, you are only looping over a specific subsection of the global. This is similar if you have a query that says SELECT COUNT(*) FROM SomeTable WHERE pxfactid = 'hello'
In the second query, you are not looping over subrange of nodes that have a specific value, instead you are looping over the entirety of the index and testing whether its content do not match null. This is equivalent to SELECT COUNT(*) FROM SomeTable WHERE pxfactid != 'hello'
It's weird that your extent size is showing up as 3521840 but your SELECT COUNT(*) says 49468
Maybe you need to rebuild pxfactidIndex?
In any unclear situation, when incorrect data is returned, two things need to be done first: rebuild the indexes and tune tables:
- Kyle Lists Common Problems and Their Solutions: 7 Reasons A Query Returns No Data
- The One Query Performance Trick You NEED to Know? Tune Table!
It turns out this was a bug with parallel processing where the negative numbers in the data were more negative than the null marker caused data to be skipped. Development is fixing this