Written by

Question ha haha · Aug 13

Inconsistency in Caché Database: Mismatch Between Total Count Based on Unique Identifiers, Deduplicated Count, and Conditional Query Results

In the Caché database, when calculating the total count based on the unique identifier of a record, the quantity is over 1.2 million. After removing duplicates based on the unique identifier and then calculating the total count, the quantity is over 400,000. When grouping by the unique identifier, it can be observed that the count for this identifier is not one. However, when performing a conditional query based on the identifier, only one record can be retrieved. Why is this the case?

Product version: Caché 2016.1

Comments

ha haha  Aug 14 to Evgeny Shvarov

I'm not quite clear, could you please elaborate?

0
Robert Cemper  Aug 14 to ha haha

In SMP (System Management Portal), you step to EXLORER and then step into SQL
where you select your TABLE. and can rebuild index


Furthermore, every persistent class  has by default
• classmethod %BuildDeferredIndices
• classmethod %BuildIndices
• classmethod %BuildIndicesAsync
Next variant : use $SYSTEM.OBJ.ValidateIndices()
Details described here  Fix broken index 8 years ago, still valid
 

0
ha haha  Aug 14 to Robert Cemper

Since I'm using a third-party database, I can't fix their indexing issues directly. I need to manually control pagination, so I'm wondering if there's a way to avoid retrieving duplicate indexes. Using SQL deduplication isn't efficient due to the large volume of data, making it difficult to fetch results effectively.

0
Robert Cemper  Aug 15 to ha haha

Well, you have to do it yourself.
Suggestion: Keep a list of the indices processed and skip all followers
For the list you need a small Stored Procedure that you add to
your SQL SELECT in the WHERE clause.

CREATEPROCEDURE SQLUSER.DUPL(valueVARCHAR, idINTEGER)
RETURNSINTEGERLANGUAGE OBJECTSCRIPT
{
 set used=$d(^||dupl(value))
 set ^||dupl(value,id)=$i(^||dupl(value))  
 quit used
}

And in the SELECT

SELECT id, sickindex, . . . . . 
FROM your.data 
WHERE DUPL(sickindex,id) < 1

As a side effect, you create a list of affected indices.
I used a PPG to avoid the need to clear it before use.
If you are interested in the duplicate, you need to change the global name
and add some cleanup before use  
 

0
ha haha  Aug 18 to Robert Cemper

Thank you very much. I will try to use this method.

0
ha haha  Aug 14 to Evgeny Shvarov

Now my problem is that during data migration, duplicate data appears in the queries. Since pagination is manually controlled, I have to perform deduplication after migrating the data to the target database. Because the data volume is large, this is very time-consuming. Is there a good solution?

0