Written by

Sales Engineer at InterSystems
Discussion Sylvain Guilbaud · Feb 21, 2022

How to efficiently store historical data, similar to current data, without mixing physical storage ?

Let's consider you would like to efficiently store your historical data in a similar structure than the one used for your current data, but without sharing the same physical storage (ie : not in the same global). What is the most efficient way to do it ?

Below a simple class of your current data : 


Class data.current.person Extends (%Persistent, %Populate)
{

Parameter DEFAULTGLOBAL = "^on.person";

Property name As %String;

Property dob As %Date(FORMAT = 4);

Property activ As %Boolean [ InitialExpression = 1 ];

Property created As %TimeStamp [ InitialExpression = {$zdt($h,3)} ];

Storage Default { <Data name="personDefaultData"> <Value name="1"> <Value>%%CLASSNAME</Value> </Value> <Value name="2"> <Value>name</Value> </Value> <Value name="3"> <Value>dob</Value> </Value> <Value name="4"> <Value>activ</Value> </Value> <Value name="5"> <Value>created</Value> </Value> </Data> <DataLocation>^on.personD</DataLocation> <DefaultData>personDefaultData</DefaultData> <IdLocation>^on.personD</IdLocation> <IndexLocation>^on.personI</IndexLocation> <StreamLocation>^on.personS</StreamLocation> <Type>%Storage.Persistent</Type> }

}

NB : you can omit the DEFAULTGLOBAL parameter if you're happy with the one automatically generated (which is the case, most of the time) ;  I use it in my example to emphasize the separation of globals in current and archive classes.

First approach

You could first think about creating 2 persistent classes and copy the same definition in both classes (except the Storage clause which will be recreated at the first compilation) :


Class data.current.person Extends %Persistent {}

Class data.archive.person Extends %Persistent {}

But this approach will rapidly reveal itself disastrous in terms of maintenance, for obvious reason of duplicating the code/definitions.

Second approach

Another idea could be to create a common class with ClassType = "" in order to not create the storage definition at its level. Then, you can make archive and current classes inherit from this common class and add %Persistent inheritance in both subclasses.


Class data.common.person Extends %Persistent [ ClassType = "" ] {}

Class data.current.person Extends (%Persistent,data.common.person) [ClassType = persistent ] {}

Class data.archive.person Extends (%Persistent,data.common.person) [ ClassType = persistent ] {}

This way, you solve the question and get separate globals for each persistent class with a unique class to store all the definition. 

But, this is not really convenient to manage 3 classes instead of 2 and to have to maintain the definition in the common class ; neither to be obliged to open the common class to get the detail of the definition of the current class.

Third approach

Well, a third solution could be to return to a very simple approach : just create an archive class which inherits from the current data class...


Class data.current.person Extends %Persistent {}

Class data.archive.person Extends data.current.person {}

In that case, current and historical data will share the same storage definition and, as a result, their data will be stored in the same globals, which is not really convenient to physically separate both kind of data and thus not answering to our need.

A simpler approach

You can obtain a similar structure in archive class which inherits from current, without inheriting from its storage, by simply adding %Persistent inheritance before the class you want to inherit from.

In this case, even if the super class already contains a storage class with DataLocation (because of its persistent type), at compile time, the inheritance precedence will force the generation of a new storage clause based on the subclass definition, with its own storage for DataLocation, IdLocation, IndexLocation, StreamLocation.


Class data.archive.person Extends (%Persistent, data.current.person)
{
Parameter DEFAULTGLOBAL = "^off.person";

Storage Default
{
%%CLASSNAMEnamedobactivcreated^off.personDpersonDefaultData^off.personD^off.personI^off.personS%Storage.Persistent
}
}

 

Other ideas ?

If you have other ways to address this question, please share your ideas and comments.

Comments

Eduard Lebedyuk · Feb 21, 2022

I'm using a modified second approach.

1. Create a base abstract class with all properties:

Class model.PersonBase Extends (%XML.Adaptor, %JSON.Adaptor) [ Abstract ]
{

Property Name;

}

2. Create persistent class:

Class model.Person Extends (%Persistent, model.PersonBase, Utils.Copyable
{
/// Indices, fkeys and relationships go here
}

Utils.Copyable allows easy copy/compare.

3. Create snapshot class:

Class model.PersonSnapshot Extends (%Persistent,  model.PersonBase, Utils.Copyable)
{

Index parentIndex On parent;

/// Creation timestamp (UTC)
Property createdOn As %TimeStamp(POPORDER = -1, XMLPROJECTION = "NONE") [ InitialExpression = {$ZDATETIME($ZTIMESTAMP, 3, 1, 3)}, Required ];

Property parent As model.Person(XMLPROJECTION = "NONE");

Method %OnNew(parentId As %Integer) As %Status [ Private, ServerOnly = 1 ]
{
    #dim sc As %Status = $$$OK
    
    set ..parent = ##class(model.Person).%OpenId(parentId,, .sc)
    if $$$ISERR(sc) quit sc

    quit ..copyFrom(..parent, ##class(model.PersonBase).%ClassName(1))
}

}

And done.

This way you can add a snapshot of object at any point of time and by calling compareTo calculate a diff between any two versions.

There's also a hash function you can use to calculate hash from some or all object properties to speed up equivalence checks.

0
Sylvain Guilbaud  Feb 22, 2022 to Eduard Lebedyuk

Thanks Eduard for sharing your code implementing a very powerful approach of data snapshot.
 

0
Robert Cemper · Feb 22, 2022

You have a lucky design without indices. That makes life with 2 distinct globals easy.
The archiving action maintaining the original ID could be achieved by a simple
MERGE from ^source to ^archive.  maintenance of indices could be slightly
effort requiring class methods.  
 

0
Sylvain Guilbaud  Feb 22, 2022 to Robert Cemper

Thanks Robert for your comment.
 

Merging globals is exactly what the toArchive method does here :


 


Class data.archive.person Extends (%Persistent, data.current.person)
{

Parameter DEFAULTGLOBAL = "^off.person";

/// Description ClassMethod archive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status { set sc = $$$OK , tableName = "" set (archived,archivedErrors, severity) = 0

set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
set targetClassName = ..%ClassName(1)

set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

set tableName = $$$CLASSsqlschemaname($$$gWRK,sourceClassName) _"."_  $$$CLASSsqltablename($$$gWRK,sourceClassName)

if $ISOBJECT(sourceClass) 
 &amp; $ISOBJECT(targetClass)
 &amp; tableName '= "" {
    if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
     &amp; $ISOBJECT(targetClass.Storages.GetAt(1))
     {
        set tStatement=##class(%SQL.Statement).%New(1) 
        kill sql
        set sql($i(sql)) = "SELECT" 
        set sql($i(sql)) = "id"  
        set sql($i(sql)) = "FROM"
        set sql($i(sql)) = tableName
        set sc = tStatement.%Prepare(.sql) 
        set result = tStatement.%Execute()

        kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

        while result.%Next() {
            set source = $CLASSMETHOD(sourceClassName,"%OpenId",result.%Get("id"))

            if $ISOBJECT(source) {
                set archive = $CLASSMETHOD(targetClassName,"%New")

                for i = 1:1:sourceClass.Properties.Count() {
                    set propertyName = sourceClass.Properties.GetAt(i).Name
                    set $PROPERTY(archive,propertyName) = $PROPERTY(source,propertyName)
                }

                set sc = archive.%Save()
                if sc {
                    set archived = archived + 1
                } else {
                    set archivedErrors = archivedErrors + 1
                }
            }
        }

        kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

        set msg ="archive data from " _ sourceClassName _ " into "_ targetClassName _ " result:" _ archived _ " archived (errors:" _ archivedErrors _ ")"

   } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
    }
} else {
    set severity = 1
    set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
}
do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
Return sc

}

ClassMethod toArchive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status { set sc=$$$OK

set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
set targetClassName = ..%ClassName(1)
set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

if $ISOBJECT(sourceClass) 
 &amp; $ISOBJECT(targetClass) {
    if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
     &amp; $ISOBJECT(targetClass.Storages.GetAt(1))
     {

        set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
        set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
        set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

        set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
        set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
        set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

        kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

        merge @targetDataLocation = @sourceDataLocation
        merge @targetIndexLocation = @sourceIndexLocation
        merge @targetStreamLocation = @sourceStreamLocation

        set ^mergeTrace($i(^mergeTrace)) = $lb($zdt($h,3),sourceDataLocation)

        kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

        set severity = 0
        set msg = "ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " SUCCESSFULLY"
                

    } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
    }
} else {
    set severity = 1
    set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
}
do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
return sc

}

Storage Default { %%CLASSNAMEnamedobactivcreated^off.personDpersonDefaultData^off.personD^off.personI^off.personS%Storage.Persistent }

}

0
Eduard Lebedyuk  Feb 23, 2022 to Robert Cemper

Okay, why would you want to index active versions and old versions together?

In my design I explicitly create indices for the active version only. Old versions are only indexed on the parent field.

0
Robert Cemper  Feb 23, 2022 to Eduard Lebedyuk

Why together ? ? ?  Who wants that ? ? ?   Where ? ? ?
A simple MERGE ^archive(....)=^source(....)  just doesn't maintain any index.
this was my message    

0
Eduard Lebedyuk  Feb 23, 2022 to Robert Cemper

You have a lucky design without indices. That makes life with 2 distinct globals easy.

I though you have a common index for current and archive versions. I guess I misunderstood your point.

A simple MERGE ^archive(....)=^source(....)  just doesn't maintain any index.

I think the biggest issue is that it can store only one (previous) version. Or if you merge:

merge ^archive(id, ts) = ^source(id)

you'll need a custom storage for a composite id key.

0
Robert Cemper  Feb 23, 2022 to Eduard Lebedyuk

YEAH! A history track makes it definitely more complex

0