How to efficiently store historical data, similar to current data, without mixing physical storage ?
Let's consider you would like to efficiently store your historical data in a similar structure than the one used for your current data, but without sharing the same physical storage (ie : not in the same global). What is the most efficient way to do it ?
Below a simple class of your current data :
Class data.current.person Extends (%Persistent, %Populate) {Parameter DEFAULTGLOBAL = "^on.person";
Property name As %String;
Property dob As %Date(FORMAT = 4);
Property activ As %Boolean [ InitialExpression = 1 ];
Property created As %TimeStamp [ InitialExpression = {$zdt($h,3)} ];
Storage Default { <Data name="personDefaultData"> <Value name="1"> <Value>%%CLASSNAME</Value> </Value> <Value name="2"> <Value>name</Value> </Value> <Value name="3"> <Value>dob</Value> </Value> <Value name="4"> <Value>activ</Value> </Value> <Value name="5"> <Value>created</Value> </Value> </Data> <DataLocation>^on.personD</DataLocation> <DefaultData>personDefaultData</DefaultData> <IdLocation>^on.personD</IdLocation> <IndexLocation>^on.personI</IndexLocation> <StreamLocation>^on.personS</StreamLocation> <Type>%Storage.Persistent</Type> }}
NB : you can omit the DEFAULTGLOBAL parameter if you're happy with the one automatically generated (which is the case, most of the time) ; I use it in my example to emphasize the separation of globals in current and archive classes.
First approach
You could first think about creating 2 persistent classes and copy the same definition in both classes (except the Storage clause which will be recreated at the first compilation) :
Class data.current.person Extends %Persistent {}Class data.archive.person Extends %Persistent {}
But this approach will rapidly reveal itself disastrous in terms of maintenance, for obvious reason of duplicating the code/definitions.
Second approach
Another idea could be to create a common class with ClassType = "" in order to not create the storage definition at its level. Then, you can make archive and current classes inherit from this common class and add %Persistent inheritance in both subclasses.
Class data.common.person Extends %Persistent [ ClassType = "" ] {}Class data.current.person Extends (%Persistent,data.common.person) [ClassType = persistent ] {}Class data.archive.person Extends (%Persistent,data.common.person) [ ClassType = persistent ] {}
This way, you solve the question and get separate globals for each persistent class with a unique class to store all the definition.
But, this is not really convenient to manage 3 classes instead of 2 and to have to maintain the definition in the common class ; neither to be obliged to open the common class to get the detail of the definition of the current class.
Third approach
Well, a third solution could be to return to a very simple approach : just create an archive class which inherits from the current data class...
Class data.current.person Extends %Persistent {}Class data.archive.person Extends data.current.person {}
In that case, current and historical data will share the same storage definition and, as a result, their data will be stored in the same globals, which is not really convenient to physically separate both kind of data and thus not answering to our need.
A simpler approach
You can obtain a similar structure in archive class which inherits from current, without inheriting from its storage, by simply adding %Persistent inheritance before the class you want to inherit from.
In this case, even if the super class already contains a storage class with DataLocation (because of its persistent type), at compile time, the inheritance precedence will force the generation of a new storage clause based on the subclass definition, with its own storage for DataLocation, IdLocation, IndexLocation, StreamLocation.
Class data.archive.person Extends (%Persistent, data.current.person) { Parameter DEFAULTGLOBAL = "^off.person";Storage Default {%%CLASSNAME name dob activ created ^off.personD personDefaultData ^off.personD ^off.personI ^off.personS %Storage.Persistent } }
Other ideas ?
If you have other ways to address this question, please share your ideas and comments.
Comments
I'm using a modified second approach.
1. Create a base abstract class with all properties:
Class model.PersonBase Extends (%XML.Adaptor, %JSON.Adaptor) [ Abstract ]
{
Property Name;
}2. Create persistent class:
Class model.Person Extends (%Persistent, model.PersonBase, Utils.Copyable
{
/// Indices, fkeys and relationships go here
}Utils.Copyable allows easy copy/compare.
3. Create snapshot class:
Class model.PersonSnapshot Extends (%Persistent, model.PersonBase, Utils.Copyable)
{
Index parentIndex On parent;
/// Creation timestamp (UTC)
Property createdOn As %TimeStamp(POPORDER = -1, XMLPROJECTION = "NONE") [ InitialExpression = {$ZDATETIME($ZTIMESTAMP, 3, 1, 3)}, Required ];
Property parent As model.Person(XMLPROJECTION = "NONE");
Method %OnNew(parentId As %Integer) As %Status [ Private, ServerOnly = 1 ]
{
#dim sc As %Status = $$$OK
set ..parent = ##class(model.Person).%OpenId(parentId,, .sc)
if $$$ISERR(sc) quit sc
quit ..copyFrom(..parent, ##class(model.PersonBase).%ClassName(1))
}
}And done.
This way you can add a snapshot of object at any point of time and by calling compareTo calculate a diff between any two versions.
There's also a hash function you can use to calculate hash from some or all object properties to speed up equivalence checks.
Thanks Eduard for sharing your code implementing a very powerful approach of data snapshot.
You have a lucky design without indices. That makes life with 2 distinct globals easy.
The archiving action maintaining the original ID could be achieved by a simple
MERGE from ^source to ^archive. maintenance of indices could be slightly
effort requiring class methods.
Thanks Robert for your comment.
Merging globals is exactly what the toArchive method does here :
Class data.archive.person Extends (%Persistent, data.current.person) {Parameter DEFAULTGLOBAL = "^off.person";
/// Description ClassMethod archive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status { set sc = $$$OK , tableName = "" set (archived,archivedErrors, severity) = 0
set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2) set targetClassName = ..%ClassName(1) set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName) set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation set tableName = $$$CLASSsqlschemaname($$$gWRK,sourceClassName) _"."_ $$$CLASSsqltablename($$$gWRK,sourceClassName) if $ISOBJECT(sourceClass) & $ISOBJECT(targetClass) & tableName '= "" { if $ISOBJECT(sourceClass.Storages.GetAt(1)) & $ISOBJECT(targetClass.Storages.GetAt(1)) { set tStatement=##class(%SQL.Statement).%New(1) kill sql set sql($i(sql)) = "SELECT" set sql($i(sql)) = "id" set sql($i(sql)) = "FROM" set sql($i(sql)) = tableName set sc = tStatement.%Prepare(.sql) set result = tStatement.%Execute() kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation while result.%Next() { set source = $CLASSMETHOD(sourceClassName,"%OpenId",result.%Get("id")) if $ISOBJECT(source) { set archive = $CLASSMETHOD(targetClassName,"%New") for i = 1:1:sourceClass.Properties.Count() { set propertyName = sourceClass.Properties.GetAt(i).Name set $PROPERTY(archive,propertyName) = $PROPERTY(source,propertyName) } set sc = archive.%Save() if sc { set archived = archived + 1 } else { set archivedErrors = archivedErrors + 1 } } } kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation set msg ="archive data from " _ sourceClassName _ " into "_ targetClassName _ " result:" _ archived _ " archived (errors:" _ archivedErrors _ ")" } else { set severity = 1 set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition" } } else { set severity = 1 set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition" } do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity) Return sc}
ClassMethod toArchive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status { set sc=$$$OK
set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2) set targetClassName = ..%ClassName(1) set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName) if $ISOBJECT(sourceClass) & $ISOBJECT(targetClass) { if $ISOBJECT(sourceClass.Storages.GetAt(1)) & $ISOBJECT(targetClass.Storages.GetAt(1)) { set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation merge @targetDataLocation = @sourceDataLocation merge @targetIndexLocation = @sourceIndexLocation merge @targetStreamLocation = @sourceStreamLocation set ^mergeTrace($i(^mergeTrace)) = $lb($zdt($h,3),sourceDataLocation) kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation set severity = 0 set msg = "ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " SUCCESSFULLY" } else { set severity = 1 set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition" } } else { set severity = 1 set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition" } do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity) return sc}
Storage Default {%%CLASSNAME name dob activ created ^off.personD personDefaultData ^off.personD ^off.personI ^off.personS %Storage.Persistent }}
Okay, why would you want to index active versions and old versions together?
In my design I explicitly create indices for the active version only. Old versions are only indexed on the parent field.
Why together ? ? ? Who wants that ? ? ? Where ? ? ?
A simple MERGE ^archive(....)=^source(....) just doesn't maintain any index.
this was my message
You have a lucky design without indices. That makes life with 2 distinct globals easy.
I though you have a common index for current and archive versions. I guess I misunderstood your point.
A simple MERGE ^archive(....)=^source(....) just doesn't maintain any index.
I think the biggest issue is that it can store only one (previous) version. Or if you merge:
merge ^archive(id, ts) = ^source(id)you'll need a custom storage for a composite id key.
YEAH! A history track makes it definitely more complex