Minify XML in IRIS
In a project I'm working on we need to store some arbitrary XML in the database. This XML does not have any corresponding class in IRIS, we just need to store it as a string (it's relatively small and can fit in a string).
Since there are MANY (millions!) of records in the database I decided to reduce as much as possible the size without compressing. I know that some XML to be stored is indented, some not, it varies.
To reduce the size I decided to minify the XML, but how do I minify an XML document in IRIS?
I searched across all the classes/utilities and I could not find a ready made code/method, so I had to implement it and it turned out to be fairly simple in IRIS using %XML.TextReader class, frankly simpler than I expected.
Since this can be useful in some other context, I decided to share this little utility with the Developer Community.
I've tested with some fairly complex XML documents and works fine, here is the code.
/// Minify an XML document passed in the XmlIn Stream, the minified XML is returned in XmlOut Stream/// If XmlOut Stream is passed, then the minified XML is stored in the passed Stream, otherwise a %Stream.TmpCharacter in returned in XmlOut./// Collapse = 1 (default), empty elements are collapsed, e.g. <tag></tag> is returned as <tag/>/// ExcludeComments = 1 (default), comments are not returned in the minified XMLClassMethod MinifyXML(XmlIn As%Stream, ByRef XmlOut As%Stream = "", Collapse As%Boolean = 1, ExcludeComments As%Boolean = 1) As%Status
{
#Include %occSAXSet sc=$$$OKTry {
Set Mask=$$$SAXSTARTELEMENT+$$$SAXENDELEMENT+$$$SAXCHARACTERS+$$$SAXCOMMENTSet sc=##class(%XML.TextReader).ParseStream(XmlIn,.reader,,$$$SAXNOVALIDATION,Mask)
#dim reader as%XML.TextReaderIf$$$ISERR(sc) QuitIf '$IsObject(XmlOut) {
Set XmlOut=##class(%Stream.TmpCharacter).%New()
}
While reader.Read() {
Set type=reader.NodeType
If ((type="error")||(type="fatalerror")) {
Set sc=$$$ERROR($$$GeneralError,"Error loading XML "_type_"-"_reader.Value)
Quit
}
If type="element" {
Do XmlOut.Write("<"_reader.Name)
If Collapse && reader.IsEmptyElement {
; collapse empty elementDo XmlOut.Write("/>")
Set ElementEnded=1
} Else {
; add attributesFork=1:1:reader.AttributeCount {
Do reader.MoveToAttributeIndex(k)
Do XmlOut.Write(" "_reader.Name_"="""_reader.Value_"""")
}
Do XmlOut.Write(">")
}
} ElseIf type="chars" {
Set val=reader.Value
Do XmlOut.Write($select((val["<")||(val[">")||(val["&"):"<![CDATA["_$replace(val,"]]>","]]]]><![CDATA[>")_"]]>",1:val))
} ElseIf type="endelement" {
If$g(ElementEnded) {
; ended by collapsingSet ElementEnded=0
} Else {
Do XmlOut.Write("</"_reader.Name_">")
}
} ElseIf 'ExcludeComments && (type="comment") {
Do XmlOut.Write("<!--"_reader.Value_"-->")
}
}
} Catch CatchError {
#dim CatchError as%Exception.SystemExceptionSet sc=CatchError.AsStatus()
}
Quit sc
}P.S.: anyone know if there is other/simpler way to minify XML in IRIS?
Comments
Would canonicalization work for you?
Also consider storing your xmls as gzipped streams (%Stream.GblChrCompress) or compressed strings ($system.Util.Compress). I think it will be more effective as a storage space saving strategy.
Have you tried? Can you share your code?
You edited your post after my answer 😊
Am I missing something or canonicalization does not minify the XML?
For other reasons (how data is consumed) we cannot compress it and the target property is a %String.
Maybe creating a compressed string datatype can be another option in other situations but in this case the target property/class is part of HealthShare (a Registry Slot).