Benjamin De Boe · Apr 17, 2019 go to post

Still not sure I'm entirely on the same page wrt the goal, but $translate is to replace individual characters with other individual characters (so 1 by 1). If you're looking for a proper function in ObjectScript, take a look at $zconvert(), which supports HTML and XML-escaping.

Benjamin De Boe · Apr 23, 2019 go to post

Depends a bit on what you want. If you want to use them in the WHERE clause, you can leave the "list of" structure as-is and use our %FOR SOME ELEMENT syntax. In your case, for retrieving them in the SELECT list, changing the projection or your data model is probably the pragmatic choice. Note that collections are a really powerful feature when working in the Object paradigm, but somewhat constrained by SQL standard operations when accessing through the relational paradigm.

Benjamin De Boe · May 2, 2019 go to post

Good question. Given our long history, we have a fair number of utility methods whose signatures grew organically and for which backwards compatibility goals have prevented us from doing significant cleanup. We're now in the process of reviewing the SQL utility methods and for the ones that really need as many parameters, are considering the use of an options string (aka compile flags), as is being used by some more modern infrastructure such as the work queue manager and/or just making separate specific methods rather than super-generic 10-argument ones. While we're at it, I'm eager to read suggestions or feedback on how others developing utility functions are dealing with this.

Benjamin De Boe · Aug 28, 2019 go to post

Thanks for sharing Dave, great article!

As this is an area we'd like to develop further on the product end, we're very eager to hear customers' experiences and feedback on this.

Benjamin De Boe · Sep 13, 2019 go to post

The Jupyter support is very exciting, adding a neat and highly appropriate mechanism for exposing IRIS-side concepts to a typical Python environment (Jupyter). This release is introducing a first taste of such an interaction, but we're very interested in learning from your experiences and ideas on making this even more effective at adding process control to your Python work. yes

Benjamin De Boe · Oct 3, 2019 go to post

If you know which non-NoExtent (for lack of a YesExtent keyword :-) ) subclass of A you're targeting in B's subclass, you can override your property definition pointing to that subclass.

Benjamin De Boe · Oct 28, 2019 go to post

We're working on support for OVER syntax (as part of SQL window functions) for a future release of IRIS. Feel free to reach out to me directly with more details on the specific requirements you have to see if they correspond to the feature set currently in the pipeline.

Benjamin De Boe · Nov 29, 2019 go to post

Besides the elaborate earlier answers about the actual interfaces under (or at) the hood, maybe a simple question to ask is which version you're on. If I'm not mistaking, that SELECT option was only added some 5 years ago and because the word "legacy" shows up on this page, I thought it's worth checking too ;-)

Benjamin De Boe · Nov 29, 2019 go to post

No, we indeed don't support that syntax feature. Like Evgeny said, this recursion is something we'd typically try to wrap inside ObjectScript methods, which you could then expose as a stored procedure.

Benjamin De Boe · Jan 2, 2020 go to post

Could you share the rest of the error message (hidden behind the alert) and more specifics on the ODBC connection type?

And is there anything special about these tables or do you get this for each and every table?

Benjamin De Boe · Jan 2, 2020 go to post

That's what POSIXTIME does for you under the hood, so no need to require all your queries to be aware of this frugal innovation ;-)

Benjamin De Boe · Mar 29, 2016 go to post

Hi Benjamin,

the default algorithm indeed won't return scores for each record, but will only make the calculation for all records that contain at least a decent number of entities that are relevant in the source document. You can indeed simply approximate the other documents' score by taking 0.

For your specific use case, you may want to take a look at the text categorization infrastructure. I've posted a tutorial on the topic here.

regards,
benjamin

Benjamin De Boe · Mar 29, 2016 go to post

That functionality is not supported through the iKnow Architect. It is our intent to focus on the table and query data location options, as those are just a bit of COS development away from other sources of (meta)data.

Benjamin De Boe · Apr 22, 2016 go to post

Hi Orion,

I haven't heard about index data simply disappearing. There should be no reason for that other than calls to %PurgeIndices() or manually dropping the globals containing the data (including ^ISC.IF* ones containing entities and words shared across the namespace). Another potential issue may arise when importing (at a global level) only either the index or the shared data, resulting in them no longer being in sync. Any chance any of that could have happened?

For the missing class, that might be due to a class import as well, after which not all related classes were recompiled to reflect the changes.

Which version are you working on? Some of those recompiling issues may have been addressed in recent versions.

Perhaps the WRC is a better place to get the appropriate follow-up for specific issues like this one. 

regards,
benjamin

Benjamin De Boe · May 8, 2016 go to post

Hi Terri,

[a little late perhaps (and we spoke briefly in Phoenix as well), but for the sake of completeness, here's your response]

The iKnow Architect page only shows domains based on domain definitions, subclasses of %iKnow.DomainDefinition. The ones created "the old way" (programmatically) through the %iKnow.Domain class do not carry the declarative information about where to load data from, hence the majority of the Architect GUI wouldn't be available on those.

As to the metadata loading question: I'm not sure which part of the script you're referring to was taking care of that metadata loading, but maybe this comes back to your question in a separate thread.

regards,

benjamin

Benjamin De Boe · Nov 3, 2016 go to post

What is the attribute of your texts that you want to categorize? When building TC models on top of iKnow domains, as Chip indicated, the easiest way is to do this based on a metadata field using the %LoadMetadataCategories method on your builder object. The more manual method is using these %AddCategory methods you're using, but they require a filter specification as the second argument (you passed "") to identify which records correspond to the category you're adding. That's what's making it a training set, a set of texts for which the outcome (category) is known, so the TC infrastructure can try to find corellations to exploit. 

Separately, 11 records is a very small training set. I would not expect to find much statistically relevant information to exploit. Even when you're building a rule-based model manually, it'll be barely enough to validate your model. So probably it's worth trying to get hold of (much) more data to start with.

I've uploaded a tutorial we prepared for a Global Summit academy session on the subject in a separate thread (could not attach it to an existing one). That may give you a better idea of what the infrastructure can be used for and also takes you through the GUI, which may be more practical than the code-based approach you're referring to.

Benjamin De Boe · Dec 22, 2016 go to post

Hi Edward,

the thing that comes closest here would be the %iKnow.Queries.SourceAPI:GetSimilar() query, which for a certain seed document, looks for the most similar ones in a domain, optionally constrained by a filter object. The results of that query include a figure like the one you're looking for, expressing how many entities were new in the seed document vs the corpus it's comparing against. Although that that particular calculation isn't available as an atomic function, a simple way to get to what you want would be to use the %iKnow.Filters.SourceIdFilter and just compare against an individual document.

If you prefer to write more code :o), you can just look up the entities in the one document and compare them against those in the others through the %iKnow.Objects.EntityInSourceDetails SQL projection.

Regards,

benjamin

Benjamin De Boe · Jan 5, 2017 go to post

Hi Benjamin,

The (patented) magic of iKnow is the way how it identifies concepts in sentences and happens in a library shipped as a binary, which we refer to as the iKnow engine and is used by both the iKnow APIs and iFind indices. Most of what happens with that engine's output is not nearly as much rocket science and as Eduard indicated, its COS source code can usually be consulted for clues on how it works if you're adventurous. 

The two options of the GetSimilar() query both work by looking at the top concepts of the reference source and look for other sources that have them as well, using frequency and dominance for weighting in-source relevance for the two options respectively. So not much rocket science and only support for full matches at this point. 

This said, iKnow offers you the building blocks to build much more advanced things, quite possibly inspired by your academical research, leveraging the concept level that is unique to iKnow in identifying what a text is really about. For example, you can build vectors containing entity frequency or dominance and look for cosine similarity in this vector space, or you can leverage topic modelling, but many of these will require quite a bit of computation and actual result quality may depend a bit on the nature of the texts you're dealing with, which is why we chose to stick to very simple things in the kit for now.

However, you can find two (slightly) more advanced options in demos we have published online:

  • The iKnow Investigator demo is part of your SAMPLES namespace and offers, next to the SourceAPI:GetSimilar() option, also an implementation that builds a dictionary from your reference source, matches it against your target sources and looks for the ones with the highest aggregated match score, which accounts for partial matches as well.
  • In the iFind Search Portal demo, the bottom of the record viewer popup displays a list of similar records as well. This one is populated based on a bit of iFind juggling implemented in the Demo.SearchPortal.Utils class, leveraging full entity matches only, but easy to extend to weigh in words as well.

In both cases, there's a myriad of options to refine these algorithms, but all at a certain compute cost, given the high dimensionality introduced by the iKnow entity (and actually even word) level. If you have further ideas or, better yet, sample code to achieve better similar document lists, we'd be thrilled to read about it here on the community ;o)

Thanks,
benjamin

Benjamin De Boe · Jun 26, 2017 go to post

Hi Eduard,

you can define iFind indices for calculated fields, so if you point your field calculation to a function that strips out the HTML, you should be fine. The HTML converter in iKnow was built for a slightly different purpose, but can be used here:

Property HtmlText As %String(MAXLEN="");

Property PlainText As %String(MAXLEN="") [ Calculated, ReadOnly, SqlComputed, SqlComputeCode = { set {PlainText} = ##class(%iKnow.Source.Converter.Html).StripTags({HtmlText}) } ];

Regards,
benjamin

Benjamin De Boe · Jul 3, 2017 go to post

Thanks John,

indeed, you'd need a proper license in order to work with iKnow. If the method referred above would return 0, please contact your sales representative to request a temporary trial license and appropriate assistance for implementing your use case.

Also, iKnow doesn't come as a separate namespace. You can create (regular) namespaces as you prefer and use them to store iKnow domain data. You may need to enable your web application for iKnow, which is disabled by default for security reasons in the same way DeepSee is. See this paragraph here for more details.

Benjamin De Boe · Sep 19, 2017 go to post

Hi Steve,

hadn't seen this question until just now, but I have to admit we're a bit storage-hungry with iKnow. If you generate the default full set of indices, for a moderately-sized domain you'll need up to 25x the original dataset size measured as raw text to fit everything. This can drop to half that size (12x) if you forsake all non-essential indices, but that will prevent a number of queries from running smoothly or, in some cases, disable them completely.

For iFind, the numbers are dependent on the type of index. Count on factors 2x, 7x and 15x for Basic, Semantic and Analytic indices, respectively. Of course there's a difference in functionality between all these options and it's best to start from a set of functional requirements and then look at which particular approach covers those.

These numbers are somewhat conservative maximums and, as Eduard already suggested, you may see different (lower) numbers depending on the nature of your data. A more detailed sizing guide is available on request.

Thanks,
benjamin

Benjamin De Boe · Mar 27, 2018 go to post

Horita-san,

The SetParameter() method requires your domain has an ID assigned, which it gets automatically as soon as you call %Save() a first time. Note that it should be returning an error in the sequence of commands you pasted, but it went unnoticed because of the do syntax.

In general, it should be more convenient to work with Domain Definitions rather than the %iKnow.Domain API directly.

Thanks,
benjamin

Benjamin De Boe · Jun 18, 2018 go to post

Hi Eduard,

for this sort of querying (and many other uses outside straight API calls), you can use the SQL projections generated for your domain, as documented in %iKnow.Tables.Utils. That'll generate a column for your Views metadata field on the table containing Source information, which you can then join to the Part (entity occurrence) table to filter the ones containing the requested entity.

Hope this helps,
benjamin

Benjamin De Boe · Jun 14, 2018 go to post

Hi Eduard,

looking at your code, there seem to be a few small things that may each contribute to not seeing the results you were expecting:

  1. the MaintenanceAPI:GetBlackListElements() call returns its results as result(n) = $lb(id, string) with n just an incrementing integer representing the row number. At the other end, the ContainsEntityFilter expects array(string) or a $listbuild(string1, string2, ...). So your filter might be selecting sources containing the strings "1", "2", etc
  2. SourceAPI:GetByDomain() returns result(n) = $lb(sourceID, externalID). That source ID is an internally generated integer ID that has no links to your source table Text.Data. The external ID is typically composed of what you selected as group field and identifier field when loading from a SQL table. So depending on how you set up your domain, that may indeed be the ID field of your Text.Data table. It looks like you have the "simple external IDs" feature switched on, which is why your external IDs only consist of the identifier field, making things indeed easier (but usually only useful/safe when loading from a single table!). Note that this is slightly different for DeepSee-managed domains, where the source ID equals the external ID and corresponds to DeepSee's fact ID, but ignore this confusing comment when not using DeepSee
  3. Finally, and likely irrelevant, you're passing in $$$YES when initializing filterNot. I'm not sure where you're loading that macro from, but that should be a %Boolean with a value of 1 to work as expected, where a string value would translate to a %Boolean with value 0.

Hope this helps,

benjamin

Benjamin De Boe · Aug 8, 2018 go to post

Hi Robert,

in 2018.2, we're introducing a feature called "coordinated backup", which basically allows adding a checkpoint in the journal files of all participating instances so you can roll them back to a synchronized state. We were just working on the docs for that feature the other week and it's four pages if you'd want the comprehensive answer to your question, so this is just a simplified version :-)

Please note that we currently do not support cross-shard transactions on sharded tables. It's not a common requirement for the types of use cases our sharding implementation was designed for (typically more analytical queries), but we're happy to discuss specific scenarios in the context of a POC to see what guarantees can be provided through appropriate application & schema design.

thanks,

benjamin