Benjamin De Boe · Jan 18, 2016 go to post

Hi Jack,

thanks for sharing your question. iFind actually only uses the iKnow engine, the internal piece of machinery that analyzes natural language text to identify semantic entities and their context. It does not use the iKnow domain infrastructure, which most of the documentation is focused on, but files the output of the iKnow engine into index structures that can be queried using the %FIND syntax or through some of the additional projections in search scenarios.

In order to create an iFind index on your table, you simply add it to the class definition (more info here) and then call the regular %BuildIndices() method (if there was data in it already). In a sense, iFind is a more lightweight, search oriented SQL index type, whereas the iKnow domain infrastructure offers a broader environment for exploring entities and their context.

FYI, I've posted an example search portal built on top of iFind here.

Benjamin De Boe · Jan 18, 2016 go to post

Hi Jack,

you can enable stemming by setting the INDEXOPTION index parameter to 1 (or by leveraging the more flexible TRANSFORMATIONSPEC index parameter if you are on 2016.1).

Class ThePackage.MyClass Extends %Persistent
{
	Property MyStringProperty As %String;
	
	Index MyBasicIndex On (MyStringProperty) As %iFind.Index.Basic(INDEXOPTION=1);
}

The class reference for %iFind.Index.Basic also explains how you can toggle between stemmed and normal search by using the search mode argument:

SELECT * FROM ThePackage.MyClass WHERE %ID %FIND search_index(MyBasicIndex, 'interesting')

for normal search vs using search option 1 for stemmed search:

SELECT * FROM ThePackage.MyClass WHERE %ID %FIND search_index(MyBasicIndex, 'interesting', 1)

 

We do not discard stop words in iFind, in order to ensure you can query for any literal word sequence afterwards. If you start looking at the projections for entities (cf %iKnow.Index.Analytic class ref), you'll see how iKnow offers you a more insightful view of what a sentence is about through the "entity" level, where classic search tech may only offer you the words of a sentence minus the stop words.

regards,

benjamin

Benjamin De Boe · Jan 19, 2016 go to post

Hi Jack,

what exactly do you mean with search using natural language? If you're referring to combined search strings like "snow AND (ski OR ice-skat*)", this is possible today. Or do you mean asking a real literal question? Can you give an example?

thanks,

benjamin

Benjamin De Boe · Jan 28, 2016 go to post

Hi Ben,

thanks for your reply, but that's what I tested first, but didn't seem to work, maybe because it somehow still needs the CSP file to be in the install/CSP/xyz/ folder, where it still only is in install/CSP/abc/. I also tried adding a web app /csp/xyz/test/ that referred to the abc folder and xyz namespace, but that was probably too optimistic (or messy).

Benjamin De Boe · Jan 28, 2016 go to post

Interesting, thanks for trying this out.

Maybe I was asking too much when I tested with a subdirectory of the root web application, in order to still see other CSP pages from my abc namespace. And also, I'd still need to look for a convenient way to map javascript files in the same way. But at least we're half way :o)

Benjamin De Boe · Feb 1, 2016 go to post

Hi Jack,

this is not an out-of-the-box feature of the iKnow technology. iKnow's semantic analysis is targeted at identifying the semantic entities of a sentence, but not at interpreting them, which is typically an application-specific activity. However, we do have some building blocks that will help you create such applications, combining the iKnow analysis of a sentence with domain knowledge you already have. If you look at the indexing results for such a sentence, you'll see that the entities iKnow identifies will usually already present a good structure for your sentence, and human questions are often not that complicated. However, if the database you'll be querying is just un-interpreted free text as well, you'll need much more magic. If you're looking at querying a well-known data structure, it's much more feasible. I once wrote a crude text-to-MDX query tool that translated natural language questions into MDX by matching the concepts in the question to the labels on the dimensions and measures of a DeepSee cube definition. In this case, iKnow played its part in decomposing the question into concepts and relationships, which were then easily "interpreted" by custom code as cube elements and MDX constructs. 

So, in short, iKnow will help you in the semantic analysis of natural language text, but depending on the complexity of the domain, more dedicated (and expensive) tools are usually needed for the subsequent interpretation and inference of results.

benjamin

Benjamin De Boe · Feb 26, 2016 go to post

Hi Benjamin,

in order to enable a web application to use iKnow, you need to check the "iKnow" box in the SMPs Web Application management page (System Administration > Security > Applications > Web Applications). This was mentioned in the release notes of the first version introducing the stricter security policies (or at least the routine behind the checkbox is), but isn't mentioned prominently enough in the iKnow guide. We'll look into that.

This is actually only related to the web interfaces, so Atelier is not involved here. To create iKnow domain definitions through the management portal, look for the "iKnow Architect" in the SMP menu for iKnow.

regards,

benjamin

Benjamin De Boe · Mar 2, 2016 go to post

Hi Benjamin,

if you're familiar with Caché ObjectScript, that's the easiest way to work with iKnow. For example, the script below will add two short "sources" (documents) to your domain and then query the top concepts:

set domainID = 1, domainName = $system.iKnow.GetDomainName(domainID)

write $system.iKnow.IndexString(domainName, "123", "This is a first piece of text to be added to your iKnow domain!")

write $system.iKnow.IndexString(domainName, "234", "This is the second piece of text to be added to your iKnow domain! And guess what, it's an even more inspirational one!")

write ##class(%iKnow.Queries.EntityAPI).GetTop(.result, domainID)

zwrite result

For a good start with iKnow, take a look at this iKnow video and the next ones in the iKnow playlist.

If you prefer to work with SQL and loaded your domain through the iKnow Architect in the management portal, you can invoke those same query APIs through either of the following calls (for domain ID = 1):

CALL %iKnow_Queries.EntityQAPI_GetTop(1)

SELECT * FROM %iKnow_Queries.EntityQAPI_GetTop(1)

Benjamin De Boe · Mar 3, 2016 go to post

Hi Benjamin (sounds like a conversation amongst just Benjamins now!),

The knowledge portal demonstration interface you find in the %iKnow.UI package (which gets a significant visual overhaul in 2016.3) is written using InterSystems' Zen technology, a web development framework that helps you combine client-side JavaScript and server-side Caché ObjectScript to build web applications. If you're good with PHP and/or JavaScript, there's no strict need to dig into Zen to build an iKnow-powered application. You can either use ODBC to connect to Caché and use SQL as in the above examples to query an iKnow domain, or you can build a simple REST service on top of iKnow (in Caché ObjectScript) and query that from your PHP/JavaScript code. We'll be releasing an out-of-the-box REST interface with 2016.3, but it's no rocket science to build one that fits your needs on earlier versions. If you already have an ISC sales engineering contact (none of them called Benjamin, unfortunately ;o) ), we can work together to get you up and running.

FYI, this github repo contains a simple iKnow demo application written with AngularJS and a REST interface. It's technically speaking a CSP page (yet another ISC web technology at a lower level than Zen), but could have been a straight HTML page.

Regards,

Benjamin

Benjamin De Boe · Mar 7, 2016 go to post

Hi Benjamin,

If you just want a SQL prompt, you can open one from the COS prompt by calling "do $system.SQL.Shell()", or use the SQL page in the system management portal which you can find under the "system exploration" menu.

The SQL lister and loader functionality is meant to populate your domain (rather than query it), but should no longer be invoked directly. Managing a domain can be taken care of much more easily through domain definitions, which can be configured through the iKnow Architect as from 2016.1. But I see you're already using one, otherwise you wouldn't have seen that error message (which BTW informs you that this domain definition is configured not to allow any build/config operation other than through the domain definition itself, which is the default setting for domain defs).

If you want to achieve the same result through COS:

write ##class(%iKnow.Queries.EntityAPI).GetTop(.result, domainID)

zwrite result

regards,

benjamin

Benjamin De Boe · Mar 14, 2016 go to post

Hi Evgeny, Jack,

Ranking is new in 2016.1, and will indeed allow you to retrieve a score expressing how well a record matches a search string. A packagename.tablename_indexnameRank function gets automatically generated when  you compile your class with an iFind index and can be invoked as follows:

SELECT %ID, 
Title,
FullText,
SomePackage.TheTable_MyIndexRank(%ID, 'cocktail* OR (hammock AND NOT bees)')
FROM SomePackage.TheTable
WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)')
ORDER BY 4 DESC

There are no public demo servers exposing this functionality at this time.

regards,
benjamin

Benjamin De Boe · Mar 14, 2016 go to post

Hi Jack,

there's no need to normalize your search strings, as it's take care of automatically as part of executing your search when appropriate.

When you use DELETE FROM in SQL, or ##class(Your.Table).%DeleteExtent() in COS, the associated iFind indices' data will be erased as well. To drop just the indices data, use ##class(Your.Table).%PurgeIndices() (cf class ref for refinements). Note that, unless you are using index-local storage (new feature in 2016.1), the words and entities tables will not be wiped as they are shared between all iFind indices in your namespace (somewhat conserving space and indexing efficiency).

iFind can calculate a score representing how well a record satisfies a search string, largely based on TFIDF (although it'll leverage the more refined dominance scores for entities when it can). This is also new in 2016.1. See https://community.intersystems.com/code/ifind-search-portal for an example.

regards,
benjamin

Benjamin De Boe · Mar 18, 2016 go to post

Hi Julie,

For XEP, the XEP guide in the product documentation is probably the best starting point. For iKnow, you can take a look at this video playlist introducing the technology. 

As you may know, InterSystems is also developing a new platform specifically aimed at big data use cases. Part of this new platform will be support for the UIMA standard, as a broader framework for dealing with unstructured data than iKnow's natural language processing alone, allowing you to combine it with third-party or custom utilities. Please send me an email if you'd like to discuss your big data project in more detail.

thanks,
benjamin

Benjamin De Boe · Aug 10, 2016 go to post

It's indeed tempting to just stuff interfaces like this in the kit, but it goes a bit beyond the objectives of pure system management interfaces that we'd typically pack with Caché. Also, in the specific case of this Dictionary Builder demo, it uses the programmatic APIs to create dictionaries (requiring allowCustomUpdates=true) and does not update the domain definition itself. We're actually working on making that a smoother process, so when that gets to a point where it can support the interactions implemented in this GUI (and when AngularJS becomes part of our kit), we can reconsider it.

Benjamin De Boe · Aug 10, 2016 go to post

This is due to a logging issue that has been fixed in 2016.2 and should also be included in a future maintenance release of 2016.1

Benjamin De Boe · Nov 3, 2016 go to post

A bit of a long read, but very nice illustration of how iKnow's bottom up approach allows you to work with the full concepts as coined by the author rather than a top-down approach relying on predefined lists of terms. If this is only their bachelor thesis, I'm looking forward to see their master thesis :-)

Thanks for sharing Otto!

Benjamin De Boe · Nov 3, 2016 go to post

I sometimes like to call myself Kyle on Thursdays that feel like Mondays ;o) 

Like any operator whose name starts with a %, it's an InterSystems-specific one, so both %CONTAINS and %FIND were added to the standard for specific purposes. %CONTAINS has been part of Caché for a long while and indeed offers a simple means to directly refer to your %Text field. So, you don't need to know the name of your index, but you do need to make sure your text field is of type %Text, so there's still a minimum of table metadata you need to be aware of.

In the case of %FIND, we're actually leveraging the %SQL.AbstractFind infrastructure, which is a much more recent addition that allows leveraging code-based filters in SQL. In this case, the code-based filter implementation is provided by the %iFind infrastructure. Because this AbstractFind interface is, what's in a name, abstract, it needs a bit of direction in the form of the index name to wire it to the right implementation, which in our case comes as the index name. As the AbstractFind interface is expected to return a bitmap (for fast filtering), it also needs to be applied to an integer field, typically the IDKey of your class. So, while it offers a lot of programmatic flexibility to implement optimized filtering code (cf this article on spatial index), it does require this possibly odd-looking construct on the SQL side. But that's easily hidden behind a user interface, of course.

Benjamin De Boe · Jan 9, 2017 go to post

you are entirely correct. 

The separate MatchScore column is to accommodate methods where the score is more refined than the pure count-based one with $$$SIMSRCSIMPLE. With $$$SIMSRCDOMENT, dominance is accounted for in this metric and you'll see it'll differ from percentageMatched

Benjamin De Boe · Jan 10, 2017 go to post

The $$$SIMSRCDOMENTS is much more restrictive and may not yield any results if your domain is small and sources are too far apart. I see results when trying it in the Aviation demo dataset. Note that you can loosen it by setting the "strict" parameter to 0 as described in the class ref.

That third alternative you quoted has been deprecated and does not anything to the regular $$$SIMSRCSIMPLE option. You dug too deep in the code ;o)

Regards,
benjamin

Benjamin De Boe · Feb 27, 2017 go to post

Nice article Andreas!

Have you perhaps also looked into creating a more advanced interpreter, rather than just leveraging the JDBC one? I know that's probably a significantly more elaborate thing to do, but a notebook-style interface for well-documented scripting would nicely complement the application development focus of Atelier / Studio.

Thanks,
benjamin

Benjamin De Boe · Mar 1, 2017 go to post

yes, the two-word feature called "executing COS" would probably be quite a step up. It was more a loose idea than something I've researched thoroughly, but maybe the authors of the Caché Web Terminal have some clues on how the connectivity should work (JDBC won't pull it). 

Benjamin De Boe · Mar 10, 2017 go to post

Hi Andreas,

we don't have a release date yet, but we'll certainly be demonstrating it at the Global Summit in September. If you are already using Spark in your organisation today and would be interested in seeing how it may help you make better use of the underlying Caché database, please drop me an email.

Thanks,
benjamin

Benjamin De Boe · Apr 3, 2017 go to post

Hi Max,

the connector we're building is meant to be a smarter alternative to regular JDBC, pushing down filtering work from the Spark side to Caché SQL and leveraging parallelism where possible. So that means you can still use any Spark programming language (Scala, Java, Python or R) while enjoying the optimized connection. However, as it's an implementation of Spark's DataSource API, it's meant to go from Spark to "a data source" and not the other way round, i.e. submit a Spark job from Caché. On the other hand, that'd be something you could probably build without much effort through the Java Gateway. Do you have a particular example or use case in mind? Perhaps that would make an interesting code sample to post on the Developer Community.

Thanks,
benjamin

Benjamin De Boe · Apr 10, 2017 go to post

Cool stuff!

I believe you're using matching dictionaries for identifying those sentiment markers, which is indeed convenient from an API perspective. However, you might want to take advantage of sentiment attributes, which will allow you to not just detect occurrences of your marker terms, but also which parts of the sentence they apply to. I'm not sure how that is covered in your current app (didn't dig that deep into the code), but especially in the recent versions that improved our attribute expansion accuracy, it may improve the precision of your application too. See this article for more details.

Separately, leveraging domain definitions may also simplify the methods you're using to set up your domain. There's an option to load dictionary content from a table or file, leveraging <external> tags inside the <matching> section. It's not (yet) supported through the Architect, but you can add it when updating the class through Studio.

Thanks for sharing this!

benjamin

Benjamin De Boe · Apr 19, 2017 go to post

After posting the initial article, I realized the sample code's use of ^CacheTemp.* globals implied a risk of iKnow.SyncedDefinition subclasses with the same name in different namespaces overwrite one another's data. The revised code now uses the namespace and domain ID as a subscript in ^CacheTemp, which should be safe.

The update also fixes the sample table's CreateTime column to be of type %DeepSee.Datatype.dateTime rather than %Date.

Benjamin De Boe · May 8, 2017 go to post

Hi Evgeny,

nice work!

Maybe you can enhance the interface by also including an iKnow-based KPI to the dashboard exposing the similar or related entities for the concept clicked in the heat map. You can subclass this generic KPI and specify the query you want it to invoke, and then use it as the data source for a table widget. Let me know if I can help.

thanks,
benjamin

Benjamin De Boe · Sep 12, 2017 go to post

Hi Robert,

glad you liked Paul's announcement. Global Summit attendees can pre-register for the limited release program in the Tech Exchange area, at the central booths. After the summit, we'll gradually broaden that program and publish a Field Test closer to the end of this calendar year.

You can find a lot more about the InterSystems IRIS Data Platform on our new website and through this resource guide at learning.intersystems.com. Stay tuned for more articles on the various new features here too.

Thanks,
benjamin

Benjamin De Boe · Sep 18, 2017 go to post

Hi Konstantin,

thanks for sharing your work, a nice application of iFind technology! If I can add a few ideas to make this more lightweight:

  • Rather than creating a domain programmatically, the recommended approach for a few versions now has been to use Domain Definitions. They allow you to declare a domain in an XML format (not much unlike the %Installer approach) and avoid a number of inconveniences in managing your domain in a reproducible way.
  • From reading the article, I believe you're just using the iKnow domain for that one EntityAPI:GetSimilar() call to generate search suggestions. iFind has a similar feature, also exposed through SQL, through %iFind.FindEntities() and %iFind.FindWords(), depending on what kind of results you're looking for. See also this iFind demo. With that in place, you may even be able to skip those domains altogether :-)

thanks,
benjamin