VECTOR inside IRIS
This is an attempt to run a vector search demo completely in IRIS
There are no external tools and all you need is a Terminal / Console and the management portal.
Special thanks to Alvin Ryanputra as his package iris-vector-search that was the base
of inspiration and the source for test data.
My package is based on IRIS 2024.1 release and requires attention to your processor capabilities.
I attempted to write the demo in pure ObjectScript.
Only the calculation of the description_vectoris done in embedded Python
Calculation of a vector with 384 dimensions over 2247 records takes time.
In my Docker container, it was running 01:53:14 to generate them completely.
You have been warned!
So I adjusted this step to be reentrant to allow pausing vector calculation.
Every 50 records you get an offer to have a stop.
The demo looks like this:
USER>do ^A.DemoV Test Vector Search ============================= 1 - Initialize Tables 2 - Generate Data 3 - VECTOR_COSINE 4 - VECTOR_DOT_PRODUCT 5 - Create Scotch 6 - Load Scotch.csv 7 - generate VECTORs 8 - VECTOR search Select Function or * to exit : 8 Default search: Let's look for TOP 3 scotch that costs less than $100, and has an earthy and creamy taste change price limit [100]: 50 change phrase [earthy and creamy taste]: earthy calculating search vector Total below $50: 222 ID price name 1990 40 Wemyss Vintage Malts 'The Peat Chimney,' 8 year old, 40% 1785 39 The Famous Jubilee, 40% 1868 40 Tomatin, 15 year old, 43% 2038 45 Glen Grant, 10 year old, 43% 1733 29 Isle of Skye, 8 year old, 43% 5 Rows(s) Affected
- You see the basic functionalities of Vectors in steps 1..4
- Steps 5..8 are related to the search example I borrowed from Alvin
- Step 6 (import of test data) is straight ObjectScript
SQL LOAD DATA was far too sensible for irregularities in the input CSV
I suggest following the examples also in MGMT portal to watch how Vectors operate.
Comments
Thank you Robert for the time you took to test all of this out and make the code available :) Do you have any observations, conclusions, takeaways after your testing?
Indeed:
I was unable to locate an official docu on new SQL function TO_VECTOR()
Similarly, I found no documentation on how to set a VECTOR Datetype on pure object level
e.g. obj.vectorproperty = ?????? or obj.vectorproperty.set(?????)
I tried a bit with DisplayToLogical but gave up in the end
Here is the documentation for TO_VECTOR, https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cl…
The Vector datatype documentation is below, but doesn't necessarily help a lot. I think this is because you will typically need a Python library, like sentence_transformers used in iris-vector-search, to generate useful vectors.
https://docs.intersystems.com/iris20241/csp/documatic/%25CSP.Documatic…
Thanks.
I was looking for this but couldn't detect.
https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RCOS_fvector
It is not on https://docs.intersystems.com/iris20241/csp/docbook
but on https://docs.intersystems.com/irislatest/csp/docbook
which is not covered by Doc Search
experimenting with class %Library.Vector I found an unattractive way:
;; compose JSON array >> v
USER>zw v
v=[($double(.5)),($double(1.5)),($double(2.2000000000000001776))] ; <DYNAMIC ARRAY>
USER>set vec=##class(%Vector).OdbcToLogical(v)
USER>zw vec
vec={"type":"double", "count":3, "length":3, "vector":[$double(.5),$double(1.5),$double(2.2000000000000001776)]} ; <VECTOR>Applying OdbcToLogical was really shocking
found no documentation on how to set a VECTOR Datetype on pure object level
Maybe you can have a look to the documentation of $vector, $vectorop, $vectordefined and $isvector intrinsic functions.
Thanks. I just looked for TO_VECTOR but failed at that time
https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cl…
covers my needs.
I am not sure that all of the docs are public until the product is fully released.
I think the product IS fully released since more than a week now:
InterSystems announces General Availability of InterSystems IRIS 2024.1
Or I'm missing something?
You even posted in the announce!
I just realized that $vector() is a left+right function similar to $li()
- set $vector(target,...) = .... to set
- set vec = $vector(...) to get
Here's how I was able to use LOAD DATA. First I ran into the issue of commas one field, in otherwise csv data.
So, I edited the data file with vi and changed all the delimiters to the '|' symbol using this vi command. :1,$ s/","/"|"/g
Then using SQL for the create table,
CREATE TABLE scotch_reviews ( name VARCHAR(255),
category VARCHAR(255),
review_point INT,
price DOUBLE,
description VARCHAR(2000),
description_vector VECTOR(DOUBLE, 384))
Then using LOAD DATA:
LOAD BULK DATA FROM FILE 'scotch_reviews.tbl'
INTO scotch_reviews (name, category, review_point, price, description)
USING '{ "from": {"file": {"columnseparator":"|"} } }'
And it worked.