Written by

~~ retired but not tired ~~
Article Robert Cemper · Mar 21, 2024 2m read

VECTOR inside IRIS

This is an attempt to run a vector search demo completely in IRIS
There are no external tools and all you need is a Terminal / Console and the management portal.
Special thanks to Alvin Ryanputra as his package iris-vector-search that was the base
of inspiration and the source for test data.
My package is based on IRIS 2024.1 release and requires attention to your processor capabilities.

I attempted to write the demo in pure ObjectScript.
Only the calculation of the description_vectoris done in embedded Python
Calculation of a vector with 384 dimensions over 2247 records takes time.
In my Docker container, it was running 01:53:14 to generate them completely.

You have been warned!
So I adjusted this step to be reentrant to allow pausing vector calculation.
Every 50 records you get an offer to have a stop. 
The demo looks like this:

USER>do ^A.DemoV

      Test Vector Search
=============================
     1 - Initialize Tables
     2 - Generate Data
     3 - VECTOR_COSINE
     4 - VECTOR_DOT_PRODUCT
     5 - Create Scotch
     6 - Load Scotch.csv
     7 - generate VECTORs
     8 - VECTOR search
Select Function or * to exit : 8

      Default search:
Let's look for TOP 3 scotch that costs less than $100,
 and has an earthy and creamy taste
     change price limit [100]: 50
     change phrase [earthy and creamy taste]: earthy 

 calculating search vector
  
     Total below $50: 222 

ID      price   name
1990    40      Wemyss Vintage Malts 'The Peat Chimney,' 8 year old, 40%
1785    39      The Famous Jubilee, 40%
1868    40      Tomatin, 15 year old, 43%
2038    45      Glen Grant, 10 year old, 43%
1733    29      Isle of Skye, 8 year old, 43% 5 Rows(s) Affected


- You see the basic functionalities of Vectors in steps 1..4
- Steps 5..8 are related to the search example I borrowed from Alvin
- Step 6 (import of test data) is straight ObjectScript
  SQL LOAD DATA was far too sensible for irregularities in the input CSV

I suggest following the examples also in MGMT portal to watch how Vectors operate.

GitHub
 

Comments

Ben Spead · Mar 21, 2024

Thank you Robert for the time you took to test all of this out and make the code available :)  Do you have any observations, conclusions, takeaways after your testing?

0
Robert Cemper  Mar 21, 2024 to Ben Spead

Indeed:
I was unable to locate an official docu on new  SQL function TO_VECTOR()
Similarly, I found no documentation on how to set a VECTOR Datetype on pure object level
e.g. obj.vectorproperty = ?????? or obj.vectorproperty.set(?????)
I tried a bit with DisplayToLogical  but gave up in the end

0
Robert Cemper  Mar 21, 2024 to Robert Cemper

experimenting with class %Library.Vector I found an unattractive way:

;; compose JSON array  >> v
USER>zw v
v=[($double(.5)),($double(1.5)),($double(2.2000000000000001776))]  ; <DYNAMIC ARRAY>
USER>set vec=##class(%Vector).OdbcToLogical(v)
 
USER>zw vec
vec={"type":"double", "count":3, "length":3, "vector":[$double(.5),$double(1.5),$double(2.2000000000000001776)]}  ; <VECTOR>

Applying OdbcToLogical  was really shocking

0
Enrico Parisi  Mar 21, 2024 to Robert Cemper

found no documentation on how to set a VECTOR Datetype on pure object level

Maybe you can have a look to the documentation of $vector, $vectorop, $vectordefined and $isvector intrinsic functions.

0
Ben Spead  Mar 21, 2024 to Robert Cemper

I am not sure that all of the docs are public until the product is fully released.

0
Robert Cemper  Mar 21, 2024 to Enrico Parisi

I just realized that $vector() is a left+right function similar to $li()

  • set $vector(target,...) = ....  to set
  • set vec = $vector(...)          to get   
0
Mark Morris · Jun 27, 2024

Here's how I was able to use LOAD DATA.  First I ran into the issue of commas one field, in otherwise csv data.
So, I edited the data file with vi and changed all the delimiters to the '|' symbol using this vi command. :1,$ s/","/"|"/g
Then using SQL for the create table,
CREATE TABLE scotch_reviews ( name VARCHAR(255),
            category VARCHAR(255),
            review_point INT,
            price DOUBLE,
            description VARCHAR(2000),
            description_vector VECTOR(DOUBLE, 384))
Then using LOAD DATA:
LOAD BULK DATA FROM FILE 'scotch_reviews.tbl'
    INTO scotch_reviews (name, category, review_point, price, description)
    USING '{ "from": {"file": {"columnseparator":"|"} } }'
And it worked.

0