#Unstructured Data

0 Followers · 28 Posts

Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts as well.

Question Kanishk Mittal · Jul 28

We’re building out a data lake in IRIS 2025.1 that aggregates data across multiple business systems and departments. I’m trying to establish best practices for schema design and separation.

Right now, I’m thinking of using a separate schema for each distinct system of record feeding into the data lake - for example, one schema per upstream source system, rather than splitting based on function (e.g. staging, raw, curated). The idea is that this would make it easier to manage source ownership, auditing, and pipeline logic, especially when multiple domains are contributing data.

0
0 0
Article Maxim Gorshkov · Feb 14, 2024 4m read

The invention and popularization of Large Language Models (such as OpenAI's GPT-4) has launched a wave of innovative solutions that can leverage large volumes of unstructured data that was impractical or even impossible to process manually until recently. Such applications may include data retrieval (see Don Woodlock's ML301 course for a great intro to Retrieval Augmented Generation), sentiment analysis, and even fully-autonomous AI agents, just to name a few!

2
0 754
Article José Pereira · May 14, 2024 11m read

TL;DR

This article introduces using the langchain framework supported by IRIS for implementing a Q&A chatbot, focusing on Retrieval Augmented Generation (RAG). It explores how IRIS Vector Search within langchain-iris facilitates storage, retrieval, and semantic search of data, enabling precise and up-to-date responses to user queries. Through seamless integration and processes like indexing and retrieval/generation, RAG applications powered by IRIS enable the capabilities of GenAI systems for InterSystems developers.

0
0 415
Question Nimisha Joseph · Feb 29, 2024

I'm facing an issue while converting an ORU_R01 HL7 message to XML, specifically with the <pidgrpgrp> kind elements. When I use the getvalueat() method before conversion, the XML includes the <pidgrpgrp> and other <grp> elements, but when I don't use the getvalueat() method, the XML is generated without these <grp>elements.

I've attempted to debug the issue using zwrite on the HL7 message before and after calling getvalueat(). Before calling it, the content appears different, and after calling it, the content shows buildmap=1, etc.Please see the xml generated in 2 cases.

0
0 185
Article Veerarajan Karunanithi · Feb 27, 2024 4m read

What is Unstructured Data?
Unstructured data refers to information lacking a predefined data model or organization. In contrast to structured data found in databases with clear structures (e.g., tables and fields), unstructured data lacks a fixed schema. This type of data includes text, images, videos, audio files, social media posts, emails, and more.

Why Are Insights from Unstructured Data Important?
According to an IDC (International Data Corporation) report, 80% of worldwide data is projected to be unstructured by 2025, posing a significant concern for 95% of businesses. Forbes Article
 

0
0 417
Article Iryna Mykhailova · Aug 2, 2022 8m read

Before we start talking about databases and different data models that exist, first we'd better talk about what a database is and how to use it.

A database is an organized collection of data stored and accessed electronically. It is used to store and retrieve structured, semi-structured, or raw data which is often related to a theme or activity.

At the heart of every database lies at least one model used to describe its data. And depending on the model it is based on, a database may have slightly different characteristics and store different types of data.

5
3 1770
Article Henrique Dias · Jan 13, 2022 4m read

Hey community! How are you doing?

I hope to find everyone well, and a happy 2022 to all of you!

Over the years, I've been working on a lot of different projects, and I've been able to find a lot of interesting data.

But, most of the time, the dataset that I used to work with was the customer data. When I started to join the contest in the past couple of years, I began to look for specific web datasets.

I've curated a few data by myself, but I was thinking, "This dataset is enough to help others?"

4
0 391
Question Ahmad Bukhtiar · Nov 19, 2020

I have multiple files with different columns, first 9 values are fixed, so i want to ignore the first value, and next 8 values i want to combine into one value using ^ sign

Current Format

|||||||||||^^||||||^^|||||||||||||||||
|||||||||||^^||||^^|||||||||||||||||||||||
|||||||||||^^|||^^||||||||

Desired Format

^^^^^^|||^^||||||^^|||||||||||||||||
^^^^^^|||^^||||^^|||||||||||||||||||||||
^^^^^^|||^^|||^^||||||||

Reading each line from the file use below code.

#dim line as %String = tInput.ReadLine(, .status)

"here i was to put some string function to change format of the data in line variable"

11
0 995
Article Renato Banzai · Jul 17, 2020 3m read

This is the second post of a series explaining how to create an end-to-end Machine Learning system.

Exploring Data

The InterSystems IRIS already has what we need to explore the data: an SQL Engine! For people who used to explore data in csv or text files this could help to accelerate this step. Basically we explore all the data to understand the intersection (joins) which should help to create a dataset prepared to be used by a machine learning algorithm.

Posts Table ( Provided by Intersystems Team )
Tags Table ( Provided by Intersystems Team )
0
1 337
Article Sergey Kamenev · May 28, 2020 7m read

A More Industrial-Looking Global Storage Scheme

In the first article in this series, we looked at the entity–attribute–value (EAV) model in relational databases, and took a look at the pros and cons of storing those entities, attributes and values in tables. We learned that, despite the benefits of this approach in terms of flexibility, there are some real disadvantages, in particular a basic mismatch between the logical structure of the data and its physical storage, which causes various difficulties.

0
0 939
Article Sergey Kamenev · May 11, 2020 8m read

Introduction

In the first article in this series, we’ll take a look at the entity–attribute–value (EAV) model in relational databases to see how it’s used and what it’s good for. Then we'll compare the EAV model concepts to globals.

Sometimes you have objects with an unknown number of fields, or perhaps hierarchically nested fields, for which, as a rule, you need to search.

0
4 4330
Article Alex Litkovets · Apr 10, 2017 5m read

Introduction

We used the InterSystems iKnow technology to create a review assessment system called iKnow Reviews Analyzer (iKRA). Some information about the prototype of the system can be found here. iKRA analyzes users’ text reviews and automatically rates the object being reviewed. This functionality may come in very handy on e-commerce sites, forums or collections of media content – in other words, everywhere where people discuss products, places or services, for example.

What does the solution do?

5
0 2017
Article Michelle Stolwyk · May 25, 2017 2m read

The Data Platforms department here at InterSystems is gearing up for this year's crop of interns, and I for one am very excited to meet them all next week!

We've got folks from top technical colleges with diverse specialties from hard core engineers to pure computer scientists to mathematicians to business professionals. They come from countries around the world like Vietnam, China, and Finland and they all come with impressive backgrounds. We're sure they will do very well this summer.

0
0 591
Article Benjamin De Boe · Nov 3, 2016 16m read

This article contains the tutorial document for a Global Summit academy session on Text Categorization and provides a helpful starting point to learn about Text Categorization and how iKnow can help you to implement Text Categorization models. This document was originally prepared by Kerry Kirkham and Max Vershinin and should work based on the sample data provided in the SAMPLES namespace.

0
1 758
Article Otto Medin · Nov 1, 2016 1m read

A group of students at the Chalmers University of Technology (Gothenburg, Sweden) tried different approaches to automatically rating the quality of emergency calls, including iKnow.

Excerpt: "The most impressive results produced by iKnow is its ability to correctly classify 100% of the calls using the Average algorithm. This is quite surprising since iKnow only compares low-level concepts, how words relates to each other."

Full story: http://publications.lib.chalmers.se/records/fulltext/244534/244534.pdf

1
0 492
Article Daniel Wijnschenk · Apr 7, 2016 1m read

Presenter: Danny Wijnschenk
Task: Help people make better decisions by letting application deal with all the data.
Approach: As an example, we’ll extend a demo asset management application for portfolio and trade compliance, using iKnow technology to translate agreements into rules that ensure portfolio compliance prior to trade execution.
 

1
0 380
Article Benjamin De Boe · Apr 8, 2016 1m read

Presenter: Benjamin De Boe
Task: Extract specialized information from your unstructured data
Approach: Combine InterSystems iKnow technology with third-party and custom text-processing tools
 

This session explains how you can easily combine ISC, third-party and custom text processing tools to get the broadest insights in your unstructured data.

Content related to this session, including slides, video and additional learning content can be found here.

0
0 391
Article Developer Community Admin · Oct 21, 2015 3m read

Introduction - Analyzing Textual Big Data

Big Data for Enriching Analytical Capabilities - Big data is revolutionizing the world of business intelligence and analytics. Gartner predicts that big data will drive $232 billion in spending through 2016, Wikibon claims that by 2017 big data revenue will have grown to $47.8 billion, and McKinsey Global Institute indicates that big data has the potential to increase the value of the US health care industry by $300 billion and to increase the industry value of Europe's public sector administration by Ä250 billion.

2
0 315
Question Jack Abdo · Feb 2, 2016

Hi,

I created with Studio a persistent class with the following field and index:

Property DescriptionDemande As %String(MAXLEN = "");
Index IDXBASDescriptionDemande On (DescriptionDemande) As %iFind.Index.Basic(INDEXOPTION = 1, LANGUAGE = "fr", LOWER = 1);

INDEXOPTION is set to 1 for activating stemming. I'm indexing french  documents. I have set lower to 1 because I want to do non case sensitive search. 

I inserted a single french word "élément" in the field DescriptionDemande for testing purposes using this query: insert into my_table(DescriptionDemande) values(' élément')

2
1 407
Question Jack Abdo · Jan 15, 2016

Hi,

I created an iKnow domain, where I supplied dictionaries, blacklist, metadata and stemming. The datasource is a table.

I would like to use iFind semantic search feature. It is said in the documentation that iFind use iKnow semantic analysis. But I want iFind to use the iKnow  domain configuration I created earlier earlier. How can I do that ?

Regards,

Jack Abdo.

7
0 434
Question Scott Beeson · Jan 21, 2016

So calling this lookup manually from the console works as expected:

PHR>set key = "WMMC_IMM"
PHR>w ##class(Ens.Util.FunctionSet).Lookup("BlockFeed",key)
1

However, calling it from a method with some concatination to build the key is giving me problems:

ClassMethod canSendToState(iParticipant As %String, iFeed As %String) As %Boolean
{
    set = iParticipant _ "_" _ iFeed
    w "Looking up " _ k,!
    set = ..Lookup("BlockFeed",k,"not found")
    w "x = " _ x,!
}
PHR>w ##class("Custom.MHC.Common.Functions").canSendToState("WMMC","IMM")
Looking up WMMC_IMM
x = not found
8
0 430
Article Developer Community Admin · Oct 21, 2015 1m read

Introduction

Experts estimate that 85% of all data exists in unstructured formats – held in e-mails, documents (contracts, memos, clinical notes, legal briefs), social media feeds, etc. Where structured data typically accounts for quantitative facts, the more interesting and potentially more valuable expert opinions and conclusions are often hidden in these unstructured formats. And with massive volumes of text being generated at unprecedented speed, there’s very little chance this information can be made useful without some process of synthesis or automation.

0
0 290