#Big Data

0 Followers · 44 Posts

Big data is a field that treats ways to analyze, systematically extract information from. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. 

Learn more.

Article Timothy Scott · Feb 28 7m read

High-Performance Message Searching in Health Connect

The Problem

Have you ever tried to do a search in Message Viewer on a busy interface and had the query time out? This can become quite a problem as the amount of data increases. For context, the instance of Health Connect I am working with does roughly 155 million Message Headers per day with 21 day message retention. To try and help with search performance, we extended the built-in SearchTable with commonly used fields in hopes that indexing these fields would result in faster query times. Despite this, we still couldn't get some of these queries to finish at all.

0
0 0
Question Kanishk Mittal · Jul 28

We’re building out a data lake in IRIS 2025.1 that aggregates data across multiple business systems and departments. I’m trying to establish best practices for schema design and separation.

Right now, I’m thinking of using a separate schema for each distinct system of record feeding into the data lake - for example, one schema per upstream source system, rather than splitting based on function (e.g. staging, raw, curated). The idea is that this would make it easier to manage source ownership, auditing, and pipeline logic, especially when multiple domains are contributing data.

0
0 0
Question Michael Davidovich · Apr 10

Hello,

Our software commonly returns a full result set to the client and we use the DataTables plugin to display table data.  This has worked well, but at datasets grow larger, we are trying to move some of these requests server-side so the server handles the bulk of the work rather than the client.  This has had me scratching my head in so many ways.  

I'm hoping I can get a mix of general best practice advice but also maybe some IRIS specific ideas.

Some background

0
0 0
Question Steven Henry Suhendra · Dec 2, 2024

Hello My Friends,

I have a question how to use order by %DLIST, this is my code:

SELECT

$ListToString(%DLIST(DISTINCT MRDIA_ICDCode_DR->MRCID_Code),', ' ) ICDX,

$ListToString(%DLIST(DISTINCT (MRDIA_ICDCode_DR->MRCID_Desc || ' (' || MRDIA_DiagnosisType_DR->DTYP_Code || ')')),', ' ) Diagnose

FROM SQLUser.PA_Adm

LEFT JOIN SQLUser.PA_AdmInsurance ON (PAADM_RowID = INS_ParRef AND INS_Rank = 1)

LEFT JOIN SQLUser.PA_AdmPackage ON (PAADM_RowID = PACK_ParRef)

LEFT JOIN SQLUser.MR_Adm on MRADM_ADM_DR = PAADM_RowID

LEFT JOIN SQLUser.MR_Diagnos ON MRADM_RowId = MRDIA_MRADM_ParRef

0
0 0
Announcement Jean Millette · Jun 12, 2024

TL;DR: My comment to Microsoft when I voted:Our team has implemented most of what we need for source management of Power BI Report files in Perforce. The missing piece? Automated creation of ".pbix" files from ".pbit" template files that can deployed to the Power BI Report Server. Let's get the manual “Power BI Desktop->File->Save As” step out of the process to make report deployment totally automated.

0
0 0
Article Yuri Marx · Nov 20, 2023 3m read

In the world of Big Data, selecting the right file format is crucial for efficient data storage, processing, and analysis. With the massive amount of data generated every day, choosing the appropriate format can greatly impact the speed, cost, and accuracy of data processing tasks. There are several file formats available, each with its own set of advantages and disadvantages, making the decision of which one to use complex. Some of the popular Big Data file formats include CSV, JSON, Avro, ORC, and Parquet. The last one, Parquet, is a columnar storage format that is highly compressed and

2
1 578
Article Yuri Marx · Nov 27, 2023 3m read

According to Databricks Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Apache Parquet is designed to be a common interchange format for both batch and interactive workloads. It is similar to other columnar-storage file formats available in Hadoop, namely RCFile and ORC. (source: https://www.databricks.com/glossary/what-is-parquet). Below are the characteristics and benefits of Parquet according to

0
1 355
Discussion Dmitry Maslennikov · Sep 1, 2023

In today's landscape, enterprises have grown substantially in scale, amassing vast amounts of data. This data is collected from a plethora of sources including different applications, databases, and other channels. Given the diversity and volume of this data, it's only logical for these enterprises to seek a deeper understanding of what their data entails. Some of the data can be stored in IRIS, and it can be reasonable to be able to add this data to a data lake too.

The Internet now offers many different tools for such tasks, that do not yet support IRIS, but it's achievable. 

7
0 563
Article Piyush Adhikari · Oct 19, 2022 3m read

The capacity of taking numerous records every second while also facilitating real-time queries simultaneously in real time is called Hybrid Transactional Analytical Processing (HTAP). It is also called Transactional analytics or Transanalytics or Translytics and is a very useful element in scenarios where there is constant flow of real time data coming from IIOT sensors or data on fluctuations in stock market, and supporting the need for querying these data sets in real-time or near real-time.

4
0 846
Announcement Anastasia Dyubaylo · Apr 6, 2022

Hey Developers,

Good news! One more upcoming in-person event is nearby.

We're pleased to invite you to join "J On The Beach", an international rendezvous for developers and DevOps around Big Data technologies. A fun conference to learn and share the latest experiences, tips & tricks related to Big Data technologies, and, the most important part, it’s On The Beach!

🗓 April 27-29, 2022

📍Málaga, Spain

This year, InterSystems is a Gold Sponsor of the JOTB.

0
0 437
Article Henrique Dias · Jan 15, 2022 2m read

Hi everyone,

I want to talk about our project and use the dataset theme for this contest.

Our intention never was to be a data curator, especially because sometimes my precious data means a lot for me, but not for the rest of the world.

My Precious

We want to go a step further and empower the user to find the perfect dataset for their needs.

Our project is a bridge between the data science community and the developer's community using InterSystems IRIS to achieve this mission.

0
0 486
Article Yuri Marx · Oct 19, 2021 5m read

In the last years the data architecture and platforms focused into Big Data repositories and how toprocess it to deliver business value. From this effort many technologies were created to process tera and petabytes of data, see:

The fundamental piece to the Big Data technologies is HDFS (Hadoop Distributed File System). It is a distributed file system to store tera or petabytes of data into arrays of storages, memory and CPU working together. In addition of the Hadoop we have other components, see:

0
1 761
Article Yuri Marx · Oct 19, 2021 2m read

Hi Community,

The InterSystems IRIS has a good connector to do Hadoop using Spark. But the market offers other excellent alternative to Big Data Hadoop access, the Apache Hive. See the differences:

Hive vs. Spark
Source: https://dzone.com/articles/comparing-apache-hive-vs-spark

I created a PEX interoperability service to allows you use Apache Hive inside your InterSystems IRIS apps. To try it follow these steps:

1. Do a git clone to the iris-hive-adapter project:

$ git clone https://github.com/yurimarx/iris-hive-adapter.git

2. Open the terminal in this directory and run:

0
1 558
Article Phillip Booth · Jan 30, 2020 3m read

Over the last couple of weeks the Solution Architecture team has been working to finish off our 2019 workload: this included open-sourcing the Readmission Demo that was brought to HIMSS last year, so we could make it available to anyone looking for an interactive-way of exploring the tooling provided by IRIS.

2
2 1426
Announcement Anastasia Dyubaylo · Dec 28, 2020

Hi Community,

We're pleased to invite you to the online meetup with the winners of the InterSystems Analytics Contest!

Date & Time: Monday, January 4, 2021 – 10:00 EDT

What awaits you at this virtual Meetup? 

  • Our winners' bios.
  • Short demos on their applications.
  • An open discussion about technologies being used, bonuses, questions. Plans for the next contests.
3
0 438
Article Yuri Marx · Jan 4, 2021 2m read
Big Data 5V with InterSystems IRIS

See the table below:

Velocity: Elastic velocity delivered with horizontal and vertical node scalingEnablers: Distributed memory cache, Distributed processing, Sharding and Multimodel Architecturehttps://www.intersystems.com/isc-resources/wp-content/uploads/sites/24/… and https://learning.intersystems.com/course/view.php?id=1254&ssoPass=1

Value: exponential data value produced by Analytics and IAEnablers: BI, NLP, ML, AutoML and Multimodel

0
1 565
Announcement Anastasia Dyubaylo · Dec 21, 2020

Hey Developers,

This week is a voting week for the InterSystems Analytics Contest! So, it's time to give your vote to the best solutions built with InterSystems IRIS.

🔥 You decide: VOTING IS HERE 🔥

How to vote? 

Please meet the new voting engine and algorithm for the Experts and Community nomination:

3
0 329