Hi all. We are going to find duplicates in a dataset using Apache Spark Machine Learning algorithms.
Note: I have done the following on Ubuntu 18.04, Python 3.6.5, Zeppelin 0.8.0, Spark 2.1.1
Introduction
In previous articles we have done the following:
- The way to launch Jupyter Notebook + Apache Spark + InterSystems IRIS
- Load a ML model into InterSystems IRIS
- K-Means clustering of the Iris Dataset
- The way to launch Apache Spark + Apache Zeppelin + InterSystems IRIS
In this series of articles, we explore Machine Learning and record linkage.
This error appears: