Tuesday, March 1, 2022

Analyze your MySQL Database Service(MDS) data with Zeppelin

 MDS-Heatwave Data Analysis using Zeppelin

The Complete Guide to access MDS with HeatWave by Zeppelin

 In this guide we will walk through how to install and access MDS with Heatwave.

1.       What is MDS and HeatWave ?

2.       What is Zeppelin  ?

3.       Environment details

4.       How to Install Zeppelin ?

5.       Network Consideration

6.       How to access HeatWave

7.       MDS Data Analysis using Zeppelin

8.       Conclusion

Overview of Zeppelin and MySQL Database Service with HeatWave

We are living in the data world and as the data is increasing with large velocity, it is important to get the results quicker until earlier MySQL has challenged to produce results against larger data size in faster way how ever there is “paradigm shift” on current solutions of MySQL and

 Now, MySQL (only in PaaS model of OCI) comes with HeatWave ,which is a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query processing engine designed for extreme performance.

It is enabled when you add a heatwave cluster to a MySQL DB System.

So MySQL database service(MDS) will give faster results and  then needed some data analysis tool to make meaningful of the data , get more insight of data.

In this blog let me introduce Apache Zeppelin for MDS data analysis.

 Apache Zeppelin , an open source multipurpose notebook which help users to represent and analyze your data in the form of graphs or charts so that it help the organization to take quick decision.

 We will explore each item in details…

 Overall, My Idea is to show you quick demo , how easily you can connect MDS through Zeppelin.

Zepplein can be installed anywhere (any public/private  cloud  , On-Premises).

 What is MDS and Heatwave ?

Oracle MySQL Database Service(MDS) is a fully managed database service that lets developers quickly develop and deploy secure, cloud native applications using the world’s most popular open source database.

 MySQL Database Service is the only MySQL cloud service with an integrated, high performance, in-memory query accelerator - Heatwave. It enables customers to run sophisticated analytics directly against their operational MySQL databases—eliminating the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database.

MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.

more info:- https://www.oracle.com/mysql/

Heatwave:- https://www.oracle.com/mysql/heatwave

MDS Business benefits :-

https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html

 

 What is Zeppelin ?

Zeppelin is web-based notebook that enables data-driven, interactive  data analytics and collaborative documents with language like SQL, Scala, Python, R and more.

Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook.

Features of Apache Zeppelin

v  Data Ingestion.

v  Data Discovery

v  Data Analytics

v  Data Visualization and Collaboration

 More info:- https://zeppelin.apache.org/

 Network Consideration

Make sure the port of zeppelin 8080 is whitelisted in your environment

and in case you are using Oracle Cloud Infrastructure (OCI) then make sure ingress rule is configured to white list the compute instance where Zeppelin is installed and also compute instance is able to ping pong MDS instance.

To access the Zeppelin, always access with proper user with Interpreter access otherwise any user can access your Zeppelin,

Sometime anonymous user get into the portal of Zeppelin by entering the Zeppelin public IP address but however if your interpreter is restricted with particular user then your workspaces is safer.

Hence, It is important to have data source authorization in Apache Zeppelin

https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html




Note:- I couldn’t dig much in details about security and my major focused on where you are , how are you accessing the Zeppelin , how eill have seamless experiences with MDS and HeatWave and do awesome visualization and data analysis

Installation of Zeppelin

 In this blog, installation of Zeppelin will be on Oracle Cloud Infrastructure (https://www.oracle.com/in/cloud/)

 With below specifications




Step 1 #Install JDK

sudo yum install java-11-openjdk-devel

Step 2:- Download the Zeppelin by using below command

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz

Step 3 :- create user and give the permission

sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin

sudo chown -R zeppelin:zeppelin /home/opc/zeppelin

Step 4:- rename zeppelin site template to zeppelin site

cd /home/opc/zeppelin/conf

sudo cp zeppelin-site.xml.template zeppelin-site.xml

sudo cp zeppelin-site.xml.template zeppelin-site.xml

Step 5 :- #Start Zeppelin

Connect Zeppelin

http://<IP Address of Compute Instance/Local IP>::8080/#/



Connect MDS and Heatwave through Zeppelin

Download MySQL Connector/J

https://dev.mysql.com/downloads/connector/j/

 rpm -ivh  mysql-connector-java-8.0.28-1.el7.noarch.rpm

warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 3a79bd29: NOKEY

error: Failed dependencies:

        java-headless >= 1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch

[

 Fix:- yum -y install java-headless 

Create a MySQL Interpreter

#Create directory called mds under zeppelin’s interpreter folder

# mkdir mds

#move the “mysql-connector-java.jar” file into MDS folder

 #cp /usr/share/java/mysql-connector-java.jar /home/opc/zeppelin/interpreter/mds/

 

Navigate to the  Interpreter

http://<IP Address of Compute Instance/Local IP>:8080/#/interpreter

Search for MDS interpreter and fill below details



 






Finally once changes are final then it looks like below








Access data from MDS and Heatwave

Please ensure MDS and Heatwave is up and running.







Create a notebook and access MDS with Heatwave via Zeppelin













Awesome! , Zeppelin Connected to MDS

Let’s do data analysis by using zeppelin

In this demonstration assume that MDS with Heatwave is up and running and data is loaded into HeatWave.

If you wanted to follow quick start demo , how to load data into HeatWave.

https://docs.oracle.com/en-us/iaas/mysql-database/doc/heatwave.html#GUID-700248EF-4614-49CD-888F-920F4C66CD4C

 Schema used in the demo is “AirportDB” with 50 GB database size.

Command to load first time data into the HeatWave:-

run Auto Parallel Load to load airportdb data into HeatWave:

CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL);












##Run Below SQL statement to generate report

use airportdb;

 SELECT airline.airlinename, SUM(booking.price) as price_tickets, count(*) as nb_tickets

FROM booking, flight, airline, airport_geo

WHERE booking.flight_id=flight.flight_id AND

airline.airline_id=flight.airline_id AND

flight.from=airport_geo.airport_id AND

airport_geo.country = "UNITED STATES"

GROUP BY

airline.airlinename

ORDER BY

nb_tickets desc, airline.airlinename limit 10;















More info about Heatwave:-  https://www.oracle.com/mysql/heatwave/

 Conclusion

Apache Zeppelin is a kind of tool, which makes Data Scientist life smooth, they can do everything they need in one place. Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook and MDS HeatWave is a massively parallel, high performance, in-memory query accelerator for Oracle MySQL Database Service that accelerates MySQL performance by orders of magnitude for analytics and mixed workloads and cost lesser than Specialist analytics product like Amazon Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.