MDS-Heatwave Data Analysis using Zeppelin
The Complete Guide to access MDS with HeatWave by Zeppelin
1.
What is MDS and HeatWave ?
2.
What is Zeppelin ?
3.
Environment details
4.
How to Install Zeppelin ?
5.
Network Consideration
6.
How to access HeatWave
7.
MDS Data Analysis using Zeppelin
8.
Conclusion
Overview of Zeppelin and MySQL Database Service with HeatWave
We
are living in the data world and as the data is increasing with large
velocity, it is important to get the results quicker until earlier MySQL has
challenged to produce results against larger data size in faster way how ever
there is “paradigm shift” on current solutions of MySQL and It is enabled when you add a heatwave cluster to a MySQL DB System. So MySQL database service(MDS) will give faster results and then needed some data analysis tool to make
meaningful of the data , get more insight of data. In this blog let me introduce Apache
Zeppelin for MDS data analysis. Zepplein can be installed anywhere (any public/private cloud
, On-Premises). |
What is MDS and Heatwave ?
Oracle MySQL Database
Service(MDS) is a fully managed database service that lets developers
quickly develop and deploy secure, cloud native applications using the
world’s most popular open source database. MySQL Database Service is 100% built, managed, and supported by the
OCI and MySQL engineering teams. more info:- https://www.oracle.com/mysql/ Heatwave:- https://www.oracle.com/mysql/heatwave MDS Business benefits :- https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html |
What is Zeppelin ?
Zeppelin is web-based
notebook that enables data-driven, interactive data analytics and collaborative documents
with language like SQL, Scala, Python, R and more. Things
like data ingestion, data exploration, data visualization, and data analytics
can be done in the zeppelin notebook. Features of Apache Zeppelin v Data Ingestion. v Data Discovery v Data Analytics v Data
Visualization and Collaboration |
Network Consideration
Make sure the port of zeppelin 8080
is whitelisted in your environment
and in case you are using Oracle Cloud Infrastructure (OCI) then make
sure ingress rule is configured to white list the compute instance where
Zeppelin is installed and also compute instance is able to ping pong MDS
instance. To access the Zeppelin, always access with proper user with Interpreter access otherwise any user
can access your Zeppelin, Sometime anonymous user get into the portal of Zeppelin by entering
the Zeppelin public IP address but however if your interpreter is restricted
with particular user then your workspaces is safer. Hence, It is important to have data source authorization in Apache
Zeppelin https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html |
Note:- I couldn’t dig much in details about security and my major focused
on where you are , how are you accessing the Zeppelin , how eill have seamless
experiences with MDS and HeatWave and do awesome visualization and data
analysis
Installation of Zeppelin
With below
specifications
Step 1
#Install JDK sudo yum install
java-11-openjdk-devel Step 2:- Download the Zeppelin by
using below command wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz Step 3 :- create user and give
the permission sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin sudo chown -R zeppelin:zeppelin /home/opc/zeppelin Step 4:- rename zeppelin site
template to zeppelin site cd /home/opc/zeppelin/conf sudo cp zeppelin-site.xml.template zeppelin-site.xml sudo cp zeppelin-site.xml.template
zeppelin-site.xml Step 5 :- #Start Zeppelin |
Connect Zeppelin http://<IP Address of Compute Instance/Local IP>::8080/#/ |
Connect MDS and Heatwave through Zeppelin
Download MySQL Connector/J https://dev.mysql.com/downloads/connector/j/ warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256
Signature, key ID 3a79bd29: NOKEY error:
Failed dependencies: java-headless >=
1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch [ |
Create a MySQL Interpreter #Create directory called mds
under zeppelin’s interpreter folder # mkdir mds #move the “mysql-connector-java.jar” file into MDS folder |
Navigate to the Interpreter http://<IP
Address of Compute Instance/Local IP>:8080/#/interpreter |
Search for MDS interpreter and
fill below details
|
Finally once changes are final
then it looks like below |
Access data from MDS and Heatwave
Please ensure MDS and
Heatwave is up and running.
Create a notebook and access MDS with Heatwave via Zeppelin
Awesome! , Zeppelin Connected to MDS
Let’s do data analysis by using zeppelin
In this demonstration assume that MDS with Heatwave is up and running
and data is loaded into HeatWave. If you wanted to follow quick start demo , how to load data into
HeatWave. Command to load first time data into the HeatWave:- run Auto Parallel Load to load
airportdb data into HeatWave: CALL sys.heatwave_load(JSON_ARRAY('airportdb'),
NULL); |
##Run Below SQL statement to generate report use airportdb; FROM booking, flight, airline, airport_geo WHERE booking.flight_id=flight.flight_id AND airline.airline_id=flight.airline_id
AND flight.from=airport_geo.airport_id
AND airport_geo.country
= "UNITED STATES" GROUP BY airline.airlinename ORDER BY nb_tickets desc, airline.airlinename limit 10; |
More
info about Heatwave:- https://www.oracle.com/mysql/heatwave/
Conclusion
Apache Zeppelin is a kind of tool, which makes Data Scientist life smooth, they can do everything they need in one place. Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook and MDS HeatWave is a massively parallel, high performance, in-memory query accelerator for Oracle MySQL Database Service that accelerates MySQL performance by orders of magnitude for analytics and mixed workloads and cost lesser than Specialist analytics product like Amazon Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.