Tuesday, March 1, 2022

Analyze your MySQL Database Service(MDS) data with Zeppelin

 MDS-Heatwave Data Analysis using Zeppelin

The Complete Guide to access MDS with HeatWave by Zeppelin

 In this guide we will walk through how to install and access MDS with Heatwave.

1.       What is MDS and HeatWave ?

2.       What is Zeppelin  ?

3.       Environment details

4.       How to Install Zeppelin ?

5.       Network Consideration

6.       How to access HeatWave

7.       MDS Data Analysis using Zeppelin

8.       Conclusion

Overview of Zeppelin and MySQL Database Service with HeatWave

We are living in the data world and as the data is increasing with large velocity, it is important to get the results quicker until earlier MySQL has challenged to produce results against larger data size in faster way how ever there is “paradigm shift” on current solutions of MySQL and

 Now, MySQL (only in PaaS model of OCI) comes with HeatWave ,which is a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query processing engine designed for extreme performance.

It is enabled when you add a heatwave cluster to a MySQL DB System.

So MySQL database service(MDS) will give faster results and  then needed some data analysis tool to make meaningful of the data , get more insight of data.

In this blog let me introduce Apache Zeppelin for MDS data analysis.

 Apache Zeppelin , an open source multipurpose notebook which help users to represent and analyze your data in the form of graphs or charts so that it help the organization to take quick decision.

 We will explore each item in details…

 Overall, My Idea is to show you quick demo , how easily you can connect MDS through Zeppelin.

Zepplein can be installed anywhere (any public/private  cloud  , On-Premises).

 What is MDS and Heatwave ?

Oracle MySQL Database Service(MDS) is a fully managed database service that lets developers quickly develop and deploy secure, cloud native applications using the world’s most popular open source database.

 MySQL Database Service is the only MySQL cloud service with an integrated, high performance, in-memory query accelerator - Heatwave. It enables customers to run sophisticated analytics directly against their operational MySQL databases—eliminating the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database.

MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.

more info:- https://www.oracle.com/mysql/

Heatwave:- https://www.oracle.com/mysql/heatwave

MDS Business benefits :-

https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html

 

 What is Zeppelin ?

Zeppelin is web-based notebook that enables data-driven, interactive  data analytics and collaborative documents with language like SQL, Scala, Python, R and more.

Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook.

Features of Apache Zeppelin

v  Data Ingestion.

v  Data Discovery

v  Data Analytics

v  Data Visualization and Collaboration

 More info:- https://zeppelin.apache.org/

 Network Consideration

Make sure the port of zeppelin 8080 is whitelisted in your environment

and in case you are using Oracle Cloud Infrastructure (OCI) then make sure ingress rule is configured to white list the compute instance where Zeppelin is installed and also compute instance is able to ping pong MDS instance.

To access the Zeppelin, always access with proper user with Interpreter access otherwise any user can access your Zeppelin,

Sometime anonymous user get into the portal of Zeppelin by entering the Zeppelin public IP address but however if your interpreter is restricted with particular user then your workspaces is safer.

Hence, It is important to have data source authorization in Apache Zeppelin

https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html




Note:- I couldn’t dig much in details about security and my major focused on where you are , how are you accessing the Zeppelin , how eill have seamless experiences with MDS and HeatWave and do awesome visualization and data analysis

Installation of Zeppelin

 In this blog, installation of Zeppelin will be on Oracle Cloud Infrastructure (https://www.oracle.com/in/cloud/)

 With below specifications




Step 1 #Install JDK

sudo yum install java-11-openjdk-devel

Step 2:- Download the Zeppelin by using below command

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz

Step 3 :- create user and give the permission

sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin

sudo chown -R zeppelin:zeppelin /home/opc/zeppelin

Step 4:- rename zeppelin site template to zeppelin site

cd /home/opc/zeppelin/conf

sudo cp zeppelin-site.xml.template zeppelin-site.xml

sudo cp zeppelin-site.xml.template zeppelin-site.xml

Step 5 :- #Start Zeppelin

Connect Zeppelin

http://<IP Address of Compute Instance/Local IP>::8080/#/



Connect MDS and Heatwave through Zeppelin

Download MySQL Connector/J

https://dev.mysql.com/downloads/connector/j/

 rpm -ivh  mysql-connector-java-8.0.28-1.el7.noarch.rpm

warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 3a79bd29: NOKEY

error: Failed dependencies:

        java-headless >= 1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch

[

 Fix:- yum -y install java-headless 

Create a MySQL Interpreter

#Create directory called mds under zeppelin’s interpreter folder

# mkdir mds

#move the “mysql-connector-java.jar” file into MDS folder

 #cp /usr/share/java/mysql-connector-java.jar /home/opc/zeppelin/interpreter/mds/

 

Navigate to the  Interpreter

http://<IP Address of Compute Instance/Local IP>:8080/#/interpreter

Search for MDS interpreter and fill below details



 






Finally once changes are final then it looks like below








Access data from MDS and Heatwave

Please ensure MDS and Heatwave is up and running.







Create a notebook and access MDS with Heatwave via Zeppelin













Awesome! , Zeppelin Connected to MDS

Let’s do data analysis by using zeppelin

In this demonstration assume that MDS with Heatwave is up and running and data is loaded into HeatWave.

If you wanted to follow quick start demo , how to load data into HeatWave.

https://docs.oracle.com/en-us/iaas/mysql-database/doc/heatwave.html#GUID-700248EF-4614-49CD-888F-920F4C66CD4C

 Schema used in the demo is “AirportDB” with 50 GB database size.

Command to load first time data into the HeatWave:-

run Auto Parallel Load to load airportdb data into HeatWave:

CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL);












##Run Below SQL statement to generate report

use airportdb;

 SELECT airline.airlinename, SUM(booking.price) as price_tickets, count(*) as nb_tickets

FROM booking, flight, airline, airport_geo

WHERE booking.flight_id=flight.flight_id AND

airline.airline_id=flight.airline_id AND

flight.from=airport_geo.airport_id AND

airport_geo.country = "UNITED STATES"

GROUP BY

airline.airlinename

ORDER BY

nb_tickets desc, airline.airlinename limit 10;















More info about Heatwave:-  https://www.oracle.com/mysql/heatwave/

 Conclusion

Apache Zeppelin is a kind of tool, which makes Data Scientist life smooth, they can do everything they need in one place. Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook and MDS HeatWave is a massively parallel, high performance, in-memory query accelerator for Oracle MySQL Database Service that accelerates MySQL performance by orders of magnitude for analytics and mixed workloads and cost lesser than Specialist analytics product like Amazon Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.

Monday, February 7, 2022

Understanding MySQL Database Service(MDS): Managed MySQL, Focus on your Business

In this blog , let understand about key business benefit of using MySQL Database Service.

before that let's understand

What is MySQL database service (MDS)?

MySQL Database Service is a fully managed Oracle Cloud Infrastructure native service, 100% developed, managed, and supported by the MySQL team in Oracle.

Oracle automates all tasks such as high availability, backup and recovery, database and operating system patching,version upgrades and so on. 

You are responsible solely for managing your data, schema designs, and access policies.

MDS is only MySQL managed service with MySQL Enterprise Edition and using integrated massively-scalable query accelerator with Heatwave. 

More info:- https://www.oracle.com/mysql/

Why is MySQL Database Service (MDS) ?

 Understanding Key Business benefit with MySQL Database Service(MDS) with Heatwave.














List of Business benefit of using MySQL Database Service (MDS)

 ¨    Extreme Performance with HeatWave 

HeatWave is a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query processing engine designed for extreme performance for MySQL database service (MDS).

It is enabled when you add a HeatWave cluster to a MySQL DB System.

 A HeatWave cluster includes a MySQL DB System node and two or more HeatWave nodes.

The MySQL DB System node has a HeatWave plugin that is responsible for cluster management, loading data into the HeatWave cluster, query scheduling, and returning query results to the MySQL DB System.

HeatWave nodes store data in memory and process analytics queries.

 More info:- https://dev.mysql.com/doc/heatwave/en/heatwave-introduction.html

 

Advantage of using HeatWave to accelerate the performance of MySQL Database

 

Without HeatWave

 

With HeatWave

ETL(Extract Transform Load) Required

No ETL(Extract Transform Load) Required

 

Application level changes required in syntax and semantics

No Change in Application

Multiple databases for OLAP and OLTP workloads

Single MySQL Database Service for all applications (OLTP , OLAP)

Can be prone to Data loss and inconsistency

 

No chance of data loss because data intact with MySQL database service(MDS).

Lower Performance

 

Extreme Performance

-          5400x faster than Amazon RDS

-          1400x faster than Amazon Aurora

-          6.5x Faster than Amazon Redshift

Higher Cost

Cheaper Cost

-          2/3 cost of Amazon RDS

-          ½ cost of Amazon Aurora

-          ½ cost of Amazon Redshift

Without HeatWave

With Heatwave

Performance benchmark available:- https://www.mysql.com/why-mysql/benchmarks/

¨   MDS HA(High Availability)

MDS HA is single clicked high availability solution which ensure fault tolerant system with automatic failover and zero data loss.

 Key business benefit of MDS HA include

·       Single Click High Availability

·       Automatic Failover

·       Increase Uptime

·       Zero Data Loss during failure event

·       Application will never experience down time due to MySQL node failures.

 Key product benefit of MDS HA include

·       Built on Group Replication

-          Provides distributed state machine replication with strong coordination between servers.

-          Servers automatically coordinate using Paxos based protocol

·       Plugin that enables you to create elastic, highly-available, fault-tolerant replication topologies.

·       Proven MySQL technology (binary log & replication)

·       Developed and Supported by MySQL Team





More info:- https://dev.mysql.com/doc/refman/8.0/en/group-replication.html

 ¨   MDS Security Features and Compliance

In Today’s digital world data is most valuable asset and database security has become top priority for the stakeholders.

If you lose your data, it can have serious impact in the business, possibly we may loss the brand.

That is why we needed to ensure that whatever in our databases, it should be fully protected.

Stay secure with MDS, the latest MySQL security fixes from the MySQL team are automatically applied to limit your exposure to security vulnerabilities.

 Also, MDS is the only public cloud service built on MySQL Enterprise Edition which comes with comprehensive enterprise security features like

·       Transparent Data Encryption(TDE) :-

-          Data at rest encryption

-          AES 256 encryption

·       MySQL Enterprise Encryption

-          Data at rest encryption

-          AES 256 encryption

·       MySQL Data Masking

-          De-identify , Anoymize Sensitive data

-          Reduce risk of data breach

·       MySQL Enterprise Firewall

-          Block SQL injection attacks

-          Intrusion detection

·       MySQL Enterprise Audit

-          Tracking the use of database records

-          Monitoring each operation on data

 

More info:- https://dev.mysql.com/doc/refman/8.0/en/mysql-enterprise-security.html

 ¨   MDS Cost Less than AWS RDS , Aurora

Amazon RDS is ~266% more expensive than MySQL Database Service for standard compute shapes.

In fact OCI(compute, storage, networking) is also cheaper with better underlying hardware.

 

In case you wanted to know in advance how much MDS cost you , please enter specific details in the cost estimator tool here:- https://www.oracle.com/cloud/costestimator.html

 

 

 ¨   MySQL Enterprise Support – 24x7 from MySQL team

MySQL Technical Support Services provide direct access to our expert MySQL Support engineers who are ready to assist you in the development, deployment, and management of MySQL applications.

 A vast majority of the problems the MySQL Support Engineers encounter, they have seen before. So an issue that could take several weeks for your staff to research and resolve, may be solved in a matter of hours by the MySQL Support team.

       Largest MySQL engineering and support organization

       Backed by the MySQL developers

       World-class support, in 29 languages

       Hot fixes & maintenance releases

       Consultative support

More info:- https://www.mysql.com/support/

 

 Conclusion

Oracle MySQL Database Service is a fully managed database service that lets developers quickly develop and deploy secure, cloud native applications using the world’s most popular open source database. MySQL Database Service is the only MySQL cloud service with an integrated, high performance, in-memory query accelerator—HeatWave.

 It enables customers to run sophisticated analytics directly against their operational MySQL databases-eliminating the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database.

HeatWave accelerates MySQL performance by orders of magnitude for analytics and mixed workloads. Optimized for Oracle Cloud Infrastructure (OCI), MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.




Friday, January 21, 2022

Glimpse of MySQL 8.0.28 Release

 MySQL 8.0.28 version is GA!!! , ready to use in production.

Below is high level overview , more can be viewed from the weblink.

[1] Disabling Audit Logging for all connecting and connected sessions

SET GLOBAL audit_log_disable = true;

Setting audit_log_disable to true disables the audit log plugin. The plugin is re-enabled when audit_log_disable is set back to false, which is the default setting.


[2]- [Under Performance Schema] A new statement metric, CPU_TIME, is now available, enabling you to measure the CPU time spent on a query.

This is helpful to troubleshoot performance of SQL statements specially hardware resource utilize by the SQL statement , below is the there table you can use to monitor the bottlenecks

[3]- InnoDB now supports ALTER TABLE ... RENAME COLUMN operations using ALGORITHM=INSTANT

Operations that support ALGORITHM=INSTANT only modify metadata in the data dictionary. No exclusive metadata locks are taken on the table during preparation and execution phases of the operation, and table data is unaffected, making the operations instantaneous. If not specified explicitly, ALGORITHM=INSTANT is used by default by DDL operations that support it.

[4]- Replication: When the PAD_CHAR_TO_FULL_LENGTH SQL mode was enabled on a replica server, trailing spaces could be added to a replication channel’s name in the replication metadata repository tables, resulting in errors in replication operations that identified the channel using that data. The issue has now been fixed in MySQL 8.0 by using VARCHAR for character columns, and in MySQL 5.7 by disabling the SQL mode when reading from those tables. Thanks to Brian Yue for the contribution. (Bug #33213841)

[5.]Producing a per-table dump using mysqldump in MySQL 5.7 and 8.0 requires a longer execution time compared to MySQL 5.6. This is because the information_schema.files table, which is queried for information on log file groups by mysqldump, contains information about InnoDB data files as well as NDB data files from MySQL 5.7.
In MySQL 8.0 , the issue has been fixed by rewriting the query to select only the appropriate data files. In MySQL 5.7, Information Schema tables do not have indexes, so a full table scan is still required. (Bug #29210990, Bug #93875)