Tuesday, November 8, 2022

Large File Import from Oracle Cloud Object Storage into MySQL Database Service(MDS) using Data Integration Service – Part 2

 Large File Import from Oracle Cloud Object
Storage into MySQL Database Service(MDS)
using Data Integration Service Part 2

In this tutorial, I will walk through about Importing .CSV file from Oracle Cloud Object Storage into MDS using OCI DI Service( Oracle Data Integration service)

There are multiple ways to import the backup/csv file from object storage to MySQL database service(MDS) , below are some of options you may try in case you have DBA background

[1.] Using Parallel Table Import Utility 

https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-parallel-table.html

[2.] Mount the Object Storage using S3fs-fuse  S3 compatible API,  and upload the files into MDS. 

[3.] Using Oracle Cloud Service(Data Integration) Service. >> this blog is all about

When to use this approach - OCI Data Integration Service ?

Suppose you have large CSV file or else MySQL backup file or else you have database backup from multiple supported sources(RDS, MSSQL, PostgreSQL ,Aurora) and you wanted to import into MySQL database service(MDS) from object storage , then it’s easy to lift-shift the file into MySQL Database Service and run analytics and machine learning etc.

Supported data sources as below

https://docs.oracle.com/en-us/iaas/data-integration/using/data- assets.htm#data-sources

In case you wanted to know what is OCI Data Integration service (OCI DI) then below web link will have detailed info:- 

What is Data Integration Service ?

Oracle Cloud Infrastructure (OCI) Data Integration is a fully managed,

serverless, native cloud service that helps you with common extract, load,

and transform (ETL) tasks.

These tasks include ingesting data from various sources, cleansing,

transforming, and reshaping that data, and then efficiently loading it to

target data sources on OCI 

more info: https://www.oracle.com/integration/data-integration/

Let’s get started...

High level work flow to upload CSV file from object storage into MDS as below:-

  1. Make sure Object storage has CSV file- acting as Source .

  2. Make sure MDS has table with matching attribute as CSV file- acting

    as Target

  3. Make sure you have adequate policies defined to use data

    integration service.

  4. Create Data Integration task between source and target

  5. Run the task and validate the output

Create Required Policies to use for data integration service

By going to :- MenuAnalytics & AIData IntegrationWorkspaces








Design Data Integration Task for the Import

By going to :- MenuAnalytics & AIData IntegrationWorkspaces

Create data asset for Object storage 


Create data asset for MDS


 Create Data Flow

A data flow is a visual program representing the flow of data from source data assets, such as a database or flat file, to target data assets, such as a data lake or data warehouse.

Define Source details : CSV file details

Define Target details : MDS details

Make data flow between Source and Target 

hange the data type form varchar to INT , in case data type is having mismatch between source and target , always ensure both side have same data type to work the ETL.

For that attach EXPRESSION to the source and convert the required column data type if required or else let it be as it is , this step could be optional.
Other option is instead of changing data type on the sources you get imported data first and then due to the data type changing by “ALTER TABLE ...” command

Click On Target : target_MDS

click on the Map tab on the Properties panel. Drag and drop the fields with NO Auto in the mapping column (from left to right). Do it until they are all mapped. You can expand / shrink the Properties canvas. Once mapping is done make sure you click on “Validation” , ensure for ZERO error.


 Create data integration Task

Publish the Integration Task

Go to Projects on the home screen. >> Click on My First Project>> click

on Tasks>> Publish to Application.

Run the Task

Go back to the Home screen and click Applications and Run

Validate the output

Login to MySQL Database Service and Validate the output

MySQL 10.0.x.x:33060+ ssl demo SQL > select * from Employee; 

+-------+----------+
| empid | empname |
+-------+----------+

| 100 | chandan |
| 200 | chandani | 

+-------+----------+

2 rows in set (0.0007 sec)

That’s it !!! 

Conclusion

In this blog, you have seen that you can successfully setup an OCI Data Integration service in Oracle Cloud Infrastructure.

Utilising the steps, you can now create data assets, data flows to transform and move the data to desired targets.

It’s user friendly managed service which helps you to move off from different sources like AWS RDS for MySQL ,Aurora, Google cloud for MySQL ,MSSQL ,PostgreSQL into MySQL Database Service(MDS). 

 

 

 








 

 

 


 



 


cc

Saturday, October 29, 2022

Large File Upload from On-premises into Oracle Cloud Object Storage using REST API-Part 1

Large File Upload from On-premises into Oracle
Cloud Object Storage using REST API

In this tutorial(for users of mac os ,linux) , I will walk through about uploading .CSV file from on-premises to Oracle Cloud Object Storage using REST APIs.

When to use this approach ?

Suppose you have large CSV file or else MySQL backup file and you wanted to export into MySQL database service(MDS) , then it’s easy to lift-shift the backup file into Object Storage and from object storage to MySQL database service.

Next blog, we will explore how to import CSV file from Object Storage into MySQL Database Service(MDS).

In case you wanted to know what is object storage then below web link will have detailed info:-

What is Object Storage ?

more info:-

https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm

Let’s get started...
High level work flow as below:-

  1. Make sure cURL is install in your On-P machine.

  2. Make sure bucket is created in the object storage of Oracle Cloud

  3. Generate Authentication Token

  4. Generate Pre-Authenticated Requests in side the bucket of Oracle

    Cloud

  5. Run the cURL- PUT command from the On-Premises.

The Oracle Cloud Infrastructure Object Storage service is an internet-

scale, high-performance storage platform that offers reliable and cost-

efficient data durability. The Object Storage service can store an unlimited

amount of unstructured data of any content type, including analytic data

and rich content, like images and videos.

Verify cURL is install or not run curl help , in the command prompt or else install it.

Create bucket in the Oracle Cloud

Bucket is the location where CSV file will be exported from On-Premises. It is a container for storing objects in a compartment within an Object Storage namespace.
You may use for storing data, images, logs, and video etc.

Each object inside bucket is encrypted with its own data encryption key. Data encryption keys are always encrypted with a master encryption key that is assigned to the bucket.
Encryption is enabled by default and cannot be turned off. By default, Oracle manages the master encryption key.

Generate Authentication Token

By going to :- user settingsAuth tokensGenerate Token

Copy the Auth Token and save into text file and looks like below:-

Czhfc(g_8i#h{8CAW2Gr

Create a pre-Authenticate Request on Object Storage

Pre-authenticated requests provide a way to let you to access a bucket or an object without having their own credentials.
you continue to have access to the bucket or object as long as the creator of the request has permissions to access those resources.

Go to > storage BucketsCreate BucketPre-Authenticated Requests

Copy the URL as below it will look like

https://objectstorage.ap-mumbai- 1.oraclecloud.com/p/EcXPtgNE6vRxSwJB8buWVXpW-- KNUTviigbc0QlLcQ2IvJXqE5hrZQPC2hsoxkxt/n/idazzjlcjqzj/b/bucket-csv- file-upload/o/

Run the CURL command from On-Premises

cURL is one of the tools used to access the REST interface. cURL is an open source tool used for transferring data which supports various protocols including HTTP and HTTPS.

curl -v -X PUT -u '<user>:<token string from step 2>' --upload-file <local file location> <Pre-authenticated Request URL from step 3>

For example:#1. >> create Pre Authentication Request for Objects

curl -v -X PUT -u'chandan.d.kumar@oracle.com:Czhfc(g_8i#h{8CAW2Gr'
--upload-file /Users/cjkumar/Documents/Mac-key/bootcamp.pub
https://objectstorage.ap-mumbai-
1.oraclecloud.com/p/vDU3iX0C2yOKjNJCkc9yCp4CvYVCrY_FTDy1Yh6yVwqpIn
JO4Ve_R-nAefZHhWlx/n/idazzjlcjqzj/b/bucket-fileUpload/o/On-P-
MySQLbackupfile

For example:#2. >> create Pre Authentication Request for Bucket

curl -v -X PUT -u'chandan.d.kumar@oracle.com:Czhfc(g_8i#h{8CAW2Gr' --upload-file /Users/cjkumar/Downloads/employee.csv https://objectstorage.ap-mumbai- 1.oraclecloud.com/p/EcXPtgNE6vRxSwJB8buWVXpW-- KNUTviigbc0QlLcQ2IvJXqE5hrZQPC2hsoxkxt/n/idazzjlcjqzj/b/bucket-csv-file-upload/o/

Once done, remember to delete the pre-authenticated Request as it is no longer

needed.

Above screenshot show that , we have successfully uploaded CSV file from On-Premises to Oracle Cloud.

Next Blog , I will walk through how to import CSV file from Object storage into MySQL Database Service

Conclusion

Object storage is very easy way to store the data could be structured/unstructured from on-premises to oracle cloud and use with cloud native applications.
REST API uses cURL which simplifying the task of lift-shift from different sources into Oracle Cloud object storage.
REST API is used to manage containers and objects in the Oracle Storage Cloud Service instance.
Anyone can access the REST API from any application or programming platform that understands the Hypertext Transfer Protocol (HTTP) and has Internet connectivity. 

Tuesday, March 1, 2022

Analyze your MySQL Database Service(MDS) data with Zeppelin

 MDS-Heatwave Data Analysis using Zeppelin

The Complete Guide to access MDS with HeatWave by Zeppelin

 In this guide we will walk through how to install and access MDS with Heatwave.

1.       What is MDS and HeatWave ?

2.       What is Zeppelin  ?

3.       Environment details

4.       How to Install Zeppelin ?

5.       Network Consideration

6.       How to access HeatWave

7.       MDS Data Analysis using Zeppelin

8.       Conclusion

Overview of Zeppelin and MySQL Database Service with HeatWave

We are living in the data world and as the data is increasing with large velocity, it is important to get the results quicker until earlier MySQL has challenged to produce results against larger data size in faster way how ever there is “paradigm shift” on current solutions of MySQL and

 Now, MySQL (only in PaaS model of OCI) comes with HeatWave ,which is a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query processing engine designed for extreme performance.

It is enabled when you add a heatwave cluster to a MySQL DB System.

So MySQL database service(MDS) will give faster results and  then needed some data analysis tool to make meaningful of the data , get more insight of data.

In this blog let me introduce Apache Zeppelin for MDS data analysis.

 Apache Zeppelin , an open source multipurpose notebook which help users to represent and analyze your data in the form of graphs or charts so that it help the organization to take quick decision.

 We will explore each item in details…

 Overall, My Idea is to show you quick demo , how easily you can connect MDS through Zeppelin.

Zepplein can be installed anywhere (any public/private  cloud  , On-Premises).

 What is MDS and Heatwave ?

Oracle MySQL Database Service(MDS) is a fully managed database service that lets developers quickly develop and deploy secure, cloud native applications using the world’s most popular open source database.

 MySQL Database Service is the only MySQL cloud service with an integrated, high performance, in-memory query accelerator - Heatwave. It enables customers to run sophisticated analytics directly against their operational MySQL databases—eliminating the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database.

MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.

more info:- https://www.oracle.com/mysql/

Heatwave:- https://www.oracle.com/mysql/heatwave

MDS Business benefits :-

https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html

 

 What is Zeppelin ?

Zeppelin is web-based notebook that enables data-driven, interactive  data analytics and collaborative documents with language like SQL, Scala, Python, R and more.

Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook.

Features of Apache Zeppelin

v  Data Ingestion.

v  Data Discovery

v  Data Analytics

v  Data Visualization and Collaboration

 More info:- https://zeppelin.apache.org/

 Network Consideration

Make sure the port of zeppelin 8080 is whitelisted in your environment

and in case you are using Oracle Cloud Infrastructure (OCI) then make sure ingress rule is configured to white list the compute instance where Zeppelin is installed and also compute instance is able to ping pong MDS instance.

To access the Zeppelin, always access with proper user with Interpreter access otherwise any user can access your Zeppelin,

Sometime anonymous user get into the portal of Zeppelin by entering the Zeppelin public IP address but however if your interpreter is restricted with particular user then your workspaces is safer.

Hence, It is important to have data source authorization in Apache Zeppelin

https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html




Note:- I couldn’t dig much in details about security and my major focused on where you are , how are you accessing the Zeppelin , how eill have seamless experiences with MDS and HeatWave and do awesome visualization and data analysis

Installation of Zeppelin

 In this blog, installation of Zeppelin will be on Oracle Cloud Infrastructure (https://www.oracle.com/in/cloud/)

 With below specifications




Step 1 #Install JDK

sudo yum install java-11-openjdk-devel

Step 2:- Download the Zeppelin by using below command

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz

Step 3 :- create user and give the permission

sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin

sudo chown -R zeppelin:zeppelin /home/opc/zeppelin

Step 4:- rename zeppelin site template to zeppelin site

cd /home/opc/zeppelin/conf

sudo cp zeppelin-site.xml.template zeppelin-site.xml

sudo cp zeppelin-site.xml.template zeppelin-site.xml

Step 5 :- #Start Zeppelin

Connect Zeppelin

http://<IP Address of Compute Instance/Local IP>::8080/#/



Connect MDS and Heatwave through Zeppelin

Download MySQL Connector/J

https://dev.mysql.com/downloads/connector/j/

 rpm -ivh  mysql-connector-java-8.0.28-1.el7.noarch.rpm

warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 3a79bd29: NOKEY

error: Failed dependencies:

        java-headless >= 1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch

[

 Fix:- yum -y install java-headless 

Create a MySQL Interpreter

#Create directory called mds under zeppelin’s interpreter folder

# mkdir mds

#move the “mysql-connector-java.jar” file into MDS folder

 #cp /usr/share/java/mysql-connector-java.jar /home/opc/zeppelin/interpreter/mds/

 

Navigate to the  Interpreter

http://<IP Address of Compute Instance/Local IP>:8080/#/interpreter

Search for MDS interpreter and fill below details



 






Finally once changes are final then it looks like below








Access data from MDS and Heatwave

Please ensure MDS and Heatwave is up and running.







Create a notebook and access MDS with Heatwave via Zeppelin













Awesome! , Zeppelin Connected to MDS

Let’s do data analysis by using zeppelin

In this demonstration assume that MDS with Heatwave is up and running and data is loaded into HeatWave.

If you wanted to follow quick start demo , how to load data into HeatWave.

https://docs.oracle.com/en-us/iaas/mysql-database/doc/heatwave.html#GUID-700248EF-4614-49CD-888F-920F4C66CD4C

 Schema used in the demo is “AirportDB” with 50 GB database size.

Command to load first time data into the HeatWave:-

run Auto Parallel Load to load airportdb data into HeatWave:

CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL);












##Run Below SQL statement to generate report

use airportdb;

 SELECT airline.airlinename, SUM(booking.price) as price_tickets, count(*) as nb_tickets

FROM booking, flight, airline, airport_geo

WHERE booking.flight_id=flight.flight_id AND

airline.airline_id=flight.airline_id AND

flight.from=airport_geo.airport_id AND

airport_geo.country = "UNITED STATES"

GROUP BY

airline.airlinename

ORDER BY

nb_tickets desc, airline.airlinename limit 10;















More info about Heatwave:-  https://www.oracle.com/mysql/heatwave/

 Conclusion

Apache Zeppelin is a kind of tool, which makes Data Scientist life smooth, they can do everything they need in one place. Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook and MDS HeatWave is a massively parallel, high performance, in-memory query accelerator for Oracle MySQL Database Service that accelerates MySQL performance by orders of magnitude for analytics and mixed workloads and cost lesser than Specialist analytics product like Amazon Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.