Saturday, June 22, 2024

Building Modern Data Platform using Heatwave as Query Processing Engine


 Building Modern Data Platform using Heatwave as Query Processing Engine. 

In this tutorial we will explore about Heatwave as query processing engine as part of “modern data platform”. 

MySQL HeatWave is the only service that enables developers, database administrators, and data analysts to run OLTP, OLAP, and machine learning (ML) workloads directly from MySQL Database or else object storage/S3 as part of multi-cloud hybrid model 



 Current Challenges with traditional Data Warehouse Solutions 

In the contemporary landscape of data-driven decision-making, organizations face numerous challenges in effectively managing, analysing, and deriving insights from the vast volumes of data available to them. 


Traditional data management approaches often prove inadequate in addressing the evolving needs of businesses, including the 

need for real-time insights, scalability, and interoperability across diverse data sources and formats. 

“Noisy neighbours” multiple databases for different kind of workloads which includes RDBMS for OLTP , specialize analytics for OLAP, ML tools for machine learning etc,additional cost ,complexities , ETL jobs etc. 

Furthermore, siloed data environments hinder collaboration and inhibit the ability to derive holistic insights from disparate datasets. 


In light of these challenges, there is a growing imperative for organizations to adopt a modern data platform that can seamlessly integrate, process, and analyze data from various sources while ensuring security, governance, and compliance. 

Such a platform should enable organizations to harness the full potential of their data assets to drive innovation, improve operational efficiency, and gain a competitive edge in the marketplace. 


 Key considerations in designing and implementing a modern data platform 

Key considerations in designing and implementing a modern data platform include - 

  • scalability to handle growing data volumes, 
  • flexibility to accommodate diverse data types and sources, 
  • interoperability to support integration with existing systems and technologies, 
  • real-time processing capabilities for timely insights, 
  • Competitive price/performance 
  • No application level changes 
  • robust security measures to protect sensitive data, and comprehensive governance frameworks to ensure regulatory compliance and data quality. 

Addressing these challenges and requirements with Heatwave Service 

Addressing these challenges and requirements demands a strategic approach to building a modern data platform that aligns with the organization's goals, fosters a data-driven culture, and empowers users across the enterprise to derive actionable insights from data. 

I recommend Heatwave as query processing engine ,which is in-memory hybrid columnar data store with multi-node ,more core parallel processing model processes the partitioned data in parallel with machine learning based workload optimization using Oracle AUTO ML . 

Today, Heatwave can immediately replace five different services of AWS with single service of Heatwave to drive workload which require solutions for transactional processing ,analytical processing, machine learning,lakehouse,GenAI. 

In AWS, organization depends on RDS/Aurora for OLTP , Redshift for analytics ,Sage maker for ML etc and multiple services of AWS increases cost ,complexity and noisy neighbours. 


Key technical feature makes Heatwave way Faster 

There are multiple technical features which accelerate heatwave query performance, some of those are. 


1. In-Memory Hybrid Columnar Format 

HeatWave stores data in main memory in a hybrid columnar format. Hybrid columnar format enables the use of efficient query processing algorithms designed to operate on fixed-width data, and permits vectorized query processing. 


2. Push based vectorized data processing model 


HeatWave processes queries by pushing vector blocks (slices of columnar data) through the query execution plan from one operator to another. A push-based execution model avoids deep call stacks and saves valuable resources compared to tuple-based processing models. 


3. Massive Parallel Architecture 


The HeatWave massively parallel architecture uses internode and intranode partitioning of data. Each node within a HeatWave Cluster, and each CPU core within a node, processes the partitioned data in parallel. 

This massively parallel architecture, combined with high-fanout, workload-aware partitioning, accelerates query processing. 

4. ML-based Workload Optimization 


MySQL Autopilot provides workload-aware, machine learning–powered automation. It improves performance and scalability without requiring database 

tuning expertise, increases the productivity of developers and DBAs, and helps eliminate human errors. MySQL Autopilot automates many of the most important and often challenging aspects of achieving high query performance at scale—including provisioning, data loading, query execution, and failure handling. 

5. Scale out data management 


When analytics data is loaded into HeatWave, the HeatWave Storage Layer automatically persists the data for pause and resume of the HeatWave Cluster and for fast recovery in case of a HeatWave node or cluster failure. Data is automatically restored by the HeatWave Storage Layer when the HeatWave Cluster resumes after a pause or recovers a failed node or cluster. This automated, self-managing storage layer scales to the size required for your HeatWave Cluster and operates independently in the background. 

MySQL HeatWave Service persists the data to OCI Object Storage. 

MySQL HeatWave on AWS persists the data to AWS S3. 


Architecture for Modern Data Platform with Heatwave Service 

Below architecture depicts one of “modern data platform “on AWS 

  1. Input Source:- where multiple data sources streamlined the raw data into S3 as primary storage and then data is extracted to curated layer for further data processing. Tools you may use for data streaming -Solace , Kafka ,Golden Gate , OCI data Integration service. 
  2. Query Processing Layer: Heatwave accept data automatically from S3 by calling the auto-pilot stored procedure first time and then automatically data streamlined into heatwave by the system. 
  3. Data Visualization Layer:- Processed data can be used for further insights and apply business analysis to be used by external application or else we can understand the data by using various data visualization technique – OAC,looker,tableau etc. 


Below architecture is an example which caters to multi-zone ,active-active/passive heatwave cluster with DR site utilizing multiple availability zone/regions of AWS. 

















Below architecture is an example which caters to multi-zone ,active-active/passive heatwave cluster with DR site utilizing multiple availability zone/regions of OCI.















What kind of use case Heatwave is suitable for ? 

Use HeatWave service to eliminate the complexity, latency, costs, 

and risks of using separate cloud services for transactions, 

analytics, GenAI and ML. This use case will be most effective in 

accounts with MySQL on-premises, including Azure and AWS 

customers who have MySQL estates on-premises. 

Common use cases: 

  1. Digital Payment and Retail PoS applications 
  2. Fraud detection 
  3. Digital Marketing 
  4. Logistics applications and E-commerce app 
  5. ‘content management 

Some of additional areas you may focus on to :- 

  1. Build cloud native applications 
  2. Move existing OLTP/OLAP workloads 
  3. Run mixed kind of workloads where real-time analytics require 
  4. Power SaaS applications 
  5. Anomaly detection ,classification, time-series ML applications 

Heatwave Case Studies : https://www.mysql.com/products/mysqlheatwave/case-studies/

 

Conclusion 

By investing in a modern data platform, organizations can unlock the full potential of their data assets and drive sustainable growth and innovation in today's data-driven world 

MySQL HeatWave is the only cloud service that combines transactions, realtime analytics across data warehouses and data lakes, and machine learning in one MySQL database—without the complexity, latency, risks, and cost of ETL duplication. It delivers unmatched performance and price-performance. HeatWave AutoML enables native, in-database machine learning, allowing users to build, train, deploy, and explain machine learning models inside MySQL, without machine learning expertise. 

MySQL HeatWave can be deployed in OCI, AWS, Azure, in a hybrid environment, and in customers’ data centers with OCI Dedicated Region