Change Data Capture (CDC) is a typical use case in Real-Time Data Warehousing. It tracks the data change log (binlog) of a relational database (OLTP), and replay these change log timely to an external storage to do Real-Time OLAP, such as delta/kudu.

To implement a robust CDC streaming pipeline, lots of factors should be concerned, such as how to ensure data accuracy , how to process OLTP source schema changed, whether it is easy to build for variety databases with less code. This talk will share the practice for simplify CDC pipeline with SparkStreaming SQL and Delta Lake.

Adam Marczak explains Azure Data Factory Mapping Data Flow in this video.

With Azure Data Factory Mapping Data Flow, you can create fast and scalable on-demand transformations by using visual user interface. In just minutes you can leverage power of Spark with not a single line of code written.

In this episode I give you introduction to what Mapping Data Flow for Data Factory is and how can it solve your day to day ETL challenges. In a short demo I will consume data from blob storage, transform movie data, aggregate it and save multiple outputs back to blob storage.

Sample code and data: https://github.com/MarczakIO/azure4everyone-samples/tree/master/azure-data-factory-mapping-data-flows 

Ayman El-Ghazali recently presenting this Introduction to Databricks from the perspective of a SQL DBA at the NoVA SQL Users Group.

Code available at:https://github.com/thesqlpro/blogThis is an introduction to Databricks from the perspective of a SQL DBA. Come learn about the following topics:

  • Basics of how Spark works
  • Basics of how Databricks works (cluster setup, basic admin)
  • How to design and code an ETL Pipeline using Databricks
  • How to read/write from Azure Datalake and Database
  • Integration of Databricks into Azure Data Factory pipeline

Code available at:  https://github.com/thesqlpro/blog

Databricks recently streamed this tech chat on SCD, or Slowly Changing Dimensions.

We will discuss a popular online analytics processing (OLAP) fundamental – slowly changing dimensions (SCD) – specifically Type-2.

As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes.

Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

 

ThorogoodBI explores the use of Databricks for data engineering purposes in this webinar.

Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In the second of 2 recorded webcasts Thorogood Consultants Jon Ward and Robbie Shaw showcase Databricks’ data transformation and data movement capabilities, how the tool aligns with cloud computing services, and highlight the security, flexibility and collaboration aspects of Databricks. We’ll also look at Databricks Delta Lake, and how it offers improved storage for both large-scale datasets and real-time streaming data.Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In this video, Chris Seferlis continues discussing the Modern Data Platform in Azure with Part 3: Data Processing.

Tools Discusssed:

Learn about what is Spark and using it in Big Data Clusters.

Time index

  • [00:00] Introduction
  • [00:30] One-sentence definition of Spark
  • [00:47] Storing Big Data
  • [01:44] What is Spark?
  • [02:35] Language choice
  • [03:27] Unified compute engine
  • [04:57] Spark with SQL Server
  • [05:47] Learning more
  • [06:10] Wrap-up