SQL Server and BI to Delta Lake and Lakehouses

SQL Server and BI to Delta Lake and Lakehouses

Considering shifting gears into Spark Data Engineering? Join this fun session with Simon Whiteley (@mrsiwhiteley) and Denny Lee (@dennylee) as they chat through their meandering... Details
Making Apache Spark Better with Delta Lake

Making Apache Spark Better with Delta Lake

Join Michael Armbrust, head of Delta Lake engineering team, to learn about how his team built upon Apache Spark to bring ACID transactions and other... Details
Healthcare Claim Reimbursement using Apache Spark

Healthcare Claim Reimbursement using Apache Spark

Here’s an interesting video on using Databricks to increase the efficiency of healthcare claim reimbursements. Details
Load data using Petastorm

How to Load Data Using Petastorm

Petastorm is an open source data access library. This library enables single-node or distributed training and evaluation of deep learning models directly from datasets in... Details
The Apache Spark File Format Ecosystem

The Apache Spark File Format Ecosystem

It’s all too easy to overlook the importance of storage and IO in the performance and optimization of Spark jobs. However, the choice of file... Details
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters

Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters

XGBoost is one of the most popular machine learning library, and its Spark integration enables distributed training on a cluster of servers. This talk will... Details
Deep Dive into GPU Support in Apache Spark 3.x

Deep Dive into GPU Support in Apache Spark 3.x

GPU support in Apache Spark presents massive opportunities for significant speedup of ETL, ML and DL applications. Here’s a great video by Databricks on the... Details
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Change Data Capture (CDC) is a typical use case in Real-Time Data Warehousing. It tracks the data change log (binlog) of a relational database (OLTP),... Details
Deep Dive into the New Features of Apache Spark 3.0

Deep Dive into the New Features of Apache Spark 3.0

Databricks provides an in depth look at the new features of Spark 3.0. Details
Introducing Apache Spark 3.0

Introducing Apache Spark 3.0

Here’s a keynote from Matei Zaharia, the original creator of Apache Spark, that contains retrospective of the Last 10 Years, and a Look Forward to... Details