Automate Data Pipelines with PySpark SQL

Automate Data Pipelines with PySpark SQL

Are you struggling with your cloud data management costs and architecture? Are you looking for ways to accelerate your data engineering capacity? By leveraging an... Details
SQL Server and BI to Delta Lake and Lakehouses

SQL Server and BI to Delta Lake and Lakehouses

Considering shifting gears into Spark Data Engineering? Join this fun session with Simon Whiteley (@mrsiwhiteley) and Denny Lee (@dennylee) as they chat through their meandering... Details
Encoding multi-layered Vega-Lite COVID-19 Geodata visualizations

Encoding multi-layered Vega-Lite COVID-19 Geodata visualizations

Visualizations are a powerful tool for communicating results to end-users and stakeholders. Their development and life-cycle management are no less challenging than the underlying processes... Details
Making Apache Spark Better with Delta Lake

Making Apache Spark Better with Delta Lake

Join Michael Armbrust, head of Delta Lake engineering team, to learn about how his team built upon Apache Spark to bring ACID transactions and other... Details
Scaling Up AI Research to Production with PyTorch and MLFlow

Scaling Up AI Research to Production with PyTorch and MLFlow

PyTorch, the popular open-source ML framework, has continued to evolve rapidly since the introduction of PyTorch 1.0, which brought an accelerated workflow from research to... Details
Healthcare Claim Reimbursement using Apache Spark

Healthcare Claim Reimbursement using Apache Spark

Here’s an interesting video on using Databricks to increase the efficiency of healthcare claim reimbursements. Details
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters

Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters

XGBoost is one of the most popular machine learning library, and its Spark integration enables distributed training on a cluster of servers. This talk will... Details
Deep Dive into GPU Support in Apache Spark 3.x

Deep Dive into GPU Support in Apache Spark 3.x

GPU support in Apache Spark presents massive opportunities for significant speedup of ETL, ML and DL applications. Here’s a great video by Databricks on the... Details
Introducing MLflow for End-to-End Machine Learning on Databricks

Introducing MLflow for End-to-End Machine Learning on Databricks

Solving a data science problem is about more than making a model. It entails data cleaning, exploration, modeling and tuning, production deployment, and workflows governing... Details
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Change Data Capture (CDC) is a typical use case in Real-Time Data Warehousing. It tracks the data change log (binlog) of a relational database (OLTP),... Details