In this session from Build 2019, learn how to use the new Spark API feature integration that allows Spark to fully take advantage of Cosmos DB to run real-time analytics directly on petabytes of operational data.
MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data center. Here’s an interesting take from the Register.
MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.
Databricks, announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these big data repositories. TechCrunch has an article detailing on why this is a big deal.
The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.
In this video, Dinesh Priyankara explains Azure Databricks, why and where it should be used and how to start with it. it speaks about modern data warehousing, usage of Databricks with it, creating a workspace, a cluster, notebook and a database.
In this video, learn how to ingest data using Azure Databricks in Azure SQL Data Warehouse to speed up your data pipeline and get more value from your data faster.
Gaurav Malhotra discusses how you can operationalize Jars and Python scripts running on Azure Databricks as an activity step in a Data Factory pipeline.
For more information:
Just when you thought Azure Databricks couldn’t get any better, watch this video where Yatharth Gupta, Principal Program Manager for Azure Databricks, talks about the newly introduced integration with R Studio.
For data scientists looking at scaling out R-based computing to big data, Azure Databricks provides the best way scale out their R models with Spark, that is easy to setup and integrates with the most popular R tools and frameworks. Data scientists can use Azure Databricks and R Studio to easily create analytics models, quickly access and prepare high quality data sets, and automatically run R workloads at unprecedented scale.
Bryan Cafferky introduces the awesomeness that is Databricks on Azure: A PaaS data science collaborative platform available as PaaS
Today’s business managers depend heavily on reliable data integration systems that run complex ETL/ELT workflows (extract, transform/load and load/transform data).
Gaurav Malhotra joins Scott Hanselman to discuss how you can iteratively build, debug, deploy, and monitor your data integration workflows (including analytics workloads in Azure Databricks) using Azure Data Factory pipelines.
For more information:
- Ingest, prepare, and transform using Azure Databricks and Data Factory (blog)
- Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory (docs)
- Create a free account (Azure)