MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data center. Here’s an interesting take from the Register.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.

Databricks, announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these big data repositories. TechCrunch has an article detailing on why this is a big deal.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

Data engineering is about 70% of any data pipeline today, and without having the experience to implement a data engineering pipeline well, there is no value to be accumulated from your data.
In this session from Microsoft Ignite we discuss the best practices and demonstrate how a data engineer can develop and orchestrate the big data pipeline, including: data ingestion and orchestration using Azure Data Factory; data curation, cleansing and transformation using Azure Databricks; data loading into Azure SQL Data Warehouse for serving your BI tools.
Watch and learn how to effectively do the ETL/ELT process combined with advanced capabilities such as monitoring the jobs, getting alerts, jobs retrial, set permissions, and much more.

Gaurav Malhotra discusses how you can operationalize Jars and Python scripts running on Azure Databricks as an activity step in a Data Factory pipeline.

For more information:

Just when you thought Azure Databricks couldn’t get any better, watch this video where Yatharth Gupta, Principal Program Manager for Azure Databricks, talks about the newly introduced integration with R Studio.

For data scientists looking at scaling out R-based computing to big data, Azure Databricks provides the best way scale out their R models with Spark, that is easy to setup and integrates with the most popular R tools and frameworks. Data scientists can use Azure Databricks and R Studio to easily create analytics models, quickly access and prepare high quality data sets, and automatically run R workloads at unprecedented scale.

Today’s business managers depend heavily on reliable data integration systems that run complex ETL/ELT workflows (extract, transform/load and load/transform data).

Gaurav Malhotra joins Scott Hanselman to discuss how you can iteratively build, debug, deploy, and monitor your data integration workflows (including analytics workloads in Azure Databricks) using Azure Data Factory pipelines.

For more information: