Data Lake Storage Gen 2 is the best storage solution for big data analytics in Azure. With its Hadoop compatible access, it is a perfect fit for existing platforms like Databricks, Cloudera, Hortonworks, Hadoop, HDInsight and many more. Take advantage of both blob storage and data lake in one service!

In this video, Azure 4 Everyone introduces to what Azure Data Lake Storage is, how it works and how can you leverage it in your big data workloads. I will also explain the differences between Blob and ADLS.

Sample code from demo:

Next steps for you after watching the video
1. Azure Data Lake Storage documentation
2. Transform data using Databricks and ADLS demo tutorial
3. More on multi-protocol access
4. Read more on ACL

Azure Databricks is fast, easy to use and scalable big data collaboration platform. Based on Apache Spark brings high performance and benefits of spark without need of having high technical knowledge.

You simply write Python/Scala scripts.

Learn the basics of Databricks and show common Blob Storage JSON to Blob Storage CSV transformation scenario in this video.

Samples from video:

In our previous episodes of the AI Show, we’ve learned all about the Azure Anomaly detector, how to bring the service on premises, and some awesome tips and tricks for getting the service to work well for you.

In this episode of the AI Show, Qun Ying shows us how to build an end-to-end solution using the Anomaly Detector and Azure Databricks. This step by step demo detects numerical anomalies from streaming data coming through Azure Event Hubs.

Anomaly Detection on Streaming Data Using Azure Databricks Related Links

David Giard recently posted a how-to article on creating an Azure DataBricks service. Check it out!

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft’s Azure cloud platform. Databricks provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a […]

CloudAcademy has an intro piece Apache Spark on Azure DataBricks.

Apache Spark is an open-source framework for doing big data processing. It was developed as a replacement for Apache Hadoop’s MapReduce framework. Both Spark and MapReduce process data on compute clusters, but one of Spark’s big advantages is that it does in-memory processing, which can be orders of magnitude faster than the disk-based processing that MapReduce uses. There are plenty of other differences between the two systems, as well, but we don’t need to go into the details here.

MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data center. Here’s an interesting take from the Register.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.