Azure Synapse has many features to help analyze data, and in this episode of Data Exposed, Ginger Grant will review how to query data stored in a Data Lake not only in Azure Synapse but also visualize the data in Power BI.

The demonstrations show how to run SQL queries against the Data Lake without using any Synapse Compute or data manipulation. Ginger will also walk-through the steps for how you can connect to Power BI from within Azure Synapse and visualize the data. To help get started Power BI and Azure Synapse, the video will walk through the steps to create Power BI Data Source files to speed connectivity.

Index:

  • 0:00 Introduction
  • 1:15 What is Azure Synapse
  • 2:27 What you can do with Azure Synapse
  • 3:40 Azure Synapse Studio
  • 5:10 Including PowerBI Demo
  • 9:40 When to use Azure Synapse

Databricks recently streamed this tech chat on SCD, or Slowly Changing Dimensions.

We will discuss a popular online analytics processing (OLAP) fundamental – slowly changing dimensions (SCD) – specifically Type-2.

As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes.

Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

 

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. All of this leverages our limitless Azure Data Lake Storage service for any type of data.

Microsoft Mechanics explains.

Data Lake Storage Gen 2 is the best storage solution for big data analytics in Azure. With its Hadoop compatible access, it is a perfect fit for existing platforms like Databricks, Cloudera, Hortonworks, Hadoop, HDInsight and many more. Take advantage of both blob storage and data lake in one service!

In this video, Azure 4 Everyone introduces to what Azure Data Lake Storage is, how it works and how can you leverage it in your big data workloads. I will also explain the differences between Blob and ADLS.

Sample code from demo: https://pastebin.com/ee7ULpwx

Next steps for you after watching the video
1. Azure Data Lake Storage documentation
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
2. Transform data using Databricks and ADLS demo tutorial
– https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
3. More on multi-protocol access
– https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access
4. Read more on ACL
– https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

HDFS Tiering is Microsoft’s latest contribution to the Apache HDFS open source project.

In this video, learn how to use the HDFS tiering feature in SQL Server big data clusters, to seamlessly get access to your remote HDFS compatible storages for querying and analysis.

To learn more, check out our documentation: https://docs.microsoft.com/sql/big-data-cluster/hdfs-tiering?view=sql-server-ver15&WT.mc_id=dataexposed-c9-niner.

Here’s an interesting new feature to Azure: exporting to data lake.

This simplifies the technical and administrative complexity of operationalizing entities for analytics and managing schema and data.

In fact, within a few clicks, customers will be able to link their Common Data Service environment to a data lake in their Azure subscription, select standard or customer entities and export it to data lake.

We are super excited to announce the Export to data lake (code name: Athena) preview to our Common Data Service customers. The Export to data lake service enables continuous replication of Common Data Service entity data to Azure Data Lake Gen 2 which can then be used to run […]

Databricks, announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these big data repositories. TechCrunch has an article detailing on why this is a big deal.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

Here are some interesting concepts about data in IoT applications.

Kevin Saye shares with us on the IoT Show: all data has value, data needs to be preserved in its raw form for later usage and you want to store this data in a cost effective manner. Curious about these statements? Watch this very interesting episode about these concepts illustrated with a demo showing Azure Data Lake Analytics used to analyze data from IoT devices collected through Azure IoT Hub.