In this second part episode, Fernando Mejia walks through everything you need to plan for in a Hybrid Cloud architecture for Azure Kubernetes Service.

This includes IP address concerns from on-premises to Azure, hub and spoke topology, as well as the different options you have in Azure Kubernetes Service. 

Watch Part 1

Learn more: https://azure.microsoft.com/en-us/overview/kubernetes-on-azure

Ayman El-Ghazali recently presenting this Introduction to Databricks from the perspective of a SQL DBA at the NoVA SQL Users Group.

Code available at:https://github.com/thesqlpro/blogThis is an introduction to Databricks from the perspective of a SQL DBA. Come learn about the following topics:

  • Basics of how Spark works
  • Basics of how Databricks works (cluster setup, basic admin)
  • How to design and code an ETL Pipeline using Databricks
  • How to read/write from Azure Datalake and Database
  • Integration of Databricks into Azure Data Factory pipeline

Code available at:  https://github.com/thesqlpro/blog

In this video, Chris Seferlis continues discussing the Modern Data Platform in Azure with Part 3: Data Processing.

Tools Discusssed:

Data Lake Storage Gen 2 is the best storage solution for big data analytics in Azure. With its Hadoop compatible access, it is a perfect fit for existing platforms like Databricks, Cloudera, Hortonworks, Hadoop, HDInsight and many more. Take advantage of both blob storage and data lake in one service!

In this video, Azure 4 Everyone introduces to what Azure Data Lake Storage is, how it works and how can you leverage it in your big data workloads. I will also explain the differences between Blob and ADLS.

Sample code from demo: https://pastebin.com/ee7ULpwx

Next steps for you after watching the video
1. Azure Data Lake Storage documentation
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
2. Transform data using Databricks and ADLS demo tutorial
– https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
3. More on multi-protocol access
– https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access
4. Read more on ACL
– https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

As Apache Spark is 10 years old. This article in Analytics India Magazine explores what led to Spark’s widespread adoption and what will keep it going into the future.

Dubbed as the official “in-memory replacement for MapReduce”, the disk-based computational engine is at the heart of early Hadoop clusters. Why Spark took off was because it reflects the changing processing paradigm to a more memory intensive pipeline, so if your cluster has a decent memory and an API simpler than MapReduce, processing in Spark will be faster. The reason why Spark is faster is because most of the operations (including reads) decrease in processing time roughly linearly with the number of machines since it’s all distributed.