Advancing Analytics explainshow to parameterize Spark in Synapse Analytics, meaning you can plug notebooks to our orchestration pipelines and dynamically pass parameters to change how it works each time.

But how does it actually work?

Simon’s digging in to give us a quick peek at the new functionality.

For more details on the new parameters, take a peek here: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-development-using-notebooks#orchestrate-notebook

Azure Synapse workspaces can host a Spark cluster.

In addition to providing the execution environment for certain Synapse features such as Notebooks, you can also write custom code that runs as a job inside Synapse hosted Spark cluster.

This video walks through the process of running a C# custom Spark job in Azure Synapse. It shows how to create the Synapse workspace in the Azure portal, how to add a Spark pool, and how to configure a suitable storage account. It also shows how to write the custom job in C#, how to upload the built output to Azure, and then how to configure Azure Synapse to execute the .NET application as a custom job.

Topics/Time index:

  • Create a new Azure Synapse Analytics workspace (0:17)
  • Configuring security on the storage account (1:29)
  • Exploring the workspace (2:42)
  • Creating an Apache Spark pool (3:01)
  • Creating the C# application (4:05)
  • Adding a namespace directive to use Spark (SQL 4:48)
  • Creating the Spark session (5:01)
  • How the job will work (5:22)
  • Defining the work with Spark SQL (6:42)
  • Building the .NET application to upload to Azure Synapse (9:48)
  • Uploading our application to Azyure Synapse (11:45)
  • Using the ZIPed .NET application in a custom Spark job definition (12:39)
  • Testing the custom job (13:36)
  • Monitoring the job (13:56)
  • Inspecting the results (14:25)

Chris Seferlis discusses one of the lesser known and newer Data Services in Azure, Data Explorer.

If you’re looking to run extremely fast queries over large sets of log and IoT data, this may be the right tool for you. I also discuss where it’s not a replacement for Azure Synapse or Azure Databricks, but works nicely alongside them in the overall architecture of the Azure Data Platform.

In this video Chris Seferlis discusses some of the reasons you might want to choose Azure Data Factory over Azure Synapse Workspaces with Synapse Studio.

Even though many of the features overlap, there are still scenarios where I’d use ADF, and pass on the additional features of Synapse. Let me know your thoughts below, please like, comment, share and follow me on Twitter: @bizdataviz

Here’s an interesting read I discovered via a coworker about the Azure Data Factory Azure Synapse Analytics (Preview).

In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a compute service.

An integration runtime provides the bridge between the activity and linked Services. It’s referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from.

This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.

Integration runtime in Azure Data Factory