Microsoft Mechanics learns how UK-based data engineering consultant, endjin, is evaluating Azure Synapse for on-demand serverless compute and querying.

Endjin specializes in big data analytics solutions for customers across a range of different industries such as ocean research, financial services, and retail industries.

Host Jeremy Chapman speaks with Jess Panni, Principal and Data Architect at endjin, to discuss how they’re using SQL serverless for on-demand compute as well as visualization capabilities to help customers with big data challenges. If you are new to Azure Synapse, it’s Microsoft’s limitless analytics platform that brings enterprise data warehousing and big data processing together into a single service, removing the traditional constraints for analyzing data of all shapes and sizes.

For more information on endjin and how they help small teams achieve big things, check out their website at https://endjin.com

Watch an introduction to Azure Synapse at https://aka.ms/mechanicssynapse 

Check out other early adopters on our How We Built It series at https://aka.ms/AzureSynapseSeries

Databricks recently streamed this tech chat on SCD, or Slowly Changing Dimensions.

We will discuss a popular online analytics processing (OLAP) fundamental – slowly changing dimensions (SCD) – specifically Type-2.

As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes.

Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

 

ThorogoodBI explores the use of Databricks for data engineering purposes in this webinar.

Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In the second of 2 recorded webcasts Thorogood Consultants Jon Ward and Robbie Shaw showcase Databricks’ data transformation and data movement capabilities, how the tool aligns with cloud computing services, and highlight the security, flexibility and collaboration aspects of Databricks. We’ll also look at Databricks Delta Lake, and how it offers improved storage for both large-scale datasets and real-time streaming data.Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In this video, Chris Seferlis continues discussing the Modern Data Platform in Azure with Part 3: Data Processing.

Tools Discusssed:

Here’s a question for the ages and the wise old sages.

Although there are lots of similarities across Software Development and Data Science , they also have three main differences: processes, tooling and behavior. Find out. In my previous article , I talked about model governance and holistic model management. I received great response, along with some questions about the […]

Here’s an interesting article from CodeProject defining the cycles of data science and how it relates to business cycles and the fairly well established framework of SDLC. Although some will argue that data science is “pure science” and this cycle belongs to the “data engineering” label, organizations that fail to move innovations efficiently from “the lab” to production are not going to be competitive.

By its simple definition, Data Science is a multi-disciplinary field that contains multiple processes to extract knowledge or useful output from Input Data. The output may be Predictive or Descriptive analysis, Report, Business Intelligence, etc. Data Science has well-defined lifecycles similar to any other projects and CRISP-DM and TDSP are some of the proven standards.

Data integration is complex with many moving parts. It helps organizations to combine data and complex business processes in hybrid data environments. Failures are very common in data integration workflows. This can happen due to data not arriving on time, functional code issues in your pipelines, infrastructure issues, etc.

A common requirement is the ability to rerun failed activities within data integration workflows. In addition, sometimes you need to rerun activities to re-process data due to an error upstream in data processing. Azure Data Factory now enables you to rerun the entire pipeline or choose to rerun downstream from a particular activity inside a pipeline.

Data engineering is about 70% of any data pipeline today, and without having the experience to implement a data engineering pipeline well, there is no value to be accumulated from your data.
In this session from Microsoft Ignite we discuss the best practices and demonstrate how a data engineer can develop and orchestrate the big data pipeline, including: data ingestion and orchestration using Azure Data Factory; data curation, cleansing and transformation using Azure Databricks; data loading into Azure SQL Data Warehouse for serving your BI tools.
Watch and learn how to effectively do the ETL/ELT process combined with advanced capabilities such as monitoring the jobs, getting alerts, jobs retrial, set permissions, and much more.

On the podcast Andy Leonard and I create, we love experimenting around here and examining the resulting data. After all, we are Data Driven: not just in name but also in spirit.In this webinar Andy recorded, he also streamed it live on our Facebook page.

We thought it was good enough to share with our larger audience here.Let us know what you think. Both Frank and Andy have been recording/streaming their live events and we’re curious to hear what you have to say about this innovation in how we podcast.

Press the play button below to listen here or visit the show page at DataDriven.tv