Here’s a question for the ages and the wise old sages.

Although there are lots of similarities across Software Development and Data Science , they also have three main differences: processes, tooling and behavior. Find out. In my previous article , I talked about model governance and holistic model management. I received great response, along with some questions about the […]

Here’s an interesting article from CodeProject defining the cycles of data science and how it relates to business cycles and the fairly well established framework of SDLC. Although some will argue that data science is “pure science” and this cycle belongs to the “data engineering” label, organizations that fail to move innovations efficiently from “the lab” to production are not going to be competitive.

By its simple definition, Data Science is a multi-disciplinary field that contains multiple processes to extract knowledge or useful output from Input Data. The output may be Predictive or Descriptive analysis, Report, Business Intelligence, etc. Data Science has well-defined lifecycles similar to any other projects and CRISP-DM and TDSP are some of the proven standards.

Data integration is complex with many moving parts. It helps organizations to combine data and complex business processes in hybrid data environments. Failures are very common in data integration workflows. This can happen due to data not arriving on time, functional code issues in your pipelines, infrastructure issues, etc.

A common requirement is the ability to rerun failed activities within data integration workflows. In addition, sometimes you need to rerun activities to re-process data due to an error upstream in data processing. Azure Data Factory now enables you to rerun the entire pipeline or choose to rerun downstream from a particular activity inside a pipeline.

Data engineering is about 70% of any data pipeline today, and without having the experience to implement a data engineering pipeline well, there is no value to be accumulated from your data.
In this session from Microsoft Ignite we discuss the best practices and demonstrate how a data engineer can develop and orchestrate the big data pipeline, including: data ingestion and orchestration using Azure Data Factory; data curation, cleansing and transformation using Azure Databricks; data loading into Azure SQL Data Warehouse for serving your BI tools.
Watch and learn how to effectively do the ETL/ELT process combined with advanced capabilities such as monitoring the jobs, getting alerts, jobs retrial, set permissions, and much more.

On the podcast Andy Leonard and I create, we love experimenting around here and examining the resulting data. After all, we are Data Driven: not just in name but also in spirit.In this webinar Andy recorded, he also streamed it live on our Facebook page.

We thought it was good enough to share with our larger audience here.Let us know what you think. Both Frank and Andy have been recording/streaming their live events and we’re curious to hear what you have to say about this innovation in how we podcast.

Press the play button below to listen here or visit the show page at DataDriven.tv

 

Frank and Andy talked about doing a Deep Dive show where they take a deep look into a particular data science technology, term, or methodology.  And now, they deliver!

In this very first Deep Dive, Frank and Andy discuss the differences between Data Science and Data Engineering, where they overlap, where they differ, and why so many C-level execs can’t seem to figure out the deltas.