In this last livestream of this foul Year of Our Lord 2020, I reflect on what it will take to really make your life better in 2021.
Amidst rapidly changing conditions, many companies build ETL pipelines using ad-hoc strategy.
However, this approach makes automated testing for data reliability almost impossible and leads to ineffective and time-consuming manual ETL monitoring.
Software engineering decouples code dependency, enables automated testing, and powers engineers to design, deploy, and serve reliable data in a module manner.
As a consequence, the organization is able to easily reuse and maintain its ETL code base and, therefore, scale.
In this presentation, we discuss the challenges data engineers face when it comes to data reliability. Furthermore, we demonstrate how software engineering best practices help to build code modularity and automated testings for modern data engineering pipelines.
Today, I had the chance to speak with Supriya Sri about her work as a grad student in data science.
To help set the holiday mood, here’s a Holiday Special I made for FranksWorldTV in 2014.
Most Christmas specials on TV are pretty lame. FWTV kicks it up a notch.
Jon Wood shows how to use the SAFE Stack as well as how to implement a ML.NET model.
Jon Wood shows us how to register and deploy an AutoML model within the Azure ML Service.
codebasics delivers this great tutorial on sliding window object detection is a technique that allows you to detect objects in a picture.
This technique is not very efficient as it is very compute intensive. Recently new techniques has been discovered that tried to improve performance such as R CNN, Fast R CNN, Faster R CNN etc. YOLO (You only look once) is a state of the art most modern technique that outperforms all other previous techniques such as sliding window object detection, R CNN, Fast and Faster R CNN etc. We will cover YOLO in future videos.
In this episode of Data Exposed, Kate Smith walks us through the process of how to create Elastic Jobs for Azure SQL using PowerShell.
Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release.
Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads.
In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.
We will go through an end-to-end example of building, deploying and maintaining an end-to-end data pipeline. This will be a code-heavy session with many tips to help beginners and intermediate Spark developers be successful with Spark on Kubernetes, and live demos running on the Data Mechanics platform.
– Setting up your environment (data access, node pools)
– Sizing your applications (pod sizes, dynamic allocation)
– Boosting your performance through critical disk and I/O optimizations
– Monitoring your application logs and metrics for debugging and reporting