Building a curated data lake on real time data is an emerging data warehouse pattern with delta.

However in the real world, what we many times face ourselves with is dynamically changing schemas which pose a big challenge to incorporate without downtimes.

In this presentation we will present how we built a robust streaming ETL pipeline that can handle changing schemas and unseen event types with zero downtimes. The pipeline can infer changed schemas, adjust the underlying tables and create new tables and ingestion streams when it detects a new event type. We will show the details how to infer the schemas on the fly and how to track and store these schemas when you don’t have the luxury of having a schema registry in the system.

With potentially hundreds of streams, it’s important how we deploy these streams and make them operational on Databricks.

This on the Databricks YouTube channel presents the web application that calculates real-time health scores at a very rapid speed using Spark on Kubernates.

A health score represents a machine’s lifetime and it is commonly used as a landmark for making a decision on whether to replace the machine with new one for high productivity maintenance. Therefore, it is very important to observe the health scores of the large number of machines in a factory without a delay.

To cope with this issue, the BISTel has applied the stream processing using Spark and services the real-time health score application.

Anthony Chu joins Donovan Brown to show how to deliver live updates from Azure Functions to web, mobile, and desktop apps with Azure SignalR Service.

Learn how to send real-time messages over WebSockets from your serverless apps with a few lines of code.

Related Links: