Big Data Engineering closely examines Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled....
Details
Here’s an interesting talk Albert Franziu Cros on a CI/CD setup composed by a Spark Streaming job in K8s consuming from Kafka. Over the last...
Details
This on the Databricks YouTube channel presents the web application that calculates real-time health scores at a very rapid speed using Spark on Kubernates. A...
Details
With cloud-native rising, the conversation of infrastructure costs seeped from R&D Directors to every person in the R&D: “How does much a VM cost?” “can...
Details
Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies...
Details
NLP is a key component in many data science systems that must understand or reason about text. This hands-on tutorial uses the open-source Spark NLP...
Details
In this talk from the Databricks YouTube Channel is about date-time processing in Spark 3.0, its API and implementations made since Spark 2.4. In particular,it...
Details
Delta Lake is an open-source storage management system (storage layer) that brings ACID transactions and time travel to Apache Spark and big data workloads. The...
Details
This video with David Vrba focuses on some internal features of Spark SQL which are not well described in official documentation with a strong emphasis...
Details