In this talk, Andrei Varanoch demonstrates the blueprint for such a Lambda Architecture implementation in Microsoft Azure, with Azure Databricks — a PaaS Spark offering – as a key component. The term “Lambda Architecture” stands for a generic, scalable and fault-tolerant data processing architecture. As the hyper-scale now offers a various PaaS services for data ingestion, storage and processing, the need for a revised, cloud-native implementation of the lambda architecture is arising.
As Apache Spark is 10 years old. This article in Analytics India Magazine explores what led to Spark’s widespread adoption and what will keep it going into the future.
Dubbed as the official “in-memory replacement for MapReduce”, the disk-based computational engine is at the heart of early Hadoop clusters. Why Spark took off was because it reflects the changing processing paradigm to a more memory intensive pipeline, so if your cluster has a decent memory and an API simpler than MapReduce, processing in Spark will be faster. The reason why Spark is faster is because most of the operations (including reads) decrease in processing time roughly linearly with the number of machines since it’s all distributed.
ComputerPhile has a great video where Rebecca Tickle explains the inner workings of Apache Spark and what makes it better than MapReduce. As an added bonus, she uses Scala in the demo.
It’s also interesting to note that she used Spark in her day job pulling IoT data from trucks (“Lorries”).
Bryan Cafferky introduces the awesomeness that is Databricks on Azure: A PaaS data science collaborative platform available as PaaS
Here’s a particularly interesting tutorial on Spark by Frank Kane, the other guy named Frank in Data Science. 😉
Here’s a great talk on how to use CosmosDB in conjunction with Apache Spark.
Spark is gaining momentum in the big data space. Watch this video for a demonstration of how you can use your favorite developer tools to debug Spark applications.
Product info: azure.microsoft.com/en-us/services/hdinsight/apache-spark/
Learn more: docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-load-data-run-query
Mike Olson, Chief Strategy Officer and Co-Founder at Cloudera, explains Apache Spark’s origins, its rise in popularity in the open source community, and how Spark is primed to replace MapReduce as the general processing engine in Hadoop.