Davis Busteed walks us through building a proof of concept for Spark Streaming from a Kafka Source to Hive.
Check out the README and resource files at
https://github.com/dbusteed/kafka-spark-streaming-example
Davis Busteed walks us through building a proof of concept for Spark Streaming from a Kafka Source to Hive.
Check out the README and resource files at
https://github.com/dbusteed/kafka-spark-streaming-example
At work recently, a question came up about whether Spark or Tez is better. Here’s an interesting article with some interesting perspectives.
On paper, Spark and Tez have a lot in common: both possess in-memory capabilities, can run on top of Hadoop YARN and support all data types from any data sources. So, what’s the difference?
In this video, Murali Krishnaprasad discusses Interactive Query (also called Hive LLAP, or Low Latency Analytical Processing, or Live Long and Process), which is an Azure HDInsight cluster type. Interactive Query supports in-memory caching, which makes Hive queries super-fast and interactive. See how to use HDInsight Interactive Query to analyze extremely large datasets (~100TB) in common file formats such as ORC and CSV using common BI/SQL tools including Zeppelin notebooks and VS Code.
For more information, see:
Rafael Coss, manager Big Data Enablement for IBM, explains Hive defined in 3 minutes.