Forbes points out that the term “Big Data” has been eclipsed by “Data Science” in the hype cycle. However, the Great Hype Cycle resembles Game of Thrones and I think we can all agree that “AI” or “Machine Learning” is next to sit on the Iron Throne of Hype.

In a world in which “big data” and “data science” seem to adorn every technology-related news article and social media post, have the terms finally reached public interest saturation? As the use of large amounts of data has become mainstream, is the role of “data science” replacing the hype of “big data?”

It never hurts to practice the fundamentals and understanding SQL is fundamental to any well-rounded data scientist. Here’s an interesting closeup look at T-SQL, the SQL “dialect” found in SQL Server.

Like any programming language, T-SQL has its share of common bugs and pitfalls, some of which cause incorrect results and others cause performance problems. In many of those cases, there are best practices that can help you avoid getting into trouble. I surveyed fellow Microsoft Data Platform MVPs asking […]

Christian Wade joins Scott Hanselman to show you how to unlock petabyte-scale datasets in Azure with a way that was not previously possible. Learn how to use the aggregations feature in Power BI to enable interactive analysis over big data.

For more information:

 

In this episode of Azure Friday, Thomas Alex discusses how Microsoft uses Apache Kafka for HDInsight to power Siphon, a data ingestion service for internal use.

Apache Kafka for HDInsight is an enterprise-grade, open-source, streaming ingestion service. Microsoft created Siphon as a highly available and reliable service to ingest massive amounts of data for processing in near real time. Siphon handles ingestion of over a trillion events per day across multiple business-critical scenarios at Microsoft. In this episode, learn how Siphon uses Apache Kafka for HDInsight as its scalable pub/sub message queue.

 

For more information:

In this video, Murali Krishnaprasad discusses Interactive Query (also called Hive LLAP, or Low Latency Analytical Processing, or Live Long and Process), which is an Azure HDInsight cluster type. Interactive Query supports in-memory caching, which makes Hive queries super-fast and interactive. See how to use HDInsight Interactive Query to analyze extremely large datasets (~100TB) in common file formats such as ORC and CSV using common BI/SQL tools including Zeppelin notebooks and VS Code.

For more information, see:

In this video, Katherine Kampf, PM on Azure Big Data team, talks about the newly introduced ML Services in Azure HDInsight.

ML Services bridges these Microsoft innovations and contributions coming from the open-source community (R, Python, and AI toolkits) all on top of a single enterprise-grade platform. Any R or Python open-source machine learning package can work side by side with any proprietary innovation from Microsoft.

ML Services includes highly scalable, distributed set of algorithms such as RevoscaleRrevoscalepy, and microsoftML that can work on data sizes larger than the size of physical memory, and run on a wide variety of platforms in a distributed manner.

Terrabytes, Petabytes, and Yottabytes. We toss these terms around quite casually, but do we really know how huge these are?

Using grains of rice and retro computer graphics reminiscent of the Computer Chronicles, the YouTube channel “It’s OK to Be Smart” explores how big big data is and what the future of data storage may be.