Databricks live streamed this interview with Matei Zaharia, an assistant professor at Stanford CS and co-founder and Chief Technologist of Databricks, the data and AI platform startup.

During his Ph.D., Matei started the Apache Spark project, which is now one of the most widely used frameworks for distributed data processing. He also co-started other widely used data and AI software such as MLflow, Apache Mesos and Spark Streaming.

Here’s a great listicle on why Python is the goto language of big data.

Don’t get me wrong, R is also a great choice, but I think the article is geared to convincing developers of other languages to check out Python.

Big Data is the most valuable commodity in present times! The data generated by companies and people is growing so much that the data generated would reach 175 zettabytes in 2025 whereas it is around 50 zettabytes currently. And Python is the best programming language to manage this Big […]

ThorogoodBI explores the use of Databricks for data engineering purposes in this webinar.

Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In the second of 2 recorded webcasts Thorogood Consultants Jon Ward and Robbie Shaw showcase Databricks’ data transformation and data movement capabilities, how the tool aligns with cloud computing services, and highlight the security, flexibility and collaboration aspects of Databricks. We’ll also look at Databricks Delta Lake, and how it offers improved storage for both large-scale datasets and real-time streaming data.Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

Databricks hosted this webinar introducing Apache Spark, the platform that Databricks is based upon.

Abstract: scikit-learn is one of the most popular open-source machine learning libraries among data science practitioners.

This workshop will walk through what machine learning is, the different types of machine learning, and how to build a simple machine learning model. This workshop focuses on the techniques of applying and evaluating machine learning methods, rather than the statistical concepts behind them. We will be using data released by the New York Times (https://github.com/nytimes/covid-19-data).

Prior basic Python and pandas experience is required.

Previous webinars in the series:

  • Watch Part1, Intro to Python: https://youtu.be/HBVQAlv8MRQ ( to learn about python)
  • Watch Part 2, Data Analysis with pandas: https://youtu.be/riSgfbq3jpY
  • Watch Part 3, Machine Learning: https://youtu.be/g103iO-izoI

Are you working with Apache Kafka and want to simplify management of your infrastructure?

Lena Hall joins Scott Hanselman to show you can keep using Apache Kafka libraries for hundreds of projects, and try Azure Event Hubs behind the scenes to focus on code instead of maintaining infrastructure     

Related Links

AI is everywhere – and now even included in Power BI Desktop.

No matter if you’re a business user, analyst, or data scientist – Power BI has AI capabilities tailored to you.

In this video, learn how to leverage the use of language R, integrate an Azure Machine Learning Service when loading data, and understand what kinds of insights Power BI is capable of delivering automatically. 

To learn more, visit: https://community.powerbi.com

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. All of this leverages our limitless Azure Data Lake Storage service for any type of data.

Microsoft Mechanics explains.

Microsoft’s Project Silica aims to show that glass is the future of long-term data storage.

To prove its usefulness outside the lab, Microsoft partnered with Warner Bros. to write the 1978 Superman film into glass with lasers.

To see the whole process and the Superman glass, CNET visited Microsoft’s Research Lab in Cambridge, England and Warner Bros. Studios in Burbank, California.