Here’s an interesting look at how Amazon, a web site that originally only sold books, became a member of the trillion dollar valuation club.
Quartz takes a look at an old technology resurrected to help with the flood of big data organizations like CERN have.
If any company takes the idea that “data is the new oil” to heart, it’s Facebook. Here’s a sobering interview with Yael Eisenstat, a former Facebook employee, by WIRED Magazine about the consequences of it all.
The titans of social media are trapped, and we’re all suffering for it. As free services, Facebook, Twitter, and YouTube monetize you by keeping you engaged, so they can show you more ads. The services are designed to exploit our brain chemistry, flashing us notifications and giving us one more hit of algorithm-recommended video. If they didn’t, their revenue would dwindle and shareholders would be unhappy.
If you’re looking to take the Microsoft exams of DP-200 and DP-201, then you need to read this blog post carefully and study everything recommended in it.
It helped me to pass both tests with flying colors and, since the contents of both exams are similar, this one post will help you with both.
Also, I recommend taking DP-201 before taking DP-200.
Here’s a list of the skills and objectives measured on the DP-200 exam, taken from the official exam’s objectives. The percentages next to each objective area represent the number of questions that you will find in that area on the exam. Below each topic, you will find links to the resources that I have found helpful.
CloudAcademy has an intro piece Apache Spark on Azure DataBricks.
Apache Spark is an open-source framework for doing big data processing. It was developed as a replacement for Apache Hadoop’s MapReduce framework. Both Spark and MapReduce process data on compute clusters, but one of Spark’s big advantages is that it does in-memory processing, which can be orders of magnitude faster than the disk-based processing that MapReduce uses. There are plenty of other differences between the two systems, as well, but we don’t need to go into the details here.
BBC Click explores the impact of GDPR one year later and offers a brief glimpse into what our smartphones know about us.
At work recently, a question came up about whether Spark or Tez is better. Here’s an interesting article with some interesting perspectives.
On paper, Spark and Tez have a lot in common: both possess in-memory capabilities, can run on top of Hadoop YARN and support all data types from any data sources. So, what’s the difference?
Here’s an interesting look at how big big data is from Computerphile, for those not satisfied with my “Costco Test for Big Data.”
BBC Click asks the question: Is Big Brother Watching You? (Spoiler alert: yes)