The Microsoft Azure channel explains how KPMG Japan uses Azure Arc to build out a seamless data solution.

KPMG Ignition Tokyo, the centerpiece of KPMG Japan’s digital strategy, delivers specialty software solutions to its global clients. With a multi-cloud and hybrid approach, the firm is rolling out its next-generation, AI-based audit software built on Azure, and implementing Azure Arc to deliver seamless solutions for clients across multiple hybrid data estates.

Adam Marczak explains Azure Data Factory Mapping Data Flow in this video.

With Azure Data Factory Mapping Data Flow, you can create fast and scalable on-demand transformations by using visual user interface. In just minutes you can leverage power of Spark with not a single line of code written.

In this episode I give you introduction to what Mapping Data Flow for Data Factory is and how can it solve your day to day ETL challenges. In a short demo I will consume data from blob storage, transform movie data, aggregate it and save multiple outputs back to blob storage.

Sample code and data: https://github.com/MarczakIO/azure4everyone-samples/tree/master/azure-data-factory-mapping-data-flows 

Learn how to extract value from your data to bring the impact of your low-code solutions to a whole new level.

PowerApps already enable creation of useful business applications with minimal effort.

In this session, you will learn about how and why to connect your applications to Azure services responsible for Big Data.

You will see an example of an application that keeps track of NYC taxi logs and provides logistical information for greater business insights. You will leave this session with confident understanding of what Big Data connection options PowerApps provide, how to connect your application to Big Data, as well as how to reference and visualize it.

Additional Resources: Power Apps Devs

Gaurav Sen explains NoSQL databases in this introductory video.

NoSQL is a popular database storage method. It keeps data as key value pairs. The advantages and disadvantages of NoSQL compared with RDBMS (which uses SQL) are discussed here, using the Cassandra architecture as an example.

Video index:

  • 1:08 NoSQL explanation and comparison
  • 10:27 Cassandra Architecture
  • 18:00 Quorum
  • 21:30 Compaction of SST tables

Here’s an interesting idea that combines K8S, AI, Big Data, and HPC.

With the emergence and support of Mobile, IoT and Edge Computing technologies, we are seeing the next wave of workloads running on cloud native platforms — Artificial Intelligence (AI; including Machine Learning and Deep Learning), Big Data, and High-Performance Computing (HPC) — where a large amount of compute resources running “batch jobs” connected to massive data lakes is essential.

Microsoft Mechanics shows us a practical use case for Predictive Maintenance, Safety, and Efficiency through Microsoft Azure Synapse.

Find out how Azure Synapse is part of the next-generation data and analytics platform for global aviation tech company, GE Aviation. Jeremy Chapman speaks with Luke Bowman, Senior Product Manager at GE Aviation’s Digital Group, to discuss how they are evaluating Azure Synapse to drive the development of predictive maintenance analytics at scale to help airlines, as well as to get ahead of issues to optimize flight safety and operational efficiency.

If you are new to Azure Synapse, it’s Microsoft’s limitless analytics platform that brings enterprise data warehousing and big data processing together into a single service, removing the traditional constraints for analyzing data of all shapes and sizes.

The Career Force goes through her top 5 free dataset resources in this video.

  1. Data.gov: https://data.govData.gov is a large dataset aggregator and the home of the US Government’s open data.
  2. FiveThirtyEight: https://data.fivethirtyeight.com/ This is a great resource to not only see datasets, but also see how a well-respected analytics organization provides meaningful insights and commentary on the data.
  3. Kaggle: https://www.kaggle.com/Kaggle  is a great resource not only for free datasets, but for data science topics in general.
  4. Data.World: https://data.world/ There are hundreds of thousands of free datasets for anyone that sets up an account on data.world.
  5. Google Dataset Search: https://datasetsearch.research.google.com/ By accessing thousands of different repositories across the web, Google Dataset Search provides access to almost 25 million different publicly available datasets.

Databricks live streamed this interview with Matei Zaharia, an assistant professor at Stanford CS and co-founder and Chief Technologist of Databricks, the data and AI platform startup.

During his Ph.D., Matei started the Apache Spark project, which is now one of the most widely used frameworks for distributed data processing. He also co-started other widely used data and AI software such as MLflow, Apache Mesos and Spark Streaming.

Here’s a great listicle on why Python is the goto language of big data.

Don’t get me wrong, R is also a great choice, but I think the article is geared to convincing developers of other languages to check out Python.

Big Data is the most valuable commodity in present times! The data generated by companies and people is growing so much that the data generated would reach 175 zettabytes in 2025 whereas it is around 50 zettabytes currently. And Python is the best programming language to manage this Big […]

ThorogoodBI explores the use of Databricks for data engineering purposes in this webinar.

Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.

In the second of 2 recorded webcasts Thorogood Consultants Jon Ward and Robbie Shaw showcase Databricks’ data transformation and data movement capabilities, how the tool aligns with cloud computing services, and highlight the security, flexibility and collaboration aspects of Databricks. We’ll also look at Databricks Delta Lake, and how it offers improved storage for both large-scale datasets and real-time streaming data.Whether you’re looking to transform and clean large volumes of data or collaborate with colleagues to build advanced analytics jobs that can be scaled and run automatically, Databricks offers a Unified Analytics Platform that promises to make your life easier.