Databricks recently held a webinar on how they worked with Virgin Hyperloop One engineers.

They discuss the goals, implementation, and outcome of moving from Pandas code to Koalas code and using MLflow. Lots of code, notebooks, demos, etc.

Come hear Patryk Oleniuk, Software Engineer at Virgin Hyperloop (VHO) discuss how VHO has dramatically reduced processing time by 95%, while changing less than 1% of previously single-threaded, pandas-based python code. Attendees of this webinar will learn:

How VHO leverages public and private transportation data to optimize Hyperloop designHow to ‘Sparkify’ (scale) your pandas code by using ‘Koalas’ with minimal code changesHow to use ‘Koalas’ and MLflow for sweeping machine learning models and experiment resultsFeatured SpeakersPatryk Oleniuk, Lead Data Engineer, Virgin Hyperloop OneYifan Cao, Senior Product Manager, Databricks 

Resources:

Slides: https://www.slideshare.net/databricks/from-pandas-to-koalas-reducing-timetoinsight-for-virgin-hyperloops-data

Koalas Notebook: https://pages.databricks.com/rs/094-YMS-629/images/koalas_webinar_code%20-%20Copy.html

Pandas 1.0.0 is the Python’s primary library for data analysis and manipulation. Pandas 1.0.0 is now officially released! ✅Get 20% OFF the data science training! http://bit.ly/2SwmMB4

Although at first sight this latest version is not much different for the user than the previous release starting with a 0: 0.25.3, there are plenty of enhanced features that boost performance and lay a better foundation in the long run. They represent 1.0.0 as a stable version of pandas with a strengthened API, which has also been cleaned of many prior version deprecations.Here are the most notable improvements that come with 1.0.0. 

Data School has a great video on the pandas library. In it, you’ll use pandas to answer questions about a real-world dataset. Through each exercise, you’ll learn important data science skills as well as “best practices” for using pandas. By the end of the tutorial, you’ll be more fluent at using pandas to correctly and efficiently answer your own data science questions.

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial session from PyCon Cleveland 2018, you’ll perform a variety of data science tasks on a handful of real-world datasets using pandas.