Databricks just livestreamed this tech talk earlier today.

Developers and data scientists around the world have developed tens of thousands of open source projects to help track, understand, and address the spread of COVID-19. Given the sheer volume, finding a project to contribute to can prove challenging. To make this easier, we built a recommendation system to highlight projects based off of inputted programming languages and keywords.

This talk will go through the full cycle of implementing this system: gathering data, building/tracking models, deploying the model, and creating a UI to utilize the model.

Databricks just posted part 3 of a 3 part online technical workshop series on Managing the Complete Machine Learning Lifecycle with MLflow. If you’re interested in learning about machine learning and MLflow, this workshop series is for you!

Details:

This workshop is an introduction to MLflow. Machine Learning (ML) development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

To solve these challenges, MLflow, an open source project, simplifies the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, encapsulate models that can be used with many existing tools, and central repository to share models, accelerating the ML lifecycle for organizations of any size.

Related Links:

Databricks just streamed this workshop on managing the machine learning lifecycle with MLflow

Workshop 1 of 3 | Introduction to MLflow: How to Use MLflow Tracking

Level: Beginner/Intermediate Data Scientist or ML Engineer

Details: This workshop is an introduction to MLflow. Machine Learning (ML) development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

Databricks talks about the latest developments and best practices for managing the full ML lifecycle on Databricks with MLflow.

Part 1: Opening Keynote and Demo

  • MLOps and ML Platforms State of the Industry, opening Keynote with Matei Zaharia, Co-founder and CTO at Databricks and Clemens Mewald, Director of Product Management at Databricks – https://youtu.be/9Ehh7Vl7ByM – Slideshare: https://www.slideshare.net/databricks/mlops-virtual-event-building-machine-learning-platforms-for-the-full-lifecycle
  • Operationalizing Data Science & ML on Databricks using MLflow (Demo) with Sean Owen, Principal Solution Architect at Databricks – https://youtu.be/cxAmu9w8BFo
  • Live Q&As – https://youtu.be/AQqqK5hRY5g

Resources:

Databricks recently held a webinar on how they worked with Virgin Hyperloop One engineers.

They discuss the goals, implementation, and outcome of moving from Pandas code to Koalas code and using MLflow. Lots of code, notebooks, demos, etc.

Come hear Patryk Oleniuk, Software Engineer at Virgin Hyperloop (VHO) discuss how VHO has dramatically reduced processing time by 95%, while changing less than 1% of previously single-threaded, pandas-based python code. Attendees of this webinar will learn:

How VHO leverages public and private transportation data to optimize Hyperloop designHow to ‘Sparkify’ (scale) your pandas code by using ‘Koalas’ with minimal code changesHow to use ‘Koalas’ and MLflow for sweeping machine learning models and experiment resultsFeatured SpeakersPatryk Oleniuk, Lead Data Engineer, Virgin Hyperloop OneYifan Cao, Senior Product Manager, Databricks 

Resources:

Slides: https://www.slideshare.net/databricks/from-pandas-to-koalas-reducing-timetoinsight-for-virgin-hyperloops-data

Koalas Notebook: https://pages.databricks.com/rs/094-YMS-629/images/koalas_webinar_code%20-%20Copy.html

MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data center. Here’s an interesting take from the Register.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.

Databricks first introduced MLflow in last June. Immediately, startups and larger enterprises started using it to manage their machine learning lifecycles. Since then, more than 80 contributors from some 40 companies have worked on the open source machine learning tool, and it regularly sees more than 500,000 downloads per month.

And check out this recent news:

Unveiled at the Spark + AI Summit 2019, sponsored by Databricks, the new Databricks and Microsoft collaboration is a sign of the companies’ deepening ties, but it is also too new to say how effectively the partnership will advance MLflow for developers, said Mike Gualtieri, a Forrester analyst.