Here’s an interesting talk on Dask from AnacondaCon 2018.

Tom Augspurger. Scikit-Learn, NumPy, and pandas form a great toolkit for single-machine, in- memory analytics. Scaling them to larger datasets can be difficult, as you have to adjust your workflow to use chunking or incremental learners. Dask provides NumPy- and pandas-like data containers for manipulating larger than memory datasets, and dask-ml provides estimators and utilities for modeling larger than memory datasets.

These tools scale your usual workflow out to larger datasets. We’ll discuss some of the challenges data scientists run into when scaling out to larger datasets. We’ll then focus on demonstrations of how dask and dask-ml solve those challenges. We’ll see examples of how dask can expose a cluster of machines to scikit-learn’s built-in parallelization framework. We’ll see how dask-ml can train estimators on large datasets.AnacondaCon 2018. Tom Augspurger. Scikit-Learn, NumPy, and pandas form a great toolkit for single-machine, in- memory analytics.

Scaling them to larger datasets can be difficult, as you have to adjust your workflow to use chunking or incremental learners. Dask provides NumPy- and pandas-like data containers for manipulating larger than memory datasets, and dask-ml provides estimators and utilities for modeling larger than memory datasets. These tools scale your usual workflow out to larger datasets. We’ll discuss some of the challenges data scientists run into when scaling out to larger datasets. We’ll then focus on demonstrations of how dask and dask-ml solve those challenges. We’ll see examples of how dask can expose a cluster of machines to scikit-learn’s built-in parallelization framework. We’ll see how dask-ml can train estimators on large datasets.

Have you wondered whether there could be an ultimate solution to speed up your data science work via parallelizing Pandas and NumPy?

Can you boost the speed by integrating all of these data frames with libraries like XGBoost or Sklearn?

Well, then Dask may be just what you’ve been wanting all along.

Dask is a revolutionary tool, and a perfect solution if use Pandas and Numpy and struggle with the data that does not fit into RAM In this article, we will be looking at how easily the dask data frames fits into the data science workflow