Bloomberg takes a look at the unique role of data science in professional basketball.

From the description:

With her PhD in math, Ivana Seric had expected to wind up with a career in academia—but thanks to the growing use of statistical analysis in the NBA, she took a job with the Philadelphia 76ers instead. As a data scientist, she helps the team’s coaches devise smarter strategies to win.

What will the future of Data Science work look like when technologies like AutoML promise to automate much of it?

Here’s an interesting look from TDWI.

AutoML is the umbrella term for tools and platforms that automate the steps of selecting the right model and optimizing its hyperparameters to generate the best model possible under a given set of data. There are libraries such as auto-sklearn and auto-WEKA that provide these autoML capabilities.

Here’s a good article answering a very common question asked by folks who want to further shift their career focus into analytics, cloud computing, data science, and machine learning.

Breaking into the field of data science has to be navigated before launching into a career. Earning a job in data science isn’t easy, especially since there are extra job seekers in this analytics jobs.

Given the rise of data science and machine learning as an in-demand career, many people are wondering how to get started as a Data Scientist. Forbes explores how to get started in this article.

Many people are looking to break into data science, from undergraduates to career changers, have asked me how I’ve attained my current data science position at Pacific Life. I’ve referred them to many different resources, including discussions I’ve had on the Dataquest.io blog and the Scatter Podcast. In the interest of providing job seekers with a comprehensive view of what I’ve learned that works, I’ve put together the five most valuable lessons learned. I’ve written this article to make your data science job hunt easier and as efficient as possible.

Here’s a great explainer video that walks through the Support Vector Machine (SVM) algorithm.

From the video description:

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. To understand SVM’s a bit better, Lets first take a look at why they are called support vector machines. So say we got some sample data over here of features that classify whether a observed picture is a dog or a cat, so we can for example look at snout length or and ear geometry if we assume that dogs generally have longer snouts and cat have much more pointy ear shapes.

The most popular dataset on Kaggle is  Credit Card Fraud Detection. It’s an easy to understand problem space and impacts just about everyone. Fraud detection is a practical application that many businesses care about.  There’s a also something intrinsically cool about stopping crime with AI.

Here’s an interesting article on how to implement a fraud detection system with TensorFlow, PySpark, and Cortex.

While it would be cool to just build an accurate model, it would be more useful to build a production application that can automatically scale to handle more data, update when new data becomes available, and serve real-time predictions. This usually requires a lot of DevOps work, but we can do it with minimal effort using Cortex, an open source machine learning infrastructure platform. Cortex converts declarative configuration into scalable machine learning pipelines. In this guide, we’ll see how to use Cortex to build and deploy a fraud detection API using Kaggle’s dataset.

Here’s an interesting article from CodeProject defining the cycles of data science and how it relates to business cycles and the fairly well established framework of SDLC. Although some will argue that data science is “pure science” and this cycle belongs to the “data engineering” label, organizations that fail to move innovations efficiently from “the lab” to production are not going to be competitive.

By its simple definition, Data Science is a multi-disciplinary field that contains multiple processes to extract knowledge or useful output from Input Data. The output may be Predictive or Descriptive analysis, Report, Business Intelligence, etc. Data Science has well-defined lifecycles similar to any other projects and CRISP-DM and TDSP are some of the proven standards.

Many people new to data science might believe that this field is just about R, Python, Spark, Hadoop, SQL, traditional machine learning techniques or statistical modeling. While those technologies are a large part of the field, the answer is more nuanced than that.

Here’s a thoughtful article from Vincent Granville on Data Science Central about this very question and here is the list of resources that 

24 Articles About Core Data Science

Neural networks have become a hot topic over the last few years, but evaluating the most efficient way to build one is still more art than science. In fact, it’s more trial and error than art. However, MIT may have solved that problem.

The NAS (Neural Architecture Search, in this context) algorithm they developed “can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms — when run on a massive image dataset — in only 200 GPU hours,” MIT News reports. This is a massive improvement over the 48,000 hours Google reported taking to develop a state-of-the-art NAS algorithm for image classification. The goal of the researchers is to democratize AI by allowing researchers to experiment with various aspects of CNN design without needing enormous GPU arrays to do the front-end work. If finding state of the art approaches requires 48,000 GPU arrays, precious few people, even at large institutions, will ever have the opportunity to try.