MSR’s New York City lab is home to some of the best reinforcement learning research on the planet but if you ask any of the researchers, they’ll tell you they’re very interested in getting it out of the lab and into the real world.

One of those researchers is Dr. Akshay Krishnamurthy and today, he explains how his work on feedback-driven data collection and provably efficient reinforcement learning algorithms is helping to move the RL needle in the real-world direction.

Are you curious how data scientists and researchers train agents that make decisions? 

Learn how to use reinforcement learning to optimize decision making using Azure Machine Learning.  We show you how to get started.

Time Index:

  • [00:36] – What is reinforcement learning?
  • [01:37] – How do reinforcement learning algorithms work?
  • [04:10] – Reinforcement Learning on Azure – Notebook sample
  • [05:17] – Reinforcement Learning Estimator
  • [07:21] – Sample training Python script
  • [09:06] – Training Result
  • [10:15] – What kind of problems can you solve with reinforcement learning?

Learn More:

The AI Show’s Favorite links:

Reinforcement Learning (RL) uses a “trial and error” method and interacts with the environment to learn an optimal policy for gaining maximum rewards by making the right decisions.

It is one of the most popular machine learning techniques among organizations to develop solutions like recommender systems, healthcare, robotics, and many more.

Analytics India Magazine has compiled a list of the top 10 free resources to learn RL.

Reinforcement learning is one of the most popular machine learning techniques among organisations to develop solutions like recommendation systems, healthcare, robotics, transportations, among others. This learning technique follows the “trial and error” method and interacts with the environment to learn an optimal policy for gaining maximum rewards by making […]

Lex Fridman interviews David Silver for the Artificial Intelligence podcast..

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.

Time Index:

  • 0:00 – Introduction
  • 4:09 – First program
  • 11:11 – AlphaGo
  • 21:42 – Rule of the game of Go
  • 25:37 – Reinforcement learning: personal journey
  • 30:15 – What is reinforcement learning?
  • 43:51 – AlphaGo (continued)
  • 53:40 – Supervised learning and self play in AlphaGo
  • 1:06:12 – Lee Sedol retirement from Go play
  • 1:08:57 – Garry Kasparov
  • 1:14:10 – Alpha Zero and self play
  • 1:31:29 – Creativity in AlphaZero
  • 1:35:21 – AlphaZero applications
  • 1:37:59 – Reward functions
  • 1:40:51 – Meaning of life

Machine Learning with Phil explores reinforcement learning with SARSA in this video.

While Q learning is a powerful algorithm, SARSA is equally powerful for many environments in the open AI gym. In this complete reinforcement learning tutorial, I’ll show you how to code an n Step SARSA agent from scratch.

n Step temporal difference learning is a sort of unifying theory of reinforcement learning that bridges the gap between Monte Carlo methods and temporal difference learning. We extend the agent’s horizon from a single step to n steps, and in the limit that n goes to the episode length we end up with Monte Carlo methods. For n = 1 we have vanilla temporal difference learning.

We’ll implement the n step SARSA algorithm directly from Sutton and Barto’s excellent reinforcement learning textbook, and use it to balance the cartpole from the Open AI gym 

Machine Learning with Phil has a great tutorial on how to do Deep Q Learning in PyTorch.

The PyTorch deep learning framework makes coding a deep q learning agent in python easier than ever. We’re going to code up the simplest possible deep Q learning agent, and show that we only need a replay memory to get some serious results in the Lunar Lander environment from the Open AI Gym. We don’t really need the target network, though it has been known to help the deep Q learning algorithm with convergence.

OpenAI Gym is a well known RL environment/community for developing and comparing Reinforcement Learning agents.

OpenAI Gym doesn’t make assumptions about the structure of the agent and works out well with any numerical computation library such as TensorFlow, PyTorch.

The gym also provides various types of environments.

In this hands-on guide, learn how to develop a tic-tac-toe environment from scratch using OpenAI Gym.

I always knew that reinforcement learning would teach us more about ourselves than any other kind of AI approach. This feeling was backed up in a paper published recently in Nature.

DeepMind, Alphabet’s AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains.

The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence.

It turns out the brain’s reward system works in much the same way—a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward.