Machine Learning with Phil explores reinforcement learning with SARSA in this video.

While Q learning is a powerful algorithm, SARSA is equally powerful for many environments in the open AI gym. In this complete reinforcement learning tutorial, I’ll show you how to code an n Step SARSA agent from scratch.

n Step temporal difference learning is a sort of unifying theory of reinforcement learning that bridges the gap between Monte Carlo methods and temporal difference learning. We extend the agent’s horizon from a single step to n steps, and in the limit that n goes to the episode length we end up with Monte Carlo methods. For n = 1 we have vanilla temporal difference learning.

We’ll implement the n step SARSA algorithm directly from Sutton and Barto’s excellent reinforcement learning textbook, and use it to balance the cartpole from the Open AI gym 

Machine Learning with Phil has a great tutorial on how to do Deep Q Learning in PyTorch.

The PyTorch deep learning framework makes coding a deep q learning agent in python easier than ever. We’re going to code up the simplest possible deep Q learning agent, and show that we only need a replay memory to get some serious results in the Lunar Lander environment from the Open AI Gym. We don’t really need the target network, though it has been known to help the deep Q learning algorithm with convergence.

Siraj Raval explores the world of automated training with reinforcement learning with a few lines of Python code!

In this video, he demonstrates how a popular reinforcement learning technique called “Q learning” allows an agent to approximate prices for stocks in a portfolio. The literature of reinforcement learning is incredibly rich. There are so many concepts, like TD-Learning and Actor-Critic for example, that have real-world potential.