Machine Learning with Phil explores reinforcement learning with SARSA in this video.

While Q learning is a powerful algorithm, SARSA is equally powerful for many environments in the open AI gym. In this complete reinforcement learning tutorial, I’ll show you how to code an n Step SARSA agent from scratch.

n Step temporal difference learning is a sort of unifying theory of reinforcement learning that bridges the gap between Monte Carlo methods and temporal difference learning. We extend the agent’s horizon from a single step to n steps, and in the limit that n goes to the episode length we end up with Monte Carlo methods. For n = 1 we have vanilla temporal difference learning.

We’ll implement the n step SARSA algorithm directly from Sutton and Barto’s excellent reinforcement learning textbook, and use it to balance the cartpole from the Open AI gym 

Machine Learning with Phil dives into Deep Q Learning with Tensorflow 2 and Keras.

Dueling Deep Q Learning is easier than ever with Tensorflow 2 and Keras. In this tutorial for deep reinforcement learning beginners we’ll code up the dueling deep q network and agent from scratch, with no prior experience needed. We’ll train an agent to land a spacecraft on the surface of the moon, using the lunar lander environment from the OpenAI Gym.

The dueling network can be applied to both regular and double q learning, as it’s just a new network architecture. It doesn’t require any change to the q learning or double q learning algorithms. We simply have to change up our feed forward to accommodate the new value and advantage streams, and combine them in a way that makes sense.