Machine Learning with Phil has a great tutorial on how to do Deep Q Learning in PyTorch.

The PyTorch deep learning framework makes coding a deep q learning agent in python easier than ever. We’re going to code up the simplest possible deep Q learning agent, and show that we only need a replay memory to get some serious results in the Lunar Lander environment from the Open AI Gym. We don’t really need the target network, though it has been known to help the deep Q learning algorithm with convergence.

OpenAI Gym is a well known RL environment/community for developing and comparing Reinforcement Learning agents.

OpenAI Gym doesn’t make assumptions about the structure of the agent and works out well with any numerical computation library such as TensorFlow, PyTorch.

The gym also provides various types of environments.

In this hands-on guide, learn how to develop a tic-tac-toe environment from scratch using OpenAI Gym.

I always knew that reinforcement learning would teach us more about ourselves than any other kind of AI approach. This feeling was backed up in a paper published recently in Nature.

DeepMind, Alphabet’s AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains.

The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence.

It turns out the brain’s reward system works in much the same way—a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward.

Machine Learning with Phil has got another interesting look at Deep Q Learning as part of a preview of his course.

The two biggest innovations in deep Q learning were the introduction of the target network and the replay memory. One would think that simply bolting a deep neural network to the Q learning algorithm would be enough for a robust deep Q learning agent, but that isn’t the case. In this video I’ll show you how this naive implementation of the deep q learning agent fails, and spectacularly at that.

This is an excerpt from my new course, Deep Q Learning From Paper to Code which you can get on sale with this link

Samuel Arzt shows off a project where an AI learns to park a car in a parking lot in a 3D physics simulation.

The simulation was implemented using Unity’s ML-Agents framework (

From the video description:

The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Dani, a game developer, recently made a game and decided to train an AI to play it.

A couple of weeks ago I made a video “Making a Game in ONE Day (12 Hours)”, and today I’m trying to teach an A.I to play my game!

Basically I’m gonna use Neural Networks to make the A.I learn to play my game.

This is something I’ve always wanted to do, and I’m really happy I finally got around to do it. Some of the biggest inspirations for this is obviously carykh, Jabrils & Codebullet!