OpenAI has an interesting video explaining some of their latest research behind training reinforcement learning agents how to play hide and seek.
Two Minute Papers explores the paper “MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies” in this video.
This video picks up on the continuing project that DeepLizard has been working on to build a deep Q-network to master the cart and pole problem. Learn how to manage the environment and process images that will be passed to the deep Q-network as input.
TensorFlow developers interested in Reinforcement Learning (RL) may want to take a look at Huskarl. The framework was recently introduced in a Medium blog post and is meant for easy prototyping with deep-RL algorithms.
According to its creator, software engineer Daniel Salvadori, Huskarl “abstracts away the agent-environment interaction” in a similar way “to how TensorFlow abstracts away the management of computational graphs”. Under the hood it makes use of TensorFlow 2.0, naturally, and the tf.keras API. It is also implemented in a way that facilitates the parallelisation of computation of environment dynamics across CPU cores, to help in scenarios benefitting from multiple sources.
Code Bullet explores Deep Q Learning by teaching an AI to play Snake.
In this video, Deep Lizard builds out a Deep Q Network to tackle the cart-pole problem.
Here’s an interesting article on building AI solutions for board games and where it works well and does not work quite so well.
Impressed by DeepMind’s AlphaZero achievement with game of Go, we tried to use a similar approach to implement AI for the highly acclaimed board game, Azul. We discovered that reinforcement learning is not a necessary ingredient of successful solution – and we also learned that using your favourite tools can sometimes lead you astray.
Watch this video on Deep Q-learning to implement your own deep Q-network in code.
The edureka video on “Q Learning Explained” will provide you with a detailed and comprehensive knowledge of Q-Learning and also the various aspects of Q-Learning.
Here’s a great explanation of Reinforcement Learning, AlphaGo Zero, and how it compares to other forms of machine learning.
For example, AlphaGo, in order to learn to play (the action) the game of Go (the environment), first learned to mimic human Go players from a large data set of historical games (apprentice learning). It then improved its play through trial and error (reinforcement learning), by playing large numbers of Go games against independent instances of itself.