TensorFlow developers interested in Reinforcement Learning (RL) may want to take a look at Huskarl. The framework was recently introduced in a Medium blog post and is meant for easy prototyping with deep-RL algorithms.

According to its creator, software engineer Daniel Salvadori, Huskarl “abstracts away the agent-environment interaction” in a similar way “to how TensorFlow abstracts away the management of computational graphs”. Under the hood it makes use of TensorFlow 2.0, naturally, and the tf.keras API. It is also implemented in a way that facilitates the parallelisation of computation of environment dynamics across CPU cores, to help in scenarios benefitting from multiple sources.

Here’s an interesting article on building AI solutions for board games and where it works well and does not work quite so well.

Impressed by DeepMind’s AlphaZero achievement with game of Go, we tried to use a similar approach to implement AI for the highly acclaimed board game, Azul. We discovered that reinforcement learning is not a necessary ingredient of successful solution – and we also learned that using your favourite tools can sometimes lead you astray.

Here’s a great explanation of Reinforcement Learning, AlphaGo Zero, and how it compares to other forms of machine learning.

For example, AlphaGo, in order to learn to play (the action) the game of Go (the environment), first learned to mimic human Go players from a large data set of historical games (apprentice learning). It then improved its play through trial and error (reinforcement learning), by playing large numbers of Go games against independent instances of itself.

It amazes me how many people have heard about AlphaGo, but not about AlphaGo Zero.  In the future, I predict that we will look back on AlphaGo Zero as the watershed moment in AI development.

Here’s a great over of AlphaGo Zero and the techniques behind it.

AlphaGo Zero is able to achieve all this by employing a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. As explained previously, the system starts off with a single neural network that knows absolutely nothing about the game of Go. By combining this neural network with a powerful search algorithm, it then plays games against itself. As it plays more and more games, the neural network is updated and tuned to predict moves, and even the eventual winner of the games.