It amazes me how many people have heard about AlphaGo, but not about AlphaGo Zero.  In the future, I predict that we will look back on AlphaGo Zero as the watershed moment in AI development.

Here’s a great over of AlphaGo Zero and the techniques behind it.

AlphaGo Zero is able to achieve all this by employing a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. As explained previously, the system starts off with a single neural network that knows absolutely nothing about the game of Go. By combining this neural network with a powerful search algorithm, it then plays games against itself. As it plays more and more games, the neural network is updated and tuned to predict moves, and even the eventual winner of the games.

In this video, watch Siraj Raval build a cryptocurrency trading bot called GradientTrader, and he shows you the tools used to build it.It uses a graphical interface that lets you back-test on historical data, simulate paper trading, and implement a custom trading strategy for the real markets. The technique I used was a cutting edge Deep Reinforcement Learning strategy called Multi Agent Actor Critic.

Here’s an insightful blog post on the future of RL (reinforcement learning): Deep RL and why it’s going to be revolutionary.

Until few years back, reinforcement learning techniques were constrained on small discrete systems. An increase in state space(different parameters of the system), the memory and computation power increases exponentially. Before apply reinforcement learning techniques even continuous systems had to be discretized. Many things are now possible with the recent breakthroughs of Deep Neural Networks(DNN), and specially its approximation capability. Combining Reinforcement Learning and DNN, we have developed techniques taking advantage of both fields. The new field is called Deep Reinforcement Learning (DRL) and is responsible for unimaginable breakthroughs in many domains.