Feb 3, 2019 1:00 AM

DeepMind’s superhuman AI is rewriting how we play chess

AlphaZero doesn’t play chess like a machine – it plays it like a human grandmaster, but better

EDUARDO MUNOZ ALVAREZ/AFP/Getty Images

Since 1997, when IBM’s Deep Blue beat world champion and chess legend Garry Kasparov in a six-game match, chess players have accepted that machines are stronger at chess. We have taken some comfort from the fact that we taught these machines how to play. But strangely enough, despite being programmed by humans, traditional chess engines don’t play quite like humans.

Despite the hand-crafted heuristics, the fundament of an engine’s superiority lies in calculation: sifting through vast numbers of moves to find concrete ways to solve a position. Back then, chess grandmasters were hired in to evaluate a series of typical positions and describe the considerations that led to the assessment, and then programmers turned these considerations into ever more sophisticated heuristics. A chess program or an “engine” like Stockfish searches through about 60 million positions a second. But an engine’s solution may look ugly to human eyes, even if it is unquestionably a winning move.

Enter DeepMind. The Google-owned AI company’s AlphaZero is a paradox. AlphaZero taught itself chess (as well as go and shogi) starting with no knowledge about the game beyond the basic rules. It developed its chess strategies by playing millions of games against itself and discovering promising avenues of exploration from the games it won and lost. It also searches far fewer positions than Stockfish when it plays. The result was a chess player of superhuman strength with a style that is human-like.

We worked together intensively with AlphaZero during the World Chess Championships played in London in November 2018. While Norway’s Magnus Carlsen and America’s Fabiano Caruana were fighting it out across the chessboard, AlphaZero was evaluating their moves and suggesting alternative ideas.

AlphaZero’s reinforcement learning has given it a distinctive and instantly recognisable style, and it implements its ideas in a direct, efficient way, without undue regard for the material balance. It has human-like drive to make progress and never sit still. Interestingly, many of AlphaZero’s ideas match accepted human rules derived from hundreds of years of playing chess. However, AlphaZero’s twist (achieved through its deep neural network architecture) is to combine factors we considered minor or incidental – such as the restriction of the opponent’s king – into a whole game strategy. For example, taking unusually early action to create a weakness in the opponent’s king’s position and then using this weakness as a motif throughout the rest of its play.

Having AlphaZero next to us felt like having a human chess genius on tap, who never got tired and never asked for coffee. “AlphaZero find us a path!” became our standard cry during the World Championship and it was always ready with a creative way to optimise its position. Its strength compared with traditional engines wasn’t necessarily in calculation-heavy positions but rather in intricate positions in which a mixture of calculation, positional insight and long-term planning was required. We particularly noticed how alert AlphaZero was to the danger of landing in a passive position without prospects and how driven it was to avoid this scenario.

In our book Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI we work with the DeepMind technical team to explain how AlphaZero’s construction and training has led to its creative and intuitive style. There are many unexpected aspects to this. For example, AlphaZero trains by playing vast numbers of lightning-fast games (40 milliseconds a move) against itself at a very shallow search depth.

There is a trade-off here: one might think that AlphaZero could learn more by playing slower, high-quality games. However the quicker the game is played, the more games AlphaZero sees, the more different situations it is exposed to, and the more it can learn. Quicker games are also more likely to become unbalanced and produce a decisive result which AlphaZero can then use to tune (strengthen or weaken) the connections in its policy network that led to its decisions in the game.

There is an interesting parallel in the way modern chess grandmasters train compared to 40 years ago. 40 years ago, “blitz chess” – super-fast games played with just one or three minutes per player per game – were frowned upon as both a waste of time and damaging for your chess skill. However, all of the current top chess players – World Champion Magnus Carlsen above all – are superlative blitz players and regularly take part in online blitz competitions.

Another fascinating aspect is how AlphaZero evaluates chess positions. Traditional engines evaluate a given position via a scale based on material (the general chess term for pawns and pieces). For example, a score of +1.5 indicates an advantage of one-and-a-half pawns. (The generally recognised scale for material in chess is that pawns are worth one point, knights and bishops are worth three points, a rook is worth five points and a queen is worth nine points.)

AlphaZero evaluates positions probabilistically based on its perceived chance of winning or drawing (in fact we don’t even know whether it assigns any values for pawns and pieces!) This may explain why AlphaZero is not afraid to sacrifice its pawns and pieces to achieve its goals: what does a pawn or two matter if your expected score increases?

The evaluation of traditional engines also reflects only the single best variation it finds in the position. AlphaZero’s evaluation is a weighted average of all the variations it considers in the position, not just the single best variation. This seems to allow AlphaZero to steer games “intuitively” into promising-looking situations, in which danger and the possibility of mistakes are ever-present for the opponent, without needing to calculate every detail – just like strong human players do.

AlphaZero’s strength and originality truly surprised us. Chess is full of superhuman expert systems, yet AlphaZero discovered an uncharted space in which its self-taught insights were both startling and valuable. That uncharted space was so significant that AlphaZero was able to convincingly defeat the strongest expert system at the time of testing. Bearing that in mind, you can’t help but to be positive for the application of AlphaZero-like techniques in environments that are less well-researched than chess. Maybe soon, scientists will be echoing our cry during the World Championship: “AlphaZero, find us a path!”

Matthew Sadler and Natasha Regan are the authors of Game Changer,published by New in Chess

This article was originally published by WIRED UK