This figure appears in DeepMind’s instant-classic paper ‘Mastering the Game of Go without Human Knowledge’ (2017):

Figure 3b: 'Prediction accuracy on human professional moves. The plot shows the accuracy of the neural network at each iteration of self-play, in predicting human professional moves...
The accuracy measures the percentage of positions in which the neural network assigns the highest probability to the human move

It shows that AlphaGo Zero (AGZ) only predicts human pro moves with 50% accuracy, at best. That is, AGZ disagrees with human professionals on 50% of moves.

This perhaps has implications for human expertise in general, by the following argument:

1. AGZ plays far beyond peak human ability.

2. AGZ would play differently from a peak human in 50% of moves.

3. So a peak human makes suboptimal moves at least 50% of the time.

4. Go is an excellent environment for human learning 
(small ruleset, rapid objective feedback, amenable to intuition). 

5. So, relative to more complex domains, human mastery of Go should be 
relatively complete.

6. So we can expect human experts in other, more complex domains to make 
suboptimal decisions at least 50% of the time.

Regarding premise 4, Kahneman has this to say:


Luke commented on 03 March 2018 :

Hey Gavin, you must have watched the AlphaGO documentary? I thought it was interesting when they were evaluating the game versus Lee Seedol (maybe the 2nd game?), they noticed how even when AG was ahead it was making very conservative moves.

The policy optimizes for the move which maximises win percentage, so it would choose a very boring move a with small advantage, but very high confidence on that advantage. Now I accept this probably is the best move, but looking at a more holistic approach humans play games for enjoyment.. so they won’t play this style.

The same reason a football team generally doesn’t get ahead, and then stay in their own half passing the ball around themselves for the rest of the game.

Gavin commented on 03 March 2018 :

Haven’t seen it, but yeah I followed the streams.

It’s fair to say that human players can have multiple goals (winning, personal satisfaction, looking cool, whim). Not sure how often these other goals come up among the pro group who were compared to AGZ. Still, how much of the 45% deviance are you willing to put on human holism, vs simple error or lack of foresight?

You might control for holism by increasing the stakes. But the stakes were already very high in the case of the two big games: the world watching, human exceptionalism on the line, etc.

A distinct big problem with my argument is that it treats a game as a set of independent moves, which is obviously wrong. Not sure if this invalidates anything.

Post a comment (with Markdown):

Enable submit button