From AlphaGo to Artificial General Intelligence

Edward Tsang 2017.06.01; updated 2018.02.07

AlphaGo Master beats Ke Jie emphatically. AlphaGo Zero eliminates human intervention, yet beats AlphaGo Master convincingly. It's creator Demis Hassibis announced his ambition to build machines that can learn anything. To distingish it from traditional artificial intelligence (AI), he calls this Artificial General Intelligence (AGI). One must realize that, while AlphaGo has achieved a lot, it has only made a small step towards AGI.

AlphaGo has beaten KE Jie, the current world number one human player

AlphaGo beat KE Jie, the current world number one ranking human player, by 3:0 at the Future of Go Summit. The games were held in Wuzhen, China on 23rd, 25th and 27th May 2017. The version of AlphaGo was "AlphaGo Master". It was later beaten by a new version of AlphaGo, called "AlphaGo Zero".

AlphaGo Master’s power came from (explained here):

A good team of computer scientists and go experts;
The input of around 30 million board positions (i.e. professional players’ moves);
Machine learning (which gives AlphaGo more experience than any player alive); and
Hardware (i.e. having multiple players to play as a team).

Human input to AlphaGo Master

The human players have to tell the program two things (explained here):

Where to look; and
How to assess how good a board situation is.

AlphaGo’s human engineers may not be world champions, but they certainly must be strong players themselves.

Emotion and stress

Ke Jie's performance may have been affected by the fact that he doesn't know his opponent. Human players often play to counter their opponent's style. The fact that Ke doesn't know AlphaGo's style may cause him stress, which may weaken his play.

A player must be good in every aspect in order to be a top player. To play a real game, a human player needs to manage stress. A human player must be able to think fast and manage time well. A human player who can play strategy but cannot play fast cannot be a top player. A human player who can assess board situations fast but cannot manage his stress cannot be a top player.

A machine does not worry about stress. With multiple processors, machine can think fast. AlphaGo has the right quality to succeed in Go.

AlphaGo will only get stronger

Human players will retire. New players will have to learn everything from scratch. With machine learning, AlphaGo can only get stronger. Experience will accumulate with more games that AlphaGo plays. Every mistakes or missed opportunities can be scrutinized and avoided in the future.

Psychology matters too. The more AlphaGo wins, the more human players will respect its moves. Even if AlphaGo makes a mistake, human players may hesitate in exploiting it. The fear factor will weaken human players.

For all the reasons above, AlphaGo will only grow stronger and stronger against human players.

AlphaGo Zero

AlphaGo Zero was developed after AlphaGo Master. It got rid of supervised learning (30 million moves played by professionals) used by AlphaGo Master. It also got rid of the two separate neural networks which told AlphaGo Master (a) where to look, and (b) how good a move is likely to be. It uses a single neural network which only looked at the 19x19 board position. This network would perform the functions of both networks in AlphaGo Master. The idea is to remove human Go expertise in the program. AlphaGo therefore learned from scratch, with nothing more than the rules in Go given.

Without human expertise, AlphaGo Zero played a poor game at start. But after 3 days' unsupervised learning, it was able to beat AlphaGo Lee, the AlphaGo version that beat Lee Sedol. After 21 days, it was able to beat AlphaGo Master. After 40 days, it beat 100-0 against all previously published, champion defeating AlphaGo programs.

To support AlphaGo, Google developed its Tensor Processing Unit (TPU), hardware specifically designed for machine learning.

Artificial General Intelligence (AGI)

Demis Hassibis' ambition is to build a artificial general intelligence which can learn everything on it's own. To distinguish this from traditional artificial intelligence, he calls this artificial general intelligence.

Hassibis argued that DeepBlue is a pre-programmed system. It belongs to “narrow AI”; the intelligence is in the programmers. He pointed out that Go is a much harder game, with more possible board positions than the number of atoms of the universe. Cursed by Combinatorial Explosion, an unintelligent, brute-force search will not go very far. Hassibis also argued that Go is a construction game, whereas chess is a destruction game. That makes it more difficult to assess whether one is in a favourable or unfavourable position in Go.

The role of human in artificial intelligence

First, it is important to recognize the fact that machine learning is part of traditional AI. In most applications that did not involve machine learning:

Human find decision variables relevant to solving the problem;
Human use those variables to construct programs for solving the problem.

AlphaGo Master:

Human find decision variables that are relevant to playing Go;
Human defines the model of learning -- in AlphaGo's case, to build two multi-layered neural networks;
Human provides machine with 30 million board situations to kick start the learning process; (this is not needed for the later version of AlphaGo, called AlphaGo Zero)
Machine learns how to combine those variables within the form of neural networks.

AlphaGo Zero showed that the 30 million board positions was not necessary to kick start the learning, although it speeds up the process.

How far has AlphaGo gone towards AGI?

What are the input and decision variables?

In Go, the input are the state of the 19x19 board positions -- whether it is empty, occupied by black or occupied by white. The decision variables are the evaluation of the 19x19 board positions -- the program has to assign a value to each of them, indicating how good it is to play on those positions. These are clearly identifiable inputs and variables. In most real life situations, such as medical diagnosis, the potential inputs are less well-defined. For example, what are the decision variables in a medical diagnosis? In the style of AlphaGo, one could assign a value to each known medical problem. But what are the input? For example, does one do blood tests, take the ECG readings or do a DNA analysis? These input all incur a cost. So some of the decision variables must be "what tests to do". These decisions determine what other input and decisions are available.

Multi-objective optimization

In Go, the objectives are clear: to win the game. In real life problems, the objectives are often less clear. In medical diagnosis, we want an accurate diagnosis. But we also want to minimize cost. More tests incurs higher cost, but allows one to make more accurate diagnosis. What about discomfort? Putting a patient in hospital would allow more tests to be conducted. But would patients feel comfortable staying in hospitals and going through many tests? Do we allow patients a say on whether they should stay in hospital? With cost, comfort and patient preference involved, the problem is a multi-objective optimisation problem. It is unclear what the "optimal" solution should be.

Other considerations

There are many more gaps to bridge from Go to real life problems. Here are just a few:

Go is a board game in which players make alternate moves. In real life problems, one has to learn when to make a move and what move to make.
Go is a zero-sum game, with concrete results: win, lose or draw. Many real life problems are not zero-sum games. The evaluation function in AlphaGo need modification; it is unclear what one should be learning.
Go is a game with perfect information. One knows what moves are available by either player at any situation. In real life problems, one does not have perfect information about the world. One does not know what could happen in the future.
AlphaGo Zero has 19x19=381 input variables. Most real life problems have far more variables and a much bigger search space.

A small though significant step in the right direction

ALphaGo has demonstrated great success. Input and decision variables complicate things, but can be handled. Expertise is around to tackle multi-objective optimization. But there is a lot to bridge from playing Go to solving general problems. Towards AGI, AlphGo has only made a small, though exciting and significant, step towards the right direction.

Remarks

It is worth remarking that this is not the first attempt to create general artificial intelligence. Many researchers (including the author) have been excited by these ambitious projects.

Herbert Simon (1978 Nobel Prize winner in economics), Cliff Shaw and Allen Newell created the General Problem Solver in 1959. The popular belief then was: computers are very good at memorising things, but poor in reasoning. All we need is to provide computers with a reasoning ability. Then as soon as we feed computers with the relevant knowledge, which they will remember without difficulties, they would be able to do all the reasoning that we can do.
Doug Lenat created two programs called AM and Eurisko that can rewrite their own reasoning rules. Lenat believed that all that was lacking was common sense. This led to the Cyc project which attempted to build a library of concepts covering common sense.

These projects have not achieved what they aimed to achieve. This is because reasoning involves searching in a large space. As Demi Hassabis pointed out, there are more possible board situations in Go than the number of atoms in the universe. We are all limited by combinatorial explosion.

Besides, computers are no better than human beings in memorising things. Yes, computers can have far more memory (in RAM and in hard disks) than human brain cells. The popular belief today is that human beings are far more efficient in organising our memory. Instead of looking up systematically, which is haunted by combinatorial explosion, we seem to be able to retrieve hidden items in our memory efficiently through association.

The purpose of this remark is purely to provide background knowledge of previous attempts. Just because previous attempts have failed doesn't mean that DeepMind cannot succeed in achieving Artificial General Intelligence. After all, previous attempts have never achieved as much as AlphaGo, which beats top human players in Go -- no one ever believed this could have been achieved within the next decade.

Appendix: versions of AlphaGo

AlphaGo Fan: the version that defened Fan Hui in October 2015; it used 176 GPUs; this was the version published in Nature in 2016.
AlphaGo Lee: the version that played Lee Sedol 4-1 in March 2016; it used 48 TPUs [The version that played Ke Jie 3-0 in 2017 was not named in (Silver et al 2017).]
AlphaGo Master: the version that played online and defeated professional human players 60-0 [probably the version that defeated Ke Jie]
AlphaGo Zero: the version published in Nature in 2017, which got rid of supervised learning; it won 100-0 against "the previously published, champion defeating AlphaGo"

[End]

Related: