AlphaGo Master beats Ke Jie emphatically. AlphaGo Zero eliminates human intervention, yet beats AlphaGo Master convincingly. It's creator Demis Hassibis announced his ambition to build machines that can learn anything. To distingish it from traditional artificial intelligence (AI), he calls this Artificial General Intelligence (AGI). One must realize that, while AlphaGo has achieved a lot, it has only made a small step towards AGI.
AlphaGo beat KE Jie, the current world number one ranking human player, by 3:0 at the Future of Go Summit. The games were held in Wuzhen, China on 23rd, 25th and 27th May 2017. The version of AlphaGo was "AlphaGo Master". It was later beaten by a new version of AlphaGo, called "AlphaGo Zero".
AlphaGo Master’s power came from (explained here):
The human players have to tell the program two things (explained here):
Ke Jie's performance may have been affected by the fact that he doesn't know his opponent. Human players often play to counter their opponent's style. The fact that Ke doesn't know AlphaGo's style may cause him stress, which may weaken his play.
A player must be good in every aspect in order to be a top player. To play a real game, a human player needs to manage stress. A human player must be able to think fast and manage time well. A human player who can play strategy but cannot play fast cannot be a top player. A human player who can assess board situations fast but cannot manage his stress cannot be a top player.
A machine does not worry about stress. With multiple processors, machine can think fast. AlphaGo has the right quality to succeed in Go.
Human players will retire. New players will have to learn everything from scratch. With machine learning, AlphaGo can only get stronger. Experience will accumulate with more games that AlphaGo plays. Every mistakes or missed opportunities can be scrutinized and avoided in the future.
Psychology matters too. The more AlphaGo wins, the more human players will respect its moves. Even if AlphaGo makes a mistake, human players may hesitate in exploiting it. The fear factor will weaken human players.
For all the reasons above, AlphaGo will only grow stronger and stronger against human players.
AlphaGo Zero was developed after AlphaGo Master. It got rid of supervised learning (30 million moves played by professionals) used by AlphaGo Master. It also got rid of the two separate neural networks which told AlphaGo Master (a) where to look, and (b) how good a move is likely to be. It uses a single neural network which only looked at the 19x19 board position. This network would perform the functions of both networks in AlphaGo Master. The idea is to remove human Go expertise in the program. AlphaGo therefore learned from scratch, with nothing more than the rules in Go given.
Without human expertise, AlphaGo Zero played a poor game at start. But after 3 days' unsupervised learning, it was able to beat AlphaGo Lee, the AlphaGo version that beat Lee Sedol. After 21 days, it was able to beat AlphaGo Master. After 40 days, it beat 100-0 against all previously published, champion defeating AlphaGo programs.
To support AlphaGo, Google developed its Tensor Processing Unit (TPU), hardware specifically designed for machine learning.
Demis Hassibis' ambition is to build a artificial general intelligence which can learn everything on it's own. To distinguish this from traditional artificial intelligence, he calls this artificial general intelligence.
Hassibis argued that DeepBlue is a pre-programmed system. It belongs to “narrow AI”; the intelligence is in the programmers. He pointed out that Go is a much harder game, with more possible board positions than the number of atoms of the universe. Cursed by Combinatorial Explosion, an unintelligent, brute-force search will not go very far. Hassibis also argued that Go is a construction game, whereas chess is a destruction game. That makes it more difficult to assess whether one is in a favourable or unfavourable position in Go.
First, it is important to recognize the fact that machine learning is part of traditional AI. In most applications that did not involve machine learning:
AlphaGo Zero showed that the 30 million board positions was not necessary to kick start the learning, although it speeds up the process.
In Go, the input are the state of the 19x19 board positions -- whether it is empty, occupied by black or occupied by white. The decision variables are the evaluation of the 19x19 board positions -- the program has to assign a value to each of them, indicating how good it is to play on those positions. These are clearly identifiable inputs and variables. In most real life situations, such as medical diagnosis, the potential inputs are less well-defined. For example, what are the decision variables in a medical diagnosis? In the style of AlphaGo, one could assign a value to each known medical problem. But what are the input? For example, does one do blood tests, take the ECG readings or do a DNA analysis? These input all incur a cost. So some of the decision variables must be "what tests to do". These decisions determine what other input and decisions are available.
In Go, the objectives are clear: to win the game. In real life problems, the objectives are often less clear. In medical diagnosis, we want an accurate diagnosis. But we also want to minimize cost. More tests incurs higher cost, but allows one to make more accurate diagnosis. What about discomfort? Putting a patient in hospital would allow more tests to be conducted. But would patients feel comfortable staying in hospitals and going through many tests? Do we allow patients a say on whether they should stay in hospital? With cost, comfort and patient preference involved, the problem is a multi-objective optimisation problem. It is unclear what the "optimal" solution should be.
There are many more gaps to bridge from Go to real life problems. Here are just a few:
ALphaGo has demonstrated great success. Input and decision variables complicate things, but can be handled. Expertise is around to tackle multi-objective optimization. But there is a lot to bridge from playing Go to solving general problems. Towards AGI, AlphGo has only made a small, though exciting and significant, step towards the right direction.
It is worth remarking that this is not the first attempt to create general artificial intelligence. Many researchers (including the author) have been excited by these ambitious projects.
Besides, computers are no better than human beings in memorising things. Yes, computers can have far more memory (in RAM and in hard disks) than human brain cells. The popular belief today is that human beings are far more efficient in organising our memory. Instead of looking up systematically, which is haunted by combinatorial explosion, we seem to be able to retrieve hidden items in our memory efficiently through association.
The purpose of this remark is purely to provide background knowledge of previous attempts. Just because previous attempts have failed doesn't mean that DeepMind cannot succeed in achieving Artificial General Intelligence. After all, previous attempts have never achieved as much as AlphaGo, which beats top human players in Go -- no one ever believed this could have been achieved within the next decade.
[End]
Related: