# How does the AlphaZero chess playing computer evaluate positions?

From what I have read, the computer was only taught the rules of the game. See, for example, this aricle It shows an illustration of neural network. What I want to know is, what information is stored? It can’t just be the location of pieces on the board, because it will probably never again encounter the same board position. Does it count the number of pieces taken? Does it have some other way of evaluating a given position, and how is that possible if it was only told the rules of the game?

Observing members: 0 Composing members: 0

It does have a database of positions.
It got that database by playing millions of games against itself, building up a memory of probabilities of which position likely leads to a win, and selecting the next move accordingly.

ragingloli (49946)

But is it ever going to see the same position again? Or are the endgames sufficiently small in number that it will likely have played most of them in its training?

For reference, the first ancestor of Alphazero, Alphago, was provided with an extensive database of human games, that it worked off of, as a baseline before it started self-play.

ragingloli (49946)

Think of it this way:
There may be more possible positions than there are atoms in the universe, but there are only a limited number of positions and moves that make sense in the context of winning a game.
This greatly reduces the required size of the necessary database.

ragingloli (49946)

It’s storing weights and biases and using the activation function for each node.

gorillapaws (28265)

AlphaZero was told nothing other than the rules of the game then it set off playing millions of games against itself. I think that’s why they called it “zero” as it began with no prior knowledge. Its style of play is alien and it makes moves humans wouldn’t normally consider good play but it keeps winning.

flutherother (32683)

@gorillapaws , But what input is being weighted and how does the computer know what is should look at?

It extensively uses reinforcement learning. Specifically, the Monte Carlo Tree search. I believe there must be temporal-distance methods as well. That’s a fancy way of saying it has no model but it builds experience playing against itself.

The inputs are probably the position of the pieces on the board. These get fed to the “hidden nodes” which apply their individual biases and weights to the input values. These values get passed into activation functions. The neuron then passes its output from its activation function to the next layer of connected neurons as one of their respective inputs. Each of those neurons aggregate their input values, apply their respective weights and biases and then pass the results into its activation function. The output of this neuron’s activation function is then passed on to the next connected neurons until they eventually reach output neurons that instruct the computer which move to make. Once the move is made, a function is run to evaluate how good the move was. Then the result is “backpropogated” through the neurons to solve for what the appropriate weights/balances ought to have been to get the correct result (basic algebra).

After millions of games, the weights and biases begin to get honed to the ideal values for the use-case they’re being trained on. Those are the values being stored in a neural network.

gorillapaws (28265)

Each input neuron has to have an associated numerical value which gets multiplied by a varying weight. What are these numerical input values?

@LostInParadise I think you’d have each piece assigned a number, pawn = 0, knight = 1, etc. Each board position could be a number, and the color of the piece could be represented as a number as well. Remember that a node can have many inputs.

gorillapaws (28265)

Maybe it is something like that, but if the computer is not given any information, then it would have to assign a value of 1 to each of its pieces and adjust the weights as it plays, so that the weights of the stronger pieces would increase.

If I remember correctly, Alphazero, unlike conventional chess engines, does not assign assign hard values to each type of piece. The internal value of each piece emerges from the tactical usefulness of each piece, that it figured out during self play.

ragingloli (49946)

@LostInParadise ”...then it would have to assign a value of 1 to each of its pieces and adjust the weights as it plays, so that the weights of the stronger pieces would increase.”

Kind of. They may be using a value for each piece in the evaluation function (at the end of the turn) to determine if the output was a good or bad move (to backpropogate new values), but there’s nowhere in the matrix of neurons where it states a pawn is worth 1 point, and this makes sense since a pawn 1 square away from the last row is much more valuable for example. Likewise a queen blocked into a space isn’t nearly as valuable as a pawn that could force a checkmate.

I found this article describing the architecture of the neural network for the Stockfish chess engine (there are some good diagrams here as well). It’s pretty complex stuff (I don’t fully understand it), and may not make sense without a better general understanding of how neural networks work, but I suspect you would be able to follow the general idea of how they structure the inputs and nodes.

gorillapaws (28265)

or