In particular, the AlphaGo paper mentioned four neural networks of significance:
- a policy network trained on human pro games.
- a RL-enhanced policy network improving on the original SL-trained policy network.
- a value network trained on games generated by the RL-enhanced policy network
- a cheap policy network trained on human pro games, used only for rapid rollout simulations.
The cheap rollout policy network was discarded because DeepMind found that a "slow evaluations of the right positions" was better than "rapid evaluations of questionable positions". The independently trained value network was discarded because co-training a value and policy head on a shared trunk saved a significant amount of compute, and helped regularize both objectives against each other. The RL-enhanced policy network was discarded in favor of training the policy network to directly replicate MCTS search statistics.
The depth and branching factor in chess and Go are different, so I won't say the solutions ought to be the same, but it's interesting nonetheless to see the original AlphaGo ideas be resurrected in this form.
The incremental updates are also related to Zobrist Hashing, which the Stockfish authors are certainly aware of.
Alpha-beta pruning however always explores the entire searchtree systematically (up to a certain depth using itd) and prunes the searchtree by discarding bad moves. In this case you don't need as good of an evaluation function because you search the entire tree anyways. Rather you need the function to be fast, as it is evaluated on many more states.
In general AB-pruning only needs the heuristic for estimating the tail-end of the tree and for sorting the states based on usefulness, while MCTS uses all the above plus guiding the whole search process.
Spending tons of time on the heuristic is not useful as even the optimal search order can only double your searchdepth. Don't get me wrong that still a lot (especially considering exponential blowup) but MCTS can surpass this depth easily. The disadvantage is that MCTS loses a lot of the guarantees of AB-pruning and tends to "play down to his opponent" when trained using self-play because the exploration order is entirely determined by the policy.
There are two trainers currently, the original one, which runs on CPU: https://github.com/nodchip/Stockfish, and a pytorch one which runs on GPU: https://github.com/glinscott/nnue-pytorch.
The SF Discord is where all of the discussion/development is happening: https://discord.gg/KGfhSJd.
Right now there is a lot of experimentation to try adjusting the network architecture. The current leading approach is a much larger net which takes in attack information per square (eg. is this piece attacked by more pieces than it's defended by?). That network is a little slower, but the additional information seems to be enough to be stronger than the current architecture.
Btw, the original Shogi developers really did something amazing. The nodchip trainer is all custom code, and trains extremely strong nets. There are all sorts of subtle tricks embedded in there as well that led to stronger nets. Not to mention, getting the quantization (float32 -> int16/int8) working gracefully is a huge challenge.
Most of the times one would learn a model by just changing a single feature and then doing the whole sum made no sense.
A good example is learning a sequence of decisions, where each decision might have a cost associated to it, you can then say that the current decision depends on a previous one and vary the previous one to learn to recover from errors. If previous decision was bad, then you'd still like to make the best decision for current state.).
So even if your training data does not have this error-recovery example, you can just iterate through all previous decisions and make the model learn to recover.
An optimization in that case would be to just not redo the whole sum (for computing the decision function of a linear model).
What I would find interesting is if they could give engines an energy budget instead of a time limit. Maybe that makes CPU vs GPU games more interesting & fair.
For single-input "batches" (seems like this is what's being used now?) it might never be worthwhile but perhaps if multiple positions could be searched in parallel and the NN evaluation batched this might start to look tempting?
Not sure what the effect of running PVS with multiple parallel search threads is. Presumably the payoff of searching with less information means you reach the performance ceiling quite a lot quicker than MCTS-like searches as the pruning is a lot more sensitive to having up-to-date information about the principal variation.
Disclaimer: My understanding of PVS is very limited.
But I'm not sure how much of this applies to chess engines. I see some comments noting that the search part makes it hard to generate batches, but I'm not an expert in this.
The way this is made cheap is by making it incremental: given some board s and output of the first layer b + Ws, it is cheap to compute b + Wt where a t is a board that is similar to s (the difference is W(t-s) but the vector t-s is 0 in almost every element.)
This motivates some of the engineering choices like using integers instead of floats. If you used floats then this incremental update wouldn’t work.
It seems to me that a lot of the smarts of stockfish will be in the search algorithm getting good results, but I don’t know if that just requires a bit of parallelism (surely some kind of work-stealing scheduler) and brute force or if it mostly relies on some more clever strategies. And maybe I’m wrong and the key is really in the scoring of positions.
The neural net is only for scoring positions, not moves. Check the article on Stockfish NNUE at the chess programming wiki: https://www.chessprogramming.org/Stockfish_NNUE
There is a LOT of cleverness built into the stockfish search as well.
In fact, while neural networks have now won the position evolution game, it is still an open discussion in the chess programming community if the Alpha Zero / Leela search (NN + Monte Carlo) is as good as Stockfish's PVS.
Of course floats aren't strictly associative so you wouldn't get bitwise equivalence between the incremental and non-incremental updates, but I don't see how that would matter in this context.
The rest goes into sign and exponent.
Some of the Leela-Stockfish games are analyzed by agadmator on YouTube .
Traditionally this wasn't a big problem as every engine was more or less optimised for a fast Intel CPU with a moderate to large amount of RAM. The organisers would decree the specs of the championship hardware some time in advance. Now, (at least for TCEC, the other major engine tournament) they pick two hardware configurations, one CPU-heavy and one GPU-heavy, and give each team the choice.
How do you balance those? It's been suggested you should make them equal in terms of watts of power, or dollar cost to buy, but neither of those are obviously best. In practice the TCEC organisers pick something close to what they picked last time but shade it against the winning engines, making the contest more even. Chess.com likely do something similar though they're less rigorous about the details.
Do they give these computers a rating like they do human players?
Note that the numbers aren't calibrated to human players, so there's no claim that an engine with a rating of 3100 on some particular hardware should score 85% against a human rated 2800, or whatever.
There is a current tournament going on at tcec-chess.com/ which stockfish has been leading so far, but I see Leela has just caught up in the head to head.
Of course Leela also keeps evolving.
This makes things a bit worse for black, typically, because playing for the win as black means making some positional concession (that white can exploit after successfully defending) in exchange for the initiative and a chance for an attack.
In human play, black has a little more ability than white to steer the game towards their opening preparation. If they want to win, they will try to get have a position that they know in and out from (computer-assisted) home preparation. But there are still a lot of draws in classical (many hours per game) chess.
 Chess openings that white chooses are typically called "attack" (King's Indian Attack) or "game" (Scotch Game), while those that black chooses are called "defense" (Sicilian defense). You'll find that there are a lot more "defenses" that black can choose from than "attacks".
I've never heard this before, and it seems strange! Do you have a link to tornament rules with this?
Computer chess is traditionally really strong in openings because their opening book is just database lookups. What is getting a human to play the same opening book supposed to achieve?
Why would someone not want to win?
One thing I don't understand is why it would be smarter to augment the inputs with some of the missing board information - particularly the availability of castling. Even though this network is a quick-and-dirty evaluation, seems like there's room for a few additional bits in the input and being able to castle very much changes the evaluation of some common positions.
This network is just to signal to the final search algorithm how good the board looks in a given spot on tree of possible moves.
A few examples come to mind where, for instance, white has a bishop and knight attacking f7 and it's black to move. If black castles nothing interesting happens and white is likely overextended. But if black cannot castle white will win a rook.
I can make up a similar situation with en passant, and this does come up, but a lot less frequently.
So anyway, surprised you wouldn't toss a few bits in there, at least the 2 for ability to castle by each player.
In the chess scenario if you can prove that there is a set of moves where you win no matter what the other colour does then you have solved the game. You don't need to consider every possible game, just the games reachable within this set of moves.
Solving chess would be proving that "From starting position black can win every time" (or the same for white). You don't need to prove how to win from every possible board state.
Having said this, the nature of many chess endgames suggests that such a proof is not really possible, or at least would not be "simple". As an example, tempo / opposition flips games from draws to wins, etc.
Assuming they would want to waste part of a planet's worth of computing resources to play perfect chess. In any case, it's far beyond anything we can compute.
This optimization always done by chess engines, it's called pruning (they'd be quite crippled without it).
Maybe the neural network component is now in charge of it, but it's not a new thing.
> it’s a simple feedforward network with... a single scalar output to give a numerical score for the position, indicating how favourable it is for the player about to move.
The search algorithm proposes moves, computes the board state resulting from them, and evaluates each state.
This may seem like a trivial distinction, but it's important. The Stockfish method requires knowing the rules of the game; networks that output a move directly do not.
I don't think this is correct. According to the article, they are only using the neural network for evaluating the value (strength) of the positions. The search is still the same.
I believe Leela works in the manner you described.
There is also the 50-move rule of course which limits more drastically the length of a game.
It's a typical mistake to equate chess draws with a stalemate, possibly due to the broader, idiomatic meaning of "stalemate" in common speech.
The word you're looking for is "deterministic", rather than solved. Solved is usually used to mean the exhaustive computation has been done. Deterministic would mean it could be if enough computing power existed, but for chess it doesn't.
This is, for example, why we have a concrete number for the number of legal go positions even though there are far too many to enumerate. https://en.wikipedia.org/wiki/Go_and_mathematics#Legal_posit...
The rules of chess are so irregular that it seems likely that any latent structure would be tremendously complex, but you never know what some clever new analysis technique will do.
Chess is a hard game to solve completely, and I think
one reason is that there are many states in chess where there is not one superior move, but only a probable optimal move, in the sense that the game-tree for winning against the move is smaller than other moves.
Then the agent/player has to guess what the opponents strategy is given that move, and that depends on the opponent.
I plays are truly creative, and its results speak for itself.
"In AlphaZero's chess match against Stockfish 8 (2016 TCEC world champion), each program was given one minute per move. Stockfish was allocated 64 threads and a hash size of 1 GB, a setting that Stockfish's Tord Romstad later criticized as suboptimal. AlphaZero was trained on chess for a total of nine hours before the match. During the match, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72. In a series of twelve, 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24."
Here is a graph of Stockfish's progress since the end of 2015: https://camo.githubusercontent.com/f169f774996346ad146f96f74.... Self-play exaggerates strength differences but it's been a very good rate of progress, even without hardware improvements in the meantime.
A solved game is categorically different from merely teaching a machine to be better at it than humans are presently.
Cepheus  is approximately a solution to Heads-Up (two player) Limit (ie the amount you can bet is specified in the game, you don't get to just bet arbitrary money) Texas Hold 'em poker.
In contrast Libratus is an AI that plays Heads-Up No Limit Hold 'em very well against humans.
You can examine the Cepheus strategy for yourself, if you could memorize it (it's too complicated) and were capable of making truly random decisions (Poker is a game of chance and so your strategy needs random action) you could reproduce it and be exactly as good at the game as Cepheus is. You can examine individual strategy elements and reason about them. For example if Cepheus gets an 8 and a 3 and you open with a bet, it will call your bet if they're the same suit, otherwise it will fold. On the other hand if those suited cards were an 8 and a Queen, it would raise a bit more than nine times out of ten.
You can't do anything about it (you might be thinking surely knowing that e.g. Cepheus folds 8-3 off here is valuable so I can benefit from that, er, no, "hiding" that by sometimes playing it loses more money than Cepheus gives up by folding it that's why this is a perfect strategy), if playing Cepheus, even with an incrementally "more" approximate solution you're not going to win a significant amount of money reliably in reasonable time, that's why it's approximately solved.
But Libratus isn't like that, it's playing some strategy that we know beat world class human players, but a further incrementally better AI might crush Libratus just as badly.
I don't think this statement is true given , but maybe I understood you wrong.
The uncertainty is due to the inability to scan the entire sub-tree for each move.
The theorem you linked to would only be applicable in certain endgame situations.
Maybe it's not clear to human players or current engines, but that does not mean it doesn't exist, and indeed the theorem mentioned above states that at least one player has an optimal strategy (either forcing a draw or winning). Obviously, that does not necessarily mean that we'll ever be able to design an engine that can compute this strategy in a reasonable time.
I just think that OP meant something other than the usual definition of "solved", maybe they meant that engines are plateauing, but as others have pointed out, there is also little evidence for that for now.
It says that if the game cannot end in a draw, then one of the two players must have a winning strategy (i.e. force a win).
It made me think it didn't apply in situations where the other player could force a draw.
Reading the linked article though made it more clear what it's about, and I agree with what you said.
That uncertainty is specifically why chess is unsolved.
edit: so to make it crystal clear, I didn't try to argue chess was solved, just to point out why I think that theorem doesn't really apply to most of chess.
Lots of pieces about neural network design skip over the design of the representation in the input stage, which is one of the key design issues when building a custom neural network.
I love how much depth this articles goes into about that representation.