Next step chess

1/11/2024

Often, value adaptation is reinforced by determining an expected outcome by self play. in form of annotations from position evaluation symbols. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. Ī second supervised learning approach used to tune evaluation weights is based on regression of the desired value, i.e. Jonathan Schaeffer's and Paul Lu's efforts to make Deep Thought's approach work for Chinook in 1990 failed - nothing seemed to produce results that were as good than their hand-tuned effort. in 1990, and later published by Andreas Nowatzyk, is also based on an extended form of move adaptation. Eval Tuning in Deep Thought as mentioned by Feng-hsiung Hsu et al. In chess, move adaptation was first described by Thomas Nitsche in 1982, and with some extensions by Tony Marsland in 1985. Already pioneering in reinforcement learning some years before, move adaptation was described by Arthur Samuel in 1967 as used in the second version of his checkers player, where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert.

One supervised learning method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. TD-λ was famously applied by Gerald Tesauro in his Backgammon program TD-Gammon, its minimax adaptation TD-Leaf was successful used in eval tuning of chess programs, with KnightCap and CilkChess as prominent samples. This TD method was generalized and formalized by Richard Sutton in 1988, who introduced the decay parameter λ, where proportions of the score came from the outcome of Monte Carlo simulated games, tapering between bootstrapping (λ = 0) and Monte Carlo (λ = 1).

In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the score of the root position after a quiescence search became closer to the score of the full search. Reinforcement learning, in particular temporal difference learning, has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by Arthur Samuel in his Checkers player.

Time complexity issues with increasing number of weights to tune.
Takes search-eval interaction into account.
Works with all engine parameters, including search.
Mathematical optimization methods in tuning consider the engine as a black box. 91-94, describing a theorem (of existence) which says that in case of linear evaluation functions with lots of terms there is always a small subset of the terms such that this set with the right parameters is almost as good as the full evaluation function. Some 12 years ago I had a technical article on this ("On telescoping linear evaluation functions") in the ICCA Journal, Vol. It is one of the best arts to find the right SMALL set of parameters and to tune them. Playing many games with ultra short time controls has became de facto standard with todays strong programs, as for instance applied in Stockfish's Fishtest, using the sequential probability ratio test (SPRT) to possibly terminate a match early. The closer the strength of two opponents, the more games are necessary to determine whether changed parameters or weights in one of them are improvements or not, up to several tens of thousands. Therefore, measuring strength requires to play many games against a reference opponent to determine the win rate with a certain confidence. In particular, solving test-positions does not necessarily correlate with practical playing strength in matches against other opponents. Using small sets of test-positions, which was quite common in former times to estimate relative strength of chess programs, lacks adequate diversity for a reliable strength predication. A difficulty in tuning and automated tuning of engine parameters is measuring playing strength.

0 Comments

Author

Archives

Categories

Next step chess

Leave a Reply.