AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Next step chess1/11/2024 ![]() ![]() Often, value adaptation is reinforced by determining an expected outcome by self play. in form of annotations from position evaluation symbols. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. Ī second supervised learning approach used to tune evaluation weights is based on regression of the desired value, i.e. Jonathan Schaeffer's and Paul Lu's efforts to make Deep Thought's approach work for Chinook in 1990 failed - nothing seemed to produce results that were as good than their hand-tuned effort. in 1990, and later published by Andreas Nowatzyk, is also based on an extended form of move adaptation. Eval Tuning in Deep Thought as mentioned by Feng-hsiung Hsu et al. In chess, move adaptation was first described by Thomas Nitsche in 1982, and with some extensions by Tony Marsland in 1985. Already pioneering in reinforcement learning some years before, move adaptation was described by Arthur Samuel in 1967 as used in the second version of his checkers player, where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert. ![]() One supervised learning method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. TD-λ was famously applied by Gerald Tesauro in his Backgammon program TD-Gammon, its minimax adaptation TD-Leaf was successful used in eval tuning of chess programs, with KnightCap and CilkChess as prominent samples. This TD method was generalized and formalized by Richard Sutton in 1988, who introduced the decay parameter λ, where proportions of the score came from the outcome of Monte Carlo simulated games, tapering between bootstrapping (λ = 0) and Monte Carlo (λ = 1). ![]() In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the score of the root position after a quiescence search became closer to the score of the full search. Reinforcement learning, in particular temporal difference learning, has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by Arthur Samuel in his Checkers player. ![]()
0 Comments
Read More
Leave a Reply. |