UCT算法的实现,,,.doc

上传人：工*** IP属地：浙江上传时间：2020-04-16 格式：DOC 页数：8 大小：18.78KB 积分：12 举报 版权申诉

已阅读5页，还剩3页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

UCT算法的实现在网上，实在很难找到一篇全面一点的介绍mc和uct的论文，即使是E文的东西，也因为专业性实在太强，导致很难读懂（only for me），偶尔看到crazystone 网站上的一点东西，只好免强啃下来，E文水平实在太差，感兴趣的朋友只好对照E文啃了！UCTfor Monte Carlo computer goA Monte Carlo (MC) go program plays random games and easily evaluates the terminal position after two passes using Chinese rules. A MC program searches for moves that have high win rates, calculated from playing out at least a few hundred random games.一个mc围棋程序随机地下棋，而且根据中国规则可以很容易地对双方结束后的态势进行估值。Mc程序经过计算无数个棋局搜索那个取得最高胜率的走法。A basic MC program would simply collect statistics for all children of the root node, but MC evaluation of these moves will not converge to the best move. It is therefore necessary to do a deeper search.一个最基本的mc程序会简单地为第一个根节点统计数值。但是这些着法的mc估值不会趋向于一个最佳着法，因此需要做更深的搜索。The UCT-method (which stands for Upper Confidence bounds applied to Trees) is a very natural extension to MC-search, where for each played game the first moves are selected by searching a tree which is grown in memory, and as soon as a terminal node is found a new move/child is added to the tree and the rest of the game is played randomly. The evaluation of the finished random game is then used to update the statistics of all moves in the tree that were part of that game.UCT算法（树图置信）是对mc搜索自然的扩展，对每一盘棋，通过搜索一个在内存中生长的树来确定最初的着法选择。当一个枝端节点出现时，再生长一个新的枝节点，余下的棋自由走下去。利用棋局最后的估值更新对局树中所有走法的统计数据，而这些走法仅是棋局的一部分。The UCT-method solves the problem of selecting moves in the tree such that important moves are searched more often than moves that appears to be bad. One way of doing this would be to select the node with the highest winrate 50% of the time and select a random move otherwise.UCT算法解决了一个问题，就是在树中选择走法，重要的走法比那些看起来差点的走法更多地被。一个办法就是选择最高胜率高过50的，或者随机走法。UCT also selects the best move most of the time but also explores other moves in a more sophisticated way. It does this by adding a number to the win rate for each candidate move that goes down every time the node has been visited. But this number also goes up a little every time the parent was visited and some other move is selected. This means that the win rate + the number will grow for unexplored moves so that at some point the sum is higher than for all moves that have higher winrates. If the move works, then the winrate goes up and it may soon be selected again. If the move failed to win then the winrate goes down as well as the added number and the move may have to wait for a long time before the added number has grown large enough. A move can also be selected if all other moves are refuted so that the winrates for all competitors goes down.UCT很多的时候不仅选择最好的走法，而且还探索更加通常的一些走法。这样做是通过对每个被访问的低势候选走法的胜率增加一个数来实现的。但是这个数每次在父节点被访问或是其它走法被选择时会同时升高一点。这就是当胜率加上为每个未被找到的走法增长的值，因此在有些点，汇总值要高于那些有较高胜率的点。如果这一步可行，那么胜率增加，很快被重新选择，如果这一步导致失败，那么胜率变小，这个增加值和走法会在增加数值增长到足够大之前再等一段时间。一个走法能够被选择，如果所有其它走法被拒绝，因而各方所有的胜率都降底。In summary: starting from the root UCT selects a path of moves through the tree by computing a value for each candidate move based on the winrate and how many times the move has been played as well as how many times the position has been visited. If there are children to a node that has not been visited yet then one of those moves are selected at random. Since this algorithm does not assume any knowledge about go it is natural to explore every move at least once.总之，从根节点开始，UCT通过为每个基于胜率和走棋次数（即这个节点被访问的次数）的候选走法去计算一个值用来从树中选择一个走法路径。如果有些子节点还未被访问，那么这些节点会被随机选择。故而这个算法不需预设的任何围棋知识，它自然能够至少一次遍历到每种走法。The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with胜率和每个候选走法n的数值通过如下公式计算。UCTValue(parent,n)=winrate+sqrt(ln(parent.visits)/(5*n.nodevisits)and the candidate move n with highest UCTvalue is selected to be played next.具有最高UCTvalue值的候选的走法n被用于下一步棋。With UCT one can strike a good balance between searching the best move so far and exploring alternative moves. I think the innovation of UCT (to me at least) is the logarithm of the number of visits to the parent node for the UCT-Value. This value goes up quickly early but then it becomes flat, but never stops rising. It goes to infinity although very very slowly. All moves will be searched no matter how bad winrates they have, so that UCT-search given close to infinite time and memory will find any winning move no matter how misleading the winrates are initially.利用UCT，你可以在进一步的搜索最好走法和找到不相上下的走法之间得到一个好的平衡。我想UCT的创新 (至少对我)是就UCT-Value值而言即是父节点访问数量的对数。这个值早期很快增大，随即增长平缓，但从不停止增大。它会很慢很慢地趋向无穷大。不管胜率有多差，所有的走法都会被搜索得到。因此只要时间和内存无穷大，UCT搜索将会找到任何可胜的走法，不管最初胜率是如何的难以确定。Implementation details实现的细节There are at least two ways to implement the tree:至少有两种方法可以实现这个树1. Build a tree in memory Pro: Easy to program Con: You quickly run out of memory using this method.1、在内存中建立一个树。优点：编程容易，缺点：利用这种方法你很快就会用完所有内存。2. Transposition Table: Pro: Identical positions reached with different move orders will not be searched twice Con: Depending on implementation details one might have problems with the correctness of t

人人文库> 全部分类> 教育资料 > 幼儿教育

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

UCT算法的实现,,,.doc

文档简介

温馨提示

最新文档

评论

UCT算法的实现,,,.doc

文档简介

温馨提示

最新文档

评论

相关文档