UCT算法的实现培训讲学

上传人：M*** IP属地：天津上传时间：2023-07-06 格式：DOCX 页数：8 大小：18.39KB 积分：22 举报 版权申诉

已阅读5页，还剩3页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

UCT算法的实现,,,精品文档UCT算法的实现在网上，实在很难找到一篇全面一点的介绍 mc和uct的论文，即使是E文的东西，也因为专业性实在太强，导致很难读懂（onlyforme），偶尔看到crazystone网站上的一点东西，只好免强啃下来，E文水平实在太差，感兴趣的朋友只好对照E文啃了！{UCTforMonteCarlocomputergoAMonteCarlo(MC)goprogramplaysrandomgamesandeasilyevaluatestheterminalpositionaftertwopassesusingChineserules.AMCprogramsearchesformovesthathavehighwinrates,calculatedfromplayingoutatleastafewhundredrandomgames.一个mc围棋程序随机地下棋，而且根据中国规则可以很容易地对双方结束后的态势进行估值。Mc程序经过计算无数个棋局搜索那个取得最高胜率的走法。收集于网络，如有侵权请联系管理员删除精品文档AbasicMCprogramwouldsimplycollectstatisticsforallchildrenoftherootnode,butMCevaluationofthesemoveswillnotconvergetothebestmove.Itisthereforenecessarytodoadeepersearch.一个最基本的mc程序会简单地为第一个根节点统计数值。但是这些着法的 mc估值不会趋向于一个最佳着法，因此需要做更深的搜索。TheUCT-method(whichstandsforUpperConfidenceboundsappliedtoTrees)isaverynaturalextensiontoMC-search,whereforeachplayedgamethefirstmovesareselectedbysearchingatreewhichisgrowninmemory,andassoonasaterminalnodeisfoundanewmove/childisaddedtothetreeandtherestofthegameisplayedrandomly.Theevaluationofthefinishedrandomgameisthenusedtoupdatethestatisticsofallmovesinthetreethatwerepartofthatgame.UCT算法（树图置信）是对mc搜索自然的扩展，对每一盘棋，通过搜索一个在内存中生长的树来确定最初的着法选择。当一个枝端节点出现时，再生长一个新的枝节点，余下的棋自由走下去。利用棋收集于网络，如有侵权请联系管理员删除精品文档局最后的估值更新对局树中所有走法的统计数据，而这些走法仅是棋局的一部分。TheUCT-methodsolvestheproblemofselectingmovesinthetreesuchthatimportantmovesaresearchedmoreoftenthanmovesthatappearstobebad.Onewayofdoingthiswouldbetoselectthenodewiththehighestwinrate50%ofthetimeandselectarandommoveotherwise.UCT算法解决了一个问题，就是在树中选择走法，重要的走法比那些看起来差点的走法更多地被。一个办法就是选择最高胜率高过50％的，或者随机走法。UCTalsoselectsthebestmovemostofthetimebutalsoexploresothermovesinamoresophisticatedway.Itdoesthisbyaddinganumbertothewinrateforeachcandidatemovethatgoesdowneverytimethenodehasbeenvisited.Butthisnumberalsogoesupalittleeverytimetheparentwasvisitedandsomeothermoveisselected.Thismeansthatthewinrate+thenumberwillgrowforunexploredmovessothatatsomepointthesumishigherthanforallmovesthathavehigherwinrates.Ifthemoveworks,thenthe收集于网络，如有侵权请联系管理员删除精品文档winrategoesupanditmaysoonbeselectedagain.Ifthemovefailedtowinthenthewinrategoesdownaswellastheaddednumberandthemovemayhavetowaitforalongtimebeforetheaddednumberhasgrownlargeenough.Amovecanalsobeselectedifallothermovesarerefutedsothatthewinratesforallcompetitorsgoesdown.UCT很多的时候不仅选择最好的走法，而且还探索更加通常的一些走法。这样做是通过对每个被访问的低势候选走法的胜率增加一个数来实现的。但是这个数每次在父节点被访问或是其它走法被选择时会同时升高一点。这就是当胜率加上为每个未被找到的走法增长的值，因此在有些点，汇总值要高于那些有较高胜率的点。如果这一步可行，那么胜率增加，很快被重新选择，如果这一步导致失败，那么胜率变小，这个增加值和走法会在增加数值增长到足够大之前再等一段时间。一个走法能够被选择，如果所有其它走法被拒绝，因而各方所有的胜率都降底。Insummary:startingfromtherootUCTselectsapathofmovesthroughthetreebycomputingavalueforeachcandidatemovebasedonthewinrateand收集于网络，如有侵权请联系管理员删除精品文档howmanytimesthemovehasbeenplayedaswellashowmanytimesthepositionhasbeenvisited.Iftherearechildrentoanodethathasnotbeenvisitedyetthenoneofthosemovesareselectedatrandom.Sincethisalgorithmdoesnotassumeanyknowledgeaboutgoitisnaturaltoexploreeverymoveatleastonce.总之，从根节点开始， UCT通过为每个基于胜率和走棋次数（即这个节点被访问的次数）的候选走法去计算一个值用来从树中选择一个走法路径。如果有些子节点还未被访问，那么这些节点会被随机选择。故而这个算法不需预设的任何围棋知识，它自然能够至少一次遍历到每种走法。Thesumofthewinrateandthenumber(UCTvalue)foreachcandidatemoveniscomputedwith胜率和每个候选走法 n的数值通过如下公式计算。UCTValue(parent,n)=winrate+sqrt((ln(parent.visits))/(5*n.nodevisits))andthecandidatemovenwithhighestUCTvalueisselectedtobeplayednext.收集于网络，如有侵权请联系管理员删除精品文档具有最高UCTvalue值的候选的走法n被用于下一步棋。WithUCTonecanstrikeagoodbalancebetweensearchingthebestmovesofarandexploringalternativemoves.IthinktheinnovationofUCT(tomeatleast)isthelogarithmofthenumberofvisitstotheparentnodefortheUCT-Value.Thisvaluegoesupquicklyearlybutthenitbecomesflat,butneverstopsrising.Itgoestoinfinityalthoughveryveryslowly.Allmoveswillbesearchednomatterhowbadwinratestheyhave,sothatUCT-searchgivenclosetoinfinitetimeandmemorywillfindanywinningmovenomatterhowmisleadingthewinratesareinitially.利用UCT，你可以在进一步的搜索最好走法和找到不相上下的走法之间得到一个好的平衡。我想UCT的创新(至少对我)是就UCT-Value值而言即是父节点访问数量的对数。这个值早期很快增大，随即增长平缓，但从不停止增大。它会很慢很慢地趋向无穷大。不管胜率有多差，所有的走法都会被搜索得到。因此只要时间和内存无穷大，UCT搜索将会找到任何可胜的走法，不管最初胜率是如何的难以确定。收集于网络，如有侵权请联系管理员删除精品文档Implementationdetails实现的细节Thereareatleasttwowaystoimplementthetree:至少有两种方法可以实现这个树BuildatreeinmemoryPro:EasytoprogramCon:Youquicklyrunoutofmemoryusingthismethod.1、在内存中建立一个树。优点：编程容易，缺点：利用这种方法你很快就会用完所有内存。TranspositionTable:Pro:IdenticalpositionsreachedwithdifferentmoveorderswillnotbesearchedtwiceCon:Dependingonimplementationdetailsonemighthaveproblemswiththecorrectnessofthealgori

人人文库> 全部分类> 办公材料 > 办公文档

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

UCT算法的实现培训讲学

文档简介

温馨提示

最新文档

评论

UCT算法的实现培训讲学

文档简介

温馨提示

最新文档

评论

相关文档