




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、How to Win a Chinese Chess Game,Reinforcement Learning Cheng, Wen Ju,Set Up,RIVER,General,Guard,Minister,Rook,Knight,Cannon,Pawn,Training,how long does it to take for a human? how long does it to take for a computer? Chess program, “KnightCap”, used TD to learn its evaluation function while playing
2、on the Free Internet Chess Server (FICS, ), improved from a 1650 rating to a 2100 rating (the level of US Master, world champion are rating around 2900) in just 308 games and 3 days of play.,Training,to play a series of games in a self-play learning mode using temporal difference learning The goal i
3、s to learn some simple strategies piece values or weights,Why Temporal Difference Learning,the average branching factor for the game tree is usually around 30 the average game lasts around 100 ply the size of a game tree is 30100,Searching,alpha-beta search 3 ply search vs 4 ply search horizon effec
4、t quiescence cutoff search,Horizon Effect,t,t+1,t+2,t+3,Evaluation Function,feature property of the game feature evaluators Rook, Knight, Cannon , Minister, Guard, and Pawn weight: the value of a specific piece type feature function: f return the current players piece advantage on a scale from -1 to
5、 1 evaluation function: Y Y = k=1 to 7 wk * fk,TD() and Updating the Weights,wi, t+1 = wi, t + a (Yt+1 Yt)S k=1 to t l t-k wiYk = wi, t + a (Yt+1 Yt)(fi, t + l fi, t-1 + l 2fi, t-2 + + l t-1fi, 1) = 0.01 learning rate how quickly the weights can change = 0.01 feedback coefficient -how much to discou
6、nt past values,Features Table,Array of Weights,Example,t=5,t=6,t=7,t-8,Final Reward,loser if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is twice the board if the board evaluation is positive, then the final reward is -2 times the board evaluation winn
7、er if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is -2 times the board evaluation if the board evaluation is positive, then the final reward is twice the board evaluation,Final Reward,the weights are normalized by dividing by the greatest weight any n
8、egative weights are set to zero the most valuable piece has weight 1,Summary of Main Events,Reds turn Update weights for Red using TD() Red does alpha-beta search. Red executes the best move found Blues turn Update weights for Blue using TD() Blue does alpha-beta search Blue executes the best move f
9、ound (go to 1),After the Game Ends,Calculate and assign final reward for losing player Calculate and assign final reward for winning player Normalize the weights between 0 and 1,Results,10 games series 100 games series learned weights are carried over into the next series began with all weights init
10、ialized to 1 The goal is to learn the different the piece values that is close to the default values defined by H.T. Lau or even better,Observed Behavior,the early stages played pretty randomly after 20 games had identified the most valuable piece Rook after 250 games played better protecting the va
11、luable pieces, and trying to capture a valuable piece,Weights,Testing,self-play games Red played using the learned weights after 250 games Blue used H.T. Laus equivalent of the weights 5 games red won 3 blue won once one draw,Future Works,8 different types or categories of features: Piece Values Com
12、parative Piece Advantage Mobility Board Position Piece Proximity Time Value of Pieces Piece Combinations Piece Configurations,Examples,Cannon behind Knight,Conclusion,Computer Chinese chess has been studied for more than twenty years. Recently, due to the advancement of AI researches and enhancement
13、 of computer hardware in both efficiency and capacity, some Chinese chess programs with grand-master level (about 6-dan in Taiwan) have been successfully developed. Professor Shun-Chin Hsu of Chang-Jung University (CJU), who has involved in the development of computer Chinese chess programs for a lo
14、ng time of period, points out that “the strength of Chinese chess programs increase 1-dan every three years.” He also predicts that a computer program will beat the “world champion of Chinese chess” before 2012.,When and What,2004 World Computer Chinese Chess Championship Competition Dates : June 25-26, 2004 Prizes : (1) First Place USD 1,500 A gold medal (2) Second Place USD 900 A silver medal (3) Third Place USD 600 A bronze medal (4) Fourth Place USD 300,References,C. Sze
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 辽宁省盘锦市第二高级中学2025年物理高一下期末监测模拟试题含解析
- 2025年广西贵港市覃塘高中物理高二第二学期期末统考试题含解析
- 冬至家长进校园课件
- 租赁合同之续租协议
- 二零二五年度贷款购车担保服务合同范本下载-@-1
- 2025年度高端物业保安劳务派遣服务协议范本
- 二零二五年沧州创意园区办公场地租赁标准范本
- 二零二五年企事业单位食堂清洁外包协议
- 2025版餐饮业食品安全管理体系合同
- 2025版安全生产标准化安全文化建设服务合同
- 中医课件 第二节方剂的分类及常用方剂(临本)学习资料
- 施工单位项目物资管理
- 2025安徽安庆市桐城经开区建设投资集团有限公司招聘12人笔试参考题库附带答案详解
- 给水管道试压、冲洗消毒方案
- 企业人力资源管理创新与发展趋势分析
- 2022城市轨道交通列车驾驶员技能及素质要求第1部分:地铁、轻轨和单轨
- 《人工智能基础与应用(第2版)》全套教学课件
- 老年群体智能手机使用教程
- 2025年宿迁市公需考试试题
- 高速公路集中养护工作指南-地方标准编制说明
- 建设工程项目的组织协调保障措施
评论
0/150
提交评论