IROS2019国际学术会议论文集 0812_第1页
IROS2019国际学术会议论文集 0812_第2页
IROS2019国际学术会议论文集 0812_第3页
IROS2019国际学术会议论文集 0812_第4页
IROS2019国际学术会议论文集 0812_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Path Planning for Surgery Robot with Bidirectional Continuous Tree Search and Neural Network Rui Jian Huang1 2 Gui Bin Bian1 Chen Xin3 Zhen Li1and Zeng Guang Hou1 Fellow IEEE Abstract Solving a thorny issue of real time path planning for surgery robot in uncertain environments a novel algorithm named bidirectional continuous tree search BCTS is proposed Most partially observable markov decision process POMDP planners address challenges of unknown environments with discrete states observations and actions which are fail to automate the operative procedure However the BCTS method addresses the issue by handling POMDPs in continuous state observation and action spaces The proposed approach has a bidirectional search structure with the intent of greatly improv ing the calculation effi ciency Meanwhile Bayesian optimization BO algorithm is considered to dynamically sample promising actions while we construct a belief tree In view of the speed of BO process the upper and lower bounds of the optimal action values given by fast informed bound FIB and point based value iteration PBVI limit the search scope so we can improve the speed of BO In addition we apply an optimal path planning generator radial basis function neural network RBFNN to obtain a smoother trajectory Finally simulation of glaucoma surgery has been carried out to explore the best surgical approach The results show that the introduced structure can effectively guide the surgery robot to perform surgical procedures and receive a real time as well as smooth path I INTRODUCTION Trajectory planning under complex environment is one of the most important technologies of robot in achieving autonomous mobile Several applications can be derived from the planning such as path planning for mobile vehicles 1 2 and robot grasping 3 But it is diffi cult that we determine an optimal path with a small amount of envi ronmental informations In the fi eld of medicine surgery robot often search for an optimal path quickly based on fragmentary and minimal informations from the external environment Therefore it is of great signifi cance for medical robot developments to effectively deal with such problem about path planning Bayesian optimization an online trajectory planning ap proach was proposed by 4 and 5 Unfortunately BO algorithm will obtain a suboptimal solution due to the myopic This research is supported by the National Key Research and Development Program of China Grant 2017YFB1302704 the National Natural Science Foundation of China Grant U1713220 and the Youth Innovation Promotion Association of the Chinese Academy of Sciences Grant 2018165 1The State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 China Emails guibin bian 2School of Mathematics and Statistics Central South University Chang sha China 3Beijing Institute of Ophthalmology Beijing Tongren Eye Center Beijing Tongren Hospital Capital Medical University and Beijing Ophthalmology and Visual Science Key Lab Beijing China character and it is ineffi cient for BO to gain it As a matter of fact a framework with look ahead planning capabilities can solve this kind of problem which was described as a POMDP by 6 It has been proved that the framework performes better than its myopic equivalent 7 But the method just only addresses the planning issues with a set of discrete actions often leading to a suboptimal policy Therefore it is a challenge that surgery robots rapidly plan a optimal path with continuous action spaces A POMDP framework based on monte carlo tree search MCTS named BCTS is proposed to plan optimal path with continuous actions rapidly Firstly MCTS is an approximate method based on tree search algorithm 8 But it can only handle continuous states and observations as well as discrete actions Therefore we present an extension of this method which is named as BCTS for the bidirectional path planning on a continuous action space 9 Thanks to features of dynamic action sampling the BCTS is able to address continuous actions problem Secondly it is obvious that the searching speed of BO is slow And it is necessary for surgery robots to hurry up ensuring instantaneity Accordingly we present two offl ine approximation methods i e FIB and PBVI to limit the upper and lower bounds of the optimal value respectively which can reduce its searching scope and fast locate the value based on BO Based upon the above mentioned two points we can rapidly seek an optimal route with surgery robots A novel neural network radial basis function neural net work RBFNN is adopted for obtaining a smooth trajectory In the medical fi eld robots need to plan a really smooth path But the trajectories generated by most of the path planning algorithms are tortuous Thus we apply RBFNN to make the path smoother 10 With good ability in generalization and nonlinear function estimation the RBFNN framework can approximate any continuous function with arbitrary degree of accuracy Therefore we can get a smooth and immediate trajectory that is the optimal solution as we expect The simulation results indicated that the proposed frame work is highly effective The fi nal algorithms BCTS and RBFNN are demonstrated by surgery robotic control system The BCTS method is adopted in the simulation of glaucoma surgery Based on the special environment of eyeball the bidirectional characteristic optimizes convergence speed and reduces cost Lastly the experiment shows our approach is effective to achieve glaucoma surgery II RELATEDWORK 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE3302 Fig 1 Structure of RBFNN The robots path planning under POMDPs has increasingly drawn more attention from international society and states in the past few decades For instance a offl ine planner the point based approach generally handles a problem by sam pling method but the speed is not satisfactory 11 In most complex system new information will be generated when surgery robots interact with the surrounding environment however offl ine approximation approach may not make full use of it Unlike online techniques the offl ine is computa tionally cheap but poor in real time performance Therefore it is one of the main methods for POMDPs to combine online tree search with offl ine approximation approach at present Most of online POMDP planners have been developed Some of them are proposed in discrete state observation and action spaces 12 14 while some are restricted to contin uous states and observations 8 15 16 Most of these techniques provide more compact representations based on sampling space Random sampling causes expected estimates with regard to states and observations But the behavior that generates the greatest return is not an expectation problem so the same sampling technique may be completely ineffective in solving similar problems In recent decades optimal path problem is an area of tremendous importance in POMDP and has been extensively studied by many researchers Several techniques were pro posed to plan path for POMDPs in continuous action spaces 17 22 But the biggest difference of them is that they have diverse assumptions In the literature 17 it limits the classes of predefi ned policy and greatly restricts policy outcomes a belief satisfying Gaussian distribution restricts the range of application in 18 a specifi c heuristic method is considered based on 19 it can lead to estimate the state inaccurately because of the hypothesis of 20 in the treatise 21 approximation of kernel density is used to approach continuous visit counting heuristic search algorithm usually converges to local minimum in continuous spaces by 22 Therefore this paper proposes a bidirectional continuous tree search method for surgery robots to solve the POMDPs in continuous state observation and action spaces In the application of surgery robots it is of critical im portance for planner to generate continuous and reasonable Fig 2 Model of POMDP Fig 3 Schematic of control strategy about POMDP trajectories Many classic frameworks use waypoints to plan path such as rapidly exploring random tree RRT 23 24 However they require specifi c knowledge on a wide range of issues and computation to accommodate robot kinematics In addition Bezier curves constrain curvature by using reference points so that the generated trajectory is more realistic 25 But it need to adjust many parameters which does not guaranteed to satisfy the physical constraints of robots In this kind of research way we adopt the RBFNN approach in order to construct realistic routes 26 28 The RBFNN framework is a typical forward neural network with three layers i e input layer hidden layer and output layer and it has fast convergence speed The network architecture diagram is shown in Fig 1 It is obvious that the neural network can approximate arbitrary continuous functions with any accuracy which is very effective to fi t the nonlinear function As a result we can obtain a smooth path for surgery robots Additionally our method shows that the trajectory planning for POMDPs with continuous action is more effective than the discrete planning previously III FRAMEWORK A Overiew We want to achieve path planning in a fully continuous and partially observable environment And we regard the situation as a POMDP which is provided with a solution named MCTS in Section III B MCTS simulates the trajectory sequence and constructs a belief tree effectively by extending the optimal branch iteratively At each node we expand MCTS to continuous action with continuous optimization method In Section III C the improvement of BO is proposed to seek 3303 the optimal value function In addition we present a planner about bidirectional continuous tree search BCTS to explore the optimal path in Section III D Finally RBFNN is used to generate a smooth path in Section III E B Monte Carlo Tree Search Partially observable markov decision process POMDP a well defi ned stochastic process can effectively solve the surgery robot path planning The model of POMDP is shown in Fig 2 A POMDP can be expressed as where S A represent fi nite set of states actions and observations respectively In addition T expresses the transition distribution O expresses the distri bution of observations R expresses the reward function and is an exponential discount factor As is shown in Fig 3 the agent maintains the state s S at step t then it transfers from the state s to s S after taking an action a A so we obtain a reward r R and an observation o at the moment The distribution T being provided with the markoy property represents the transition probability that the agent transfers from state s to state s when taking action a that is T s a s Pr s s a The function R expresses the reward when we execute action a in state s Hence the reward R just relies on the state s and action a namely r R s a IR Besides the observational distribution O is a probability of value o observed and the o can successfully transfers to state s by carrying out action a that is O o s a Pr o s a At last the symbol of B represents a belief set of the state Considering the partial observability the agent cannot acquire the real environment But we can provide a belief for the current state to indicate the probability of its occurrence So the policy B A represents that agent adopts an action a in the belief state b The ultimate goal for path planning is fi guring out the optimal policy which maximize the cumulative discounted rewards that is argmax E t 0 tr t b0 1 where b0is the initial value of belief B and r t indicates the reward after executing policy at time t Monte Carlo tree search is a method treated as a partial ran dom search tree It constructs a tree to deal with the POMDP problem effectively where the tree replaces the beliefs with nodes and represents the actions with branches Obviously the tree can be regarded as the possible operation sequences in step t Particularly each branch retains cumulative rewards and the number of accessing We apply the formula above to fi nd the branch with maximum cumulative rewards and replace infi nite sum with the fi nite It is of difference that MCTS does not search for all branches relative to full tree search MCTS simulates the sequence of random actions and calculates the empirical average so that we can obtain more realisic behavioral values Exploration of new branches is guided by upper con fi dence bound for trees UCT which is a popular acquisition function When maximizing UCT a branch aifrom v is selected The formula of UCT is represented as follows UCT v ai ri ni c lnt ni 2 where riand niare the cumulative rewards and the number of accessing of airespectively Besides c indicates the exploration parameter and t represents the number of access node v In the Eq 2 the fi rst term biases a branch with high cumulative rewards i e exploitation while the second encourages the solver to evaluate the undeveloped branches i e exploration C Improvement of Bayesian Optimization As an approximation method Bayesian optimization aims to obtain the optimal hyper parameters which can be used to solve non convex problems Hyper parameters are set before the learning process It is clear that optimizing them can improve the performance and effect of learning The basic contents of BO is that the posterior distribution of the objective function is estimated by Bayesian theorem based on the data then the superparametric combination of sampling is selected according to this distribution That is there is a funcion f IR and we need to fi nd the following formula in X x argmin x X f x 3 However classical BO is complex and it need to cost much time for seeking the optimum value Hence the improvement of BO is necessary so that the optimum value can be found quickly The improvement of BO is proposed based on offl ine approximation approach To start with the upper bound of optimum value can be fi xed by FIB 29 The FIB contains a fi xed set of vector A where each element A is relevant to an action so that it assigns a single vector to each action a A i e A A Therefore VFIB b max a A R b a o s S max AFIB s S O s o T s a s s b s 4 Compared with Eq 4 the bellman equation for vector A is favourable to FIB which can be written dowm as follows a FIB s R s a o max a FIB AFIB s S T s a s a FIB s 5 where the Eq 5 is a monotonic and contractive mapping and this vector is convergent based on value iteration Ac cordingly we can infer that V VFIB namely VFIBis an upper bound of V In the next place we can apply PBVI to ascertain the lower bound of this value In the real world it is unrealistic to consider all points in belief space Thus in order to reduce the computational diffi culty we can pay more attention to the 3304 a BCTS b optimal roadmap Fig 4 Bidirectional continuous tree search and its optimal roadmap nodes that are more likely to arrive at In the PBVI method a belief set BPBVIis given where b0 BPBVI Similarly each point b BPBVIis relevant to a vector so APBVI BPBVI Then the bellman equation for vector A can be put dowm as follows VPBVI b max a A R b a o max APBV I s S s S O s o T s a s s b s 6 where b BPBVI Because of its simplicity we introduce the belief set expansion algorithm 11 In each iteration of the algorithm a new belief point is generated based on all of the existing point in the set by forward sampling an action observation trajectory The fi nal set of the vector can be attained by using Eq 6 Unlike FIB the VPBVIcan not be convergent But the error VPBVI V is bounded 11 And as the density of belief set increases the boundary becomes tighter Hence we can infer that VPBVI V that is VPBVIis a lower bound of V In summary we can extend the approach of BO based on FIB and PBVI so that we can reduce the computational complexity of the model and search the best value quickly D Optimal Bidirectional Continuous Tree Search BCTS is a bidirectional and approximate tree search al gorithm based on UCT which can effi ciently plan trajectory for discrete POMDP It extends UCT to the planning with continuous operations so that we can not discretize the action space before path planning The traditional MCTS method partially gropes for a belief tree by selecting actions tendentiously in order to maximize the Eq 2 However the space is continuous so that we must apply a new method i e BCTS to explore the subset of action spaces It is essential to choose the subset of actions because the cumulative rewards of all subsequent branches in the belief tree can be directly affected by it Therefore the proposed method dynamically sample an action space relying on the improvement of BO We use the improvement of BO to bidirectionally select actions and apply it to deal with continuous action spaces with MCTS The proposed algorithm is named as bidirec tional continuous tree search BCTS Firstly the exploring branches are always UCT dependent i e the equation 2 Then the improvement of BO simulates the optimal route by choosing the actions legitimately Each node s stores reward Algorithm 1 Bidirectional Continuous Tree Search BCTS 1 Let Dvbe the data and be an data collection function 2 function a BCTS bs bg 1 2depthmax 3 vs NewNode bs 0 4 vg NewNode bg 0 5 for i 0 to Max BCTS iterations do 6 vls TreePolicy vs 7 vlg TreePolicy vg 8 rs Random actions from vlsuntil 1 2depthmax 9 rg Random actions from vlguntil 1 2depthmax 10 BackUp vls rs 11 BackUp vlg rg 12 end for 13 a s action from vs with r s 14 a g action from vg with r g 15 end function 16 function v TREEPOLICY v 17 while depth v 1 2depthmax do 18 if length Dv Amaxthen 19 for t 1 2 do 20 argmax Dv 21 Generate trajectory with RBFNN 22 r o simulate 23 Augment Dvwith r 24 end for 25 Collect r o and Dv 26 Update b with o and bvwith Dv 27 return v NewNode b r 28 else 29 v BestChild v 30 end if 31 end while 32 end function 33 function BACKUP v r 34 while v

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论