IROS2019国际学术会议论文集 1191_第1页
IROS2019国际学术会议论文集 1191_第2页
IROS2019国际学术会议论文集 1191_第3页
IROS2019国际学术会议论文集 1191_第4页
IROS2019国际学术会议论文集 1191_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Adaptive Assist as needed Control Based on Actor Critic Reinforcement Learning Yufeng Zhang Student Member IEEE Shuai Li Student Member IEEE Karen J Nolan and Damiano Zanotto Member IEEE Abstract In robot assisted rehabilitation assist as needed AAN controllers have been proposed to promote subjects active participation which is thought to lead to better training outcomes Most of these AAN controllers require a patient specifi c manual tuning of the parameters defi ning the under lying force fi eld which typically results in a tedious and time consuming process In this paper we propose a reinforcement learning based impedance controller that actively reshapes the stiffness of the force fi eld to the subject s performance while providing assistance only when needed This adaptability is made possible by correlating the subject s most recent per formance to the ultimate control objective in real time In addition the proposed controller is built upon action dependent heuristic dynamic programming using the actor critic structure and therefore does not require prior knowledge of the system model The controller is experimentally validated with healthy subjects through a simulated ankle mobilization training session using a powered ankle foot orthosis IndexTerms Assist as neededcontroller robot assisted training reinforcement learning wearable robotics rehabili tation robotics I INTRODUCTION In the past decade the rapid advancement of robotics led to a family of rehabilitation strategies known as patient cooperative or assist as needed controllers 1 2 which incorporate conventional physical therapy principles to mod ulate the robot s assistance In the case of gait rehabilitation early controllers forced patients to follow a predefi ned gait trajectory recorded from healthy individuals 3 However stiff position control may discourage patients active par ticipation and decrease the perceived movement error both of which can adversely affect the training outcome 4 5 Several recent studies on robot assisted gait training have shown improved motor outcomes by adopting compliant con trollers 1 and AAN training paradigms 6 10 based on impedance control IC which aim at encouraging subject s active participation by providing assistance only when they cannot complete the target motion on their own The MIT MANUS pioneered the use of IC in rehabilitation robotics 11 The Anklebot used a programmable IC to treat foot drop 12 Emken etal proposed an AAN controller that reduces the assistive force to zero when the movement error is suffi ciently small 2 The AAN controller used in 8 Corresponding author Y Zhang S Li and D Zanotto dzanotto stevens edu are with the Wearable Robotics Systems WRS Lab Stevens Institute of Technology Hoboken NJ 07030 USA K J Nolan is with Human Performance and Engineering Research Kessler Foundation West Orange NJ 07052 USA and Rutgers NJMS Newark NJ 07103 USA 10 generates corrective forces when the user s foot does not follow a desired trajectory within a certain tolerance margin set by a virtual tunnel Although adjustable the stiffness of these conventional AAN controllers cannot be automatically modulated in real time based on the user performance Yet patients ability and mobility substantially varies from one individual to another and their adaptability to new training protocols may greatly differ based on the type of impairment and the dosage of rehabilitation one has already received 13 Therefore manual adjustments are often required to appropriately tune conventional AAN controllers to a patient s level of performance which may result in a tedious and time consuming process Learning based controllers are capable of automatically adapting to different individuals In human augmentation exoskeletons recent research has focused on personalizing assistance parameters using machine learning based algo rithms including gradient descent 14 evolution strategies 15 or Bayesian optimization 16 In terms of robot assisted rehabilitation the adaptive AAN controller with force decay term introduced in 7 can modify the level of robot assistance based on an individual s performance by learning a model of the patient s impairment using radial basis functions The adaptive IC developed in 17 uses an inverse dynamics model to estimate the active torque contributions from the human joints and seeks to encourage participation by adapting the stiffness of the IC Perez Ibarra et al introduced an optimal adaptive control wherein a cost function consisting of robot assisted force and patient motion error were derived and minimized in order to compute an appropriate stiffness that balances the two terms 18 In recent years reinforcement learning RL control a subclass of learning based control methods has been applied to assistive exoskeletons 19 21 and prostheses 22 The RL framework used in 19 learns the sensitivity factor of the system model in order to reduce physical human robot interactions whereas the adaptive IC in 20 seeks to minimize motion tracking errors and reduce human effort Obayashi et al proposed a novel AAN controller that uses a RL algorithm to adjust the stiffness of the IC in order to help subjects learn a dart throwing task 23 The goal oriented behavior of a RL agent 24 makes it a convenient candidate for rehabilitation applications where patients good performance and the effective control policy need to be remembered and rewarded However the feasibility of the RL framework in robot assisted rehabilitation has not been investigated to date 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE4066 In this paper we propose a RL based AAN controller that is designed for rehabilitation purposes It employs the actor critic AC structure to evaluate subject performance and optimize system control decisions The ultimate objective value used in the action network is continuously modulated to steer the system control goal towards a direction that improves subject performance in terms of tracking a target trajectory while balancing the amount of assistance being provided Additionally the implementation of action depen dent heuristic dynamic programming ADHDP eliminates the need for any system model To validate the feasibility of the proposed controller a powered ankle foot orthosis 25 was used to help a small group of healthy subjects overcome a virtual impariment induced by ankle weights through robot assisted ankle mobilization exercises 26 We address the rationale of the selected motor task intro duce the apparatus and formalize the controller framework in Section II Experimental results are presented in Section III and discussed in Section IV II METHODS A Motor Task Abnormal ankle stiffness or spasticity are often treated with physiotherapeutic interventions such as stretching 27 28 and joint mobilization training 26 29 both of which aim at increasing ankle range of motion RoM The latter approach also known as plantarfl exion dorsifl exion training has been carried out both manually 26 and with a robot assisting the wearer through a sequence of sinusoidal ankle motions 29 In this study we adopt a similar strategy to train healthy subjects to overcome a virtual ankle impairment using a powered ankle foot orthosis that operates under an RL based AAN controller Ankle weight was added to the subjects dorsal foot to disturb normal ankle motor function A reference sinusoidal trajectory together with the measured ankle position were displayed on a screen as visual feedback B Apparatus A cable driven ankle foot orthosis dubbed Stevens Ankle Foot Electromechanical SAFE orthosis was used to pro vide assistance on subject s right ankle The SAFE orthosis is built off a modifi ed articulated orthosis Two Bowden cables anchored to the posterior and anterior sides of the orthosis form an antagonistic actuation mechanism and provide active assistance on both ankle plantarfl exion and dorsifl exion Two BLDC motors EC45 Maxon Sachsln Switzerland placed on an off board actuation platform are used to actuate individual cables A digital encoder is mounted on a 3D printed holder atop the lateral malleolus to monitor the joint angle Two load cells LSB200 Futek Irving CA USA connected in line with the two Bowden cables monitor the applied forces and close the inner force control loop Data acquisition and high low level control are both implemented in a myRIO board National Instruments Austin TX The low level torque control loop runs at 400 Hz and the high level AAN controller runs at 100 Hz More details about the SAFE orthosis can be found in 25 Adaptive IC Active Orthosis t PIDActuators m t m t r t Actor Critic Evaluation S k r k IC stiffness curve d Fig 1 Control scheme of the RL AAN controller Inset modulation of stiffness profi les as a function of d arrows indicate increasing d values for a given max C Formalization of the Actor Critic AAN Controller Figure 1 shows the block diagram of the proposed RL based AAN controller RL AAN The AC structure takes in the evaluated system states and rewards and optimizes the current policy through iterative approximation of the Bellman equation We defi ne the IC law as max 1 exp d 2 1 in which and maxindicate the output torque and the a priori bound of the IC respectively 30 31 The stiffness curve of the IC is shaped by parameter d as shown in the inset of Fig 1 is the error between reference trajectory r and the actual trajectory m In conventional IC stiffness is often manually tuned by a human expert 32 or simply set to constant value 30 Conversely the RL AAN can modulate the stiffness profi le of the force fi eld to continuously engage subject participation while also seeking to minimize the error In the following the key elements of the proposed RL AAN controller are summarized 1 Markov decision process MDP A complete MDP consists of a tuple of states actions transition probabilities and reward S U P R A RL agent seeks to fi nd the optimum policy that maps S to U in order to maximize the cumulative return from immediate reward R 33 Here we consider a deterministic policy i e P is assumed uni tary 24 After each periodic motion cycle rollout the evaluation module computes the tracking errors and outputs the system states as S k ep k eb k erms k T R3 1 2 where ep k and eb k represent the tracking errors com puted at the positive and negative peak of the target trajectory during the k th cycle respectively similar to 22 and erms k indicates the root mean square RMS error in cycle k The instantaneous reward r k is built upon weighted errors with weight coeffi cient r k 1e2 p k 2e 2 b k 3e 2 rms k 3 and the infi nite horizon cost V k with a discount rate is defi ned as V k X i 0 ir k i 1 4 Based on preliminary tests 1and 2were set to 1 5 and 3 1 In order for the controller to swiftly match the vary ing control objective detailed in the following subsection 4067 we set 0 5 so that the signifi cance of cumulative return and immediate reward is balanced The critic network learns to approximate V k online and backpropagates the error signals to the action network 2 Actor critic structure An AC structure learns the action value function and policy at the same time through parameterization The action value is added as an input node to the critic network which outputs the estimated infi nite horizon cost e V k 34 The main goal of the AC algorithm is to search for the optimum action output by minimizing the temporal difference error at each cycle k k r k eV k eV k 1 5 3 Neural network structure Both critic and actor use nonlinear feedforward neural networks with one hidden layer 4 6 1 and 3 6 1 respectively Sigmoid activation functions denoted as are employed in all nodes in the hidden layers and the output layer of the actor network Critic network The e V k predicted by the critic network is parameterized as e V k W 2 c k Hc k 6a Hc k W 1 c k ST k da k T 6b where W 1 c and W 2 c are the weight matricies of the input hidden and hidden output layer respectively and da represents the action output The weight matrices in the critic network are updated following gradient descent at a rate of c W j c k 1 W j c k W j c k 7a W j c k c Ec k W j c k j 1 2 7b with the goal of minimizing the objective function Ec k 1 2 2 k 8 Actor network The primary target of the actor network is to learn an optimal policy so as to reach the principle optimality of a given problem The RL AAN algorithm monitors the subject performance as evaluated by r k and correlates it to the control objective Uc k Specifi cally Ucis updated after every cycle based on the subject performance in the last m 1 cycles according to the following law Uc k 1 Uc k if Pk i k mr k Uc k if Pk i k mr k 9 where determines the rate of change of the assistance level and sets the acceptable level of subject performance We bound Ucbetween 0 and set m 5 1 15 based on preliminary tests Because the infi nite horizon cost V relates to the discounted sum of squared tracking errors Uccan be thought of as the maximum acceptable combined error under which the controller will not step in to correct the subject s motion To this end the actor network objective function given by Ea k 1 2 e V k Uc k 2 10 Algorithm 1 RL based Assist As Needed Control 1 Initialize S 0 r 0 e V 0 da 0 k 2 Initialize Wc 0 Wa 0 with small random numbers 3 while controller is on do 4 if the k th motion cycle completed then 5 Critic Network Update 6 Evaluate S k r k using 2 3 7 Update Wc k using 5 8 8 Actor Network Update 9 Determine Uc k using 9 10 Update Wa k using 10 12 13 11 Compute da k from 11 12 Update dausing 14 13 k k 1 14 end if 15 Compute assistive torque using 1 16 end while aims to guide the e V k towards the desired Uc k through altering the actor output da which can be computed as da k H 2 a k 11a H 2 a k W 2 a k H 1 a k 11b H 1 a k W 1 a k ST k T 11c The actor weights W 1 a W 2 a are updated using the same gradient based approach as in 7 with additional infl uences from Wcthat connects to dadue to the error signal backpropagation from the critic network W 2 a k a Ea k W 2 a k 12a Ea k W 2 a k Ea k eV k eV k da k da k H 2 a k H 2 a k W 2 a k 12b W 1 a k a Ea k W 1 a k 13a Ea k W 1 a k Ea k eV k eV k da k da k H 2 a k H 2 a k H 1 a k H 1 a k H 1 a k H 1 a k W 1 a k 13b Since the actor output dais inherently bounded by the sigmoid function to 0 1 the range of the stiffness applied in the adaptive IC must be adjusted with a scale factor 35 so that the stiffness parameter d in 1 is replaced by da da 14 We chose max 5 Nm and 20 to ensure that the maximum assistance is suffi ciently strong while the minimum stiffness is negligible The online update procedure of the RL AAN controller is summarized in Algorithm 1 D Experimental Design Four male healthy subjects participated in a simulated joint mobilization exercise Participants were selected such that their right foot and shank could comfortably fi t in the SAFE orthosis 25 36 The study was approved by Stevens 4068 Institutional Review Board and all participants provided informed consent prior to testing The experimental setup is illustrated in Fig 2 A wheelchair leg rest was attached to the right anterior leg of an offi ce chair The orthosis and subject s leg were strapped down to the padding of the leg rest in order to support the calf and immobilize the knee joint The angle between the leg rest and ground was fi xed to approximately 45 for all subjects A sand bag 2lbs or 0 91 kg was added to the dorsal part of the subject s right foot to perturb the normal ankle function Subsequently the encoder of the powered orthosis was zeroed with the subject s ankle in neutral position Then each subject went through a baseline trial followed by fi ve training trials and an immediate post test All trials lasted for 2 minutes with 1 min breaks included in between trials similar to 26 During each trial subjects were asked to track with their ankle a reference sinusoidal trajectory which was displayed on a computer screen located in front of the subject as shown in Fig 2 The current ankle position was also displayed on the same screen The period of the sinusoidal trajectory was set to 1 5s amplitude was set to 15 and the offset was fi xed at 5 of plantar fl exion in order to approximately cover the ankle RoM observed during normal walking 37 During the baseline and post test trials the powered or thosis was controlled in transparent mode 38 During the fi ve training sessions instead the orthosis was controlled with either a conventional AAN controller described by 1 with fi xed stiffness or with the proposed RL AAN controller adaptive stiffness as described by 14 The value of d applied to the conventional AAN controller was chosen as d 1 The four subjects were randomly assigned to the conven tional AAN controller subjects C1 C2 or to the RL AAN controller subjects R1 R2 All subjects were instructed about the goal of the exercise but they were blinded to the type of controller used during training E Data Analysis To compare the short term training effects induced by the conventional AAN and by the adaptive RL AAN controllers we analyzed the last 60s of the baseline and post test sessions of each subject Data were segmented into cycles and three metrics were extracted per each cycle peak error ep bottom error eb and RMS error erms Separate Wilcoxon rank sum tests were run for each metric for each subject to check for signifi cant 0 05 differences between the baseline and the post test metrics All data analysis was performed using MATLAB MathWorks MA USA III RESULTS Figure 3 a shows the value of dathroughout a repre sentative trial The corresponding error metrics as well as the value of Ucfrom the same trial are illustrated in Fig 3 b The large errors at the beginning of the trial caused dato rapidly converge to a stiff value in order to help Ankle weight Fig 2 Experimental setup of the joint mobilization training

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论