




已阅读5页,还剩3页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Uncertainty-Aware Imitation Learning using Kernelized Movement Primitives Jo ao Silv erio1,2, Yanlong Huang2, Fares J. Abu-Dakka2,3, Leonel Rozo4and Darwin G. Caldwell2 AbstractDuring the past few years, probabilistic ap- proaches to imitation learning have earned a relevant place in the robotics literature. One of their most prominent features is that, in addition to extracting a mean trajectory from task demonstrations, they provide a variance estimation. The intuitive meaning of this variance, however, changes across different techniques, indicating either variability or uncertainty. In this paper we leverage kernelized movement primitives (KMP) to provide a new perspective on imitation learning by predicting variability, correlations and uncertainty using a single model. This rich set of information is used in combination with the fusion of optimal controllers to learn robot actions from data, with two main advantages: i) robots become safe when uncertain about their actions and ii) they are able to leverage partial demonstrations, given as elementary sub-tasks, to optimally perform a higher level, more complex task. We showcase our approach in a painting task, where a human user and a KUKA robot collaborate to paint a wooden board. The task is divided into two sub-tasks and we show that the robot becomes compliant (hence safe) outside the training regions and executes the two sub-tasks with optimal gains otherwise. I. INTRODUCTION Probabilistic approaches to imitation learning 1 have wit- nessed a rise in popularity during the past few years. They are often seen as complementing deterministic techniques, such as dynamic movement primitives 2, with more complete descriptions of demonstration data, in particular in the form of covariance matrices that encode both the variability and correlations in the data. Widely used approaches at this level include Gaussian mixture models (GMM), popularized by the works of Calinon (e.g. 3) and more recently, proba- bilistic movement primitives 4 and kernelized movement primitives (KMP) 5. In recent work 6, 7, we discussed a fundamental difference between the type of variance encapsulated by the predictions of classical probabilistic techniques, particularly Gaussian mixture regression (GMR) and Gaussian process regression (GPR) 8. We showed that the variance pre- dicted by these two techniques has distinct, complementary 1Idiap Research Institute, CH-1920 Martigny, Switzerland (e-mail: joao.silverioidiap.ch). 2Department of Advanced Robotics, Istituto Italiano di Tecnologia, 16163 Genova, Italy (e-mail: name.surnameiit.it) 3Intelligent Robotics Group, EEA, Aalto University, FI-00076 Aalto, Finland (e-mail: fares.abu-dakkaaalto.fi ). 4 Bosch Center for Artifi cial Intelligence, 71272 Renningen, Germany (e- mail:leonel.rozo). Jo ao Silv erio is partially supported by the CoLLaboratE project (https:/collaborate-project.eu/), funded by the EU within H2020-DT-FOF- 02-2018 under grant agreement 820767. Fares J. Abu-Dakka is partially supported by CHIST-ERA project IPALM (Academy of Finland decision 326304) Fig. 1: Gaussian mixture regression (GMR) and Gaussian process regression (GPR) provide complementary notions of variance (represented as green and red shaded areas) as variability and absence of training datapoints (depicted as black dots). With a unifi ed technique, robots can learn controllers that are modulated by both types of information. interpretations. In particular that GMR predictions measure the variability in the training data, while those of GPR quantify the degree of uncertainty, increasing as one queries a model farther away from the region where it was trained. These properties are illustrated in Fig. 1. This fi nding led us to inquire: is there a probabilistic technique that can simultaneously predict both variability and uncertainty? Are these two notions compatible and unifi able into a single imitation learning framework where they both provide clear advantages from a learning perspective? In this paper we try to answer these questions. The two types of variance have been individually lever- aged by different lines of work. For instance, variability and data correlations (encapsulated in full covariance matrices) have been used to modulate control gains in several works 3, 9, 10, 11. Uncertainty, in the sense of absence of data/information, is also a concept with tradition in robotics. Problems in robot localization 12, control 13 and, more recently, Bayesian optimization 14, leverage uncertainty information to direct the robot to optimal performance. In 7 we took advantage of uncertainty to regulate robot stiffness, in order to make it compliant (and safer) when uncertain about its actions. However, to the best of our knowledge, variability and uncertainty have never been exploited simul- taneously in imitation learning. In this paper we introduce an approach that predicts variability, correlations and uncertainty from KMP and uses this information to design optimal controllers from demonstrations. These drive the robot with high precision when the variability in the data is low (while respecting the observed correlations across degrees of freedom) and render the robot compliant (and safer to interact with) when the uncertainty is high. The uncertainty is further leveraged by the robot to know when different controllers, responsible for the execution of separate, elementary sub-tasks, should be activated. In particular we: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE90 1) demonstrate that KMP predicts full covariance matri- ces and uncertainty (Sections III and IV-A) 2) exploit a linear quadratic regulator (LQR) formulation that yields control gains which are a function of both covariance and uncertainty (Section IV-B) 3) dovetail 1), 2) with the concept of fusion of controllers 6 which allows for demonstrating one complex task as separate sub-tasks, whose activation depends on individual uncertainty levels (Section IV-C) Experimentally, we expand on a previously published robot-assisted painting scenario and validate the approach using a KUKA LWR where different types of controllers are used for individual sub-tasks (Section V). We provide a discussion of the approach and the obtained results in Section VI and concluding remarks and possible extensions in Section VII. II. RELATEDWORK Most probabilistic regression techniques provide variance predictions in some form. GMR, relying on a previously trained GMM, computes full covariance matrices encoding the correlation between output variables. However, it does not measure uncertainty, defaulting to the covariance of the closest Gaussian component when a query point is far from the model. GPR, despite estimating uncertainty, assumes constant noise therefore not taking the variability of the out- puts into account. Heteroscedastic Gaussian Processes (HGP) 15, 16 introduce an input-dependent noise model into the regression problem. Nonetheless, tasks with multiple outputs require the training of separate HGP models, thus output correlations are not straightforward to learn in the standard formulation. In addition, the noise is treated as a latent function, hence each HGP depends upon the defi nition of two Gaussian processes (GP) per output, scaling poorly with the number of outputs. In 17, Choi et al. propose to use mixture density networks (MDN) in an imitation learning context to predict both variability and uncertainty. The main drawback of the approach, similarly to HGP, is that outputs are assumed to be uncorrelated. Moreover, in 17 only the uncertainty is used in the proposed imitation learning framework, without considering variability. As opposed to the aforementioned works, we here show that KMP predicts both full covariance matrices and a diagonal uncertainty matrix, parameterized by its hyperparameters, allowing the access to all the desired information. Table I details the differences between variance predictions of different algorithms, highlighting that KMP estimates all desired features in our approach. In terms of estimating optimal controllers from demonstra- tions, previous works have either exploited full covariance matrices encoding variability and correlations 3, 9, 10 or diagonal uncertainty matrices 7. While the former are aimed at control effi ciency, by having the robot apply higher control efforts where required (depending on variability), the latter target safety, with the robot becoming more compliant when uncertain about its actions. The LQR we propose in Section IV-B is identical to the one in 3, 7, 11. However, by benefi ting from the KMP predictions, it unifi es the best Types of prediction VariabilityUncertaintyCorrelations GMM/GMR 3XX GPR 8X HGP 15, 16XX MDN 17XX Our approachXXX TABLE I: (Co)variance predictions of different techniques. of the two approaches. Umlauft et al. 18 propose a related formulation where, using Wishart processes, they build full covariance matrices with uncertainty. However their solution requires a very high number of parameters, whose estimation relies heavily on optimization, and their control gains are set heuristically. Finally, inspired by 19, in 6 we proposed a fusion of controllers to allow robots to smoothly switch between sub- tasks based on the uncertainty of each sub-tasks controller. Here we go one step further and consider optimal controllers learned from demonstrations into the fusion, instead of manually defi ning the control gains. In previous work 11, we have studied the fusion of optimal controllers. However, in that case we focused on time-driven trajectories whereas here we consider multi-dimensional inputs and uncertainty. The approach described in the next sections therefore aims at a seamless unifi cation of concepts exploited in previous work, taking imitation learning one step ahead into the learning of optimal controllers for potentially complex tasks. III. KERNELIZEDMOVEMENTPRIMITIVES We consider datasets comprised of H demonstrations with length T, t,h I ,t,h O T t=1Hh=1 where I RDIand O RDOdenote inputs and outputs (I,O are initials for input and output), respectively, and DI,DOare their dimensions. Ican represent any variable of interest to drive the movement synthesis (e.g., time, object/human poses) and Oencodes the desired state of the robot (e.g., an end- effector position, a joint space confi guration). KMP assumes access to an N-dimensional probabilistic trajectory distribu- tion n I, n, nN n=1mapping a sequence of inputs to their corresponding means and covariances, which encompass the important features in the demonstration data. This probabilis- tic reference trajectory can be obtained in various ways, for example by computing means and covariances empirically at different points in a dataset or by using unsupervised clustering techniques. Here we follow the latter direction, in particular by using a GMM to cluster the data and GMR to obtain the trajectory distribution that initializes KMP (done once after data collection). Byconcatenatingthetrajectorydistributioninto = 1 . N and = blockdiag(1,.,N), KMP predicts a new Gaussian distribution at new test points I according to 5 O= k (K + 1)1, (1) O= N 2 ?k k(K + 2)1)k ?. (2) 91 where K = k(1 I, 1 I) k(1 I, N I) . . . . . . . k(N I, 1 I) k(N I, N I) (3) is a matrix evaluating a chosen kernel function k(.,.) at the training inputs, k= h k( I, 1 I) . k( I, N I) i and k= k( I, I). Moreover, k( i I, j I) = k( i I, j I)IDO. Hyperparameters 1,2are regularization terms chosen as to constrain the magnitude of the predicted mean and co- variance, respectively. The kernel treatment implicit in (1)- (2) assumes the previous choice of a kernel function that depends on the characteristics of the training data. We here consider the squared-exponential kernel k(i I, j I) = 2 fexp ? 1 l |i I j I| 2 ? ,(4) a common choice in the literature. We hence have that KMP with kernel (4) requires the defi nition of four hy- perparameters 1,2,l,2 f. Note the similarity between predictions (1)-(2) and other kernel-based techniques (e.g. GPR, HGP). The main difference is that in KMP the noise model is learned through which describes both the variability and correlations present in the data throughout the trajectory. This makes KMP a richer representation when compared to GPR or HGP, which assume either constant noise i= 2 ?IDO,i = 1,.,N (GPR) or input- dependent uncorrelated noise i= 2 ?( i I)IDO (HGP). IV. UNCERTAINTY-AWARE IMITATION LEARNING WITH KMP We now demonstrate that KMP provides an estimation of uncertainty through (2), by defaulting to a diagonal matrix completely specifi ed by its hyperparameters in the absence of datapoints (Section IV-A). In addition we propose a control framework to convert the predictions into optimal robot actions (Section IV-B) and the fusion of optimal controllers (Section IV-C). A. Uncertainty predictions with KMP In the light of the kernel treatment (2) and the exponen- tial kernel (4), both covariance and uncertainty predictions emerge naturally in the KMP formulation. While the for- mer occur within the training region, the latter arise when querying the model away from the original data. Lemma 1: The squared exponential kernel (4) goes to zero as |n I I| +,n = 1,.,N. Proof:Let us consider d=| n I I|, where n = argminn|n I I| is the index of the training point with the minimum distance to the test point I. lim d+ k( n I, I) = lim d+ 2 fexp( 1 l d2) = 0.(5) Lemma 1 extends to other popular exponential kernels, including the Mat ern kernel 8. Theorem 1: Covariance predictions (2) converge to a di- agonal matrix completely specifi ed by the KMP hyperparam- eters as test inputs I move away from the training dataset, i.e. d +. Particularly, lim d+ O= 2 f N 2 IDO.(6) Proof:FollowingfromLemma1andknow- ingthatk= h k( I, 1 I) . k( I, N I) i wehave lim d+ k= 0DONDO. Hence lim d+ O= lim d+ N 2 k.(7) Moreover we have k= k( I, I) = 2 fexp ? 1 l 0 ? = 2 fIDO, which replaced in (7) yields (6). Equation (6) plays a crucial role in our approach. It provides a principled way to know when the model is being queried in regions where data was not present during training. We leverage this information to 1) make the robot compliant when unsure about its actions and 2) let the robot know when to execute control actions pertaining to different KMPs. Moreover, through the dependence on 2 f, N and 2, one can adjust the expression of uncertainty provided by the model, through the tuning of any of those hyperparameters. For instance, increasing the length of the initialized trajectory distribution N has the effect of scaling the uncertainty. GPR offers a similar property, where the variance prediction converges to the scalar 2 f. However this is rather limiting as tuning this hyperparameter can have undesired effects on the mean prediction. In KMP, N and 2do not affect the mean prediction as they do not parameterize the kernel function. Moreover, (2) is typically robust to their choice, providing freedom for tuning while yielding proper predictions (see 5 for details). B. Computing optimal controllers from KMP We now propose to use Oto obtain variable control gains that result in a compliant robot both when the variability and uncertainty are high1. We follow the concept introduced in 9 and formulate the problem as a LQR. Let us consider linear systems t= At+But, where t, t RNSdenote the system state at time t and its fi rst-order derivative (NSis the dimension of the state) and ut RNCis a control com- mand, where NCdenotes the number of controlled degrees of freedom. Moreover, A RNSNSand B RNSNC represent the state and input matrices. We will stick to task space control and hence make a simplifying assumption, in line with 3, that the end-effector can be modeled as a unitary mass, yielding a double integrator system A = ?0 I 00 ? ,B = ?0 I ? ,(8) 1In the context of movement synthesis, new inputs occur at every new time step thus we will replace by t from now on in the notation. 92 Algorithm 1 Uncertainty-aware imitation learning Initialization 1:Identify number of sub-tasks P 2:Collect demonstrations t,h,p I ,t,h,p O T t=1 H h=1 P p=1 3:Generate trajectory distributions n,p I , n,p,n,pN n=1 P p=1 4:Select hyperparameters 2 f,p,lp,1,p,2,p P p=1and Rp Movement synthesis 1:Input: Test point t I 2:for p = 1,.,P do 3:Compute t,p O ,t,p O , per (1), (2) 4:Set p t = t,p O and Qp t = (t,p O )1 5:Find optimal gains KP t,p,K V t,pand compute u p t per (13) 6:Set p t = (t,p O )1 7:end for 8:Compute utfrom (12) 9:Output: Control command ut where 0 and I are zero and identity matrices of appropriate dimension. We defi ne the end-effector state at t as its Cartesian position and velocity xt, xt, i.e. t= x t x t , and therefore utcorresponds to acceleration commands. At every time step t of a task, a KMP is queried with an input test point t I, predicting a mean tO and a covariance matrix t O . We defi net= t O, i.e. the desired state for the end-effector is given by the mean prediction of KMP. For time-driven tasks, where t I = t, a sequence of reference states t=1,.,Tcan be easily computed and an optimal control command utcan be found, minimizing c(t) = T X t=1 (t t) Q t( t t) + u tRtut, (9) where Qtis a NS NS positive semi-defi nite matrix that determines how much the optimization penalizes deviations from tand Rtis an NCNC positive-defi nite matrix that penalizes the magnitude of the cont
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 项目二:果实处理与留种储藏说课稿-2025-2026学年小学劳动皖教版四年级上册-皖教版
- 2025年新能源汽车高压系统电气安全防护技术产业技术创新与发展报告
- 零售门店数字化运营:2025年智能货架与商品展示效果优化报告
- 四级数学百科知识竞赛题及答案
- 2025年电工入场考试试题及答案
- 汽车专业应试题库及答案
- 英语卷子考试题库及答案
- 气象问答知识竞赛题及答案
- DB65T 4387-2021 天然彩色棉花颜色测量与分级方法
- DB65T 4379-2021 水稻主要病虫害绿色防控技术规程
- 聚合物成型的理论基础课件
- 周口市医疗保障门诊特定药品保险申请表
- 灭火器每月定期检查及记录(卡)表
- 校园物业考评表
- 千米、分米和毫米的认识单元备课
- 人工智能(AI)在人力资源领域的应用与展望
- GB∕T 29169-2012 石油天然气工业 在用钻柱构件的检验和分级
- 重大医疗事件报告及处理制度
- 公铁两用大桥连续刚构专项施工测量实施方案
- 爆破作业人员培训考核题库
- 构造地质学03章-地质构造分析的力学基础
评论
0/150
提交评论