IROS2019国际学术会议论文集0627

上传人：我*** IP属地：北京上传时间：2020-06-04 格式：PDF 页数：8 大小：564.16KB 积分：12 举报 版权申诉

已阅读5页，还剩3页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Data Association Aware Semantic Mapping and Localization via a Viewpoint-Dependent Classifi er Model Vladimir Tchuiev, Yuri Feldman and Vadim Indelman AbstractWe present an approach for localization and semantic mapping in ambiguous scenarios by incrementally maintaining a hybrid belief over continuous states and discrete classifi cation and data association variables. Unlike existing incremental approaches, we explicitly maintain data association components over time, which allows us to deal with perceptual aliasing. Crucially, we utilize a viewpoint-dependent classifi er model over rich classifi er outputs and leverage the coupling be- tween poses and semantic measurements both for disambiguat- ing data association and in pose estimation. We demonstrate in simulation that incorporating semantic measurements with a viewpoint-dependent classifi er model enhances disambiguation of both data association and localization over usage of only geometric measurements or viewpoint independent models, further contributing to the tractability of the approach in practice, and providing better estimates. I. INTRODUCTION Localization and mapping in unknown and uncertain envi- ronments is a fundamental capability in robotics, with numer- ous applications, including search and rescue, autonomous car navigation, indoor navigation, and surveillance. The cor- responding problem is known as simultaneous localization and mapping (SLAM) and has been extensively investigated in the last two decades, e.g. see a recent review 1. Seman- tic perception and object-based SLAM have been actively investigated by the research community. In particular, object- based SLAM reasons about much fewer landmarks with richer information than geometric SLAM, allowing for faster computation and assistance in data association of features and objects between images. One of the key challenges in SLAM is a reliable and robust operation in perceptually aliased environments. Data association (DA) is particularly diffi cult in these scenarios, as the measurement information can be interpreted in mul- tiple ways, and DA errors may lead to critically incorrect estimations. Existing approaches maintain data association hypotheses, which is a computationally diffi cult problem on its own. Semantic information can assist in disambiguation of DA hypotheses. However, existing approaches that uti- lize semantic observations for DA disambiguation typically consider only most likely class measurements (e.g. 2). Yet, V.TchuievandV.IndelmanarewiththeDepartmentof AerospaceEngineering,Technion-IsraelInstituteofTechnology, Haifa 32000, Israel. Y. Feldman is with the Department of Com- puterScience,Technion-IsraelInstituteofTechnology,Haifa 32000, Israel. vovatch, vadim.indelmantechnion.ac.il, yurifcs.technion.ac.il. This work was partially supported by the Israel Ministry of Science this approach performs batch inference using expectation maxi- mization (EM). Bowman et al. 19 utilizes most likely class and bounding box measurements, in addition to geometric measurements, to perform SLAM and DA disambiguation using EM as well. The above approaches consider only the most likely class and do not reason about viewpoint depen- dency of classifi cation results. In contrast, we utilize richer classifi er output in conjunction with a viewpoint dependent model to perform object level SLAM, while maintaining classifi cation and DA hypotheses. III. NOTATIONS ANDPROBLEMFORMULATION Consider a robot operating in a partially known environ- ment containing different, possibly perceptually similar or identical, objects. The robot aims to localize itself, and map the environment geometrically and semantically while rea- soning about ambiguous data association (DA). We consider a closed-set setting where each object is assumed to be one of M classes. Moreover, in this work we consider the number of objects in the environment is known. The objects are assumed to be stationary. Let xkdenote the robots camera pose at time k; let xo n and cnrepresent the n-th object pose and class, respec- tively. We denote the set of all object poses and classes by Xo . = xo 1,.,xoN and C . = c1,.,cn. To shorten nota- tions, denote Xk . = x0:k,Xo. Further, we denote the data association realization at time k as k: given nkobject observations at time k, k Rnk; each element in kcorresponds to an object observation, and is equal to an objects identity label. For example, if at time k the camera observes 2 objects with hypothesized identity labels 4 in observation 1 and 6 in observation 2, then k= k,1,k,2T R2, and k,1= 4 and k,2= 6. Denote Zk . = zk,1,.,zk,nk as the set of nkmeasurements at time k, and akas the robots action at time k. Each measurement zk,i Zkconsists of two parts: a geometric part zgeo k,i , e.g. range or bearing measurements to an object, and a semantic part zsem k,i . The set of all geometric measurements for time k is denoted Zgeo k , and similarly for semantic measurements Zsem k , such that Zk= Zgeo k Zsem k . We assume the geometric and semantic measurements are independent from each other. In addition, we assume independence between measurements at different time steps. Weconsiderstandardmotionandgeomet- ricobservationGaussianmodels,suchthat P(xk+1|xk,ak) = N(f(xk,ak),w) and P(zgeo k |xk,xo) = N(hgeo(xk,xo),geo v ).Theprocessandgeometric measurementcovariancematrices,wandgeo v ,as well as the functions f(.),hgeo(.) are assumed to be known. For the semantic measurements, we utilize a (deep learn- ing) classifi er that provides a vector of class probabilities where zsem k,i . = P(ci|Ik,i) given sensor raw observation Ik,i, e.g. an image cropped from a bounding box of a larger image taken by the camera of object i at time k. To simplify notations we drop index i, as the measurements, both semantic and geometric, apply to each bounding box. Thus, zsem k RMwith zsem k . = P(c = 1|Ik)P(c = M|Ik)T.(1) A crucial observation, following 15, is that zsem k is de- pendent on the cameras pose relative to the object (see Fig. 1). In this work we contribute an approach that leverages this coupling to assist in inference and data association disambiguation. 7737 01020304050 X coordinate m 0 10 20 30 40 50 Y coordinate m 123 0.0 0.5 1.0 Class Probability 123 0.0 0.5 1.0 Class Probability 123 0.0 0.5 1.0 Class Probability 123 0.0 0.5 1.0 Class Probability Fig. 1: A classifi er observing an object from multiple viewpoints will produce different classifi cation scores for each viewpoint. Specifi cally, we model this dependency via a classifi er model P(zsem k |c = m,xk,xo ). The classifi er model repre- sents the distribution over classifi er output, i.e. class proba- bility vector zsem k , when an object with a class hypothesis m is observed from relative pose xo? xk. Note that for M classes we require M classifi er models, one for each class. The model can be represented with a Gaussian Process (see 14, 15) or a deep neural network (see 16). In this work, we use a Gaussian classifi er model, given by P(zsem k | c,xk,xo) = N(hc(xk,xo),c(xk,xo),(2) where the viewpoint-dependent functions hc(xk,xo) and c(xk,xo ) are learned offl ine. Note that unlike 14, 15 we do not model correlations in classifi er scores among viewpoints. Conversely, we do not assume data association is known. We assume a prior on initial camera and object poses, x0and Xorespectively, and class realization probability P(C). For simplicity, we assume independent variable priors (although this assumption is not required by our approach, and is not true in general, as e.g. some objects are more likely to appear together than others), thus we can write the prior as follows: P(x0,Xo,C) = P(x0) N Y i=1 P(xo i)P(ci). (3) In this paper we use a Gaussian prior for the continuous variables, and uninformative (uniform) prior for the object classes. Problem formulation: We aim to effi ciently maintain the following hybrid belief P(Xk,C,1:k| Hk),(4) with history Hk . = Z1:k,a0:k1. The belief (4) is both over continuous variables, i.e. robot and object poses Xk (continuous variables), and over discrete variables, i.e. object classes C and data association hypotheses thus far, 1:k. In the following, we incorporate a viewpoint-dependent clas- sifi er model and develop a recursive formulation to update that hybrid belief with incoming information captured by the robot as it moves in the environment. IV. APPROACH In this section we develop a recursive scheme to compute and maintain the hybrid belief from Eq. (4). We start by factorizing using the chain rule as P(Xk,C,1:k|Hk)=P(Xk|C,1:k,Hk) |z bXkC 1:k P(C,1:k|Hk) |z wC 1:k , (5) wherebXkC 1:k . = P(Xk| C,1:k,Hk)isthe conditionalbeliefoverthecontinuousvariables,and wC 1:k . = P(C,1:k| Hk) is the marginal belief over the discrete variables, and can be considered as the conditional beliefweight.Thus,eachrealizationofthediscrete variables, i.e. data association and class hypotheses, has its own probability (weight) and gives rise to a different belief over the continuous variables. Moreover, the factorization (5) facilitates computation of marginal distributions that are of interest in practice. In particular, the posterior over robot and object poses can be calculated via P(Xk| Hk) = X 1:k X C wC 1:kbXk C 1:k, (6) while the marginal distributions over object classes and data association hypotheses are given by P(C | Hk)= X 1:k wC 1:k, (7) P(1:k| Hk)= X C wC 1:k. (8) The posterior P(Xk| Hk) in Eq. (6) is a mixture belief that accounts for all hypotheses regarding data association and classifi cation. Without semantic observations, our approach degenerates to passive DA-BSP. The term P(C | Hk) is the distribution over classes of all objects while accounting for both localization uncertainty and ambiguous data associa- tion. As such, it is important for robust semantic percep- tion. Finally, the posterior over data association hypotheses, P(1:k| Hk) accounts for all class realizations for all objects. Next, we derive a recursive formulation for calculating the continuous and marginal distributions in the factorization (5). As will be seen, semantic observations along with the viewpoint-dependent classifi er model (2) impact both of the terms in the factorization (5), and as a result assist in inference of robot and objects poses (via Eq. (6) and helps in disambiguation between data association realizations (via Eq. (8). Furthermore, as discussed in Sec. 4.D, while the number of objects classes and data association hypotheses (number of weights wC 1:k) is intractable, in practice many of these are negligible and can be pruned. A. Conditional Belief Over Continuous Variables: bXkC 1:k Using Bayes law we get the following expression: bXkC 1:k P(Xk| C,1:k,Hk) P(Zk| Xk,C,k) P(Xk| C,1:k1,H k), (9) 7738 where H k . = Z1:k1,a0:k1, the normalization constant is omitted as it does not depend on Xk, and kis dropped in the second term because it refers to association of Zkwhich is not present. The expression P(Zk| Xk,C,k) in Eq. (9) is the joint measurement likelihood for all geometric and semantic ob- servations obtained at time k. Given classifi cations, asso- ciations and robot pose at time k, history H k and past associations 1:k1can be omitted. The joint measurement likelihood can be explicitly written as P(Zk| Xk,C,k) = nk Y i=1 P(zgeo k,i | xk,xo k,i) P(z sem k,i | xk,xo k,i,ck,i), (10) where xo k,i and ck,iare the object pose and class corre- sponding to the measurement respectively, given DA real- ization kand nkthe number of measurements obtained at time k as before. Note that the viewpoint-dependent semantic measurement term above P(zsem k,i | xk,xo k,i,ck,i) couples between semantic measurement and robot pose relative to object, making it useful for inference of both. The term bXkC 1:k1 . =P(Xk| C,1:k1,H k) in Eq. (9) is the propagated belief over continuous variables, which, using chain rule, can be written as bXkC 1:k1 = P(xk|xk1,ak1)bXk1C 1:k1. (11) Overall, the conditional belief (9) can be represented as a factor graph (Kschischang et al. 20). Note that each realization of 1:khas a different factor graph topology (observation factors are affected, motion model factors are not). For a fi xed 1:kwith different class assignments C the corresponding conditional belief factor graph topology remains the same (geometric and semantic observation fac- tors connect the same nodes), but semantic factors change, according to class models. Fig. 2 presents an example for 2 factor graphs, in which k = 2, N = 2, and the DA hypothesis is that at time k = 1 the camera observes object 1 for the fi rst graph and 2 for the second, at time k = 2 the camera observes objects 1 and 2 for both graphs. To effi ciently infer Xkfor every realization of C and 1:k, state of the art incremental inference approaches, such as iSAM2 21 can be used. The joint posterior from Eq. (4) can thus be maintained following Eq. (5) as a set of continuous beliefs bXkC 1:k conditioned on the discrete variables 1:kand C each represented with a factor graph, along with their corresponding component weights wC 1:k, describing the marginal belief over discrete variables. In the next section, we describe how the latter can be calculated. B. Marginal Belief Over Discrete Variables: wC 1:k To compute the DA and class realization weight wC 1:k we marginalize over all continuous variables: wC 1:kP(C,1:k| Hk) = Z Xk P(Xk,C,1:k|Hk)dXk. (12) x0 x1x2 xo 1 xo 2 (a) x0 x1x2 xo 1 xo 2 (b) Fig. 2: A toy example for two factor graphs in our approach, each for a different data association realization. The edges that connect between camera poses correspond to motion model P(xk|xk1,ak1). The edges that connect directly between camera and object poses correspond to the measurement model (10) for both semantic and geometric measurements. Thus a viewpoint-dependent semantic measurement model results in geometric constraints on robot-to-object relative pose. Using Bayes law, we can expand the above as follows: P(Xk,C,1:k| Hk)= P(Zk| Xk,C,1:k) P(Xk,C,1:k|H k) (13) where = P(Zk| H k) 1 is a normalization constant and the joint measurement likelihood P(Zk| Xk,C,1:k) can be explicitly written as in Eq. (10). We further expand P(Xk,C,1:k|H k) using chain rule: P(Xk,C,1:k|H k) = P(k|1:k1,Xk,C,H k) P(xk|xk1,ak1) P(Xk1,C,1:k1| Hk1), (14) where: P(Xk1,C,1:k1|Hk1) = wC 1:k1 bXk1C 1:k1. (15) Eq. (15) is the prior belief calculated at time k 1 and represented as a continuous belief component along with corresponding weight as described above. The term P(k|1:k1,Xk,C,H k ) from Eq. (14) is the object obser- vation model that represents the probability of observing a scene given a hypothesis of camera and object poses. In this paper we use a simple model that depends only on camera and object poses at current time step, thus it can be written as P(k| xk,Xo k), where X o k . = xo k,i nk i=1. If the model predicts observation of all objects corresponding to kthen P(k| xk,Xo k) is equal to a constant, otherwise it is zero. Plugging the above into Eq. (12) yields a recursive rule for calculating component weights at time k wC 1:k = R Xk P(Zk|Xk,C,k) P(k|xk,Xo k) bXkC 1:k1w C 1:k1 dXk. (16) The normalization constant (from Eq. (13) does not depend on variables and cancels out when weights are normalized to sum to 1. It is therefore dropped out in subsequent calculations. Note that the realization weight from the previous time step wC 1:k1 is independent from Xk, and thus can be taken out of the integral. Recalling Eq. (10), the continuous variables participating in P(Zk| Xk,C,1:k) are xkand Xo k. Those variables are participating also in P(k| xk,xo k). As b XkC 1:k1 is Gaussian, all other continuous variables can be marginalized easily. On the other hand, xkand Xo k must be sampled because of the object observation model P(k| xk,Xo k), which is commonly not Gaussian. If the observation model predicts that the objects will not be observed for most of the samples, then wC 1:k 7739 will be small and likely to be pruned. We can express the realization weight as follows: wC 1:k wC 1:k1 ZZ xk,Xo k P(Zk| xk,Xo k,C,1:k) P(k| xk,Xo k) b xk,Xo k C 1:k1 dxkdXo k, (17) where: bxk,Xo k C 1:k1 . = P(xk,Xo k|C,1:k1,H k) = Z X k xk,Xo k bXkC 1:k1 d ?X k xk,Xo k ? . (18) The viewpoint dependent classifi er model contributes to data association disambiguation by acting as reinforcement or contradiction to the geometric model. If both agree on the poses hypothesis, wC 1:k will be large relative to cases where both disagree. Next, we provide an overview of the inference scheme, then address computational aspects. C. Overall Algorithm The proposed scheme is outlined in Alg. 1. For every time step we are input the prior belief P(Xk1,C,1:k1| Hk1) representedfollowingEq.(5)asasetofweights wC 1:k1 . = P(C,1:k1| Hk1) and corresponding con- tinuous (Gaussian) belief components bXk1C 1:k1 . = P(Xk1| C,1:k1,Hk1). In our implementation we main- tain a separate factor graph for each such component. We also obtain an action ak1and observations Zk, sepa- rated into geometric Zgeo k , and semantic Zsem k . We prop- agate each prior belief component using the motion model P(xk| xk1,ak1) (step 3). Each component then splits to a number of subcomponents, one for each possible assign- ments of data associations kat current time (generally a vector of length nk). Procedure PropWeights at step 5 computes the normalized weight of each subcomponent via Eq. (17) as a product of the component (prior) weight wC 1:k1 with an update term comprising the measureme

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

IROS2019国际学术会议论文集0627

文档简介

温馨提示

最新文档

评论

IROS2019国际学术会议论文集0627

文档简介

温馨提示

最新文档

评论

相关文档