IROS2019国际学术会议论文集 1544

上传人：我*** IP属地：北京上传时间：2020-04-08 格式：PDF 页数：7 大小：1014.15KB 积分：12 举报 版权申诉

已阅读5页，还剩2页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Robust Loop Closure Detection based on Bag of SuperPoints and Graph Verifi cation Haosong Yue Jinyu Miao Yue Yu Weihai Chen and Changyun Wen Fellow IEEE Abstract Loop closure detection LCD is a crucial tech nique for robots which can correct accumulated localization errors after long time explorations In this paper we propose a robust LCD algorithm based on Bag of SuperPoints and graph verifi cation The system fi rst extracts interest points and feature descriptors using the SuperPoint neural network Then a visual vocabulary is trained in an incremental and self supervised manner considering the relations between consecu tive training images Finally a topological graph is constructed using matched feature points to verify candidate loop closures obtained by a Bag of Words BoW framework Comparative experiments with state of the art LCD algorithms on several typical datasets have been carried out The results demonstrate that our proposed graph verifi cation method can signifi cantly improve the accuracy of image matching and the overall LCD approach outperforms existing methods I INTRODUCTION Simultaneous Localization and Mapping SLAM refers that a robot incrementally localizes itself and at the same time builds a map of the environment It is a crucial technology for robots to navigate autonomously Plenty of SLAM algorithms have been presented and achieved pleased performances now However their predicted trajectories will inevitably meet some accumulated errors after long time explorations Loop closure detection LCD is one of the most popular solutions for this problem If a robot can detect loops correctly it could amend the predicted trajectory and thus improve the accuracy of SLAM According to the data utilized LCD algorithms can be distinguished into two main categories 2D image based ones 12 21 and 3D point cloud based ones 31 36 Images have rich textures and image processing methods are well established Therefore 2D image based algorithms still lead the fi eld although the performances of 3D point cloud based ones are getting better now Feature extraction and image representation are the cores of appearance based LCD algorithms Traditionally used features can be divided into global features and local fea tures Global features 3 4 are compact and computationally This research was sponsored by National Natural Science Foundation of China under Grand No 61603020 61620106012 and 61573048 It was also supported by the Fundamental Research Funds for the Central Universities under Grand No YWF 19 BJ J 355 H Yue J Miao Y Yu and W Chen are with the School of Au tomation Science and Electrical Engineering Beihang University Bei jing 100191 China yuehaosong mjy0519 yuyuesmile whchenbuaa C Wen is with the School of Electrical and Electronic Engineer ing Nanyang Technological University Singapore 639798 Singapore ecywen ntu edu sg Yue Yu is the corresponding author Fig 1 Topological graph built by our proposed algorithm Red dots represent selected SuperPoint feature points i e nodes of the graph Green lines represent edges between nodes Two candidate images are regarded as loop closure only if they have the same graph structure effi cient However they are also sensitive to view and illu mination changes On the contrary local features are robust to appearance changes and have been widely used in image representation 5 8 Recently convolutional neural network CNN has achieved a great success in pattern recognition and computer vision tasks 9 11 There are also many researchers utilize features extracted by CNNs to improve LCD algorithms 12 14 21 In order to reduce computational cost bag of words BoW framework has been utilized in LCD BoW methods train a visual vocabulary using an off line or on line manner Then features extracted from images are quantifi ed into visual words according to the vocabulary Finally images are represented and matched using corresponding vectors depicting the histogram of words The technique of tf idf term frequency inverse document frequency is commonly used to assign different weights to words Although BoW methods are quite effective they often mismatch loop clo sures because of discarding the spatial distribution of words To overcome this problem some researchers introduced tem poral or spatial consistency constraints 22 29 to LCD For example a posterior verifi cation based on RANSAC Ran dom Sample Consensus is used to exclude wrong matches However there is still room to improve the performance of BoW based LCD approaches In this paper we present a novel appearance based LCD algorithm It uses visual features extracted by a deep neural network to represent images and a topological graph model to verify candidate loop closures as illustrated in Fig 1 Experiments show that our proposed algorithm outperforms several state of the art LCD methods The main contributions of this paper are summarized as follows 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE3787 Utilizing SuperPoint a fully convolution network to ex tract key points and descriptors which is more accurate than modernly used feature extraction methods in LCD Training the visual vocabulary in an incremental man ner which fully consider the relations between conse quential training images A novel topological graph model based verifi cation method is proposed to confi rm candidate loop closures obtained by BoW framework The rest of this paper is organized as follows Section II briefl y introduces related works on LCD methods Section III describes our proposed algorithm in detail Comparative experiments with state of the art LCD algorithms are pre sented in Section IV while conclusions and future work are discussed in Section V II RELATEDWORK As an important procedure to correct accumulated errors produced by SLAM algorithms loop closure detection has achieved great attentions in robotic community Early re searches detected loops by comparing robot s current and historical positions directly 1 which required accurate lo cating ability of robots Therefore it only applied to small and simple operating environments Then some methods using sonar based or laser based distance sensors were proposed and achieved good performance 2 With the development of imaging technique and computational capabilities images have become the primary input data for LCD One of the most important issues in appearance based LCD algorithms is image representation It can be regarded as dimension reduction of an image including methods based on global features and local features Global feature based algorithms use a single descriptor to depict the whole image such as Gist 3 and the statistical histogram of colors 4 This kind of representation is quite effi cient but also sensitive to appearance changes caused by illumination viewpoint or dynamic objects On the contrary local feature based methods extract plenty of key points and calculate feature descriptors of these points to describe one image such as SIFT Scale Invariant Feature Transform 5 HOG Histogram of Oriented Gradient 6 SURF Speeded Up Robust Features 7 and ORB Oriented FAST and Rotated BRIEF 8 These local features are less sensitive than global features to illumination changes and scale variations but their computational cost are relatively high In recent years deep learning technology has been applied to various computer vision tasks and got impressive suc cesses 9 11 It is demonstrated that deep neural networks could learn to extract some high level features from images which cannot be extracted by traditional hand crafted feature extractors Therefore researches begin to introduce artifi cial neural networks into LCD Niko et al 12 presented an approach that uses EdgeBoxes to extract proposal landmarks from images and a pre trained ConvNet to extract feature descriptors Chingiz et al 13 refi ned a pre trained network to enhance the robustness against seasonal changes using the Nordland dataset which was gathered at different seasons Instead of using networks trained for object recognition Lopez Antequera et al 14 trained an appearance invariant place recognition network and achieved better performance When the scale of environment gets larger it is ineffi cient to represent images using raw features BoW framework is a possible solution to reduce computational costs Originating from text retrieval BoW model was fi rstly introduced into image retrieval areas by Zisserman 15 Nister et al 16 improved the retrieval effi ciency by building the vocabulary with a tree structure Some researchers 17 18 utilized binary words and have received excellent improvements on match ing effi ciency One of the most infl uential BoW based LCD algorithms is FAB MAP 19 Then the authors proposed an improved version FAB MAP 2 0 20 achieving a wonderful result in large scale environments In order to utilize the representation ability of CNNs some researchers creatively combined deep learning technology and BOW based LCD algorithms For example Hou et al 21 presented BoCNF method to detect loop closures which uses features extracted by a convolution network to build vocabulary and retrieve images BoW model improves the effi ciency of LCD algorithm but it also make the LCD algorithm easily affected by perceptual aliasing problem The main reasons are twofold one is the information loss in quantization procedure and the other is the discarding of spatial relations between words Thus those scenes having plenty of similar objects such as offi ces often confuse the algorithm and lead to a bad matching precise To overcome this problem some researchers presented LCD algorithms based on temporal consistency constraints 22 25 Milford et al 23 proposed SeqSLAM which believed that two images are judged as a loop closure only if there are continuous matched pairs Then Siam and Zhang 25 improved this algorithm and increased the computing speed Using spatial information among words is another effi cient solution to reduce perceptual aliasing Paul and Newman 26 improved FAB MAP by integrating spatial information into vocabulary building and image representation Kanji et al 27 introduced visual phrase into LCD which clustered words those often appear in the same image into a phrase by a self supervised method Lynen et al 28 and Garcia Fidalgo et al 29 used RANSAC method to verify the matched results detected by BoW framework Candidate images will be regarded as a loop closure if they fi t a cer tain corresponding geometric constraint To further improve reliability Konstantinos et al 30 combined both temporal and spatial constraints in their LCD algorithm Apart from image based LCD algorithms there are also methods based on point cloud data Rizzini 31 used hand crafted 3D features to represent a scene Steder et al 32 projected point cloud into a range image and then extracted image features to perform LCD Wu et al 33 and Maturana et al 34 used trained neural nets to extract 3D features on point clouds However they need to convert unordered points to regular voxels fi rst Later PointNet 35 and PointNet 36 were proposed which means it is possible to extract features directly on raw points 3788 Training images Encoder Descriptor Decoder SuperPoint Net Feature descriptor vectors Vocabulary training Vocabulary Tree Loop closure detecting Input image Encoder Interest Point Decoder Descriptor Decoder SuperPoint Net Feature descriptor vectors Candidiates Key points 1 0 1 0 0 1 Graph posterior Loop closure result Database images Graph model Fig 2 An overview of our proposed framework III PROPOSEDMETHOD A Overview Figure 2 provides an overview of our proposed framework for LCD which consists of an off line vocabulary training pipeline and an on line loop closure detecting pipeline To inhibit the reduction of LCD performance caused by inac curate features we utilize interest points and visual feature descriptors extracted by the SuperPoint Net 37 which is a novel convolution network and more robust against appearance changes With the extracted feature descriptors we train a visual vocabulary in an incremental manner We take full use of the relations between consecutive images in the training process to increase the representability of the vocabulary Then images can be represented according to the vocabulary and some candidates of loop closures can be obtained using a BoW method In order to further exclude mismatches caused by perceptual aliasing we present a novel posterior verifi cation method based on topological graph built with extracted interest points Two candidate images are regarded as a loop closure only if they have the same graph structures The details of our proposed method will be described in the following subsections B Feature extraction The accuracies of feature descriptors are crucial for LCD algorithms In this paper we use the pre trained SuperPoint network to extract robust key points and their feature de scriptors Different with other CNNs SuperPoint can detect interest points with precise locations in images It uses a sin gle shared encoder which consists of convolution layers max pooling layers RELU nonlinear activations and BatchNorm normalizations Resized by three 2 2 max pooling layers the H W full size image is reshaped into a narrow feature map with resolution of Hc Wc in which the interest points are detected Here Hc H 8 and Wc W 8 Then the output of the encoder are fed into two decoders namely interest point decoder and descriptor decoder separately to extract feature points and their descriptors In order to avoid training computation these decoders use non learned upsampling techniques to bring the outputs back to full resolution To test the robustness of the key points and descriptors ex tracted by SuperPoint network we carried out suffi cient fea ture matching experiments on several challenging datasets According to our evaluation we demonstrate that SuperPoint outperforms state of the art feature extraction algorithms and meets our requirements on image representation for LCD C Vocabulary training SuperPoint uses the extracted feature descriptors to repre sent images directly which is ineffi cient when the number of images is too large Clustering raw feature descriptors into quantized words and then represent images is a desired solution Existing vocabulary training algorithms usually cluster image features using k means method and establish a vocabulary tree by assigning its levels and nodes However if the scale of the tree is not assigned properly the vocabulary will not distinguish the features well In other words there will often be the situation that different features are associat ed to the same word or the same feature point is associated to different words after view changes The strategy of training the vocabulary with several scales and select the best one can alleviate this problem to some extent but it is troublesome and still cannot obtain the optimal solution 3789 Image k 1Image k 2Image k Fig 3 Process of vocabulary building In this paper we train the vocabulary in an incremental manner considering the relations between adjacent images Generally training images are collected by robots or hu man operators consecutively along their moving trajectories Therefore adjacent images always have common parts of the scene as illustrated in Fig 3 With the fi rst image in the training set we extract feature points using SuperPoint and regard their descriptors as initial words With the second image we match the extracted feature points with those extracted from the previous image The matched features are assigned to existing words and the centers of these words are updated accordingly Those unmatched features are assigned as new words When all the training images are processed we can obtain a series of words which are regarded as the leaf nodes of the vocabulary tree Then the Hierarchical Dirichlet Process 38 is utilized to cluster these leaves into parent nodes In this way we can build a vocabulary tree without supervisions Thanks to the feature matching and self supervised training procedures the established vocabu lary can represent features properly D Candidate proposal With the trained vocabulary loop closure candidates can be obtained by representing and retrieving images in a BoW framework For every captured image feature points and descriptors are extracted using SuperPoint network fi rst Then these features are quantized into words by matching with the nodes of the vocabulary tree Next the image is represented using a visual vector by calculating the histogram of words occurred in it Finally we can select top K images from database as candidate loop closures by measuring the distance between visual vectors In our implementation we use L1 distance to measure the similarity between images as below s I1 I2 i v1 i v2 i 1 where I1and I2represent two images v1and v2are corre sponding vectors respectively E Graph verifi cation Traditional BoW frameworks ignore the position infor mation of words which makes it easy to produce wrong matches To overcome this problem a novel graph based verifi cation method is proposed in this paper The nodes of the graph are extracted feature points us ing SuperPoint However there are too many points in an image Using all the points to build the graph will increase computational cost and decrease robustness of the algorithm Therefore we match these points between two candidate loop closure images and only reserve the ones have one to one mapping relations For each candidate image we construct an undirected triangular graph whose sides connect adjacent nodes without overlapping The graph building method we proposed is similar to Delaunay triangular technology but we do not consider convexity and concavity AssumeV v1 v2 vn is the selected point set of an image and G V E is the graph to build where E represents the edge set of the graph The procedures o

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

IROS2019国际学术会议论文集 1544

文档简介

温馨提示

最新文档

评论

IROS2019国际学术会议论文集 1544

文档简介

温馨提示

最新文档

评论

相关文档