IROS2019国际学术会议论文集 1436_第1页
IROS2019国际学术会议论文集 1436_第2页
IROS2019国际学术会议论文集 1436_第3页
IROS2019国际学术会议论文集 1436_第4页
IROS2019国际学术会议论文集 1436_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

EPN Edge Aware PointNet for Object Recognition from Multi View 2 5D Point Clouds Syeda Mariam Ahmed1 Pan Liang2and Chee Meng Chew1 Abstract Performance of current 3D point based detectors is limited by the number of points they can process conse quently limiting their accuracy In this paper we propose a novel architecture coined as Edge Aware PointNet that incorporates geometric shape priors as binary maps integrated in parallel with the PointNet framework through convolutional neural networks CNNs The proposed architecture takes individual object instances as input and learns the task of object recogni tion for 3D shapes To train the network we present a dataset of 31k 2 5D synthetic point clouds rendered from ModelNet40 Through 2 5D representation the network learns object recog nition despite occlusion that enables improved performance on objects from real world while 2D binary maps enable feature learning that is independent of number of points in the point cloud Comprehensive experimentation shows that the proposed network is able to improve performance by 2 5 on ModelNet40 and 2 6 on ModelNet10 datasets as compared to the baseline PointNet We also show improved performance as compared to state of the art methods on a real world RGB D dataset where our network improves results by 8 Our code and dataset is publicly available at Aware PointNet I INTRODUCTION Object recognition is an important task for robots to perceive and interact with the real world However with most common depth sensors objects can only be viewed from limited perspectives due to clutter and occlusion Consequently researchers have shown that state of the art performance has been achieved by training CNN on multiple 2D views rendered from 3D models achieving a higher accuracy as compared to volumetric grid based methods 16 These results clearly indicate that incorporating view based information is an important aspect for recognition of 3D objects The state of the art methods aim to learn features in 3D through two main techniques volumetric quantization based approaches 18 11 14 5 and multi view CNNs using 2D rendered images of 3D shapes 14 16 6 Volumetric methods are popular because they incorporate the complete point cloud for recognition and directly exploit 3D information which is much more informative as compared to 2D projections of point clouds On the other hand in real world scenarios 3D objects can only be observed partially In such cases multi view CNN based recognition is a better approach as the network is trained to recognize 3D objects under occlusion 14 6 For both volumetric and multi 1Syeda MariamAhmedandCheeMengChewarewiththe DepartmentofMechanicalEngineering 2Pan LiangiswithAd vanced Robotics Center National University of Singapore email e0020829 u nus edu sg Fig 1 Given a single point cloud X P RNx4 a projection mapping fPS is applied that generates three binary images Sxy Syzand Sxz The binary images are input for the CNN layer of the network The binary edge maps provide additional geometric features of the objects that may be invisible due to occlusion that results in improved feature learning view approaches the deep learning architecture generalizes 2D image CNNs to 3D CNNs While both methods have shown great success there are signifi cant challenges associated with these approaches Vol umetric domain requires quantization of the point cloud and its resolution can directly affect the computational complex ity of the system 6 10 In addition sparsity in datasets leads to unnecessary operations and memory consumption Though effi cient CNNs have been proposed that use voting mechanisms to leverage the sparsity in 3D data 5 17 these methods require the network to run in parallel threads at various different angular orientations to cater for object rotation 5 Alternatively multi view based methods have shown to outperform volumetric methods However it is still argued that projecting a point cloud to 2D depth map discards valuable information as the network fi lters need to learn local dependencies with regards to depth that is readily available in 3D representation In addition these methods are often constrained with employing a large number of views per object Recently a novel architecture PointNet 15 is pro posed that directly takes 3D points as input and creates clusters to aggregate features at different scales to generate a global point cloud signature The architecture has been applied for object classifi cation of full 3D CAD models se mantic segmentation and 3D object detection in scenes 13 15 14 While such point based methods do not require reorganization of the input point cloud their performance is limited by the number of points the network can process which directly affects computational complexity of the sys tem Consequently in this paper we propose to incorporate geometric shape priors as binary maps that explicitly capture object features while being independent to the number of points in the cloud Boundary detection algorithms have a long history in 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE3445 computer vision and used to play a fundamental role in extracting features from images 2 8 However with the arrival of deep learning neural networks the explicit require ment to detect edges is not as signifi cant The convolution operation has a close relationship to the frequency domain due to which a CNN learns to detect edge features within the initial layers of the network Recently researchers have shown that using edge detection as an auxiliary task improves performance for the main task of semantic segmentation 3 7 4 Our work is inspired by these methods where we investigate if boundary detection can improve performance for point cloud based recognition We explore two variations where a edge detection is used as an auxiliary task and b 3D boundary information is used to generate binary maps for parallel feature learning Specifi cally in this paper we introduce a combination of the most effective strategies a PointNet 15 architecture combined with a parallel stream of 2D binary image based convolutional neural network CNN coined as Edge Aware PointNet EPN The PointNet layer of the proposed framework takes as input individual instances of 3D point clouds while the complimentary CNN layer of the network is provided with three binary maps as shown in Figure 1 These images are projections of the point cloud boundary on the xy yz and xz planes As a result a feature vector is generated by both layers that is concatenated to predict object category In summary we make the following key contributions We propose a novel learning framework for unorganized point clouds that employs geometric shape priors as binary maps for improved feature learning We propose learning from 2 5D view based representation of 3D point clouds to explicitly incorporate occlusion and clutter effects on objects as perceived by real sensors We demonstrate that this architecture can surpass perfor mance of the PointNet baseline and achieve state of the art results on a 3D data of real world object recon structions despite being trained on purely synthetic data II EDGE AWAREPOINTNET The objective of our deep learning network is object recog nition given a group of unorganized 3D points Formally given a point cloud P tp1 p2 pnu where pi tx y zu P R we aim to learn a network that can infer ppy x wq such that y tycu where yc represents multi class classifi cation A Edge Detection for Point Clouds It has been explicitly shown that CNNs learn edge features at the earlier levels in the network while gradually moving on to high level features 19 However there is no such evidence for a point based network like PointNet that is designed on the idea of grouping points to extract features from the point cloud As a consequence we propose to explicitly detect edge points from the point cloud and use them to enhance feature learning using CNNs In our previous work we proposed an edge detection algo rithm 1 for 3D point cloud data that evaluates symmetry of a group of nearest neighboring points to classify the query point as edge non edge Given a query point pi we determine its k nearest neighbors For an unorganized point cloud this is achieved through a k dimensional K d tree These neighboring points of piare denoted as Vi tn1 n2 nku Initially we assume that the centroid of Viis the query point itself while a new centroid Ciis computed by taking the mean of the neighboring points as follows Ci 1 Vi k j 1 nj 1 To cater variation in density we compute the resolution ZipVi q as defi ned by 2 from the neighboring points This is achieved by determining the distance of the nearest neighbor of piamong all the k neighbors ZipViq min nPVi ppi niq 2 Ci Vi ZipViq 3 Evaluation of Ziensures scale invariance as the local density of points is considered for each point individually Finally Zi is weighted by a fi xed parameter which serves as the classifi cation threshold If the distance between the new centroid Ciand the query point piis greater than ZipViq as defi ned by 3 the point is classifi ed as an edge As a result symmetry among the neighboring points of pidetermines whether a point belongs to an edge B Architecture of Edge Aware PointNet Figure 2 illustrates the complete architecture of the pro posed network for classifi cation of 3D objects The network consists of two parallel channels for feature learning which is an extension to the recently proposed PointNet 15 architecture The original network takes as input a set of 3D unorganized points Pi i 1 n where each point Pi is a vector of px y zq points in Euclidean space and extracts features through multiple set abstraction layers SA The SA layers consist of sampling and grouping where Farthest point sampling FPS algorithm 12 is used to sample k number of points tr p tp1 p2 pku from the given point cloud and k nearest neighbor or radius based search is used to collect the designated number of nearest neighbors These points serve as input to a PointNet layer that is a multi layer perceptron MLP and extracts features from each group of points Three consecutive SA layers are used to extract features from the point cloud generating a k1x1 feature vector that is representative of the geometrical shape of the input point cloud The SA layers in the network aim to implement a hierar chical learning of features The fi rst sampling and grouping layer samples N1points from the original point cloud of size N given that N1 N These points are used to compute features that pertain to this local region much like traditional handcrafted descriptors In the next layer the group of points N2are sampled from N1and their nearest neighbors are also computed from this subset Thus there is an automatic 3446 Fig 2 Architecture of Edge Aware PointNet for object recognition The network consists of two main branches a PointNet and b CNN branch The PointNet branch is trained using 2 5D point clouds while the CNN branch is trained on binary images that are generated using edge points from the point cloud The feature vectors generated by both branches are concatenated before predicting class labels for each point cloud shift in scale where the features learnt at the previous layer are further propagated to incorporate a larger receptive fi eld Finally the third layer does not perform sampling and grouping but accumulates features from the previous layers to generate a k1length signature for the given point cloud The second channel that trains in parallel to the Point Net layer is a CNN architecture also shown in Figure 2 The input to this layer is generated by fi nding edge points from the point cloud using the algorithm described earlier Once edge points are determined we get a matrix X P RNx4 where N is the fi xed number of points randomly sampled from the original point cloud P while the four channels rep resent Euclidean coordinates and a logical value indicating if the given point is edge or non edge We further introduce a binary projection module that defi nes a mapping function to convert the 3D matrix X to a series of 2D binary maps S fBMpX gq 4 where S P RMxMrepresents a 2D binary map X is the input point cloud tensor and g defi nes a 1 D vector that determines the projection plane The function fBMmaps each point from the dot product between X and g onto a specifi c grid cell Sjkof the binary map This mapping is formalized as fBM Xtxi yi ziu g Sjk 5 where j P p1 q1 n and k P p2 q2 n represent a 2D location in the binary map S while p and q represent the range of the respective principal axis in the point cloud and n is the quantization factor This mapping is used to generate three channels that can be described as Sxy fBMpX r1 1 0sTq Syz fBMpX r0 1 1sTq 6 Sxz fBMpX r1 0 1sTq As a result we generate the input set for EPN described as X tX P R3 S P RMxMx3u 7 The binary maps generated are of low resolution which prevents the network from being computationally intensive however there is no restriction on the size of the input cloud X The CNN layers designed to process the binary maps consist of the traditional convolutional and max pooling layers to generate a feature vector of length k2x1 The joint feature vector is followed through a series of fully connected layers and a softmax classifi er that uses the cross entropy loss function to predict probabilities for each class C Edge Detection as an Auxiliary Task Versus 2D Input To evaluate the most effective approach for integrating edge information from point clouds we experiment with two different network architectures as shown in Figure 3 The fi rst architecture uses edge detection as an auxiliary task based on improved performance reported by several researchers 3 7 4 As a result we extract edge points from the original point cloud and create a binary label for each point indicating if it s an edge 1 or non edge 0 point The details of this network are shown in Figure 3a The auxiliary task is defi ned using an encoder decoder network that can generate binary predictions for every point in the original point cloud 3447 Fig 3 Two variants of the Edge Aware PointNet The fi rst variation uses edge detection as an auxiliary task The network uses an encoder decoder framework to generate pixel wise binary predictions for each point in the point cloud The second variant uses edge points to generate three binary images that are used for a parallel CNN layer The encoder decoder network adopts a hierarchical prop agation strategy with distance based interpolation Feature propagation is achieved by interpolating feature values using an inverse distance weighted average based on k nearest neighbors between the input and output levels of a SA layer The interpolation function is defi ned as 15 fpjqpxq k i 1wipxqf j i k i 1wipxq where wipxq 1 dpx xiqp 8 The fi nal prediction layer from the decoder is concatenated with the feature vector generated by the encoder part of the network Intuitively we concatenated edge point predictions per point with the feature vector from the encoder using a skip connection link The concatenated feature vector follows through a series of fully connected and drop out layers to predict class labels per point cloud The network is trained using multi task learning where the fi nal loss function is a weighted combination of the binary edge prediction and the multi class classifi cation tasks The loss for binary edge detection is a pixel wise cross entropy loss defi ned as follows Lmulti task 1 Ledge 2 Lclass 9 Ledge 1 N N n 1 pynlogp ynq p1 ynq logp1 ynqq 10 where N is the total number of pixels while ynrepresents predicted probability from the softmax function As opposed to the fi rst variant the second network also shown in Figure 3 explicitly uses edge points for enhanced feature learning using a shallow CNN network This architecture is also inspired by the success of multi view CNNs and their improved performance over volumetric methods for point cloud data This is a novel branch that aims to extract additional shape information of the point cloud without drastically increasing computational complexity of the network In contrast to the fi rst network this architecture uses a single task of multi class classifi cation Lclassfor the network loss function III EXPERIMENTAL RESULTS For experimental evaluation we fi rst briefl y describe the procedure for 2 5D dataset generation from 3D models of ModelNet40 Next we analyze the results of the proposed network on ModelNet40 ModelNet10 and a real world dataset 14 which consists of 246 objects This dataset con sists of point clouds that are captured with an ASUS Xtion Pro and a dense reconstruction of the object is performed The goal is to show that training on 2 5D dataset outperforms training on ModelNet40 18 as the network achieves state of the art accuracy for 14 The implementation details of the network architecture are as follows The PointNet branch samples 512 and 128 nearest neighbors in the fi rst two SA layers with a search radius of 0 2 and 0 4m respectively Similarly there are two CNN layers that consist of 32 and 64 fi lters with a kernel size of 5x5 while the max pooling layers have a kernel size of 2x2 and a stride of 2 This is followed by two fully connected layers that results in k2 1024 which is concatenated with k1 1024 to form the fi nal feature vector from the point cloud It is to be noted that no color information is provided to the network during training We use dropout with a keep ratio 0 5 before the last fully connected layer Adam optimizer is used with an initial learning rate 0 001 momentum 0 9 and batch size 8 Training on ModelNet40 takes about 9 12 hours to converge with a GTX1070 GPU A 6D Pose Labeling of Point Clouds The 2 5D partial view dataset is generated through ray tracing using a 3D icosahedron that is tesselated divided in polygonal regions as

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论