IROS2019国际学术会议论文集 1701

上传人：我*** IP属地：北京上传时间：2020-04-11 格式：PDF 页数：7 大小：853.76KB 积分：12 举报 版权申诉

已阅读5页，还剩2页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Crowd sourced Semantic Edge Mapping for Autonomous Vehicles Markus Herb1 2 Tobias Weiherer1 Nassir Navab2and Federico Tombari2 Abstract Highly accurate maps of the road infrastructure are a crucial cornerstone for self driving cars to enable nav igation in complex traffi c scenarios Traditional methods for creating detailed maps of road environments involve expensive survey vehicles that cannot keep up with the frequent changes in the road network In this paper we propose a novel method to derive detailed high defi nition maps by crowd sourcing data using commodity sensors Our system uses multi session feature based visual SLAM to align submaps recorded by individual vehicles on a central backend server We reconstruct 3D boundaries of road infrastructure elements such as road markings and road boundaries from semantic object contours detected in keyframes by a neural network The result is a concise map of semantically meaningful objects suitable both for localization and higher level planning tasks of automated vehicles We evaluate our method on real world data against a globally referenced ground truth map demonstrating a high level of detail and metric accuracy I INTRODUCTION Highly accurate maps are generally believed to play a crucial role in future self driving vehicles as such high defi nition HD maps of the road infrastructure provide important information about the road environment required for navigation and planning For this HD maps need to contain landmarks to enable accurate re localization within the map and must also provide semantic information about the road environment to facilitate higher level tasks which road elements such as lane markings pavement signs driv able area traffi c signs or traffi c lights In addition the maps should be compact and concise to allow delivery of the map data to the vehicles using mobile data networks Today such HD maps are typically created using special survey mapping vehicles equipped with expensive reference sensors which provide high quality maps but are unsuitable for large scale deployment given the large number of such vehicles required to create and update the maps in constantly changing environments A promising approach to overcome these challenges is the use of sensor data crowd sourced from regular vehi cles While by using crowd sourcing mapping data can be acquired quickly and continuously for large areas it also creates new challenges for the mapping system The system should make use of established and cheap sensors for large scale deployment while it also requires effi cient use of the available bandwidth to collect mapping data on a central backend server These considerations render the use of many 1AUDI AG Department Sensorfusion MapLearning Ingolstadt Ger many 2Technical University of Munich Chair for Computer Aided Medical Procedures Munich Germany F Tombari is now also affi liated with Google Zurich Switzerland Session 1Session 2Session 3 Fig 1 Exemplary semantic edge map reconstruction bottom computed from multiple sessions using semantic object contours top established mapping pipelines infeasable since they often rely on expensive sensors such as lidar or require large amounts of raw sensor data In this work we present a novel approach to create detailed and accurate maps of the road infrastructure suitable for autonomous vehicles from crowd sourced sensor data The main contribution of our work is the use of 2D semantic ob ject contours as compact features to reconstruct concise and semantically meaningful 3D maps in a crowd sourcing setup We defi ne semantic object contours as the outer borders of relevant semantic segments given by a semantic image segmentation Each such contour typically corresponds to the projection of the outline of a 3D object which we use as our underlying semantic map representation Our system relies on a monocular camera as well as standard GPS and vehicle odometry as sensor inputs which are already available in many production cars today Each vehicle contributes map snippets of road sections generated from feature based visual odometry all of which are co registered on a central backend server using visual keypoints as auxiliary localization features We use onboard image based semantic segmentation to extract the contours of objects of interest such as lane markings pavement signs as well as the drivable area Given the registered map snippets we reconstruct the 3D object boundaries from their semantic 2D contour detections a concept that can be applied to a wide range of road elements An examplary semantic map reconstruction from multiple recording sessions can be seen in Fig 1 By using a combination of sparse visual keypoints as auxiliary features for co registration and semantic object con tours as concise description of the scene semantics we obtain 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE7047 detailed map reconstructions while keeping requirements for sensors and network connection reasonably small for real world deployment II RELATEDWORK Creating maps of the environment for robotic applications has been studied intensively in the past within the context of Simultaneous Localization and Mapping SLAM Given the vast amount of previous work conducted and the relatively unexplored fi eld of crowd sourced mapping for autonomous vehicles in which we focus in this work we restrict our review in the following to the most relevant work of Visual SLAM mapping and localization for automated vehicles and 3D edge reconstruction A Visual SLAM Visual Odometry and SLAM systems using cameras as primary sensor are widely adopted in robotic applications In particular sparse feature based approaches such as ORB SLAM 1 are well studied and have been applied to a wide range of problems including large scale outdoor scenes Ex tensions to the standard single robot setup towards multiple agents operating simulatenously or at different times have been presented both in centralized variants using stereo 2 and visual inertial 3 formulations as well as distributed 4 setups without any dedicated master agent or server For our multi session mapping framework we adopt a centralized ar chitecture similar to existing ones proposed in the literature While sparse feature landmarks allow for accurate relo calization higher level planning tasks require a richer under standing of the environment Therefore semantic reconstruc tion approaches 5 were developed to build more meaningful maps of the environment However dense reconstruction typically used for semantic mapping is not well suited for crowd sourcing mapping data due to the prohibitively high computational and bandwidth demands Another important aspect of mapping in dynamic envi ronments is the long term map maintance to refl ect semantic changes 6 For the proposed system its lightweight crowd sourcing architecture allows to quickly perform re mapping such that we currently do not explicitly model changes in the environment over time B Localization Mapping for Automated Driving Traditional methods for acquiring highly detailed road infrastructure maps involve ground survey vehicles typically equipped with high accuracy GPS and precise laser scanners 7 or multi camera systems 8 9 To alleviate the need for such expensive survey mapping several crowd sourced mapping systems for vehicles that fuse sensor data from multiple sessions and vehicles have been proposed Different approaches using dashed lane markings 10 lane boundaries and traffi c barriers 11 or lane boundaries and traffi c signs 12 for mapping were proposed in such crowd sourced setups Targeting the use case of localization in a parking lot Schuster et al 13 also presented a radar based incremental multi session mapping approach which however did not include meaningful semantic elements of the environment Particularly for challenging urban environments however these existing approaches for crowd sourced mapping do not offer the level of detail required for mapping many road elements relevant for the driving task such as arrows crosswalks bus stops parking spaces and more Recent work in the fi eld of vehicle localization 14 15 demonstrated that such semantically meaningful map ele ments including detailed road markings road boundaries and traffi c signs can also be used to accurately and ro bustly relocalize the vehicle within the map This enables autonomous vehicles to use semantic map elements not only for navigation and planning but also for relocalization thereby not requiring any additional features for localization Our approach can therefore be used as a complementary mapping component to these existing localization methods While low level localization features are not strictly neces sary for online localization we consider them still useful to the mapping process by providing accurate inter session localization required for map reconstruction C Edge Reconstruction Finally we want to draw connections of our approach to the fi eld of 3D shape reconstruction In the context of Structure from Motion the reconstruction of 3D polyline edge models from detected edges in multiple images has been proposed by Bignoli et al 16 The fundamental idea of our reconstruction approach is similar to this work but instead of using image edges we use semantic object contours for re construction The concept of reconstructing the 3D shape of an object from its contour or silhouette observed in multiple images is well known in computer vision as Visual Hull 17 Panoptic segmentation 18 can be used to obtain contours of all scene elements as well as individual object instances in an image In this work we focus on reconstructing planar elements on the road surface that do not occlude each other making it possible to use standard semantic segmentation networks without instance level awareness instead of the more complex panoptic segmentation III MULTI SESSIONMAPPINGFRAMEWORK Our crowd sourced mapping pipeline depicted in Fig 2 is composed of a vehicle frontend and a corresponding server side backend The vehicle frontend is composed of a visual odometry system complemented by semantic contour extraction The backend is made up of two stages First a multi session merging step that co registers the different sessions using feature based alignment and in a second step we reconstruct the semantic edge map We give a brief overview of our vehicle frontend and multi session merging backend in the following two subsections as it is very similar to existing multi session SLAM approaches before detailing the semantic edge construction process in section IV A Visual Odometry Frontend The vehicle frontend serves the purpose of generating locally accurate submaps of road sections in order to register 7048 Single Session Submap Odometry GPS Camera Sparse Visual Odometry Vehicle FrontendServer Backend Submap Optimization Feature Matching Feature Map Optimization 3D Edge Map Semantic Segmentation Semantic Contours Ground Surface Estimation 3D Edge Reconstruction Multi Session Feature Map Multi Session Submaps Single Session Submap Odometry GPS Camera Sparse Visual Odometry Vehicle FrontendServer Backend Submap Optimization Feature Matching Feature Map Optimization 3D Edge Map 2D Semantic Segmentation 2D Semantic Contours Ground Surface Estimation 3D Edge Reconstruction Multi Session Feature Map Multi Session Submaps Fig 2 Pipeline for our crowd sourced 3D edge map reconstruction system split into the vehicle frontend left and server backend right components Individual session submaps made up of keyframe poses tracked sparse feature landmarks semantic contours and GPS measurements are transmitted from frontend to backend the sensor data acquired by multiple vehicles in the backend We employ feature based visual odometry similar to the ORB SLAM 1 frontend by tracking a local map of sparse landmarks created from matched ORB keypoints To im prove robustness in dynamic street environments and recover absolute scale we guide the feature tracking by incorporating vehicle odometry measurements We do not perform any loop closure detection within a single submap because loop closures rarely occur in natural driving but leave this entirely to the multi session merging in the backend For each keyframe we compute a pixel wise semantic segmentation using a multi layer Convolutional Neural Net work CNN The network has been trained on 20 000 fi nely annotated images of street scenes from Germany to classify roadmarkings drivable and non drivable areas as well as dynamic objects static obstacles vegetation buildings and traffi c signs After segmentation we extract the segment contours as polygonal chains containing each pixel of the contour We detail the contour extraction later in section IV A For each completed submap we perform an initial pose graph optimization using consumer grade GPS measure ments and vehicle odometry to reduce local drift and com pute a global anchor pose for the submap The anchor pose describes the position of the fi rst keyframe of the submap in the global world coordinate frame with all other keyframes and landmarks parameterized relative to the origin of the submap After initial pose graph optimization we refi ne the structure using robust full bundle adjustment remove outlier observations and create additional landmark observations in each keyframe by reprojecting unseen landmarks from covisible keyframes For each optimized submap we transmit keyframe poses with tracked landmark observations and semantic contours sparse 3D landmarks as well as GPS measurements to a central backend server e g using cellular data networks B Multi Session Merging Backend Given a number of individual session submaps we per form a pairwise matching of all sessions For each keyframe in each session we search for the closest keyframe in each other unprocessed session using the rough global position of the keyframes If a suffi ciently close matching keyframe has been found we extract observed landmark points from all covisible keyframes and generate feature matches by match ing the ORB descriptors We verify inlier feature matches and compute a relative pose using RANSAC PnP as loop closure constraints Given the inter session loop closures between all sessions we build a multi session pose graph and optimize using GPS constraints for global alignment After initial pose graph relaxation we optimize the map using robust bundle adjustment over both the intra and inter session feature matches and merge duplicate map points from different sessions IV SEMANTICMAPRECONSTRUCTION In the previous section we described our multi session merging framework for combining crowd sourced mapping session data in a single coherent metric map In the fol lowing we detail the reconstruction of the semantic map information We represent the semantic map using 3D poly gon lines describing boundaries of objects of interest In this work we focus on reconstructing the road geometry including lane markings and the drivable area and plan to extend the approach to other roadside objects that may be relevant both for navigation as well as re localization of the vehicle While our approach for co registering the inidividual ses sions relies on sparse feature points as auxiliary localization features we do not require them for the semantic 3D edge reconstruction making the approach general to be used with other relocalization methods A Contour Extraction For the 3D reconstruction we re interested only in the semantic visual contour of each object as these give a concise description of the object As introduced before we extract relevant semantic contours cn i x1 n i xmn i xk n i R2for reconstruction from the 2D semantic segmentation of each keyframe Ki To avoid having many small contours generated by mispredictions we pre fi lter the segments to keep only those with an area greater than a threshold amin In each extracted contour we discard all invalid points that lie at the image border or share a common border with an occluding object such as dynamic vehicles and split the segment contour into distinct contours at these points 7049 Fig 3 Ground surface triangle mesh pink reconstructed from ground points of aligned keyframe traces Keyframes colored by session Each fi ltered contour cn i is then simplifi ed by applying the Douglas Peucker 19 algorithm to reduce the number of in dividual points xk n i required to represent the contour which also creates an improved subpixel accurate representation of it B Ground Surface Estimation An issue when reconstructing road scenes from image data is the strong change in perspective between the ground surface and the image plane This renders the reconstruction of 3D points using triangulation from image matches chal lenging given that only temporal multi view stereo instead of more accurate static stereo information is available This is particularly relevant given the inaccurate feature matches from noisy semantic contours To overcome this challenge we make use of the fact that the drivable ground surface can be suffi ciently well approx imated as a piecewise planar surface While simple ground planes for inverse persepective mapping are often estimated online on a per frame basis this is not benefi cial to an accurate map reconstruction because co visible keyframes observing the same object may use different confl icting ground planes Instead we estimate a triangle mesh ground surface for the entire map from individual ground points We generate the ground points used for the ground surface from keyframe poses of all sessions and the approximately known height of the camera above ground from an initial extrinsic calibration Fig 3 shows such a reconstructed ground surface mesh from multiple aligned sessions For reconstructing the ground mesh we fi rst approximate the lateral crossfall of the road by least squares fi tting line seg

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

IROS2019国际学术会议论文集 1701

文档简介

温馨提示

最新文档

评论

IROS2019国际学术会议论文集 1701

文档简介

温馨提示

最新文档

评论

相关文档