IROS2019国际学术会议论文集 2086

上传人：我*** IP属地：北京上传时间：2020-04-06 格式：PDF 页数：6 大小：1.98MB 积分：12 举报 版权申诉

免费预览已结束，剩余1页可下载查看

下载本文档

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Resolving Elevation Ambiguity in 1 D Radar Array Measurements using Deep Learning Jayakrishnan Unnikrishnan and Urs Niesen Abstract Motivated by requirements for future automotive radar we study the problem of resolving target elevation from measurements by a one dimensional horizontal radar antenna array This is a challenging and ill posed problem since such measurements contain only indirect and highly ambiguous elevation cues As a consequence traditional model based approaches fail We instead propose to use a machine learning based approach that learns to exploit the subtle elevation cues and prior knowledge of the scene from the data We design an encoder decoder structured deep convolutional neural network that takes a radar return intensity image in the range azimuth plane as input and produces a depth image in the elevation azimuth plane as output We train the network with over 200000 radar frames collected in highway environments Through experimental evaluations we demonstrate the feasi bility of resolving the highly ambiguous elevation information in such environments I INTRODUCTION Modern vehicles are usually equipped with radar sensors used as inputs to advanced driver assistance systems Current automotive radar sensors commonly use a one dimensional horizontal antenna array that provides resolution only in azimuth but not in elevation However detailed elevation res olution is a key requirement for future automotive radar 1 3 Consequently the need arises to try to resolve elevation information from the measurements of such an antenna array This task of resolving both azimuth and elevation from the measurements of a 1 D antenna array turns out to be challenging To quote from 4 It is clear that a 2 D array is necessary to retrieve the information of 2 D i e azimuth and elevation arrival directions of waves impinging on the array The reasoning behind this statement is that for a 1 D horizontal antenna array the radar measurements do not contain any direct and unambiguous information about target elevations As a consequence the problem of resolving target elevations from 1 D horizontal antenna array measurements is inherently ill posed While solving this problem in general settings is impossi ble in this paper we argue that in specifi c applications it can in fact be solved To overcome the inherent ambiguity rich prior information about the scene has to be taken into account to exploit subtle elevation cues hidden in the radar signal For example for a forward facing automotive radar we expect to see certain structures and objects such as median dividers roadside embankments cars trucks in specifi c areas of the fi eld of view Objects on the road at larger range tend The authors are with Qualcomm Flarion Technologies Inc Bridgewater NJ 08807 USA junnikri uniesen i Radar input ii Lidar ground truth iii Prediction Fig 1 Sample results i shows radar return intensity in dB as function of azimuth x axis and range y axis brighter pixels indicate higher intensity ii and iii show depth as function of azimuth x axis and elevation y axis brighter pixels indicate larger depths The scene contains a truck a car and a roadside embankment seen as black regions from right to left in ii and iii Comparing our prediction iii with the lidar ground truth ii shows that our approach successfully resolves the ambiguous elevation information from horizontal 1 D radar array measurements to have higher elevation Taller targets tend to have stronger radar returns These elevation cues are diffi cult to formally model ruling out traditional model based approaches Instead we adopt here a machine learning based approach in which we im plicitly learn the relevant elevation cues directly from a large dataset of over 200000 radar frames collected in highway environments We design an encoder decoder structured deep convolutional neural network CNN and train it using this dataset The neural network takes a radar frame as its input and predicts a depth map which assigns to each azimuth elevation pair a target depth We use a lidar sensor for ground truth comparison of the neural network predictions Fig 1 depicts a sample result of our proposed approach Comparing the predicted depth map output produced by the neural network from only the radar input with the ground truth depth map from the lidar sensor demonstrates the feasibility of our approach The neural network success fully learns to use the subtle elevation cues to resolve the ambiguous elevation information from the radar signal II RELATEDWORK Several papers have explored the feasibility of resolving elevation information from 1 D horizontal radar antenna array measurements under relatively restricted conditions using a model based approach For static targets above a refl ecting ground plane such as a road the height can be un ambiguously estimated by analyzing the temporal variation in interference caused by the refl ection of the radar return from the ground 5 Also for static targets the height can alternatively be estimated by comparing the measured radial Doppler velocity of the target with the known nonzero 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE5883 velocity of the ego vehicle 6 Both of these approaches have severe restrictions such as being applicable only for static targets and moving ego vehicle In addition the multipath based approach requires analyzing several consecutive radar frames In contrast our proposed approach is able to produce height estimates both for static targets such as guard rails and roadside embankments and for dynamic targets such as cars and trucks and regardless of the ego vehicle velocity Further our proposed approach uses only the range and azimuth data of a single radar frame While the problem of elevation resolution from a 1 D horizontal radar antenna array explored in this paper is inherently ill posed and ambiguous the situation changes entirely when a 2 D antenna array is used This renders the problem well posed and allows for unambiguous elevation resolution using model based approaches In this form the so called 2 D i e azimuth and elevation direction of arrival DOA estimation problem has a long history For example 7 studies 2 D DOA for two parallel uniform linear antenna arrays 4 analyzes 2 D DOA using L shaped antenna arrays 8 9 study 2 D DOA for rectangular antenna arrays In all these approaches the 2 D nature of the antenna array is crucial The problem of estimating target elevation using mea surements from a single horizontal radar antenna array is similar in spirit to that of depth estimation from a single monocular image Both these problems aim to recover an ambiguous additional dimension of the scene by exploiting subtle cues and complicated prior knowledge Early work on monocular depth estimation relies on hand crafted features that are combined with a graphical model 10 In later work these hand crafted features are replaced by non parametric features learned from a training set of images 11 More recent work instead uses deep CNNs to extract the features which are again combined using a graphical model 12 13 The need for using an explicit graphical model is removed in 14 which combines two deep CNNs to directly regress on the pixel depths Very recently 15 16 proposed encoder decoder CNN architectures with skip connections to capture features at different scales While our application of deep learning to the specifi c problem of resolving radar elevation ambiguity is novel deep learning for inference based on radar signals more broadly has been explored by several recent papers as surveyed in 17 Some works focus on the use of CNNs for target recognition using either synthetic aperture radar images 18 high range resolution profi les 19 or micro Doppler signatures 20 22 23 24 represent the intensities of radar returns as images in the range azimuth plane and uses them as input to CNNs to recognize objects in the scene 25 develops a CNN to perform semantic segmentation on radar returns III APPROACH A Problem Statement Our objective is to estimate the range and the 2 D di rections of arrival of radar target refl ections measured by a horizontal 1 D antenna array In other words instead of just estimating range and azimuth pairs for each target as is done in traditional radar array processing we aim to resolve range azimuth and elevation triplets Resolving the elevation dimension is considerably more challenging than the other two dimensions Indeed standard radar signal processing techniques are only able to solve for the target range and azimuth This limitation stems from the fact that for a horizontal 1 D antenna array the radar signal contains only indirect and highly ambiguous information about the target elevation Estimating target elevation is therefore inherently ill posed This elevation ambiguity can only be resolved by using prior information and by exploiting subtle elevation cues hidden in the radar signal Much of these priors and cues are application dependent In our application of automotive radar in highway environments these may include the following We expect to see median dividers on the left the road in the center and roadside embankments on the right of the fi eld of view Similarly for multi lane roads we expect trucks to be more likely in the right lane and cars to be more likely in the left lane Using prior knowledge of the ground plane location we can infer information on target elevation from the range measurements Targets on the road with larger range tend to have higher elevation in the fi eld of view We expect taller targets to have stronger radar returns i e larger radar cross sections 26 Since radar measures time of fl ight which is a function of the height of the point of refl ection taller targets tend to have blurrier range measurements If we detect the radar signature of a specifi c object type such as a car or a truck we can use strong prior information about the size and shape of that object type As is clear from this list many of these elevation cues are diffi cult to formally describe or model This diffi culty rules out traditional model based approaches Instead we adopt here a machine learning based approach in which we implicitly learn the relevant priors and elevation cues directly from the data Specifi cally we use supervised learning with a dataset of more than 200000 radar frames to train a deep convolutional neural network to regress the target range for each azimuth elevation pair B Sensors and Preprocessing Recall that we aim to estimate range azimuth and eleva tion triplets These triplets can be arranged into a 2 D matrix whose rows correspond to elevation and whose columns correspond to azimuth Each matrix entry specifi es the range or depth of the nearest target in the corresponding direction We refer to this matrix as a depth map For network training we need to obtain ground truth in formation The natural sensor measuring ground truth depth maps is a lidar This sensor uses rotating lasers with different elevation angles each measuring depth as a function of azimuth 5884 TABLE I SENSOR SPECIFICATIONS Radar input Raw lidarPreproc lidar ground truth Range limit40m80m40m Range resolution8cm4cm4cm Azimuth fi eld of view90 360 90 Azimuth resolution3 7 0 015 2 8 Elevation fi eld of view13 42 21 Elevation resolutionN A1 3 1 3 10 20 30 a 30 20 10 0 b Fig 2 Lidar depth map before a and after b ground depth subtraction The depth scale is in meters The sensor specifi cations are given in the fi rst two columns of Table I further details can be found in Appendix A Note the difference in elevation resolution between the radar and lidar sensors in Table I Whereas the lidar has a relatively fi ne elevation resolution of 1 3 the radar has no resolution in the elevation direction at all It is this difference in sensor capabilities that is at the heart of our problem The raw radar returns are preprocessed using standard techniques see Appendix B for the details After prepro cessing the radar data is in the form of a 512 32 matrix with rows representing range and with columns representing azimuth Each matrix entry specifi es the intensity of the radar refl ection at the corresponding range and azimuth The raw lidar data is preprocessed to have fi elds of view and resolution matching those of the radar sensor as shown in the third column of Table I see Appendix C for more details After preprocessing the lidar data is in the form of a 16 32 depth map matrix The dynamic range of the lidar depth map is typically quite high because rays from the lower lasers usually travel a shorter distance until they hit the road while those from the higher lasers travel longer distances To reduce this dynamic range we perform a ground depth subtraction step Assuming that the ground is perfectly fl at and using the known sensor height we compute the distance from the lidar to the ground in each laser ray direction We subtract this quantity from the lidar depth map entries see Fig 2 The resulting compensated depth map is used as ground truth for the training phase C Network Architecture We use a deep convolutional encoder decoder neural net work trained by supervised learning to solve the problem of predicting a depth map from the radar input image see Fig 3 The architecture for the neural network is driven by conv2dconv2d downsampledense dropoutconv2d upsample Fig 3 Network architecture the unique aspects of the problem The input radar image is of size 512 32 with the fi rst dimension representing range and the second azimuth The predicted output depth map is of dimension 16 32 with the fi rst dimension representing elevation and the second azimuth This drastic change in dimensions and aspect ratio necessitates an effective down sampling of the radar input image by a factor 32 along the range axis We accomplish this downsampling through fi ve smaller downsampling steps by a factor 2 Each such step comprises the concatenation of a 3 1 convolutional layer with strides 1 1 and a 3 3 convolutional layer with strides 2 1 The number of channels is kept constant at 32 throughout this process This approach of concatenating layers with small fi lters in order to obtain a wider receptive fi eld rather than using a single big fi lter is inspired by VGGNet 27 At this stage the image is of size 16 32 the same as the output size We can therefore use this downsampled image as input to a traditional encoder decoder architecture inspired by U Net 28 We progressively downsample both dimensions to an image size of 4 8 using concatenations of two 3 3 convolutional layers as shown in Fig 3 Each time we reduce the image size we increase the number of channels by a factor 2 resulting in 128 channels once we reach image size 4 8 At the waist of the network we use a fully connected layer that takes in 128 4 8 4096 input variables and produces 512 outputs followed by a 50 dropout layer and another fully connected layer converting back from 512 to 4096 variables The use of these fully connected and dropout layers differs from the standard U Net architecture The fully connected layers allow the network to learn global features and the dropout layer improves generalization The remainder of the network makes up the decoder We use the mirror operations from the encoder part to upsample the image back to the desired output size of 16 32 We use skip connections between corresponding layers of the encoder and the decoder as is commonly done in au toencoders to preserve higher frequency features see again Fig 3 The skip connections are implemented by adding the tensor from the encoder layer to the tensor at the decoder layer 5885 TABLE II ABLATION STUDY MethodRMSEParameters Ground depth subtraction only radar not used 3 43m0 Proposed method without downsampling layers2 37m4475135 Proposed method without skip connections2 36m4545043 Proposed method without dense layers2 35m341523 Proposed method2 32m4545043 IV EXPERIMENTS A Data Collection We collect radar and lidar data during eight different days over a two week period on several highways around San Diego CA Each day s collect consists of an approximately one hour drive Non highway portions of the dataset are removed to ensure uniformity of conditions Data from six of the eight days is used to create the training set One remaining day is used for the validation set and the other for the test set In all the datasets for training validation and test have 202775 10000 and 26916 records respectively B Implementation Details We use TensorFlow 1 9 0 on a GPU with 15GB memory which supports a batch size of 100 training samples We train our network models to minimize the 2loss using the ADAM optimizer 29 with a learning step size of 0 001 We initialize network weights using Glorot initialization 30 We train for at least fi ve epochs and perform early stopping based on the validation loss Since our data is collected continuously consecutive sam ples are highly correlated We therefore need fewer traversals through the training dataset Additionally we shuffl e the data prior to training to decorrelate samples across time C Results Table II shows the impact of various network architectural choices on the root mean squared error RMSE evaluated on the validation set The fi rst row shows the RMSE for ground depth subtraction only i e without using the radar measure ments at all This baseline approach achieves an RMSE of 3 43m Our full proposed method reduces this to an RMSE of 2 32m The table indicates that the downsampling layers replaced with standard image downsampling in the ablation study dense layers and the skip connections each contribute a small but non negligible reduction in RMSE The third column in the table indicates the number of trainable param eters for each of these variants It indicates that removing the dense layers results in a signifi cant reduction in t

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

IROS2019国际学术会议论文集 2086

文档简介

温馨提示

最新文档

评论

IROS2019国际学术会议论文集 2086

文档简介

温馨提示

最新文档

评论

相关文档