




免费预览已结束,剩余1页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Resolving Elevation Ambiguity in 1-D Radar Array Measurements using Deep Learning Jayakrishnan Unnikrishnan and Urs Niesen AbstractMotivated by requirements for future automotive radar, we study the problem of resolving target elevation from measurements by a one-dimensional horizontal radar antenna array. This is a challenging and ill-posed problem, since such measurements contain only indirect and highly ambiguous elevation cues. As a consequence, traditional model-based approaches fail. We instead propose to use a machine-learning- based approach that learns to exploit the subtle elevation cues and prior knowledge of the scene from the data. We design an encoder-decoder structured deep convolutional neural network that takes a radar return intensity image in the range- azimuth plane as input and produces a depth image in the elevation-azimuth plane as output. We train the network with over 200000 radar frames collected in highway environments. Through experimental evaluations, we demonstrate the feasi- bility of resolving the highly ambiguous elevation information in such environments. I. INTRODUCTION Modern vehicles are usually equipped with radar sensors, used as inputs to advanced driver-assistance systems. Current automotive radar sensors commonly use a one-dimensional horizontal antenna array that provides resolution only in azimuth but not in elevation. However, detailed elevation res- olution is a key requirement for future automotive radar 1 3. Consequently, the need arises to try to resolve elevation information from the measurements of such an antenna array. This task of resolving both azimuth and elevation from the measurements of a 1-D antenna array turns out to be challenging. To quote from 4: “It is clear that a 2-D array is necessary to retrieve the information of 2-D i.e., azimuth and elevation arrival directions of waves impinging on the array.” The reasoning behind this statement is that for a 1-D horizontal antenna array, the radar measurements do not contain any direct and unambiguous information about target elevations. As a consequence, the problem of resolving target elevations from 1-D horizontal antenna array measurements is inherently ill posed. While solving this problem in general settings is impossi- ble, in this paper, we argue that in specifi c applications it can in fact be solved. To overcome the inherent ambiguity, rich prior information about the scene has to be taken into account to exploit subtle elevation cues hidden in the radar signal. For example, for a forward-facing automotive radar, we expect to see certain structures and objects (such as median dividers, roadside embankments, cars, trucks, .) in specifi c areas of the fi eld of view. Objects on the road at larger range tend The authors are with Qualcomm Flarion Technologies, Inc., Bridgewater, NJ 08807, USAjunnikri, uniesen (i) Radar (input)(ii) Lidar (ground truth)(iii) Prediction Fig. 1.Sample results: (i) shows radar return intensity in dB as function of azimuth (x-axis) and range (y-axis); brighter pixels indicate higher intensity. (ii) and (iii) show depth as function of azimuth (x-axis) and elevation (y-axis); brighter pixels indicate larger depths. The scene contains a truck, a car, and a roadside embankment seen as black regions from right to left in (ii) and (iii). Comparing our prediction (iii) with the lidar ground-truth (ii) shows that our approach successfully resolves the ambiguous elevation information from horizontal 1-D radar array measurements. to have higher elevation. Taller targets tend to have stronger radar returns. These elevation cues are diffi cult to formally model, ruling out traditional model-based approaches. Instead we adopt here a machine-learning-based approach, in which we im- plicitly learn the relevant elevation cues directly from a large dataset of over 200000 radar frames collected in highway environments. We design an encoder-decoder structured deep convolutional neural network (CNN) and train it using this dataset. The neural network takes a radar frame as its input and predicts a depth map, which assigns to each azimuth- elevation pair a target depth. We use a lidar sensor for ground-truth comparison of the neural network predictions. Fig. 1 depicts a sample result of our proposed approach. Comparing the predicted depth-map output (produced by the neural network from only the radar input) with the ground-truth depth-map from the lidar sensor demonstrates the feasibility of our approach: The neural network success- fully learns to use the subtle elevation cues to resolve the ambiguous elevation information from the radar signal. II. RELATEDWORK Several papers have explored the feasibility of resolving elevation information from 1-D horizontal radar antenna array measurements under relatively restricted conditions using a model-based approach. For static targets above a refl ecting ground plane (such as a road), the height can be un- ambiguously estimated by analyzing the temporal variation in interference caused by the refl ection of the radar return from the ground 5. Also for static targets, the height can alternatively be estimated by comparing the measured radial Doppler velocity of the target with the known (nonzero) 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE5883 velocity of the ego vehicle 6. Both of these approaches have severe restrictions such as being applicable only for static targets and moving ego vehicle. In addition, the multipath- based approach requires analyzing several consecutive radar frames. In contrast, our proposed approach is able to produce height estimates both for static targets (such as guard rails and roadside embankments) and for dynamic targets (such as cars and trucks), and regardless of the ego vehicle velocity. Further, our proposed approach uses only the range and azimuth data of a single radar frame. While the problem of elevation resolution from a 1-D horizontal radar antenna array explored in this paper is inherently ill posed and ambiguous, the situation changes entirely when a 2-D antenna array is used. This renders the problem well posed and allows for unambiguous elevation resolution using model-based approaches. In this form, the so-called 2-D (i.e., azimuth and elevation) direction of arrival (DOA) estimation problem has a long history. For example, 7 studies 2-D DOA for two parallel uniform linear antenna arrays. 4 analyzes 2-D DOA using L-shaped antenna arrays. 8, 9 study 2-D DOA for rectangular antenna arrays. In all these approaches, the 2-D nature of the antenna array is crucial. The problem of estimating target elevation using mea- surements from a single horizontal radar antenna array is similar in spirit to that of depth estimation from a single monocular image. Both these problems aim to recover an ambiguous additional dimension of the scene by exploiting subtle cues and complicated prior knowledge. Early work on monocular depth estimation relies on hand-crafted features that are combined with a graphical model 10. In later work, these hand-crafted features are replaced by non-parametric features learned from a training set of images 11. More recent work instead uses deep CNNs to extract the features, which are again combined using a graphical model 12, 13. The need for using an explicit graphical model is removed in 14, which combines two deep CNNs to directly regress on the pixel depths. Very recently, 15, 16 proposed encoder-decoder CNN architectures with skip connections to capture features at different scales. While our application of deep learning to the specifi c problem of resolving radar elevation ambiguity is novel, deep learning for inference based on radar signals more broadly has been explored by several recent papers as surveyed in 17. Some works focus on the use of CNNs for target recognition using either synthetic aperture radar images 18, high range resolution profi les 19, or micro-Doppler signatures 2022. 23, 24 represent the intensities of radar returns as images in the range-azimuth plane and uses them as input to CNNs to recognize objects in the scene. 25 develops a CNN to perform semantic segmentation on radar returns. III. APPROACH A. Problem Statement Our objective is to estimate the range and the 2-D di- rections of arrival of radar target refl ections measured by a horizontal 1-D antenna array. In other words, instead of just estimating range and azimuth pairs for each target as is done in traditional radar array processing, we aim to resolve range, azimuth, and elevation triplets. Resolving the elevation dimension is considerably more challenging than the other two dimensions. Indeed, standard radar signal processing techniques are only able to solve for the target range and azimuth. This limitation stems from the fact that, for a horizontal 1-D antenna array, the radar signal contains only indirect and highly ambiguous information about the target elevation. Estimating target elevation is therefore inherently ill posed. This elevation ambiguity can only be resolved by using prior information and by exploiting subtle elevation cues hidden in the radar signal. Much of these priors and cues are application dependent. In our application of automotive radar in highway environments, these may include the following: We expect to see median dividers on the left, the road in the center, and roadside embankments on the right of the fi eld of view. Similarly, for multi-lane roads, we expect trucks to be more likely in the right lane and cars to be more likely in the left lane. Using prior knowledge of the ground-plane location, we can infer information on target elevation from the range measurements. Targets on the road with larger range tend to have higher elevation in the fi eld of view. We expect taller targets to have stronger radar returns (i.e., larger radar cross sections 26). Since radar measures time of fl ight, which is a function of the height of the point of refl ection, taller targets tend to have blurrier range measurements. If we detect the radar signature of a specifi c object type (such as a car or a truck), we can use strong prior information about the size and shape of that object type. As is clear from this list, many of these elevation cues are diffi cult to formally describe or model. This diffi culty rules out traditional model-based approaches. Instead, we adopt here a machine-learning-based approach, in which we implicitly learn the relevant priors and elevation cues directly from the data. Specifi cally, we use supervised learning with a dataset of more than 200000 radar frames to train a deep convolutional neural network to regress the target range for each azimuth-elevation pair. B. Sensors and Preprocessing Recall that we aim to estimate range, azimuth, and eleva- tion triplets. These triplets can be arranged into a 2-D matrix whose rows correspond to elevation and whose columns correspond to azimuth. Each matrix entry specifi es the range or depth of the nearest target in the corresponding direction. We refer to this matrix as a depth map. For network training, we need to obtain ground-truth in- formation. The natural sensor measuring ground-truth depth maps is a lidar. This sensor uses rotating lasers with different elevation angles, each measuring depth as a function of azimuth. 5884 TABLE I SENSOR SPECIFICATIONS Radar (input) Raw lidarPreproc. lidar (ground truth) Range limit40m80m40m Range resolution8cm4cm4cm Azimuth fi eld of view9036090 Azimuth resolution3.70.0152.8 Elevation fi eld of view134221 Elevation resolutionN/A1.31.3 10 20 30 (a) 30 20 10 0 (b) Fig. 2.Lidar depth-map before (a) and after (b) ground-depth subtraction. The depth scale is in meters. The sensor specifi cations are given in the fi rst two columns of Table I; further details can be found in Appendix A. Note the difference in elevation resolution between the radar and lidar sensors in Table I. Whereas the lidar has a relatively fi ne elevation resolution of 1.3, the radar has no resolution in the elevation direction at all. It is this difference in sensor capabilities that is at the heart of our problem. The raw radar returns are preprocessed using standard techniques; see Appendix B for the details. After prepro- cessing, the radar data is in the form of a 512 32 matrix with rows representing range and with columns representing azimuth. Each matrix entry specifi es the intensity of the radar refl ection at the corresponding range and azimuth. The raw lidar data is preprocessed to have fi elds of view and resolution matching those of the radar sensor as shown in the third column of Table I; see Appendix C for more details. After preprocessing, the lidar data is in the form of a 16 32 depth-map matrix. The dynamic range of the lidar depth map is typically quite high, because rays from the lower lasers usually travel a shorter distance until they hit the road, while those from the higher lasers travel longer distances. To reduce this dynamic range, we perform a ground-depth subtraction step. Assuming that the ground is perfectly fl at and using the known sensor height, we compute the distance from the lidar to the ground in each laser ray direction. We subtract this quantity from the lidar depth-map entries (see Fig. 2). The resulting compensated depth map is used as ground-truth for the training phase. C. Network Architecture We use a deep convolutional encoder-decoder neural net- work trained by supervised learning to solve the problem of predicting a depth map from the radar input image (see Fig.3). The architecture for the neural network is driven by conv2dconv2d+downsampledense+dropoutconv2d+upsample Fig. 3.Network architecture the unique aspects of the problem. The input radar image is of size 51232 with the fi rst dimension representing range and the second azimuth. The predicted output depth map is of dimension 16 32 with the fi rst dimension representing elevation and the second azimuth. This drastic change in dimensions and aspect ratio necessitates an effective down- sampling of the radar input image by a factor 32 along the range axis. We accomplish this downsampling through fi ve smaller downsampling steps by a factor 2. Each such step comprises the concatenation of a 31 convolutional layer with strides (1,1) and a 33 convolutional layer with strides (2,1). The number of channels is kept constant at 32 throughout this process. This approach of concatenating layers with small fi lters in order to obtain a wider receptive fi eld rather than using a single big fi lter is inspired by VGGNet 27. At this stage, the image is of size 16 32, the same as the output size. We can therefore use this downsampled image as input to a traditional encoder-decoder architecture inspired by U-Net 28. We progressively downsample both dimensions to an image size of 4 8 using concatenations of two 3 3 convolutional layers as shown in Fig 3. Each time we reduce the image size, we increase the number of channels by a factor 2, resulting in 128 channels once we reach image size 4 8. At the waist of the network, we use a fully connected layer that takes in 128 4 8 = 4096 input variables and produces 512 outputs, followed by a 50% dropout layer, and another fully connected layer converting back from 512 to 4096 variables. The use of these fully connected and dropout layers differs from the standard U-Net architecture. The fully connected layers allow the network to learn global features, and the dropout layer improves generalization. The remainder of the network makes up the decoder. We use the mirror operations from the encoder part to upsample the image back to the desired output size of 16 32. We use skip connections between corresponding layers of the encoder and the decoder as is commonly done in au- toencoders to preserve higher-frequency features (see again Fig 3). The skip connections are implemented by adding the tensor from the encoder layer to the tensor at the decoder layer. 5885 TABLE II ABLATION STUDY MethodRMSEParameters Ground-depth subtraction only (radar not used)3.43m0 Proposed method without downsampling layers2.37m4475135 Proposed method without skip connections2.36m4545043 Proposed method without dense layers2.35m341523 Proposed method2.32m4545043 IV. EXPERIMENTS A. Data Collection We collect radar and lidar data during eight different days over a two-week period on several highways around San Diego, CA. Each days collect consists of an approximately one-hour drive. Non-highway portions of the dataset are removed to ensure uniformity of conditions. Data from six of the eight days is used to create the training set. One remaining day is used for the validation set and the other for the test set. In all, the datasets for training, validation, and test have 202775, 10000, and 26916 records, respectively. B. Implementation Details We use TensorFlow 1.9.0 on a GPU with 15GB memory, which supports a batch size of 100 training samples. We train our network models to minimize the 2loss using the ADAM optimizer 29 with a learning step size of 0.001. We initialize network weights using Glorot initialization 30. We train for at least fi ve epochs and perform early stopping based on the validation loss. Since our data is collected continuously, consecutive sam- ples are highly correlated. We therefore need fewer traversals through the training dataset. Additionally, we shuffl e the data prior to training to decorrelate samples across time. C. Results Table II shows the impact of various network architectural choices on the root mean-squared error (RMSE) evaluated on the validation set. The fi rst row shows the RMSE for ground- depth subtraction only, i.e., without using the radar measure- ments at all. This baseline approach achieves
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 日照市莒县中小学美术教师招聘考试试题及答案
- 2025年机务检修考试试题及答案
- 2025年铁路机务试题及答案
- 高企调账合同模板(3篇)
- qc知识考试试题及答案
- 现代农业企业代理记账服务合同
- 文化产业项目投资担保合同模板
- 消费电子行业商标许可及技术创新协议
- 剑桥数学专业测试题及答案
- 园长专业考试试题及答案
- 公路养护技术管理与实施细则
- 2025-2026学年北师大版数学小学三年级上册(全册)教案设计及教学计划
- 【桂美版】六年级美术上册-六年级(桂教版)上册美术教案(详案)全
- GB/T 17238-2022鲜、冻分割牛肉
- 第四章集装箱箱务管理
- 高尔夫人群消费及行为习惯调研报告-课件
- 天气预报的发展历程课件
- 2022年国家公务员考试申论真题及答案(地市级)
- 西方法律思想史教案课件
- 电镀基础知识介绍-课件
- 公路工程项目管理(第三版)全套课件
评论
0/150
提交评论