IROS2019国际学术会议论文集1774_第1页
IROS2019国际学术会议论文集1774_第2页
IROS2019国际学术会议论文集1774_第3页
IROS2019国际学术会议论文集1774_第4页
IROS2019国际学术会议论文集1774_第5页
已阅读5页,还剩3页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

A Density Map Estimation Model with DropBlock Regularization for Clustered-Fruit Counting Xiaochun Mai, Xiao Jia, Xiaoling Deng and Max Q.-H. Meng AbstractModern agricultural robots like drones have been studied in automatic yield estimation in recent years. Fruit counting is a fundamental task in the automatic yield estimation, on which signifi cant progress has been achieved by detection- basedmethodsandsegmentation-regression-basedmethods. However, for clustered-fruit counting, the existing methods lack advantages on the localization of small and occluded fruits or discrete number regression. In addition, it is observed that existing deep neural network based counting methods have high variances on fruit density map estimation. Aiming at solving these two problems and decreasing the regression variance, in this paper, we propose a density-map-estimation model with DropBlock regularization. For evaluating the proposed model, we propose a new Clustered-Fruit dataset. Extensive experiments show that the proposed model is effective and outperforms the state-of-the-art counting methods on the Clustered-Fruit dataset. Our dataset is available at Clustered-Fruit. Keywords Deep learning, density estimation, counting, fruit yield estimation, agricultural robots I. INTRODUCTION Recentyears,inordertoimprovecropyieldand transportationscheduling,agriculturalrobotshavebeen studied to help planters to monitor and estimate production yield. Besides ground robots, aerial robots, such as drones and quadrotors, have been also used in yield estimation. With the advantage of fl exibility, drones can fl y row by row and collect data in fi elds. Compared to traditional manual way of yield sampling and estimation, the modern way by using aerial robots is more effi cient, especially for broad fi elds. Fruit counting is a fundamental yet challenging task for yield estimation. Achieving accurate fruit counting is diffi cult due to occlusion, illumination variants and non-uniform distribution of fruits. A key limitation of this task is under- counting of fruits in clustered groups 3. Intuitively, the reasons for under-counting of clustered groups are (a) scale variants, (b) occlusion, (c) variant quantity of fruits. Signifi cant progress on fruit counting has been achieved by deep learning methods. Therein, detection based methods have advantages on detecting big fruits and a small number of fruits in images. However, these methods have disadvantages Xiaochun Mai, Xiao Jia and Max Q.-H. Meng are with the Robotics, Perception and AI Laboratory at the Department of Electronic Engineering, The Chinese University of Hong Kong, N.T., Hong Kong SAR, China. Xiaoling Deng is with the Department of Electronic Information Engineering, School of Electronic Engineering, South China Agricultural University, Guangdong Province, China. (e-mail:.hk, .hk, dengxl, ) Corresponding Author Fig. 1: A drone sampling in an orchard. on detecting small and clustered fruits due to localization errors. In addition, segmentation-regression based methods show good performance on images with little quantity variant fruits. For clustered fruits, the number of fruits varies quite differently. The quantities to be estimated are discrete numbers. Hence, segmentation-regression based methods lack robustness on quantity estimation of fruits within clustered groups. Inordertotacklethelimitationofoccludedfruit localization, we propose to localize fruits via estimating density maps. For density map estimation, the ground-truth of a fruit location is a Gaussian kernel matrix instead of a four dimensional coordinate vector of a bounding box. It will be easier to learn a matrix than a four dimensional vector from fruit pixel values. Moreover, the values of Gaussian kernel matrix are continuous numbers instead of discrete numbers, thus regression model fi tting will be easier by using continuous numbers as ground-truth. Then, the limitation of discrete number regression can be tackled at the same time. Inspired by the excellent performance of the Convolutional Neural Network (CNN), we use CNNs for the density map estimation. Theoretically, a deep density map estimation model can solve the drawbacks of occluded fruit localization and discrete number regression, but like many regression models, it tends to overfi t. High variances have been observed in the density map estimation by existing counting methods. In order to reduce the variance of the density map estimation model, we propose a density map estimation model with DropBlock regularization, and evaluate the proposed approach on our clustered fruit dataset collected by a drone in a citrus orchard. The contributions of this paper are as follows: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE5512 We propose a new dataset for the fruit counting task. Compared to existing datasets, the proposed dataset features a large dataset size, large fruit quantity and high occlusion. Therein, large fruit quantity and high occlusion creates diffi culty for conventional fruit counting methods. We propose a density map estimation model with DropBlock regularization. The proposed model is end-to- end trainable. To the best of our knowledge, this paper is the fi rst to use a density map estimation based model for clustered-fruit counting. II. RELATEDWORK A. Fruit Counting Fruit counting approaches can be mainly divided into two categories. The fi rst category is detection-based methods 6 9, 30. Fruit counting by detection-based methods estimates bounding-box locations and predicts fruit categories by the region proposals. A proposal is considered to be a true positive, if the maximum Intersection over Union (IoU) value between the proposal and a bounding box of ground truth is larger than the IoU threshold. A proposal is considered to be a false positive, if the maximum IoU value of that is less than the IoU threshold or the corresponding object has been detected. The number of correct detected fruits is the predicted number of fruits in a corresponding image. For example, Faster R- CNN model 10 has been widely used in fruit detection. It has shown a good performance on big fruit detection in natural illumination environments. However, due to limited features, detection-methods have localization errors on small fruits and clustered fruits with high occlusion. Our proposed method provides another way to the localization of small and clustered fruits. The second category is segmentation-based methods 1, 1113, 19. In these methods, a segmentation algorithm is used to obtain fruit pixels, followed by a detection algorithm or a regression model. For example, a CNN based segmentation model such as Fully Convolutional Network (FCN) 20 , has been used to segment fruits. After segmentation, a regression model (i.e. a simple CNN with non-linear activation functions) acts as a counting module to estimate the number of fruits 1. For segments with few fruits, a regression model has a good capability. However, for segments with clustered groups which have many mutual occluded fruits, a simple regression model has limitations. The pixel numbers of clustered groups are various, which brings a dimension reduction problem. Also, the numbers to be estimated are discrete numbers. Hence, the regression model which takes clustered groups as input and discrete numbers as output, lacks robustness. In our proposed method, fruit features are regressed to continuous numbers obeying Gaussian distributions. B. Density Map Estimation Recently, deep convolutional neural networks have been used in the task of density estimation and crowd counting 2225. Three-column CNNs with different convolutional receptive fi elds 25 are used as feature extractors to learn different scale features. Then feature maps were concatenated, followed by a 1 1 convolutional layer as a regressor. Lately, the negative correlation learning (NCL) strategy was used to learn a regressor ensemble 24. Two different networks D-ConvNet-v1 and D-ConvNet-v2 were designed to evaluate the performance of the NCL strategy. To enhance the feature diversity, branches of the D-ConvNet-v2 network have convolutional layers with different fi lter sizes. The structure of our proposed network is inspired by the work 25. C. Regularization methods For training a nerual network and making it keep far away from overfi tting, besides data augmentation and cross validation, weight regularization method is also a good choice when we have little amount of training data. The regularization methods 2729 for neural networks have been studied to inject noise into the neural networks so that they dont overfi t the training data. DropBlock 26 is one of the advanced and effective methods, which can drop spatially correlated features from feature maps. In our work, considering that the regularization can reduce variance of regression model, we take advantage of the DropBlock in designing a density map estimation network for clustered-fruit counting. III. PROPOSEDAPPROACH In this section, a deep ensemble model of density map estimation is proposed. Firstly, the problem of clustered-fruit counting is formulated. Secondly, DropBlock regularization is introduced. Finally, an overview of the proposed deep model is illustrated. A. Problem Formulation The goal is to estimate the quantity of visible fruits in images. First of all, we denote a training dataset as (X1,y1),(X2,y2),.,(XN,yN), where Xi Rh1w1,i = 1,2,.,N is a fruit image, yi R is the number of fruits in the ith image annotated by human. A counting model is denoted as F(X;), where X = X1,X2,.,XN, is the model parameters to be learned. We denote a fruit density map output by the model F(X;) as Di Rh2w2. Here, h1,h2,w1,w2 Z+, h2 h1and w2 w1. For the ith image, the estimated quantity of fruit is ci R, as seen in Equation 1. Di= F(Xi;) ci= X Di (1) In this problem, we denote ground-truth density maps as density maps Gi Rh1w1 ,i = 1,2,.,N. The defi nition 4 of Giis illustrated in Equation 2, where xlis the location vector of a fruit center. (x xl) is a unit pulse function of time-delay. gl(x) is a Gaussian kernel with standard deviation l. Ground-truth density map Gi(x) of the input image Xican be denoted as the summation of normalized Gaussian kernels centered at fruit centers. That is, each fruit is represented by a normalized Gaussian kernel matrix. The center pixel of a fruit has the highest value in the corresponding Gaussian 5513 distribution. The pixels around the center pixel have decreasing values along the direction away from the center pixel. The surrogate fruit quantity yiin the ith image (summation of ground-truth density map) is approximate to the number of fruits in the ith image annotated by human. Formally, yi= P Gi, yi yi. Gi(x) = yi X l=1 (x xl) gl(x)(2) B. DropBlock DropBlock 26 is one of the advanced regularization methods for dealing with the overfi tting problem of deep neural networks. It has been proven to be effective in the networks for image classifi cation, object detection and segmentation. DropBlock drops contiguous regions from a feature map of a layer by using a mask. The mask is generated through setting elements of some blocks as zeros with a predefi ned dropping probability pdropand a block size s. Suppose we have a feature map with the size of m n, we generate an initial mask S Rmnwith elements Si,j sampled from a Bernoulli distribution with probability , as defi ned in Equation 4. The probability controls the number of features to drop. has a relationship with the desired drop probability. Equation 3 denotes the relationship between the drop probability and . Taking Si,jas centers and the block size as side length, we can generate blocks in the mask S. With the mask S, we can remove some features by conducting a dot product operation on the feature map, as defi ned in Equation 5. Then, the features are normalized as Equation 6. Figure 2 shows an example of DropBlock operation on a feature map with the size of 16 16. The drop probability and block size are set as 0.3 and 3, respectively. According to the empirical experiments in 26, the DropBlock can increase the accuracy when it is placed after convolutional layers or in skip connections. = pdrop s2 m n (m s + 1)(n s + 1) (3) Si,j Bernoulli()(4) H = H S(5) H = H (mn? X S)(6) C. Density Map Estimation Network with DropBlock F(Xi;) = 1 M M X i=1 neti(7) The advanced algorithm multi-column CNNs (MCNN) 25 has achieved impressive progress in people counting. We transfer the MCNN algorithm to the clustered-fruit counting task. However, after training and evaluating the MCNN, we observed that the estimated density maps have high variances. In other words, the elements of the estimated density maps have high variances under the reference of the elements of the (a) Feature map(b) Operated feature map Fig. 2: An example of the DropBlock corresponding ground-truth density maps, which shows that the model overfi ts. Hence, we propose to use the DropBlock to reduce the overfi tting of MCNN. In this section, we propose a deep density map estimation network with DropBlock to estimate the quantity of fruits. Our network is shown in Figure 3. The proposed model consists of three CNNs with different convolutional fi lter sizes and a 11 convolution fusion operator. Each CNN has fi ve convolutional layers, two max pooling layers and two DropBlock layers, which outputs a group of feature maps. Then, the three groups of feature maps are fused by the fusion operator. In each CNN, the DropBlock layers locates after pooling layers. The whole network takes RGB images as input and outputs density maps of fruits. There are three factors considered in the design of our deep density map estimation network. Firstly, we consider to design a network that is as shallow as possible. Secondly, we consider to make the CNNs learn scale variant features of fruits. Inspired by the MCNN 25, we use three CNNs whose convolutional layers have different fi lter sizes as the feature extractor. As shown in Figure 3, fi lter sizes of convolutional layers in branch 1 are 99, 77 and 11. Filter sizes in branch 2 are 77, 55 and 11. Filter sizes in branch 3 are 5 5, 3 3 and 1 1. For the parameter settings of CNNs, we follow the parameter settings of convolutional layers by 25. These parameters perform better than other parameters we have tried. Details are shown in Figure 3. On the other hand, we consider to reduce the number of active neurons of feature maps during feature learning so that we can reduce the number of weights and therefore reduce overfi tting. The DropBlock has been proven to be effective in the networks for image classifi cation, object detection and segmentation. Also, the DropBlock has outperformed recent regularization methods for CNNs. Hence, we use the DropBlock in the density map estimation network. How many DropBlocks should be used and where can the DropBlock be applied in the deep density map estimation network? The previous work 26 have shown the DropBlock placed after convolutional layers has increased the accuracy. We look for the best choice for the deep density map network by trying different numbers of DropBlocks and different positions. For example, in the deep density map 5514 11 conv 2704803 2704803 2704803 9x9 7x7 5x5 26848016 6712016 13424032 671208 77 77 7711 13424032 6712016 26848020 6712020 13424040 6712010 55 55 5511 13424040 6712020 26848024 6712024 13424048 6712012 33 33 3311 13424048 6712024 feature mapsconvolutional filter feature maps after DropBlock 671201 6712030 Fig. 3: Deep density map estimation network with DropBlock. estimation network, we have tried to apply two DropBlocks after the convolution layers. One DropBlock is applied after the 2nd convolution layer and the other one after the 3rd convolution layer. In addition, we also have tried to apply one DropBlock after the 2nd, the 3rd and the 4th convolution layer, respectively. Finally, we found that the DropBlocks applied after the pooling layers has the best result. There are two max pooling layers in the deep density map estimation network. After each pooling layer, we apply a DropBlock. The proposed network is as shown in Figure 3. We use the mean squared error loss function (Equation 8) to train the proposed network. L() = 1 N N X i=1 MSE(Di,Gi)(8) IV. EXPERIMENTS ANDRESULTS A. Clustered-Fruit Dataset A new fruit dataset, named Clustered-Fruit, has been collected by a Bebop-2 drone fl ying in a citrus experimental orchard row by row under natural illumination condition. The data was collected during the harvesting period, which has a refl ection on the visual distribution of mature fruits. The raw data collected by the drone is a video. We obtained fruit frames by sampling the video at a frame rate of 30 fps. There are overlapping fruits between two adjacent frames. The resolution of each frame is 1920 1080. We labeled each image by an annotation tool called “agdss” tool published by 1. We label each fruit by drawing a circle on the fruit and let the fruit center locate at the circle center. The fruit pixels are contained Fig. 4: Histograms of fruit datasets. x: #fruit, y: #image in the circle. When we draw a circle on a fruit, we can obtain fruit center coordinates. There are two labelers annotating the images. After the fi rst round annotation, annotations were checked and adjusted at least twice by labelers. In proposed dataset, there are 217 annotated images and 13,502 fruits in total with their centers annotated. As histograms shown in Figure 4, compared to the existing dataset, our dataset features large dat

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论