IROS2019国际学术会议论文集 0694

上传人：我*** IP属地：北京上传时间：2020-04-06 格式：PDF 页数：8 大小：3.49MB 积分：12 举报 版权申诉

已阅读5页，还剩3页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Simultaneous transparent and non transparent object segmentation with multispectral scenes Atsuro Okazawa1 Tomoyuki Takahata2and Tatsuya Harada3 Abstract For an autonomous mobile system such as an autonomous robot that moves throughout a city semantic segmentation is important Performing semantic segmentation under diverse conditions in turn requires 1 a robust ability to recognize objects in low visibility environments such as at night and 2 the ability to recognize objects that transmit visible light such as glass and acrylic used in doors and windows To satisfy these requirements using RGB images and infrared images simultaneously is considered effective Visibility and infrared transmission characteristics are different for different objects therefore merely entering them into the conventional semantic segmentation framework is not applicable For example when a pedestrian is present behind a glass the visible image captures the pedestrian rather than the glass and the infrared image captures the glass In this research we propose a new semantic segmentation method having a three stream structure focusing on the difference in the transmission characteristics This method extracts not only valid features for ordinary non transparent objects but also features effective for the recognition of transparent objects by utilizing differences in objects to be imaged owing to transmission characteristics Furthermore we constructed a new dataset called coaxials for the visible and infrared coaxial dataset and demonstrated that we can obtain better segmentation performance compared with the conventional method I INTRODUCTION Recently several studies have been conducted on au tonomous mobile systems For autonomous mobile systems to travel indoors and outdoors it is necessary that the following are recognizable i a movable area such as a corridor or a road and ii obstacles such as pedestrians or glass windows However two problems occur in semantic segmentation in urban outdoor and indoor settings One is the recognition of pedestrians in environments with poor visibility under visible light such as at night The other is the recognition of objects that transmit visible light such as glass and acrylic used in doors and windows Particularly recognizing transparent objects such as glass is diffi cult Only the visible image can observe the object behind the glass or the window frame with the glass fi tted Fig 1 a c To recognize such dark places and transparent objects using infrared images is considered effective Red green blue RGB images and infrared im ages exhibit different characteristics The RGB images are 1Atsuro Okazawa is with the Department of Image Processing Technol ogy Olympus Corporation Japan 2Tomoyuki Takahata is with the Department of Mechano Informatics Graduate School of Information Science and Technology The University of Tokyo Japan 3Tatsuya Harada is with the Department of Mechano Informatics Gradu ate School of Information Science and Technology The University of Tokyo Japan and with RIKEN Tokyo Japan TreeAisleGlassRoad a RGB b IR c RGB only d Ours Fig 1 Examples of images and recognition results of the created dataset a RGB image b IR image c segmentation using only RGB image and d segmentation using the proposed method Using ICNet 1 for the encoder decoder model images obtained by photographing the energy refl ected by visible light an electromagnetic wave of wavelength approx imately 400 800 nm on the surface of the object Therefore pedestrians cannot be recognized at little ambient light such as at night Meanwhile long wave infrared LWIR imaging captures infrared light which has wavelengths in the 7 13 m band LWIR imaging measures the radiant energy of infrared light determined by the surface temperature of an object and its emissivity Pedestrians generally have a higher temperature than the environment and thus radiate more infrared light consequently it is possible for LWIR imaging to capture them clearly even in dark places Glass acrylic or similar materials which transmits visible light generally absorbs far infrared light therefore only the energy radi ated from the transparent object itself can be photographed Fig 1 b d Hence we consider that multimodal semantic segmentation using infrared information and RGB images can be applied to solve the two above mentioned problems With regard to semantic segmentation deep neural net works based on convolutional neural networks CNNs 2 has achieved great success In particular Segnet 3 which has an encoder decoder structure was proposed beginning with fully convolutional networks 4 and has become a basic network structure for semantic segmentation in recent 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE4977 years Recently the faster ENet method 5 and methods that perform stepwise frequency resolution of images or feature maps have been applied to an encoder decoder model 1 6 7 and yielded desirable results Previous studies 8 9 10 11 demonstrated that semantic segmentation using visible images and infrared images yielded good recognition results when recognizing human bodies with high infrared emissivity at night In these studies the one stream structure in which an image with a visible image and an infrared image are concatenated is input and the two stream structure with a visual image and an infrared image is inputted separately subsequently the features are added or concatenated For the simultaneous recognition of transparent objects and non transparent objects designing CNNs considering the transmission characteristics of objects is important For example when a transparent object such as glass is pho tographed an image of the object behind the glass is obtained in the visible image and an image of the glass itself is obtained in the infrared image In this case because the information observed by visible light and infrared light are different the difference between visible light and infrared light is useful for recognition Meanwhile for objects that can observe the energy refl ected or emitted by both visible light and infrared light such as the human body the char acteristics of the spatial pattern are important irrespective of whether it is visible light or infrared light However in the CNNs of one stream or two stream structures using conventional visible images and infrared images acquiring both features of spatial patterns and features of differences simultaneously is diffi cult Because the one stream structure is not characterized by independently extracting the visible image and the infrared image obtaining the feature of the difference is diffi cult Meanwhile because the two stream structure feature a visible image and an infrared image independently the features of differences may be acquired however depending on the learning data the features of both the difference and the spatial pattern are not suffi ciently acquired Therefore in this research we propose CNNs with a three stream structure that independently inputs three types of image the visible image the infrared image and the concatenated image of the visible and infrared images We regard the encoder that inputs the concatenated image as an input that extracts the feature of the spatial pattern effective for the non transparent object and the encoder that inputs the visible image and the infrared image independently for extracting these differences By adopting a structure that separates roles we can suffi ciently extract features that are effective for the recognition of both transparent objects and non transparent objects In this study we verifi ed the above hypothesis based on two experiments In the fi rst experiment we tested objects with different transmission properties by modality such as glass or pedestrians in two ways 1 using only RGB images and 2 using an existing method of combining the information from both images In the second experiment we proposed an effective method for recognizing objects including those whose transmission properties vary under visible and infrared lights and subsequently compared the results with the existing method The contributions of this paper are as follows We proposed the semantic segmentation of a new three stream structure that could recognize both transparent objects and semantic segmentation simultaneously In the simultaneous recognition of transparent objects and non transparent objects the three stream structure indicated an improved intersection over union IOU compared to the existing methods We created the dataset coaxial for semantic segmenta tion consisting of more than 17 000 pairs of visible and infrared coaxial images including transparent objects and non transparent objects at night evening indoor and other illumination conditions II RELATED WORK The semantic segmentation required for the autonomous mobile system is robust to various illumination conditions additionally real time properties to follow the change in situation owing to movement are required Further multi modal semantic segmentation is effective in situations where recognition is diffi cult using only visible images Existing research related to the request of the autonomous mobile system is detailed below A High quality semantic segmentation As mentioned in the introduction Segnet 3 a basic net work structure in semantic segmentation with an encoder decoder structure has been recently proposed Recent studies have also proposed encoder decoder structures that ensemble multiscale features to represent all frequency information 1 6 Yuet al devised a dilated convolution and proposed a method to expand the receptive fi eld without reducing the spatial dimension 12 Representative methods using dilated convolution include Deeplab v2 13 3 14 and 3 7 Although these methods are effective for improving accuracy they are diffi cult to be directly applied to real time systems B Fast semantic segmentation Segnet 3 relinquishes layers to reduce layer parameters and ENet 5 is a lightweight network with a reduced number of parameters In addition ICNet 1 realized CNNs with a multiscale high speed encoder decoder structure using a step reduced image as the input These methods improve the effi ciency considerably however their accuracy is a concern C Multimodal semantic segmentation Several multimodal semantic segmentation methods using CNNs have been proposed Zhuet al proposed a CNNs architecture that combines visible and distance images 15 Guoet al proposed a CNNs architecture that merges PET images CT images and MRI images for medical applica tions 16 In the method using infrared images Haet al proposed a CNNs structure wherein a visible image and an infrared image are merged 9 In past methods using infrared 4978 image the features are extracted from each image and the features are concatenated or added in the intermediate layer In the latest research Wanget al focused on the method of convolution in CNNs In this approach when the visible image with the distance image are convoluted the pixel that is as the same distance as the center pixel of the kernel is weighted 17 D Dataset For the image dataset the dataset of the visible image is main current such as CamVid 18 Cityscapes 19 and Daimler Urban Segmentation 20 A dataset of visible and depth images exist as multimodal datasets 21 22 further a dataset for object detection 8 of visible and infrared images and a dataset for semantic segmentation 9 exist However an explicit dataset is not involved in photographing under various illumination conditions and transparent objects are not included in recognition targets The dataset containing transparent objects recognized under various illumination conditions has not yet been disclosed E With multispectral scene Using infrared images pedestrians can be imaged even in poor visibility conditions such as at night Previous studies 8 11 23 have signifi cantly contributed to the improvement in pedestrian detection performance at night by detecting objects using visible images and infrared images In recent years a method of inputting visible and infrared images by adopting the CNNs structure has been proposed and its purpose has been expanded to general object recog nition 9 10 both have been successful However to the best of our knowledge research on general object recognition including transparent objects has not been conducted yet F Transparent object recognition The recognition of transparent objects is important in object recognition for autonomous mobile systems Hitherto research on recognizing glass via a method using only visible images have been conducted 24 25 26 Even in the multimodal method a method that uses distance images 27 has been proposed Other studies 25 26 have reported the recognition of transparent objects using the CNNs structure However in either method only glass was detected and other general objects with different physical properties were not detected In this research we propose a new CNNs structure for general object recognition useful for detecting simultaneous transparent and non transparent objects III SIMULTANEOUS TRANSPARENT AND NON TRANSPARENT OBJECT SEGMENTATION A Outline of existing methods Before explaining the features of the network proposed herein as shown in Fig 2 an overview of an existing method performing semantic segmentation using a visible image and an infrared image is presented Here it is classifi ed into three methods image concat Fig 2 a feature concat Fig 2 b and selective result Fig 2 c The image concat method is an encoder decoder model in which a 4ch image that concatenates a visible RGB 3ch image with an infrared 1ch image is inputted as shown in the following equation Fig 2 a y y y gm fm Concat x x xrgb x x xir 1 Here x x xrgbis a visible image x x xiris an infrared image Concat is the function that concatenates the input im ages fm is an encoder of multimodal data gm is a decoder of multimodal data and y y y is an output Next the feature concat method is a two stream structure CNNs in which the feature from the visible image is con catenated with the feature from the infrared image at the intermediate layer Fig 2 b The feature concat method is expressed by the following equation y y y gc Concat frgb x x xrgb fir x x xir 2 where frgb is an encoder of the visible image fir is an encoder of the infrared image and gc is a decoder of the concatenated features The selective result method selects the result with the highest probability among the outputs of the softmax function at the last stage of the decoder for each visible image and infrared image Fig 2 c This equation is described below y y y max grgb frgb x x xrgb gir fir x x xir 3 B Design guideline of network structure As mentioned in the introduction if the object to be recognized contains substances such as glass that transmit visible light and do not transmit infrared light the existing method described in Section III A is ineffective Because the visible image captures the object behind the glass and the infrared image captures the glass itself the visible and infrared features extracted from the image are different In contrast when the object is a pedestrian spectra refl ected or radiated from the human body are observed such that the shape of the human body is imaged as an image for both visible and infrared images In this case the feature of the spatial pattern representing the shape becomes important Therefore in consideration of these two characteristics we herein propose a CNNs of the three stream structure that independently inputs three of the images the visible image infrared image and concatenated image of the visible and infrared images C Proposed network structure Our proposed network structure of the three stream struc ture is shown in Fig 3 It contains two subnetworks re ciprocal feature network Fig 3 a and cooperative feature network Fig 3 b The reciprocal feature network separately inputs the vis ible and infrared images and extracts the feature to each Fig 3 a v v vrgb frgb x x xrgb 4 4979 RGB image IR image RGB IR 4ch image RGB image IR image a Image concat b Feature concat c Selective result Encoder part of Segnet et al Decoder part of Segnet et al Feature map Softmax layer Concatenate Argument of the maximum Concatenate Fig 2 Structure comparison of existing multimodal semantic segmentation methods a Image concat b Feature concat and c Selective result RGB image IR image RGB IR 4ch image Mutual feature IR feature RGB feature conv3 a Reciprocal feature network b Cooperative feature network Cooperative feature Concatenate Dimensional compression Prediction Fusion Multi modal image input conv4 Concatenate conv1 Dimensional compression Fusion conv2 Fig 3 Construction of the proposed method a Reciprocal feature network b Cooperative feature networkD v v vir fir x x xir 5 v v vrgbis a visible feature and v v viris an infrared feature Inde pendently extracted features are merged by the concatenating layer at the later stage and the convolution layer h of two stages It is expressed by the following equation v v vs h2 h1 Concat v v vrgb v v vir 6 The two features are merged by the fi rst stage convolution with respect to the concatenated feature in the second stage convolution dimensional compression is performed to match the output of the cooperative feature network of the subse quent stage and the dimension number Consequently the new feature v v vs hereinafter reciprocal feature obtained by properly fusing the two features is output from the reciprocal feature network By learning the reciprocal feature network can determine the weight of each layer of the encoder such that a difference effective for separating transparent and non transparent objects is extracted The cooperative feature network exhibits the same struc ture as the image concat method An effective feature here inafter cooperative feature is calculated to recognize

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

IROS2019国际学术会议论文集 0694

文档简介

温馨提示

最新文档

评论

IROS2019国际学术会议论文集 0694

文档简介

温馨提示

最新文档

评论

相关文档