




免费预览已结束,剩余1页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Generating an image of an objects appearance from somatosensory information during haptic exploration Kento Sekiya1, Yoshiyuki Ohmura and Yasuo Kuniyoshi2 AbstractVisual occlusions caused by the environ- ment or by the robot itself can be a problem for object recognition during manipulation by a robot hand. Under such conditions, tactile and somatosensory information are useful for object recognition during manipulation. Humans can visualize the appearance of invisible objects from only the somatosensory in- formation provided by their hands. In this paper, we propose a method to generate an image of an invisi- ble objects posture from the joint angles and touch information provided by robot fi ngers while touching the object. We show that the objects posture can be estimated from the time-series of the joint angles of the robot hand via regression analysis. In addition, conditional generative adversarial networks can gen- erate an image to show the appearance of the invisible objects from their estimated postures. Our approach enables user-friendly visualization of somatosensory information in remote control applications. I. INTRODUCTION Object and environmental recognition are crucial pro- cesses for object handling in the real world. Progress in computer vision has enabled robots to detect objects and recognize them. Additionally, computer vision is helpful in shape recognition and pose estimation applications for the purposes of robotic manipulation. However, com- puter vision is often useless during object manipulation because the robot hand or the surrounding environment hides part or the entirety of the object. In such situations, visual processing of the changes in the position and pose of an object that has been touched by the robot becomes diffi cult. Humans can recognize and manipulate objects in situ- ation where the visual information has been lost, e.g., in the dark, or when the object is in a pocket. Klatzky et al. showed that humans can recognize the type of an object with only a few touches 1. Furthermore, humans seem to be able to visualize an individual objects informa- tion during haptic exploration 2. While somatosensory information mainly consists of self-motion and posture- related information, humans frequently pay attention to the objects posture and pose rather than their hands pose. Because the objects posture and pose are more important than self-motion during manipulation, this 1Kento Sekiya is with the Faculty of Engineering, the University of Tokyo, Japansekiyaisi.imi.i.u-tokyo.ac.jp 2Yoshiyuki Ohmura and Yasuo Kuniyoshi are with the Graduate School of Information Science and Technology, the University of Tokyo, Japan ohmura,kuniyoshisi.imi.i.u-tokyo. ac.jp . . . Somatosensory information PostureReal images 128 128 145 n 1 Fig. 1: System used to match an image to somatosensory information via an objects posture. attention bias is reasonable. However, the method used to extract the objects information from the somatosensory information is poorly understood. We believe that this ability is crucial for eff ective object manipulation. In this paper, we show that the postures of sev- eral known objects can be estimated from time-series somatosensory information and provide a model that generates an image of the appearance of sample objects during haptic exploration. We propose a method that combines regression networks with conditional generative adversarial networks (cGANs) 3. Regression networks estimate an objects pose from somatosensory informa- tion and we evaluate how much of the time-series hand data contains the object pose information. A cGAN is a generative model that generates an image of an object corresponding to that objects pose. We also evaluate whether or not the generated image shows the objects pose correctly. Our proposed approach can be used to complement the visual information of objects when they are covered by the surrounding environment. The robot can present the somatosensory information as an image that a human can understand easily and our approach enables user- friendly visualization of somatosensory information in remote control applications. II. Related work A. Object recognition In the computer vision fi eld, high-level object recogni- tion has been achieved. Through the use of deep neural networks, techniques for feature extraction from images have improved and the acceleration of the processing 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE8132 hand data n145 random noise n2 Regression nets mlp(145,50,10,2) n (cos, sin) pose n100 Generator mlp(100,256,512,1024,16384) n128128 Embedding Discriminator mlp(16384,512,512,512,1) 128128 real images Random selection Embedding real or fake? fake real generated images Regression nets Conditional generative adversarial networks Fig. 2: Model composed of regression nets and cGAN. The conditional labels of cGAN are constructed from the estimated objects pose with regression. ”mlp” means multi-layer perceptron and numbers are layer sizes. time has enabled real-time object recognition 4 5 6 7 8. In the neuroscience fi eld, the ability of humans to recognize objects by touch has often been discussed. Hern andez-P erez et al. showed that tactile object recog- nition generates patterns of activity in a multisensory area that is known to encode objects 2. Monaco et al. also showed that the area of the brain related to visual recognition is activated during haptic exploration of shapes 9. Furthermore, the relationship between visual perception and object manipulation has also been discussed in recent years 10. Therefore, it is believed that humans can imagine visual information from the tactile information acquired during haptic exploration. B. Image generation Generative modeling has been studied in both the computer vision and natural language processing fi elds. Recently, deep neural networks have made a major con- tribution to image generation using generative modeling. Examples of the deep generative models that have been developed include the variational autoencoder (VAE) 11 and generative adversarial networks (GANs) 12. GANs include two networks, known as the generator and the discriminator, and the generator can generate high-resolution images that we cannot discriminate from real images, but GANs have a problem with training instability. To solve this problem, various studies have proposed improved GAN models 3 13 14. In this pa- per, we have focused on cGANs 3, which can control the generated images using a conditional vector. In cGANs, conditional vectors are merged into the inputs of both the generator and the discriminator, so the generator can learn weights that represent images that correspond to conditional vectors. III. Methods A. Overview To generate an image of an objects appearance from somatosensory information during haptic exploration with supervised learning, it is necessary to collect a set composed of an image of the objects appearance and the somatosensory information during haptic ex- ploration. However, in the real world, the robotic hand generally covers objects during haptic exploration, so it is diffi cult to collect the object image and somatosensory information simultaneously. We propose a system to match images to the somatosensory information via the objects posture, which is measured using a rotation sensor. We collect the somatosensory information and the object posture data simultaneously, and collect the object posture and image data simultaneously. Finally, we match the images to the somatosensory information, as shown in Fig.1. Fig.2 shows the model used to generate an image of an objects appearance from somatosensory information during haptic exploration. To determine whether an objects information can be extracted from somatosen- sory information alone during this exploration, we used regression nets that estimate an objects pose from the somatosensory information. The cGAN trains a gener- ator that generates images from noise and conditional vectors that are constructed from the estimated objects pose. B. Regression nets We used regression nets to extract the objects pose from the somatosensory information. The regression nets were trained using a set of object postures and the somatosensory information, and estimated the objects pose. Posture data are cyclic data that become the same pos- ture again after rotating through 360. Therefore, when 8133 Fig. 3: Experimental setup. The robotic hand equipped with the fi xed robot arm touches the object at random. The stereo camera captures images of the object. THUMB (5+1) FF (4+1) MF (4+1)RF (4+1) LF (5+1) WRIST (2+0) Joint Touch sensor Fig. 4: Degrees of freedom of the robotic hand. The fi ngers have 22 degrees of freedom and the wrist has two degrees of freedom. The robotic hand has fi ve fi ngers that are equipped with touch sensors on the fi nger- tips. regression nets are trained, raw posture data cannot be used to calculate the minimum square error. We thus used the cosine and the sine of the posture as the outputs of the regression nets. C. Conditional generative adversarial networks A cGAN is composed of generator networks and dis- criminator networks. A conditional label is a number in the 0 9 range that classifi es one round of the objects posture into 10 discrete classes in our experiment. In the case where there are too many classes, we believe that the small quantities of training data per class infl uence the instability of the cGANs learning. In the case where there are too few classes, we believe that various images of the objects poses were included in a single class, so a conditional label cannot be used to control the correct Fig. 5: Three objects used in the experiments. The left object is a regular square prism, the middle object is an elliptical cylinder, and the right object is a regular triangular prism. image of the objects pose. LGis the loss function of the generator and LDis the loss function of the discriminator described by (1)-(2). x represents real images, y is a conditional label that is constructed from the estimated object poses and z is random noise. The generator minimizes log(D(x|y), which means that the discriminator discriminates the real images from the generated images correctly, and maximizes log(D(G(z|y), which means that the discrimi- nator recognizes the generated images as real images. In contrast, the discriminator minimizes log(D(G(z|y). LG= Expdatalog(D(x|y)+Ezpz1log(D(G(z|y) (1) LD= Ezpzlog(D(G(z|y)(2) D. Evaluation of the generated images To evaluate whether or not the generated images express the objects appearance correctly, we compare the image pixels of the generated images with those of the real images. The cGAN generates an image that corresponds to a conditional label and we calculate the pixel loss between this image and the real images for 10 classes. If the class of the smallest loss corresponds to the input label or to the label on both sides, the generated image expresses the objects appearance correctly. E. Implementation We implemented regression nets and the cGAN us- ing Keras 15, which is a neural network library in Python. The cGAN was trained using the DGX-1 system (NVIDIA), which contains eight Pascal P100 graphics processing units (GPUs). IV. Data collection A. Hardware setup Fig.3 shows the experimental hardware setup. We used a robotic arm (LBR iiwa 14 R820, KUKA) that has seven degrees of freedom and a robotic hand (Shadow Dexterous Hand E Series, Shadow Robot Company) that has 24 degrees of freedom (Fig.4). The joint angles of the robotic arm are all set at fi xed positions. The robotic hand has fi ve fi ngers that are equipped with touch sen- sors on the fi ngertips. Touch sensors are Pressure Sensor Tactiles (PSTs) which are a single region sensor. 8134 time Fig. 6: Haptic exploration with the robotic hand. 50150250350450550650750850950 epoch 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 loss regression loss loss:1 loss:3 loss:5 loss:7 loss:9 Fig. 7: Comparison of the transitions of the minimum square error when changing the span of time-series so- matosensory information in the case of a square prism. The test objects are set on a horizontal table and their positions are fi xed. They rotate around a single pivot and their angular positions are measured using a rotary encoder (MAS-14-262144N1, Micro Tech Laboratory). A stereo camera (ZED, Stereolabs) is then used to take photographs of the objects. The object images are gray scale images with a size of 128128. We used three objects: a regular square prism, an elliptical cylinder, and a regular triangular prism (Fig.5). The regular square prism achieves the same pose by rotating through 90. The elliptical cylinder achieves the same pose by rotating through 180. The regular triangular prism achieves the same pose by rotating through 120. B. Haptic exploration using the robotic hand We controlled the robotic hand remotely using a glove (CyberGlove II , CyberGlove Systems) and the robotic hand touched the objects at random (Fig.6). We collected fi ve tactile data and 24 joint angles of the robotic hand with a 10 Hz cycle, and then merged the tactile and somatosensory data into 29 dimensional hand data. The hand data generated when touching the objects at two or more points were extracted. To use the time-series infor- mation of the hand data, we merged the extracted hand TABLE I: Accuracy of the estimated posture with regres- sion analysis ShapeAccuracy Square prism89.9% Elliptic cylinder92.3% Triangular prism88.7% data with several steps before and after the extracted hand data were acquired. We collected 3000 extracted somatosensory data for each object. V. Experiments A. Pose estimation We trained the regression nets on the somatosensory information and the estimated object poses. The so- matosensory information was split into two sets, and we used 1500 data to train the regression nets and used the other 1500 data to test a regression model and estimate the object poses. To determine how many steps of the hand data were merged with the extracted hand data, we evaluated the minimum square error of the regression in time windows of fi ve diff erent sizes. Fig.7 shows the minimum square error results in 1, 3, 5, 7, and 9 steps, which were determined from before- and-after analysis of the extracted hand data on the square prism. One step means 29 dimensional hand data from touching the objects, while three steps means 87 dimensional hand data composed of the touching data and the hand data in the 0.1 s periods before and after touching occurred, and the quantities of data continue to increase with increasing numbers of steps. In the case of the shorter time-series hand data, the minimum square error did not decrease. In contrast, in the case of the longer time-series hand data, the weights were overfi tting the training data, so the minimum square error increased as the number of learning epochs increased. In the case of fi ve steps, which are merged with hand data in the 0.2 s periods before and after the touching data were acquired, the minimum square error gradually decreased. The estimated postures from the somatosensory infor- mation were classifi ed into 10 classes. A square prism was classifi ed every 9 , an elliptical cylinder was classifi ed every 18 , and a triangular prism was classifi ed every 12. TABLE I shows the accuracy as calculated from 8135 class: 0class: 1class: 2class: 3class: 4 class: 5class: 6class: 7class: 8class: 9 square (a) Square prism class: 0class: 1class: 2class: 3class: 4 class: 5class: 6class: 7class: 8class: 9 ellipse (b) Elliptical cylinder class: 0class: 1class: 2class: 3class: 4 class: 5class: 6class: 7class: 8class: 9 triangle (c) Triangular prism Fig. 8: Results for 10 generated images corresponding to the conditional labels that were classifi ed from estimated postures. Fig.8a shows the square prism, Fig.8b shows the elliptical cylinder, and Fig.8c shows the triangular prism. 025050075010001250150017502000 epoch/10 0.0 0.2 0.4 0.6 0.8 1.0 discriminator loss(square) 025050075010001250150017502000 epoch/10 0 5 10 15 generator (a) Square prism 025050075010001250150017502000 epoch/10 0.0 0.2 0.4 0.6 0.8 1.0 discriminator loss(ellipse) 025050075010001250150017502000 epoch/10 0 5 10 15 generator (b) Elliptical cylinder 025050075010001250150017502000 epoch/10 0.0 0.2 0.4 0.6 0.8 1.0 discriminator loss(triangle) 025050075010001250150017502000 epoch/10 0 5 10 15 generator (c) Triangular prism Fig. 9: Loss transitions of the generator and the discriminator. Fig.9a shows the results for the square prism, Fig.9b shows the results for the elliptical cylinder, and Fig.9c shows the results for the triangular prism. the classifi ed class corresponding to the correct class or the class on both sides. In the elliptic cylinder case, the accuracy was 92.3%, which was the highest score. The accuracies for the other objects also showed high scores, demonstrating that the object pose information can be extracted from fi ve steps of time-series somatosensory information with regression analysis. B. Image generation We matched the object images to the somatosensory information using the estimated postures. We trained the cGAN on 1500 samples of object images and conditional labels that were classifi ed based on the estimated pos- tures. The learning time was 20000 epochs and the batch size was 32. Fig.8 shows the results of image generation of each object shape. The cGAN was able to generate visual images of the objects. The results also show that the change in the class label corresponded to the change in object poses visually. Fig.9 shows the loss transitions of the generator and the discriminator. The loss
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- CJ/T 3065-1997弧形格栅除污机
- DB13T 2859-2018 出口薯蓣栽培技术规范
- DB13T 5155-2019 食品金属包装内壁涂层中三聚氰胺和双酚A迁移量的测定
- DB13T 5001-2019 信息安全技术 信息系统个人信息保护技术与管理规范
- DB13T 2964-2019 成品油经营企业(加油站)诚信计量建设规范
- 智慧城市面试题库及答案
- 聚光式屋顶太阳能系统的节能设计-洞察阐释
- 工业产品人性化设计研究企业制定与实施新质生产力项目商业计划书
- 政府城管面试题库及答案
- 2024年中考三模 模拟卷 语文(深圳卷)(参考答案及评分标准)
- 2022年广东省深圳市中考化学真题试卷
- 国际财务管理教学ppt课件(完整版)
- 2022年江西省南昌市中考一模物理试卷
- 百日咳临床研究进展PPT医学课件
- Q∕GDW 12176-2021 反窃电监测终端技术规范
- 光引发剂的性能与应用
- 图像处理和分析(上册)课后习题答案(章毓晋)
- 三金片前处理车间1
- NB_T 10499-2021《水电站桥式起重机选型设计规范》_(高清最新)
- 韵能cfd风环境模拟stream scstream答疑软件常见q a汇总
- 门诊疾病诊断证明书模板
评论
0/150
提交评论