




已阅读5页,还剩3页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Abstract The purpose of this study is for a robot to learn picking motions in a logistics warehouse environment The picking operation performed by a robot often fails owing to the inclination of items placed on a shelf as well as the minimum clearance between the products and their vinyl packaging Therefore we considered acquiring a specific motion trajectory by reinforcement learning However because numerous types of items are handled in logistics warehouses efficient learning is required Therefore in this research we propose a method to efficiently exploration for learning picking an object by determining a focus exploration area for learning based on previous results of different objects I INTRODUCTION Owing to the concern of labor shortage consequent on the decrease in the working population robot automation in factories and warehouses is becoming a major social trend 1 2 An industry in which the shortage of workers is becoming a serious issue is the logistics industry Picking work carried out by robots in logistics warehouses for delivery of goods is attracting attention 3 A worker in a logistics warehouse moves goods from a shelf on which a plurality of the same goods are wrapped and packed in boxes to a tray on a conveyor belt for shipment to a retailer The task of depalletizing in which shipments newly entering the logistics warehouse are moved to a conveyor belt or pallet is a similar operation Research on robot automation in depalletizing work has already been conducted 4 5 In picking work the size of the object handled is smaller than that in depalletizing work and people tend to outperform robots which may encounter problems such as the position and diversity of the objects The automation of such work through a robotic picking operation would help solve the shortage of workers However the goods stored on the shelves are diverse and the gaps between them are very small thus it is difficult to design the picking trajectory for each target product Therefore a mechanism by which the robot itself acquires the picking operation such as by using machine learning is necessary Many studies have been conducted in which a robot autonomously learns a motion without requiring complex motion planning by applying deep learning 6 7 8 9 10 One example of robot action learning using deep learning is the grasp operation learning reported by Levine et al 11 In this experiment 800 000 trials were conducted using 14 robots and the robots were trained on the operation of grasping an object from image data obtained from a camera Gu et al used deep reinforcement learning train a robot on the reaching task as well as the opening and closing of a door using simulation and real world trials 12 In their experiment four robots were trained for several thousand trials Generally a large amount of data is necessary for deep learning thereby increasing the number of trials required for a robot to learn trajectory control This excessive number of trials is a big problem while conducting robot learning of trajectory control Therefore the aim of this study is to generate a trajectory for each product using reinforcement learning for picking tasks in a logistics warehouse and proposes a method for a more efficient learning with relatively few trials Transfer learning 13 14 is a method to enable training with a small number of data Transfer learning reduces the number of data required for learning by starting from a model learned beforehand However it is often difficult to prepare a learned model such as that targeted in this study Therefore for efficient learning in this study we limited the area to be explored during learning Hence a picking operation is first performed on a sample object different from the object to be learned The trajectory for this operation which was intuitively prepared by the experimenter is used to determine the area to be explored during the training stage of the system Then we aim to make learning efficient by weighting the probability of selecting each action based on an intensive exploration of the determined area In order to evaluate the proposed method we compare the learning results with and without the proposed method Section describes specific details of the proposed method In Section a picking operation learning experiment in a simulated warehouse environment is carried out using the proposed method and the usefulness of the proposed method is evaluated by comparing the learning result with a conventional method of exploring and learning at random Section discusses the results of the learning experiments and Section presents a summary of this study II PICKING SYSTEM IN A LOGISTICS WAREHOUSE A Logistics warehouse environment Within the logistics warehouse products from each manufacturer are collected in a single location and the work of sorting and delivering the goods to retail stores is carried out Goods handled in warehouses are often tightly packaged Adjusting Weight of Action Decision in Exploration for Logistics Warehouse Picking Learning Kato Yusuke 1 2 Nakamura Tomoaki 3 Nagai Takayuki 3 Yamanobe Natsuki 2 Nagata Kazuyuki 2 Ozawa Jun 2 1Kato Yusuke is with Panasonic Corp Tokyo 135 8072 Japan e mail kato yusuke001 2Kato Yusuke Yamanobe Natsuki Nagata Kazuyuki and Ozawa Jun are with the National Institute of Advanced Industrial Science and Technology Ibaraki 305 8560 Japan e mail kato yusuke n yamanobe k nagata ozawa jun aist go jp 3Nakamura Tomoaki Nagai Takayuki are with the University of Electro Communications Tokyo 182 8585 Japan e mail tnagai eee uec ac jp tnakamura uec ac jp 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE3557 together with other same goods in a box without any gaps as shown in Figure 1 In this study this state is called a filled state In addition goods are stored on a shelf tilted toward the workers to pick up easily and a conveyor belt is installed on the other side of the operator opposite the shelf The worker inputs the order form from the retail store into a machine picks the specified product from the shelf and moves the product to the tray on the conveyor belt The position where the goods are placed is predetermined and an indicator on the shelf from which the picking is necessary lights up when the machine reads the order slip Robots are suited to environments in which product position information is controlled and robotic automation of the picking operation in a logistics warehouse could help solve the worker shortage B Construction of the picking system The robot system shown in Figure 2 was constructed as a simulated environment of a picking operation in a logistics warehouse A KUKA LBR iiwa 14 R820 which is a 7 axis articulated robot arm was prepared The robotic arm has a suction system and a force sensor attached to its hand The vacuum suction system is composed of a compressor and an ejector The ejector provides the suction pressure generated at the time of adhesion A WACOH TECH DynPick which is a capacitive six axis force sensor was used This can measure forces and moments in the XYZ directions In this study grip by suction was adopted as the picking system As mentioned earlier the goods handled in the warehouse are often in a filled state and there is almost no gap between the goods so it is difficult to grasp the sides of goods using multi finger hands In addition because the size differs for each product suction gripping can pick any item from a single point As a simulated warehouse environment shelves inclined at 20 deg were prepared in front of the robot and target objects were placed on the shelves in a filled state Two types of objects were prepared as objects to be picked object A shown in Figure 3 a and object B shown in Figure 3 b Similar to the handling conditions in the warehouse environment the same goods were packaged together as a single unit For packaging film was used for object A and vinyl for object B In an actual warehouse environment the storage location of goods can be managed electronically In addition because the goods themselves are also arranged in storage boxes the suction position for picking can be known in advance Therefore the picking operation can be possible if the retrieval trajectory for picking can be determined C Picking trajectory Intuitively the simplest trajectory for picking is the trajectory that lifts the product in one direction after suction Therefore three types of retrieval trajectories were prepared as shown in Figure 4 and objects A and B were picked As a result picking was successful with trajectory 1 and trajectory 2 but failed with trajectory 3 because the robot could not lift the product well after suction Figure 5 upper row Meanwhile when picking was carried out in the same way using object B with the trajectories of Figure 4 picking failed with all trajectories Figure 5 lower row With trajectory 3 as in the case of object A the robot failed to lift the product after suction Although with trajectory 1 and trajectory 2 the robot succeeded to lift the product after suction when the entire object was taken out of the box the attitude of the object changed greatly thus releasing the grip and dropping the object The cause of the fall is related to the wrapping of the object In the case when the product is wrapped with a relatively solid material like film it was possible to achieve suction and pick up it from a filled box by a trajectory where the product was lifted in one direction without any particular problems Meanwhile when the suction surface was a relatively soft material elongation of the packaging occurred when it was lifted by suction Because the goods in a warehouse are placed on an inclined shelf when a product was extracted completely from a filled box by a simple trajectory a change in attitude occurred owing to the elongation of the packaging material This change in attitude was initially prevented by the surrounding objects however when the product was Figure 1 An example of the state of goods in the warehouse Figure 2 Robot system Figure 3 Goods in warehouse Object a is wrapped in film b is wrapped in vinyl 3558 withdrawn it swayed greatly and fell Therefore in such a case it is necessary to change the angle of the lifting direction to avoid causing a large change in attitude when extracting the object from the box However when the lifting distance is small it is necessary to design a trajectory for an appropriate angle change because there is the possibility of dropping the object such as in trajectory 3 In addition because it is possible that the trajectory at that time will differ depending on the size and weight of the target product manual design of the trajectory for each individual product is difficult Consider training a robot on a trajectory that allows it to pick up goods without dropping them by using reinforcement learning In the environment of this study as detailed above simulating aspects such as the Vinyl elongation and the accompanying swaying is difficult hence learning needs to be carried out using an actual robot However reinforcement learning generally requires a large number of trials to ensure sufficient performance There are many types of goods in the warehouse and because the inventory is constantly changing spending excessive time on each one of the products is inefficient Therefore it is necessary to complete learning with a relatively small number of trials Accordingly in this study we aim to achieve effective learning with a relatively small number of trials by establishing an exploration area for the learning of the target object based on the picking result of other objects III REINFORCEMENT LEARNING OF PICKING TRAJECTORY FOR A LOGISTICS WAREHOUSE A Physical model of picking trajectory In reinforcement learning the next action is decided using the state observed by an agent and better policies are learned based on the reward obtained by the agent In this study as shown in Figure 6 the hand of the robot arm at the time is represented by the position coordinates and the attitude information Then movement control of the hand of the arm is defined by assigning a movement and a change in attitude change of the hand from that point in time Therefore the model of the trajectory control at the time is 0 1 0 2 Consider acquiring a picking trajectory by reinforcement learning using such a trajectory control model In addition to the robot hand position coordinate and attitude information consider the learning problem of trajectory control to achieve a certain task under conditions such that the force and the torque can be measured on the robot s hand The state used to determine the action at time is defined as 3 and the agent determines action At in accordance with the optimal strategy at that time In accordance to the trajectory control model let action At be 4 The agent obtains a reward as a result of the action and proceeds to the next moment of time The policy is updated and learning proceeds such that the sum of rewards obtained from the series of actions is maximized B Action selection in reinforcement learning for a picking trajectory In this study we used deep reinforcement learning 15 to train a robot on trajectory control from the initial position at t 0 where the target object is gripped by the robot arm until the target object is completely pick up from the box In the task of this experiment it was assumed that the trajectory in the Y axis direction was unnecessary because the movement of the robot arm was carried out on the X Z plane of the coordinate system As shown in Figure 7 the parameters of the position and force on the Y axis and rotational moment on the X Z plane have no effect and are omitted thus the state is defined as follows Figure 4 3 way simple picking trajectories Figure 5 Picking result by 3 simple trajectories Figure 6 Model of trajectory 3559 0 0 5 The action is represented by the amount of change d with respect to the current moving direction d of the hand Movement distance was fixed at 2 cm and the moving direction was fixed from 10 deg to 10 deg in 5 deg increments such that there is selection from five prepared choices In addition one trial of the retrieval operation is defined as 12 actions This was determined from the number of movements required to completely pick up the target object from the box The reward is determined by the success or failure of the task at the end of one trial A positive reward was assigned when the robot arm was able to completely pick up the target object without dropping it and a negative reward was assigned when the target object was dropped partway through In learning a method of determining action by weighted selection probability was used rather than a simple greedy method in which one action is chosen from several available actions all of them having equal probability The details are explained in the next section Using these actions one trial of learning consists of the following steps 1 Move the hand tip of the robot arm to the suction position and grip the target object by suction 2 Acquire the current state St determine the action At and execute it 3 Check the state of the target object When the target object is dropped the reward is set to 1 and the trial is ended Otherwise go to step 4 4 If the number of actions at this point has been less than 12 the process returns to step 2 with a reward of 0 Otherwise the reward is set to 1 and the trial is ended In Step 3 the state of the target object is confirmed using the suction pressure obtained from the system When the suction pressure becomes equal to or less than a threshold value the target object is considered to be dropped by the robot arm A simple network was used for learning consisting of two hidden layers with 30 units Figure 8 C Weighting of action selection probabilities for reinforcement learning In this study we aimed to achieve effective learning with relatively fewer trials For that purpose we proposed to determine the exploration area when learning the picking trajectory of the target object based on the picking result of other objects We believe that efficient learning can be realized by determining the exploration area so as to reduce unnecessary exploration Unlike transfer learning 13 14 there is no need to prepare other learned models We can use hand engineered trajectories It can be said that object A for which picking was realized with a simple trajectory such as 1 or 2 in Figure 4 is easier to pick than object B Therefore when acquiring the picking trajectory of object B by reinforcement learning exploring the picking trajectories using the results of object A is of high importance In this task we can predict that the successful picking trajectory of object B is close to the successful picking trajectory of object A thus intensively exploring around the successful picking trajectory of object A is considered important From this fact we will consider determining the exploration range for learning the picking trajectory of object B from the results of picking object A We will introduce the method later First a grid shaped area is prepared of 10 10 in the X and Z axis directions which are the coordinates that the robot hand can take The evaluation value of the region for each trajectory is calculated from the picking trajectory of object A as shown in Figure 4 and Figure 5 as well as in the results Figure 9 To calculate the evaluation value the trajectories are first divided into successful and failed trajectories and different calculations are performed for each one For a successful trajectory the evaluation value is calculated using the kernel density estimation method with the standard normal distribution expressed by Equation 6 as the kernel for the X axis direction of the region through which the trajectory passes 1 2 2 exp 2 2 2 6 Next regarding failed picking trajectories the ar
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年企业内部业务员选拔与劳动合同规范编制指南
- 2025年时尚酒吧主题舞台娱乐活动全案承包合同
- 2025年国际汽车展览会场地租赁合作协议
- 2025年高品质露营帐篷批发合作及深度售后服务保障协议
- 2025年绿色农业小额创业贷款服务合同
- 2025绿色生态养老院护理员技能培训与职业发展合同
- 宁乡公务员面试题及答案
- 奉贤公务员面试题及答案
- 独山公务员面试题目及答案
- 广东省广州市初三语文真题汇编《诗歌鉴赏》及答案
- 有害物质过程管理系统HSPM培训教材
- 2025年蛇年年会汇报年终总结大会模板
- 存款代持协议书范文模板
- DB3301T 0374-2022 疗休养基地评价规范
- 胖东来企业文化指导手册
- 北师大版八年级物理(上册)期末复习题及答案
- 【历年真题合集+答案解析】2024年教资高中历史
- 委托别人找工作的协议
- 医技三基三严知识模拟习题含参考答案
- Y -S-T 732-2023 一般工业用铝及铝合金挤压型材截面图册 (正式版)
- 不定代词专项练习(附详解)
评论
0/150
提交评论