IROS2019国际学术会议论文集 1540_第1页
IROS2019国际学术会议论文集 1540_第2页
IROS2019国际学术会议论文集 1540_第3页
IROS2019国际学术会议论文集 1540_第4页
IROS2019国际学术会议论文集 1540_第5页
已阅读5页,还剩3页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

E Abstract Object detection extended to recognize and localize indoor structural features is used to enable a quadrotor drone to autonomously navigate through indoor environments The video stream from a monocular front facing camera on board a quadrotor drone is fed to an off board system that runs a Convolutional Neural Network CNN object detection algorithm to identify specific features such as dead ends doors and intersections in hallways Using pixel scale dimensions of the bounding boxes around the recognized objects the distance to intersections dead ends and doorways can be estimated accurately using a Support Vector Regression SVR model to generate flight control commands for consistent real time autonomous navigation at flight speeds approaching 2 m s I INTRODUCTION Convolutional Neural Networks CNNs have revolutionized the computer vision field starting with image classification models 1 2 3 that have consistently outperformed their non CNN competitors 4 More recently image classification has led to object detection allowing for a finer level of understanding of what and where something is contained in an image Along with advances in processing hardware primarily GPUs it is starting to become possible to realize autonomous flights of MAVs Micro Aerial Vehicle drones by processing a monocular video stream from the forward looking MAV using a CNN in real time and without reliance on multiple sensors for localizing the MAV This paper introduces a prototype system for such autonomous navigation of a quadrotor MAV in hallways using a CNN running on an off board system for generating flight control commands Autonomous MAV drone indoor navigation is challenging as the navigation space is highly constrained and GPS denied 5 Simultaneously the flight dynamics of small quadrotors are relatively unstable due to low drag in the airmass demanding pro active control rather than reactive control to compensate for unintended motion due to inertia Specifically flight maneuvers have to be anticipated sufficiently in advance to avoid instabilities and collisions As such it is imperative to use reliable and consistent techniques to extract features from the environment that can be used as navigational markers to plan maneuvers in advance This is a task that has great potential for development and improvement using state of the art CNN based object detection algorithms This work provides significant enhancements to our prior work 6 7 8 In 6 7 indoor autonomous navigation was demonstrated using conventional image processing techniques to extract environmental features to aid in autonomous localization and navigation through hallways The work in 8 improves on the limitations of 6 7 by using CNN based classifiers to recognize specific types of structural features without estimating their dimensions or proximity to the drone which are estimated using traditional image processing techniques This work uses CNNs to simultaneously detect size and localize structural features eliminating the need for traditional image processing and enabling our technique to deal with a significantly broader class of objects Specifically the CNN object detector enables the types of environments where navigation is possible to include a wider variety of hallways that were previously undetected with image processing techniques Object dimensions and their proximities are determined based on reference objects in the pixel plane to enable successful autonomous flights in a consistent manner Our technique seems to be unique in deriving flight control commands using an off board CNN to detect and localize structural features while simultaneously using a Support Vector Regression model for correlating pixel level dimensions to ground truth for proximity estimation The CNN based YOLO 9 framework is used to train and implement a prediction model that can very reliably and A Convolutional Neural Network Feature Detection Approach to Autonomous Quadrotor Indoor Navigation Adriano Garcia Sandeep S Mittal Edward Kiewra and Kanad Ghose agarcia3 smittal2 ekiewra1 ghose binghamton edu Department of Computer Science State University of New York Binghamton NY 13902 Figure 1 System architecture and data flow 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE74 consistently detect and locate structural features from the drone s sole forward looking camera at frame processing rates around 25 frames per second on average Such a frame rate is in line with the drone s 30 frames per second video transmission rate and permits autonomous real time navigation flights at speeds approaching 2 m s The use of the off board processing technique enables the use of small stock drones eliminating the need to have additional sensors and or higher lift capabilities that would otherwise be needed to support on board processing The advent of smaller low power GPUs or similar accelerators also enables the technique to be implemented completely on board the drone this is our work in progress at this time II SYSTEM DESCRIPTION A System Architecture The system architecture used in our solution is shown in Figure 1 The drone s forward facing camera sends a video stream to a base processing station laptop with GPU which analyzes the image frames to derive flight control commands that are then transmitted to the drone Thus stock off the shelf drones can be readily used requiring no additional specialized sensors and processing on board to support autonomous navigation Figure 1 shows a data flow diagram of the main functional blocks at the base station for the system described in this paper Video frames and navigational data such as pitch and yaw are received at the Drone API TX RX Node which handles direct communication with the drone The received data is then sent to two other ROS nodes YOLOv3 Node A node running YOLOv3 processes the received video frame and publishes a container filled with metadata describing the number of structures detected along with a list of bounding box data The list of bounding box data contains all the bounding boxes that were detected in the video frame Each bounding box data contains the x and y coordinates for the top left most and bottom right most points of the bounding box type of structure detected left intersection right intersection doorway front door etc and a confidence probability for the detection Control Node The control node runs asynchronously receives YOLOv3 structure feature data as well as navigation and video data from the Drone API TX RX node and generates flight control commands The YOLOv3 bounding box data in the control node is used to detect upcoming intersections dead ends and side features such as doors and windows to be able to localize and issue appropriate commands In this paper we introduce to the best of our knowledge novel methods for drone yaw and roll adjustment based on dead end and side feature object detection to derive positional markers necessary for straight navigation and wall collision avoidance To enable drone yaw correction a central navigation point is extracted from the environment by locating the center point of dead end features such as front facing walls Side features carefully annotated during CNN training allow for the extraction of floor and wall demarcation lines that permit roll correction Pertinent details are given in Section V B Hiding the Off board Control Loop Latency The video transmission latency between drone and ground processing station was measured at 172 ms The total YOLO prediction time averages to 40 ms per frame giving a frame processing rate of 25 FPS To accommodate the discrepancies between the frame reception rate and the frame processing rates some received frames are deliberately skipped The total execution time of the control node including the image processing and control derivations ranged between 6 to 10 ms depending on the scenario Lastly the delay in transmitting control commands back to the drone was measured to average about 5 ms This means that the total time between the moment a video frame is acquired on the drone and the drone acts on that information via commands from the base station is 227 ms This total delay time happens to align nicely with the Bebop s 200 ms 5 Hz navigation data update rate and is easily absorbed by our long range estimation of the distance to a particular detected feature making off board control feasible C Hardware Description Our primary quadrotor platform is the Parrot Bebop 2 10 which uses a Wi Fi connection to send navigational data at 5 Hz and a stabilized video stream at 30 Hz 10 with a resolution of 856 pixels X 480 pixels Off board object detection localization and derivation of steering commands is implemented on a separate ground base platform comprised of a laptop computer with an Intel Core i7 7700HQ CPU 16 GB of RAM and a Nvidia GeForce 1070 graphics card The training of the detection model takes places on a PC workstation with Intel Core i7 7820X CPU 3 60GHz 46 GB RAM and a Nvidia GeForce 1080 TI video card Training takes approximately one day to obtain reasonable results at 30 000 iterations D Control Software Description The software side of our system uses Kubuntu 16 04 to house our control software architecture We use the Robot Operating System ROS to decouple the control system into three different functional nodes running asynchronously but sharing data among each other We use Joseph Redmond s Darknet framework 11 to train and implement feature recognition and proximity estimation to anticipate the Figure 2 Drone velocity blue squares and pitch red circles 10 0 5 0 0 0 5 0 10 0 0 5 0 0 0 5 1 0 1 5 2 0 2 5 0 05 010 015 0 Pitch deg Velocity m s Time s 75 necessary flight maneuvers OpenCV is used to perform image processing and display video data E Quadrotor Aerodynamic Model Aerodynamic calibration is used to determine how the drone s trajectory is affected by its inertia and speed and ascertain the response time of the drone to ensure that flight maneuvers are performed in a timely fashion based on the estimated distances to intersections and dead ends The Bebop s maximum reported velocity exceeds 30 m s this is too fast and possibly dangerous for human inhabited indoor hallways We therefore limit the speed of the drone by limiting the forward pitch angle to 4 degrees Figure 2 depicts the drone s forward velocity blue squares and pitch red circles as a function of time when the drone pitches forward for a total of 12 5 seconds and then immediately decelerates until it stops moving forward From the speed plot we determined that the drone s average acceleration to be 0 28 m s2 approximately in the 6 second interval starting from a stationary mode till a speed of 1 8 m s is attained At the 12 5 second mark the drone immediately pitches backward and decelerates at a rate of 1 433 m s2 The large difference in acceleration and deceleration rates is planned for as the backward pitch of the drone is allowed to be up to 8 degrees for deceleration This allows the drone to fly at a safe indoor speed while being able to quickly come to a stop when turning or stopping maneuvers are needed Finally overshoots due to angular momentum are absent during turning on the Bebop drone due to its advanced rotor control logic making it unnecessary to compensate for angular overshoots during turns III RELATED WORK To the best of our knowledge our approach is unique in its use of a monocular vision system on a stock drone with no additional sensors on board and its use of off board CNN structural feature detectors for early planning and execution of flight maneuvers to enable higher flight speeds Techniques for autonomous navigation of drones has been a popular research area The techniques have used markers for localization 12 pose estimation 13 or visual odometry with full on board processing 14 or images captured by external cameras 12 to steer a drone along a pre specified trajectory 15 vision system based technique that enable a drone to follow another 16 Other navigation techniques have used reference images and corner feature recognition 17 a vision system with three on board proximity sensors 18 a wide variety of on board sensors 19 20 21 optical flow algorithms 22 23 24 Sensor fusion using image data and IMU measurements combined in a potential field formulation is used to steer a drone autonomously in a partially known indoor environment 25 Machine learning based approaches for automated drone navigation has been made practical in recent years due to hardware advances An imitation learning based approach to avoid obstacles and steer a Parrot AR Drone in a wooded area is described in 26 A waypoint map constructed from images from a front facing camera on a hovering drone is used to avoid obstacles in a hallway in 27 a similar approach using a CNN is presented in 28 In 29 a CNN trained with data obtained from a human maneuvering a robot around obstacles is used to subsequently steer the robot autonomously in principle this technique can also be used with a MAVs A cloud based CNN to recognize objects in real time from images captured by a drone appears in 30 In 31 a CNN is trained on 2D images and their depth maps and an offboard control system similar to the one presented here is used to steer the drone and avoid obstacles Corners windows with clear glass panes and close by walls represent challenges for this solution For the generalized autonomous navigation systems for hallways as presented here depth recognition on its own is insufficient specific features need to be recognized and classified In 32 extended Kalman filtering and a CNN is used to localize a camera and its pose leading to an algorithm that can be used for indoor navigation This contrasts with our work which uses CNNs to identify and classify indoor features as well as guide flights along the length of the hallway CNN approaches to indoor MAV navigation are starting to have a strong impact in the field In 33 a CNN based approach to steer a drone inside a hallway in a straight line path based on the classification of the front view into three categories close to the center line close to the left wall and close to the right wall A CNN based approach is used in 34 to enable a quadrotor to follow an outdoor hiking trail by classifying images from the drone into three classes turn left turn right go straight this is extended in 35 to incorporate six scenario classifications and obstacle avoidance using YOLOv1 9 taking advantage of an on board GPU In current state of the art indoor navigation research a CNN binary classifier to avoid crashes based on an extensive database of crash scenarios is implemented in 36 This was used to permit autonomous indoor navigation and obstacle avoidance by also splitting front facing video images into left center and right parts then choosing the best path by adjusting the yaw angle of the drone to take the path least likely to have an obstacle Instead of training by crashing the work in 37 used proximity sensors on board the drone to train a distance to collision regression CNN model that estimates distances to obstacles up to 5 meters away Similar to 33 34 35 36 the images are split into left center and right parts which can then be used to adjust the yaw angle of the drone to follow the more open path Unlike the state of the art work presented in 33 34 35 36 and our prior work 8 that rely on CNN classification we use CNN object detection to locate features of interest in the image at distances that exceed 50 meters We therefore don t need to split the image in multiple parts and one single CNN object detector is enough to detect and locate all the features necessary for our system We do not implement general obstacle avoidance but our system can detect proximity to walls and allow for changes in the yaw and roll angles of the drone permitting high maneuverability and fast speeds approaching 2 m s 76 IV CNN BASED VISION MODEL Our CNN vision model is an object detection model that is trained to detect and locate intersections dead ends and other hallway features such as doors and poster boards in a wide variety of hallways The primary benefit of a CNN approach is the ability to generalize the types of environments and structural characteristics that can be detected Our previous approach 6 7 used image processing feature detection techniques which was unable to detect a variety of features due to poor contrast reflections adverse lighting conditions and partial occlusion of features This CNN based feature recognition approach does not have these limitations and can now reliably detect all necessary features consistently in all of our testing scenarios Section VI A YOLO Structure Detector YOLO You Only Look Once 9 38 39 is a revolutionary CNN based object detection algorithm that aims to perform real time object detection by streamlining the object detection process using a single recursive type CNN 3 that looks at the whole image once This contrasts with other object detection methods 40 41 that use a pipeline in multiple steps to perform region proposal and classification of each region to detect objects YOLO s approach also compares well against other CNN based techniques for real time object detection 42 43 for the same detection accuracy YOLO is three times faster 39 than the method devised in 42 YOLO s real time detection focus and streamlined detection process makes it an ideal solution for the structural feature detector as used in this work The YOLOv3 network 39 used in this work has 53 convolutional layers composed of successive 3x3 and 1x1 Network in Network 44 reduction layers A batch size of 64 is used with a learning rate set to 0 001 along with leaky activation functions Although this paper focuses mainly on structural features we also trained the CNN to detect common indoor objects such as chairs tab

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论