IROS2019国际学术会议论文集1244_第1页
IROS2019国际学术会议论文集1244_第2页
IROS2019国际学术会议论文集1244_第3页
IROS2019国际学术会议论文集1244_第4页
IROS2019国际学术会议论文集1244_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Unsupervised Task Segmentation Approach for Bimanual Surgical Tasks using Spatiotemporal and Variance Properties Ya-Yen Tsai, Yao Guo, Member, IEEE Guang-Zhong Yang, Fellow, IEEE AbstractIn surgical workfl ow analysis and training in robot-assisted surgery, automatic task segmentation could sig- nifi cantly reduce the manual labeling time and enhance robot learning effi ciency. This paper presents an unsupervised seg- mentation approach to automatically segment a given surgical task without manual intervention. A new segmentation method is presented, which relies only on bimanual kinematic trajec- tories without the need for prior information about the data. Specifi cally, surgical tasks are segmented by fusing trajectories spatiotemporal and variance properties. To demonstrate the effectiveness of the proposed method, detailed experiments were fi rst conducted on our dataset. We segmented trajectories of three different surgical stitches and observed an average F1 score of 77.9% against the ground truths. The same trajectories were then added with different levels of noises and the seg- mentation comparison was made with four other methods. The proposed algorithm had demonstrated its robustness against the noises. Finally, to assess its generalization ability, the method was evaluated on publicly available JIGSAWS dataset and an average F1score of 75.5% was achieved. I. INTRODUCTION In the past decades, robot-assisted surgeries have sup- ported realizing the full potential of minimally invasive surgery compared to traditional open surgeries 1 2. The benefi ts of such transition can be seen from many clinical studies and evidence 3, 4. The provision of task automa- tion is particularly advantageous in situations where surgi- cal subtasks require, for example, extended period of high concentration for repeated and tedious operations. Learning from Demonstration (LfD) 5 improves the effi ciency in programming a robot by learning complicated movements and manipulations through human guidance or provision of human demonstrations. Task segmentation is one of the most critical processes in LfD because it facilitates analyzing and understanding motion behaviors. Complicated motions during surgical tasks typically consist of multiple steps and complicated tool ma- nipulations. Hence, accurately dividing a demonstration into meaningful and homogeneous action units, namely motion primitives (MP), is challenging, especially when there is a lack of a prominent and clear boundary between MPs. Moreover, human annotations for a large amount of data are diffi cult to be maintained, and thus manual labelling is prone to errors. Y.-Y. Tsai, Y. Guo and G.-Z. Yang are with the Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, London, UK (e-mail: y.tsai17, yao.guo, g.z.yangimperial.ac.uk). G.-Z. Yang is also with the Institute of Medical Robotics, Shanghai Jiao Tong University, China. This work was supported by Engineering and Physical Sciences Research Council (EPSRC) under Grant (EP/L020688/1). Fig. 1: Flowchart of the proposed unsupervised segmentation method. Bimanual trajectories generated from human demon- strations are fi rst segmented based on the spatiotemporal and variance properties of 6 DoF kinematic trajectories, separately. Then two sets of segments are merged using DBSCAN to form the fi nal segmentation result. Many unsupervised segmentation methods are focused on using kinematics information to segment trajectories 6 7. They group sub-trajectories based on the similarity of kine- matic features. Buchin et al. 8 used a range of movement characteristics such as location, speed, velocity, shape, curva- ture, and sinuosity to determine the kinematic homogeneity of motions. Segmentation points were found at the changes in the homogeneity. Despinony et al. 9, on the other hand, relied on distance metrics like Hausdorff distance, Fr echet distance, and Dynamic Time Warping (DTW) for computing the dissimilarity in trajectories, and used this information to fi nd segmentation points. Clustering-based technique is another commonly used segmentation strategy. Many of which detected the stop-and-move actions of the trajectory data in the spatial domain 10, 11. As stagnation points often appear to be dense in space, they utilized such property 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE7934 to group vicinity points and segmented the trajectory based on the distant clusters. Extended works have also been carried out to combine both kinematic homogeneity and clustering to segment a task 12. Transition State Clustering (TSC) 13 exploited video and kinematic information of demonstrations for sur- gical task segmentation. It identifi ed potential transitions and segmented linear dynamic regimes based on kinematic, sensory and temporal similarity. The number of clusters was later governed by Dirichlet Process (DP) avoiding the need for a priori knowledge. The algorithm optimized its results by iteratively merging dense clusters while removing sparse and repeating loops until a condition was met. Fard et al. 14 introduced a soft-boundary unsupervised approach (Soft-UGS) to fi rst segment surgical gestures into fi ne pieces then iteratively merge homogeneous segments that defi ned by Probabilistic Principal Component Analysis (PPCA), the distance between the centers of segments and DTW distance between neighboring segments. Although many previous works have addressed issues such as trajectory frame dependence in surgical tasks segmenta- tion, there remain main problems still unresolved. Firstly, temporal variations such as noises are often inevitable in human demonstrations. Although smoothing may mitigate the effect, intensive smoothing may change the critical shape as well as spatiotemporal information from the original trajectory. This effect leads to the critical segmentation features becoming less prominent and results in a degraded performance of an automatic task segmentation. Secondly, inconspicuous segmentation points or transition periods is another common problem, especially for trajectories involved human demonstrations. The tendency to perform a continu- ous and smooth transition from one action unit to another is naturally inherited. Distinguishing such motions by hand may be diffi cult, and this can increase the likelihood of inconsistent manual segmentation. On top of that, data with a high Degree of Freedom (DoF) may even make seg- mentation more ambiguous. Without incorporating rotational components of trajectories, it introduces more uncertainty in segment identifi cation and classifi cation 15, which has not been addressed in the previous work. In this paper, we propose a generic and novel task segmen- tation framework to divide bimanual 6DoF trajectories into multiple action units automatically. It utilizes two derived kinematic features from the spatiotemporal and variance properties of trajectories to fi nd initial sets of potential segmentation points. These sets are fi nally clustered to refi ne the segmentation result. Fig. 1 illustrates the overall structure of the proposed unsupervised segmentation algorithm. This paper mainly focuses on complex surgical related applica- tions, but the algorithm is also applicable to other relevant tasks. The main contributions of this paper are two-fold as follows. We propose a new framework for complicated task segmentation with bimanual 6DoF spatiotemporal tra- jectories as inputs; The algorithm fuses two frame-independent kinematic features to enhance the robustness against noises and the segmentation precision and accuracy. This paper is organized as follows. Section II introduces our proposed method used to segment a given task au- tomatically. The experiments in Section III evaluate the proposed algorithm by comparing the results with manually labeled ground truths with and without the presence of the additive noises. We also compared its performance against other commonly used segmentation approaches. Section IV presents the discussion and, conclusion and future works are provided in Section V. II. METHODOLOGY Complicated motions involved in surgical tasks typically require multiple steps to accomplish. Decomposing a task into several simple steps allows a better understanding of the constitution of a motion. This paper proposes a novel algorithm to automatically segment bimanual tasks such as surgical suturing. A divide-and-merge approach is followed throughout the framework to refi ne the segmentation perfor- mance. A. Problem Statement Let us defi ne a bimanual task T= l,r as the combination of two motion trajectories, where l RNdL and r RNdRrepresent the kinematic trajectories of the left hand and the right hand, respectively. dLis the dimension of features describing the translation and orientation of the left hand movement over time, while dRdescribes those of the right hand. N refers to the number of frames in trajectory data. The purpose of the task segmentation is to divide the task T into K consecutive fractions as: T = K i=1 Si(pi s,p i e) (1) where pi s and pi e indicate the indexes of the starting point and the ending point for the segment Si. Note that, the end point pi1 e of the segment Si1coincides with the start point pi s of the current segment Si. In this paper, 6 DoF kinematic trajectories of tools move- ments are recorded using a visual system by tracking the visual markers attached to the tips of the tools. Then we have dL= dR = 6. In specifi c, a 6 DoF trajectory is expressed as (t) =, where p = x,y,z is the translation component and = ex,ey,ez is the rotation component in the form of Euler-Rodrigues representation. The proposed segmentation algorithm takes bimanual kinematic trajectories as inputs and produces a set of segmentation points based on the following three steps: spatiotemporal-based segmentation, variance-based segmen- tation, and a merging step. Firstly, the spatiotemporal-based segmentation method provides the initial set of segments by identifying the spatiotemporal density of the candidate tra- jectory. Secondly, the variance-based segmentation projects the recorded temporal trajectories onto different coordinates for capturing the changes in velocity for each feature over time. Segments are determined based on the changes in the 7935 (a)Left hand trajectory(b)Right hand trajectory Fig. 2: Example bimanual trajectory generated from a blanket stitch. The trajectory is manually segmented into 9 sections, each of which corresponds to a color. variance of each feature. By adopting these two methods separately, given trajectories are divided into two sets of segments. Finally, the merge process is carried out to extract points exist in both data sets and cluster points in the same space region in order to form the fi nal set of segmentation points. Segmentation points are defi ned at the physical boundaries of motions. This decomposes a trajectory into homogeneous segments, therefore each of which has a continuous and smooth motion that only comprises of simple translation and/or rotation. Fig. 2, shows a single cycle of a blanket stitch, segmented into 9 MPs. As illustrated, critical points are found when there is a change in homogeneity. In this paper, the algorithm aimed to fi nd an optimal segmentation where a trajectory is partitioned into minimal segments while meeting the requirement of segment homogeneity. B. Spatiotemporal-based Segmentation We fi rst explore both the spatial and temporal properties of the points in the given trajectories and determine the potential segmentation points. This method is inspired by the inherent characteristics of human movements 15. When performing and combining multiple incoherent motions, to ensure smooth transitions, motions tend to change gradually in the spatiotemporal domain. The kinematic characteristics of the transition points can differ from those within the same segments, therefore, they serve as good properties for segmentation. The movements at the transitions often refl ect on the density of trajectories in space. The gradual transition from one motion to another results in poses of hands hovered at a particular region, which leads to the dense distribution of points within the trajectory. It should be pointed out that not all spatial clustered points represent transition state. Clustered points may also come from noises, the presence of other motion primitives, and other transition periods. To address this, a spatiotemporal-based protocol is pro- posed to investigate the clustered points that are temporally proximate. For a 6 DoF trajectory, a distance profi le is calculated by fi nding the distance in space between two vicinity points. Considering the translation and rotation are measured in different units, we calculate the distance for the translation and rotation components, separately. For the translation component, the Euclidean distance Dtrans(t) measures the distance between the point at the current instance p(t) = x(t),y(t),z(t) and the point at the previous instance p(t1) = x(t 1),y(t 1),z(t 1). As for the rotational component, we fi rst convert the Euler-Rodrigues representation into the quaternion q to calculate the distance between two 3D rotations. For two 3D rotations that are close enough, the distance between them can be approximated as linear. In this case, the provided trajectories were recorded at 20Hz, and hence, the distance at a time instance, Drot(t), is calculated using the quaternion, q(t) = qw(t),qx(t),qy(t),qz(t) at the instance and quaternion, q(t 1) = qw(t 1),qx(t 1),qy(t 1),qz(t 1) at the previous instance. Eqn. 2 is used to compute the quaternion distance between two vicinity points along the trajectory at time stamp t. Drot(t) = arccos(2(q(t) q(t 1)2 1)(2) We calculate Drotand Dtransfor the left hand and the right hand trajectories, respectively. In total, there are four distance profi les for two hands. As mentioned, the points belonging to the transition state are more closely clustered and therefore they possess smaller values in a distance profi le, whereas peaks represent mo- tions or segments. The potential segmentation points are consequently found by fi nding the corners of the peaks. To minimize the effect of noises presented in the data, only peaks that had height and prominence higher than the pre- defi ned thresholds are considered. At a transition period, there exists a moment with zero ve- locity and acceleration. Zero velocity crossing is a commonly used method to determine segmentation points which happen at the zero crossing. Therefore, speed and acceleration pro- fi les derived from the distance profi les for each component are used to further refi ne the segmentation results. Starting from the segmentation points identifi ed from the distance profi les, each point is examined whether it is at the zero crossing. If it is not at the zero crossing, it searched forward and backward temporally to fi nd the points that meet the criteria. This refi nement process is performed iteratively for each segmentation point and each component. Finally, the segmentation results from the four components are combined to form the fi nal set for spatiotemporal-based segmentation approach. A point is selected in the fi nal set such that it exists in one of the four segmentation sets and there existed another point temporally nearby the point in another segmentation set. The vicinity of another point is again determined based on the pre-defi ned threshold. The outcome of the merge point served as the initial segmentation points of this framework. C. Variance-based Segmentation Although spatiotemporal-based segmentation policy is ca- pable of generating a set of segmentation points, there still exists mis-segments due to the noise of the raw trajectories. To solve this, we additionally propose a variance-based segmentation policy to address two common problems in task segmentation, noises, and frame-dependency. 7936 Fig. 3: Illustration of points along a trajectory being projected onto different frames in space. To address the aforementioned challenges, the variance- based segmentation, instead, transforms the given trajec- tory from the current coordinate system E to different frames in space to perform segmentation. Nfframes are fi rst randomly selected, and the given bimanual trajecto- ries are projected onto these frames. For each point p(t) along the trajectory on a new frame j(j = 1, ,Nf), we can defi ne a vector pointing from the origin of j to the point p(t). Next, the projection angles at time t, jx(t),jy(t),jz(t), to the three axes of j are com- puted. Let us defi ne R(t) as the 3D rotation matrix from the local Euler-Rodrigues representation (t) to the new frame j. The Euler angles jx(t), jy(t),jz(t) can be derived from the rotation matrix R(t). These two sets of angles are calculated for Nfframes. Finally, we obtain Nfsets of 6 DoF reparameterized trajectories (t) for each hand, from which with frame j can be expressed as jx(t), jy(t),jz(t),jx(t),jy(t),jz(t). For each time step t, we calculate the variance of the speed profi le for each feature from the trajectory. The trajectories of both hands were transformed to these points. The changes in the projection angles were found using the translation component. The changes in angular displacements were calculated using the rotation component.Variances from the translation components, V artrans, and variances from the rotation components, V arrot, calculated from the frame j are summarized as follows: V arj trans(t) = V arjx(t) + V arjy(t) + V arjz(t) V arj rot(t) = V arjx(t) + V arjy(t) + V arjz(t) (3) where j = 1, ,Nf. The variance profi le has a similar prope

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论