IROS2019国际学术会议论文集0297_第1页
IROS2019国际学术会议论文集0297_第2页
IROS2019国际学术会议论文集0297_第3页
IROS2019国际学术会议论文集0297_第4页
IROS2019国际学术会议论文集0297_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

High-dimensional Motion Segmentation by Variational Autoencoder and Gaussian Processes Masatoshi Nagano1, Tomoaki Nakamura1, Takayuki Nagai2, Daichi Mochihashi3, Ichiro Kobayashi4and Wataru Takano2 AbstractHumans perceive continuous high-dimensional in- formation by dividing it into signifi cant segments such as words and units of motion. We believe that such unsupervised segmentation is also important for robots to learn topics such as language and motion. To this end, we previously proposed a hierarchical Dirichlet processGaussian process hidden semi-Markov model (HDP-GP-HSMM). However, an important drawback to this model is that it cannot divide high- dimensional time-series data. Further, low-dimensional features must be extracted in advance. Segmentation largely depends on the design of features, and it is diffi cult to design effective features, especially in the case of high-dimensional data. To overcome this problem, this paper proposes a hierarchical Dirichlet processvariational autoencoderGaussian process hidden semi-Markov model (HVGH). The parameters of the proposed HVGH are estimated through a mutual learning loop of the variational autoencoder and our previously pro- posed HDP-GP-HSMM. Hence, HVGH can extract features from high-dimensional time-series data, while simultaneously dividing it into segments in an unsupervised manner. In an experiment, we used various motion-capture data to show that our proposed model estimates the correct number of classes and more accurate segments than baseline methods. Moreover, we show that the proposed method can learn latent space suitable for segmentation. I. INTRODUCTION Humans perceive continuous high-dimensional informa- tion by dividing it into signifi cant segments such as words and units of motion. For example, we recognize words by segmenting speech waves, and we perceive particular motions by segmenting continuous motion. Humans learn words and motions by appropriately segmenting continuous information without explicit segmentation points. We believe that such unsupervised segmentation is also important for robots, in order for them to learn language and motion. To this end, we previously proposed a hierarchical Dirich- let processGaussian processhidden semi-Markov model (HDP-GP-HSMM) 1. HDP-GP-HSMM is a nonparametric Bayesian model that is a hidden semi-Markov model whose emission distributions are Gaussian processes, making it possible to segment time-series data in an unsupervised 1Department of Mechanical Engineering and Intelligent Systems, The University of Electro-Communications, Chofu-shi, Japan 2Department of Systems Innovation, Osaka University, Toyonaka-shi, Japan 3Department of Statistical Inference and Mathematics, The Institute of Statistical Mathematics, Tachikawa, Japan 4Department of Information Sciences, Ochanomizu University, Bunkyo- ku, Japan Correspondence:n1832072edu.cc.uec.ac.jp Time Observed value High dimensional time-series data : VAE EncoderDecoder Latent space Classes (Gaussian process) GP 1 GP 2 GP 3 Fig. 1.Overview of the generative process of the HVGH. manner. In this model, segments are continuously represented using a Gaussian process. Moreover, the number of seg- mented classes can be estimated using hierarchical Dirichlet processes 2. The Dirichlet processes assume an infi nite number of classes. However, only a fi nite number of classes are actually used. This is accomplished by stochastically truncating the number of classes using a slice sampler 3. However, our HDP-GP-HSMM cannot deal with high- dimensional data, and feature extraction is needed in order to reduce the dimensionality in advance. Indeed, segmentation largely depends on this feature extraction, and it is diffi cult to extract effective features, especially in the case of high- dimensional data. To overcome this problem, this paper introduces a variational autoencoder (VAE) 4 to the HDP- GP-HSMM. Thus, the model we propose in this paper is a hierarchical Dirichlet processvariational autoencoder Gaussian processhidden semi-Markov model (HVGH). Fig. 1 shows an overview of HVGH. The observation sequence is compressed and converted into a latent variable sequence by the VAE, and the latent variable sequence is divided into segments by HDP-GP-HSMM. Furthermore, parameters learned by HDP-GP-HSMM are used as the hyperparameters for the VAE. In this way, the parameters are optimized in a mutual learning loop, and appropriate latent space for segmentation can be learned by the VAE. In experiments, we evaluated the effi ciency of the proposed HVGH using real motion-capture datasets. The experimental results show that HVGH achieves more accuracy than baseline methods. Many studies on unsupervised motion segmentation have been conducted. However, heuristic assumptions are used in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE105 ! # $%$& $&% (%(& (&% )# * +%+& +&% , - Fig. 2.Graphical model of HVGH. The white nodes represent unobserved variables, and the gray node represents the high-dimensional observed sequence that is obtained by concatenating segments. many of them 5, 6, 7.Moreover, some methods use motion features such as the zero velocity of joint angles 8, 9, 10. However, this assumption usually leads to over- segmentation 6. Furthermore, studies have proposed methods of detecting change points in time-series data in an unsupervised manner 11, 12, 13, 14. These are the methods of fi nding points with different fl uctuations based on previous observations. However, change points do not necessarily indicate the boundary of segments. In some studies, segmentation is formulated stochastically using hidden Markov models (HMMs) 15, 16, 17, 18. However, it is diffi cult for HMMs to represent complicated motion patterns. Instead, we use Gaussian processes, a type of non-parametric model that can better represent compli- cated time-series data compared to HMMs. We confi rmed that our GP-based model can estimate segments more accu- rately than HMM-based methods 1. In some studies, the number of classes is estimated by introducing a hierarchical Dirichlet process (HDP) into an HMM 15, 19. An HDP is a method of estimating the number of classes by assuming an infi nite number of classes. Fox et al. extended an HDPHMM to make a so-called sticky HDP-HMM, which prevents over-segmentation by increasing the self-transition probability 19. Among methods of combining statistical models with neural networks, a method of classifying complicated data using a GMM and VAE was proposed 20. By contrast, our proposed HVGH is a model that combines a statistical model with VAE for segmenting high-dimensional time-series data. II. HIERARCHICALDIRICHLETPROCESSVARIATIONAL AUTOENCODERGAUSSIANPROCESSHIDDEN SEMI-MARKOVMODEL(HVGH) Fig. 2 shows a graphical model of our proposed HVGH, which is a generative model of time-series data. In this model, cj(j = 1,2, ,) denotes the classes of the segments, where the number of classes is assumed to be countably infi nite. Probability cdenotes the transition prob- ability, which is generated from the the Dirichlet process is parameterized by , and the is generated by the GEM distribution 21with the so-called stick-breaking process (SBP) 22: GEM(),(1) cDP(,).(2) The class cjof the j-th segment is determined by the class of the (j 1)-th segment and transition probability c. The segment of latent variables Zjis generated by a Gaussian process whose parameter is c, as follows: cjP(c|cj1,c),(3) ZjGP(Z|cj),(4) where cdenotes a set of segments of latent variables that are classifi ed into class c. The segment Xjis generated from the segment of the latent variables Zjby using decoder pdec of the VAE: Xjpdec(X|Zj).(5) The observed sequence s = X1,X2, ,XJis assumed to be generated by connecting segments Xjsampled by the above generative process. Similarly, the sequence of the latent variables s = Z1,Z2, ,ZJis obtained by connecting the segments of the latent variables Zj. In this paper, the i-th data point included in Xjis described as xji, and the i-th data point included in Zjis described as zji. If the characters represent what they obviously are, we omit their subscripts. A. Hierarchical Dirichlet Processes (HDP) In this study, the number of segment classes is estimated by utilizing a non-parametric Bayesian model.In a non- parametric Bayesian model, an infi nite number of classes is assumed, and the classes are sampled from an infi nite- dimensional multinomial distribution. To realize this, an infi nite-dimensional multinomial distribution must be con- structed, and one of the methods for this is the SBP. vkBeta(1,)(k = 1, ,),(6) k=vk k1 i=1 (1 vi)(k = 1, ,).(7) This process is represented as GEM(), where “GEM” denotes the co-authors Griffi ths, Engen, and McCloskey. In the case of models like the HMM, all states have to share their destination states and have different probabilities of transitioning to each state. To construct such a distribution, the distribution generated by the SBP is shared with all states as a base measure, and transition probabilities c, which are different in each state c, are generated by another Dirichlet process. The method in which the probability distribution is constructed by a two-phase Dirichlet process is called an HDP. c DP(,).(8) 106 B. Gaussian Process (GP) In this paper, each class represents a continuous trajectory by learning the emission ziof time step i using a Gaussian process. In the Gaussian process, given the pairs (i,t) of time step i and its emission, which are classifi ed into class c, the predictive distribution of znewof time step inewbecomes a Gaussian distribution whose parameters are and 2: p(znew|inew,c,i)N(z|,2),(9) =kTC1t,(10) 2=c kTC1k,(11) where t and i are the vectors where a latent variable zi and its time step i in care respectively arranged. Here, k(,) denotes the kernel function, and C is a matrix whose elements are C(ip,iq) = k(ip,iq) + 1pq,(12) where denotes a hyperparameter that represents noise in the observations. k is a vector whose elements are k(ip,inew), and c is k(inew,inew). A Gaussian process can represent complicated time-series data owing to the kernel function. In this paper, we used the following kernel function, which is generally used for Gaussian processes: k(ip,iq) = 0exp(1 21|ip iq|2) + 2+ 3ipiq,(13) where, denotes the parameters of the kernel. Additionally, if the observations are composed of multi- dimensional vectors, we assume that each dimension is in- dependently generated. Therefore, the predictive distribution GP(z|c) that the emission z = (z1,z2,) of time step i is generated by a Gaussian process of class c is computed as follows: GP(z|c)=p(z1|i,c,1,i) p(z2|i,c,2,i) p(z3|i,c,3,i)(14) =N(z|1,2 1)N(z|2, 2 2)N(z|3, 2 3) . (15) By using this probability, the latent variables can be classifi ed into the classes. Moreover, because each dimension is inde- pendently generated, the mean vector c(i) and the variance- covariance matrix c(i) of GP(zji|c) are represented as follows: c(i)=(1,2,3,),(16) c(i)= 2 1 00 02 2 0 00 . , (17) where (1,2,3,) and (2 1,22,23,) respectively represent the mean and the variance of each dimension. HVGH is a model whereby VAE and GP infl uence each other mutually by using c(i) and c(i) as the prior distribution of the VAE. Encoder Decoder Fig. 3.Overview of the variational autoencoder (VAE). The low- dimensional latent variable zjis obtained by compressing the observed data point xjthrough the encoder network. C. Overview of the Variational Autoencoder In this paper, we compress a high-dimensional time-series observation into low-dimensional latent variables using the VAE 4. The VAE is a neural network that can learn the correspondence between a high-dimensional observation x and the latent variable z.Fig. 3 shows an overview of the VAE. A Gaussian distribution with a mean enc(x) and variance enc(x) that are estimated by using encoder networks from input x is used as qenc(z): qenc(z) = N(z|enc(x),enc(x).(18) The latent variable z is stochastically determined by this distribution, and xis generated through decoder networks pdec: zqenc(z),(19) xpdec(x|z).(20) The parameters of the encoder and decoder are determined to maximize the likelihood by using the variational Bayesian method. Generally, the prior distribution of the VAE is a Gaussian distribution whose mean is the zero vector 0, and the variancecovariance matrix is the identity matrix e. On the other hand, HVGH uses a Gaussian distribution whose parameters are c(i) and c(i) of class c into which zjiis classifi ed. As a result, latent space suitable for segmentation can be constructed. By using this VAE, a sequence of the observation s = X1,X2, ,XJis converted to a se- quence of the latent variables s = Z1,Z2, ,ZJthrough the encoder. III. PARAMETERINFERENCE Fig. 4 shows an outline of parameter estimation for HVGH. A sequence of observations s is converted to a sequence of latent variables s by the VAE. Then, by the HDP-GP-HSMM, the sequence of latent variables s is di- vided into segments of latent variables Z1,Z2, and the parameters c(i) and c(i) of the predictive distribution of z are computed. This predictive distribution is used as a prior distribution of the VAE. Thus, the parameters of the VAE and HDP-GP-HSMM are mutually optimized. A. Parameter Inference of HDP-GP-HSMM The parameters of HDP-GP-HSMM are estimated in the same manner that we proposed in 1. Here, we briefl y explain the parameter inference and see 1 for detail. 107 Observed sequence: ! Initial parameters: #(%) = (,*+(%) = , Latent variable sequence: - ! Mean: .(%) Variance: *.(%) - Divide - ! into segments /0,/1, by HDP-GP-HSMM - Update parameters .(%),*.(%) Convert ! to latent variable sequence - ! by VAE using .% and *.% as prior Segments: /1,/3, Classes: 41,45, Fig. 4.Overview of parameter estimation for HVGH. The parameters are learned by a mutual learning loop of the VAE and HDP-GP-HSMM. In the HDP-GP-HSMM, segments and classes of latent variables are determined by sampling. For effi cient estima- tion, we utilize a blocked Gibbs sampler 23, in which all segments and their classes in one observed sequence are sampled. The segments of latent variables and their classes are sampled as follows: (Zn,1, ,Zn,Jn),(cn,1, ,cn,Jn) p(Z1,Z2,),(c1,c2,)| sn).(21) The parameters of the Gaussian process of each class c and transition probability P(c|c) are updated by using the sampled segments and their classes. However, it is diffi cult to compute Eq. (21) because an infi nite number of classes is assumed. To overcome this problem, we use a slice sampler to compute these probabilities by stochastically truncating the number of classes. Moreover, the probabilities of all possible patterns of segments and classifi cations are required in Eq. (21), and these cannot be computed naively, owing to the large com- putational cost. To compute Eq. (21) effi ciently, we utilize forward fi lteringbackward sampling 24. B. Parameter Inference of the VAE The parameters of the encoder and decoder of VAE are estimated to maximize the likelihood p(x). However, it is diffi cult to maximize the likelihood directly. Instead, the normal VAE maximizes the following variational lower limit: L(xji,zji)= qenc(zji|xji)logpdec(xji|zji)dzji DKL(qenc(zji|xji)|p(zji|0,e), (22) where qenc(zji|xji)logpdec(xji|zji)dz represents the re- construction error. Moreover, p(zji|0,e) is a prior dis- tribution of zji, which is a Gaussian distribution whose mean is 0, and the variancecovariance matrix is e. (a) (b) ! ! 1 ! 2 ! ! 1 ! 2 Fig. 5. Infl uence of prior distribution. The blue region represents the standard deviation. (a) The same prior distribution is used for any data points in the normal VAE. (b) The distribution is varied depending on the time step and the class of the data point in the VAE in HVGH. DKL(qenc(zji|xji)|p(zji|0,e) is the KullbackLeibler di- vergence, and this works as a regularization term. On the other hand, in the HVGH, the mean c(i) and the variance covariance matrix c(i) are used as the parameters of the prior distribution. These are the parameters of the predictive distribution of class c into which zji is classifi ed, and they are estimated by HDP-GP-HSMM: L(xji,zji) = qenc(zji|xji)logpdec(xji|zji)dzji DKL(qenc(zji|xji)|p(zji|c(i),c(i). (23) Fig. 5 illustrates the difference in prior distribution between Eq. (22) and Eq. (23). In the normal VAE using Eq. (22), the prior distribution is N(0,e) against all data points, as shown in Fig. 5(a). On the other hand, the parameters of the prior distribution of HVGH are computed by the Gaussian process, as shown in Fig. 5(b). Because the GP restricts data points that have closer time steps to being more similar values, zjibecomes a similar value to zj,i1and zj,i+1. Therefore, the latent space learned by the VAE can refl ect the characteristics of time-series data. Moreover, these parameters have different values depending on the class of the data point. Therefore, the latent space can also refl ect the characteristics of each class. IV. EXPERIMENTS To validate the proposed HVGH, we applied it to several types of time-series data. For comparison, we used HDP-GP- HSMM 1, HDP-HMM 15, HDP-HMM+NPYLM 16, BP-HMM 17, and Autoplait 18 as baseline methods. A. Datasets To evaluate the validity of the proposed method, we used the following four motion-capture datasets. Chicken dance: We used a sequence of motion capture data of a human performing a chicken dance from the CMU Graphics Lab Motio

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论