版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Learning low-dimensional state embeddings and metastable clusters from time series data Yifan Sun Carnegie Mellon University Yaqi Duan Princeton University Hao Gong Princeton University Mengdi Wang Princeton University A
2、bstract This paper studies how to fi nd compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank. In the spirit of diffusion map, we propose an effi cient method for learning a low- dimensional state embedding and capturing the
3、processs dynamics. This idea also leads to a kernel reshaping method for more accurate nonparametric estimation of the transition function. State embedding can be used to cluster states into metastable sets, thereby identifying the slow dynamics. Sharp statistical error bounds and misclassifi cation
4、 rate are proved. Experiment on a simulated dynamical system shows that the state clustering method indeed reveals metastable structures. We also experiment with time series generated by layers of a Deep-Q-Network when playing an Atari game. The embedding method identifi es game states to be similar
5、 if they share similar future events, even though their raw data are far different. 1Introduction High-dimensional time series is ubiquitous in scientifi c studies and machine learning. Finding compact representation from state-transition trajectories is often a prerequisite for uncovering the under
6、lying physics and making accurate predictions. Suppose that we are given a Markov process Xttaking values in Rd. Letp(y|x)be the one-step transition density function (transition kernel) of the Markov process. In practice, state-transition trajectories may appear high-dimensional, but they are often
7、generated by a system with fewer internal parameters and small intrinsic dimension. In this paper, we focus on problems where the transition kernelp(y|x)admits a low-rank decomposi- tion structure. Low-rank or nearly low-rank nature of the transition kernel has been widely identifi ed in scientifi c
8、 and engineering applications, e.g. molecular dynamics RZMC11,SS13, periodized diffusion process G ar54 , traffi c transition data ZW18,DKW19, Markov decision process and reinforcement learning KAL16. For reversible dynamical systems, leading eigenfunctions ofpare related to metastable sets and slow
9、 dynamics SS13. Low-rank latent structures also helps state representation learning and dimension reduction in robotics and control BSB+15. Our goal is to estimate the transition kernelp(|) from fi nite time series and fi nd state representation in lower dimensions. For nonparametric estimation of p
10、robability distributions, one natural approach is the kernel mean embedding (KME). Our approach starts with a kernel space, but we “open up” the kernel function into a set of features. We will leverage the low-rankness ofpin the spirit of diffusion map for dimension reduction. By using samples of tr
11、ansition pairs(Xt,Xt+1), we can estimate the “projection” ofp onto the product feature space and fi nds its leading singular functions. This allows 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. us to learn state embeddings that preserve information about
12、 the transition dynamics. Our approach can be thought of as a generalization of diffusion map to nonreversible processes and Hilbert space. We show that, when the features can fully express the truep, the estimated state embeddings preserve the diffusion distances and can be further used to cluster
13、states that share similar future paths, thereby fi nding metastable sets and long-term dynamics of the process. The contributions of this paper are: 1.KME Reshaping for more accurate estimation ofp. The method of KME reshaping is proposed to estimatepfrom dependent time series data. The method takes
14、 advantage of the low-rank structure ofp and can be implemented effi ciently in compact space. Theorem 1 gives a fi nite-sample error bound and shows that KME reshaping achieves signifi cantly smaller error than plain KME. 2.State embedding learning with statistical distortion guarantee. In light of
15、 the diffusion map, we study state embedding by estimating the leading spectrum of the transition kernel. Theorems 2,3 show that the state embedding largely preserves the diffusion distance. 3. State clustering with misclassifi cation error bound. Based on the state embeddings, we can further aggreg
16、ate states to preserve the transition dynamics and fi nd metastable sets. Theorem 4 establishes the statistical misclassifi cation guarantee for continuous-state Markov processes. 4. Experiments with diffusion process and Atari game. The fi rst experiment studies a simulated stochastic diffusion pro
17、cess, where the results validate the theoretical bounds and reveals metastable structures of the process. The second experiment studies the time series generated by a deep Q- network (DQN) trained on an Atari game. The raw time series is read from the last hidden layer as the DQN is run. The state e
18、mbedding results demonstrate distinctive and interpretable clusters of game states. Remarkably, we observe that game states that are close in the embedding space share similar future moves, even if their raw data are far different. To our best knowledge, our theoretical results on estimatingp and st
19、ate embedding are the fi rst of their kind for continuous-state nonreversible Markov time series. Our methods and analyses leverage spectral properties of the transition kernel. We also provide the fi rst statistical guarantee for partitioning the continuous state space according to diffusion distan
20、ce. Related Work Spectral dimension reduction methods fi nd wide use in data analysis and scientifi c computing. Diffusion map is a prominent dimension reduction tool which applies to data analysis, graph partitioning and dynamical systems LL06,CKL+08. For molecular dynamics, SNL+11 showed that lead
21、ing spectrum of transition operator contains information on slow dynamics of the system, and it can be used to identify coresets upon which a coarse-grained Markov state model could be built. KSM18 extended the transfer operator theory to reproducing kernel spaces and pointed out the these operators
22、 are related to conditional mean embeddings of the transition distributions. See KKS16,KNK+18 for surveys on data-driven dimension reduction methods for dynamical systems. They did not study statistical properties of these methods which motivated our research. Nonparametric estimation of the Markov
23、transition operator has been thoroughly studied, see Yak79,Lac07,Sar14. Among nonparametric methods, kernel mean embeddings are prominent for representing probability distributions BTA04,SGSS07. SHSF09 extended kernel embedding methods to conditional distributions. GLB+12 proposed to use conditional
24、 mean embedding to model Markov decision processes. See MFSS17 for a survey on kernel mean embedding. None of these works considered low-rank estimation of Markov transition kernel, to our best knowledge. Estimation of low-rank transition kernel was fi rst considered by ZW18 in the special case of f
25、i nite- state Markov chains. ZW18 used a singular thresholding method to estimate the transition matrix and proves near-optimal error upper and lower bounds. They also proved misclassifi cation rate for state clustering when the chain is lumpable or aggragable. LWZ18 studied a rank-constrained maxim
26、um likelihood estimator of the transition matrix. DKW19 proposed a novel approach for fi nding state aggregation by spectral decomposing transition matrix and transforming singular vectors. For continuous-state reversible Markov chains, LP18 studied the nonparametric estimation of transition kernel
27、via Galerkin projection with spectral thresholding. They proved recovery error bounds when eigenvalues decay exponentially. NotationsFor a functionf : R , we defi nekfk2 L2 := R f(x) 2dx andkfk2 L2() := R (x)f(x) 2dx, respectively. Forg(,) R , we defi nekg(,)kL2()L2:= (R(x)g(x,y)2dydx)1/2. We usek k
28、to 2 denote the Euclidean norm of a vector. We lettmixdenote the mixing time of the Markov process LP17, i.e, tmix= min ?t | TV(Pt( | x),() 1 4,x ? ,whereTVis the total variation divergence between two distributions. Let(x)be the density function of invariant measure of the Markov chain. Letp(x,y)be
29、 the den- sity of the invariant measure of the bivariate chain(Xt,Xt+1) t=0, i.e.p(Xt,Xt+1) = (Xt)p(Xt+1|Xt). We use P() to denote probability of an event. 2KME Reshaping for Estimating p In this section we study the estimation of of transition functionp from a fi nite trajectoryXtn t=1 Rd. We make
30、following low-rank assumption regarding the transition kernelp, which is key to more accurate estimation. Assumption 1.There exist real-valued functionsukr k=1, vkr k=1 onsuch thatp(y|x) := Pr k=1kuk(x)vk(y), where r is the rank. Due to the asymmetry ofp(,)and lack of reversibility, we use two repro
31、ducing kernel Hilbert spacesHand Hto embed the left and right side ofp. LetKand Kbe the kernel functions forHand Hrespectively. The Kernel Mean Embedding (KME)p(x,y)of the joint distributionp(x,y)into the product space H H is defi ned by p(x,y) := Z K(x,u)K(y,v)p(u,v)dudv. Given sample transition pa
32、irs(Xi,X0 i)ni=1, the natural empirical KME estimator is p(x,y) = 1 n Pn i=1K(Xi,x) K(X0 i,y).If data pairs are independent, one can show that the embedding error kp pkH H is approximately q E(X,Y )pK(X,X)K(Y,Y ) n (Lemma 1 in Appendix). Next we propose a sharper KME estimator. Suppose that the kern
33、el functionsKand K are continuous and symmetric semi-defi nite. Let j(x)jJandj(x)jJbe the real-valued feature functions onsuch thatK(x,y) = P jJj(x)j(y), and K(x,y) = P jJ j(x)j(y).In practice, if one is given a shift-invariant symmetric kernel function, we can generate fi nitely many random Fourier
34、 features to approximate the kernel RR08. In what follows we assume without loss of generality that J is fi nite of size N. Let (x) = 1(x),.,N(x)T RN . We defi ne the “projection” of p onto the feature space by P = Z p(x,y)(x)(y)Tdxdy.(1) Assumption 1 suggests thatrank(P) r(Lemma 2 in Appendix). Not
35、e that the KME ofp(x,y)is equivalent top(x,y) = (x)TP(y)(Lemma 3 in Appendix). The matrixP is of fi nite dimensions, therefore we can estimate it tractably from the trajectory Xt by P := 1 n n X t=1 (Xt)(Xt+1)T.(2) Since the unknownPis low-rank, we propose to apply singular value truncation to Pfor
36、obtaining a better KME estimator. The algorithm is given below: Algorithm 1: Reshaping the Kernel Mean Embedding. Input:X1,.,Xn,r; Get P by (2), compute its SVD: P = UV; Let P := U1.rV be the best rank r approximation of P; Let p(x,y) := (x)TP(y); Output: p(x,y) We analyze the convergence rate of pt
37、op. LetKmax:= maxsupxK(x,x),supx K(x,x). We defi ne following kernel covariance matrices: V1= E(X,Y )p(X)(X)T K(Y,Y ),V2= E(X,Y )pK(X,X)(Y )(Y )T. Let := maxmax(V1),max(V2 ). We show the following fi nite-sample error bound. 3 Theorem 1 (KME Reshaping). Let Assumption 1 hold. For any (0,1), we have
38、kp pkH H = kP PkF Cr ?rt mixlog(2tmixN/) n + tmixKmaxlog(2tmixN/) 3n ? with probability at least 1 , where C is a universal constant. The KME reshaping method and Theorem 1 enjoys the following advantages: 1.Improved accuracy compared to plain KME.The plain KME ps estimation error is ap- proximately
39、 q E(X,Y )pK(X,X)K(Y,Y ) n (Appendix Lemma 1). Note thatTr(V1) = Tr(V2) = E(X,Y )pK(X,X)K(Y,Y ). Whenr ? N, we typically haver ? Tr(V1) = Tr(V2), there- fore the reshaped KME has a signifi cantly smaller estimation error. 2.Ability to handle dependent data.Algorithm 1 applies to time series consisti
40、ng of highly dependent data. The proof of Theorem 1 handles dependency by constructing a special matrix martingale and using the mixing properties of the Markov process to analyze its concentration. 3.Tractable implementation.Kernel-based methods usually require memorizing all the data and may be in
41、tractable in practice. Our approach is based on a fi nite number of features and only needs to low-dimensional computation. As pointed out by RR08, one can approximate any shift- invariant kernel function usingNfeatures whereNis linear with respect to the input dimensiond. Therefore Algorithm 1 can
42、be approximately implemented in O(nd2) time and O(d2) space. 3Embedding States into Euclidean Space In this section we want to learn low-dimensional representations of the state spaceto capture the transition dynamics. We need following extra assumption thatpcan be fully represented in the kernel sp
43、ace. Assumption 2. The transition kernel belongs to the product Hilbert space, i.e., p( | ) H H. For two arbitrary states x,y , we consider their distance given by dist(x,y) := kp(|x) p(|y)kL2= ?Z ?p(z|x) p(z|y)?2dz?1/2. (3) Eq.(3)is known as the diffusion distance NLCK06. It measures the similarity
44、 between future paths of two states. We are motivated by the diffusion map approach for dimension reduction LL06,CKL+08,KSM18. Diffusion map refers to the leading eigenfunctions of the transfer operator of a reversible dynamical system. We will generalize it to nonreversible processes and feature sp
45、aces. For simplicity of presentation, we assume without loss of generality thatiN i=1and iN i=1are L2()andL2orthogonal bases ofHand Hrespectively, with squared norms1 Nand 1 Nrespectively. Any given features can be orthogonalized to satisfy this condition. In particular, let the matrixC := diag1, ,N
46、, C := diag 1, , N, it is easy to verify that p(y|x) = (x)TC1PC 1 (y). LetC1/2PC 1/2 = U()() 1rV () be its SVD. We defi ne the state embedding as (x) := ? (x)TC1/2U()() 1r ?T . It is straightforward to verify thatdist(x,z) = k(x) (z)k. We propose to estimatein Algorithm 2. Algorithm 2: Learning Stat
47、e Embedding Input:X1,.,Xn,r; Get P from (2), compute SVDU () ()V () = C1/2PC 1/2; Compute state embedding using fi rst r singular pairs (x) = ? (x)TC1/2U () () 1r ?T ; Output: x 7 (x) 4 Let d dist(x,z) := k(x) (z)k.We show that the estimated state embeddings preserve the diffusion distance with an a
48、dditive distortion. Theorem 2(Maximum additive distortion of state embeddings).Let Assumptions 1,2 hold. Let Lmax:= supx(x)TC1(x)and letbe the condition number of p(x)p(y|x). For any 0 42, then for any 0 42is a separability condition needed for fi nding the correct clusters with high probability, an
49、d 162 2 2 1 is non-vanishing misclassifi cation error. In the case of reversible fi nite-state Markov process, the clustering problem is equivalent to fi nding the optimal metastablem-full partition given byargmax1,m Pm k=1p(k|k),wherep(j|i) := 1 (i) R xi,yj (x)p(y|x)dydxELVE08,SS13. The optimal par
50、tition( 1, ,m) gives metastable sets that can be used to construct a reduced-order Markov state model SS13. In the more general case of nonreversible Markov chains, the proposed method will cluster states together if they share similar future paths. It provides an unsupervised learning method for st
51、ate aggregation, which is a widely used heuristic for dimension reduction of control and reinforcement learning BT96,SJJ95. 5Experiments 5.1Stochastic Diffusion Processes We test the proposed approach on simulated diffusion processes of the formdXt= V (Xt)dt + 2dB t,Xt Rd, whereV ()is a potential fu
52、nction andBtt0is the standard Brownian motion. For any interval 0, the discrete-time trajectoryXk k=1is a Markov process. We apply the Euler method to generate sample pathXkn k=1according to the stochastic differential equation. We use the Gaussian kernelsK(x,y) = K(x,y) = 1 (22)d/2e kxyk2 2 22 wher
53、e 0, and construct RKHSH = HfromL2(). To get the features, we generate 2000 random Fourier features h = h1,h2,.,hNsuch thatK(x,y) PN i=1hi(x)hi(y)(RR08), and then orthogonalizeh to get . 1000080000 0.1 0.15 0.2 0.25 Figure 1:Reshaped KME versus plain KME.The error curve approximately satis- fi es a
54、convergence rate of n1/2. Comparison between reshaped and plain KME We apply Algorithm 1 to fi nd the reshaped KME pand compare its error with the plain KME pgiven by(2). The experiment is performed on a four-well diffusion onR, and we take rankr = 4. By orthogonalizingN = 2000 randomFourierfeatures
55、, weobtainJ = 82basisfunctions. Figure 5.1 shows that the reshaped KME consistently out- performs plain KME with varying sample sizes. State clustering to reveal metastable structures We apply the state clustering method to analyze metastable structures of a diffusion process whose potential functio
56、n V (x)is given by Figure 5.1 (a). We generate trajectories of lengthn = 106and take the time interval to be = 0.1,1,5and10. We conduct the state embedding and clustering procedures with rankr = 4. Figure 5.1 (c) shows the clustering results for = 1with a varying number of clusters. The partition re
57、sults reliably reveal metastable sets, which are also known as invariance sets that characterize slow dynamics of this process. Figure 5.1 (d) shows the four-cluster results with varying values of, where the contours are based on diffusion distances to the centroid in each cluster. One can see that the diffusion distance contours are dense whentakes small values. This is because, whenis small, the state embedding method largely captures fast local dynamics. By taking
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024-2025学年医学检验(中级)练习题【新题速递】附答案详解
- 1.1金属的物理、化学性能 课件(共12张) - 中职《金属材料及热处理(第三版)》同步教学(劳动版)
- 2024-2025学年度监理工程师考前冲刺练习题含完整答案详解(有一套)
- 2024-2025学年度化验员综合提升测试卷含答案详解(精练)
- 2024-2025学年度中医助理医师高分题库附参考答案详解【培优B卷】
- 2024-2025学年咨询工程师题库试题及完整答案详解(夺冠)
- 纺织厂安全生产管理制度
- 风湿病关节炎康复训练方案
- 校园环境消杀培训课件
- 弘扬伟大抗疫精神 凝聚新时代奋进力量
- 幼儿园实物拓印版画教学的实践研究
- 2025年湖南农电服务招聘考试(非电工类)模拟试题及答案
- 通信行业市场前景及投资研究报告:光模块框架培训
- 国有企业资产租赁合同协议(GF-2025-002)
- 禁毒安全主题班会课件
- 2024河南农业大学辅导员招聘笔试真题及答案
- 餐饮具清洗消毒规程培训考试题及答案
- 2025年幼师高考语文试卷及答案
- 变压器拆除施工方案
- 2025年注册安全工程师历年考试真题及答案
- 2026年高考试题汇编物理专题13原子结构原子核和波粒二象性
评论
0/150
提交评论