NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections

上传人：我*** IP属地：北京上传时间：2020-05-30 格式：PDF 页数：12 大小：1.85MB 积分：12 举报 版权申诉

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections_第2页

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections_第3页

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections_第4页

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections_第5页

已阅读5页，还剩7页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections Boris Muzellec CREST, ENSAE boris.muzellecensae.fr Marco Cuturi Google Brain and CREST, ENSAE cuturi Abstract Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than that between the original measures. This approach is however diffi cult to extend to the case in which one wants to compute an OT map (a Monge map) between the original measures. Since computations are carried out on lower-dimensional projections, classical map estimation techniques can only produce maps operating in these reduced dimensions. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space. We prove that the best optimal transport plan that takes such “subspace detours” is a generalization of the Knothe-Rosenblatt transport. We show that these plans can be explicitly formulated when comparing Gaussian measures (between which the Wasserstein distance is commonly referred to as the Bures or Frchet distance). We provide an algorithm to select optimal subspaces given pairs of Gaussian measures, and study scenarios in which that mediating subspace can be selected using prior information. We consider applications to semantic mediation between elliptic word embeddings and domain adaptation with Gaussian mixture models. 1Introduction Minimizing the transport cost between two probability distributions 32 results in two useful quantities: the minimum cost itself, often cast as a loss or a metric (the Wasserstein distance), and the minimizing solution, a function known as the Monge 20 map that pushes forward the fi rst measure onto the second with least expected cost. While the former has long attracted the attention of the machine learning community, the latter is playing an increasingly important role in data sciences. Indeed, important problems such as domain adaptation 8, generative modelling 17,2,16, reconstruction of cell trajectories in biology 28 and auto-encoders 19,30 among others can be recast as the problem of fi nding a map, preferably optimal, which transforms a reference distribution into another. However, accurately estimating an OT map from data samples is a diffi cult problem, plagued by the well documented instability of OT in high-dimensional spaces 11,13 and its high computational cost. Optimal Transport on Subspaces.Several approaches, both in theory and in practice, aim at bridging this gap. Theory 33 supports the idea that sample complexity can be improved when the measures are supported on lower-dimensional manifolds of high-dimensional spaces. Practical insights 9 supported by theory 15 advocate using regularizations to improve both computational and sample complexity. Some regularity in OT maps can also be encoded by looking at specifi c 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. families of maps 29,23. Another trend relies on lower-dimensional projections of measures before computing OT. In particular, sliced Wasserstein (SW) distances 4 leverage the simplicity of OT between 1D measures to defi ne distances and barycentres, by averaging the optimal transport between projections onto several random directions. This approach has been applied to alleviate training complexity in the GAN/VAE literature 10,34 and was generalized very recently in 22 who considered projections onk-dimensional subspaces that are adversarially selected. However, these subspace approaches only carry out half of the goal of OT: by design, they do result in more robust measures of OT costs, but they can only provide maps in subspaces that are optimal (or nearly so) between the projected measures, not transportation maps in the original, high-dimensional space in which the original measures live. For instance, the closest thing to a map one can obtain from using several SW univariate projections is an average of several permutations, which is not a map but a transport plan or coupling 2625, p.6. Our approach.Whereas the approaches cited above focus on OT maps and plans in projection subspaces only, we consider here plans and maps on the original space that are constrained to be optimalwhenprojectedonagivensubspaceE . Thisresultsinthedefi nitionofaclassoftransportation plans that fi guratively need to make an optimal “detour” inE. We propose two constructions to recover such maps corresponding respectively (i) to the independent product between conditioned measures, and (ii) to the optimal conditioned map. Paper Structure.After recalling background material on OT in section 2, we introduce in section 3 the class of subspace-optimal plans that satisfy projection constraints on a given subspaceE. We characterize the degrees of freedom ofE-optimal plans using their disintegrations onEand introduce two extremal instances: Monge-Independent plans, which assume independence of the conditionals, and Monge-Knothe maps, in which the conditionals are optimally coupled. We give closed forms for the transport between Gaussian distributions in section 4, respectively as a degenerate Gaussian distribution, and a linear map with block-triangular matrix representation. We provide guidelines and a minimizing algorithm for selecting a subspaceEwhen it is not prescribed a priori in section 5. Finally, in section 6 we showcase the behavior of MK and MI transports on (noisy) synthetic data, show how using a mediating subspace can be applied to selecting meanings for polysemous elliptical word embeddings, and experiment usingMKmaps with the minimizing algorithm on a domain adaptation task with Gaussian mixture models. Notations.ForEa linear subspace ofRd,E?is its orthogonal complement,VE2 Rdk(resp. VE?2 Rdd?k) the matrix of orthonormal basis vectors ofE(respE?).pE: x ! V Exis the orthogonal projection operator ontoE.P2(Rd)is the space of probability distributions overRdwith fi nite second moments.B(Rd)is the Borel algebra overRd.*denotes the weak convergence of measures.is the product of measures, and is used in measure disintegration by abuse of notation. 2Optimal Transport: Plans, Maps and Disintegration of Measure Kantorovitch Plans. For two probability measures , 2 P2(Rd), we refer to the set of couplings (,) def =? 2 P(Rd Rd) : 8A,B 2 B(Rd),?(A Rd) = (A),?(Rd B) = (B) as the set of transportation plans between,. The 2-Wasserstein distance betweenandis defi ned as W2 2(,) def =min ?2(,)E(X,Y )? kX ? Y k2 . Conveniently, transportation problems with quadratic cost can be reduced to transportation between centered measures. Indeed, letm(resp.m ) denote fi rst moment of(resp.). Then,8? 2 (,),E(X,Y )?kX?Y k2 = km?mk2+E(X,Y )?k(X?m)?(Y ?m)k2. Therefore, in the following all probability measures are assumed to be centered, unless stated otherwise. Monge Maps.For a Borel-measurable mapT, the push-forward ofbyT is defi ned as the measure Tsatisfying for allA 2 B(Rd),T(A) = (T?1(A). A map such thatT = is called a transportation map fromto. When a transportation map exists, the Wasserstein distance can be written in the form of the Monge problem W2 2(,) = min T:T=EXkX ? T(X)k 2. 2 When it exists, the optimal transportation mapT?in the Monge problem is called the Monge map fromto. It is then related to the optimal transportation plan?by the relation?= (Id,T?). When and are absolutely continuous (a.c.), a Monge map always exists (27, Theorem 1.22). Global Maps or Plans that are Locally Optimal.Considering the projection operator onE,pE, we writeE= (pE)for the marginal distribution ofonE. Suppose that we are given a Monge mapSbetween the two projected measuresEandE. One of the contributions of this paper is to propose extensions of this mapSas a transportation plan?(resp. a new mapT) whose projection ?E= (pE,pE)?on that subspaceEcoincides with the optimal transportation plan(IdE,S)E (resp.pE? T = S ? pE). Formally, the transports introduced in section 3 only require thatSbe a transport map fromEtoE, but optimality is required in the closed forms given in section 4 for Gaussian distributions. In either case, this constraint implies that?is built “assuming that” it is equal to (IdE,S)E on E. This is rigorously defi ned using the notion of measure disintegration. Disintegration of Measures.The disintegration ofon a subspaceEis the collection of measures (xE)xE2E supported on the fi bersxE E?such that any test function?can be integrated againstas R Rd ?d = R E ?R E? ?(y)dxE(y)?dE(xE). In particular, ifX , then the law of XgivenxEisxE. By abuse of the measure product notation, measure disintegration is denoted as = xE E. A more general description of disintegration can be found in 1, Ch. 5.5. 3Lifting Transport from Subspace to Full Space Given two distributions, 2 P2(Rd), it is often easier to compute a Monge mapSbetween their marginalsE,Eon ak-dimensional subspaceErather than in the whole spaceRd. Whenk = 1, this fact is at the heart of sliced wasserstein approaches 4, which have recently sparked interest in the GAN/VAE literature 10,34. However, whenk E+ VE?V E?, d2 P(x,y) def =(x ? y)P(x ? y). Let Tbe the optimal transport map for the cost d2 P. Then T ! TMKin L2(). Proof in the supplementary material. MI as a limit of the discrete case.Whenandare a.c., forn 2 Nletn,ndenote the uniform distribution overni.i.d. samples fromandrespectively, and letnbe an optimal transportation plan between(pE)nand(pE)ngiven by a Monge map (which is possible assuming uniform weights and non-overlapping projections). We have thatn* andn* . From 27, Th 1.50, 1.51, we have thatn2 P2(E E)converges weakly, up to subsequences, to a coupling 2 P2(E E)that is optimal forEandE. On the other hand, up to points having the same 4 projections, the discrete plansncan also be seen as plans inP(Rd Rd). A natural question is then whether the sequence n2 P(Rd Rd) has a limit in P(Rd Rd). Proposition 3.Let, 2 P2(Rd)be a.c. and compactly supported,n,n,n ? 0be uniform distributions over n i.i.d. samples, and n2 E(n,n),n ? 0. Then n* MI(,). Proof in the supplementary material. We conjecture that under additional assumptions, the compact- ness hypothesis can be relaxed. In particular, we empirically observe convergence for Gaussians. 4Explicit Formulas for Subspace Detours in the Bures Metric Multivariate Gaussian measures are a specifi c case of continuous distributions for which Wasserstein distances and Monge maps are available in closed form. We fi rst recall basic facts about optimal transport between Gaussian measures, and then show that theE-optimal transports MI and MK introduced in section 3 are also in closed form. For two Gaussians, one hasW2 2(,) = km? mk2+ B2(var,var)whereBis the Bures metric 3 between PSD matrices 14: B2(A,B) def = TrA + TrB ? 2Tr(A1/2BA1/2)1/2. The Monge map from a centered Gaussian distributionwith covariance matrixAto onewith covariance matrixBis linear and is represented by the matrixTAB def = A?1/2(A1/2BA1/2)1/2A?1/2. For any linear transport map,Thas covarianceTAT, and the transportation cost fromtoisEXkX ? TXk2 = TrA + TrB ? Tr(TA + AT). In the following,(resp.) will denote the centered Gaussian distribution with covariance matrixA(resp.B). We writeA = AEAEE? A EE? AE? whenAis represented in an orthonormal basis (VEVE?) of E ? E?. Monge-Independent Transport for Gaussians.The MI transport between Gaussian measures is given by a degenerate Gaussian, i.e. a measure with Gaussian density over the image of its covariance matrix (we refer to the supplementary material for the proof). Proposition 4 (Monge-Independent (MI) Transport for Gaussians). Let C def = ?V EAE+ VE?A EE? ? TAEBE ?V E+ (BE)?1BEE?V E? ? and def = ? AC CB ?. Then MI(,) = N(02d,) 2 P(Rd Rd). Figure 2: MI transport from a 2D Gaussian (red) to a 1D Gaussian (blue), projected on thex-axis. The two 1D distributions repre- sent the projections of both Gaussians on the x-axis, the blue one being already originally supported on thex-axis. The oblique hyper- plane is the support ofMI, onto which its density is represented. Knothe-Rosenblatt and Monge-Knothe for Gaus- sians.Before giving the closed-formMKmap for Gaussian measures, we derive theKRmap (27, section 2.3) with successive marginalization1on x1,x2,.,xd. Whend = 2and the basis is orthonor- mal for E ? E?, those two notions coincide. Proposition 5(Knothe-Rosenblatt (KR) Transport between Gaussians).LetLA(resp.LB) be the Cholesky factor ofA(resp.B). TheKRtransport fromtois a linear map whose matrix is given by TAB KR = LB(LA)?1. Its cost is the squared Frobe- nius distance between the Cholesky factorsLAand LB: EXkX ? TAB KR Xk2 = kLA? LBk2. Proof. The KR transport with successive marginaliza- tion onx1,x2,.,xdbetween two a.c. distributions has a lower triangular Jacobian with positive entries on the diagonal. Further, since the one-dimensional disintegrations of Gaussians are Gaussians them- selves, and since Monge maps between Gaussians 1Note that compared to 27, this is the reversed marginalization order, which is why the KRmap here has lower triangular Jacobian. 5 are linear, the KR transport between two centered Gaussians is a linear map, hence its matrix representation equals its Jacobian and is lower triangular. LetT = LB(LA)?1. We haveTAT= LBL?1 A LAL AL ? A L B = LBL B = B, i.e.T = . Further, sinceTLAis the Cholesky factor forB, and sinceAis supposed non-singular, by unicity of the Cholesky decompositionTis the only lower triangular matrix satisfyingT = . Hence, it is the KR transport map from to . Finally, we have thatEXkX ?TKRXk2 = Tr(A+B?(A(TKR)+TKRA) = Tr(LAL A+ LBL B? (LAL B+ LBL A) = kLA? LBk 2 Corollary 1.The (square root) cost of the Knothe-Rosenblatt transport(EXkX ? TKRXk2)1/2 between centered gaussians defi nes a distance (i.e. it satisfi es all three metric axioms). Proof. This comes from the fact that ?E XkX ? TKRXk2 ?1/2 = kLA? LBk. As can be expected from the fact that MK can be seen as a generalization of KR, the MK transportation map is linear and has a block-triangular structure. The next proposition shows that the MK transport map can be expressed as a function of the Schur complementsA/AE def = AE? A EE?A ?1 E AEE? ofAw.r.t.AE, andBw.r.t.BE, which are the covariance matrices of(resp.) conditioned onE. (a) Usual Monge Interpolation of Gaussians(b) Monge-Knothe Interpolation through E Figure 3: (a) Wasserstein-Bures geodesic and (b) Monge-Knothe interpolation throughE = (x,y) : x = y from 0to 1, at times t = 0,0.25,0.5,0.75,1. Proposition 6(Monge-Knothe (MK) Transport for Gaussians).LetA,Bbe represented in an orthonormal basis for E ? E?. The MK transport map on E between and is given by TMK= TAEBE0k(d?k) B EE?(T AEBE)?1 ? T(A/AE)(B/BE)A EE? (AE)?1T(A/AE)(B/BE) . Proof. As can be seen from the structure of theMK transport map in Defi nition 3,TMKhas a lower block-triangular Jacobian (with block sizeskandd ? k), with PSD matrices on the diagonal (corre- sponding to the Jacobians of the Monge maps (i) between marginals and (ii) between conditionals). Further, sinceandare Gaussian, their disintegrations are Gaussian as well. Hence, all Monge maps from the disintegrations of to that of are linear, and therefore the matrix representing T is equal to its Jacobian. One can check that the mapT in the proposition verifi esTAT= Band is of the right form. One can also check that it is the unique such matrix, hence it is the MK transport map. 5Selecting the Supporting Subspace Algorithm 1 MK Subspace Selection Input: A,B 2 PSD,k 2 1,d, V Polar(AB) while not converged do L MK(VAV,VBV;k) V V ? rVL V Polar(V) end while Output: E = Spanv1,.,vk Both MI and MK transports are highly dependent on the chosen subspaceE. Depending on applications,Ecan either be prescribed (e.g. if one has access to a transport map between the marginals in a given subspace) or has to be selected. In the latter case, we give guidelines on how prior knowledge can be used, and alternatively propose an algorithm for minimizing the MK distance. Subspace Selection Using Prio

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections

文档简介

温馨提示

最新文档

评论

NIPS20198915-subspace-detours-building-transport-plans-that-are-optimal-on-subspace-projections

文档简介

温馨提示

最新文档

评论

相关文档