iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling

上传人：我*** IP属地：北京上传时间：2020-06-20 格式：PDF 页数：13 大小：1.16MB 积分：12 举报 版权申诉

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling_第2页

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling_第3页

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling_第4页

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling_第5页

已阅读5页，还剩8页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、Expressive power of tensor-network factorizations for probabilistic modeling Ivan Glasser1,2, Ryan Sweke3, Nicola Pancotti1,2, Jens Eisert3,4, J. Ignacio Cirac1,2 1Max-Planck-Institut fr Quantenoptik, D-85748 Garching 2Munich Center for Quantum Science and Technology (MCQST), D-80799 Mnchen 3Dahlem

2、Center for Complex Quantum Systems, Freie Universitt Berlin, D-14195 Berlin 4Department of Mathematics and Computer Science, Freie Universitt Berlin, D-14195 Berlin Abstract Tensor-network techniques have recently proven useful in machine learning, both as a tool for the formulation of new learning

3、algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor- network factorizations

4、 of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspon- dence with hidden Markov models, and Born machines, which are naturally related to the probabilistic interpretation of quantum circuits. When used to model proba-

5、 bility distributions, they exhibit tractable likelihoods and admit effi cient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Of partic

6、ular interest, using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purifi ed states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably bette

7、r expressive power than all other representations considered. The ramifi cations of this result are explored through numerical experiments. 1Introduction Many problems in diverse areas of computer science and physics involve constructing effi cient representations of high-dimensional functions. Neur

8、al networks are a particular example of such representations that have enjoyed great empirical success, and much effort has been dedicated to understanding their expressive power - i.e. the set of functions that they can effi ciently represent. Analogously, tensor networks are a class of powerful re

9、presentations of high-dimensional arrays (tensors), for which a variety of algorithms and methods have been developed. Examples of such tensor networks are tensor trains/matrix product states (MPS) 1,2 or the hierarchical Tucker decomposition 3,4, which have found application in data compression 57,

10、 the simulation of physical systems 810 and the design of machine learning algorithms 1116. In addition to their use in numerical algorithms, tensor networks enjoy a rich analytical understanding which has facilitated their use as a tool for obtaining rigorous results on the expressive power of deep

11、 learning models 1722, and fundamental insights into the structure of quantum mechanical systems 23. In the context of probabilistic modeling, tensor networks have been shown to be in natural corre- spondence with probabilistic graphical models 2429, as well as with Sum-Product Networks and Correspo

12、nding author, ivan.glassermpq.mpg.de 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Arithmetic Circuits 17,30,31. Motivated by this correspondence, and with the goal of enhancing the toolbox for deriving analytical results on the properties of machine lea

13、rning algorithms, we study the expressive power of various tensor-network models of discrete multivariate probability distributions. The models we consider, defi ned in Section 2, fall into two main categories: Non-negative tensor networks, which decompose a probability mass function as a network of

14、 non-negative tensors 32, as in a probabilistic graphical model 33. Born machines (BM), which model a probability mass function as the absolute value squared of a real or complex function, which is itself represented as a network of real or complex tensors. While Born machines have been previously e

15、mployed for probabilistic modeling 3440, they have additional potential applications in the context of quantum machine learning 4144, since they arise naturally from the probabilistic interpretation of quantum mechanics. These models are considered precisely because they represent non-negative tenso

16、rs by construction. In this work we focus on tensor networks which are based on tensor-trains/MPS and generaliza- tions thereof, motivated by the fact that these have tractable likelihood, and thus effi cient learning algorithms, while lending themselves to a rigorous theoretical analysis. In this s

17、etting non-negative tensor networks encompass hidden Markov models (HMM), while Born machines include models that arise from local quantum circuits of fi xed depth. Our results also apply to tensor networks with a tree structure, and as such can be seen as a more general comparison of the difference

18、 between non-negative tensor networks and Born machines. The main result of this work is a characterization of the expressive power of these tensor networks. Interestingly, we prove that there exist families of probability distributions for which there are unbounded separations between the resource

19、requirements of some of these tensor-network factoriza- tions. This allows us to show that neither HMM nor Born machines should be preferred to each other in general. Moreover, we prove that using complex instead of real tensors can sometimes lead to an arbitrarily large reduction in the number of p

20、arameters of the network. Furthermore, we introduce a new tensor-network model of discrete multivariate probability distri- butions with provably better expressive power than the previously introduced models. This tensor network, which retains an effi cient learning algorithm, is referred to as a lo

21、cally purifi ed state (LPS) due to its origin in the classical simulation of quantum systems 4548. We demonstrate through numerical experiments on both random probability distributions as well as realistic data sets that our theoretical fi ndings are relevant in practice - i.e. that LPS should be co

22、nsidered over HMM and Born machines for probabilistic modeling. This paper is structured as follows: The models we consider are introduced in Section 2. Their relation with HMM and quantum circuits is made explicit in Section 3. The main results on expressive power are presented in Section 4. Sectio

23、n 5 then introduces learning algorithms for these tensor networks, and the results of numerical experiments are provided in Section 6. 2Tensor-network models of probability distributions Consider a multivariate probability mass functionP(X1,.,XN)overNdiscrete random variables Xitaking values in1,.,d

24、. This probability mass function is naturally represented as a multi- dimensional array, or tensor, withNindices, each of which can takedvalues. As such, we use the notationPto refer simultaneously to both the probability mass function and the equivalent tensor representation. More specifi cally, fo

25、r each confi gurationX1,.,XNthe tensor elementPX1,.,XN stores the probabilityP(X1,.,XN). Note that asPis a representation of a probability mass function, it is a tensor with non-negative entries summing to one. Here we are interested in the case whereNis large. Since the number of elements of this t

26、ensor scales exponentially withN, it is quickly impossible to store. In cases where there is some structure to the variables, one may use a compact representation ofPwhich exploits this structure, such as Bayesian networks or Markov random fi elds defi ned on a graph. In the following we consider mo

27、dels, known as tensor networks, in which a tensorTis factorized into the contraction of multiple smaller tensors. As long asTis non-negative, one can modelPasP = T/ZT, whereZT= P X1,.,XN TX1,.,XN is a normalization factor. For all tensor networks considered in this work, this normalization factor ca

28、n be evaluated effi ciently, as explained in Section 5. 2 In particular, we defi ne the following tensor networks, in both algebraic and graphical notation. In the diagrams each box represents a tensor and lines emanating from these boxes represent tensor indices. Connecting two lines implies a cont

29、raction, which is a summation over the connected index. 1. Tensor-train/matrix product state (MPSF): A tensorT, withN d-dimensional indices, admits an MPSFrepresentation of TT-rankFr when the entries of T can be written as TX1,.,XN= r X i=1 A1 1,X1A 1,2 2,X2 AN2,N1 N1,XN1A N1 N,XN ,(1) X1XN T= A1AN

30、X1XN 12N1 ,(2) whereA1andANared rmatrices, andAiare order-3tensors of dimensiond r r, with elements inF R0,R,C. The indicesiof these constituent tensors run from1 to r and are contracted (summed over) to construct T. 2. Born machine (BMF): A tensorT, withN d-dimensional indices, admits a BMFrepresen

31、- tation of Born-rankFr when the entries of T can be written as TX1,.,XN= ? ? ? ? ? ? r X i=1 A1 1,X1A 1,2 2,X2 AN2,N1 N1,XN1A N1 N,XN ? ? ? ? ? ? 2 ,(3) X1XN T= A1AN A1AN X1XN X1XN 12N1 01020N1 ,(4) with elements of the constituent tensorsAiinF R,C, i.e., whenTadmits a representa- tion as the absol

32、ute-value squared (element-wise) of an MPSFof TT-rankFr. 3. Locally purifi ed state (LPSF): A tensorT, withN d-dimensional indices, admits an LPSF representation of puri-rankFr and purifi cation dimensionwhen the entries ofTcan be written as TX1,.,XN= r X i,0 i=1 X i=1 A1,1 1,X1 A 1,01 1,X1 A2,1,2 2

33、,X2 A 2,01,02 2,X2 AN,N1 N,XN A N,0 N1 N,XN , (5) X1XN T= A1AN A1AN X1XN X1XN 12N1 01020N1 12N,(6) whereA1andANare order-3tensors of dimensiond randAiare order-4tensors of dimensiond r r. The indicesirun from 1 tor, the indicesirun from 1 to , and both are contracted to constructT. Without loss of g

34、enerality we can consider only rd2. Note that all the representations defi ned above yield non-negative tensors by construction, except for MPSR/C. In this work, we consider only the subset of MPSR/Cwhich represent non-negative tensors. Given a non-negative tensorT we defi ne the TT-rankF(Born-rankF

35、) ofTas the minimalrsuch that Tadmits an MPSF(BMF) representation of TT-rankF(Born-rankF)r . We defi ne the puri-rankFofT 3 as the minimalrsuch thatTadmits an LPSFrepresentation of puri-rankFr , for some purifi cation dimension. We note that if we consider tensorsTwith 2d-dimensional indices (i.e.,

36、matrices) then the TT-rankR0is the non-negative rank, i.e., the smallestksuch thatTcan be written asT = AB withAbeingdkandBbeingk dmatrices with real non-negative entries. The TT-rankR/Cis the conventional matrix rank, the Born-rankR(Born-rankC) is the real (complex) Hadamard square-root rank, i.e.,

37、 the minimal rank of a real (complex) entry-wise square root ofT , and fi nally the puri-rankR (puri-rankC ) is the real (complex) positive semidefi nite rank 49. Table 1: Summary of notations for the different tensor-network representations and their ranks. Tensor representationMPSR0MPSR/CBMR/CLPSR

38、/C Tensor rankTT-rankR0TT-rankR/CBorn-rankR/Cpuri-rankR/C Matrix rank 49rank+rankrankR/CrankR/C,psd For a given rank and a given tensor network, there is a set of non-negative tensors that can be exactly represented, and as the rank is increased, this set grows. In the limit of arbitrarily large ran

39、k, all tensor networks we consider can represent any non-negative tensor. This work is concerned with the relative expressive power of these different tensor-network representations, i.e. how do these representable sets compare for different tensor networks. This will be characterized in Section 4 i

40、n terms of the different ranks needed by different tensor networks to represent a non-negative tensor. 3Relationship to hidden Markov models and quantum circuits In order to provide context for the factorizations introduced in Section 2, we show here how they are related to other representations of

41、probability distributions based on probabilistic graphical models and quantum circuits. In particular, we show that there is a mapping between hidden Markov models with constant number of hidden units per variable and MPSR0with constant TT-rankR0, as well as between local quantum circuits of fi xed

42、depth and Born machines of constant Born-rankC. These relations imply that results on the expressive power of the former directly provide results on the expressive power of the latter. 3.1Hidden Markov models are non-negative matrix product states Consider a hidden Markov model (HMM) with observed v

43、ariablesXitaking values in1,.,d and hidden variablesHitaking values in1,.,r(Fig. 1). The probability of the observed variables may be expressed as P(X1,.,XN) = X H1,.,HN P(X1|H1) N Y i=2 P(Hi|Hi1)P(Xi|Hi).(7) Notice thatP(Hi|Hi1)andP(Xi|Hi)are matrices with non-negative elements, as depicted in the

44、factor graph in the central diagram of Fig. 1. Now defi ne the tensorsAj 1,l = P(Xi= l|H1= j), and Ajk i,l = P(Hi= k|Hi1= j)P(Xi= l|Hi= k).Then the MPS with TT-rankR0= r defi ned with tensors Ai defi nes the same probability distribution on the observed variables as the HMM. Figure 1: Mapping betwee

45、n a HMM and a non-negative MPS. Conversely, given an MPSR0with TT-rankR0= r, there exists an HMM, with hidden variables of dimensionr0 min(dr,r2) , defi ning the same probability mass function, as shown in the supplementary material. We note also that by using a different graph for the HMM, it is po

46、ssible to construct an equivalent HMM with hidden variables of dimensionr28,29. As such, any results on expressivity derived for MPSR0hold also for HMM. 4 3.2 Quantum circuits are Born machines or locally purifi ed states An introductory presentation of the details of the connection between quantum

47、circuits and Born machines is contained in the supplementary material. There, we show that local quantum circuits of fi xed depthDallow sampling from the probability mass function ofNdiscreted-dimensional random variablesXi which is given by the modulus squared of the amplitudes defi ned by the quan

48、tum circuit. For local quantum circuits of fi xed depthD, these can be written as an MPS of TT-rankC= dD+1 . Therefore quantum circuits of fi xed depth are in correspondence with Born machines of constant Born-rankC, and any results on the expressive power of Born machines hold also for local quantu

49、m circuits, when considered as probabilistic models. Furthermore, quantum circuits that include alternating ancillary (or “hidden”) and visible variables allow to sample from a probability distribution that can be expressed as a LPS. As such, this correspondence implies that any results on the expre

50、ssive power of LPS hold also for local quantum circuits with alternating visible and hidden variables. 4Expressive power of tensor-network representations In this section we present various relationships between the expressive power of all representations, which constitute the primary results of thi

51、s work. The proofs of the propositions in this section can be found in the supplementary material. Figure 2: Representation of the sets of non-negative tensors that admit a given tensor-network factorization. In this fi gure we fi x the different ranks of the different tensor networks to be equal. F

52、or a given rank, there is a set of non-negative tensors that can be exactly represented by a given tensor network. These sets are represented in Fig. 2 for the case in which the ranks of the tensor networks are equal. When one set is included in another, it means that for every non-negative tensor,

53、the rank of one of the tensor-network factorizations is always greater than or equal to the rank of the other factorization. The inclusion relationships between these sets can therefore be characterized in terms of inequalities between the ranks, as detailed in Proposition 1. Proposition 1.For all n

54、on-negative tensorsTT-rankR0 TT-rankR,Born-rankR Born-rankC, Born-rankR puri-rankR,Born-rankC puri-rankC,puri-rankR puri-rankC,TT-rankR0 puri-rankR, TT-rankR= TT-rankC. Next, as detailed in Proposition 2, and summarized in Table 2, we continue by showing that all the inequalities of Proposition 1 ca

55、n in fact be strict, and that for all other pairs of representations there exist probability distributions showing that neither rank can always be lower than the other. This shows that neither of the two corresponding sets of tensors can be included in the other. The main new result is the introduct

56、ion of a matrix with non-negative rank strictly smaller than its complex Hadamard square-root rank, i.e. TT-rankR0 Born-rankC. Proposition 2.The ranks of all introduced tensor-network representations satisfy the properties contained in Table 2. Specifi cally, denoting byrrow(rcolumn) the rank appear

57、ing in the row (column), indicates that there exists a tensor satisfyingrrow rcolumnandindicates that there exists both a tensor satisfying rrow rrow. We now answer the question: By how much do we need to increase the rank of a tensor network such that the set of tensors it can represent includes th

58、e set of tensors that can be represented by a different tensor network of a different rank? More specifi cally, consider a tensor that has rankraccording to one representation and rankr0according to another. Can we bound the rankras a function of the rankr0only? The results of Proposition 3, present

59、ed via Table 3, indicate that in many cases there is 5 Table 2: Results of Proposition 2 TT-rankRTT-rankR0Born-rankRBorn-rankCpuri-rankRpuri-rankC TT-rankR= Born-rankR= Born-rankC puri-rankR puri-rankC= no such function - i.e. there exists a family of non-negative tensors, describing a family of probability dis

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling

文档简介

温馨提示

最新文档

评论

iccv2019论文全集8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling

文档简介

温馨提示

最新文档

评论

相关文档