




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Poisson-Randomized Gamma Dynamical Systems Aaron Schein Data Science Institute Columbia University Scott W. Linderman Department of Statistics Stanford University Mingyuan Zhou McCombs School of Business University of Texas at Austin David M. Blei Department of Statistics Columbia University Hanna W
2、allach Microsoft Research New York, NY Abstract This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variabl
3、e modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractable. This motif yields closed-form complete conditionals for all variables by way of the Bessel distribution and a novel discrete distribution that we cal
4、l the shifted confl uent hypergeometric distribution. We draw connections to closely related models and compare the PRGDS to these models in studies of real-world count data sets of text, international events, and neural spike trains. We fi nd that a sparse variant of the PRGDS, which allows the con
5、tinuous gamma latent states to take values of exactly zero, often obtains better predictive performance than other models and is uniquely capable of inferring latent structures that are highly localized in time. 1Introduction Political scientists routinely analyze event counts of the number of times
6、 countryitook actiona toward countryjduring time stept1. Such data can be represented as a sequence of count tensors Y (1),.,Y(T) each of which contains theVVAevent counts for that time step for every combina- tion ofVsender countries,Vreceivers, andAaction types. International event data sets exhib
7、it “com- plex dependence structures” 2 like coalitions of countries and bursty temporal dynamics. These de- pendence structures violate the independence assumptions of the regression-based methods that politi- cal scientists have traditionally used to test theories of international relations 35. Pol
8、itical scientists have therefore advocated for using latent variable models to infer unobserved structures as a way of controlling for them 6. This approach motivates interpretable yet expressive models that are capable of capturing a variety of complex dependence structures. Recent work has applied
9、 tensor factorization methods to international event data sets 711 to infer coalition structures among countries and topic structures among actions; however, these methods assume that the sequentially observed count tensors are exchangeable, thereby failing to capture the bursty temporal dynamics in
10、herent to such data sets. Sequentiallyobservedcounttensorspresentuniquestatisticalchallengesbecausetheytendtobebursty 12, high-dimensional, and sparse 13,14. There are few models that are tailored to the challenging properties of both time series and count tensors. In recent years, Poisson factoriza
11、tion has emerged as a framework for modeling count matrices 1520 and tensors 13,21,9. Although factorization methods generally scale with the size of the matrix or tensor, many Poisson factorization models yield inference algorithms that scale linearly with the number of non-zero entries. This prope
12、rty allows researchers to effi ciently infer latent structures from massive tensors, provided these tensors are sparse; however, this property is unique to a subset of Poisson factorization models that only posit 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Can
13、ada. (3) Y (2) Y (3) Y (1) ?(1)(2) ? discrete continuous intractable tractable (a) Poissongamma dynamical systems 22 (3) h(1) Y (2) Y (3) h(2) Y (1) g h(3) ?(1) (2) ? (b) Poisson-randomized gamma dynamical systems Figure 1:Left: The PGDS imposes dependencies directly between the gamma latent states,
14、 preventing closed-form complete conditionals. Right: The PRGDS (this paper) breaks these dependencies with discrete Poisson latent statesdoing so yields closed-form conditionals for all variables without data augmentation. non-negative prior distributions, which are diffi cult to chain in state-spa
15、ce models for time series. Hier- archical compositions of non-negative priorsnotably, gamma and Dirichlet distributionstypically introduce non-conjugate dependencies that require innovative approaches to posterior inference. This paper fi lls a gap in the literature between Poisson factorization mod
16、els that are tractablei.e., yielding closed-form complete conditionals that make inference algorithms easy to deriveand those that are expressivei.e., capable of capturing a variety of complex dependence structures. To do so, we introduce an alternating chain of discrete Poisson and continuous gamma
17、 latent states, a new modeling motif that is analytically convenient and computationally tractable. We rely on this motif to construct the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that is tractable, expressive, and effi cient. The PRGDS is cl
18、osely related to the Poissongamma dynamical system (PGDS) 22, a recently introduced model for dynamic count ma- trices, that is based on non-conjugate chains of gamma states. These chains are intractable; thus, poste- rior inference in the PGDS relies on sophisticated data augmentation schemes that
19、are cumbersome to derive and impose unnatural restrictions on the priors over other variables. In contrast, the PRGDS in- troduces intermediate Poisson states that break the intractable dependencies between the gamma states (see Fig. 1). Although this motif is only semi-conjugate, it is tractable, y
20、ielding closed-form complete conditionals for the Poisson states by way of the little-known Bessel distribution 23 and a novel discrete distribution that we derive and call the shifted confl uent hypergeometric (SCH) distribution. We study the inductive bias of the PRGDS by comparing its smoothing a
21、nd forecasting performance to that of the PGDS and two other baselines on a range of real-world count data sets of text, interna- tional events, and neural spike data. For smoothing, we fi nd that the PRGDS performs better than or similarly to the PGDS; for forecasting, we fi nd the converse relatio
22、nship. Both models outperform the other baselines. Using a specifi c hyperparameter setting, the PRGDS permits the continuous gamma latent states to take values of exactly zero, thereby encoding a unique inductive bias tailored to sparsity and burstiness. We fi nd that this sparse variant always obt
23、ains better smoothing and forecasting perfor- mance than the non-sparse variant. We also fi nd that this sparse variant yields a qualitatively broader range of latent structuresspecifi cally, bursty latent structures that are highly localized in time. 2Poisson-randomized gamma dynamical systems (PRG
24、DS) Notation.Consider a data set of sequentially observed count tensorsY (1),.,Y(T), each of which hasMmodes. An entryy (t) i 0,1,2,.in thetthtensor is subscripted by a multi-index i (i1,.,iM)that indexes into theMmodes of the tensor. As an example, the event count of the number of times countryitoo
25、k actionatoward countryjduring time steptcan be written as y (t) i where the multi-index corresponds to the sender, receiver, and action typei.e., i = (i,j,a). Generative process. The PRGDS is a form of canonical polyadic decomposition 24 that assumes y (t) i Pois ? (t) K X k=1 k (t) k M Y m=1 (m) k
26、im ? ,(1) 2 where (t) k represents the activation of thekthcomponent at time stept. Each component represents a dependence structure in the data set by way of a factor vector (m) k for each modem. For international events, the fi rst factor vector (1) k = ( (1) k1,., (1) kV) represents the rate at w
27、hich each of theV countries acts as a sender in thekthcomponent while the second factor vector (2) k represents the rate at which each country acts as a receiver. The weightskand(t)represent the scales of component k and time step t. The PRGDS is stationary if (t)=. We posit the following conjugate
28、priors: (t) Gam(a0,b0)and (m) k Dir(a0,.,a0).(2) The PRGDS is characterized by an alternating chain of discrete and continuous latent states. The continuous states (1) k ,., (T) k evolve via the intermediate discrete states h (1) k ,.,h (T) k as follows: (t) k Gam ?() 0 +h (t) k , ? and h (t) k Pois
29、 ? K X k2=1 kk2 (t 1) k2 ? ,(3) where we defi ne (0) k = kto be the per-component weight from Eq. (1). In other words, the PRGDS assumes that (t) k is conditionally gamma distributed with rateand shape equal toh (t) k plus hyperparameter? () 0 0. We adopt the convention that a gamma random variable
30、will be zero, almost surely, if its shape is zero. Therefore, setting? () 0 =0 defi nes a sparse variant of the PRGDS, where the gamma latent state (t) k takes the value of exactly zero providedh (t) k =0i.e., (t) k a.s. = 0ifh (t) k =0. The transition weightkk2in Eq. (3) represents how strongly com
31、ponentk2excites componentk at the next time step. We view these weights collectively as aKKtransition matrixand impose Dirichlet priors over the columns of this matrix. We also place a gamma prior over concentration parameter . This prior is conjugate to the gamma and Poisson distributions in which
32、it appears: Gam(0,0) and k Dir(a0,.,a0) such that PK k1k1k = 1.(4) Fortheper-componentweights1,.,K , weuseahierarchicalpriorwithasimilarfl avortoEq.(3): k Gam ? ?() 0 K + gk, ? and gk Pois ? K ? ,(5) where? () 0 is analogous to? () 0 . Finally, we use the following gamma priors, which are both conju
33、gate: Gam(a0,b0)and Gam(0,0).(6) The PRGDS has fi ve fi xed hyperparameters:? () 0 ,? () 0 ,0,a0, andb0. For the empirical studies in 5, we seta0=b0=0.01 to defi ne weakly informative gamma and Dirichlet priors and set0=10 to defi ne a gamma prior that promotes values close to 1; we consider ? () 0
34、0,1 and set ? () 0 =1. Properties.In Eq. (5), both? () 0 andare divided by the number of componentsK. This means that asthenumberofcomponentsgrowsK , theexpectedsumoftheweightsremainsfi niteandfi xed: X k=1 Ek = X k=1 ?() 0 K + Egk ?1 = X k=1 ?() 0 K + K ?1 = ?() 0 + ?1.(7) This prior encodes an ind
35、uctive bias toward small values ofk and may be interpreted as the fi nite truncation of a novel Bayesian nonparametric process. A small value ofkshrinks the Poisson rates of bothy (t) i and the fi rst discrete latent stateh (0) k . As a result, this prior encourages the PRGDS to only infer component
36、s that are both predictive of the data and useful for capturing the temporal dynamics. The marginal expectation of (t)=( (t) 1 ,., (t) K) takes the form of a linear dynamical system: E ?(t) |(t 1)?= E ?E?(t) |h(t 1)?= ? () 0 1+ (t 1).(8) This is becauseE ?(t) k ? = ?() 0 +E ?h(t) k ?1 = ?() 0 + PK k
37、2=1kk2 (t 1) k2 ?1 by iterated expec- tation. Concentration parameterappears in both the Poisson and gamma distributions in Eq. (3). It contributes to the variance of the PRGDS, while simultaneously canceling out of the expectation in Eq. (8), except for its role in the additive term ? () 0 1, which
38、 itself disappears when ? () 0 =0. Finally, we can analytically marginalize out all of the discrete Poisson latent states to obtain a purely continuous dynamical system. When ? () 0 0, this dynamical system can be written as follows: (t) k RG1 ? ? () 0 , K X k2=1 kk2 (t 1) k2 , ? ,(9) where RG1 deno
39、tes the randomized gamma distribution of the fi rst type 23,25. When? () 0 =0, the dynamical system can be written in terms of a limiting form of the RG1. We describe the RG1 in Fig. 2. 3 051015 0.0 0.2 0.4 0.6 0.8 1.0 1.2 P(|,?,?=1) = 0.5 051015 = 1 051015 = 4 ?=4 ?=2 ?=0.5 RG1(;,?,?) = ? r ? ? !?1
40、 e?I?1 2 p ? Figure 2: The randomized gamma distribution of the fi rst type (RG1) 23,25 has support0 and is defi ned by three parameters:?,0 . Its PDF is displayed in the fi gure;I?1() is the modifi ed Bessel function of the fi rst kind 26. When? 0: m Pois(c3), Gam ?() 0 +h,c2?, and h Pois(c1).(12)
41、This model is semi-conjugate. The gamma prior overis conjugate to the Poisson and its posterior is ?| ? Gam ?() 0 +h + m, c2+ c3?.(13) The Poisson prior overhis not conjugate to the gamma; however, despite this, the posterior ofh is still available in closed form by way of the Bessel distribution 23
42、, which we defi ne in Fig. 3(a): ?h| ? Bes? () 0 1, 2 p c2c1?.(14) The Bessel distribution can be sampled effi ciently 53; our Cython implementation is available online.1Provided that? () 0 0, samplingandhiteratively from Eqs. (13) and (14) constitutes a valid Markov chain for posterior inference. W
43、hen? () 0 =0, though, a.s. = 0ifh=0, and vice versa. As a result, this Markov chain has an absorbing condition ath=0and violates detailed balance. In this case, we must therefore sample h with marginalized out. Toward that end, we prove Theorem 1. 1 5 020406080100 h 0.00 0.05 0.10 0.15 0.20 0.25 0.3
44、0 P(h|v,a) v=-0.5, a=30 v=-0.8, a=10 v=1, a=50 v=2, a=7 v=10, a=150 v=400, a=300 Bes(h;v,a) = (a 2) 2h+v h!?(v+h+1)Iv(a) (a) Bessel distribution 23 120406080100 h 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 P(h|m,) m=1, =1 m=1, =10 m=1, =50 m=10, =1 m=1000, =1 m=45, =45 SCH(h;m,) = (h+m+1)! (h+1)!h!m! h
45、?1 1F1(m+1;2;) (b) Shifted confl uent hypergeometric (SCH) distribution Figure 3:Two discrete distributions that arise as posteriors in PoissongammaPoisson chains. Theorem 1: The incomplete conditional P(h|? () 0 =0,) , R P(h,|? () 0 =0,)d is (h|) ( Pois? c1c2 c3+c2 ? if m = 0 SCH?m, c1c2 c3+c2 ? ot
46、herwise, (15) where SCH denotes the shifted confl uent hypergeometric distribution. We describe the SCH in Fig. 3(b) and provide further information in the supplementary material, including the derivation of its PMF, PGF, and mode, along with details of how we sample from it and the proof for Theore
47、m 1. 4.2Closed-form complete conditionals for the PRGDS The PRGDS admits a latent source representation, so the fi rst step of posterior inference is therefore ?(y(t) ik) K k=1| ? Multinom ? y (t) i , ? k (t) k QM m=1 (m) kim ?K k=1 ? .(16) We may similarly representh (t) k under its latent source r
48、epresentationi.e.,h (t) k h (t) k= PK k2=1h (t) kk2, whereh (t) kk2Pois ? kk2 (t 1) k2 ?. When notationally convenient, we use dot-notation (“”) to denote summing over a mode. In this case,h (t) k denotes the sum of thekthrow of theKKmatrix of latent countsh (t) kk2. The complete conditional of thek
49、 th row of counts, when conditioned on their sumh (t) k, is ?(h(t) kk2) K k2=1| ? Multinom ?h(t) k, (kk2 (t 1) k2 )K k2=1 ? .(17) Toderivetheconditionalfor (t) k weaggregatethePoissonvariablesthatdependonit. ByPoissonaddi- tivity, thecolumnsumh (t + 1) k =PK k1=1h (t + 1) k1k isdistributedash (t + 1
50、) k Pois ?(t) k k?andsimilarlyy (t) k is distributed asy (t) k Pois? (t) k (t)k QM m=1 (m) k ?. The count m (t) k , h (t + 1) k +y (t) k isolates all depen- dence on (t) k and is also Poisson distributed. By gammaPoisson conjugacy, the conditional of (t) k is ?(t) k | ? Gam? () 0 +h (t) k + m (t) k
51、, + k+ (t)k QM m=1 (m) k ?. (18) When ? () 0 0, we apply the identity in Eq. (14) and sample h (t) k from its complete conditional: ?h(t) k | ? Bessel ? ? () 0 1, 2 q (t) k 2PK k2=1kk2 (t 1) k2 ? .(19) When? () 0 =0, we instead apply Theorem 1 to sampleh (t) k, wherem (t) k is analogous tomin Eq. (1
52、5): ?h(t) k | (t) k ? ?Pois(t) k )if m (t) k =0 SCH(m (t) k , (t) k )otherwise where (t) k , 2 PK k2=1kk2 (t 1) k2 + k+(t)k QM m=1 (m) k . (20) The complete conditionals forkandgkfollow from applying the same PoissongammaPoisson identities, while the complete conditionals for , , (m) k , k, and all
53、follow from conjugacy. 6 0.0 0.5 1.0 1.5 2.0 SMOOTHING Information gain over BPTF (nats) GDELT 0.0 0.5 1.0 ICEWS 0 1 2 NeurIPS 0.0 0.2 0.4 DBLP 0.00 0.05 0.10 0.15 SOTU 0.00 0.25 0.50 0.75 1.00 FORECASTING Information gain over BPTF (nats) 0.0 0.2 0.4 0 2 5 7 10 0.0 0.1 0.2 0.3 0.0 0.1 0.2 GP-DPFA P
54、GDS PRGDS () 0 =0 PRGDS () 0 =1 (a) Matrix empirical studies (originally described by Schein et al. 22) 0.00 0.02 0.04 0.06 GDELT 0.000 0.002 0.004 0.006 0.008 0.010 ICEWS 0.00 0.01 0.02 0.03 0.04 Macaques 0.000 0.025 0.050 0.075 0.100 0.125 0.000 0.001 0.002 0.003 0.004 0.00 0.02 0.04 0.06 (b) Tens
55、or empirical studies Figure 4:The smoothing performance (top row) or forecasting performance (bottom row) of each model is quantifi ed by its information gain over a non-dynamic baseline (BPTF 9), where higher values are better. 5Empirical studies As explained in the previous section, the Poissongam
56、maPoisson motif of the PRGDS (see 4.1) yields a more tractable (see Fig. 1) and fl exible (see 3) model than previous models. This motif also encodes a unique inductive bias tailored to sparsity and burstiness that we test by comparing the PRGDS to the PGDS (described in 3). As we can see by compari
57、ng Eqs. (9) and (10), comparing these models isolates the impact of the PoissongammaPoisson motif. Because the PGDS was pre- viously introduced to model aTVmatrixYof sequentially observedV-dimensional count vectors y(1),.,y(T), we generalize the PGDS toM-mode tensors and provide derivations of its c
58、omplete conditionals in the supplementary material. Our Cython implementation of this generalized PGDS (and the PRGDS) is available online. We also compare the variant of the PRGDS with? () 0 =1to the variant with? () 0 =0, which allows the continuous gamma latent states to take values of exactly zero. Setup.Our empirical studies all have the following setup. For each data setY (1),.
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年全国安全生产月活动《安全知识》备考模拟题(附答案)
- 江苏省盐城市解放路实验校2026届中考数学考试模拟冲刺卷含解析
- 软件委托开发合同范本2025年
- 被动式建筑设计-洞察及研究
- 企业咨询服务协议范本2025年
- 砂子采购合同范本2025年
- 门店合伙人协议请收藏(2025版)
- 软装墙布销售合同范本(2025版)
- 商铺销售代理合同2025年
- 劳动合同的补充协议标准(2025版)
- 电厂指标管理办法
- 2025年新修订《安全生产法》安全教育培训考核试卷及答案
- 开源人工智能:合作的价值与未来(研究报告中文版)
- 公司社保知识培训
- 湖北省潜江市2024-2025学年八年级下学期期末物理试题
- 泰国餐厅装修设计
- 2025年保密教育线上培训考试试题库及答案(共19套)
- 国际贸易政策课件
- 2025年甘肃省高考政治试题(含答案)
- 接听报修电话管理办法
- 电梯安全总监职责培训考核试题及答案
评论
0/150
提交评论