已阅读5页,还剩5页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
On two ways to use determinantal point processes for Monte Carlo integration Guillaume Gautier g.gautierinria.fr Rmi Bardenet remi.bardenet Michal Valko valkom Univ.Lille, CNRS, Centrale Lille, UMR 9189CRIStAL, 59651 Villeneuve dAscq, France Inria Lille-Nord Europe, 40 avenue Halley 59650 Villeneuve dAscq, France DeepMind Paris, 14 Rue de Londres, 75009 Paris, France Abstract When approximating an integral by a weighted sum of function evaluations, de- terminantal point processes (DPPs) provide a way to enforce repulsion between the evaluation points. This negative dependence is encoded by a kernel. Fifteen years before the discovery of DPPs, Ermakov Dick see also (Evans Huszr Briol et al., 2015), herding (Chen et al., 2010; Bach et al., 2012), or the biased importance sampling estimate of Delyon Kulesza see, e.g., Robert see (Simon, 2011, Section 3.11). On our side, we use a two-layer rejection sampling scheme. We rather sample from then-th conditional using the marginal distributionN?1KN(x,x)!(x)dxas proposal and rejection constantN/(N ?(n?1). This allows us to reduce the number of (costly) evaluations of bothKn?1(xn)and the quadratic form Kn?1(xn)TK?1 n?1Kn?1(xn)that appear in the acceptance ratio. The marginal distribution itself is sampled using the same proposal!eq(x)dxand rejection constant as BH. The rejection constant, of order2d, is derived from Chow et al. (1994) and Gautschi (2009). We further reduced the number of OP evaluations by consideringN?1KN(x,x)!(x)dxas a mixture, where each component in (6)involves only one OP. In the end, the expected total number of rejections is of order2dN logN. Finally, we implemented a specifi c rejection free method for the univariate Jacobi ensemble; a special continuous projection DPP which can be sampled exactly inO(N2), by computing the eigenvalues of a random tridiagonal matrix (Killip see Appendix B.1. The expected variance decay is of order1/N1+1/d. We reproduce their experiment in Figure 2 for largerN, and compare with the behavior of b I EZ N . In short,bI EZ N dramatically outperformsbI BH N ind 2, with surprisingly fast empirical convergence rates. When d ? 3, performance decreases, andbI BH N shows both faster and more regular variance decay. To know whether we can hope for a CLT for b I EZ N , we performed Kolmogorov-Smirnov tests for N = 300, which yielded smallp-values across dimensions, from 0.03 to 0.24. This is compared to the samep-values forbI BH N , which range from 0.60 to 0.99. The results are presented in Appendix B.1. The lack of normality ofbI EZ N is partly due to a few outliers. Where these outliers come from is left for future work; ill-conditioning of the linear system(10)is an obvious candidate. Besides, contrary tobI BH N , the estimatorbI EZ N has no guarantee to preserve the sign of integrands having constant sign. 7 4.2Integrating sums of eigenfunctions In the next series of experiments, we are mainly interested in testing the variance decay ofbI EZ N (f) prescribed by Theorem 1. To that end, we consider functions of the form f(x) = M?1 X b(k)=0 1 b(k) + 1?k(x), (16) whose integral w.r.t.is to be estimated based on realizations of the multivariate Jacobi ensemble with kernelKN(x,y) = PN?1 b(k)=0?k(x)?k(y), whereN 6= M a priori. This means that the functionf can be either fully (M N) or partially (M N) decomposed in the eigenbasis of the kernel. In both cases, we let the number of pointsNused to build the two estimators vary from10to100in dimensionsd = 1to4 . In the fi rst setting, we setM = 70. Thus,Neventually reaches the number of functions used to buildfin(16), after what b I EZ N is an exact estimator, see Figure 2(e)-(h). The second setting hasM = N + 1, so that the number of pointsNis never enough for the variance in (11) to be zero. The results of both settings are to be found in Appendix B.2. In the fi rst case, for each dimensiond, we indeed observe a drop in the variance of b I EZ N once the number of points of the DPP hits the thresholdN = M. This is in perfect agreement with Theorem 1: oncef 2 HM HN, the variance in(11)is zero. In the second setting, asNincreases the contribution of the extra mode?b?1(N)in(16)decreases as 1 N. Hence, from Theorem 1 we expect a variance decay of order 1 N2, which we observe in practice. 4.3Further experiments InAppendicesB.3-B.6wetesttherobustnessofbothBHandEZestimators, whenappliedtofunctions presenting discontinuities or which do not belong to the span of the eigenfunctions of the kernel. Although the conditions of the CLT(9)associated tobI BH are violated, the corresponding variance decay is smooth but not as fast. ForbI EZ, the performance deteriorates with the dimension. Indeed, the cross terms arising from the Taylor expansion of the different functions introduce monomials, associated to large coeffi cients, that do not belong toHN. Sampling more points would reduce the variance(11). But more importantly, for EZ to excel, this suggests to adapt the kernel to the basis where the integrand is known to be sparse or to have fast-decaying coeffi cients. 5Conclusion Ermakov & Zolotukhin (EZ, 1960) proposed a non-obvious unbiased Monte Carlo estimator using projection DPPs. It requires solving a linear system, which in turn involves evaluating both theN eigenfunctions of the corresponding kernel and the integrand at theNpoints of the DPP sample. This is yet another connection between DPPs and linear algebra. In fact, solving this linear system provides unbiased estimates of the Fourier-like coeffi cients of the integrandfwith each of theN eigenfunctions of the DPP kernel. Remarkably, these estimators have identical variance, and this variance measures the accuracy of the approximation offby its projection onto these eigenfunctions. With modern arguments, we have provided a much shorter proof of these properties than in the original work of (Ermakov & Zolotukhin, 1960). Beyond this, little is known on the EZ estimator. While coming with a less interpretable variance, the more direct estimator proposed by Bardenet & Hardy (BH, 2019) has an intrinsic connection with the classical Gauss quadrature and further enjoys stronger theoretical properties when using multivariate Jacobi ensemble. Our experiments highlight the key features of both estimators when the underlying DPP is a multi- variate Jacobi ensemble, and further demonstrate the known properties of the BH estimator in yet unexplored regimes. Although EZ shows a surprisingly fast empirical convergence rate ford 2, its behavior is more erratic ford ? 3. Ill-conditioning of the linear system is a potential source of outliers in the distribution of the estimator. Regularization may help but would introduce a stabil- ity/bias trade-off. More generally, EZ seems worth investigating for integration or even interpolation tasks where the function is known to be decomposable in the eigenbasis of the kernel, i.e., in a setting similar to the one of Bach (2017). Finally, the new implementation of an exact sampler for multivariate Jacobi ensemble unlocks more large-scale empirical investigations and asks for more theoretical work. The associated code1is available in the DPPy toolbox of Gautier et al. (2019). 8 References Bach, F. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. Journal of Machine Learning Research, 2017. arXiv:1502.06800. Bach, F., Lacoste-Julien, S., and Obozinski, G. On the Equivalence between Herding and Condi- tional Gradient Algorithms. In International Conference on Machine Learning (ICML), 2012. arXiv:1203.4523. Bardenet, R. and Hardy, A. Monte Carlo with Determinantal Point Processes. Annals of Applied Probability, in press, 2019. arXiv:1605.00361. Briol, F.-X., Oates, C. J., Girolami, M., and Osborne, M. A. Frank-Wolfe Bayesian Quadrature: Prob- abilistic Integration with Theoretical Guarantees. In Advances in Neural Information Processing Systems (NeurIPS), 2015. arXiv:1506.02681. Chen, Y., Welling, M., and Smola, A. Super-Samples from Kernel Herding. In Conference on Uncertainty in Artifi cial Intelligence (UAI), 2010. arXiv:1203.3472. Chow, Y., Gatteschi, L., and Wong, R. A Bernstein-type inequality for the Jacobi polynomial. Proceedings of the American Mathematical Society, 1994. Davis, P. J. and Rabinowitz, P. Methods of numerical integration. Academic Press. 1984. Delyon, B. and Portier, F.Integral approximation by kernel smoothing.Bernoulli, 2016. arXiv:1409.0733. Dick, J. and Pillichshammer, F. Digital nets and sequences : discrepancy and quasi-Monte Carlo integration. Cambridge University Press. 2010. Ermakov, S. M. and Zolotukhin, V. G. Polynomial Approximations and the Monte-Carlo Method. Theory of Probability & Its Applications, 1960. Evans, M. and Swartz, T. Approximating integrals via Monte Carlo and deterministic methods. Oxford University Press. 2000. Gautier, G., Polito, G., Bardenet, R., and Valko, M. DPPy: DPP Sampling with Python. Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS), in press, 2019. arXiv:1809.07258. Gautschi, W. How sharp is Bernsteins Inequality for Jacobi polynomials? Electronic Transactions on Numerical Analysis, 2009. Hough, J. B., Krishnapur, M., Peres, Y., and Virg, B. Determinantal Processes and Independence. In Probability Surveys. 2006. arXiv:math/0503110. Huszr, F. and Duvenaud, D. Optimally-Weighted Herding is Bayesian Quadrature. In Conference on Uncertainty in Artifi cial Intelligence (UAI), 2012. arXiv:1204.1664. Johansson, K. Random matrices and determinantal processes. Les Houches Summer School Proceed- ings, 2006. Killip, R. and Nenciu, I. Matrix models for circular ensembles. International Mathematics Research Notices, 2004. arXiv:math/0410034. Knig, W. Orthogonal polynomial ensembles in probability theory. Probability Surveys, 2004. arXiv:math/0403090. Kulesza, A. and Taskar, B. Determinantal Point Processes for Machine Learning. Foundations and Trends in Machine Learning, 2012. arXiv:1207.6083. Lavancier, F., Mller, J., and Rubak, E. Determinantal point process models and statistical inference : Extended version. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 2012. arXiv:1205.4818. 9 Liu, Q. and Lee, J. D. Black-Box Importance Sampling. In Internation Conference on Artifi cial Intelligence and Statistics (AISTATS), 2017. arXiv:1610.05247. Macchi, O. The coincidence approach to stochastic point processes. Advances in
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 岗位讲述活动工作方案
- 幼儿园午睡室光线强度对幼儿入睡速度影响-基于2023年环境监测仪与睡眠记录表
- 文明施工协调措施方案
- 光伏柔性支架施工方案步骤参考
- 二学一专题实施方案
- 标枪训练课程设计
- c 课程设计航空订票系统
- 语文材料作文突破-引语式材料
- 初中七年级道德与法治“生命至上防患未‘燃’”假期消防安全主题班会教案
- 高中地理二轮复习·港口专题精讲讲义-港口枢纽:能级跃迁与价值重塑
- DL∕T 1392-2014 直流电源系统绝缘监测装置技术条件
- 电影叙事与美学智慧树知到期末考试答案章节答案2024年南开大学
- 农村院子菜园设计
- 2024外研版初中英语单词表汇总(七-九年级)中考复习必背
- 电加热供暖工程验收表
- 中医养生保健职业生涯发展规划
- 2022-2023学年雅安市六年级数学第二学期期末统考试题含解析
- 驾考三力测试模拟题含答案
- 技术创新成熟度评价标准及评价细则
- D500-D505 2016年合订本防雷与接地图集
- 氩弧焊焊接工艺指导书
评论
0/150
提交评论