




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、A General Framework for Symmetric Property Estimation Moses Charikar Stanford University Kirankumar Shiragur Stanford University Aaron Sidford Stanford University Abstract In this paper we provide a general framework for estimating symmet
2、ric properties of distributions from i.i.d. samples. For a broad class of symmetric properties we identify the easy region where empirical estimation works and the diffi cult region where more complex estimators are required. We show that by approximately computing the profi le maximum likelihood (P
3、ML) distribution ADOS16 in this diffi cult region we obtain a symmetric property estimation framework that is sample complexity optimal for many properties in a broader parameter regime than previous universal estimation approaches based on PML. The resulting algorithms based on these pseudo PML dis
4、tributions are also more practical. 1Introduction Symmetric property estimation is a fundamental and well studied problem in machine learning and statistics. In this problem, we are givenni.i.d samples from an unknown distribution1pand asked to estimatef(p), wherefis a symmetric property (i.e. it do
5、es not depend on the labels of the symbols). Over the past few years, the computational and sample complexities for estimating many symmetric properties have been extensively studied. Estimators with optimal sample complexities have been obtained for several properties including entropy VV11b,WY16a,
6、JVHW15, distance to uniformity VV11a, JHW16, and support VV11b, WY15. All aforementioned estimators were property specifi c and therefore, a natural question is to design a universal estimator. In ADOS16, the authors showed that the distribution that maximizes the profi le likelihood, i.e. the likel
7、ihood of the multiset of frequencies of elements in the sample, referred to as profi le maximum likelihood (PML) distribution, can be used as a universal plug-in estimator. ADOS16 showed that computing the symmetric property on the PML distribution is sample complexity optimal in estimating support,
8、 support coverage, entropy and distance to uniformity within accuracy 1 n0.2499. Further, this also holds for distributions that approximately optimize the PML objective with the approximation factor affecting the values of for which it holds. Acharya et al. ADOS16 posed two important and natural op
9、en questions. The fi rst was to give an effi cient algorithm for fi nding an approximate PML distribution, which was recently resolved in CSS19. The second open question is whether PML is sample competitive in all regimes of the accuracy parameter ? In this work, we make progress towards resolving t
10、his open question. Firstly, we show that the PML distribution based plug-in estimator achieves optimal sample complex- ity for allfor the problem of estimating support size. Next, we introduce a variation of the PML distribution that we call the pseudo PML distribution. Using this, we give a general
11、 framework for estimating a symmetric property. For entropy and distance to uniformity, this pseudo PML based framework achieves optimal sample complexity for a broader regime of the accuracy parameter than was known for the vanilla PML distribution. 1Throughout the paper, distribution refers to dis
12、crete distribution. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. We provide a general framework that could, in principle be applied to estimate any separable symmetric propertyf, meaningf(p)can be written in the form of P x2Df(px). This motivation behin
13、d this framework is: for any symmetric propertyfthat is separable, the estimate forf(p) can be split into two parts:f(p) = P x2Bf(px) + P x2Gf(px), where BandGare a (property dependent) disjoint partition of the domainD. We refer toGas the good set andBas the bad set. Intuitively,Gis the subset of d
14、omain elements whose contribution tof(p)is easy to estimate, i.e a simple estimator such as empirical estimate (with correction bias) works. For many symmetric properties, fi nding an appropriate partition of the domain is often easy. Many estimators in the literature JVHW15,JHW16,WY16a make such a
15、distinction between domain elements. The more interesting and diffi cult case is estimating the contribution of the bad set: P x2Bf(px). Much of the work in these estimators is dedicated towards estimating this contribution using sophisticated techniques such as polynomial approximation. Our work gi
16、ves a unifi ed approach to estimating the contribution of the bad set. We propose a PML based estimator for estimating P x2Bf(px). We show that computing the PML distribution only on the setBis sample competitive for entropy and distance to uniformity for almost all interesting parameter regimes thu
17、s (partially) handling the open problem proposed in ADOS16. Additionally, requiring that the PML distribution be computed on a subsetB Dreduces the input size for the PML subroutine and results in practical algorithms (See Section 6). To summarize, the main contributions of our work are: We make pro
18、gress on an open problem of ADOS16 on broadening the range of error parameter that one can obtain for universal symmetric property estimation via PML. We give a general framework for applying PML to new symmetric properties. As a byproduct of our framework, we obtain more practical algorithms that i
19、nvoke PML on smaller inputs (See Section 6). 1.1Related Work For many natural properties, there has been extensive work on designing effi cient estimators both with respect to computational time and sample complexity HJWW17,HJM17,AOST14,RVZ17, ZVV+16,WY16b,RRSS07,WY15,OSW16,VV11b,WY16a,JVHW15,JHW16,
20、VV11a. We defi ne and state the optimal sample complexity for estimating support, entropy and distance to uniformity. For entropy, we also discuss the regime in which the empirical distribution is sample optimal. Entropy:For any distributionp 2 ?D, the entropyH(p) def = ? P x2Dpxlogpx. For ? log N N
21、 (the interesting regime), whereN def = |D|, the optimal sample complexity for estimatingH(p) within additive accuracyisO( N log N 1 )WY16a. Further if 0|. Estimating support is diffi cult in general because we need suffi ciently large number of samples to observe elements with small probability val
22、ues. Suppose for allx 2 D, ifpx2 0 1 k,1, then WY15 showed that the optimal sample complexity for estimating support within additive accuracy k is O( k log k log 2 1 ). PML was introduced by Orlitsky et al. OSS+04 in 2004. The connection between PML and universal estimators was fi rst studied in ADO
23、S16. As discussed in the introduction, PML based plug-in estimator applies to a restricted regime of error parameter. There have been several other approaches for designing universal estimators for symmetric properties. Valiant and Valiant VV11b adopted and rigorously analyzed a linear programming b
24、ased approach for universal estimators proposed by ET76 and showed that it is sample complexity optimal in the constant error regime for estimating certain symmetric properties (namely, entropy and support size). Recent work of Han et al. HJW18 applied a local moment matching based approach in desig
25、ning effi cient universal 2 symmetric property estimators for a single distribution. HJW18 achieves the optimal sample complexity in restricted error regimes for estimating the power sum function, support and entropy. Recently, YOSW18 gave a different unifi ed approach to property estimation. They d
26、evised an estimator that usesnsamples and achieves the performance attained by the empirical estimator with nplognsamples for a wide class of properties and for all underlying distributions. This result is further strengthened tonlognsamples for Shannon entropy and a broad class of other properties
27、including 1-distance in HO19b. Independently of our work, authors in HO19a propose truncated PML that is slightly different but similar in the spirit to our idea of pseudo PML; refer HO19a for further details. 1.2Organization of the Paper In Section 2 we provide basic notation and defi nitions. We p
28、resent our general framework in Section 3 and state all our main results. In Section 4, we provide proofs of the main results of our general framework. In Section 5, we use these results to establish the sample complexity of our estimator in the case of entropy (See Section 5.1) and distance to unif
29、ormity (See Section 5.2). Due to space constraints, many proofs are deferred to the appendix. In Section 6, we provide experimental results for estimating entropy using pseudo PML and other state-of-the-art estimators. Here we also demonstrate the practicality of our approach. 2Preliminaries Letaden
30、ote all integers in the interval1,a. Let?D 0,1D R be the set of all distributions supported on domainDand letNbe the size of the domain. Throughout this paper we restrict our attention to discrete distributions and assume that we receive a sequence ofnindependent samples from an underlying distribut
31、ionp 2 ?D. LetDnbe the set of all lengthnsequences andyn2 Dn be one such sequence withyn i denoting itsith element. The probability of observing sequenceynis: P(p,yn) def = Y x2D pf(y n,x) x wheref(yn,x) = |i 2 n | yn i = x|is the frequency/multiplicity of symbolxin sequenceynand pxis the probabilit
32、y of domain elementx 2 D . We next formally defi ne profi le, PML distribution and approximate PML distribution. Defi nition 2.1 (Profi le).For a sequenceyn2 Dn , its profi le denoted? = ?(yn) 2 Zn +is? def = (?(j)j2nwhere?(j) def = |x 2 D|f(yn,x) = j|is the number of domain elements with frequency
33、j in yn . We call n the length of profi le ? and use ?n denote the set of all profi les of length n. 2 For any distribution p 2 ?D , the probability of a profi le ? 2 ?n is defi ned as: P(p,?) def = X yn2Dn| ?(yn)=? P(p,yn)(1) The distribution that maximizes the probability of a profi le? is the pro
34、fi le maximum likelihood distribution and we formally defi ne it next. Defi nition 2.2 (Profi le maximum likelihood distribution). For any profi le? 2 ?n , a Profi le Maximum Likelihood (PML) distributionppml,?2 ?Dis:ppml,?2 argmaxp2?DP(p,?)andP(ppml,?,?)is the maximum PML objective value. Further,
35、a distributionp? pml,? 2 ?Dis a?-approximate PML distribution if P(p? pml,?,?) ? ? P(ppml,?,?). We next provide formal defi nitions for separable symmetric property and an estimator. Defi nition 2.3(Separable Symmetric Property).A symmetric propertyf : ?D! Ris separable if for anyp 2 ?D,f(p) def = P
36、 x2Dg(px), for some functiong : R ! R. Further for any subset S D, we defi ne fS(p) def = P x2Sg(px). 2 The profi le does not contain ?(0), the number of unseen domain elements. 3 Defi nition 2.4.A property estimator is a functionf : Dn! R, that takes as inputnsamples and returns the estimated prope
37、rty value. The sample complexity offfor estimating a symmetric property f(p)is the number of samples needed to estimatefup to accuracyand with constant probability. The optimal sample complexity of a propertyfis the minimum number of samples of any estimator. 3Main Results As discussed in the introd
38、uction, one of our motivations was to provide a better analysis for the PML distribution based plug-in estimator. In this direction, we fi rst show that the PML distribution is sample complexity optimal in estimating support in all parameter regimes. Estimating support is diffi cult in general and a
39、ll previous works make the assumption that the minimum non-zero probability value of the distribution is at least 1 k. In our next result, we show that the PML distribution under this constraint is sample complexity optimal for estimating support. Theorem 3.1.The PML distribution 3 based plug-in est
40、imator is sample complexity optimal in estimating support for all regimes of error parameter . For support, we show that an approximate PML distribution is sample complexity optimal as well. Theorem 3.2.For any constant 0, anexp(?2n1?)-approximate PML distribution 3 based plug-in estimator is sample
41、 complexity optimal in estimating support for all regimes of error . We defer the proof of both these theorems to Appendix A. For entropy and distance to uniformity, we study a variation of the PML distribution we call the pseudo PML distribution and present a general framework for symmetric propert
42、y estimation based on this. We show that this pseudo PML based general approach gives an estimator that is sample complexity optimal for estimating entropy and distance to uniformity in broader parameter regimes. To motivate and understand this general framework we fi rst defi ne new generalizations
43、 of the profi le, PML and approximate PML distributions. Defi nition 3.3(S -pseudo Profi le).For any sequenceyn2 DnandS D, itsS -pseudo profi le denoted?S= ?S(yn)is? def = (?S(j)j2nwhere?S(j) def = |x 2 S | f(yn,x) = j|is the number of domain elements inSwith frequencyjinyn. We callnthe length of?Sa
44、s it represents the length of the sequenceyn from which this pseudo profi le was constructed. Let?n S denote the set of all S-pseudo profi les of length n. For any distribution p 2 ?D , the probability of a S-pseudo profi le ?S2 ?n S is defi ned as: P(p,?S) def = X yn2Dn| ?S(yn)=?S P(p,yn)(2) We nex
45、t defi ne theS-pseudo PML and(?,S)-approximate pseudo PML distributions that are analogous to the PML and approximate PML distributions. Defi nition 3.4(S-pseudo PML distribution).For anyS -pseudo profi le?S2 ?n S, a distribution p?S2 ?Dis a S-pseudo PML distribution if p?S2 argmaxp2?DP(p,?S). Defi
46、nition 3.5(?,S)-approximate pseudo PML distribution). For any profi le?S2 ?n S, a distribu- tion p? ?S 2 ?Dis a (?,S)-approximate pseudo PML distribution if P(p? ?S,?S) ? ? P(p ? ?S,?S). For notational convenience, we also defi ne the following function. Defi nition 3.6.For any subsetS D, the functi
47、onFreq : ?n S ! 2Z+takes input aS -psuedo profi le and returns the set with all distinct frequencies in ?S. Using the defi nitions above, we next give an interesting generalization of Theorem 3 in ADOS16. Theorem 3.7.For a symmetric propertyfandS D, suppose there is an estimatorf : ?n S ! R, such th
48、at for any p and ?S p the following holds, P |fS(p) ?f(?S)| ? ? , 3 Under the constraint that its minimum non-zero probability value is at least 1 k. This assumption is also necessary for the results in ADOS16 to hold. 4 then for any F 2 2Z+, a (?,S)-approximate pseudo PML distribution p? ?S satisfi
49、 es: P |fS(p) ? fS(p? ?S)| ? 2 ?n|F| ? P(Freq(?S) F) + P(Freq(?S) 6 F) . Note that in the theorem above, the error probability with respect to a pseudo PML distribution based estimator has dependency on?n |F| ? andP(Freq(?S) 6 F). However Theorem 3 in ADOS16 has error probability ?e pn ? . This is t
50、he bottleneck in showing that PML works for all parameter regimes and the place where pseudo PML wins over the vanilla PML based estimator, getting non-trivial results for entropy and distance to uniformity. We next state our general framework for estimating symmetric properties. We use the idea of
51、sample splitting which is now standard in the literature WY16a, JVHW15, JHW16, CL11, Nem03. Algorithm 1 General Framework for Symmetric Property Estimation 1:procedure PROPERTY ESTIMATION(x2n,f,F) 2:Let x2n= (xn 1,xn2), where xn1 and xn 2 represent fi rst and last n samples of x2nrespectively. 3: De
52、fi ne S def = y 2 D | f(xn 1,y) 2 F. 4: Construct profi le ?S, where ?S(j) def = |y 2 S | f(xn 2,y) = j|. 5:Find a(?,S)-approximate pseudo PML distributionp? ?S and empirical distribution p onxn 2. 6:return fS(p? ?S) + fS( p) + correction bias with respect to fS( p). 7:end procedure In the above gen
53、eral framework, the choice ofFdepends on the symmetric property of interest. Later, in the case of entropy and distance to uniformity, we will chooseFto be the region where the empirical estimate fails; it is also the region that is diffi cult to estimate. One of the important properties of the abov
54、e general framework is thatfS(p? ?S)(recallp ? ?S is a(?,S)-approximate pseudo PML distribution andfS(p? ?S)is the property value of distributionp ? ?S on subset of domain elements S D) is close to fS(p) with high probability. Below we state this result formally. Theorem 3.8.For any symmetric proper
55、tyf, letG DandF,F02 2Z+. If for allS02 2G, there exists an estimatorf : ?n S0 ! R, such that for any p and ?S0 p satisfi es, P |fS0(p) ?f(?S0)| ? 2 ? and P(Freq(?S0) 6 F0) ? .(3) Then for any sequence x2n= (xn 1,xn2), P |fS(p) ? fS(p? ?S)| 4 ?n|F 0| ? + ? + P ?S / 2 2G? , where S is a random set S d
56、ef = y 2 D | f(xn 1,y) 2 F and ?S def = ?S(xn 2). Using the theorem above, we already have a good estimate forfS(p)for appropriately chosen frequency subsetsF,F0andG D. Further, we choose these subsetsF,F0andGcarefully so that the empirical estimatef S( p)plus the correction bias with respect tofSis
57、 close tofS(p). Combining these together, we get the following results for entropy and distance to uniformity. Theorem 3.9.If error parameter log N N1? for any constant 0, then for estimating entropy, the estimator 1 for ? = n?log nis sample complexity optimal. For entropy, we already know from WY16
58、a that the empirical distribution is sample complexity optimal if 0. Therefore the interesting regime for entropy estimation is when log N N and our estimator works for almost all such . Theorem 3.10.Let 0and error parameter ? 1 N1?8 ?, then for estimating distance from uniformity, the estimator 1 for ? = n? p n log n N is sample complexity optimal. Note that the estimator in JHW17 also requires that the error parameter ? 1 NC, whereC 0is some constant. 5 4Analysis of General Framework for Symmetric Property Estimation Here we provide proofs
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年《道德与法治》教学反思
- 2024年12月英语四级考试真题和答案及解析
- 某著名企业云整合营销传播方案
- 2025年药品质量认证与国际标准变革趋势研究报告
- 2024安全员理论考试各版本
- 2025年功能性饮料在足球赛事中的市场推广策略研究
- 2023护本医院感染控制理论教学大纲
- 2023年经济学说史知识点姚开建第二版
- 第二章 有理数的计算 单元测试卷(含部分解析)人教版七年级数学上册
- 2025年度进口家电产品销售代理合同模板
- 集控中心培训管理制度
- PPP项目成本管理制度和管控措施
- 2025建筑安全员C证考试(专职安全员)题库及答案
- 事故隐患内部报告奖励制度
- 风险管控考试题及答案
- 八年级历史上册第六单元中华民族的抗日战争第18课从九一八事变到西安事变学案新人教版
- 2025年茶艺师高级技能考核试卷:茶艺设备维护与操作试题
- 人教版数学七年级上册单元测试卷-第一单元-有理数(含答案)
- 【艾青诗选】批注
- 《能源法》重点内容解读与实务应用
- 2025年云南省康旅控股集团有限公司招聘笔试参考题库含答案解析
评论
0/150
提交评论