Maximum Likelihood in Concept and Practice在理念和实践的最大似然.doc_第1页
Maximum Likelihood in Concept and Practice在理念和实践的最大似然.doc_第2页
Maximum Likelihood in Concept and Practice在理念和实践的最大似然.doc_第3页
全文预览已结束

VIP免费下载

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Maximum Likelihood in Concept and Practice(cribbed mostly from Gary Kings Unifying Political Methodology)Tuesday, September 28: The Goals and Foundations of Maximum LikelihoodI. Definitions and Notation II. The Linear Model in a General FormIII. Why Inverse Probability Doesnt WorkIV. Why the Likelihood Model of Inference WorksI. Definitions and NotationA. Developed by statistician R.A. Fisher in the 1920s, borrowed first by economists, and finally imported into political science, maximum likelihood provides a fundamental rationale for using appropriate estimators and gives us lots of flexibility to match our statistical models to the process that we think generated our data. There are a number or overarching approaches designed to unify statistical methodologies, but this is by far the most familiar to political scientists. It requires you to make explicit choices about how you think your dependent variable is distributed (the models stochastic component) and what the relationship is between your independent and dependent variables (the models systematic component). Then, based on some basic rules of probability, it teaches you how to write out a likelihood function and find the set of parameters most likely to have generated observed data, given your assumed model. Before we learn the step-by-step process of getting maximum likelihood estimate, we need to learn a bit of notation.B. Let Yi be a “random variable.” It is random in that there is stochastic variation in it across many experiments for a single observation, and a variable in that it varies across observations in a single experiment. Let yi be one draw from the random variable. Let xi be one draw (consisting of one or more explanatory factors) from the social system X.C. Hypothesize some model M about how the social system produces the random variable. We can partition this model into M*, the part of the model that we will assume, and , the part of the model composed of parameters that we will estimate. A fully “restrictive” model has all of its assumptions specified (it is all M* and no ); it is the most parsimonious model and omits all of the variables. An “unrestrictive” model estimates everything, is all and no M*, and is more interesting but demands more from the data. In hypothesis testing, we will often compare a fairly unrestrictive model to a slightly more restrictive model. II. The Linear Model in a General FormA. You should be familiar with this way of writing out an OLS regression: systematic stochastic or random component component King refers to it as the linear normal regression model, breaking it down into its linear systematic component and its normally distributed stochastic component. The stochastic components distribution is given by i fn(ei|0, 2). This should be read as “the errors are distributed normally with a mean of zero and a variance of 2,” which is elsewhere written as i N(0, 2).B. A more general way of writing out an identical linear normal model is:Yi fn(yi|i, 2) where i = xiNote that this expression models the randomness in Yi directly, rather than through i. Econometrics textbook writers like Goldberger show that this assumption of normality in the distribution of Yi around its expected value is equivalent to assuming that the errors are normally distributed around zero. King uses this style of presentation because the maximum likelihood process requires you to make a substantive assumption about how the data are generated (and thus distributed), and thinking about the dependent variable itself is usually more natural than thinking about its errors.i. The systematic component in the expression above is a statement of how i varies over observations as a function of a vector of explanatory variables. It says that xi and Yi are “parametrically related” through E(Yi) = i = xi. It can be written in a general functional form as =g(X,). ii. The stochastic component should not be viewed merely as an annoyance, but as an expression that contains substantive information. It can be written generally as Yi fi(yi|i,i) where is the vector of parameters of interest, like i in the linear case, and is the vector of ancillary parameters, like 2 in the linear case. III. Why Inverse Probability Doesnt WorkA. Wouldnt it be great if we could determine the absolute probability of some parameter vector , given our data y and model M*? If we could do that, we could conduct a poll, assume that the variables that we didnt measure are irrelevant, and then make statements like, “This is a 0.8237 probability that the effect of getting a PhD on your expected annual income is -$32,689.” This would be an inverse probability statement, and for a while was the holy grail of statistics. It can be formalized as Pr(|y, M*), though because M* is assumed, it is usually suppressed and an inverse probability is written as Pr(|y).B. Using some basic rules of probability, we can see what we would need to calculate in order to calculate an inverse probability: Pr(|y) = Pr(,y) / Pr(y) by the rule that Pr(a|b)=Pr(a,b)/Pr(b)Pr(|y) = Pr()Pr(y|) / Pr(y)by substituting Pr(,y)=Pr(y,)=Pr(y|)Pr()This is “Bayes Theorem,” and statisticians thought it would give them a way to calculate an inverse probability. It is possible to calculate Pr(y|), which the probability of observing your data given a hypothesized parameter vector, and referred to as the “traditional probability.” We can put Pr(y) in terms of Pr(y|) and Pr(), but this leaves us with the tricky Pr(), ones prior belief about . There is a raging debate in statistics between the “frequentists,” who define probability as the relative frequency of an event in hypothetical repetitions of an experiment, and those who say probability is only the subjective judgment of individuals. But no matter which camp you come from, you cannot use Pr() to assign an absolute value to the inverse probability.IV. Why the Likelihood Model of Inference WorksA. Without a method for calculating absolute inverse probabilities, we must be content with relative measures of uncertainty. This is what the likelihood model of inference gives us. Now we will let be hypothetical values of the single, unobserved true value and letbe the point estimator for it. We can write out the “Likelihood Axiom:”In this axiom, k(y) can be treated as a constant, because it is an unknown function of the data, which makes the likelihood of the true parameter given the d

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论