版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、The Stata JournalEditorH. Joseph Newton Department of Statistics Texas A & M University College Station, Texas 77843979-845-3142; FAX 979-845-3144Associate EditorsChristopher F. Baum Boston CollegeRino BelloccoKarolinska Institutet, Sweden andUniv. degli Studi di Milano-Bicoc
2、ca, ItalyA. Colin CameronUniversity of CaliforniaDavis David ClaytonCambridge Inst. for Medical ResearchMario A. ClevesUniv. of Arkansas for Medical Sciences William D. DupontVanderbilt UniversityCharles FranklinUniversity of WisconsinMadison Joanne M. GarrettUniversity of North CarolinaAllan Gregor
3、y Queens UniversityJames HardinUniversity of South Carolina Ben JannETH Zurich, SwitzerlandStephen Jenkins University of EssexUlrich KohlerWZB, BerlinStata Press Production Manager Stata Press Copy EditorEditorNicholas J. Cox Department of Geography Durham UniversitySouth RoadDurham City DH1 3LE UK
4、Jens LauritsenOdense University Hospital Stanley LemeshowOhio State UniversityJ. Scott LongIndiana University Thomas LumleyUniversity of WashingtonSeattleRoger NewsonImperial College, London Marcello PaganoHarvard School of Public HealthSophia Rabe-HeskethUniversity of Califo
5、rniaBerkeleyJ. Patrick RoystonMRC Clinical Trials Unit, London Philip RyanUniversity of AdelaideMark E. SchaerHeriot-Watt University, Edinburgh Jeroen WeesieUtrecht UniversityNicholas J. G. Winter University of VirginiaJerey WooldridgeMichigan State UniversityLisa Gilmore Gabe WaggonerCopyright Stat
6、ement: The Stata Journal and the contents of the supporting les (programs, datasets, and help les) are copyright c by StataCorp LP. The contents of the supporting les (programs, datasets, and help les) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy or r
7、eproduction includes attribution to both (1) the author and (2) the Stata Journal.The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.
8、Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible web sites, leservers, or other locations where the copy may be accessed by anyone oth
9、er than the subscriber.Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting les understand that such use is made without warranty of any kind, by either the Stata Journal, the author, or StataCorp. In particular, there is no warranty of tness
10、 of purpose or merchantability, nor for special, incidental, or consequential damages such as loss of prots. The purpose of the Stata Journal is to promote free communication among Stata users.The Stata Journal, electronic version (ISSN 1536-8734) is a publication of Stata Press. Stata and Mata are
11、registered trademarks of StataCorp LP.The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book reviews, and other material of interest to Stata users. Examples of the types of papers include 1) expository papers that link the use of Stata commands or
12、 programs to associated principles, such as those that will serve as tutorials for users rst encountering a new eld of statistics or a major new technique; 2) papers that go “beyond the Stata manual” in explaining key features or uses of Stata that are of interest to intermediate or advanced users o
13、f Stata; 3) papers that discuss new commands or Stata programs of interest either to a wide spectrum of users (e.g., in data management or graphics) or to some large segment of Stata users (e.g., in survey statistics, survival analysis, panel analysis, or limited dependent variable modeling); 4) pap
14、ers analyzing the statistical properties of new or existing estimators and tests in Stata; 5) papers that could be of interest or usefulness to researchers, especially in elds that are of practical importance but are not often included in texts or other journals, such as the use of Stata in managing
15、 datasets, especially large datasets, with advice from hard-won experience; and 6) papers of interest to those teaching, including Stata with topics such as extended examples of techniques and interpretation of results, simulations of statistical concepts, and overviews of subject areas.For more inf
16、ormation on the Stata Journal, including information for authors, see the web pageThe Stata Journal is indexed and abstracted in the following: Science Citation Index Expanded (also known as SciSearch r ) CompuMath Citation Index rSubscriptions are available from StataCorp
17、, 4905 Lakeway Drive, College Station, Texas 77845, telephone 979-696-4600 or 800-STATA-PC, fax 979-696-4601, or online at/bookstore/sj.htmlSubscription rates:Subscriptions mailed to US and Canadian addresses:3-year subscription (includes printed and electronic copy)$1652-year sub
18、scription (includes printed and electronic copy)$1151-year subscription (includes printed and electronic copy)$ 591-year student subscription (includes printed and electronic copy)$ 351-year university library subscription (includes printed and electronic copy)$ 75 1-year institutional subscription
19、(includes printed and electronic copy)$175Subscriptions mailed to other countries:3-year subscription (includes printed and electronic copy)$2402-year subscription (includes printed and electronic copy)$1651-year subscription (includes printed and electronic copy)$ 843-year subscription (electronic
20、only)$1601-year student subscription (includes printed and electronic copy)$ 591-year university library subscription (includes printed and electronic copy)$ 95 1-year institutional subscription (includes printed and electronic copy)$195Back issues of the Stata Journal may be ordered online athttp:/
21、/bookstore/sj.htmlThe Stata Journal is published quarterly by the Stata Press, College Station, Texas, USA.Address changes should be sent to the Stata Journal, StataCorp, 4905 Lakeway Drive, College Station TX 77845, USA, or email .Volume 7Number 12007The Stata JournalArticle
22、s and Columns1A survey on survey statistics: What is done and can be done in Stata . . . . . . . . . F. Kreuter and R. Valliant1Rasch analysis: Estimation and tests with raschtest. . . . . . . . . . . . . J.-B. Hardouin22Multivariable modeling with cubic regression splines: A principled approach . .
23、 . . . . . . . . P. Royston and W. Sauerbrei45Sensitivity analysis for average treatment eects . . S. O. Becker and M. Caliendo71Stata and the WeeW information system . . . . . P. Vittorini, S. Necozione, and F. di Orio84File ltering in Stata: Handling complex data formats and navigating log lesecie
24、ntly. J .Eng98Mata Matters: Subscripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould106Speaking Stata: Making it count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox117Review of An Introduction to Modern Econometric
25、s Using Stata by Baum . . . . . . . . . . A. Nichols131Notes and Comments137Stata tip 40: Taking care of business. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . F. Baum137Stata tip 41: Monitoring loop iterations . . . . . . . . . . . . . . . . . . . . . . . . . D. A. Harrison140S
26、tata tip 42: The overlay problem: Oset for clarity. . . . . . . . . . . . . . . . . . . . . J. Cui141Stata tip 43: Remainders, selections, sequences, extractions: Uses of the modulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27、. . . . . . . . N. J. Cox143The Stata Journal (2007)7, Number 1, pp. 121A survey on survey statistics: What is done and can be done in StataFrauke KreuterJoint Program in Survey Methodology University of Maryland, College Park Richard ValliantJoint Program in Survey Methodology
28、 University of Michigan, Ann ArborAbstract. This article will survey issues in analyzing complex survey data and describe some of the capabilities of Stata for such analyses. We will briey review key elements of survey design and explain the eects of dierent design features on bias and variance. We
29、compare dierent methods of variance estimation for stratied and clustered samples and discuss the handling of survey weights. We will also give examples for the practical importance of Statas survey capabilities.Keywords: st0118, cluster sampling, complex design, nonresponse, stratied sam- pling, va
30、riance estimation, weights, DEFECT, NHANES, NHIS, PISA1 Issues in analyzing survey dataSurvey data are used in most empirical work in behavioral and social sciences, economics, and public health. Throughout the last few years, there has been an increased awareness that researchers need to consider t
31、he sampling design when analyzing survey data. The increasing awareness led several of the major statistical software packages to expand their features for analyzing complex survey data. Survey statisticians recognize Stata as one of the most powerful packages. However, applied substantive researche
32、rs still do not always account for survey design information as part of their standard practice. This article will therefore provide a rough guideline through the various Stata methods that are appropriate for analyzing survey data and should help to answer the following questions: What are the surv
33、ey design features that I need to take into account? Why do I need to take these survey design features into account? How do suchsurvey features aect bias and variance? How do I account for complex designs in practice?This articles goal is not to explain all possible survey designs but rather to ll
34、some knowledge gaps about issues that need to be considered in day-to-day data analysis. We will start with a brief review of the common elements of complex survey designs in section 2 and discuss the consequences of excluding these elements in section 3. Readers who are already familiar with sampli
35、ng designs may skim these sections and continue with section 4, where we discuss two major variance estimation methods for complexc 2007 StataCorp LPst01182 Survey statisticssurveys: Taylor linearization and replication. In section 5, we demonstrate the use of Stata procedures in analyzing public-us
36、e data for two large-scale surveys. The article concludes with a brief summary.2Features of survey designEstimates produced by standard procedures in statistical packages usually ignore survey design features and assume that observed data are realized values of independent random variables or that t
37、he data were collected from a simple random sample (SRS). In contrast, sample surveys involve three features that have potentially signicant consequences for estimation: weights, stratication, and clustering. We will briey introduce these features before we discuss their eects and related problems.A
38、nother feature of many surveys is that, in practice, sampling is typically done with- out replacement to avoid multiple selections of the same sampling unit. The resulting dierence in variance estimates for with- and without-replacement samples is negligible if the sample is a small proportion of th
39、e population. Because this proportion is small in our examples, as it is for many survey data, we will not discuss this issue further.Most surveys begin with a probability sample from a population frame. When the population is relatively small, the frame may be a list of all units in the population.
40、 For example, if a survey is conducted of all elementary schools in a region, a list may be available from a government education agency. In countries with population registries, those might be used as a sampling frame for household surveys. Sometimes the frame may not fully cover the desired popula
41、tion, but the weighting step, described below, tries to correct for this.Weights: Survey weights are designed to expand the sample to the level of the pop- ulation that the sample represents. In a probability sample, units are selected using known probabilities. In some surveys, all units have the s
42、ame selection probability, but more typically there will be some variation in the probabilities. In a survey of persons, separate analyses of groups dened by age, gender, and raceethnicity may be planned. Consequently, those groups may be sampled at dierent rates to obtain adequate sam- ple sizes fr
43、om each. The selection probabilities account for unequal sampling rates used for dierent types of units. The inverse of the selection probability of a sample unit is known as its base weight. For example, if males were selected with probability 0.01 and females with probability 0.05, the base weight
44、s for males and females would be 100 and 20, respectively.Many survey datasets are delivered with what are called nal weights that not only take sampling probabilities into account but are also designed to adjust for nonresponse, coverage problems, and other uses of auxiliary data outside the survey
45、.Stratication: With stratication, population elements are divided into strata: mutu- ally exclusive and exhaustive subgroups. That is, some information for every elementF. Kreuter and R. Valliant3needs to be on the frame of population elements to divide them into strata. For example, telephone numbe
46、rs for surveys of U.S. households are often divided into geographical strata. To do so, the researcher must be able to identify the geographic region of each telephone number in the sampling frame. The left panel in gure 1 shows a population that is divided into ve strata (indicated by solid lines).
47、 Sampling then takes place within each of these strata. The xs in the left panel of gure 1 denote four selected sample units in each of these ve strata. One reason to stratify is the desire to make comparisons among the subgroups that form the strata, and stratication ensures that units from each gr
48、oup are selected into the sample. Political or geographical regions are often used as strata for this reason.Stratified sampleCluster sample within strataFigure 1: Stratied and clustered samplesClustering: Samples are called clustered if one species groups of population units, and a sample of such g
49、roups (primary sampling units PSUs) is rst taken instead of the individual units. The dotted lines in the right panel of gure 1 indicate such clusters within the strata. Here two PSUs are selected in each of the ve strata. In this simple example of cluster sampling, all elements within each cluster
50、are selected into the sample. Researchers often decide to use a clustered sample instead of a simpler design for organizational or nancial reasons. The absence of a general population registry in many countries makes in-person surveys of an SRS virtually impossible. Sampling in several stages, one o
51、f them at the level of small geographical clusters, facilitates selecting respondents without the aid of registry data. This approach is used in many household surveys when data are collected by in-person interviews and a list of all households is not available. Here geographic areas are sampled unt
52、il, at the last stage, households can be listed and sampled. More complex designs can have further sampling within the clusters. Also, a sample in which geographical clusters are sampled rst is cost ecient for face-to-face surveys, since interviewing respondents who live close together reduces trave
53、l costs.4Survey statistics3 Accounting for survey design: Eects on bias and vari- anceTwo challenges arise when dealing with survey data: (1) obtaining correct point esti- mates (avoiding bias) and (2) computing correct variances and standard errors (SEs). The three elements described above (weights
54、, stratication, and clustering) have dier- ent eects on bias and variance.3.1 WeightsIf the sample is selected with unequal selection probabilities, disregarding sampling weights can lead to biased estimates when estimating population totals, means, or other more complicated quantities. If weights a
55、re used in models, the resulting estimates are of models that would be tted if you had the entire population in the sample. But even if a sample is selected with equal selection probabilities, analysts might be confronted with weights in the resulting dataset. Those weights are usually designed to a
56、djust for nonresponse or coverage error (or both). Typically users will not create those weights themselves. Datasets are usually delivered with weight variables designed by the data producer.Most complex samples suer some degree of nonresponse. Nonresponse can occur for several reasons. For example
57、, in a household survey, contact may never be made with some households because no one can be found at home during the survey period. Others that are contacted may refuse to participate. Only if the respondents can be safely treated as a random subsample of the full sample will estimates of quantities like means and proportions be unbiased. Nonresponse can lead to bias if the
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 施工临时用水管线维护方案
- 分部分项工程量审核协审控制方案
- 高层建筑主体框架施工组织方案
- 蔬菜清洁加工车间方案
- 蔬菜采后冷却车间方案
- 2026年绩效分配方案实施细则
- 食堂定价策略模拟系统
- 施工安全管理方案
- 混凝土搅拌站运行监控管理方案
- 2026广西桂林兴安县兴安镇卫生院招聘1人备考题库附答案详解ab卷
- 中医食疗护理
- 2026届新高考地理三轮热点复习综合题提分策略
- GB/T 46971-2026电子凭证会计数据银行电子对账单
- 危化企业防雷生产制度
- 2026年二级建造师之二建市政工程实务考试题库500道及答案【夺冠系列】
- 2026年安全员之A证考试题库500道【满分必刷】
- 疫苗类型课件
- 湖北开放大学2025年秋学期《地域文化(本)》形考任务1【含参考答案】
- 化工安全设计课件
- 工业金属管道施工规范解析
- 雨课堂在线学堂《西方哲学-从古希腊哲学到晚近欧陆哲学》单元考核测试答案
评论
0/150
提交评论