




已阅读5页,还剩50页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1 Lecture5Incompletedata ZiadTaibBiostatistics AZMay3 2011 Outlineoftheproblem MissingvaluesinlongitudinaltrialsisabigissueFirstaimshouldbetoreduceproportionEthicsdictatethatitcan tbeavoidedThereisnomagicmethodtofixitMagnitudeofproblemvariesacrossareas8 weekdepressiontrial 25 50 maydropoutbyfinalvisit12 weekasthmatrial maybeonly5 10 2 Date Name department 3 Outlineofthelecture PartI Missingdata PartII Multipleimputation Example Theanalgesictrial 4 5 Date Name department 6 PartI Missingdata Inrealdatasets like e g surveysandclinicaltrials itisquitecommontohaveobservationswithmissingvaluesforoneormoreinputfeatures Thefirstissueindealingwiththeproblemisdeterminingwhetherthemissingdatamechanismhasdistortedtheobserveddata LittleandRubin 1987 andRubin 1987 distinguishbetweenbasicallythreemissingdatamechanisms Dataaresaidtobemissingatrandom MAR ifthemechanismresultinginitsomissionisindependentofits unobserved value Ifitsomissionisalsoindependentoftheobservedvalues thenthemissingnessprocessissaidtobemissingcompletelyatrandom MCAR Inanyothercasetheprocessismissingnotatrandom MNAR i e themissingnessprocessdependsontheunobservedvalues http www emea europa eu pdfs human ewp 177699EN pdf 1 Introductiontomissingdata 7 Whatismissingdata Themissingnesshidesarealvaluethatisusefulforanalysispurposes Surveyquestions WhatisyourtotalannualincomeforFY2008 Whoareyouvotingforinthe2009electionfortheEuropeanparlament 8 Whatismissingdata Clinicaltrials Start Finish censoredatthispointintime time 9 Missingness Itmatterswhydataaremissing Supposeyouaremodellingweight Y asafunctionofsex X Somerespondentswouldn tdisclosetheirweight soyouaremissingsomevaluesforY Therearethreepossiblemechanismsforthenondisclosure Theremaybenoparticularreasonwhysomerespondentstoldyoutheirweightsandothersdidn t Thatis theprobabilitythatYismissingmayhasnorelationshiptoXorY InthiscaseourdataismissingcompletelyatrandomOnesexmaybelesslikelytodiscloseitsweight Thatis theprobabilitythatYismissingdependsonlyonthevalueofX SuchdataaremissingatrandomHeavy orlight peoplemaybelesslikelytodisclosetheirweight Thatis theprobabilitythatYismissingdependsontheunobservedvalueofYitself Suchdataarenotmissingatrandom 10 Missingdatapatterns mechanisms Pattern Whichvaluesaremissing Mechanism Ismissingnessrelatedtotheresponse Yi Ri Datamatrix withCOMPLETEDATA Rij 1 Yijmissing0 Yijobserved Rij Missingdataindicatormatrix ObservedpartofY MissingpartofY 11 Missingdatapatterns mechanisms Pattern concernsthedistributionofR Mechanism concernsthedistributionofRgivenY Rubin Biometrika1976 distinguishesbetween MissingCompletelyatRandom MCAR P R Y P R forallY 12 MissingAtRandom MAR Whatarethemostgeneralconditionsunderwhichavalidanalysiscanbedoneusingonlytheobserveddata andnoinformationaboutthemissingnessvaluemechanism Theanswertothisiswhen giventheobserveddata themissingnessmechanismdoesnotdependontheunobserveddata Mathematically ThisistermedMissingAtRandom andisequivalenttosayingthatthebehaviouroftwounitswhoshareobservedvalueshavethesamestatisticalbehaviourontheotherobservations whetherobservedornot 13 Asunits1and2havethesamevalueswherebothareobserved giventheseobservedvalues underMAR variables3 5and6fromunit2havethesamedistribution NBnotthesamevalue asvariables3 5and6fromunit1 NotethatunderMARtheprobabilityofavaluebeingmissingwillgenerallydependonobservedvalues soitdoesnotcorrespondtotheintuitivenotionof random Theimportantideaisthatthemissingvaluemechanismcanbeexpressedsolelyintermsofobservationsthatareobserved Unfortunately thiscanrarelybedefinitivelydeterminedfromthedataathand Example 14 IfdataareMCARorMAR youcanignorethemissingdatamechanismandusemultipleimputationandmaximumlikelihood IfdataareNMAR youcan tignorethemissingdatamechanism twoapproachestoNMARdataareselectionmodelsandpatternmixture 15 SupposeYisweightinpounds ifsomeonehasaheavyweight theymaybelessinclinedtoreportit SothevalueofYaffectswhetherYismissing thedataareNMAR Twopossibleapproachesforsuchdataareselectionmodelsandpatternmixture Selectionmodels Inaselectionmodel yousimultaneouslymodelYandtheprobabilitythatYismissing Unfortunately anumberofpracticaldifficultiesareoftenencounteredinestimatingselectionmodels Patternmixture Rubin1987 WhendataisNMAR analternativetoselectionmodelsismultipleimputationwithpatternmixture Inthisapproach youperformmultipleimputationsunderavarietyofassumptionsaboutthemissingdatamechanism Inordinarymultipleimputation youassumethatthosepeoplewhoreporttheirweightsaresimilartothosewhodon t Inapattern mixturemodel youmayassumethatpeoplewhodon treporttheirweightsareanaverageof20poundsheavier Thisisofcourseanarbitraryassumption theideaofpatternmixtureistotryoutavarietyofplausibleassumptionsandseehowmuchtheyaffectyourresults Patternmixtureisamorenatural flexible andinterpretableapproach 16 Simpleanalysisstrategies 1 CompleteCase CC analysis Advantages CompleteCases discard Easy Doesnotinventdata Disadvantages Inefficient Discardingdataisbad CCareoftenbiasedsamples Whensomevariablesarenotobservedforsomeoftheunits onecanomittheseunitsfromtheanalysis Theseso called completecases arethenanalyzedastheyare 17 Analysisstrategies 2 Analyzeasincomplete summarymeasures GEE Advantages CompleteCases Advantages Doesnotinventdata Disadvantages Restrictedinwhatyoucaninfer Maximumlikelihoodmethodsmaybecomputationallyintensiveornotfeasibleforcertaintypesofmodels 18 Analysisstrategies 3 Analysisaftersingleimputation Advantages CompleteCases imputation Rectangularfile Goodformultipleusers Disadvantages Na veimputationsnotgood Inventsdata inferenceisdistortedbytreatingimputationsasthetruth 19 Simplemethodsofanalysisofincompletedata cc locf 20 Variousstrategies 21 Notation DROPOUT 22 Ignorability InalikelihoodsettingthetermignorableisoftenusedtorefertoMARmechanism Itisthemechanismwhichisignorable notthemissingdata 23 Ignorability 24 Directlikelihoodmaximisation 25 Example1 Growthdata 26 27 Growthdata 28 29 Example Thedepressiontrial Patientsareevaluatedbothpretreatmentandposttreatmentwiththe17 itemHamiltonRatingScaleforDepression Ham D 17 30 Thedepressiontrial 31 32 5 PartII Multipleimputation 33 Datasetwithmissingvalues Result Completedset 34 35 Generalprinciples 36 Informaljustification 37 Thealgorithm 38 Poolinginformation 39 Hypothesistesting 40 41 MIinpractice 42 MIinpractice Asimulation basedapproachtomissingdata 1 GenerateM 1plausibleversionsof CompleteCases imputationforMthdataset 2 AnalyzeeachoftheMdatasetsbystandardcomplete datamethods 3 CombinetheresultsacrosstheMdatasets M 3 5isusuallyOK 43 MIinpractice Step1 GenerateM 1plausibleversionsofviasoftware i e obtainMdifferentdatasets Anassumptionwemake thedataareMCARorMAR i e themissingdatamechanismisignorable Shoulduseasmuchinformationisavailableinordertoachievethebestimputation Ifthepercentageofmissingdataishigh weneedtoincreaseM 44 Howmanydatasetstocreate TheefficiencyofanestimatorbasedonMimputationsis where isthefractionofmissinginformation Efficiencyofmultipleimputation M0 10 30 50 70 939791868177598949188851099979593922010099989796 45 MIinpractice Step2 AnalyzeeachoftheMdatasetsbystandardcomplete datamethods Letbbetheparameterofinterest istheestimateofbfromthecomplete dataanalysisofthemthdataset m 1 M isthevarianceoffromtheanalysisofthemthdataset 46 MIinpractice Step3 CombinetheresultsacrosstheMdatasets isthecombinedinferenceforb Varianceforis between within 47 Software 1 JoeSchafer ssoftwarefromhiswebsite 0 http www stat psu edu 7Ejls misoftwa htmlSchaferhaswrittenpubliclyavailablesoftwareprimarilyforS plus Thereisastand aloneWindowspackagefordatathatismultivariatenormal Thiswebsitecontainsmuchusefulinformationregardingmultipleimputation 48 Software 2 SASsoftware experimental ItispartofSAS STATversion8 02SASinstitutepaperonmultipleimputation givesanexampleandSAScode 49 Software 3 SOLASversion3 0 1K http www statsol ie index php pageID 5Windowsbasedsoftwarethatperformsdifferenttypesofimputation Hot deckimputation PredictiveOLS discriminantregression Nonparametricbasedonpropensityscores LastvaluecarriedforwardWillalsocombineparameterresultsacrosstheManalyses 50 MIAnalysisoftheOrthodonticGrowthData 51 Propertiesofmethods MCAR drop outindependentofresponseCCisvalid thoughitignoresinformationLOCFisvalidiftherearenotrendswithtimeMAR drop outdependsonlyonobservationsCC LOCF GEEinvalidMI MNLM weightedGEEvalidMNAR drop outdependsalsoonunobservedCC LOCF GEE MI MNLMinvalidSM PMMvalidif uncheckable assumptionstrue 52 References Allison P 2002 Missingdata ThousandOaks CA Sage greenback Horton NJ Lipsitz SR 2001 Multipleimputationinpractice Comparisonofsoftwarepackagesforregressionmodelswithmissingvariables TheAmericanStatistician55 3 244 254 Little R J A 1992 RegressionwithmissingX s Areview JournaloftheAmericanStatisticalAssociation87 420 1227 1237 RoderickJ A LittleandDonaldB Rubin 2002 StatisticalAnalysiswithMissingData 2ndeditionApril2002 ApplicationsofModernMissingDataMethods byRoderick
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年汽车维修高试题及答案
- 输尿管息肉恶变鳞状细胞癌型护理查房
- 2025 年小升初广州市初一新生分班考试英语试卷(带答案解析)-(人教版)
- 2025校园国庆节爱国主题活动总结
- 学生入校军训心得2025(五篇)
- 部编版七年级下册第五单元 单元测试卷(含答案)
- 收藏酒回收合同范本
- 光缆质量合同范本
- 核酸期间订餐合同范本
- 购牛协议合同范本
- 产品售后服务方案模板
- 防雷防静电培训考试试题及答案
- 混凝土索赔协议书
- 社保返还协议书
- 2025年湖南省国际工程咨询集团有限公司招聘笔试参考题库附带答案详解
- 变电站施工考试试题及答案
- 2025年农机修理工岗位职业技能资格考试练习题库(附答案)
- 中小学违规办学行为治理典型案例与规范要求
- 《实验室用电安全》课件
- 足少阴肾经试题及答案
- 血液透析中心护士手册
评论
0/150
提交评论