版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysisSilke Aisenbrey, Yale UniversityGoals for the workshop:-Intro to stata-Modeling Change over time:Panel Regression Models (fixed, between and random)-Modeling whether and/or when events occur:
2、Event History Analysis (Data management for event history data, kaplan-meier, cox, piecewise constant)open stata:COMMANDRESULTSresults and syntaxREVIEWof syntax: commands or menuVARIABLESof open fileopen data, with menu (stata data- eventex.dta)to see real data to make changes directly in data erase
3、 variables, cases, make single changes in cases- relational and logical operators in stata:= is equal to= is not equal (also !=) greater than= greater than or equal0by var1: sort exercise:e.g.: tab abitur sex, coltab abitur sex if cohort=1930, colsort cohortby cohort: tab abitur sexbasic descriptive
4、 commandshelp “command”gen var1 = var2recode var1 (0=.) (1/8=2) (9=3)rename var1 var100*use the following variables:cohort (indicator of cohort membership)sex (1=male, 2=female)agemaryc (age first marriage)exercise:e.g.: sum agemaryc recode age married in groups-generate a new variable-recode new va
5、riable into groups-recode if marcens=0basic commands for data managementpossible breakIntro to panel regression with stata:-panel data-fixed effects-between effects-random effects -fixed or random?panel data (panelex1.dta)Panel data, also called cross-sectional time series data, are data where multi
6、ple cases (people, firms, countries etc) were observed at two or more time periods. Cross-sectional data: only information about variance between subjectsPanel data: two kinds of information between and within subjects- two sources of variance Panel data:Janet: Basics of panel regression modelscross
7、 sectional vs. panel analysesopen panelex1.dtaignore the fact that we have repeated measures:conclusion: more children - higher incomeregress childrn incomeFixed effects modelAnswers the question: What is the effect of x when x changes within persons over time e.g. Person A has two children at first
8、 point of time and three children at second, what effect has this change on income?Information used: fixed effects estimates using the time-series information in the dataVariance analyzed: withinProblems: only time variant variablesFixed effects exercise: separate regression for each unit and then a
9、verage it:regress income childrn if id=1regress income childrn if id=2+()_ 2= - 2.5exercise: generate dummy variable for person and regress with dummy variabletab id, g(iddum)reg income childrn iddum1 iddum2conclusion: more children - lower incomeFixed effects-define data set as panel datatsset id t
10、-regression with fixed effects commandxtreg income chldrn, feBetween effects modelAnswers the question: What is the effect of x when x is different (changes) between persons: Person A has “on the average” three children and Person B has “on the average” five children, what effect has this difference
11、 on their income? In the between effects model we model the mean response, where the means are calculated for each of the units.Information used: cross-sectional information (between subjects)Variance analyzed: between varianceTime variant and time invariant variablesBetween effects regress income c
12、hildrnconclusion: more children - more incomedefine data as panel data xtreg dependent independent, beaverage-Random effects model: Assumption: no difference between the two answers to the questions:1) what is the effect of x when x changes within the person: Person A has two children at first point
13、 of time and three children at second, what effect does this change have on their income?2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income? Information us
14、ed: panel and cross-sectional (between and within subjects)Variance analyzed: between variance and within varianceTime variant and time invariant variablesRandom effects model:-matrix-weighted average of the fixed and the between estimates. -assumes b1 has the same effect in the cross section as in
15、the time-series-requires that individual error terms treated as random variables and follow the normal distribution.use:xtreg dependent independent if var=x, repossible breakopen data: panelex2.dtavarlist:tell stata the structure of the data:tsset X YX= caseidY=time/wavesummary statistics:xtdesxtsum
16、use the effects xtreg dependent independent if sex=1, fe xtreg dependent independent if sex=1, be xtreg dependent independent if sex=1, re exercise: compare/discuss modelse.g.: xtreg indvar1 indvar2 if sex=1, fetry to include time invariant variablestry to make theoretical/empirical argument why you
17、 use which modelProblems/Tests/Solutions:Whats the right model: fixed or random effects?Test: Hausman TestNull hypothesis:Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator. If same (insignificant P-value, Probchi2 l
18、arger than .05) - safe to use random effects. If significant P-value - use fixed effects.xtreg y x1 x2 x3 . , fe estimates store fixed xtreg y x1 x2 x3 . , re estimates store random hausman fixed randomProblems/Tests/Solutions:Autocorrelation?What is autocorrelation:Last time periods values affect c
19、urrent valuestest: xtserialInstall user-written program, type findit xtserial or net search xtserialxtserial depvar indepvarsSignificant test statistic indicates presence of serial correlation.Solution: use model correcting for autocorrelationxtregar instead of xtregpossible breakpanel-waves-number
20、of children wave1 / 2/ 3/ 4-employed wave1 / 2/ 3/ 4-income wave1 / 2/ 3/ 4regression models: dependent variable continuousevent-dates of events-birth of first child 1963-birth of second child 1966-start of first employment -start of unemployment -start of second employment time information in event
21、 data more precise: dependent variable event happens 0/1different data structureDifferent Faces of Event History DataTimecontinuousdiscreteTypes of censoringSubject does not experience event of interestIncomplete follow-upLost to follow-upWithdraws from studyLeft or right censoredopen data eventex.d
22、tatell stata that our data is “survival data” stsetstset X, failure(Y) id(Z)X= time at which event happens or right censored, this is always neededY= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failureZ= idthree examples:stset ageendschevent:
23、end of schooltime: age end of schoolstset agemaryc, failure (marcens) id (caseid) event: marriagestset agestjob, failure (stjob) id (caseid) event: first jobDATA MANGAGEMENT HANNAHDifferent Models of Event HistoryTimecontinousdiscretenon-parametricsemi-parametricparametric-kaplan-meier -nelson-aalen
24、-log-rank test for comparison b/w groups-cox-piecewise constant-exponential-weibull-log-logistic-lognormal-gompertz-generalized gamma-logistic-log-logonly qualitative covariatesinclusion of covariates in models-compare survival experiences between groups (sex, cohorts)-univariate-multivariateExtende
25、d from Jenkins 2005survivor function and hazard functionSurvivor function, S(t) defines the probability of surviving longer than time tSurvivor and hazard functions can be converted into each otherHazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time i
26、nterval (t) at time t, given that the subject did not experience the event before that timeList the Kaplan-Meier survivor function . sts list . sts list, by(sex) compareGraph the Kaplan-Meier survivor function . sts graph . sts graph, by(sex)non-parametric: kaplan-meiernon-parametric: kaplan-meierex
27、ercise:stset your data for marriage, endschool or first jobe.g.: 1) sts list2) sts graph3) sts list, by () compare4) sts graph, by (.)List the Nelson-Aalen cumulative hazard function . sts list, na . sts list, na by(sex) compareGraph the Nelson-Aalen cumulative hazard function . sts graph, na . sts
28、graph, na by(sex)non-parametric: Nelson-Aalennon-parametric: Nelson-Aalenexercise:stset your data for marriage, endschool or first job1) sts list, na2) sts graph, na3) sts list, na by () compare4) sts graph, na by (.) Comparing Kaplan-Meier curvesLog-rank test can be used to compare survival curvesH
29、ypothesis test (test of significance)H0: the curves are statistically the sameH1: the curves are statistically differentCompares observed to expected cell countsnon-parametric: kaplan-meierfor agemarr:Comparing Kaplan-Meier curvesnon-parametric: kaplan-meierexercise:Test equality of survivor functio
30、nse.g.: sts test abiturLimit of Kaplan-Meier curvesWhat happens when you have several covariates that you believe contribute to survival?ExampleEducation, marital status, children, gender contribute to job changeCan use K-M curves for 2 or maybe 3 covariatesNeed another approach multivariate Cox pro
31、portional hazards model is most common - for many covariatesnon-parametric: kaplan-meierCox proportional hazards modelCan handle both continuous and categorical predictor variables Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratioA
32、ssumes multiplicative risk - -proportional hazard assumptionsemi-parametric models: coxsemi-parametric models: coxexample age of first marriage stcox sexInterpretation:because the cox model does not estimate a baseline, there is no intercept in the output.sex (male=1) (female=2)whatever the hazard r
33、ate at a particular time is for men, it is 1.5 times higher for womenwhat does this mean in our case?women get married younger than men do.Interpretation of the regression coefficients An estimated hazard rate ratio greater than 1 indicates the covariate is associated with an increased hazard of exp
34、eriencing the event of interestAn estimated hazard rate ratio less than 1 indicates the covariate is associated with a decreased hazard of experiencing the event of interestEstimated hazard rate ratio of 1 indicates no association between covariate and hazard.semi-parametric models: coxGraphically:
35、estimates for functions:stcox sex, basehc (H0)stcurve, hazard at1(sex=0) at2(sex=1)stcox sex, basesurv (S0)stcurve, surviv at1(sex=0) at2(sex=1)exercise:make your own cox modeland estimate the hazard and survivalAssessing model adequacyProportional assumption: covariates are independent with respect
36、 to time and their hazards are constant over timeThree general ways to examine model adequacyGraphically: Do survival curves intersect?Mathematically: Schoenfeld testComputationally: Time-dependent variables (extended model)compare with kaplan maier:stcoxkm, by (sex)exercise: do this with one of you
37、r estimates log-log plotsstphplot, by (sex)exercise: do this with one of your estimates, stphplot can be adjusted- look in stphplot help Mathematically: Schoenfeld Testtests if the log hazard function is constant over time, thus a rejection of the null hypothesis indicates a deviation from the proportional hazard assumptionstcox sex, schoenfeld(sch*) scaledsch(sca*)estat phtest (if more var estat phtest, detail)exercise: do this with your model,
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论