STATA学习提高笔记材料

上传人：唯*** IP属地：河北上传时间：2024-08-12 格式：PDF 页数：39 大小：5.05MB 积分：12 举报 版权申诉

已阅读5页，还剩34页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

北京科技大学

STATA应用

学习摘录

第一章STATA的基本操作

一、设置内存容

setmem500m,perm

一、显示输入内容

Display1

Display"dive"

二、显示数据集结构describe

Describe/d

三、编辑edit

Edit

四、重命名变量

Renamevarlvar2

五、显示数据集内容list/browse

Listin1

Listin2/10

六、数据导入:数据文件是文本类型（.csv）

1>insheet:.insheetusing"C:\DocumentsandSettings\Administrator\桌面

\ST9007\dataset\Feesl.csv/clear

2、内存为空时才可以导入数据集，否则会出现（youmuststartwithanemptydataset）

（1）清空内存中的所有变量：.drop_all

（2）导入语句后加入“clear”命令。

七、保存文件

1>save"C:\DocumentsandSettings\Administrator\桌面\ST9007\dataset\Feesl.dta”

、,,

2save"C:\DocumentsandSettings\Administrator\^®\ST9007\dataset\Feesl.dta/replace

八、打开及退出已存文件use

1>.Use文件路径及文件名，clear

2、.Drop_all/.exit

九、记录命令和输出结果（log）

1、开始建立记录文件：logusing"J:\phd\output.log",replace

2、暂停记录文件：logoff

3、重新打开记录文件：logon

4、关闭记录文件：logclose

十一、创建和保存程序文件：（doedit,do）

1、打开程序编辑窗口：doedit

2、写入命令

3、保存文件，.do.

4、运行命令：.do程序文件路径及文件名

十二、多个数据集合并为一个数据集（变量和结构相同）纵向合并append

insheetusing"J:\phd\Feesl.csv",clear

save"J:\phd\Feesl.dta",replace

insheetusing"J:\phd\Fees2.csv",clear

appendusing"J:\phd\Feesl.dta"

save"J:\phd\Feesl.dtaH,replace

十三、横向合并，在原数据集基础上加上另外的变量merge

1、insheetusing"J:\phd\Feesl.csv",clear

sortcompanyidyearend

save"J:\phd\Feesl.dta",replace

describe

insheetusing"J:\phd\Fees6.csv",clear

sortcompanyidyearend

mergecompanyidyearendusing"J:\phd\Feesl.dta"

save"J:\phd\Feesl.dta",replace

describe

2^__merge==lobs.Frommasterdata

_merge==2obs.Fromusingdata

_merge==3obs.Frombothmasterandusingdata

十四、帮助文件：help

1、.Helpdescribe

十五、描述性统计量

1、summarizeincorporationyear单个

summarizeincorporationyear-big6连续多个

summarize_allorsimplysummarize所有

2、更详细的统计量

summarizeincorporationyear,detail

3、centile

centileauditfees,centile(0(10)100)

centileauditfees,centile(0(5)100)

4、tabulate不同类型变量的频数和比例

tabulatecompanytype

tabulatecompanytypebig6,column按列计算百分比

tabulatecompanytypebig6,row按行计算百分比

tabcompanytypebig6ifcompanytype<=3,rowcol同时按行列和条件计算百分比

5、计算满足条件观测的个数

countifbig6==l

countifbig6==0|big6==l

6、按离散变量排序，对连续变量计算描述性统计量：

(1)bycompanytype,sort:summarizeauditfees,detail

(2)sortcompanytype

Bycompanytype:summarizeauditees

十六、转换变量

1、按公司类型将公开发行股票公司赋值为1，其他为0

genlisted=0

replacelisted=lifcompanytype==2

replacelisted=lifcompanytype==3

replacelisted=lifcompanytype==5

replacelisted=.ifcompanytype==.

卜七、产生新变量gen

Generatenewvar=表达式

十八、数据类型

1、数值型

StoragetypeBytesMinMax

byte1-127+100

int2-32,767+32,740

long4-2,147,483,6472,147,483,620

float4-1.70141173319*10381.70141173319*1036

double8-8.9884656743*103078.9884656743*10308

2、字符型

StoragetypeBytesMaxlength(characters)

strl11

str222

str808080

3、新建变量的过程中定义数据类型

•genstr3gender="male"

・listgenderin1/10

4、变量所占字节过长

•dropgender

・genstr30gender="male"

•browse

•describegender

•compressgender

5^日期数据类型：%ddates,whichisacountofthenumberofdayselapsedsinceJanuary1,I960。

(1)date(日期变量)

•genfye=date(yearend,"MDY")MDY应根据前面日期的排列顺序而定，结果显示的

是距离1960年1月1日的天数

・listyearendfyein3/10

(2)日期格式化％d(显示fye变量为日期形式，但数值并未真正变动)：

・formatfye%d

・listyearendfyein1/10

・sumfye

(3)利用日期天数求对应的年、月、日

•genyear=year(fye)

•genmonth=month(fye)

•genday=day(fye)

•listyearendfyeyearmonthdayin3/10

(4)将三个分别表示年、月、日的变量合并为一个日期变量

・dropfye

•genfye=mdy(month,day,year)

・formatfye%d

•listyearendfyein1/10

⑸将一个数值型的时间数据(20080131)转变为ST可识别的时间数据

•genyear=int(date/10000)

・genmonth=int((date-year*10000)/100)

•genday=date-year*10000-month*100

•listdateyearmonthdayin1/10

•genedate=mdy(month,day,year)

・formatedate%d

•listedatedatein1/10

十九、存贮统计量的内部变量R()

•sumauditfees

•genmeanadjaf=auditfees-r(mean)

・listmeanadjafin]/10

SUM命令后常见的几种R()值

r(N)Numberofcasesr(sd)Standarddeviation

r(sum_w)Sumofweightsr(min)Minimum

r(mean)Arithmeticmeanr(max)Maximum

r(var)Variancer(sum)Sumofvariable

显示这些变量值的命令

•sumauditfees,detail

・returnlist

二十、recode命令(PPT61)

1、产生有多个值的变量的哑变量recode

recodeyear(min/1999=0)(2000/max=1),gen(yeardum)

min/1999表示小于等于1999的值全部赋值为0

2000/max表示大于等于2000的值全部赋为1。

2、对一个连续变量按一定值分为不同间隔的组recode

genassets_categ=recode(totalassets,100,500,1000,5000,20000,100000,1000000)0分

组的值为每组的上限，包含该值。

sortassets_categ

byassets__categ:sumtotalassetsassets_categ

3^对一个连续变量按一定值分为相同间隔的组autocode

autocode(variablename,#ofintervals,minvalue,maxvalue)

forexample:genassets_categ=autocode(totalassetsz10,0,10000)

4、对一个连续变量按每组样本数相同进行分组：xtile

xtileassets_categ=totalassets,nquantiles(lO)

每组样本不一定完全相同

二十一、一次性计算同一变量不同组别的均值：egen命令

按公司类型先排序，再计算每一类型公司审计费用的均值并赋值给新变量：

bycompanytype,sort:egenmeanaf2=mean(auditfees)

•count()

•mean()

•median()

•sum()

二十二、_n和_N命令

1、显示每个观测的序号并显示总观测数

sortcompanyidfye

capturedropx

genx=_n

capturedropy

geny=_N

listcompanyidfyexyin1y30

2、分组显示每个组中变量的序号和每组总的样本数

•capturedropxy

•sortcompanyidfye

•bycompanyid:genx=_n

•bycompanyid:geny=_N

•listcompanyidfyexyinly30

3、创建新变量等于每个分组中变量的第一个值或最后一个值

•sortcompanyidfye

•bycompanyid:genauditfees_first=auditfees[l]

•bycompanyid:genauditfeesJast=auditfees[_N]

•listcompanyidfyeauditfeesauditfees_firstauditfeesjastin1730

4、创建新变量等于滞后一期或滞后两期的值

•sortcompanyidfye

•bycompanyid:genauditfeesjagl=auditfees[_n-l]

•bycompanyid:genauditfees_lag2=auditfees[_n-2]

•listcompanyidfyeauditfeesauditfeesjaglauditfees_lag2inV30

二十三、转变数据集结构：reshape

不同数据库的数据集结构不同：长型是指同一公司不同年度数据在不同的行。宽型数

据是指同一数据不同年度数据在现一行。二者间的转换可通过reshape命令来实现。需要注意的

是，在转换过程中对数据集是有要求的，一个公司只能有一个年度数据，否则会出错。

1、长型转换为宽型：

reshapewideyearendincorporationyearcompanytypesalesauditfeesnonauditfees

currentassetscurrentliabilitiestotalassetsbig6fye,i(companyid)j(year)

2、宽型转换为长型：

reshapelongyearendincorporationyearcompanytypesalesauditfeesnonauditfeescurrentassets

currentliabilitiestotalassetsbig6fye,i(companyid)j(year)

3、第二次转换时命令可简化：

•reshapewide

•reshapelong

二十四、计算CAR的例子：

已知股票日回报率，市场回报率，事件日，计算窗口期为三天的CAR。

1、定义三天的窗口期：

•sorttickeredate

•genwindow=0ifeventdate<.(事件日为0)

・replacewindow=-lifwindow[_n+l]==0&ticker==ticker[_n+l]

•replacewindow=lifwindow[_n-l]==0&ticker==ticker[_n-l]

2、计算AR和CAR

•genar=ret-vwretd

•gencar=ar+ar[_n-l]+arLn+l]ifwindow==0&ticker==ticker[_n+l]&

ticker==ticker[_n-l]

3、检验

•listtickeredateretvwretdarcarwindowifwindow<.

二十五、means的T检验：

1、检验总体上big6的审计收费有无显著不同

•use"J:\phd\Fees.dta",clear

•genlnaf=ln(auditfees)

•bybig6,sort:sumInaf

•testInaf,by(big6)

2、分年度比较big6的审计收费有无显著不同，加入byyear命令。

•genfye=date(yearend,"MDY")

•formatfye%d

•genyear=year(fye)

•sortyear

•byyear:ttestInaf,by(big6)

3、均值等于特定值得的T检验：

・sumInaf

•ttestlnaf=2.1

二十六、meadian的显著性检验：

1、获取中位数的命令：

bybig6,sort:sumInaf,detail

bybig6,sort:centileInaf

2、中位数检验：

•medianInaf,by(big6)

■ranksumInaf,by(big6)

二十七、列联表检验：

1、创建列联表的命令：

•tabulatecompanytypebig6,row

第一个变量是表的最左侧一列的项目，第二个变量是表的第一行的项目。

2、两变量之间的相关性检验：chi2

tabulatecompanytypebig6,chi2row

3、相关矩阵：

pwcorrInafbig6yearlisted

4、列出相关矩阵并进行符号检验

pwcorrInafbig6yearlisted,sig

5、在矩阵中列出观测数

•pwcorrInafbig6listedifyear==2000,sigobs

二十八、创建一个不包含缺失值的数据集

I、无缺失值的变量值为I,至少有一个的为0

gensamp=liflnaf<.&big6<.&year<.&listed<.

2、缺失值的变量值表示同一行中缺失值的个数

egenmiss=rmiss(lnafbig6yearlisted)

summiss,detail

二十九、图形

1、直方图

•histogramincorporationyear,width(l)

•histogramincorporationyear;bin(147)

width表示分一小份的宽度。bin表示分成的份数。改变宽度值可以使图像看起来更合适。

・选择起始点和间隔宽度：histInafiflnaf>=0&lnaf<=5,width(0.25)

•选择描述横轴和纵轴的单位和数据标识：histInafiflnaf>=0&lnaf<=5,width(0.25)

xlabel(0(0.5)5)

■是否与正态分布一致：histInafiflnaf>=0&lnaf<=5,width(0.25)normal

2、散点图(scatter)

•scatterInafInta

第一个变量是纵轴，第二个变量是横轴。

•twoway(scatterInafInta,msize(tiny))(IfitInafInta)

在散点图上加入最适合的一条直线。

三十、缩尾处理winsor

.winsorrev,gen(wrev)p(0.01)0.01代表去掉的百分数。

Winsorrev,gen(wrev)h(5),5代表去掉的个数

第二章线性回归

内容简介：

>2.1Thebasicideaunderlyinglinearregression

>2.2SinglevariableOLS

>2.3Correctlyinterpretingthecoefficients

>2.4Examiningtheresiduals

>2.5Multipleregression

>2.6Heteroskedasticity

>2.7Correlatederrors

>2.8Multicollinearity

>2.9Outlyingobservations

>2.10Medianregression

>2.11"Looping"

2.1Thebasicideaunderlyinglinearregression

1.残差

F=F+s

F为真实值，F为预测值，£为残差.

OLS回归就是使残差最小。

2.基本一元回归

regressyx

3.回归结果的保存

回归结果的系数保存在_可丫”的"司内存变量中，常数项的系数保存在(_cons)内存变量

中。

4、预测值及残差

・predictyhat

•predictyres,resid

yres即为真实值得与预测值之差。

5、残差与X的散点图

twoway(scattery_resx)(Ifity_resx)

Li.

<7l

I』

468101214

•y_res------------Fittedvalues

6、衡量估计系数准确程度：标准误差。

用样本的标准偏差与系数之间的关系来衡量即T值(用系数除以标准差)，同时P值是

根据T值的分布计算出来的，表示系数落入标准对应上下限的可能性。前提是残差符合以下假设:

同方差：Homoscedasticity(i.e.,theresidualshaveaconstantvariance)

独立不相关：Non-correlation(i.e.,theresidualsarenotcorrelatedwitheachother)

正态分布：Normality(i.e.,theresidualsarenormallydistributed)

7、回归结果包含的一些内容的意思

•各变差的自由度：

>FortheESS,df=k-lwherek=numberofregressioncoefficients(df=2-1)

>FortheRSS,df=n-kwheren=numberofobservations(=11-2)

>FortheTSS,df=n-l(=ll-1)

•MS：变差除以自由度：Thelastcolumn(MS)reportstheESS,RSSandTSSdividedbytheir

respectivedegreesoffreedom

•R平方：TheR-squared=ESS/TSS

•调整的R平方：AdjR-squared-l-(l-R2)(n-l)/(n-k),消除了加入相关度不高解释变量后R平

方增加的不足。

•RootMSE=squarerootofRSS/n-k：模型的平均解释能力

•TheF-statistic=(ESS/k-l)/(RSS/n-k)：模型的总解释能力

2.3CorrectIyiinterpretingthecoefficients

1、假如想检验big6的审计费用在公开发行和非公开发行公司之间的区别时，可用交互变量。

Big6*listed.

尸=+%Big6+a^Listed+a3Big6xListed+s

E(F|Big6=0,Listed=0)=Q。

E(F|Big6=1,Listed=0)=40+%

E(F|Big6=0,Listed=1)=a0+a2

E(F|Big6=LListed=1)=&+%+Q?+Q3

2、变量回归系数的解释

(1)对连续变量系数的解释：估计系数的经济意义是指X对Y的影响，可以有不同的方法来衡

量：一种是用X从25%变动到75%时Y的变动量。或X变动一个标准差时Y的变动。

•regauditfeestotalassets

•sumtotalassetsifauditfees<.,detail

•genfeesjow=_b[_cons]+_b[totalassets]*r(p25)

・genfees_high=_b[_cons]+_b[totalassets]*r(p75)

•sumfees_lowfees_high

(2)对非连续变量的解释

一般使用0和1,而不是百分比。

•regInafbig6

•genfees_nb6=exp(_b[_cons])

•genfees_b6=exp(_b[_cons]+_b[big6])

•sumfees_nb6fees__b6

2.4ExaminingtheresiduaIs

1、报告结果时，不仅用R平方来衡量显著性，而且需要报告其他统计结果：

•istheresignificantheteroscedasticity?

•isthereanypatterntotheresiduals?

•arethereanyproblemsofoutliers?

2、R2的使用：

Gu(2007)pointsoutthat:

•econometriciansconsiderR2valuestoberelativelyunimportant(accounting

researchersputfartoomuchemphasisonthemagnitudeoftheR2)

•regressionR2sshouldnotbecomparedacrossdifferentsamples

•incontrastthereisalargeaccountingliteraturethatusesR2stodeterminewhether

thevaluerelevanceofaccountinginformationhaschangedovertime。

Econometricianshavelongpointedou(thatregressionIVsarcnotcomparableacrosssamples.1

While"itisgenerallyconcededamonginsidersthatthey(/?2s)donotmeanathing*'(Cramer,1987,p.

253.emphasesadded),theR2comparisonhasbecomeanincreasinglypopularmethodinaccounting

research.2ThispaperillustrateswhytheR2sareincomparableacrosssamplesandthegeneralnatureof

tliisproblem.

TheR2tellsusnothingaboutwhetherourhypothesisaboutthedeterminantsofYiscorrect.

3、适当使用resid来评估模型的优劣。

2.5Multipleregression

1、判断模型中有无忽略相关解释变量：

・theory

•priorempiricalstudies

2、检验残差和所预测的值之间是否独立：

・genlisted=0

•replacelisted=lifcompanytype==2|companytype==3|companytype==5

•regInafIntabig6listed

•predictlnaf_hat(求预测值，因变量的估计值)

•predictlnaf_res,resid(将残差赋值给变量lnaf_res)

•twoway(scatterlnaf_reslnaf_hat)(Ifitlnaf_reslnaf_hat)(检验残差和预测值之间是

否相关)

3、另一种命令可以实现以上功能：

・regInafIntabig6listed

•rvfplot

2.6Heteroscedasticity(hettest)异方差性

1、检验方差齐性的方法：

回归后使用hettest命令：

•regauditfeesnonauditfeestotalassetsbig6listed

•hettest

3、方差齐性不会使系数有偏，但会使使系数的标准差有偏。产生的原因有可能是数据

本身有界限，产生高的偏度。一些方差不齐可以通过取对数消除。当发现不齐性时

使用Huber/White/sandwichestimator对标准差进行调整。STATA可以在回归时加上

robust来实现。

•regauditfeesnonauditfeestotalassetsbig6listed,robust

加robust后的回归系数相同，但标准差不同，T值变小，P值变大，F值变小，R2不

变。

2.7CorreIatederrors(自变量相关)

1>Theresidualsofagivenfirmarecorrelatedacrossyears("timeseriesdependence"),面板数据

(Inpaneldata),同一公司不可观测的特性对不同年度都会产生一定的影响，这时就会使数

据不独立。therearelikelytobeunobservedcompany-specificcharacteristicsthatarerelatively

constantovertime

>2、标准差会下偏，Thisproblemcanbeavoidedbyadjustingthestandarderrorsforthe

clusteringofyearlyobservationsacrossagivencompany

3、消除变量相关问题：

在回归中加入robustcluster()

regInafIntabig6listed,robustcluster(companyid)

4、如何验证同一公司不同年度数据的残差的相关性

•regInafInta

•predictres,resid

•keepcompanyidyearres

•sortcompanyidyear

•dropifcompanyid==companyidLn-l]&year==year[_n-l]

•reshapewideres,i(companyid)j(year)

•browse

・pwcorrresl998-res2002

5、在使用面板数据时应注意：

•只用robust控制heteroscedasticity,而未用cluster()控制time-seriesdependence,

T统计量也会上偏。

•如果heteroscedasticity也未控制，T统计量会上偏更严重。

•因此在使用面板数据时应加入robustcluster()option,otherwiseyour"significant”

resultsfrompooledregressionsmaybespurious.

2.8MulticolIinearity

1、什么情况下会产生多重共线性

•Wehaveseenthatwhenthereisperfectcollinearitybetweenindependentvariables,

S7ATAwillhavetoexcludeoneofthem.Forexample,year_l+year_2+year_3+

year_4+year_5=1

•regInafyear_lyear_2year_3year_4year_5,nocons

•STATAautomaticallythrowsawayoneoftheyeardummiessothatthemodelcanbe

estimated

•Eveniftheindependentvariablesarenotperfectlycollinear,therecanstillbea

problemiftheyarehighlycorrelated

2、后果：

•thestandarderrorsofthecoefficientstobelarge(i.e.,thecoefficientsarenot

estimatedprecisely)

•thecoefficientestimatescanbehighlyunstable

3、衡量方法：

Variance-inflationfactors(VIF)可用来衡量是否存在多重共线性。

・regInafIntabig6Intal

•vif

・regInafIntabig6

•vif

4、多重共线性的严重程度：如果为10时可判断为高，为20时可判断为非常高。

2.9Outlyingobservations

1、异常值的衡量Cook'sD

•Wecancalculatetheinfluenceofeachobservationontheestimatedcoefficientsusing

Cook,sD

•ValuesofCook'sDthatarehigherthanWNareconsideredlarge,whereNisthe

numberofobservationsusedintheregression

2、异常值的计算

・regInafIntabig6

•predictcook,cooksd(将cooksd的值赋给cook)

•sumcook,detail

•genmax=4/e(N)(求max,e(N)是回归过程中的内部已知变量)

•countifcook>max&cook<.

4、去掉异常值后重新回归

•regInafIntabig6ifcook<=max

5、用winsorize方法消除异常值:其缺点是Adisadvantagewith“winsorizing”isthatthe

researcherisassumingthatoutlierslieonlyattheextremesofthevariable,sdistribution<)

•winsorInaf,gen(wlnaf)p(0.01)

・winsorInta,gen(wlnta)p(0.01)

•sumInafwlnafIntawlnta,detail

・regwlnafwlntabig6

2.10Medianregression

1、中位数回归是当存在异常值问题时使用。

2,原理：

OLS估计是尽量使残差平方和最小：

nnn

Rss==Z(乃一°。一4居)，

7=11=1Z=1

中位数回归是尽量使thesumoftheabsoluteresiduals最小。

/=1

3、回归方法：STATA将中位数回归看作是quant订eregressions的一个特例。

qregInafIntabig6

2.11"Looping”

1、当多次用到一个命令集时，我们可以建立一个程序集，以program开头，以forvalues引

导的内容，以end结束。使用时只须输入程序名"ten”即可执行程序中的一引起命令集。

Example:

programten

forvaluesi=1(1)10{

displayi,

}

end

2、修改命令集：

须首先删除内存中的命令集：captureprogramdropten

然后重新编写。

4、例子：利用JONES模型计算操控性应计。

・use"J:\phd\accruals.dta",clear

•genone_sic=int(sic/1000)

•genncca=current_assets-cash

•genndcl=currentjiabilities-debt_in_currentjiabilities

•sortcikyear

•genchncca=ncca-ncca[n-l]ifcik==cik[n-l]

•gench_ndcl=ndcl-ndcl[_n-l]ifcik==cik[_n-l]

•genaccruals=(ch_ncca-ch_ndcl)/assetsLn-l]ifcik==cik[_n-l]

•genlag_assets=assets[_n-l]ifcik==cik[_n-l]

•genppe_scaled=ppe/assets[_n-l]ifcik==cik[_n-l]

•genchsales_scaled=(sales-sales[_n-l])/assets[_n-l]ifcik==cik[_n-l]

•genab__acc=.

•captureprogramdropab_acc

•programabacc

•forvalues$=0(1)9{

•captureregaccrualslag_assetsppe__scaledchsales_scaledifone__sic=='i'

•capturepredictab_acc'i'ifone_sic=='i'/resid

•replaceab__acc=ab_acc'i'ifone_sic=='i'

•capturedropab_acc'i,

•)

・ab_acc

第三章因变量为非连续性变量时的回归分析

内容简介：

>3.1WhynotOLS?

>3.2Thebasicideaunderlyinglogitmodels

A3.3Estimatinglogitmodels

>3.4Multinomialmodels

>3.5Ordinaldependentvariables

>3.6Countdatamodels

>3.7Tobitmodelsandintervalregression

>3.8Durationmodels

>3.1WhynotOLS?

1、twostatisticalproblemsifweuseOLSwhenthedependent

variableiscategorical:

>Thepredictedvaluescanbenegativeorgreaterthanone

>Thestandarderrorsarebiasedbecausetheresidualsareheteroscedastic.

2、InsteadofOLS,wecanusealogitmodel

>3.2Thebasicideaunderlyinglogitmodels

Weneedtocreateavariablethat:将离散型的因变量转变为符合OLS的形式。

•hasaninfiniterange,

•reflectsthelikelihoodofchoosingabig6auditorversusanon-big6auditor.

2、“oddsration”可实现上面的两项要求:log(oddsration)

Probability^

oddsygs=

PYobabi/ity„onbig6

P(big6=1)

odds=

1一尸(b，g6=l)

3、具体例子:

P(娩6=1)

P(big6=1)odds1-P(big6=1)In(odds)

0.010.01/0.99=0.01-4.60

0.030.03/0.97=0.03-3.48

0.050.05/0.95=0.05-2.94

0.200.20/0.80=0.25-1.39

0.300.30/0.70=0.43-0.85

0.400.40/0.60=0.67-0.41

0.500.50/0.50=1.000.00

0.600.60/0.40=1.500.41

0.700.70/0.30=2.330.85

0.800.80/0.20=4.001.39

0.950.95/0.05=19.002.94

0.970.97/0.03=32.333.48

0.990.99/0.01=99.004.60

第一列为big6的可能性，第二列和第三列为优势比率，第四列为取自然对数后的值。

4、L和P之间的转换关系。

,=尸(始6川］

U-P(Z>zg6=l)J

P亚g6=1)

exp⑷=

l-P(Z>zg6=l)

]

1+exp⑷=

1-P(big6-1)

exp(Z)

=P(big6=1)

1+exp⑷

5、似然函数：使用最大似然法估计(maximumlikelihood"estimation)

i=n(p(y=i))，(i-p(y))T

]

i-p(y=i)=

1+exp⑷

=nl^)M1+exp1(i)

6、回归命令logit和logistic

•logitreportsthevaluesoftheestimatedcoefficients

•logisticreportstheoddsratios

一般报告系数估计所以使用logito

7、模型的解释能力参数：pseudo-R2和Chi2

>pseudo-R2=(In(Lo)-ln(LN))/ln(L0)=(-175224+146215)/-175224

In(LO)是第一个回归值，In(LN)是最后一个回归值。

•Chi2=-2(ln(L0)-ln(LN))=-2*(-175224+146215)=58018

>3.3Estimatinglogitmodels

1、回归模型

•logitbig6Intaage,robustcluster(companyid)

加入robust命令是为了纠正异方差，加入cluster。是为了纠正相关性错误。

2、预测因变量的可能性

•logitbig6Intaage,robustcluster(companyid)

•dropbig6hat

・predictbig6hat

•sumbig6hat,detail

用此命令产生的预测值为以下公式：

P(big6=1)=+

1+exp(6f0+/InS+/age)

另一种产生预测因变量可能性的方法：

•genbig6hat2=exp(big6hatl)/(l+exp(big6hatl))

・sumbig6hatbig6hatlbig6hat2

3、产生预测因变量的值：

・genbig6hatl=_bLcons]+_b[lnta]*lnta+_b[age]*age

・sumbig6hatl,detail

另一种方法是predictbig6hatl,xb

4、计算自变量变动对因变量可能性的影响：

•logitbig6Intaage,robustcluster(companyid)

•genbiglO=exp(_b[_cons]+_b[lnta]*lnta+_b[age]*10)

(l+(exp(_bLcons]+_b[lnta]*lnta+_b[age]*10)))

•genbig20=exp(_b[__cons]+_b[lnta]*lnta+_b[age]*20)

(l+(exp(_bLcons]+_b[lnta]*lnta+_b[age]*20)))

•sumbiglObig20

5、检验因变量与自变量之间单调性的方法：

・xtilelnta_categ=lnta,nquan

人人文库> 全部分类> 教育资料 > 辅导培训

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

STATA学习提高笔记材料

文档简介

温馨提示

最新文档

评论

STATA学习提高笔记材料

文档简介

温馨提示

最新文档

评论

相关文档