




已阅读5页,还剩6页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
已有 556 次阅读 2010-5-8 10:56 |个人分类:Stata|系统分类:科研笔记|关键词:SAS, STATA SASStataIn SAS operators can be symbols or mnemonic equivalents such as: & or and For many situations in SAS order doesnt matter: = can be: = can be: = Most operators are the same in Stata as in SAS, but in Stata operators do not have mnemonic equivalents. For example, you have to use the ampersand (&) and not the word and. This works: var_a = 1 & var_b = 1 and var_b = greater than or equal to = not = . /* this is a comment */ * this is also a comment ; /* this is a comment */ * this is also a comment / this is a comment as well To continue a command to the next line (line continuation): / you can comment here as well For example: list id state gender age income / race income date Range of values: if 1 = var_a = 1 & var_a = 10 or: if inrange(var_a,1,10) or: if inlist(var_a,1,2,3,4,5,6,7,8,9,10) or a list of string values: if inlist(state,NC,AZ,TX,NY,MA,CA,NJ) Stata has a limit of 10 arguments to inlist() (which includes the string variable) when the arguments are strings. More than one variable can be specified. Referencing multiple variables at a time: Say the following variables are in a data file in the order shown: var1 var2 var3 age var4 var5 Then you could code them as: var1-var5 To SAS, this means all variables that are positionally between var1 and var5, which would include the variable age. Referencing multiple variables at a time: var1-var5 To Stata, this means all variables that are positionally between var1 and var5. Notice that there is only one dash (-). Referencing multiple variables at a time: var1-var5 is the same as: var1 var2 var3 var4 var5 no matter the positions of the variables are in the observation. Using a colon selects variables containing the same prefix: var: could represent: var1 var2 var10 variable varying var_1 Referencing multiple variables at a time: var? The question mark (?) is a wild card that represents one character in the variable name. It could be a number, a letter, or an underscore (_). var* The asterisk/star (*) is a wild card that represents many characters in the variable name. They could be numbers, letters, or underscores. Thus: var* could represent: var1 var2 var10 variable varying var_1 To save the contents of the Log window and/or Output window, go to that window and click on the menu bars File, Save. In SAS batch mode these files are automatically generated for you. To save the contents of the results window, start logging to a log file BEFORE you submit commands that you want logged. Open a log file by clicking on the icon in the tool bar that looks like a scroll and a traffic light. A *.log file is a simple ASCII text file; a *.smcl file is formatted with html-like tags. You can also use the log command: log using D:mydatamydofile.log, replace Note: The replace option simply tells Stata to overwrite the log file if it already exists. This is helpful when you have to run a do-file over and over again. libname in D:mydata; data new; set in.mySASfile; run; or, starting in SAS 8: data new; set D:mydatamysasfile.sas7bdat; run; use D:mydatamyStataFile.dta You can also click on the open file icon and select your dataset. Save the dataset newer to D:mydata: libname in D:mydata; data in.newer; set new; run; save D:mydatanewer.dta To overwrite the dataset newer if it already exists: save D:mydatanewer.dta , replace You can also click on the save icon to save your dataset. proc contents; On selected variables: proc contents data = in.newer (keep= id state gender age income); run; describe On selected variables: describe id state gender age income proc means; On selected variables: proc means; var age income; run; or proc univariate; var age income; run; summarize On selected variables: summarize age income If you want variable labels and a proc univariate style output try: summarize age income, detail or: codebook age income proc freq; table var1; run; tabulate var1 or, for just checking out your dataset, try the codebook command. A series of 1-way tables: proc freq; tables var1 var2; run; A series of 1-way tables: tab1 var1 var2 A 2-way table: proc freq; tables var1*var2; run; A 2-way table: tab2 var1 var2 proc print; selected variables in this order: proc print; var id age income; run; On selected variables and a limited range of observations: proc print data = new (firstobs = 1 obs = 20); var id age income; run; list On selected variables in this order: list id age income On selected variables and a limited range of observations: list id age income in 1/20 Create a numeric variable with a default length of 8 bytes: var1 = 1234; Create a numeric variable with the minimum allowable length (3 bytes): length var1 3; var1 = 1234; generate var1 = 1234 Note: the default numeric data type is float. The statement above is relying on that default. It could have been written explicitly as: generate float var1 = 1234 float stands for floating point decimal. You could more wisely save storage space by specifying: gen int var1 = 1234 int stands for integer data type. Create a character variable with a length of 3 bytes: name = Bob; Generate a string variable with a length of 3 bytes: gen str3 name = Bob Increase the variable length to allow for 5 characters: data new; length name $5; set new; *Change the values of numeric * and character variables: *; var1 = 123456; name = Bobby;run; replace var1 = 123456 Stata automatically increases the storage type if necessary. To change the storage of a variable manually, use the recast command. replace name = Bobby Stata automatically increases length to 5 Example of an if-then statement: if var1 = 123456 then var2 = 1; The condition follows the command: replace var2 = 1 if var1 = 123456 Notice that Stata requires two equals signs when testing equality.Example of an if-then do loop: if age = 10 then do; child = 1; parent = 0; end; replace child = 1 if age = 10 replace parent = 0 if age = 10 Since each command is executed on all observations before the next command is executed, the if-then-do loop is not an option. Stata does have excellent looping tools: foreach, forvalues, and while. Example of an if-then-else: if 0 = age = 2 then agegp = 1; else if 2 age = 10 then agegp = 2; else if 10 age = 20 then agegp = 3; else if 20 age = 0 & age 2 & age 10 & age 20 & age = 1 & age 2 & age 10 & age 20 & age = and = , adding .9999 to the upper range ensures that fractional values are handled correctly. Drop variables var1, var2, and var3: data new(drop= var1 var2 var3); set new; run; Drop variables var1, var2, and var3: drop var1 var2 var3 Keep variables var1, var2, and var3: data new(keep= var1 var2 var3); set new; run; Keep variables var1, var2, and var3: keep var1 var2 var3 Keep observations / subsetting if statement: data new; set new; if var1 = 1 then output new; run; Keep observations keep if var1 = 1 Delete observations: data new; set new; if var1 = 1 then delete; run; Drop observations: drop if var1 = 1 Loop over a variable list (varlist): data new(drop= i); set new; array raymond 4 var1 var2 var3 var4; do i = 1 to 4; if raymondi = 99 then raymondi = . ; end; run; Check out this array example in the SAS programming examples page. foreach i of varlist var1 var2 var3 var4 replace i = . if i = 99 Note: Notice that the quote to the left of the local macro variable i is a left quote (). The left quote is located at the top of your keyboard next to the (!1) key. In this example i is a local macro variable that exists only for the duration of the foreach command so it does not need to be dropped like the variable i in the SAS code. Create variable labels: label age = age in years income = salary plus bonuses ; label var age age in years label var income salary plus bonuses Define a format: proc format; value yesno 1 = yes 2 = no ; run;Assign the format to a variable: data newer; set newer; format smokes yesno.; run; Define a format. These are called value labels: label define yesno 1 yes /* */ 2 no Assign the value label to a variable: label value smokes yesno Remove formats from a variable: data newer; set newer; * just do not specify a format *; format smokes ; run; label value smokes . Assign formats defined by SAS to a variable: format interview_date mmddyy8.;Assign formats defined by Stata to a variable: format interview_date %tdNN/DD/YY /* pre Stata 10 the format did not start * with the letter t and did not * need two letters for each part of the date: */ format interview_date %dN/D/Y Note: The letter N in %tdNN/DD/YY stands for number of the month. Specifying Mon in %tdDDMonCCYY uses the three letter abbreviation of the name of the month. So %tdNN/DD/YY displays as 11/06/45 and %tdDDMonCCYY displays as 06Nov1945. title Number of Companies That Got Acquired;Since the Results window/log file is a mix of both the log and the Output window Stata doesnt need a title statement. Titling can be accomplished with a comment. /* Number of Companies That Got Acquired */ proc sort data = new out = newer; by id; run; sort id proc sort data= sashelp.shoes (keep= region product subsidiary stores sales inventory) out= work.shoes; by region subsidiary product; run; /* fix flaw in dataset * where the Copenhagen subsidiary * has 2 obs for product = Sport Shoe */ proc summary nway data= work.shoes; /* the by statement fixes * the variable order in work.shoes */ by region subsidiary product; var stores sales inventory; output out= work.shoes (drop= _TYPE_ _FREQ_) sum=stores sales inventory;run; /* long to wide because: * there are repeats of by-variable values */ proc transpose data= work.shoes out= shoes_wide prefix=prodnum; by region subsidiary; var product; run; keep region subsidiary product bysort region subsidiary (product) : gen prodnum = _n reshape wide product, / i(region subsidiary) j(prodnum) The xpose command is similar but only works with numeric data. It will turn string variables into missing values. /* wide to long because: * there are no repeats of by-variable values */ proc transpose data= work.shoes_wide out= shoes_long name=prodnum; by region subsidiary; var prodnum: ; run; / j(prodnum) just names the _j variable prodnum reshape long product, i(region subsidiary) j(prodnum) Check out this reshape example in the Stata code examples page. Using by-groups: data newer; set newer; by id; if first.id = 1 then f_num = 1; if first.id = 1 and last.id = 1 then s_num = 1; if last.id = 1 then l_num = 1; run; by id: gen f_num = 1 if _n = 1 by id: gen s_num = 1 if _n = 1 & _N = 1 by id: gen l_num = 1 if _n = _N Statas _n is equivalent to SASs _n_ in that it is equal to the observation number; but when inside a by command _n is equal to 1 for the first observation of the by-group, 2 for the second observation of the by-group, etc. Statas _N is equal to the number of observations in the dataset except in a by command when it is equal to the total number of observations in the by-group. Count the total number of observations within each ID group, and add that total to each observation: proc summary data= new nway; class id; var age; output out= temp(drop= _type_ _freq_) n= totboys; run; proc sort data= temp; by id; run; proc sort data= new; by id; run; data newer; merge new temp; by id; run; bysort id: egen totboys = count(age) Note: in both SAS and Stata, the count will be the number of observations where the variable being counted has a non-missing value. Here we used the variable age. Create a cumulative/running sum of boys within each ID group: data new; set newer; by id; retain count 0; if first.id then count = 0; if gender = 1 and age = 18 then count = count + 1; run; bysort id: gen count = sum(gender = 1 & age = 18) data both; merge in.new(in = a) in.newer(in = b); by id; if a = 1 and b = 1; run; Check out this merge example in the SAS programming examples page. use D:mydatanew.dta sort id /* Starting in Stata 11 you have to specify * what type of merge you are doing nor have. * to have your datasets sorted before the merge. * This is a one-to-one merge: */ merge 1:1 id using D:mydatanewer.dta / or in previous versions of Stata: merge id using D:mydatanewer.dta keep if _merge = 3 Stata automatically creates the variable _merge after a merge. Stata will not merge on another dataset if the variable _merge already exists in one of the datasets. The dataset in memory is the master dataset. The dataset that is being merged on is the using dataset. Unlike SAS, variables shared by the master dataset and the using dataset will not be updated (values overwritten) by the using dataset. Like SAS, the formats, labels, and informats of variables shared by the master dataset and the using dataset will be defined by the master dataset. Remember that the master always wins. Use the update option to overwrite missing data in master file. Concatenate two datasets / add observations to a dataset: data both; set in.new in.newer; run; use D:mydatanew.dta append using D:mydatanewer.dta /* Starting in Stata 11 you can use append without * having a dataset already in memory: */ append using D:mydatanew.dta D:mydatanewer.dta Sort datasets in order to prepare them for a merge: Sort permanently stored datasets and create new, sorted copies in the WORK library: proc sort data = pany out = pany; by id; run; proc sort data = in.firm out = work.firm; by id; run; data temp2; merge firm(in = a) company(in = b); by id; run; Sorting datasets in order to prepare them for a merge is only required if you are using a version of Stata prior to Stata 11: Create a local macro variable to represent a filename for Stata to use in temporarily storing a data file on the computers hard drive if requested to do so later: tempfile company use D:mydatacompany.dta sort id Save the dataset thats currently in memory to a temporary filename in Statas temp directory. This file will be deleted when Stata is exited just like a dataset in SASs WORK library: save company use D:mydatafirm.dta / pre Stata 11 code: sort id merge id using company /* Starting in Stata 11 the data does not need to * be sorted but the type of merge needs to be * specified like in this one-to-one merege: */ merge 1:1 id using company proc surveymeans; cluster sampunit; strata stratum; var age income; weight sampwt; run; svyset sampunit pweight = sampwt, strata(stratum) svy: mean age income Analyze a subpopulation by implementing the domain option: proc surveymeans; cluster sampunit; strata stratum; domain female; var age income; weight sampwt; run; Analyze a subpopulation by implementing the subpop option: svy: mean age income, subpop(females) Note: options come after a comma (,). Starting in SAS 9: proc surveyfreq; cluster sampunit; strata stratum; tables females*var1*var2; weight sampwt; run; When using proc surveyfreq the domain/subpop variable needs to be included in the tables statement. svyset sampunit pweight = sampwt, strata(stratum) svy: tab var1 var2, subpop(females) svy: tab var1 , subpop(females) proc surveyreg; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveyreg procedure does not have a way of dealing with subpopulations. Using by or where will not suffice as they will compute incorrect standard errors. svyset sampunit pweight = sampwt, strata(stratum) svy: regress depvar indvar1 indvar2 indvar3, / subpop(females) Starting in SAS 9: proc surveylogistic; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveylogistic procedure does not have a way of dealing with subpopulations. Using by or where will not suffice as they will compute incorr
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 高端定制刺绣师个性化方案设计考试试卷及答案
- 农发行呼和浩特市武川县2025秋招数据分析师笔试题及答案
- 农发行乐山市峨眉山市2025秋招结构化面试经典题及参考答案
- 成都蒲江县中储粮2025秋招笔试行测高频题库及答案
- 国家能源鸡西市梨树区2025秋招笔试模拟题及答案
- 国家能源焦作市马村区2025秋招笔试数学运算题专练及答案
- 2025年陕西电力科隆发展有限责任公司招聘(1人)考前自测高频考点模拟试题及答案详解(各地真题)
- 出租协议书范文
- 协会成立申请书
- 中国移动普洱市2025秋招技术岗专业追问清单及参考回答
- 2025年中国零售用显示屏行业市场全景分析及前景机遇研判报告
- 吉林省长春市2024-2025学年七年级上学期生物月考试题(含答案)
- 2025至2030中国视觉点胶机市场运行状况与未来发展走势预测报告
- 心源性休克病人的护理
- 种草莓劳动课件
- 雀巢牛奶购销合同范本
- 2025-2026学年华中师大版(2024)小学体育与健康一年级(全一册)教学设计(附目录P123)
- GA/T 952-2011法庭科学机动车发动机号码和车架号码检验规程
- 吊洞停止点检查记录表
- 以友辅仁教案
- “20道游标卡尺题目及答案”
评论
0/150
提交评论