asadv63题解析和机经63题答案解析_第1页
asadv63题解析和机经63题答案解析_第2页
asadv63题解析和机经63题答案解析_第3页
asadv63题解析和机经63题答案解析_第4页
asadv63题解析和机经63题答案解析_第5页
已阅读5页,还剩85页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1.Whenattemptingtominimizememory

usage,

themostefficientwaytodogroupprocessingwhenusingtheMEANSprocedureisto

use:

A.theBYstatement.

B.GROUPBYwiththeNOTSORTEDspecification.

C.theCLASSstatement.

D.multipleWHEREstatements.

ShapeoftheDataMostEfficientTools

arelativelysmallnumberofdistinctcombinationsofvalues・PROCMEANS(orPROC

oftheclassvariable-thesummarizeddatacontains10%SUMMARY)

orfeweroftheobservationsintheoriginaldataset•PROCREPORT

•PROCTABULATE

arelativelylargenumberofdistinctcombinationsofvalues•PROCSQL

oftheclassvariable-thesummarizeddatacontainsmore•DATAstepandPROCSORT

than10%oftheobservationsintheoriginaldataset

由UsingaGROUPBYstatementin

PROCSQLissimilarinCPUusageto

aPROCSORTst叩followedbya

DATAstep.

ComparingResourceUsageacrossThreeTechniquesforUsingPROCMEANS

Thethreetechniquesforsummarizingdataforspecificcombinationsofclass

variables(allbutthebasicPROCMEANSstep)differinresourceusageasfollows:

•TheTYPESstatementinaPROCMEANSstepusesthefewestresources.

•AprogramthatcontainstheNWAYoptioninmultiplePROCMEANSstepsuses

themostresourcesbecauseSASmustreadthedatasetseparatelyforeachPROC

MEANSstep.

•TheWHERE=datasetoptioninaPROCMEANSstepusesmoreresourcesthanthe

TYPESstatementinPROCMEANSbecauseSASmustcalculateallpossible

combinationsofclassvariablesbeforesubsetting.However,theWHERE=data

setoptioninPROCMEANSusesfewerresourcesthantheNWAYoptioninmultiple

PROCMEANSsteps.

2.TheSASdatasetWORK.CHECKhasa

variablenamedId_Codeinit.

WhichSQLstatementwouldcreateanindexonthisvariable?

A.createindexIdCodeonWORK.CHECK;

B.createindex(ld_Code)onWORK.CHECK;

C.makeindex=IdCodefromWORK.CHECK;

1).defineindex(Id_Code)inWORK.CHECK;

关于在SQL语句中创建INDEX的语句格式如下:

Generalform,CREATEINDEXstatement:

CREATE<UN1QUE>INDEXindex-nameONtable-name(column-name-l<,,,,column-name-n>);

示例语句:

/*creatingasimpleindex*/

datacrackman;

inputid_codename$@;

datalines;

1java

2c++

3dephi

4asp

5c#.net

r

procsql;

createindexid_codeonwork.crackman;

quit;

log提示简单索引id_code已经建立

或者使用:

procsql;

createindexidcodeonwork.crackman(id_code);

quit;

之所以括号内的column-name是可选项,是应为在建立Simpleindex时,

index-name必须与column-name是同一个名字。

datacrackman;

inputid_codename

datalines;

1java

2c++

3dephi

4asp

4c#.net

r

procsql;

createuniqueindexid_codeonwork.crackman;

quit;

LOG:提示ERROR:文件CRACKMAN的索弓Iid_code不允许使用重复的值。

/*dropingindexse*/

procsql;

dropindexdaily

fromwork.marchflights;

quit;

3.GiventheSASdatasets:

WORK.EMPLOYEEWORK.NEWEMPLOYEE

NameDeptNamesSalary

AlanSalesMichelle50000

MichelleSalesParesh60000

ASASprogramissubmittedand

thefollowingiswrittentotheSASlog:

101procsql;

102selectdept,name

103fromWORK.EMPLOYEE

104wherename=(selectnamesfromnewemployeewheresalary>40000)

ERROR:Subqueryevaluatedtomorethanonerow.

105;

106quit;

Whatwouldallowtheprogramto

successfullyexecutewithouterrors?

A.Replacethewhereclausewith:

whereEMPLOYEE.Name=(selectNamesdelimitedwith

fromWORK.NEWEMPLOYEE

whereSalary>40000);

B.Replaceline104with:

whereEMPLOYEE.Name=ANY(selectNamesseparatedwith

fromWORK.NEWEMPLOYEE

whereSalary>40000);

C.ReplacetheequalsignwiththeINoperator.

D.Qualifythecolumnnameswiththetablenames.

首先分析这个原程序第104行,中wherename=(selectnamesfromnewemployeewhere

salsry>40000)

原程序在执行过程,应先是执行子查询,selectnamesfromnewemployeewhere

salsry>40000,将返回的结果作为下一个查询的筛选条件。

在这里返回的结果NAMES这个变量包括了两个值Michelle和Paresh,所以在执行过程出现

错误。那么应该改成IN或者ANYoINANY指定的是一个匹配范围,而等号必须是•个具体

匹配的值。

这里可以思考SEPARATEDWITH这个语句,SAS认为是错误。

4.GiventheSASdataset

SASUSER.HIGHWAY:

SteeringSeatbeltSpeedStatusCount

absentNo0-29serious31

absentNo0-29not1419

absentNo30-49serious191

absentno30-49not2004

absentno50+serious216

ThefollowingSASprogramissubmitted:

procsqlnoprint;

selectdistinct

Speed[insertSQLclause]

fromSASUSER.HIGHWAY;

quit;

titlel〃Speedvaluesrepresentedare:&GROUPS”;

procprintdata=SASUSER.HIGHWAY;

run;

WhichSQLclausestoresthetext0-29,30-49,50+in

themacrovariableGROUPS?

A.into&GROUPS

B.into:GROUPS

C.into:GROUPSseparatedby','

D.into&GROUPSseparatedby','

此题考察的是SQL与宏的结合。

用SQL语句产生宏变量,语法结构如下:

PROCSQLNOPRINT;

SELECTcolumnl<,column2,...>

INTO:macro-variable-1<,:macro-variable-2,...>

FROMtable-1|view-1

<WHEREexpression>

<otherclauses>;

QUIT;

下面这个语法结构是将一变量的多个值组合在一个宏变量里面,例如:

Generalform,SELECTstatementwithINTOclausefbrcombiningvaluesintoone

macrovariable:

PROCSQLNOPRINT;

SELECTcolumn!

INTO:macro-variable-1

SEPARATEDBY'delimited'

FROMtable-1|view-1

<WHEREexpression>

<otherclauses>;

QUIT;

/*示例程序:*/

datacrackman;

inputSteering$Seatbelt$Speed$Status$Count@;

datalines;

absentNo0-29serious31

absentNo0-29not1419

absentNo30-49serious191

absentno30-49not2004

absentno50+serious216

run;

procsql;

11

selectdistinctspeedinto:groupsseparatedbyzfrom

crackman;

quit;

这里创建的宏变量是全局宏变量

/*Supposeyouwanttocreaterangesofmacrovariablesthat

containthecoursecode,location,andstartingdateofall

coursesthatarescheduledin2002.*/

procsqlnoprint;

selectcount(*)into:numrows

fromsasuser.schedule

whereyear(begindate)=2002;

%letnumrows=&numrows;

%putThereare&numrowscoursesin2002;

selectcourse_codezlocation,begindateformat=mmddyylO.

into:crsidl-:crsid&numrows,

splacel-:place&numrowsz

:datel-:date&numrows

fromsasuser.schedule

whereyear(begindate)=2002

orderbybegindate;

%put_user__;

quit;

5.TheSASdatasetWORK.CHECKhasan

indexonthevariableCodeandthefollowing

SASprogramissubmitted.

procsortdata=W0RK.CHECK;

byCode;

run;

Whichdescribestheresultofsubmitting

theSASprogram?

A.TheindexonCodeisdeleted.

B.TheindexonCodeisupdated.

C.TheindexonCodeisuneffected.

D.Thesortdoesnotexecute.

SORT过程是不能对已经建立索引INDEX的数据集进行SORT或者REPLACE,因为那样会破坏数

据集的INDEX文件。

当然如果非要排序,破坏INDEX文件,可以再后面加入一个参数force。

6.ThetableWORK.PILOTScontainsthe

followingdata:

WORK.PILOTS

IdNameJobcodeSalary

001AlbertPT150000

002BrendaPT170000

003CarlPT160000

004DonnaPT280000

005EdwardPT290000

006FloraPT3100000

Thedatasetwassummarizedtoincludeaverage

salarybasedonjobcode:

JobcodeSalaryAvg

P/p>

P/p>

PT16000060000

PT28000085000

PT29000085000

PT3100000100000

WhichSQLstatementcould\0Tgenerate

thisresult?

A.select

Jobcode,

Salary,

avg(Salary)label=,Avg'

fromWORK.PILOTS

groupbyJobcode

orderbyId

B.select

Jobcode,

Salary,

(selectavg(Salary)

fromWORK.PILOTSasPl

wherePl.Jobcode=P2.Jobcode)asAvg

fromWORK.PILOTSasP2

orderbyId

C.select

Jobcode,

Salary,

(selectavg(Salary)

fromWORK.PILOTS

groupbyJobcode)asAvg

fromWORK.PILOTS

orderbyId

D.select

Jobcode,

Salary,

Avg

from

WORK.PILOTS,

(select

JobcodeasJc,

avg(Salary)asAvg

fromWORK.PILOTS

groupby1)

whereJobcode=Jc

orderbyId

DATAPILOTS;

inputIdName$Jobcode$Salary@@;

datalines;

001AlbertPT150000

002BrendaPT170000

003CarlPT160000

004DonnaPT280000

005EdwardPT290000

006FloraPT3100000

run;

/*选项A*/

procsql;

select

Jobcode,

Salaryz

avg(Salary)label='Avg1

fromWORK.PILOTS

groupbyJobcode

orderbyId;

quit;

/*NOTE:指定的查询涉及按其SELECT子句中没出现的一项排序。*

//NOTE:查询要求将汇总统计量与原始的数据重新合并。可以运行出结果,sas没有报

错★/

/*选项B*/

procsql;

select

Jobcode,

Salaryz

(selectavg(Salary)

fromWORK.PILOTSasPl

wherePl.Jobcode=P2.Jobcode)asAvg

fromWORK.PILOTSasP2

orderbyId;

quit;

*这是一个correctedsubqueriesz

Correlatedsubqueriescannotbeevaluatedindependentlyzbut

dependonthevalues

passedtothembytheouterqueryfortheirresults.;

/*选项c*/

procsql;

select

Jobcodez

Salary,

(selectavg(Salary)

fromWORK.PILOTS

groupbyJobcode)asAvg

fromWORK.PILOTS

orderbyId;

quit;

/*运行不出结果,因为在括号中是multiple-valuenoncorrectedsuqueries,

Somesubqueriesaremultiple-valuesubqueries:theyreturnmorethan

onevalue(row)totheouterquery.

Ifyournoncorrelatedsubquerymightreturnavalueformorethanone

row,buttheWHEREorHAVINGclauseintheouterquery

containsanoperatorotherthanoneoftheoperatorsthatarespecified

above,thequerywillfail.Anerrormessage

isdisplayedintheSASlog,whichindicatesthatthesubqueryevaluated

tomorethanonerow.*/

/*选项d*/

procsql;

select

Jobcodez

Salary,

Avg

from

WORK.PILOTS,

(select

JobcodeasJcz

avg(Salary)asAvg

fromWORK.PILOTS

groupby1)

whereJobcode=Jc

orderbyId;

quit;

/*括号内的子查询已经得出一个各个jobcode的均值,形成一个view,通过where语

句进行match,最后可以得出结果//

7.Aquickruleofthumbforthespace

requiredtorunPROCSORTis:

A.twotimesthesizeoftheSASdatasetbeingsorted.

B.threetimesthesizeoftheSASdatasetbeingsorted.

C.fourtimesthesizeoftheSASdatasetbeingsorted.

D.fivetimesthesizeoftheSASdatasetbeingsorted.

这道题有争议,sasprepguide里面的原话是这样的:

TheSORTprocedureisonetechniquethatcanbeusedtogroupororderdata.

However,theSORTprocedurecanuseahighvolumeofresources.Whenanuncompressed

datafileissortedusingtheSORTprocedure,SASrequiresenoughspaceinthedata

libraryfortwocopiesofthedatafile,plusaworkspacethatisapproximatelytwo

tofourtimesthesizeofthedatafile.

However,Themultiplier4appliesonlytoutilityfilesusedinreleasesprior

toSAS9,whenPROCSORTneedstousediskspaceinordertosortthedata.For

in-memorysortingandsortingwithSAS9andlater,themultiplieris2orless.

ThreadedprocessingtakesadvantageofmultipleCPUsbyexecutingmultiple

threadsinparallel(parallelprocessing).Threadedproceduresarecompletedinless

realtimethanifeachtaskwerehandledsequentially,althoughtheCPUtimeis

generallyincreased.

Whenathreadedsortisused,theobservationsintheinputdatasetaredivided

intoequaltemporarysubsets,basedonhowmanyprocessorsareallocatedtotheSORT

procedure.Eachsubsetisthensortedonadifferentprocessor.Thesortedsubsets

aretheninterleavedtorecreatethesortedversionoftheinputdataset.

所以最后选A两倍。

8.Multi-threadedprocessingforPROC

SORTwilleffectwhichofthesesystem

resources?

A.CPUtimewilldecrease,wallclocktimewilldecrease

B.CPUtimewillincrease,wallclocktimewilldecrease

C.CPUtimewilldecrease,wallclocktimewillincrease

1).CPUtimewillincrease,wallclocktimewillincrease

Thread-enabledproceduresarecompletedinsubstantiallylessrealtimethanifeach

taskishandledsequentially,buttheCPUtimeisgenerallyincreased.

ThreadedsortingisenabledordisabledbyusingtheSASsystemoptionTHREADS|

NOTHREADSortheTHREADS|NOTHREADSprocedureoption.

Generalform,SORTprocedurewiththeTHREADS|NOTHREADSoption:

PROCSORTSAS-data-set-nameTHREADS|NOTHREADS;

9.GiventheSASdataset

WORK.TRANSACT:

RepCostShip

SMITH20050

SMITH40020

JONES10010

SMITH600100

JONES1005

Thefollowingoutputisdesired:

Rep

JONES105

SMITH250

WhichSQLstatementwasused?

A.select

rep,

min(Cost+Ship)

fromWORK.TRANSACT

orderbyRep

B.select

Rep,

min(Cost,Ship)asMin

fromWORK.TRANSACT

summarybyRep

orderbyRep

C.select

Rep,

min(Cost,Ship)

fromWORK.TRANSACT

groupbyRep

orderbyRep

D.select

Rep,

min(Cost+Ship)

fromWORK.TRANSACT

groupbyRep

orderbyRep

/*示例程序:*/

dataTRANSACT;

inputRep$CostShip@;

datalines;

SMITH20050

SMITH40020

JONES10010

SMITH600100

JONES1005

run;

/★选项A★/

procsql;

selectrepfmin(Cost+Ship)

fromWORK.TRANSACTorderbyRep;

quit;

/*

JONES105

JONES105

SMITH105

SMITH105

SMITH105

得出的结果应该是没有分组比较的结果,就是自身比较,所以会得到5个结果,没有分组,

而且M1N的值都为105*/

/*选项B*/

procsql;

selectRep,min(CostrShip)asMin

fromWORK.TRANSACT

summarybyReporderbyRep;

quit;

/*ERROR22-322:语法错误,期望下列之一:,3,GROUP.*/

/*选项C*/

procsql;

selectRepzmin(CostzShip)

fromWORK.TRANSACT

groupbyReporderbyRep;quit;

/*Rep

JONES10

JONES5

SMITH50

SMITH20

SMITH100

得到是在一个REP上的这两个变量COSTSHIP之间的小值。*

/*选项D*/

procsql;

selectRep,min(Cost+Ship)

fromWORK.TRANSACT

groupbyReporderbyRep

r

quit;

/*因为在要得到的数据集中有一个是没有命名的,所以在

中MIN(COST+SH工P)是没有命名的变量的值。*/

变形题:

GiventheSASdatasetWORK.TRANSACT:

RepCostShip

SMITH20050

SMITH40020

JONES10010

SMITH600100

JONES1005

Thefollowingoutputisdesired:

Rep

JONES105

JONES105

SMITH105

SMITH105

SMITH105

WhichSQLstatementwasused?

A.

selectRep,min(Cost+Ship)

fromWORK.TRANSACT

orderbyRep

B.

selectRep,min(Cost,Ship)asMin

fromWORK.TRANSACT

summarybyRep

orderbyRep

C.

selectRep,min(Cost,Ship)

fromWORK.TRANSACT

groupbyRep

orderbyRep

D.

selectRep,min(Cost+Ship)

fromWORK.TRANSACT

groupbyRep

orderbyRep

Answer:A

10.ThefollowingSASprogramis

submitted:

%letValue=9;

%letAdd=5;

%letNewval=%eval(&Value/&Add);

%put&Newval;

Whatisthevalueofthemacrovariable

Newvalwhenthe%PUTstatementexecutes?

A.0.555

B.2

C.1.8

D.1

%PUTstatementshowsthat%EVALdiscardsthefractionalpart(小数部分),whenit

performsdivisiononintegersthatwouldresultinafraction:

如果要获得准确的结果应该选择函数%sysevalf替换%eval.

the%EVALfunctiongeneratesERRORmessagesinthelogwhenitencountersan

expressionthatcontainsnon-integervalues.InordertoavoidtheseERRORmessages,

youcanusethe%SYSEVALFfunction.The%SYSEVALFfunctionevaluatesarithmeticand

logicalexpressionsusingfloating-pointarithmetic.

11.ThefollowingSAScodeissubmitted:

dataWORK.TEMPWORK.ERRORS/view=W0RK.TEMP;

infileRAWDATA;

inputXaXbXc;

ifXa=.thenoutputWORK.ERRORS;

elseoutputWORK.TEMP;

run;

WhichofthefollowingistrueoftheWORK.ERRORSdataset?

A.ThedatasetiscreatedwhentheDATAstepissubmitted.

B.ThedatasetiscreatedwhentheviewTEMPisusedinanotherSASstep.

C.ThedatasetisnotcreatedbecausetheDATAstatementcontainsasyntaxerror.

I).ThedescriptorportionofWORK.ERRORSiscreatedwhentheDATAstepissubmitted.

UsingSASDATAStepViewstoConserveDataStorageSpace(continued)

DATAStepViews

ADATAstepviewcontainsapartiallycompiledDATAstepprogramthatcanreaddatafromavarietyofsources,including

•rawdatafiles

•SASdatafiles

・PROCSQLviews

•SAS/ACCESSviews

•DB2,ORACLE,orotherDBMSdata.

ADATAstepviewcanbecreatedonlyinaDATAstepADATAstepviewcannotcontainglobalstatements,host-specificdatasetoptions,ormosthost­

specificFILEandINF1LEstatements.Also,aDATAstepviewcannotbeindexedorcompressed.

YoucanuseDATAstepviewsto

•alwaysaccessthemostcurrentdatainchangingfiles

•avoidstoringacopyofalargedatafile

•combinedatafrommultiplesources

Thecompiledcodedoesnottakeupmuchroomforstorage,soyoucancreateDATAstepviewstoconsen/ediskspace.Ontheotherhand,useofDATA

stepviewscanincreaseCPUusagebecauseSASmustexecutethestoredDATAstepprogrameachtimeyouusetheview

Generalform,DATAsteptocreateaDATAstepview:

DATASAS-data-view<SAS-data-file-1...SASdata-file-n>I

VIEW-SAS-data-v/etv;

<SASstatements〉

RUN;

where

•SAS-data-viewnamesthedataviewtobecreated

•SAS-data-file-1SAS-data-file-nisanoptionallistthatnamesanydatafilestobecreated

•SASstatementsincludesotherDATAstepsyntaxtocreatethedataviewandanydatafilesthatarelistedintheDATAstatement

TheVIEW=optiontellsSAStocompile,butnottoexecute,thesourceprogramandtostorethecompiledcodeintheinputDATAstepviewthatisnamedinthe

option.

回IfyouspecifyadditionaldatafilesintheDATAstatement,SAScreatesthesedatafileswhentheviewisprocessedinasubsequentDATAorPROCstep.

Therefore,youneedtoreferencethedataviewbeforeyouattempttoreferencethedatafileinlatersteps.

Example

ThefollowingprogramcreatesaDATAstepviewnamedCompany.NewdatathatreadsfromthefilereferencedbythefilerefintheINFILEstatement

datacoiqsany.newdata/view=company.newdata;

infile<fileref>;

<DATAszepscatezencs>

run;

所以这里sasstatement并没有语法错误,更为准确的答案是B

12.Whichtitlestatementwouldalwaysdisplaythecurrentdate?

A.titleTodayis:&sysdate.〃;

B.title"Todayis:&sysdate9.〃;

C.title"Todayis:&today.〃;

D.titleTodayis:%sysfunc(today(),worddate.);

A选项和B选项输出的是sassessionstarted的时间或者日期

SYSDATE9representsthedateonwhichtheSASsessionstarted,andSYSTIMErepresents

thetimeatwhichtheSASsessionstarted.Supposeyouwouldratherseethedate

insomeotherformat,orsupposeyouwouldratherseethecurrentdateortime.You

canusethe%SYSFUNCfunctiontoexecuteotherSASfunctionsaspartofthemacro

facility.

Generalform,%SYSFUNCfunction:

%SYSFUNC(function(argument(s))<.formats)

where

•functionisthenameoftheSASfunctiontoexecute.

•argument(s)isoneormoreargumentsthatareusedbyfunction.Usecommastoseparateallarguments.Anargumentcanbea

macrovariablereferenceoratextexpressionthatproducesargumentsforafunction.

•formatisanoptionalformattoapplytotheresultoffunction.Bydefault,numericresultsareconvertedtoacharacterstring

usingtheBEST12.format,andcharacterresultsareusedastheyare.withoutformattingortranslation.

举个例子:

Supposeyouwanttocreateareporttitlethatincludesthecurrentdatein

WORDDATE,format.Youcouldusethisstatement:

title"ReportProducedon%sysfunc(today(),worddate.)";

However,thatwouldresultinthefollowingtitle:

ReportProducedonJune7,2002

Youcanmaskthecommabyusingthe%QSYSFUNCfunctioninstead,asfollows:

title"ReportProducedon

%sysfunc(left(%qsysfunc(today(),worddate.)))";

Themodifiedstatementgeneratesthefollowingtitle:

ReportProducedonJune7,2002(去空)

13.GiventheSASdatasets:

WORK.ONEWORK.TWO

IdNameIdSalary

112Smith243150000

243Wei35545000

457Jones52375000

ThefollowingSASprogramissubmitted:

dataWORK.COMBINE;

mergeWORK.ONEWORK.TWO;

byId;

run;

WhichSQLprocedurestatementproducesthesameresults?

A.createtableWORK.COMBINEas

select

Id,

Name,

Salary

from

WORK.ONE

fulljoin

WORK.TWO

onONE.Id=TWO.Id

B.createtableWORK.COMBINEas

select

coalesce(ONE.Id,TWO.Id)asId,

Name,

Salary

from

WORK.ONE,

WORK.TWO

whereONE.ld=TWO.Id

C.createtableWORK.COMBINEas

select

coalesce(ONE.Id,TWO.Id)asId,

Name,

Salary

from

WORK.ONE

fulljoin

WORK.TWO

onONE.Id=TWO.Id

orderbyId

D.createtableWORK.COMBINEas

select

coalesce(ONE.Id,TWO.Id)asId,

Name,

Salary

from

WORK.ONE,

WORK.TWO

whereONE.Id=TWO.Id

orderbyONE.Id

下面对四个答案进行解析。

A:

犯了一个明显的错误,就是ID这个变量引用不明确。所以运行不出结果来;

B:

procsql;

createtableWORK.COMBINEas

select

coalesce(ONE.IdzTWO.Id)asIdf

Name,

Salary

from

WORK.ONE,

WORK.TWO

whereONE.Id=TWO.Id

r

quit;

procprint;run;

*WhenyouaddtheCOALESCEfunctiontotheSELECTclauseofthePROC

SQLouterjoin,

thePROCSQLouterjoincanproducethesameresultasaDATAstep

match-merge.;

/*返回只有一个观测,因为ONE和TWO的连接按照where筛选条件进行了筛选,结果只有一

条符合条件。*/

C

procsql;

createtableWORK.COMBINEas

select

coalesce(ONE.Id,TWO.Id)asIdf

Name,

Salary

from

WORK.ONE

fulljoin

WORK.TWO

onONE.Id=TWO.Id

orderbyId

r

quit;

/*首先执行的是按照ON条件执行FULLJOIN链接,此时如果不用COALESCE,数据集会

有两个ID,一个是ONE•个是TWO的,

如果ONE和TWO的工D不匹配,那么想要的工D就为缺失值。

那么COALESCE函数就是合并这两个工D,然后重复的合并在一起,剔除缺失值,所以结

果就和MERGE一致。*/

D;

错误和B一样,连接方式不是FULLJOIN,而是innerjoin。

看mergeSQL在横向水平上合并数据集的区别

datacrackmanl(index=(x));

inputxy;

datalines;

12

34

56

/

run;

datacrackman2(index=(x));

inputxz;

datalines;

18

39

57

/

run;

datacrackman;

mergecrackmanlcrackman2;

byx;

run;

procsql;

createtablecrackmanasselecta.xra.yfb.z

fromcrackmanla,crackman2bwherea.x=b.x;

quit;

两种方法都可以实现既定的目标

但是各有什么优缺点呢?

MERGE:

优点:

1.除非内存限制,否则对要合并的数据集的观测总数是没有限制的

2.可以通过DATASTEP里面的DOLOOP和ARRAY以及其他一些MERGE的功能来实现比较复杂

的逻辑合并

3.可以通过对要合并数据集的多个变量的比较进行合并

缺点:

1.要合并的数据集必须先排序或者建立索引

2.根据多个变量或者一个变量的比较进行数据集合并,那么前提是要合并的数据集必须包括

要比较的变量。

3.按照指定的变量进行严格比较,获得所需要的结果,而且必须至少有一个观测是匹配成功

的。

因为MERGE在DATASTEP里面,在匹配的过程中,如果匹配成功之后,不会再重复读取已经

匹配成功的数据观测,例如:如果crackman1的第一个观测和crackman2的第一个观测匹配

成功了,那么在进行第二个观测匹配时,就不会再去读取第一个观测的数据,所以效率会高

些。但是SQL不是这样的,SQL是多对多的扫描匹配,产生一个cartesianproduct,笛卡

尔乘积,所以相对MERGE显得速度和效率要慢一些

SQL:

优点:

L数据集不需要排序或者建立索引,当然如果建立索引更好

2.多个数据集可以

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论