统计计算与模拟_第1页
统计计算与模拟_第2页
统计计算与模拟_第3页
统计计算与模拟_第4页
统计计算与模拟_第5页
已阅读5页,还剩55页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

「統計計算與模擬」

第一週補充教材

R-programming//HungChen(台大數學系:陳宏老師)OutlineIntroduction:HistoricaldevelopmentS,SplusCapabilityStatisticalAnalysisReferencesCalculatorDataTypeResourcesSimulationandStatisticalTablesProbabilitydistributionsProgrammingGrouping,loopsandconditionalexecutionFunctionReadingandwritingdatafromfilesModelingRegressionANOVADataAnalysisonAssociationLotteryGeyserSmoothingR,SandS-plusS:aninteractiveenvironmentfordataanalysisdevelopedatBellLaboratoriessince19761988-S2:RA

Becker,JM

Chambers,

AWilks1992-S3:JM

Chambers,TJHastie1998-S4:JM

ChambersExclusivelylicensedbyAT&T/LucenttoInsightfulCorporation,SeattleWA.Productname:“S-plus”.ImplementationlanguagesC,Fortran.See:

/cm/ms/departments/sia/S/history.htmlR:initiallywrittenbyRossIhakaandRobertGentlemanatDep.

ofStatisticsofUofAuckland,NewZealandduring1990s.Since1997:international“R-core”teamofca.15peoplewithaccesstocommonCVSarchive.IntroductionRis“GNUS”—Alanguageandenvironmentfordatamanipula-tion,calculationandgraphicaldisplay.Rissimilartotheaward-winningSsystem,whichwasdevelopedatBellLaboratoriesbyJohnChambersetal.asuiteofoperatorsforcalculationsonarrays,inparticularmatrices,alarge,coherent,integratedcollectionofintermediatetoolsforinteractivedataanalysis,graphicalfacilitiesfordataanalysisanddisplayeitherdirectlyatthecomputeroronhardcopyawelldevelopedprogramminglanguagewhichincludesconditionals,loops,userdefinedrecursivefunctionsandinputandoutputfacilities.ThecoreofRisaninterpretedcomputerlanguage.Itallowsbranchingandloopingaswellasmodularprogrammingusingfunctions.Mostoftheuser-visiblefunctionsinRarewritteninR,callinguponasmallersetofinternalprimitives.ItispossiblefortheusertointerfacetoprocedureswritteninC,C++orFORTRANlanguagesforefficiency,andalsotowriteadditionalprimitives.WhatRdoesanddoesnotdatahandlingandstorage:numeric,textualmatrixalgebrahashtablesandregularexpressionshigh-leveldataanalyticandstatisticalfunctionsclasses(“OO”)graphicsprogramminglanguage:loops,branching,subroutinesisnotadatabase,butconnectstoDBMSshasnographicaluserinterfaces,butconnectstoJava,TclTklanguageinterpretercanbeveryslow,butallowstocallownC/C++codenospreadsheetviewofdata,butconnectstoExcel/MsOfficenoprofessional/commercialsupportRandstatisticsPackaging:acrucialinfrastructuretoefficientlyproduce,loadandkeepconsistentsoftwarelibrariesfrom(many)differentsources/authorsStatistics:mostpackagesdealwithstatisticsanddataanalysisStateoftheart:manystatisticalresearchersprovidetheirmethodsasRpackagesDataAnalysisandPresentationTheRdistributioncontainsfunctionalityforlargenumberofstatisticalprocedures.linearandgeneralizedlinearmodelsnonlinearregressionmodelstimeseriesanalysisclassicalparametricandnonparametrictestsclusteringsmoothingRalsohasalargesetoffunctionswhichprovideaflexiblegraphicalenvironmentforcreatingvariouskindsofdatapresentations.ReferencesForR,

ThebasicreferenceisTheNewSLanguage:AProgrammingEnvironmentforDataAnalysisandGraphicsbyRichardA.Becker,JohnM.ChambersandAllanR.Wilks(the“BlueBook”).Thenewfeaturesofthe1991releaseofS(Sversion3)arecoveredinStatisticalModelsinSeditedbyJohnM.ChambersandTrevorJ.Hastie(the“WhiteBook”).Classicalandmodernstatisticaltechniqueshavebeenimplemented.SomeofthesearebuiltintothebaseRenvironment.Manyaresuppliedaspackages.Thereareabout8packagessuppliedwithR(called“standard”packages)andmanymoreareavailablethroughthecranfamilyofInternetsites(via).AlltheRfunctionshavebeendocumentedintheformofhelppagesinan“outputindependent”formwhichcanbeusedtocreateversionsforHTML,LATEX,textetc.Thedocument“AnIntroductiontoR”providesamoreuser-friendlystartingpoint.An“RLanguageDefinition”manualMorespecializedmanualsondataimport/exportandextendingR.Rasacalculator>log2(32)[1]5>sqrt(2)[1]1.414214>seq(0,5,length=6)[1]012345>plot(sin(seq(0,2*pi,length=100)))Objectorientationprimitive(or:atomic)datatypesinRare:numeric(integer,double,complex)characterlogicalfunctionoutofthese,vectors,arrays,listscanbebuilt.ObjectorientationObject:acollectionofatomicvariablesand/orotherobjectsthatbelongtogetherExample:amicroarrayexperimentprobeintensitiespatientdata(tissuelocation,diagnosis,follow-up)genedata(sequence,IDs,annotation)Parlance:class:the“abstract”definitionofitobject:aconcreteinstancemethod:otherwordfor‘function’slot:acomponentofanobjectObjectorientationAdvantages:Encapsulation(canusetheobjectsandmethodssomeoneelsehaswrittenwithouthavingtocareabouttheinternals)Genericfunctions(e.g.plot,print)Inheritance(hierarchicalorganizationofcomplexity)Caveat:Overcomplicated,baroqueprogramarchitecture…variables>a=49>sqrt(a)[1]7>a="Thedogatemyhomework">sub("dog","cat",a)[1]"Thecatatemyhomework“>a=(1+1==3)>a[1]FALSEnumericcharacterstringlogicalvectors,matricesandarraysvector:anorderedcollectionofdataofthesametype>a=c(1,2,3)>a*2[1]246Example:themeanspotintensitiesofall15488spotsonachip:avectorof15488numbersInR,asinglenumberisthespecialcaseofavectorwith1element.Othervectortypes:characterstrings,logicalvectors,matricesandarraysmatrix:arectangulartableofdataofthesametypeexample:theexpressionvaluesfor10000genesfor30tissuebiopsies:amatrixwith10000rowsand30columns.array:3-,4-,..dimensionalmatrixexample:theredandgreenforegroundandbackgroundvaluesfor20000spotson120chips:a4x20000x120(3D)array.Listsvector:anorderedcollectionofdataofthesametype.>a=c(7,5,1)>a[2][1]5list:anorderedcollectionofdataofarbitrarytypes.>doe=list(name="john",age=28,married=F)>doe$name[1]"john“>doe$age[1]28Typically,vectorelementsareaccessedbytheirindex(aninteger),listelementsbytheirname(acharacterstring).Butbothtypessupportbothaccessmethods.Dataframesdataframe:issupposedtorepresentthetypicaldatatablethatresearcherscomeupwith–likeaspreadsheet.Itisarectangulartablewithrowsandcolumns;datawithineachcolumnhasthesametype(e.g.number,text,logical),butdifferentcolumnsmayhavedifferenttypes.Example:>a

localisationtumorsizeprogressXX348proximal6.3FALSEXX234distal8.0TRUEXX987proximal10.0FALSEFactorsAcharacterstringcancontainarbitrarytext.Sometimesitisusefultousealimitedvocabulary,withasmallnumberofallowedwords.Afactorisavariablethatcanonlytakesuchalimitednumberofvalues,whicharecalledlevels.>a[1]Kolon(Rektum)

MagenMagen[4]MagenMagenRetroperitoneal[7]MagenMagen(retrogastral)

MagenLevels:Kolon(Rektum)

Magen

Magen(retrogastral)Retroperitoneal>class(a)[1]"factor">as.character(a)[1]"Kolon(Rektum)""Magen"

"Magen"[4]"Magen""Magen"

"Retroperitoneal"[7]"Magen""Magen(retrogastral)"

"Magen">eger(a)[1]122224232>eger(as.character(a))[1]NANANANANANANANANANANANAWarningmessage:NAsintroducedbycoercionSubsettingIndividualelementsofavector,matrix,arrayordataframeareaccessedwith“[]”byspecifyingtheirindex,ortheirname>a

localisation

tumorsizeprogressXX348proximal6.30XX234distal8.01XX987proximal10.00>a[3,2][1]10>a["XX987","tumorsize"][1]10>a["XX987",]

localisationtumorsizeprogressXX987proximal100Subsetting>a

localisation

tumorsize

progressXX348proximal

6.3

0XX234distal

8.0

1XX987proximal

10.0

0>a[c(1,3),]

localisationtumorsizeprogressXX348proximal6.30XX987proximal10.00>a[c(T,F,T),]

localisationtumorsizeprogressXX348proximal6.30XX987proximal10.00>a$localisation[1]"proximal""distal""proximal">a$localisation=="proximal"[1]TRUEFALSETRUE>a[a$localisation=="proximal",]

localisationtumorsizeprogressXX348proximal6.30XX987proximal10.00subsetrowsbyavectorofindicessubsetrowsbyalogicalvectorsubsetacolumncomparisonresultinginlogicalvectorsubsettheselectedrowsResourcesApackagespecificationallowstheproductionofloadablemodulesforspecificpurposes,andseveralcontributedpackagesaremadeavailablethroughtheCRANsites.CRANandRhomepage:/

ItisR’scentralhomepage,givinginformationontheRprojectandeverythingrelatedtoit./

Itactsasthedownloadarea,carryingthesoftwareitself,extensionpackages,PDFmanuals.Gettinghelpwithfunctionsandfeatureshelp(solve)?solveForafeaturespecifiedbyspecialcharacters,theargumentmustbeenclosedindoubleorsinglequotes,makingita“characterstring”:help("[[")GettinghelpDetailsaboutaspecificcommandwhosenameyouknow(inputarguments,options,algorithm,results):>?t.testor>help(t.test)Gettinghelp

oHTMLsearchengine

oSearchfortopics

withregular

expressions:

“help.search”ProbabilitydistributionsCumulativedistributionfunctionP(X≤x):‘p’fortheCDFProbabilitydensityfunction:‘d’forthedensity,,Quantilefunction(givenq,thesmallestxsuchthatP(X≤x)>q):‘q’forthequantilesimulatefromthedistribution:‘rDistributionRnameadditionalargumentsbetabetashape1,shape2,ncpbinomialbinomsize,probCauchycauchylocation,scalechi-squaredchisq

df,ncpexponentialexprateFfdf1,df1,ncpgammagammashape,scalegeometricgeom

probhypergeometrichyperm,n,klog-normallnorm

meanlog,sdloglogisticlogis;negativebinomialnbinom;normalnorm;Poissonpois;Student’stt;uniformunif;Weibull

weibull;Wilcoxon

wilcoxGrouping,loopsandconditionalexecutionGroupedexpressionsRisanexpressionlanguageinthesensethatitsonlycommandtypeisafunctionorexpressionwhichreturnsaresult.Commandsmaybegroupedtogetherinbraces,{expr1,...,exprm},inwhichcasethevalueofthegroupistheresultofthelastexpressioninthegroupevaluated.ControlstatementsifstatementsThelanguagehasavailableaconditionalconstructionoftheformif(expr1)expr2elseexpr3whereexpr1mustevaluatetoalogicalvalueandtheresultoftheentireexpressionisthenevident.avectorizedversionoftheif/elseconstruct,theifelsefunction.Thishastheformifelse(condition,a,b)Repetitiveexecutionforloops,repeatandwhilefor(nameinexpr1)expr2wherenameistheloopvariable.expr1isavectorexpression,(oftenasequencelike1:20),andexpr2isoftenagroupedexpressionwithitssub-expressionswrittenintermsofthedummyname.expr2isrepeatedlyevaluatedasnamerangesthroughthevaluesinthevectorresultofexpr1.Otherloopingfacilitiesincludetherepeatexprstatementandthewhile(condition)exprstatement.Thebreakstatementcanbeusedtoterminateanyloop,possiblyabnormally.Thisistheonlywaytoterminaterepeatloops.Thenextstatementcanbeusedtodiscontinueoneparticularcycleandskiptothe“next”.Branchingif(logicalexpression){statements}else{alternativestatements}elsebranchisoptionalLoopsWhenthesameorsimilartasksneedtobeperformedmultipletimes;forallelementsofalist;forallcolumnsofanarray;etc.MonteCarloSimulationCross-Validation(deleteoneandetc)for(iin1:10){

print(i*i)}i=1while(i<=10){

print(i*i)i=i+sqrt(i)}lapply,sapply,applyWhenthesameorsimilartasksneedtobeperformedmultipletimesforallelementsofalistorforallcolumnsofanarray.Maybeeasierandfasterthan“for”loopslapply(li,function)Toeachelementofthelistli,thefunctionfunctionisapplied.Theresultisalistwhoseelementsaretheindividualfunctionresults.>li=list("klaus","martin","georg")>lapply(li,toupper)>[[1]]>[1]"KLAUS">[[2]]>[1]"MARTIN">[[3]]>[1]"GEORG"lapply,sapply,applysapply(li,fct)Likeapply,buttriestosimplifytheresult,byconvertingitintoavectororarrayofappropriatesize>li=list("klaus","martin","georg")>sapply(li,toupper)[1]"KLAUS""MARTIN""GEORG">fct=function(x){return(c(x,x*x,x*x*x))}>sapply(1:5,fct)[,1][,2][,3][,4][,5][1,]12345[2,]1491625[3,]182764125applyapply(arr,margin,fct)Applythefunctionfctalongsomedimensionsofthearrayarr,accordingtomargin,andreturnavectororarrayoftheappropriatesize.>x[,1][,2][,3][1,]570[2,]798[3,]467[4,]635>apply(x,1,sum)[1]12241714>apply(x,2,sum)[1]222520functionsandoperatorsFunctionsdothingswithdata“Input”:functionarguments(0,1,2,…)“Output”:functionresult(exactlyone)Example:add=function(a,b){result=a+b

return(result)}Operators:Short-cutwritingforfrequentlyusedfunctionsofoneortwoarguments.Examples:+-*/!&|%%functionsandoperatorsFunctionsdothingswithdata“Input”:functionarguments(0,1,2,…)“Output”:functionresult(exactlyone)Exceptionstotherule:Functionsmayalsousedatathatsitsaroundinotherplaces,notjustintheirargumentlist:“scopingrules”*Functionsmayalsodootherthingsthanreturningaresult.E.g.,plotsomethingonthescreen:“sideeffects”*LexicalscopeandStatisticalComputing.R.Gentleman,R.Ihaka,JournalofComputationalandGraphicalStatistics,9(3),p.491-508(2000).ReadingdatafromfilesTheread.table()functionToreadanentiredataframedirectly,theexternalfilewillnormallyhaveaspecialform.Thefirstlineofthefileshouldhaveanameforeachvariableinthedataframe.Eachadditionallineofthefilehasitsfirstitemarowlabelandthevaluesforeachvariable.PriceFloorAreaRoomsAgeCent.heat0152.00111.083056.2no0254.75128.071057.5no0357.50101.0100054.2no0457.50131.069068.8no0559.7593.090051.9yes...numericvariablesandnonnumericvariables(factors)ReadingdatafromfilesHousePrice<-read.table("houses.data",header=TRUE)

PriceFloorAreaRoomsAgeCent.heat52.00111.083056.2no54.75128.071057.5no57.50101.0100054.2no57.50131.069068.8no59.7593.090051.9yes...Thedatafileisnamed‘input.dat’.Supposethedatavectorsareofequallengthandaretobereadininparallel.Supposethattherearethreevectors,thefirstofmodecharacterandtheremainingtwoofmodenumeric.Thescan()functioninp<-scan("input.dat",list("",0,0))Toseparatethedataitemsintothreeseparatevectors,useassignmentslikelabel<-inp[[1]];x<-inp[[2]];y<-inp[[3]]inp<-scan("input.dat",list(id="",x=0,y=0));

inp$id;inp$x;inp$yStoringdataEveryRobjectcanbestoredintoandrestoredfromafilewiththecommands“save”and“load”.ThisusestheXDR(externaldatarepresentation)standardofSunMicrosystemsandothers,andisportablebetweenMS-Windows,Unix,Mac.>save(x,file=“x.Rdata”)>load(“x.Rdata”)ImportingandexportingdataTherearemanywaystogetdataintoRandoutofR.Mostprograms(e.g.Excel),aswellashumans,knowhowtodealwithrectangulartablesintheformoftab-delimitedtextfiles.>x=read.delim(“filename.txt”)also:read.table,read.csv>write.table(x,file=“x.txt”,sep=“\t”)Importingdata:caveatsTypeconversions:bydefault,thereadfunctionstrytoguessandautoconvertthedatatypesofthedifferentcolumns(e.g.number,factor,character).Thereareoptionsas.isandcolClassestocontrolthis–readtheonlinehelpSpecialcharacters:thedelimitercharacter(space,comma,tabulator)andtheend-of-linecharactercannotbepartofadatafield.Tocircumventthis,textmaybe“quoted”.However,ifthisoptionisused(thedefault),thenthequotecharactersthemselvescannotbepartofadatafield.Exceptiftheythemselvesarewithinquotes…Understandtheconventionsyourinputfilesuseandsetthequoteoptionsaccordingly.StatisticalmodelsinRRegressionanalysisalinearregressionmodelwithindependenthomoscedasticerrorsTheanalysisofvariance(ANOVA)Predictorsarenowallcategorical/qualitative.ThenameAnalysisofVarianceisusedbecausetheoriginalthinkingwastotrytopartitiontheoverallvarianceintheresponsetothatduetoeachofthefactorsandtheerror.Predictorsarenowtypicallycalledfactorswhichhavesomenumberoflevels.Theparametersarenowoftencalledeffects.Theparametersareconsideredfixedbutunknown—calledfixed-effectsmodelsbutrandom-effectsmodelsarealsousedwhereparametersaretakentoberandomvariables.One-WayANOVAThemodelGivenafactoraoccurringati=1,…,Ilevels,withj=1,…,Ji

observationsperlevel.Weusethemodelyij=µ+ai

+eij,

i=1,…,I,j=1,…,Ji

Notalltheparametersareidentifiableandsomerestrictionisnecessary:Setµ=0anduseIdifferentdummyvariables.Seta1

=0—thiscorrespondstotreatmentcontrastsSetSJiai

=0—ensureorthogonalityGeneralizedlinearmodelsNonlinearregressionTwo-WayAnovaThemodel

yijk=µ+ai

+bj

+(ab)ij+eijk.Wehavetwofactors,a

atIlevelsandb

atJlevels.Letnij

bethenumberofobservationsatleveliofaandleveljofb

andletthoseobservationsbeyij1,yij2,….Acompletelayouthasnij1foralli,j.Theinteractioneffect(ab)ij

isinterpretedasthatpartofthemeanresponsenotattributabletotheadditive

effectofai

and

bj.

Forexample,youmayenjoystrawberriesandcreamindividually,butthecombinationissuperior.Incontrast,youmaylikefishandicecreambutnottogether.Asofaninvestigationoftoxicagents,48ratswereallocatedto3poisons(I,II,III)and4treatments(A,B,C,D).Theresponsewassurvivaltimeintensofhours.TheData:StatisticalStrategyandModelUncertaintyStrategyDiagnostics:Checkingofassumptions:constantvariance,linearity,normality,outliers,influentialpoints,serialcorrelationandcollinearity.Transformation:Transformingtheresponse—Box-Cox,transformingthepredictors—testsandpolynomialregression.Variableselection:StepwiseandcriterionbasedmethodsAvoiddoingtoomuchanalysis.Rememberthatfittingthedatawellisnoguaranteeofgoodpredictiveperformanceorthatthemodelisagoodrepresentationoftheunderlyingpopulation.Avoidcomplexmodelsforsmalldatasets.Trytoobtainnewdatatovalidateyourproposedmodel.Somepeoplesetasidesomeoftheirexistingdataforthispurpose.Usepastexperiencewithsimilardatatoguidethechoiceofmodel.SimulationandRegressionWhatisthesamplingdistributionofleastsquaresestimateswhenthenoisesarenotnormallydistributed?Assumethenoisesareindependentandidenticallydistributed.1.Generateefromtheknownerrordistribution.2.Formy=Xb+e.3.Computetheestimateofb.Repeatthesethreestepsmanytimes.Wecanestimatethesamplingdistributionofusingtheempiricaldistributionofthegenerated,whichwecanestimateasaccuratelyaswepleasebysimplyrunningthesimulationforlongenough.Thistechniqueisusefulforatheoreticalinvestigationofthepropertiesofaproposednewestimator.Wecanseehowitsperformancecomparestootherestimators.Itisofnovaluefortheactualdatasincewedon’tknowthetrueerrordistributionandwedon’tknowb.BootstrapThebootstrapmethodmirrorsthesimulationmethodbutusesquantitieswedoknow.Insteadofsamplingfromthepopulationdistributionwhichwedonotknowinpractice,weresamplefromthedataitself.Difficulty:bisunknownandthedistributionofeisknown.Solution:bisreplacedbyitsgoodestimatebandthedistributionofeisreplacedbytheresidualse1,…,en.1.Generatee*bysamplingwithreplacementfrome1,…,en.2.Formy*=Xb+e*.3.Computeb*from(X,y*).Forsmalln,itispossibletocomputeb*foreverypossiblesamplesofe1,…,en.1 nInpractice,thisnumberofbootstrapsamplescanbeassmallas50ifallwewantisanestimateofthevarianceofourestimatesbutneedstobelargerifconfidenceintervalsarewanted.ImplementationHowdowetakeasampleofresidualswithreplacement?sample()isgoodforgeneratingrandomsamplesofindices:sample(10,rep=T)leadsto“7992574189”Executethebootstrap.Makeamatrixtosavetheresultsinandthenrepeatthebootstrapprocess1000timesforalinearregressionwithfiveregressors:

bcoef<-matrix(0,1000,6)Program:for(iin1:1000){

newy<-g$fit+g$res[sample(47,rep=T)]

brg<-lm(newy~y)

bcoef[i,]<-brg$coef}Heregistheoutputfromthedatawithregressionanalysis.TestandConfidenceIntervalTotestthenullhypothesisthatH0:b1=0againstthealternativeH1:b1>

0,wemayfigurewhatfractionofthebootstrapsampledb1

werelessthanzero:length(bcoef[bcoef[,2]<0,2])/1000:Itleadsto0.019.Thep-valueis1.9%andwerejectthenullatthe5%level.Wecanalsomakea95%confidenceintervalforthisparameterbytakingtheempiricalquantiles:quantile(bcoef[,2],c(0.025,0.975))2.5%97.5%0.000990370.01292449Wecangetabetterpictureofthedistributionbylookingatthedensityandmarkingtheconfidenceinterval:plot(density(bcoef[,2]),xlab="CoefficientofRace",main="")abline(v=quantile(bcoef[,2],c(0.025,0.975)))Bootstrapdistributionofb1

with95%confidenceintervalsStudytheAssociationbetweenNumberandPayoff我們為何要研究中獎號碼?這個彩卷的發行是否公平?

何謂彩卷的發行是公平的?中獎號碼的分配是否接近於一離散均勻分配?

如何檢查中獎號碼的分配是否接近於一離散均勻分配?length(lottery.number)#254breaks<-100*(0:10);breaks[1]<--1hist(lottery.number,10,breaks)abline(256/10,0)直條圖看起來相當平坦(goodnes-of-fittest)除非能預測未來,我們挑選的號碼僅有千分之一的機會中獎這個彩卷的期望獎金為何?當每張彩卷以50分出售,如果反覆買這個彩卷,我們期望中獎時,其獎金至少為$500,因為中獎機率為1/1000。boxplot(lottery.payoff,main="NJPick-itLottery+(5/22/75-3/16/76)",sub="Payoff")lottery.label<-”NJPick-itLottery(5/22/75-3/16/76)”hist(lottery.payoff,main=lottery.label) DataAnalysis是否中獎獎金曾多次高過$500?

該如何下注?中獎獎金是否含outliers?

min(lottery.payoff) #最低中獎獎金83lottery.number[lottery.payoff==min(lottery.payoff)]# 123 #<,>,<=,>=,==,!=:比較指令max(lottery.payoff) #最高中獎獎金869.5lottery.number[lottery.payoff==max(lottery.payoff)]#499

plot(lottery.number,lottery.payoff);abline(500,0)#迴歸分析無母數迴歸分析Load“modreg”package.a<-loess(lottery.payoff~lottery.number,span=50,degree=2)a<-rbind(lottery.number[lottery.payoff>=500],lottery.payoff[lottery.payoff>=500])高額中獎獎金的中獎號碼是否具有任何特徵?高額中獎獎金的中獎號碼特徵特徵:大部份高額獎金中獎號碼,都有重複的數字。此彩卷有一特別下注的方式稱作「combinationbets」,下注號碼必須是三個不同的數字,只要下注號碼與中獎號碼中所含的數字相同就算中獎。plot(a[1,],a[2,],xlab="lottery.number",ylab="lottery.payoff",main="Payoff>=500")boxplot(split(lottery.payoff,lottery.number%/%100),sub="LeadingDigitofWinningNumbers",ylab="Payoff")

依據中獎號碼的首位數字製作盒狀圖。當中獎號碼的首位數字為零時,其獎金都較高。一個解釋是較少人會下注這樣的號碼。在不同時間下,中獎獎金金額的比較。qqplot(lottery.payoff,lottery3.payoff);abline(0,1)使用盒狀圖來比較不同時間下,中獎獎金金額的分配。boxplot(lottery.payoff,lottery2.payoff,lottery3.payoff)依時間先後來看,中獎獎金金額漸漸穩定下來,很少能超過$500。rbind(lottery2.number[lottery2.payoff>=500],lottery2.payoff[lottery2.payoff>=500])rbind(lottery3.number[lottery3.payoff>=500],lottery3.payoff[lottery3.payoff>=500])NewJerseyPick-ItLottery(每天開獎)三筆數據(收集於不同的時間):lottery(254個中獎號碼由1975年5月22日至1976年3月16日)number:中獎號碼由000至999;這個樂透獎自1975年5月22日開始。payoff:中獎號碼所得到的獎金金額;獎金金額為所有中獎者來平分當日下注總金額的半數。lottery2(1976年11月10日至1977年9月6日的中獎號碼及獎金)。lottery3(1980年12月1日至1981年9月22日的中獎號碼獎金)。lottery.number<-scan("c:/lotterynumber.txt")lottery.payoff<-scan("c:/lotterypayoff.txt")

僅看這一連串的中獎號碼,是頗難看出個所以然。lottery2<-scan("c:/lottery2.txt")lottery2<-matrix(lottery2,byrow=F,ncol=2)lottery2.payoff<-lottery2[,2];lottery2.number<-lottery2[,1]lottery3<-matrix(scan("c:/lottery3.txt"),byrow=F,ncol=2)lottery3.payoff<-lottery3[,2];lottery3.number<-lottery3[,1]OldFaithfulGeyserinYellowstoneNationalPark

研究目的:便利遊客安排旅遊瞭解geyser形成的原因,以

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论