《数据库库系统实现》PPT课件_第1页
《数据库库系统实现》PPT课件_第2页
《数据库库系统实现》PPT课件_第3页
《数据库库系统实现》PPT课件_第4页
《数据库库系统实现》PPT课件_第5页
已阅读5页,还剩203页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

数据元素的表示,HowtolayoutdataondiskHowtomoveittomemory,Principlesarerathersimple,buttherearelotsofvariationsinthedetails,主要内容,如何将SQL数据类型表示成字段?如何将元组表示成记录?如何在存储块中表示记录或元组的集合?如何用块的集合表示和存储关系?如果不同的元组具有不同的记录大小,如何处理?如果记录大小因修改发生改变,如何处理?,PrinciplesofDataLayout,数据元素的表示,Attributesofrelationaltuples(orobjects)representedbysequencesofbytescalledfieldsFieldsgroupedtogetherintorecordsrepresentationoftuplesorobjectsRecordsstoredinblocksFile:collectionofblocksthatformsarelation(ortheextentofanobjectclass),Overview,DataItemsRecordsBlocksFilesMemory,here,数据元素的表示,作业,Whatarethedataitemswewanttostore?,asalary,anameadate,apicture,数据元素的表示,作业,Torepresent:,Integer(short):2bytes(-32000+32000)e.g.,35is,00000000,00100011,arithmeticinterpretationbyhardware,Real,floatingpointnbitsformantissa,mforexponent.,数据元素的表示,作业,Charactersvariouscodingschemessuggested,mostpopularisASCII,Torepresent:,Example(8bitASCII):A:01000001a:011000015:00110101LF:00001010,数据元素的表示,Booleane.g.,TRUEFALSE,11111111,00000000,Torepresent:,Applicationspecifice.g.,RED1GREEN2BLUE3YELLOW4,数据元素的表示,Dates,e.g.:Integer:#dayssinceJan1,19008chars:YYYYMMDD7chars:YYYYDDD10chars:YYYY-MM-DD(SQL2)(notYYMMDD!Why?)Time,e.g.Integer:secondssincemidnightchars:HH:MM:SS.FF(SQL2),Torepresent:,数据元素的表示,StringofcharactersNullterminatede.g.,-Lengthgivene.g.,-Fixedlength,3,Torepresent:,数据元素的表示,Bagofbits,Length,Bits,Record-Collectionofrelatedfields,E.g.:Employeerecord:namefield,salaryfield,date-of-hirefield,.,数据元素的表示,Typesofrecords:,Mainchoices:FIXEDvsVARIABLEFORMATFIXEDvsVARIABLELENGTH,ASCHEMA(notrecord)containsfollowinginformation-numberoffields-typeofeachfield-orderinrecord-meaningofeachfield,Fixedformat,数据元素的表示,Example:fixedformatandlength,Employeerecord(1)E#,2byteinteger(2)E.name,10char.Schema(3)Dept,2bytecode,46,Ford,02,83,Jones,01,Records,数据元素的表示,Recorditselfcontainsformat;“SelfDescribing”,Variableformat,数据元素的表示,Fieldnamecodescouldalsobestrings,i.e.tags(XMLasadatainterchangeformat),#FieldsCodeidentifyingfieldasE#IntegertypeCodeforEnameStringtypeLengthofstr.,Variableformatusefulfor:,“sparse”recordse.g,patientrecordswiththousandsofpossibletestsrepeatingfieldsinformationintegrationfromheterogeneoussources,EXAMPLE:varformatrecordwithrepeatingfieldsEmployeeoneormorechildren,3,E_name:Fred,Child:Sally,Child:Tom,VariantbetweenFIXED/VARformat,Hybridformatonepartisfixed,othervariable,数据元素的表示,Manyvariationsininternalorganizationofrecord,Justtoshowone:lengthoffield,3,F3,10,F1,5,F2,12,*,3,32,5,15,20,F1,F2,F3,totalsize,offsets,012345152032,数据元素的表示,模式信息主要为出现在CREATETABLE语句中的信息;关系的属性属性类型属性在元组中出现的顺序属性或关系自身上的约束,模式信息,数据元素的表示,SQLServer数据行结构,状态位A1-3记录类型5是否存在变长列,状态位B,定长部分的长度,定长数据,列数,NULL位图每个列1位,变长列的数目,列偏移数组,变长列数据,Next:placingrecordsintoblocks,blocks.afile,数据元素的表示,(1)separatingrecords(2)spannedvs.unspanned(3)mixedrecordtypesclustering(4)splitrecords(5)sequencing(6)addressingrecords,Issuesinstoringrecordsinblocks:,数据元素的表示,Block(a)fixedsizerecs.-noneedtoseparate(b)specialmarker(c)giverecordlengths(oroffsets)-withineachrecord-inblockheader(seelater),(1)Separatingrecords,R2,R1,R3,数据元素的表示,Unspanned:recordsarewithinoneblockblock1block2.Spanned:recordsspanblockboundariesblock1block2.,(2)Spannedvs.Unspanned,数据元素的表示,Unspannedismuchsimpler,butmaywastespaceSpannednecessaryifrecordsizeblocksize(e.g.,fieldscontaininglargeBLOBsfor,say,MPEGvideoclips),Spannedvs.unspanned:,数据元素的表示,Example(ofunspannedrecords),106recordseachofsize2,050bytes(fixed)blocksize=4096bytes,Spaceusedabout4x109B,abouthalfwasted,数据元素的表示,Mixed-recordsofdifferenttypes(e.g.DEPT,EMPLOYEE)allowedinsameblocke.g.,ablock:,(3)Mixedrecordtypes,Dep,d1,Emp,e1,Emp,e2,数据元素的表示,Recordsthatarefrequentlyaccessedtogethershouldbeinthesameblock,数据元素的表示,Whydowewanttomix?,Answer:CLUSTERING,Example,Q1:selectDEPT.Name,EMP.Name,fromDEPT,EMPwhereDEPT.Name=EMP.DeptNameablock,DEPT,Name=Toy,.,EMP,DeptName=Toy,.,EMP,DeptName=Toy,.,数据元素的表示,IfQ1frequent,clusteringisgoodButconsiderQ2:SELECT*FROMDEPTIfQ2isfrequent,clusteringiscounter-productive,数据元素的表示,FixedpartinoneblockTypicallyforhybridformatVariablepartinanotherblock,(4)Splitrecords,数据元素的表示,Blockwithfixedrecs.,R1(a),R1(b),Blockswithvariablerecs.,数据元素的表示,Orderingrecordsinfile(andblock)bysomekeyvalueSequentialfile(sequenced),(5)Sequencing,Typicallytomakeitpossibletoefficientlyreadrecordsinorder(e.g.,todoamerge-joindiscussedlater),数据元素的表示,SequencingOptions,(a)Nextrecordphysicallycontiguous.(b)Linked,Next(R1),R1,R1,Next(R1),数据元素的表示,(c)OverflowareaRecordsinsequence,R1,R2,R3,R4,R5,SequencingOptions,数据元素的表示,Howdoesonerefertorecords?,(6)Addressingrecords,Rx,数据元素的表示,PurelyPhysical,DeviceIDE.g.,RecordCylinder#Address=Track#orIDBlock#Offsetinblock,BlockID,数据元素的表示,FullyIndirect,E.g.,RecordIDisarbitrarybytestring(say,anobjectID)maptablerecIDraddressa,Physicaladdr.,RecID,数据元素的表示,Ex#1Indirectioninblock,HeaderAblock:Freespace,R4,R3R2R1,数据元素的表示,Blockheader-dataatbeginningthatdescribesblock,Maycontain:-FileID(orRELATIONorDBID);theIDofthisblock-Recorddirectory;Pointertofreespace-Typeofblock(e.g.containsrecordsoftype4;isoverflow,)-Pointertootherblocks“likeit”(say,ifpartofanindexstructure)-Timestamp.,SQLServer的数据页,数据页是8K,包括三部分:页头、数据行和行偏移数组页头96个字节pageID,nextpage,prevpage,objID,lsn,slotCnt,level,indexID,freeData,pminlen数据行,最大长度8060个字节数据行不能跨页存储行偏移数组,每行2个字节DBCCPAGE,Block,(1)Deletion,Rx,数据元素的表示,Options:,(a)Immediatelyreclaimspace(b)Markdeleted,Mayneedchainofdeletedrecords(forre-use)Iftherearepointerstotherecord,eitherneedtosetthemtoNULLorindicatetherecordspacereclaimed,数据元素的表示,Asusual,manytradeoffs.,Howexpensiveistomovevalidrecordtofreespaceforimmediatereclaim?Howmuchspaceiswasted?e.g.,deletedrecords,deletefields,freespacechains,.,数据元素的表示,Easycase:recordsnotinsequenceInsertnewrecordatendoffile,orinanyblockwithspaceforit,(1)Insert,数据元素的表示,Hardcase:recordsinsequenceIffreespace“closeby”,nottoobad-cansliderecordstopreceeding/suceedingblocks-pointerstomovedrecordsrequireforwardingaddressesOruseadditionalblocksasanoverflowarea,Insert,数据元素的表示,Interestingproblems:,Howmuchfreespacetoleaveineachblock,track,cylinder?Howoftentoreorganizefile+overflow?,数据元素的表示,Thereareabout10,000,000waystoorganizemydataondiskWhichoneisrightforme?,Comparison,FlexibilitySpaceUtilizationComplexityPerformance,数据元素的表示,Toevaluateagivenstrategy,estimatefollowingparameters:-spaceusedforexpecteddata-expectedtimeto-fetchrecordgivenitskey-fetchrecordwithnextkey-insert/delete/updaterecord-scanallrecordsinthefile-reorganizefile,数据元素的表示,Example,HowwouldyoudesignMegatron3000storagesystem?(forarelationalDB,lowend)Variablelengthrecords?Spanned?Whatdatatypes?Fixedformat?RecordIDs?Sequencing?Howtohandledeletions?,数据元素的表示,Howtolayoutdataondisk,Summary,Next,Howtofindarecordquickly,givenakey,数据元素的表示,IndexStructures,DatastructureforlocatingrecordswithgivensearchkeyefficientlyvalueAlsofacilitatesafullscanofarelation(ortheextentofanobjectclass)ifrecordsnotstoredonphysicallyconsecutiveblocks,?,value,record,Topics,ConventionalindexesbasedonsequentialfileorderB-treesHashingschemes,SequentialFile,Assumeonly2records/block,SequentialFile,DenseIndex,Indexkeyforeachkeyvalueinfile,SequentialFile,SparseIndex,Indexkeyforeachblockoffile,SequentialFile,Sparse2ndlevelindex,Sparsevs.DenseTradeoff,Sparse:Lessindexspaceperrecord-cankeepmoreofindexinmemoryDense:Cantellifanyrecordexistswithoutaccessingfile(Later:sparsebetterforinsertionsdenseneededforsecondaryindexes),Terms,Searchkey(primarykey)PrimaryindexindexlocationsdeterminelocationsofdatablocksSecondaryindexDenseindex(containsallsearchkeyvalues)SparseindexMulti-levelindex,Next:,DuplicatekeysDeletion/InsertionSecondaryindexes,Duplicatekeys,10,10,10,20,20,30,30,30,10,10,10,20,20,30,30,30,Denseindex,onewaytoimplement?,Duplicatekeys,10,20,30,40,Denseindex,betterway?,Duplicatekeys,10,10,20,30,Sparseindex,oneway?,Duplicatekeys,10,20,30,30,Sparseindex,anotherway?,Duplicatekeys,firstnewkeyfromblock,Duplicatevalues,primaryindex,IndexmaypointtofirstinstanceofeachvalueonlyFileIndex,Summary,a,a,a,b,.,Deletionfromsparseindex,10,30,50,70,90,110,130,150,Indexupdateoperations,Deletionfromsparseindex,10,30,50,70,90,110,130,150,deleterecord40,Deletionfromsparseindex,10,30,50,70,90,110,130,150,deleterecord30,Deletionfromsparseindex,10,30,50,70,90,110,130,150,deleterecords30(a):Simplecase:nounderflow;Otherwise.(b):Borrowkeysfromanadjacentsibling(ifitdoesntbecometooempty);Else.(c):Coalescewithasiblingnode-(d):Cases(a),(b)or(c)atnon-leaf,DeletionfromB+tree,(b)BorrowkeysDelete50,1040100,10203035,4050,n=4,=min#ofkeysinaleaf=5/2=2,(c)CoalescewithasiblingDelete50,2040100,2030,4050,n=4,4045,3037,2526,2022,1014,13,1020,3040,(d)Non-leafcoalesceDelete37,n=4,25,=min#ofkeysinanon-leaf=(n+1)/2-1=3-1=2,B+treedeletionsinpractice,Often,coalescingisnotimplementedToohardandnotworthit!laterinsertionsmayreturnthenodebacktoitsrequiredminimumsizeCompromise:Tryredistributingkeyswithasibling;Ifnotpossible,leaveitthereifallaccessestotherecordsgothroughtheB-tree,canplaceatombstoneforthedeletedrecordattheleaf,WhyB-treesAreGood?,B-treeadaptswelltoinsertionsanddeletions,maintainingbalanceDBAdoesnotneedtocareaboutreorganizingsplit/mergeoperationsratherrare(Howoftenwouldnodeswith200keysbesplit?)-Accesstimesdominatedbykey-lookup(i.e.,traversalfromroottoaleaf),EfficiencyofB-trees,Forexample,assume4KBblocks,4bytekeysand8bytepointersHowmanykeysandpointersfitinanode(=indexblock)?Maxns.t.(4*n+8*(n+1)B4096B?-n=340;340keysand341pointersfitinanode-171341pointersinanon-leafnode,EfficiencyofB-trees(cont.),Assumeanaveragenodehas255pointers-athree-levelB-treehas2552=65025leaveswithtotalof2553orabout16.6millionpointerstorecords-ifrootblockkeptinmainmemory,eachrecordcanbeaccessedwith2+1diskI/Os;Ifall256internalnodesareinmainmemory,recordaccessrequires1+1diskI/Os(256x4KB=1MB;quitefeasible!),Comparison:B-treesvs.staticindexedsequentialfile,Ref#1:HeldAssumeaAVGbucketoccupancyis1.7/2=0.85ofablock,Example:2keys/block,b=4bits,n=2,i=1,0001,1111,0000,1010,insert0101,0101,nowr=41.7n-getnewbucket10anddistributekeysbtwbuckets00and10,-n=3,i=2;distributekeysbtwbuckets00and10:,000110,0101,1111,0000,1010,n=3,i=2;insert0001:,000110,0101,1111,0000,1010,canhaveoverflowchains!,n=3,i=2,000110,0101,1111,0000,1010,0001,insert0111,bucket11notinuse-redirectto01,0111,nowr=61.7n-getnewbucket11,n=4,i=2;distributekeysbtw01and11,00011011,0101,1111,0000,1010,0001,0111,ExampleContinued:Howtogrowbeyondthis?,00011011,1111,1010,0101,0101,0000,m=11(maxusedblock),i=2,.,LinearHashing,Canhandlegrowingfiles-withoutfullreorganizationsNoindirectiondirectoryofextensiblehashingCanhaveoverflowchains-butprobabilityoflongchainscanbekeptlowbycontrollingther/nfillratio(?),Summary,+,+,-,YouwalkintoaStarbucksandseeBenBitdiddleandAlyssaP.Hackerinaheatedargumentaboutextensibleandlinearhashing.Theyeachhavemadesomestatementsandhaveaskedyoutojudgewhethertheirclaimsaretrue.Theyhaveagreedtousea8bitshashkeywhereextensiblehashingusesthetopbitsandlinearhashingusesthelowerbits.Theyalsohaveagreedthateachblockcanholdupto2keysandlinearhashingmaintainsamaximumutilizationof0.75.Assumethehashtableisinitiallyemptyandthereareonlyinserts.Also,emptyblockshouldbecountedincomputationsbelow.,BenclaimsthatforanextensiblehashtableTwith5blocks(excludingthedirectory),theminimumnumberofkeysinTis5(b)AlyssaclaimsthatforalinearhashtableTwith5blocks(whichmayormaynotbeoverflowblock),theminimumnumberofkeysinTis5.(c)Benclaimsthatthemaximumnumberofkeysinalinearhashtablewith5blocks(whichmayormaynotbeoverflowblock)is10.(d)Alyssaclaimsthatthemaximumnumberofkeysinanextensiblehashtablewith5blocks(excludingthedirectory)is10.,Hashing-Howitworks-Dynamichashing-Extensible-Linear,Summary,Next:,IndexingvsHashingIndexdefinitioninSQLMultiplekeyaccess,Hashinggoodforprobesgivenkeye.g.,SELECTFROMRWHERER.A=5,IndexingvsHashing,INDEXING(IncludingBTrees)goodforRangeSearches:e.g.,SELECTFROMRWHERER.A5,IndexingvsHashing,IndexdefinitioninSQL,Createindexnameonrel(attr)Createuniqueindexnameonrel(attr),definescandidatekey,DropINDEXname,CANNOTSPECIFYTYPEOFINDEX(e.g.B-tree,Hashing,)ORPARAMETERS(e.g.LoadFactor,SizeofHash,.).atleastinSQL.,Note,Multidimensionalindexes,MotivationHash-likestructures:GridfilesPartitionedHashFunctionsTree-likestructuresMultiple-keyindexeskd-treesQuad-treesR-trees,Motivation,Manyapplicationsofdatabasesaregeographical(2-d)data.OthersinvolvelargenumberofdimensionsExamples:locationofrestaurantsinacity.Mapdata:zones,countylines,rivers,lakes,etc.(Datahasspatialextent)Salesinformationdescribedbystore,day,item,color,size,etc.Sale=pointinmultidimensionalspace.Studentdescribedbyage,zipcode,maritalstatus.Queries:RangeQuery:“findallMcDonaldrestaurantwithinagivenregion”.NearestNeighborQuery:FindthenearestMcDonaldtomyhousepartialmatchqueries,ATTRIBUTELISTMULTIKEYINDEXe.g.,CREATEINDEXfooONR(A,B,C),Note,Multi-keyIndex,Motivation:FindrecordswhereDEPT=“Toy”ANDSAL50k,StrategyI:,Useoneindex,sayDept.GetallDept=“Toy”recordsandchecktheirsalary,I1,Use2Indexes;ManipulatePointersToySal50k,StrategyII:,MultipleKeyIndexOneidea:,StrategyIII:,I1,I2,I3,Example,ExampleRecordDeptIndexSalaryIndex,Name=JoeDEPT=SalesSAL=15k,10k,15k,17k,21k,12k,15k,15k,19k,kd-trees,AmainmemorydatastructurebasedonbinarysearchtreesEachinteriornodeusesoneattribute,withonlyonevalueLevelsrotateamongtheattributesNotnecessarilybalanced.,Example,X=5,y=5,y=6,x=3,y=2,x=8,x=7,Y=2,Y=6,Interestingapplication:,GeographicDataDATA:,x,y,.,Anotherexample,kd-TreeSearch,Search:Straightforward:justdescenddownthetreelikebinarysearchtrees.Example:FindpointswithYi20FindpointswithXiFindpoints“close”tob=,kd-TreeInsertions,lookuprecordtobeinserted,reachingtheappropriateleaf.Ifenoughroomonleaf,insertintheleafblockElse,findasuitablevaluefortheappropriatedimensionandsplittheleafblock,AdaptingKDTreetoBlockModel,SimilartoB-tree,treenodessplit/mergecouldgetquitecomplexandexpensivePackmanyinteriornodes(formingasubtree)intoablock.Goal:reducethetreeheightManyinterestingpapersonhowtooptimallypacknodesintoblocks.,QuadTree,NodessplitalongalldimensionssimultaneouslyDivisionfixed:byquadrantsAsinkd-tree,wecannotmakequad-treelevelsuniformFork-dimensionalspace,eachsplittingyields2kregions.,QuadTreeExample(k=2),X=5,X=8,X=7,X=3,SW,SE,NE,NW,QuadTreeOperations,Insert:FindleafnodetowhichpointbelongsIfenoughroom,putitthereElse,maketheleafaninteriornodeandgiveitleavesforeachquadrant.Splitthepointsamongthenewleaves.Search:Straighforward:justdescenddowntherightsubtree,R-trees(rangetrees),ExtensionofB-treetomultidimensionalspace.Paginated,balanced,Cansupportbothpointdataanddatawithspatialextent(e.g.,rectangles)Groupobjectsintopossiblyoverlappingclusters(rectanglesinourcase)Searchofarangequeryproceedsalongallpathsthatoverlapwiththequery.,R0,R0,R1,R2,R1,R2,R3,R4,R3,R4,SplitNode,GivenanodesplititintotwonodesthatareeachatleasthalffullMultipleObjectives:minimizeoverlapminimizecoveredareaR-treeminimizescoveredarea,Minimizeoverlap,Minimizecoveredarea,Example,SearchwindowX-treeistraverseddownwhenoverlapwithX,Twomoretypesofmultikeyindexes,GridPa

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论