版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、COMP2311COMP231File and Index StructureCOMP2312OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP2313Disks and FilesDBMS stores information on (“hard”) disks.A disk is a sequence of bytes, each has a disk address.READ: transfer data from disk to main memory (RAM).WRITE: transfer data from RAM
2、 to disk.Data are stored and retrieved in units called disk blocks or pages.Each page has a fixed size, say 512 bytes. It contains a sequence of records.In many (not always) cases, records in a page have the same size, say 100 bytes.COMP2314Why Not Store Everything in Main Memory?Costs too much. RAM
3、 is much more expensive than disk.Main memory is volatile. We want data to be saved between runs. Typical storage hierarchy:Main memory (RAM) for currently used data.Disk for the main database (secondary storage).Tapes for archiving older versions of the data (tertiary storage).COMP2315Components of
4、 a DiskPlattersSpindleDisk headArm movementArm assemblyTracksSectorBlock size is a multiple of sector size (which is fixed).COMP2316File OrganizationDatabase is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields.one file one tableone record one tupl
5、ea record/tuple has a fixed lengthEasy to implement but limited by the file systemCOMP2317Fixed-Length RecordsSimple approach:store record i starting from byte n(i - 1), where n is the size of each record.Record access is simple but records may cross disk blocks.When record i is deleted, how do you
6、handle the released space?Shifting records:move records i+1,n to i, n-1move record n to ilink all free records on a free listiCOMP2318Sequential File OrganizationSuitable for application that require sequential processing of the entire file The records in the file are ordered by a search-keyCOMP2319
7、Sequential File Organization (cont.)Deletion use pointer chainsInsertion must locate the position in the file where the record is to be recordif there is free space insert thereif no free space, insert the record in an overflow blockIn either case, pointer chain must be updatedNeed to reorganize the
8、 file from time to time to restore sequential orderCOMP23110Clustering File OrganizationGood for queries involving depositor customer, and for queries involving one single customer and his accountsBad for queries involving only customerResult in variable size recordsSimple file structure stores each
9、 relation in a separate file can instead store several relations in one file using a clustering file organizationE.g., clustering organization of customer and depositor.1 clustercustomerdepositorCOMP23111OutlineDisks and FilesIndexes and DatabasesHash FilesHierarchy structure12Indexes and DatabasesA
10、 table(conceptual)文件纲要Index 1(Ordered indices, B-tree, hash)Index 2(Ordered indices, B-tree, hash)Records are physicallystored in a hash tablebased on a selected keyCOMP23113Basic ConceptsIndexing mechanisms speed up access to desired dataE.g. If you know the call number of a book, you can go direct
11、ly to the book shelve in the library; otherwise you have to search through all the book shelvesSearch key attributes used to look up records in a file.An index file consists of records (called index entries) of the format:关键字 物理内存COMP23114Basic ConceptsIndex files are typically much smaller than the
12、 original file Two basic kinds of indices:Ordered indices: search keys are stored in sorted orderHash indices: search keys are distributed uniformly across “buckets” using a “hash function”.COMP23115Index Evaluation MetricsHow are indexing techniques evaluated?What access operations are supported ef
13、ficiently? E.g.,records with a specified value in an attribute (equality queries)or records with an attribute value falling in a specified range of values (range queries)Access timeInsertion timeDeletion timeSpace overhead (size of indexes / size of data records)COMP23116Ordered IndicesIn an ordered
14、 index, index entries are stored based on the search key value. E.g., author catalog in library.Indexes are mostly ordered indexes except those based on hash filesCOMP23117Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Also called
15、 clustering indexThe search key of a primary index is usually but not necessarily the primary key.Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.COMP23118Index Structure Design: Dense Index FilesEvery se
16、arch-key value in the data file is indexed.Data recordsDense indexCOMP23119Sparse Index FilesNot all of the search-key values are indexedAdv: reduce index size Disadv: slowerRecords in table are sorted by primary key valuesSearch is done in two steps: (i) find the largest possible key (ii) search th
17、e record forwardCOMP23120Example of Sparse Index FilesTo locate a record with search-key value K we:find index record with largest search-key value Ksearch file sequentially starting at the record to which index record pointsLess space and less maintenance overhead for insertion and deletions.Genera
18、lly slower than dense index for locating records.Good Tradeoff: sparse index with an index entry for every block in file. The disk must read a block of data into main memory anyway, so searching within the block cost little.COMP23121Multilevel IndexIf primary index does not fit in memory, access bec
19、omes expensive. (Why?)To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it.Outer index a sparse index of primary indexInner index the primary index fileIf even outer index is too large to fit in main memory, yet
20、another level of index can be created, and so on (multi-level indices).Indices at all levels must be updated on insertion or deletion from the file.COMP23122Multilevel Index (Cont.)IndexBlock 0IndexBlock 1Inner indexouter indexDatablock 0Datablock 1Datablock 2Datablock 3COMP23123Index Update: Deleti
21、onIf deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.Single-level index deletion:Dense indices similar to file record deletionCOMP23124Index Update: Deletion in Sparse IndicesSparse indices if an entry for the search
22、key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.DowntownCOMP23125Index Update: InsertionSingle-level in
23、dex insertion:Perform a lookup using the search-key value appearing in the record to be inserted.Dense indices - if the search-key value does not appear in the index, insert it.Sparse indices - if index stores an entry for each block of the file, no change needs to be made to the index unless a new
24、block is created. In this case, the first search-key value appearing in the new block is inserted into the index.Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms.COMP23126Secondary IndicesYou can organize a file (say, into indexed sequential
25、file) based on one attribute only (e.g., emp#), but it is not adequate in practiceFrequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.Example 1: if the account database stored sequentially by account
26、number, we may want to find all accounts in a particular branch.Example 2: as above, but where we want to find all accounts with a specified balance or range of balances.We can have a secondary index with an index record for each search-key value: an index record points to a bucket that contains poi
27、nters to all the actual records with that particular search-key value.COMP23127Secondary Index on Balance Field of AccountNote: the file is already indexed basedon branch-nameSecondary IndexCOMP23128Primary and Secondary IndicesSecondary indices have to be dense because records are not sorted by the
28、 secondary index valuesIndices offer substantial benefits when searching for records, always select some attributes to index and re-examine the selection periodicallyWhen a file is modified, every index on the file must be updated. Updating indices imposes overhead on database modificationSequential
29、 scan using primary index is efficient, but a sequential scan using a secondary index is expensive (each record access may fetch a new block from disk.)COMP23129OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP23130Hash FilesIn previous data structure course, you may learn hash table, which
30、is main-memory residentA hash method includes a hash function and a collision handling mechanismIn main memory, the unit of access is based on byte or word sizesIn disk-based hash file, the unit of access is based on disk block size (from 512 to 2048 bytes, depending on OS)COMP23131Hash FilesStatic
31、HashingDynamic HashingCOMP23132Static External HashingBucket 0Bucket iA hash file consists of M buckets of the same size: bucket0, bucket1,. bucketM-l Collisions occur when a new record hashes to a bucket that is already full. A new bucket is created and chained to the overflowed bucketTo reduce ove
32、rflow records, a hash file is typically kept 70-80% fullRecord to beinserted withkey Ki = h(K)Hashfunctionh()OverflowbucketCOMP23133Static External Hashing PropertiesThe hash function h() should distribute the records uniformly among the buckets; otherwise, search time will increase due to many over
33、flow records An ideal hash function will assign roughly the same number of records to each bucket irrespective of the actual distribution of search-key values in the fileThe number of buckets M must be fixed when the file is created; not good when the file size changes widelyAs a design criteria, yo
34、u need to determine the load factorCOMP23134Static External Hashing ExampleHash file organization of account file, using branch-name as key.An example of a hash function defined on a set of charaters in a file organization:There are 10 bucketsTake the ASCII code of each character as an integerSum up
35、 the binary representations of the characters and then applies modulo 10 to the sum to get the bucket numberCOMP23135Example of hash File OrganizationBucket 4Bucket 5Bucket 7Bucket 8COMP23136Hash IndicesHashing can be used not only for file organization, but also for index-structure creation. A hash
36、 index organizes the search keys, with their associated record pointers, into a hash file structure.Hash indices are always secondary indiceshash index(e.g., on salary)data file (e.g., hash or index sequential on emp#)COMP23137Example of hash IndexBucket 0Bucket 1Bucket 2Bucket 3Bucket 4Bucket 5Buck
37、et 6Primary indexUsed as asecondaryindexOverflowbucketHash Index on account-numberCOMP23138Hash FilesStatic HashingDynamic HashingCOMP23139Dynamic HashingGood for database that grows and shrinks in sizeAllows the hash function to be modified dynamicallyWhen the hash function takes modulo 10 in the p
38、revious example, the number of buckets is fixed to 10The trick is how to change the hash function so that the number of buckets can changeAnd at the same time, without the need of rehashing the existing records!Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehash
39、ed not a good ideaCOMP23140Extendable HashingExtendable hashing - one form of dynamic hashingHashing function generates values over a large range - typically b-bit integers, with b = 32.At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of
40、the prefix be i bits, 0 i 32Initially i = 1, meaning that it can index at most 2 bucketsWhen the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth.i grows and shrinks as the size of the database grows and shrinks.Actual number of b
41、uckets is ij (more than one pointer to bucket j)allocate a new bucket z, and set ij and iz to the old ij+1.make the second half of the bucket address table entries pointing to j to point to zremove and reinsert each record in bucket j.recompute new bucket for Kj and insert record in the bucket (furt
42、her splitting is required if the bucket is still full)1222could be any number 1COMP23150Split in Extendable hash StructureTo split a bucket j when inserting record with search-key value Kj;If i = ij (only one pointer to bucket j)increment i and double the size of the bucket address table.Replace eac
43、h entry in the table by two entries that point to the same bucket.Re-compute new bucket address table entry for Kj, now i ij, so use the first case above. i0=2i1=2i2=2i3=2new recordCOMP23151Example: Use of Extendable Hash StructureBranch-nameh(branch-name)Brighton0010 1101 1111 1011 0010 1100 0011 0
44、000Downtown1010 0011 1010 0000 1100 0110 1001 1111Mianus1100 0111 1110 1101 1011 1111 0011 1010Perryridge1111 0001 0010 0100 1001 0011 0110 1101Redwood 0011 0101 1010 0110 1100 1001 1110 1011Round hill1101 1000 0011 1111 1001 1100 0000 0001Initial Hash structure, Bucket size=2Bucket 0hash address ta
45、ble0COMP23152Insert:0010 1101 1111 1011 0010 1100 0011 0000no bit is needed from the hash value (i=0)Brighton, A-217, 750Brighton, A-217, 7500COMP23153Insert:1010 0011 1010 0000 1100 0110 1001 1111no bit is needed from the hash value (i=0)Downtown, A-101, 500Brighton, A-217, 7500Downtown, A-101, 500Insert:bucket full, split records according to first bit (i=1)Downtown, A-101, 600COMP2315411Brighton0010 1101 1111 1011 0010 1100 0011 0000Downtown1010 0011 1010 0000 1100 0110 1001 1111Brighton, A-217, 750Downtown, A-101, 500Downtown, A-101, 600Insert:1
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 肝移植免疫抑制剂优化
- 金融数据隐私保护技术-第160篇
- 2025年木雕工艺师技能鉴定考试通知试题及答案
- 出口合同范例模板(3篇)
- 宠物抚养合同模板(3篇)
- 战区俱乐部考核制度
- 清华北大考核制度
- 环境卫生月考核制度
- ktv工程考核制度
- 医药代公司考核制度
- 神经内科卒中患者误吸风险的多维度评估
- 机加工检验员培训课件
- 上海市奉贤区2026届初三一模物理试题(含答案)
- 2025年数字货币跨境结算法律场景报告
- 医院消毒供应监测基本数据集解读与实践
- 2025年中国联通AI+研发效能度量实践报告
- 2026年新高考历史全真模拟试卷 3套(含答案解析)
- 恶性肿瘤高钙血症
- 民房火灾扑救要点与处置流程
- 安全生产自查自纠报告及整改措施
- 中小企业数字化转型城市试点实施指南
评论
0/150
提交评论