版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Tutorial 7:File and Index StructureMemory HierarchyCost (money)Access (time)volatileCapacityCacheVery expensiveVery FastYesKB - MBMain memoryExpensiveFastYesMB - GBDiskInexpensiveSlowNoGB - TBTapeVery inexpensiveVery slowNoTB - PBDatabaseAccessing DiskAccess unit: disk blocks or pages (typically a b
2、lock is 1K-16Kbytes)Physical movement, random accessTime to access (read/write) a disk blockseek time + rotational delay + transfer time Data File Organization Databasea collection of filesFilea sequence of recordsRecorda sequence of fieldsStoring records in filesFixed-Length RecordsFree ListsVariab
3、le-Length RecordsByte String RepresentationReserved spacePointer MethodSlotted Page StructureFixed-Length RecordsFixed-Length RecordsRecord access is simple: n*(i-1)Problem: what if deletion?Free listiData File OrganizationVariable-Length RecordsByte String Representation : Attach an end-of-record (
4、) control character to the end of each recordProblem: deletion & record growthReserved space : use fixed-length records of a known maximum lengthProblem: waste storage spacePointer Method : Anchor block + Overflow block Data File OrganizationVariable-Length RecordsByte String Representation : Attach
5、 an end-of-record () control character to the end of each recordProblem: deletion & record growthReserved space : use fixed-length records of a known maximum lengthProblem: waste storage spacePointer Method : Anchor block + Overflow block Slotted Page Structure : Low cost for data movement within a
6、pageData File OrganizationDatabasea collection of filesFilea sequence of recordsRecorda sequence of fieldsHeaprecord can be placed anywhere in the file where there is spaceSequentialstore records in sequential orderHashinga hash function computed on some attribute of each recordExercise 1Assume that
7、 a school keeps a file with the records of its students: Student (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes), dept-id is the department id where a student belongs to.There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequ
8、entially on sid.Q1: Whats the size of a record?Q2: How many pages do we need to store these student records?Q3: Given this data file, whats the cost to find a particular student?ExerciseStudent (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A p
9、age is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.Whats the size of a record? 4 + 10 + 4 = 18ExerciseStudent (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The d
10、ata file is sorted sequentially on sid.How many pages do we need to store these student records? (#records * size of record) / size of page = 10,000 * 18 / 128 = 1406.25 1407Page size: 128 bytes18 bytes18 bytes18 bytes7 records, of size 18*7=126 bytes18 bytes18 bytes18 bytesDo not allow records to c
11、ross page boundariesExerciseStudent (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.How many pages do we need to store these student records? (# of records
12、 / size of page / size of record ) = (10,000 / 128 / 18 ) = 1429Page size: 128 bytes18 bytes18 bytes18 bytes7 records, of size 18*7=126 bytes18 bytes18 bytes7 recordsExerciseStudent (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 b
13、ytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.Q3: Given this data file, whats the cost to find a particular student? # of pages required: 1429Binary search on sid: log2(# of pages) = log21429 = 11 IndexingSpeed up access to desired data.Search Key - attribute to set of a
14、ttributes used to look up records in a file.Two kinds of indexOrdered indexHash IndexIndexPrimary index (clustering index): in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Secondary index (non-clustering index): an index whose search key specifie
15、s an order different from the sequential order of the file. Dense index: one index, one recordSparse Index: one index, many records; only applicable to clustering indexSingle level Index: index for dataMultilevel Index: index of indexExercise 2Assume that a school keeps a file with the records of it
16、s students: Student (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes), dept-id is the department id where a student belongs to.There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.Q1: Given the data file only,
17、 whats the cost of finding students in a particular department (e.g., CSE)?Q2: How to improve? Exercise 2Student (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially o
18、n sid.Q1: Given the data file only, whats the cost of finding students in a particular department (e.g., CSE)?Pages required to store data file is 1429.Records are sorted on sid, instead of dept-id. Given the data file, the only way is to sequentially scan it. The cost is 1429.Exercise 2Student (sid
19、:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.Q2: How to improve?Build index on dept-id. The entry for an index: (dept-id, pointer), each is of size 8 bytes.
20、Exercise 2Student (sid:4 bytes, sname: 10 bytes, dept-id: 4 bytes)There exist 10,000 student records and 50 departments. A page is 128 bytes and a pointer is 4 bytes. The data file is sorted sequentially on sid.For simplification, we assume each department has exactly 200 students, and a page can on
21、ly contain indices for students in the same department.Q2-1: Buildordered index.CSE001JohnCSECSE008MaryCSE002JohnMATH003LarryMATHMATHData file# of Pages needed:(10,000/ 128/18 ) = 1429Cost:# of pages containing students in the same department.Index file# of Pages needed:(200/ 128/8 ) *50 = 650Cost:1
22、2 + Log2650 = 22MATHExercise 2Q2-1: Buildordered index.Primary index or secondary index?Secondary index.Dense index or sparse index?Dense index.Secondary index must be dense index.CSE001JohnCSECSE008MaryCSE002JohnMATH003LarryMATHMATHMATHExercise 2Primary index or secondary index?Primary index.Dense
23、index or sparse index?Dense index.001001JohnCSE002008MaryCSE002JohnMATH003LarryMATH003008Exercise 2Primary index or secondary index?Primary index.Dense index or sparse index?Sparse index.001001JohnCSE008008MaryCSE002JohnMATH003LarryMATHExercise 2001001JohnCSE017008MaryCSE002JohnMATH003LarryMATH00100
24、200300810,000MaryCSE100001429 pages625 pages625/16= 40 page00125740/16= 3 page0013445/16= 1 page1000010000008MaryCSE01710,000MaryCSE1000000140973/16= 1 page1000010000Assume the memory can only contain one page.The cost to look up a particular student: 4 + 1 = 5(Cost on index file is 4, and cost on d
25、ata file is 1) Hash IndexHashing can be used not only for data file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure.Strictly speaking, hash indices are always secondary indices. Hashing: Hash FunctionsWorst caseHash function maps all search-key values to the same bucketAccess time proportional to the number of search-key valuesData:2,7,12,17,22Hash fuction F(x)=x mod 5Bucket 0Bucket 121271722Bucket 2Bucket 3Bucket 4H
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026西安未央汉城医院招聘笔试模拟试题及答案解析
- 2026广东广州市天河区员村街道综合事务中心招聘环卫工人5人笔试参考题库及答案解析
- 2026天津科技大学第三批招聘96人(博士或副高及以上岗位)考试备考题库及答案解析
- 2026年山西省朔州市高职单招综合素质考试题库含答案详细解析
- 2026福建福州仓山产投集团下属福州仓山城市智能科技发展有限公司招聘2人笔试备考试题及答案解析
- 2026新疆兵团第一师七团医院招聘1人考试备考题库及答案解析
- 大连市政院2026届春季校园招聘考试备考题库及答案解析
- 2026北京金融法院事业单位招聘工作人员笔试备考试题及答案解析
- 2026广西玉林市玉州区名山街道社区卫生服务中心招聘编外人员1人笔试参考题库及答案解析
- 2026中国联通鲁甸分公司招聘2人考试备考题库及答案解析
- 劳动课自制沙拉课件
- 药膳养生鸡汤培训课件
- 监狱辅警面试题目及答案
- 医院运营数据统计分析
- 幼儿跑酷培训
- 2025至2030年中国氟化液行业市场运行态势及产业趋势研判报告
- 毕业设计(论文)-包裹分拣机械结构设计
- 徐州地铁考试题库及答案
- 国家助学贷款诚信教育主题班会
- 危重新生儿转运规范及流程
- 设计费入股合同协议
评论
0/150
提交评论