计算机算法导论_第11章.ppt_第1页
计算机算法导论_第11章.ppt_第2页
计算机算法导论_第11章.ppt_第3页
计算机算法导论_第11章.ppt_第4页
计算机算法导论_第11章.ppt_第5页
已阅读5页,还剩55页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Introduction to Algorithms,III Data Structures,2,Dynamic Sets,Dynamic Sets: Different from mathematical set, the sets manipulated by algorithms can grow, shrink, or otherwise change over time. Algorithms may require several different types of operations to be performed on sets. Dictionary: A dynamic set that only supports the ability to insert elements into, delete elements from, and test membership in a set.,3,Elements of a dynamic set,Typically, each element is represented by an object. An object = Key +satellite data Some dynamic sets presuppose that the keys are drawn from a totally ordered set,4,Operations on dynamic sets,Queries: simply return information about the set SEARCH(S, k), returns a pointer x to an element in S such that keyx = k, or NIL if no such element belongs to S. MINIMUM(S), returns a pointer to the element of S with the smallest key. MAXIMUM(S), returns a pointer to the element of S with the largest key. SUCCESSOR(S, x), returns a pointer to the next larger element in S, or NIL if x is the maximum element. PREDECESSOR(S, x), returns a pointer to the next smaller element in S, or NIL if x is the minimum element. Modifying operations: change the set. INSERT(S, x), augments the set S with the element pointed to by x. DELETE(S, x), given a pointer x to an element in the set S, removes x from S.,5,Overview,The time taken to execute a set operation is usually measured in terms of the size of the set. Next few lectures will focus on data structures rather than straight algorithms Chapter 10 presents the essentials of working with simple data structures such as stacks, queues, linked lists, and rooted trees. Chapter 11 introduces hash tables, which support the dictionary operations INSERT, DELETE, and SEARCH. Chapter 12 Binary search trees Chapter 13 Red-black trees, a variant of binary search trees Chapter 14, augment red-black trees to support operations other than the basic ones,11 Hash Table,7,Hash Tables,Motivation: symbol tables A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. For a symbol table (also called a dictionary), we care about search, insertion, and deletion Typically dont care about sorted order,8,Symbol-table problem,The structure we will use is a hash table Supports all the above in O(1) expected time!,9,Hashing: Keys,consider all keys to be (possibly large) natural numbers How can we convert floats to natural numbers for hashing purposes? How can we convert ASCII strings to natural numbers for hashing purposes?,10,Review,Section 11.1 discusses direct addressing Section 11.2 presents the main ideas, focusing on “chaining“ as a way to handle “collisions“ Section 11.3 describes how array indices can be computed from keys using hash functions Section 11.4 looks at “open addressing,“ which is another way to deal with collisions Section 11.5 introduces “perfect hashing“,11.1 Direct-address tables,12,Direct Addressing,Suppose: The range of keys is 0m-1 Keys are distinct The idea: Set up an array T0m-1 in which Ti = x if x T and keyx = i Ti = NULL otherwise This is called a direct-address table Operations take O(1) time!,13,Direct Addressing,14,Direct Addressing,DIRECT-ADDRESS-SEARCH(T, k) return T k DIRECT-ADDRESS-INSERT(T, x) Tkeyx x DIRECT-ADDRESS-DELETE(T, x) Tkeyx NIL Each of these operations is fast: only O(1) time is required.,15,The Problem With Direct Addressing,Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? Problem 1: direct-address table will have 232 entries, more than 4 billion Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be Solution: map keys to smaller range 0m-1 This mapping is called a hash function,11.2 Hash tables,17,Hash tables,18,Hash tables,19,Hashing: Collisions,20,Hash tables,a well-designed, “random“-looking hash function can minimize the number of collisions a method for resolving the collisions that do occur.,21,Resolving Collisions,How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing,22,Chaining,Chaining puts elements that hash to the same slot in a linked list:,T,k4,k2,k3,k1,k5,U (universe of keys),K (actual keys),k6,k8,k7,k1,k4,k5,k2,k3,k8,k6,k7,23,Chaining,How do we insert an element?,T,k4,k2,k3,k1,k5,U (universe of keys),K (actual keys),k6,k8,k7,k1,k4,k5,k2,k3,k8,k6,k7,24,Chaining,T,k4,k2,k3,k1,k5,U (universe of keys),K (actual keys),k6,k8,k7,k1,k4,k5,k2,k3,k8,k6,k7,How do we delete an element? Do we need a doubly-linked list for efficient delete?,25,Chaining,How do we search for a element with a given key?,T,k4,k2,k3,k1,k5,U (universe of keys),K (actual keys),k6,k8,k7,k1,k4,k5,k2,k3,k8,k6,k7,26,Chaining,CHAINED-HASH-INSERT(T, x) insert x at the head of list Th(keyx) CHAINED-HASH-SEARCH(T, k) search for an element with key k in list Th(k) CHAINED-HASH-DELETE(T, x) delete x from the list Th(keyx),27,Analysis of Chaining,Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key?,28,Analysis of Chaining,Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+),29,Analysis of Chaining,Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search?,30,Analysis of Chaining,Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search? A: O(1 + /2) = O(1 + ),31,Analysis of Chaining,32,Analysis of Chaining,33,Analysis of Chaining,34,Analysis of Chaining,35,Analysis of Chaining,Thus, the total time required for a successful search (including the time for computing the hash function) is (2 + /2 - /2n) = (1 + ).,36,Analysis of Chaining,So the cost of searching = O(1 + ) If the number of keys n is proportional to the number of slots in the table, what is ? A: = O(1) In other words, we can make the expected cost of searching constant if we make constant,11.3 Hash functions,38,Choosing A Hash Function,Clearly choosing the hash function well is crucial What will a worst-case hash function do? What will be the time to search in this case? What are desirable features of the hash function? Should distribute keys uniformly into slots Should not depend on patterns in the data,39,The Division Method,h(k) = k mod m In words: hash k into a table with m slots using the slot given by the remainder of k divided by m What happens to elements with adjacent values of k? What happens if m is a power of 2 (say 2P)? What if m is a power of 10? Upshot: pick table size m = prime number not too close to a power of 2 (or 10),40,The Multiplication Method,For a constant A, 0 A 1: h(k) = m (kA - kA) ,41,The Multiplication Method,For a constant A, 0 A 1: h(k) = m (kA - kA) Choose m = 2P Choose A not too close to 0 or 1 Knuth: Good choice for A = (5 - 1)/2,Fractional part of kA,42,The Multiplication Method,43,The Multiplication Method,11.4 Open addressing,45,Open Addressing,Basic idea : The hash function: h : U 0, 1, ., m - 1 0, 1, ., m - 1. For every key k, the probe sequence : h(k,0),h(k,1), ., h(k,m - 1),46,Open Addressing,Basic idea : To insert: if slot is full, try another slot, , until an open slot is found (probing) HASH-INSERT(T, k) 1 i 0 2 repeat j h(k, i) 3 if Tj = NIL 4 then Tj k 5 return j 6 else i i + 1 7 until i = m 8 error “hash table overflow“,47,Open Addressing,Basic idea : To search, follow same sequence of probes as would be used when inserting the element If reach element with correct key, return it If reach a NULL pointer, element is not in table HASH-SEARCH(T, k) 1 i 0 2 repeat j h(k, i) 3 if Tj = k 4 then return j 5 i i + 1 6 until Tj = NIL or i = m 7 return NIL,48,Open Addressing,Good for fixed sets (adding but no deletion) Example: spell checking Table neednt be much bigger than n Three techniques are commonly used to compute the probe sequences required for open ddressing: Linear Probing Quadratic Probing Double Hashing,49,Linear Probing,Linear probing : Auxiliary hash function: h : U 0, 1, ., m - 1 The hash function: h(k, i) = (h(k) + i) mod m Disadvantage: primary clustering. Long runs of occupied slots build up, increasing the average search time. Clusters arise since an empty slot preceded by i full slots gets filled next with probability (i + 1)/m.,50,Quadratic Probing,Quadratic probing the hash function: h(k, i) = (h(k) +c1 i+c2i2) mod m Disadvantage: h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, called secondary clustering.,51,Double Hashing,Double Hashing the hash function: h(k, i) = (h1(k) + ih2(k) mod m (m2) probe sequences are used

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论