




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Database System Concepts 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarshan14.2Database System Concepts - 5th Edition, Oct 5, 2006.nIntroduction nTransformation of Relational ExpressionsnCatalog Information for Cost EstimationnStatistical I
2、nformation for Cost EstimationnCost-based optimizationnDynamic Programming for Choosing Evaluation PlansnMaterialized views Silberschatz, Korth and Sudarshan14.3Database System Concepts - 5th Edition, Oct 5, 2006.nAlternative ways of evaluating a given querylEquivalent expressionslDifferent algorith
3、ms for each operationSilberschatz, Korth and Sudarshan14.4Database System Concepts - 5th Edition, Oct 5, 2006.nAn evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.Silberschatz, Korth and Sudarshan14.5Database System Con
4、cepts - 5th Edition, Oct 5, 2006.nCost difference between evaluation plans for a query can be enormouslE.g. seconds vs. days in some casesnSteps in cost-based query optimization1.Generate logically equivalent expressions using equivalence rules2.Annotate resultant expressions to get alternative quer
5、y plans3.Choose the cheapest plan based on estimated costnEstimation of plan cost based on:lStatistical information about relations. Examples:4number of tuples, number of distinct values for an attributelStatistics estimation for intermediate results4to compute cost of complex expressionslCost formu
6、lae for algorithms, computed using statisticsDatabase System Concepts 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarshan14.7Database System Concepts - 5th Edition, Oct 5, 2006.nTwo relational algebra expressions are said to be equivalent if
7、 the two expressions generate the same set of tuples on every legal database instancelNote: order of tuples is irrelevantnIn SQL, inputs and outputs are multisets of tupleslTwo expressions in the multiset version of the relational algebra are said to be equivalent if the two expressions generate the
8、 same multiset of tuples on every legal database instance. nAn equivalence rule says that expressions of two forms are equivalentlCan replace expression of first form by second, or vice versaSilberschatz, Korth and Sudarshan14.8Database System Concepts - 5th Edition, Oct 5, 2006.1.Conjunctive select
9、ion operations can be deconstructed into a sequence of individual selections.2.Selection operations are commutative.3.Only the last in a sequence of projection operations is needed, the others can be omitted.4.Selections can be combined with Cartesian products and theta joins.a.(E1 X E2) = E1 E2 b.1
10、(E1 2 E2) = E1 1 2 E2 )()(1221EE)()(2121EE)()(121EELLnLLSilberschatz, Korth and Sudarshan14.9Database System Concepts - 5th Edition, Oct 5, 2006.5. Theta-join operations (and natural joins) are commutative.E1 E2 = E2 E16. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3)(b) Theta
11、joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3) where 2 involves attributes from only E2 and E3.Silberschatz, Korth and Sudarshan14.10Database System Concepts - 5th Edition, Oct 5, 2006.Silberschatz, Korth and Sudarshan14.11Database System Concepts - 5th Edition, O
12、ct 5, 2006.7. The selection operation distributes over the theta join operation under the following two conditions:(a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined. 0E1 E2) = (0(E1) E2 (b) When 1 involves only the attributes of E1 and 2 involve
13、s only the attributes of E2. 1 E1 E2) = (1(E1) ( (E2)Silberschatz, Korth and Sudarshan14.12Database System Concepts - 5th Edition, Oct 5, 2006.8. The projection operation distributes over the theta join operation as follows:(a) if involves only attributes from L1 L2:(b) Consider a join E1 E2. l Let
14、L1 and L2 be sets of attributes from E1 and E2, respectively. lLet L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, andl let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.)()()(21212121EEEELLLL)()()(212142312121EEEELLLLLLLLSilbers
15、chatz, Korth and Sudarshan14.13Database System Concepts - 5th Edition, Oct 5, 2006.9.The set operations union and intersection are commutative E1 E2 = E2 E1 E1 E2 = E2 E1 n(set difference is not commutative).10.Set union and intersection are associative. (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E
16、3)11.The selection operation distributes over , and . (E1 E2) = (E1) (E2) and similarly for and in place of Also: (E1 E2) = (E1) E2 and similarly for in place of , but not for 12. The projection operation distributes over union L(E1 E2) = (L(E1) (L(E2) Silberschatz, Korth and Sudarshan14.14Database
17、System Concepts - 5th Edition, Oct 5, 2006.nQuery: Find the names of all customers who have an account at some branch located in Brooklyn.customer_name(branch_city = “Brooklyn”(branch (account depositor)nTransformation using rule 7a. customer_name (branch_city =“Brooklyn” (branch) (account depositor
18、)nPerforming the selection as early as possible reduces the size of the relation to be joined. Silberschatz, Korth and Sudarshan14.15Database System Concepts - 5th Edition, Oct 5, 2006.nQuery: Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000.cu
19、stomer_name(branch_city = “Brooklyn” balance 1000 (branch (account depositor)nTransformation using join associatively (Rule 6a):customer_name(branch_city = “Brooklyn” balance 1000 (branch account) depositor) nSecond form provides an opportunity to apply the “perform selections early” rule, resulting
20、 in the subexpression branch_city = “Brooklyn” (branch) balance 1000 (account)nThus a sequence of transformations can be usefulSilberschatz, Korth and Sudarshan14.16Database System Concepts - 5th Edition, Oct 5, 2006.Silberschatz, Korth and Sudarshan14.17Database System Concepts - 5th Edition, Oct 5
21、, 2006.nWhen we compute(branch_city = “Brooklyn” (branch) account )we obtain a relation whose schema is:(branch_name, branch_city, assets, account_number, balance)nPush projections using equivalence rules 8a and 8b; eliminate unneeded attributes from intermediate results to get: customer_name ( acco
22、unt_number ( (branch_city = “Brooklyn” (branch) account ) depositor )nPerforming the projection as early as possible reduces the size of the relation to be joined. customer_name(branch_city = “Brooklyn” (branch) account) depositor) Silberschatz, Korth and Sudarshan14.18Database System Concepts - 5th
23、 Edition, Oct 5, 2006.nFor all relations r1, r2, and r3,(r1 r2) r3 = r1 (r2 r3 )(Join Associativity)nIf r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation.Silberschatz, Korth and Sudarshan14.19Database System Concepts - 5th Edition
24、, Oct 5, 2006.nConsider the expressioncustomer_name (branch_city = “Brooklyn” (branch) (account depositor)nCould compute account depositor first, and join result with branch_city = “Brooklyn” (branch)but account depositor is likely to be a large relation.nOnly a small fraction of the banks customers
25、 are likely to have accounts in branches located in Brooklynl it is better to compute branch_city = “Brooklyn” (branch) account first. Silberschatz, Korth and Sudarshan14.20Database System Concepts - 5th Edition, Oct 5, 2006.nQuery optimizers use equivalence rules to systematically generate expressi
26、ons equivalent to the given expressionnCan generate all equivalent expressions as follows: l Repeat4apply all applicable equivalence rules on every equivalent expression found so far4add newly generated expressions to the set of equivalent expressions Until no new equivalent expressions are generate
27、d abovenThe above approach is very expensive in space and timelTwo approaches4Optimized plan generation based on transformation rules4Special case approach for queries with only selections, projections and joinsSilberschatz, Korth and Sudarshan14.21Database System Concepts - 5th Edition, Oct 5, 2006
28、.nSpace requirements reduced by sharing common sub-expressions:lwhen E1 is generated from E2 by an equivalence rule, usually only the top level of the two are different, subtrees below are the same and can be shared using pointers4E.g. when applying join commutativitylSame sub-expression may get gen
29、erated multiple times4Detect duplicate sub-expressions and share one copynTime requirements are reduced by not generating all expressionslDynamic programming4We will study only the special case of dynamic programming for join order optimizationE1E2Silberschatz, Korth and Sudarshan14.22Database Syste
30、m Concepts - 5th Edition, Oct 5, 2006.nCost of each operator computer as described in Chapter 13lNeed statistics of input relations4E.g. number of tuples, sizes of tuplesnInputs can be results of sub-expressionslNeed to estimate statistics of expression resultslTo do so, we require additional statis
31、tics4E.g. number of distinct values for an attributenMore on cost estimation laterSilberschatz, Korth and Sudarshan14.23Database System Concepts - 5th Edition, Oct 5, 2006.nMust consider the interaction of evaluation techniques when choosing evaluation planslchoosing the cheapest algorithm for each
32、operation independently may not yield best overall algorithm. E.g.4merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation.4nested-loop join may provide opportunity for pipeliningnPractical query optimizers incorporate elements
33、 of the following two broad approaches:1. Search all the plans and choose the best plan in a cost-based fashion.2. Uses heuristics to choose a plan.Silberschatz, Korth and Sudarshan14.24Database System Concepts - 5th Edition, Oct 5, 2006.nConsider finding the best join-order for r1 r2 . . . rn.nTher
34、e are (2(n 1)!/(n 1)! different join orders for above expression. With n = 7, the number is 665280, with n = 10, the number is greater than 176 billion!nNo need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of r1, r2, . . . rn is computed only o
35、nce and stored for future use. Silberschatz, Korth and Sudarshan14.25Database System Concepts - 5th Edition, Oct 5, 2006.nTo find best join tree for a set of n relations:lTo find best plan for a set S of n relations, consider all possible plans of the form: S1 (S S1) where S1 is any non-empty subset
36、 of S.lRecursively compute costs for joining subsets of S to find the cost of each plan. Choose the cheapest of the 2n 1 alternatives.lBase case for recursion: single relation access plan4Apply all selections on Ri using best choice of indices on RilWhen plan for any subset is computed, store it and
37、 reuse it when it is required again, instead of recomputing it4Dynamic programmingSilberschatz, Korth and Sudarshan14.26Database System Concepts - 5th Edition, Oct 5, 2006.procedure findbestplan(S)if (bestplanS.cost )return bestplanS/ else bestplanS has not been computed earlier, compute it nowif (S
38、 contains only 1 relation) set bestplanS.plan and bestplanS.cost based on the best way of accessing S /* Using selections on S and indices on S */ else for each non-empty subset S1 of S such that S1 SP1= findbestplan(S1)P2= findbestplan(S - S1)A = best algorithm for joining results of P1 and P2cost
39、= P1.cost + P2.cost + cost of Aif cost bestplanS.cost bestplanS.cost = costbestplanS.plan = “execute P1.plan; execute P2.plan; join results of P1 and P2 using A”return bestplanSSilberschatz, Korth and Sudarshan14.27Database System Concepts - 5th Edition, Oct 5, 2006.nIn left-deep join trees, the rig
40、ht-hand-side input for each join is a relation, not the result of an intermediate join.Silberschatz, Korth and Sudarshan14.28Database System Concepts - 5th Edition, Oct 5, 2006.nWith dynamic programming time complexity of optimization with bushy trees is O(3n). lWith n = 10, this number is 59000 ins
41、tead of 176 billion!nSpace complexity is O(2n) nTo find best left-deep join tree for a set of n relations:lConsider n alternatives with one relation as right-hand side input and the other relations as left-hand side input.lModify optimization algorithm:4Replace “for each non-empty subset S1 of S suc
42、h that S1 S”4By: for each relation r in S let S1 = S r .nIf only left-deep trees are considered, time complexity of finding best join order is O(n 2n)lSpace complexity remains at O(2n) nCost-based optimization is expensive, but worthwhile for queries on large datasets (typical queries have small n,
43、generally 10)Silberschatz, Korth and Sudarshan14.29Database System Concepts - 5th Edition, Oct 5, 2006.nConsider the expression (r1 r2) r3 (with A as common attribute)nAn interesting sort order is a particular sort order of tuples that could be useful for a later operationlUsing merge-join to comput
44、e r1 r2 may be costlier than hash join but generates result sorted on AlWhich in turn may make merge-join with r3 cheaper, which may reduce cost of join with r3 and minimizing overall cost lSort order may also be useful for order by and for groupingnNot sufficient to find the best join order for eac
45、h subset of the set of n given relationslmust find the best join order for each subset, for each interesting sort orderlSimple extension of earlier dynamic programming algorithmslUsually, number of interesting orders is quite small and doesnt affect time/space complexity significantlySilberschatz, K
46、orth and Sudarshan14.30Database System Concepts - 5th Edition, Oct 5, 2006.nCost-based optimization is expensive, even with dynamic programming.nSystems may use heuristics to reduce the number of choices that must be made in a cost-based fashion.nHeuristic optimization transforms the query-tree by u
47、sing a set of rules that typically (but not in all cases) improve execution performance:lPerform selection early (reduces the number of tuples)lPerform projection early (reduces the number of attributes)lPerform most restrictive selection and join operations (i.e. with smallest result size) before o
48、ther similar operations.lSome systems use only heuristics, others combine heuristics with partial cost-based optimization.Silberschatz, Korth and Sudarshan14.31Database System Concepts - 5th Edition, Oct 5, 2006.nMany optimizers considers only left-deep join orders.lPlus heuristics to push selection
49、s and projections down the query treelReduces optimization complexity and generates plans amenable to pipelined evaluation.nHeuristic optimization used in some versions of Oracle:lRepeatedly pick “best” relation to join next 4Starting from each of n starting points. Pick best among thesenIntricacies
50、 of SQL complicate query optimizationlE.g. nested subqueriesSilberschatz, Korth and Sudarshan14.32Database System Concepts - 5th Edition, Oct 5, 2006.nSome query optimizers integrate heuristic selection and the generation of alternative access plans.lFrequently used approach4heuristic rewriting of n
51、ested block structure and aggregation4followed by cost-based join-order optimization for each blocklSome optimizers (e.g. SQL Server) apply transformations to entire query and do not depend on block structurenEven with the use of heuristics, cost-based query optimization imposes a substantial overhe
52、ad.lBut is worth it for expensive querieslOptimizers often use simple heuristics for very cheap queries, and perform exhaustive enumeration for more expensive queries Database System Concepts 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarsh
53、an14.34Database System Concepts - 5th Edition, Oct 5, 2006.nnr: number of tuples in a relation r.nbr: number of blocks containing tuples of r.nlr: size of a tuple of r.nfr: blocking factor of r i.e., the number of tuples of r that fit into one block.nV(A, r): number of distinct values that appear in
54、 r for attribute A; same as the size of A(r).nIf tuples of r are stored together physically in a file, then: rfrnrbSilberschatz, Korth and Sudarshan14.35Database System Concepts - 5th Edition, Oct 5, 2006.nHistogram on attribute age of relation personnEqui-width histogramsnEqui-depth histogramsSilbe
55、rschatz, Korth and Sudarshan14.36Database System Concepts - 5th Edition, Oct 5, 2006.n A=v(r)4nr / V(A,r) : number of records that will satisfy the selection4Equality condition on a key attribute: size estimate = 1nAV(r) (case of A V(r) is symmetric)lLet c denote the estimated number of tuples satis
56、fying the condition. lIf min(A,r) and max(A,r) are available in catalog4c = 0 if v min(A,r)4c =l If histograms available, can refine above estimatelIn absence of statistical information c is assumed to be nr / 2.),min(),max(),min(.rArArAvnrSilberschatz, Korth and Sudarshan14.37Database System Concep
57、ts - 5th Edition, Oct 5, 2006.nThe selectivity of a condition i is the probability that a tuple in the relation r satisfies i . l If si is the number of satisfying tuples in r, the selectivity of i is given by si /nr.nConjunction: 1 2. . . n (r). Assuming indepdence, estimate of tuples in the result
58、 is:nDisjunction:1 2 . . . n (r). Estimated number of tuples:nNegation: (r). Estimated number of tuples:nr size(r)nrnrnsssn . . . 21)1 (.)1 ()1 (121rnrrrnsnsnsnSilberschatz, Korth and Sudarshan14.38Database System Concepts - 5th Edition, Oct 5, 2006.Running example: depositor customerCatalog informa
59、tion for join examples:nncustomer = 10,000.nfcustomer = 25, which implies that bcustomer =10000/25 = 400.nndepositor = 5000.nfdepositor = 50, which implies that bdepositor = 5000/50 = 100.nV(customer_name, depositor) = 2500, which implies that , on average, each customer has two accounts.lAlso assum
60、e that customer_name in depositor is a foreign key on customer.lV(customer_name, customer) = 10000 (primary key!)Silberschatz, Korth and Sudarshan14.39Database System Concepts - 5th Edition, Oct 5, 2006.nThe Cartesian product r x s contains nr .ns tuples; each tuple occupies sr + ss bytes.nIf R S =
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 天津英语高考试题及答案
- 沪科版七年级上册数学第一次月考全真模拟试卷(含答案)
- 皮毛微生态肺过敏关联-洞察及研究
- 中国保险中介管理办法
- 规范石材加工管理办法
- 要素资源评估管理办法
- 警车保安登记管理办法
- 自营与资产管理办法
- 中央救市措施管理办法
- 英威腾项目管理办法
- 2025年公安警种知识测试题及答案
- 抵押车贷合同(标准版)
- 2025年秋季学期教科版三年级上册科学教学计划(三篇)
- 农民公寓买卖合同协议书
- 燃气检修工模拟试题(附答案)
- 2025居间服务合同范本(合同版本)
- JJG 693-2011可燃气体检测报警器
- 机械制图(第五版)全套课件
- 人卫慕课《走进肺功能》试题答案
- 四川省扶贫和移民工作局移民安置独立评估细则-范文
- 工程项目管理课程设计实例
评论
0/150
提交评论