SQL多表连接查询优化的相关研究_第1页
SQL多表连接查询优化的相关研究_第2页
SQL多表连接查询优化的相关研究_第3页
SQL多表连接查询优化的相关研究_第4页
SQL多表连接查询优化的相关研究_第5页
已阅读5页,还剩43页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、多表连接查询优化的相关研究多表连接查询优化的相关研究吕吕 彬彬2009.3.5motivation图中表示连接顺序对查询效率的图中表示连接顺序对查询效率的影响:影响:准确估计选择度要考虑属性间的相关性问题关键:问题关键:高效地计算属性间的相关度AgendapMulti-table join overviewHeuristic and randomized optimization for the join ordering problemMichael Steinbrunn,et al, The VLDB Journal (1997) 6: 191208pAttribute correlati

2、on detectionBHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational DataPaul G. Brown Peter J. Haas , Proceedings of the 29th VLDB Conference, 2003CORDS: Automatic Discovery of Correlations and Soft Functional DependenciesIhab F. Ilyas,Volker Markl, et al, SIGMOD 2004, June 1318, 200

3、4,COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations DetectionCAO Wei1, QIN Xiongpai, WANG Shan,WAIM2008pStar joinStar Gazing from atop your DB2 z/OS Database ServerTerry Purcell, et al, Intelligent OptimizerStar join revisited: Performance internals for cluster archit

4、ecturesJosep Aguilar-Saborit, Data & Knowledge Engineering 63 (2007) 9951013pHeuristic and randomized optimization for the join ordering problemChoosing join type based on costSolution space for the join ordering problemJoin ordering strategiesQuantitative analysisConclusionMulti-table join over

5、viewpChoosing join type based on costCost modelsNested loop joinSort-merge joinHash joinMulti-table join overviewIO CostElapsed TimeRow CostCPU CostBase CostScan CostPage CostpSolution space for the join ordering problemLeft-deep treesn! ways to allocate n base relations to the trees leavesgood solu

6、tions because of exploiting the cost-reducing pipelining techniqueBushy treesan adaptable plan enumeration strategylinear graphs (n3 n)/6 star graphs (n1) 2n2Multi-table join overviewpJoin ordering strategiesDeterministic algorithmsheuristic or exhaustive searchRandomized algorithmsDefine a set of m

7、oves which constitute edges between the different solutions of the solution spaceperforms a random walk along the edges according to certain rules,terminating as soon as no more applicable moves exist or a time limit is exceededGenetic algorithmsmake use of a randomized search strategy very similar

8、to biological evolutionHybrid algorithmscombine the strategies of pure deterministic and pure randomized algorithmsMulti-table join overviewpQuantitative analysisClass of the join graphRelation cardinalities and domain sizesMulti-table join overviewpQuantitative analysisSolution spacesMulti-table jo

9、in overviewpQuantitative analysisdeterministic algorithmsMulti-table join overviewpQuantitative analysisdeterministic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table

10、join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisTotal running timeFind final solution time Multi-table join overviewpConclusionHeuristic algorithms Vs Ra

11、ndomized and genetic algorithmscompute quickly, but far from the optimum. better suited for join optimizations; although require a longer running timeSolution spaceExcept the star join graph, the bushy tree is preferable than left-deep processing trees.The extensibility of randomized and genetic alg

12、orithmsMulti-table join overviewpQuery-Driven approachesQuery workloadQuery feedbackMultidimensional histogramSASH algorithmAdvantage and disadvantagepData-Driven approachesStatisticalChi-square test, Log-linear modelProbabilityBayesian network, Markov networkFull scan vs. Samplehard FDs, soft FDs,

13、CorrelationAttribute correlation detectionpBHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational DataExampleAttribute correlation detectionpOverview of BHUNTAlgebraic ConstraintsExample a1 : deliveries.deliveryDate, a2 : orders.shipDate, :as the subtraction operator, P : orders.ord

14、erID = deliveries.orderID,I = 2, 3, 4, 5 12, 13, . . . , 19 31, 32, 33, 34, 35 .Attribute correlation detectionpOverview of BHUNTGenerating Candidates C = (a1, a2, P,). Generating Pairing RulesTurning Pairing Rules Into CandidatesIdentifying Fuzzy ConstraintsConstructing Bump Intervals, by applying

15、statistical histogramming, segmentation, or clustering techniques to a sample of the column valuesChoosing the Sample Size, The sample size is selected to control the number of “exception” records that fail to satisfy the constraint.Exploiting the ConstraintsIdentify the most useful set of constrain

16、ts, and create “exception tables” to hold all of the exception records.Modify the queries to incorporate the constraints the optimizer uses the constraints to identify new, more efficient access paths.Attribute correlation detectionpExperimental results of BHUNTAttribute correlation detectionpCORDS:

17、 Automatic Discovery of Correlations and Soft Functional Dependencies.Attribute correlation detectionpCORDS: Automatic Discovery of Correlations and Soft Functional Dependenciessoft functional dependenciesC1 = C2, the value of C1 determines the value of C2 not with certainty, but merely with high pr

18、obability.hard functional dependenciesthe value of C1 completely determines the value of C2.Attribute correlation detectionpCORDS overviewFirst: Enumerating and PruningPruning ruleType constraintStatistical constraintParing constraintWorkload constraintIdentify trivial casesAttribute correlation det

19、ectionpCORDS overviewAttribute correlation detectionAnalyze distinct value of sampled column (to test for SOFT FD)Soft FD?Chi-squared analysis (to test for statistical dependency)Correlated?Column Group(Top-k pair)YESNOYESpCORDS discovers three property and relationshipTrivial cases“soft” keys“trivi

20、al” columnSoft FDs C1 = C2 |C1|/|C1, C2| CORDS estimates the |C1| and |C1, C2| using sampleCorrelationsDetect statistical dependence using the sample-based chi-squared analysisAttribute correlation detectionpCORDS and query optimizationpa query having a selection predicate p1p2, p1 = “Make = Honda”

21、and p2 = “Model = Accord”. ptrue selectivity : p1p2 = 1/10. pnave estimate : Sp1p2 = 1/|Make|1/|Model| = 1/7 1/8 = 1/56,padjusted estimate : nave estimate *|Make| |Model|/|Make, Model| = 1/56 * 56/9 = 1/9, Attribute correlation detectionpCORDS experimental resultSynthetic Data:The Accidents database

22、Benchmarking Data:The TPC-H benchmarkReal-world Data: include a subset of the Census database and the Auto database, Attribute correlation detectionpCORDS experimental resultAttribute correlation detectionpCORDS experimental resultAttribute correlation detectionpCOCA: More Accurate Multidimensional

23、Histograms out of More Accurate Correlations Detectionprobust and informative metric entropy correlation coefficientspa novel yet simple kind of multi-dimensional synopses COCA-Hist to cope with different correlationsAttribute correlation detectionpEntropy correlation coefficientsAttribute correlati

24、on detectionpMerits of COCASimple and straightforwardAccurate and robustInformative and unifiedAttribute correlation detectionpDifferent histograms for different degrees of correlationsHistograms for value sparsityMHIST+squeezingHistograms for mutual independenceAVI assumptionHistograms for other si

25、tuationsMHist-MaxDiff histogramsOther improvementsdiscarding empty bucketsAttribute correlation detectionpExperimentspCORDS: #1(0.39, 0.29) ,#3 (0.45, 0.58)Attribute correlation detectionpExperimentsAttribute correlation detectionpStar Gazing from atop your DB2 z/OS Database ServerIndex key feedback

26、The problems are: The cartesian join difficult to create suitable multi-column indexes unless the combination of filtering dimensions that are included in the queries are known.Star joinpStar Gazing from atop your DB2 z/OS Database ServerPAIR-WISE JOINStar joinpStar Gazing from atop your DB2 z/OS Database ServerPerformanceStar joinpStar join revisited: Performance internals for cluster architecturesStar joinpCluster architectures and horizontal partitioningpartitioning schemes:round robin,range partitioninghash partitioningCollocated

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论