版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、多表连接查询优化的相关研究多表连接查询优化的相关研究吕吕 彬彬2009.3.5motivation图中表示连接顺序对查询效率的图中表示连接顺序对查询效率的影响:影响:准确估计选择度要考虑属性间的相关性问题关键:问题关键:高效地计算属性间的相关度AgendapMulti-table join overviewHeuristic and randomized optimization for the join ordering problemMichael Steinbrunn,et al, The VLDB Journal (1997) 6: 191208pAttribute correlati
2、on detectionBHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational DataPaul G. Brown Peter J. Haas , Proceedings of the 29th VLDB Conference, 2003CORDS: Automatic Discovery of Correlations and Soft Functional DependenciesIhab F. Ilyas,Volker Markl, et al, SIGMOD 2004, June 1318, 200
3、4,COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations DetectionCAO Wei1, QIN Xiongpai, WANG Shan,WAIM2008pStar joinStar Gazing from atop your DB2 z/OS Database ServerTerry Purcell, et al, Intelligent OptimizerStar join revisited: Performance internals for cluster archit
4、ecturesJosep Aguilar-Saborit, Data & Knowledge Engineering 63 (2007) 9951013pHeuristic and randomized optimization for the join ordering problemChoosing join type based on costSolution space for the join ordering problemJoin ordering strategiesQuantitative analysisConclusionMulti-table join over
5、viewpChoosing join type based on costCost modelsNested loop joinSort-merge joinHash joinMulti-table join overviewIO CostElapsed TimeRow CostCPU CostBase CostScan CostPage CostpSolution space for the join ordering problemLeft-deep treesn! ways to allocate n base relations to the trees leavesgood solu
6、tions because of exploiting the cost-reducing pipelining techniqueBushy treesan adaptable plan enumeration strategylinear graphs (n3 n)/6 star graphs (n1) 2n2Multi-table join overviewpJoin ordering strategiesDeterministic algorithmsheuristic or exhaustive searchRandomized algorithmsDefine a set of m
7、oves which constitute edges between the different solutions of the solution spaceperforms a random walk along the edges according to certain rules,terminating as soon as no more applicable moves exist or a time limit is exceededGenetic algorithmsmake use of a randomized search strategy very similar
8、to biological evolutionHybrid algorithmscombine the strategies of pure deterministic and pure randomized algorithmsMulti-table join overviewpQuantitative analysisClass of the join graphRelation cardinalities and domain sizesMulti-table join overviewpQuantitative analysisSolution spacesMulti-table jo
9、in overviewpQuantitative analysisdeterministic algorithmsMulti-table join overviewpQuantitative analysisdeterministic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table
10、join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisRandomized and genetic algorithmsMulti-table join overviewpQuantitative analysisTotal running timeFind final solution time Multi-table join overviewpConclusionHeuristic algorithms Vs Ra
11、ndomized and genetic algorithmscompute quickly, but far from the optimum. better suited for join optimizations; although require a longer running timeSolution spaceExcept the star join graph, the bushy tree is preferable than left-deep processing trees.The extensibility of randomized and genetic alg
12、orithmsMulti-table join overviewpQuery-Driven approachesQuery workloadQuery feedbackMultidimensional histogramSASH algorithmAdvantage and disadvantagepData-Driven approachesStatisticalChi-square test, Log-linear modelProbabilityBayesian network, Markov networkFull scan vs. Samplehard FDs, soft FDs,
13、CorrelationAttribute correlation detectionpBHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational DataExampleAttribute correlation detectionpOverview of BHUNTAlgebraic ConstraintsExample a1 : deliveries.deliveryDate, a2 : orders.shipDate, :as the subtraction operator, P : orders.ord
14、erID = deliveries.orderID,I = 2, 3, 4, 5 12, 13, . . . , 19 31, 32, 33, 34, 35 .Attribute correlation detectionpOverview of BHUNTGenerating Candidates C = (a1, a2, P,). Generating Pairing RulesTurning Pairing Rules Into CandidatesIdentifying Fuzzy ConstraintsConstructing Bump Intervals, by applying
15、statistical histogramming, segmentation, or clustering techniques to a sample of the column valuesChoosing the Sample Size, The sample size is selected to control the number of “exception” records that fail to satisfy the constraint.Exploiting the ConstraintsIdentify the most useful set of constrain
16、ts, and create “exception tables” to hold all of the exception records.Modify the queries to incorporate the constraints the optimizer uses the constraints to identify new, more efficient access paths.Attribute correlation detectionpExperimental results of BHUNTAttribute correlation detectionpCORDS:
17、 Automatic Discovery of Correlations and Soft Functional Dependencies.Attribute correlation detectionpCORDS: Automatic Discovery of Correlations and Soft Functional Dependenciessoft functional dependenciesC1 = C2, the value of C1 determines the value of C2 not with certainty, but merely with high pr
18、obability.hard functional dependenciesthe value of C1 completely determines the value of C2.Attribute correlation detectionpCORDS overviewFirst: Enumerating and PruningPruning ruleType constraintStatistical constraintParing constraintWorkload constraintIdentify trivial casesAttribute correlation det
19、ectionpCORDS overviewAttribute correlation detectionAnalyze distinct value of sampled column (to test for SOFT FD)Soft FD?Chi-squared analysis (to test for statistical dependency)Correlated?Column Group(Top-k pair)YESNOYESpCORDS discovers three property and relationshipTrivial cases“soft” keys“trivi
20、al” columnSoft FDs C1 = C2 |C1|/|C1, C2| CORDS estimates the |C1| and |C1, C2| using sampleCorrelationsDetect statistical dependence using the sample-based chi-squared analysisAttribute correlation detectionpCORDS and query optimizationpa query having a selection predicate p1p2, p1 = “Make = Honda”
21、and p2 = “Model = Accord”. ptrue selectivity : p1p2 = 1/10. pnave estimate : Sp1p2 = 1/|Make|1/|Model| = 1/7 1/8 = 1/56,padjusted estimate : nave estimate *|Make| |Model|/|Make, Model| = 1/56 * 56/9 = 1/9, Attribute correlation detectionpCORDS experimental resultSynthetic Data:The Accidents database
22、Benchmarking Data:The TPC-H benchmarkReal-world Data: include a subset of the Census database and the Auto database, Attribute correlation detectionpCORDS experimental resultAttribute correlation detectionpCORDS experimental resultAttribute correlation detectionpCOCA: More Accurate Multidimensional
23、Histograms out of More Accurate Correlations Detectionprobust and informative metric entropy correlation coefficientspa novel yet simple kind of multi-dimensional synopses COCA-Hist to cope with different correlationsAttribute correlation detectionpEntropy correlation coefficientsAttribute correlati
24、on detectionpMerits of COCASimple and straightforwardAccurate and robustInformative and unifiedAttribute correlation detectionpDifferent histograms for different degrees of correlationsHistograms for value sparsityMHIST+squeezingHistograms for mutual independenceAVI assumptionHistograms for other si
25、tuationsMHist-MaxDiff histogramsOther improvementsdiscarding empty bucketsAttribute correlation detectionpExperimentspCORDS: #1(0.39, 0.29) ,#3 (0.45, 0.58)Attribute correlation detectionpExperimentsAttribute correlation detectionpStar Gazing from atop your DB2 z/OS Database ServerIndex key feedback
26、The problems are: The cartesian join difficult to create suitable multi-column indexes unless the combination of filtering dimensions that are included in the queries are known.Star joinpStar Gazing from atop your DB2 z/OS Database ServerPAIR-WISE JOINStar joinpStar Gazing from atop your DB2 z/OS Database ServerPerformanceStar joinpStar join revisited: Performance internals for cluster architecturesStar joinpCluster architectures and horizontal partitioningpartitioning schemes:round robin,range partitioninghash partitioningCollocated
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 26年甲状腺癌NGS检测质控手册
- 胆囊炎患者急性期饮食护理建议
- 少儿速写人物课件
- 感恩教育座谈会实施纲要
- 广东省广州市2024-2025学年八年级上学期期末地理试卷(含答案)
- 2026新生儿气道及呼吸机管路护理要点解析
- 防灾减灾活动中班教案
- 现代教育技术发展与应用
- 六灾安全教育
- 健康饮食教育核心体系
- 2026年同等学力申硕英语模拟卷
- 摩根士丹利 -半导体:中国AI加速器-谁有望胜出 China's AI Accelerators – Who's Poised to Win
- 2026辽宁沈阳汽车集团有限公司所属企业华亿安(沈阳)置业有限公司下属子公司招聘5人笔试历年参考题库附带答案详解
- 2026年公路养护工职业技能考试题库(新版)
- 2026中国广播影视出版社有限公司高校毕业生招聘3人备考题库含答案详解(完整版)
- 宜宾市筠连县国资国企系统2026年春季公开招聘管理培训生农业考试模拟试题及答案解析
- 2026年福建南平市八年级地生会考考试真题及答案
- 2025-2030非洲智能汽车零部件行业市场供需理解及投资潜力规划分析研究报告
- GA/T 718-2007枪支致伤力的法庭科学鉴定判据
- 贞丰县乡镇地图PPT黔西南布依族苗族自治州贞丰县行政区划可
- 湖南省衡阳市南岳区事业单位考试历年真题
评论
0/150
提交评论