版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
High-DimensionalOLAP:
AMinimalCubingApproachpurposeHowtocubinginHigh-DimensinaldatawarehousesefficientlyThispaperproposeanovelmethodthatcomputesathinlayerofthedatacubetogetherwithassociatedvalue-listindicesIntroductionDatacubehasbeenplayinganessentialroleintheimplementationoffastOLAPoperationTherehavebeenmanyefficientcubecomputationalgorithmsproposedMultiwayarrayaggregationBUCH-cubingStar-cubingIntroduction(cont.)Traditionaldatawarehousemayhave10dimensions,butmorethat109
tuplesButforbioinformatics,textprocessing,dataarehighindimensionality,over100,1000dimensionsbutonlymediuminsize,egaround106
tuples.ExistingmethodistoocostlyincomputationtimeandstoragespacetohighdimensionalOLAPIntroduction(cont.)newmethodcalledshellfragmentVerticallypartitionsahighdimensionaldatasetintoasetofdisjointlowdimensionaldatasetsForeachfragment,computeitlocaldatacubeofflineWhenquery,assemblethesefragmentonlineAnalysisCurseofDimensionalityAhighdimensionaldatacuberequiresmassivememoryanddiskspaceCurrentalgorithmsareunabletomaterializethefullcubeundersuchconditionsIcebergCubeComputingonlythecuboidcellswhosecountorotheraggregatessatisfyingthecondition:HAVINGCOUNT(*)>=minsupMotivationOnlyasmallportionofcubecellsmaybe“abovethewater’’inasparsecubeOnlycalculate“interesting”data—dataabovecertainthresholdProblemofIcebergCubeFirst,ifahigh-dimensionalcellhasthesupportalreadypassingthecebergthreshold,itcannotbeprunedbytheicebergconditionandwillstillgenerateahugenumberofcells.abasecuboidcell:“(a1;a2;:::;a60):5"(i.e.,withcount5)willstillgenerate260icebergcubecells.ProblemofIcebergCube(cont.)Second,itisdifficulttosetupanappropriateicebergthreshold.Atoolowthresholdwillstillgenerateahugecube,butatoohighonemayinvalidatemanyusefulapplications.Third,anicebergcubecannotbeincrementallyup-dated.Samesituationhappensinthedwarf,quotientcubeSubstantialI/OoverheadforaccessingafullmaterializeddatacubeQueryordermightbeincompatiblewithaI/OproblemCuboidsarestoredondiskinsomefixedorder,thatordermightbeincompatiblewithparticularequery.CurrentpartialsolutionComputeathincubeshellCubeidwithMaybe3dimensionsorlessina60Existingalotofproblems:StillneedtocomputealotofcubeidDonotsupportOLAPover4dimensionsCannotsupportdrillingComputationModelSemi-onlinecomputatinmodelwithcertainpre-processingObservation,anOLAPquery: ignoremanydimensions(i.e.,treatingthemasirrelevant)fixsomedimensions(e.g.,usingqueryconstantsasinstantiations)leaveonlyafewtobemanipulated(fordrilling,pivoting,etc.).OLAPoperationsPrecomputationofshellFragmentsInvertedIndexLemma1TheinvertedindextableusesthesameamountofstoragespaceastheoriginaldatabaseShellFragmentsAllthedimensionsofadatasetarepartitionedintoindependentgroups,calledfragments.Foreachfragment,wecomputethecompletelocaldatacubewhileretainingtheinvertedindices.(A1……A60),fragmentsofsize3,140cubeids,whilecubeshellofsizeof336050cubeids.Example(A,B,C)and(D,E)Foreachfragment,wecomputethecompletedatacubebyintersectingthetid-lists{a1b2*}CuboidDELemma2GivenadatabaseofTtuplesandDdimensions,theamountofmemoryneededtostoretheshellfragmentsofsizeFisO(T(D/F)(2F-1))ComputingotherMeasuresSum,averageID_MeasurearrayAlgorithmforShellFragmentComputationOnlineQueryComputationPointQueryseeksaspecialcuboidcellintheoriginaldataspace.Inann-dimensionaldatacube(A1;A2;:::;An),apointqueryisintheformof(a1;a2;:::;an:M)MistheinquiredmeasureFordimensionsthatareirrelevantoraggregated,onecanuse*asitsvalue.SubcubeQueryseeksasetofcuboidcellsintheoriginaldataspaceItisonewhereatleastoneoftherelevantdimensionsinthequeryisinquired,Marked?.<a2;?;c1;*;?:count()>QueryProcessing<a1;a2;:::;an:M>.Eachaihas3possiblevalues:aninstantiatedvalue,Aggregate*,inquire?.Stepsforinstantiateddimensionalgatheralltheinstantiatedai'sifthereareanyexaminetheshellfragmentpartitionstocheckwhichai'sareinthesamefragments.retrievethetid-listsTheobtainedtid-listsareintersectedtoderivetheinstantiatedbasetable.Iftherearenoinquireddimensions,stopotherwiseStepsforinquireddimensionsForeachinquireddimension,weretrieveallitspossiblevaluesandtheirassociatedtid-lists.theyareintersectedwiththeinstantiatedbasetabletoformthelocalbasecuboidoftheinquiredandinstantiateddimensions.AnycubingalgorithmcanbeemployedtocomputethelocaldatacubeShellFragmentGrouping&SizeGroupingdomain-specificknowledgecanbeusedforbettergrouping.Size(F)IfFistoosmall,thespacerequiredtostorethefragmentcubeswillbesmallbutthetimeneededtocomputequeriesonlinewillbelong.2<=F<=4Bottom-UpComputation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD’99)Bottom-upvs.top-down?—dependingonhowyouviewit!Aprioriproperty:Aggregatethedata, thenmovetothenextlevelIfminsupisnotmet,stop!Ifminsup=1ÞcomputefullCUBE!PartitioningUsually,entiredatasetcan’tfitinmainmemorySortdistinctvalues,partitionintoblocksthatfitContinueprocessingOptimizationsPartiti
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年河北省医科大学第二医院医护人员招聘考试备考题库及答案详解
- 2026年九江市中医医院医护人员招聘笔试备考试题及答案详解
- 2026年山西省太原市中心医院医护人员招聘笔试备考题库及答案详解
- 2026年龙凤街将军直社区卫生服务站医护人员招聘考试参考试题及答案详解
- 2026年荆州市中医医院医护人员招聘考试参考试题及答案详解
- 2026年牡丹江市第一人民医院医护人员招聘考试参考试题及答案详解
- 2026年洛阳市第二人民医院医护人员招聘考试参考题库及答案详解
- 2026年深圳市福田区中医院医护人员招聘笔试参考题库及答案详解
- 2026年清远市人民医院医护人员招聘笔试备考题库及答案详解
- 2026年南昌市第三医院医护人员招聘笔试备考试题及答案详解
- 2026年湖北天门市专业技术职务水平能力测试(党建基础知识)练习试题及答案
- 2026年高考北京卷文综历史预测考点题库真题及答案
- 2026江苏苏州工业园区综合执法系统招聘工作人员20人考试参考试题及答案解析
- 统编版历史八年级下册第20课《维护国家安全和推进祖国统一》 教学课件
- 2026年招标采购从业人员《招标采购专业理论与法律基础(初级)》考试真题(含解析)
- 2026年国际注册汉语教师资格等级考试基础综合教材笔记及真题题库
- 2026四川省引大济岷水资源开发有限公司第二批次招聘68人笔试参考题库及答案解析
- 2026广东中山大学附属第三医院招聘事业单位人员29人(第二批)笔试备考题库及答案解析
- 2025年大数据管理中心招聘考试笔试试题(含答案)
- 党员发展对象培训考试题库完整版附答案【完整版】
- 医院信息安全培训
评论
0/150
提交评论