版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
High-DimensionalOLAP:
AMinimalCubingApproachpurposeHowtocubinginHigh-DimensinaldatawarehousesefficientlyThispaperproposeanovelmethodthatcomputesathinlayerofthedatacubetogetherwithassociatedvalue-listindicesIntroductionDatacubehasbeenplayinganessentialroleintheimplementationoffastOLAPoperationTherehavebeenmanyefficientcubecomputationalgorithmsproposedMultiwayarrayaggregationBUCH-cubingStar-cubingIntroduction(cont.)Traditionaldatawarehousemayhave10dimensions,butmorethat109
tuplesButforbioinformatics,textprocessing,dataarehighindimensionality,over100,1000dimensionsbutonlymediuminsize,egaround106
tuples.ExistingmethodistoocostlyincomputationtimeandstoragespacetohighdimensionalOLAPIntroduction(cont.)newmethodcalledshellfragmentVerticallypartitionsahighdimensionaldatasetintoasetofdisjointlowdimensionaldatasetsForeachfragment,computeitlocaldatacubeofflineWhenquery,assemblethesefragmentonlineAnalysisCurseofDimensionalityAhighdimensionaldatacuberequiresmassivememoryanddiskspaceCurrentalgorithmsareunabletomaterializethefullcubeundersuchconditionsIcebergCubeComputingonlythecuboidcellswhosecountorotheraggregatessatisfyingthecondition:HAVINGCOUNT(*)>=minsupMotivationOnlyasmallportionofcubecellsmaybe“abovethewater’’inasparsecubeOnlycalculate“interesting”data—dataabovecertainthresholdProblemofIcebergCubeFirst,ifahigh-dimensionalcellhasthesupportalreadypassingthecebergthreshold,itcannotbeprunedbytheicebergconditionandwillstillgenerateahugenumberofcells.abasecuboidcell:“(a1;a2;:::;a60):5"(i.e.,withcount5)willstillgenerate260icebergcubecells.ProblemofIcebergCube(cont.)Second,itisdifficulttosetupanappropriateicebergthreshold.Atoolowthresholdwillstillgenerateahugecube,butatoohighonemayinvalidatemanyusefulapplications.Third,anicebergcubecannotbeincrementallyup-dated.Samesituationhappensinthedwarf,quotientcubeSubstantialI/OoverheadforaccessingafullmaterializeddatacubeQueryordermightbeincompatiblewithaI/OproblemCuboidsarestoredondiskinsomefixedorder,thatordermightbeincompatiblewithparticularequery.CurrentpartialsolutionComputeathincubeshellCubeidwithMaybe3dimensionsorlessina60Existingalotofproblems:StillneedtocomputealotofcubeidDonotsupportOLAPover4dimensionsCannotsupportdrillingComputationModelSemi-onlinecomputatinmodelwithcertainpre-processingObservation,anOLAPquery: ignoremanydimensions(i.e.,treatingthemasirrelevant)fixsomedimensions(e.g.,usingqueryconstantsasinstantiations)leaveonlyafewtobemanipulated(fordrilling,pivoting,etc.).OLAPoperationsPrecomputationofshellFragmentsInvertedIndexLemma1TheinvertedindextableusesthesameamountofstoragespaceastheoriginaldatabaseShellFragmentsAllthedimensionsofadatasetarepartitionedintoindependentgroups,calledfragments.Foreachfragment,wecomputethecompletelocaldatacubewhileretainingtheinvertedindices.(A1……A60),fragmentsofsize3,140cubeids,whilecubeshellofsizeof336050cubeids.Example(A,B,C)and(D,E)Foreachfragment,wecomputethecompletedatacubebyintersectingthetid-lists{a1b2*}CuboidDELemma2GivenadatabaseofTtuplesandDdimensions,theamountofmemoryneededtostoretheshellfragmentsofsizeFisO(T(D/F)(2F-1))ComputingotherMeasuresSum,averageID_MeasurearrayAlgorithmforShellFragmentComputationOnlineQueryComputationPointQueryseeksaspecialcuboidcellintheoriginaldataspace.Inann-dimensionaldatacube(A1;A2;:::;An),apointqueryisintheformof(a1;a2;:::;an:M)MistheinquiredmeasureFordimensionsthatareirrelevantoraggregated,onecanuse*asitsvalue.SubcubeQueryseeksasetofcuboidcellsintheoriginaldataspaceItisonewhereatleastoneoftherelevantdimensionsinthequeryisinquired,Marked?.<a2;?;c1;*;?:count()>QueryProcessing<a1;a2;:::;an:M>.Eachaihas3possiblevalues:aninstantiatedvalue,Aggregate*,inquire?.Stepsforinstantiateddimensionalgatheralltheinstantiatedai'sifthereareanyexaminetheshellfragmentpartitionstocheckwhichai'sareinthesamefragments.retrievethetid-listsTheobtainedtid-listsareintersectedtoderivetheinstantiatedbasetable.Iftherearenoinquireddimensions,stopotherwiseStepsforinquireddimensionsForeachinquireddimension,weretrieveallitspossiblevaluesandtheirassociatedtid-lists.theyareintersectedwiththeinstantiatedbasetabletoformthelocalbasecuboidoftheinquiredandinstantiateddimensions.AnycubingalgorithmcanbeemployedtocomputethelocaldatacubeShellFragmentGrouping&SizeGroupingdomain-specificknowledgecanbeusedforbettergrouping.Size(F)IfFistoosmall,thespacerequiredtostorethefragmentcubeswillbesmallbutthetimeneededtocomputequeriesonlinewillbelong.2<=F<=4Bottom-UpComputation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD’99)Bottom-upvs.top-down?—dependingonhowyouviewit!Aprioriproperty:Aggregatethedata, thenmovetothenextlevelIfminsupisnotmet,stop!Ifminsup=1ÞcomputefullCUBE!PartitioningUsually,entiredatasetcan’tfitinmainmemorySortdistinctvalues,partitionintoblocksthatfitContinueprocessingOptimizationsPartiti
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 卖配件设备采购规章制度
- 山西同文职业技术学院《对外汉语教学概论》2025-2026学年期末试卷
- 沈阳师范大学《急诊与灾难学》2025-2026学年期末试卷
- 山西铁道职业技术学院《欧美文学选读》2025-2026学年期末试卷
- 泰州学院《旅游消费者行为学》2025-2026学年期末试卷
- 沈阳音乐学院《流通概论》2025-2026学年期末试卷
- 山西同文职业技术学院《市场调查》2025-2026学年期末试卷
- 沈阳建筑大学《电子商务法》2025-2026学年期末试卷
- 电力工程招投标专员标书制作考试题目及答案
- Butropium-bromide-生命科学试剂-MCE
- 重庆市康德2026届高三高考模拟调研卷(三)地理试卷(含答案详解)
- 2026年全国两会解读:反垄断反不正当竞争
- 2026黑龙江省住房和城乡建设厅直属事业单位公开招聘工作人员14人笔试模拟试题及答案解析
- 2026年及未来5年市场数据中国丙酮酸行业市场调查研究及发展趋势预测报告
- 2026广西桂林国民村镇银行招聘笔试备考试题及答案解析
- 检验检测机构监管新规解读
- 南极洲地理介绍课件
- 油库安全管理规范
- 2022年天津注册会计师《审计》考试题库汇总(含典型题和真题)
- 功率场效应晶体管绝缘栅双极型晶体管课件
- 江苏省幼儿园教育技术装备标准
评论
0/150
提交评论