版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Lesson10DataWarehouseOverview
(第十课数据仓库概论)
Vocabulary(词汇)ImportantSentences(重点句)QuestionsandAnswers(问答)Problems(问题)
TheworddatawarehousewasfirstdevelopedbyBillInmonintheearly1990s.Hereferredtoitasbeingaintegratedcollectionofinformationthatcouldhelpcompaniesandorganizationsmakebetterdecisions.
Tobeeffective,adatawarehousehadtobeintegrated,subjectoriented,non-volatile,andtimevariant.Inthisarticle,Iwillgooverallthesefactorsindetail.Ifyouarebuildingadatawarehouse,itisimportantforyoutounderstandwhytheyareimportant.
Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.[1]
Itisimportantthattheinformationcontainedwithinadatawarehouseisstable.Whiledatacanbeadded,itshouldneverbedeleted.Thispropertyisreferredtoasbeingnon-volatile.Whenacompanyusesadatawarehousethatisstable,thiswillallowthemtogetabetterunderstandingoftheoperationswithintheircompany.Despitethefactthatthesetermswerefirstcoinedinthe1990s,theyarestillhighlyaccuratetoday.However,itshouldbenotedthatsomedatawarehousesarevolatile.Thereasonforthisisbecausemanymoderndatawarehousesdealwithterabytesofdata.Becausetheymuststoreterabytesofdata,manycompaniesareforcedtodeletesomeoftheirinformationafteracertainperiodoftime.Forinstance,somecompanieswillsystematicallydeletedatathathasreachedthreeyearsofage.Beforeadatawarehousecanbebuilt,thecorrectdatamustbelocated.Generally,theinformationthatwillbeaddedtothewarehousewillcomefromdailyinformationorhistoricalinformation.Thehistoricalinformationmaybestoredinalegacysystem,andischallengingtoextract.
Thedesignofthedatawarehouseisimportantaswell.Itisimportantfordesignerstomakesurethedesignisconsistentwiththequeriesthatwillbeconductedwithinthewarehouse.Todothissuccessfully,itisimportantfordesignerstounderstandthedatabaseschema.Itiscrucialtomakesurethedatawarehouseisdesignedcorrectly,asitisdifficulttorecreatesomeformsofdata.Anotherimportantaspectofdatawarehousesisdataacquisition.Dataacquisitioncanbedefinedastransferringdatafromasourcetothewarehouse.Dataacquisitionisoneofthemostexpensivepartsofbuildingadatawarehouse.ThisprocesswilloftenbeconductedwithanETL(Extracting,TranslatingandLoading)tool.
Asofthistime,therearejustover50ETLtoolsbeingsold.Itmaycostacompanymillionsofdollarsinordertotransferdatafromsourcestothewarehouse.Oncetheinitialdatahasbeentransferredtothedatawarehouse,theprocessmustberepeatedconsistently.Dataacquisitionisacontinuousprocess,andthegoalofacompanyistomakesurethewarehouseisupdatedonaregularbasis.Whenthewarehouseisupdated,itisoftenhardtodeterminewhichinformationinthesourcehaschangedsincethepreviousupdate.Theprocessofdealingwiththisissueiscalledchangeddatacapture.Thisprocesshasbecomeaseparatefield,andthereareanumberofproductscurrentlybesoldtodealwithit.
Itisimportantfordatatobecleanedbeforeitcanbeplacedinthewarehouse.Thedatacleansingprocessisusuallydoneduringthedataacquisitionphase.Anydatathatisplacedinawarehousebeforebeingcleanwillposeadangertothesystem,anditcannotbeused.Thereasonforthisisbecausethedatamaynotbecorrectifitisnotcleaned,andacompanymaymakeincorrectdecisionsbasedonit.Thiscouldleadtoanumberofproblems.Forexample,alltheinformationwithinadatawarehousethatmeansthesamethingmustbestoredinthesameform.Ifthereisinformationthatreads“MS”and“Microsoft”,eventhoughtheymeanthesamething,onlyoneofthemcanbeusedtorecognizetheelementwithinthedatawarehouse.1DataWarehouseTools
Thereareanumberofimportanttoolswhichareconnectedtodatawarehouses,andoneoftheseisdataaggregation.Adatawarehousecanbedesignedtostoreinformationbasedonacertainlevelofdetail.Forexample,youcanstoredatabasedoneachtransaction,oryoucanstoreitbasedonasummary.Theseareexamplesofdataaggregation.Whendataissummarized,thequerieswillmoveatamuchfasterrate.However,someoftheinformationmaybelostduringaquery,andthisinformationmaybeimportantforsolvingacertainproblem.
Beforeyoudecidewhichoneyouwilluse,itisimportanttoweighyouroptionscarefully.Onceyouhavecarriedoutanoperation,youwillneedtorebuildthewarehouseinordertoundoit.Thebestwaytohandlethissituationistomakesurethedatawarehouseisconstructedwithalargeamountofdetail.However,thecostforthiscanbehugedependingonthestorageoptionsyouchoose.Onceyouhavefilledyourdatawarehousewithimportantinformation,youwillwanttousethisdatatohelpyoumakesmartinvestmentdecisions.Thetoolsthatcanallowyoutodothiswillfallunderatopicthatiscalledbusinessintelligence.
Businessintelligenceisafieldwhichisverydiverse.ItiscomprisedofthingssuchasExecutiveInformationSystems,DecisionSupportSystems,andBusinessintelligencecanfurtherbebrokendownintoafieldthatiscalledmulti-dimensionalanalysistools.Thesearetoolsthatwillallowausertoviewdatafromawidevarietyofangles.AquerytoolwillallowausertosendSQLquerieswithinawarehousetolookforresults.Dataminingisalsoafieldthatfallsunderbusinessintelligence,andwillallowyoutolookforpatternsandrelationshipswithinadatawarehouse.
Anothertoolthatisconnectedtodatawarehousesisdatavisualization.Thetoolsthatareusedfordatavisualizationwillpresentvisualmodelsofdata.Thisdatacouldcomeintheformofintricate3Dimages.Thegoalofdatavisualizationistoallowtheusertoviewtrendsinamethodwhichiseasiertounderstandthancomplicatedmodelsthatarebasedoffstatistics.OnetoolthatisallowingthisfieldtoadvanceisVRML,orVirtualRealityModelingLanguage.Inorderfordatawarehousestofunctionproperly,itisalsoimportanttoplaceanemphasisonmetadatamanagement.Metadatacanbedescribedasbeing“informationaboutinformation”.
Metadatamustbemanagedwhendataisacquiredoranalyzed.Metadatawillbeheldinarepository,andcangiveyouimportantinformationaboutmanyofthedatawarehousetools.Theprocessofproperlymanagingmetadatahasbecomeasciencewithinitself.Ifitisdoneproperly,thecompanycangreatlybenefit.Thereasonwhyitisimportantisbecauseitcanalloworganizationstoanalyzethechangesthatoccurwithindatabasetables.Thisisatoolthatplaysanimportantpartoftheconstructionofadatawarehouse.
Datawarehousingisafieldwhichissomewhatcomplicated.Therearemanyvendorswhoareattemptingtoadvertisethetools,butthecostandcomplexityinvolvedwiththeproductshasnotallowedthemtobeusedbyalargenumberofcompanies.Anycompanythatisthinkingofusingdatawarehousesmustmakesuretheyhavetakenthetimetoreviewandunderstandthetechnology.Itcanonlybeusefulifyouknowhowtouseit.Onceyouunderstandandacquirethetechnology,itispossibleforyoutogainapowerfuladvantageoveryourcompetitors.Thishasmadedatawarehousesattractivetomanycompanies.
Oneofthebiggestadvantagestodatawarehousesisthattheyallowyoutostoreinformationthatyoucanusetoimprovethemarketingstrategiesofyourcompany.Notonlycanyouimprovethemarketingstrategies,butyouwillalsobeabletomakestrategicdecisionsbasedontheinformationyouhavecompiledandorganized.Withtechniquessuchasdatamininganddatavisualization,youwillbeabletodiscoverimportantpatternsthatyoudidn’tknowexisted.Thepatternsthatyoudiscovercanallowyourcompanytoearnlargeprofits.2DataWarehousingMethods
Mostorganizationsagreethatdatawarehousesareausefultool.Theybenefitfromtheabilitytostoreandanalyzedata,andthiscanallowthemtomakesoundbusinessdecisions.Itisalsoimportantforthemtomakesurethecorrectinformationispublished,anditshouldbeeasytoaccessbythepeoplewhoareresponsibleformakingdecisions.
Therearetwoelementsthatmakeupthedatawarehouseenvironment,andthesearepresentationandstaging.Thestagingcouldalsobeknownastheacquisitionarea.ItiscomposedofETLoperations,andoncethedatahasbeenprepared,itwillbesenttothepresentationarea.
Whenthedataisplacedwithinthepresentationarea,anumberofprogramswillanalyzeandreviewit.Whilemanyorganizationsagreeontheoverallgoalofdatawarehouses,theapproachestobuildingthemmaydiffer.Attemptingtousedatamartsaloneisnotagoodapproach,becausetheyaregearedtowardsdepartments.Inadditiontothis,attemptingtousedatamartsalonewillbeinefficient,andyouwillrunintoanumberoflongtermproblems.Therearetwotechniquesforbuildingdatawarehousesthathavebecomeverypopular.ThesearetheKimballBusArchitectureandtheCorporateInformationFactory.
WiththeKimballtechnique,theroughdatawillbetransformedandrefinedwithinthestagingarea.Itisimportanttomakesurethedataisproperlyhandledduringthisstep.Duringthestagingprocess,theroughdatawillbepulledfromthesourcesystems.Whilesomeofthestagingprocessesmaybecentralized,otherswillbedistributed.Thepresentationareawillhaveadimensionalstructure,andthismodelwillholdthesameinformationasastandardmodel.However,itwillbeeasiertouse,anditwilldisplayinformationthatissummarized.
Adimensionalmodelwillbecreatedbyabusinessoperation.Departmentswithintheorganizationdonotplayaroleinthis.Thedatawillbepopulatedonceitisplacedwithinthedimensionalwarehouse,andisnotdependentonthevariousdepartmentsthatmaycomposeanorganization.Whenbusinessprocesseshavebeendevelopedwithinthewarehouse,thesystemwillbecomehighlyefficient.ThenextpopulardatawarehouseapproachthatyouwillwanttobecomefamiliarwithistheCorporateInformationFactory.AnothernameforthistechniqueistheEDWapproach.Thedatathatisextractedfromthesourcewillbecoordinated.
WithintheCIF,astandarddatawarehouseisusedtoholddatarepositories,anditmayalsohavespecificdatawarehouseswhicharedesignedfordatamining.Thedatamartsmaybedesignedforspecificdepartments,andtheymayhavesummarydatawhichisintheformofadimensionalstructure.Theatomicdatamaybeobtainedfromthestandarddatawarehouse.Whiletherearesomesimilaritiesbetweenthesetotechniques,therearesomenotabledifferencesaswell.
Oneoftheprimarydifferencesbetweenthesetwotechniquesisthenormalizeddatafoundation.WiththeKimballapproach,thedatastructuresthatmustbeobtainedbeforethedimensionalpresentationwillbedependentonthesourcedataandtransformation.Inmostcases,theduplicatestorageofdataisnotrequiredinbothdimensionalandnormalizedfoundations.Manyofthepeoplewhochoosetouseanormalizeddatastructurebelievethatitisfasterthanthedimensionalstructure,buttheyoftenfailtotakeETLintoconsideration.
Anotherthingthatseparatesthetwodatawarehouseapproachesisthemanagementofatomicdata.WiththeCIF,atomicdatawillbestoredwithinanormalizeddatawarehouse.Incontrast,theKimballmethodstatesthattheatomicdatashouldbeplacedwithinadimensionalstructure.Whenthedataisplacedwithinadimensionalstructure,itcanbesummarizedinawidevarietyofdifferentways.
Itisimportanttomakesuretheinformationyouhaveisdetailedsothatuserswillbeabletoaskrelevantquestions.Whilemostuserswillnotplaceanemphasisonthedetailsofoneatomictransaction,theymaywantasummaryofalargenumberoftransactions.Itisimportantforthemtohavethedetailssothattheywillbeabletoanswerimportantquestions.Theapproachthatyouchooseshouldbetheonewhichbestservestheneedsofyourcompany.3DataWarehouseDesignStrategies
Tobuildaneffectivedatawarehouse,itisimportantforyoutounderstanddatawarehousedesignprinciples.Ifyourdatawarehouseisnotbuiltcorrectly,youcanrunintoanumberofdifferentproblems.
Thepropermethodsforbuildingapowerfuldatawarehousearebasedoninformationtechnologytactics.Firstoff,itisimportantthatyouandyourorganizationunderstandtheimportanceofhavingadatawarehouse.Ifworkersfeelthatadatawarehouseisunnecessary,theymaynotuseit,andthiscouldcauseconflicts.Everyoneinyourorganizationshouldunderstandtheimportanceofusingthesystem.
Afteryouhavegotyourcolleaguesbehindtheconceptofusingadatawarehouse,youwillwanttonextfocusondataintegrity.Youwillwanttoavoiddesigningadatawarehousethatwillloaddatathatisnotconsistent.Itisalsoimportanttoavoidcreatingadatabasethatwillreplicatedata.Thegoalofyourorganizationshouldbetointegratedataandcreatestandardsthatwillbeusedandfollowed.Afterdataintegrity,youwillnextwanttolookatimplementationefficiency.Thisbasicallymeansthatyouwillwanttodesignatsystemthatissimpletouse.Itdoesn’tmatterhowwelldesignedyourdatawarehouseisifyourworkershaveahardtimeusingit.
Ifyourworkershaveahardtimeusingthedatawarehouse,itwillslowdownthespeedandproductivityofyouroperation.Whenitcomestocreatingadatawarehouse,youwillwanttomakeitassimpleaspossible.Allofyourworkersshouldbeabletouseitwithoutproblems.Implementationefficiencyisaprinciplethatnaturallyleadstothenexttopicyouwillwanttofocuson,andthisisuserfriendliness.Thisisaconceptthatisanimportantpartofyourbusiness.Thereasonforthisisbecauseenduserswillnotutilizeaprogramthatistoodifficulttouse.Itisimportantforyoutokeeptheminmind.Useadesignwhichisfriendlyandeasytolearn.
Onceyouhavedesignedadatawarehousethatisuserfriendly,youwillnextwanttolookatoperationalefficiency.Oncethedatawarehousehasbeencreated,itshouldbeabletocarryoutoperationsquickly.Inadditiontothis,itshouldnothaveerrorsorothertechnicalproblems.Whenerrorsortechnicalproblemsdooccur,theyshouldbesimpletofix.Anotherthingyouwillwanttolookatisthecostinvolvedwithsupportingthesystem.Youwillwanttokeepthesecostslowasmuchaspossible.
Thedesignprinciplesthathavebeendiscussedinthisarticlesofararemorerelatedtobusinessthaninformationtechnology.However,thereareanumberofITdesignprinciplesthatyouwillwanttofollow.Oneoftheseisscalability.Thisisaproblemthatmanydatawarehousedesignersruninto.Thebestwaytodealwiththisissueistocreateadatawarehousethatisscalablefromthebeginning.Designitinawaywhichwillallowittosupportexpansionsorupgrades.Youshouldbeabletoadaptittoanumberofdifferentbusinesssituations.Thebestdatawarehousesarethosewhicharescalable.
Thedatawarehousethatyoudesignshouldfallundertheguidelinesofinformationtechnologystandards.EverytoolthatyouusetobuildyourdatawarehouseshouldworkwellwithITstandards.Youwillwanttomakesureitisdesignedinawaythatmakesiteasierforyourworkerstouse.Whilefollowingtheguidelinesinthisarticlewon’tallowyoutoalwaysbesuccessful,itwillgreatlytiptheoddsinyourfavor.Youshouldbewaryofcompaniesthatpromiseyouperfectresultsifyouusetheirdesignmethods.[2]Nomatterhowwelldesignedyourdatawarehouseis,youwillalwaysrunintoproblems.However,followingtherightprincipleswillmaketheproblemseasiertorecognizeandsolve.
Whenitcomestousingadatawarehouse,itisnotamatterof“if”youwillrunintoproblems.Itismatterof“how”and“when”.Whenyourdatawarehouseiswelldesigned,youwillbebetterequippedtosolveanyproblemsyouencounter.
1. warehousen.仓库,货栈。
2. goover受欢迎,获得接受;检查。
3. orientvt.vi.使熟悉,使适应;使朝向;确定位置;朝向;确定方向;使适应n.东方,亚洲。
4. variantn.变体;变种;变型adj.不同的;差别的;变异的;各种各样的。
5. specificadj.明确的,确切的,详尽的;具体的,特有的,特定的;仅限于……的。Vocabulary
6. volatileadj.飞行的,挥发性的,可变的,不稳定的,轻快的,爆炸性的n.有翅的动物,挥发物。
7. scheman.概要,计划,图表,模式。
8. acquisitionn.获得,得到的东西;得到的人,买进。
9. aggregationn.集合,凝聚,集成,集结(作用),集合[成]体,集团。
10. strategyn.战略(学),策略,计谋,作战方针;智谋,手腕strategyandtactics战略与战术。
11. Intricateadj.复杂的,错综的,难以理解的。
12. martn.市场;贸易场所。
13. repositoryn.仓库,储藏所;储物器皿,博物馆;学识渊博的人;受人信赖的人,知己。
14. Stagingn.举行,进行;配置,阶变,级,级组,分段运输;分级法。
15. Populatevt.居住,使人口聚居于;移民于;殖民于人口稠密(稀少)的城市。
[1]Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.ImportantSentences
所谓“面向主题”,就是数据将提供有关一个具体的主题的信息,而不是有关公司运行的信息。由于数据仓库是面向主题的,因此它就允许你分析与具体主题相关的
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 某著名企业双创项目介绍
- 某著名企业商务礼仪培训资料
- 《GB-Z 31477-2015航空电子过程管理 航空电子产品高加速试验定义和应用指南》专题研究报告
- 《GBT 16538-2008声学 声压法测定噪声源声功率级 现场比较法》专题研究报告
- 《GBT 21778-2008化学品 非啮齿类动物亚慢性(90天)经口毒性试验方法》专题研究报告
- 《GBT 15825.5-2008金属薄板成形性能与试验方法 第5部分:弯曲试验》专题研究报告
- 《GBT 2317.2-2008电力金具试验方法 第2部分:电晕和无线电干扰试验》专题研究报告
- 道路安全出行教育培训课件
- 道路交通安全法安全培训课件
- 2026年国际注册内部审计师考试试题题库(答案+解析)
- 2025年贸易经济专业题库- 贸易教育的现状和发展趋势
- 核子仪考试题及答案
- DB46-T 481-2019 海南省公共机构能耗定额标准
- 劳动合同【2026版-新规】
- 电子元器件入厂质量检验规范标准
- 中药炮制的目的及对药物的影响
- 688高考高频词拓展+默写检测- 高三英语
- 学生公寓物业管理服务服务方案投标文件(技术方案)
- 食品检验检测技术专业介绍
- 2025年事业单位笔试-贵州-贵州财务(医疗招聘)历年参考题库含答案解析(5卷套题【单项选择100题】)
- 二年级数学上册100道口算题大全(每日一练共12份)
评论
0/150
提交评论