《人工智能与数据挖掘教学课件》2.datawarehou_第1页
《人工智能与数据挖掘教学课件》2.datawarehou_第2页
《人工智能与数据挖掘教学课件》2.datawarehou_第3页
《人工智能与数据挖掘教学课件》2.datawarehou_第4页
《人工智能与数据挖掘教学课件》2.datawarehou_第5页
已阅读5页,还剩69页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

DataWarehouse,WhyDatawarehouse,Themostcommonissuecompaniesfacewhenlookingatdataminingisthattheinformationisnotinoneplace.Thebiggestchallengebusinessanalystsfaceinusingdataminingishowtoextract,integrate,cleanse,andpreparedatatosolvetheirmostpressingbusinessproblems.,WhatisDataWarehouse,Theideaofadatawarehouseistoputawiderangeofoperationaldatafrominternalandexternalsourcesintooneplacesoitcanbebetterutilizedbyexecutives,lineofbusinessmanagersandotherbusinessanalysts.Oncetheinformationisgathered,OLAP(on-lineanalyticalprocessing)softwarecomesintoplaybyprovidingthedesktopanalysistoolsforquerying,manipulatingandreportingthedatafromthedatawarehouse.,DataWarehouseenvironment,thesourcesystemsfromwhichdataisextractedthetoolsusedtoextractdataforloadingthedatawarehousethedatawarehousedatabaseitselfwherethedataisstoredthedesktopqueryandreportingtoolsusedfordecisionsupport,DataWarehousingProcessOverview,OperationalVs.MultidimensionalViewOfSales,CreatingADataWarehouse,TheDataWarehouse,TheDataWarehouseisanintegrated,subject-oriented,time-variant,non-volatiledatabasethatprovidessupportfordecisionmaking.,TheDataWarehouse,IntegratedTheDataWarehouseisacentralized,consolidateddatabasethatintegratesdataretrievedfromtheentireorganization.Subject-OrientedTheDataWarehousedataisarrangedandoptimizedtoprovideanswerstoquestionscomingfromdiversefunctionalareaswithinacompany.,TheDataWarehouse,TimeVariantTheWarehousedatarepresenttheflowofdatathroughtime.Itcanevencontainprojecteddata.Non-VolatileOncedataentertheDataWarehouse,theyareneverremoved.TheDataWarehouseisalwaysgrowing.,OperationalDatabasevs.Datawarehouse,OperationalDBSimilardatacanhavedifferentrepresentationsormeaningsFunctionalorprocessorientationCurrenttransactionFrequentupdating,DataWarehouseUnifiedviewofalldataelementsSubjectorientationfordecisionsupportHistoricalinformationwithtimedimensionDataareaddedwithoutchange,DataMart,Adatamartisasmall,single-subjectdatawarehousesubsetthatprovidesdecisionsupporttoasmallgroupofpeople.,DataMart,DataMartscanserveasatestvehicleforcompaniesexploringthepotentialbenefitsofDataWarehouses.DataMartsaddresslocalordepartmentalproblems,whileaDataWarehouseinvolvesacompany-wideefforttosupportdecisionmakingatalllevelsintheorganization.,EnterpriseDataWarehouse(EDW),AlargescaredatawarehousethatisusedacrosstheenterprisefordecisionsupportEDWareusedtoprovidedataformanytypesofDSS,includingCRM,SCM,BPM,BAM,PLM,andKMS.BPM:BusinessperformancemanagementBAM:BusinessactivitymonitoringPLM:productlifecyclemanagementKMS:Knowledgemanagementsystems,Metadata,Metadataisthedataaboutdata.Inadatawarehouse,metadatadescribethecontentsofadatawarehouseandthemannerofitsuseGoodmetadataisessentialtotheeffectiveoperationofadatawarehouseanditisusedindataacquisition/collection,datatransformation,anddataaccess.,TheneedsforTechnicalmetadata,Theuseofdatawarehousinganddecisionprocessingofteninvolvesawiderangeofdifferentproducts,andcreatingandmaintainingthemetadatafortheseproductsistime-consuminganderrorprone.Automatingthemetadatamanagementprocessandenablingthesharingofthisso-calledtechnicalmetadatabetweenproductscanreducebothcostsanderrors.,TheNeedsforBusinessmetadata,Businessusersneedtohaveagoodunderstandingofwhatinformationexistsinadatawarehouse.Theyneedtounderstandwhattheinformationmeansfromabusinessviewpoint,howitwasderived,fromwhatsourcesystemsitcomes,whenitwascreated,whatpre-builtreportsandanalysesexistformanipulatingtheinformation,andsoforth.,metadatainadatawarehouse,Kimballliststhefollowingtypesofmetadatainadatawarehouse:SourcesystemmetadataDatastagingmetadataDBMSmetadataRalphKimball,TheDataWarehouseLifecycleToolkit,Wiley,1998,ISBN0-471-25547-5,sourcesystemmetadata,sourcespecifications,suchasrepositories,andsourcelogicalschemassourcedescriptiveinformation,suchasownershipdescriptions,updatefrequenciesandaccessmethodsprocessinformation,suchasjobschedulesandextractioncode,datastagingmetadata,dataacquisitioninformation,suchasdatatransmissionschedulingandresults,andfileusagedimensiontablemanagement,suchasdefinitionsofdimensions,andsurrogatekeyassignmentstransformationandaggregation,suchasdataenhancementandmapping,DBMSloadscripts,andaggregatedefinitionsaudit,joblogsanddocumentation,suchasdatalineagerecords,datatransformlogs,StarSchema,Thestarschemaisadatamodelingtechniqueusedtomapmultidimensionaldecisionsupportintoarelationaldatabase.Starschemasyieldaneasilyimplementedmodelformultidimensionaldataanalysiswhilestillpreservingtherelationalstructureoftheoperationaldatabase.,StarSchema,FourComponents:FactsDimensionsAttributesAttributehierarchies,Figure13.14AThree-DimensionalViewofSales,Figure13.17AttributeHierarchiesinMultidimensionalAnalysis,Facts,NumericmeasurementsthatrepresentspecificbusinessaspectoractivityNormallystoredinfacttablethatiscenterofstarschemaFacttablecontainsfactslinkedthroughtheirdimensionsMetricsarefactscomputedatruntime,Dimensions,QualifyingcharacteristicsprovideadditionalperspectivestoagivenfactDecisionsupportdataalmostalwaysviewedinrelationtootherdataStudyfactsviadimensionsDimensionsstoredindimensiontables,Attributes,DimensionsprovidedescriptionsoffactsthroughtheirattributesNomathematicallimittothenumberofdimensionsUsetosearch,filter,andclassifyfactsSliceanddice:focusonslicesofthedatacubformoredetailedanalysis,AttributeHierarchies,Providetop-downdataorganizationTwopurpose:AggregationDrill-down/roll-updataanalysisDeterminehowthedataareextractedandrepresentedStoredinaDBMSsdatadictionaryUsedbyOLAPtooltoaccesswarehouseproperly.,StarSchema,Astarschemaconsistsoffacttablesanddimensiontables.Facttablescontainthequantitativeorfactualdataaboutabusiness-theinformationbeingqueried.Thisinformationisoftennumerical,additivemeasurementsandcanconsistofmanycolumnsandmillionsorbillionsofrows.Dimensiontablesareusuallysmallerandholddescriptivedatathatreflectsthedimensions,orattributes,ofabusiness.,Figure13.17StarSchemaForSales,StarSchemaRepresentation,Factsanddimensionsarenormallyrepresentedbyphysicaltablesinthedatawarehousedatabase.Thefacttableisrelatedtoeachdimensiontableinamany-to-one(M:1)relationship.Factanddimensiontablesarerelatedbyforeignkeysandaresubjecttotheprimary/foreignkeyconstraints.,Figure13.18OrdersStarSchema,StarSchema,Performance-ImprovingTechniquesNormalizationofdimensionaltablesMultiplefacttablesrepresentingdifferentaggregationlevelsDenormalizationoffacttablesTablepartitioningandreplication,Figure13.19NormalizedDimensionTables,MultipleFactTables,Practice,Howtodesignastarschemaforanautoinsurancecompanytodoriskanalysis?WhatistheObjective?WhataretheFacts?WhataretheDimensions?WhataretheAttributes?WhataretheAttributehierarchy?,AutoinsuranceDWstarschema,DataWarehouseDesign,GrainAdefinitionofthehighestlevelofdetailthatissupportedinadatawarehouseDrill-downTheprocessofprobingbeyondasummarizedvaluetoinvestigateeachofthedetailtransactionsthatcomprisethesummary,DataWarehouseImplementation,TheDataWarehouseasanActiveDecisionSupportNetworkACompany-WideEffortthatRequiresUserInvolvementandCommitmentatAllLevelsSatisfytheTrilogy:Data,Analysis,andUsersApplyDatabaseDesignProcedures,DataWarehouseImplementation,ImplementingadatawarehouseisgenerallyamassiveeffortthatmustbeplannedandexecutedaccordingtoestablishedmethodsTherearemanyfacetstotheprojectlifecycle,andnosinglepersoncanbeanexpertineacharea,DataWarehouseImplementationRoadMap,DataIntegrationandtheExtraction,Transformation,andLoad(ETL)Process,Dataintegrationcomprisesthreemajorprocesses:dataaccess(theabilitytoaccessandextractdatafromanydatasource)datafederation(theintegrationofbusinessviewsacrossmultipledatastores),andchangecapture(theidentification,capture,anddeliveryofthechangesmadetoenterprisedatasources).,DataIntegrationandtheExtraction,Transformation,andLoad(ETL)Process,Extraction,transformation,andload(ETL)Extraction-readingdatafromadatabaseTransformation-convertingtheextracteddatafromitspreviousformintotheformthatcanbeplacedintoadatawarehouseLoad-puttingthedataintothedatawarehouse,DataIntegrationandtheExtraction,Transformation,andLoad(ETL)Process,DataCleanse,Datacleansingordatascrubbingistheactofdetectingandcorrecting(orremoving)corruptorinaccuraterecordsfromarecordset,table,ordatabase.Usedmainlyindatabases,thetermreferstoidentifyingincomplete,incorrect,inaccurate,irrelevantetc.partsofthedataandthenreplacing,modifyingordeletingthisdirtydata.,ETLtools,AgoodETLtoolmustbeabletocommunicatewiththemanydifferentrelationaldatabasesandreadthevariousfileformatsusedthroughoutanorganization.ETLtoolshavestartedtomigrateintoEnterpriseApplicationIntegration,orevenEnterpriseServiceBus,systemsthatnowcovermuchmorethanjusttheextraction,transformationandloadingofdata.ManyETLvendorsnowhavedataprofiling,dataqualityandmetadatacapabilities.,On-LineAnalyticalProcessing,On-LineAnalyticalProcessing(OLAP)isanadvanceddataanalysisenvironmentthatsupportsdecisionmaking,businessmodeling,andoperationsresearchactivities.FourMainCharacteristicsofOLAPUsemultidimensionaldataanalysistechniques.Provideadvanceddatabasesupport.Provideeasy-to-useenduserinterfaces.Supportclient/serverarchitecture.,On-LineAnalyticalProcessing,AdditionalFunctionsofMultidimensionalDataAnalysisTechniquesAdvanceddatapresentationfunctionsAdvanceddataaggregation,consolidation,andclassificationfunctionsAdvancedcomputationalfunctionsAdvanceddatamodelingfunctions,IntegrationOfOLAPWithASpreadsheetProgram,Figure13.7OLAPServerArrangement,SAPsBusinessInformationWarehouse:anEnterprise-WideInformationHub,Anend-to-endenterprise-wideinformationhubtosupportplanninganddecision-making.AcentraldatarepositoryofSAP,non-SAP,current,andhistoricalbusinesstransactionsandmetadata.Timelyinformationtoalllevelsandroles,fromanalysttoexecutive.YearsofSAPfinancial,logistic,andhumanresourceinformationsystemsexperienceweddedwithmoderndatawarehousemethodologies.,ASampleOfCurrentDataWarehousingAndDataMiningVendors,Table13.10,SuccessStoriesatPepsi,Usingthedatawarehouse,wevebeenabletoidentifyimportantitems,findnationalsuppliersforthem,andleveragethoserelationshipstoreducecosts.“Thankstothewarehouse,Pepsicanmonitorpurchasingcomplianceattheuserlevel,anabilitythathasboostedpriceandproductcompliancewellover90percent.Thewarehousealsohelpsensure100percentsalestaxcompliance,saysBridgman.Sincegoingonlinein1995,thewarehousehashelpedgenerateprocurementsavingsinexcessof$100million.,LevelsofDWSupportforEnterpriseDecisionMaking,Theneedforreal-timedata,AbusinessoftencannotaffordtowaitawholedayforitsoperationaldatatoloadintothedatawarehouseforanalysisProvidesincrementalreal-timedatashowingeverystatechangeandalmostanalogouspatternsovertimeMaintainingmetadatainsyncispossibleLesscostlytodevelop,maintain,andsecureonehugedatawarehousesothatdataarecentralizedforBI/BAtoolsAnEAIwithreal-timedatacollectioncanreduceoreliminatethenightlybatchprocesses,Real-Time/ActiveDataWarehouse(RDW/ADW),Loadingandandprovidingdataviathedatawarehouseastheybecomeavailable.ExpandtraditionaldatawarehousefunctionsintotherealmoftacticaldecisionmakingEmpowerdecisionmakingwheninteractdirectlywithcustomersandsuppliers.,Real-TimeDataWarehousing,DataWarehouseAdministration,Duetoitshugesizeanditsintrinsicnature,

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论