数据挖掘概念与技术 第二版 韩家炜 第十一章a1VisualMine.ppt_第1页
数据挖掘概念与技术 第二版 韩家炜 第十一章a1VisualMine.ppt_第2页
数据挖掘概念与技术 第二版 韩家炜 第十一章a1VisualMine.ppt_第3页
数据挖掘概念与技术 第二版 韩家炜 第十一章a1VisualMine.ppt_第4页
数据挖掘概念与技术 第二版 韩家炜 第十一章a1VisualMine.ppt_第5页
已阅读5页,还剩58页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

29.05.2020,DataMining:ConceptsandTechniques,1,DataMining:ConceptsandTechniquesChapter11ApplicationsandTrendsinDataMiningAdditionalTheme:VisualDataMining,JiaweiHanandMichelineKamberDepartmentofComputerScienceUniversityofIllinoisatUrbana-C/hanj2006JiaweiHanandMichelineKamber.Allrightsreserved.,29.05.2020,DataMining:ConceptsandTechniques,2,29.05.2020,DataMining:ConceptsandTechniques,3,VisualDataMining:AnOverview,WhatisVisualDataMining?SurveyoftechniquesDataVisualizationVisualizingDataMiningResultsVisualDataMining,29.05.2020,DataMining:ConceptsandTechniques,4,WhatIsVisualDataMining?,Visualdatamining“discoversimplicitandusefulknowledgefromlargedatasetsusingdataand/orknowledgevisualizationtechniques”Datavisualization+Dataminingtechniques,29.05.2020,DataMining:ConceptsandTechniques,5,WhyVisualDataMining?,AdvantagesofhumanvisualsystemHighlyparallelprocessorSophisticatedreasoningengineLargeknowledgebaseCanbeusedtocomprehenddatadistributions,patterns,clusters,andoutliers,29.05.2020,DataMining:ConceptsandTechniques,6,WhyNotOnlyVisualDataMining?,DisadvantagesofhumanvisualsystemNeedstrainingNotautomatedIntrinsicbiasLimitofabout106or107observations(Wegman1995)Powerofintegrationwithanalyticalmethods,29.05.2020,DataMining:ConceptsandTechniques,7,ScopeofVisualDataMining,Visualization:Useofcomputergraphicstocreatevisualimageswhichaidintheunderstandingofcomplex,oftenmassiverepresentationsofdataVisualDataMining:Theprocessofdiscoveringimplicitbutusefulknowledgefromlargedatasetsusingvisualizationtechniques,ComputerGraphics,HighPerformanceComputing,PatternRecognition,HumanComputerInterfaces,MultimediaSystems,29.05.2020,DataMining:ConceptsandTechniques,8,PurposeofVisualization,GaininsightintoaninformationspacebymappingdataontographicalprimitivesProvidequalitativeoverviewoflargedatasetsSearchforpatterns,trends,structure,irregularities,relationshipsamongdataHelpfindinterestingregionsandsuitableparametersforfurtherquantitativeanalysisProvideavisualproofofcomputerrepresentationsderived,29.05.2020,DataMining:ConceptsandTechniques,9,VisualDataMining&DataVisualization,IntegrationofvisualizationanddataminingdatavisualizationdataminingresultvisualizationdataminingprocessvisualizationinteractivevisualdataminingDatavisualizationDatainadatabaseordatawarehousecanbeviewedatdifferentlevelsofabstractionasdifferentcombinationsofattributesordimensionsDatacanbepresentedinvariousvisualforms,29.05.2020,DataMining:ConceptsandTechniques,10,AbilitiesofHumansandComputers,29.05.2020,DataMining:ConceptsandTechniques,11,VisualMiningvs.ScientificVis.&Graphics,ScientificVisualizationOftenvisualizephysicalmodel,lowdimensionalityGraphicsMoreconcernedwithhowtorender(draw)ratherthanwhattorender,29.05.2020,DataMining:ConceptsandTechniques,12,DataVisualization,ViewdataindatabaseordatawarehouseUsermaycontrolDifferentlevelsofdetailsSubsetofattributesDrawnusingboxplots,histograms,polylines,etc.,29.05.2020,DataMining:ConceptsandTechniques,13,HistoricalOverviewofExploratoryDataVisualizationTechniques(cf.WB95),PioneeringworksofTufteTuf83,Tuf90andBertinBer81focusonVisualizationofdatawithinherent2D-/3D-semanticsGeneralrulesforlayout,colorcomposition,attributemapping,etc.DevelopmentofvisualizationtechniquesfordifferenttypesofdatawithanunderlyingphysicalmodelGeographicdata,CADdata,flowdata,imagedata,voxeldata,etc.Developmentofvisualizationtechniquesforarbitrarymultidimensionaldata(w.o.anunderlyingphysicalmodel)Applicabletodatabasesandotherinformationresources,29.05.2020,DataMining:ConceptsandTechniques,14,DimensionsofExploratoryDataVisualization,29.05.2020,DataMining:ConceptsandTechniques,15,ClassificationofDataVisualizationTechniques,GeometricTechniques:Scatterplots,Landscapes,ProjectionPursuit,ProsectionViews,Hyperslice,ParallelCoordinates.Icon-basedTechniques:ChernoffFaces,StickFigures,Shape-Coding,ColorIcons,TileBars,.Pixel-orientedTechniques:RecursivePatternTechnique,CircleSegmentsTechnique,Spiral-&Axes-Techniques,.HierarchicalTechniques:DimensionalStacking,Worlds-within-Worlds,Treemap,ConeTrees,InfoCube,.Graph-BasedTechniques:BasicGraphs(Straight-Line,Polyline,Curved-Line,.)SpecificGraphs(e.g.,DAG,Symmetric,Cluster,.)Systems(e.g.,TomSawyer,Hy+,SeeNet,Narcissus,.)HybridTechniques:arbitrarycombinationsfromabove,29.05.2020,DataMining:ConceptsandTechniques,16,Distortion&Dynamic/InteractionTechniques,DistortionTechniquesSimpleDistortion(e.g.PerspectiveWall,BifocalLenses,TableLens,GraphicalFisheyeViews,.)ComplexDistortion(e.g.HyperbolicRepr.Hyperbox,.)Dynamic/InteractionTechniquesData-to-VisualizationMapping(e.g.AutoVisual,SPlus,XGobi,IVEE,.)Projections:(e.g.GrandTour,SPlus,XGobi,.)Filtering(Selection,Querying)(e.g.MagicLens,Filter/FlowQueries,InfoCrystal,.)Linking&Brushing(e.g.Xmdv-Tool,XGobi,DataDesk,.)Zooming(e.g.PAD+,IVEE,DataSpace,.)DetailonDemand(e.g.IVEE,TableLens,MagicLens,VisDB,.),29.05.2020,DataMining:ConceptsandTechniques,17,VisualSurvey,DatavisualizationtechniquesScatterplotMatrices,Landscapes,ParallelCoordinatesIcon-based,DimensionalStacking,Treemaps,29.05.2020,DataMining:ConceptsandTechniques,18,DirectVisualization,RibbonswithTwistsBasedonVorticity,29.05.2020,DataMining:ConceptsandTechniques,19,GeometricTechniques,BasicIdeaVisualizationofgeometrictransformationsandprojectionsofthedataMethodsLandscapesWis95ProjectionPursuitTechniquesHub85(atechniquesforfindingmeaningfulprojectionsofmultidimensionaldata)Scatterplot-MatricesAnd72,Cle93ProsectionViewsFB94,STDS95HypersliceWL93ParallelCoordinatesIns85,ID90,29.05.2020,DataMining:ConceptsandTechniques,20,matrixofscatterplots(x-y-diagrams)ofthek-dimensionaldatatotalof(k2/2-k)scatterplots,UsedbyermissionofM.Ward,WorcesterPolytechnicInstitute,Scatterplot-MatricesCleveland93,29.05.2020,DataMining:ConceptsandTechniques,21,LandscapesWis95,VisualizationofthedataasperspectivelandscapeThedataneedstobetransformedintoa(possiblyartificial)2Dspatialrepresentationwhichpreservesthecharacteristicsofthedata,newsarticlesvisualizedasalandscape,UsedbypermissionofB.Wright,VisibleDecisionsInc.,29.05.2020,DataMining:ConceptsandTechniques,22,ParallelCoordinatesIns85,ID90,nequidistantaxeswhichareparalleltooneofthescreenaxesandcorrespondtotheattributestheaxesarescaledtotheminimum,maximumrangeofthecorrespondingattributeeverydataitemcorrespondstoapolygonallinewhichintersectseachoftheaxesatthepointwhichcorrespondstothevaluefortheattribute,29.05.2020,DataMining:ConceptsandTechniques,23,ParallelCoordinates,29.05.2020,DataMining:ConceptsandTechniques,24,Icon-BasedTechniques,BasicIdeaVisualizationofthedatavaluesasfeaturesoficonsOverviewChernoff-FacesChe73,Tuf83StickFiguresPic70,PG88ShapeCodingBed90ColorIconsLev91,KK94TileBarsHea95(useofsmalliconsrepresentingtherelevancefeaturevectorsindocumentretrieval),29.05.2020,DataMining:ConceptsandTechniques,25,censusdatashowingage,income,sex,education,etc.,usedbypermissionofG.Grinstein,UniversityofMassachusettesatLowell,StickFigures,29.05.2020,DataMining:ConceptsandTechniques,26,HierarchicalTechniques,BasicIdea:Visualizationofthedatausingahierarchicalpartitioningintosubspaces.OverviewDimensionalStackingLWW90Worlds-within-WorldsFB90a/bTreemapShn92,Joh93ConeTreesRMC91InfoCubeRG93,29.05.2020,DataMining:ConceptsandTechniques,27,DimensionalStackingLWW90,partitioningofthen-dimensionalattributespacein2-dimensionalsubspaceswhicharestackedintoeachotherpartitioningoftheattributevaluerangesintoclassestheimportantattributesshouldbeusedontheouterlevelsadequateespeciallyfordatawithordinalattributesoflowcardinality,29.05.2020,DataMining:ConceptsandTechniques,28,UsedbypermissionofM.Ward,WorcesterPolytechnicInstitute,Visualizationofoilminingdatawithlongitudeandlatitudemappedtotheouterx-,y-axesandoregradeanddepthmappedtotheinnerx-,y-axes,DimensionalStacking,29.05.2020,DataMining:ConceptsandTechniques,29,DimensionalStacking,Disadvantages:DifficulttodisplaymorethanninedimensionsImportanttomapdimensionsappropriatelyMaybedifficulttounderstandvisualizationsatfirst,29.05.2020,DataMining:ConceptsandTechniques,30,Screen-fillingmethodwhichusesahierarchicalpartitioningofthescreenintoregionsdependingontheattributevaluesThex-andy-dimensionofthescreenarepartitionedalternatelyaccordingtotheattributevalues(classes),TreemapJS91,Shn92,Joh93,MSRNetscanimage:,29.05.2020,DataMining:ConceptsandTechniques,31,29.05.2020,DataMining:ConceptsandTechniques,32,TreemapofaFileSystem(Schneiderman),29.05.2020,DataMining:ConceptsandTechniques,33,Treemaps,Theattributesusedforthepartitioningandtheirorderingareuser-defined(themostimportantattributesshouldbeusedfirst)ThecoloroftheregionsmaycorrespondtoanadditionalattributeSuitabletogetanoverviewoverlargeamountsofhierarchicaldata(e.g.,filesystem)andfordatawithmultipleordinalattributes(e.g.,censusdata),29.05.2020,DataMining:ConceptsandTechniques,34,DataMiningResultVisualization,PresentationoftheresultsorknowledgeobtainedfromdatamininginvisualformsExamplesScatterplotsandboxplots(obtainedfromdescriptivedatamining)DecisiontreesAssociationrulesClustersOutliersGeneralizedrulesTextmining,29.05.2020,DataMining:ConceptsandTechniques,35,BoxplotsfromStatsoft:MultipleVariableCombinations,29.05.2020,DataMining:ConceptsandTechniques,36,VisualizationofDataMiningResultsinSASEnterpriseMiner:ScatterPlots,29.05.2020,DataMining:ConceptsandTechniques,37,VisualizationofAssociationRulesinSGI/MineSet3.0,29.05.2020,DataMining:ConceptsandTechniques,38,VisualizationofDecisionTreeinSGI/MineSet3.0,29.05.2020,DataMining:ConceptsandTechniques,39,VizualizationofDecisionTrees,29.05.2020,DataMining:ConceptsandTechniques,40,VisualizationofClusterGroupingIBMIntelligentMiner,29.05.2020,DataMining:ConceptsandTechniques,41,AssociationRules(MineSet),LHSandRHSitemsaremappedtox-,y-axisConfidence,supportcorrespondtoheightofthebarordisc,respectivelyInterestingnessismappedtoColor,29.05.2020,DataMining:ConceptsandTechniques,42,MineSet:AssociationRules,29.05.2020,DataMining:ConceptsandTechniques,43,AssociationBallGraph(DBMiner),ItemsarevisualizedasballsArrowsindicateruleimplicationSizerepresentssupport,29.05.2020,DataMining:ConceptsandTechniques,44,Classification(SASEMSAS01),ColorcorrespondstorelativefrequencyofaclassinanodeBranchlinethicknessisproportionaltothesquarerootoftheobjects,TreeViewer,29.05.2020,DataMining:ConceptsandTechniques,45,ClusterAnalysis(H-BLOB:HierarchicalBLOB)SBG00,Cluster,Formellipsoids,Formblobs(implicitsurfaces),29.05.2020,DataMining:ConceptsandTechniques,46,H-BLOB,29.05.2020,DataMining:ConceptsandTechniques,47,TextMining(ThemeRiverWCF+00),VisualizationofthematicChangesindocumentsVerticaldistanceindicatescollectivestrengthofthethemes,29.05.2020,DataMining:ConceptsandTechniques,48,DataMiningProcessVisualization,Presentationofthevariousprocessesofdatamininginvisualformssothatuserscanseetheflowofdatacleaning,integration,preprocessing,miningDataextractionprocessWherethedataisextractedHowthedataiscleaned,integrated,preprocessed,andminedMethodselectedfordataminingWheretheresultsarestoredHowtheymaybeviewed,29.05.2020,DataMining:ConceptsandTechniques,49,VisualizationofDataMiningProcessesbyClementine,Understandvariationswithvisualizeddata,Seeyoursolutiondiscoveryprocessclearly,29.05.2020,DataMining:ConceptsandTechniques,50,InteractiveVisualDataMining,UsingvisualizationtoolsinthedataminingprocesstohelpusersmakesmartdataminingdecisionsExampleDisplaythedatadistributioninasetofattributesusingcoloredsectorsorcolumns(dependingonwhetherthewholespaceisrepresentedbyeitheracircleorasetofcolumns)Usethedisplaytowhichsectorshouldfirstbeselectedforclassificationandwhereagoodsplitpointforthissectormaybe,29.05.2020,DataMining:ConceptsandTechniques,51,Visualdatamining,ProjectionPursuits(Class)ToursDhillonetal.98VisualClassificationAnkerstetal.KDD99,29.05.2020,DataMining:ConceptsandTechniques,52,ProjectionPursuits,Exploratoryprojectionpursuit:Goal:reducedimensionalityDefine“interestingness”indextoeachpossibleprojectionofadatasetMaximizethisindex,projectlinearlyNotalwayspossible/useful,29.05.2020,DataMining:ConceptsandTechniques,53,ClassTours,“VisualizingClassStructureofMultidimensionalData”byDhillonetal.1998Problem:VisualizemultidimensionaldatacategorizedintoclassesSolution:Projectdatainto2Dwhilepreservingdistancesbetweenclassmeans,29.05.2020,DataMining:ConceptsandTechniques,54,Class-PreservingProjection:Preservesdistancesbetweenprojectedmeans,29.05.2020,DataMining:ConceptsandTechniques,55,Tours,Toursareanimatedandinterpolatedsequencesof2DprojectionsAsimov1985Classtours:sequencesofclass-preserving2-dimensionalprojectionsCaptures“inter-classstructureofcomplex,multi-dimensionaldata”,29.05.2020,DataMining:ConceptsandTechniques,56,InteractiveVisualMiningbyPerception-BasedClassification(PBC),29.05.2020,DataMining:ConceptsandTechniques,57,VisualClassification,“VisualClassification:AnInteractiveApproachtoDecisionTreeConstruction”byAnkerst

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论