数据挖掘研究前沿-韩家炜_第1页
数据挖掘研究前沿-韩家炜_第2页
数据挖掘研究前沿-韩家炜_第3页
数据挖掘研究前沿-韩家炜_第4页
数据挖掘研究前沿-韩家炜_第5页
已阅读5页,还剩58页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

DataMining:UnlimitedNewResearchFrontiersJiaweiHanDataMiningResearchGroupDepartmentofComputerScienceUniversityofIllinoisatUrbana-ChampaignAcknowledgements:NSF,ARL,NASA,AFOSR(MURI),DHS,Microsoft,IBM,Yahoo!,HPLab&Boeing06September20231OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions2DataMiningandDataWarehousing

JiaweiHan’sGroupatCS,UIUC

MiningpatternsandknowledgediscoveryfrommassivedataDatamininginheterogeneousinformationnetworksExploringbroadapplicationsofdataminingDevelopedmanyeffectivedataminingalgorithms,e.g.,FPgrowth,PrefixSpan,gSpan,StarCubing,CrossMine,RankingCube,CrossClus,RankClus,andNetClus600+researchpapersinconferencesandjournalsFellowofACM,FellowofIEEE,ACMSIGKDDInnovationAward,W.McDowellAward,DanielDruckerEminentFacultyAwardTextbook,“Datamining:ConceptsandTechniques,”adoptedworldwideProjectleadforNASAEventCubeforAviationSafety[2008-2012]DirectorofInformationNetworkAcademicResearchCenterfundedfromArmyResearchLab(ARL)[2009-2014]3DataMiningResearchGroupatCS,UIUC4NewBooksonDataMining&LinkMiningHan,KamberandPei,DataMining,3rded.2011Yu,HanandFaloutsos(eds.),LinkMining,2010SunandHan,MiningHeterogeneousInformationNetworks,20125OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions6MiningHeterogeneousInformationNetworksRankClus/NetClusVS.RankCompete:ACompetingRandomWalkModelforRank-BasedClusteringDatabaseDataMiningAIIRTop-5rankedconferencesVLDBKDDIJCAISIGIRSIGMODSDMAAAIECIRICDEICDMICMLCIKMPODSPKDDCVPRWWWEDBTPAKDDECMLWSDMTop-5rankedtermsdatamininglearningretrievaldatabasedataknowledgeinformationqueryclusteringreasoningwebsystemclassificationlogicsearchxmlfrequentcognitiontextRankClass[KDD11]KnowledgePropagationinHeterogeneousNetwork8SimilaritySearchandRoleDiscoveryinInformationNetworksPath:ITIPath:ITIGITIWhichimagesaremostsimilartomeinFlickr?PathSim[VLDB11]MetaPath-GuidedSimilaritySearchinNetworksA“dirty”InformationNetwork(imaginary)Cleaned/InferredAdversarialNetworkChiefInsurgentCellLeadAutomaticallyinferRoleDiscoveryinInformationNetworks[KDD’10]AdviseeTopRankedAdvisorTimeNoteDavidM.Blei1.MichaelI.Jordan01-03PhDadvisor,20042.JohnD.Lafferty05-06Postdoc,2006HongCheng1.QiangYang02-03MSadvisor,20032.JiaweiHan04-08PhDadvisor,2008SergeyBrin1.RajeevMotawani97-98UnofficialadvisorInterestingResultsfromOtherDomainsRankCompete:Organizeyourphotoalbumautomatically!RanktreatmentsforAIDSfromMEDLINE9Meta-PathBasedCo-authorshipPredictioninDBLPCo-authorshippredictionproblemWhethertwoauthorsaregoingtocollaborateforthefirsttimeCo-authorshipencodedinmeta-pathAuthor-Paper-AuthorTopologicalfeaturesencodedinmeta-pathsMeta-pathsbetweenauthorsunderlength4Meta-PathSemanticMeaning10ThePowerofPathPredictExplainthepredictionpowerofeachmeta-pathWaldTestforlogisticregressionHigherpredictionaccuracythanusingprojectedhomogeneousnetwork7%higherinpredictionaccuracySocialrelationsplaymoreimportantrole?11CaseStudy:PredictingConcreteCo-AuthorsHighqualitypredictivepowerforsuchadifficulttask12UsingdatainT0=[1989;1995]andT1=[1996;2002]PredictnewcoauthorrelationshipinT2=[2003;2009]OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions13Intuitions:Friendstendtoholdsimilaropinions,whilefoestendtoholdconflictingopinionsBasedonusers’sentimentscoresondifferentobjects,wecaninferthesimilarityanddissimilarity(i.e.,pseudo-friendandpseudo-foerelationship)betweenusersBasedontheinferredfriendship,wecanimprovesentimentanalysisanduserclusteringbyconsideringglobalconsistencyonheterogeneousnetworksState-of-the-ArtExploresimilaropinionsinsteadofoppositeopinionsTypicallyconsidertextcontentwhileignoreInfoNetRelyonobservedfriendship(butmanyarehidden)IndustryNeed/BenefitsUsesentimentanalysistounderstandandminepublicopinionsonproduct/market-relatedissuesQoI-awareminingoftext-richmulti-genrenetworksIntelligentmethodsforpublicopinionassessment14Insight:Exploringoppositeopinionsmayhelptodiscoverhiddenfriendship,whichcanproducebettersentimentscoresanduserclustering.(mutuallyenhanceeachother)1.Considerinformationandsocialnetworks2.Exploreoppositeandsimilaropinions3.Bothobserved&hiddenfriendshiparevaluableObservedfriendshipObserved+InferredfriendshipRefiningsentimentscoresInferhiddenfriendshipPseudo-foeWhatpeoplearethinkingaboutcertaintarget/issue?ExploringOppositeOpinionsandDissimilaritiesforSentimentAnalysisLong-TermGoalsStructurallymodelatext-richmulti-genrenetworkandinvestigatemethodsforminingknowledgefromsuchnetworksEnhancesearchandknowledgediscoverycapabilityintext-richmulti-genrenetworkmodelSimilarstudiesoftext-richmulti-genrenetworkscanleadtonovelmodelsforminingandsearchrichtextualdatainsocialmediaforbusinessapplicationsApproaches15

Developanewmetapath-basedmeasureforinferringsimilarityanddissimilaritybetweenobjectsandusers,basedonsentimentscoresDevelopagraph-basedsemi-supervisedrefiningmodeltopropagatethesentimentscoresfromlabeleddatatounlabeleddataResult:Forsentimentclassification,ourmodelwith40labeleddataachievesbetterperformancethanSVM-basedmodelwith600trainingdata;foruserclustering,theresultsoninferredfriendshipcarrymoresemanticmeaningthantheresultsonobservedfriendship.ObservedfriendshipClusteringresultsofNormalizedCutsondifferentgraphsIdealresults(ground-truth)Observed+InferredEachclusterisagroupofuserswithsimilaropinionswrt3candidatesSVM-basedsupervisedmodelwith600trainingdataBetterthanExploringOppositeOpinionsandDissimilaritiesforSentimentAnalysis(2)TruthDiscovery:FromTruth-FindertoLatentTruthModel(1)State-of-the-ArtHITS-likeRandomWalkmethods(e.g.,TruthFinder(KDD’07),3-Estimate(WSDM’10),Investment(COLING’10),etc.)Limitations:(1)Qualityasasinglevalue,cannotwellsupportmultipletrueattributesforeachentity;and(2)basedonheuristics,notprincipledprobabilisticmodels.IndustryNeed/BenefitsIntegrateentity-attributedatabasesfrommultiplesources,e.g.salescampaign/productopinions,etc.Automaticallylearnthequalityofeachdatasourceandthemostaccurateintegratedrecords16Insight:Somesourcestendtomisstrueattributes(FalseNegatives),whilesomeotherstendtoproducefalseattributes(FalsePositives).Modelingtwo-sidedqualityiskeytosupportingmultipletruevaluesperentityfortruthfinding.ContributionsofLatentTruthModelAPrincipledProbabilisticModelModelnegativeclaimsandtwo-sidedsourcequalitywithBayesianregularizationNaturallysupportmultipletrueattributevaluesLTMcannaturallyincorporatepriordomainknowledgethroughBayesianpriorsLTMcanrunineitherbatchoronlinestreamingmodesforincrementaltruthfindingIMDBNegativeClaimPositiveClaimGenerateImplicitNegativeClaims:HarryPotterQualityofSourcesObservationofClaimsTruthofFactsGenerativeProcessinLTM:1)Foreachsource,generatefalsepositiverate(withstrongprior)andsensitivity(withuniformprior).2)Foreachfactf,generatepriortruthprobabilityandtruthlabel.3)Foreachclaim,generateobservationbasedontruthlabelandcorrespondingsourcequality.NetflixBadSourceCorrectClaimIncorrectClaimHighPrecision,HighRecallHighPrecision,LowRecallLowPrecision,LowRecallLong-TermGoalsTruth-findingmodelsformoregeneraldatatypes(numericalattributes,etc.)Modelsourcequalityinotherdataintegrationtasks,e.g.entityresolution.Trustworthinessinmulti-genrenetworks(text-richnetworks,socialnetworks,etc.)TruthDiscovery:FromTruth-FindertoLatentTruthModel(2)17Result:

Outperformstate-of-the-artmethodsontworealworlddatasets:booksandmoviesLTMisalsoveryscalable:seeLTM_incExperimentaldatasets:LargeandrealBookAuthorsfrom(1263books,879sources,48153claims,2420book-author,100labeled)MovieDirectorsfromBing(15073movies,12sources,108873claims,33526movie-director,100labeled)Varyingcutoffthreshold(consistentlybetter)RunningTimeOutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions18EventCube:AnOverviewMultidimensionalTextDatabase98.0199.0299.0198.02LAXSJCMIAAUSovershootundershootbirdsturbulenceTimeLocationTopicCAFLTXLocation19981999TimeDeviationEncounterTopicdrill-downroll-upEventCubeRepresentationAnalyst…MultidimensionalOLAP,Ranking,CauseAnalysis,TopicSummarization/Comparison……

AnalysisSupport19

EventCube:AnOrganizedApproachforMiningandUnderstandingAnomalousAviationEventsFundedbyNASA$1.2M(2008-now)Text/TopicCube:GeneralIdeaHeterogeneous:categoricalattributes+unstructuredtextHowtocombine?Oursolution:TimeLocationPlaceEnvironment……Event

ReportACNTextdataCube:CategoricalAttributesTerm/TopicWeightT1W1T2W2T3W3……Text/TopicModel:UnstructuredTextMeasure20EffectiveKeywordSearchTopCells(ICDE’10):Rankingaggregatedcells(objects)inTextCube.HealthcareReformTopCellsSystemPerson:Obama,Year:2010Org:Congress,Year:2010Person:Hillary,Year:2008…21EffectiveOLAPExplorationTEXplorer(CIKM’11):Integratingkeyword-basedrankingandOLAPexplorationHealthcareReformTEXplorerSystemTop-1Dimension:PersonTop-2Dimension:OrgTop-3Dimension:Time20102008200422EffectiveEventTrackingPET(KDD’10):trackingpopularityandtextualrepresentationofeventsinsocialcommunities(twitter)debate,cost,senate,…pass,success,law,…HealthcareReformPopularEventTrackingSystemTimePopularityContentFeb2010Mar2010Apr2010benefit,profit,effective,…23OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions24GrowingParallelPaths(WWW2011)

Result:25MappingPagestoRecords(CIKM’10)Databaserecordscanbefoundonlinkpaths!26WinaCS:WebInformationNetworkAnalysisforComputerScienceIntegrationofWebstructureminingandinformationnetworkanalysisTimWeninger,MarinaDanilevsky,etal.,“WinaCS:ConstructionandAnalysisofWeb-BasedComputerScienceInformationNetworks",ACMSIGMOD'11(systemdemo),Athens,Greece,June2011.27OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions2829DiscoveryofSwarmsandPeriodicPatternsinMovingObjectDataAsystemthatminesmovingobjectpatterns:Z.Li,etal.,“MoveMine:MiningMovingObjectDatabases",SIGMOD’10(systemdemo)Z.Li,B.Ding,J.Han,andR.Kays,“MiningHiddenPeriodicBehaviorsforMovingObjects”,KDD’10(sub)Z.Li,B.Ding,J.Han,andR.Kays,“Swarm:MiningRelaxedTemporalMovingObjectClusters”,VLDB’10(sub)←BirdflyingpathsshownonGoogleEarthMinedperiodicpatternsbyournewmethod→←ConvoydiscoversonlyrestrictedpatternsSwarm

discoversmorepatterns→GeoTopicDiscovery:MiningSpatialTextLDMTDMGeoFolkLGTAGeo-taggedphotosw.landscape(coastvs.desertvs.mountain)Z.Yin,eta.,GeoTopicDiscoveryandComparison,WWW'1130OutlineAnIntroductiontoDataMiningResearchGroupMiningandOLAPingInformationNetworksMiningHeterogeneousInformationNetworksMiningText-RichInformationNetworksOLAPing(Multi-dimensionalanalysis)ofinformationnetworks:TextCube,OLAPheterogeneousnetworksTamingtheWeb:WINACS(IntegratedminingofWebstructuresandcontents)MiningCyber-PhysicalSystemsandNetworksConclusions3132

Conclusions:MiningBigandComplexDataMiningbigdata:AcriticalpartofbigdatainitiativesMostdataobjectsareinterconnected,formingheterogeneousinformationnetworksMostdatasetscanbe“organized”or“transformed”into“structured”multi-typed,heterogeneousinfo.networksStructurescanbeprogressivelyminedfromlessorganizeddatasetsbyinfo.networkanalysisSurprisinglyrichknowledgecanbeminefromsuchstructuredheterogeneousinfo.networksClustering,ranking,classification,datacleaning,trustanalysis,rolediscovery,similaritysearch,relationshipprediction,……Miningandexploringbigandcomplexdata:Lotstobedone!ReferencesfortheTalkJ.Han,Y.Sun,X.Yan,and.S.Yu,“MiningHeterogeneousInformationNetworks"(tutorial),KDD'10.MingJi,

JiaweiHan,andMarinaDanilevsky,

"Ranking-BasedClassificationofHeterogeneousInformationNetworks",KDD'11.Y.Sun,J.Han,etal.,"RankClus:IntegratingClusteringwithRankingforHeterogeneousInformationNetworkAnalysis",EDBT’09Y.Sun,Y.Yu,andJ.Han,"Ranking-BasedClusteringofHeterogeneousInformationNetworkswithStarNetworkSchema",KDD’09Y.Sun,J.Han,X.Yan,P.S.Yu,andT.Wu,“PathSim:MetaPath-BasedTop-KSimilaritySearchinHeterogeneousInformationNetworks”,VLDB'11Y.Sun,R.Barber,M.Gupta,C.AggarwalandJ.Han,"Co-AuthorRelationshipPredictioninHeterogeneousBibliographicNetworks",ASONAM'11C.Wang,J.Han,etal.,,,“MiningAdvisor-AdviseeRelationshipsfromResearchPublicationNetworks",KDD'10.TimWeninger,MarinaDanilevsky,etal.,“WinaCS:ConstructionandAnalysisofWeb-BasedComputerScienceInformationNetworks",ACMSIGMOD'11(systemdemo)X.Yin,J.Han,andP.S.Yu,“TruthDiscoverywithMultipleConflictingInformationProvidersontheWeb”,IEEETKDE,20(6),20083333第一节活塞式空压机的工作原理第二节活塞式空压机的结构和自动控制第三节活塞式空压机的管理复习思考题单击此处输入你的副标题,文字是您思想的提炼,为了最终演示发布的良好效果,请尽量言简意赅的阐述观点。第六章活塞式空气压缩机

piston-aircompressor压缩空气在船舶上的应用:

1.主机的启动、换向;

2.辅机的启动;

3.为气动装置提供气源;

4.为气动工具提供气源;

5.吹洗零部件和滤器。

排气量:单位时间内所排送的相当第一级吸气状态的空气体积。单位:m3/s、m3/min、m3/h第六章活塞式空气压缩机

piston-aircompressor空压机分类:按排气压力分:低压0.2~1.0MPa;中压1~10MPa;高压10~100MPa。按排气量分:微型<1m3/min;小型1~10m3/min;中型10~100m3/min;大型>100m3/min。第六章活塞式空气压缩机

piston-aircompressor第一节活塞式空压机的工作原理容积式压缩机按结构分为两大类:往复式与旋转式两级活塞式压缩机单级活塞压缩机活塞式压缩机膜片式压缩机旋转叶片式压缩机最长的使用寿命-

低转速(1460RPM),动件少(轴承与滑片),润滑油在机件间形成保护膜,防止磨损及泄漏,使空压机能够安静有效运作;平时有按规定做例行保养的JAGUAR滑片式空压机,至今使用十万小时以上,依然完好如初,按十万小时相当于每日以十小时运作计算,可长达33年之久。因此,将滑片式空压机比喻为一部终身机器实不为过。滑(叶)片式空压机可以365天连续运转并保证60000小时以上安全运转的空气压缩机1.进气2.开始压缩3.压缩中4.排气1.转子及机壳间成为压缩空间,当转子开始转动时,空气由机体进气端进入。2.转子转动使被吸入的空气转至机壳与转子间气密范围,同时停止进气。3.转子不断转动,气密范围变小,空气被压缩。4.被压缩的空气压力升高达到额定的压力后由排气端排出进入油气分离器内。4.被压缩的空气压力升高达到额定的压力后由排气端排出进入油气分离器内。1.进气2.开始压缩3.压缩中4.排气1.凸凹转子及机壳间成为压缩空间,当转子开始转动时,空气由机体进气端进入。2.转子转动使被吸入的空气转至机壳与转子间气密范围,同时停止进气。3.转子不断转动,气密范围变小,空气被压缩。螺杆式气体压缩机是世界上最先进、紧凑型、坚实、运行平稳,噪音低,是值得信赖的气体压缩机。螺杆式压缩机气路系统:

A

进气过滤器

B

空气进气阀

C

压缩机主机

D

单向阀

E

空气/油分离器

F

最小压力阀

G

后冷却器

H

带自动疏水器的水分离器油路系统:

J

油箱

K

恒温旁通阀

L

油冷却器

M

油过滤器

N

回油阀

O

断油阀冷冻系统:

P

冷冻压缩机

Q

冷凝器

R

热交换器

S

旁通系统

T

空气出口过滤器螺杆式压缩机涡旋式压缩机

涡旋式压缩机是20世纪90年代末期开发并问世的高科技压缩机,由于结构简单、零件少、效率高、可靠性好,尤其是其低噪声、长寿命等诸方面大大优于其它型式的压缩机,已经得到压缩机行业的关注和公认。被誉为“环保型压缩机”。由于涡旋式压缩机的独特设计,使其成为当今世界最节能压缩机。涡旋式压缩机主要运动件涡卷付,只有磨合没有磨损,因而寿命更长,被誉为免维修压缩机。

由于涡旋式压缩机运行平稳、振动小、工作环境安静,又被誉为“超静压缩机”。

涡旋式压缩机零部件少,只有四个运动部件,压缩机工作腔由相运动涡卷付形成多个相互封闭的镰形工作腔,当动涡卷作平动运动时,使镰形工作腔由大变小而达到压缩和排出压缩空气的目的。活塞式空气压缩机的外形第一节活塞式空压机的工作原理一、理论工作循环(单级压缩)工作循环:4—1—2—34—1吸气过程

1—2压缩过程

2—3排气过程第一节活塞式空压机的工作原理一、理论工作循环(单级压缩)

压缩分类:绝热压缩:1—2耗功最大等温压缩:1—2''耗功最小多变压缩:1—2'耗功居中功=P×V(PV图上的面积)加强对气缸的冷却,省功、对气缸润滑有益。二、实际工作循环(单级压缩)1.不存在假设条件2.与理论循环不同的原因:1)余隙容积Vc的影响Vc不利的影响—残存的气体在活塞回行时,发生膨胀,使实际吸气行程(容积)减小。Vc有利的好处—

(1)形成气垫,利于活塞回行;(2)避免“液击”(空气结露);(3)避免活塞、连杆热膨胀,松动发生相撞。第一节活塞式空

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论