




已阅读5页,还剩60页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
introductiontospatialdatamining,7.1patterndiscovery7.2motivation7.3classificationtechniques7.4associationrulediscoverytechniques7.5clustering7.6outlierdetection,learningobjectives,learningobjectives(lo)lo1:understandtheconceptofspatialdatamining(sdm)describetheconceptsofpatternsandsdmdescribethemotivationforsdmlo2:learnaboutpatternsexploredbysdmlo3:learnabouttechniquestofindspatialpatternsfocusonconceptsnotprocedures!mappingsectionstolearningobjectiveslo1-7.1lo2-7.2.4lo3-7.3-7.6,examplesofspatialpatterns,historicexamples(section7.1.5,pp.186)1855asiaticcholerainlondon:awaterpumpidentifiedasthesourcefluorideandhealthygumsnearcoloradorivertheoryofgondwanaland-continentsfitlikepiecesofajigsawpuzllemodernexamplescancerclusterstoinvestigateenvironmenthealthhazardscrimehotspotsforplanningpolicepatrolroutesbaldeaglesnestontalltreesnearopenwaternilevirusspreadingfromnortheastusatosouthandwestunusualwarmingofpacificocean(elnino)affectsweatherinusa,whatisaspatialpattern?,whatisnotapattern?random,haphazard,chance,stray,accidental,unexpectedwithoutdefinitedirection,trend,rule,method,design,aim,purposeaccidental-withoutdesign,outsideregularcourseofthingscasual-absenceofpre-arrangement,relativelyunimportantfortuitous-whatoccurswithoutknowncausewhatisapattern?afrequentarrangement,configuration,composition,regularityarule,law,method,design,descriptionamajordirection,trend,predictionasignificantsurfaceirregularityorunevenness,whatisspatialdatamining?,metaphorsminingnuggetsofinformationembeddedinlargedatabasesnuggets=interesting,useful,unexpectedspatialpatternsmining=lookingfornuggetsneedleinahaystackdefiningspatialdataminingsearchforspatialpatternsnon-trivialsearch-as“automated”aspossiblereducehumaneffortinteresting,usefulandunexpectedspatialpattern,whatisspatialdatamining?-2,non-trivialsearchforinterestingandunexpectedspatialpatternnon-trivialsearchlarge(e.g.exponential)searchspaceofplausiblehypothesisexample-figure7.2,pp.186ex.asiaticcholera:causes:water,food,air,insects,;waterdeliverymechanisms-numerouspumps,rivers,ponds,wells,pipes,.interestingusefulincertainapplicationdomainex.shuttingoffidentifiedwaterpump=savedhumanlifeunexpectedpatternisnotcommonknowledgemayprovideanewunderstandingofworldex.waterpump-choleraconnectionleadtothe“germ”theory,whatisnotspatialdatamining?,simplequeryingofspatialdatafindneighborsofcanadagivennamesandboundariesofallcountriesfindshortestpathfrombostontohoustoninafreewaymapsearchspaceisnotlarge(notexponential)testingahypothesisviaaprimarydataanalysisex.femalechimpanzeeterritoriesaresmallerthanmaleterritoriessearchspaceisnotlarge!sdm:secondarydataanalysistogeneratemultipleplausiblehypothesesuninterestingorobviouspatternsinspatialdataheavyrainfallinminneapolisiscorrelatedwithheavyrainfallinst.paul,giventhatthetwocitiesare10milesapart.commonknowledge:nearbyplaceshavesimilarrainfallminingofnon-spatialdatadiapersalesandbeersalesarecorrelatedineveningsgpsproductbuyersareof3kinds:outdoorsenthusiasts,farmers,technologyenthusiasts,whylearnaboutspatialdatamining?,twobasicreasonsfornewworkconsiderationofuseincertainapplicationdomainsprovidefundamentalnewunderstandingapplicationdomainsscaleupsecondaryspatial(statistical)analysistoverylargedatasetsdescribe/explainlocationsofhumansettlementsinlast5000yearsfindcancerclusterstolocatehazardousenvironmentsprepareland-usemapsfromsatelliteimagerypredicthabitatsuitableforendangeredspeciesfindnewspatialpatternsfindgroupsofco-locatedgeographicfeaturesexercise.name2applicationdomainsnotlistedabove.,whylearnaboutspatialdatamining?-2,newunderstandingofgeographicprocessesforcriticalquestionsex.howisthehealthofplanetearth?ex.characterizeeffectsofhumanactivityonenvironmentandecologyex.predicteffectofelninoonweather,andeconomytraditionalapproach:manuallygenerateandtesthypothesisbut,spatialdataisgrowingtoofasttoanalyzemanuallysatelliteimagery,gpstracks,sensorsonhighways,numberofpossiblegeographichypothesistoolargetoexploremanuallylargenumberofgeographicfeaturesandlocationsnumberofinteractingsubsetsoffeaturesgrowexponentiallyex.findteleconnectionsbetweenweathereventsacrossoceanandlandareassdmmayreducethesetofplausiblehypothesisidentifyhypothesissupportedbythedataforfurtherexplorationusingtraditionalstatisticalmethods,spatialdatamining:actors,domainexpert-identifiessdmgoals,spatialdataset,describedomainknowledge,e.g.well-knownpatterns,e.g.correlatesvalidationofnewpatternsdatamininganalysthelpsidentifypatternfamilies,sdmtechniquestobeusedexplainthesdmoutputstodomainexpertjointeffortfeatureselectionselectionofpatternsforfurtherexploration,thedataminingprocess,fig.7.1,pp.184,choiceofmethods,2approachestominingspatialdata1.pickspatialfeatures;useclassicaldmmethods2.usenovelspatialdataminingtechniquespossibleapproach:definetheproblem:capturespecialneedsexploredatausingmaps,othervisualizationtryreusingclassicaldmmethodsifclassicaldmperformpoorly,trynewmethodsevaluatechosenmethodsrigorouslyperformancetuningasneeded,learningobjectives,learningobjectives(lo)lo1:understandtheconceptofspatialdatamining(sdm)lo2:learnaboutpatternsexploredbysdmrecognizecommonspatialpatternfamiliesunderstanduniquepropertiesofspatialdataandpatternslo3:learnabouttechniquestofindspatialpatternsfocusonconceptsnotprocedures!mappingsectionstolearningobjectiveslo1-7.1lo2-7.2.4lo3-7.3-7.6,7.2.4familiesofsdmpatterns,commonfamiliesofspatialpatternslocationprediction:wherewillaphenomenonoccur?spatialinteraction:whichsubsetsofspatialphenomenainteract?hotspots:whichlocationsareunusual?note:otherfamiliesofspatialpatternsmaybedefinedsdmisagrowingfield,whichshouldaccommodatenewpatternfamilies,7.2.4locationprediction,questionaddressedwherewillaphenomenonoccur?whichspatialeventsarepredictable?howcanaspatialeventsbepredictedfromotherspatialevents?equations,rules,othermethods,examples:wherewillanendangeredbirdnest?whichareasarepronetofiregivenmapsofvegetation,draught,etc.?whatshouldberecommendedtoatravelerinagivenlocation?exercise:listtwopredictionpatterns.,7.2.4spatialinteractions,questionaddressedwhichspatialeventsarerelatedtoeachother?whichspatialphenomenadependonotherphenomenon?examples:exercise:listtwointeractionpatterns.,7.2.4hotspots,questionaddressedisaphenomenonspatiallyclustered?whichspatialentitiesorclustersareunusual?whichspatialentitiessharecommoncharacteristics?examples:cancerclusterscdctolaunchinvestigationscrimehotspotstoplanpolicepatrolsdefiningunusualcomparisongroup:neighborhoodentirepopulationsignificance:probabilityofbeingunusualishigh,7.2.4categorizingfamiliesofsdmpatterns,recallspatialdatamodelconceptsfromchapter2entities-categoriesofdistinct,identifiable,relevantthingsattribute:properties,features,orcharacteristicsofentitiesinstanceofanentity-individualoccurrenceofentitiesrelationship:interactionsorconnectionamongentities,e.g.neighbordegree-numberofparticipatingentitiescardinality-numberofinstanceofanentityinaninstanceofrelationshipself-referencing-interactionamonginstanceofasingleentityinstanceofarelationship-individualoccurrenceofrelationshipspatternfamilies(pf)inentityrelationshipmodelsrelationshipsamongentities,e.g.neighborvalue-basedinteractionsamongattributes,e.g.valueofstudent.ageisdeterminedbystudent.date-of-birth,7.2.4familiesofsdmpatterns,commonfamiliesofspatialpatternslocationprediction:determinationofvalueofaspecialattributeofanentityisbyvaluesofotherattributesofthesameentityspatialinteraction:n-ryinteractionamongsubsetsofentitiesn-ryinteractionsamongcategoricalattributesofanentityhotspots:self-referencinginteractionamonginstancesofanentity.note:otherfamiliesofspatialpatternsmaybedefinedsdmisagrowingfield,whichshouldaccommodatenewpatternfamilies,uniquepropertiesofspatialpatterns,itemsinatraditionaldataareindependentofeachother,whereaspropertiesoflocationsinamapareoften“auto-correlated”.traditionaldatadealswithsimpledomains,e.g.numbersandsymbols,whereasspatialdatatypesarecomplexitemsintraditionaldatadescribediscreteobjectswhereasspatialdataiscontinuousfirstlawofgeographytobler:everythingisrelatedtoeverything,butnearbythingsaremorerelatedthandistantthings.peoplewithsimilarbackgroundstendtoliveinthesameareaeconomiesofnearbyregionstendtobesimilarchangesintemperatureoccurgraduallyoverspace(andtime),example:clusterngandauto-correlation,noteclusteringofnestsitesandsmoothvariationofspatialattributes(figure7.3,pp.188includesmapsoftwootherattributes)alsoseefig.7.4(pp.189)fordistributionswithnoautocorrelation,moransi:ameasureofspatialautocorrelation,givensampledovernlocations.moraniisdefinedaswhereandwisanormalizedcontiguitymatrix.,fig.7.5,pp.190,morani-example,pixelvaluesetin(b)and(c)aresamemoraniisdifferent.q?whichdatasetbetween(b)and(c)hashigherspatialautocorrelation?,figure7.5,pp.190,basicofprobabilitycalculus,givenasetofevents,theprobabilitypisafunctionfrominto0,1whichsatisfiesthefollowingtwoaxiomsandifaandbaremutuallyexclusiveeventsthenp(ab)=p(a)p(b)conditionalprobability:giventhataneventbhasoccurredtheconditionalprobabilitythateventawilloccurisp(a|b).abasicruleisp(ab)=p(a|b)p(b)=p(b|a)p(a)bayesrule:allowsinversionsofprobabilitieswellknownregressionequationallowsderivationoflinearmodels,learningobjectives,learningobjectives(lo)lo1:understandtheconceptofspatialdatamining(sdm)lo2:learnaboutpatternsexploredbysdmlo3:learnabouttechniquestofindspatialpatternsmappingsdmpatternfamiliestotechniquesclassificationtechniquesassociationruletechniquesclusteringtechniquesoutlierdetectiontechniquesfocusonconceptsnotprocedures!mappingsectionstolearningobjectiveslo1-7.1lo2-7.2.4lo3-7.3-7.6,mappingtechniquestospatialpatternfamilies,overviewtherearemanytechniquestofindaspatialpatternfamiliychoiceoftechniquedependsonfeatureselection,spatialdata,etc.spatialpatternfamiliesvs.techniqueslocationprediction:classification,functiondeterminationinteraction:correlation,association,colocationshotspots:clustering,outlierdetectionwediscussthesetechniquesnowwithemphasisonspatialproblemseventhoughthesetechniquesapplytonon-spatialdatasetstoo,given:1.spatialframework2.explanatoryfunctions:3.adependentclass:4.afamilyoffunctionmappings:find:classificationmodel:objective:maximizeclassification_accuracyconstraints:spatialautocorrelationexists,nestlocations,distancetoopenwater,vegetationdurability,waterdepth,locationpredictionasaclassificationproblem,colorversionoffig.7.3,pp.188,techniquesforlocationprediction,classicalmethod:logisticregression,decisiontrees,bayesianclassifierassumeslearningsamplesareindependentofeachotherspatialauto-correlationviolatesthisassumption!q?whatwillamaplooklikewherethepropertiesofapixelwasindependentofthepropertiesofotherpixels?(seebelow-fig.7.4,pp.189)newspatialmethodsspatialauto-regression(sar),markovrandomfieldbayesianclassifier,spatialautoregressionmodel(sar)y=wy+x+wmodelsneighborhoodrelationshipsmodelsstrengthofspatialdependencieserrorvectorsolutionsand-canbeestimatedusingmlorbayesianstat.e.g.,spatialeconometricspackageusesbayesianapproachusingsampling-basedmarkovchainmontecarlo(mcmc)method.likelihood-basedestimationrequireso(n3)ops.otheralternativesdivideandconquer,sparsematrix,ludecomposition,etc.,spatialautoregression(sar),modelevaluation,confusionmatrixmfor2classproblems2rows:actualnest(true),actualnon-nest(false)2columns:predictednests(positive),predictednon-nest(negative)4cellslistingnumberofpixelsinfollowinggroupsfigure7.7(pp.196)nestiscorrectlypredictedtruepositive(tp)modelcanpredictnestwheretherewasnonefalsepositive(fp)no-nestiscorrectlyclassified-(truenegative)(tn)no-nestispredictedatanest-(falsenegative)(fn),modelevaluationcont,outcomesofclassificationalgorithmsaretypicallyprobabilitiesprobabilitiesareconvertedtoclass-labelsbychoosingathresholdlevelb.forexampleprobabilitybis“nest”andprobabilitypower(2,n)possibleassociationskeyassumptionfewassociationsaresupportabovegiventhresholdassociationswithlowsupportarenotintrestingkeyinsight-monotonicityifanassociationitemsethashighsupport,tensodoallitssubsetsdetailspsuedocodeonpp.203executiontraceexample-fig.7.11(pp.203)onnextslide,associationrules:example,spatialassociationrules,spatialassociationrulesaspecialreferencespatialfeaturetransactionsaredefinedaroundinstanceofspecialspatialfeatureitem-types=spatialpredicatesexample:table7.5(pp.204),colocationrules,motivationassociationrulesneedtransactions(subsetsofinstanceofitem-types)spatialdataiscontinuousdecomposingspatialdataintotransactionsmayalterpatternsco-locationrulesforpointdatainspacedoesnotneedtransaction,worksdirectlywithcontinuousspaceuseneighborhooddefinitionandspatialjoins“naturalapproach”,colocationrules,participationindex=minpr(fi,c)wherepr(fi,c)offeaturefiinco-locationc=f1,f2,fk:=fractionofinstancesoffiwithfeaturef1,fi-1,fi+1,fknearbyn(l)=neighborhoodoflocationl,co-locationrulesvs.associationrules,co-locationexample,co-locationexample,dataset=spatialfeaturea,b,c,andtheirinstancesedges=neighborrelationshipcolocationapproach:support(a,b)=min(2/2,3/3)=1support(b,c)=min(2/2,2/2)=1spatialassociationruleapproachcasreferencefeaturetransactions:(b1)(b2)support(b)=2/2=1butsupport(a,b)=0.transactionsloseinformationpartioning1:transactions=(a1,b1,c1),(a2,b2,c2)support(a,b)=1,support(b,c)=1partioning2:transactions=(a2,b1,c1),(b2,c2)support(a,b)=0.5,support(b,c)=1,learningobjectives,learningobjectives(lo)lo1:understandtheconceptofspatialdatamining(sdm)lo2:learnaboutpatternsexploredbysdmlo3:learnabouttechniquestofindspatialpatternsmappingsdmpatternfamiliestotechniquesclassificationtechniquesassociationruletechniquesclusteringtechniquesoutlierdetectiontechniquesfocusonconceptsnotprocedures!mappingsectionstolearningobjectiveslo1-7.1lo2-7.2.4lo3-7.3-7.6,ideaofclustering,clusteringprocessofdiscoveringgroupsinlargedatabases.spatialview:rowsinadatabase=pointsinamulti-dimensionalspacevisualizationmayrevealinterestinggroupsadiversefamilyoftechniquesbasedonavailablegroupdescriptionsexample:census2001attributebasedgroupshomogeneousgroups,e.g.urbancore,suburbs,ruralcentralplacesormajorpopulationcentershierarchicalgroups:necorridor,metropolitanarea,majorcities,neighborhoodsareaswithunusuallyhighpopulationgrowth/declinepurposebasedgroups,e.g.segmentpopulationbyconsumerbehaviourdatadrivengroupingwithlittleaprioridescriptionofgroupsmanydifferentwaysofgroupingusingage,income,spending,ethnicity,.,spatialclusteringexample,exampledata:populationdensityfig.7.13(pp.207)onnextslidegroupinggoal-centralplacesidentifylocationsthatdominatesurroundings,groupsares1ands2groupinggoal-homogeneousareasgroupsarea1anda2note:clusteringliteraturemaynotidentifythegroupinggoalsexplicitly.suchclusteringmethodsmaybeusedforpurposebasedgroupfinding,spatialclusteringexample,exampledata:populationdensityfig.7.13(pp.207)groupinggoal-centralplacesidentifylocationsthatdominatesurroundings,groupsares1ands2groupinggoal-homogeneousareasgroupsarea1anda2,spatialclusteringexample,figure7.13(pp.206),techniquesforclustering,categorizingclassicalmethods:hierarchicalmethodspartitioningmethods,e.g.k-mean,k-medoiddensitybasedmethodsgridbasedmethodsnewspatialmethodscomparisonwithcompletespatialrandomprocessesneighborhoodemourfocus:section7.5:partitioningmethodsandnewspatialmethodssection7.6onoutlierdetectionhasmethodssimilartodensitybasedmethods,algorithmicideasinclustering,hierarchicalallpointsinoneclustersthensplitsandmergestillastoppingcriterionisreachedpartitionalstartwithrandomcentralpointsassignpointstonearestcentralpointupdatethecentralpointsapproachwithstatisticalrigordensityfindclustersbasedondensityofregionsgrid-basedquantizetheclusteringspaceintofinitenumberofcellsusethresholdingtopickhighdensitycellsmergeneighboringcellstoformclusters,learningobjectives,learningobjectives(lo)lo1:understandtheconceptofspatialdatamining(sdm)lo2:learnaboutpatternsexploredbysdmlo3:learnabouttechniquestofindspatialpatternsmappingsdmpatternfamiliestotechniquesclassificationtechniquesassociationruletechniquesclusteringtechniquesoutlierdetectiontechniquesfocusonconceptsnotprocedures!mappingsectionstolearningobjectiveslo1-7.1lo2-7.2.4lo3-7.3-7.6,ideaofoutliers,whatisanoutlier?observationsinconsistentwithrestofthedatasetex.pointd,lorginfig.7.16(a),pp.216techniquesforglobaloutliersstatisticaltestsbasedonmembershipinadistributionpr.iteminpopulationislownon-statisticaltestsbasedondistance,nearestneig
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026届吉林省德惠市九校化学高二上期中联考模拟试题含解析
- 综合解析京改版数学8年级上册期中试卷附参考答案详解【满分必刷】
- 信息科技领域数据中心安全防护与应急响应计划
- 新产品使用手册及功能介绍
- 甘肃省兰州市兰化一中2026届高二化学第一学期期末调研模拟试题含答案
- 家具表面涂层耐用度检验
- (2025年标准)果然结婚协议书
- (2025年标准)闺房的协议书
- 2026届湖南省张家界市慈利县高一化学第一学期期中学业水平测试模拟试题含解析
- (2025年标准)挂靠入户协议书
- 2025年中国智慧养殖行业市场占有率及投资前景预测分析报告
- 电影院安全生产与安全管理规定制度
- 废气处理合同协议
- 镁铝合金行业前景
- 2025-2030中国余热回收行业市场现状供需分析及投资评估规划分析研究报告
- 无人机物流配送服务手册
- 见证取样送检计划方案
- 二年级上册语文课内阅读理解每日一练(含答案)
- 2025-2030年中国功率器件市场发展趋势规划研究报告
- 基层管理培训课程
- 宇宙飞船的发射与回收技术分析
评论
0/150
提交评论