模式识别聚类算法

上传人：3*** IP属地：湖北上传时间：2023-01-12 格式：PPT 页数：49 大小：1.16MB 积分：30 举报 版权申诉

已阅读5页，还剩44页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

模式识别PatternRecognitionChapter10(III)HIERARCHICALCLUSTERINGALGORITHMS12一月20231HIERARCHICALCLUSTERINGALGORITHMSTheyproduceahierarchyof(hard)clusteringsinsteadofasingleclustering.Applicationsin:SocialsciencesBiologicaltaxonomyModernbiologyMedicineArchaeologyComputerscienceandengineering2LetX={x1,…,xN},

xi=[xi1,…,xil]T.Recallthat:Inhardclusteringeachvectorbelongsexclusivelytoasinglecluster.Anm-(hard)clusteringofX,,isapartitionofXintomsets(clusters)C1,…,Cm

,sothat:

Bythedefinition:={Cj,j=1,…m}Definition:

Aclustering1containing

kclustersissaidtobenestedintheclustering2containingr(<k)clusters,ifeachclusterin1

isasubsetofaclusterin2. Wewrite123Example:Let1={{x1,x3},{x4},{x2,x5}},2={{x1,x3,x4},{x2,x5}},

3={{x1,x4},{x3},{x2,x5}},4={{x1,x2,x4},{x3,x5}}.Itis12,butnot

13,

14,

11.Remarks:Hierarchicalclusteringalgorithmsproduceahierarchyofnested

clusterings.TheyinvolveNstepsatthemost.Ateachstept,theclusteringtisproducedbyt-1.Maincategories:Agglomerativeclusteringalgorithms:Here0={{x1},…,{xN}},N-1={{x1,…,xN}}

and0…N-1.Divisiveclusteringalgorithms:Here0={{x1,…,xN}}, N-1={{x1},…,{xN}}

andN-1…0.4AGGLOMERATIVEALGORITHMSLetg(Ci,Cj)aproximityfunctionbetweentwoclustersofX.GeneralizedAgglomerativeScheme(GAS)InitializationChoose0={{x1},…,{xN}}t=0Repeatt=t+1Choose(Ci,Cj)int-1suchthatDefineCq=CiCjandproducet=(t-1-{Ci,Cj})

{Cq}Untilallvectorslieinasinglecluster.5Remarks:Iftwovectorscometogetherintoasingleclusteratleveltofthehierarchy,theywillremaininthesameclusterforallsubsequentclusterings.Asaconsequence,thereisnowaytorecovera“poor”clusteringthatmayhaveoccurredinanearlierlevelofhierarchy.Numberofoperations:O(N3)6Definitionsofsomeusefulquantities:LetX={x1,x2,…,xN},withxi=[xi1,xi2,…,xil]T.Patternmatrix

(D(X)):AnNxlmatrixwhosei-throwisxi(transposed).Proximity(similarityordissimilarity)matrix

(P(X)):AnNxNmatrixwhose(i,j)elementequalstheproximity(xi,xj)(similaritys(xi,xj),dissimilarityd(xi,xj)).Example1:LetX={x1,x2,x3,x4,x5},withx1=[1,1]T,x2=[2,1]T,

x3=[5,4]T,x4=[6,5]T,x5=[6.5,6]T.

Euclideandistance

Tanimotodistance7Thresholddendrogram(ordendrorgram):Itisaneffectivewayofrepresentingthesequenceofclusteringswhichareproducedbyanagglomerativealgorithm. Inthepreviousexample,ifisemployedasthedistancemeasurebetweentwosetsandtheEuclideanoneasthedistancemeasurebetweentwovectors,thefollowingseriesofclusteringsareproduced:8Proximity(dissimilarityordissimilarity)dendrogram:Adendrogramthattakesintoaccountthelevelofproximity(dissimilarityorsimilarity)wheretwoclustersaremerged

forthefirsttime.Example2:Intermsofthepreviousexample,theproximitydendrogramsthatcorrespondtoP΄(X)andP(X)areRemark:Onecanreadilyobservethelevelinwhichaclusterisformedandthelevelinwhichitisabsorbedinalargercluster(indicationofthenaturalclustering).9Agglomerativealgorithmsaredividedinto:Algorithmsbasedonmatrixtheory.Algorithmsbasedongraphtheory.Inthesequelwefocusonlyondissimilaritymeasures.Algorithmsbasedonmatrixtheory.TheytakeasinputtheNxNdissimilaritymatrixP0=P(X).AteachleveltwheretwoclustersCiandCjaremergedtoCq,thedissimilaritymatrixPtisextractedfromPt-1by:DeletingthetworowsandcolumnsofPtthatcorrespondtoCiandCj.AddinganewrowandanewcolumnthatcontainthedistancesofnewlyformedCq=CiCjfromtheremainingclustersCs,viaarelationoftheform

d(Cq,Cs)=f(d(Ci,Cs),d(Cj,Cs),d(Ci,Cj))10Anumberofdistancefunctionscomplywiththefollowingupdateequation

d(Cq,Cs)=aid(Ci,Cs)+aj(d(Cj,Cs)+bd(Ci,Cj)+c|d(Ci,Cs)-d(Cj,Cs)|Algorithmsthatfollowtheaboveequationare:Singlelink(SL)algorithm(ai=1/2,aj=1/2,b=0,c=-1/2).Inthiscase

d(Cq,Cs)=min{d(Ci,Cs),d(Cj,Cs)}Completelink(CL)algorithm(ai=1/2,aj=1/2,b=0,c=1/2).Inthiscase

d(Cq,Cs)=max{d(Ci,Cs),d(Cj,Cs)}Remarks:Singlelinkformsclustersatlowdissimilaritieswhilecompletelinkformsclustersathighdissimilarities.Singlelinktendstoformelongatedclusters(chainingeffect)whilecompletelinktendstoformcompactclusters.Therestalgorithmsarecompromisesbetweenthesetwoextremes.11Example:(a)ThedatasetX.(b)Thesinglelinkalgorithmdissimilaritydendrogram.(c)Thecompletelinkalgorithmdissimilaritydendrogram12WeightedPairGroupMethodAverage(WPGMA)(ai=1/2,aj=1/2,b=0,c=0).Inthiscase:

d(Cq,Cs)=(d(Ci,Cs)

+d(Cj,Cs))/2UnweightedPairGroupMethodAverage(UPGMA)(ai=ni/(ni+nj),aj=nj/(ni+nj),b=0,c=0,whereniisthecardinalityofCi).Inthiscase:

d(Cq,Cs)=(ni

d(Ci,Cs)

d(Cj,Cs))/(ni+nj)UnweightedPairGroupMethodCentroid(UPGMC)(ai=ni/(ni+nj),aj=nj/(ni+nj),b=-ni

nj/(ni+nj)2,c=0).Inthiscase:

FortheUPGMC,itistruethatdqs=||mq-ms||2,wheremqisthemeanofCq

.13WeightedPairGroupMethodCentroid(WPGMC)(ai=1/2,aj=1/2,b=-1/4,c=0).Inthiscase

dqs=(dis

+djs)/2–dij/4 ForWPGMCtherearecaseswheredqs

max{dis,djs}

(crossover)Wardorminimumvariancealgorithm.Herethedistanced´ijbetweenCiandCjisdefinedas

d´ij=(ni

nj/(ni+nj))

||mi-mj||2

d´qscanalsobewrittenas

d´qs=((ni

)d´is

(ni

nj)d´js

–

nsd´ij

)/(ni+nj+ns)Remark:Ward’salgorithmformst+1

bymergingthetwoclustersthatleadtothesmallestpossibleincreaseofthetotalvariance,i.e.,14Example3:Considerthefollowingdissimilaritymatrix(Euclideandistance)Allthealgorithmsproducetheabovesequenceofclusterings

atdifferentproximitylevels:SLCLWPGMAUPGMAWPGMCUPGMCWard0000000011111110.50.75352.251.54163725.7527.524.6926.4631.750={{x1},{x2},{x3},{x4},{x5}},1={{x1,x2},{x3},{x4},{x5}},2={{x1,x2},{x3},{x4,x5}},3={{x1,x2,x3},{x4,x5}},4={{x1,x2,x3,x4,x5}}15Monotonicityandcrossover: Forthefollowingdissimilaritymatrix

thedissimilaritydendrogramsproducedbysinglelink,completelinkandUPGMC(thesameresultisproducedifWPGMCisemployed)are:{x1,x2,x3,x4}isformedatlowerdissimilaritylevelthan{x1,x2}(crossover)16

Monotonicitycondition:

IfclustersCiandCjareselectedtobemergedinclusterCq,atthetth

levelofthehierarchy,thecondition

d(Cq,Ck)

d(Ci,Cj) mustholdforallCk,k

≠

q.Inotherwords,themonotonicityconditionimpliesthataclusterisformedathigherdissimilaritylevelthananyofitscomponents.Remarks:Monotonicityisapropertythatisexclusivelyrelatedtotheclusteringalgorithmandnottothe(initial)proximitymatrix.Analgorithmthatdoesnotsatisfythemonotonicitycondition,doesnotnecessarilyproducedendrogramswithcrossovers.Singlelink,completelink,UPGMA,WPGMAandtheWard’salgorithmsatisfythemonotonicitycondition,whileUPGMCandWPGMCdonotsatisfyit.17Complexityissues:GASrequires,ingeneral,O(N3)operations.MoreefficientimplementationsrequireO(N2logN)computationaltime.Foraclassofwidelyusedalgorithms,implementationsthatrequireO(N2)computationaltimeandO(N2)orO(N)storagehavealsobeenproposed.ParallelimplementationsonSIMDmachineshavealsobeenconsidered.18AlgorithmsbasedongraphtheorySomebasicdefinitionsfromgraphtheory:Agraph,G,isdefinedasanorderedpairG=(V,E),whereV={vi,i=1,…,N}isasetofverticesandEisasetofedgesconnectingsomepairsofvertices.Anedgeconnectingviandvjisdenotedbyeijor(vi,vj).Agraphiscalledundirectedgraphifthereisnodirectionassignedtoanyofitsedges.Otherwise,wedealwithdirectedgraphs.Agraphiscalledunweightedgraphifthereisnocostassociatedwithanyofitsedges.Otherwise,wedealwithweightedgraphs.ApathinGbetweenverticesvi1andvin

isasequenceofverticesandedgesoftheformvi1

ei1i2vi2....vin-1ein-1invin.AloopinGisapathwherevi1andvin

coincide.Asubgraph

G´=(V´,E´)ofGisagraphwithV´VandE´E1,whereE1isasubsetofEcontainingverticesthatconnectverticesofV´.Everygraphisasubgraphtoitself.Aconnectedsubgraph

G´=(V´,E´)isasubgraphwherethereexistsatleastonepathconnectinganypairofverticesinV´.19Acompletesubgraph

G´=(V´,E)isasubgraphwhereforanypairofverticesinV’thereexistsanedgeinE´´connectingthem.AmaximallyconnectedsubgraphofGisaconnectedsubgraph

G´ofGthatcontainsasmanyverticesofGaspossible.AmaximallycompletesubgraphofGisacompletesubgraph

G´ofGthatcontainsasmanyverticesofGaspossible. Examplesfortheabove,areshowninthefollowingfigure.2021NOTE:Intheframeworkofclustering,eachvertexofagraphcorrespondstoafeaturevector.Usefultoolsforthealgorithmsbasedongraphtheoryarethethresholdgraphandtheproximitygraph.Athresholdgraph

G(a)

isanundirected,unweightedgraphwithNnodes,eachonecorrespondingtoavectorofX.Noself-loopsormultipleedgesbetweenanytwoverticesareencountered.ThesetofedgesofG(a)containsthoseedges(vi,vj)forwhichthedistanced(xi,xj)betweenthevectorscorrespondingtoviandvjislessthana.Aproximitygraph

Gp(a)isathresholdgraphG(a),allofwhoseedges(vi,vj)areweightedwiththeproximitymeasured(xi,xj).22(a)ThethresholdgraphG(3),(b)theproximity(dissimilarity)graphGp(3),(c)thethresholdgraphG(5),(d)thedissimilaritygraphGp(5),forthedissimilaritymatrixP(X)giveninexample1.23Moredefinitions:Inthisframework,weconsidergraphsG,ofNnodes,whereeachnodecorrespondstoavectorofX.ValidclustersareconnectedcomponentsofGthatsatisfyanadditionalgraphpropertyh(k).Typicalgraphpropertiesforaconnectedsubgraph

G´ofGare:Nodeconnectivity:ThelargestintegerksuchthatallpairsofnodesofG´arejoinedbyatleastkpathshavingnonodesincommon.Edgeconnectivity:Thelargestintegerksuchthatallpairsofnodesarejoinedbyatleastkpathshavingnoedgesincommon.Nodedegree:Thelargestintegerksuchthateachnodehasatleastkincidentedges.

2425Theproximityfunctiongh(k)(Cr,Cs)betweentwoclustersisdefinedintermsofaproximitymeasurebetweenvectors(nodes)certainconstraintsimposedbypropertyh(k)onthesubgraphsthatareformed.Inmathematicallanguage:

gh(k)(Cr,Cs)

minxuCr,xvCs

{d(xu,xv)a:theG(a)

subgraphdefinedbyCrCsis(a)connectedandeither(b1)hasthepropertyh(k)or(b2)iscomplete}Graphtheory-basedalgorithmicscheme(GTAS):ItistheGASinthecontextofgraphtheory.InthecontextofGTAS,apairofclusters(Ci,Cj)isselectedtobemergedaccordingto:26Singlelink(SL)algorithm.Here

gh(k)(Cr,Cs)=minxuCr,xvCs

{d(xu,xv)a:theG(a)

subgraphdefinedbyCrCsisconnected}

minxCr,yCs

d(x,y)

(why?)

Remarks:Nopropertyh(k)orcompletenessisrequired.TheSLstemmingfromthegraphtheoryisexactlythesamewiththeSLstemmingfromthematrixtheory.Completelink(CL)algorithm.Here

gh(k)(Cr,Cs)

minxuCr,xvCs

{d(xu,xv)a:theG(a)

subgraphdefinedbyCrCsiscomplete}

maxxCr,yCs

d(x,y)

(why?)Remarks:Nopropertyh(k)isrequired.TheCLstemmingfromgraphtheoryisexactlythesamewiththeCLstemmingfrommatrixtheory.27Example5:Forthedissimilaritymatrix,SLandCLproducethesamehierarchyofclusteringsatthelevelsgiveninthetable.SLCL0={{x1},{x2},{x3},{x4},{x5}}

1={{x1,x2},{x3},{x4},{x5}}

2={{x1,x2},{x3},{x4,x5}}

3={{x1,x2,x3},{x4,x5}}

4={{x1,x2,x3,x4,x5}}02.504.228

Remarks:SLposestheweakestpossiblegraphcondition(connectivity)fortheformationofacluster,whileCLposesthestrongestpossiblegraphcondition(completeness)fortheformationofacluster.Avarietyofgraphtheory-basedalgorithms,thatliebetweenthesetwoextremesresultforvariouschoicesofh(k).Fork=1allthesealgorithmscollapsetothesinglelinkalgorithm.Askincreases,theresultingsubgraphsapproachcompleteness.ClusteringalgorithmsbasedontheMinimumSpanningTree(MST)Definitions:SpanningTree:Itisaconnectedgraph(containingalltheverticesofthegraph),withnoloops(onlyonepathconnectsanytwovertices).WeightofaSpanningTree:Thesumoftheweightsofitsedges(providedthattheyhavebeenassignedwithaweight).MinimumSpanningTree(MST):Aspanningtreewiththesmallestweightamongthespanningtreesconnectingalltheverticesofthegraph.29Remarks:TheMSThasN-1edges.Whenalltheweightsaredifferentfromeachother,theMSTisunique.Otherwise,itmaynotbeunique.EmployingtheGTASandsubstitutinggh(k)(Cr,Cs)withg(Cr,Cs)=minij{wij:

xiCr,xjCs}wherewij=d(xi,xj),wecandeterminetheMST.Alternatively,ahierarchyofclusteringsmaybeobtainedbytheMSTasfollows:

TheclusteringtatthetthlevelisthesetofconnectedcomponentsoftheMST,whenonlyitstsmallestweightsareconsidered.Remark:ThehierarchyproducedbyMSTisthesamewiththatproducedbythesinglelinkalgorithm,atleastwhenallwij´saredifferentfromeachother.30TiesintheproximitymatrixSLproducesthesamehierarchyofclusterings,independentoftheorderofconsiderationofedgeswithequalweights.CLmayproducedifferenthierarchies,dependingontheorderofconsiderationofedgeswithequalweights.Theothergraphtheory-basedalgorithmsbehaveastheCL.Thesametrendappearsinthematrix-basedalgorithms.Inthiscase,tiesmayappearatalaterstageofthealgorithm.Example6:LetNotethatP(2,3)=P(3,4).(CL(a))(SL)(CL(b))31DIVISIVEALGORITHMSLetg(Ci,Cj)beadissimilarityfunctionbetweentwoclusters.LetCtjdenotethejthclusterofthetthclusteringt,t=0,…,N-1,j=1,…,t+1.32GeneralizedDivisiveScheme(GDS)InitializationChoose0={X}astheinitialclusteringt=0Repeat

t=t+1Fori=1totAmongallpossiblepairsofclusters(Cr,Cs)thatformapartitionofCt-1,i,findthepair(C1t-1,i,C2t-1,i)thatgivesthemaximumvalueforg.EndforFromthetpairsdefinedinthepreviousstep,choosetheonethatmaximizesg.Supposethatthisis(C1t-1,j,C2t-1,j).Thenewclusteringis:t=(t-1

–

{Ct-1,j}){C1t-1,j,C2t-1,j}Relabeltheclustersoft.Untileachvectorliesinasinglecluster.33Remarks:Differentchoicesofggiverisetodifferentalgorithms.TheGDSiscomputationallyverydemandingevenforsmallN.Algorithmsthatruleoutmanypartitionsasnot“reasonable”,underaprespecifiedcriterion,havealsobeenproposed.Algorithmswherethesplittingoftheclustersisbasedonallfeaturesofthefeaturevectorsarecalledpolytheticalgorithms.Otherwise,ifthesplittingisbasedonasinglefeatureateachstep,thealgorithmsarecalledmonotheticalgorithms.34HIERARCHICALALGORITHMSFORLARGEDATASETSSincethenumberofoperationsrequiredbyGASisgreaterthanO(N2), algorithmsresultingbyGASareprohibitiveforverylargedatasetsencountered,forexample,inwebminingandbioinformatics.Toovercomethisdrawback,varioushierarchicalalgorithmsofspecialtypehavebeendevelopedthataretailoredtohandlelargedatasets.

Typicalexamplesare:TheCUREalgorithm.TheROCKalgorithm.TheChameleonalgorithm.35TheCURE(ClusteringUsingRepresentatives)algorithmInCURE:EachclusterCisrepresentedbyasetofk>1representatives,denotedbyRC.Theserepresentativestryto“capture”theshapeofthecluster.Theyarechosenatthe“border”oftheclusterandthen,theyarepushedtowarditsmean,inordertodiscardtheirregularitiesoftheborder.Morespecifically,RCisdeterminedasfollows:SelectxC,withthemaximumdistancefromthemeanmCofCandsetRC={x}Fori=2

tomin{k,nC}

(nC

isthenumberofpointsinC)DetermineyC-RCthatliesfarthestfromthepointsofRCandset RC=

{y}.ShrinkthepointsxRCtowardthemeanmCinCbyafactora.Thatisx=(1-a)

x+a

,xRC

.CUREisaspecialcaseofGASwherethedistancebetweentwoclustersis

definedas:36WorstcasetimecomplexityforCURE:O(N2log2N).Thisisprohibitiveforverylargedatasets.Solution:Adoptionoftherandomsamplingtechnique. ThesizeN´ofasampledatasetX´,createdfromX,viarandom

sampling,shouldbesufficientlylargeinordertoensurethatthe

probabilityofmissingaclusterduetosamplingislow.CUREutilizingrandomsamplingIdentificationofclustersPartitionrandomlyXintop=N/N´sampledatasets.Foreachoneofthepsampledatasets.ApplytheoriginalversionofCURE,untilN´/qclusters(atthemost)areformed(qisuser-defined).Consideralltheabovep(N´/q)clusters(atthemost)andapplytheoriginalCUREuntiltherequirednumberofclusters,m,isformed.AssignmentofpointstoclustersForeachofthemclustersselectarandomsampleofrepresentativepoints.Assigneachpointxthatisnotclusterrepresentativetotheclusterthatcontainstherepresentativeclosesttoit.37Remarks:CUREissensitivetotheparametersk,N´,a.Specifically:kmustbelargeenoughtocapturethegeometryofeachcluster.N´mustbehigherthanacertainpercentageofN

(typicallyN´

2.5%N)ForsmallaCUREbehavesliketheMSTalgorithm,whileforlargeaitresemblesthealgorithmsthatuseasinglepointrepresentativeforeachcluster.

WorstcasetimecomplexityforCUREusingrandomsampling:O(N´2log2N´)Thealgorithmexhibitslowsensitivitywithrespecttooutlierswithintheclusters.Afewstagesbeforeitstermination,CUREchecksforclusterscontainingveryfewdatapointsandremovesthem(sincetheyarelikelytobeoutliers).IfN´/qissufficientlylarge,comparedtom,itisexpectedthatthepartitionofXwillnotaffectsignificantlythefinalclusteringobtainedbyCURE.CUREcan,inprinciple,revealclustersofnon-sphericalorelongatedshapes,aswellasclustersofwidevarianceinsize.CUREcanbeimplementedefficientlyusingtheheapandthek-dtreedatastructures.38TheROCK(RObustClusteringusinglinKs)algorithmItisbestsuitedfornominal(categorical)features.

SomepreliminariesTwopointsx,yXareconsideredneighborsifs(x,y)θ,wheres(.)isasimilarityfunctionandθauser-definedthresholdofsimilaritybetweentwovectors.

link(x,y)isthenumberofcommonneighborsbetweenxandy.

Assumption:Thereexistsafunctionf(θ)

(<1)suchthat: “EachpointassignedtoaclusterCihasapproximatelynif(θ)neighborsin

Ci(niisthenumberofpointsinCi)” Itcanbeprovedthattheexpectedtotalnumberoflinksamongallpairs

inCiisni1+2f(θ).39ROCKisaspecialcaseofGASwhereTheclosenessbetweentwoclustersisdefinedas Thedenominatoristheexpectedtotalnumberoflinksbetweenthetwoclusters. Thelargertheg(.),themoresimilartheclustersCiandCj

are

.Thestoppingcriterionis:thenumberofclustersbecomeequaltoapredefinednumberm

link(Ci,Cj)=0foreverypairinaclusteringt.TimecomplexityforROCK:SimilartoCUREforlargeN.Prohibitiveforverylargedatasets.Solution:Adoptionofrandomsamplingtechniques.40ROCKutilizingRandomSamplingIdentificationofclustersSelectasubsetX´ofXviarandomsamplingRuntheoriginalROCKalgorithmonX´AssignmentofpointstoclustersForeachclusterCiselectasetLiofnLipointsForeachzX-X´Computeti=Ni/(nLi+1)f(θ),whereNiisthenumberofneighborsofzinLi.Assignztotheclusterwiththemaximumti.Remarks:Achoiceforf(θ)isf(θ)=(1-θ)/(1+θ),with(θ<1).

f(θ)dependsonthedatasetandthetypeofclustersweareinterestedin.Thehypothesisabouttheexistenceoff(θ)isverystrong.Itmayleadtopoorresultsifthedatadonotsatisfyit.41TheChameleonalgorithmThisalgorithmisnotbasedona“static”modelingofclusterslikeCURE(whereeachclusterisrepresentedbythesamenumberofrepresentatives)andROCK(whereconstraintsareposedthroughthefunctionf(θ)).Itenjoysbothdivisiveandagglomerativefeatures.Somepreliminaries: LetG=(V,E)beagraphwhere:eachvertexofVcorrespondstoadatapointinX.EisasetofedgesconnectingpairsofverticesinV.Eachvertexisweightedbythesimilarityofthecorrespondingpoints.Edgecutset:LetCbeasetofpointscorrespondingtoasubsetofV. AssumethatCispartitionedintotwononemptysetsCiandCj ThesubsetE´ijofEthatconnectCiandCjiscallededgecutset.42Minimumcutset:Let|E´ij|bethesumofweightsoftheedgesinE´ij. If|E´ij|=min(Cu,Cv)|Euv|,then(Ci

,Cj)istheminimumcutsetofC.

Minimumcutbisector:IfCi,Cjareconstrainedtobeofapproximate

equalsize,theminimumcutset(overallpossiblepartitionsof

approximatelyequalsize)isknownastheminimumcutbisector.Example7:Thegraphinthefollowingfigureconsistsof5verticesandtheedgesshown,eachoneweightedbythesimilarityofthepointsthatcorrespondtotheverticesitconnects.Theminimumcutsetandtheminimumcutbisectorareshown.43MeasuringthesimilaritybetweenclustersRelativeinterconnectivity:LetEijbethesetofedgesconnectingpointsinCiwithpointsinCj.LetEibethesetofedgescorrespondingtotheminimumcutbisectorofCi.Let|Ei|,|Eij|bethesumoftheweightsoftheedgesofEi,Eij,respectively.AbsoluteinterconnectivitybetweenCi,Cj

=|Eij|InternalinterconnectivityofCi

=|Ei|RelativeinterconnectivitybetweenCi,Cj:Relativecloseness:LetSijbetheaverageweightoftheedgesinEij

.LetSibetheaverageweightoftheedgesinEi

.RelativeclosenessbetweenCiandCj:44TheChameleonalgorithm

Preliminaryphase Createak-nearestneighborgraphG=(V,E)suchthat:EachvertexofVcorrespondstoadatapoint.TheedgebetweentwoverticesviandvjisaddedtoEifviisoneofthek-nearestneighborsofvj

orviseversa.

Divisivephase Set0={X}

t=0 Repeatt=t+1SelectthelargestclusterCint-1.ReferringtoE,partitionCintotwosetssothat:thesumoftheweightsoftheedgecutsetbetweentheresultingclustersisminimized.eachclustercontainsatleast25%oftheverticesofC. Untileachcluster

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

模式识别聚类算法

文档简介

温馨提示

最新文档

评论

模式识别聚类算法

文档简介

温馨提示

最新文档

评论

相关文档