




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
模式识别PatternRecognitionChapter10(III)HIERARCHICALCLUSTERINGALGORITHMS12一月20231HIERARCHICALCLUSTERINGALGORITHMSTheyproduceahierarchyof(hard)clusteringsinsteadofasingleclustering.Applicationsin:SocialsciencesBiologicaltaxonomyModernbiologyMedicineArchaeologyComputerscienceandengineering2LetX={x1,…,xN},
xi=[xi1,…,xil]T.Recallthat:Inhardclusteringeachvectorbelongsexclusivelytoasinglecluster.Anm-(hard)clusteringofX,,isapartitionofXintomsets(clusters)C1,…,Cm
,sothat:
Bythedefinition:={Cj,j=1,…m}Definition:
Aclustering1containing
kclustersissaidtobenestedintheclustering2containingr(<k)clusters,ifeachclusterin1
isasubsetofaclusterin2. Wewrite123Example:Let1={{x1,x3},{x4},{x2,x5}},2={{x1,x3,x4},{x2,x5}},
3={{x1,x4},{x3},{x2,x5}},4={{x1,x2,x4},{x3,x5}}.Itis12,butnot
13,
14,
11.Remarks:Hierarchicalclusteringalgorithmsproduceahierarchyofnested
clusterings.TheyinvolveNstepsatthemost.Ateachstept,theclusteringtisproducedbyt-1.Maincategories:Agglomerativeclusteringalgorithms:Here0={{x1},…,{xN}},N-1={{x1,…,xN}}
and0…N-1.Divisiveclusteringalgorithms:Here0={{x1,…,xN}}, N-1={{x1},…,{xN}}
andN-1…0.4AGGLOMERATIVEALGORITHMSLetg(Ci,Cj)aproximityfunctionbetweentwoclustersofX.GeneralizedAgglomerativeScheme(GAS)InitializationChoose0={{x1},…,{xN}}t=0Repeatt=t+1Choose(Ci,Cj)int-1suchthatDefineCq=CiCjandproducet=(t-1-{Ci,Cj})
{Cq}Untilallvectorslieinasinglecluster.5Remarks:Iftwovectorscometogetherintoasingleclusteratleveltofthehierarchy,theywillremaininthesameclusterforallsubsequentclusterings.Asaconsequence,thereisnowaytorecovera“poor”clusteringthatmayhaveoccurredinanearlierlevelofhierarchy.Numberofoperations:O(N3)6Definitionsofsomeusefulquantities:LetX={x1,x2,…,xN},withxi=[xi1,xi2,…,xil]T.Patternmatrix
(D(X)):AnNxlmatrixwhosei-throwisxi(transposed).Proximity(similarityordissimilarity)matrix
(P(X)):AnNxNmatrixwhose(i,j)elementequalstheproximity(xi,xj)(similaritys(xi,xj),dissimilarityd(xi,xj)).Example1:LetX={x1,x2,x3,x4,x5},withx1=[1,1]T,x2=[2,1]T,
x3=[5,4]T,x4=[6,5]T,x5=[6.5,6]T.
Euclideandistance
Tanimotodistance7Thresholddendrogram(ordendrorgram):Itisaneffectivewayofrepresentingthesequenceofclusteringswhichareproducedbyanagglomerativealgorithm. Inthepreviousexample,ifisemployedasthedistancemeasurebetweentwosetsandtheEuclideanoneasthedistancemeasurebetweentwovectors,thefollowingseriesofclusteringsareproduced:8Proximity(dissimilarityordissimilarity)dendrogram:Adendrogramthattakesintoaccountthelevelofproximity(dissimilarityorsimilarity)wheretwoclustersaremerged
forthefirsttime.Example2:Intermsofthepreviousexample,theproximitydendrogramsthatcorrespondtoP΄(X)andP(X)areRemark:Onecanreadilyobservethelevelinwhichaclusterisformedandthelevelinwhichitisabsorbedinalargercluster(indicationofthenaturalclustering).9Agglomerativealgorithmsaredividedinto:Algorithmsbasedonmatrixtheory.Algorithmsbasedongraphtheory.Inthesequelwefocusonlyondissimilaritymeasures.Algorithmsbasedonmatrixtheory.TheytakeasinputtheNxNdissimilaritymatrixP0=P(X).AteachleveltwheretwoclustersCiandCjaremergedtoCq,thedissimilaritymatrixPtisextractedfromPt-1by:DeletingthetworowsandcolumnsofPtthatcorrespondtoCiandCj.AddinganewrowandanewcolumnthatcontainthedistancesofnewlyformedCq=CiCjfromtheremainingclustersCs,viaarelationoftheform
d(Cq,Cs)=f(d(Ci,Cs),d(Cj,Cs),d(Ci,Cj))10Anumberofdistancefunctionscomplywiththefollowingupdateequation
d(Cq,Cs)=aid(Ci,Cs)+aj(d(Cj,Cs)+bd(Ci,Cj)+c|d(Ci,Cs)-d(Cj,Cs)|Algorithmsthatfollowtheaboveequationare:Singlelink(SL)algorithm(ai=1/2,aj=1/2,b=0,c=-1/2).Inthiscase
d(Cq,Cs)=min{d(Ci,Cs),d(Cj,Cs)}Completelink(CL)algorithm(ai=1/2,aj=1/2,b=0,c=1/2).Inthiscase
d(Cq,Cs)=max{d(Ci,Cs),d(Cj,Cs)}Remarks:Singlelinkformsclustersatlowdissimilaritieswhilecompletelinkformsclustersathighdissimilarities.Singlelinktendstoformelongatedclusters(chainingeffect)whilecompletelinktendstoformcompactclusters.Therestalgorithmsarecompromisesbetweenthesetwoextremes.11Example:(a)ThedatasetX.(b)Thesinglelinkalgorithmdissimilaritydendrogram.(c)Thecompletelinkalgorithmdissimilaritydendrogram12WeightedPairGroupMethodAverage(WPGMA)(ai=1/2,aj=1/2,b=0,c=0).Inthiscase:
d(Cq,Cs)=(d(Ci,Cs)
+d(Cj,Cs))/2UnweightedPairGroupMethodAverage(UPGMA)(ai=ni/(ni+nj),aj=nj/(ni+nj),b=0,c=0,whereniisthecardinalityofCi).Inthiscase:
d(Cq,Cs)=(ni
d(Ci,Cs)
+
nj
d(Cj,Cs))/(ni+nj)UnweightedPairGroupMethodCentroid(UPGMC)(ai=ni/(ni+nj),aj=nj/(ni+nj),b=-ni
nj/(ni+nj)2,c=0).Inthiscase:
FortheUPGMC,itistruethatdqs=||mq-ms||2,wheremqisthemeanofCq
.13WeightedPairGroupMethodCentroid(WPGMC)(ai=1/2,aj=1/2,b=-1/4,c=0).Inthiscase
dqs=(dis
+djs)/2–dij/4 ForWPGMCtherearecaseswheredqs
max{dis,djs}
(crossover)Wardorminimumvariancealgorithm.Herethedistanced´ijbetweenCiandCjisdefinedas
d´ij=(ni
nj/(ni+nj))
||mi-mj||2
d´qscanalsobewrittenas
d´qs=((ni
+
nj
)d´is
+
(ni
+
nj)d´js
–
nsd´ij
)/(ni+nj+ns)Remark:Ward’salgorithmformst+1
bymergingthetwoclustersthatleadtothesmallestpossibleincreaseofthetotalvariance,i.e.,14Example3:Considerthefollowingdissimilaritymatrix(Euclideandistance)Allthealgorithmsproducetheabovesequenceofclusterings
atdifferentproximitylevels:SLCLWPGMAUPGMAWPGMCUPGMCWard0000000011111110.50.75352.251.54163725.7527.524.6926.4631.750={{x1},{x2},{x3},{x4},{x5}},1={{x1,x2},{x3},{x4},{x5}},2={{x1,x2},{x3},{x4,x5}},3={{x1,x2,x3},{x4,x5}},4={{x1,x2,x3,x4,x5}}15Monotonicityandcrossover: Forthefollowingdissimilaritymatrix
thedissimilaritydendrogramsproducedbysinglelink,completelinkandUPGMC(thesameresultisproducedifWPGMCisemployed)are:{x1,x2,x3,x4}isformedatlowerdissimilaritylevelthan{x1,x2}(crossover)16
Monotonicitycondition:
IfclustersCiandCjareselectedtobemergedinclusterCq,atthetth
levelofthehierarchy,thecondition
d(Cq,Ck)
d(Ci,Cj) mustholdforallCk,k
≠
i,
j
,
q.Inotherwords,themonotonicityconditionimpliesthataclusterisformedathigherdissimilaritylevelthananyofitscomponents.Remarks:Monotonicityisapropertythatisexclusivelyrelatedtotheclusteringalgorithmandnottothe(initial)proximitymatrix.Analgorithmthatdoesnotsatisfythemonotonicitycondition,doesnotnecessarilyproducedendrogramswithcrossovers.Singlelink,completelink,UPGMA,WPGMAandtheWard’salgorithmsatisfythemonotonicitycondition,whileUPGMCandWPGMCdonotsatisfyit.17Complexityissues:GASrequires,ingeneral,O(N3)operations.MoreefficientimplementationsrequireO(N2logN)computationaltime.Foraclassofwidelyusedalgorithms,implementationsthatrequireO(N2)computationaltimeandO(N2)orO(N)storagehavealsobeenproposed.ParallelimplementationsonSIMDmachineshavealsobeenconsidered.18AlgorithmsbasedongraphtheorySomebasicdefinitionsfromgraphtheory:Agraph,G,isdefinedasanorderedpairG=(V,E),whereV={vi,i=1,…,N}isasetofverticesandEisasetofedgesconnectingsomepairsofvertices.Anedgeconnectingviandvjisdenotedbyeijor(vi,vj).Agraphiscalledundirectedgraphifthereisnodirectionassignedtoanyofitsedges.Otherwise,wedealwithdirectedgraphs.Agraphiscalledunweightedgraphifthereisnocostassociatedwithanyofitsedges.Otherwise,wedealwithweightedgraphs.ApathinGbetweenverticesvi1andvin
isasequenceofverticesandedgesoftheformvi1
ei1i2vi2....vin-1ein-1invin.AloopinGisapathwherevi1andvin
coincide.Asubgraph
G´=(V´,E´)ofGisagraphwithV´VandE´E1,whereE1isasubsetofEcontainingverticesthatconnectverticesofV´.Everygraphisasubgraphtoitself.Aconnectedsubgraph
G´=(V´,E´)isasubgraphwherethereexistsatleastonepathconnectinganypairofverticesinV´.19Acompletesubgraph
G´=(V´,E)isasubgraphwhereforanypairofverticesinV’thereexistsanedgeinE´´connectingthem.AmaximallyconnectedsubgraphofGisaconnectedsubgraph
G´ofGthatcontainsasmanyverticesofGaspossible.AmaximallycompletesubgraphofGisacompletesubgraph
G´ofGthatcontainsasmanyverticesofGaspossible. Examplesfortheabove,areshowninthefollowingfigure.2021NOTE:Intheframeworkofclustering,eachvertexofagraphcorrespondstoafeaturevector.Usefultoolsforthealgorithmsbasedongraphtheoryarethethresholdgraphandtheproximitygraph.Athresholdgraph
G(a)
isanundirected,unweightedgraphwithNnodes,eachonecorrespondingtoavectorofX.Noself-loopsormultipleedgesbetweenanytwoverticesareencountered.ThesetofedgesofG(a)containsthoseedges(vi,vj)forwhichthedistanced(xi,xj)betweenthevectorscorrespondingtoviandvjislessthana.Aproximitygraph
Gp(a)isathresholdgraphG(a),allofwhoseedges(vi,vj)areweightedwiththeproximitymeasured(xi,xj).22(a)ThethresholdgraphG(3),(b)theproximity(dissimilarity)graphGp(3),(c)thethresholdgraphG(5),(d)thedissimilaritygraphGp(5),forthedissimilaritymatrixP(X)giveninexample1.23Moredefinitions:Inthisframework,weconsidergraphsG,ofNnodes,whereeachnodecorrespondstoavectorofX.ValidclustersareconnectedcomponentsofGthatsatisfyanadditionalgraphpropertyh(k).Typicalgraphpropertiesforaconnectedsubgraph
G´ofGare:Nodeconnectivity:ThelargestintegerksuchthatallpairsofnodesofG´arejoinedbyatleastkpathshavingnonodesincommon.Edgeconnectivity:Thelargestintegerksuchthatallpairsofnodesarejoinedbyatleastkpathshavingnoedgesincommon.Nodedegree:Thelargestintegerksuchthateachnodehasatleastkincidentedges.
2425Theproximityfunctiongh(k)(Cr,Cs)betweentwoclustersisdefinedintermsofaproximitymeasurebetweenvectors(nodes)certainconstraintsimposedbypropertyh(k)onthesubgraphsthatareformed.Inmathematicallanguage:
gh(k)(Cr,Cs)
=
minxuCr,xvCs
{d(xu,xv)a:theG(a)
subgraphdefinedbyCrCsis(a)connectedandeither(b1)hasthepropertyh(k)or(b2)iscomplete}Graphtheory-basedalgorithmicscheme(GTAS):ItistheGASinthecontextofgraphtheory.InthecontextofGTAS,apairofclusters(Ci,Cj)isselectedtobemergedaccordingto:26Singlelink(SL)algorithm.Here
gh(k)(Cr,Cs)=minxuCr,xvCs
{d(xu,xv)a:theG(a)
subgraphdefinedbyCrCsisconnected}
minxCr,yCs
d(x,y)
(why?)
Remarks:Nopropertyh(k)orcompletenessisrequired.TheSLstemmingfromthegraphtheoryisexactlythesamewiththeSLstemmingfromthematrixtheory.Completelink(CL)algorithm.Here
gh(k)(Cr,Cs)
=
minxuCr,xvCs
{d(xu,xv)a:theG(a)
subgraphdefinedbyCrCsiscomplete}
maxxCr,yCs
d(x,y)
(why?)Remarks:Nopropertyh(k)isrequired.TheCLstemmingfromgraphtheoryisexactlythesamewiththeCLstemmingfrommatrixtheory.27Example5:Forthedissimilaritymatrix,SLandCLproducethesamehierarchyofclusteringsatthelevelsgiveninthetable.SLCL0={{x1},{x2},{x3},{x4},{x5}}
1={{x1,x2},{x3},{x4},{x5}}
2={{x1,x2},{x3},{x4,x5}}
3={{x1,x2,x3},{x4,x5}}
4={{x1,x2,x3,x4,x5}}02.504.228
Remarks:SLposestheweakestpossiblegraphcondition(connectivity)fortheformationofacluster,whileCLposesthestrongestpossiblegraphcondition(completeness)fortheformationofacluster.Avarietyofgraphtheory-basedalgorithms,thatliebetweenthesetwoextremesresultforvariouschoicesofh(k).Fork=1allthesealgorithmscollapsetothesinglelinkalgorithm.Askincreases,theresultingsubgraphsapproachcompleteness.ClusteringalgorithmsbasedontheMinimumSpanningTree(MST)Definitions:SpanningTree:Itisaconnectedgraph(containingalltheverticesofthegraph),withnoloops(onlyonepathconnectsanytwovertices).WeightofaSpanningTree:Thesumoftheweightsofitsedges(providedthattheyhavebeenassignedwithaweight).MinimumSpanningTree(MST):Aspanningtreewiththesmallestweightamongthespanningtreesconnectingalltheverticesofthegraph.29Remarks:TheMSThasN-1edges.Whenalltheweightsaredifferentfromeachother,theMSTisunique.Otherwise,itmaynotbeunique.EmployingtheGTASandsubstitutinggh(k)(Cr,Cs)withg(Cr,Cs)=minij{wij:
xiCr,xjCs}wherewij=d(xi,xj),wecandeterminetheMST.Alternatively,ahierarchyofclusteringsmaybeobtainedbytheMSTasfollows:
TheclusteringtatthetthlevelisthesetofconnectedcomponentsoftheMST,whenonlyitstsmallestweightsareconsidered.Remark:ThehierarchyproducedbyMSTisthesamewiththatproducedbythesinglelinkalgorithm,atleastwhenallwij´saredifferentfromeachother.30TiesintheproximitymatrixSLproducesthesamehierarchyofclusterings,independentoftheorderofconsiderationofedgeswithequalweights.CLmayproducedifferenthierarchies,dependingontheorderofconsiderationofedgeswithequalweights.Theothergraphtheory-basedalgorithmsbehaveastheCL.Thesametrendappearsinthematrix-basedalgorithms.Inthiscase,tiesmayappearatalaterstageofthealgorithm.Example6:LetNotethatP(2,3)=P(3,4).(CL(a))(SL)(CL(b))31DIVISIVEALGORITHMSLetg(Ci,Cj)beadissimilarityfunctionbetweentwoclusters.LetCtjdenotethejthclusterofthetthclusteringt,t=0,…,N-1,j=1,…,t+1.32GeneralizedDivisiveScheme(GDS)InitializationChoose0={X}astheinitialclusteringt=0Repeat
t=t+1Fori=1totAmongallpossiblepairsofclusters(Cr,Cs)thatformapartitionofCt-1,i,findthepair(C1t-1,i,C2t-1,i)thatgivesthemaximumvalueforg.EndforFromthetpairsdefinedinthepreviousstep,choosetheonethatmaximizesg.Supposethatthisis(C1t-1,j,C2t-1,j).Thenewclusteringis:t=(t-1
–
{Ct-1,j}){C1t-1,j,C2t-1,j}Relabeltheclustersoft.Untileachvectorliesinasinglecluster.33Remarks:Differentchoicesofggiverisetodifferentalgorithms.TheGDSiscomputationallyverydemandingevenforsmallN.Algorithmsthatruleoutmanypartitionsasnot“reasonable”,underaprespecifiedcriterion,havealsobeenproposed.Algorithmswherethesplittingoftheclustersisbasedonallfeaturesofthefeaturevectorsarecalledpolytheticalgorithms.Otherwise,ifthesplittingisbasedonasinglefeatureateachstep,thealgorithmsarecalledmonotheticalgorithms.34HIERARCHICALALGORITHMSFORLARGEDATASETSSincethenumberofoperationsrequiredbyGASisgreaterthanO(N2), algorithmsresultingbyGASareprohibitiveforverylargedatasetsencountered,forexample,inwebminingandbioinformatics.Toovercomethisdrawback,varioushierarchicalalgorithmsofspecialtypehavebeendevelopedthataretailoredtohandlelargedatasets.
Typicalexamplesare:TheCUREalgorithm.TheROCKalgorithm.TheChameleonalgorithm.35TheCURE(ClusteringUsingRepresentatives)algorithmInCURE:EachclusterCisrepresentedbyasetofk>1representatives,denotedbyRC.Theserepresentativestryto“capture”theshapeofthecluster.Theyarechosenatthe“border”oftheclusterandthen,theyarepushedtowarditsmean,inordertodiscardtheirregularitiesoftheborder.Morespecifically,RCisdeterminedasfollows:SelectxC,withthemaximumdistancefromthemeanmCofCandsetRC={x}Fori=2
tomin{k,nC}
(nC
isthenumberofpointsinC)DetermineyC-RCthatliesfarthestfromthepointsofRCandset RC=
RC
{y}.ShrinkthepointsxRCtowardthemeanmCinCbyafactora.Thatisx=(1-a)
x+a
mC
,xRC
.CUREisaspecialcaseofGASwherethedistancebetweentwoclustersis
definedas:36WorstcasetimecomplexityforCURE:O(N2log2N).Thisisprohibitiveforverylargedatasets.Solution:Adoptionoftherandomsamplingtechnique. ThesizeN´ofasampledatasetX´,createdfromX,viarandom
sampling,shouldbesufficientlylargeinordertoensurethatthe
probabilityofmissingaclusterduetosamplingislow.CUREutilizingrandomsamplingIdentificationofclustersPartitionrandomlyXintop=N/N´sampledatasets.Foreachoneofthepsampledatasets.ApplytheoriginalversionofCURE,untilN´/qclusters(atthemost)areformed(qisuser-defined).Consideralltheabovep(N´/q)clusters(atthemost)andapplytheoriginalCUREuntiltherequirednumberofclusters,m,isformed.AssignmentofpointstoclustersForeachofthemclustersselectarandomsampleofrepresentativepoints.Assigneachpointxthatisnotclusterrepresentativetotheclusterthatcontainstherepresentativeclosesttoit.37Remarks:CUREissensitivetotheparametersk,N´,a.Specifically:kmustbelargeenoughtocapturethegeometryofeachcluster.N´mustbehigherthanacertainpercentageofN
(typicallyN´
2.5%N)ForsmallaCUREbehavesliketheMSTalgorithm,whileforlargeaitresemblesthealgorithmsthatuseasinglepointrepresentativeforeachcluster.
WorstcasetimecomplexityforCUREusingrandomsampling:O(N´2log2N´)Thealgorithmexhibitslowsensitivitywithrespecttooutlierswithintheclusters.Afewstagesbeforeitstermination,CUREchecksforclusterscontainingveryfewdatapointsandremovesthem(sincetheyarelikelytobeoutliers).IfN´/qissufficientlylarge,comparedtom,itisexpectedthatthepartitionofXwillnotaffectsignificantlythefinalclusteringobtainedbyCURE.CUREcan,inprinciple,revealclustersofnon-sphericalorelongatedshapes,aswellasclustersofwidevarianceinsize.CUREcanbeimplementedefficientlyusingtheheapandthek-dtreedatastructures.38TheROCK(RObustClusteringusinglinKs)algorithmItisbestsuitedfornominal(categorical)features.
SomepreliminariesTwopointsx,yXareconsideredneighborsifs(x,y)θ,wheres(.)isasimilarityfunctionandθauser-definedthresholdofsimilaritybetweentwovectors.
link(x,y)isthenumberofcommonneighborsbetweenxandy.
Assumption:Thereexistsafunctionf(θ)
(<1)suchthat: “EachpointassignedtoaclusterCihasapproximatelynif(θ)neighborsin
Ci(niisthenumberofpointsinCi)” Itcanbeprovedthattheexpectedtotalnumberoflinksamongallpairs
inCiisni1+2f(θ).39ROCKisaspecialcaseofGASwhereTheclosenessbetweentwoclustersisdefinedas Thedenominatoristheexpectedtotalnumberoflinksbetweenthetwoclusters. Thelargertheg(.),themoresimilartheclustersCiandCj
are
.Thestoppingcriterionis:thenumberofclustersbecomeequaltoapredefinednumberm
or
link(Ci,Cj)=0foreverypairinaclusteringt.TimecomplexityforROCK:SimilartoCUREforlargeN.Prohibitiveforverylargedatasets.Solution:Adoptionofrandomsamplingtechniques.40ROCKutilizingRandomSamplingIdentificationofclustersSelectasubsetX´ofXviarandomsamplingRuntheoriginalROCKalgorithmonX´AssignmentofpointstoclustersForeachclusterCiselectasetLiofnLipointsForeachzX-X´Computeti=Ni/(nLi+1)f(θ),whereNiisthenumberofneighborsofzinLi.Assignztotheclusterwiththemaximumti.Remarks:Achoiceforf(θ)isf(θ)=(1-θ)/(1+θ),with(θ<1).
f(θ)dependsonthedatasetandthetypeofclustersweareinterestedin.Thehypothesisabouttheexistenceoff(θ)isverystrong.Itmayleadtopoorresultsifthedatadonotsatisfyit.41TheChameleonalgorithmThisalgorithmisnotbasedona“static”modelingofclusterslikeCURE(whereeachclusterisrepresentedbythesamenumberofrepresentatives)andROCK(whereconstraintsareposedthroughthefunctionf(θ)).Itenjoysbothdivisiveandagglomerativefeatures.Somepreliminaries: LetG=(V,E)beagraphwhere:eachvertexofVcorrespondstoadatapointinX.EisasetofedgesconnectingpairsofverticesinV.Eachvertexisweightedbythesimilarityofthecorrespondingpoints.Edgecutset:LetCbeasetofpointscorrespondingtoasubsetofV. AssumethatCispartitionedintotwononemptysetsCiandCj ThesubsetE´ijofEthatconnectCiandCjiscallededgecutset.42Minimumcutset:Let|E´ij|bethesumofweightsoftheedgesinE´ij. If|E´ij|=min(Cu,Cv)|Euv|,then(Ci
,Cj)istheminimumcutsetofC.
Minimumcutbisector:IfCi,Cjareconstrainedtobeofapproximate
equalsize,theminimumcutset(overallpossiblepartitionsof
approximatelyequalsize)isknownastheminimumcutbisector.Example7:Thegraphinthefollowingfigureconsistsof5verticesandtheedgesshown,eachoneweightedbythesimilarityofthepointsthatcorrespondtotheverticesitconnects.Theminimumcutsetandtheminimumcutbisectorareshown.43MeasuringthesimilaritybetweenclustersRelativeinterconnectivity:LetEijbethesetofedgesconnectingpointsinCiwithpointsinCj.LetEibethesetofedgescorrespondingtotheminimumcutbisectorofCi.Let|Ei|,|Eij|bethesumoftheweightsoftheedgesofEi,Eij,respectively.AbsoluteinterconnectivitybetweenCi,Cj
=|Eij|InternalinterconnectivityofCi
=|Ei|RelativeinterconnectivitybetweenCi,Cj:Relativecloseness:LetSijbetheaverageweightoftheedgesinEij
.LetSibetheaverageweightoftheedgesinEi
.RelativeclosenessbetweenCiandCj:44TheChameleonalgorithm
Preliminaryphase Createak-nearestneighborgraphG=(V,E)suchthat:EachvertexofVcorrespondstoadatapoint.TheedgebetweentwoverticesviandvjisaddedtoEifviisoneofthek-nearestneighborsofvj
orviseversa.
Divisivephase Set0={X}
t=0 Repeatt=t+1SelectthelargestclusterCint-1.ReferringtoE,partitionCintotwosetssothat:thesumoftheweightsoftheedgecutsetbetweentheresultingclustersisminimized.eachclustercontainsatleast25%oftheverticesofC. Untileachcluster
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 医院护理质量提升工作总结
- 餐饮企业员工职业技能考试试题集
- 现代物流运输调度系统应用报告
- 服装销售实战话术及心理技巧
- 员工工资增长趋势及预测报告
- 施工企业安全生产责任制解析
- 八年级英语考试试卷解析与训练
- 行业发展趋势专项分析报告
- 中学英语口语训练方案与实践案例
- 班级文化建设创新实践案例
- 中国外币管理制度
- 广州市市政工程主要项目概算指标及编制指引 (2021年)
- 关于体育的论文
- 中医治疗发热
- 第三届“皇家杯”职业院校宠物营养学知识竞赛考试题库(含答案)
- QGDW12505-2025电化学储能电站安全风险评估规范
- 研究生教材SPSS统计软件应用
- 2025年部编版新教材三年级上册《9.犟龟》教案
- 2024年南宁市招聘中小学教师笔试真题
- 2024-2025学年下学期高二英语外研社版期中必刷常考题之被动语态
- 老员工带新员工的培训制度
评论
0/150
提交评论