版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic ModelingPaper presentation in data mining classPresenter : 許明壽 ; 蘇建仲Data : 2001/12/182001/12/181CHAMELEONAbout this paper Department of Computer Science and Engineering , University of MinnesotaGeorge KarypisEui-Honh (Sam) HanVipin KumarIEE
2、E Computer Journal - Aug. 19992001/12/182CHAMELEONOutlineProblems definitionMain algorithmKeys features of CHAMELEONExperiment and related workedConclusion and discussion2001/12/183CHAMELEONProblems definitionClusteringIntracluster similarity is maximizedIntercluster similarity is minimizedProblems
3、of existing clustering algorithmsStatic model constrainBreakdown when clusters that are of diverse shapes,densities,and sizesSusceptible to noise , outliers , and artifacts 2001/12/184CHAMELEONStatic model constrainData space constrainK means , PAM etcSuitable only for data in metric spacesCluster s
4、hape constrainK means , PAM , CLARANSAssume cluster as ellipsoidal or globular and are similar sizesCluster density constrainDBScanPoints within genuine cluster are density-reachable and point across different clusters are notSimilarity determine constrainCURE , ROCKUse static model to determine the
5、 most similar cluster to merge 2001/12/185CHAMELEONPartition techniques problem(a) Clusters of widely different sizes(b) Clusters with convex shapes2001/12/186CHAMELEONHierarchical technique problem (1/2)The (c) , (d) will be choose to merge when we only consider closeness2001/12/187CHAMELEONHierarc
6、hical technique problem (2/2)The (a) , (c) will be choose to merge when we only consider inter-connectivity2001/12/188CHAMELEONMain algorithmTwo phase algorithmPHASE IUse graph partitioning algorithm to cluster the data items into a large number of relatively small sub-clusters.PHASE IIUses an agglo
7、merative hierarchical clustering algorithm to find the genuine clusters by repeatedly combining together these sub-clusters.2001/12/189CHAMELEONFramework ConstructSparse GraphPartition the GraphMerge PartitionFinal ClustersData Set2001/12/1810CHAMELEONKeys features of CHAMELEONModeling the dataModel
8、ing the cluster similarityPartition algorithmsMerge schemes2001/12/1811CHAMELEONTermsArguments neededKK-nearest neighbor graphMINSIZEThe minima size of initial clusterTRIThreshold of related inter-connectivity TRC Threshold of related intra-connectivityCoefficient for weight of RI and RC2001/12/1812
9、CHAMELEONModeling the dataK-nearest neighbor graph approachAdvantagesData points that are far apart are completely disconnected in the GkGk capture the concept of neighborhood dynamicallyThe edge weights of dense regions in Gk tend to be large and the edge weights of sparse tend to be small2001/12/1
10、813CHAMELEONExample of k-nearest neighbor graph2001/12/1814CHAMELEONModeling the clustering similarity (1/2)Relative interconnectivityRelative closeness2001/12/1815CHAMELEONModeling the clustering similarity (2/2) If related is considered , (c) , (d) will be merged2001/12/1816CHAMELEONPartition algo
11、rithm (PHASE I)WhatFinding the initial sub-clustersWhyRI and RC cant be accurately calculated for clusters containing only a few data pointsHowUtilize multilevel graph partitioning algorithm (hMETIS)Coarsening phasePartitioning phaseUncoarsening phase2001/12/1817CHAMELEONPartition algorithm (cont.)I
12、nitialall points belonging to the same clusterRepeat until (size of all clusters MINSIZE)Select the largest cluster and use hMETIS to bisectPost scriptumBalance constrainSpilt Ci into CiA and CiB and each sub-clusters contains at least 25% of the node of Ci2001/12/1818CHAMELEON2001/12/1819CHAMELEONW
13、hatMerging sub-clusters using a dynamic frameworkHowFinding and merging the pair of sub-clusters that are the most similarScheme 1Scheme 2Merge schemes (Phase II)and2001/12/1820CHAMELEONExperiment and related workedIntroduction of CUREIntroduction of DBSCANResults of experimentPerformance analysis20
14、01/12/1821CHAMELEONIntroduction of CURE (1/n)Clustering Using Representative points1. Properties :Fit for non-spherical shapes.Shrinking can help to dampen the effects of outliers.Multiple representative points chosen for non-sphericalEach iteration , representative points shrunk ratio related to me
15、rge procedure by some scattered points chosen Random sampling in data sets is fit for large databases2001/12/1822CHAMELEONIntroduction of CURE (2/n)2. Drawbacks :Partitioning method can not prove data points chosen are good.Clustering accuracy with respect to the parameters below :(1) Shrink factor
16、s : CURE always find the right clusters by range of s values from 0.2 to 0.7.(2) Number of representative points c : CURE always found right clusters for value of c greater than 10.(3) Number of Partitions p : with as many as 50 partitions , CURE always discovered the desired clusters.(4) Random Sam
17、ple size r : (a) for sample size up to 2000 , clusters found poor quality(b) from 2500 sample points and above , about 2.5% of the data set size , CURE always correctly find the clusters.2001/12/1823CHAMELEON3. Clustering algorithm : Representative points 2001/12/1824CHAMELEONMerge procedure2001/12/
18、1825CHAMELEONIntroduction of DBSCAN (1/n)Density Based Spatial Clustering of Application With Noise1. Properties :Can discovery clusters of arbitrary shape.Each cluster with a typical density of points which is higher than outside of cluster.The density within the areas of noise is lower than the de
19、nsity in any of the clusters.Input the parameters MinPts onlyEasy to implement in C+ language using R*-treeRuntime is linear depending on the number of points.Time complexity is O(n * log n)2001/12/1826CHAMELEONIntroduction of DBSCAN (2/n)2. Drawbacks :Cannot apply to polygons.Cannot apply to high d
20、imensional feature spaces.Cannot process the shape of k-dist graph with multi-features.Cannot fit for large database because no method applied to reduce spatial database.3. DefinitionsEps-neighborhood of a point pNEps(p)=qD | dist(p,q)=MinPts (core point condition)We know directly density-reachable
21、is symmetric when p and q both are core point , otherwise is asymmetric if one core point and one border point.5. p is density-reachable from q if there is a chain of points between p and qDensity-reachable is transitive , but not symmetricDensity-reachable is symmetric for core points.2001/12/1828C
22、HAMELEONIntroduction of DBSCAN (4/n)6. A point p is density-connected to a point q if there is a point s such that both p and q are density-reachable from s.Density-connected is symmetric and reflexive relationA cluster is defined to be a set of density-connected points which is maximal density-reac
23、hability.Noise is the set of points not belong to any of clusters.7. How to find cluster C ?Maximality p , q : if p C and q is density-reachable from p , then q CConnectivity p , q C : p is density-connected to q8. How to find noises ? p , if p is not belong to any clusters , then p is noise point20
24、01/12/1829CHAMELEONResults of experiment2001/12/1830CHAMELEONPerformance analysis (1/2)The time of construct the k-nearest neighborLow-dimensional data sets based on k-d trees , overall complexity of O(n log n)High-dimensional data sets based on k-d trees not applicable , overall complexity of O(n2)Finding initial sub-clustersObtains m clusters by repeated partitioning successively smaller graphs , overall computa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 深度解析(2026)《GBT 7234-2004产品几何量技术规范(GPS) 圆度测量 术语、定义及参数》
- 公共政策概论期末机考题打印
- 2026年上海财经大学浙江学院教师招聘考试参考试题及答案解析
- 市政管网工程临时用地管理方案
- 2026年辽宁大学教师招聘考试参考试题及答案解析
- 2026年赣南卫生健康职业学院辅导员招聘笔试备考试题及答案解析
- 2026年服装店定制服务合同
- 2026年江苏卫生健康职业学院教师招聘考试备考试题及答案解析
- 2026年宿州学院教师招聘考试参考试题及答案解析
- 2026年教师招聘考试教育综合知识试题及答案
- 2025济南幼儿师范高等专科学校教师招聘考试题目及答案
- 【历史】 明清时期社会经济的发展 课件 2025-2026学年统编版七年级历史下册
- 人美版六年级美术下册全册课件
- 人工智能与智慧教育课件 第3章 人工智能助力教学资源生成
- 24春国家开放大学《机电一体化系统综合实训》大作业参考答案
- 水文地质勘察课件
- 拖式混凝土输送泵的泵送部分设计(全套图纸)
- 粮食仓储企业安全风险辨识与管控分级指南
- 危化企业双重预防机制数字化建设运行成效评估
- 派昂医药协同应用价值
- GB/T 2521.1-2016全工艺冷轧电工钢第1部分:晶粒无取向钢带(片)
评论
0/150
提交评论