数据挖掘课件10unsupervised learning_第1页
数据挖掘课件10unsupervised learning_第2页
数据挖掘课件10unsupervised learning_第3页
数据挖掘课件10unsupervised learning_第4页
数据挖掘课件10unsupervised learning_第5页
已阅读5页,还剩16页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、chapter 10Unsupervised Learning9/13/20221数据挖掘与统计计算This chapter will instead focus on unsupervised learning, a set of statistical tools intended for the setting in which we have only a set of features X1, X2, . . . , Xp measured on n observations. We are not interested in prediction, because we do no

2、t have an associated response variable Y . Rather, the goal is to discover interesting things about the measurements on X1, X2, . . . , Xp. principal components analysisclustering9/13/2022数据挖掘与统计计算210.1 The Challenge of Unsupervised LearningIf we fit a predictive model using a supervised learning te

3、chnique, then it is possible to check our work by seeing how well our model predicts the response Y on observations not used in fitting the model. However, in unsupervised learning, there is no way to check ourwork because we dont know the true answerthe problem is unsupervised。9/13/2022数据挖掘与统计计算310

4、.2 Principal Components AnalysisWhen faced with a large set of correlated variables, principal components allow us to summarize this set with a smaller number of representative variables that collectively explain most of the variability in the original set. PCA is an unsupervised approach, since it

5、involves only a set of features X1, X2, . . . , Xp, and no associated responseY . Apart from producing derived variables for use in supervised learning problems, PCA also serves as a tool for data visualization (visualization of the observations or visualization of the variables).9/13/2022数据挖掘与统计计算4

6、10.2.1 What Are Principal Components?Suppose that we wish to visualize n observations with measurements on a set of p features, X1, X2, . . . , Xp, We could do this by examining two-dimensional scatterplots of the data, each of which contains the n observations measurements on two of thefeatures. Ho

7、wever, there are p(p1)/2 such scatterplots.9/13/2022数据挖掘与统计计算5low-dimensional representationClearly, a better method is required to visualize the n observations when p is large. In particular, we would like to find a low-dimensional representation of the data that captures as much of the information

8、 as possible. For instance, if we can obtain a two-dimensional representation of the data that captures most of the information, then we can plot the observations in this low-dimensional space.9/13/2022数据挖掘与统计计算6PCA provides a tool to do just this. It finds a low-dimensional representation of a data

9、 set that contains as much as possible of the variation. The idea is that each of the n observations lives in p dimensional space, but not all of these dimensions are equally interesting. PCA seeks a small number of dimensions that are as interesting as possible, where the concept of interesting is

10、measured by the amount that the observations vary along each dimension. 9/13/2022数据挖掘与统计计算7Geometric interpretation for PCAThere is a nice geometric interpretation for the first principal component. The loading vector 1 with elements 11, 21, . . . , p1 defines a direction in feature space along whic

11、h the data vary the most. If we project the n data points x1, . . . , xn onto this direction, the projected values are the principal component scores z11, . . . , zn1 themselves. 9/13/2022数据挖掘与统计计算8Low-dimensional views of the dataOnce we have computed the principal components, we can plot them agai

12、nst each other in order to produce low-dimensional views of the data.For instance, we can plot the score vector Z1 against Z2, Z1 against Z3, Z2 against Z3, and so forth. Geometrically, this amounts to projecting the original data down onto the subspace spanned by 1, 2, and 3, and plotting the proje

13、cted points.9/13/2022数据挖掘与统计计算99/13/2022数据挖掘与统计计算1010.2.2 Another Interpretation of Principal ComponentsPrincipal components provide low-dimensional linear surfaces that are closest to the observations. The first principal component loading vector has a very special property: it is the line in p-dim

14、ensional space that is closest to the n observations (using average squared Euclidean distance as a measure of closeness). The appeal of this interpretation is clear: we seek a single dimension of the data that lies as close as possible to all of the data points, since such a line will likely provid

15、e a good summary of the data.9/13/2022数据挖掘与统计计算119/13/2022数据挖掘与统计计算1210.2.3 More on PCAScaling the Variablesthe results obtained when we perform PCA will also depend on whether the variables have been individually scaled (each multiplied by a different constant). Uniqueness of the Principal Componen

16、tsThis means that two different software packages will yield the same principal component loading vectors, although the signs of those loading vectors may differ. 9/13/2022数据挖掘与统计计算13The Proportion of Variance ExplainedWe can now ask a natural question: how much of the information in a given data se

17、t is lost by projecting the observations onto the first few principal components? That is, how much of the variance in the data is not contained in the first few principal components? Deciding How Many Principal Components to UseIn fact, we would like to use the smallest number of principal componen

18、ts required to get a good understanding of the data. How many principal components are needed? Unfortunately, there is no single (or simple!) answer to this question.9/13/2022数据挖掘与统计计算149/13/2022数据挖掘与统计计算1510.3 Clustering MethodsClustering refers to a very broad set of techniques for finding subgrou

19、ps, or clusters, in a data set. When we cluster the observations of a data set, we seek to partition them into distinct groups so that the observations withineach group are quite similar to each other, while observations in different groups are quite different from each other. K-means clusteringHier

20、archical clustering9/13/2022数据挖掘与统计计算1610.3.1 K-Means Clustering1. C1 C2 . . . CK = 1, . . . , n. In other words, each observation belongs to at least one of the K clusters.2. Ck Ck = for all k k. In other words, the clusters are nonoverlapping: no observation belongs to more than one cluster.9/13/2022数据挖掘与统计计算179/13/2022数据挖掘与统计计算18Code : Principal Components Analysislibrary(ISLR)states =s ( USArrests )statesnames ( USArrests )apply ( USArrests , 2, mean )apply ( USArrests , 2, var )pr.out = p ( USArrests , scale =TRUE)nam

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论