数据挖掘:实用机器学习工具与技术_08深度学习_第1页
数据挖掘:实用机器学习工具与技术_08深度学习_第2页
数据挖掘:实用机器学习工具与技术_08深度学习_第3页
数据挖掘:实用机器学习工具与技术_08深度学习_第4页
数据挖掘:实用机器学习工具与技术_08深度学习_第5页
已阅读5页,还剩22页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Data MiningPractical Machine Learning Tools and TechniquesSlides for Chapter 8 of Data Mining by I. H. Witten, E. Frank andM. A. HallData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Ensemble learningCombining multiple modelsThe basic ideaBaggingBias-variance decomposition, bag

2、ging with costsRandomizationRotation forestsBoostingAdaBoost, the power of boostingAdditive regressionNumeric prediction, additive logistic regressionInterpretable ensemblesOption trees, alternating decision trees, logistic model treesStackingData Mining: Practical Machine Learning Tools and Techniq

3、ues (Chapter 8)Combining multiple modelsBasic idea:build different “experts”, let them voteAdvantage:often improves predictive performanceDisadvantage:usually produces output that is very hard to analyzebut: there are approaches that aim to produce a single comprehensible structureData Mining: Pract

4、ical Machine Learning Tools and Techniques (Chapter 8)BaggingCombining predictions by voting/averagingSimplest wayEach model receives equal weight“Idealized” version:Sample several training sets of size n(instead of just having one training set of size n)Build a classifier for each training setCombi

5、ne the classifiers predictionsLearning scheme is unstable almost always improves performanceSmall change in training data can make big change in model (e.g. decision trees)Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bias-variance decompositionUsed to analyze how much sele

6、ction of any specific training set affects performanceAssume infinitely many classifiers,built from different training sets of size nFor any learning scheme,Bias=expected error of the combinedclassifier on new dataVariance=expected error due to theparticular training set usedTotal expected error bia

7、s + varianceData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on baggingBagging works because it reduces variance by voting/averagingNote: in some pathological hypothetical situations the overall error might increaseUsually, the more classifiers the betterProblem: we only

8、have one dataset!Solution: generate new ones of size n by sampling from it with replacementCan help a lot if data is noisyCan also be applied to numeric predictionAside: bias-variance decomposition originally only known for numeric predictionData Mining: Practical Machine Learning Tools and Techniqu

9、es (Chapter 8)Bagging classifiersLet n be the number of instances in the training dataFor each of t iterations:Sample n instances from training set(with replacement)Apply learning algorithm to the sampleStore resulting modelFor each of the t models:Predict class of instance using modelReturn class t

10、hat is predicted most oftenModel generationModel generationClassificationClassificationData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bagging with costsBagging unpruned decision trees known to produce good probability estimatesWhere, instead of voting, the individual classif

11、iers probability estimates are averagedNote: this can also improve the success rateCan use this with minimum-expected cost approach for learning problems with costsProblem: not interpretableMetaCost re-labels training data using bagging with costs and then builds single treeData Mining: Practical Ma

12、chine Learning Tools and Techniques (Chapter 8)RandomizationCan randomize learning algorithm instead of inputSome algorithms already have a random component: eg. initial weights in neural netMost algorithms can be randomized, eg. greedy algorithms:Pick from the N best options at random instead of al

13、ways picking the best optionsEg.: attribute selection in decision treesMore generally applicable than bagging: e.g. random subsets in nearest-neighbor schemeCan be combined with baggingData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Rotation forestsBagging creates ensembles o

14、f accurate classifiers with relatively low diversityBootstrap sampling creates training sets with a distribution that resembles the original dataRandomness in the learning algorithm increases diversity but sacrifices accuracy of individual ensemble membersRotation forests have the goal of creating a

15、ccurate andand diverse ensemble membersData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Rotation forestsCombine random attribute sets, bagging and principal components to generate an ensemble of decision treesAn iteration involvesRandomly dividing the input attributes into k d

16、isjoint subsetsApplying PCA to each of the k subsets in turnLearning a decision tree from the k sets of PCA directionsFurther increases in diversity can be achieved by creating a bootstrap sample in each iteration before applying PCAData Mining: Practical Machine Learning Tools and Techniques (Chapt

17、er 8)BoostingAlso uses voting/averagingWeights models according to performanceIterative: new models are influenced by performance of previously built onesEncourage new model to become an “expert” for instances misclassified by earlier modelsIntuitive justification: models should be experts that comp

18、lement each otherSeveral variantsData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)AdaBoost.M1Assign equal weight to each training instanceFor t iterations: Apply learning algorithm to weighted dataset,store resulting model Compute models error e on weighted dataset If e = 0 or

19、 e 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instances weight by e/(1-e) Normalize weight of all instancesModel generationModel generationClassificationClassificationAssign weight = 0 to all classesFor each of the t (or less) models:For

20、the class this model predictsadd log e/(1-e) to this classs weightReturn class with highest weightData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on boosting IBoosting needs weights butCan adapt learning algorithm . orCan apply boosting without weightsresample with proba

21、bility determined by weightsdisadvantage: not all instances are usedadvantage: if error 0.5, can resample againStems from computational learning theoryTheoretical result:training error decreases exponentiallyAlso:works if base classifiers are not too complex, andtheir error doesnt become too large t

22、oo quicklyData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on boosting IIContinue boosting after training error = 0?Puzzling fact:generalization error continues to decrease!Seems to contradict Occams RazorExplanation:consider margin (confidence), not errorDifference betwe

23、en estimated probability for true class and nearest other class (between 1 and 1)Boosting works with weak learnersonly condition: error doesnt exceed 0.5In practice, boosting sometimes overfits (in contrast to bagging)Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Additive r

24、egression ITurns out that boosting is a greedy algorithm for fitting additive modelsMore specifically, implements forward stagewise additive modelingSame kind of algorithm for numeric prediction:1.Build standard regression model (eg. tree)2.Gather residuals, learn model predicting residuals (eg. tre

25、e), and repeatTo predict, simply sum up individual predictions from all modelsData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Additive regression IIMinimizes squared error of ensemble if base learner minimizes squared errorDoesnt make sense to use it with standard multiple li

26、near regression, why?Can use it with simple linear regression to build multiple linear regression modelUse cross-validation to decide when to stopAnother trick: shrink predictions of the base models by multiplying with pos. constant 0.5, otherwise predict 2nd classData Mining: Practical Machine Lear

27、ning Tools and Techniques (Chapter 8)Option treesEnsembles are not interpretableCan we generate a single model?One possibility: “cloning” the ensemble by using lots of artificial data that is labeled by ensembleAnother possibility: generating a single structure that represents ensemble in compact fa

28、shionOption tree: decision tree with option nodesIdea: follow all possible branches at option nodePredictions from different branches are merged using voting or by averaging probability estimatesData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)ExampleCan be learned by modifyin

29、g tree learner:Create option node if there are several equally promising splits (within user-specified interval)When pruning, error at option node is average error of optionsData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Alternating decision treesCan also grow option tree by

30、 incrementally adding nodes to itStructure called alternating decision tree, with splitter nodes and prediction nodesPrediction nodes are leaves if no splitter nodes have been added to them yetStandard alternating tree applies to 2-class problemsTo obtain prediction, filter instance down all applica

31、ble branches and sum predictionsPredict one class or the other depending on whether the sum is positive or negativeData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)ExampleData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Growing alternating treesTree is g

32、rown using a boosting algorithmEg. LogitBoost described earlierAssume that base learner produces single conjunctive rule in each boosting iteration (note: rule for regression)Each rule could simply be added into the tree, including the numeric prediction obtained from the ruleProblem: tree would gro

33、w very large very quicklySolution: base learner should only consider candidate rules that extend existing branchesExtension adds splitter node and two prediction nodes (assuming binary splits)Standard algorithm chooses best extension among all possible extensions applicable to treeMore efficient heu

34、ristics can be employed insteadData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Logistic model treesOption trees may still be difficult to interpretCan also use boosting to build decision trees with linear models at the leaves (ie. trees without options)Algorithm for building logistic model trees:Run LogitBoost with simple linear regression as base lea

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论