




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Ensemble learning,Combining multiple models The basic idea Bagging Bias-variance decomposi
2、tion, bagging with costs Randomization Rotation forests Boosting AdaBoost, the power of boosting Additive regression Numeric prediction, additive logistic regression Interpretable ensembles Option trees, alternating decision trees, logistic model trees Stacking,Data Mining: Practical Machine Learnin
3、g Tools and Techniques (Chapter 8),Combining multiple models,Basic idea: build different “experts”, let them vote Advantage: often improves predictive performance Disadvantage: usually produces output that is very hard to analyze but: there are approaches that aim to produce a single comprehensible
4、structure,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Bagging,Combining predictions by voting/averaging Simplest way Each model receives equal weight “Idealized” version: Sample several training sets of size n (instead of just having one training set of size n) Build a c
5、lassifier for each training set Combine the classifiers predictions Learning scheme is unstable almost always improves performance Small change in training data can make big change in model (e.g. decision trees),Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Bias-variance d
6、ecomposition,Used to analyze how much selection of any specific training set affects performance Assume infinitely many classifiers, built from different training sets of size n For any learning scheme, Bias = expected error of the combined classifier on new data Variance = expected error due to the
7、 particular training set used Total expected error bias + variance,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),More on bagging,Bagging works because it reduces variance by voting/averaging Note: in some pathological hypothetical situations the overall error might increas
8、e Usually, the more classifiers the better Problem: we only have one dataset! Solution: generate new ones of size n by sampling from it with replacement Can help a lot if data is noisy Can also be applied to numeric prediction Aside: bias-variance decomposition originally only known for numeric pred
9、iction,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Bagging classifiers,Let n be the number of instances in the training data For each of t iterations: Sample n instances from training set (with replacement) Apply learning algorithm to the sample Store resulting model,For
10、 each of the t models: Predict class of instance using model Return class that is predicted most often,Model generation,Classification,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Bagging with costs,Bagging unpruned decision trees known to produce good probability estimat
11、es Where, instead of voting, the individual classifiers probability estimates are averaged Note: this can also improve the success rate Can use this with minimum-expected cost approach for learning problems with costs Problem: not interpretable MetaCost re-labels training data using bagging with cos
12、ts and then builds single tree,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Randomization,Can randomize learning algorithm instead of input Some algorithms already have a random component: eg. initial weights in neural net Most algorithms can be randomized, eg. greedy alg
13、orithms: Pick from the N best options at random instead of always picking the best options Eg.: attribute selection in decision trees More generally applicable than bagging: e.g. random subsets in nearest-neighbor scheme Can be combined with bagging,Data Mining: Practical Machine Learning Tools and
14、Techniques (Chapter 8),Rotation forests,Bagging creates ensembles of accurate classifiers with relatively low diversity Bootstrap sampling creates training sets with a distribution that resembles the original data Randomness in the learning algorithm increases diversity but sacrifices accuracy of in
15、dividual ensemble members Rotation forests have the goal of creating accurate and diverse ensemble members,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Rotation forests,Combine random attribute sets, bagging and principal components to generate an ensemble of decision tre
16、es An iteration involves Randomly dividing the input attributes into k disjoint subsets Applying PCA to each of the k subsets in turn Learning a decision tree from the k sets of PCA directions Further increases in diversity can be achieved by creating a bootstrap sample in each iteration before appl
17、ying PCA,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Boosting,Also uses voting/averaging Weights models according to performance Iterative: new models are influenced by performance of previously built ones Encourage new model to become an “expert” for instances misclassi
18、fied by earlier models Intuitive justification: models should be experts that complement each other Several variants,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),AdaBoost.M1,Assign equal weight to each training instance For t iterations: Apply learning algorithm to weight
19、ed dataset, store resulting model Compute models error e on weighted dataset If e = 0 or e 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instances weight by e/(1-e) Normalize weight of all instances,Model generation,Classification,Assign wei
20、ght = 0 to all classes For each of the t (or less) models: For the class this model predicts add log e/(1-e) to this classs weight Return class with highest weight,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),More on boosting I,Boosting needs weights but Can adapt learnin
21、g algorithm . or Can apply boosting without weights resample with probability determined by weights disadvantage: not all instances are used advantage: if error 0.5, can resample again Stems from computational learning theory Theoretical result: training error decreases exponentially Also: works if
22、base classifiers are not too complex, and their error doesnt become too large too quickly,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),More on boosting II,Continue boosting after training error = 0? Puzzling fact: generalization error continues to decrease! Seems to contr
23、adict Occams Razor Explanation: consider margin (confidence), not error Difference between estimated probability for true class and nearest other class (between 1 and 1) Boosting works with weak learners only condition: error doesnt exceed 0.5 In practice, boosting sometimes overfits (in contrast to
24、 bagging),Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Additive regression I,Turns out that boosting is a greedy algorithm for fitting additive models More specifically, implements forward stagewise additive modeling Same kind of algorithm for numeric prediction: Build st
25、andard regression model (eg. tree) Gather residuals, learn model predicting residuals (eg. tree), and repeat To predict, simply sum up individual predictions from all models,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Additive regression II,Minimizes squared error of ens
26、emble if base learner minimizes squared error Doesnt make sense to use it with standard multiple linear regression, why? Can use it with simple linear regression to build multiple linear regression model Use cross-validation to decide when to stop Another trick: shrink predictions of the base models
27、 by multiplying with pos. constant 0.5, otherwise predict 2nd class,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Option trees,Ensembles are not interpretable Can we generate a single model? One possibility: “cloning” the ensemble by using lots of artificial data that is l
28、abeled by ensemble Another possibility: generating a single structure that represents ensemble in compact fashion Option tree: decision tree with option nodes Idea: follow all possible branches at option node Predictions from different branches are merged using voting or by averaging probability est
29、imates,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Example,Can be learned by modifying tree learner: Create option node if there are several equally promising splits (within user-specified interval) When pruning, error at option node is average error of options,Data Mini
30、ng: Practical Machine Learning Tools and Techniques (Chapter 8),Alternating decision trees,Can also grow option tree by incrementally adding nodes to it Structure called alternating decision tree, with splitter nodes and prediction nodes Prediction nodes are leaves if no splitter nodes have been add
31、ed to them yet Standard alternating tree applies to 2-class problems To obtain prediction, filter instance down all applicable branches and sum predictions Predict one class or the other depending on whether the sum is positive or negative,Data Mining: Practical Machine Learning Tools and Techniques
32、 (Chapter 8),Example,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Growing alternating trees,Tree is grown using a boosting algorithm Eg. LogitBoost described earlier Assume that base learner produces single conjunctive rule in each boosting iteration (note: rule for regre
33、ssion) Each rule could simply be added into the tree, including the numeric prediction obtained from the rule Problem: tree would grow very large very quickly Solution: base learner should only consider candidate rules that extend existing branches Extension adds splitter node and two prediction nod
34、es (assuming binary splits) Standard algorithm chooses best extension among all possible extensions applicable to tree More efficient heuristics can be employed instead,Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8),Logistic model trees,Option trees may still be difficult to interpret Can also use boosting to build decision trees with linear models at the leaves (ie. trees without options) Algorithm for building logistic model trees: Run LogitBoost with simple linear regression as base learner (
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 浙江国企招聘2025嘉兴市南湖投资开发建设集团有限公司下属公司招聘14人笔试参考题库附带答案详解
- 浙江交通职业技术学院《语演讲与辩论》2023-2024学年第二学期期末试卷
- 武汉航海职业技术学院《单片机原理及应用C》2023-2024学年第二学期期末试卷
- 德阳城市轨道交通职业学院《工程机械液压传动》2023-2024学年第二学期期末试卷
- 山东中医药大学《焊接质量检验与评价》2023-2024学年第二学期期末试卷
- 肇庆学院《社区工作实验》2023-2024学年第二学期期末试卷
- 新疆农业大学《建筑摄影》2023-2024学年第二学期期末试卷
- 河南轻工职业学院《计算机地图制图》2023-2024学年第二学期期末试卷
- 湖南外国语职业学院《GIS开发基础》2023-2024学年第二学期期末试卷
- 广东外语外贸大学南国商学院《电力专业俄语》2023-2024学年第二学期期末试卷
- 2025-2030年中国温泉特色酒店行业市场深度调研及发展趋势与投资前景预测研究报告
- 家政合伙合同协议书
- 安监考试试题及答案
- 【绥化】2025年黑龙江绥化市“市委书记进校园”企事业单位引才1167人笔试历年典型考题及考点剖析附带答案详解
- 合肥市2025届高三年级5月教学质量检测(合肥三模)历史试题+答案
- 肯德基假期兼职合同协议
- 货运司机测试题及答案
- 2025年全国防灾减灾日班会 课件
- SL631水利水电工程单元工程施工质量验收标准第1部分:土石方工程
- (二调)武汉市2025届高中毕业生二月调研考试 英语试卷(含标准答案)+听力音频
- 数学-湖北省武汉市2025届高中毕业生二月调研考试(武汉二调)试题和解析
评论
0/150
提交评论