版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、.,1,Contributed by Yizhou Sun 2008,An Introduction to WEKA,23.06.2020,.,2,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,3,What is WEKA?,Waikato Environment for Knowledge Ana
2、lysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.,23.06.2020,.,4,Download and Install WEKA,Website: http:/www.cs.waikato.ac.nz/ml/weka/index.html Support multipl
3、e platforms (written in java): Windows, Mac OS X and Linux,23.06.2020,.,5,Main Features,49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection,2
4、3.06.2020,.,6,Main GUI,Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface),23.06.2020,.,7,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Asso
5、ciation Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,8,Explorer: pre-processing the data,Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WE
6、KA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, ,23.06.2020,.,9,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, n
7、on_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class present, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with
8、“flat” files,Flat file in ARFF format,23.06.2020,.,10,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, non_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class pres
9、ent, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with “flat” files,numeric attribute,nominal attribute,23.06.2020,.,11,23.06.2020,.,12,23.06.2020,.,13,23.06.2020,.,14,23.
10、06.2020,.,15,23.06.2020,.,16,23.06.2020,.,17,23.06.2020,.,18,23.06.2020,.,19,23.06.2020,.,20,23.06.2020,.,21,23.06.2020,.,22,23.06.2020,.,23,23.06.2020,.,24,23.06.2020,.,25,23.06.2020,.,26,23.06.2020,.,27,23.06.2020,.,28,23.06.2020,.,29,23.06.2020,.,30,23.06.2020,.,31,23.06.2020,.,32,Explorer: build
11、ing “classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, ,23.06.2020,.,33,This follows a
12、n example of Quinlans ID3 (Playing Tennis),Decision Tree Induction: Training Dataset,23.06.2020,.,34,Output: A Decision Tree for “buys_computer”,23.06.2020,.,35,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examp
13、les are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain),Algorithm for Decision Tre
14、e Induction,23.06.2020,.,36,23.06.2020,.,37,23.06.2020,.,38,23.06.2020,.,39,23.06.2020,.,40,23.06.2020,.,41,23.06.2020,.,42,23.06.2020,.,43,23.06.2020,.,44,23.06.2020,.,45,23.06.2020,.,46,23.06.2020,.,47,23.06.2020,.,48,23.06.2020,.,49,23.06.2020,.,50,23.06.2020,.,51,23.06.2020,.,52,23.06.2020,.,53,
15、23.06.2020,.,54,23.06.2020,.,55,23.06.2020,.,56,23.06.2020,.,57,23.06.2020,.,58,Explorer: clustering data,WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “t
16、rue” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution,23.06.2020,.,59,Explorer: finding associations,WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statis
17、tical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence,23.06.2020,.,60,23.06.2020,.,61,23.06.2020,.,62,23.06.2020,.,63,23.06.2020,.,64,23.06.2020,.,
18、65,Explorer: attribute selection,Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correl
19、ation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two,23.06.2020,.,66,23.06.2020,.,67,23.06.2020,.,68,23.06.2020,.,69,23.06.2020,.,70,23.06.2020,.,71,23.06.2020,.,72,23.06.2020,.,73,23.06.2020,.,74,Explorer: data visualization,Vi
20、sualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function,23.06.2020
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 苏州大学《中西医结合内科学》2025-2026学年期末试卷
- 肠胃炎的饮食调理指南培训
- 小学生健康科普
- 消防工程防火封堵施工工艺(含实例图片)
- 2026年成人高考土木工程(本科)建筑工程管理模拟试卷
- 2026年成人高考高起专政治理论模拟单套试卷
- COPD 健康教育的主要内容
- 《数据的图表呈现》教案-2025-2026学年苏科版(新教材)小学信息技术四年级下册
- 招聘考试真题及答案
- 造价师历年真题及答案
- 儿科疾病作业治疗
- 保育员-生活管理-健康观察课件
- 2023浙江工业大学机械原理习题答案
- 中国铁塔股份有限公司代维单位星级评定方案2017年
- 江苏如东1100MW海上风电项目陆上换流站工程环评报告
- 江苏省无锡市江阴市2023年事业单位考试A类《职业能力倾向测验》临考冲刺试题含解析
- YS/T 885-2013钛及钛合金锻造板坯
- GB/T 34755-2017家庭牧场生产经营技术规范
- GB/T 32245-2015机床数控系统可靠性测试与评定
- 压力性损伤与失禁性皮炎的鉴别
- 进口DCS(DeltaV系统)培训教材
评论
0/150
提交评论