版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1机器学习简介李云南京邮电大学计算机学院EMAIL: 2为什么上这门课As we begin the new millennium science and technology are changing rapidly “old” sciences such as physics are relatively well-understood computers are ubiquitous Grand challenges in science and technology understanding the brain, i.e. reasoning, cognition, creativi
2、ty creating useful intelligent machines arguably AI poses the most interesting challenges and questions in computer science today3Why Study Machine Learning?工程性更好的计算系统Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tun
3、ed to a specific task (knowledge engineering bottleneck).Develop systems that can automatically adapt and customize themselves to individual users.Personalized news or mail filterPersonalized tutoringDiscover new knowledge from large databases (data mining).Market basket analysis (e.g. diapers and b
4、eer)Social network analysis (e.g.影响力计算、社团发现 )4Why Study Machine Learning?认知科学的发展Computational studies of learning may help us understand learning in humans and other biological organisms.人脑是已知最有效的生物智能系统 Hebbian neural learning“Neurons that fire together, wire together.”被一起被激发的神经元紧紧相连,Repeated patter
5、ns of mental activity build neural structure Power law of practice(实践的幂定律:对于很大范围内的学习问题,人们的反应速度随着实践次数的幂级提高)log(# training trials)log(perf. time)5Why Study Machine Learning?时机成熟Many basic effective and efficient algorithms available.Large amounts of on-line data available.Large amounts of computationa
6、l resources available.6Related DisciplinesArtificial IntelligenceData MiningProbability and StatisticsInformation theoryNumerical optimizationComputational complexity theoryControl theory (adaptive)Psychology (developmental, cognitive)NeurobiologyLinguisticsPhilosophy7这门课的目标 To get you excited about
7、 machine learning (ML) To give you an overview of basic methods in ML To entice you to take deeper study on ML, write a thesis on ML, dedicate your life to ML 8学习机器学习的建议 线性代数和概率与数理统计非常有用。 从不同的角度来了解每一个知识点Dont expect to get anything the first time. Read descriptions of the same thing from several diff
8、erentsources.学习模型的同时要编程实现Theres nothing like trying something yourself. Pick a model and implement it. 阅读大量的文献。 选择经典论文,精读!Pick a paper you like and “live inside it” for a week. 有耐心并且要坚持Be patient and persistent. 9课程主要内容机器学习和大数据简介机器学习的基础知识: 决策树 朴素贝叶斯 神经网络 支持向量机 计算学习理论 聚类集成学习维数约简稀疏学习模型应用案例10机器学习简介机器学习
9、是人工智能的核心研究领域之一 任何一个没有学习能力的系统都很难被认为是一个真正的智能系统经典定义:利用经验改善系统自身的性能 随着该领域的发展,主要做智能数据分析典型任务:预测11What is Learning?12从数据中学习13简单的例子-垃圾邮件过滤器(1)数据采集:收集大量邮件,并标记为“是垃圾邮件”和“不是垃圾邮件”两个类别。(2)特征学习:每封邮件的描述可以是一个布尔向量x = (x1,xj,xd),其中如果词典中的第j个词出现在该邮件中,则xj=1,否则xj=0。(3)训练分类器:一个学习器将一个训练集(training set)样例(xi,yi)作为输入,其中xi = (xi
10、,1,xi,d)是观察到的特征输入,yi是相应的标记输出,学习器的输出是一个分类器(垃圾邮件过滤器)。(4)预测:利用训练得到的分类器对没有见过的邮件信息进行预测。14简单的例子-人脸识别f李云训练数据15机器学习的重要性1 Mitchell, T.M. Machine Learning. McGraw-Hill, NY, 19972 Witten, I., Frank, E. and Hall, M. Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition. Morgan Kaufmann, San
11、Mateo, CA, 201116标志性事件 美国航空航天局JPL(喷气推进实验室)的科学家在Science(2001年9月)上撰文指出:机器学习对科学研究的整个过程正起到越来越大的支持作用,该领域在今后的若干年内将取得稳定而快速的发展。 2006年,美国CMU专门成立了机器学习系 2010年图灵奖美国Havard大学L. Valliant和2011年图灵奖UCLA的J. Pearl 麦肯锡全球研究院(the McKinsey Global Institute)最近一份报告(2011年)指出,机器学习(又称数据挖掘或者预测分析)将驱动下一轮创新。17热点应用1819202122机器学习发展历程
12、23机器学习发展历程In 1949, based ona neuropsychological learning formulation. It is calledHebbian Learning海扁学习theory :如果两个神经元常常同时产生动作电位,或者说同时激动(fire),这两个神经元之间的连接就会变强,反之则变弱 In 1952,Arthur Samuelat IBM, developed a program playingCheckers 24机器学习发展历程In 1957,RosenblattsPerceptron, it was a very exciting discove
13、ry at the time and it was practically more applicable than Hebbians idea. After 3 years later,Widrow engravedDelta Learning rulethat is then used aspractical procedure for Perceptron training. It is also known asLeast Squareproblem. Combination of those two ideas creates a good linear classifier.How
14、ever, Perceptrons excitement was hinged byMinsky in 1969 . He proposed the famous XORproblem and the inability of Perceptrons in such linearly inseparable data distributions. It was the Minskys tackle to NN community. Thereafter, NN researches would be dormant up until 1980s 25机器学习发展历程There had been
15、 not to much effort until the intuition ofMulti-Layer Perceptron (MLP)was suggested by Werbosin 1981with NN specificBack propagation(BP)algorithm With those new ideas, NN researches accelerated again. In 1985-1986 NN researchers successively presented the idea ofMLPwith practicalBPtraining.At the an
16、other spectrum, a very-well known ML algorithm was proposed byJ. R. Quinlanin 1986 that we callDecision Trees, more specificallyID3algorithm. 26机器学习发展历程One of the most important ML breakthrough wasSupport Vector Machines(Networks) (SVM), proposed byVapnik and Cortesin1995with very strong theoretical
17、 standing and empirical results. NN took another damage by the work ofHochreiters thesis in 1991 andHochreiter et. al in 2001, showing the gradient loss after the saturation of NN units as we apply BP learning. Simply means, it is redundant to train NN units after a certain number of epochs owing to
18、 saturated units hence NNs are very inclined to over-fit in a short number of epochs.27机器学习发展历程Little before,another solid ML model was proposed byFreund and Schapirein1997prescribed with boosted ensemble of weak classifiers calledAdaboost. Anotherensemble modelexploredbyBreimanin2001that ensembles
19、multiple decision trees where each of them is curated by a random subset of instances and each node is selected from a random subset of features. Random Forests(RF) 28机器学习发展历程As we come closer today, a new era of NN calledDeep Learninghas been commerced. The 3rd rise of NN has begun roughly in2005wi
20、th the conjunction of many different discoveries from past and present by recent mavensHinton, LeCun, Bengio, Andrew Ng and other valuable older researchers. With the combination of all those ideas and non-listed ones, NN models are able to beat off state of art at very different tasks such as Objec
21、t Recognition, Speech Recognition, NLP etc. However, it should be noted that this absolutely does not mean, it is the end of other ML streams. EvenDeep Learning success stories grow rapidly , there are many critics directed to training costand tuningexogenous parameters ofthese models. Moreover, sti
22、ll SVM is being used more commonly owing to its simplicity 29机器学习发展历程After the growth of WWW and Social Media, a new term,Big Dataemergedand affected ML research wildly. Because of the large problems arising from BigData , many strong ML algorithms are useless for reasonable systems (not for giant Tech Companies of course). Hence, research people come up with a new set of simple models that are dubbedBandit Algorithms (formally predicat
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年数据备份数据备份数据数据应急预案
- 控油皮肤的日常护理注意事项
- 神经科患者的营养需求与评估
- 2026年脑机接口柔性电极新材料标准化研究
- 环境与职业健康安全管理方案参考指南
- 2025年前台服务考核
- 2025年前台服务规范测试
- 2026年发射箱体功能层铺放与电磁屏蔽结构一体化设计
- 2026年数字孪生几何建模技术:从点云采集到模型轻量化
- 支气管镜检查的拔火罐护理
- 51testing:2024年软件测试行业现状调查报告
- 灌排渠道设计规范
- 三年级数学下册口算练习题(每日一练共12份)
- 心脑血管病防治知识讲座
- 2025至2030中国有机芝麻行业产业运行态势及投资规划深度研究报告
- 低空经济试题及答案
- (高清版)DB11∕T 1455-2025 电动汽车充电基础设施规划设计标准
- 养老院安全生产教育培训内容
- 设备设施停用管理制度
- 学会宽容第3课时-和而不同 公开课一等奖创新教案
- 山东高考英语语法单选题100道及答案
评论
0/150
提交评论