understanding-machine-learning-theory-algorithms_第1页
understanding-machine-learning-theory-algorithms_第2页
understanding-machine-learning-theory-algorithms_第3页
understanding-machine-learning-theory-algorithms_第4页
understanding-machine-learning-theory-algorithms_第5页
已阅读5页,还剩443页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Understanding Machine Learning: From Theory to Algorithmsc 2014 by Shai Shalev-Shwartz and Shai Ben-DavidPublished 2014 by Cambridge University Press.This copy is for personal use only. Not for distribution.Do not post. Please link to:http:/www.cs.huji.ac.il/shais/UnderstandingMachineLearningPlease

2、note: This copy is almost, but not entirely, identical to the printed versionof the book. In particular, page numbers are not identical (but section numbers are the same).Understanding Machine LearningMachine learning is one of the fastest growing areas of computer science, with far-reaching applica

3、tions. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a princi- pled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles i

4、nto practical algorithms. Fol- lowing a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous text- books. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; im

5、portant algorith- mic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes t

6、he fundamentals and algorithms of machine learning accessible to stu- dents and nonexpert readers in statistics, computer science, mathematics, and engineering.Shai Shalev-Shwartz is an Associate Professor at the School of Computer Science and Engineering at The Hebrew University, Israel.Shai Ben-Da

7、vid is a Professor in the School of Computer Science at the University of Waterloo, Canada.UNDERSTANDING MACHINE LEARNINGFrom Theory to AlgorithmsShai Shalev-ShwartzThe Hebrew University, JerusalemShai Ben-DavidUniversity of Waterloo, Canada32 Avenue of the Americas, New York, NY 10013-2473, USACamb

8、ridge University Press is part of the University of Cambridge.It furthers the Universitys mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of Information on this title: /978110705713

9、5c Shai Shalev-Shwartz and Shai Ben-David 2014This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.First published 20

10、14Printed in the United States of AmericaA catalog record for this publication is available from the British LibraryLibrary of Congress Cataloging in Publication DataISBN 978-1-107-05713-5 HardbackCambridge University Press has no responsibility for the persistence or accuracy of URLs for external o

11、r third-party Internet Web sites referred to in this publication, and does not guarantee that any content on such Web sites is, or will remain, accurate or appropriate.Triple-S dedicates the book to triple-MviiPrefaceThe term machine learning refers to the automated detection of meaningful patterns

12、in data. In the past couple of decades it has become a common tool in almost any task that requires information extraction from large data sets. We are surrounded by a machine learning based technology: search engines learn how to bring us the best results (while placing profitable ads), anti-spam s

13、oftware learns to filter our email messages, and credit card transactions are secured by a software that learns how to detect frauds. Digital cameras learn to detect faces and intelligent personal assistance applications on smart-phones learn to recognize voice commands. Cars are equipped with accid

14、ent prevention systems that are built using machine learning algorithms. Machine learning is also widely used in scientific applications such as bioinformatics, medicine, and astronomy. One common feature of all of these applications is that, in contrast to more traditional uses of computers, in the

15、se cases, due to the complexity of the patterns that need to be detected, a human programmer cannot provide an explicit, fine- detailed specification of how such tasks should be executed. Taking example from intelligent beings, many of our skills are acquired or refined through learning from our exp

16、erience (rather than following explicit instructions given to us). Machine learning tools are concerned with endowing programs with the ability to “learn”and adapt.The first goal of this book is to provide a rigorous, yet easy to follow, intro- duction to the main concepts underlying machine learnin

17、g: What is learning? How can a machine learn? How do we quantify the resources needed to learn a given concept? Is learning always possible? Can we know if the learning process succeeded or failed?The second goal of this book is to present several key machine learning algo- rithms. We chose to prese

18、nt algorithms that on one hand are successfully used in practice and on the other hand give a wide spectrum of different learning techniques. Additionally, we pay specific attention to algorithms appropriate for large scale learning (a.k.a. “Big Data”), since in recent years, our world has be- come

19、increasingly “digitized” and the amount of data available for learning is dramatically increasing. As a result, in many applications data is plentiful and computation time is the main bottleneck. We therefore explicitly quantify both the amount of data and the amount of computation time needed to le

20、arn a given concept.The book is divided into four parts. The first part aims at giving an initial rigorous answer to the fundamental questions of learning. We describe a gen- eralization of Valiants Probably Approximately Correct (PAC) learning model, which is a first solid answer to the question “w

21、hat is learning?”. We describe the Empirical Risk Minimization (ERM), Structural Risk Minimization (SRM), and Minimum Description Length (MDL) learning rules, which shows “how can a machine learn”. We quantify the amount of data needed for learning using the ERM, SRM, and MDL rules and show how lear

22、ning might fail by derivingviiia “no-free-lunch” theorem. We also discuss how much computation time is re- quired for learning. In the second part of the book we describe various learning algorithms. For some of the algorithms, we first present a more general learning principle, and then show how th

23、e algorithm follows the principle. While the first two parts of the book focus on the PAC model, the third part extends the scope by presenting a wider variety of learning models. Finally, the last part of the book is devoted to advanced theory.We made an attempt to keep the book as self-contained a

24、s possible. However, the reader is assumed to be comfortable with basic notions of probability, linear algebra, analysis, and algorithms. The first three parts of the book are intended for first year graduate students in computer science, engineering, mathematics, or statistics. It can also be acces

25、sible to undergraduate students with the adequate background. The more advanced chapters can be used by researchers intending to gather a deeper theoretical understanding.AcknowledgementsThe book is based on Introduction to Machine Learning courses taught by Shai Shalev-Shwartz at the Hebrew Univers

26、ity and by Shai Ben-David at the Univer- sity of Waterloo. The first draft of the book grew out of the lecture notes for the course that was taught at the Hebrew University by Shai Shalev-Shwartz during 20102013. We greatly appreciate the help of Ohad Shamir, who served as a TA for the course in 201

27、0, and of Alon Gonen, who served as a TA for the course in 20112013. Ohad and Alon prepared few lecture notes and many of the exercises. Alon, to whom we are indebted for his help throughout the entire making of the book, has also prepared a solution manual.We are deeply grateful for the most valuab

28、le work of Dana Rubinstein. Dana has scientifically proofread and edited the manuscript, transforming it from lecture-based chapters into fluent and coherent text.Special thanks to Amit Daniely, who helped us with a careful read of the advanced part of the book and also wrote the advanced chapter on

29、 multiclass learnability. We are also grateful for the members of a book reading club in Jerusalem that have carefully read and constructively criticized every line of the manuscript. The members of the reading club are: Maya Alroy, Yossi Arje- vani, Aharon Birnbaum, Alon Cohen, Alon Gonen, Roi Livn

30、i, Ofer Meshi, Dan Rosenbaum, Dana Rubinstein, Shahar Somin, Alon Vinnikov, and Yoav Wald. We would also like to thank Gal Elidan, Amir Globerson, Nika Haghtalab, Shie Mannor, Amnon Shashua, Nati Srebro, and Ruth Urner for helpful discussions.Shai Shalev-Shwartz, Jerusalem, Israel Shai Ben-David, Wa

31、terloo, CanadaContentsPrefacepage vii1Introduction1919212224252621.41.5What Is Learning?When Do We Need Machine Learning? Types of LearningRelations to Other Fields How to Read This Book1.5.1Possible Course Plans Based on This Book1.6NotationPart IFoundations312A Gentle Start33333535363741

32、2.12.2A Formal Model The Statistical Learning Framework Empirical Risk Minimization2.2.1Something May Go Wrong Overfitting2.3Empirical Risk Minimization with Inductive Bias2.3.1Finite Hypothesis Classes2.4Exercises3A Formal Learning Model4343443.13.2PAC LearningA More General Learning Model3.2.1Rele

33、asing the Realizability Assumption Agnostic PAC LearningThe Scope of Learning Problems Modeled45474950503.43.5Summary Bibliographic Remarks Exercises4Learning via Uniform Convergence5454554.14.2Uniform Convergence Is Sufficient for Learnability Finite Classes Are Agnostic PAC LearnableUnders

34、tanding Machine Learning, c 2014 by Shai Shalev-Shwartz and Shai Ben-David Published 2014 by Cambridge University Press.Personal use only. Not for distribution. Do not post.Please link to http:/www.cs.huji.ac.il/shais/UnderstandingMachineLearningxContentsSummary Bibliographic Remarks Exerci

35、ses5858585The Bias-Complexity Tradeoff606163646566665.1The No-Free-Lunch Theorem5.1.1No-Free-Lunch and Prior Knowledge5.5Error Decomposition Summary Bibliographic Remarks Exercises6The VC-Dimension6767687070717172727273737578787Infinite-Size Classes Can Be Learnable The VC-Dimensi

36、onExamples..46.3.5Threshold Functions IntervalsAxis Aligned Rectangles Finite ClassesVC-Dimension and the Number of Parameters6.46.5The Fundamental Theorem of PAC learning Proof of Theorem 6.5.2Sauers Lemma and the Growth FunctionUniform Convergence for Classes of Small Effe

37、ctive SizeSummary Bibliographic remarks Exercises7Nonuniform Learnability8383848589919293959697977.1Nonuniform Learnability7.1.1Characterizing Nonuniform Learnability7.27.3Structural Risk MinimizationMinimum Description Length and Occams Razor7.3.1Occams Razor7.47.5Other Notions of Learnabi

38、lity Consistency Discussing the Different Notions of Learnability7.5.1The No-Free-Lunch Theorem RevisitedSummary Bibliographic Remarks Exercises8The Runtime of Learning1001018.1Computational Complexity of LearningContentsxi8.1.1Formal Definition*1021031041051061071071081101101108.2Implement

39、ing the ERM Rule..4Finite ClassesAxis Aligned Rectangles Boolean Conjunctions Learning 3-Term DNF8.68.7Efficiently Learnable, but Not by a Proper ERM Hardness of Learning*Summary Bibliographic Remarks ExercisesPart IIFrom Theory to Algorithms1159Linear Predictors11711811912

40、01221231241251261281281289.1Halfspaces.29.1.3Linear Programming for the Class of Halfspaces Perceptron for HalfspacesThe VC Dimension of Halfspaces9.2Linear Regression.2Least SquaresLinear Regression for Polynomial Regression Tasks9.6Logistic Regression Summary Bibliographic

41、 Remarks Exercises10Boosting13013113313413713914014114114210.1Weak Learnability10.1.1 Efficient Implementation of ERM for Decision Stumps AdaBoostLinear Combinations of Base Hypotheses10.3.1 The VC-Dimension of L(B, T ) AdaBoost for Face Recognition SummaryBibliographic Remarks Exercises10.210.310.4

42、10.510.610.711Model Selection and Validation11.1 Model Selection Using SRM11.2 Validation144145146146147148.211.2.3Hold Out SetValidation for Model Selection The Model-Selection CurvexiiContents11.2.4 k-Fold Cross Validation11.2.5 Train-Validation-Test Split11.3 What to Do If Learning Fail

43、s11.4 Summary149150151154154Exercises11.512Convex Learning Problems15615615616016216316416616716816916912.1Convexity, Lipschitzness, and Smoothness.212.1.3Convexity Lipschitzness Smoothness12.2Convex Learning Problems12.2.1 Learnability of Convex Learning Problems12.2.2 Convex-Lipschitz/Sm

44、ooth-Bounded Learning Problems Surrogate Loss FunctionsSummary Bibliographic Remarks Exercises12.312.412.512.613Regularization and Stability17117117217317417617717818018018113.1Regularized Loss Minimization13.1.1 Ridge Regression Stable Rules Do Not OverfitTikhonov Regularization as a Stabilizer13.3

45、.1 Lipschitz Loss13.3.2 Smooth and Nonnegative Loss Controlling the Fitting-Stability Tradeoff SummaryBibliographic Remarks Exercises13.213.313.413.513.613.714Stochastic Gradient Descent18418518618818919019019119119319319419514.1Gradient Descent14.1.1 Analysis of GD for Convex-Lipschitz Functions Su

46、bgradients.3Calculating Subgradients Subgradients of Lipschitz Functions Subgradient Descent14.3Stochastic Gradient Descent (SGD)14.3.1 Analysis of SGD for Convex-Lipschitz-Bounded Functions Variants14.4.214.4.3Adding a Projection Step Variable Step SizeOther Averaging

47、TechniquesContentsxiii14.4.4 Strongly Convex Functions* Learning with SGD19519619619819920020020114.5.214.5.3SGD for Risk MinimizationAnalyzing SGD for Convex-Smooth Learning Problems SGD for Regularized Loss Minimization14.614.714.8Summary Bibliographic Remarks Exercises15Support Vector M

48、achines20220220520520620820820921021121221321321415.1Margin and Hard-SVM15.1.1 The Homogenous Case15.1.2 The Sample Complexity of Hard-SVM Soft-SVM and Norm Regularization.3The Sample Complexity of Soft-SVMMargin and Norm-Based Bounds versus Dimension The Ramp Loss*15.315.415.515

49、.615.715.8Optimality Conditions and “Support Vectors”* Duality*Implementing Soft-SVM Using SGD SummaryBibliographic Remarks Exercises16Kernel Methods21521521722122222222422522516.116.2Embeddings into Feature Spaces The Kernel Trick16.2.1 Kernels as a Way to Express Prior Knowledge16.2.2 Characterizi

50、ng Kernel Functions* Implementing Soft-SVM with Kernels SummaryBibliographic Remarks Exercises16.316.416.516.617Multiclass, Ranking, and Complex Prediction Problems22722723023023223223323423623817.117.2One-versus-All and All-PairsLinear .217.2.317.2.417.2.5Multiclass Predictors How to Cons

51、truct Cost-Sensitive Classification ERMGeneralized Hinge Loss Multiclass SVM and SGD17.317.4Structured Output Prediction RankingxivContents17.4.1 Linear Predictors for RankingBipartite Ranking and Multivariate Performance Measures17.5.1 Linear Predictors for Bipartite Ranking SummaryBibliographic Re

52、marks Exercises24024324524724724817.517.617.717.818Decision Trees25025125225325425525525625625618.118.2Sample Complexity Decision Tree Algorithms.218.2.3Implementations of the Gain Measure PruningThreshold-Based Splitting Rules for Real-Valued Features18.318.418.518.6Random Forests Summary

53、 Bibliographic Remarks Exercises19Nearest Neighbor258258259260263264264264265k Nearest Neighbors Analysis.1 A Generalization Bound for the 1-NN Rule19.2.2 The “Curse of Dimensionality” Efficient Implementation*Summary Bibliographic Remarks Exercises19.319.419.519.620Neural Networks268269

54、27027127327427627728128128Feedforward Neural Networks Learning Neural NetworksThe Expressive Power of Neural Networks20.3.1 Geometric IntuitionThe Sample Complexity of Neural Networks The Runtime of Learning Neural Networks SGD and BackpropagationSummary Bibliographic Remarks Exercises20.420.520.620.720.820.9Part IIIAdditional Learning Models28521Online Learning21.1 Online Classification in the Realizable Case287288Contentsxv21.1.1 Online LearnabilityOnline Classification in the Unrealizable Case21.2.1 Weig

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论