外文翻译--机器学习的研究.doc_第1页
外文翻译--机器学习的研究.doc_第2页
外文翻译--机器学习的研究.doc_第3页
外文翻译--机器学习的研究.doc_第4页
外文翻译--机器学习的研究.doc_第5页
已阅读5页,还剩16页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1Machine-LearningResearchFourCurrentDirectionsThomasG.DietterichMachine-learningresearchhasbeenmakinggreatprogressinmanydirections.Thisarticlesummarizesfourofthesedirectionsanddiscussessomecurrentopenproblems.Thefourdirectionsare(1)theimprovementofclassificationaccuracybylearningensemblesofclassifiers,(2)methodsforscalingupsupervisedlearningalgorithms,(3)reinforcementlearning,and(4)thelearningofcomplexstochasticmodels.Thelastfiveyearshaveseenanexplosioninmachine-learningresearch.Thisexplosionhasmanycauses:First,separateresearchcommunitiesinsymbolicmachinelearning,computationlearningtheory,neuralnetworks,statistics,andpatternrecognitionhavediscoveredoneanotherandbeguntoworktogether.Second,machine-learningtechniquesarebeingappliedtonewkindsofproblem,includingknowledgediscoveryindatabases,languageprocessing,robotcontrol,andcombinatorialoptimization,aswellastomoretraditionalproblemssuchasspeechrecognition,facerecognition,handwritingrecognition,medicaldataanalysis,andgameplaying.Inthisarticle,Iselectedfourtopicswithinmachinelearningwheretherehasbeenalotofrecentactivity.ThepurposeofthearticleistodescribetheresultsintheseareastoabroaderAIaudienceandtosketchsomeoftheopenresearchproblems.Thetopicareasare(1)ensemblesofclassifiers,(2)methodsforscalingupsupervisedlearningalgorithms,(3)reinforcementlearning,and(4)thelearningofcomplexstochasticmodels.Thereadershouldbecautionedthatthisarticleisnotacomprehensivereviewofeachofthesetopics.Rather,mygoalistoprovidearepresentativesampleoftheresearchineachofthesefourareas.Ineachoftheareas,therearemanyotherpapersthatdescriberelevantwork.IapologizetothoseauthorswhoseworkIwasunabletoincludeinthearticle.EnsemblesofClassifiersThefirsttopicconcernsmethodsforimprovingaccuracyinsupervisedlearning.Ibeginbyintroducingsomenotation.Insupervisedlearning,alearningprogramisgiventrainingexamplesoftheform(x1,y1),(xm,ym)forsomeunknownfunctiony=f(x).Thexivaluesaretypicallyvectorsoftheformwhosecomponentsarediscreteorrealvalued,suchasheight,weight,color,andage.ThesearealsocalledthefeatureofXi,IusethenotationXijto.referto2thejthfeatureofXi.Insomesituations,Idroptheisubscriptwhenitisimpliedbythecontext.Theyvaluesaretypicallydrawnfromadiscretesetofclasses1,kinthecaseofclassificationorfromthereallineinthecaseofregression.Inthisarticle,Ifocusprimarilyonclassification.Thetrainingexamplesmightbecorruptedbysomerandomnoise.GivenasetSoftrainingexamples,alearningalgorithmoutputsaclassifier.Theclassifierisahypothesisaboutthetruefunctionf.Givennewxvalues,itpredictsthecorrespondingyvalues.Idenoteclassifiersbyh1,,hi.Anensembleofclassifierisasetofclassifierswhoseindividualdecisionsarecombinedinsomeway(typicallybyweightedorunweightedvoting)toclassifynewexamples.Oneofthemostactiveareasofresearchinsupervisedlearninghasbeenthestudyofmethodsforconstructinggoodensemblesofclassifiers.Themaindiscoveryisthatensemblesareoftenmuchmoreaccuratethantheindividualclassifiersthatmakethemup.Anensemblecanbeemoreaccuratethanitscomponentclassifiersonlyiftheindividualclassifiersdisagreewithoneanother(HansenandSalamon1990).Toseewhy,imaginethatwehaveanensembleofthreeclassifiers:h1,h2,h3,andconsideranewcasex.Ifthethreeclassifiersareidentical,thenwhenh1(x)iswrong,h2(x)andh3(x)arealsowrong.However,iftheerrorsmadebytheclassifiersareuncorrelated,thenwhenh1(x)iswrong,h2(x)andh3(x)mightbecorrect,sothatamajorityvotecorrectlyclassifiesx.Moreprecisely,iftheerrorratesofLhypotheseshiareallequaltopL/2andiftheerrorsareindependent,thentheprobabilitythatbinomialdistributionwheremorethanL/2hypothesesarewrong.Figure1showsthisareaforasimulatedensembleof21hypotheses,eachhavinganerrorrateof0.3.Theareaunderthecurvefor11ormorehypothesesbeingsimultaneouslywrongis0.026,whichismuchlessthantheerrorrateoftheindividualhypotheses.Ofcourse,iftheindividualhypothesesmakeuncorrelatederrorsatratesexceeding0.5,thentheerrorrateofthevotedensembleincreasesasaresultofthevoting.Hence,thekeytosuccessfulensemblemethodsistoconstructindividualclassifierswitherrorratesbelow0.5whoseerrorsareatleastsomewhatuncorrelated.MethodsforConstructingEnsemblesManymethodsforconstructingensembleshavebeendeveloped.Somemethodsaregeneral,andtheycanbeappliedtoanylearningalgorithm.Othermethodsarespecifictoparticularalgorithms.Ibeginbyreviewingthegeneraltechniques.SubsamplingtheTrainingExamplesThefirstmethodmanipulatesthetrainingexamplestogeneratemultiple3hypotheses.Thelearningalgorithmisrunseveraltimes,eachtimewithadifferentsubsetofthetrainingexamples.Thistechniqueworksespeciallywellforunstablelearningalgorithms-algorithmswhoseoutputclassifierundergoesmajorchangesinresponsetosmallchangesinthetrainingdata.Decisiontree,neuralnetwork,andrule-learningalgorithmsareallunstable.Linear-regression,nearest-neighbor,andlinear-thresholdalgorithmsaregenerallystable.Themoststraightforwardwayofmanipulatingthetrainingsetiscalledbagging.Oneachrun,baggingpresentsthelearningalgorithmwithatrainingsetthatconsistofasampleofmtrainingexamplesdrawnrandomlywithreplacementfromtheoriginaltrainingsetofmitems.Suchatrainingsetiscalledabootstrapreplicateoftheoriginaltrainingset,andthetechniqueiscalledbootstrapaggregation(Breiman1996a).Eachbootstrapreplicatecontains,ontheaverage,63.2percentoftheoriginalset,withseveraltrainingexamplesappearingmultipletimes.Anothertraining-setsamplingmethodistoconstructthetrainingsetsbyleavingoutdisjointsubsets.Then,10overlappingtrainingsetscanbedividedrandomlyinto10disjointsubsets.Then,10overlappingtrainingsetscanbeconstructedbydroppingoutadifferentisusedtoconstructtrainingsetsfortenfoldcross-validation;so,ensemblesconstructedinthiswayaresometimescalledcross-validatedcommittees(Parmanto,Munro,andDoyle1996).ThethirdmethodformanipulatingthetrainingsetisillustratedbytheADABOOSTalgorithm,developedbyFreundandSchapire(1996,1995)andshowninfigure2.Likebagging,ADABOOSTmanipulatesthetrainingexamplestogeneratemultiplehypotheses.ADABOOSTmaintainsaprobabilitydistributionpi(x)overthetrainingexamples.Ineachiterationi,itdrawsatrainingsetofsizembysamplingwithreplacementaccordingtotheprobabilitydistributionpi(x).Thelearningalgorithmisthenappliedtoproduceaclassifierhi.Theerrorrateiofthisclassifieronthetrainingexamples(weightedaccordingtopi(x)iscomputedandusedtoadjusttheprobabilitydistributiononthetrainingexamples.(Infigure2,notethattheprobabilitydistributionisobtainedbynormalizingasetofweightswi(i)overthetrainingexamples.)Theeffectofthechangeinweightsistoplacemoreweightonexamplesthatweremisclassifiedbyhiandlessweightonexamplesthatwerecorrectlyclassified.Insubsequentiterations,therefore,ADABOOSTconstructsprogressivelymoredifficultlearningproblems.Thefinalclassifier,hiisconstructsbyaweightedvoteoftheindividualclassifiers.Eachclassifierisweightedaccordingtoitsaccuracyforthedistributionpithatitwastrainedon.Inline4oftheADABOOSTalgorithm(figure2),thebaselearningalgorithmLearniscalledwiththeprobabilitydistributionpi.IfthelearningalgorithmLearncanusethisprobabilitydistributiondirectly,4thenthisproceduregenerallygivesbetterresults.Forexample,Quinlan(1996)developedaversionofthedecisiontree-learningprogramc4.5thatworkswithaweightedtrainingsample.Hisexperimentsshowedthatitworkedextremelywell.Onecanalsoimagineversionsofbackpropagationthatscaledthecomputedoutputerrorfortrainingexample(Xi,Yi)bytheweightpi(i).Errorsforimportanttrainingexampleswouldcauselargergradient-descentstepsthanerrorsforunimportant(low-weight)examples.However,ifthealgorithmcannotusetheprobabilitydistributionpidirectly,thenatrainingsamplecanbeconstructedbydrawingarandomsamplewithreplacementinproportiontotheprobabilitiespi.ThisproceduremakesADABOOSTmorestochastic,butexperimentshaveshownthatitisstilleffective.Figure3comparestheperformanceofc4.5toc4.5withADABOOST.M1(usingrandomsampling).Onepointisplottedforeachof27testdomainstakenfromtheIrvinerepositoryofmachine-learningdatabases(MerzandMurphy1996).Wecanseethatmostpointslieabovetheliney=x,whichindicatesthattheerrorrateofADABOOSTislessthantheerrorrateofc4.5.Figure4comparestheperformanceofbagging(withc4.5)toc4.5alone.Again,weseethatbaggingproducessizablereductionsintheerrorrateofc4.5formanyproblems.Finally,figure5comparesbaggingwithboosting(bothusingc4.5astheunderlyingalgorithm).Theresultsshowthatthetwotechniquesarecomparable,althoughboostingappearstostillhaveanadvantageoverbagging.Wecanseethatmostpointslieabovetheliney=x,whichindicatesthattheerrorrateofADABOOSTislessthantheerrorrateofc4.5.Figure4comparestheperformanceofbagging(withc4.5)toc4.5alone.Again,weseethatbaggingproducessizabler

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论