华南理工大学《高级人工智能》演讲报告书_第1页
华南理工大学《高级人工智能》演讲报告书_第2页
华南理工大学《高级人工智能》演讲报告书_第3页
华南理工大学《高级人工智能》演讲报告书_第4页
华南理工大学《高级人工智能》演讲报告书_第5页
已阅读5页,还剩15页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

工大学《高级人工智能》演讲报告书题目:Machinelearning:Trends,perspectives,andprospects(Unsupervisedlearningand

featurereduction)学院计算机科学与工程专业计算机科学与技术(全英创新班)学生姓名黄炜杰学生学号201230590051指导教师陈琼起始日期2015年11月1日选题概述【小组论文选题】Machinelearning:Trends,perspectives,andprospects.【个人完成部分】Unsupervisedlearningandfeaturereduction论文学习报告【论文结构】通过对论文的三次细读,本人对论文内容结构理解如下:机器学习的定义与发展现状机器学习核心算法2.1有监督学习2.1.1有监督学习综述2.1.2深度学习综述2.2无监督学习2.2.1无监督想学习综述2.2.2特征降维方法2.3强化学习机器学习中亟待解决问题3.1数据隐私性3.2数据分布性机器学习发展机遇与挑战【内容总结】根据本人的理解,对文章的所有内容进行了如下总结:机器学习的定义与发展现状:机器学习主要集中在研究2个问题:机器能否通过经验提升性能?是什么样的统计理论原理支持着所有的学习系统?机器学习在这几十年取得了长足的发展。之所以机器学习取得那么大的进步,是通过经验学习的机器,比纯手工打代码高效得多。学习问题可定义为对经验的研究来提高某个度量函数,这个度量函数可以是有监督学习里的准确率等。机器学习中有很多算法,总的来说可以看作为对待选解集的一个搜索,这个搜索是通过对经验的学习来对性能进行优化的过程,根据待选解的表示、搜索的方法,机器学习算法可以分为很多类。机器学习算法背后依靠的原理就是统计学中的最优化原理,机器学习是计算机科学与统计学的交叉学科,这个交叉学科仍处在发芽阶段。最近由于大数据的兴起,算法可伸缩性、数据隐私性等问题对机器学习算法提出了新的挑战。其中算法可伸缩性还需要解决的是一个数据粒度性问题。机器学习核心算法:2・1有监督学习:有监督学习的应用很广泛,有如垃圾邮件过滤、人脸识别等。有监督学习是度量函数最优化的一个典型例子,学习的任务就是学习一个映射函数巳把任何输入样本x*映射到类别y*上。其中发展最为成功的是二分类问题,在多分类、ranking、结构预测问题上也有丰富的研究成果。其中卓越的算法有如决策树、逻辑回归、支持向量机、神经网络等。一些通用性比较强的学习算法受到了学者们的认可,其中主要以集成学习为主。集成学习集成了多个分类器的学习结果,对数据的判别具有普遍性。因为现实问题的提出,对有监督学习算法提出了各种各样的要求,根据算法复杂度和性能的折衷,已有很多成功的算法。2.1・1深度学习:深度学习是近几年最热门的课题之一,是对人工神经网络的提升。深度学习网络由多层的threshold单元组成,利用梯度原理对网络的权值进行优化调整,使得误差最小化。因为算法的卓越性,它可以处理几百万个参数、大规模的数据集,正因为具有如此良好的伸缩性,这个算法已成功应用在计算机视觉、自然语言处理等领域。如今深度学习的领域仍然处于膨胀阶段,越来越多的学者投入到此领域中,发展前景无可估量。2.2无监督学习:无监督学习是对无标签数据学习的一个过程。其中最具有代表性的是聚类学习,聚类的任务是把数据组成数据簇,使得簇内数据相似度最大,簇间数据相似度最小。特定的聚类算法都是对数据具有一定的分布假设,通过分布的假设发现数据的内在结构。与聚类相似,特征降维也是基于对数据分布的假设进行的,部分特征降维算法可以视为无监督学习。特征降维在大数据处理上的地位尤为重要,著名的特征降维方法有如PCA、LDA等。2.3强化学习:所谓强化学习就是智能系统从环境到行为映射的学习,以使奖励信号(强化信号)函数值最大,强化学习不同于连接主义学习中的监督学习,主要表现在教师信号上,强化学习中由环境提供的强化信号是对产生动作的好坏作一种评价(通常为标量信号),而不是告诉强化学习系统RLS(reinforcementlearningsystem)如何去产生正确的动作。由于外部环境提供的信息很少,RLS必须靠自身的经历进行学习。通过这种方式,RLS在行动-评价的环境中获得知识,改进行动方案以适应环境。2.4三种学习算法的结合:三种学习算法都取得了一定的成功,近期的研究很多集中在三种算法的混合当中,其中的例子有如半监督学习,在聚类过程中,部分样本的类属关系可以得知,而这个有监督的类属关系作为约束提供在无监督聚类当中。另外的例子有如模型选择、主动学习、causal学习等。很多的问题影响到学习算法的设计,其中包括串行与并行性、鲁棒性等问题。机器学习亟待解决问题:数据的隐私性与分布性是最迫切的两个问题。其中数据的隐私性涉及了政策原因、数据拥有权、数据隐私度等问题。有的数据人们是愿意分享出来以供大家研究的,而有的数据则希望绝对保密,对于同一数据,不同人也有不同的想法。如何把握数据隐私的度量是重要的道德性问题。由于数据的大规模以及分布性,把数据集中在一起处理通常不可能,有如全国连锁的分公司在各地都拥有巨大数据库,只能通过分布式学习然后结合学习结果。现代的算法应该越来越注重并行性,以使得它能在实际环境中很好的运作。机器学习机遇与挑战:现代机器学习虽然取得了很大的发展,可以离真正的智能还离很远。永不止境的自适应学习机是机器学习的最高愿望。作为下一步,半人工合作式学习算法是一个新课题,通过人工的介入,使得机器能分析处理各种各种的复杂数据。

拓展学习这篇论文是一篇综述性的文章,没有公式、算法、实验、证明,也没提及无监督学习上的前沿方向,页数也是最少的。所以为了更好地进行理论学习,本人另外阅读了2篇论文作为拓展学习:PengH,LongF,DingC(2005)Featureselectionbasedonmutualinformation:criteriaofmax-dependency,max-relevance,andmin-redundancy.IEEETransPatternAnalMachIntell27:1226-238.X.Z.FernandC.E.Brodley.Randomprojectionforhighdimensionaldataclustering:Aclusterensembleapproach.InMachineLearning,ProceedingsoftheInternationalConferenceon,2003.其中第一篇讲的是特征选择,第二篇讲的是随机映射(特征降维方法)以及无监督学习中的前沿方向----聚类集成。由于本人一开始做的是英文的PPT和学习报告,后来和组员PPT合并时才协商一起用中文,所以我把PPT和上面的内容翻译成了中文。下面拓展学习部分我写了一个晚上,不想再花一个晚上翻译成中文(。。),所以保持使用英文,望见谅。ountFeatureselectionountFeatureselectionwasproposedin1970andtherehasbeengreatamofworkstudiedonit.Objectiveoffeaturereductionisthree-fold:Improvingtheaccuracyofclassification,provideafasterandmorecost-effectivepredictors,andprovideabetterunderstandingoftheunderlyingprocessthatgeneratedthedata.Featureselectionisdifferentfromfeatureextraction,whichcanbeseeninthefigurebelow:Targetoffeatureselectionistoselectasubsetoffeaturesinsteadofmappingthemintolowdimension.Givenasetoffeatures:,FeatureSelectionproblemisdefinedasfindingasubsetthatmaximizesthelearner'sabilitytoclassifypatterns.Moreformally,F'shouldmaximizesomescoringfunctions厂where「isthespaceofallpossiblefeaturesubsetsofF:F'=argmax~「加(b}G^tFrameworkoffeatureselectionisgivenasfollow:Wheretwomainpartofitisgenerationstepandevaluationstep.Forgenerationstep,themaintaskisselectcandidatesubsetoffeatureforevaluation.Thereare3waysinhowthefeaturespaceisexamined:(1)Complete⑵Heuristic(3)Random.Complete/exhaustive:Examineallcombinationsofpossiblefeaturesubsetwhichcontainelements,forexamplewecanexamfeature(f1,f2,f3}inthisway:(f1,f2,f3}=>((f1},(f2},(f3},(f1,f2},(f1,f3},(f2,f3},(f1,f2,f3}}.Optimalsubsetisachievableifwesearchallthepossiblesolution,butit'stooexpensiveiffeaturespaceisverylarge.⑵HeuristicSelectionisdirectedundercertainguideline.Startwithemptyfeatureset(orfullset),select(ordelete)onefeatureineachstepuntilthetargetnumberoffeaturesisachieved.Forexampletheincrementalgenerationofsubsets:(f1}一(f1,f3}一{f1,f3,f2}.RandomNopredefinedwaytoselectfeaturecandidate,pickfeatureatrandom.Requiremoreuser-definedinputparameterslikethetimeoftry.Accordingtowhetherthelearningalgorithmisparticipateintheselectionstep,featureselectionmethodcanbedividedintothreecategory:filter,wrapper,andembedded,whichisgivenasfollow:Wrapperapproach:EmbeddedapproachFilterApproachisusuallyfast.Itprovidegenericselectionoffeatures,nottunedbygivenlearningalgorithm.ButitstpediftGOstatisticalmethod,notoptimizedforusedclassifier,sosometimesfiltermethodsareusedasapre-processingstepforothermethods.Forwrapperapproach,learnerisconsideredablack-box,usedtoscoresubsetsaccordingtotheirpredictivepower.Theaccuracyisusuallyhigh,butresultvaryfordifferentlearners,lossgeneralization.Oneneedstodefine:howtosearchthespaceofallpossiblevariablesubsets(possibleselections)andhowtoassessthepredictionperformanceofacertainsubset.FindingoptimalsubsetisNP-hard!Awiderangeofheuristicsearchstrategiescanbeused:IDPT,Branch-and-boundmethod,simulatedannealing,TABUsearchalgorithm,geneticalgorithm,forwardselection(startwithemptyfeaturesetandaddfeaturesateachstep)andbackwarddeletion(startwithfullfeaturesetanddeleteonefeatureateachstep).Predictivepowerisusuallymeasuredonavalidationsetorbycross-validation.Drawbackofwrappermethodisthatalargeamountofcomputationisrequired,hasdangerofoverfitting.Embeddedapproachisspecifictoagivenlearningmachine.Itcombinetheadvantagesofbothpreviousmethods:reducetheclassificationoflearning,takesadvantageofitsownvariableselectionalgorithmandusuallyimplementedbyatwo-stepormulti-stepprocess.Forevaluationstep,themaintaskisusuallyimplementedbyatwo-stepormulti-stepprocess.5maintypeofevaluationfunctionsare:distance(Euclideandistancemeasure),information(entropy,informationgain,etc.),dependency(correlationcoefficient),consistency(min-featuresbias),andclassificationerrorrate.Wherethefirstfourmethodareusedforfiltermethodandthelastoneisforwrapper.Anapplicationoffeatureselectioninsupervisedlearningisgiveninthefollowing,whichisextractedforthepaperFeatureselectionbasedonmutualinformation:criteriaofmax-dependency,max-relevance,andmin-redundancy'.Optimalcharacterizationconditionoffeatureselectioninsupervisedlearningisminimalclassificationerrorandmaximalstatisticaldependencyisformaximalstatisticaldependency.OneofthemostpopularapproachestorealizeMax-Dependencyismaximalrelevance,whichmeansthatoneofthemostpopularapproachestorealizeMax-Dependencyismaximalrelevance:ProblemsofthismethodisCombinationsofindividuallygoodfeaturesdonotnecessarilyleadtogoodclassificationperformance,i.e.“thembestfeaturesarenotthebestmfeatures".Andalsorelevancealonemayintroducerichredundancy.Sofeatureswithminimumredundancyshouldalsobeconsidered.Sotheauthorproposedanotheralgorithmthatsolvetheproblem.Mainworkofthepaperconsistofthreepart:(1)PresentatheoreticalanalysisshowingthatmRMR(max-relevanceandmin-redundancy)isequivalenttoMax-Dependencyforfirst-orderfeatureselection.(2)InvestigatehowtocombinemRMRwithotherfeatureselectionmethodsintoatwo-stageselectionalgorithm.(3)ComparemRMR,Max-Relevance,Max-Dependency,andthetwo-stagefeatureselectionalgorithmthroughcomprehensiveexperiments.Sincethefirstpartisunrelatedtothecourseproject,soIskippeditandonlyoneexperimentintheoriginalpaperwillbementioned.TheproposedalgorithmisnamedmRMR(Max-RelevanceandMin-Redundancy),wheremax-relevancemeansselectfeaturesthataremostrelevanttothetargetclass,i.e.selectfeaturessatisfying:,'宅eSI(x,y)ismutualinformationthatIhadmentionedbefore.AndMin-Redundancymeansthatselectfeaturesthatnotredundantwithselectedfeatures,whichsatisfying:minR(S).—t5Z心5)I占I殆*SThenaOperator):,isdefinetoachievethismulti-objectoptimizationtaskwhichcombineDandR,optimizeDandRsimultaneously:J?).=D—li.Inpractice,incrementalsearchmethodscanbeusedtofindthenear-optimalfeatures:平?峪贝叼;砰一七,(叼;工J一I..盅茂—LJUntilnowthisnotthewholeprocessofthealgorithm,it'sonlyahalfofit.ThealgorithminthispaperisAtwo-stageprocess:(1)FindacandidatefeaturesubsetusingmRMRincrementalselectionmethod.(2)Usemoresophisticatedmethod(classifierinvolved)tosearchacompactfeaturesubsetfromthecandidatesubset.Sothatthistwo-stagealgorithmisacaseofembeddedmethod.Thefirststageisgivenasfollow:UsemRMRincrementalselectiontoselectsequentialfeatures:SiUS2U…CSn_iCSnCompareclassificationerrorofallthesubsets,,findtherangeofk,called0,withinwhichtherespectiveerror门isconsistentlysmall.Within0,findsmallesterrore=min〔n,optimalsubsetsizeisthekcorrespondstoc.TheSecondstageisgivenasfollow:Forbackwardselection:Excludeoneredundantfeatureifresultanterrorissmallerthaneachtime(selecttheoneleadstogreatesterrorreduction).Terminateifnoerrorreductioncanbeobtained.Forforwardselection:Selectonefeaturewhichleadstogreatesterrorreductioneachtime.Terminateiferrorbeginstoincrease.Nowthealgorithmofthispaperiscomplete.Bestevaluatehoweffectiveandefficientthisalgorithmis,thereisalsoaproblemthathowtojudgewhichalgorithmissuperior.SotheauthordefineameasurementofRM-characteristic.GiventwofeaturesetLandS:,whichisgeneratesequentially:S::s]仁s!仁…ZS;匚…仁cS£疣虚暖.…•磴

Wesaythat5;isrecursivelymorecharacteristic(RM-characteristic)thans*byp%,ifforp%ofk,erToissmallerthanW.NB0.250.21Q20知WWfe&iLienumber--a-rri^MRMokRciHDRdatasetARRdatasetLDAMa缺102QNB0.250.21Q20知WWfe&iLienumber--a-rri^MRMokRciHDRdatasetARRdatasetLDAMa缺102Q3^40$0feeiyrtiwbermRMRMaxRfll0.15W20X4050fuafLureinunhern1-—01020SO4050featuenumberM51。W205?4Q5CfvaL*jrenijntrerFigureaboveisoneoftheresultofexperimentgiveninthepaper.Eachrowisforadifferentdatasetandeachcolumnisfordifferentclassificationalgorithm.Foreachgraph,X-axisdenotesthenumberofselectedfeatures,Y-axisisforerrorrate.Thelineonthetopwithtriangleonitistheproposedalgorithmandthebuttononeisthestate-of-artalgorithmonthattime.Asshownintheresult,classificationaccuracycanbesignificantlyimprovedbasedonmRMRfeatureselection.Thereisalsoanexperimentdonebymyselftoverifythatfeatureselectionmethodcanimproveaccuracy:

41015202&JO3543455Q55IMuienumtwjThisexperimentiscarriedonSpambasedatasetbySVMalgorithmwithlinearkernel.X-axisdenotesthenumberofselectedfeatures,Y-axisisforaccuracy.Redlineistheproposedalgorithm,othersarebaseline,traditional,random.Wecanseethattheproposedalgorithmperformsthebest.SoIamconvincedthatfeatureselectionmethodscanimprovedaccuracyoflearningalgorithm.RandomprojectionRandomprojectionisoneoffeatureextractionalgorithm.MostfamousfeatureextractionalgorithmincludesPCA,LDA,LLEetc.RandomprojectionismentionedasLSHmethodsometimesandit'shighlyunstable,soit'snotsofamous.Butit'squietusefulinsomecaseandmuchefficientthanthatofmostfamousalgorithmsuchasPCA.Mainstepsofrandomprojectioncanbeintroducedbriefly:(1)Selectasetofhigh-dimensionalunitvectors(notnecessaryorthogonal)randomly(2)ProjecthighdimensiondataintolowdimensionbyproductionofthesevectorsSuchstepssoundssimpleandsomewhatunreliable,butinfactthere'sLemmathatguaranteetheprecisionofit,whichiscalledJohnson-LindenstraussLemma.Mainideaofitis,ItispossibletoprojectnpointsinaspaceofarbitrarilyhighdimensionontoanO(logn)-dimensionalspace,suchthatthepairwisedistancesbetweenthepointsareapproximatelypreserved.Moreformally:Johiison-LindenTheoremForanyflcs<Iandanypositiveintegern,leiJtbeapositiveintegersuchth就k>4(孝/2—『/3)ThmThenforanyseiFcfmpointshithereisamapf:TR?'suchthatforallU.〃£W(1<||/(W)-/(U)|||2<(1+E)||也Furthermore,thismapcanbefoundinexpectedpolynomialtime.HereweusesampledistanceasameasureofgoodnessoffeaturereductionperformanceforthereasonthatoneoftheObjectiveoffeaturereductionisthatpairwisedistancesofthepointsareapproximatelythesameasbefore.Indataminingarea,weknowthatdatasethastwowayofrepresentation:DatamatrixandDiscernibilityMatrix:X1I-xlfxid0心)oxil-xif-xidW)dg)0xitl0DatamatrixDiscernibilityMatrixIfpairwisedistanceofdatapointsreserveprecisely,thentheDiscernibilityMatrixretainmostoftheinformationfortheoriginaldataset,andwesaythatthafsagoodfeaturereductionmethod.Thereareseveralwaysforrandomprojection「Rh.WeadopttheoneintheoriginalJohnson-Lindenstrausspaper:Theprojection|duetoJohnson-Lindenstrauss)LetAbearandomk%matrixthatprojectsR,'onicaunJoirnrandomk-dimensicnalsubspaceMultiplyJbyafixedscalar...Forevery『厂vismappedtoJU,[?,VkVkTomakeabetterunderstanding,Idrawagraphfortheprocess:Advantageofrandomprojectionisthatitdoesnotuseanydefined“interestingness”criterionlikePCAandHigh-dimensionaldistributionslookmorelikeGaussianwhenprojectedtolowdimensional.Butit'sahighlyunstablealgorithm,forexample:Theleftpictureisthetruedistributionofahighdimensionaldataset(use2ofitsfeaturestomakethegraph).Themiddleandrightistwosinglerunofclusteringalgorithmafterrandomprojection.Wecanfindthatresultofeachrunmakehavegreatdifference.Butit'sjustthisunstableperformanceprovideamultiviewofthesamedataset,whichisusefulinensemblelearning.Clusterensemble■1Ensemblelearningisahottopicintheseyears.Clusterensembleisoneofthenewesttopicofunsupervisedlearning.Frameworkofclassificationensembleisshownasfollow:Givenacertaindataset,wefirstgenerateadifferentviewofthedataset,whichcanbeimplementedbybootstrapsampling,randomfeaturesubspaceorothermethod.Thenweusedifferentlearningalgorithmorthesamealgorithmwithdifferentparameterorevenjustthesamealgorithmtogenerateservaldifferentclassifier.Whenanewdatacomesin,multiclassifiercanbeusedtoclassifyitandobtainthefinalclassificationresultbasedonvotingschemeorothermethod.Clusterensembleisalmostthesamewithclassificationensemble:Butthemaindifferenceisthatclusteringsolutionofeachrunmayhavedifferentoutputlabelanddifferentnumberofoutputclass,whichmakeitimpossibletoobtainafinalresultbyvotingschemedirectly.Soaconsensusfunctionneedstobedefinetocombinetheresultofmultiruns.ThereisanapplicationofrandomprojectionIhadmentionedbeforeandacasestudyforclusterensemble.Ifsgivenbythepaper„Randomprojectionforhighdimensionaldataclustering-Aclusterensembleapproach'publishedbyIMCLin2013.Firstofall,randomprojectionisusedtoprovidedifferentrepresentationoftheoriginaldataset.Inordertoshowthesuperiorperformanceofrandomprojection,PCAalgorithmisalsousedtomakeacomparisonwithrandomprojection.Firststepofthegenerationistogenerateannxnsimilaritymatrix:Foreachrun,EMgeneratesaprobabilisticmodel0ofamixturekGaussianProbabilityofpointibelongstoclusterlisdenotebyP(l|i,0)*Probabilityofpointiandpointjdenotetosameclusterisdenotedby:AndthenPijisaverageformultiplerun.Toverifytheusefulnessofthismetric,theauthorplotahistogramforwhethersampleiandsamplejbelongstosameclusterornot:l|Pwliand[:MilTiEZala?laWecanseethattheyhavedifferentandlittleoverlap,soit'sagoodmetricforsimilaritymatrix.Thenwecanusethefollowingalgorithmtoobtainthefinalclustering:Inputs:Pisaxrisimilaritymatrix^isadesiredJiumbfacofcliisterR.Output:apartitiono

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论