学习人工智能的深度架构_第1页
学习人工智能的深度架构_第2页
学习人工智能的深度架构_第3页
学习人工智能的深度架构_第4页
学习人工智能的深度架构_第5页
已阅读5页,还剩30页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Fulltextavailableat:

/10.1561/2200000006

LearningDeepArchitecturesforAI

LearningDeepArchitecturesforAI

YoshuaBengio

Dept.IRO,Universit´edeMontr´eal

C.P.6128,Montreal,Qc

Canada

yoshua.b

engio@umontreal.ca

Boston–Delft

FoundationsandTrendsQRinMachineLearning

Published,soldanddistributedby:

nowPublishersInc.

POBox1024

Hanover,MA02339USA

Tel.+1-781-985-4510

sales@

OutsideNorthAmerica:

nowPublishersInc.

POBox179

2600ADDelftTheNetherlands

Tel.+31-6-51115274

ThepreferredcitationforthispublicationisY.Bengio,LearningDeepArchitectures

forAI,FoundationandTrendsQRinMachineLearning,vol2,no1,pp1–127,2009

ISBN:978-1-60198-294-0

Qc2009Y.Bengio

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,mechanical,photocopying,recordingorotherwise,withoutpriorwrittenpermissionofthepublishers.

Photocopying.IntheUSA:ThisjournalisregisteredattheCopyrightClearanceCen-ter,Inc.,222RosewoodDrive,Danvers,MA01923.Authorizationtophotocopyitemsforinternalorpersonaluse,ortheinternalorpersonaluseofspecificclients,isgrantedbynowPublishersInc.forusersregisteredwiththeCopyrightClearanceCenter(CCC).The‘services’foruserscanbefoundontheinternetat:

Forthoseorganizationsthathavebeengrantedaphotocopylicense,aseparatesystemofpaymenthasbeenarranged.Authorizationdoesnotextendtootherkindsofcopy-ing,suchasthatforgeneraldistribution,foradvertisingorpromotionalpurposes,forcreatingnewcollectiveworks,orforresale.Intherestoftheworld:Permissiontopho-tocopymustbeobtainedfromthecopyrightowner.PleaseapplytonowPublishersInc.,POBox1024,Hanover,MA02339,USA;Tel.+1-781-871-0245;;

sales@

nowPublishersInc.hasanexclusivelicensetopublishthismaterialworldwide.Permissiontousethiscontentmustbeobtainedfromthecopyrightlicenseholder.PleaseapplytonowPublishers,POBox179,2600ADDelft,TheNetherlands,;e-mail:

sales@

FoundationsandTrendsQRinMachineLearningVolume2Issue1,2009

EditorialBoard

Editor-in-Chief:

MichaelJordan

DepartmentofElectricalEngineeringandComputerScienceDepartmentofStatistics

UniversityofCalifornia,BerkeleyBerkeley,CA94720-1776

Editors

PeterBartlett(UCBerkeley)

YoshuaBengio(Universit´edeMontr´eal)AvrimBlum(CarnegieMellonUniversity)CraigBoutilier(UniversityofToronto)StephenBoyd(StanfordUniversity)

CarlaBrodley(TuftsUniversity)InderjitDhillon(UniversityofTexasatAustin)

JeromeFriedman(StanfordUniversity)KenjiFukumizu(InstituteofStatisticalMathematics)

ZoubinGhahramani(CambridgeUniversity)

DavidHeckerman(MicrosoftResearch)TomHeskes(RadboudUniversityNijmegen)GeoffreyHinton(UniversityofToronto)AapoHyvarinen(HelsinkiInstituteforInformationTechnology)

LesliePackKaelbling(MIT)MichaelKearns(UniversityofPennsylvania)

DaphneKoller(StanfordUniversity)

JohnLafferty(CarnegieMellonUniversity)MichaelLittman(RutgersUniversity)GaborLugosi(PompeuFabraUniversity)DavidMadigan(ColumbiaUniversity)PascalMassart(Universit´edeParis-Sud)AndrewMcCallum(UniversityofMassachusettsAmherst)

MarinaMeila(UniversityofWashington)AndrewMoore(CarnegieMellonUniversity)

JohnPlatt(MicrosoftResearch)

LucdeRaedt(Albert-LudwigsUniversitaetFreiburg)

ChristianRobert(Universit´eParis-Dauphine)

SunitaSarawagi(IITBombay)

RobertSchapire(PrincetonUniversity)BernhardSchoelkopf(MaxPlanckInstitute)RichardSutton(UniversityofAlberta)LarryWasserman(CarnegieMellonUniversity)

BinYu(UCBerkeley)

EditorialScope

FoundationsandTrendsQRinMachineLearningwillpublishsur-

veyandtutorialarticlesinthefollowingtopics:

Adaptivecontrolandsignalprocessing

Applicationsandcasestudies

Behavioral,cognitiveandneurallearning

Bayesianlearning

Classificationandprediction

Clustering

Datamining

Dimensionalityreduction

Evaluation

Gametheoreticlearning

Graphicalmodels

Independentcomponentanalysis

Inductivelogicprogramming

Kernelmethods

MarkovchainMonteCarlo

Modelchoice

Nonparametricmethods

Onlinelearning

Optimization

Reinforcementlearning

Relationallearning

Robustness

Spectralmethods

Statisticallearningtheory

Variationalinference

Visualization

InformationforLibrarians

FoundationsandTrendsQRinMachineLearning,2009,Volume2,4issues.

ISSNpaperversion1935-8237.ISSNonlineversion1935-8245.Alsoavailableasacombinedpaperandonlinesubscription.

FoundationsandTrendsQRinMachineLearning

Vol.2,No.1(2009)1–127

Qc2009Y.Bengio

DOI:10.1561/2200000006

LearningDeepArchitecturesforAI

YoshuaBengio

Dept.IRO,Universit´edeMontr´eal,C.P.6128,Montreal,Qc,H3C3J7,Canada,

yoshua.bengio@umontreal.ca

Abstract

Theoreticalresultssuggestthatinordertolearnthekindofcom-plicatedfunctionsthatcanrepresenthigh-levelabstractions(e.g.,invision,language,andotherAI-leveltasks),onemayneeddeeparchitec-tures.Deeparchitecturesarecomposedofmultiplelevelsofnon-linearoperations,suchasinneuralnetswithmanyhiddenlayersorincom-plicatedpropositionalformulaere-usingmanysub-formulae.Searchingtheparameterspaceofdeeparchitecturesisadifficulttask,butlearningalgorithmssuchasthoseforDeepBeliefNetworkshaverecentlybeenproposedtotacklethisproblemwithnotablesuccess,beatingthestate-of-the-artincertainareas.Thismonographdiscussesthemotivationsandprinciplesregardinglearningalgorithmsfordeeparchitectures,inparticularthoseexploitingasbuildingblocksunsupervisedlearningofsingle-layermodelssuchasRestrictedBoltzmannMachines,usedtoconstructdeepermodelssuchasDeepBeliefNetworks.

Contents

Introduction

1

HowdoWeTrainDeepArchitectures?

4

IntermediateRepresentations:SharingFeaturesand

AbstractionsAcrossTasks

6

DesiderataforLearningAI

9

OutlineofthePaper

10

TheoreticalAdvantagesofDeepArchitectures 13

ComputationalComplexity 16

InformalArguments 18

LocalvsNon-LocalGeneralization 21

TheLimitsofMatchingLocalTemplates 21

LearningDistributedRepresentations 27

NeuralNetworksforDeepArchitectures 31

Multi-LayerNeuralNetworks 31

TheChallengeofTrainingDeepNeuralNetworks 32

UnsupervisedLearningforDeepArchitectures 40

DeepGenerativeArchitectures 41

ConvolutionalNeuralNetworks 44

Auto-Encoders 46

ix

Energy-BasedModelsandBoltzmannMachines 49

Energy-BasedModelsandProductsofExperts 49

BoltzmannMachines 54

RestrictedBoltzmannMachines 56

ContrastiveDivergence 60

GreedyLayer-WiseTrainingofDeep

Architectures 69

Layer-WiseTrainingofDeepBeliefNetworks 69

TrainingStackedAuto-Encoders 72

Semi-SupervisedandPartiallySupervisedTraining 73

VariantsofRBMsandAuto-Encoders 75

SparseRepresentationsinAuto-Encoders

andRBMs 75

DenoisingAuto-Encoders 81

LateralConnections 83

ConditionalRBMsandTemporalRBMs 84

FactoredRBMs 86

GeneralizingRBMsandContrastiveDivergence 87

StochasticVariationalBoundsforJoint

OptimizationofDBNLayers 91

UnfoldingRBMsintoInfiniteDirected

BeliefNetworks 92

VariationalJustificationofGreedyLayer-wiseTraining 94

JointUnsupervisedTrainingofAlltheLayers 97

LookingForward 101

GlobalOptimizationStrategies 101

WhyUnsupervisedLearningisImportant 107

OpenQuestions 108

Conclusion 113

Acknowledgments 115

References

117

1

Introduction

Allowingcomputerstomodelourworldwellenoughtoexhibitwhatwecallintelligencehasbeenthefocusofmorethanhalfacenturyofresearch.Toachievethis,itisclearthatalargequantityofinforma-tionaboutourworldshouldsomehowbestored,explicitlyorimplicitly,inthecomputer.Becauseitseemsdauntingtoformalizemanuallyallthatinformationinaformthatcomputerscanusetoanswerques-tionsandgeneralizetonewcontexts,manyresearchershaveturnedtolearningalgorithmstocapturealargefractionofthatinformation.Muchprogresshasbeenmadetounderstandandimprovelearningalgorithms,butthechallengeofartificialintelligence(AI)remains.Dowehavealgorithmsthatcanunderstandscenesanddescribetheminnaturallanguage?Notreally,exceptinverylimitedsettings.Dowehavealgorithmsthatcaninferenoughsemanticconceptstobeabletointeractwithmosthumansusingtheseconcepts?No.Ifweconsiderimageunderstanding,oneofthebestspecifiedoftheAItasks,wereal-izethatwedonotyethavelearningalgorithmsthatcandiscoverthemanyvisualandsemanticconceptsthatwouldseemtobenecessarytointerpretmostimagesontheweb.ThesituationissimilarforotherAItasks.

1

2 Introduction

Fig.1.1Wewouldliketherawinputimagetobetransformedintograduallyhigherlevelsofrepresentation,representingmoreandmoreabstractfunctionsoftherawinput,e.g.,edges,localshapes,objectparts,etc.Inpractice,wedonotknowinadvancewhatthe“right”representationshouldbeforalltheselevelsofabstractions,althoughlinguisticconceptsmighthelpguessingwhatthehigherlevelsshouldimplicitlyrepresent.

ConsiderforexamplethetaskofinterpretinganinputimagesuchastheoneinFigure

1.1.

WhenhumanstrytosolveaparticularAItask(suchasmachinevisionornaturallanguageprocessing),theyoftenexploittheirintuitionabouthowtodecomposetheproblemintosub-problemsandmultiplelevelsofrepresentation,e.g.,inobjectpartsandconstellationmodels

[138,

179,

197]

wheremodelsforpartscanbere-usedindifferentobjectinstances.Forexample,thecurrentstate-of-the-artinmachinevisioninvolvesasequenceofmodulesstartingfrompixelsandendinginalinearorkernelclassifier

[134,

145],

withintermediatemodulesmixingengineeredtransformationsandlearning,

3

e.g.,firstextractinglow-levelfeaturesthatareinvarianttosmallgeo-metricvariations(suchasedgedetectorsfromGaborfilters),transform-ingthemgradually(e.g.,tomaketheminvarianttocontrastchangesandcontrastinversion,sometimesbypoolingandsub-sampling),andthendetectingthemostfrequentpatterns.Aplausibleandcommonwaytoextractusefulinformationfromanaturalimageinvolvestrans-formingtherawpixelrepresentationintograduallymoreabstractrep-resentations,e.g.,startingfromthepresenceofedges,thedetectionofmorecomplexbutlocalshapes,uptotheidentificationofabstractcat-egoriesassociatedwithsub-objectsandobjectswhicharepartsoftheimage,andputtingallthesetogethertocaptureenoughunderstandingofthescenetoanswerquestionsaboutit.

Here,weassumethatthecomputationalmachinerynecessarytoexpresscomplexbehaviors(whichonemightlabel“intelligent”)requireshighlyvaryingmathematicalfunctions,i.e.,mathematicalfunc-tionsthatarehighlynon-linearintermsofrawsensoryinputs,anddisplayaverylargenumberofvariations(upsanddowns)acrossthedomainofinterest.Weviewtherawinputtothelearningsystemasahighdimensionalentity,madeofmanyobservedvariables,whicharerelatedbyunknownintricatestatisticalrelationships.Forexample,usingknowledgeofthe3Dgeometryofsolidobjectsandlighting,wecanrelatesmallvariationsinunderlyingphysicalandgeometricfac-tors(suchasposition,orientation,lightingofanobject)withchangesinpixelintensitiesforallthepixelsinanimage.Wecallthesefactorsofvariationbecausetheyaredifferentaspectsofthedatathatcanvaryseparatelyandoftenindependently.Inthiscase,explicitknowledgeofthephysicalfactorsinvolvedallowsonetogetapictureofthemath-ematicalformofthesedependencies,andoftheshapeofthesetofimages(aspointsinahigh-dimensionalspaceofpixelintensities)asso-ciatedwiththesame3Dobject.Ifamachinecapturedthefactorsthatexplainthestatisticalvariationsinthedata,andhowtheyinteracttogeneratethekindofdataweobserve,wewouldbeabletosaythatthemachineunderstandsthoseaspectsoftheworldcoveredbythesefactorsofvariation.Unfortunately,ingeneralandformostfactorsofvariationunderlyingnaturalimages,wedonothaveananalyticalunderstand-ingofthesefactorsofvariation.Wedonothaveenoughformalized

4 Introduction

priorknowledgeabouttheworldtoexplaintheobservedvarietyofimages,evenforsuchanapparentlysimpleabstractionasMAN,illus-tratedinFigure

1.1.

Ahigh-levelabstractionsuchasMANhasthepropertythatitcorrespondstoaverylargesetofpossibleimages,whichmightbeverydifferentfromeachotherfromthepointofviewofsimpleEuclideandistanceinthespaceofpixelintensities.Thesetofimagesforwhichthatlabelcouldbeappropriateformsahighlycon-volutedregioninpixelspacethatisnotevennecessarilyaconnectedregion.TheMANcategorycanbeseenasahigh-levelabstractionwithrespecttothespaceofimages.Whatwecallabstractionherecanbeacategory(suchastheMANcategory)orafeature,afunctionofsensorydata,whichcanbediscrete(e.g.,theinputsentenceisatthepasttense)orcontinuous(e.g.,theinputvideoshowsanobjectmovingat2meter/second).Manylower-levelandintermediate-levelconcepts(whichwealsocallabstractionshere)wouldbeusefultoconstructaMAN-detector.Lowerlevelabstractionsaremoredirectlytiedtoparticularpercepts,whereashigherlevelonesarewhatwecall“moreabstract”becausetheirconnectiontoactualperceptsismoreremote,andthroughother,intermediate-levelabstractions.

Inadditiontothedifficultyofcomingupwiththeappropriateinter-mediateabstractions,thenumberofvisualandsemanticcategories(suchasMAN)thatwewouldlikean“intelligent”machinetocap-tureisratherlarge.Thefocusofdeeparchitecturelearningistoauto-maticallydiscoversuchabstractions,fromthelowestlevelfeaturestothehighestlevelconcepts.Ideally,wewouldlikelearningalgorithmsthatenablethisdiscoverywithaslittlehumaneffortaspossible,i.e.,withouthavingtomanuallydefineallnecessaryabstractionsorhav-ingtoprovideahugesetofrelevanthand-labeledexamples.Ifthesealgorithmscouldtapintothehugeresourceoftextandimagesontheweb,itwouldcertainlyhelptotransfermuchofhumanknowledgeintomachine-interpretableform.

HowdoWeTrainDeepArchitectures?

Deeplearningmethodsaimatlearningfeaturehierarchieswithfea-turesfromhigherlevelsofthehierarchyformedbythecompositionof

HowdoWeTrainDeepArchitectures? 5

lowerlevelfeatures.Automaticallylearningfeaturesatmultiplelevelsofabstractionallowasystemtolearncomplexfunctionsmappingtheinputtotheoutputdirectlyfromdata,withoutdependingcompletelyonhuman-craftedfeatures.Thisisespeciallyimportantforhigher-levelabstractions,whichhumansoftendonotknowhowtospecifyexplic-itlyintermsofrawsensoryinput.Theabilitytoautomaticallylearnpowerfulfeatureswillbecomeincreasinglyimportantastheamountofdataandrangeofapplicationstomachinelearningmethodscontinuestogrow.

Depthofarchitecturereferstothenumberoflevelsofcompositionofnon-linearoperationsinthefunctionlearned.Whereasmostcur-rentlearningalgorithmscorrespondtoshallowarchitectures(1,2or3levels),themammalbrainisorganizedinadeeparchitecture

[173]

withagiveninputperceptrepresentedatmultiplelevelsofabstrac-tion,eachlevelcorrespondingtoadifferentareaofcortex.Humansoftendescribesuchconceptsinhierarchicalways,withmultiplelevelsofabstraction.Thebrainalsoappearstoprocessinformationthroughmultiplestagesoftransformationandrepresentation.Thisispartic-ularlyclearintheprimatevisualsystem

[173],

withitssequenceofprocessingstages:detectionofedges,primitiveshapes,andmovinguptograduallymorecomplexvisualshapes.

Inspiredbythearchitecturaldepthofthebrain,neuralnetworkresearchershadwantedfordecadestotraindeepmulti-layerneuralnetworks

[19,

191],

butnosuccessfulattemptswerereportedbefore2006

1

:researchersreportedpositiveexperimentalresultswithtypicallytwoorthreelevels(i.e.,oneortwohiddenlayers),buttrainingdeepernetworksconsistentlyyieldedpoorerresults.Somethingthatcanbeconsideredabreakthroughhappenedin2006:Hintonetal.atUniver-sityofTorontointroducedDeepBeliefNetworks(DBNs)

[73],

withalearningalgorithmthatgreedilytrainsonelayeratatime,exploitinganunsupervisedlearningalgorithmforeachlayer,aRestrictedBoltz-mannMachine(RBM)

[51].

Shortlyafter,relatedalgorithmsbasedonauto-encoderswereproposed

[17,

153],

apparentlyexploitingthe

1Exceptforneuralnetworkswithaspecialstructurecalledconvolutionalnetworks,dis-cussedinSection4.5.

6 Introduction

sameprinciple:guidingthetrainingofintermediatelevelsofrepresen-tationusingunsupervisedlearning,whichcanbeperformedlocallyateachlevel.OtheralgorithmsfordeeparchitectureswereproposedmorerecentlythatexploitneitherRBMsnorauto-encodersandthatexploitthesameprinciple

[131,

202]

(seeSection4).

Since2006,deepnetworkshavebeenappliedwithsuccessnotonlyinclassificationtasks

[2,

17,

99,

111,

150,

153,

195],

butalso

inregression

[160],

dimensionalityreduction

[74,

158],

modelingtex-

tures

[141],

modelingmotion

[182,

183]

,objectsegmentation

[114],

informationretrieval

[154,

159,

190],

robotics

[60],

naturallanguage

processing

[37,

130,

202],

andcollaborativefiltering[

162].

Althoughauto-encoders,RBMsandDBNscanbetrainedwithunlabeleddata,inmanyoftheaboveapplications,theyhavebeensuccessfullyusedtoinitializedeepsupervisedfeedforwardneuralnetworksappliedtoaspecifictask.

IntermediateRepresentations:SharingFeaturesandAbstractionsAcrossTasks

Sinceadeeparchitecturecanbeseenasthecompositionofaseriesofprocessingstages,theimmediatequestionthatdeeparchitecturesraiseis:whatkindofrepresentationofthedatashouldbefoundastheoutputofeachstage(i.e.,theinputofanother)?Whatkindofinterfaceshouldtherebebetweenthesestages?Ahallmarkofrecentresearchondeeparchitecturesisthefocusontheseintermediaterepresentations:thesuccessofdeeparchitecturesbelongstotherepresentationslearnedinanunsupervisedwaybyRBMs

[73],

ordinaryauto-encoders

[17],

sparseauto-encoders

[150,

153],

ordenoisingauto-encoders

[195].

Thesealgo-rithms(describedinmoredetailinSection7.2)canbeseenaslearn-ingtotransformonerepresentation(theoutputofthepreviousstage)intoanother,ateachstepmaybedisentanglingbetterthefactorsofvariationsunderlyingthedata.AswediscussatlengthinSection4,ithasbeenobservedagainandagainthatonceagoodrepresenta-tionhasbeenfoundateachlevel,itcanbeusedtoinitializeandsuccessfullytrainadeepneuralnetworkbysupervisedgradient-basedoptimization.

SharingFeaturesandAbstractionsAcrossTasks 7

Eachlevelofabstractionfoundinthebrainconsistsofthe“activa-tion”(neuralexcitation)ofasmallsubsetofalargenumberoffeaturesthatare,ingeneral,notmutuallyexclusive.Becausethesefeaturesarenotmutuallyexclusive,theyformwhatiscalledadistributedrepresen-tation

[68,

156]:

theinformationisnotlocalizedinaparticularneuronbutdistributedacrossmany.Inadditiontobeingdistributed,itappearsthatthebrainusesarepresentationthatissparse:onlyaaround1-4%oftheneuronsareactivetogetheratagiventime

[5,

113].

Sec-tion3.2introducesthenotionofsparsedistributedrepresentationandSection7.1describesinmoredetailthemachinelearningapproaches,someinspiredbytheobservationsofthesparserepresentationsinthebrain,thathavebeenusedtobuilddeeparchitectureswithsparserep-resentations.

Whereasdensedistributedrepresentationsareoneextremeofaspectrum,andsparserepresentationsareinthemiddleofthatspec-trum,purelylocalrepresentationsaretheotherextreme.Localityofrepresentationisintimatelyconnectedwiththenotionoflocalgener-alization.Manyexistingmachinelearningmethodsarelocalininputspace:toobtainalearnedfunctionthatbehavesdifferentlyindifferentregionsofdata-space,theyrequiredifferenttunableparametersforeachoftheseregions(seemoreinSection3.1).Eventhoughstatisticaleffi-ciencyisnotnecessarilypoorwhenthenumberoftunableparametersislarge,goodgeneralizationcanbeobtainedonlywhenaddingsomeformofprior(e.g.,thatsmallervaluesoftheparametersarepreferred).Whenthatpriorisnottask-specific,itisoftenonethatforcesthesolutiontobeverysmooth,asdiscussedattheendofSection3.1.Incontrasttolearningmethodsbasedonlocalgeneralization,thetotalnumberofpatternsthatcanbedistinguishedusingadistributedrepresentationscalespossiblyexponentiallywiththedimensionoftherepresentation(i.e.,thenumberoflearnedfeatures).

Inmanymachinevisionsystems,learningalgorithmshavebeenlim-itedtospecificpartsofsuchaprocessingchain.Therestofthedesignremainslabor-intensive,whichmightlimitthescaleofsuchsystems.Ontheotherhand,ahallmarkofwhatwewouldconsiderintelligentmachinesincludesalargeenoughrepertoireofconcepts.RecognizingMANisnotenough.Weneedalgorithmsthatcantackleaverylarge

8 Introduction

setofsuchtasksandconcepts.Itseemsdauntingtomanuallydefinethatmanytasks,andlearningbecomesessentialinthiscontext.Fur-thermore,itwouldseemfoolishnottoexploittheunderlyingcommon-alitiesbetweenthesetasksandbetweentheconceptstheyrequire.Thishasbeenthefocusofresearchonmulti-tasklearning

[7,

8,

32,

88,

186].

Architectureswithmultiplelevelsnaturallyprovidesuchsharingandre-useofcomponents:thelow-levelvisualfeatures(likeedgedetec-tors)andintermediate-levelvisualfeatures(likeobjectparts)thatareusefultodetectMANarealsousefulforalargegroupofothervisualtasks.Deeplearningalgorithmsarebasedonlearningintermediaterep-resentationswhichcanbesharedacrosstasks.Hencetheycanleverageunsuperviseddataanddatafromsimilartasks

[148]

toboostperfor-manceonlargeandchallengingproblemsthatroutinelysufferfromapovertyoflabelleddata,ashasbeenshownby

[37],

beatingthestate-of-the-artinseveralnaturallanguageprocessingtasks.Asimi-larmulti-taskapproachfordeeparchitectureswasappliedinvisiontasksby

[2].

Consideramulti-tasksettinginwhichtherearedifferentoutputsfordifferenttasks,allobtainedfromasharedpoolofhigh-levelfeatures.Thefactthatmanyoftheselearnedfeaturesaresharedamongmtasksprovidessharingofstatisticalstrengthinproportiontom.Nowconsiderthattheselearnedhigh-levelfeaturescanthem-selvesberepresentedbycombininglower-levelintermediatefeaturesfromacommonpool.Againstatisticalstrengthcanbegainedinasim-ilarway,andthisstrategycanbeexploitedforeverylevelofadeeparchitecture.

Inaddition,learningaboutalargesetofinterrelatedconceptsmightprovideakeytothekindofbroadgeneralizationsthathumansappearabletodo,whichwewouldnotexpectfromseparatelytrainedobjectdetectors,withonedetectorpervisualcategory.Ifeachhigh-levelcate-goryisitselfrepresentedthroughaparticulardistributedconfigurationofabstractfeaturesfromacommonpool,generalizationtounseencate-goriescouldfollownaturallyfromnewconfigurationsofthesefeatures.Eventhoughonlysomeconfigurationsofthesefeatureswouldpresentinthetrainingexamples,iftheyrepresentdifferentaspectsofthedata,newexamplescouldmeaningfullyberepresentedbynewconfigurationsofthesefeatures.

DesiderataforLearningAI 9

DesiderataforLearningAI

Summarizingsomeoftheaboveissues,andtryingtoputtheminthebroaderperspectiveofAI,weputforwardanumberofrequirementswebelievetobeimportantforlearningalgorithmstoapproachAI,manyofwhichmotivatetheresearcharedescribedhere:

Abilitytolearncomplex,highly-varyingfunctions,i.e.,withanumberofvariationsmuchgreaterthanthenumberoftrainingexamples.

Abilitytolearnwithlittlehumaninputthelow-level,

intermediate,andhigh-levelabstractionsthatwouldbeuse-fultorepresentthekindofcomplexfunctionsneededforAItasks.

Abilitytolearnfromaverylargesetofexamples:computa-

tiontimefortrainingshouldscalewellwiththenumberofexamples,i.e.,closetolinearly.

Abilitytolearnfrommostlyunlabeleddata,i.e.,toworkin

thesemi-supervisedsetting,wherenotalltheexamplescomewithcompleteandcorrectsemanticlabels.

Abilitytoexploitthesynergiespresentacrossalargenum-

beroftasks,i.e.,multi-tasklearning.ThesesynergiesexistbecausealltheAItasksprovidedifferentviewsonthesameunderlyingreality.

Strongunsupervisedlearning(i.e.,capturingmostofthesta-

tisticalstructureintheobserveddata),whichseemsessentialinthelimitofalargenumberoftasksandwhenfuturetasksarenotknownaheadoftime.

Otherelementsareequallyimportantbutarenotdirectlyconnectedtothematerialinthismonograph.Theyincludetheabilitytolearntorepresentcontextofvaryinglengthandstructure

[146],

soastoallowmachinestooperateinacontext-dependentstreamofobservationsandproduceastreamofactions,theabilitytomakedecisionswhenactionsinfluencethefutureobservationsandfuturerewards

[181]

,andtheabilitytoinfluencefutureobservationssoastocollectmorerelevantinformationabouttheworld,i.e.,aformofactivelearning

[34].

10 Introduction

OutlineofthePaper

Section2reviewstheoreticalresults(whichcanbeskippedwithouthurtingtheunderstandingoftheremainder)showingthatanarchi-tecturewithinsufficientdepthcanrequiremanymorecomputationalelements,potentiallyexponentia

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论