版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Fulltextavailableat:
/10.1561/2200000006
LearningDeepArchitecturesforAI
LearningDeepArchitecturesforAI
YoshuaBengio
Dept.IRO,Universit´edeMontr´eal
C.P.6128,Montreal,Qc
Canada
yoshua.b
engio@umontreal.ca
Boston–Delft
FoundationsandTrendsQRinMachineLearning
Published,soldanddistributedby:
nowPublishersInc.
POBox1024
Hanover,MA02339USA
Tel.+1-781-985-4510
sales@
OutsideNorthAmerica:
nowPublishersInc.
POBox179
2600ADDelftTheNetherlands
Tel.+31-6-51115274
ThepreferredcitationforthispublicationisY.Bengio,LearningDeepArchitectures
forAI,FoundationandTrendsQRinMachineLearning,vol2,no1,pp1–127,2009
ISBN:978-1-60198-294-0
Qc2009Y.Bengio
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,mechanical,photocopying,recordingorotherwise,withoutpriorwrittenpermissionofthepublishers.
Photocopying.IntheUSA:ThisjournalisregisteredattheCopyrightClearanceCen-ter,Inc.,222RosewoodDrive,Danvers,MA01923.Authorizationtophotocopyitemsforinternalorpersonaluse,ortheinternalorpersonaluseofspecificclients,isgrantedbynowPublishersInc.forusersregisteredwiththeCopyrightClearanceCenter(CCC).The‘services’foruserscanbefoundontheinternetat:
Forthoseorganizationsthathavebeengrantedaphotocopylicense,aseparatesystemofpaymenthasbeenarranged.Authorizationdoesnotextendtootherkindsofcopy-ing,suchasthatforgeneraldistribution,foradvertisingorpromotionalpurposes,forcreatingnewcollectiveworks,orforresale.Intherestoftheworld:Permissiontopho-tocopymustbeobtainedfromthecopyrightowner.PleaseapplytonowPublishersInc.,POBox1024,Hanover,MA02339,USA;Tel.+1-781-871-0245;;
sales@
nowPublishersInc.hasanexclusivelicensetopublishthismaterialworldwide.Permissiontousethiscontentmustbeobtainedfromthecopyrightlicenseholder.PleaseapplytonowPublishers,POBox179,2600ADDelft,TheNetherlands,;e-mail:
sales@
FoundationsandTrendsQRinMachineLearningVolume2Issue1,2009
EditorialBoard
Editor-in-Chief:
MichaelJordan
DepartmentofElectricalEngineeringandComputerScienceDepartmentofStatistics
UniversityofCalifornia,BerkeleyBerkeley,CA94720-1776
Editors
PeterBartlett(UCBerkeley)
YoshuaBengio(Universit´edeMontr´eal)AvrimBlum(CarnegieMellonUniversity)CraigBoutilier(UniversityofToronto)StephenBoyd(StanfordUniversity)
CarlaBrodley(TuftsUniversity)InderjitDhillon(UniversityofTexasatAustin)
JeromeFriedman(StanfordUniversity)KenjiFukumizu(InstituteofStatisticalMathematics)
ZoubinGhahramani(CambridgeUniversity)
DavidHeckerman(MicrosoftResearch)TomHeskes(RadboudUniversityNijmegen)GeoffreyHinton(UniversityofToronto)AapoHyvarinen(HelsinkiInstituteforInformationTechnology)
LesliePackKaelbling(MIT)MichaelKearns(UniversityofPennsylvania)
DaphneKoller(StanfordUniversity)
JohnLafferty(CarnegieMellonUniversity)MichaelLittman(RutgersUniversity)GaborLugosi(PompeuFabraUniversity)DavidMadigan(ColumbiaUniversity)PascalMassart(Universit´edeParis-Sud)AndrewMcCallum(UniversityofMassachusettsAmherst)
MarinaMeila(UniversityofWashington)AndrewMoore(CarnegieMellonUniversity)
JohnPlatt(MicrosoftResearch)
LucdeRaedt(Albert-LudwigsUniversitaetFreiburg)
ChristianRobert(Universit´eParis-Dauphine)
SunitaSarawagi(IITBombay)
RobertSchapire(PrincetonUniversity)BernhardSchoelkopf(MaxPlanckInstitute)RichardSutton(UniversityofAlberta)LarryWasserman(CarnegieMellonUniversity)
BinYu(UCBerkeley)
EditorialScope
FoundationsandTrendsQRinMachineLearningwillpublishsur-
veyandtutorialarticlesinthefollowingtopics:
Adaptivecontrolandsignalprocessing
Applicationsandcasestudies
Behavioral,cognitiveandneurallearning
Bayesianlearning
Classificationandprediction
Clustering
Datamining
Dimensionalityreduction
Evaluation
Gametheoreticlearning
Graphicalmodels
Independentcomponentanalysis
Inductivelogicprogramming
Kernelmethods
MarkovchainMonteCarlo
Modelchoice
Nonparametricmethods
Onlinelearning
Optimization
Reinforcementlearning
Relationallearning
Robustness
Spectralmethods
Statisticallearningtheory
Variationalinference
Visualization
InformationforLibrarians
FoundationsandTrendsQRinMachineLearning,2009,Volume2,4issues.
ISSNpaperversion1935-8237.ISSNonlineversion1935-8245.Alsoavailableasacombinedpaperandonlinesubscription.
FoundationsandTrendsQRinMachineLearning
Vol.2,No.1(2009)1–127
Qc2009Y.Bengio
DOI:10.1561/2200000006
LearningDeepArchitecturesforAI
YoshuaBengio
Dept.IRO,Universit´edeMontr´eal,C.P.6128,Montreal,Qc,H3C3J7,Canada,
yoshua.bengio@umontreal.ca
Abstract
Theoreticalresultssuggestthatinordertolearnthekindofcom-plicatedfunctionsthatcanrepresenthigh-levelabstractions(e.g.,invision,language,andotherAI-leveltasks),onemayneeddeeparchitec-tures.Deeparchitecturesarecomposedofmultiplelevelsofnon-linearoperations,suchasinneuralnetswithmanyhiddenlayersorincom-plicatedpropositionalformulaere-usingmanysub-formulae.Searchingtheparameterspaceofdeeparchitecturesisadifficulttask,butlearningalgorithmssuchasthoseforDeepBeliefNetworkshaverecentlybeenproposedtotacklethisproblemwithnotablesuccess,beatingthestate-of-the-artincertainareas.Thismonographdiscussesthemotivationsandprinciplesregardinglearningalgorithmsfordeeparchitectures,inparticularthoseexploitingasbuildingblocksunsupervisedlearningofsingle-layermodelssuchasRestrictedBoltzmannMachines,usedtoconstructdeepermodelssuchasDeepBeliefNetworks.
Contents
Introduction
1
HowdoWeTrainDeepArchitectures?
4
IntermediateRepresentations:SharingFeaturesand
AbstractionsAcrossTasks
6
DesiderataforLearningAI
9
OutlineofthePaper
10
TheoreticalAdvantagesofDeepArchitectures 13
ComputationalComplexity 16
InformalArguments 18
LocalvsNon-LocalGeneralization 21
TheLimitsofMatchingLocalTemplates 21
LearningDistributedRepresentations 27
NeuralNetworksforDeepArchitectures 31
Multi-LayerNeuralNetworks 31
TheChallengeofTrainingDeepNeuralNetworks 32
UnsupervisedLearningforDeepArchitectures 40
DeepGenerativeArchitectures 41
ConvolutionalNeuralNetworks 44
Auto-Encoders 46
ix
Energy-BasedModelsandBoltzmannMachines 49
Energy-BasedModelsandProductsofExperts 49
BoltzmannMachines 54
RestrictedBoltzmannMachines 56
ContrastiveDivergence 60
GreedyLayer-WiseTrainingofDeep
Architectures 69
Layer-WiseTrainingofDeepBeliefNetworks 69
TrainingStackedAuto-Encoders 72
Semi-SupervisedandPartiallySupervisedTraining 73
VariantsofRBMsandAuto-Encoders 75
SparseRepresentationsinAuto-Encoders
andRBMs 75
DenoisingAuto-Encoders 81
LateralConnections 83
ConditionalRBMsandTemporalRBMs 84
FactoredRBMs 86
GeneralizingRBMsandContrastiveDivergence 87
StochasticVariationalBoundsforJoint
OptimizationofDBNLayers 91
UnfoldingRBMsintoInfiniteDirected
BeliefNetworks 92
VariationalJustificationofGreedyLayer-wiseTraining 94
JointUnsupervisedTrainingofAlltheLayers 97
LookingForward 101
GlobalOptimizationStrategies 101
WhyUnsupervisedLearningisImportant 107
OpenQuestions 108
Conclusion 113
Acknowledgments 115
References
117
1
Introduction
Allowingcomputerstomodelourworldwellenoughtoexhibitwhatwecallintelligencehasbeenthefocusofmorethanhalfacenturyofresearch.Toachievethis,itisclearthatalargequantityofinforma-tionaboutourworldshouldsomehowbestored,explicitlyorimplicitly,inthecomputer.Becauseitseemsdauntingtoformalizemanuallyallthatinformationinaformthatcomputerscanusetoanswerques-tionsandgeneralizetonewcontexts,manyresearchershaveturnedtolearningalgorithmstocapturealargefractionofthatinformation.Muchprogresshasbeenmadetounderstandandimprovelearningalgorithms,butthechallengeofartificialintelligence(AI)remains.Dowehavealgorithmsthatcanunderstandscenesanddescribetheminnaturallanguage?Notreally,exceptinverylimitedsettings.Dowehavealgorithmsthatcaninferenoughsemanticconceptstobeabletointeractwithmosthumansusingtheseconcepts?No.Ifweconsiderimageunderstanding,oneofthebestspecifiedoftheAItasks,wereal-izethatwedonotyethavelearningalgorithmsthatcandiscoverthemanyvisualandsemanticconceptsthatwouldseemtobenecessarytointerpretmostimagesontheweb.ThesituationissimilarforotherAItasks.
1
2 Introduction
Fig.1.1Wewouldliketherawinputimagetobetransformedintograduallyhigherlevelsofrepresentation,representingmoreandmoreabstractfunctionsoftherawinput,e.g.,edges,localshapes,objectparts,etc.Inpractice,wedonotknowinadvancewhatthe“right”representationshouldbeforalltheselevelsofabstractions,althoughlinguisticconceptsmighthelpguessingwhatthehigherlevelsshouldimplicitlyrepresent.
ConsiderforexamplethetaskofinterpretinganinputimagesuchastheoneinFigure
1.1.
WhenhumanstrytosolveaparticularAItask(suchasmachinevisionornaturallanguageprocessing),theyoftenexploittheirintuitionabouthowtodecomposetheproblemintosub-problemsandmultiplelevelsofrepresentation,e.g.,inobjectpartsandconstellationmodels
[138,
179,
197]
wheremodelsforpartscanbere-usedindifferentobjectinstances.Forexample,thecurrentstate-of-the-artinmachinevisioninvolvesasequenceofmodulesstartingfrompixelsandendinginalinearorkernelclassifier
[134,
145],
withintermediatemodulesmixingengineeredtransformationsandlearning,
3
e.g.,firstextractinglow-levelfeaturesthatareinvarianttosmallgeo-metricvariations(suchasedgedetectorsfromGaborfilters),transform-ingthemgradually(e.g.,tomaketheminvarianttocontrastchangesandcontrastinversion,sometimesbypoolingandsub-sampling),andthendetectingthemostfrequentpatterns.Aplausibleandcommonwaytoextractusefulinformationfromanaturalimageinvolvestrans-formingtherawpixelrepresentationintograduallymoreabstractrep-resentations,e.g.,startingfromthepresenceofedges,thedetectionofmorecomplexbutlocalshapes,uptotheidentificationofabstractcat-egoriesassociatedwithsub-objectsandobjectswhicharepartsoftheimage,andputtingallthesetogethertocaptureenoughunderstandingofthescenetoanswerquestionsaboutit.
Here,weassumethatthecomputationalmachinerynecessarytoexpresscomplexbehaviors(whichonemightlabel“intelligent”)requireshighlyvaryingmathematicalfunctions,i.e.,mathematicalfunc-tionsthatarehighlynon-linearintermsofrawsensoryinputs,anddisplayaverylargenumberofvariations(upsanddowns)acrossthedomainofinterest.Weviewtherawinputtothelearningsystemasahighdimensionalentity,madeofmanyobservedvariables,whicharerelatedbyunknownintricatestatisticalrelationships.Forexample,usingknowledgeofthe3Dgeometryofsolidobjectsandlighting,wecanrelatesmallvariationsinunderlyingphysicalandgeometricfac-tors(suchasposition,orientation,lightingofanobject)withchangesinpixelintensitiesforallthepixelsinanimage.Wecallthesefactorsofvariationbecausetheyaredifferentaspectsofthedatathatcanvaryseparatelyandoftenindependently.Inthiscase,explicitknowledgeofthephysicalfactorsinvolvedallowsonetogetapictureofthemath-ematicalformofthesedependencies,andoftheshapeofthesetofimages(aspointsinahigh-dimensionalspaceofpixelintensities)asso-ciatedwiththesame3Dobject.Ifamachinecapturedthefactorsthatexplainthestatisticalvariationsinthedata,andhowtheyinteracttogeneratethekindofdataweobserve,wewouldbeabletosaythatthemachineunderstandsthoseaspectsoftheworldcoveredbythesefactorsofvariation.Unfortunately,ingeneralandformostfactorsofvariationunderlyingnaturalimages,wedonothaveananalyticalunderstand-ingofthesefactorsofvariation.Wedonothaveenoughformalized
4 Introduction
priorknowledgeabouttheworldtoexplaintheobservedvarietyofimages,evenforsuchanapparentlysimpleabstractionasMAN,illus-tratedinFigure
1.1.
Ahigh-levelabstractionsuchasMANhasthepropertythatitcorrespondstoaverylargesetofpossibleimages,whichmightbeverydifferentfromeachotherfromthepointofviewofsimpleEuclideandistanceinthespaceofpixelintensities.Thesetofimagesforwhichthatlabelcouldbeappropriateformsahighlycon-volutedregioninpixelspacethatisnotevennecessarilyaconnectedregion.TheMANcategorycanbeseenasahigh-levelabstractionwithrespecttothespaceofimages.Whatwecallabstractionherecanbeacategory(suchastheMANcategory)orafeature,afunctionofsensorydata,whichcanbediscrete(e.g.,theinputsentenceisatthepasttense)orcontinuous(e.g.,theinputvideoshowsanobjectmovingat2meter/second).Manylower-levelandintermediate-levelconcepts(whichwealsocallabstractionshere)wouldbeusefultoconstructaMAN-detector.Lowerlevelabstractionsaremoredirectlytiedtoparticularpercepts,whereashigherlevelonesarewhatwecall“moreabstract”becausetheirconnectiontoactualperceptsismoreremote,andthroughother,intermediate-levelabstractions.
Inadditiontothedifficultyofcomingupwiththeappropriateinter-mediateabstractions,thenumberofvisualandsemanticcategories(suchasMAN)thatwewouldlikean“intelligent”machinetocap-tureisratherlarge.Thefocusofdeeparchitecturelearningistoauto-maticallydiscoversuchabstractions,fromthelowestlevelfeaturestothehighestlevelconcepts.Ideally,wewouldlikelearningalgorithmsthatenablethisdiscoverywithaslittlehumaneffortaspossible,i.e.,withouthavingtomanuallydefineallnecessaryabstractionsorhav-ingtoprovideahugesetofrelevanthand-labeledexamples.Ifthesealgorithmscouldtapintothehugeresourceoftextandimagesontheweb,itwouldcertainlyhelptotransfermuchofhumanknowledgeintomachine-interpretableform.
HowdoWeTrainDeepArchitectures?
Deeplearningmethodsaimatlearningfeaturehierarchieswithfea-turesfromhigherlevelsofthehierarchyformedbythecompositionof
HowdoWeTrainDeepArchitectures? 5
lowerlevelfeatures.Automaticallylearningfeaturesatmultiplelevelsofabstractionallowasystemtolearncomplexfunctionsmappingtheinputtotheoutputdirectlyfromdata,withoutdependingcompletelyonhuman-craftedfeatures.Thisisespeciallyimportantforhigher-levelabstractions,whichhumansoftendonotknowhowtospecifyexplic-itlyintermsofrawsensoryinput.Theabilitytoautomaticallylearnpowerfulfeatureswillbecomeincreasinglyimportantastheamountofdataandrangeofapplicationstomachinelearningmethodscontinuestogrow.
Depthofarchitecturereferstothenumberoflevelsofcompositionofnon-linearoperationsinthefunctionlearned.Whereasmostcur-rentlearningalgorithmscorrespondtoshallowarchitectures(1,2or3levels),themammalbrainisorganizedinadeeparchitecture
[173]
withagiveninputperceptrepresentedatmultiplelevelsofabstrac-tion,eachlevelcorrespondingtoadifferentareaofcortex.Humansoftendescribesuchconceptsinhierarchicalways,withmultiplelevelsofabstraction.Thebrainalsoappearstoprocessinformationthroughmultiplestagesoftransformationandrepresentation.Thisispartic-ularlyclearintheprimatevisualsystem
[173],
withitssequenceofprocessingstages:detectionofedges,primitiveshapes,andmovinguptograduallymorecomplexvisualshapes.
Inspiredbythearchitecturaldepthofthebrain,neuralnetworkresearchershadwantedfordecadestotraindeepmulti-layerneuralnetworks
[19,
191],
butnosuccessfulattemptswerereportedbefore2006
1
:researchersreportedpositiveexperimentalresultswithtypicallytwoorthreelevels(i.e.,oneortwohiddenlayers),buttrainingdeepernetworksconsistentlyyieldedpoorerresults.Somethingthatcanbeconsideredabreakthroughhappenedin2006:Hintonetal.atUniver-sityofTorontointroducedDeepBeliefNetworks(DBNs)
[73],
withalearningalgorithmthatgreedilytrainsonelayeratatime,exploitinganunsupervisedlearningalgorithmforeachlayer,aRestrictedBoltz-mannMachine(RBM)
[51].
Shortlyafter,relatedalgorithmsbasedonauto-encoderswereproposed
[17,
153],
apparentlyexploitingthe
1Exceptforneuralnetworkswithaspecialstructurecalledconvolutionalnetworks,dis-cussedinSection4.5.
6 Introduction
sameprinciple:guidingthetrainingofintermediatelevelsofrepresen-tationusingunsupervisedlearning,whichcanbeperformedlocallyateachlevel.OtheralgorithmsfordeeparchitectureswereproposedmorerecentlythatexploitneitherRBMsnorauto-encodersandthatexploitthesameprinciple
[131,
202]
(seeSection4).
Since2006,deepnetworkshavebeenappliedwithsuccessnotonlyinclassificationtasks
[2,
17,
99,
111,
150,
153,
195],
butalso
inregression
[160],
dimensionalityreduction
[74,
158],
modelingtex-
tures
[141],
modelingmotion
[182,
183]
,objectsegmentation
[114],
informationretrieval
[154,
159,
190],
robotics
[60],
naturallanguage
processing
[37,
130,
202],
andcollaborativefiltering[
162].
Althoughauto-encoders,RBMsandDBNscanbetrainedwithunlabeleddata,inmanyoftheaboveapplications,theyhavebeensuccessfullyusedtoinitializedeepsupervisedfeedforwardneuralnetworksappliedtoaspecifictask.
IntermediateRepresentations:SharingFeaturesandAbstractionsAcrossTasks
Sinceadeeparchitecturecanbeseenasthecompositionofaseriesofprocessingstages,theimmediatequestionthatdeeparchitecturesraiseis:whatkindofrepresentationofthedatashouldbefoundastheoutputofeachstage(i.e.,theinputofanother)?Whatkindofinterfaceshouldtherebebetweenthesestages?Ahallmarkofrecentresearchondeeparchitecturesisthefocusontheseintermediaterepresentations:thesuccessofdeeparchitecturesbelongstotherepresentationslearnedinanunsupervisedwaybyRBMs
[73],
ordinaryauto-encoders
[17],
sparseauto-encoders
[150,
153],
ordenoisingauto-encoders
[195].
Thesealgo-rithms(describedinmoredetailinSection7.2)canbeseenaslearn-ingtotransformonerepresentation(theoutputofthepreviousstage)intoanother,ateachstepmaybedisentanglingbetterthefactorsofvariationsunderlyingthedata.AswediscussatlengthinSection4,ithasbeenobservedagainandagainthatonceagoodrepresenta-tionhasbeenfoundateachlevel,itcanbeusedtoinitializeandsuccessfullytrainadeepneuralnetworkbysupervisedgradient-basedoptimization.
SharingFeaturesandAbstractionsAcrossTasks 7
Eachlevelofabstractionfoundinthebrainconsistsofthe“activa-tion”(neuralexcitation)ofasmallsubsetofalargenumberoffeaturesthatare,ingeneral,notmutuallyexclusive.Becausethesefeaturesarenotmutuallyexclusive,theyformwhatiscalledadistributedrepresen-tation
[68,
156]:
theinformationisnotlocalizedinaparticularneuronbutdistributedacrossmany.Inadditiontobeingdistributed,itappearsthatthebrainusesarepresentationthatissparse:onlyaaround1-4%oftheneuronsareactivetogetheratagiventime
[5,
113].
Sec-tion3.2introducesthenotionofsparsedistributedrepresentationandSection7.1describesinmoredetailthemachinelearningapproaches,someinspiredbytheobservationsofthesparserepresentationsinthebrain,thathavebeenusedtobuilddeeparchitectureswithsparserep-resentations.
Whereasdensedistributedrepresentationsareoneextremeofaspectrum,andsparserepresentationsareinthemiddleofthatspec-trum,purelylocalrepresentationsaretheotherextreme.Localityofrepresentationisintimatelyconnectedwiththenotionoflocalgener-alization.Manyexistingmachinelearningmethodsarelocalininputspace:toobtainalearnedfunctionthatbehavesdifferentlyindifferentregionsofdata-space,theyrequiredifferenttunableparametersforeachoftheseregions(seemoreinSection3.1).Eventhoughstatisticaleffi-ciencyisnotnecessarilypoorwhenthenumberoftunableparametersislarge,goodgeneralizationcanbeobtainedonlywhenaddingsomeformofprior(e.g.,thatsmallervaluesoftheparametersarepreferred).Whenthatpriorisnottask-specific,itisoftenonethatforcesthesolutiontobeverysmooth,asdiscussedattheendofSection3.1.Incontrasttolearningmethodsbasedonlocalgeneralization,thetotalnumberofpatternsthatcanbedistinguishedusingadistributedrepresentationscalespossiblyexponentiallywiththedimensionoftherepresentation(i.e.,thenumberoflearnedfeatures).
Inmanymachinevisionsystems,learningalgorithmshavebeenlim-itedtospecificpartsofsuchaprocessingchain.Therestofthedesignremainslabor-intensive,whichmightlimitthescaleofsuchsystems.Ontheotherhand,ahallmarkofwhatwewouldconsiderintelligentmachinesincludesalargeenoughrepertoireofconcepts.RecognizingMANisnotenough.Weneedalgorithmsthatcantackleaverylarge
8 Introduction
setofsuchtasksandconcepts.Itseemsdauntingtomanuallydefinethatmanytasks,andlearningbecomesessentialinthiscontext.Fur-thermore,itwouldseemfoolishnottoexploittheunderlyingcommon-alitiesbetweenthesetasksandbetweentheconceptstheyrequire.Thishasbeenthefocusofresearchonmulti-tasklearning
[7,
8,
32,
88,
186].
Architectureswithmultiplelevelsnaturallyprovidesuchsharingandre-useofcomponents:thelow-levelvisualfeatures(likeedgedetec-tors)andintermediate-levelvisualfeatures(likeobjectparts)thatareusefultodetectMANarealsousefulforalargegroupofothervisualtasks.Deeplearningalgorithmsarebasedonlearningintermediaterep-resentationswhichcanbesharedacrosstasks.Hencetheycanleverageunsuperviseddataanddatafromsimilartasks
[148]
toboostperfor-manceonlargeandchallengingproblemsthatroutinelysufferfromapovertyoflabelleddata,ashasbeenshownby
[37],
beatingthestate-of-the-artinseveralnaturallanguageprocessingtasks.Asimi-larmulti-taskapproachfordeeparchitectureswasappliedinvisiontasksby
[2].
Consideramulti-tasksettinginwhichtherearedifferentoutputsfordifferenttasks,allobtainedfromasharedpoolofhigh-levelfeatures.Thefactthatmanyoftheselearnedfeaturesaresharedamongmtasksprovidessharingofstatisticalstrengthinproportiontom.Nowconsiderthattheselearnedhigh-levelfeaturescanthem-selvesberepresentedbycombininglower-levelintermediatefeaturesfromacommonpool.Againstatisticalstrengthcanbegainedinasim-ilarway,andthisstrategycanbeexploitedforeverylevelofadeeparchitecture.
Inaddition,learningaboutalargesetofinterrelatedconceptsmightprovideakeytothekindofbroadgeneralizationsthathumansappearabletodo,whichwewouldnotexpectfromseparatelytrainedobjectdetectors,withonedetectorpervisualcategory.Ifeachhigh-levelcate-goryisitselfrepresentedthroughaparticulardistributedconfigurationofabstractfeaturesfromacommonpool,generalizationtounseencate-goriescouldfollownaturallyfromnewconfigurationsofthesefeatures.Eventhoughonlysomeconfigurationsofthesefeatureswouldpresentinthetrainingexamples,iftheyrepresentdifferentaspectsofthedata,newexamplescouldmeaningfullyberepresentedbynewconfigurationsofthesefeatures.
DesiderataforLearningAI 9
DesiderataforLearningAI
Summarizingsomeoftheaboveissues,andtryingtoputtheminthebroaderperspectiveofAI,weputforwardanumberofrequirementswebelievetobeimportantforlearningalgorithmstoapproachAI,manyofwhichmotivatetheresearcharedescribedhere:
Abilitytolearncomplex,highly-varyingfunctions,i.e.,withanumberofvariationsmuchgreaterthanthenumberoftrainingexamples.
Abilitytolearnwithlittlehumaninputthelow-level,
intermediate,andhigh-levelabstractionsthatwouldbeuse-fultorepresentthekindofcomplexfunctionsneededforAItasks.
Abilitytolearnfromaverylargesetofexamples:computa-
tiontimefortrainingshouldscalewellwiththenumberofexamples,i.e.,closetolinearly.
Abilitytolearnfrommostlyunlabeleddata,i.e.,toworkin
thesemi-supervisedsetting,wherenotalltheexamplescomewithcompleteandcorrectsemanticlabels.
Abilitytoexploitthesynergiespresentacrossalargenum-
beroftasks,i.e.,multi-tasklearning.ThesesynergiesexistbecausealltheAItasksprovidedifferentviewsonthesameunderlyingreality.
Strongunsupervisedlearning(i.e.,capturingmostofthesta-
tisticalstructureintheobserveddata),whichseemsessentialinthelimitofalargenumberoftasksandwhenfuturetasksarenotknownaheadoftime.
Otherelementsareequallyimportantbutarenotdirectlyconnectedtothematerialinthismonograph.Theyincludetheabilitytolearntorepresentcontextofvaryinglengthandstructure
[146],
soastoallowmachinestooperateinacontext-dependentstreamofobservationsandproduceastreamofactions,theabilitytomakedecisionswhenactionsinfluencethefutureobservationsandfuturerewards
[181]
,andtheabilitytoinfluencefutureobservationssoastocollectmorerelevantinformationabouttheworld,i.e.,aformofactivelearning
[34].
10 Introduction
OutlineofthePaper
Section2reviewstheoreticalresults(whichcanbeskippedwithouthurtingtheunderstandingoftheremainder)showingthatanarchi-tecturewithinsufficientdepthcanrequiremanymorecomputationalelements,potentiallyexponentia
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024全新老人监护人协议下载
- 2024全新开除离职协议下载
- 2024公证股权转让合同模板
- 2024年涂料分散剂合作协议书
- 2024年天然植物纤维及人造纤维编织工艺品项目建议书
- 2024年提供住宿社会救助服务项目建议书
- 2021-2022学年江苏省无锡市惠山六校联考中考考前最后一卷数学试卷含解析
- 2024年WBC水性汽车金属闪光涂料项目合作计划书
- 钢筋工两单两卡
- 2024保洁员协议书
- 江苏省南京市江北新区2024届中考语文全真模拟试卷含解析
- 班组长生产管理能力考试题库(浓缩500题)
- 抖音本地生活培训课件
- 2024年基金应知应会考试试题及答案
- 中国食物成分表2018年(标准版)第6版 第一册 素食
- 直线上的相遇与追及问题
- 部编版二年级语文下册23 祖先的摇篮-教案
- 中频电治疗仪操作课件.ppt
- 初中英语人教版七年级下册Unit 7 Its raining! (B) 2a-2c.ppt
- Lenovo system xMA SOW 7x24(cn)
- 分析方法的验证.ppt
评论
0/150
提交评论