




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Outline
•
•
•
•
•
QuickoverviewandbackgroundDifferentiabledata-structureLearningparadigms
Neuro-symbolism
Conclusion
Part-I:
OverviewandBackground
Overview
Background:wordembeddingandcompositionmodels
Progressintermsoftasks:
MachinetranslationDialogue
Reasoning
ImagecaptioningNaturallanguageparsing
……
•
•
•
•
•
•
Progressintermsofmethodology(focusofthistutorial):
•
•
•
•
AttentionmodelsExternalmemories
Differentiabledatastructures
End-to-endlearning
Distributedrepresentationofwords
•
Distributedrepresentationofwords
dog
cat
puppy
kitten
Usinghigh-dimensionalreal-valuedvectorstorepresentthemeaningofwords
Distributedrepresentationofsentences
•
Surprisingly,wecanuselong-enoughvectorstorepresentthemeaningofsentences
MaryislovedbyJohn
JohnlovesMary
MarylovesJohn
CompositionModels
•
•
Fromwordstosentencesandbeyond
Architecturesthatmodelthesyntaxandsemanticsofsentence
Twobasicarchitectures
Convolu1onalNeuralNet(CNN)
Mary loves John E-O-S
RecurrentNeuralNet(RNN)
Maryloves John E-O-S
CompositionModels(cont’d)
Bottomupandsoftparsingofsentence,createensembleofparsetrees
Constructallpossiblecompositions,andchoosethemostsuitableonesusingpooling(orgating)
Needssometrickstohandlevariablelengthsofsentences
Convolu1onalNeuralNet(CNN)
Mary loves John E-O-S
Sentenceassequenceofwords,lefttorightand/orrighttoleft
Recursivelyconstructthesame(oneforall)composition,betweenthehistoryandnextword
Differentgatingmechanismshavebeenproveduseful,e.g.,LSTMorGRU
RecurrentNeuralNet(RNN)
Maryloves John E-O-S
NeuralMachineTranslation(NMT)
•
•
•
•
•
•
Sequence-to-sequencelearningwithencoder-decoderframeworkGatedRNN(e.g.,LSTM,GRU)asencoderanddecoderAttentionmodelforautomaticalignment
End-to-endlearning
Othertricks(e.g.,largevocabularyetc)forfurtherimprovementOutperformingSTM
NeuralMachineTranslation(cont’d)
•
Comparingwithstatisticalmachinetranslation(SMT)
SRC:人类
永久
和平
和
稳定
的
一天
即将
到来。
SMT:Mankindpermanentpeaceandstabilityinthedaystocome.
NMT:Peacewillsooncomeinmankind.
SRC:欧安组织的成员包括北美、欧洲和亚洲的五十五个国家,
明年将庆祝成立三十周年。
SMT:theoscemembersincludingnorthamerica,europeandasia,the55countriesnextyearwillmarkthe30thanniversaryoftheestablishment.
NMT:theorganizationofsecurityincludes55countriesfromnorthamerica,europeandasia,andnextyearitwillcelebrateits30thanniversary.
Betterfluencyandsentence-levelfidelity,catchinguponotheraspects(e.g.,idiomsetc)
Interestingtopic:HowtocombineNMTandSMT
•
•
NeuralDialogue
•
•
•
Generationbasedapproachvsretrievalbasedapproach(traditional)Singleturndialogueandmultiturndialogue
Singleturndialogue
Sequence-to-sequencelearningwithencoder-decoderframework
95%ofresponsesarenaturalsentences,76%ofresponsesarereasonablereplies,trainedwithsocialmediadata(Shangetal.2015)
Multiturndialogue
•
–
–
–
Taskspecificdialogue
Multiplenetworks,end-to-endlearning
Successrateis98%forquestionansweringaboutrestaurants(Wenetal.,2016)
•
Nohumaneffort,purelydata-driven,notpossiblebefore
NeuralDialogueModels
•
•
•
Sequence-to-sequencelearningforsingle-turndialogueNeuralRespondingMachine(NRM)(Shangetal.,2015)
Encodertoencodeinputmessage,decodertodecodeittooutput
response
Combinationoflocalcontextmodel(attentionmodel)andglobalcontextmodel
End-to-endlearning
•
•
OccupyCentralisfinallyover.
H:
占中终于结束了。
WillLujiazui(financedistrictinShanghai)bethenext?
M:
下一个是陆家嘴吧?
IwanttobuyaSamsungphone
H:
我想买三星手机。
Whynotbuyournationalbrands?
M:
还是支持一下国产的
吧。
ExamplesofsingleturndialoguebetweenhumansandNRM,trainedwithWeibodata
ArchitectureofNRM,combinationoflocalandglobalrepresentation
NeuralDialogueModels(cont’d)
•
•
Sequence-to-sequencelearningformultiturndialogue(Vinyals&Le,2015)
TwoRNNs:oneRNN(encoder)toencodecontext(previoussentences),theotherRNN(decoder)todecodeittoresponse
End-to-endlearning
•
Sequencetosequencelearning:encodingcontextandgeneratingresponse
Examplesofdialoguebetweenhumansandmodel,trainedfrommoviesubtitledata
NeuralDialogueModels(cont’d)
•
Task-dependentmulti-turndialogue(Wenetal.,2016),&severalothermodels
End-to-endlearningArchitecture
–Intentnetwork,beliefstatetracker,policynetwork,generationnetwork
Complicatedtaskcontrolledbynetworks,trainedinend-to-endfashion
•
•
•
Systemarchitecture
Exampleofmulti-turndialoguebetweenhumanandsystem,restaurantdomain
NeuralReasoning
•
Reasoningoverfacts
–
Factsandquestionsareinnaturallanguage
bAbidataset,
(Westonetal.,2015)
–
Externalmemory,step-by-steporend-to-endlearning
•
Naturallogicreasoning
–
–
–
Learntoconductnaturallogicinference
Reasoningaboutsemanticrelations
toinfer turtle≺animal
turtle≺reptile, reptile≺animal
E.g.,from
NeuralReasoningModels
•
•
•
•
MemoryNetworks(Westonetal.,2014,Sukhbaataretal.,2015)Reasoningoverfacts,languagemodeling,etcAttentionmodel,externalmemory
Architecture
–
Multiplelayers(steps)ofprocessing
•
Writeandreadintermediateresultsintoexternalmemory,networkstrainedinend-to-endfashion
controlledby
ArchitectureofMemoryNetworks,(a)onelayer,
(b)multiplelayers
NeuralReasoningModels(cont’d)
•
•
•
Reasoningoverfacts
NeuralReasoner:(Pengetal.2015)Externalmemory,differentiabledatastructure,end-to-endlearning
Architecture
•
–
–
–
Encodinglayer:Reasoninglayers:
Answerlayer:
•
Writeandreadintermediateresultsintoexternalmemory,controlledbynetworkstrainedinend-to-endfashion
ArchitectureofNeuralReasoner
NeuralReasoningModels(cont’d)
•
•
•
•
Modelforreasoningusingnaturallogic(Bowmanetal.,2015)Comparetwoconceptsorterms(distributedrepresentations)Model:deepneuralnetworkordeeptensornetwork
Learndistributedrepresentationsofconceptsthroughreasoningwithnaturallogic
Architectureofmodel
Reference(partI)
•
•
[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka
[Wenetal.,2016]ANetwork-basedEnd-to-EndTrainableTask-orientedDialogueSystemTsung-HsienWen,MilicaGaši´c,NikolaMrkši´c,
LinaM.Rojas-Barahona,Pei-HaoSu,StefanUltes,DavidVandyke,SteveYoung
[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes
[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyals&Le,2015]Aneuralconversationalmodel.OriolVinyalsandQuocLe.
[Kumaretal.,2015]AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessingAnkitKumar,OzanIrsoy,PeterOndruska,MohitIyyer,JamesBradbury,IshaanGulrajani,VictorZhong,RomainPaulus,RichardSocher
[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.
[Bownetal.,2014]Recursiveneuralnetworkscanlearnlogicalsemantics.S.Bowman,C.Potts,andC.Manning.
[Westonetal.,2015]Towardsai-completequestionanswering:Asetofprerequisitetoytasks.J.Weston,A.Bordes,S.Chopra,T.Mikolov.[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong
[Maetal.,2015]LearningtoAnswerQuestionsFromImageUsingConvolutionalNeuralNetwork.LinMa,ZhengdongLu,HangLi
[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.
•
•
•
•
•
•
•
•
•
•
•
Part-II:
Differentiable
Data-structures
Differentiable
Data-structure:
Outline
•
•
•
•
•
•
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamples
Concludingremarks
Differentiable
Data-structure:
Outline
•
•
•
•
•
•
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamples
Concludingremarks
Whatisdifferentiabledata-structure?
•
Differentiabledata-structureisamemory-likestructurewhichcanbecontrolledbyneuralnetworksystem,withthefollowingproperties
Itcanbeusedtoperformrathercomplicatedoperations
Alltheoperationsthatcanbe“tuned”,includingthereadand/orwritetothememory,aredifferentiable
“soyoucanjustdoback-propagations”
•
RepresentativeExamples:
–
–
–
–
NeuralTuringMachine(ageneraltake)RNNsearch(automaticalignmentforM.T.)
MemoryNetwork(adifferenttakeonmemorysetting)
NeuralRandomAccessMachine(smartdesignfor“pointers”)
WhatisNOTdifferentiabledata-structure?
(Too)manyexamples
Hardattention
e.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationalmethods
Varyingnumberofmemorycells
Typicallythenumberofmemorycellsarenotpartofoptimization,sincetheycannotbedirectlyoptimizedviaback-propagation
Otherstructuraloperations
e.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequencesandstorethemsomewhereetc
Othersymbolicstuff
–
e.g,.usesymbols(discreteclasses)inintermediaterepresentations
WhatisNOTdifferentiabledata-structure?
(Too)manyexamples
•
Hardattention
–E.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationmethods
•
Varyingnumberofmemoryunits
–
they
•
Otherstructuraloperations
–E.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequenceandstoreitsomewhereetc
Othersymbolicstuff
•
–
E.g,.usesymbols(discreteclasses)inintermediaterepresentations
InTyapisceanllyseth,tehneudmebseirgnofomfdemiffoerryenutniiatsblaerestnroutcptuarrteoifsotoptifminidzaatiowna,ysince
acraonunnodt,wbeitdhioreucttlsyaocpritfiimciinzegdtovioabmacukc-hpreofpfiacgiaetniocny
Differentiable
Data-structure:
Outline
•
•
•
•
•
•
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamples
Concludingremarks
NeuralTuringMachine(Gravesetal.,2014)
Ageneralformulation:
Controller:typicallyagatedRNN,controllingtheread-writeoperations
Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller
Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller
Tape:thememory,typicallyamatrix
NeuralTuringMachine(Gravesetal.,2014)
Ageneralformulation:
Controller:typicallyagatedRNN,controllingtheread-writeoperations
Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller
Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller
Tape:thememory,typicallyamatrix
NeuralTuringMachine(Gravesetal.,2014)
Ageneralformulation:
Controller:typicallyagatedRNN,controllingtheread-writeoperations
Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller
Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller
Tape:thememory,typicallyamatrix
NeuralTuringMachine(Gravesetal.,2014)
Ageneralformulation:
Controller:typicallyagatedRNN,controllingtheread-writeoperations
Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller
Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller
Tape:thememory,typicallyamatrix
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
reading
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
Writing
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
Writing
Determinethecelltowriteto(addressing)
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
Writing
Determinethecelltowriteto(addressing)
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
Writing
Determinethecelltowriteto(addressing)
Modifythecontentofit
•
Typicallywithsomeforgettingfactor
=
+
Erasesomething
ferase( )
addsomething
Hardaddressingforreading/writing
Reading
Determinethecelltoreadfrom(addressing)
Getthecontentoftheselectedcell
Writing
Determinethecelltowriteto(addressing)
Modifythecontentofit
•
Typicallywithsomeforgettingfactor
=
ferase(
)+
Softaddressingforreading/writing
Withsoftaddressing,thesystemdoesn’tpinpointamemorycell,instead,givesadistributionovercellsitisgoingtoreadfromorwriteto.
it
Sothereading/writingoccursonessentiallyallmemorycells,withdifferentweights(giveninthedistribution)
Itcanbeviewedasanexpectation(weightedsum)ofallthepossibleread/writeactions
Theessenceofdifferentiabledata-structures
Theessenceofdifferentiabledata-structureistorepresentthedistributionproperly,including
everytunablecomponentinthesystem
jointdistributionwhenmultiplecomponentsmeet
sothesupervisioncangothroughtheentiresystemto
increasetheprobabilitiesofthepromisingcandidates,
decreasetheprobabilitiesofthepoorcandidates,
withoutkillinganyone
Whenwehavetoexplicitlymodelmanydiscreteoperations,weoftenneedtogeneratetherepresentationofallofthemanddoaweightedaverage
Youhavetocustomizeyourownmodel
ThegenericNTMwon’tworkformostreal-worldtasks
“Tryamachinetranslationtask,andthenyouwillsee”
•
•
Youhavetodesignyourownmodel(inasense,probablyaspecialcaseofNeuralTuringMachine)toputinyourdomainknowledge
Forbetterinductivebias(soitwon’tneedtoomanysamplestolearn)
Tobetterdecomposethecomplicatedoperations(soeachsub-operationcanbeeasily“represented”and“learned”)
Forbetterefficiency(sotheentireoperationstakelesstime)
Differentiable
Data-structure:
Outline
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamplesConcludingremarks
•
•
•
•
•
•
Intermsof“term”
Averysloppycategorization
Short-termmemory
Short-termmemory(STM)storestherepresentationofcurrentinput,e.g.,therepresentationofsourcesentenceinneuralmachinetranslation
Intermediatememory
Intermediatememoryliesbetweenshort-termandlong-termmemory,itstoresthefactsandcontextualinformationrelatedtothecurrentproblem.
Long-termmemory
Long-termmemory(LTM)storestheinstance-independentknowledge,forexamplethefactoidknowledgeinQ.A.,ortherulememoryfortranslationinM.T.
Examplesabouttermsofmemories(1)
Short-termMemoryinNMT
Themodelwilllearntoformthesourcememoryintheencodingphase,toreadfromit(“attentiveread”)duringdecoding,andsometimeswritealittlebittoo
NowacommonpracticeinNMT
Bahdanauetal.(2015)andmanymanymore
“ShortShort-termMemory”inNMT
MemoryasadirectextensiontothestateofdecoderRNN,bigimprovementofperformance
ThismemoryisreadfromandwrittentobythedecoderRNNateachtimestep
Itstorespartofsourceandtargetsentencesrelevantforaparticulartimeindecoding
Wangetal.(2016)
Examplesabouttermsofmemories(2)
Long-termMemoryinNeuralQA
Haveamemorytosavethefactoidknowledge(tables,triples)
Angenerativemodelwill“fetch”theknowledgefromtheLTMasneeded
Yinetal.(2016a)
IntermediateMemoryforRepresentingADialogSession,AParagraph…
E.g,savethedialoghistoryinamemory-netandattendtotherelevantpartwhengeneratingaresponse
bot:whichpricerangeareyoulookingfor?
Bordes&Westonetal.(2016)
user:hi
bot:hellowhatcanIhelpyouwithtoday
user:mayIhaveatableinParis
bot:Iamnoit
user:<silence>
bot:anypreferenceonatypeofcuisine
user:IloveIndianfood
bot:howmanypeoplewouldbeinyourpart?
user:wewillbesix
Intermof“structure”
Pre-determinedSize(genericNTM)
–Memoryoffixedsizeisclaimed(independentofinstances)andtheread/writeoperationisontheentirememory
Linear(NeuralStack,NeuralQueue)
…
–Thenumberofmemory“cells”islineartothelengthofsequence
Stacked-Memory(DeepMemory)
–Itcouldbebasedonmemorywithfixedorlinearsize
LinearMemory
Thenumberofmemorycellsislineartothelengthofsequence
Itisunbounded,butthenumberofcellscanoftenbepre-determined(e.g.,intranslation,afteryouseetheentiresourcesentence)
Cantaketheformqueue,stack…,fordifferentmodelingtasks
NeuralStackMachine
Credit:Mengetal.(2015)
Grefenstetteetal.(2015)
“DeepMemory”
Deepmemory-basedarchitectureforNLP
Differentlayersofmemory,equippedwithsuitableread/writestrategiestoencouragelayerbylayertransformationofinput(e.g,asentence)
AgeneralizationofthedeeparchitecturesinDNNtoricherformofrepresentationstohandlemorecomplicatedlinguisticobjects
output
3-layer
“DEEPMEMORY”
RepresentationatLayer-3
RepresentationatLayer-2
RepresentationatLayer-1
Transformations
Differentiable
Data-structure:
Outline
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamplesConcludingremarks
•
•
•
•
•
•
Addressing(forbothreadandwrite)
Roughly,threetypes
•
Location-basedaddressing
–Thecontrollerdetermineswhichcellstoread/writepurelybasedonthelocationinformation
Content-basedaddressing
–Thecontrollerdetermineswhichcellstoread/writebasedonthecontentofthecells
•
Hybridaddressing
Theaddressesaredeterminedbasedonbothlocationandcontent
Manydifferentwaystocombinethetwostrategies
•
Content-basedAddressing
Content-basedaddressingdetermineswhichcellstoread/writebasedonthestateofthecontroller
Tomakeitdifferentiable,allthecells
thereadingisaweighedaverageofthecontentof
st
rt
Read-head
xn
st:Thequeryingsignalattimet
rt:Thereadingattimet
xn:Thecontentofthenthmemorycell
Attentionisalsoaddressing
Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread
RNNsearchasanexampleofattention
Attentionisalsoaddressing
Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread
memory
Attentionisalsoaddressing
Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread
Attentivereadcouldbebasedoncontent,location(Graves,2012),orboth
memory
Morecomplicatedattentionmechanism
InteractiveAttention
Learnto“write”alittlebittothememorythefacilitatenextroundofreading
Itcanencourageanddiscouragecertainmemorycells
Helptohandleunder-translationandover-translationinNMT
Mengetal.(2016)
Local+globalattention
Multipleattentionstrategytofindthe“context”vectorwithdifferentresolutions
Oftenoutperformsjusttheglobalone
Luongetal.(2015)
Location-basedAddressing
Iflocation-basedaddressingispartoftheoptimization,itislikelytobenon-differentiable.Therearethreestrategiestogetaround
Strategy-I:thelocation-basedpartishard-wiredintothehybridaddressing,
e.g.inthegenericNTM(Gravesetal.,2014)
–
–
Strategy-II:location-basedaddressingispartofhard-wiredoperationswhicharecontrolledbytheneuralnetwork,e.g,NeuralRandomAccessMachines(Kurachetal.,2015)
Strategy-III:boththearchitectureandthelearningsettingaredesignedtoencouragelocation-basedaddressing,e.g.,CopyNet(Guetal.,2016)
–
Location-basedAddressing:
CopyNet(Guetal.,2016)
Sequence-to-sequencemodelwithtwoattentionmechanisms
Theencoderisencouragedtoputlocationinformationinthecontentofmemory,whilethedecoderisencouragedtousesthislocationinformationfor“copying”segmentsofsource
Differentiable
Data-structure:
Outline
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamplesConcludingRemarks
•
•
•
•
•
•
NeuralEnquirer:
bothSTMandLTM
AneuralnetworksystemforqueryingKBtables
NeuralEnquirerhasbothshort-termandlong-termmemory
Yinetal.(2016b)
NeuralEnquirer:
bothSTMandLTM
AneuralnetworksystemforqueryingKBtables
NeuralEnquirerhasbothshort-termandlong-termmemory
Long-termmemory:
Thepartiallyembeddedknowledge-base
Yinetal.(2016b)
NeuralEnquirer:
bothSTMandLTM
AneuralnetworksystemforqueryingKBtables
NeuralEnquirerhasbothshort-termandlong-termmemory
Short-termmemory:Intermediateresultofprocessingoftables
Yinetal.(2016b)
NeuralRandomAccessMachine
NeuralrandomaccessmachineisspecialcaseofNTMthatsupportspointerinadifferentiabledata-structure
Anintriguingexampletousedifferentiabledata-structureforseeminglynon-differentiableoperations
–Manyhardmodulestoaccessthememoryandsoftmechanism(aspecialkindofNNcontroller)tocallthem
•
•
Kurachetal.(2016)
Thecircuitgeneratedateverytimestep>2forthetaskReverse
Differentiable
Data-structure:
Outline
Whatisdifferentiabledata-structure?Ageneralformulation
Memory:typesandstructures
AddressingstrategiesExamplesConcludingRemarks
•
•
•
•
•
•
ProsandCons
Differentiabilityrequiresmaintainingtheentiredistribution,whichhasitsadvantagesanddisadvantages
Pros:
itmakestheoptimizationstraightforwardandefficient,sinceeverymembergetsanon-zeroshareofthemass(vs.non-differentiablecases)
Memoryandallthatgivegreatspaceforarchitecturalandmechanismdesign
Cons:
Maintainingthisdistributionandproperlyrepresentingitisnotalwayseasy
Droppingthedifferentiabilityrequirementoftenmakesthedesign(forexamplethepointer)mucheasier
Reference(partII)
•
[Xuetal.,2015]Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention
KelvinXu,JimmyBa,RyanKiros,KyunghyunCho,AaronCourville,RuslanSalakhutdinov,RichardZemel,YoshuaBengio[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka
[Grefenstetteetal.,2015]LearningtoTransducewithUnboundedMemory.EdwardGrefenstette,KarlMoritzHermann,MustafaSuleyman,PhilBlunsom
[Gu,2016]IncorporatingCopyingMechanisminSequence-to-SequenceLearning.JiataoGu,ZhengdongLu,HangLi,VictorO.K.Li[Kurachetal.,2015]NeuralRandom-AccessMachinesKarolKurach,MarcinAndrychowicz,IlyaSutskever
[Wangetal.,2016]Memory-enhancedDecoderforNeuralMachineTranslationMingxuanWang,ZhengdongLu,HangLi,QunLiu[Yinetal.,2016a]NeuralGenerativeQuestionAnsweringJunYin,XinJiang,ZhengdongLu,LifengShang,HangLi,XiaomingLi
[Yinetal.,2016b]NeuralEnquirer:LearningtoQueryTableswithNaturalLanguage.PengchengYin,ZhengdongLu,HangLi,BenKao
[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes[Hochreiter&Schmidhuber,1997]Longshort-termmemory.SeppHochreiterandJurgenSchmidhuber.
[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyalsetal.,2015]Pointernetworks.OriolVinyals,MeireFortunato,andNavdeepJaitly.
[Tuetal.,2016]ModelingCoverageforNeuralMachineTranslationZhaopengTu,ZhengdongLu,YangLiu,XiaohuaLiu,HangLi[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.
[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong
[Mengetal.,2015]ADeepMemory-basedArchitectureforSequence-to-SequenceLearningFandongMeng,ZhengdongLu,ZhaopengTu,HangLi,QunLiu
[Luongetal.2015]Effectiveapproachestoattention-basedneuralmachinetranslation.Minh-ThangLuong,HieuPham,andChristopherDManning.
[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
[Mengetal.2016]Interactiveattentionforneuralmachinetranslation.FandongMeng,ZhengdongLu,
QunLiu,HangLi.
Part-III:
LearningParadigms
Learning:
Outline
Overview
End-to-endlearning(ornot?)Dealingwithnon-differentiabilityGrounding-basedlearning
Newlearningparadigms
•
•
•
•
•
Humanlanguagelearning
Itisacomplex(andpowerful)mixtureof
Supervisedlearning:
- whenwearetaughtwordsandgrammar
whenwegotcorrectedinmakingasentence
Unsupervisedlearning:
whenwelearnFrenchbyreadingaFrenchnovel
whenwefigureoutthemeaningofwordsbyseeinghowtheyareused
Reinforcementlearning:
whenwearelearningthrough“trialanderror”
Explanation-basedlearning:
whenwearebuildingatheorybasedonourdomainknowledgetomakesenseofanewobservation
…
•
•
•
•
Severaldimensionsoflearningparadigm
End-to-end
vs.“step-by-step”
Gradientdescentvs.Non-differentiableobjectives
Supervisionfrom“grounding”orhumanlabeling
Supervisedlearningvs.Reinforcementlearning
Severaldimensionsoflearningparadigm
End-to-end
vs.“step-by-step”
Gradient-basedvs.Non-differentiable
objectives
Supervisionfrom“grounding”orhumanlabeling
Supervisedlearningvs.Reinforcementlearning
End-to-endlearningtunestheparametersoftheentiremodelbasedonthecorrectionalsignalfromtheoutput.Nosupervisionaddedtotheintermediatelayers
InStep-by-steplearning,wehavespecificallydesignedsupervisionontheintermediaterepresentations
Severaldimensionsoflearningparadigm
End-to-end
vs.“step-by-step”
Gradient-basedvs.Non-differentiable
objectives
Supervisionfrom“grounding”orhumanlabeling
Supervisedlearningvs.Reinforcementlearning
Gradient-basedlearningtunestheparametersbyminimizing
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025吉林白山市教育系统“进校园”招聘高校毕业生52人模拟试卷附答案详解(模拟题)
- 2025年2月山东领取济宁市份普通话水平测试等级证书考前自测高频考点模拟试题带答案详解
- 2025年宜昌市消防救援支队政府专职消防员招聘48人模拟试卷附答案详解(模拟题)
- 2025湖南常德市德善附属幼儿园招聘(含实习岗)3人考前自测高频考点模拟试题及参考答案详解
- 2025辽宁省水资源管理和生态环保产业集团校园招聘208人模拟试卷附答案详解(典型题)
- 2025年宁夏医科大学自主公开招聘考前自测高频考点模拟试题及完整答案详解一套
- 2025天津力生制药股份限公司面向社会选聘惠符制药质量保证部副部长1人(分管化验室)考试模拟试题及答案解析
- 2025秋季《中国石油报》社有限公司高校毕业生招聘备考考试题库附答案解析
- 2025昆明市教工第二幼儿园社会聘用制教师招聘(1人)备考考试题库附答案解析
- 2025湖南张家界市桑植县农业农村局所属事业单位选调4人考前自测高频考点模拟试题附答案详解(完整版)
- 2025年下半年拜城县招聘警务辅助人员(260人)考试模拟试题及答案解析
- 宅基地争议申请书
- 2025年杭州上城区总工会公开招聘工会社会工作者9人笔试参考题库附答案解析
- 百师联盟2026届高三上学期9月调研考试数学试卷(含答案)
- 2025年互联网+特殊教育行业研究报告及未来发展趋势预测
- 神舟十号课件
- 2025-2026学年冀人版(2024)小学科学二年级上册(全册)教学设计(附教材目录 )
- 《管理学(马工程)》考试复习试题库(含答案)
- 公司建筑施工安全风险辨识分级管控台账
- 玻璃纤维增强塑料在船舶制造中的应用
- 教科版小学三年级上册科学实验报告
评论
0/150
提交评论