案例ai深度好奇_第1页
案例ai深度好奇_第2页
案例ai深度好奇_第3页
案例ai深度好奇_第4页
案例ai深度好奇_第5页
已阅读5页,还剩121页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Outline

QuickoverviewandbackgroundDifferentiabledata-structureLearningparadigms

Neuro-symbolism

Conclusion

Part-I:

OverviewandBackground

Overview

Background:wordembeddingandcompositionmodels

Progressintermsoftasks:

MachinetranslationDialogue

Reasoning

ImagecaptioningNaturallanguageparsing

……

Progressintermsofmethodology(focusofthistutorial):

AttentionmodelsExternalmemories

Differentiabledatastructures

End-to-endlearning

Distributedrepresentationofwords

Distributedrepresentationofwords

dog

cat

puppy

kitten

Usinghigh-dimensionalreal-valuedvectorstorepresentthemeaningofwords

Distributedrepresentationofsentences

Surprisingly,wecanuselong-enoughvectorstorepresentthemeaningofsentences

MaryislovedbyJohn

JohnlovesMary

MarylovesJohn

CompositionModels

Fromwordstosentencesandbeyond

Architecturesthatmodelthesyntaxandsemanticsofsentence

Twobasicarchitectures

Convolu1onalNeuralNet(CNN)

Mary loves John E-O-S

RecurrentNeuralNet(RNN)

Maryloves John E-O-S

CompositionModels(cont’d)

Bottomupandsoftparsingofsentence,createensembleofparsetrees

Constructallpossiblecompositions,andchoosethemostsuitableonesusingpooling(orgating)

Needssometrickstohandlevariablelengthsofsentences

Convolu1onalNeuralNet(CNN)

Mary loves John E-O-S

Sentenceassequenceofwords,lefttorightand/orrighttoleft

Recursivelyconstructthesame(oneforall)composition,betweenthehistoryandnextword

Differentgatingmechanismshavebeenproveduseful,e.g.,LSTMorGRU

RecurrentNeuralNet(RNN)

Maryloves John E-O-S

NeuralMachineTranslation(NMT)

Sequence-to-sequencelearningwithencoder-decoderframeworkGatedRNN(e.g.,LSTM,GRU)asencoderanddecoderAttentionmodelforautomaticalignment

End-to-endlearning

Othertricks(e.g.,largevocabularyetc)forfurtherimprovementOutperformingSTM

NeuralMachineTranslation(cont’d)

Comparingwithstatisticalmachinetranslation(SMT)

SRC:人类

永久

和平

稳定

一天

即将

到来。

SMT:Mankindpermanentpeaceandstabilityinthedaystocome.

NMT:Peacewillsooncomeinmankind.

SRC:欧安组织的成员包括北美、欧洲和亚洲的五十五个国家,

明年将庆祝成立三十周年。

SMT:theoscemembersincludingnorthamerica,europeandasia,the55countriesnextyearwillmarkthe30thanniversaryoftheestablishment.

NMT:theorganizationofsecurityincludes55countriesfromnorthamerica,europeandasia,andnextyearitwillcelebrateits30thanniversary.

Betterfluencyandsentence-levelfidelity,catchinguponotheraspects(e.g.,idiomsetc)

Interestingtopic:HowtocombineNMTandSMT

NeuralDialogue

Generationbasedapproachvsretrievalbasedapproach(traditional)Singleturndialogueandmultiturndialogue

Singleturndialogue

Sequence-to-sequencelearningwithencoder-decoderframework

95%ofresponsesarenaturalsentences,76%ofresponsesarereasonablereplies,trainedwithsocialmediadata(Shangetal.2015)

Multiturndialogue

Taskspecificdialogue

Multiplenetworks,end-to-endlearning

Successrateis98%forquestionansweringaboutrestaurants(Wenetal.,2016)

Nohumaneffort,purelydata-driven,notpossiblebefore

NeuralDialogueModels

Sequence-to-sequencelearningforsingle-turndialogueNeuralRespondingMachine(NRM)(Shangetal.,2015)

Encodertoencodeinputmessage,decodertodecodeittooutput

response

Combinationoflocalcontextmodel(attentionmodel)andglobalcontextmodel

End-to-endlearning

OccupyCentralisfinallyover.

H:

占中终于结束了。

WillLujiazui(financedistrictinShanghai)bethenext?

M:

下一个是陆家嘴吧?

IwanttobuyaSamsungphone

H:

我想买三星手机。

Whynotbuyournationalbrands?

M:

还是支持一下国产的

吧。

ExamplesofsingleturndialoguebetweenhumansandNRM,trainedwithWeibodata

ArchitectureofNRM,combinationoflocalandglobalrepresentation

NeuralDialogueModels(cont’d)

Sequence-to-sequencelearningformultiturndialogue(Vinyals&Le,2015)

TwoRNNs:oneRNN(encoder)toencodecontext(previoussentences),theotherRNN(decoder)todecodeittoresponse

End-to-endlearning

Sequencetosequencelearning:encodingcontextandgeneratingresponse

Examplesofdialoguebetweenhumansandmodel,trainedfrommoviesubtitledata

NeuralDialogueModels(cont’d)

Task-dependentmulti-turndialogue(Wenetal.,2016),&severalothermodels

End-to-endlearningArchitecture

–Intentnetwork,beliefstatetracker,policynetwork,generationnetwork

Complicatedtaskcontrolledbynetworks,trainedinend-to-endfashion

Systemarchitecture

Exampleofmulti-turndialoguebetweenhumanandsystem,restaurantdomain

NeuralReasoning

Reasoningoverfacts

Factsandquestionsareinnaturallanguage

bAbidataset,

(Westonetal.,2015)

Externalmemory,step-by-steporend-to-endlearning

Naturallogicreasoning

Learntoconductnaturallogicinference

Reasoningaboutsemanticrelations

toinfer turtle≺animal

turtle≺reptile, reptile≺animal

E.g.,from

NeuralReasoningModels

MemoryNetworks(Westonetal.,2014,Sukhbaataretal.,2015)Reasoningoverfacts,languagemodeling,etcAttentionmodel,externalmemory

Architecture

Multiplelayers(steps)ofprocessing

Writeandreadintermediateresultsintoexternalmemory,networkstrainedinend-to-endfashion

controlledby

ArchitectureofMemoryNetworks,(a)onelayer,

(b)multiplelayers

NeuralReasoningModels(cont’d)

Reasoningoverfacts

NeuralReasoner:(Pengetal.2015)Externalmemory,differentiabledatastructure,end-to-endlearning

Architecture

Encodinglayer:Reasoninglayers:

Answerlayer:

Writeandreadintermediateresultsintoexternalmemory,controlledbynetworkstrainedinend-to-endfashion

ArchitectureofNeuralReasoner

NeuralReasoningModels(cont’d)

Modelforreasoningusingnaturallogic(Bowmanetal.,2015)Comparetwoconceptsorterms(distributedrepresentations)Model:deepneuralnetworkordeeptensornetwork

Learndistributedrepresentationsofconceptsthroughreasoningwithnaturallogic

Architectureofmodel

Reference(partI)

[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka

[Wenetal.,2016]ANetwork-basedEnd-to-EndTrainableTask-orientedDialogueSystemTsung-HsienWen,MilicaGaši´c,NikolaMrkši´c,

LinaM.Rojas-Barahona,Pei-HaoSu,StefanUltes,DavidVandyke,SteveYoung

[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes

[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyals&Le,2015]Aneuralconversationalmodel.OriolVinyalsandQuocLe.

[Kumaretal.,2015]AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessingAnkitKumar,OzanIrsoy,PeterOndruska,MohitIyyer,JamesBradbury,IshaanGulrajani,VictorZhong,RomainPaulus,RichardSocher

[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.

[Bownetal.,2014]Recursiveneuralnetworkscanlearnlogicalsemantics.S.Bowman,C.Potts,andC.Manning.

[Westonetal.,2015]Towardsai-completequestionanswering:Asetofprerequisitetoytasks.J.Weston,A.Bordes,S.Chopra,T.Mikolov.[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong

[Maetal.,2015]LearningtoAnswerQuestionsFromImageUsingConvolutionalNeuralNetwork.LinMa,ZhengdongLu,HangLi

[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.

Part-II:

Differentiable

Data-structures

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

Whatisdifferentiabledata-structure?

Differentiabledata-structureisamemory-likestructurewhichcanbecontrolledbyneuralnetworksystem,withthefollowingproperties

Itcanbeusedtoperformrathercomplicatedoperations

Alltheoperationsthatcanbe“tuned”,includingthereadand/orwritetothememory,aredifferentiable

“soyoucanjustdoback-propagations”

RepresentativeExamples:

NeuralTuringMachine(ageneraltake)RNNsearch(automaticalignmentforM.T.)

MemoryNetwork(adifferenttakeonmemorysetting)

NeuralRandomAccessMachine(smartdesignfor“pointers”)

WhatisNOTdifferentiabledata-structure?

(Too)manyexamples

Hardattention

e.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationalmethods

Varyingnumberofmemorycells

Typicallythenumberofmemorycellsarenotpartofoptimization,sincetheycannotbedirectlyoptimizedviaback-propagation

Otherstructuraloperations

e.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequencesandstorethemsomewhereetc

Othersymbolicstuff

e.g,.usesymbols(discreteclasses)inintermediaterepresentations

WhatisNOTdifferentiabledata-structure?

(Too)manyexamples

Hardattention

–E.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationmethods

Varyingnumberofmemoryunits

they

Otherstructuraloperations

–E.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequenceandstoreitsomewhereetc

Othersymbolicstuff

E.g,.usesymbols(discreteclasses)inintermediaterepresentations

InTyapisceanllyseth,tehneudmebseirgnofomfdemiffoerryenutniiatsblaerestnroutcptuarrteoifsotoptifminidzaatiowna,ysince

acraonunnodt,wbeitdhioreucttlsyaocpritfiimciinzegdtovioabmacukc-hpreofpfiacgiaetniocny

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

reading

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Modifythecontentofit

Typicallywithsomeforgettingfactor

=

+

Erasesomething

ferase( )

addsomething

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Modifythecontentofit

Typicallywithsomeforgettingfactor

=

ferase(

)+

Softaddressingforreading/writing

Withsoftaddressing,thesystemdoesn’tpinpointamemorycell,instead,givesadistributionovercellsitisgoingtoreadfromorwriteto.

it

Sothereading/writingoccursonessentiallyallmemorycells,withdifferentweights(giveninthedistribution)

Itcanbeviewedasanexpectation(weightedsum)ofallthepossibleread/writeactions

Theessenceofdifferentiabledata-structures

Theessenceofdifferentiabledata-structureistorepresentthedistributionproperly,including

everytunablecomponentinthesystem

jointdistributionwhenmultiplecomponentsmeet

sothesupervisioncangothroughtheentiresystemto

increasetheprobabilitiesofthepromisingcandidates,

decreasetheprobabilitiesofthepoorcandidates,

withoutkillinganyone

Whenwehavetoexplicitlymodelmanydiscreteoperations,weoftenneedtogeneratetherepresentationofallofthemanddoaweightedaverage

Youhavetocustomizeyourownmodel

ThegenericNTMwon’tworkformostreal-worldtasks

“Tryamachinetranslationtask,andthenyouwillsee”

Youhavetodesignyourownmodel(inasense,probablyaspecialcaseofNeuralTuringMachine)toputinyourdomainknowledge

Forbetterinductivebias(soitwon’tneedtoomanysamplestolearn)

Tobetterdecomposethecomplicatedoperations(soeachsub-operationcanbeeasily“represented”and“learned”)

Forbetterefficiency(sotheentireoperationstakelesstime)

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingremarks

Intermsof“term”

Averysloppycategorization

Short-termmemory

Short-termmemory(STM)storestherepresentationofcurrentinput,e.g.,therepresentationofsourcesentenceinneuralmachinetranslation

Intermediatememory

Intermediatememoryliesbetweenshort-termandlong-termmemory,itstoresthefactsandcontextualinformationrelatedtothecurrentproblem.

Long-termmemory

Long-termmemory(LTM)storestheinstance-independentknowledge,forexamplethefactoidknowledgeinQ.A.,ortherulememoryfortranslationinM.T.

Examplesabouttermsofmemories(1)

Short-termMemoryinNMT

Themodelwilllearntoformthesourcememoryintheencodingphase,toreadfromit(“attentiveread”)duringdecoding,andsometimeswritealittlebittoo

NowacommonpracticeinNMT

Bahdanauetal.(2015)andmanymanymore

“ShortShort-termMemory”inNMT

MemoryasadirectextensiontothestateofdecoderRNN,bigimprovementofperformance

ThismemoryisreadfromandwrittentobythedecoderRNNateachtimestep

Itstorespartofsourceandtargetsentencesrelevantforaparticulartimeindecoding

Wangetal.(2016)

Examplesabouttermsofmemories(2)

Long-termMemoryinNeuralQA

Haveamemorytosavethefactoidknowledge(tables,triples)

Angenerativemodelwill“fetch”theknowledgefromtheLTMasneeded

Yinetal.(2016a)

IntermediateMemoryforRepresentingADialogSession,AParagraph…

E.g,savethedialoghistoryinamemory-netandattendtotherelevantpartwhengeneratingaresponse

bot:whichpricerangeareyoulookingfor?

Bordes&Westonetal.(2016)

user:hi

bot:hellowhatcanIhelpyouwithtoday

user:mayIhaveatableinParis

bot:Iamnoit

user:<silence>

bot:anypreferenceonatypeofcuisine

user:IloveIndianfood

bot:howmanypeoplewouldbeinyourpart?

user:wewillbesix

Intermof“structure”

Pre-determinedSize(genericNTM)

–Memoryoffixedsizeisclaimed(independentofinstances)andtheread/writeoperationisontheentirememory

Linear(NeuralStack,NeuralQueue)

–Thenumberofmemory“cells”islineartothelengthofsequence

Stacked-Memory(DeepMemory)

–Itcouldbebasedonmemorywithfixedorlinearsize

LinearMemory

Thenumberofmemorycellsislineartothelengthofsequence

Itisunbounded,butthenumberofcellscanoftenbepre-determined(e.g.,intranslation,afteryouseetheentiresourcesentence)

Cantaketheformqueue,stack…,fordifferentmodelingtasks

NeuralStackMachine

Credit:Mengetal.(2015)

Grefenstetteetal.(2015)

“DeepMemory”

Deepmemory-basedarchitectureforNLP

Differentlayersofmemory,equippedwithsuitableread/writestrategiestoencouragelayerbylayertransformationofinput(e.g,asentence)

AgeneralizationofthedeeparchitecturesinDNNtoricherformofrepresentationstohandlemorecomplicatedlinguisticobjects

output

3-layer

“DEEPMEMORY”

RepresentationatLayer-3

RepresentationatLayer-2

RepresentationatLayer-1

Transformations

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingremarks

Addressing(forbothreadandwrite)

Roughly,threetypes

Location-basedaddressing

–Thecontrollerdetermineswhichcellstoread/writepurelybasedonthelocationinformation

Content-basedaddressing

–Thecontrollerdetermineswhichcellstoread/writebasedonthecontentofthecells

Hybridaddressing

Theaddressesaredeterminedbasedonbothlocationandcontent

Manydifferentwaystocombinethetwostrategies

Content-basedAddressing

Content-basedaddressingdetermineswhichcellstoread/writebasedonthestateofthecontroller

Tomakeitdifferentiable,allthecells

thereadingisaweighedaverageofthecontentof

st

rt

Read-head

xn

st:Thequeryingsignalattimet

rt:Thereadingattimet

xn:Thecontentofthenthmemorycell

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

RNNsearchasanexampleofattention

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

memory

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

Attentivereadcouldbebasedoncontent,location(Graves,2012),orboth

memory

Morecomplicatedattentionmechanism

InteractiveAttention

Learnto“write”alittlebittothememorythefacilitatenextroundofreading

Itcanencourageanddiscouragecertainmemorycells

Helptohandleunder-translationandover-translationinNMT

Mengetal.(2016)

Local+globalattention

Multipleattentionstrategytofindthe“context”vectorwithdifferentresolutions

Oftenoutperformsjusttheglobalone

Luongetal.(2015)

Location-basedAddressing

Iflocation-basedaddressingispartoftheoptimization,itislikelytobenon-differentiable.Therearethreestrategiestogetaround

Strategy-I:thelocation-basedpartishard-wiredintothehybridaddressing,

e.g.inthegenericNTM(Gravesetal.,2014)

Strategy-II:location-basedaddressingispartofhard-wiredoperationswhicharecontrolledbytheneuralnetwork,e.g,NeuralRandomAccessMachines(Kurachetal.,2015)

Strategy-III:boththearchitectureandthelearningsettingaredesignedtoencouragelocation-basedaddressing,e.g.,CopyNet(Guetal.,2016)

Location-basedAddressing:

CopyNet(Guetal.,2016)

Sequence-to-sequencemodelwithtwoattentionmechanisms

Theencoderisencouragedtoputlocationinformationinthecontentofmemory,whilethedecoderisencouragedtousesthislocationinformationfor“copying”segmentsofsource

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingRemarks

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Yinetal.(2016b)

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Long-termmemory:

Thepartiallyembeddedknowledge-base

Yinetal.(2016b)

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Short-termmemory:Intermediateresultofprocessingoftables

Yinetal.(2016b)

NeuralRandomAccessMachine

NeuralrandomaccessmachineisspecialcaseofNTMthatsupportspointerinadifferentiabledata-structure

Anintriguingexampletousedifferentiabledata-structureforseeminglynon-differentiableoperations

–Manyhardmodulestoaccessthememoryandsoftmechanism(aspecialkindofNNcontroller)tocallthem

Kurachetal.(2016)

Thecircuitgeneratedateverytimestep>2forthetaskReverse

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingRemarks

ProsandCons

Differentiabilityrequiresmaintainingtheentiredistribution,whichhasitsadvantagesanddisadvantages

Pros:

itmakestheoptimizationstraightforwardandefficient,sinceeverymembergetsanon-zeroshareofthemass(vs.non-differentiablecases)

Memoryandallthatgivegreatspaceforarchitecturalandmechanismdesign

Cons:

Maintainingthisdistributionandproperlyrepresentingitisnotalwayseasy

Droppingthedifferentiabilityrequirementoftenmakesthedesign(forexamplethepointer)mucheasier

Reference(partII)

[Xuetal.,2015]Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention

KelvinXu,JimmyBa,RyanKiros,KyunghyunCho,AaronCourville,RuslanSalakhutdinov,RichardZemel,YoshuaBengio[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka

[Grefenstetteetal.,2015]LearningtoTransducewithUnboundedMemory.EdwardGrefenstette,KarlMoritzHermann,MustafaSuleyman,PhilBlunsom

[Gu,2016]IncorporatingCopyingMechanisminSequence-to-SequenceLearning.JiataoGu,ZhengdongLu,HangLi,VictorO.K.Li[Kurachetal.,2015]NeuralRandom-AccessMachinesKarolKurach,MarcinAndrychowicz,IlyaSutskever

[Wangetal.,2016]Memory-enhancedDecoderforNeuralMachineTranslationMingxuanWang,ZhengdongLu,HangLi,QunLiu[Yinetal.,2016a]NeuralGenerativeQuestionAnsweringJunYin,XinJiang,ZhengdongLu,LifengShang,HangLi,XiaomingLi

[Yinetal.,2016b]NeuralEnquirer:LearningtoQueryTableswithNaturalLanguage.PengchengYin,ZhengdongLu,HangLi,BenKao

[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes[Hochreiter&Schmidhuber,1997]Longshort-termmemory.SeppHochreiterandJurgenSchmidhuber.

[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyalsetal.,2015]Pointernetworks.OriolVinyals,MeireFortunato,andNavdeepJaitly.

[Tuetal.,2016]ModelingCoverageforNeuralMachineTranslationZhaopengTu,ZhengdongLu,YangLiu,XiaohuaLiu,HangLi[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.

[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong

[Mengetal.,2015]ADeepMemory-basedArchitectureforSequence-to-SequenceLearningFandongMeng,ZhengdongLu,ZhaopengTu,HangLi,QunLiu

[Luongetal.2015]Effectiveapproachestoattention-basedneuralmachinetranslation.Minh-ThangLuong,HieuPham,andChristopherDManning.

[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.

[Mengetal.2016]Interactiveattentionforneuralmachinetranslation.FandongMeng,ZhengdongLu,

QunLiu,HangLi.

Part-III:

LearningParadigms

Learning:

Outline

Overview

End-to-endlearning(ornot?)Dealingwithnon-differentiabilityGrounding-basedlearning

Newlearningparadigms

Humanlanguagelearning

Itisacomplex(andpowerful)mixtureof

Supervisedlearning:

- whenwearetaughtwordsandgrammar

whenwegotcorrectedinmakingasentence

Unsupervisedlearning:

whenwelearnFrenchbyreadingaFrenchnovel

whenwefigureoutthemeaningofwordsbyseeinghowtheyareused

Reinforcementlearning:

whenwearelearningthrough“trialanderror”

Explanation-basedlearning:

whenwearebuildingatheorybasedonourdomainknowledgetomakesenseofanewobservation

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradientdescentvs.Non-differentiableobjectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradient-basedvs.Non-differentiable

objectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

End-to-endlearningtunestheparametersoftheentiremodelbasedonthecorrectionalsignalfromtheoutput.Nosupervisionaddedtotheintermediatelayers

InStep-by-steplearning,wehavespecificallydesignedsupervisionontheintermediaterepresentations

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradient-basedvs.Non-differentiable

objectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

Gradient-basedlearningtunestheparametersbyminimizing

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论