案例ai深度好奇

上传人：我*** IP属地：北京上传时间：2023-12-16 格式：DOCX 页数：126 大小：5.37MB 积分：30 举报 版权申诉

已阅读5页，还剩121页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Outline

•

QuickoverviewandbackgroundDifferentiabledata-structureLearningparadigms

Neuro-symbolism

Conclusion

Part-I:

OverviewandBackground

Overview

Background:wordembeddingandcompositionmodels

Progressintermsoftasks:

MachinetranslationDialogue

Reasoning

ImagecaptioningNaturallanguageparsing

……

•

Progressintermsofmethodology(focusofthistutorial):

•

AttentionmodelsExternalmemories

Differentiabledatastructures

End-to-endlearning

Distributedrepresentationofwords

•

Distributedrepresentationofwords

dog

cat

puppy

kitten

Usinghigh-dimensionalreal-valuedvectorstorepresentthemeaningofwords

Distributedrepresentationofsentences

•

Surprisingly,wecanuselong-enoughvectorstorepresentthemeaningofsentences

MaryislovedbyJohn

JohnlovesMary

MarylovesJohn

CompositionModels

•

Fromwordstosentencesandbeyond

Architecturesthatmodelthesyntaxandsemanticsofsentence

Twobasicarchitectures

Convolu1onalNeuralNet(CNN)

Mary loves John E-O-S

RecurrentNeuralNet(RNN)

Maryloves John E-O-S

CompositionModels(cont’d)

Bottomupandsoftparsingofsentence,createensembleofparsetrees

Constructallpossiblecompositions,andchoosethemostsuitableonesusingpooling(orgating)

Needssometrickstohandlevariablelengthsofsentences

Convolu1onalNeuralNet(CNN)

Mary loves John E-O-S

Sentenceassequenceofwords,lefttorightand/orrighttoleft

Recursivelyconstructthesame(oneforall)composition,betweenthehistoryandnextword

Differentgatingmechanismshavebeenproveduseful,e.g.,LSTMorGRU

RecurrentNeuralNet(RNN)

Maryloves John E-O-S

NeuralMachineTranslation(NMT)

•

Sequence-to-sequencelearningwithencoder-decoderframeworkGatedRNN(e.g.,LSTM,GRU)asencoderanddecoderAttentionmodelforautomaticalignment

End-to-endlearning

Othertricks(e.g.,largevocabularyetc)forfurtherimprovementOutperformingSTM

NeuralMachineTranslation(cont’d)

•

Comparingwithstatisticalmachinetranslation(SMT)

SRC：人类

永久

和平

和

稳定

的

一天

即将

到来。

SMT:Mankindpermanentpeaceandstabilityinthedaystocome.

NMT:Peacewillsooncomeinmankind.

SRC：欧安组织的成员包括北美、欧洲和亚洲的五十五个国家,

明年将庆祝成立三十周年。

SMT:theoscemembersincludingnorthamerica,europeandasia,the55countriesnextyearwillmarkthe30thanniversaryoftheestablishment.

NMT:theorganizationofsecurityincludes55countriesfromnorthamerica,europeandasia,andnextyearitwillcelebrateits30thanniversary.

Betterfluencyandsentence-levelfidelity,catchinguponotheraspects(e.g.,idiomsetc)

Interestingtopic:HowtocombineNMTandSMT

•

NeuralDialogue

•

Generationbasedapproachvsretrievalbasedapproach(traditional)Singleturndialogueandmultiturndialogue

Singleturndialogue

Sequence-to-sequencelearningwithencoder-decoderframework

95%ofresponsesarenaturalsentences,76%ofresponsesarereasonablereplies,trainedwithsocialmediadata(Shangetal.2015)

Multiturndialogue

•

–

Taskspecificdialogue

Multiplenetworks,end-to-endlearning

Successrateis98%forquestionansweringaboutrestaurants(Wenetal.,2016)

•

Nohumaneffort,purelydata-driven,notpossiblebefore

NeuralDialogueModels

•

Sequence-to-sequencelearningforsingle-turndialogueNeuralRespondingMachine(NRM)(Shangetal.,2015)

Encodertoencodeinputmessage,decodertodecodeittooutput

response

Combinationoflocalcontextmodel(attentionmodel)andglobalcontextmodel

End-to-endlearning

•

OccupyCentralisfinallyover.

占中终于结束了。

WillLujiazui(financedistrictinShanghai)bethenext?

下一个是陆家嘴吧？

IwanttobuyaSamsungphone

我想买三星手机。

Whynotbuyournationalbrands?

还是支持一下国产的

吧。

ExamplesofsingleturndialoguebetweenhumansandNRM,trainedwithWeibodata

ArchitectureofNRM,combinationoflocalandglobalrepresentation

NeuralDialogueModels(cont’d)

•

Sequence-to-sequencelearningformultiturndialogue(Vinyals&Le,2015)

TwoRNNs:oneRNN(encoder)toencodecontext(previoussentences),theotherRNN(decoder)todecodeittoresponse

End-to-endlearning

•

Sequencetosequencelearning:encodingcontextandgeneratingresponse

Examplesofdialoguebetweenhumansandmodel,trainedfrommoviesubtitledata

NeuralDialogueModels(cont’d)

•

Task-dependentmulti-turndialogue(Wenetal.,2016),&severalothermodels

End-to-endlearningArchitecture

–Intentnetwork,beliefstatetracker,policynetwork,generationnetwork

Complicatedtaskcontrolledbynetworks,trainedinend-to-endfashion

•

Systemarchitecture

Exampleofmulti-turndialoguebetweenhumanandsystem,restaurantdomain

NeuralReasoning

•

Reasoningoverfacts

–

Factsandquestionsareinnaturallanguage

bAbidataset,

(Westonetal.,2015)

–

Externalmemory,step-by-steporend-to-endlearning

•

Naturallogicreasoning

–

Learntoconductnaturallogicinference

Reasoningaboutsemanticrelations

toinfer turtle≺animal

turtle≺reptile, reptile≺animal

E.g.,from

NeuralReasoningModels

•

MemoryNetworks(Westonetal.,2014,Sukhbaataretal.,2015)Reasoningoverfacts,languagemodeling,etcAttentionmodel,externalmemory

Architecture

–

Multiplelayers(steps)ofprocessing

•

Writeandreadintermediateresultsintoexternalmemory,networkstrainedinend-to-endfashion

controlledby

ArchitectureofMemoryNetworks,(a)onelayer,

(b)multiplelayers

NeuralReasoningModels(cont’d)

•

Reasoningoverfacts

NeuralReasoner:(Pengetal.2015)Externalmemory,differentiabledatastructure,end-to-endlearning

Architecture

•

–

Encodinglayer:Reasoninglayers:

Answerlayer:

•

Writeandreadintermediateresultsintoexternalmemory,controlledbynetworkstrainedinend-to-endfashion

ArchitectureofNeuralReasoner

NeuralReasoningModels(cont’d)

•

Modelforreasoningusingnaturallogic(Bowmanetal.,2015)Comparetwoconceptsorterms(distributedrepresentations)Model:deepneuralnetworkordeeptensornetwork

Learndistributedrepresentationsofconceptsthroughreasoningwithnaturallogic

Architectureofmodel

Reference(partI)

•

[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka

[Wenetal.,2016]ANetwork-basedEnd-to-EndTrainableTask-orientedDialogueSystemTsung-HsienWen,MilicaGaši´c,NikolaMrkši´c,

LinaM.Rojas-Barahona,Pei-HaoSu,StefanUltes,DavidVandyke,SteveYoung

[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes

[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyals&Le,2015]Aneuralconversationalmodel.OriolVinyalsandQuocLe.

[Kumaretal.,2015]AskMeAnything:DynamicMemoryNetworksforNaturalLanguageProcessingAnkitKumar,OzanIrsoy,PeterOndruska,MohitIyyer,JamesBradbury,IshaanGulrajani,VictorZhong,RomainPaulus,RichardSocher

[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.

[Bownetal.,2014]Recursiveneuralnetworkscanlearnlogicalsemantics.S.Bowman,C.Potts,andC.Manning.

[Westonetal.,2015]Towardsai-completequestionanswering:Asetofprerequisitetoytasks.J.Weston,A.Bordes,S.Chopra,T.Mikolov.[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong

[Maetal.,2015]LearningtoAnswerQuestionsFromImageUsingConvolutionalNeuralNetwork.LinMa,ZhengdongLu,HangLi

[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.

•

Part-II:

Differentiable

Data-structures

Differentiable

Data-structure:

Outline

•

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

Differentiable

Data-structure:

Outline

•

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

Whatisdifferentiabledata-structure?

•

Differentiabledata-structureisamemory-likestructurewhichcanbecontrolledbyneuralnetworksystem,withthefollowingproperties

Itcanbeusedtoperformrathercomplicatedoperations

Alltheoperationsthatcanbe“tuned”,includingthereadand/orwritetothememory,aredifferentiable

“soyoucanjustdoback-propagations”

•

RepresentativeExamples:

–

NeuralTuringMachine(ageneraltake)RNNsearch(automaticalignmentforM.T.)

MemoryNetwork(adifferenttakeonmemorysetting)

NeuralRandomAccessMachine(smartdesignfor“pointers”)

WhatisNOTdifferentiabledata-structure?

(Too)manyexamples

Hardattention

e.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationalmethods

Varyingnumberofmemorycells

Typicallythenumberofmemorycellsarenotpartofoptimization,sincetheycannotbedirectlyoptimizedviaback-propagation

Otherstructuraloperations

e.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequencesandstorethemsomewhereetc

Othersymbolicstuff

–

e.g,.usesymbols(discreteclasses)inintermediaterepresentations

WhatisNOTdifferentiabledata-structure?

(Too)manyexamples

•

Hardattention

–E.g.,hardattentiononimages(Xuetal.2015),forwhichwehavetoresorttoreinforcementlearningorvariationmethods

•

Varyingnumberofmemoryunits

–

they

•

Otherstructuraloperations

–E.g,.changingtheorderoftwosub-sequences(Guo2015),replacingsomesub-sequencewithothers,locatingsomesub-sequenceandstoreitsomewhereetc

Othersymbolicstuff

•

–

E.g,.usesymbols(discreteclasses)inintermediaterepresentations

InTyapisceanllyseth,tehneudmebseirgnofomfdemiffoerryenutniiatsblaerestnroutcptuarrteoifsotoptifminidzaatiowna,ysince

acraonunnodt,wbeitdhioreucttlsyaocpritfiimciinzegdtovioabmacukc-hpreofpfiacgiaetniocny

Differentiable

Data-structure:

Outline

•

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamples

Concludingremarks

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

NeuralTuringMachine(Gravesetal.,2014)

Ageneralformulation:

Controller:typicallyagatedRNN,controllingtheread-writeoperations

Readhead:readthecontentofthememoryattheaddressgivenbycontroller,andreturnthereadingresulttothecontroller

Writehead:writetomemorywiththecontentandaddressbothdeterminedbycontroller

Tape:thememory,typicallyamatrix

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

reading

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Modifythecontentofit

•

Typicallywithsomeforgettingfactor

Erasesomething

ferase( )

addsomething

Hardaddressingforreading/writing

Reading

Determinethecelltoreadfrom(addressing)

Getthecontentoftheselectedcell

Writing

Determinethecelltowriteto(addressing)

Modifythecontentofit

•

Typicallywithsomeforgettingfactor

ferase(

Softaddressingforreading/writing

Withsoftaddressing,thesystemdoesn’tpinpointamemorycell,instead,givesadistributionovercellsitisgoingtoreadfromorwriteto.

Sothereading/writingoccursonessentiallyallmemorycells,withdifferentweights(giveninthedistribution)

Itcanbeviewedasanexpectation(weightedsum)ofallthepossibleread/writeactions

Theessenceofdifferentiabledata-structures

Theessenceofdifferentiabledata-structureistorepresentthedistributionproperly,including

everytunablecomponentinthesystem

jointdistributionwhenmultiplecomponentsmeet

sothesupervisioncangothroughtheentiresystemto

increasetheprobabilitiesofthepromisingcandidates,

decreasetheprobabilitiesofthepoorcandidates,

withoutkillinganyone

Whenwehavetoexplicitlymodelmanydiscreteoperations,weoftenneedtogeneratetherepresentationofallofthemanddoaweightedaverage

Youhavetocustomizeyourownmodel

ThegenericNTMwon’tworkformostreal-worldtasks

“Tryamachinetranslationtask,andthenyouwillsee”

•

Youhavetodesignyourownmodel(inasense,probablyaspecialcaseofNeuralTuringMachine)toputinyourdomainknowledge

Forbetterinductivebias(soitwon’tneedtoomanysamplestolearn)

Tobetterdecomposethecomplicatedoperations(soeachsub-operationcanbeeasily“represented”and“learned”)

Forbetterefficiency(sotheentireoperationstakelesstime)

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingremarks

•

Intermsof“term”

Averysloppycategorization

Short-termmemory

Short-termmemory(STM)storestherepresentationofcurrentinput,e.g.,therepresentationofsourcesentenceinneuralmachinetranslation

Intermediatememory

Intermediatememoryliesbetweenshort-termandlong-termmemory,itstoresthefactsandcontextualinformationrelatedtothecurrentproblem.

Long-termmemory

Long-termmemory(LTM)storestheinstance-independentknowledge,forexamplethefactoidknowledgeinQ.A.,ortherulememoryfortranslationinM.T.

Examplesabouttermsofmemories(1)

Short-termMemoryinNMT

Themodelwilllearntoformthesourcememoryintheencodingphase,toreadfromit(“attentiveread”)duringdecoding,andsometimeswritealittlebittoo

NowacommonpracticeinNMT

Bahdanauetal.(2015)andmanymanymore

“ShortShort-termMemory”inNMT

MemoryasadirectextensiontothestateofdecoderRNN,bigimprovementofperformance

ThismemoryisreadfromandwrittentobythedecoderRNNateachtimestep

Itstorespartofsourceandtargetsentencesrelevantforaparticulartimeindecoding

Wangetal.(2016)

Examplesabouttermsofmemories(2)

Long-termMemoryinNeuralQA

Haveamemorytosavethefactoidknowledge(tables,triples)

Angenerativemodelwill“fetch”theknowledgefromtheLTMasneeded

Yinetal.(2016a)

IntermediateMemoryforRepresentingADialogSession,AParagraph…

E.g,savethedialoghistoryinamemory-netandattendtotherelevantpartwhengeneratingaresponse

bot:whichpricerangeareyoulookingfor？

Bordes&Westonetal.(2016)

user:hi

bot:hellowhatcanIhelpyouwithtoday

user:mayIhaveatableinParis

bot:Iamnoit

user:<silence>

bot:anypreferenceonatypeofcuisine

user:IloveIndianfood

bot:howmanypeoplewouldbeinyourpart?

user:wewillbesix

Intermof“structure”

Pre-determinedSize(genericNTM)

–Memoryoffixedsizeisclaimed(independentofinstances)andtheread/writeoperationisontheentirememory

Linear(NeuralStack,NeuralQueue)

…

–Thenumberofmemory“cells”islineartothelengthofsequence

Stacked-Memory(DeepMemory)

–Itcouldbebasedonmemorywithfixedorlinearsize

LinearMemory

Thenumberofmemorycellsislineartothelengthofsequence

Itisunbounded,butthenumberofcellscanoftenbepre-determined(e.g.,intranslation,afteryouseetheentiresourcesentence)

Cantaketheformqueue,stack…,fordifferentmodelingtasks

NeuralStackMachine

Credit:Mengetal.(2015)

Grefenstetteetal.(2015)

“DeepMemory”

Deepmemory-basedarchitectureforNLP

Differentlayersofmemory,equippedwithsuitableread/writestrategiestoencouragelayerbylayertransformationofinput(e.g,asentence)

AgeneralizationofthedeeparchitecturesinDNNtoricherformofrepresentationstohandlemorecomplicatedlinguisticobjects

output

3-layer

“DEEPMEMORY”

RepresentationatLayer-3

RepresentationatLayer-2

RepresentationatLayer-1

Transformations

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingremarks

•

Addressing(forbothreadandwrite)

Roughly,threetypes

•

Location-basedaddressing

–Thecontrollerdetermineswhichcellstoread/writepurelybasedonthelocationinformation

Content-basedaddressing

–Thecontrollerdetermineswhichcellstoread/writebasedonthecontentofthecells

•

Hybridaddressing

Theaddressesaredeterminedbasedonbothlocationandcontent

Manydifferentwaystocombinethetwostrategies

•

Content-basedAddressing

Content-basedaddressingdetermineswhichcellstoread/writebasedonthestateofthecontroller

Tomakeitdifferentiable,allthecells

thereadingisaweighedaverageofthecontentof

Read-head

st:Thequeryingsignalattimet

rt:Thereadingattimet

xn:Thecontentofthenthmemorycell

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

RNNsearchasanexampleofattention

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

memory

Attentionisalsoaddressing

Roughlyspeaking,attentionisjustawaytodynamicallydeterminewhichmemorycellstoread

Attentivereadcouldbebasedoncontent,location(Graves,2012),orboth

memory

Morecomplicatedattentionmechanism

InteractiveAttention

Learnto“write”alittlebittothememorythefacilitatenextroundofreading

Itcanencourageanddiscouragecertainmemorycells

Helptohandleunder-translationandover-translationinNMT

Mengetal.(2016)

Local+globalattention

Multipleattentionstrategytofindthe“context”vectorwithdifferentresolutions

Oftenoutperformsjusttheglobalone

Luongetal.(2015)

Location-basedAddressing

Iflocation-basedaddressingispartoftheoptimization,itislikelytobenon-differentiable.Therearethreestrategiestogetaround

Strategy-I:thelocation-basedpartishard-wiredintothehybridaddressing,

e.g.inthegenericNTM(Gravesetal.,2014)

–

Strategy-II:location-basedaddressingispartofhard-wiredoperationswhicharecontrolledbytheneuralnetwork,e.g,NeuralRandomAccessMachines(Kurachetal.,2015)

Strategy-III:boththearchitectureandthelearningsettingaredesignedtoencouragelocation-basedaddressing,e.g.,CopyNet(Guetal.,2016)

–

Location-basedAddressing:

CopyNet(Guetal.,2016)

Sequence-to-sequencemodelwithtwoattentionmechanisms

Theencoderisencouragedtoputlocationinformationinthecontentofmemory,whilethedecoderisencouragedtousesthislocationinformationfor“copying”segmentsofsource

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingRemarks

•

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Yinetal.(2016b)

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Long-termmemory:

Thepartiallyembeddedknowledge-base

Yinetal.(2016b)

NeuralEnquirer:

bothSTMandLTM

AneuralnetworksystemforqueryingKBtables

NeuralEnquirerhasbothshort-termandlong-termmemory

Short-termmemory:Intermediateresultofprocessingoftables

Yinetal.(2016b)

NeuralRandomAccessMachine

NeuralrandomaccessmachineisspecialcaseofNTMthatsupportspointerinadifferentiabledata-structure

Anintriguingexampletousedifferentiabledata-structureforseeminglynon-differentiableoperations

–Manyhardmodulestoaccessthememoryandsoftmechanism(aspecialkindofNNcontroller)tocallthem

•

Kurachetal.(2016)

Thecircuitgeneratedateverytimestep>2forthetaskReverse

Differentiable

Data-structure:

Outline

Whatisdifferentiabledata-structure?Ageneralformulation

Memory:typesandstructures

AddressingstrategiesExamplesConcludingRemarks

•

ProsandCons

Differentiabilityrequiresmaintainingtheentiredistribution,whichhasitsadvantagesanddisadvantages

Pros:

itmakestheoptimizationstraightforwardandefficient,sinceeverymembergetsanon-zeroshareofthemass(vs.non-differentiablecases)

Memoryandallthatgivegreatspaceforarchitecturalandmechanismdesign

Cons:

Maintainingthisdistributionandproperlyrepresentingitisnotalwayseasy

Droppingthedifferentiabilityrequirementoftenmakesthedesign(forexamplethepointer)mucheasier

Reference(partII)

•

[Xuetal.,2015]Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention

KelvinXu,JimmyBa,RyanKiros,KyunghyunCho,AaronCourville,RuslanSalakhutdinov,RichardZemel,YoshuaBengio[Gravesetal.,2014]NeuralTuringMachines.AlexGraves,GregWayne,IvoDanihelka

[Grefenstetteetal.,2015]LearningtoTransducewithUnboundedMemory.EdwardGrefenstette,KarlMoritzHermann,MustafaSuleyman,PhilBlunsom

[Gu,2016]IncorporatingCopyingMechanisminSequence-to-SequenceLearning.JiataoGu,ZhengdongLu,HangLi,VictorO.K.Li[Kurachetal.,2015]NeuralRandom-AccessMachinesKarolKurach,MarcinAndrychowicz,IlyaSutskever

[Wangetal.,2016]Memory-enhancedDecoderforNeuralMachineTranslationMingxuanWang,ZhengdongLu,HangLi,QunLiu[Yinetal.,2016a]NeuralGenerativeQuestionAnsweringJunYin,XinJiang,ZhengdongLu,LifengShang,HangLi,XiaomingLi

[Yinetal.,2016b]NeuralEnquirer:LearningtoQueryTableswithNaturalLanguage.PengchengYin,ZhengdongLu,HangLi,BenKao

[Westonetal.,2015]MemoryNetworks.JasonWeston,SumitChopra&AntoineBordes[Hochreiter&Schmidhuber,1997]Longshort-termmemory.SeppHochreiterandJurgenSchmidhuber.

[Shangetal.,2015]Neuralrespondingmachineforshort-textconversation.LifengShang,ZhengdongLu,andHangLi.[Vinyalsetal.,2014]Sequencetosequencelearningwithneuralnetworks.IlyaSutskever,OriolVinyals,andQuocVLe.[Vinyalsetal.,2015]Pointernetworks.OriolVinyals,MeireFortunato,andNavdeepJaitly.

[Tuetal.,2016]ModelingCoverageforNeuralMachineTranslationZhaopengTu,ZhengdongLu,YangLiu,XiaohuaLiu,HangLi[Sukhbaataretal.,2015]End-to-endmemorynetworks.S.Sukhbaatar,J.Weston,R.Fergus.

[Pengetal.,2015]TowardsNeuralNetwork-basedReasoningBaolinPeng,ZhengdongLu,HangLi,Kam-FaiWong

[Mengetal.,2015]ADeepMemory-basedArchitectureforSequence-to-SequenceLearningFandongMeng,ZhengdongLu,ZhaopengTu,HangLi,QunLiu

[Luongetal.2015]Effectiveapproachestoattention-basedneuralmachinetranslation.Minh-ThangLuong,HieuPham,andChristopherDManning.

[Bahdanauetal.2014]Neuralmachinetranslationbyjointlylearningtoalignandtranslate.DzmitryBahdanau,KyunghyunCho,andYoshuaBengio.

•

[Mengetal.2016]Interactiveattentionforneuralmachinetranslation.FandongMeng,ZhengdongLu,

QunLiu,HangLi.

Part-III:

LearningParadigms

Learning:

Outline

Overview

End-to-endlearning(ornot?)Dealingwithnon-differentiabilityGrounding-basedlearning

Newlearningparadigms

•

Humanlanguagelearning

Itisacomplex(andpowerful)mixtureof

Supervisedlearning:

- whenwearetaughtwordsandgrammar

whenwegotcorrectedinmakingasentence

Unsupervisedlearning:

whenwelearnFrenchbyreadingaFrenchnovel

whenwefigureoutthemeaningofwordsbyseeinghowtheyareused

Reinforcementlearning:

whenwearelearningthrough“trialanderror”

Explanation-basedlearning:

whenwearebuildingatheorybasedonourdomainknowledgetomakesenseofanewobservation

…

•

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradientdescentvs.Non-differentiableobjectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradient-basedvs.Non-differentiable

objectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

End-to-endlearningtunestheparametersoftheentiremodelbasedonthecorrectionalsignalfromtheoutput.Nosupervisionaddedtotheintermediatelayers

InStep-by-steplearning,wehavespecificallydesignedsupervisionontheintermediaterepresentations

Severaldimensionsoflearningparadigm

End-to-end

vs.“step-by-step”

Gradient-basedvs.Non-differentiable

objectives

Supervisionfrom“grounding”orhumanlabeling

Supervisedlearningvs.Reinforcementlearning

Gradient-basedlearningtunestheparametersbyminimizing

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

案例ai深度好奇

文档简介

温馨提示

最新文档

评论

案例ai深度好奇

文档简介

温馨提示

最新文档

评论

相关文档