技术博客tenso与教程tensor2tensor_第1页
技术博客tenso与教程tensor2tensor_第2页
技术博客tenso与教程tensor2tensor_第3页
技术博客tenso与教程tensor2tensor_第4页
技术博客tenso与教程tensor2tensor_第5页
已阅读5页,还剩58页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Tensor2Tensor Transformers TensorFlow for Deep Learning Research Based on Attention Is All You Need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, Illia Polosukhin and other works with Samy Bengio, Eugene Brevdo, Francois Chollet, Stephan Go

2、uws, Nal Kalchbrenner, Ofir Nachum, Aurko Roy, Ryan Sepassi. ukasz KaiserWhy? Some context.How Deep Learning Quietly Revolutionized NLP (2016)What NLP tasks are we talking about?Part Of Speech Tagging Assign part-of-speech to each word.Parsing Create a grammar tree given a sentence.Named Entity Reco

3、gnition Recognize people, places, etc. in a sentence.Language Modeling Generate natural sentences.Translation Translate a sentence into another language.Sentence Compression Remove words to summarize a sentence.Abstractive Summarization Summarize a paragraph in new words.Question Answering Answer a

4、question, maybe given a passage.Can deep learning solve these tasks?Inputs and outputs have variable size, how can neural networks handle it?Recurrent Neural Networks can do it, but how do we train them?Long Short-Term Memory Hochreiter et al., 1997, but how to compose it?Encoder-Decoder (sequence-t

5、o-sequence) architecturesSutskever et al., 2014; Bahdanau et al., 2014; Cho et al., 2014Advanced sequence-to-sequence LSTM Parsing with sequence-to-sequence LSTMs(1) Represent the tree as a sequence.(2) (2) Generate data and train a sequence-to-sequence LSTM model.(3) (3) Results: 92.8 F1 score vs 9

6、2.4 previous best Vinyals & Kaiser et al., 2014Language modeling with LSTMsLanguage model performance is measured in perplexity (lower is better).Kneser-Ney 5-gram:67.6Chelba et al., 2013RNN-1024 + 9-gram:51.3Chelba et al., 2013LSTM-512-512:54.1Jzefowicz et al., 20162-layer LSTM-8192-1024:30.6Jz

7、efowicz et al., 20162-l.-LSTM-4096-1024+MoE:28.0Shazeer & Mirhoseini et al., 2016Model size seems to be the decisive factor.Language modeling with LSTMs: ExamplesRaw (not hand-selected) sampled sentences:Jzefowicz et al., 2016About 800 people gathered at Hever Castle on Long Beach from noon to 2

8、pm , three to four times that of the funeral cortege . It is now known that coffee and cacao products can do no harm on the body .Yuri Zhirkov was in attendance at the Stamford Bridge at the start of the second half but neither Drogba nor Malouda was able to push on through the Barcelona defence .Se

9、ntence compression with LSTMsExample:Input:State Sen. Stewart Greenleaf discusses his proposed humantrafficking bill at Calvery Baptist Church in Willow Grove Thursday night.Output: Stewart Greenleaf discusses his human trafficking bill.Results:readabilityinformativenessMIRA (previous best): 4.31 3.

10、55LSTM Filippova et al., 2015: 4.51 3.78Translation with LSTMsTranslation performance is measured in BLEU scores (higher is better, EnDe):Phrase-Based MT:20.7Durrani et al., 2014Early LSTM model:19.4Sbastien et al., 2015DeepAtt (large LSTM):20.6Zhou et al., 2016GNMT (large LSTM):24.9Wu et al., 2016G

11、NMT+MoE:26.0Shazeer & Mirhoseini et al., 2016Again, model size and tuning seem to be the decisive factor.Translation with LSTMs: ExamplesGerman:Probleme kann man niemals mit derselben Denkweise lsen, durch die sie entstanden sind.PBMT Translate:GNMT Translate:No problem can be solved fromProblem

12、s can never be solvedthe same consciousness thatwith the same way of thinkingthey have arisen.that caused them.Translation with LSTMs: How good is it?PBMTGNMTHumanRelative improvementEnglish Spanish 4.885 5.428 5.504 87%English French4.9325.2955.49664%English Chinese 4.0354.5944.98758%Spanish Englis

13、h 4.8725.1875.37263%French English 5.0465.3435.40483%Chinese English 3.6944.2634.636 60%Google Translate production data, median score by human evaluation on the scale 0-6. Wu et al., 16That was 2016. Now.How to generate almost anything?Good long sequence model:Has global structure and local structu

14、re (e.g., music)Powerful enough for complex functions (e.g., .)Captures long-range dependencies (e.g., reuse a name)Remembers rare occurrences (e.g., one-shot learning)Correlates across modalities (e.g., image+text together)What does a good long sequence model bring?Text: a story-teller, translator

15、if conditioned.Image: a painter, drawing tool if conditioned.Music: a composer, companion if conditioned.Games: a simulator, possibly changing RL techniques.Video: a world model, or a fake news tool?HowText(Attention is All You Need)RNNs EverywhereSequence to Sequence Learning with Neural NetworksAu

16、to-Regressive CNNsWaveNet and ByteNetAttentionConvolutionAttentionDot-Product Attentionk0v0k1v1k2v2q0q1AADot-Product Attentiondef dot_product_attention(q, k, v, bias, dropout_rate=0.0, image_shapes=None, name=None, make_image_summary=True, save_weights_to=None, dropout_broadcast_dims=None):with tf.v

17、ariable_scope( name, default_name=dot_product_attention, values=q, k, v) as scope: # batch, num_heads, query_length, memory_length logits = tf.matmul(q, k, transpose_b=True) if bias is not None: logits += bias weights = tf.nn.softmax(logits, name=attention_weights) if save_weights_to is not None: sa

18、ve_weights_ = weights # dropping out the attention links for each of the heads weights = common_layers.dropout_with_broadcast_dims( weights, 1.0 - dropout_rate, broadcast_dims=dropout_broadcast_dims) if expert_utils.should_generate_summaries() and make_image_summary: attention_image_summ

19、ary(weights, image_shapes) return tf.matmul(weights, v)OpsActivationsAttention (dot-prod)n2 dn2 + n dAttention (additive)n2 dn2 dRecurrentn d2n dConvolutional n d2n dn = sequence length d = depth k = kernel sizeWhats missing from Self-Attention?ConvolutionSelf-AttentionWhats missing from Self-Attent

20、ion?ConvolutionSelf-Attention Convolution: a different linear transformation for each relative position.Allows you to distinguish what information came from where. Self-Attention: a weighted average :(The Fix: Multi-Head AttentionConvolutionMulti-Head Attention Multiple attention layers (heads) in p

21、arallel (shown by different colors) Each head uses different linear transformations. Different heads can learn different relationships.The Fix: Multi-Head AttentionThe Fix: Multi-Head AttentionOpsActivationsMulti-Head Attentionwith linear transformations.For each of the h heads, dq = dk = dv = d/hn2

22、 d + n d2n2 h + n dRecurrentn d2n dConvolutional n d2n d n = sequence length d = depth k = kernel sizeThree ways of attentionEncoder-Decoder AttentionEncoder Self-AttentionMaskedDecoder Self-AttentionThe TransformerMachine Translation Results: WMT-1429.141.8AblationsCoreference resolution (Winograd

23、schemas)Coreference resolution (Winograd schemas)SentenceGoogle TranslateTransformerThe cow ate the hay because it was delicious.La vache mangeait le foin parce quelle tait dlicieuse.La vache a mang le foin parce quil tait dlicieux.The cow ate the hay because it was hungry.La vache mangeait le foin

24、parce quelle avait faim.La vache mangeait le foin parce quelle avait faim.The women stopped drinking the wines because they were carcinogenic.Les femmes ont cess de boire les vins parce quils taient cancrognes.Les femmes ont cess de boire les vins parce quils taient cancrignes.The women stopped drin

25、king the wines because they were pregnant.Les femmes ont cess de boire les vins parce quils taient enceintes.Les femmes ont cess de boire les vins parce quelles taient enceintes.The city councilmen refused the female demonstrators a permit because they advocated violence.Les conseillers municipaux o

26、nt refus aux femmes manifestantes un permis parce quils prconisaient la violence.Le conseil municipal a refus aux manifestantes un permis parce quelles prnaient la violence.The city councilmen refused the female demonstrators a permit because they feared violence.Les conseillers municipaux ont refus

27、 aux femmes manifestantes un permis parce quils craignaient la violenceLe conseil municipal a refus aux manifestantes un permis parce quelles craignaient la violence.*Example: Language Model(Generating Wikipedia by Summarizing Long Sequences)Mixture of ExpertsSparse gating instead of a feed-forward

28、layer:Long Text GenerationGenerating entire Wikipediaarticles by summarizing topsearch results and references.(Memory-Compressed Attn.)The Transformer are a Japanese hardcore punk band.=Early years=The band was formed in 1968, during the height of Japanese musichistory. Among the legendary Japanese

29、people|Japanese composers ofJapanese lyrics, they prominently exemplified Motohiro Odasespecially tasty lyrics and psychedelic intention. Michio was alongtime member of the every Sunday night band PSM. His alluring wasof such importance as being the man who ignored the already successfulimage and th

30、at he municipal makeup whose parents were  theband was calledJenei.<ref> ;From a young age the band was very close, thus opting to pioneer whatFrom a young age the band was very close, thus opting to pioneer whathad actually begun as a more manageable core hardcore punkban

31、d.<ref> ;=History=Born from the heavy metal revolution=In 1977 the self-proclaimed King of Tesponsors, Joe Lus: It was somewhere. it was just a guile . taking this song toBroadway. It was the first record I ever heard on A.M., After someopposition I received at the hands of Parsons, an

32、d in the follow-upnotes myself.<ref> ;The band cut their first record album titled Transformed, furtheredThe band cut their first record album titled Transformed, furtheredand extended Extended,<ref> MC Transformed EP (CDR) by The Moondrawn EMI, 1994</ref>an

33、d in 1978 the official band line-up of the three-piece pop-punk-rockband TEEM. They generally played around Japan, growing from theTop 40 standard.=1981-2010: The band to break away=On 1 January 1981 bassist Michio Kono, and the members of the originalline-up emerged. Niji Fukune and his Head poet|H

34、ead band (nowguitarist) Kazuya Kouda left the band in the hands of the band at theMay 28, 1981, benefit season of Led Zeppelins Marmarin building.In June 1987, Kono joined the band as a full-time drummer, playing afew nights in a 4 or 5 hour stint with D-beat. Kono played throughthe mid-1950s, at Sh

35、inlie, continued to play concerts with drummers inIbis, Cor, and a few at the Leo Somu Studio in Japan. In 1987, Konorecruited new bassist Michio Kono and drummer Ayaka Kurobe as drummerfor band. Kono played trumpet with supplement music with Saint Etienneas a drummer. Over the next few years Kono p

36、layed as drummer and wouldget many alumni news invitations to the bands Toys Beach section.In 1999 he joined the CT-182.His successor was Barrie Bell on a cover of Jethro Tull(band)|Jethro Tulls original 1967 hit "Back Home" (lastappearance was in Jethro), with whom he shares a nam

37、e.=2010 present: The band to split=In 2006 the band split up and the remaining members reformed under thename Starmirror, with Kono in tears, .The Transformer is a book by British illuminatistHerman Muirhead, set in a post-apocalyptic world that border on amysterious alien known as the "Tra

38、nsformer Planet" which ishis trademark to save Earth. The book is about 25 years old, and itcontains forty-one different demographic models of the human race, asin the cases of two fictionalgroups, Robtobeau "Richard"and "The Transformers

39、 Planet".= Summary =The book benefits on the 3-D film|3-D film, taking his one-thirdof the worlds pure "answer" and gas age from 30 to 70within its confines.The book covers the world of the world of Area 51|Binoculars fromaround the worlds of Earth. It is judged by the abi

40、lity oftelepathy|telepaths and television, and provides color, line,and end-to-end observational work.and end-to-end observational work.To make the book up and document the recoverable quantum states of theuniverse, in order to inspire a generation that fantasy producing atele-recording-offering mac

41、hine is ideal. To make portions of thisuniverse home, he recreates the rostrum obstacle-oriented frameworkMinou.<ref> )</ref>= The Transformer=The book was the first on a Random Access Album|re-issue since itsoriginal version of Robtobeau, despite the band naming itselfa

42、"Transformer Planet" in the book.<refname=prweb-the-1985>citeweb|url= |title=TheTransformer|publisher= |date=|accessdate=2012-04-25</ref>Today, "The Transformers Planet" is played entirelyopen-ended, there are more than just the four pr

43、eviously separate onlybands. A number of its groups will live on one abandoned volcano inNorth America,=Conceptual The Transformer universe=Principals a setting-man named “The Supercongo Planet,” who is anaturalistic device transferring voice and humour from TheTransformer Planet, whose two vice-mak

44、s appear often in thisuniverse existence, and what the project in general are trying tohighlight many societal institutions. Because of the way that thecorporation has made it, loneliness, confidence, research and rentingout these universes are difficult to organise without the bandscreating their o

45、wn universe. The scientist is none other than a singerand musician. Power plants are not only problematic, but if they wantprogrammed them to create and perform the worlds first Broadcast ofitself once the universe started, but deliberately Acta BiologicalStation, db.us and BB on The Transformer Pla

46、net, The TransformerPlanet, arent other things Scheduled for.:<blockquote>A man called Dick Latanii Bartow, known thegreatest radio dot Wonderland administrator at influential arrangersin a craze over the complex World of Biological Predacial Engineer inRodel bringing Earth into a sort

47、job with fans. During thisSocpurportedly Human, Conspiracy was being released to the world asBaron Maadia on planet Nature. A world-renowned scientist named JuliaSamur is able to cosmouncish society and run for it - except us who ishe and he is before talking this entire T100 before Cell physiologis

48、tCygnets. Also, the hypnotic Mr. Mattei arrived, so it is Mischief whoover-manages for himself - but a rising duplicate of Phil Rideoutmakes it almost affable. There is plenty of people at work to makeuse of it and animal allies out of politics. But Someday in 1964, whenwe were around, we were stead

49、fast against the one mans machine and hedid an amazing job at the toe of the mysterious.Mr. Suki who is an engineering desk lecturer at the University of.Images(Image Transformer)Image GenerationModel Type% unrecognized(max = 50%)ResNet4.0%Superresolution GAN(Garcia16)8.5%PixelRecursive(Dahl et al.,

50、 2017)11%Image Transformer36.9%How about GANs?(Are GANs Created Equal? A Large-Scale Study)Problem 1: VarianceProblem 2: Even best models are not great: Image Transformer: 36.6Multiple Modalities(One Model to Learn Them All)Multiple Modalities: MultiModelMultiModelTrained on 8 tasks (4 WMT, ImageNet

51、, COCO, WSJ, PTB)Images still like convolutions (pure attention doesnt work)Modalities: down-stride and up-stride images, embed textArchitecture: convolutional encoder, transformer decoderConvolution FF improves attention on many large tasksCapacity: use Mixture-of-Experts as feed-forward layersHow

52、comes ImageNet improves PTB results?MultiModel: 4 WMT, ImageNet, COCO, WSJ, PTBTensor2TensorTensor2TensorTensor2Tensor (T2T) is a library of deep learning models and datasets designed to make deep learning more accessible accelerate ML research.Datasets: ImageNet, CIFAR, MNIST, Coco, WMT, LM1B, .Mod

53、els: ResNet, RevNet, ShakeShake, Xception, SliceNet, Transformer, ByteNet, Neural GPU, LSTM, .Tools: cloud training, hyperparameter tuning, TPU, .Tensor2Tensor Code (github)data_generators/ : datasets, must subclass Problemmodels/ : models, must subclass T2TModelutils/ , bin/ , etc. : utilities, bin

54、aries, cloud helpers, pip install tensor2tensor & t2t-trainer -generate_data -data_dir=/t2t_data -output_dir=/t2t_train/mnist -problems=image_mnist -model=shake_shake -hparams_set=shake_shake_quick -train_steps=1000 -eval_steps=100Tensor2Tensor Problem Classregistry.register_problem(wmt_ende_tokens_8k)class WMTEnDeTokens8k(WMTProblem): Problem spe

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论