下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworknucl.ai Conference 2016Artificial Intelligence in Creative Industries July 18-20, Vienna/Austria.Alex J. Champandard nucl.ai Research Laboratory alexjcnucl.aiAbstractConvolutional neural networks (s) have provenhighly effective at
2、image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper intro- duces a novel concept to augment such generative archi- tectures with semantic annotations, either by
3、manually authoring pixel labels or using existing solutions for se- mantic segmentation. The result is a content-aware gen- erative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images gen- erated by avoiding common glitches, make the resultslook signifi
4、cantly more plausible, and extend the func- tional range of these algorithmswhether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into mas- terful paintings!Figure 1: Synthesizing paintings with deep neural networks via analogy. (a
5、) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.IntroductionImage processing algorithms have improved dramati-cally thanks tos trained on image classifica-occur: most often this matches semantic labels, e.g. hair style and skin tones should transfer
6、respectively regardless of color. Unfortunately, while s routinely extract se- mantic information during classification, such information is poorly exploited by generative algorithmsas evidenced by frequent glitches.We attribute these problems to two underlying causes:tion problems to extract underl
7、ying patterns from large datasets (Simonyan and Zisserman 2014).As a result,deep convolution layers in these networks provide a more expressive feature space compared to raw pixel layers, which proves useful not only for classification but also generation (Mahendran and Vedaldi 2014). For transfer-
8、ring style between two images in particular, results are astonishingespecially with painterly, sketch or abstract styles (Gatys, Ecker, and Bethge 2015).However, to achieve good results using neural style trans- fer in practice today, users must pay particular attention to the composition and/or sty
9、le image selection, or risk see- ing unpredictable and incorrect patterns. For portraits, facial features can be ruined by incursions of background colors or clothing texture, and for landscapes pieces of vegetation may be found in the sky or other incoherent places. Theres certainly a place for thi
10、s kind of glitch art, but many users become discouraged not being able to get results they want. Through our social media bot that first provided these algorithms as a service (Champandard 2015), we observe that users have clear expectations how style transfer shouldThis research was funded out of t
11、he marketing budget.1. Whiles used for classification can be re-purposed toextract style features (e.g. textures, grain, strokes), they were not architected or trained for correct synthesis.2. Higher-level layers contain the most meaningful infor- mation, but this is not exploited by the lower-level
12、 lay- ers used in generative architectures: only error back- propagation indirectly connects layers from top to bottom.To remedy this, we introduce an architecture that bridges the gap between generative algorithms and pixel labeling neural networks. The architecture commonly used for im- age synthe
13、sis (Simonyan and Zisserman 2014) is augmented with semantic information that can be used during gener- ation. Then we explain how existing algorithms can be adapted to include such annotations, and finally we show-arXiv:1603.01768v1 cs.CV 5 Mar 2016Figure 2: Comparison and breakdown of synthesized
14、portraits, chosen because of extreme color and feature mismatches. Parameters were adjusted to make the style transfer most faithful while reducing artifacts such as patch repetition or odd blendswhich proved challenging for the second column, but more straightforward in the last column thanks to se
15、mantic annotations. The top row shows transfer of painted style onto a photo (easier), and the bottom turning the painting into a photo (harder); see area around the nose and mouth for failures. Original painting by Mia Bergeron.case some applications in style transfer as well as image synthesis by
16、analogy (e.g. Figure 1).image using a nearest neighbor calculation. Operating on patches in such a way gives the algorithm local understand- ing of the patterns in the image, which overall improves the precision of the style transfer since fewer errors are intro- duced by globally enforcing statisti
17、cal distributions.Both gram- and patch-based approaches struggle to pro- vide reliable user controls to help address glitches. The pri- mary parameter exposed is a weighting factor between style and content; adjusting this results in either an abstract-styled mashup that mostly ignores the input con
18、tent image, or the content appears clearly but its texture looks washed out (see Figure 2, second column). Finding a compromise where content is replicated precisely and the style is faithful re- mains a challengein particular because the algorithm lacks semantic understanding of the input.Related W
19、orkThe image analogy algorithm (Hertzmann. 2001) isable to transfer artistic style using pixel features and theirlocal neighborhoods. While more recent algorithms using deep neural networks generate better quality results from a stylistic perspective, this technique allows users to synthe- size new
20、images based on simple annotations. As for recent work on style transfer, it can be split into two categories: specialized algorithms or more general neural approaches.The first neural network approach to style transfer is gram-based (Gatys, Ecker, and Bethge 2015), using so- called “Gram Matrices”
21、to represent global statistics about the image based on output from convolution layers. These statistics are computed by taking the inner product of in- termediate activationsa tensor operation that results in aN × N matrix for each layer of N channels. During this operation, all local informat
22、ion about pixels is lost, andonly correlations between the different channel activations remain. When glitches occur, its most often due to these global statistics being imposed onto the target image regard- less of its own statistical distribution, and without any un- derstanding of local pixel con
23、text.A more recent alternative involves a patch-based ap- proach (Li and Wand 2016), which also operates on the out- put of convolution layers. For layers of N channels, neural patches of 3 × 3 are matched between the style and contentThankfully, recentarchitectures are capable of pro-viding su
24、ch semantic context, typically by performing pixellabeling and segmentation (Thoma 2016). These msrely primarily on convolutional layers to extract high-level patterns, then use deconvolution to label the individual pix- els. However, such insights are not yet used for synthesis despite benefits sho
25、wn by non-neural approaches.The state-of-the-art specialized approaches to style trans- fer exploit semantic information to great effect, performingcolor transfer on photo portraits using specifically crafted image segmentation (Yang . 2015). In particular, facial features are extracted to create ma
26、sks for the image, then masked segments are processed independently and colors can be transferred between each corresponding part (e.g.Figure 3: Our augmentedthat uses regular filters of N channels (top), concatenated with a semantic map of M=1 channel(bottom) either output from another network capa
27、ble of labeling pixels or as manual annotations.result is a new output with N + M channels, denoted sl and labeled accordingly for each layer (e.g. sem4 1).Before concatenation, the semantic channels are weighted by parameter to provide an additional user control point:background, clothes, mouth, ey
28、es, etc.) Thanks to the addi- tional semantic information, even simpler histogram match- ing algorithms may be used to transfer colors successfully.Msl = xlkml(2)Ourcontributionbuildsonapatch-basedap-For style images, the activations for the input image andproach (Li and Wand 2016)to style transfer,
29、 using op-its semantic map are concatenated together as sl . For thetimization to minimize content reconstruction error Ecsoutput image, the current activations xl and the input con- tents semantic map are concatenated as sl. Note that the semantic part of this vector is, therefore, static during th
30、eoptimization process (implemented using L-BFGS).This architecture allows specifying manually authored se- mantic maps, which proves to be a very convenient tool for user controladdressing the unpredictability of current gen- erative algorithms. It also lets us transparently integrate re- cent pixel
31、 labeling s (Thoma 2016), and leverage any advances in this field to apply them to image synthesis.RepresentationThe input semantic map can contain an arbitrary number of channels M . Whether doing image synthesis or style trans- fer, there are only two requirements: Each image has its own semantic
32、map of the same aspect ratio, though it can be lower resolution (e.g. 4x smaller) since itll be downsampled anyway. The semantic maps may use an arbitrary number of chan- nels and representation, as long as they are consistent for the current style and content (so M must be the same).Common represen
33、tations include single greyscale chan- nels or RGB+A colorsboth of which are very easy to au- thor. The semantic map can also be a collection of layer(weighted by ) and style remapping error Es (weight ). See (Gatys, Ecker, and Bethge 2015) for details about Ec.(1)E = Ec + EsFirst we introduce an au
34、gmented(Figure 6) that in-corporates semantic information, then we define the input semantic map and its representation, and finally show how the algorithm is able to exploit this additional information.ArchitectureThe most commonly useds for image synthesis isVGG (2014), which combines pooling and
35、convolution lay-ers l with 3 × 3 filters (e.g.the first layer after thirdpool is named conv4 1). Intermediate post-activation re- sults are labeled xl and consist of N channels, which cap- ture patterns from the images for each region of the im-age: grain, colors, texture, strokes, etc. Other a
36、rchitec- tures tend to skip pixels regularly, compress data, or op- timized for classificationresulting in low-quality synthe- sis (Nikulin and Novak 2016).Our augmented network concatenates additional semantic channels ml of size M at the same resolution, computed by down-sampling a static semantic
37、 map specified as input. Themask per label as output by existings, or even somekind of “semantic embedding” that compactly describes im-age pixels (i.e. the representation for hair, beards, and eye- brows in portraits would be in close proximity).AlgorithmPatches of k × k are extracted from the
38、 semantic layers anddenoted by the function , respectively (sl ) for the in-sput style patches and (sl) for the current image patches. For any patch i in the current image and layer l, its near- est neighbor NN(i) is computed using normalized cross- correlationtaking into account weighted semantic m
39、ap:i(s) · j(ss)(3)NN(i) := arg min| (s)| · |(s )|jij sThe style error Es between all the patches i of layer l in the current image to the closest style patch is defined as the sum of the Euclidean distances:XEs(s, ss) =|i(s) NN(i)(ss)|2(4)iNote that the information from the semantic map in
40、 ml is used to compute the best matching patches and contributes to the loss value, but is not part of the derivative of the loss relative to the current pixels; only the differences in activa- tion xl compared to the style patches cause an adjustment ofthe image itself via the L-BFGS algorithm.By u
41、sing an augmentedthats compatible with theoriginal, existing patch-based implementations can use the additional semantic information without changes. If the se- mantic map and ml is zero, the original algorithm (2016)is intact. In fact, the introduction of the parameter from Equation 2 provides a co
42、nvenient way to introduce semantic style transfer incrementally.ExperimentsThe following experiments were generated from VGG19 network using augmented layers sem3 1 and sem4 1, with 3 × 3 patches and no additional rotated or scaled versions of the style images. The semantic maps used were man-
43、ually edited as RGB images, thus channels are in the range 0.255. The seed for the optimization was random, and rendering completed in multiple increasing resolutionsasusual for patch-based approaches (Li and Wand 2016). On a GTX970 with 4Gb of GPU RAM, rendering takes from 3 to 8 minutes depending
44、on quality and resolution.Figure 4: Examples of semantic style transfer with Van Gogh painting. Annotations for nose and mouth are not re- quired as the images are similar, however carefully anno- tating the eyeballs helps when generating photo-quality por- traits. Photo by Seth Johnson, concept by
45、Kyle McDonald.Precise Control via AnnotationsTransferring style in faces is arguably the most challenging task to meet expectationsand particularly if the colors in the corresponding segments of the image are opposed. Typ- ical results from our solution are shown in portraits from Figure 2, which co
46、ntains both success cases (top row) and sub-optimal results (bottom row). The input images were chosen once upfront and not curated to showcase represen- tative results; the only iteration was in using the semantic map as a tool to improve the quality of the output.In the portraits, the semantic map
47、 includes four main la- bels for background, clothing, skin and hairwith minor color variations for the eyes, mouth, nose and ears. (The se- mantic maps in this paper are shown as greyscale, but con- tain three channels.)In practice, using semantic maps as annotations helps al- leviate issues with p
48、atch- or gram-based style transfer. Of- ten, repeating patches appear when setting style weight too high (Figure 2, second row). When style weight is low, pat-terns are not transferred but lightly blended (Figure 2, first row). The semantics map prevents these issues by allowing the style weight to
49、vary relative to the content without suffer- ing from such artifacts; note in particular that the skin tones and background colors are transferred more faithfully.Parameter RangesGiven a fixed weight for the content loss = 10, the style loss for images in this paper ranges from 25 to 250 de- pending
50、 on image pairs. Figure 5 shows a grid with visual- izations of results as and vary; we note the following:The quality and variety of the style degenerates as in- creases too far, without noticeably improving the preci- sion wrt. annotations.As decreases, the algorithm reverts to its semantically un
51、aware version that ignores the annotations provided, but also indirectly causes an increase in style weight.The default value of is chosen to equalize the value range of the semantic channels ml and convolution ac- tivations xl, in this case = 50.Lowering from its default allows style to be reused a
52、cross semantic borders, which may be useful for certain applications if used carefully.In general, with the recommended default value of , ad-justing style weight now allows meaningful interpolation that does not degenerate into abstract patchworks.AnalysisHere we report observations from working wi
53、th the algo-rithm, and prour interpretations.Semantic Map Values Since the semantic channels ml are integrated into the same patch-based calculation, it af- fects how the normalized cross-correlation takes place. If the channel range is large, the values from convolution xlwill be scaled very differ
54、ently depending on the location inthe map. This may be desired, but in most cases it seems sensible to make sure values in ml have similar magnitude.Authored Representations We noticed that when users are asked to annotate images, after a bit of experience with the system, they implicitly create “se
55、mantic embeddings” that compactly describe pixel classes. For example, the rep- resentation of a stubble would be a blend between hair and skin, jewelry is similar but not identical to clothing, etc. Such representations seem better suited to semantic style transfer than plain layer masks.Content Ac
56、curacy vs. Style Quality When using seman- tic maps, only the style patches from the appropriate seg- ment can be used for the target image. When the number of source patches is small, this causes repetitive patterns, as witnessed in parts of Figure 2. This can be addressed by loosening the style co
57、nstraint and lowering , at the cost of precision.Figure 5: Varying parameters for the style transfer. First col- umn shows changes in style weight : 0) content reconstruc- tion, 10 to 50) artifact-free blends thanks to semantic con- straint, 250) best style quality. Second column shows values of sem
58、antic weight : 0) style overpowers content without semantic constraint, 10) low semantic weight strengthens in- fluence of style, 50) default value that equalizes channels,250) high semantic weight lowers quality of style.Figure 6: Deep image analogy for a Monet painting based on a doodle; its effectively semantic style transfer with no content loss. This result was achieved in only eight attem
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年中职(连锁门店运营)门店日常管理阶段测试试题及答案
- 2025年中职(乐器制造与维护)二胡制作工艺阶段测试题及答案
- 2025年中职(汽车运用与维修)汽车底盘构造试题及答案
- 2025年大学药品与医疗器械(医疗器械检测)试题及答案
- 2025年高职卫星通信技术(卫星通信应用)试题及答案
- 2025年大学纺织服装类(纺织性能测试)试题及答案
- 中国课件介绍
- 养老院老人请假审批制度
- 养老院老人生活娱乐活动组织人员行为规范制度
- 养老院老人康复理疗师激励制度
- 第六单元课外古诗词诵读《南安军》说课稿 2023-2024学年统编版语文九年级下册
- 食堂2023年工作总结及2024年工作计划(汇报课件)
- 机器学习课件周志华Chap08集成学习
- 殡仪馆鲜花采购投标方案
- T-GDWCA 0035-2018 HDMI 连接线标准规范
- 面板堆石坝面板滑模结构设计
- 初中语文新课程标准与解读课件
- 无人机装调检修工培训计划及大纲
- GB/T 3683-2023橡胶软管及软管组合件油基或水基流体适用的钢丝编织增强液压型规范
- 春よ、来い(春天来了)高木绫子演奏长笛曲谱钢琴伴奏
- ARJ21机型理论知识考试题库(汇总版)
评论
0/150
提交评论