




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、 山西大学计算机科学与信息技术学院山西大学计算机科学与信息技术学院山西大学山西大学大数据科学与产业大数据科学与产业研究院研究院2017年年12月月 知识工程知识工程OUTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks(DCGAN) Conditional Generative Adversarial Nets (CGAN)OUTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Ad
2、versarial Networks(DCGAN) Conditional Generative Adversarial Nets (CGAN) 有监督学习经常比无监督的能获得更好的训练效果。但真实世界中,有监督学习需要的数据标注(label)是相对少的。所以研究者们从未放弃去探索更好的无监督学习策略,希望能从海量的无标注数据中学到对于这个真实世界的表示甚至知识,从而去更好地理解我们的真实世界。 评价无监督学习好坏的方式有很多,其中生成任务就是最直接的一个。只有当我们能生成/创造我们的真实世界,才能说明我们是完完全全理解了它。然而,生成任务所依赖的生成式模型(generative models
3、)往往会遇到两大困难。首先是我们需要大量的先验知识去对真实世界进行建模,其中包括选择什么样的先验、什么样的分布等等。而建模的好坏直接影响着我们的生成模型的表现。另一个困难是,真实世界的数据往往很复杂,我们要用来拟合模型的计算量往往非常庞大,甚至难以承受。 Ian Goodfellow提出的Generative Adversarial Networks(GANs)很好的避开了这两个困难。每一个 GAN 框架,都包含着一对模型 一个生成模型(G)和一个判别模型(D)。因为 D 的存在,才使得 GAN 中的 G 不再需要对于真实数据的先验知识和复杂建模,也能学习去逼近真实数据,最终让其生成的数据达到
4、以假乱真的地步 D 也无法分别。 论文中的模型优化公式:Sample minibatch of m examples x_1,x_2,.,x_msample minibatch of m noise samplesz_1,z_2,.,z_mGeneratorDiscriminator代码说明及实验成果:# 定义判别器def discriminator(x): # 计算D_h1=ReLU(x*D_W1+D_b1),该层的输入为含784个元素的向量 D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1) # 计算第三层的输出结果。因为使用的是Sigmoid函数,则
5、该输出结果是一个取值为0,1间的标量(见上述权重定义) # 即判别输入的图像到底是真(=1)还是假(=0) D_logit = tf.matmul(D_h1, D_W2) + D_b2 D_prob = tf.nn.sigmoid(D_logit) # 返回判别为真的概率和第三层的输入值,输出D_logit是为了将其输入tf.nn.sigmoid_cross_entropy_with_logits()以构建损失函数 return D_prob, D_logit#定义一个可以生成m*n阶随机矩阵的函数,该矩阵的元素服从均匀分布,随机生成的z就为生成器的输入def sample_Z(m, n):
6、return np.random.uniform(-1., 1., size=m, n)# 定义生成器def generator(z): # 第一层先计算 y=z*G_W1+G-b1,然后投入激活函数计算G_h1=ReLU(y),G_h1 为第二次层神经网络的输出激活值 G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1) # 以下两个语句计算第二层传播到第三层的激活结果,第三层的激活结果是含有784个元素的向量,该向量转化2828就可以表示图像 G_log_prob = tf.matmul(G_h1, G_W2) + G_b2 G_prob = tf.nn
7、.sigmoid(G_log_prob) return G_prob#分别输入真实图片和生成的图片,并投入判别器以判断真伪D_real = discriminator(X)D_fake = discriminator(G_sample)#以下为原论文的判别器损失和生成器损失D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake)G_loss = -tf.reduce_mean(tf.log(D_fake)#定义判别器和生成器的优化方法为Adam算法,关键字var_list表明最小化损失函数所更新的权重矩阵D_solver = t
8、f.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)GAN的优势: 1.根据实际的结果,它们看上去可以比其它模型产生了更好的样本(图像更锐利、清晰)。 2.生成对抗式网络框架能训练任何一种生成器网络。大部分其他的框架需要该生成器网络有一些特定的函数形式,比如输出层是高斯的。重要的是所有其他的框架需要生成器网络遍布非零质量(non-zero mass)。生成对抗式网络能学习可以仅在与数据接近的细
9、流形(thin manifold)上生成点。 3.不需要设计遵循任何种类的因式分解的模型,任何生成器网络和任何鉴别器都会有用。 4.无需利用马尔科夫链反复采样,无需在学习过程中进行推断(Inference),回避了近似计算棘手的概率的难题。GAN的缺点: 1.解决不收敛(non-convergence)的问题。目前面临的基本问题是:所有的理论都认为 GAN 应该在纳什均衡(Nash equilibrium)上有卓越的表现,但梯度下降只有在凸函数的情况下才能保证实现纳什均衡。当博弈双方都由神经网络表示时,在没有实际达到均衡的情况下,让它们永远保持对自己策略的调整是可能的【OpenAI Ian G
10、oodfellow的Quora】。 2.难以训练:崩溃问题(collapse problem)。GAN的学习过程可能发生崩溃问题(collapse problem),生成器开始退化,总是生成同样的样本点,无法继续学习。【Improved Techniques for Training GANs】 3.无需预先建模,模型过于自由不可控。与其他生成式模型相比,GAN这种竞争的方式不再要求一个假设的数据分布,而是使用一种分布直接进行采样sampling,从而真正达到理论上可以完全逼近真实数据,这也是GAN最大的优势。然而,这种不需要预先建模的方法缺点是太过自由了,对于较大的图片,较多的 pixel的
11、情形,基于简单 GAN 的方式就不太可控了。在GANGoodfellow Ian, Pouget-Abadie J 中,每次学习参数的更新过程,被设为D更新k回,G才更新1回,也是出于类似的考虑。OUTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks(DCGAN) Conditional Generative Adversarial Nets (CGAN)In this work we introduce the conditional version of gene
12、rative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal mode
13、l, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditio
14、ned on some extra information y. y could be any kind of auxiliary information,such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer.In the generator the prior input noise pz(z), and y a
15、re combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed. In the discriminator x and y are presented as inputs and to a discriminative function (embodied again by a MLP in this case).OUTLINEGe
16、nerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks(DCGAN) Conditional Generative Adversarial Nets (CGAN)In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning
17、 with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constrai
18、nts, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Add
19、itionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.In this paper, we make the following contributions: We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to tr
20、ain in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN) We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms. We visualize the filters learnt by GANs and empirically show that specific f
21、ilters have learned to draw specific objects. We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.Background:Historical arrempts to scale up GANs using CNNs to model images have been unsuccessful.We
22、also encountered difficulties attempting to scale GANs using CNN architecures commonly used in the supervised literature.However,after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resol
23、ution and deeper generative models.Core our approach is adopting and modifying three recently demonstrated changes of CNN architectures.Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional- strided convolutions
24、 (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses Tanh. Use LeakyReLU activation in the discriminator for all layers.APPROACH AND MO
25、DEL ARCHITECTUREThe first is the all convolutional net which replaces deterministic spatial pooling functions(such as maxpooling) with stried convolutions.We use this approach in our generator,allowing it to learn its own spatial upsampling,and discriminator.Second is the trend towards eliminating f
26、ully connected layers on top of convolutional features.The strongest example of this is global average pooling which has been utilized in state of the art image classification models(Mordvintsev et al.).We found global average pooling increased model stability but huir convergence speed.A middle gro
27、und of directly connecting the highest convolutional features to the input and output respectively of the generator and discrominator worked well.The first layer of the GAN,which takes a uniform noise distribution Z as input ,could be called fully connected as it is just a matrix multiplication, but
28、 the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.Generate model:Discriminator m
29、odel: h0 = lrelu(conv2d(image, self.df_dim, name=d_h0_conv) h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name=d_h1_conv) h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name=d_h2_conv) h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name=d_h3_conv) h4 = linear(tf.reshape(h3, self.batch_size, -1),
30、 1, d_h4_lin)Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. Directly a
31、pplying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the o
32、utput layer which uses the Tanh function. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al.
33、, 2013).训练细节1、mini-batch训练,batch size是128.2、所有的参数初始化由(0, 0.02)的正态分布中随即得到3、LeakyReLU的斜率是0.2.4、虽然之前的GAN使用momentum来加速训练,DCGAN使用调好超参的Adam optimizer。5、learning rate=0.00026、将momentum参数beta从0.9降为0.5来防止震荡和不稳定。4.1 LSUNAs visual quality of samples from generative image models has improved, concerns of over-f
34、itting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link betwe
35、en how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high qualitysamples vi
36、a simply overfitting/memorizing training examples. No data augmentation was applied to the images.4.1.1 DEDUPLICATIONTo further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We fit a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving t
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 外科护理学胰腺疾病题库及答案解析
- 电炉安全生产试题题库及答案解析
- 煤矿从业人员考试试题 c卷及答案解析
- 焊工从业资格证模拟考试及答案解析
- 2025年国家开放大学《社会工作》期末考试备考试题及答案解析
- 2025年开县教师进城试题及答案
- 2025年监控人员安全培训考试题及答案
- 安全培训市场价课件
- 内蒙古安全员c2题库及答案解析
- 2025年安全员C证试题库附答案
- LY/T 1145-1993松香包装桶
- JJF 1338-2012相控阵超声探伤仪校准规范
- GB/T 9114-2000突面带颈螺纹钢制管法兰
- GB/T 17245-1998成年人人体质心
- 港口集团绩效考核方案
- 华为公司校园招聘个人简历标准版
- 固体化学固体中的扩散
- 学校结核病防控培训课件
- 经典企业商业融资计划书模板
- DBJ50T 043-2016 工程勘察规范
- 2023版北京协和医院重症医学科诊疗常规
评论
0/150
提交评论