版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、毕业论文外文资料翻译学 院: 专业班级: ) 学生姓名: 学 号: 指导教师: 外文出处: Notes on Convolutional Neural Networks 第3部分 附 件:1.外文资料翻译译文; 2.外文原文 指导教师评语:该英文资料选择合理,与毕业设计论文相关度很高。专业术语、词汇翻译的准确度较高,体现了较强的专业英语应用水平。翻译工作认真细致,严格按照规定,翻译材料能正确表达出原文意思,字、词数满足要求,翻译材料语句通顺。翻译工作能够严格按照规定,达到考核、提高学生英语应用水平的目的,较好完成要求。签名: 年 月 日1外文资料翻译译文卷积神经网络的注意事项卷积神经网络通常,
2、卷积层会穿插子采样层来减少计算时间,并逐步建立更多的空间和结构不变性。但是一个小的子采样因子是可取的,这是为了同时能够保持特异性。当然,这种想法并不新颖,但这种理念既简单又强大。因为哺乳动物的视觉皮层和模型及其12,8,7在很大程度上借鉴了这些理念。在过去十年中,听觉神经科学已经揭示:在许多不同的动物6,11,9的初级和带内侧听觉皮质中这些相同的设计范例可以被找到。层次分析和体系结构的学习还可能是听觉领域成功的关键。3.1卷积层我们可以继续向前推导,得出网络中卷积层的反向传播更新过程。在一个卷积层,先前层的特征图和可学习的内核有关,并通过激活函数把结果以特征图的形式输出。每个输出特征图可以与多
3、个输入特征图结合卷积。在一般情况下,我们有其中,Mj为选择输入的特征图,卷积是指在MATLAB实现时,将“有效的”边界进行处理。一些常见的选择输入映射包括所有的图片材料或者部分特征图。但是我们将讨论如何学习下面的组合,每个输出特征图给出的一个加偏置b,但是对于特定的输出映射,输入特征图将涉及不同的内核。也就是说,如果输出特征图j和映射K均为求和输入特征图i,则内核应用于特征图i和输出特征图j和k是不同的。3.1.1 梯度算法我们假设每个卷积层L之后是下降采样层L + 1。根据BP算法,为了计算用于在层L的单位的敏感度,我们首先对下一层的值求和,通过对应单元连接到节点的当前层,并乘以每个连接层L
4、 + 1定义相关的权重。然后我们乘这个数量在当前层的预激活输入评价激活函数的导数,在一个卷积层的情况下,由下采样层,一个像素在下一层的敏感性特征图进行对应的卷积,并输出图像映射图。因此,每一个特征图存在于层L和层L + 1之间。相应的特征图连接到仅一个单元,这是为了有效地计算在层L的敏感度系数,我们可以放大下采样层的灵敏度特征图,使其与大小相同的卷积层特征图同时乘上采样的灵敏度特征图,从而得出L+ 1层到层的自动激活派生特征图。在下采样层地图定义的“权重”都是相等的(常数,参见3.2节),所以我们只需要经过上一步的结果通过完成l.的计算,重复相同的计算每个卷积层的特征图,配以在抽样层相应的地图
5、:如果要子采样层对一个n作二次抽样,()表示一个采样操作,每个像素就会有一个水平因子的n作为内输出和垂直方向n次作为外输出,如我们将在下面讨论的,为了有效地实施此功能的一种可能的方法是使用克罗内克积:既然,对于一个给定的特征图我们有敏感系数,那么我们马上就可以通过在l.j的所有处进行简单地计算偏置梯度:最后,内核权重梯度是通过反向传播来计算的,除了这种情况,同样的重量是通过许多连接点来分享出去的。因此,我们将会为一个给定的重量总结梯度在所有的连接点来提到这个,正如我们为这个偏向所做的:(Pll-1) 是在Xil-1 上的补丁卷积过程,计算该元件在明智是乘法元件由K(U,V)中的输出卷积地图Xl
6、j.上。乍一看它可能会出现,我们需要刻意跟踪在输入地图,补丁对应于输出图(及其相应的灵敏度的地图)的哪些像素,但等式(7)可以在一个单一的线路被实现MATLAB使用卷积在重叠的有效区域:在这里,我们的图像,以交叉方式处理,而不是卷积,并旋转输出,以便我们在前馈传递卷积时,内核将有预期的旋转方向。3.2子采样层一个子采样层产生输入特征图的下采样模板。如果有N个输入的特征图,则会有正好N个输出的特征图,虽然输出的地图将较小。更正式地说,其中,下()表示子采样函数。典型地,通常这个功能会和在每个不同的n阶块的输入图像,输出图像尺寸较小的沿n次。每一个输出地图都有它自己的乘法性偏置和加性偏置b。我们也
7、可以简单地扔掉图像中的每个其他样本100.23.2.1计算梯度这里的难点在于计算灵敏度图。我们已经拿到了,唯一可以学习的参数,我们需要更新的偏置参数和b,我们假设抽样层,周围的上面和下面的卷积层。如果下面的子层的层是一个全连接层,然后对采样层敏感性地图可以与vanilla传播方程第2节中介绍的计算。当我们试图来计算段3.1.1内核的梯度,我们必须弄清楚哪些补丁输入对应的输出映射给定像素。在这里,我们必须弄清楚哪些补丁在当前层敏感图对应于下一位玩家的敏感度地图给定的像素,以应用增量递归,看起来像公式(4)。当然,输入接插和输出像素之间的连接相乘的权重是(旋转)卷积核的确切重量。这又是有效地利用卷
8、积来实现:像以前一样,我们旋转的内核,使卷积函数执行互相关。注意,在这种情况下,然而,我们需要“full”卷积边界处理,借再从MATLAB的命名。这个小的差异使我们处理边界的情况下容易且有效地,在这里的输入来在层L+ 1上的单元的数量不所述nn的卷积核的全尺寸。在这种情况下,“full”的卷积会自动垫丢失的输入与零。在这一点上,我们已经准备好计算b和的加性偏差的梯度,再加上的灵敏度地图的元素的总和。在的乘法偏置过程中,当然涉及到原来的向下采样的地图在当前层的前馈传递过程中计算。出于这个原因,它有利于前馈计算中保存这些地图,所以我们不需要重新计算他们。让我们来定义:然后,梯度给出的3.3特征映射
9、的学习组合通常,有利于提供一个输出的地图,包括了在不同输入映射多个卷积和。在文献中,被组合成一个给定的输出映射的输入映射通常选择的手。然而,我们可以尝试在训练中学习这些组合。让ij表示给定的输入地图的重量,我在形成输出图时,然后输出映射。受下列条件限制这些限制可以通过设置变量ij等于SoftMax在一组约束执行,其权值Cij:因为每一组权值Cij固定j是独立于任何其他J所有其他设置,我们可以考虑更新一张地图和下降下标j.每个地图都是以同样的方式更新,除了不同的j指标。所述softmax 函数的导数由下式给出:(其中,这里D被用作克罗内克函数),而他的导数(1)相对于所述Xi 变量在层L是:这里
10、对应于输入一输出灵敏度美国地图地图,卷积是“有效的”型,结果将匹配的敏感性地图的大小。现在,我们可以使用链规则来计算的误差函数(1)的梯度相对于底层的权Ci3.3.1执行稀疏组合我们也可以尝试对权重对于一个给定的地图加入正则化到最终的误差函数分布的稀疏性约束。在这样做时,我们会鼓励一些权重去零。在这种情况下,只有几个输入图将有助于显着一个给定的输出映射,而不是所有的。让我们写一个错误的单一模找到的正则化项的权重Ci .自定义参数控件之间的最小化网络的拟合训练数据的权衡的梯度的贡献,并确保按范数的正则化项提到的权重很小。我们会重新考虑只有权重给定输出地图和下降下标j第一,我们需除了在原点。这个结
11、果与(8)相结合将使我们获得的贡献:为权重Ci 中的最终梯度使用惩罚误差函数(11),当可利用(13)和计算(9):3.4使其与MATLAB一样快速在一个网络中交替子采样和卷积层的主要计算瓶颈是:前馈传递期间:采样卷积层输出的特征图。反向传播期间:采样更高的子采样层的sub-sampling层相匹配的下卷积层输出特征图的大小。sigmoid 函数及其衍生的应用。在进行卷积的前馈和反向传播阶段也当然的计算瓶颈,但假设二维卷积程序有效执行,没有什么我们可以做的事。有人可能倾向于使用MATLAB内置的图像处理程序来处理上和下采样操作。然而对于上采样,imresize将做的工作,有显著的巨大开销。更快
12、的替代方法是使用Kronecker积函数KRON,与基质进行上采样,和其中的一个矩阵。这可以是一个数量级快。当涉及到的前馈通期间下采样步骤,imresize不提供由不同的n-by-n个块求和来下采样的选项。“最近邻”的方法将取代一个像素的块,只有一个原始像素的块。另一种是将blkproc每个不同的块,或一些组合im2col和colfilt。尽管这些选项只计算必要的东西而已,重复调用用户定义的块处理功能带来很大的开销。一个更快的替代本案是卷积与矩阵的图像,然后简单地把每一项使用标准索引(即Y = X(1:2:end,1:2:end)。虽然在这种情况下,实际计算卷积的四倍(假设2输出采样)我们真正
13、需要的,这种方法仍然是(经验)一个量级左右,比前面提到的方法更快。大多数作者,似乎实现了sigmoid 激活功能,它的衍生工具,使用内联函数定义。在写这篇文章的时候,“内联”MATLAB函数定义一点都不喜欢C的宏,并采取一个巨大的量的时间来评估。因此,它往往是以简单的替换所有引用F和F0与实际代码值得。当然,优化代码和保持可读性之间有一个折中。参考文献1 C.M. Bishop,“Neural Networks for Pattern Recognition”, Oxford University Press, New York, 1995.2 F.J. Huang and Y. LeCun.
14、 “Large-scale Learning with SVM and Convolutional for Generic Object Categorization”, In: Proc. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 284-291, 2006.3 Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel. “Backpropag
15、ation applied to handwritten zip code recognition”, Neural Computation, 1(4), 1989.4 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.“Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, pp. 22782324, November 1998.5 Y. LeCun, L. Bottou, G. Orr, and K. Muller. “
16、Efficient BackProp”, in: Neural Networks: Tricks of the trade, G. Orr and K. Muller (eds.), “Springer”, 1998.6 J.P. Rauschecker and B. Tian. “Mechanisms and streams for processing of what and wherein auditory cortex,” Proc. Natl. Acad. Sci. USA, 97 (22), 1180011806, 2000.7 T. Serre, M. Kouh, C. Cadi
17、eu, U. Knoblich, G. Kreiman and T. Poggio. “A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex”, CBCL Paper #259/AI Memo #2005-036, Massachusetts Institute of Technology, October, 2005.8 T. Serre, A. Oliva and T. Poggio. “
18、A Feedforward Architecture Accounts for Rapid Categorization”, Proc. Natl. Acad. Sci. USA, (104)15, pp.6424-6429, 2007.9 S. Shamma. “On the role of space and time in auditory processing,” TRENDS in Cognitive Sciences, Vol. 5 No. 8, 2001.10 P.Y. Simard, Dave Steinkraus, and John C. Platt. “Best Pract
19、ices for Convolutional Neural Networks Applied to Visual Document Analysis”, Proceedings of the International Conference on Document Analysis and Recognition, pp. 958-962, 2003.11 F.E. Theunissen, K. Sen, and A. Doupe, “Spectral-temporal receptive fields of nonlinear auditory neurons obtained using
20、natural sounds,” J. Neuro., Vol. 20, pp.23152331, 2000.12 D. Zoccolan, M. Kouh, J. DiCarlo and T. Poggio. “Tradeoff between selectivity and tolerance in monkey anterior inferior temporal cortex”, J. Neurosci., 2007.2.外文原文Notes on Convolutional Neural NetworksJake BouvrieCenter for Biological and Com
21、putational LearningDepartment of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeNovember 22, 20063 Convolutional Neural NetworksTypically convolutional layers are interspersed with sub-sampling layers to reduce computation time and to gradually build up further spatial and
22、 configural invariance. A small sub-sampling factor is desirable however in order to maintain specificity at the same time. Of course, this idea is not new, but the concept is both simple and powerful. The mammalian visual cortex and models thereof 12,8, 7 draw heavily on these themes, and auditory
23、neuroscience has revealed in the past ten years or so that these same design paradigms can be found in the primary and belt auditory areas of the cortex in a number of different animals 6, 11, 9. Hierarchical analysis and learning architectures may yet be the key to success in the auditory domain.3.
24、1 Convolution LayersLets move forward with deriving the backpropagation updates for convolutional layers in a network.At a convolution layer, the previous layers feature maps are convolved with learnable kernels and put through the activation function to form the output feature map. Each output map
25、may combine convolutions with multiple input maps. In general, we have thatWhere Mj represents a selection of input maps, and the convolution is of the “valid” border handling type when implemented in MATLAB. Some common choices of input maps include all-pairs or alltriplets, but we will discuss how
26、 one might learn combinations below. Each output map is given an additive bias b, however for a particular output map, the input maps will be convolved with distinct kernels. That is to say, if output map j and map k both sum over input map i, then the kernels applied to map i are different for outp
27、ut maps j and k.3.1.1 Computing the GradientsWe assume that each convolution layer L is followed by a downsampling layer L+1. The backpropagation algorithm says that in order to compute the sensitivity for a unit at layer L, we should first sum over the next layers sensitivies corresponding to units
28、 that are connected to the node of interest in the current layer L, and multiply each of those connections by the associated weights defined at layer L+ 1. We then multiply this quantity by the derivative of the activation function evaluated at the current layers pre-activation inputs, u. In the cas
29、e of a convolutional layer followed by a downsampling layer, one pixel in the next layers associated sensitivity map corresponds to a block of pixels in the convolutional layers output map. Thus each unit in a map at layer L connects to only one unit in the corresponding map at layer L + 1. To compu
30、te the sensitivities at layer L efficiently, we can upsample the downsampling layers sensitivity map to make it the same size as the convolutional layers map and then just multiply the upsampled sensitivity map from layer L+1 with the activation derivative map at layer element-wise. The “weights” de
31、fined at a downsampling layer map are all equal to (a constant, see section 3.2), so we just scale the previous steps result by to finish the computation of l. We can repeat the same computation for each map j in the convolutional layer,pairing it with the corresponding map in the subsampling layer:
32、where up() denotes an upsampling operation that simply tiles each pixel in the input horizontally and vertically n times in the output if the subsampling layer subsamples by a factor of n. As we will discuss below, one possible way to implement this function efficiently is to use the Kronecker produ
33、ct:Now that we have the sensitivities for a given map, we can immediately compute the bias gradient by simply summing over all the entries in l.jFinally, the gradients for the kernel weights are computed using backpropagation, except in this case the same weights are shared across many connections.
34、Well therefore sum the gradients for a given weight over all the connections that mention this weight, just as we did for the bias term:Where (Pll-1) is the patch in Xil-1 that was multiplied elementwise by P during convolution in order to compute the element at (u, v) in the output convolution map
35、Xlj. At first glance it may appear that we need to painstakingly keep track of which patches in the input map correspond to which pixels in the output map (and its corresponding map of sensitivities), but equation (7) can be implemented in a single line of MATLAB using convolution over the valid reg
36、ion of overlap: Here we rotate the K image in order to perform cross-correlation rather than convolution, and rotate the output back so that when we perform convolution in the feed-forward pass, the kernel will have the expected orientation.3.2 Sub-sampling LayersA sub-sampling layer produces downsa
37、mpled versions of the input maps. If there are N input maps, then there will be exactly N output maps, although the output maps will be smaller. More formally,where down() represents a sub-sampling function. Typically this function will sum over each distinct n-by-n block in the input image so that
38、the output image is n-times smaller along both dimensions. Each output map is given its own multiplicative bias and an additive bias b. We can also simply throw away every other sample in the image 10.23.2.1 Computing the GradientsThe difficulty here lies in computing the sensitivity maps. One weve
39、got them, the only learnable parameters we need to update are the bias parameters and b. We will assume that the subsampling layers, are surrounded above and below by convolution layers. If the layer following the subsampling layer is a fully connected layer, then the sensitivity maps for the subsam
40、pling layer can be computed with the vanilla backpropagation equations introduced in section 2.When we tried to compute the gradient of a kernel in section 3.1.1, we had to figure out which patch in the input corresponded to a given pixel in the output map. Here, we must figure out which patch in th
41、e current layers sensitivity map corresponds to a given pixel in the next layers sensitivity map in order to apply a delta recursion that looks something like equation (4). Of course, the weights multiplying the connections between the input patch and the output pixel are exactly the weights of the
42、(rotated) convolution kernel. This is again efficiently implemented using convolution:As before, we rotate the kernel to make the convolution function perform cross-correlation. Notice that in this case, however, we require the “full” convolution border handling, to borrow again from MATLABs nomencl
43、ature. This small difference lets us deal with the border cases easily and efficiently, where the number of inputs to a unit at layer L+1 is not the full size of the nn convolution kernel. In those cases, the “full” convolution will automatically pad the missing inputs with zeros.At this point were
44、ready to compute the gradients for b and . The additive bias is again just the sum over the elements of the sensitivity map:The multiplicative bias will of course involve the original down-sampled map computed at the current layer during the feedforward pass. For this reason, it is advantageous to s
45、ave these maps during the feedforward computation, so we dont have to recompute them during backpropagation. Lets defineThen the gradient for is given by3.3 Learning Combinations of Feature Maps Often times, it is advantageous to provide an output map that involves a sum over several convolutions of
46、 different input maps. In the literature, the input maps that are combined to form a given output map are typically chosen by hand. We can, however, attempt to learn such combinations during training. Let ij denote the weight given to input map i when forming output map j. Then output map j is given
47、 bysubject to the constraints These constraints can be enforced by setting the ij variables equal to the softmax over a set of unconstrained, underlying weights Cij:Because each set of weights Cij for fixed j are independent of all other such sets for any other j, we can consider the updates for a s
48、ingle map and drop the subscript j. Each map is updated in the same way, except with different j indices.The derivative of the softmax function is given by(where here is used as the Kronecker delta), while the derivative of (1) with respect to the i variables at layer L isHere l is the sensitivity m
49、ap corresponding to an output map with inputs u. Again, the convolution is the “valid” type so that the result will match the size of the sensitivity map. We can now use the chain rule to compute the gradients of the error function (1) with respect to the underlying weights Ci3.3.1 Enforcing Sparse
50、CombinationsWe can also try to impose sparseness constraints on the distribution of weights i for a given map by adding a regularization penalty () to the final error function. In doing so, well encourage some of the weights to go to zero. In that case, only a few input maps would contribute signifi
51、cantly to a given output map, as opposed to all of them. Lets write the error for a single pattern asand find the contribution of the regularization term to the gradient for the weights Ci . The userdefined parameter controls the tradeoff between minimizing the fit of the network to the training dat
52、a, and ensuring that the weights mentioned in the regularization term are small according to the 1-norm. We will again consider only the weights i for a given output map and drop the subscript j. First, we need thateverywhere except at the origin. Combining this result with (8) will allow us to deri
53、ve the contribution:The final gradients for the weights Ci when using the penalized error function (11) can be computed using (13) and (9):3.4 Making it Fast with MATLABIn a network with alternating sub-sampling and convolution layers the main computational bottlenecks are:1. During the feedforward
54、pass: downsampling the convolutional layers output maps2. During backpropagation: upsampling of a higher sub-sampling layers deltas to match the size of the lower convolutional layers output maps.3. Application of the sigmoid and its derivative.Performing the convolutions during both the feedforward
55、 and backproagation stages are also computational bottlenecks of course, but assuming the 2D convolution routine is efficiently implemented, there isnt much we can do about it.One might be tempted however to use MATLABs built-in image processing routines to handle the up- and down-sampling operation
56、s. For up-sampling, imresize will do the job, but with significant overhead. A faster alternative is to use the Kronecker product function kron, with the matrix to be upsampled, and a matrix of ones. This can be an order of magnitude faster. When it comes to the down-sampling step during the feedfor
57、ward pass, imresize does not provide the option to downsample by summing over distinct n-by-n blocks. The “nearest-neighbor” method will replace a block of pixels by only one of the original pixels in the block. An alternative is to apply blkproc to each distinct block, or some combination of im2col
58、 and colfilt. While both of these options only computes whats necessary and nothing more, repeated calls to the userdefined block-processing function imposes significant overhead. A much faster alternative in this case is to convolve the image with a matrix of ones, and then simply take every-other
59、entry using standard indexing (i.e. y=x(1:2:end,1:2:end). Although convolution in this case actually computes four times as many outputs (assuming 2x downsampling) as we really need, this method is still (empirically) an order of magnitude or so faster than the previously mentioned approaches.Most a
60、uthors, it seems, implement the sigmoid activation function and its derivative using inline function definitions. At the time of this writing, “inline” MATLAB function definitions are not at all like C macros, and take a huge of amount of time to evaluate. Thus, it is often worth it to simply replac
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年黑龙江旅游职业技术学院单招综合素质考试备考题库带答案解析
- 2026年桂林生命与健康职业技术学院单招职业技能笔试备考题库带答案解析
- 2026年广东建设职业技术学院单招综合素质考试备考题库带答案解析
- 2026年黑龙江农业经济职业学院单招综合素质笔试模拟试题带答案解析
- 2026年安徽中医药高等专科学校高职单招职业适应性测试备考题库有答案解析
- 土地合作开发协议2025年信息披露
- 投资协议(天使投资)2025年退出机制
- 投资合作协议2025年
- 碳交易中介服务合同2025年
- 2026年成都工业职业技术学院单招综合素质笔试参考题库带答案解析
- 2025年广东省高中语文学业水平合格考试卷试题(含答案详解)
- 停车场道闸施工方案范本
- 2025年广东省春季高考语文试卷(解析卷)
- 2025年实验室安全事故案例
- 垃圾焚烧发电检修培训
- 城市老旧建筑改造中的结构加固与性能提升
- 全国计算机等级考试NCRE考务管理系统操作使用手册
- 铁路更换夹板课件
- 市政工程材料试验检测送检规范
- 食材销售方案
- 国资委机关公开遴选公务员面试经典题及答案
评论
0/150
提交评论