




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Introduction to Deep Learning Outline Conception of deep learning Development history Deep learning frameworks Deep neural network architectures Convolutional neural networks Introduction Network structure Training tricks Application in Aesthetic Image Evaluation Idea Deep Learning(Hinton,2006) Deep
2、 learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data. The advantage of deep learning is to extracting features automatically instead of extracting features manually. Computer vision Speech recognition Natural language processing
3、 Development History 1943 1940 1950 1960 1970 1980 1990 2000 2010 MP model 1958 Single- layer Perceptron 1969 XOR proble m 1986 BP algorithm 1989 CNN- LeNet 1995 1997 SVM LSTM Gradient disappearanc e problem 19912006 DBN ReL U 2011 2012 2015 Dropou t AlexNe t BN Faster R- CNN ResidualNet Geoffrey Hi
4、nton W.S.McCulloch W.Pitts Rosenblatt Marvin Minsky Yann LeCun Hinton Hinton Hinton LeCun Bengio Deep Learning Frameworks Deep neural network architectures Deep Belief Networks(DBN) Recurrent Neural Networks (RNN) Generative Adversarial Networks (GANs) Convolutional Neural Networks (CNN) Long Short-
5、Term Memory(LSTM) DBN(Deep Belief Network,2006) Hidden units and visible units Each unit is binary(0 or 1). Every visible unit connects to all the hidden units. Every hidden unit connects to all the visible units. There are no connections between v-v and h- h. Hinton G E. Deep belief networksJ. Scho
6、larpedia, 2009, 4(6):5947. Fig1. RBM(restricted Boltzmann machine) structure. Fig2. DBN(deep belief network) structure. Idea ? Composed of multiple layers of RBM. How to we train these additional layers? Unsupervised greedy approach RNN(Recurrent Neural Network,2013) What? RNN aims to process the se
7、quence data. RNN will remember the previous information and apply it to the calculation of the current output. That is, the nodes of the hidden layer are connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer. Marhon S A,
8、Cameron C J F, Kremer S C. Recurrent Neural NetworksM/ Handbook on Neural Information Processing. Springer Berlin Heidelberg, 2013:29-65. Applications? Machine Translation Generating Image Descriptions Speech Recognition How to train? BPTT(Back propagation through time) GANs(Generative Adversarial N
9、etworks,2014) GANs Inspired by zero-sum Game in Game Theory, which consists of a pair of networks - a generator network and a discriminator network. The generator network generates a sample from the random vector, the discriminator network discriminates whether a given sample is natural or counterfe
10、it. Both networks train together to improve their performance until they reach a point where counterfeit and real samples can not be distinguished. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial netsC/Advances in neural information processing systems. 2014: 2672-2680. Applacat
11、ions: Image editing Image to image translation Generate text Generate images based on text Combined with reinforcement learning And more Long Short-Term Memory(LSTM,1997) Neural Networks Neuron Neural network Convolutional Neural Networks(CNN) Convolution neural network is a kind of feedforward neur
12、al network, which has the characteristics of simple structure, less training parameters and strong adaptability. CNN avoids the complex pre-processing of image(etc.extract the artificial features), we can directly input the original image. Basic components : Convolution Layers, Pooling Layers, Fully
13、 connected Layers Convolution layer The convolution kernel translates on a 2-dimensional plane, and each element of the convolution kernel is multiplied by the element at the corresponding position of the convolution image and then sum all the product. By moving the convolution kernel, we have a new
14、 image, which consists of the sum of the product of the convolution kernel at each position. local receptive field weight sharing Reduced the number of parameters Pooling layer Pooling layer aims to compress the input feature map, which can reduce the number of parameters in training process and the
15、 degree of over-fitting of the model. Max-pooling : Selecting the maximum value in the pooling window. Mean-pooling : Calculating the average of all values in the pooling window. Fully connected layer and Softmax layer Each node of the fully connected layer is connected to all the nodes of the last
16、layer, which is used to combine the features extracted from the front layers. Fig1. Fully connected layer. Fig2. Complete CNN structure. Fig3. Softmax layer. Training and Testing Forward propagation -Taking a sample (X, Yp) from the sample set and put the X into the network; -Calculating the corresp
17、onding actual output Op. Back propagation -Calculating the difference between the actual output Op and the corresponding ideal output Yp; -Adjusting the weight matrix by minimizing the error. Training stage : Testing stage : Putting different images and labels into the trained convolution neural net
18、work and comparing the output and the actual value of the sample. Before the training stage, we should use some different small random numbers to initialize weights. CNN Structure Evolution Hinton BP Neocognition LeCun LeNet AlexNet Historical breakthrough ReLU Dropout GPU+BigD ata VGG1 6 VGG19MSRA-
19、 Net Deeper network NINGoogLeN et Inception V3 Inception V4 R-CNN SPP- Net Fast R- CNN Faster R- CNN Inceptio n V2(BN) FCN FCN+C RF STNe t CNN+RN N/LSTM ResN et Enhanced the functionality of the convolution module Classification taskDetection task Add new functional unit integration1980 1998 1989 20
20、142015 ImageNet ILSVRC (ImageNet Large Scale Visual Recognition Challenge) 20132014 2015 2015 2014,201 5 2015 2012 2015 BN(Batch Normalization) RPN LeNet(LeCun,1998) LeNet is a convolutional neural network designed by Yann LeCun for handwritten numeral recognition in 1998. It is one of the most repr
21、esentative experimental systems in early convolutional neural networks. LeNet includes the convolution layer, pooling layer and full-connected layer, which are the basic components of modern CNN network. LeNet is considered to be the beginning of the CNN. network structure : 3 convolution layers + 2
22、 pooling layers+1 fully connected layer + 1 output layer Haykin S, Kosko B. GradientBased Learning Applied to Document RecognitionD. Wiley-IEEE Press, 2009. AlexNet(Alex,2012) Network structure : 5 convolution layers + 3 fully connected layers The nonlinear activation function : ReLU(Rectified linea
23、r unit) Methods to prevent overfitting : Dropout, Data Augmentation Big Data Training : ImageNet- image database of million orders of magnitude Others : GPU, LRN(local response normalization) layer Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks
24、C/ International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012:1097-1105. Overfeat(2013) Sermanet P, Eigen D, Zhang X, et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional NetworksJ. Eprint Arxiv, 2013. VGG-Net(Oxford University,2
25、014) input : a fixed-size 224*224 RGB image filters : a very small receptive field-3*3,with stride 1 Max-pooling : 2*2 pixel window, with stride 2 Fig1. Architecture of VGG16 Table 1: ConvNet configurations (shown in columns). The convolutional layer parameters are denoted as “conv-” Simonyan K, Zis
26、serman A. Very Deep Convolutional Networks for Large-Scale Image RecognitionJ. Computer Science, 2014. Why 3*3 filters? Stacked conv. layers have a large receptive field More non-linearity Less parameters to learn Network-in-Network(NIN, Shuicheng Yan, 2013) Network structure : 4 Mlpconv layers + Gl
27、obal average pooling layer Fig 1. linear convolution MLP convolutionFig 2. fully connected layer global average pooling layer Min Lin et al, Network in Network, Arxiv 2013. Fig 3. NIN structure Linear combination of multiple feature maps. Information integration of cross-channel. Reduced the paramet
28、ers Reduced the network Avoided over-fitting GoogLeNet(Inception V1,2014) Fig1. Inception module, nave version Proposed inception architecture and optimized it Canceled the fully connnected layer Used auxiliary classifiers to accelerate network convergence Szegedy C, Liu W, Jia Y, et al. Going deepe
29、r with convolutionsC/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1-9. Fig2. Inception module with dimension reductions Fig3. GoogLeNet network(22 layers) Inception V2(2015) Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducin
30、g internal covariate shiftJ. arXiv preprint arXiv:1502.03167, 2015. Inception V3(2015) Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer visionC/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2818-2826. ResNet(Kaiwen He,
31、2015) A simple and clean framework of training “very” deep networks. State-of-the-art performance for Image classification Object detection Semantic Segmentation and more He K, Zhang X, Ren S, et al. Deep Residual Learning for Image RecognitionJ. 2015:770- 778. Fig1. Shortcut connections Fig2. ResNe
32、t structure(152 layers) FractalNet Inception V4(2015) Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learningJ. arXiv preprint arXiv:1602.07261, 2016. Inception-ResNet He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Reco
33、gnitionJ. 2015:770- 778. Comparison SqueezeNet SqueezeNet: AlexNet-level accuracy with 50 x fewer parameters and 0.5MB model size Xception R-CNN(2014) Region proposals : Selective Search Resize the region proposal : Warp all region proposals to the required size(227*227, AlexNet Input) Compute CNN f
34、eature : Extract a 4096-dimensional feature vector from each region proposal using AlexNet. Classify : Training a linear SVM classifier for each class. 1Uijlings J R R, Sande K E A V D, Gevers T, et al. Selective Search for Object RecognitionJ. International Journal of Computer Vision, 2013, 104(2):
35、154-171. 2Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic SegmentationJ. 2014:580-587. R-CNN : Region proposals + CNN SPP-Net(Spatial pyramid pooling network,2015) He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional
36、Networks for Visual RecognitionJ. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(9):1904-1916. Fig2. A network structure with a spatial pyramid pooling layer. Fig1. Top: A conventional CNN. Bottom: Spatial pyramid pooling network structure. Advantages: Get the feature map of
37、the entire image to save much time. Output a fixed length feature vector with inputs of arbitrary sizes. Extract the feature of different scale, and can express more spatial information. The SPP-Net method computes a convolutional feature map for the entire input image and then classifies each objec
38、t proposal using a feature vector extracted from the shared feature map. Fast R-CNN(2015) A Fast R-CNN network takes an entire image and a set of object proposals as input . The network processes the entire image with several convolutional (conv) and max pooling layers to produce a conv feature map.
39、 For each object proposal, a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map. Each feature vector is fed into a sequence of fully connected layers that finally branch into two sibling output layers. Girshick R. Fast r-cnnC/Proceedings of the IEEE In
40、ternational Conference on Computer Vision. 2015: 1440-1448. Faster R-CNN(2015) Faster R-CNN = RPN + Fast R-CNN A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score. Ren S, He K, Girshick R, et al. Faste
41、r r-cnn: Towards real-time object detection with region proposal networksC/Advances in neural information processing systems. 2015: 91-99. Figure 1. Faster R-CNN is a single, unified network for object detection. Figure 2. Region Proposal Network (RPN). Training tricks Data Augmentation Dropout ReLU
42、 Batch Normalization Data Augmentation -rotation -flip -zoom -shift -scale -contrast -noise disturbance -color -. Dropout(2012) Dropout consists of setting to zero the output of each hidden neuron with probability p. The neurons which are “dropped out” in this way do not contribute to the forward ba
43、ckpropagation and do not participate in backpropagation. ReLU(Rectified Linear Unit) advantag es rectifie d Simplified calculation Avoided gradient disappeared Batch Normalization(2015) In the input of each layer of the network, insert a normalized layer. For a layer with d- dimensional input x = (x
44、(1) . . . x(d), we will normalize each dimension: Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shiftJ. arXiv preprint arXiv:1502.03167, 2015. Internal Covariate Shift Application in Aesthetic Image Evaluation Dong Z, Shen X, Li H, et al.
45、Photo Quality Assessment with DCNN that Understands Image WellM/ MultiMedia Modeling. Springer International Publishing, 2015:524-535. Lu X, Lin Z, Jin H, et al. Rating image aesthetics using deep learningJ. IEEE Transactions on Multimedia, 2015, 17(11): 2021-2034. Wang W, Zhao M, Wang L, et al. A m
46、ulti-scene deep learning model for image aesthetic evaluationJ. Signal Processing Image Communication, 2016, 47:511-518. Photo Quality Assessment with DCNN that Understands Image Well DCNN_Aest h trained well network a two-class SVM classifier DCNN_Aesth_ SP origina l image s segmente d images spati
47、al pyramid ImageNet CUHK AVA Dong Z, Shen X, Li H, et al. Photo Quality Assessment with DCNN that Understands Image WellM/ MultiMedia Modeling. Springer International Publishing, 2015:524-535. Rating image aesthetics using deep learning Support heterogeneous inputs, i.e., global and local views. All
48、 parameters in DCNN are jointly trained. Fig1. Global views and local views of an image Fig3. DCNN architecture Fig2. SCNN architecture SCN N DCN N Enables the network to judge image aesthetics while simultaneously considering both the global and local views of an image. Lu X, Lin Z, Jin H, et al. R
49、ating image aesthetics using deep learningJ. IEEE Transactions on Multimedia, 2015, 17(11): 2021-2034. A multi-scene deep learning model for image aesthetic evaluation Design a scene convolutional layer consist of multi-group descriptors in the network. Design a pre-training procedure to initialize
50、our model. Fig1. The architecture of the multi-scene deep learning model (MSDLM). Fig2. The over view of proposed MSDLM. Architecture of MSDLM: 4 convolutional layers+ 1 scene convolutional layer+ 3 fully connected layers Wang W, Zhao M, Wang L, et al. A multi-scene deep learning model for image aes
51、thetic evaluationJ. Signal Processing Image Communication, 2016, 47:511-518. Example -Load the dataset def load_dataset(): url = http:/ filename = E:/DeepLearning_Library/mnist.pkl.gz if not os.path.exists(filename): print(Downloading MNIST dataset.) urlretrieve(url, filename) with gzip.open(filenam
52、e, rb) as f: data = pickle.load(f) X_train, y_train = data0 X_val, y_val = data1 X_test, y_test = data2 X_train = X_train.reshape(-1, 1, 28, 28) X_val = X_val.reshape(-1, 1, 28, 28) X_test = X_test.reshape(-1, 1, 28, 28) y_train = y_train.astype(np.uint8) y_val = y_val.astype(np.uint8) y_test = y_te
53、st.astype(np.uint8) return X_train, y_train, X_val, y_val, X_test, y_test X_train, y_train, X_val, y_val, X_test, y_test = load_dataset() plt.imshow(X_train00, cmap=cm.binary) Example Model net1 = NeuralNet( layers=(input, layers.InputLayer), (conv2d1, layers.Conv2DLayer), (maxpool1, layers.MaxPool2
54、DLayer), (conv2d2, layers.Conv2DLayer), (maxpool2, layers.MaxPool2DLayer), (dropout1, layers.DropoutLayer), (dense, layers.DenseLayer), (dropout2, layers.DropoutLayer), (output, layers.DenseLayer), , # input layer input_shape=(None, 1, 28, 28), # layer conv2d1 conv2d1_num_filters=32, conv2d1_filter_
55、size=(5, 5), conv2d1_nonlinearity=lasagne.nonlinearities.rectify, conv2d1_W=lasagne.init.GlorotUniform(), # layer maxpool1 maxpool1_pool_size=(2, 2), # layer conv2d2 conv2d2_num_filters=32, conv2d2_filter_size=(5, 5), conv2d2_nonlinearity=lasagne.nonlinearities.rectify, # layer maxpool2 maxpool2_poo
56、l_size=(2, 2), # dropout1 dropout1_p=0.5, # dense i.e. full-connected layer dense_num_units=256, dense_nonlinearity=lasagne.nonlinearities.rectify, # dropout2 dropout2_p=0.5, # output output_nonlinearity=lasagne.nonlinearities.softmax, output_num_units=10, # optimization method params update=nestero
57、v_momentum, update_learning_rate=0.01, update_momentum=0.9, max_epochs=10, verbose=1, ) Example Train and Test # Train the network nn = net1.fit(X_train, y_train) #Using the above training model to predict the test set preds=net1.predict(X_test) cm = confusion_matrix(y_test, preds) plt.matshow(cm) p
58、lt.title(Confusion matrix) plt.colorbar() plt.ylabel(True label) plt.xlabel(Predicted label) plt.show() #visualize the feature map of conv2d1 visualize.plot_conv_weights(net1.layers_conv2d1) Example Result References 1Marhon S A, Cameron C J F, Kremer S C. Recurrent Neural NetworksM/ Handbook on Neu
59、ral Information Processing. Springer Berlin Heidelberg, 2013:29- 65. 2 Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial netsC/Advances in neural information processing systems. 2014: 2672-2680. 3Haykin S, Kosko B. GradientBased Learning Applied to Document RecognitionD. Wiley-IE
60、EE Press, 2009. 4Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networksC/ International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012:1097-1105. 5Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Sca
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年山东省B证安全培训试题及答案
- 2025年大数据交易所笔试高频题库
- 2025年5G网络规划笔试模拟题
- 2025年机关车队招聘考试模拟题
- 顺德保安考试题及答案大全
- 2025年安全员升职面试题及答案
- 陕西驾驶员高级工考试题库及答案
- 山大药学考试题库及答案
- 2025年定期存储协议样本
- 2025年联合建设协议合同范本
- 2025年政府部门文秘岗位笔试模拟题及答案集
- 2025-2026学年人教版(2024)初中生物八年级上册教学计划及进度表
- (高清版)DB11∕T 1455-2025 电动汽车充电基础设施规划设计标准
- 2025年辅警招聘考试真题(含答案)
- 电化学储能电站设计标准
- 消除母婴三病传播培训课件
- 附件6工贸高风险企业高危领域较大以上安全风险管控清单
- ASTM-D3359-(附著力测试标准)-中文版
- iatf16949-2016标准与程序文件对照表
- 车工技师论文 细长轴的加工技术方法
- 过程装备与控制工程导论
评论
0/150
提交评论