




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、The MNIST database of handwritten digits(手写数字MNIST数据集数据摘要:The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.中文关
2、键词:手写,数字,图像,归一化大小,模式识别,英文关键词:handwritten,digits,image,size-normalized,pattern recognition,数据格式:IMAGE数据用途:It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.数据详细介绍:THE M
3、NIST DATABASEof handwritten digitsYann LeCun, Courant Institute, NYUCorinna Cortes, Google Labs, New YorkThe MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. T
4、he digits have beensize-normalized and centered in a fixed-size image.It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.Four files are available on this site:train-ima
5、ges-idx3-ubyte.gz: training set images (9912422 bytestrain-labels-idx1-ubyte.gz: training set labels (28881 bytest10k-images-idx3-ubyte.gz: test set images (1648877 bytest10k-labels-idx1-ubyte.gz: test set labels (4542 bytesplease note that your browser may uncompress these files without telling you
6、. If the files you downloaded have a larger size than the above, they have been uncompressed by your browser. Simply rename them to remove the .gz extension. Some people have asked me "my application can't open your image files". These files are not in any standard image format. You ha
7、ve to write your own (very simple program to read them. The file format is described at the bottom of this page.The original black and white (bilevel images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a r
8、esult of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.With some classification methods (particuarly tem
9、plate-based methods, such as SVM and K-nearest neighbors, the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.The MNIST database was constructed from NIST's Special Datab
10、ase 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected amon
11、g Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database
12、by mixing NIST's datasets.The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made
13、sure that the sets of writers of the training set and test set were disjoint.SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and
14、 we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly30,000 examples each. The new training set was completed
15、with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,
16、000 from SD-3 is available on this site. The full 60,000 sample training set is available.Many methods have been tested with this training set and test set. Here are a few examples. Details about the methods are given in an upcoming paper. Some of those experiments used a version of the database whe
17、re the input images where deskewed (by computing the principal axis of the shape that is closest to the vertical, and shifting the lines so as to make it vertical. In some other experiments, the training set was augmented with artificially distorted versions of the original training samples. The dis
18、tortions are random combinations of shifts, scaling, skewing, and compression. Recognition 40-6, 2007 2-layer NN, 300 hidden units, none mean square error 2-layer NN, distortions 300 HU, MSE, none deskewing none none none none none none 4.7 3.6 1.6 4.5 3.8 3.05 2.5 2.95 2.45 LeCun et al. 1998 LeCun
19、et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 2-layer NN, 300 HU 2-layer NN, 1000 hidden units 2-layer NN, distortions 3-layer units NN, 1000 300+100 HU, hidden HU 3-layer NN, distortions 3-layer units NN, 30
20、0+100 500+150 hidden HU 3-layer NN, distortions 500+150 3-layer NN, 500+300 HU, softmax, cross entropy, weight none decay 2-layer NN, 800 Cross-Entropy Loss HU, none Hinton, 1.53 unpublished, 2005 1.6 1.1 0.9 0.7 Simard et al., ICDAR 2003 Simard et al., ICDAR 2003 Simard et al., ICDAR 2003 Simard et
21、 al., ICDAR 2003 2-layer NN, 800 HU, none cross-entropy affine distortions 2-layer NN, 800 HU, MSE elastic none distortions 2-layer NN, 800 HU, none cross-entropy elastic distortions NN, 784-500-500-2000-30 + nearest neighbor, RBM + NCA none training no distortions 6-layer NN none 784-2500-2000-1500
22、-1000-500-1 Salakhutdino 1.0 v and Hinton, AI-Stats 2007 0.35 Ciresan et al., arXiv 0 (on GPU elastic distortions 1003.0358, 2010 none 7.7 1.26 1.53 1.02 0.87 to 1.7 1.1 1.1 1.1 0.95 0.85 0.8 0.7 Kegl et al., ICML 2009 Kegl et al., ICML 2009 Kegl et al., ICML 2009 Kegl et al., ICML 2009 Kegl et al.,
23、 ICML 2009 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 LeCun et al. 1998 boosted stumps products of boosted stumps (3 none terms boosted trees (17 leaves stumps on Haar features product of stumps on Haar f. Convolution
24、al net LeNet-1 Convolutional net LeNet-4 none Haar features Haar features subsampling 16x16 pixels none Convolutional net LeNet-4 with none K-NN instead of last layer Convolutional net LeNet-4 with none local learning instead of last layer Convolutional net LeNet-5, no none distortions Convolutional net LeNet-5, huge none distortions Convolutional distortions net LeNet-5, Boosted none none Convolutional net LeNet-4, distortions unsupervised sparse features + none SVM, no distortions Convolutional net, cross-entropy none affine disto
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 幼儿园与生活关联的数学题目及答案
- 文化与娱乐:2025年KOL内容营销策略与效果评估报告
- 2025南航招聘面试题及答案
- 2025妇幼护士笔试题目及答案
- 虚拟现实教育产品在物理力学实验课中的应用效果与教学策略分析
- 露营经济背景下的户外运动装备行业市场细分研究报告
- 深化小学教师反思与教育实践的研究试题及答案
- 建筑施工安全风险识别与管理试题及答案
- 新能源商用车辆在石材加工厂运输中的应用场景分析报告
- 广东初三一模试题及答案
- 单位反恐专项经费保障制度
- 羽毛球比赛对阵表模板
- 2024年上海市中考数学真题试卷及答案解析
- 统编版2023-2024学年语文三年级下册第五单元导读课教学设计
- 2024年陕西延长石油(集团)有限责任公司校园招聘考试试题参考答案
- 地籍测量成果报告
- 2024年苏州资产管理有限公司招聘笔试冲刺题(带答案解析)
- 客车防雨密封性要求及试验方法
- 农贸市场经营管理方案
- 新生儿胸腔穿刺术
- 液气胸病人护理-查房
评论
0/150
提交评论