支持向量机SVM的手写数字识别的有效方法外文文献翻译_第1页
支持向量机SVM的手写数字识别的有效方法外文文献翻译_第2页
支持向量机SVM的手写数字识别的有效方法外文文献翻译_第3页
支持向量机SVM的手写数字识别的有效方法外文文献翻译_第4页
支持向量机SVM的手写数字识别的有效方法外文文献翻译_第5页
已阅读5页,还剩9页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、文献出处:Neves R F P, Zanchettin C, Filho A N G L. An Efficient Way of Combining SVMs for Handwritten Digit RecognitionM/ Artificial Neural Networks and Machine Learning ICANN 2012. Springer Berlin Heidelberg, 2012:229-237.翻译后中文字数:4763第一部分为译文,第二部分为原文。默认格式:中文五号宋体,英文五号Times New Roma,行间距1.5倍。一种结合支持向量机的手写数字

2、识别的有效方法摘要:本文提出了一种将组合SVM(支持向量机)与其他分类器相比较,以保证高识别率和短处理时间的多问题的方法。这种分层的SVM组合考虑了高识别率和短处理时间作为评价标准。使用的案例研究是手写数字识别问题,并取得初步实验成功。关键词:模式识别,手写数字分类器,支持向量机。1引言现在世界是数字化的。技术在人们的生活中无处不在,一些人工任务,如手写识别,语音识别,人脸识别等都可以由机器来替代。在这种应用中使用的主要识别过程12需要以下步骤:数据采集;预处理数据消除噪声;分割,其中要识别的对象(文本,数字,面部等)位于背景中并分离;特征提取,其中提取每个对象的主要特征;最后有识别或分类,其

3、中的对象是根据其特征进行标记的。本文将重点放在分类任务上,用作案例研究手写数字识别问题,因为这个任务可以代表一些分类问题。例如,模式可能是不明确的,或者一些功能在多个类中是相似的。这个问题的一个例子如图1所示。在图1中1a和图1中1c图像的正确值为7,而在图1中1b是4,但图1a和b是相似的,可以是相同的数字。因此,构建一个概括好的分类器是一项艰巨任务。在某些情况下,最好的选择是尝试使用上下文信息来区分。隐马尔可夫模型(HMM)3是一种经常用于分析上下文并提高分类器识别率的技术。但其主要缺点是处理时间。建模上下文技术通常也较慢。因此,我们的研究重点是研究经典方法的优化和组合,并尝试在分类器中引

4、入更多的知识。近年来手写数字识别研究的简要概述表明,经典分类器,如多层感知器(MLP)5,k-最近邻(kNN)2和支持向量机(SVM)6用过的。一些研究人员尝试使用这些分类器的组合来改进结果78101112。组合不同技术的主要问题是我们结合了二者的优势同时也不可避免地结合了二者的缺点。MLP5是用于多类问题的强大分类器,但是当使用反向传播作为学习算法时,存在缺点。该算法可以以局部最小值停止训练。但是,如果我们尝试继续训练阶段,网络可以超越权重,降低泛化能力,就可以使用动量策略来摆脱局部最小化。kNN2根据距离样本最近的训练集中的模式的距离对样本进行分类。因此,训练集合中的模式越多,类之间的分布

5、也越均匀,识别率越高。但是,对样本进行分类的时间取决于训练数据库中模式的数量。因此,这种技术通常是缓慢的。SVM6被认为是最好的二进制分类器,因为它找到两个类之间最好的分隔边距。SVM是一个二进制分类器的事实是其最大的缺点,因为大多数的识别任务是多类问题。为了解决这个问题,有些研究人员尝试将SVM8组合起来,或者将其用作决策者分类器9。基于这些假设,本文介绍了一种分层SVM组合,在应用于手写数字识别时,可以在短时间内提供高精度的识别率。本研究结构如下:相关文献见第2节;第3节提出的SVM组合架构;实验和结果在第4节;本文的最终结论在第5节。2相关文献支持向量机(SVM)65是一种二进制分类技术

6、。训练阶段包括查找每个类的支持向量,并创建一个函数,表示不同类的支持向量之间的最佳分离边距。因此,可以获得一个最优的类分离的超平面。分析支持向量机及其先前提出的特征,它似乎类似于感知器1,因为它也试图找到一个线性函数来分离类。但有两个主要的区别:SVM发现最优线性函数,而感知器寻求发现任何线性分离函数;第二个区别是SVM可以处理非线性的可分离数据。为了做到这一点,SVM利用核函数来增加特征维数,从而使数据线性地分离。有两种经典的方法可以使用支持向量机来处理多个类:一反对全部和一反对一13。在一个反对所有的方法,一个SVM是为每个类创建。如果我们有10类,例如,在数字识别,我们将有10向量,每个

7、数字一个。这样我们训练支持向量机(0)区分0类和其他类标记为1,其他模式为0;SVM(1)以相同的方式将类1与其他类区分开来,等等。在识别阶段,模式被提交到10向量,应答标签1的SVM表示模式的类2。训练集是相同的数据库,所有向量只改变模式的标签。如果该集合用于训练SVM(i),则归类为i的模式将被替换为1,其他模式被替换为0。Nevesetal.8提出了组合一反对-一应用于手写数字识别。研究人员对每个可能的类使用了不同的SVM。在这种方法中,对称对(如0-1和1-0)被视为相同的对,而同一类的对(如1-1)不被考虑。在手写数字识别的情况下,有10类:0到9。配对的类,我们有一个总计45对。每

8、个类将出现在9向量。例如,类0将出现在以下对中:0-1、0-2、0-3、0-4、0-5、0-6、0-7、0-8、0-9。训练阶段包括两个步骤:分离数据库,为每对SVM创建45个数据子集。每个子集只包含其各自类的模式。例如,如果该对负责将0与1区分开,则子集(0,1)将仅包含0和1个模式;找到更好地分离类的内核函数训练所有45个SVM分类阶段包括向所有45个SVM提交模式,并确定SVM输出中最常见的类。该算法的主要优点来自于每个SVM给出的最优超平面分离。它产生高精度的识别率。然而,组合的SVM的数量取决于类的数量。如果问题有很多类,系统时间处理会增加。Bellili等人提出的另一种使用SVM进

9、行手写数字识别的方法是一种决策分类器。9。在这项工作中,研究人员观察到,在97.45的病例中,正确的识别类别存在于最高MLP输出中。然而,如果我们考虑两个最高的MLP产出,正确类别的百分比增加到99的情况。然后,研究人员证实了哪些主要的类对被混淆。在他们提出的方法中,当这些对存在于两个最高MLP输出中时,研究人员使用SVM来决定哪个输出对应于正确的类。对于所有其他情况,他们使用MLP输出。在7中,主要思想是使用SVM作为决策者分类器来增加kNN识别率,类似于Bellili等人。9。在这种情况下的适应是在k个最近邻居中采用两个最常见的类,并使用SVM来决定这两个类之间的关系。错误分类导致高处理时

10、间或计算成本本质上并不理想,这是令人满意的技术。在Ciresan等人数字识别任务使用MLP执行。研究人员的观点是找到正确的MLP架构的主要问题是从训练数据集中产生一个强大的分类器。该建议是提供具有隐藏层和神经元的MLP。Camastra18将SVM与神经气体相结合。该方法使用神经气体网络来验证在哪种情况下字符是大写的,然后定义字符的大写或小写形式是否可以包含在单个类中。然后将字符提交给SVM识别器以获得最终分类。在19中,研究人员创建了一个综合分类器,使用门控网络来组合来自三个不同神经网络的输出。在20中提出了一种使用基于字符图像的递归细分的新特征提取技术的方法。MLP和SVM的组合也用于识别

11、没有西方语言。3提出的算法在分析了现有技术之后,我们提出了另一个简单的SVM组合。主要思想是创建一个分层的SVM结构。第一级由一组SVM组成。每个类对有一个SVM,但是如果一个类是一对,则不能在其他对中。例如,在数字识别的情况下,我们有10个可能的类(输出):0到9.第一级将有五个支持向量机,这些对中的每一个对:0-1,2-3,4-5,6-7和8-9。该模式将由第一级的每个SVM分类。预期用正确分类集训练的SVM正确地对样本进行分类,其他人可以选择其对中的任何类别。第二级将使用与上一级相同的策略组合第一级获得的输出。该过程将持续到只有一个输出。这种分层结构的一个例子如图1所示。其中字母a,b,

12、x,y和i表示由SVM给出的输出,括号中的数字表示SVM可以区分的对。4实验方法和结果实验中使用的图像是从NISTSD19数据库3中提取的,它是美国国家标准和技术研究所提供的一个数字数据库。该数据库中的每个图像都包含不同的数字,如图3所示。利用连通分量标注的算法将图像分割成孤立位数10。每个标签对应一个孤立的数字。分割后,垂直和水平投影15被用来集中在图像中的数字。大于20x25的图像被裁剪,只删除额外的白色边框。如果图像中的对象大于20x25,则删除白色边框并调整数字大小。之所以选择大小(20x25)是因为大多数数字是近似于此大小的。在分类器的监督训练中,每个数字都被手工分离并标记成类。最终

13、数据库包含总共11377位数字。每个类平均包含1150位数字。这个数字数据库被分成训练集和测试集。训练集包含7925示例。它包含大约每类800位数。测试集包含3452样本,每类大约350位数。所有分类器的特征向量都是相同的。它是作为一个向量结构的矩阵图像。但是,在将其转换为矢量之前,图像又重新调整为12x15,以减少特征向量的维数,从而生成具有180个二进制特征的向量。这个尺寸是根据先前的实验定义的。4.1算法:训练和配置实验中使用的方法是MLP,kNN,MLP-SVM,45SVM(一对一),kNN-SVM,SVM(一对一)和所提出的分层SVM结构。在实验中,我们尝试找到每个分类器的最佳配置。

14、所使用的配置如下所述:a)多层感知器输入数量:180(基于特征向量);隐藏层数:一个;隐藏层节点数:180(在实验过程中,此数字减少并增加,这是找到的最佳拓扑)。输出数量:10(可能类数);双曲正切作为所有节点的激活功能;训练时期的数量:30,000;反向传播(梯度下降)作为训练算法。b)支持向量机(对)如前所述,为每个可能的对创建了一组训练数据库,并使用了45对。多项式和高斯径向基函数(RBF)被用作核函数。多项式核呈现最好的结果,被选为SVMs对和SVM一对一。c)KNNkNN主要参数是k值和使用的距离方程。k值在3和11之间变化。实验中使用的距离是欧几里得,曼哈顿和闵可夫斯基。发现最好的

15、结果是k等于3和欧几里得距离。所有算法都使用MatlabTM16版本R2010a实现。所有算法在参数选择后训练。在整个方法中使用相同训练的MLP和SVM,以及相同的kNN参数。4.2实验和结果使用两个标准来比较七种算法(MLP,kNN,MLP-SVM9,45SVMs-一对一8,分层SVM组合,kNN-SVM7和SVM-所有)。这些标准是处理时间和识别率。表1给出了算法在测试集上获得的识别率。表2给出了在每个算法中分类一个模式的处理时间(以秒为单位)的平均和标准偏差。考虑到具有分层SVM的算法是最快的算法(表2)的假设的统计测试中,结果表明,假设是98的置信度的假设。分析SVM组合,一对一和一对

16、一,他们呈现有希望的结果,但如果使用分类器的标准是:处理时间和高识别率,则可能不足。在一个对抗中,主要的困难是找到一个增加特征空间的内核函数,使一个类与其他类线性分离。如果类的数量增加,找到有效内核函数的复杂度也会增加。在某些情况下,此组合不会返回有效的输出。例如,在向所有SVM提交一个模式之后,输出被标记为0,并且不能进行分类。在一对一中,使用的SVM的数量是基于类的数量。该数字可以通过使用下面的公式(1)获得。本文提出的架构也取决于类的数量,但是由于SVM的数量较少,所以显着降低。基于kNN的技术获得了较高的识别率和较长的处理时间。因此,例如,这些方法不足以用于在线识别任务。如果我们考虑识

17、别率标准,则基于SVM的技术获得了良好的效果。一对一的方法是唯一一个在SVM方法中不具有高识别率的方法。然而,仔细分析,我们可以看到,在634个错误中,629没有被标记,当所有SVM返回相同的分类时,这样就不可能对模式进行分类(样本被分类器拒绝)。在634个错误中,只有15个错误分类。当拒绝而不是分类错误时,这种方法可以是一个很好的选择。所提出的分级SVM是手写数字的第三好的分类器,但是第一和第二之间的差异非常小,所以在统计测试中它们是等效的。在这种情况下,处理时间是主要目标,因为kNN-SVM和45SVMs技术是最慢的,所提出的方法是最快的。因此,分层方法可以在短的处理时间内被高识别率所强调

18、。在图4,将评估方法的处理时间和误差率归一化,并在图形中绘制在一起。在这种情况下,最好的方法是结果接近图形来源的方法。该分析表明,本文提出的方法是最佳评估方法。5结论本文提出了一种考虑到短处理时间和高精确识别率的SVM应用于手写识别问题的方法。经过对相关文献的简要研究,发现传统分类器由于处理次数少,识别率高,仍然被用于识别手写文本。新的方法,如Neves等8和Zanchettin等7增加了识别率,同时也增加了处理时间和计算成本。基于这些标准,对手写体数字识别的经典分类器和分类符进行了实现和测试。提出的新方法考虑了处理时间和识别率的最佳结果。基于kNN的技术处理时间最长,识别率最高。另一方面,递

19、阶SVM组合获得了高识别率和最短的处理时间。通过实验验证,当这两种标准都是必须要达到的要求时,这种SVM组合是最佳选择。未来的作品将考虑手写字符,并尝试将提出的方法与单词分类方法相结合。An Efficient Way of Combining SVMs for Handwritten Digit RecognitionAbstract. This paper presents a method of combining SVMs (support vector machines) for multiclass problems that ensures a high recognition

20、rate and a short processing time when compared to other classifiers. This hierarchical SVM combination considers the high recognition rate and short processing time as evaluation criteria. The used case study was the handwritten digit recognition problem with promising results.Keywords: pattern reco

21、gnition, handwriting digit classifier, support vector machine.1IntroductionNowadays the world is digital. Technology has become ubiquitous in people´s lives, and some human tasks, such as handwriting recognition, voice recognition, face recognition and others are now machine tasks. The main rec

22、ognition process 12 used in this kind of application requires the following steps: data acquisition; pre- process data to eliminate noise; segmentation, where the objects (text, numbers, face, etc) to be recognized are located and separated from the background; feature extraction, where the main fea

23、tures of each object are extracted; and finally there is the recognition, or classification, where the objects are labeled based on their features. This paper is focused on the classification task and we used as case study the handwritten digit recognition problem because this task can represent som

24、e classification issues. For example, the patterns can be ambiguous or some features are similar in more than one group of classes. An example of this problem is presented in Fig. 1. In Fig. 1a and in Fig. 1c the correct value of the image is seven, and in Fig. 1b, four. But the Fig. 1a and b are si

25、milar and could be the same digit. Fig. 1c could be confused as a digit one.Because of this, to build a classifier that generalizes well is a hard task. In some cases the best choice is try to use the context information to differentiate one class from the other.The Hidden Markov Models (HMM) 3 is a

26、 technique frequently used to analyze the context and improve the classifier recognition rate. But its main disadvantage is the processing time. Modeling context techniques also usually are slower. Thus, our research focuses on studying the optimization and combination of classical approaches and tr

27、ying to introduce more knowledge in the classifier.A brief overview of the handwritten digit recognition research in recent years shows that classical classifiers such as the multilayer perceptron (MLP) 5, k-nearest neighbor (kNN) 2 and support vector machine (SVM) 6 are extensively used. Some autho

28、rs tried combinations of these classifiers to improve results 78101112. The main problem of combining different techniques is that although we are combining the advantages, we are also joining the disadvantages of both.The MLP 5 is a powerful classifier for multiclass problems, but there is a disadv

29、antage when using back-propagation as the learning algorithm. The algorithm can stop training in a local minimum. It is possible to use the momentum strategy to escape from local minima, however, if we try to continue the training phase the network can overfit the weights, decreasing the generalizat

30、ion capability. kNN 2 classifies a sample based on the distance from the patterns in training set nearest to the sample. Thus, the more patterns there are in training set, equally distributed between the classes, higher is the recognition rate. But the time to classify a sample depends of the number

31、 of patterns in training database. Therefore, this technique is usually slow.The SVM 6 is considered the best binary classifier, because it finds the best margin of separation between two classes. The fact that SVM is a binary classifier is its greatest disadvantage, as most of the recognition tasks

32、 are multiclass problems. To solve this, some authors try to combine the SVMs 8 or use it as a decision maker classifier 9.Based on these assumptions, this paper introduces a hierarchical SVM combination that provides a highly accurate recognition rate in a short time to answer when applied to handw

33、ritten digit recognition.The present study is structured as follows: related works are presented in Section 2; the SVM combination architecture proposed in Section 3; the experiments and results are in Section 4; and the final conclusions of this paper are in Section 5.2Related WorksSupport Vector M

34、achine (SVM) 65 is a binary classification technique. The training phase consists of finding the support vectors for each class and creating a function that represents an optimal separation margin between the support vectors of different classes. Consequently, it is possible to obtain an optimal hyp

35、erplane for class separation. Analyzing SVM and its previously presented characteristics, it seems similar to the perceptron 1 because it also tries to find a linear function to separate classes. But there are two main differences: the SVM discovers the optimal linear function while the perceptron s

36、eeks to discover any linear separation function; and the second difference is that the SVM can deal with non-linearly separable data. To do that, SVM uses a kernel function to increase the feature dimensionality and consequently turning the data linearly separable.There are two classical ways to wor

37、k with multiple classes using SVM: one-against- all and one-against-one 13. In one-against-all approach, one SVM is created for each class. If we have 10 classes for example, as in digit recognition, we will have 10 SVMs, one for each digit. In this way we train the SVM(0) to differentiate the class

38、 0 from the others classes labeling it as 1 and the others patterns as 0; the SVM(1) to differentiate the class 1 from the others classes in the same way, and so on. In the recognition phase, the pattern is submitted to the 10 SVMs, the SVM that replies label 1 indicates the class of the pattern 2.

39、The training set is the same database for all SVMs changing just the label of patterns. If the set is used to train the SVM(i), patterns classified as i will be replaced by 1 and the others patterns replaced by 0.Neves et al. 8 presented the combination one-against-one applied to handwritten digit r

40、ecognition. The authors used a different SVM for each possible pair of classes. In this approach, symmetric pairs, such as 0-1 and 1-0, are considered as the same pair, whereas pairs of the same class, such as 1-1 are not considered. In the case of handwriting digit recognition, there are 10 classes

41、: 0 to 9. Pairing the classes, we have a grand total of 45 pairs. Each class will appear in 9 SVMs. For example, the class 0 will be present in the following pairs: 0-1, 0-2, 0-3, 0-4, 0-5, 0-6, 0-7, 0-8, 0-9.The training phase consists of two steps:Separate the database creating 45 data subsets for

42、 each pair of SVMs. Each subset contains only the patterns of its respective classes. For example, if the pair is responsible for differentiating 0 from 1, the subset (0,1) will contain only 0 and 1 patterns;Find the kernel function that better separate the classes;Train all 45 SVMs.The classificati

43、on phase consists in submitting the pattern to all 45 SVMs and identifying the most frequent class in the outputs of the SVMs.The main advantage of this algorithm comes from the optimal hyperplane separation given by each SVM. It produces a highly accurate recognition rate. However, the number of SV

44、Ms combined depends of the number of classes. If the problem has a lot of classes, the system time processing will increase.Another method of using SVM for handwriting digit recognition is as a decision classifier, as proposed by Bellili et al. 9. In this work the authors observed that the correct r

45、ecognition class is present in the highest MLP output in 97.45% of cases. However, if we consider the two highest MLP outputs, the percentage of correct class is increased to 99% of cases. Then, the authors verified which main pairs of classes were being confused. In their proposed method, when thes

46、e pairs are present in the two highest MLP outputs, the authors use an SVM to decide which output corresponds to the correct class. For all other cases, they use the MLP output.In 7 the main idea is to increase the kNN recognition rate using the SVM as a decision maker classifier, similar to Bellili

47、 et al. 9. The adaptation in this case is to take the two most frequent classes in the k nearest neighbors and to use the SVM to decide between these two classes. It is a satisfactory technique to be used where a misclassification results in high and processing time or computational cost are not ess

48、entially.In Ciresan et al. 17 the digit recognition task is performed using a MLP. The authors argument is that the main problem to find the correct MLP architecture is to produce a robust classifier from the training dataset. The proposal is to provide a MLP with hidden layers and neurons enough. C

49、amastra 18 combines SVMs with neural gas. The method uses a neural gas network to verify in which case the characters are, capitalized or not, and then defines if the upper or lower case forms of a character can be included into a single class. The characters are then submitted to the SVM recognizer

50、 to obtain the final classification.In 19 the authors create an ensemble classifier, using gating networks to assemble outputs given from three different neural networks. In 20 is proposed a method that uses a new feature extraction technique based on recursive subdivision of the character image. Th

51、e combination of MLP and SVM are also used to recognize no western languages.3Proposed AlgorithmAfter analyzing the state of the art, we propose another simple SVM combination. The main idea is to create a hierarchical SVM structure. The first level is composed by a set of SVMs. There is one SVM for

52、 each class pair but if one class is in one pair it cannot be in other pairs. For example, in the case of digit recognition, we have 10 possible classes (outputs): 0 to 9. The first level will have five SVMs, one for each of these pairs: 0-1, 2-3, 4-5, 6-7 and 8-9.The pattern will be classified by e

53、ach SVM in the first level. It is expected that the SVM trained with the correct classification set correctly classifies the sample and the others can choose any class of its pair. The second level will combine the outputs obtained in the first level, using the same strategy of the previous level. T

54、he process will continue until there is only one output. An example of this hierarchical structure is shown in Fig. 2, where the letters a, b, x, y and i represents the outputs given by the SVMs and the number in parenthesis represents the pair that the SVM can differentiate.4Methods, Experiments an

55、d ResultsThe images used in the experiments were extracted from the NIST SD19 database 3, which is a numerical database available from the American National Institute of Standards and Technology. Each image in this database contains a varied number of digits, as presented in Fig. 3.The images were s

56、eparated into isolated digits using an algorithm which employed connected component labeling 10. Each label corresponds to an isolated digit. After segmentation, vertical and horizontal projections 15 were used to centralize the digit in the image. Images larger than 20x25 were cropped, removing onl

57、y the extra white borders. If the object in the image is larger than 20x25, the white border is eliminated and the digit resized. The size (20x25) was selected because the majority of digits are approximately of this size.Each digit was manually separated and labeled into classes to be used in the s

58、upervised training of classifiers. The final database contains a total of 11,377 digits. Each class contains in average 1,150 digits. This digit database was separated into a training set and a test set. The training set contains 7,925 samples. It contains approximately 800 digits per class. The test set contains 3,452 samples, which is approximately 350 digits per class.The feature vector is the same for all classifiers. It is the matrix image structured as a vector. However, before converting it into a vec

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论