魏秀洁论文.docx

数学公式识别技术研究

收藏

压缩包内文档预览:(预览前20页/共48页)
预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图
编号:155459502    类型:共享资源    大小:658.09KB    格式:RAR    上传时间:2021-10-17 上传人:好资料QQ****51605 IP属地:江苏
20
积分
关 键 词:
数学公式 识别 技术研究
资源描述:
数学公式识别技术研究,数学公式,识别,技术研究
内容简介:
西北工业大学明德学院本科毕业设计论文(本科毕业设计论文)毕业设计(论文)外文资料翻译作 者: 魏秀洁 学科专业: 自动化 学 号: 103585 班 级: 191001 指导老师: 邢超 附件:1.外文原文2.外文资料翻译译文2014年6月概率的数学公式识别使用一个二维的上下文无关文法图数学表达式的识别问题。发达的系统灵活的,它可以很容易的扩展语法由于它的图形语法,不需要指定规则的优先级。在这个意义上是最佳表达式的所有可能的解释是扩大没有早期的承诺或艰难的决定。在这,我们给出了整个系统的概述和描述详细说明用图的语法和解析过程系统,随着对字符的一些初步结果,的结构和表达的识别性能。关键词在线,手写识别,OCR,数学方程,二维图形语法,图框语法一、引言在计算机和其他地方的增长尽管在我们的生活中的数字设备,纸和笔是最传送或记录信息的方便的方法方法。特别是,数学表达式是最反对手写输入的应用。计算机理解手写文字(手写或数学公式)是一个正在进行的研究领域。不同的?困难是由于几个因素,包括写作风格的变化较大,其词汇的指示可能的替代品的大小,和一定形状的无语义歧义理解(例如,“O”和“0”)。数学表达式识别包括两个主要ubproblems:赛格字符识别的识别化和标记符号(数字,字母,特殊数学符号)和结构分析下优秀的表达结构的空间字符和字符识别之间的关系定义输出。数学表达式的识别更挑战相比,在识别由于手写文本O数学表达式的复杂语义以及汉字的二维布局。有几种方法在文献结构数学表达式的分析:程序编码规则 1 ;X-Y削减基于投影亲?LES 2 , 3 ;基线树的构建 4 , 5 ;随机上下文无关语法 6 ;约束属性文法 7 ;分层分解分析 8 ;生成树的生成加权图 9 ;和图形语法 10 13 。在之间这些方法有一定的优势,图语法:为把 14 了,图语法的本质二维表示可以代表一个可能在于有限数量的模式,有限数量的规则,当增强属性。事实上,图语法是公式识别的首选方法之一,近年来。在 10 ,图语法添加到现有的系统放松约束的书写顺序的符号。在 11 13 的工作是基于图重写,在一个自底向上使用的语法分析器,折叠节点匹配在每一个规则应用到一个节点。这些系统,的解析过程的输出是一个单节点的含所有输入符号和对应的预期所表达的意思。我们的系统使用概率上下文无关图指导系统数学有效的解释和关联概率可解释的表达。所提出的系统区别于以往工作的概率算法的方法:先前的基于图文法方法修改初始图的应选择的语法规则是不可逆的,我们的方法娱乐的相邻标记所有可能的解释最终的表达。这可能是由于它的图形语法,可以指定需要规则的优先级,在所有可能的解释为在迄今为止被保存在一个扩展图。在这框架,所有可能的解释歧义消解的表达是在分析结束,由考虑到由此产生的可能的解释。一种解释的可能性取决于适宜性的符号空间分布的规律和与公认的符号的似然。输出我们的系统是最有可能的解析的输入,随着他们的情况。这是一个重要的优势所提出的系统,为用户可以简单地选择正确解析从名单上,而不是纠正解析结果或重写的表达。下一节简要描述图克火星,然后我们的方法和实验结果AR在随后的章节中描述。在本文的其余部分,使用的术语指的是字符识别的OCR不久符号和字符的互换是指segmente字符;和节点或令牌是指目前的O组符形式的表达式。二图语法数学公式精确的语法严格的数学的适当的数学定义是什么性表达和正确解析(含义)一个给定的数学表达。语法包括生产规则说明终端和非终端定义在语法方面,联合产生作为一个结果的规则应用的非终端。例如我们可以举一个简单的字符串的语法,德?NES规则这使得数字和整数如下(这里的10数字终端,而数字和点是非终端):数学公式精确的语法严格的适当的数学定义是什么性表达和正确解析(含义)一个给定的数学表达。语法包括生产规则终端和非终端定义语法方面,联合产生作为一个结果的规则应用的非终端。例如我们可以举一个简单的字符串的语法,规则这使得数字和整数如下(这里的10数字终端,而数字和整数是非终端。数字=0,1,2,3,4,5,6,7,8,9整数=0,1,2,3,4,5,6,7,8,9,图的语法提供了一个形式主义的语法多维数据不能计算处理通过字符串的语法实现。由于他们的介绍解决图像处理问题,图语法被用来在不同的领域,如并行系统,数据库,编程语言和生物学 15 。在数学表达式的识别,图的语法是十配合使用图重写的方法初始图形构造标记的表达反复降低到相应的单节点图解析表达式树。在每次迭代中,一个语法规则的选择和应用,当图的电流图的匹配规则的模式图;结果规则的应用,当前图转化为用规则表示。特别的,规则R =(GL;GR;C;EM)由左侧和右侧图图Gl GR,一个适用性谓词C,和一个嵌入规则了Em。适用性谓词C是一组约束节点和/或边缘的属性值,和不存在一定的边缘,需要满足,以便能应用的语法规则。例如,应用一个规则谓词表示两个标邻近的标记应具有可接受的大小和位置关系。适用的谓词,应用一个生产规则可以限制即使规则有在输入图的匹配。一个生产应用一个图G的规则产生G0,这是表示G)R G0。与生产G)R G0一个发生图G是一个图GL GR生产取代G0按照嵌入规则,EM,如果适用谓词是满意的。嵌入规则规定子图的将子GR在图含有原子GL。在字符串的语法,安置生产是显而易见的,但是在图文法,安置生产图GR必须指定通过嵌入规则的EM,描述如何处理悬边(边缘那个失去一个节点后,GL是从图)和如何连接到现有生产图GR图。一个图G =(N;E)被认为是在图语法GG的当且仅当N 2 N(节点)和E 2 E(边缘)的GG和存在的推导,可以生成G规则从河从图G图G0推导语法GG是德?内德作为一系列产品其中G)RI1 G1)ri2 G2:)里克G0。图1显示了一个例规则从图G的导出G0在哪儿标记节点A和C都有替换节点D一个有向边从A到C的嵌入规则表示只有边缘向C和边出应保持。虚节点和边在规则说明可能的额外的节点和边,这可能会或可能不会在实际的呈现。图1。规则R图给图G0应用。三、提出的方法所提出的系统的概述,如图2所示。输入的表达首先分割成孤立的符号(一个字符或中风的字符),每个符号的OCR引擎解释部分识别ii-a.然后初始图的构造,其中的节点a认可的符号和边缘代表德连接符号之间的空间,如N段iii-b.解析算法应用文法规则的电流图,添加一个新的节点,在每次迭代中的边缘(参见中间记号在图2)。这些新的节点或标记邻近的令牌代表可能的解释。解析过程继续下去,直到没有有效的生产电子资金转账。我们的语法和解析算法的解释的方向III-C. III-D。A. 分割和字符识别(OCR)输入的表达首先分割成单独的基于时间维度的特征,即相对两个连续的中风或大之间的时间差特征是用来表示字符边界。然后,在空间上重叠的符号被重新组合(例如两个“+”符号笔划)。OCR系统是结合支持向量机(SVM)和人工神经网络(ANN),结合使用。它的输入的分段特征参数和输出三个相关的分数。我们从拉维奥拉数据集选择的一个子集 5 ,附录。预处理包括大小归一化的事对在线数据以减少工件的坐标每个点被映射到一个固定的坐标范围。然后人物形象是通过插值从这些点的创建。特征提取作为输入的调整大小的图像特征,忽略了时间维度。这样做是为了消除在人物画的时空变化,以及允许用户修改的符号和公式后可以完成了方程。两分类,输入功能包括水平,垂直和对角直方图的符号图像的水平,垂直和对角线的深处的黑色像素点符号图像;8的Windows 8黑色像素数在整个符号和图像的宽度和高度的比率。在这个数据的支持向量机的系统成功率92%。虽然有方法生成后从多类支持向量机分类的概率阳离子,我们使用一个神经网络生成的分类,阳离子的选择和获得可靠的识别欺诈证据。人工神经网络的分类,用是一个1-hid隐藏层前馈神经网络与30隐藏神经元。这个分类的性能低相比于SVM,详细的和高识别88%和97%的比率,分别为。由于支持向量机在的表演更成功,OCR系统采用支持向量机的输出作为首选,并获得下一个两个选择和控制从人工神经网络的研究。什么时候精度是低于国家的最先进的OCR结果,不在这项工作中的主要焦点。B. 构造初始图最初的图是从一个标记列表生成通过分割得到通过OCR发动机。在该图中,一个节点对应于一个标记和一个边缘的两个节点之间表明这两个节点在表达的空间布局的邻居。过程可以说正是利用以下定义图形元素:节点:一个节点是一个元组的n =(t;i;c;A) t 是节点类型的;i 是一个独特的识别;c是相同的的规则,构建了节点属性值。一个节点的类型是词法类型的符号,如数字,字母,算子。每个节点知道哪些规则构建自身,所以如果需要的话,整个历史可以产生。在图4中每个方块代表一个图中的节点。边:边缘是元组E =(t;n1;n2)t 是的边缘型,n1和n2是连接节点在一起的边缘。有三种类型的边缘用解析过程:l 空间关系的边表示两个节点是否邻居(见定义以下)。l 组件的边缘的非终端节点及其关系组件,用于生成语法树后解析过程。l 生产边缘组件边缘反,连接一个终端和非终端节点到非终端节点,利用它。初始图形空间关系的边缘,只有他们决定胜负,而其他人(组件生产的边缘)被用来跟踪和加速解析过程。在此系统中,空间关系的边缘不有任何属性,因为我们不区分不同邻里关系的不同类型(侧,顶部,底部等);不同的社区类型是隐式的决定每个规则的适用性谓词。我们的优势方法是将空间关系的属性适用性词的规则,而不是判定元件全球空间关系的定义,每个规则可以有它自己的空间关系的定义类别。在这种方式中,而不是拘泥于标记这是写的一个与侧弱y符号附近的边缘,比如下标规则决定如果这两个符号的相对位置后大的应用规则。街道本身是有一个清晰的线他们的包围盒的中心点之间的视线在小于一个阈值计算的距离从表达式中的符号的平均大小。通常,一个令牌3最佳识别方案与之相关的。然而,如图2所示,如果一个角色可能属于一个以上的类型的符号(如“+”是一个操作数的符号或“t”),然后生成令牌它为了简化解析过程。C. 语法我们用一个概率上下文无关文法是二维的基于数学的语法,使用空间布局在本规则的适用性谓词信息。在这个语法,规则是一个元组R =(GR;GL;C)在GL该模式图,GR是产品图和C的应用谓词,C:通用能力!ftrue falseg通用汽车的地方是一个与GL图形。没有嵌入因为所有的规则,规则的遵循相同的嵌入。正常图语法规则表明,GL被GR但在我们的系统中,它表明,GR被添加到图形(如一个新的节点)和GL保留它。左边的图GL每个规则是一个星形图(一图有一个中心节点和周围的邻居节点只连接到中央节点),和正确的右手边的图的GR是一个节点。图3 GL和两个简单的语法规则的GR图,其中“+”在R1和R2的规则的规则算子是中央节点规则。对申请的决定规则中最重要的部分来自性谓词。对于大多数的规则符号之间的角度和距离的检查,以及它们的大小。一些规则可以对属性有进一步的检查价值观。例如,用于检查分数的规则,GL具有中心节点代表水平线符号。用性谓词的约束保持宽松,为了保持所有可能的解释数学表达式。例如,上标规则不要求,上标符号较小的尺寸比的基础,但它是不(很)大。由于匹配节点保持在图中,每个规则也有一个谓语,检查不存在生产的边缘连接到相同的节点GR的规则,以防止再次匹配相同的节点产生相同的产品。这有点复杂解析过程和增加了复杂性,但删除需要定义优先规则。目前有17的语法规则,包括对于下标,上标的数学规则,运算符(+, ,和),分数,求和,和积分),以及至于写在多个非符号组合的一些规则重叠笔划(例如 = ,)。某些终端与非终端定义语法中给出附录。所开发的系统主要是将手写数学表达式为乳胶容易进入科学文章;因此,在LaTeX代码正确的语法是明确的。然而,系统不了解数学的优先级规则,因此两个或两个以上的可能会产生解析方案对于一个输入,只会得到解决的优先规则(例如,A + BC + D)。然而,由于该系统提供的所有可能的解释给用户,用户可以选择正确的解释的几种可能的解释之间。D.解析算法我们的解析算法是一个相当简单的底部过程。在每一轮中,算法检查什么规则的语法可以在每个令牌是适用的图。如图4所示,最初有4个代币在初始图4节点对应;然后,之后第一轮,两个新的标记(A2和a + b)的生成和添加到图。特别地,两个任务必须由解析器进行:寻找一个相匹配的规则模式图和嵌入产生的产品图。因为任何模式图规则是在我们的系统中的星形图,当处理一个节点,分析器查找匹配的规则具有相同的中心节点;然后检查相邻节点和适用性谓词匹配过程。一旦找到一个匹配的,一个新的节点根据的规则,然后连接到现有的图组件和生产的边缘。空间关系边缘产生新产生的节点在后没有可能的生产是在现有的图左。每个新的节点继承了它的组件的邻居。空间关系的新生成的节点之间的边分开。为了处理的解析过程的复杂性在所有可能的地方解释保持,我们使用的(可能性)标记的决定扩展这个令牌(即应用规则)。理想的应该是做一个A*搜索,但是目前它是通过一个阈值是动态调整,根据令牌总数和覆盖(如何输入表达式的多标记覆盖)的最好的令牌。E.消除歧义的解析结果解析过程的输出是一个图可能的产品是目前。此外,如果输入可以定义的语法,那么至少有一个节点其中包括所有输入符号将在输出图。由于组件的边缘保持产品的历史,一个如果一个表达式树可替代的产生解释节点为根和组件边缘之后,直到达到一个终端节点。我们计算的可能性,也被称为性,性每一个解析的替代根据预先了解空间布局。OCR输出的概率分布。这些分布问题,例如为基础的相对大小差异下标符号,都学会了在单独的训练数据。总之,所产生的每一个节点可能是产生的空间关系的平均对数似然节点和构件是似然一个符号的发生概率。举个例子,在图4的输入,可能标记的a+ b将取决于的可能性的符号“a”的空间布局,“b”和“+”,就规则添加。我们的每一个空间分布模型分布统计直方图和计算的可能性在两个符号中的一个给定的距离(例如X或Y偏移从“a和+)相对于该直方图。我们还使用字符识别概率的区别之间的替代分析a+ b和atb共享相同的布局相似,但不同的可能性字母“t”。一个更复杂的表达式的可能性(例如(a + b)2)是由其平均数对数似然性计算通过在每个组件的数量加权组件组件。可能是在每个规则进行计算应用。四、实验结果开发的系统使用的一部分测试在协会收集的数学表达数据库这项工作 16 。完整的数据库包含57个方程每次从15个不同的用户选择,从常见的表达结论以匹配由王等人 17 使用的。表达长度范围从7到30个字符长度。测试集由20个方程组成,每个由5个不同的用户。结果是在与表达分析精度(产生的胶乳的代码方程正确的);结构的识别精度(乳胶代码除了OCR错误纠正);字符识别精度,说明如表IV。任务准确性计数正确认识17% 17 / 100的表达正确的结构分析50% 50 / 100正确的字符识别 79% 1100 / 1410表一整体准确度(5用户20表达式)任务准确性比例表达长度小于等于1025 / 100正确认识表达52%13 / 25正确的结构分析88% 22 / 25表达的长度趋于11 30 75 / 100正确认识的表达5,33 % 4 / 75正确的结构分析 37,33 % 28 / 75表二分析表达式的长度方面的结果。我们看到,方程的识别精度较低(17%),这是不是很奇怪的DIF水平的问题;但结构识别精度也不是很高(50%)。这可以通过事实的解释整体结构误差影响的意义.明显的OCR精度。例如,如果一个括号,积分号或正号误识别,结构分析失败。不幸的是,这些特殊的字符中的这也招致最大的OCR错误。事实上,性格出现在方程的特征识别的准确性远低于孤立的字符的准确性在这项工作中(79% vs. 91%)。表二显示根据表达式的长度误差分布。准确的结果在网上数学文献报道数学表达式识别显示出很大的差异(27用于识别和91-98页%的结构识别75%。你必须小心比较结果,为系统在许多方面不同,从语法的复杂性为了测试数据库的大小和复杂性。除了使用不同的数据库,其中一个重要的因素,增加报告的结果的准确性是给予反馈对用户在他/写在一个完美的结果分割和字符识别。在10秒的测试从每个用户和4的表达,我们测试了这个场景通过手动标记的表达式和进入正确的识别结果,导致85%的正确结构承认失败,通常只长表达式的时间了。五、结论我们提出了一个数学表达式的识别系统而且,使用一个二维的图形语法所生成解析表达式的替代品,随着他们的喜欢。鉴于数学表达式识别问题远未解决,这是一个重要的特征由于用户可以从列表中选择正确的选择,而不是乏味的校正输入或输出。在该系统中使用的语法是一个明星的语法,消除了因为规则,指定规则的优先级的需要不要删除任何节点,使得它很容易扩展语法。解析器的指数复杂度是一个问题图语法。在我们所有的语法,GL是星图这使得更容易的子图匹配。目前,该系统TEM在1-10秒每表达的大多数作品的表达,虽然有偶尔的时候解析器超时。我们在实施过程中最好的。第一个搜索算法在规则生成应用于合理的解释(如所指示的计算的似然性)。六 、确认 这项工作是由科学的支持和技术项目107e271土耳其研究委员会。我们感谢审稿人的有益的建议和批评。10Probabilistic Mathematical Formula RecognitionUsing a 2D Context-Free Graph GrammarMehmet CelikDepartment of Computer EngineeringBilkent UniversityAnkara, TurkeyEmail: Berrin YanikogluFaculty of Engineering and Natural SciencesSabanci UniversityIstanbul, TurkeyEmail: AbstractWe present a probabilistic framework for themathematical expression recognition problem. The developedsystem is flexible in that its grammar can be extended easilythanks to its graph grammar which eliminates the need forspecifying rule precedence. It is also optimal in the sense thatall possible interpretations of the expressions are expandedwithout making early commitments or hard decisions. In thispaper, we give an overview of the whole system and describein detail the graph grammar and the parsing process used inthe system, along with some preliminary results on character,structure and expression recognition performances.Keywords-online, handwriting recognition, OCR, mathemat-ical equation, 2D-grammar, graph grammarI. INTRODUCTIONIn spite of the ever-growing place of computers and otherdigital devices in our lives, pen and paper is still the mostconvenient way for communicating or recording informa-tion. In particular, mathematical expressions are most con-veniently entered by handwriting. Computer understandingof handwritten text (handwriting or mathematical formulas)is an ongoing research area. The difficulty is due to severalfactors, including large variations in writing styles, thesize of the lexicon indicating the possible alternatives, andambiguity of certain shapes without semantic understanding(e.g. O and 0).Mathematical expression recognition includes two mainsubproblems: character recognition for recognizing seg-mented and tokenized symbols (numbers, letters, specialmathematical symbols) and structural analysis for under-standing the structure of the expression from the spatialrelationships between the characters and the character recog-nition output. Recognizing mathematical expressions is morechallenging compared to handwritten text in recognition dueto the complex semantics of mathematical expressions aswell as the 2-dimensional layout of the characters.There are several approaches in literature for structuralanalysis of mathematical expressions: procedurally codedrules 1; X-Y cuts based on projection profiles 2, 3;baseline tree construction 4, 5; stochastic context-freegrammars 6; constraint attribute grammars 7; hierarchicaldecomposition parsing 8; spanning tree generation onweighted graphs 9; and graph grammars 1013. Amongthese approaches, graph grammars have certain advantages:as put forward in 14, graph grammars by their nature are2-dimensional representations that can represent a possiblyinfinite number of patterns with finite number of rules,when augmented with attributes. Indeed, graph grammarsare among the preferred approaches to formula recognitionin recent years. In 10, a graph grammar is added to anexisting system to relax their constraints about writing orderof symbols. Work in 1113 are based on graph re-writing,where a bottom up parser is used, collapsing matched nodesinto a single node at each rule application. For these systems,the output of the parse process is a single node containingall of the input symbols and corresponding to the intendedmeaning of the expression.Our system uses a probabilistic context-free graph gram-mar that guides the system to find mathematically validinterpretations and associate probabilities to each possi-ble interpretation of the expression. The proposed systemdistinguishes itself from the previous work in its proba-bilistic approach: whereas previous graph-grammar basedapproaches modify the initial graph with the applicationof the chosen grammar rules irreversibly, our approachentertains all possible interpretations of neighboring tokensand eventually the expression. This is possible thanks toits graph-grammar which eliminates the need for specifyingrule precedence, where all possible interpretations gener-ated thus far are kept in an extended graph. Within thisframework, the disambiguation of all possible interpretationsof the expression is done at the end of the parsing, byconsidering the likelihoods of the resulting interpretations.The likelihood of an interpretation depends on the suitabilityof the symbols spatial distribution for the rules used and theand the likelihoods of the recognized symbols. The outputof our system is a list of the most-likely parses of the input,along with their likelihoods. This is an important advantageof the proposed system, as the user can simply choose thecorrect parse from the list, rather than correcting the parseresult or rewriting the expression.The next section gives a brief description of graph gram-mars and then our approach and experimental results aredescribed in subsequent sections. In the rest of the paper, weuse the term OCR to shortly refer to character recognition;symbol and character interchangeably to refer segmentedcharacters; and node or token refer to the current group ofcharacters that form a subexpression.II. GRAPHGRAMMARSMathematical formulas are precise and the grammar ofmathematics strictly defines what is a proper mathemat-ical expression and the correct parse (intended meaning)of a given mathematical expression. A grammar consistsof production rules that indicate how terminals and non-terminals defined in the grammar, are combined to producenon-terminals as a result of the rule application. For instancewe can give a simple string grammar that defines rulesthat makes up digits and integers as follows (here the 10digits are the terminals, while digit and int are the non-terminals):digit 0|1|2|3|4|5|6|7|8|9int digitdigitGraph grammars provide a formalism for grammati-cal processing of multi-dimensional data which cannot beachieved by string grammars. Since their introduction tosolve picture processing problems, graph grammars havebeen used in diverse areas such as concurrent systems,databases, programming languages and biology 15. Inmathematical expression recognition, a graph grammar is of-ten used in conjunction with graph rewriting methods wherethe initial graph constructed from the tokenized expression isiteratively reduced to a single node graph corresponding tothe parse tree of the expression. In each iteration, a grammarrule is selected and applied, when a subgraph of the currentgraph matches the pattern graph of the rule; as a resultof the rule application, the current graph is transformed asindicated by the rule.Specifically, a rule r = (gl,gr,C,Em) consists of aleft-hand side graph gland a right-hand side graph gr,an applicability predicate C, and an embedding rule Em.The applicability predicate C is a set of constraints onattribute values of nodes and/or edges, and non-existence ofcertain edges, that need to be satisfied, so as to be ableto apply the grammar rule. For instance, the applicationpredicate of a rule about superscripts indicates that twoneighboring tokens should have acceptable size and positionrelationships. With applicability predicates, the applicationof a production rule can be restricted even if the rule has amatch in the input graph. A production is the applicationof a rule r to graph G to produce G0, which is denoted asG rG0. With a production G rG0, an occurrence of asubgraph glin G is replaced with a subgraph grto produceG0, according to embedding rule Em, if the applicabilitypredicate C is satisfied. The embedding rule specifies howto place the subgraph grwithin the graph containing theoriginal subgraph gl. In string grammars, the placement ofthe production is obvious, but in graph grammars, placementof production graph grhas to be specified via the embeddingrule Em that describes how to handle dangling edges (edgesthat lose one of their nodes after the glis removed from thegraph) and how to connect produced graph grto the existinggraph. A graph G = (n,e) is said to be in graph grammarGG if and only if n N (nodes) and e E (edges) ofGG and there exists a derivation that can generate G withrules from R. Here a derivation from graph G to graph G0of grammar GG is defined as a sequence of productionswhere G ri1G1ri2G2. rikG0. Figure 1 shows anexample rule r and a derivation from graph G to G0, wherenodes labeled a and c are replaced with node d if there isa directed edge from a to c. The embedding rule indicatesthat only edges towards c and edges outgoing from a shouldbe kept. The dashed nodes and edges in the rule r indicatepossible extra nodes and edges, which may or may not bepresent in the actualFigure 1.Application of rule r to graph g gives graph g0.III. PROPOSEDMETHODAn overview of the proposed system is illustrated in Fig 2.The input expression is first segmented into isolated symbols(a character or a stroke of a character) and each symbolis recognized by the OCR engine explained in SectionIII-A. Then an initial graph is constructed where the nodesrepresent the recognized symbols and edges represent the de-tected spatial neighborhood between symbols, as explainedin Section III-B.The parse algorithm applies grammar rules to the currentgraph, adding a new node and its edges in each iteration (seeintermediate tokens in Fig. 2). These new nodes or tokensrepresent possible interpretations of neighboring tokens. Theparse process continues until there is no valid productionleft.Our grammar and the parse algorithm are explained insections III-C and III-D.Figure 2.Overview of the parsing process. The constructed tokenscorrespond to the nodes in the graph while the edges are not shown forclarity.A. Segmentation and Symbol Recognition (OCR)The input expression is first segmented into individualcharacters based on time dimension, whereby a relativelylarge time difference between two consecutive strokes orcharacters is used to indicate character boundary. Then,spatially overlapping symbols are re-grouped (e.g. the twostrokes of a + sign).The OCR system is a combination of a Support VectorMachine (SVM) and an Artificial Neural Network (ANN),used in conjunction. It takes as input the segmented charac-ters and outputs top-3 alternatives with associated confidencescores. We have selected a subset from the LaViola data set5, shown in the Appendix.Preprocessing consists of size normalization which is doneon the online data to reduce artifacts whereby the coordinateof each point is mapped into a fixed coordinate range. Then acharacter image is created from these points by interpolation.Feature extraction takes as input the image of the resizedcharacter, ignoring time dimension. This is done to eliminatetemporal variations in the drawing of characters, as wellas allowing user corrections of symbols and formula thatmay be done after the equation is completed. For bothclassifiers, the input features consist of horizontal, verticaland diagonal histograms of the symbol images; horizontal,vertical and diagonal depths of the first black pixels of thesymbol images; number of black pixels in a 8 by 8 windowsover the whole symbol image; and ratio of width to height.The success rate of the SVM system on this data is92%. Although there are methods to generate posteriorprobabilities from multi-class SVM classification, we usedan ANN to generate classification alternatives and obtainreliable recognition confidence. The ANN classifier we usedis a 1-hidden layer feedforward neural network with 30hidden neurons. The performance of this classifier is lowercompared to the SVM, with top-1 and top3 recognitionrates of 88% and 97%, respectively. Since the SVM ismore successful in the top-1 performance, the OCR systemuses the SVM output as the top-choice and gets the nexttwo choices and the confidences from the ANN. While theaccuracy is lower than state-of-the-art results, OCR was notthe main focus in this work.B. Constructing the Initial GraphThe initial graph is generated from a list of tokensobtained by the segmentation and passed through the OCRengine. In this graph, a node correspond to a token and anedge between two nodes indicate that these two nodes areneighbors in the spatial layout of the expression. The processcan be explained precisely using the following definitions ofthe graph elements:Nodes: A node is a tuple n = (t,i,c,A) where t is thetype of the node; i is a unique identifier; c is the identifierof the rule that constructed the node; and A is a set ofattribute values. The type of a node t is the lexical typeof the symbols, such as number, letter, operator. Each nodeknows which rule constructed itself, so if needed, the wholehistory can be generated. Each box in Fig. 4 represents anode in the graph.Edges: An edge is tuple e = (t,n1,n2) where t is thetype of the edge, n1and n2are nodes that are connectedtogether by the edge. There are three types of edges used inthe parse process:Spatial relationship edges indicate if two nodes areneighbors (see definition below).Component edges between a non-terminal node and itscomponents, are used to generate the syntax tree afterthe parse process.Production edges are the reverse of component edges,linking a terminal or non-terminal node to a non-terminal node that is produced using it.The initial graph only have spatial relationship edges andthey determine the outcome, while others (component andproduction edges) are used to keep track and speed-up theparse process.In the proposed system, spatial relationship edges do nothave any attributes since we do not distinguish between dif-ferent types of neighborhood relationship (side, top, bottometc.); different neighborhood types are implicitly decided byeach rules applicability predicate. The advantage of ourapproach is that by associating spatial relationship attributeswith applicability predicates of the rules, as opposed todefining global definitions for spatial relationships, eachrule can have its own definition for spatial relationshipscategories. In this way, rather than rigidly labeling twosymbols that are written with a weak y-offset with a sideneighborhood edge, the subscript rule for instance decides ifthe two symbols relative positions are sufficient to apply therule. Neighborhood itself is defined as having a clear line ofsight between the center points of their bounding boxes andbeing at a distance smaller than a threshold value calculatedfrom the median size of the symbols in the expression.Normally, a token has 3 best recognition alternativesassociated with it. However as shown in Fig.2, if a charactermay belong to more than one type of symbol (e.g. ”+” maybe an operand or the symbol ”t”), then 2 tokens are generatedfor it in order to simplify the parse process.C. GrammarWe use a probabilistic, context-free 2D-grammar which isbased on the mathematical syntax, using the spatial layoutinformation in the applicability predicates of the rules. Inthis grammar, a rule is a tuple r = (gr,gl,C) where glisthe pattern graph, gris the product graph and C is the appli-cability predicate such that C : gm TRUE,FALSEwhere gmis a graph matched with gl. There is no embeddingrule because all rules follow the same embedding. Normallya graph grammar rule indicates that glis replaced by grbutin our system, it indicates that gris added to the graph (asa new node) and glis kept as it is.The left-hand-side graph glof each rule is a star graph (agraph that has a central node and surrounding neighboringnodes connected only to the central node), and the right-hand-side graph gris a single node. Fig. 3 exemplifies gland grgraphs of two simple grammar rules, where the +operator in rule r1and in rule r2are the central nodes ofthe rules.Figure 3.Sample rules where ”|” depicts ”or”.The most important part of the decision to apply a rulecomes from applicability predicates. For most rules, theangle and distance between symbols are checked, as well astheir sizes. Some rules may have further checks on attributevalues. For example, for the rule that checks for fractions,glhas a central node which represents the horizontal linesymbol. The constraints used in the applicability predicatesare kept loose, in order to keep all likely interpretations ofthe mathematical expression. For instance, the superscriptrule does not require that the superscript symbol is smallerin size than the base, but that it is not (much) bigger.Since the matched nodes are kept in the graph, eachrule also has a predicate that checks the non-existence ofa production edge that connects to a node which is same asgrof the rule, to prevent matching same nodes again andgenerating the same product. This somewhat complicates theparse process and increases the complexity but removes theneed for defining precedence rules.There are currently 17 rules in the grammar, includingmathematical rules for subscript, superscript, operator (+,-, , and ), fraction, summation, and integral), as wellas few rules for combining symbols written in multiple non-overlapping strokes (e.g. =,). Some of the terminalsand non-terminals defined in the grammar is given in theAppendix.The developed system was designed mainly to converthandwritten mathematical expressions into LaTeX for easyentry of scientific articles; as such, the LaTeX code of thecorrect parse is unambiguous. However, the system does notknow about mathematical precedence rules, so as a resulttwo or more likely parse alternatives would be generatedfor an input which would only be resolved with precedencerules (e.g. a + b c + d). However, since the system offersall likely interpretations to the user, the user may chose theright interpretation among the several likely interpretations.D. Parse AlgorithmOur parse algorithm is a fairly straightforward bottom-up process. In each round, the algorithm checks what ruleof the grammar may be applicable for each token in thegraph. As illustrated in Fig. 4, there are initially 4 tokenscorresponding to 4 nodes in the initial graph; then, after thefirst round, two new tokens (a2and a+b) are generated andadded to the graph.Specifically, two tasks have to be done by the parser: find-ing a match for pattern graphs of the rules and embeddingthe resulting product graph. Since the pattern graph of anyrule is a star graph in our system, when processing a node,the parser looks for a matching rule which has the samecenter node; then it checks for the neighboring nodes andapplicability predicates to finalize the matching process.Figure 4.Nodes generated in each round, with their component edges.Once a match is found, a new node is generated accordingto the rule, which then gets connected to the existing graphwith component and production edges. Spatial relationshipedges are generated among newly produced nodes afterno possible production is left in the existing graph. Eachnew node inherits the neighbors of its components andspatial relationship edges between new nodes are generatedseparately.In order to deal with the complexity of the parse processwhere all possible local interpretations are kept, we usethe fitness (likelihood) of the token to decide whether toexpand that token (i.e. apply rules). Ideally this would bedone with Asearch, but currently it is done by a fitnessthreshold which is dynamically adjusted depending on thetotal number of tokens and the fitness and coverage (howmany tokens of the input expressions are covered) of thebest token.E. Disambiguating the Parse ResultsThe output of the parse process is a graph where allpossible productions are present. Furthermore, if the inputcan be defined by the grammar, then at least one nodewhich covers all input symbols will be in the output graph.Since component edges keep the history of productions, anexpression tree can be generated if one of the alternativeinterpretation nodes is selected as the root and componentedges are followed until it reaches a terminal node.We compute the likelihood, also referred as fitness, ofeach parse alternative according to pre-learned spatial layoutand OCR output probability distributions. These distribu-tions, for instance for the relative size differences of base andsubscript symbols, are learned offline from separate trainingdata. In short, the likelihood of each generated node is theaverage log likelihood of the spatial relations that generatedthe node and the likelihoods of the components which is theprobability of occurence of a symbol.To give an example, for the input in Fig. 4, the likelihoodof the token a + b would depend on the likelihood of thespatial layout of the symbols a, b and +, with respect tothe rule generating the addition. We model each spatial dis-tribution statistics as a histogram and compute the likelihoodof a given distance between two symbols (e.g. x or y offsetbetween a and +) with respect to this histogram. We alsouse the character recognition probabilities to differentiatebetween alternative parses a + b and atb which share thesame layout likelihood, but differ in the likelihood of theletter t. The likelihood of a more complex expression (e.g.(a+b)2) is computed by averagingthe log-likelihoods of itscomponents weighted by the number of components in eachcomponent. The likelihood calculation is done at each ruleapplication.IV. EXPERIMENTALRESULTSThe developed system is tested using a portion of themathematical expression database collected in associationwith this work 16. The full database contains 57 equationseach from 15 different users, chosen from common expres-sions to match the ones used by Vuong et al 17. Expressionlengths ranges from 7 to 30 characters in length. The test setconsists of 20 equations, each written by 5 different users.The results are analyzed in terms of expression recogni-tion accuracy (the LaTeX code generated for the equation iscorrect); structure recognition accuracy (the LaTeX code iscorrect except for OCR mistakes); and character recognitionaccuracy, as illustrated in Table IV.TaskAccuracyCountCorrectly Recognized Expressions17%17/100Correct Structural Analysis50%50/100Correct Character Recognition79%1100/1410Table IOVERALL ACCURACY RESULTS(5USERS X20EXPRESSIONS EACH)TaskAccuracyRatioExpression Length 1025/100Correctly Recognized Expressions52%13/25Correct Structural Analysis88%22/25Expression Length 11 3075/100Correctly Recognized Expressions5,33%4/75Correct Structural Analysis37,33%28/75Table IIRESULTS ANALYZED IN TERMS OF THE LENGTH OF THE EXPRESSIONS.We see that equation recognition accuracy is low (17%),which is not very suprising given the difficulty of theproblem; but structure recognition accuracy is also not veryhigh either (50%). This can be explained by the fact that bothoverall and structure errors are affected significantly by OCRaccuracy. For instance if a parenthesis, the integral sign orthe plus sign is misrecognized, the structure analysis fails.Unfortunately these special characters are among the onesthat also incur the largest OCR mistakes. Indeed, characterrecognition accuracy for characters occuring in equationsis much lower compared to isolated character accuracydeveloped in this work (79% vs. 91%). Table II shows thedistribution of errors according to length of the expressions.Accuracy results reported in literature for online mathe-matical expression recognition show a large variance (27-75% for recognition and 91-98% for structure recognition16). One must be careful in comparing results, as systemsdiffer in many aspects, from the complexity of the grammarsto the test database size and complexity. In addition todifferent databases used, one important factor that increasesthe accuracy among the reported results is feedback givento the user while s/he is writing; resulting in a perfectsegmentation and OCR result. In a second test with 10users and 4 expression from each, we tested this scenarioby manually tokenizing the expressions and entering thecorrect OCR results, resulting in an 85% correct structurerecognition, failing typically only for long expressions bytime out.V. CONCLUSIONWe presented a system for mathematical expression recog-nition, using a 2D-graph grammar that generates all likelyparse alternatives of the expressions, along with their like-lihoods. Given that mathematical expression recognitionproblem is far from being solved, this is an important featuresince the user can select the correct alternative from this list,rather than tediously
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
提示  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:数学公式识别技术研究
链接地址:https://www.renrendoc.com/paper/155459502.html

官方联系方式

2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

网站客服QQ:2881952447     

copyright@ 2020-2025  renrendoc.com 人人文库版权所有   联系电话:400-852-1180

备案号:蜀ICP备2022000484号-2       经营许可证: 川B2-20220663       公网安备川公网安备: 51019002004831号

本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知人人文库网,我们立即给予删除!