[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】概率的数学公式识别使用一个二维的上下文无关文法图-外文文献_第1页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】概率的数学公式识别使用一个二维的上下文无关文法图-外文文献_第2页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】概率的数学公式识别使用一个二维的上下文无关文法图-外文文献_第3页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】概率的数学公式识别使用一个二维的上下文无关文法图-外文文献_第4页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】概率的数学公式识别使用一个二维的上下文无关文法图-外文文献_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Probabilistic Mathematical Formula RecognitionUsing a 2D Context-Free Graph GrammarMehmet CelikDepartment of Computer EngineeringBilkent UniversityAnkara, TurkeyEmail: Berrin YanikogluFaculty of Engineering and Natural SciencesSabanci UniversityIstanbul, TurkeyEmail: AbstractWe present a probabilistic framework for themathematical expression recognition problem. The developedsystem is flexible in that its grammar can be extended easilythanks to its graph grammar which eliminates the need forspecifying rule precedence. It is also optimal in the sense thatall possible interpretations of the expressions are expandedwithout making early commitments or hard decisions. In thispaper, we give an overview of the whole system and describein detail the graph grammar and the parsing process used inthe system, along with some preliminary results on character,structure and expression recognition performances.Keywords-online, handwriting recognition, OCR, mathemat-ical equation, 2D-grammar, graph grammarI. INTRODUCTIONIn spite of the ever-growing place of computers and otherdigital devices in our lives, pen and paper is still the mostconvenient way for communicating or recording informa-tion. In particular, mathematical expressions are most con-veniently entered by handwriting. Computer understandingof handwritten text (handwriting or mathematical formulas)is an ongoing research area. The difficulty is due to severalfactors, including large variations in writing styles, thesize of the lexicon indicating the possible alternatives, andambiguity of certain shapes without semantic understanding(e.g. O and 0).Mathematical expression recognition includes two mainsubproblems: character recognition for recognizing seg-mented and tokenized symbols (numbers, letters, specialmathematical symbols) and structural analysis for under-standing the structure of the expression from the spatialrelationships between the characters and the character recog-nition output. Recognizing mathematical expressions is morechallenging compared to handwritten text in recognition dueto the complex semantics of mathematical expressions aswell as the 2-dimensional layout of the characters.There are several approaches in literature for structuralanalysis of mathematical expressions: procedurally codedrules 1; X-Y cuts based on projection profiles 2, 3;baseline tree construction 4, 5; stochastic context-freegrammars 6; constraint attribute grammars 7; hierarchicaldecomposition parsing 8; spanning tree generation onweighted graphs 9; and graph grammars 1013. Amongthese approaches, graph grammars have certain advantages:as put forward in 14, graph grammars by their nature are2-dimensional representations that can represent a possiblyinfinite number of patterns with finite number of rules,when augmented with attributes. Indeed, graph grammarsare among the preferred approaches to formula recognitionin recent years. In 10, a graph grammar is added to anexisting system to relax their constraints about writing orderof symbols. Work in 1113 are based on graph re-writing,where a bottom up parser is used, collapsing matched nodesinto a single node at each rule application. For these systems,the output of the parse process is a single node containingall of the input symbols and corresponding to the intendedmeaning of the expression.Our system uses a probabilistic context-free graph gram-mar that guides the system to find mathematically validinterpretations and associate probabilities to each possi-ble interpretation of the expression. The proposed systemdistinguishes itself from the previous work in its proba-bilistic approach: whereas previous graph-grammar basedapproaches modify the initial graph with the applicationof the chosen grammar rules irreversibly, our approachentertains all possible interpretations of neighboring tokensand eventually the expression. This is possible thanks toits graph-grammar which eliminates the need for specifyingrule precedence, where all possible interpretations gener-ated thus far are kept in an extended graph. Within thisframework, the disambiguation of all possible interpretationsof the expression is done at the end of the parsing, byconsidering the likelihoods of the resulting interpretations.The likelihood of an interpretation depends on the suitabilityof the symbols spatial distribution for the rules used and theand the likelihoods of the recognized symbols. The outputof our system is a list of the most-likely parses of the input,along with their likelihoods. This is an important advantageof the proposed system, as the user can simply choose thecorrect parse from the list, rather than correcting the parseresult or rewriting the expression.The next section gives a brief description of graph gram-mars and then our approach and experimental results aredescribed in subsequent sections. In the rest of the paper, weuse the term OCR to shortly refer to character recognition;symbol and character interchangeably to refer segmentedcharacters; and node or token refer to the current group ofcharacters that form a subexpression.II. GRAPH GRAMMARSMathematical formulas are precise and the grammar ofmathematics strictly defines what is a proper mathemat-ical expression and the correct parse (intended meaning)of a given mathematical expression. A grammar consistsof production rules that indicate how terminals and non-terminals defined in the grammar, are combined to producenon-terminals as a result of the rule application. For instancewe can give a simple string grammar that defines rulesthat makes up digits and integers as follows (here the 10digits are the terminals, while digit and int are the non-terminals):digit)0j1j2j3j4j5j6j7j8j9int)f gdigitfdigitgGraph grammars provide a formalism for grammati-cal processing of multi-dimensional data which cannot beachieved by string grammars. Since their introduction tosolve picture processing problems, graph grammars havebeen used in diverse areas such as concurrent systems,databases, programming languages and biology 15. Inmathematical expression recognition, a graph grammar is of-ten used in conjunction with graph rewriting methods wherethe initial graph constructed from the tokenized expression isiteratively reduced to a single node graph corresponding tothe parse tree of the expression. In each iteration, a grammarrule is selected and applied, when a subgraph of the currentgraph matches the pattern graph of the rule; as a resultof the rule application, the current graph is transformed asindicated by the rule.Specifically, a rule r = (gl;gr;C;Em) consists of aleft-hand side graph gl and a right-hand side graph gr,an applicability predicate C, and an embedding rule Em.The applicability predicate C is a set of constraints onattribute values of nodes and/or edges, and non-existence ofcertain edges, that need to be satisfied, so as to be ableto apply the grammar rule. For instance, the applicationpredicate of a rule about superscripts indicates that twoneighboring tokens should have acceptable size and positionrelationships. With applicability predicates, the applicationof a production rule can be restricted even if the rule has amatch in the input graph. A production is the applicationof a rule r to graph G to produce G0, which is denoted asG)r G0. With a production G)r G0, an occurrence of asubgraph gl in G is replaced with a subgraph gr to produceG0, according to embedding rule Em, if the applicabilitypredicate C is satisfied. The embedding rule specifies howto place the subgraph gr within the graph containing theoriginal subgraph gl. In string grammars, the placement ofthe production is obvious, but in graph grammars, placementof production graph gr has to be specified via the embeddingrule Em that describes how to handle dangling edges (edgesthat lose one of their nodes after the gl is removed from thegraph) and how to connect produced graph gr to the existinggraph. A graph G = (n;e) is said to be in graph grammarGG if and only if n 2 N (nodes) and e 2 E (edges) ofGG and there exists a derivation that can generate G withrules from R. Here a derivation from graph G to graph G0of grammar GG is defined as a sequence of productionswhere G)ri1 G1 )ri2 G2:)rik G0. Figure 1 shows anexample rule r and a derivation from graph G to G0, wherenodes labeled a and c are replaced with node d if there isa directed edge from a to c. The embedding rule indicatesthat only edges towards c and edges outgoing from a shouldbe kept. The dashed nodes and edges in the rule r indicatepossible extra nodes and edges, which may or may not bepresent in the actualFigure 1. Application of rule r to graph g gives graph g0.III. PROPOSED METHODAn overview of the proposed system is illustrated in Fig 2.The input expression is first segmented into isolated symbols(a character or a stroke of a character) and each symbolis recognized by the OCR engine explained in SectionIII-A. Then an initial graph is constructed where the nodesrepresent the recognized symbols and edges represent the de-tected spatial neighborhood between symbols, as explainedin Section III-B.The parse algorithm applies grammar rules to the currentgraph, adding a new node and its edges in each iteration (seeintermediate tokens in Fig. 2). These new nodes or tokensrepresent possible interpretations of neighboring tokens. Theparse process continues until there is no valid productionleft.Our grammar and the parse algorithm are explained insections III-C and III-D.Figure 2. Overview of the parsing process. The constructed tokenscorrespond to the nodes in the graph while the edges are not shown forclarity.A. Segmentation and Symbol Recognition (OCR)The input expression is first segmented into individualcharacters based on time dimension, whereby a relativelylarge time difference between two consecutive strokes orcharacters is used to indicate character boundary. Then,spatially overlapping symbols are re-grouped (e.g. the twostrokes of a + sign).The OCR system is a combination of a Support VectorMachine (SVM) and an Artificial Neural Network (ANN),used in conjunction. It takes as input the segmented charac-ters and outputs top-3 alternatives with associated confidencescores. We have selected a subset from the LaViola data set5, shown in the Appendix.Preprocessing consists of size normalization which is doneon the online data to reduce artifacts whereby the coordinateof each point is mapped into a fixed coordinate range. Then acharacter image is created from these points by interpolation.Feature extraction takes as input the image of the resizedcharacter, ignoring time dimension. This is done to eliminatetemporal variations in the drawing of characters, as wellas allowing user corrections of symbols and formula thatmay be done after the equation is completed. For bothclassifiers, the input features consist of horizontal, verticaland diagonal histograms of the symbol images; horizontal,vertical and diagonal depths of the first black pixels of thesymbol images; number of black pixels in a 8 by 8 windowsover the whole symbol image; and ratio of width to height.The success rate of the SVM system on this data is92%. Although there are methods to generate posteriorprobabilities from multi-class SVM classification, we usedan ANN to generate classification alternatives and obtainreliable recognition confidence. The ANN classifier we usedis a 1-hidden layer feedforward neural network with 30hidden neurons. The performance of this classifier is lowercompared to the SVM, with top-1 and top3 recognitionrates of 88% and 97%, respectively. Since the SVM ismore successful in the top-1 performance, the OCR systemuses the SVM output as the top-choice and gets the nexttwo choices and the confidences from the ANN. While theaccuracy is lower than state-of-the-art results, OCR was notthe main focus in this work.B. Constructing the Initial GraphThe initial graph is generated from a list of tokensobtained by the segmentation and passed through the OCRengine. In this graph, a node correspond to a token and anedge between two nodes indicate that these two nodes areneighbors in the spatial layout of the expression. The processcan be explained precisely using the following definitions ofthe graph elements:Nodes: A node is a tuple n = (t;i;c;A) where t is thetype of the node; i is a unique identifier; c is the identifierof the rule that constructed the node; and A is a set ofattribute values. The type of a node t is the lexical typeof the symbols, such as number, letter, operator. Each nodeknows which rule constructed itself, so if needed, the wholehistory can be generated. Each box in Fig. 4 represents anode in the graph.Edges: An edge is tuple e = (t;n1;n2) where t is thetype of the edge, n1 and n2 are nodes that are connectedtogether by the edge. There are three types of edges used inthe parse process:Spatial relationship edges indicate if two nodes areneighbors (see definition below).Component edges between a non-terminal node and itscomponents, are used to generate the syntax tree afterthe parse process.Production edges are the reverse of component edges,linking a terminal or non-terminal node to a non-terminal node that is produced using it.The initial graph only have spatial relationship edges andthey determine the outcome, while others (component andproduction edges) are used to keep track and speed-up theparse process.In the proposed system, spatial relationship edges do nothave any attributes since we do not distinguish between dif-ferent types of neighborhood relationship (side, top, bottometc.); different neighborhood types are implicitly decided byeach rules applicability predicate. The advantage of ourapproach is that by associating spatial relationship attributeswith applicability predicates of the rules, as opposed todefining global definitions for spatial relationships, eachrule can have its own definition for spatial relationshipscategories. In this way, rather than rigidly labeling twosymbols that are written with a weak y-offset with a sideneighborhood edge, the subscript rule for instance decides ifthe two symbols relative positions are sufficient to apply therule. Neighborhood itself is defined as having a clear line ofsight between the center points of their bounding boxes andbeing at a distance smaller than a threshold value calculatedfrom the median size of the symbols in the expression.Normally, a token has 3 best recognition alternativesassociated with it. However as shown in Fig.2, if a charactermay belong to more than one type of symbol (e.g. ”+” maybe an operand or the symbol ”t”), then 2 tokens are generatedfor it in order to simplify the parse process.C. GrammarWe use a probabilistic, context-free 2D-grammar which isbased on the mathematical syntax, using the spatial layoutinformation in the applicability predicates of the rules. Inthis grammar, a rule is a tuple r = (gr;gl;C) where gl isthe pattern graph, gr is the product graph and C is the appli-cability predicate such that C : gm !fTRUE;FALSEgwhere gm is a graph matched with gl. There is no embeddingrule because all rules follow the same embedding. Normallya graph grammar rule indicates that gl is replaced by gr butin our system, it indicates that gr is added to the graph (asa new node) and gl is kept as it is.The left-hand-side graph gl of each rule is a star graph (agraph that has a central node and surrounding neighboringnodes connected only to the central node), and the right-hand-side graph gr is a single node. Fig. 3 exemplifies gland gr graphs of two simple grammar rules, where the +operator in rule r1 and in rule r2 are the central nodes ofthe rules.Figure 3. Sample rules where ”j” depicts ”or”.The most important part of the decision to apply a rulecomes from applicability predicates. For most rules, theangle and distance between symbols are checked, as well astheir sizes. Some rules may have further checks on attributevalues. For example, for the rule that checks for fractions,gl has a central node which represents the horizontal linesymbol. The constraints used in the applicability predicatesare kept loose, in order to keep all likely interpretations ofthe mathematical expression. For instance, the superscriptrule does not require that the superscript symbol is smallerin size than the base, but that it is not (much) bigger.Since the matched nodes are kept in the graph, eachrule also has a predicate that checks the non-existence ofa production edge that connects to a node which is same asgr of the rule, to prevent matching same nodes again andgenerating the same product. This somewhat complicates theparse process and increases the complexity but removes theneed for defining precedence rules.There are currently 17 rules in the grammar, includingmathematical rules for subscript, superscript, operator (+,-, , and ), fraction, summation, and integral), as wellas few rules for combining symbols written in multiple non-overlapping strokes (e.g. =, ). Some of the terminalsand non-terminals defined in the grammar is given in theAppendix.The developed system was designed mainly to converthandwritten mathematical expressions into LaTeX for easyentry of scientific articles; as such, the LaTeX code of thecorrect parse is unambiguous. However, the system does notknow about mathematical precedence rules, so as a resulttwo or more likely parse alternatives would be generatedfor an input which would only be resolved with precedencerules (e.g. a+b c+d). However, since the system offersall likely interpretations to the user, the user may chose theright interpretation among the several likely interpretations.D. Parse AlgorithmOur parse algorithm is a fairly straightforward bottom-up process. In each round, the algorithm checks what ruleof the grammar may be applicable for each token in thegraph. As illustrated in Fig. 4, there are initially 4 tokenscorrespondi

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论