英语科技论文_第1页
英语科技论文_第2页
英语科技论文_第3页
全文预览已结束

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Yang J Zhejiang Univ SCIENCE A 2006 7 10 1609 1625 1609 Word sense disambiguation using semantic relatedness measurement YANG Che Yu Department of Computer Science and Information Engineering Tamkang University Taipei 25137 Taiwan China E mail 890190100 s90 tku edu tw Received Nov 7 2005 revision accepted Mar 17 2006 Abstract All human languages have words that can mean different things in different contexts such words with multiple meanings are potentially ambiguous The process of deciding which of several meanings of a term is intended in a given context is known as word sense disambiguation WSD This paper presents a method of WSD that assigns a target word the sense that is most related to the senses of its neighbor words We explore the use of measures of relatedness between word senses based on a novel hybrid approach First we investigate how to literally and regularly express a concept We apply set algebra to WordNet s synsets cooperating with WordNet s word ontology In this way we establish regular rules for constructing various representations lexical notations of a concept using Boolean operators and word forms in various synset s defined in WordNet Then we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts we facilitate concept distribution statistics to determine the degree of semantic relatedness between two lexically expressed concepts The experimental results showed good performance on Semcor a subset of Brown corpus We observe that measures of semantic relatedness are useful sources of information for WSD Key words Word sense disambiguation WSD Semantic relatedness WordNet Natural language processing INTRODUCTION The need to determine the degree of semantic similarity or more generally relatedness between two lexically expressed concepts is applied in such applications as word sense disambiguation WSD determining discourse structure text summarization and annotation information extraction and retrieval automatic indexing lexical selection and automatic correction of word errors in text All human languages have words that can mean different things in different contexts such words with multiple meanings are potentially ambiguous For almost all applications of language technology such as machine translation and text retrieval word sense ambiguity has been a potential source of error WSD is the process of deciding which of their several meanings is intended in a given context It has been very difficult to formalize the automatic process of disambiguation In many ways WSD is similar to part of speech tagging It involves labelling every word in a text with a tag from a pre specified set of tag possibilities for each word by using features of the context and other information Human beings are especially sophisticated at WSD For example given the sentence The bank holds the mortgage on my home we immediately know that the bank here refers to a financial institution that accepts deposits and channels the money into lending activities Whereas given the sentence He sat on the bank of the river and watched the currents the bank here means the sloping land beside a body of water But unfortunately it is very difficult for computers to do the same job effortlessly Polysemy a single word form having more than one meaning synonymy multiple words having the same meaning are both important issues in natural language processing or artificial intelligence related fields The research on WSD has been one of the most difficult issues in computational linguistics for a long while Roughly speaking recent advances benefit from machine learning techniques sophisticated sense inventories especially WordNet Fellbaum 1998 and large corpora to find relevant linguistic features In this paper we try to utilize the method of semantic relatedness measurement to accomplish the WSD We propose the concept distribution statistics to quantify and estimate the degree of semantic relatedness between two concepts synsets defined in WordNet Miller 1995 The approach combines two features word definition in highly precise thesaurus WordNet manually constructed by domain experts and the actual word usage by numerous ordinary people around the world The first feature is used for constructing various representations of a concept by appropriate words and the second feature is used for capturing the distribution of concepts by using statistics of word appearances This article is written not only from a theoretical perspective on concept representation concept distribution and semantic relatedness but also considered the possible application of the proposed theory on WSD In the next section we introduce the online machine readable semantic dictionary WordNet The proposed novel semantic relatedness measuring and WSD method is discussed in Section 3 The experiments and discussions of our WSD methodology are presented in Section 4 The related works on WordNet based WSD is discussed in Section 5 Then a conclusion is drawn CONCLUSION This article is written not only from a theoretical perspective on concept representation concept distribution and semantic relatedness but also considered the possible application of the proposed theory on word sense disambiguation which is in the field of artificial intelligence and natural language processing In this work we propose a novel semantic relatedness measuring method which is a hybrid approach that combines a knowledge rich source WordNet for word ontology with a knowledge poor source the Internet search for word distribution statistics First we investigate how to literally and regularly express a concept We apply set algebra to WordNet s synsets cooperating with WordNet s word ontology Through this we establish regular rules for constructing various representations lexical notations of a concept using Boolean operators and word forms in various synset s defined in WordNet Then we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts We combine the idea of concept notation with the concept distribution to compute semanticrelatedness between two given concepts Then we present a method of word sense disambiguation that assigns a target word the sense that is most related to the senses of its neighbor words We carry out disambiguation relative to the senses defined in the lexical database WordNet Our method is unsupervised and does not require any training in advance The experimental results showed good performance on SemCor The algorithm can be used with any measure that computes a relatedness score between two concepts This work has shown that the measurement of semantic relatedness is a feasible approach to word sense disambiguation FUTURE RESEARCH This paper proposed a novel method for word sense disambiguation An interesting application of it is to facilitate text retrieval such as question answering QA or search engine Some methods of QA use keyword based techniques to locate interesting passages and sentences from the retrieved documents and then filter based on the presence of the desired answer type within that candidate text Ranking is then done

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论