【精品文档】45中英文双语档案专业外文文献翻译成品:问答库档案检索模型_第1页
【精品文档】45中英文双语档案专业外文文献翻译成品:问答库档案检索模型_第2页
【精品文档】45中英文双语档案专业外文文献翻译成品:问答库档案检索模型_第3页
【精品文档】45中英文双语档案专业外文文献翻译成品:问答库档案检索模型_第4页
【精品文档】45中英文双语档案专业外文文献翻译成品:问答库档案检索模型_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

此文档是毕业设计外文翻译成品( 含英文原文+中文翻译),无需调整复杂的格式!下载之后直接可用,方便快捷!本文价格不贵,也就几十块钱!一辈子也就一次的事!外文标题:Retrieval Models for Question and Answer Archives外文作者:Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft文献出处:Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 (如觉得年份太老,可改为近2年,毕竟很多毕业生都这样做)英文1589单词, 8698字符(字符就是印刷符),中文2398汉字。Retrieval Models for Question and Answer ArchivesXiaobing Xue, Jiwoon Jeon, W. Bruce CroftABSTRACTRetrieval in a question and answer archive involves finding good answers for a users question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this paper, we propose a retrieval model that combines a translation-based language model for the question part with a query likelihood approach for the answer part. The proposed model incorporates word-to-word translation probabilities learned through exploiting different sources of information. Experiments show that the proposed translation based language model for the question part outperforms baseline methods significantly. By combining with the query likelihood language model for the answer part, substantial additional effectiveness improvements are obtained.KeywordsQuestion and Answer Retrieval, Translation Model, Language Model, Information RetrievalINTRODUCTIONLarge scale question and answer (Q&A) archives have become an important information resource on the Web. These include the FAQ archives constructed by companies for their products and the archives generated from Web services such as Yahoo Answers! and Live QnA, where people answer questions posed by other people. The retrieval task in a Q&A archive is to find relevant question-answer pairs for new questions posed by the user 6. Q&A retrieval has several advantages over Web search. First, the user can use natural language instead of only keywords as a query, and thus can potentially express his/her information need more clearly. Second, the system returns several possible answers directly instead of a long list of ranked documents, and can therefore increase the efficiency of finding the required answers. Q&A retrieval can also be considered as an alternative solution to the general Question Answering (QA) problem. Since the answers for each question in the Q&A archive are generated by humans, the difficult QA task of extracting a correct answer is transformed to the Q&A retrieval task.Question-answer pairs can also be viewed as documents with different elds, and that probabilities associated with these elds may be estimated in different ways. If we assume a language model approach, we can consider how to estimate probabilities of generating queries. Given the word mismatch problem between the user question and questions in the archive is particularly acute, for the question part, the query is generated by our proposed translation-based language model. For the answer part, the query is simply generated by the query likelihood language model. Our - nal model for Q&A retrieval is a combination of the above models. Experiments show that our proposed translation-based language model for the question part outperforms three types of representative baseline methods signicantly. After combining with query likelihood language model for the answer part, further improvement is observed.Most previous studies on translation-based information retrieval did not recognize the weakness of the original translation model and adopted the IBM model “as is”. They also suffered from low-quality word to word translation probability estimates. In this paper, we overcome these drawbacks and successfully propose a translation-based language model to solve the word mismatch problem in Q&A retrieval.A Translation-Based Language Model for the Question PartBoth the IBM model and the query likelihood language model are generative models: the former for translation, the latter for information retrieval. Although they were proposed for different purposes, they share many common aspects and assumptions. In this subsection, we compare these two approaches and propose a new model that combines advantages of both approaches.The language modeling approach to information retrieval has been successfully applied to many different applications because of its flexibility and theoretically solid background. The probabilities of sampling (or generating) the query from document language models are used to rank documents. Typically, unigram language models with the maximum likelihood estimator are used to estimate document language models that are smoothed by background collections with the Dirichlet smoothing technique 16.However, we cannot simply choose the sampling method used in the IBM model because of the self translation prob- lem. Since the target and the source languages are the same, every word has some probability to translate into itself. In some cases, low self-translation probabilities reduce retrieval performance by giving very low weights to the matching terms. In the opposite case, very high self-translation proba- bilities do not exploit the merits of the translation approach.ResultsA preliminary experiment was conducted to show the importance of the question part and the answer part for Q&A retrieval. The query likelihood retrieval model was used with the question parts, the answer parts, and the question- answer pairs, respectively. CONCLUSION AND FUTURE WORKQ&A retrieval has become an important issue due to the popularity of Q&A archives on the web. In this paper, we propose a novel translation-based language model to solve this problem. Our approach combines the translation-based language model estimated using the question part and the query likelihood language model estimated using the answer part. A new technique was described for using different configurations of question-answer pairs to improve the quality of the translation probability estimates. The retrieval experiments demonstrated the effectiveness of both the retrieval model and the estimation technique.Our experiments were conducted on one type of Q&A archive, which was collected from a web service where people answer questions posed by other people. Further work will focus on testing the effect of the proposed retrieval model on FAQ archives. We also plan to work on data from Yahoo! Answers, which is potentially a much larger collection than Wondir. The new experiments will use questions derived from this data. Other techniques for combining the models estimated from the question and answer parts will be investigated. In addition, phrase-based machine translation models have shown superior performance compared to word-based translation models in translation applications. We plan to study the effectiveness of these models in the Q&A setting.REFERENCES1 A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Briding the lexical chasm: statistical approaches to answer-nding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 192199, 20002 A. Berger and J. Laery. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222229, 1999.3 P. F. Brown, V. J. D. P ietra, S. A. D. P ietra, and R. L. Mercer. The mathematics of statistical machine translation: paramter estimation. Computational Linguistics, 19(2):263311, 1993.4 J. Jeon. Searching Question and Answer Archives. IR, University of Massachusetts, August 2007.5 J. Jeon, W. B. Croft, J. Lee, and S. Part. A framework to predict the quality of answers withnon-textual features. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 228235, 2006.6 J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, pages 8490, 2005.7 V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, pages 7683, 2005.8 R. Jin, A. G. Hauptmann, and C. Zhai. Title language model for information retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4248, 2002.9 X. Liu, W. B. Croft, and M. Koll. Finding experts in community-based question-answering services. In CIKM 05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 315316, New York, NY, USA, 2005. ACM.10 V. Murdock and W. B. Croft. A statistical model for sentence retrieval. In Proceedings of Human LanguageTechnology Conference and Conference on Empirical Methods in Natural Language Processing, pages 684691, 2005.11 O. Paul and J. Callan. Language models and structured document retrieval. In Proceedings of 1st INEX workshop, 2003.12 J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275281, 1998.13 S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 464471, Prague, Czech Republic, June 2007. Association for Computational Linguistics.14 R. D. Rurke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question les: experiences with the faq nder system. AI Magazine, 18(2):5766, 1997.15 R. Soricut and E. Brill. Automatic question answering: beyond the factoid. In Proceedings of the Human Language Technology conference / North American chapter of the Association forComputational Linguistics meeting, pages 5764, 2004.16 C. Zhai and J. Laerty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334342, 2001问答库档案检索模型Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft摘要:在问答库档案中进行检索涉及到为用户的问题找到满意的答案。相比较于传统的文档检索,在此类任务的检索模型中可以挖掘出相似的问题,并对与问题相关的答案进行排序。在本文中,我们提出了一个检索模型,它将问题部分的基于翻译的语言模型与答案部分的查询邻近度方法相结合。通过挖掘不同的信息源,从文中所提出的模型中可以获得词与词的翻译概率。实验表明,文中提出的问题部分的基于翻译的语言模型明显优于基线处理法。通过结合邻近度语言检索模型,其检索的效果得到了实质性的改善。关键词问答检索,翻译模型,语言模型,信息检索引言大型问答(QA)档案已成为网络上的重要信息资源。这其中包括公司为其产品构建的FAQ档案以及从诸如Yahoo Answers等Web服务生成的档案以及现场问答,也就是人们回答其他人提出的问题。 QA档案中的检索任务是为用户提出的新问题寻找相关问题的答案 6。问答检索与Web搜索相比有几个优点。首先,用户可以使用自然语言而不是仅使用关键字作为查询,从而可以更清楚地表达他/她要检索的信息。其次,系统直接反馈给你几个可能的答案,而不是提供一长串的查询结果排序,因此可以提高查找所需答案的效率。问答检索也可以被视为一般问答(QA)问题的替代解决方案。由于QA档案中的每个问题的答案都是由人为生成的,因此从QA任务中提取正确答案的难度就转移到QA检索任务中去了。问-答对也可以被看作是具有不同领域的文档,并且可以用不同的方式来评估与这些领域具有相关性的概率。如果我们设定了语言模型,那我们就可以考虑如何去评估生成查询概率的大小。考虑到用户问题和档案中的问题之间的词汇语义不匹配问题特别突出,对于问题部分,查询由我们提出的基于翻译的语言模型来生成。对于答案部分,查询只是由邻近度语言检索模型生成。我们提出的问答检索的最终模型整合了上述两种模型。实验表明,我们提出的针对问题部分的基于翻译的语言模型明显胜过了三种具有代表性的基线处理法。通过结合邻近度的语言检索模型,其检索效果得到了改善。以前大多数关于基于翻译的信息检索研究都没有认识到原始翻译模型的缺点,并且按部就班地采用了IBM模型。它们的词与词翻译配对成功的概率很低。在本文中,我们克服了这些缺陷,并成功地提出了基于翻译的语言模型来解决QA检索中的词汇不匹配的问题。针对问题部分的基于翻译的语言模型IBM模型和查询邻近度的语言检索模型都是生成型模型:前者可以用于翻译,后者则可以用于信息检索。虽然它们被用于不同的目的,但它们有许多共同的特性。在本小节中,我们比较了这两种方法,并提出了一种结合这两种方法优点的新模型。语言建模的信息检索方法因其灵活性和强大的理论背景已成功应用于许多不同的领域。使用从文档语言模型中提取(或生成)查询的相关性来对文档进行排序。通常情况下,会使用具有极大似然估计的一元语言模型来评估通过使用狄利克雷平滑技术16收集来的文档语言模型。然而,由于自身语义翻译的问题,我们不能简单地选择IBM模型中使用的抽样方法。由于目标语言和源语言是相同的,因此每个词汇都有翻译成本身的可能性。在某些情况下,通过对匹配的词汇赋予非常低的权重,较低的自我翻译概率使得其检索性能变差。在相反的情况下,高的自我翻译概率则不会去利用其翻译方式的优点。结果在本文中,进行了初步的实验以表明问答检索中问题部分和答案部分的重要性。 查询的邻近度检索模型分别整合了问题部分、答案部分和问答对。结论和未来进一步的工作由于网上问答档案的普及,问答检索已成为一个重要的议题。在本文中,我们提出了一种新的基于翻译的语言模型来解决这个问题。我们的方法融合了问题部分的基于翻译的语言模型和答案部分的查询邻近度语言检索模型。本文还阐述了使用问答对的不同配置来提高翻译概率评估质量的新技术。检索实验证明了其检索模型和评估技术的有效性。我们的实验是在某一类型的问答档案中进行的,这些档案是从网络服务中收集来的,其中人们可以回答他人提出的问题。未来进一步的工作将侧重于测试本文所提出的检索模型对FAQ档案的影响。我们也打算对Yahoh提供的数据进行研究!其答案可能是比Wondir更大的一个集合。新的实验将使用来自这些数据中的问题。将进一步研究从问答的角度来估计他组合模型的技术。另外,基于短语的机器翻译模型在翻译应用中与基于词语的翻译模型相比,其表现出优越的性能。我们计划研究这些模型在问答环境中的有效性。参考文献1 A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Briding the lexical chasm: statistical approaches to answer-nding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 192199, 20002 A. Berger and J. Laery. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222229, 1999.3 P. F. Brown, V. J. D. P ietra, S. A. D. P ietra, and R. L. Mercer. The mathematics of statistical machine translation: paramter estimation. Computational Linguistics, 19(2):263311, 1993.4 J. Jeon. Searching Question and Answer Archives. IR, University of Massachusetts, August 2007.5 J. Jeon, W. B. Croft, J. Lee, and S. Part. A framework to predict the quality of answers withnon-textual features. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 228235, 2006.6 J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, pages 8490, 2005.7 V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, pages 7683, 2005.8 R. Jin, A. G. Hauptmann, and C. Zhai. Title language model for information retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4248, 2002.9 X. Liu, W. B. Croft, and M. Koll. Finding experts in community-based question-answering services. In CIKM 05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 315316, New York, NY, USA, 2005. ACM.10 V. Murdock and W.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论