外文翻译 - Apache Lucene 官网_第1页
外文翻译 - Apache Lucene 官网_第2页
外文翻译 - Apache Lucene 官网_第3页
外文翻译 - Apache Lucene 官网_第4页
外文翻译 - Apache Lucene 官网_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1外文原文ApacheLuceneWebsiteLuceneisaJavafull-textsearchengine.Luceneisnotacompleteapplication,butratheracodelibraryandAPIthatcaneasilybeusedtoaddsearchcapabilitiestoapplications.ThisistheofficialdocumentationforApacheLucene4.7.1.AdditionaldocumentationisavailableintheWiki.GettingStartedThefollowingsectionisintendedasagettingstartedguide.Ithasthreeaudiences:first-timeuserslookingtoinstallApacheLuceneintheirapplication;developerslookingtomodifyorbasetheapplicationstheydeveloponLucene;anddeveloperslookingtobecomeinvolvedinandcontributetothedevelopmentofLucene.Thegoalistohelpyougetstarted.ItdoesnotgointogreatdepthonsomeoftheconceptualorinnerdetailsofLucene:Lucenedemo,itsusage,andsources:Tutorialandwalk-throughofthecommand-lineLucenedemo.IntroductiontoLucenesAPIs:High-levelsummaryofthedifferentLucenepackages.Analysisoverview:IntroductiontoLucenesanalysisAPI.SeealsotheTokenStreamconsumerworkflow.ReferenceDocumentsChanges:Listofchangesinthisrelease.SystemRequirements:MinimumandsupportedJavaversions.MigrationGuide:WhatchangedinLucene4;howtomigratecodefromLucene3.x.JREVersionMigration:InformationaboutupgradingbetweenmajorJREversions.FileFormats:GuidetothesupportedindexformatusedbyLucene.Thiscanbecustomizedbyusinganalternatecodec.SearchandScoringinLucene:IntroductiontohowLucenescoresdocuments.ClassicScoringFormula:FormulaofLucenesclassicVectorSpaceimplementation.(lookhereforothermodels)ClassicQueryParserSyntax:OverviewoftheClassicQueryParserssyntaxandfeatures.APIJavadocscore:Lucenecorelibraryanalyzers-common:Analyzersforindexingcontentindifferentlanguagesanddomains.2analyzers-icu:AnalysisintegrationwithICU(InternationalComponentsforUnicode).analyzers-kuromoji:JapaneseMorphologicalAnalyzeranalyzers-morfologik:AnalyzerforindexingPolishanalyzers-phonetic:Analyzerforindexingphoneticsignatures(forsounds-alikesearch)analyzers-smartcn:AnalyzerforindexingChineseanalyzers-stempel:AnalyzerforindexingPolishanalyzers-uima:AnalysisintegrationwithApacheUIMAbenchmark:SystemforbenchmarkingLuceneclassification:ClassificationmoduleforLucenecodecs:Lucenecodecsandpostingsformats.demo:Simpleexamplecodeexpressions:Dynamicallycomputedvaluestosort/facet/searchonbasedonapluggablegrammar.facet:Facetedindexingandsearchcapabilitiesgrouping:Collectorsforgroupingsearchresults.highlighter:Highlightssearchkeywordsinresultsjoin:Index-timeandQuery-timejoinsfornormalizedcontentmemory:Single-documentin-memoryindeximplementationmisc:Indextoolsandothermiscellaneouscodequeries:FiltersandQueriesthataddtocoreLucenequeryparser:Queryparsersandparsingframeworkreplicator:Filesreplicationutilitysandbox:Variousthirdpartycontributionsandnewideasspatial:Geospatialsearchsuggest:Auto-suggestandSpellcheckingsupporttest-framework:FrameworkfortestingLucene-basedapplicationsLucene4.7.1coreAPIApacheLuceneisahigh-performance,full-featuredtextsearchenginelibrary.See:DescriptionPackagesorg.apache.luceneT.apache.lucene.analysisAPIandcodetoconverttextintoindexable/.apache.lucene.analysis.tokenattributesG.apache.lucene.codecsCodecsAPI:APIpressingStoredFieldsF.apache.lucene.codecs.lucene3xCodectosupportLucene3.xindexes(readonly)org.apache.lucene.codecs.lucene40L.apache.lucene.codecs.lucene41L.apache.lucene.codecs.lucene42L.apache.lucene.codecs.lucene45L.apache.lucene.codecs.lucene46L.apache.lucene.codecs.perfieldP.apache.lucene.documentThelogicalrepresentationofaD.apache.lucene.indexC.apache.lucene.searchC.apache.lucene.search.payloadsThepayloadspackageprovidesQ.apache.lucene.search.similaritiesThispackagecontainsthevariousrankingmodelsthatcanbeusedinL.apache.lucene.search.spansT.apache.lucene.storeBinaryi/oAPI,.apache.lucene.utilS.apache.lucene.util.automatonF.apache.lucene.util.fstFinitestatetransducersorg.apache.lucene.util.mutableComparableobjectwrappersorg.apache.lucene.util.packedPackedintegerarraysandstreams.4ApacheLucene-BuildingandInstallingtheBasicDemoAboutthisDocumentAbouttheDemoSettingyourCLASSPATHIndexingFilesAboutthecodeLocationofthesourceIndexFilesSearchingFilesAboutthisDocumentThisdocumentisintendedasagettingstartedguidetousingandrunningtheLucenedemos.Itwalksyouthroughsomebasicinstallationandconfiguration.AbouttheDemoTheLucenecommand-linedemocodeconsistsofanapplicationthatdemonstratesvariousfunctionalitiesofLuceneandhowyoucanaddLucenetoyourapplications.SettingyourCLASSPATHFirst,youshoulddownloadthelatestLucenedistributionandthenextractittoaworkingdirectory.YouneedfourJARs:theLuceneJAR,thequeryparserJAR,thecommonanalysisJAR,andtheLucenedemoJAR.YoushouldseetheLuceneJARfileinthecore/directoryyoucreatedwhenyouextractedthearchive-itshouldbenamedsomethinglikelucene-core-version.jar.Youshouldalsoseefilescalledlucene-queryparser-version.jar,lucene-analyzers-common-version.jarandlucene-demo-version.jarunderqueryparser,analysis/common/anddemo/,respectively.PutallfourofthesefilesinyourJavaCLASSPATH.IndexingFiles5Onceyouvegottenthisfaryoureprobablyitchingtogo.Letsbuildanindex!AssumingyouvesetyourCLASSPATHcorrectly,justtype:javaorg.apache.lucene.demo.IndexFiles-docspath-to-lucene/srcThiswillproduceasubdirectorycalledindexwhichwillcontainanindexofalloftheLucenesourcecode.Tosearchtheindextype:javaorg.apache.lucene.demo.SearchFilesYoullbepromptedforaquery.Typeinagibberishormadeupword(forexample:supercalifragilisticexpialidocious).Youllseethattherearenomachingresultsinthelucenesourcecode.Nowtryenteringthewordstring.Thatshouldreturnawholebunchofdocuments.Theresultswillpageateverytenthresultandaskyouwhetheryouwantmoreresults.AboutthecodeInthissectionwewalkthroughthesourcesbehindthecommand-lineLucenedemo:wheretofindthem,theirpartsandtheirfunction.ThissectionisintendedforJavadeveloperswishingtounderstandhowtouseLuceneintheirapplications.LocationofthesourceThefilesdiscussedherearelinkedintothisdocumentationdirectly:IndexFiles.java:codetocreateaLuceneindex.SearchFiles.java:codetosearchaLuceneindex.IndexFilesAswediscussedinthepreviouswalk-through,theIndexFilesclasscreatesaLuceneIndex.Letstakealookathowitdoesthis.Themain()methodparsesthecommand-lineparameters,theninpreparationforinstantiatingIndexWriter,opensaDirectory,andinstantiatesStandardAnalyzerandIndexWriterConfig.Thevalueofthe-indexcommand-lineparameteristhenameofthefilesystemdirectorywhereallindexinformationshouldbestored.IfIndexFilesisinvokedwitharelativepathgiveninthe-indexcommand-lineparameter,orifthe-indexcommand-lineparameterisnotgiven,causingthedefaultrelativeindexpathindextobeused,theindexpathwillbecreatedasasubdirectoryofthecurrentworkingdirectory(ifitdoesnotalreadyexist).Onsomeplatforms,theindexpathmaybecreatedinadifferentdirectory(suchastheusershomedirectory).The-docscommand-lineparametervalueisthelocationofthedirectorycontainingfilestobeindexed.6The-updatecommand-lineparametertellsIndexFilesnottodeletetheindexifitalreadyexists.When-updateisnotgiven,IndexFileswillfirstwipetheslatecleanbeforeindexinganydocuments.LuceneDirectorysareusedbytheIndexWritertostoreinformationintheindex.InadditiontotheFSDirectoryimplementationweareusing,thereareseveralotherDirectorysubclassesthatcanwritetoRAM,todatabases,etc.LuceneAnalyzersareprocessingpipelinesthatbreakuptextintoindexedtokens,a.k.a.terms,andoptionallyperformotheroperationsonthesetokens,e.g.downcasing,synonyminsertion,filteringoutunwantedtokens,etc.TheAnalyzerweareusingisStandardAnalyzer,whichcreatestokensusingtheWordBreakrulesfromtheUnicodeTextSegmentationalgorithmspecifiedinUnicodeStandardAnnex#29;convertstokenstolowercase;andthenfiltersoutstopwords.Stopwordsarecommonlanguagewordssuchasarticles(a,an,the,etc.)andothertokensthatmayhavelessvalueforsearching.Itshouldbenotedthattherearedifferentrulesforeverylanguage,andyoushouldusetheproperanalyzerforeach.LucenecurrentlyprovidesAnalyzersforanumberofdifferentlanguages(seethejavadocsunderlucene/analysis/common/src/java/org/apache/lucene/analysis).TheIndexWriterConfiginstanceholdsallconfigurationforIndexWriter.Forexample,wesettheOpenModetouseherebasedonthevalueofthe-updatecommand-lineparameter.Lookingfurtherdowninthefile,afterIndexWriterisinstantiated,youshouldseetheindexDocs()code.ThisrecursivefunctioncrawlsthedirectoriesandcreatesDocumentobjects.TheDocumentissimplyadataobjecttorepresentthetextcontentfromthefileaswellasitscreationtimeandlocation.TheseinstancesareaddedtotheIndexWriter.Ifthe-updatecommand-lineparameterisgiven,theIndexWriterConfigOpenModewillbesettoOpenMode.CREATE_OR_APPEND,andratherthanaddingdocumentstotheindex,theIndexWriterwillupdatethemintheindexbyattemptingtofindanalready-indexeddocumentwiththesameidentifier(inourcase,thefilepathservesastheidentifier);deletingitfromtheindexifitexists;andthenaddingthenewdocumenttotheindex.SearchingFilesTheSearchFilesclassisquitesimple.ItprimarilycollaborateswithanIndexSearcher,StandardAnalyzer,(whichisusedintheIndexFilesclassaswell)andaQueryParser.Thequeryparserisconstructedwithananalyzerusedtointerpretyourquerytextinthesamewaythedocumentsareinterpreted:findingwordboundaries,downcasing,andremovinguselesswordslikea,anandthe.TheQueryobjectcontainstheresultsfromtheQueryParserwhichispassedtothesearcher.NotethatitsalsopossibletoprogrammaticallyconstructarichQueryobjectwithoutusingthequeryparser.ThequeryparserjustenablesdecodingtheLucenequerysyntaxintothecorrespondingQueryobject.SearchFilesusestheIndexSearcher.search(query,n)methodthatreturnsTopDocswithmaxnhits.Theresultsareprintedinpages,sortedbyscore(i.e.relevance).7中文翻译ApacheLucene官网Lucene是一个Java全文搜索引擎。Lucene不是一个完整的应用程序,而是一个代码库和API,可以很容易地用于添加搜索功能的应用程序。这是ApacheLucene4.7.1的官方文档。附加的文档可以在Wiki中找到。开始以下部分是“入门”指南。我们有三类读者:首次希望在他们的应用程序安装ApacheLucene的用户;为了修改或基于Lucene开发应用程序的开发人员;希望参与和对Lucene的发展作出贡献开发者。我们的目标只是帮助你“开始”。并没有涉及一些深度的概念或Lucene内在的详细信息:1.Lucene演示、用法及其源代码:指导和演示Lucene命令行。2.介绍Lucene的api:高度总结不同Lucene的包。3.分析概述:介绍LuceneAPI的分析。参见TokenStream使用者。参考文档1.改变:改变这个版本的列表。2.系统要求:最小和支持Java版本。3.迁移指南:改变;如何迁移代码从Lucene3.x。4.JRE版本迁移:主要的JRE版本之间的升级信息。5.文件格式:支持指南使用Lucene索引格式。这可以通过使用另一个定制的编解码器。6.在Lucene搜索和得分:介绍Lucene分数文件。7.经典计分公式:公式的经典的向量空间实现。(在这里寻找其他模型)8.经典QueryParser语法:经典QueryParser的语法和功能的概述。JavaAPI文档1.核心:Lucene核心库2.analyzers-common:在不同的语言和领域分析索引内容。3.analyzers-icu:分析与ICU的集成(国际组件对Unicode)。4.analyzers-kuromoji:日本语言分析器5.analyzers-morfologik:波兰索引分析器6.analyzers-phonetic:索引语音信号分析器(sounds-alike搜索)7.analyzers-smartcn:中国索引分析器8.analyzers-stempel:波兰索引分析器9.analyzers-uima:与ApacheUIMA分析集成10.基准:Lucene系统基准测试11.分类:分类模块Lucene812.编解码器:Lucene编解码器和帖子格式。13.演示:简单的示例代码14.表达式:基于一个可插入的动态计算的值进行排序搜索的语法。15.方面:平面的索引和搜索功能16.分组:对搜索结果进行收藏分组。17.萤光笔:突出显示搜索关键字的结果18.加入:Index-timeQuery-time连接和规范化的内容19.内存:单文档内存索引的实现20.Misc:索引工具和其他杂项的代码21.查询:过滤器和查询,增加核心Lucene22.queryparser:查询解析器和解析框架23.复制因子:文件复制工具24.沙箱:各种第三方贡献和新的想法25.空间:地理空间搜索26.建议:自动建议及拼写检查支持27.test-framework:Lucene-based应用程序的测试框架Lucene4.7.2核心APIApacheLucene是一个高性能、功能完全的文本搜索引擎库。包org.apache.lucene顶级包。org.apache.lucene.analysisAPI和代码将文本转换成索引/搜索令牌。org.apache.lucene.analysis.tokenattributes文本分析的通用属性。org.apache.lucene.codecs编解码器API:API定制的编码和结构指数。pressingStoredFieldsFormat允许文档和跨域压缩存储的字段。org.apache.lucene.codecs.lucene3x编解码器支持Lucene3。x索引(只读的)org.apache.lucene.codecs.lucene40Lucene4.0文件格式。org.apache.lucene.codecs.lucene41Lucene4.1文件格式。org.apache.lucene.codecs.lucene42Lucene4.2文件格式。org.apache.lucene.codecs.lucene45Lucene4.5文件格式。org.apache.lucene.codecs.lucene46Lucene4.6文件格式。org.apache.lucene.codecs.perfield发帖格式,可以代表不同的格式的域。org.apache.lucene.document的逻辑表示Document索引和搜索。9org.apache.lucene.index代码维护和访问指数。org.apache.lucene.search代码搜索指数。org.apache.lucene.search.payloads有效载荷包提供的查询机制寻找和使用有效载荷。org.apache.lucene.search.similarities这个包包含了各种排序模型,可以使用L.apache.lucene.search.spans跨越的微积分。org.apache.lucene.store二进制文件i/oAPI,用于所有索引数据。org.apache.lucene.util一些实用工具类。org.apache.lucene.util.automaton有限状态自动机的正则表达式。org.apache.lucene.util.fst有限状态传感器org.apache.lucene.util.mutable可对比的对象包装器org.apache.lucene.util.packed整型数组和流。ApacheLucene构建和安装基本的演示1.关于本文档2.关于演示3.设置类路径中4.索引文件5.关于代码6.源的位置7.IndexFiles8.搜索文件关于本文档本文档的目的是作为一个“入门”指南使用Lucene和运行演示。它将引导您完成一些基本的安装和配置。关于演示Lucene演示代码包含一个命令行应用程序,它演示了Lucene的各种功能和如何将Lucene添加到您的应用程序。设置类路径首先,你应该下载最新的Lucene并且解压到一个目录中。你需要四个JAR:LuceneJAR,queryparserJAR,共同分析JAR,Lucene演示JAR。您应该看到Lucene核心/目录中的JAR文件创建提取档案时,它应该被命名为类似lucene-core-version.jar。您也应该看到文件lucene-queryparser-version.jar,lucene-10analyzers-common-version.jar和lucene-demo-version.jarqueryparser下,分别分析/共同/和演示。把所有这些文件的四个在Java类路径中。索引文件一旦你得到这么远你可能发痒。让我们构建一个索引!假设已经正确地设置类路径,只是类型:javaorg.apache.lucene.demo.IndexFiles-docspath-to-lucene/src这将产生一个子目录指数它将包含Lucene索引的所有源代码。搜索索引擎类型:javaorg.apache.lucene.demo.SearchFiles你会提示输入一个查询。输入胡言乱语或由词(例如:“难以置信的”)。你会发现没有在lucene源码加工的结

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论