




免费预览已结束,剩余1页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
action. Called the whole party to carry out two- learning the significance of this is: implement the party controls the party, strictly administering the party at all . Problems in ideology, organization, work style, discipline, maintaining and developing the partys advanced nature and purity, and enhance the partys cohesion and combat effectiveness. We must from the height of politics overall interest, fully aware of two learn the significance of education, educational activities in the enhancing ideological consciousness and conscious action. Second, clear focus, enhanced effectiveness. Two base lies in the doing. The focus of this study is very clear, is to study the party Constitution and party rules, studying carefully the XI series of important speech, General Secretary. Study the party Constitution and party rules, clear basic criteria, establish a code of conduct. All the party members and cadres should conscientiously study the party Constitution and seriously learn self-discipline guidelines, disciplinary regulations and other regulations in the party. Remember membership, strengthen the party members consciousness, grasp the work of baseline and bottom line, do love the party, the party of the party party, party, party, party for the party. To study the series of important speech, understanding and mastering the XI series of important speech, General Secretary the connotation and core ideas, learn, learn, learn and do it. The starting point and the foothold of this study and education, is to guide all members of the party politics and faith, show good manners and discipline, moral ethics, conduct, devotion, as a qualified party, ideologically, politically and in action, and the Central and provincial party was very consistent. To practice partys purposes, close ties with the masses, serve the people wholeheartedly; strengthening the party spirit and moral, and always maintain the pioneer, pioneering and enterprising spirit, usually when I can see that, the moment swept up. To boot the party member education from now on, bit by bit, starting effort requires qualified party member, reflected in the daily work life, reflected in promoting poverty, maintaining social harmony and stability, sustainable catch style change style of play, as a township, synchronized to contribute to build a well-off society. Three method to grasp the principles, strong in an orderly fashion. Central proposed, two learn a do not once activities, this on requirements we in carried out process in the: to highlight normal education, to branch for unit, to three will a class, party of organised for basic form, to implementation members daily education management system for basic relies on, let each members are accept once deep of thought baptism, real put party of thought political construction caught in daily, and strict in often. Should pay attention to the guidance, according to different fields, industry members of the practical, educational content, objectives and tasks, organization, work initiative category request classification designed by studying, focus on enhancing pertinence and to prevent learning education routine and pot. To strengthen the leadership and urged leading cadres and cadres to participate in the Organization as ordinary party members, lead to give a Party lecture, lead discussions, spearheading criticism, demonstration members actively engaged learning education, efforts to achieve satisfactory results. To insist on balanced, tightly around the Township overall work, implementation of educational Thirteen-Five planning,基于WEB与汉语自然语言处理的地理信息系统应用研究 崔奇明 鞍山供电公司,辽宁 鞍山 114001摘要:介绍了一个基于Visual Prolog与英语自然语言处理的单用户地理信息查询系统模型Geobase。通过对此模型的研究与改进,提出了基于WEB与汉语自然语言处理的地理信息查询系统模型总体设计思想并进行了实现,包括建立汉语词库和给出其相应的汉语句子分词算法、部分语言集、代码等,并探讨了此系统模型与大型数据库的联接。关键词:人工智能;自然语言处理;中文信息处理;Web;Visual Prolog APPLICATION RESEARCH OF GEOGRAPHY INFORMATION SYSTEM BASED ON WEB AND CHINESE NATURAL LANGUAGE PROCESSING CUI Qi-Ming Anshan Power Supply Company ,Anshan Liaoning 114001Abstract:The geography information query system model Geobase based on Visual Prolog and English Natural Language Processing is introduced in this paper.General design frame of the geography information query system model based on Web 、Chinese NLP and its implementation is created by researching this model ,including implement Chinese words library and Chinese sentence split words algorithm. The related partial language set and program code is given,and also investigating this system model connect to a larger database.Keywords: Artificial Intelligence;Natural Language Processing;Chinese information Processing;Web;Visual Prolog1引言做为人工智能(AI)的一个研究主题,自然语言处理(NLP)已经在一些系统中得到应用。人类使用自然语言(如汉语、英语)进行交流是一种智能活动。AI研究者们一直在尝试形式化处理自然语言所需要的过程机制,如把自然语言概念化为一种知识库系统以处理人与计算机的自然语言对话,并建立计算机软件来模型化这个处理过程。一种比较成熟和有效的方法并不使用显式的领域模型而是利用关键字或模式(Pattern)来处理自然语言。这种方法利用预先设计的结构存储有限的语言学和领域知识,输入的自然语言句子由预定义的含有指示已知对象或关系的关键字或模式的软件来扫描处理。这种方法也即做为一种自然语言接口与数据库系统或专家系统等进行连接,以检索其中的信息。通过学习国外相关应用案例,分析一个英语自然语言处理的模型系统,从而研究并实现基于WEB与汉语自然语言处理的地理信息查询系统模型。2 基于英语自然语言处理的系统模型Geobase21 Geobase模型简介 Geobase是针对一个地理信息系统的查询而研制的,其中用自然语言英语来查询地理信息数据库(Visual Prolog可装入的一个文本文件)。通过输入查询的英语句子,Geobase分析并转换这些英语句子为Visual Prolog能够理解的形式,然后给出查询的答案。Geobase把数据库看做是由联系而联接起来的实体联系网络。实体是存储在数据库中的数据项,联系是联接查询句子中实体的词或词组,如句子 Cities in the state California,这里的两个实体Cities和 state 是由联系in 联接的,词the在这里被忽略,而California被看做是state 实体的一个实例。 Geobase通过将用户的查询与实体联系网络进行匹配来分析查询句子。如查询句子:which rivers run through states that border the state with the capital Austin? 首先忽略某些词:which、that、the、?,其结果查询句子为:rivers run through states border state with capital Austin,其次找出实体与联系的内部名,实体可能有同义词、复数,联系也有同义词并可能由几个词组成等,经过转换后,查询句子为:river in state border state with capital Aaustin,通过查找state with capital Austin的state,Geobase再找出与这个state相邻接的所有的states,最后找出run through(由assoc(in,run,through)转义为in)states的rivers。22 数据库及实体联系网络数据库谓词举例如下:state(Name,Abbreviation,Capitol,Area,Admit,Population,City,City,City,City)city(State,Abbreviation,Name,Population) 实体联系网络结构schema(Entity,Assoc,Entity)如下:schema(population,of,state)schema(city,in,state)实体对数据库查询的接口,通过谓词db和ent实现,如:db(ent,assoc,ent,string,string)ent(ent,string) 23 Geobase分析器 分析器用来识别查询句子的结构,Geobase把查询的句子分类为九种类型。分析使用一种“差分表”方法,分析器中第一个参数是过滤后的表、第二个参数对应实体名,最后一个参数是分析器建立的查询结构,如:pars(LIST,E,Q):-s_attr(LIST,OL,E,Q),OL=,!. Q为查询结构如分析句子“How large is the town new york?”, 首先过滤器给出待分析词表:large, town, new, york,然后调用分析器谓词pars,即依次执行如下谓词:s_attr(BIG,ENAME|S1,S2,E1,q_eaec(E1,A,E2,X):- 第一个s_attr子句ent_name(E2,ENAME), 由town转义为citysize(E2,BIG), 匹配size(city,large)entitysize(E2,E1), 匹配entitysize(city,population)schema(E1,A,E2), 匹配schema(population,of,city)get_ent(S1,S2,X),!. 返回实体名等一旦分析器分析完一个句子,Geobase便调用谓词db和ent给出查询结果。3Geobase模型的汉化研究及实现对Geobase模型的汉化研究即通过对Geobase及自然语言处理过程的汉化,使之能识别汉语句子的输入,并利用这个识别汉语句子的自然语言处理系统查询存储有中国地理信息的数据库。31 汉语句子与英语句子特点汉语字或词与英语单词或词组一样既有复数形式,也有同义词。一个差别是:英语句子的每个单词之间是以空格分隔的,这使得在处理英语句子时比较方便,并且被分隔的独立的单词本身已经表明了其所含的语义,如:What is the highest mountain in California?, 利用Visual Prolog中的fronttoken函数很容易将此句处理成一个表:“What”,”is”,”the”,” highest”,” mountain”,” in”,”California”,”?” , 并且表中各项都有一定的语义。而对汉语句子来说,“加利福尼亚最高的山是什么?”,不能直接用fronttoken函数处理成表,较难分清哪几个汉字应该连接在一起,具有独立的语义,这些需要在识别汉语句子时做特殊处理,即汉语句子分词,并且由于汉语语序与英语语序不同,在汉化的Geobase中要调整语序。32 Geobase模型汉化的其它考虑及基本结构图 对原Geobase模型所提供的语言集GEOBASE.LAN进行汉化,使其内容为汉字。 对原Geobase模型所提供的数据库文件GEOBASE.DBA进行调整:建立一个ORACLE数据库,在其中存储中国的地理信息数据,由另外的软件对其进行输入与维护。在汉化Geobase中查询之前,从ORACLE中导出数据表即形成GEOBASE.DBA文件。 对原Geobase模型的程序代码进行修改,以配合汉化的语言集GEOBASE.LAN及数据库GEOBASE.DBA,如对谓词db、ent做修改。 原Geobase模型是基于WINDOWS平台单用户的自然语言处理查询系统,为了使其能在更大的范围内使用,改造Geobase使其能在Internet/Intranet上应用。基于WEB与汉语自然语言处理的地理信息查询系统基本结构图如下:输入汉语自然语言查询句子 登录网站相关谓词完成对数据库查询分析器分析过滤后的句子,分析器结构=(过滤后的表,实体名,查询结构),定位实体名和查询谓词过滤掉分词后存储在表中汉语句子里的标点符号等汉语句子分词处理4处理汉语句子的算法及程序脚本4 1基于Visual Prolog 汉语句子分词算法 此汉语句子分词算法以最大匹配算法为基础。首先建立一个汉字词库(也可利用已有的相同格式的词库),每行存储一个汉字词组,词组长短各异,此词库配合分词算法使用。现以分词最大长度为4个汉字为例描述此分词算法如下:在Visual Prolog中调用词库str20.txt,并形成词库表LIST20,表中的每个项是词组。读入待分词的文件str2.txt,也形成一个表LIST22,表中的每个项是一个单独的汉字。当表LIST22为空表时,分词结束(将得到的表LL1反序,即可用于分析器分析)。否则,从表LIST22前端取4个汉字,组成一个词组,并与表LIST20中各项匹配,如匹配成功(即此词组是表LIST20中的一项),则将此词组写入表LL1中,使表LIST22等于去掉此4个汉字后剩余的表,然后转继续分词;如不匹配或表LIST22长度小于4时,则转。从表LIST22前端取3个汉字,组成一个词组,并与表LIST20中各项匹配,如匹配成功,则将此词组顺序写入表LL1中,使表LIST22等于去掉此3个汉字后剩余的表,然后转继续分词;如不匹配或表LIST22长度小于3时,则转。从表LIST22前端取2个汉字,组成一个词组,并与表LIST20中各项匹配,如匹配成功,则将此词组顺序写入表LL1中,使表LIST22等于去掉此2个汉字后剩余的表,然后转继续分词;如不匹配或表LIST22长度小于2时,则转。从表LIST22前端取1个汉字,与表LIST20中各项匹配,如匹配成功,则将此汉字顺序写入表LL1中,使表LIST22等于去掉此1个汉字后剩余的表,然后转继续分词;如不匹配或表LIST22长度小于1时,则转。42汉语句子分词算法程序脚本PREDICATESnondeterm process4(STRINGLIST,STRINGLIST,STRINGLIST)nondeterm condcf(STRINGLIST,STRINGLIST)nondeterm attach(STRINGLIST,STRINGLIST,STRINGLIST)nondeterm member(STRING,STRINGLIST)reverse(STRINGLIST,STRINGLIST)reverse1(STRINGLIST,STRINGLIST,STRINGLIST)CLAUSES member(X,X|_). member(X,_|L):-member(X,L).reverse(X,Y):-reverse1(,X,Y). reverse1(Y,Y):-!. reverse1(X1,U|X2,Y):-!,reverse1(U|X1,X2,Y). reverse1(_,_,_):-errorexit.condcf(L1,L2):- not(eof(input), readln(B),attach(L1,B,Lnew),condcf(Lnew,L2).condcf(L,L).attach(,L,L).attach(X|L1,L2,X|L3) :- attach(L1,L2,L3).process4(,_,LL):-!, reverse(LL,LL1).process4(HEAD1,HEAD2,HEAD3,HEAD4|REST,LIST200,LL):- concat(HEAD1,HEAD2,N1),concat(HEAD3,HEAD4,N2), concat(N1,N2,N),member(N,LIST200),attach(N,LL,LL1), process4(REST,LIST200,LL1),!.process4(HEAD1,HEAD2,HEAD3|REST,LIST200,LL):- concat(HEAD1,HEAD2,N1),concat(N1,HEAD3,N), member(N,LIST200),attach(N,LL,LL1), process4(REST,LIST200,LL1),!.process4(HEAD1,HEAD2|REST,LIST200,LL):- concat(HEAD1,HEAD2,N),member(N,LIST200),attach(N,LL,LL1), process4(REST,LIST200,LL1),!.process4(HEAD1|REST,LIST200,LL):- member(HEAD1,LIST200),attach(HEAD1,LL,LL1), process4(REST,LIST200,LL1),!.GOALconsult(GEOBASE.DB,data),consult(GEOBASE.LAN,language), openread(input,STR2.txt),readdevice(input),condcf(,LIST22),closefile(input), readdevice(keyboard),openread(input,STR20.txt),readdevice(input),condcf(,LIST20),closefile(input),readdevice(keyboard), process4(LIST22,LIST20,),43 算法应用举例在Internet/Intranet网页上输入汉语句子,如:“辽宁内流过的河?”提交后经汉化Geobase处理,生成表“辽”,“宁”,“内”,“流”,“过”,“的”,“河”,“?”,同时装入词库文件,形成表“辽宁”,“内流过的”,“长”,“省”,“河”,查询句子经过分词算法处理后生成表LL1:“辽宁”,“内流过的”,“河”,“?”,对表LL1调用分析器进行分析,确定查询类型,最终利用谓词db和ent在数据库中查询出答案。其中输入的“内流过的” 汉字组合,通过assoc(“内”,“内流过的”)处理,转义为“内”,在汉化Geobase内部,查询句子变为“辽宁内河”,符合schema(“河”,“内”,“省”)模式,所以原查询句子是可以识别的句子。其它可识别的句子如:给出辽宁的市?,给出鞍山的人口,?给出辽宁的省会?,哪些河不流经吉林?,吉林邻接那些省内流过的河的名是什么?等等。5 有关语言集、数据库举例51 汉语词库以表形式给出,如:“给出”,“辽宁”,“内流过的”,“河”,“的”,“长度”,“有”,“多少”, “通过”,“流经”,“省会”,52 汉化Geobase语言集schema(市,内,省)entitysize(河,长度)assoc(内,内流过的)synonym(镇,市)ignore(给出)size(山,高)unit(长度,公里)53 汉化Geobase数据库(*为数值数据,如人口、面积等)state(辽宁,辽,沈阳,*,*,*,大连,鞍山,抚顺,本溪)city(辽宁,辽,鞍山,*)river(辽河,*,辽宁)border(辽宁,辽,吉林,内蒙古)mountain(辽宁,辽,千山,*)6 结束语 探讨基于汉语的自然语言处理系统在Internet/Intranet上的应用对于人工智能在我国的研究与应用具有现实意义,所给出的汉语句子分词算法对于尝试利用Visual Prolog实现这种应用具有一定的示范作用。汉语句子分词算法也是各网站汉语句子搜索引擎首先要考虑的问题。通过对汉字词库及基于WEB与汉语自然语言处理的地理信息查询系统模型语言集的不断扩充,增加查询类型,使此系统模型识别更多的汉语句子。对此系统模型做适当修改,可以较容易的构造出其它领域的基于自然语言处理的数据库查询系统。参考文献:1Prolog Development Center 2Visual Prolog5.2 Prof. Edition,Prolog Development Center3William B. Gevarter. Artificial Intelligence, Expert Systems, Computer Vision, and Natural Language Processing M,Noyes Publications,1984,111-125作者简介:崔奇明 1959年生,男、汉族、辽宁辽阳人,硕士、高级工程师,研究方向:人工智能、数据库应用、网络数据安全。promoting poverty, fostering enrich the industry, creating a development environment combining . Set, Township peoples Congress to be deployed as soon as possible in accordance with the arrangements of the county peoples Congress also developed Township peoples Congress general programmes, strengthen work guidance, time is set up, their boundaries, voter registration, assignment and elected representation, voting work in ensuring that rural leadership General compact convergence. To co-ordinate better general coordination work and other work, and always maintain the continuity and stability, ensure that thinking does not come loose during General, order, work continued. Second, it is the cadre selection system to innovation. Rural transition, is the play of party building this year, the Township party organizations must conscientiously strengthen leadership, carefully organize and let the general process of become a motivating spirit, increase productivity of the process. Clear guidance, positive incentives cadres. Insisted cadres officer is first guidelines, by performance with cadres, for development distribution team, insisted in precision poverty precision poverty line, and in advance major project construction, task in the, and in solution complex contradictions in the found mining cadres candidates, real let good cadres out from, and selected have Shang, and with have good, let wants to officer who has opportunities, and can officer who has stage, and dry done who has status. To improve the selection and appointment system, strict management of cadres. Seriously implementation village cadres can go up or down several provides, especially in cadres can Xia aspects to active exploration, strengthened Xia of play, refinement Xia of case, clear Xia of standard, strongly put those in its bit not conspiracy to its political, and no performance, and masses not satisfaction of village adjustment down, promoted formed can who Shang, and Yong who Xia, and bad who tide of effective mechanism. Caring village management cadres and strictly organically, further for
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年CBN油石行业研究报告及未来行业发展趋势预测
- 2025年北京特色小镇建设行业研究报告及未来行业发展趋势预测
- 数字时代下的学习资源优化策略-洞察及研究
- 在线心理咨询效果评估模型构建-洞察及研究
- 2024年文员消防产品复习试题附答案
- 2025河北衡水市恒丰小学招聘教师若干名备考考试题库附答案解析
- 2025内蒙古乌海市第五人民医院招聘4人备考模拟试题及答案解析
- 夏季软件开发合同
- 2025年湖南科技大学第二批高层次人才公开招聘78人考试模拟试题及答案解析
- 2025四川阿坝州若尔盖县下半年省内外教师业务水平达标考调中小学教师11人备考考试试题及答案解析
- 社会调查研究方法-课件
- 雕塑基础教学课件
- 沥青混合料低温弯曲试验2002363
- 《普通逻辑》全册课后练习题参考答案(含原题)
- 新版(七步法案例)PFMEA
- 01血涂片、红细胞形态PPT课件
- 高二年级开学学生大会年级主任的讲话[001]
- 校企合作讲座精品PPT课件
- 煤矿电缆与电缆敷设标准
- T∕CATCM 008-2019 阿胶质量规范
- 以内加减法口算100题可直接打印
评论
0/150
提交评论