已阅读5页,还剩81页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
When Corpus Meets Theory,James PustejovskyTSD 2002 September 10, 2002,Models and Data,Talk Outline,Goals for Language ModelingThe Role of Corpus in TheoryDisambiguationSelection discoveryClusteringCategory modification and formationGrammar inductionThe Role of Theory in Corpus,Goals of Language Modeling,Statistically informed models improve application performanceSpeechSearchClusteringParsingMachine translationSummarizationQuestion answering,Theory Drives the Model,Corpus Behavior of words is determined by their type.You cant find what you cant model. But, you dont want to find only what you model! Theory allows a model of reality, but Corpus brings reality to the model.,Language Modeling with Generative Lexicon,Selection integrates paradigmatics and syntagmaticsModels the relationship between selectional contextsCoercion in typingComplex type (Dot Objects)All major categories behave functionallyQualia structure models much of this behaviorSemantic Types are differentiated and ranked:Grammatical behavior follows (generally) from type,Quines Gambit in Corpora,Co-occurrence reveals surface relations.Paradigmatics is first order.Syntagmatics is first order.LSA and other techniques create non-superficial associations. Model Bias is necessary to create decision procedures Example: Complex Types,Recognizing Selection,1. a. The man fell/died. b. The rock fell/!died.a. John forced/!convinced the door to open.b. John forced/convinced the guests to leave.a. John poured milk into /!on his coffee.b. John poured milk into/on the bowl.,Modeling Paradigmatic Systems,Integrating Selection into Grammars,Qualia are used to create new types: They are generative coherence relations between types.,Qualia Structure,Three Ranks of Type,Entities,Events,System of Generating Types,Qualia are incorporated into Type Itself,Qualia as Types,Functional Selection,Functional Type Coercion,Co-composition,Coercion in Function Composition,Selection and Coercion,Type Specification,Type Determines Grammatical Behavior,Behavior is measurable in corpus,Corpus Distribution of different types should correlate strongly with their type.,Corpus Analysis provides probable values for Coercion,Drinking, sipping, cooling,?pouring,?spilling,Complements of “begin” in AP:(Pustejovsky and Rooth, 1991 ms),Complements of “veto” in AP,Limitations of this Approach:Fuzzy Selection,Dependencies that require Models:Complex Types,Complex Types,Contexts Introducing Complex Types,a. John read the story/the book.b. John told the story/!the book. 2. Mary read the subway wall.,When Paradigmatic systems are modeled, Syntagmatic Processes are affected,The specificity of argument selection by a predicate;The treatment of verbal polysemy and multiple subcategorization The treatment of type mismatches and the semantics of solidarities,Types of Properties,Natural Binary Predicate,Polar Predicates,hot/coldbig/smallshort/tallclean/dirty,Lexical Asymmetries,Preferences and Defaults: clean/dirty, empty/full, pretty/uglyLexical Gaps:bald/(hairy), toothless/(toothed)Lexical Perfectives:dead/alive,Sortal Opposition:,External Negation points up in the Type system:Internal Negation points down in the Type System:(1) a. Rocks are not alive. b. !Rocks are dead. (2) a. The Pope is not married.b. !The Pope is a bachelor.(3) a. Bill did not run the race.b. Hence, Bill did not win the race.c. !Bill lost the race.,Case Study I: Corpus Drives Lexical Acquisition,Text Mining the Biobibliome,40,000 papers published each month in Medline11 million abstracts currently in Medline Database 36 GB of text,Robust Extraction of Relations from Biomedical Texts,Statistical techniques are too course-grained“SU6656 does not inhibit the PDGF receptor.”Local Named Entity Extraction is not informative enough“This protein binds to Src.”Bag of words and bag of entities approaches are too weak“p16 inhibits Cdk4.”“Cdk4 is inhibited by p16.”,Parsing Methodology,Identify Targets of InterestEntities and relations to be extractedPerform Corpus Analysis over targetsCluster corpus occurrences by syntactic behavior and semantic typeGenerate Patterns for extractionTest and modify patterns against development corpus,Possible Selectional Frames,“p16 inhibits Cdk4.” (entity,entity) “p16 inhibits cell growth.” (entity,process) “Methylation inhibits HDAC1.” (process,entity) “Cell growth inhibits apoptosis.” (process,process),Corpus Pattern Analysis,Create concordances over target elementsAutomatically cluster complementation patternsSemi-automatically verify patterns and amend grammar rules accordingly.,Getting the Lexicon out of the Corpus,Preliminary examination of the textSort concordances according to semantics patternsOne-sense-per-domain doesnt cut itComplementation patterns emerge from the corpus, with and without realizationSemantic patterns are a first step towards identifying lexical setsSemantic patterns identified with specific lexical sets yields co-specifications Implicatures can be identified with co-specifications for a very high proportion of uses of all predicators.,Corpus-derived Grammars distinguish Textual Function,Tensed Sentence-based relational information conveys new information. A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity.Nominalization functions to:Allow further predication and modification;Bridge the new information with acceptance as given. Provide economy of expression in text;Agentive Nominal conveys a relation as a given fact. The protein kinase C inhibitor staurosporine , inhibited actin assembly,Probable Syntactic Patterns: Sentential Forms,A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity. Whereas phosphorylation of the IRK by ATP is inhibited by the nonhydrolyzable competitor adenylyl-imidodiphosphate, . The Met tail peptide inhibits the closely related Ron receptor but does not affect Although the ability of individual trichothecenes to inhibit protein synthesis and activate JNK/p38 kinases are dissociable , both effects contribute to the induction of apoptosis .,Probable Syntactic Patterns: Nominal Forms,12S E1A , an inhibitor of p300-dependent transcription , reduces the binding of TFIIB , but not that of cyclin E- Cdk2 , to p300. The protein kinase C inhibitor staurosporine , inhibited actin assembly and platelet aggregation induced by thrombin or PMA.,Probable Syntactic Patterns: Nominalizations,Structural basis for inhibition of protein tyrosine phosphatases by Keggin compounds phosphomolybdate and phosphotungstate. Previous reports raised question as to whether 8-Cl-cAMP is a prodrug for its metabolite, 8-Cl-adenosine which exerts growth inhibition in a broad spectrum of cancer cells.,Case Study II: Theory Drives Corpus Analysis,Semantic Rerendering,A general technique for adapting and modifying an existing ontologyTypes are extended and created through: corpus analysis of patterns implicated with type structuresAd hoc database projections over a relational database,Specialized Ontologies in the Biomedical Domain,The UMLS from National Library of Medicinewide coverageshallow semantic type structure 180,998 instances of Amino Acid, Peptide, or Protein in UMLS Chemical Viewed Functionally and Chemical Viewed StructurallyThese 2 subtrees cover a large number of all types in the UMLSThe UMLS gives semantic type bindings to 1.5 million entities,NLP Applications using Semantic Typing,Statistical Categorization and Disambiguation TasksResolution of Prepositional AttachmentRelations between Constituents in Nominal CompoundsGeneralizing across semantic classes = make up for the sparseness of dataIR Tasks Query ReformulationFiltering & Ranking of Retrieved Results Information Extraction TasksCoreference ResolutionRelation Extraction (via Anaphora Resolution) Entity Identification,GL as Modeling Bias in Rerendering,Structural subtyping (Formal)Functional subtyping (Telic)Activation relations (Agentive)Molecular analysis (Const),Syntactic Rerendering Algorithm (I),Syntactic Rerendering Algorithm (II),Syntactic Rerendering Algorithm (III),Evaluating Results,Comparison against Existing Ontologiesoverlap with Gene Ontology (GO) for select categoriesReceptor: 17.5% of 2nd level extension phrases are in GOImproved P&R for the client NLP ApplicationsCoreference Resolution ApplicationSortal Anaphora:“the enzyme”, “the protease”, “the same solvent”, etc.,Derivation of Instances for the Proposed Subtypes,Syntactic templates (inhibitor, solvent) :definitional constructions: “X is a Y inhibitor”aliasing constructions: “X (the solvent)”appositions: “X, the inhibitor of Y,”nominal compounds: “the solvent X”enumerations: “the following solvents: X, Y, .”relative clausesadjuncts: “X and Y as solvents”,Semantic (Database) Rerendering,Database of relations extracted from the Medline corpusinhibit, block, phosphorylateTyped projection from relations table induces an ad hoc category subtype of T1X = X : T1| R(X,Y) T1UMLS1,Syntactic vs. Semantic Rerendering,Sortals with no corresponding relational form solventSortal and relation predicatesinhibitor/inhibit kinase/phosphorylateRelation predicates with no corresponding nominal formsbind withincrease,Syntactic vs. Semantic Rerendering (II),Overlap of derived subtypesCDK inhibitorp21(WAF-1) inhibited CDK2 and CDK4Recover different types of informationSyntactic templates for sortal predicates : old informationTyped projections of database relations : new information,Case Study III: Applying Lexical Semantic Knowledge TERQAS: Time and Event Recognition for Question Answering Systems,Relevance to Question Answering Systems,Is Gates currently CEO of Microsoft? Were there any meetings between the terrorist hijackers and Iraq before the WTC event?Did the Enron merger with Dynegy take place?How long did the hostage situation in Beirut last?,When did the war between Iran and Iraq end? When did John Sununu travel to a fundraiser for John Ashcroft? How many Tutsis were killed by Hutus in Rwanda in 1994? Who was Secretary of Defense during the Gulf War? What was the largest U.S. military operation since Vietnam? When did the astronauts return from the space station on the last shuttle flight?,Questions over TIMBANK Corpus,Workshop Goals,TimeML: Define and Design a Metadata Standard for Markup of events, their temporal anchoring, and how they are related to each other in News articles. TIMEBANK: Given the specification of TimeML, create a gold standard corpus of 300 articles marked up for temporal expressions, events, and basic temporal relations.,TERQAS Participants,James Pustejovsky, PIRob GaizauskasGraham KatzBob Ingria Jos CastaoInderjeet ManiAntonio SanfilippoDragomir RadevPatrick HanksMarc VerhagenBeth SundheimAndrea Setzer,Jerry HobbsBran BoguraevAndy LattoJohn FrankLisa FerroMarcia LazoRoser SaurAnna RumshiskyDavid DayLuc BelangerHarry WuAndrew See,Supported by,How TimeML Differs from Previous Markups,Extends TIMEX2 annotation;Temporal Functions: three years agoAnchors to events and other temporal expressions: Identifies signals determining interpretation of temporal expressions;Temporal Prepositions: for, during, on, at;Temporal Connectives: before, after, while.Identifies event expressions; tensed verbs; has left, was captured, will resign;stative adjectives; sunken, stalled, on board;event nominals; merger, Military Operation, Gulf War;Creates dependencies between events and times:Anchoring; John left on Monday.Orderings; The party happened after midnight.Embedding; John said Mary left.,attributes := eid class tense aspect eid := IDeid := EventIDEventID := eclass := OCCURRENCE | PERCEPTION | REPORTING | ASPECTUAL | STATE | I_STATE | I_ACTION | MODALtense := PAST | PRESENT | FUTURE | NONEaspect := PROGRESSIVE | PERFECTIVE | PERFECTIVE_PROGRESSIVE | NONE,TimeML Event Classes,Occurrence: die, crash, build, merge, sell, take advantage of, .State:Be on board, kidnapped, recovering, love, .Reporting:Say, report, announce, I-Action:Attempt, try,promise, offerI-State:Believe, intend, want, Aspectual:begin, start, finish, stop, continue.Perception:See, hear, watch, feel.,The young industrys rapid growth also is attracting regulators eager to police its many facets. The young industrys rapid growth also is attracting regulators eager to police its many facets.,Israel will ask the United States to delay a military strike against Iraq until the Jewish state is fully prepared for a possible Iraqi attack. Israel will askthe United States to delay a military strike against Iraq until the Jewish state is fullypreparedfor a possible Iraqiattack,Fully Specified Temporal ExpressionsJune 11, 1989Summer, 2002Underspecified Temporal ExpressionsMondayNext monthLast yearTwo days agoDurationsThree monthsTwo yearsfunctionInDocument allows for relative anchoring of temporal expression values,TLINK,TLINK or Temporal Link represents the temporal relationship holding between events or between an event and a time, and establishes a link between the involved entities, making explicit if they are: Simultaneous (happening at the same time)Identical: (referring to the same event)John drove to Boston. During his drive he ate a donut. 3. One before the other: The police looked into the slayings of 14 women. In six of the cases suspects have already been arrested.4. One after the other: 5. One immediately before the other: All passengers died when the plane crashed into the mountain. 6.One immediately after than the other: 7.One including the other: John arrived in Boston last Thursday.8.One being included in the other: 9.One holding during the duration of the other: 10.One being the beginning of the other: John was in the gym between 6:00 p.m. and 7:00 p.m.11.One being begun by the other: 12.One being the ending of the other: John was in the gym between 6:00 p.m. and 7:00 p.m. 13.One being ended by the other:,SLINK,SLINK or Subordination Link is used for contexts introducing relations between two events, or an event and a signal, of the following sort: 1. Modal: Relation introduced mostly by modal verbs (should, could, would, etc.) and events that introduce a reference to a possible world -mainly I_STATEs: John should have bought some wine. Mary wanted John to buy some wine. 2. Factive: Certain verbs introduce an entailment (or presupposition) of the arguments veracity. They include forget in the tensed complement, regret, manage: John forgot that he was in Boston last year. Mary regrets that she didnt marry John. John managed to leave the party. 3. Counterfactive: The event introduces a presupposition about the non-veracity of its argument: forget (to), unable to (in past tense), prevent, cancel, avoid, decline, etc. John forgot to buy some wine. Mary was unable to marry John. John prevented the divorce. 4. Evidential: Evidential relations are introduced by REPORTING or PERCEPTION: John said he bought some wine. Mary saw John carrying only beer. 5. Negative evidential: Introduced by REPORTING (and PERCEPTION?) events conveying negative polarity: John denied he bought only beer. 6. Negative: Introduced only by negative particles (not, nor, neither, etc.), which will be marked as SIGNALs, with respect to the events they are modifying: John didnt forgot to buy some wine. John did not wanted to marry Mary.,ALINK,ALINK or Aspectual Link represent the relationship between an aspectual event and its argument even
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025福建南平市公路建设管理有限公司招聘28人考试笔试模拟试题及答案解析
- 2025重庆大学土木工程学院智能建造团队劳务派遣科研助理招聘1人笔试考试参考试题及答案解析
- 2025广东珠海市北京师范大学香山中学秋季面向社会招聘事业编制教师53人笔试考试参考试题及答案解析
- 洪雅县2025年从服务基层项目等人员中考核招聘乡镇事业单位工作人员笔试考试备考试题及答案解析
- 2026中国储备粮管理集团有限公司湖北分公司招聘33人笔试考试参考试题及答案解析
- 2025年下半年四川泸州职业技术学院考核招聘事业编制专任教师18人笔试考试备考题库及答案解析
- 2026天津市第四中心医院招聘40人笔试考试备考试题及答案解析
- 2025四川乐山市精神卫生中心乐山市老年医院乐山市心理健康中心自主招聘工作人员5人笔试考试参考题库及答案解析
- 2025新疆第九师白杨市大学生乡村医生专项计划招聘3人考试笔试备考题库及答案解析
- 2026民航福建空管分局招聘5人考试笔试模拟试题及答案解析
- 索菲亚全屋定制合同模板2025年家居改造合同协议
- 企业软件正版化培训
- Unit 4 Ready for school(说课稿)-2024-2025学年人教PEP版(一起)(2024)英语一年级上册
- 旅游安全知识培训
- (2025年)文学理论练习题及答案
- 2025至2030中国重组胰蛋白酶行业项目调研及市场前景预测评估报告
- 非小细胞肺癌课件
- 教育公司聘用合同范本
- 道法新课标解读课件
- 2025四川遂宁发展投资集团有限公司招聘8人模拟试卷附答案
- 2025技能考试人工智能训练师三级题库练习试卷附答案
评论
0/150
提交评论