




已阅读5页,还剩81页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
When Corpus Meets Theory,James PustejovskyTSD 2002 September 10, 2002,Models and Data,Talk Outline,Goals for Language ModelingThe Role of Corpus in TheoryDisambiguationSelection discoveryClusteringCategory modification and formationGrammar inductionThe Role of Theory in Corpus,Goals of Language Modeling,Statistically informed models improve application performanceSpeechSearchClusteringParsingMachine translationSummarizationQuestion answering,Theory Drives the Model,Corpus Behavior of words is determined by their type.You cant find what you cant model. But, you dont want to find only what you model! Theory allows a model of reality, but Corpus brings reality to the model.,Language Modeling with Generative Lexicon,Selection integrates paradigmatics and syntagmaticsModels the relationship between selectional contextsCoercion in typingComplex type (Dot Objects)All major categories behave functionallyQualia structure models much of this behaviorSemantic Types are differentiated and ranked:Grammatical behavior follows (generally) from type,Quines Gambit in Corpora,Co-occurrence reveals surface relations.Paradigmatics is first order.Syntagmatics is first order.LSA and other techniques create non-superficial associations. Model Bias is necessary to create decision procedures Example: Complex Types,Recognizing Selection,1. a. The man fell/died. b. The rock fell/!died.a. John forced/!convinced the door to open.b. John forced/convinced the guests to leave.a. John poured milk into /!on his coffee.b. John poured milk into/on the bowl.,Modeling Paradigmatic Systems,Integrating Selection into Grammars,Qualia are used to create new types: They are generative coherence relations between types.,Qualia Structure,Three Ranks of Type,Entities,Events,System of Generating Types,Qualia are incorporated into Type Itself,Qualia as Types,Functional Selection,Functional Type Coercion,Co-composition,Coercion in Function Composition,Selection and Coercion,Type Specification,Type Determines Grammatical Behavior,Behavior is measurable in corpus,Corpus Distribution of different types should correlate strongly with their type.,Corpus Analysis provides probable values for Coercion,Drinking, sipping, cooling,?pouring,?spilling,Complements of “begin” in AP:(Pustejovsky and Rooth, 1991 ms),Complements of “veto” in AP,Limitations of this Approach:Fuzzy Selection,Dependencies that require Models:Complex Types,Complex Types,Contexts Introducing Complex Types,a. John read the story/the book.b. John told the story/!the book. 2. Mary read the subway wall.,When Paradigmatic systems are modeled, Syntagmatic Processes are affected,The specificity of argument selection by a predicate;The treatment of verbal polysemy and multiple subcategorization The treatment of type mismatches and the semantics of solidarities,Types of Properties,Natural Binary Predicate,Polar Predicates,hot/coldbig/smallshort/tallclean/dirty,Lexical Asymmetries,Preferences and Defaults: clean/dirty, empty/full, pretty/uglyLexical Gaps:bald/(hairy), toothless/(toothed)Lexical Perfectives:dead/alive,Sortal Opposition:,External Negation points up in the Type system:Internal Negation points down in the Type System:(1) a. Rocks are not alive. b. !Rocks are dead. (2) a. The Pope is not married.b. !The Pope is a bachelor.(3) a. Bill did not run the race.b. Hence, Bill did not win the race.c. !Bill lost the race.,Case Study I: Corpus Drives Lexical Acquisition,Text Mining the Biobibliome,40,000 papers published each month in Medline11 million abstracts currently in Medline Database 36 GB of text,Robust Extraction of Relations from Biomedical Texts,Statistical techniques are too course-grained“SU6656 does not inhibit the PDGF receptor.”Local Named Entity Extraction is not informative enough“This protein binds to Src.”Bag of words and bag of entities approaches are too weak“p16 inhibits Cdk4.”“Cdk4 is inhibited by p16.”,Parsing Methodology,Identify Targets of InterestEntities and relations to be extractedPerform Corpus Analysis over targetsCluster corpus occurrences by syntactic behavior and semantic typeGenerate Patterns for extractionTest and modify patterns against development corpus,Possible Selectional Frames,“p16 inhibits Cdk4.” (entity,entity) “p16 inhibits cell growth.” (entity,process) “Methylation inhibits HDAC1.” (process,entity) “Cell growth inhibits apoptosis.” (process,process),Corpus Pattern Analysis,Create concordances over target elementsAutomatically cluster complementation patternsSemi-automatically verify patterns and amend grammar rules accordingly.,Getting the Lexicon out of the Corpus,Preliminary examination of the textSort concordances according to semantics patternsOne-sense-per-domain doesnt cut itComplementation patterns emerge from the corpus, with and without realizationSemantic patterns are a first step towards identifying lexical setsSemantic patterns identified with specific lexical sets yields co-specifications Implicatures can be identified with co-specifications for a very high proportion of uses of all predicators.,Corpus-derived Grammars distinguish Textual Function,Tensed Sentence-based relational information conveys new information. A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity.Nominalization functions to:Allow further predication and modification;Bridge the new information with acceptance as given. Provide economy of expression in text;Agentive Nominal conveys a relation as a given fact. The protein kinase C inhibitor staurosporine , inhibited actin assembly,Probable Syntactic Patterns: Sentential Forms,A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity. Whereas phosphorylation of the IRK by ATP is inhibited by the nonhydrolyzable competitor adenylyl-imidodiphosphate, . The Met tail peptide inhibits the closely related Ron receptor but does not affect Although the ability of individual trichothecenes to inhibit protein synthesis and activate JNK/p38 kinases are dissociable , both effects contribute to the induction of apoptosis .,Probable Syntactic Patterns: Nominal Forms,12S E1A , an inhibitor of p300-dependent transcription , reduces the binding of TFIIB , but not that of cyclin E- Cdk2 , to p300. The protein kinase C inhibitor staurosporine , inhibited actin assembly and platelet aggregation induced by thrombin or PMA.,Probable Syntactic Patterns: Nominalizations,Structural basis for inhibition of protein tyrosine phosphatases by Keggin compounds phosphomolybdate and phosphotungstate. Previous reports raised question as to whether 8-Cl-cAMP is a prodrug for its metabolite, 8-Cl-adenosine which exerts growth inhibition in a broad spectrum of cancer cells.,Case Study II: Theory Drives Corpus Analysis,Semantic Rerendering,A general technique for adapting and modifying an existing ontologyTypes are extended and created through: corpus analysis of patterns implicated with type structuresAd hoc database projections over a relational database,Specialized Ontologies in the Biomedical Domain,The UMLS from National Library of Medicinewide coverageshallow semantic type structure 180,998 instances of Amino Acid, Peptide, or Protein in UMLS Chemical Viewed Functionally and Chemical Viewed StructurallyThese 2 subtrees cover a large number of all types in the UMLSThe UMLS gives semantic type bindings to 1.5 million entities,NLP Applications using Semantic Typing,Statistical Categorization and Disambiguation TasksResolution of Prepositional AttachmentRelations between Constituents in Nominal CompoundsGeneralizing across semantic classes = make up for the sparseness of dataIR Tasks Query ReformulationFiltering & Ranking of Retrieved Results Information Extraction TasksCoreference ResolutionRelation Extraction (via Anaphora Resolution) Entity Identification,GL as Modeling Bias in Rerendering,Structural subtyping (Formal)Functional subtyping (Telic)Activation relations (Agentive)Molecular analysis (Const),Syntactic Rerendering Algorithm (I),Syntactic Rerendering Algorithm (II),Syntactic Rerendering Algorithm (III),Evaluating Results,Comparison against Existing Ontologiesoverlap with Gene Ontology (GO) for select categoriesReceptor: 17.5% of 2nd level extension phrases are in GOImproved P&R for the client NLP ApplicationsCoreference Resolution ApplicationSortal Anaphora:“the enzyme”, “the protease”, “the same solvent”, etc.,Derivation of Instances for the Proposed Subtypes,Syntactic templates (inhibitor, solvent) :definitional constructions: “X is a Y inhibitor”aliasing constructions: “X (the solvent)”appositions: “X, the inhibitor of Y,”nominal compounds: “the solvent X”enumerations: “the following solvents: X, Y, .”relative clausesadjuncts: “X and Y as solvents”,Semantic (Database) Rerendering,Database of relations extracted from the Medline corpusinhibit, block, phosphorylateTyped projection from relations table induces an ad hoc category subtype of T1X = X : T1| R(X,Y) T1UMLS1,Syntactic vs. Semantic Rerendering,Sortals with no corresponding relational form solventSortal and relation predicatesinhibitor/inhibit kinase/phosphorylateRelation predicates with no corresponding nominal formsbind withincrease,Syntactic vs. Semantic Rerendering (II),Overlap of derived subtypesCDK inhibitorp21(WAF-1) inhibited CDK2 and CDK4Recover different types of informationSyntactic templates for sortal predicates : old informationTyped projections of database relations : new information,Case Study III: Applying Lexical Semantic Knowledge TERQAS: Time and Event Recognition for Question Answering Systems,Relevance to Question Answering Systems,Is Gates currently CEO of Microsoft? Were there any meetings between the terrorist hijackers and Iraq before the WTC event?Did the Enron merger with Dynegy take place?How long did the hostage situation in Beirut last?,When did the war between Iran and Iraq end? When did John Sununu travel to a fundraiser for John Ashcroft? How many Tutsis were killed by Hutus in Rwanda in 1994? Who was Secretary of Defense during the Gulf War? What was the largest U.S. military operation since Vietnam? When did the astronauts return from the space station on the last shuttle flight?,Questions over TIMBANK Corpus,Workshop Goals,TimeML: Define and Design a Metadata Standard for Markup of events, their temporal anchoring, and how they are related to each other in News articles. TIMEBANK: Given the specification of TimeML, create a gold standard corpus of 300 articles marked up for temporal expressions, events, and basic temporal relations.,TERQAS Participants,James Pustejovsky, PIRob GaizauskasGraham KatzBob Ingria Jos CastaoInderjeet ManiAntonio SanfilippoDragomir RadevPatrick HanksMarc VerhagenBeth SundheimAndrea Setzer,Jerry HobbsBran BoguraevAndy LattoJohn FrankLisa FerroMarcia LazoRoser SaurAnna RumshiskyDavid DayLuc BelangerHarry WuAndrew See,Supported by,How TimeML Differs from Previous Markups,Extends TIMEX2 annotation;Temporal Functions: three years agoAnchors to events and other temporal expressions: Identifies signals determining interpretation of temporal expressions;Temporal Prepositions: for, during, on, at;Temporal Connectives: before, after, while.Identifies event expressions; tensed verbs; has left, was captured, will resign;stative adjectives; sunken, stalled, on board;event nominals; merger, Military Operation, Gulf War;Creates dependencies between events and times:Anchoring; John left on Monday.Orderings; The party happened after midnight.Embedding; John said Mary left.,attributes := eid class tense aspect eid := IDeid := EventIDEventID := eclass := OCCURRENCE | PERCEPTION | REPORTING | ASPECTUAL | STATE | I_STATE | I_ACTION | MODALtense := PAST | PRESENT | FUTURE | NONEaspect := PROGRESSIVE | PERFECTIVE | PERFECTIVE_PROGRESSIVE | NONE,TimeML Event Classes,Occurrence: die, crash, build, merge, sell, take advantage of, .State:Be on board, kidnapped, recovering, love, .Reporting:Say, report, announce, I-Action:Attempt, try,promise, offerI-State:Believe, intend, want, Aspectual:begin, start, finish, stop, continue.Perception:See, hear, watch, feel.,The young industrys rapid growth also is attracting regulators eager to police its many facets. The young industrys rapid growth also is attracting regulators eager to police its many facets.,Israel will ask the United States to delay a military strike against Iraq until the Jewish state is fully prepared for a possible Iraqi attack. Israel will askthe United States to delay a military strike against Iraq until the Jewish state is fullypreparedfor a possible Iraqiattack,Fully Specified Temporal ExpressionsJune 11, 1989Summer, 2002Underspecified Temporal ExpressionsMondayNext monthLast yearTwo days agoDurationsThree monthsTwo yearsfunctionInDocument allows for relative anchoring of temporal expression values,TLINK,TLINK or Temporal Link represents the temporal relationship holding between events or between an event and a time, and establishes a link between the involved entities, making explicit if they are: Simultaneous (happening at the same time)Identical: (referring to the same event)John drove to Boston. During his drive he ate a donut. 3. One before the other: The police looked into the slayings of 14 women. In six of the cases suspects have already been arrested.4. One after the other: 5. One immediately before the other: All passengers died when the plane crashed into the mountain. 6.One immediately after than the other: 7.One including the other: John arrived in Boston last Thursday.8.One being included in the other: 9.One holding during the duration of the other: 10.One being the beginning of the other: John was in the gym between 6:00 p.m. and 7:00 p.m.11.One being begun by the other: 12.One being the ending of the other: John was in the gym between 6:00 p.m. and 7:00 p.m. 13.One being ended by the other:,SLINK,SLINK or Subordination Link is used for contexts introducing relations between two events, or an event and a signal, of the following sort: 1. Modal: Relation introduced mostly by modal verbs (should, could, would, etc.) and events that introduce a reference to a possible world -mainly I_STATEs: John should have bought some wine. Mary wanted John to buy some wine. 2. Factive: Certain verbs introduce an entailment (or presupposition) of the arguments veracity. They include forget in the tensed complement, regret, manage: John forgot that he was in Boston last year. Mary regrets that she didnt marry John. John managed to leave the party. 3. Counterfactive: The event introduces a presupposition about the non-veracity of its argument: forget (to), unable to (in past tense), prevent, cancel, avoid, decline, etc. John forgot to buy some wine. Mary was unable to marry John. John prevented the divorce. 4. Evidential: Evidential relations are introduced by REPORTING or PERCEPTION: John said he bought some wine. Mary saw John carrying only beer. 5. Negative evidential: Introduced by REPORTING (and PERCEPTION?) events conveying negative polarity: John denied he bought only beer. 6. Negative: Introduced only by negative particles (not, nor, neither, etc.), which will be marked as SIGNALs, with respect to the events they are modifying: John didnt forgot to buy some wine. John did not wanted to marry Mary.,ALINK,ALINK or Aspectual Link represent the relationship between an aspectual event and its argument even
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 与兼职会计合同范本
- 简单的生意合同范本
- 快递装卸服务合同范本
- 农村加固工程合同范本
- 辽宁住宅租赁合同范本
- 隔离酒店合同范本
- 框架采购合同范本
- 问题公关合同范本
- 冷库存货合同范本
- 超市协议书范本6篇
- 内科主治医师消化内科学考试题库真题及答案
- 5-1 安全协议概述(1)-安全协议内涵
- 校长在全体教师大会上的讲话:尺在言中界在人心度于行中-三尺讲台上的教育修为
- 2025广西公需科目培训考试答案(90分)一区两地一园一通道建设人工智能时代的机遇与挑战
- 中学营养餐管理办法
- 地质勘查人员职业技能鉴定经典试题含答案
- 2022利达华信JB-QB-LD988ENM火灾报警控制器-消防联动控制器
- 中央ppp项目管理办法
- 2025-2026学年冀人版(2024)小学科学三年级上册(全册)教学设计(附目录)
- (高清版)DB11∕T 1455-2025 电动汽车充电基础设施规划设计标准
- 2024版电网公司反事故措施(2024版)
评论
0/150
提交评论