版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、introducing corpus linguisticscorpus linguisticsrichard xmodule description since the 1990s, the corpus methodology has revolutionized nearly all branches of linguistics corpus analysis can be illuminating in “virtually all branches of linguistics or language learning.” (leech 1997) one of the stren
2、gths of corpus data lies in its empirical and attested nature pools together the intuitions of a great number of speakers makes linguistic analysis more objective this module introduces the theoretical and practical issues of using corpora in linguistic studies explores how the corpus-based approach
3、 and other methodologies can be combined in linguistic studiesaims of the module the module aims to provide an introduction to corpus linguistics; familiarise students with major corpus resources and tools; pass on essential knowledge and skills for building diy corpora; to keep students up to date
4、with the latest developments in corpus research; develop students ability in corpus-based language studies.contents1)introducing corpus linguistics2)corpus design and types of corpora3)data capture and markup4)corpus annotation5)making statistic claims6)corpus analysis (1): concordance and wordlist7
5、)corpus analysis (2): keyword analysis8)corpora in lexicographic and lexical studies9)corpora in grammatical studies10) corpora in diachronic studies11)corpora in language variation research12)corpora in sociolinguistic studies13)corpora in language education14)corpora in literary and stylistic stud
6、ies15)corpora in critical discourse analysis16)corpora in contrastive and translation studieslearning outcomeson successful completion of the module, students will be able tounderstand the major theoretical frameworks in corpus linguistics and formulate research questions that are amenable to corpus
7、 research;think critically about the strengths and weaknesses of the corpus methodology and decide when and how to interface it with other methodologies;get familiar with major corpus resources and tools and to develop diy corpora when necessary;apply the corpus-based approach in their own research.
8、teaching/learning strategies with a dual focus on why and how to in corpus-based language studies, this practical module will be delivered through a series of lectures and hands-on lab sessions the module also engages students in extensive reading and interaction with corpus data outside of classass
9、essment option a a 1,000-word essay that critically reviews a corpus exploration tool or a corpus-based study (40%) a 2,500-word project report (60%) option b one 3,500-word essay based on a research project of your own choice (100%) deadline: submission a word copy as email attachment reading list
10、set text mcenery, a., xiao, r. and tono, y. (2006) corpus-based language studies: an advanced resource book. london: routledge. wynne, m. (2005) developing linguistic corpora. oxford: oxbow books. available online at http:/www.ahds.ac.uk/creating/guides/linguistic-corpora recommended reading see the
11、 module syllabus at the course websiteoutline of this session lecture: introducing key concepts and debates in corpus linguistics what is and is not a corpus? why use corpora? corpora vs. intuitions the corpus methodology a brief history of corpus linguistics nature and applications of corpus-based
12、studies lab: testing your intuitions + exploring online resourceswhat is a corpus? the word corpus comes from latin (“body”) and the plural is corpora a corpus is a body of naturally occurring language but rarely a random collection of text corpora “are generally assembled with particular purposes i
13、n mind, and are often assembled to be (informally speaking) representative of some language or text type.” (leech 1992) “a corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language
14、or language variety.” (mxt 2006: 5)what is not a corpus?a list of words is not a corpus building blocks of languagea text archive is not a corpus a random collection of textsa collection of citations is not a corpus a short quotation which contains a word or phrase that is the reason for its selecti
15、ona collection of quotations is not a corpus a short selection from a text chosen on internal criteria by human beingsa text is not a corpus intending to be read in different waysthe web is not a corpus its dimensions unknown, constantly changing, not designed from a linguistic perspectivesinclair (
16、2005)what is a corpus for? a corpus is made for the study of language in a broad sense to test existing linguistic theory and hypotheses to generate and verify new linguistic hypotheses the purpose is reflected in a well-designed corpuswhy use corpora? even expert speakers have only a partial knowle
17、dge of a language a corpus can be more comprehensive and balanced even expert speakers tend to notice the unusual and think of what is possible a corpus can show us what is common and typical even expert speakers cannot quantify their knowledge of language a corpus can readily give us accurate stati
18、sticswhy use corpora? even expert speakers cannot remember everything they know a corpus can store and recall all the information that has been stored in it even experts speakers cannot make up natural examples a corpus can provide us with a vast number of examples in real communication context even
19、 expert speakers have prejudices and preferences and every language has cultural connotations and underlying ideology a corpus can give you more objective evidencewhy use corpora? even expert speakers are not always available to be consulted a corpus can be made permanently accessible to all even ex
20、pert speakers cannot keep up with language change a constantly updated corpus can reflect even recent changes in the language even expert speakers lack authority: they can be challenged by other expert speakers a corpus can encompass the actual language use of many expert speakersintuitions as an al
21、ternative intuitions are always useful in linguistics to invent (grammatical, ungrammatical, or questionable) example sentences for linguistic analysis to make judgments about the acceptability / grammaticality or meaning of an expression to help with categorizationintuitions as an alternative intui
22、tions should be applied with caution possibly biased as they are likely to be influenced by ones dialect or sociolect introspective data is artificial and may not represent typical language use as one is consciously monitoring ones language production introspective data is decontextualized because i
23、t exists in the analysts mind rather than in any real linguistic context intuitions are not observable and verifiable by everyone as corpora are excessive reliance on intuitions blinds the analyst to the realities of language usage because we tend to notice the unusual but overlook the commonplace t
24、here are areas in linguistics where intuitions cannot be used reliably e.g. language variation, historical linguistics, register and style, first and second language acquisition human beings have only the vaguest notion of the frequency of a construct or a wordbenefits of corpus data corpus data is
25、more reliable a corpus pools together linguistic intuitions of a range of language speakers, which offsets the potential biases in intuitions of individual speakers corpus data is more natural it is used in real communications instead of being invented specifically for linguistic analysis corpus dat
26、a is contextualized attested language use which has already occurred in real linguistic context corpus data is quantitative corpora can provide frequencies and statistics readily corpus data can find differences that intuitions alone cannot perceive e.g. synonyms totally, absolutely, utterly, comple
27、tely, entirelycorpora vs. intuitions not necessarily antagonistic, but rather corroborate each other and can be gainfully viewed as being complementary armchair linguists and corpus linguists “need each other. or better, the two kinds of linguists, wherever possible, should exist in the same body.”
28、(fillmore 1992) “neither the corpus linguist of the 1950s, who rejected intuitions, nor the general linguist of the 1960s, who rejected corpus data, was able to achieve the interaction of data coverage and the insight that characterize the many successful corpus analyses of recent years.” (leech 199
29、1) the key to using corpus data is to find the balance between the use of corpus data and the use of ones intuitionsthe corpus methodology it is debatable whether cl is a methodology or a branch of linguistics cl goes well beyond this methodological role and has become an independent discipline in s
30、pite of the name, cl is indeed a methodology rather than an independent branch of linguistics in the same sense as phonetics, syntax, semantics or pragmatics these latter areas of linguistics describe, or explain, a certain aspect of language use corpus linguistics, in contrast, is not restricted to
31、 a particular aspect of language - it can be employed to explore almost any area of linguistic researcha brief history of cl the term corpus linguistics first appeared only in the early 1980s, but corpus-based language study has a substantial history the history of cl can be split into two periods:
32、before and after chomskya brief history of cl before chomsky field linguists and linguists of the structuralist tradition used “shoebox corpora” shoeboxes filled with paper slips their methodology was essentially “corpus-based” in the sense that it was empirical and based on observed data the work o
33、f early corpus linguistics was underpinned by two fundamental, yet flawed assumptions the sentences of a natural language are finite. the sentences of a natural language can be collected and enumerated. most linguists saw the “corpus” as the only source of linguistic evidence in the formation of lin
34、guistic theoriesa brief history of clchomsky revolution: between 1957 and 1965 chomsky changed the direction of linguistics from empiricism towards rationalism “any natural corpus will be skewed. some sentences wont occur because they are obvious, others because they are false, still others because
35、they are impolite. the corpus, if natural, will be so wildly skewed that the description would be no more than a mere list.” (chomsky 1962) our internal knowledge of language in human brain (competence, i-language) replaces observed data (performance, e-language) intuitions started to be relied on a
36、s evidencexiao, r. (2008) “theory-driven corpus research: using corpora to inform aspect theory”. in a. ldeling & m. kyto (eds.) corpus linguistics: an international handbook. berlin: mouton de gruytera brief history of cl revival of cl corpus research was continued in a few centres (brown, lanc
37、aster) in the 60s-70s the brown university standard corpus of present-day american english (brown corpus) lancaster-oslo-bergen corpus of bre (lob) the hardware still imposed some restrictions until the real development started in the 1980s the marriage of corpora with computer technology rekindled
38、interest in the corpus methodology since then, the number and size of corpora and corpus-based studies have increased dramatically nowadays, the corpus methodology enjoys widespread popularity, and has opened up or foregrounded many new areas of researchareas that have used corpora lexicography lexi
39、cal studies grammatical studies register/genre analysis language variation contrastive analysis translation studies language change language teaching semantics pragmatics stylistics literary study sociolinguistics discourse analysis forensic linguistics computational linguistics nature of corpus-bas
40、ed approach it is empirical, analysing the actual patterns of use from natural texts it utilises a large and principled collection of natural texts as the basis for analysis it makes extensive use of computers for analysis, using both automatic and interactive techniques it integrates both quantitat
41、ive and qualitative analytical techniques(biber et al 1998: 4-5)why use computers? development of computer technology has revived cl machine-readability is a de facto attribute of modern corpora electronic corpora have advantages unavailable to their “shoebox” ancestors it is the use of computerized
42、 corpora, together with computer programs which facilitate linguistic analysis, that distinguishes modern electronic corpora from early drawer-cum-slip corporawhy use computers? computerized corpora can be processed and manipulated rapidly at minimal cost e.g. searching, selecting, sorting and forma
43、tting computers can process machine-readable data accurately and consistently computers can avoid human bias in an analysis, thus making the result more reliable machine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enriched with various me
44、tadata and linguistic analyses corpus markup and corpus annotationa question for deep thought“alright,” said the computer deep thought. “the answer to the great question.” “yes.!”“of life, the universe and everything .” said deep thought. “yes.!”“is.”“yes.!.?” “forty-two,” said deep thought, with in
45、finite majesty and calm.it was a long time before anyone spoke. “forty-two!” yelled someone in the audience. “is that all youve got to show for seven and a half million years work?”“i checked it very thoroughly,” said the computer, “and that quite definitely is the answer. i think the problem, to be
46、 quite honest with you, is that youve never actually known what the question is.” hitchhikers guide to the galaxy by douglas adamswhat can we learn from this story?what corpora cannot do corpora do not provide negative evidence cannot tell us what is possible or not possible can show what is central
47、 and typical in language corpora can yield findings but rarely provide explanations for what is observed interfacing other methodologies the use of corpora as a methodology also defines the boundaries of any given study importance of amenable research questions the findings based on a particular cor
48、pus only tell us what is true in that corpus generalisation vs. representativeness see unit b2 for pros and cons of corporaask corpora the right questions corpus linguistics as a methodology is only one of the (many) ways of doing things “doing linguistics” the usefulness of corpora depends upon the
49、 research question being investigated “they are invaluable for doing what they do, and what they do not do must be done in another way.” (hunston 2002: 20) the development of the corpus-based approach as a tool in language studies has been compared to the invention of telescopes in astronomy if it i
50、s ridiculous to criticize a telescope for not being a microscope, it is equally pointless to criticize the corpus-based approach for not doing what it is not intended to do it is up to you to formulate research questions amenable to corpus-based investigation and to decide how to combine corpora with other resourcestesting your intuitions with view/bnc/most common noun in englishsearch for n*top 10: time, peop
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年武汉大学中南医院门诊部劳务派遣制导医招聘备考题库及完整答案详解一套
- 2026年普定县梓涵明德学校教师招聘备考题库(9名)及参考答案详解
- 会议室开会制度
- 2026年重庆医科大学附属康复医院关于党政办公室党建、宣传干事、医保办工作人员招聘备考题库参考答案详解
- 2026年深圳市龙华区第三实验学校附属善德幼儿园招聘备考题库完整参考答案详解
- 中学教学质量保证措施制度
- 2026年西安交通大学附属小学招聘备考题库附答案详解
- 2026年漯河市城乡一体化示范区事业单位人才引进备考题库及参考答案详解1套
- 2026年重庆护理职业学院(第一批)公开招聘工作人员备考题库及一套完整答案详解
- 中国人民银行所属企业网联清算有限公司2026年度校园招聘26人备考题库及完整答案详解一套
- 无人机UOM考试试题及答案
- 湖南省永州市祁阳县2024-2025学年数学七年级第一学期期末联考试题含解析
- 非常规油气藏超分子压裂液体系研发与性能评价
- 运用PDCA提高全院感染性休克集束化治疗达标率
- 第1讲 数学建模简介课件
- DB36T-叶类蔬菜机械收获作业技术规程
- 2024年全国体育单独统一招生考试语文试卷附答案
- 辽宁2017建设工程费用标准
- DB13-T5385-2021机器人检测混凝土抗压强度技术要求
- 安全生产管理办法与实施细则
- 《牛津书虫系列 绿野仙踪》电子插画版英语教学课外读物(含翻译)
评论
0/150
提交评论