Why-frequency-can-no-longer-be-ignored-in-ELT-中英文翻译_第1页
Why-frequency-can-no-longer-be-ignored-in-ELT-中英文翻译_第2页
Why-frequency-can-no-longer-be-ignored-in-ELT-中英文翻译_第3页
Why-frequency-can-no-longer-be-ignored-in-ELT-中英文翻译_第4页
Why-frequency-can-no-longer-be-ignored-in-ELT-中英文翻译_第5页
已阅读5页,还剩4页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Why frequency can no longer be ignored in ELT?Geoffrey LeechLancaster University1. IntroductionIf asked what is the one benefit that corpora can provide and that cannot be provided by other means, I would reply “information about frequency”. Frequency is also a theme which has recurred in language l

2、earning although it has also suffered from neglect. Hence there is need for are appraisal of the links between frequency, corpora and language learning. Following this introduction, this paper is divided into three main sections: sect ion 2: A brief g lance at history; section 3: Recent progress in

3、frequency studies relevant to language learning; sect ion 4: New directions in applied linguistics favorable to frequency.To begin with, it is as well to make clear that there are three usages of frequency that might be confused.(A) There is raw frequency, which is simply a count of how many instanc

4、es of some linguistic phenomenon X occur in some corpus, text or collection of texts. (B) Then there is normalized frequency (sometimes called relative frequency), which expresses frequency relative to a standard yardstick: e. g. tokens per million words).(C) There is also w hat I will call ordinal

5、frequency, where the frequency of X is compared with the frequencies of Y, of Z, etc. Thus a rank frequency list, in which words are listed in order of frequency, is the classic ex ample of ordinal frequency. Although (A) is the raw measure from which (B) and (C) are derived, it is of little or no u

6、se in itself. (B) Normalized frequency is of course essential if we are to make comparisons between corpora, texts, etc. of different sizes. But my view is that (C) ordinal frequency is the most useful measure to use when we are considering language learning. It is of no use for the language teacher

7、 to be told that shall occurs 175 times per million words in a corpus. But to be told that will is much (15 times) more frequent than shall may well be pedagogically useful.2. A brief glance at historyThe historical sketch I am about to give roughly divides into three epochs: (a) early frequency stu

8、dies; (b) the rejection of frequency; (c) the computer age and the resurgence of frequency studies.2.1 Early frequency studiesFor my present purpose, it is enough to refer to one or two landmarks in the provision of word frequency information on English. Thorndike, Thorndike & Lorge, and West are no

9、ted examples of wordfrequency list s produced by counting and calculating word frequencies by hand. By presentday standards, the corpora used were pitifully small, and the selection of texts they contained included some choices hardly ideal for learners of the current language. However, the importan

10、t point here is that word frequency was taken seriously as a guide for language teaching in those days, and in spite of the enormous amount of unrewarding “slave labor” involved, building frequency lists w as felt to be a worthwhile exercise. The simple postulate justifying this effort w as: more fr

11、equent = more important to learn.Of greater interest from the theoretical point of view w as the mathematical work of Zipf. Zipfs Law held that the frequency of any word is inversely proportional to its rank in the frequency list, such that the nth word has a frequency of approximately 1/n X the fre

12、quency of the word of highest rank. Zipfs Law gave a more heavily weighted importance to the most frequent words than would be expected according to normal distribution. Language is such that the most frequent 50 words (i. e. word types) account for 50% of wordtokens in a corpus of texts; the most f

13、requent 3, 000 words account for 85% of word tokens; and the most frequent 10, 000 words account for 92% of word tokens. For practical purposes we can say that the wordstock of English is both very large and openended. 2.2 The reject ion of frequencyIn linguistics, the second half of the twentieth c

14、entury, at least up to the 1990s, was dominated by the generative school of Noam Chomsky, who rejected the value of frequency in the study and understanding of language. Chomsky famously used the illustration of I live in Dayton, Ohio and I live in New York to show that the greater frequency of the

15、latter sentence as compared with the former w as of no linguistic relevance or interest. Of course, this had more to do with the differences of population between Dayton, Ohio and New York - from Chomsky s point of view, a matter of performance (and hence of no value to linguistics) rather than comp

16、etence. He concluded that “probabilistic considerations have nothing to do with grammar” - using grammar in a broad Chomskyan sense to include the whole language system. From that time until (roughly) the end of the century, it was difficult to find any serious reference to frequency in publications

17、 about the learning of languages, and where frequency was discussed, it was dealt with perfunctorily and sometimes negatively. The well known authoritative handbook by Rod Ellis, The Study of Second Language Acquisition (1994), has little to say about frequency, and offers very little extra in it s

18、second edition of over a thousand pages, published as recently as 2008. The only substantial reference to frequency is in the section headed “The frequency hypothesis”, in which the emphasis is w holly on the learners input frequency. For corpus linguistics, a more relevant question is: How can both

19、 the learners input and output be adjusted to the future likely needs of the learner as revealed in corpora?2.3 The compute rage and the revival of frequency studiesThe corpus revolution in linguistics began with the completion and distribution of the Brown Corpus in 1964. Shortly after, Kucera & Fr

20、ancis used this to create the first word frequency lists for English based on corpus data. Later, in Francis &Kucera, they published lemmatized frequency lists, based on the partofspeech POS tagged version of the corpus. Further word frequency lists were derived from the Lancaster Oslo/ Bergen LOB C

21、orpus of British English , and for the first time grammatically informed word frequency lists derived automatically from matching computer corpora became available to the language researcher and the language teacher permitting comparison of American and British English.This was only the first step:

22、in the last forty years, there has been an immense increase in the number of corpus based frequency studies both for written and spoken English, as more diversified corpora as well as much larger corpora have become available. Apart from word frequency lists and studies, corpus -based frequency stud

23、ies have dealt with collocations, and with frequency of grammatical categories, structures, etc. Here hundreds of grammatical studies could be mentioned, starting from Ehrman 1966, and culminating in a corpusbased frequency grammar of English as well as with frequency studies of the language of lear

24、ners. It goes almost without saying that the availability of electronic corpora has revolutionized the application of frequency information whether derived from general corpora, specialized corpora, written texts or spoken transcriptions.It is also clear that frequency data from authentic text s hav

25、e been one of the major driving forces of natural language processing NLP , leading to the development of sophisticated statistical methods and probabilistic systems. One of the first steps was taken in the probabilistic POS tagging of the LOB Corpus, employing a modified Hidden Markov Process model

26、. The history of statistical model ling in NLP, however, cannot be pursued further here. See Jelinek for further coverage.2.4 Co-frequency, collocationAnother great step forward was taken through the pursuit of co frequency i .e. the frequency of X and Y occurring together in a corpus, as measured a

27、gainst the probability of their occurring tog ether by chance. A serious beginning was made in Sinclairs research discussed in his OSTI report, using a small corpus of spoken English of 135, 000 words. Obviously, as Sinclair pointed out, a much bigger corpus (of 20 million words or more) was needed

28、to produce significant results for collocation analysis. This was achieved and surpassed in the 1980s and 1990s with Sinclair s development of the Birmingham Collection of English Texts, later known as the Bank of English, as well as by other corpora such as the BNC. To give an impression of how vas

29、tly the size of corpora on which frequency studies are based has mushroomed in the last forty years: in comparison with Sinclairs spoken corpus of 135,000 words in 1970, a recently published frequency dictionary of American English is based on a corpus of 385, 000, 000 words, including 79, 000, 000

30、words of speech. This dictionary is also an innovation in providing, alongside individual word frequencies, a classified list of common collocations for each word.Word frequency lists such as those of Francis and Kucera were of limited interest to corpus linguists like John Sinclair, who urged the i

31、nadequacy of the open choice principle of treating every wordtoken in a string as if independently selected, as contrasted with the idiom principle whereby texts are observed to be constructed in terms of “large number of semi-preconstructed phrases that constitute sing le choices, even though they

32、might appear to be analyzable into segments”. Sinclairs idiom principle has since been followed up by many corpus linguists and lexicographers for whom multi-word units- collocations, lexical bundles, and the like - are essential to the fabric of language, as w ell as to the learning of language. In

33、deed, corpus research itself has shown observationally the importance of word combinations, whose significance is capable of being measured by statistical formulae such as mutual information, test, and log likelihood. Sinclair, in championing the idiom principle, was following to some ex tent in the

34、 footsteps of his former Edinburgh colleague M. A. K. Halliday, and Hallidays teacher J. R. Firth, who first gave prominence to the co-frequential concept of collocation. Halliday had stated that the level of lexis (including collocation) had to be a distinct level of linguistic description and at t

35、he same time had proposed that the levels of grammar and of lexis w ere interrelated along a cline or continuum of delicacy. For him, the levels of grammar and lexis constituted a single lexico-grammatical level accounting for the formal structuring of language. Many corpus linguists have espoused s

36、omething like this model, evidenced as it is by a multitude of studies, with the result that the interpenetration of grammar and lexis has become widely accepted. In this respect, it can be said that the corpus revolution has introduced a new theoretical perspective on linguistic structuring: one in

37、 bold contrast to the mainstream paradigm of Chomsky whereby grammar and lexicon are two clearly distinct component s. It also challenges a tradition long established in language study, whereby grammars and dictionaries provide distinct kinds of information about a language, and are published in sep

38、arate covers, and dictionaries provide distinct kinds of information about a language, and are published in separate covers. 3. Recent Progress in Frequency Studies Relevant to Language LearningIn this part I will discuss how studies of frequency have been increasingly applied to various linguistic

39、unit s or components. In so doing, I will consider how the following four topics have been advanced by recent research:a. Word frequencyb. Co-frequency between words - lexis and collocationc. Grammatical frequencyd. Lexico-grammatical frequency co-frequency between lexis and grammatical structures3.

40、 1How frequency is important for English Language Teaching (ELT)First, let us revisit word frequency lists. The case for “more frequent = more important to learn” is simply put: “The reasoning behind this position is that learners should be taught w hat is most frequent in language, since it is what

41、 is of most use to them”. In other words, the more frequent a word is in language use, the more likely it is to be useful to the learner. This is (a) because it will be more frequently encountered in the language use of other people, and (b) because it will be more frequently needed for the learners

42、 own language use.论频率对英语教学重要性1 引言如果被问到语料库的好处以及其用其他方法不能提过的益处是什么,我会回答是“关于频率的信息”。尽管曾经被忽视,但是频率还是再次成为语言学习领域的主题。因此重新评估频率、语料库和语言学习之间的关系十分必要。继引言之后,本篇文章可以划分为三个部分:第二部分为历史简介;第三部分为与语言学习相关的频率的最新进展;第四部分是应用语言学频率的新方向。首先要弄清楚,关于频率的比较容易让人迷惑的三个用法。第一,原始频率,是对语料库、文章或者文集中某个语言现象X出现次数的简单计算。第二,标准频率(有时也叫做相对频率),这是相对于某一标准来计算频率,比

43、如,每一百万单词做出的记号。第三,也就是我称之为的序数频率,X现象与Y、Z现象相比出现的频率。因此,按照出现频率来排序的单词表是一个关于序数频率的经典案例。尽管标准频率与序数频率是从原始频率中衍化而来,但是仅就原始频率本身而言没有什么用处。如果我们要对比不同大小的语料库、文本或者其他资料,标准频率当然是最为必要的。但是我的观点是,序数频率是考量语言学习的最有效的手段。告诉语言老师在每一百万的单词中shall的出现频率是175次是没有用的。可是告诉他们will比shall使用得更加频繁(约15倍)时,在语言教学方面反而更加有效。2 历史简介我将要介绍的历史梗概可以分为三个时间段:第一,早期的频率

44、研究;第二,对频率研究的摒弃;第三,电脑时代与频率研究的复兴。2.1 早期的频率研究引用一两个关于英语单词频率有关的事件来阐述我的目的就已经足够。Thorndike,Thorndike & Lorge, 和West是制作单词频率排序表的著名案例,人工地通过计算单词出现频率对单词进行的排序。虽然按照现代的标准,他们引用的语料库过于狭小且语料库中所包含的文章对于当时的学习者并不是十分的合适,但是这里需要重视的是在那个时代单词频率被严格的作为语言教学的重要向导。尽管需要做大量的工作,且像奴隶劳动一般没有报酬,但是制作单词频率表还是被认为是一项非常值得做的事情。而这份锻炼所得到的基本原理就是:出现的越

45、是频繁越是重要越是值得学习。从理论角度来看更加有趣的是齐波夫的数学工作。齐波夫定律认为每个单词的频率与它在单词频率表中的序位是成反比的。比如,第n个单词在X现象中出现的频率相应的是1/n。齐波夫定律比按照常规分配标准使得原本就最为频繁的单词显得更加重要。语言就是如此,使用频率最高的50个单词占到语料库文本中文字符号的50%,使用最频繁的3000个单词占到其中的85%,使用最频繁的10,000个单词占到其中的92%。从实用的目的来说,英语的词汇库是非常庞大且十分开放。2.2对频率的摒弃在语言学中,Noam Chomsky的生成语法学派主宰了二十世纪下半叶,至少至二十世纪九十年代。他否定了频率在语

46、言学习和理解中的价值。最出名的是Chomsky用“I live in Day ton, Ohio”和“I live in New York”的例子来说明:后一句比一句出现的频率高与语言学并无关联或者对其并无益处。当然,这与纽约与代顿市的人口数量有更大关系。在Chomsky看来,这是一个语言的实际输出问题而不是人们所掌握的语言知识的情况(因此对语言学没有价值)。他得出结论:概率性问题与语法无关在广义上使用语法来包括整个语言系统。从那时起,大概一直到这个世纪末,都很难找到涉及语言学习中频率的参考文献出版。并且,不管在哪儿谈论频率,它都遭受敷衍的对待,有时遭到否定。Rod Ellis的权威性参考书T

47、he Study of Second Language Acquisition (1994)几乎不涉及频率,并且在一千页的再版中只增加了一点的关于频率的内容。涉及频率的实质性内容出现在以“频率假设”为标题的部分。在这部分,重点完全在于初学者的输入频率。对于语料库语言学来说,一个更重要的问题是:如何能将初学者的输入与输出调整至语料库里所显示的初学者未来可能的需要。2.3 电脑时代与频率研究的复兴语言学中的语料库变革开始于1964年Brown语料库的完成与传播。不久之后,在1967年,Kucera 和 Francis 用此基于语料库的数据创造了第一个英语单词频率表。随后,在Francis和Kuce

48、ra中,他们出版了将文字曲折归类化的频率表。更深层次的单词频率表源于Lancaster-Oslo/ Bergen LOB 英语语料库。并且,从匹配电脑语料库中导出的有语法根据的单词频率表首次为愿意拿美式英语和英式英语做比较的语言研究者和语言老师所得。 这只是第一步,在最后四十年里,基于语料库的口语与书面语的频率研究数量都大幅增加,更多种类以及更大容量的语料库也很容易达到。除了单词频率清单和研究,基于语料库的频率研究解决了词语搭配的问题,以及语法分类、结构等问题。如今,始于Ehrman,数百种的语法研究被提及,并在基于语料库的英语语法频率中达到顶点。电子语料库的可利用性彻底变革了来自非专业性语料库、专业性语料库、书面语篇以及口语翻译中的频率

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论