




已阅读5页,还剩22页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
An Introduction to English Language TestingBy Chen HuilinDefinition of terms: measurement, test, evaluationMeasurement: the process of quantifying the characteristics of persons according to explicit procedures and rules.Test: a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.Evaluation: the systematic gathering of information for the purpose of making decisions. Approaches to language testing The essay-translation approach: the subjective judgment of the teacher is considered to be of paramount importance. Tests usually consist of essay writing, translation, and grammatical analysis, have a heavy literary and cultural bias.The structuralist approach: characterized by the view that language learning is chiefly concerned with the systematic acquisition of a set of habits, identify and measure the learners mastery of the separate elements and skills of the target language. It is considered essential to test one thing at a time.The integrative approach: involve the testing of language in context and is thus concerned with meaning and total communicative effect of discourse论述. Designed to assess the learners ability to use two or more skills simultaneously 同时地, and concerned with a global view of proficiency.The communicative approach: concerned with how language is used in communication. Success is judged in terms of the effectiveness of the communication which takes place rather than formal linguistic accuracy. Based on precise and detailed specifications of needs of the learners. Difference between approach and methodApproach: theoretical positions and beliefs about the nature of language, the nature of language learning, and the applicability of both to testing. Method: the way in which language or knowledge of language is elicited from a test taker.Test methodsA framework of test method facetsTest environment Familiarity of the place and equipmentPersonnelTime of testing Physical conditionsTest rubric Test organization Time allocationInstructionFacets of the inputFormatNature of language Facets of the expected responseFormat Nature of language Restrictions on responseRelationship between input and responseReciprocal Nonreciprocal Adaptive Characteristics of individuals Personal characteristicsAgeSexNationality Resident status Native languageLevel and type of general educationType and amount of preparationThe topical knowledge that test takers bring to the language testing situation Their affective schemata Their language ability Communicative language abilityA theoretical framework of communicative language abilityLanguage knowledgeOrganizational knowledgeGrammatical knowledgeTextual knowledge Pragmatic knowledgeFunctional knowledge Sociolinguistic knowledgeStrategic competenceGoal setting Assessment Planning ExecutionPsychophysiological mechanisms Uses of language testsUses of language tests in educational programsThe information regarding educational outcomes is essential to effective formal education, to make decisionsTo improve learning and teaching through appropriate changes in the program, based on feedbackTo measure educational outcomesResearch uses of language testsResearch on language proficiencyResearch on the nature of language processingResearch on the nature of language acquisitionResearch on the nature of language attritionInvestigation of effects of different instructional settings and techniques on language acquisitionClassifying types of language tests according to intended useSelection: whether or not the students should enter the programPlacement: placing students into appropriate groupsDiagnosis: diagnosing students areas of strength and weakness in order to determine appropriate types and levels of teaching and learning activities Progress and grading: providing continuous feedback to both the teacher and the learner for making decisions regarding appropriate modifications in the instructional procedures and learning activities. Classifying types of language tests according to contentProficiency tests: measuring general ability or skillAptitude tests: measuring capability or potential related to language acquisition as well as the use of language Achievement tests: measuring the extent of learning of the material presented in a particular course, textbook, or program of instructionClassifying types of language tests according to formatDirect tests: measuring ability directly in an authentic context and formatIndirect tests: fostering inference about one kind of behavior or performance through measurement of another related kind performance. Classifying types of language tests according to complexity of responseDiscrete-point tests: employing items measuring performance over a unitary set of linguistic structures or featuresIntegrative tests: measuring knowledge of a variety of language features, modes, or skills simultaneously Classifying types of language tests according to scoring Objective tests: scored with reference to a scoring key and not requiring expert judgment in the scoring process Subjective tests: depending on impression and opinion at the time of scoring Classifying types of language tests according to norm of referenceNorm-referenced tests: evaluating ability against a standard of mean or normative performance of a group, implying standardization through prior administration to a large sample of examineesCriterion-referenced tests: assessing achievement or performance against a cut-off score that is determined as a reflection of mastery or attainment of specified objectives. Classifying types of language tests according to time limitSpeed tests: limiting time allowed for completion so that the majority of examinees would not be expected to finish it, containing so easy items that, given enough time, most persons would respond correctly. Power tests: allowing sufficient time for nearly all examinees to complete it, but containing material of sufficient difficulty that it is not expected that a majority of examinees will get every item correct. Test usefulnessReliabilityValidityAuthenticity InteractivenessImpactPracticablity Reliability The consistency of the scores obtainable from a test. Test-retest method: calculated by the means of product-moment correlation of two sets of scores for the same person.Parallel forms method: two tests are administered to the same sample of persons and the results are correlated using product-moment correlation.Split half reliability: dividing a test into two nearly equal parts, correlating the scores together for the two parts, and adjusting the coefficient using the Spearman-Brown Prophecy Formula.Inter-rater reliability: correlation between different raters ratings of the same objects or performances, adjusted by the Spearman-Brown Prophecy Formula. Validity The extent to which a test measures the ability or knowledge that it is purported to measure.Face validity: a subjective impression, usually on the part of examinees, of the extent to which the test and its format fulfills the intended purpose of measurement. Content validity: a non-empirical expert judgment of the extent to which the content of a test is comprehensive and representative of the content domain purported to be measured by the test.Concurrent validity: the magnitude of the correlation between scores for a given test and some recognized criterion measure.Construct validity: the extent to which we can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure.Response validity: the extent to which examinee responses to a test or questionnaire can be said to reflect the intended purpose in measurement.Predictive validity: an indication of how well a test predicts intended performance.Relationship between reliability and validity Reliability: how much of the variance in test scores is reliable variance; examining variance in test scores themselves; agreement between similar measures of the same trait. Validity: what abilities contribute to this reliable variance; examining the relationship between test performance and factors outside the test itself; agreement between different measures of the same trait.A test cannot be valid unless it is reliable; it is quite possible for a test to be reliable but invalid.Maximizing reliability may lead to reducing validity.Authenticity the degree of correspondence between the characteristics of a given language test task to the features of a TLU task.Real-life approachThe appearance or perception of the test and how this may affect test performance and test use.The accuracy with which test performance predicts future non-test performance.Interactional/ability approachThe interaction between the language user, the context, and the discoursethe extent to which test performance reflects language abilities, or construct validity. Interactiveness the extent and type of involvement of the test takers individual characteristics in accomplishing a test task.language ability language knowledgestrategic competencetopical knowledgeaffective schemata Impact the positive or negative feedback of a test on teaching and learning.washback: the effect of a test on instruction. Practicability the relationship between the resources that will be required in the design, development, and use of the test and the resources that will be available for these activities. Reasons for test planning Providing the best means for assuring that the test will be useful for intended purposesIncreasing accountability: the ability to say what was done and what was right.Increasing the amount of satisfaction we experience.Stages of test development Statement of the problemWriting specifications for the testWriting the testPretesting Validation of the testStatement of the problemWhat kind of test is it to be?What is its precise purpose?What abilities are to be tested?How detailed must the results be? How accurate must the results be?How important is washback?What constraints are set by unavailability of expertise, facilities, time? Writing specifications for the testTest specifications - the blueprint to be followed by test and item writers, and essential in the establishment of tests construct validity.Content OperationsType of textAddresseesTopicsFormat and timingCriterial levels of performance Writing the testSampling Item writing and moderationWriting and moderation of scoring keyPretestingPurposesAssessing the usefulness of the testMaking the inferences or decisions for which the test intendedAdministering tests and collecting feedbackAnalyzing test scoresArchivingCollecting feedback for assessing usefulnessKinds of feedbackMethods of obtaining feedbackKinds of feedbackFeedback about test takers language abilityFeedback about the testing procedure itselfMethods of obtaining feedbackQuestionnaires Multiple-choice questionnairesRating scalesOpen-ended questionsThink-aloud protocolsObservation and description Interviews Item typesObjective-type itemsMultiple choiceDichotomous itemsMatching Information transferOrdering tasksEditing Gap fillingClozeC-testDictationShort-answer questionsSubjectively marked testsCompositions and essays SummariesOral interviewsInformation gap activitiesGeneral problems of items What an item is actually testing?Each item should be independent of othersInstructions for all items must be clear Multiple choice itemsThe correct answer must be genuinely correctThere is only one correct answerEach wrong alternative should be attractive to at least some of the studentsMultiple choice items should be presented in contextThe correct alternative should not look so different from the distractors that it stands out from the restEach option should fit equally well into the stem Item should not be independent from the reading or listening passageDichotomous items 50% possibility of getting any item right by chanceIt is necessary to have a large number of such items in order to discount the effect of chanceIncluding a third category “not given” or “does not say”MatchingTo give more alternatives than the matching task requiresEach item in the first column only matches one item in the secondInformation transfer The task can be complicated in the transfer but linguistically easyMay be culturally or cognitively biased Ordering tasksNot easy to provide words or phrases which only makes sense in one orderMarked wholly right or wholly wrongThe effort in constructing and in answering the item may not be considered EditingThere is one mistake per lineStudents should be told how many errors there are Gap filling It is important to reduce the number of alternative answers to the minimum and to ensure that there are no other possible answers which are not listed in the answer keyCandidates may not think of an answer not because they have poor language but because the word does not spring to mindA banked gap-filling task may be usedIt is important to tell students whether each gap is to be filled by one or by more than one wordClozeWords are deleted mechanicallyThe choice of the first deletion can have an effect on the validity of the testThere may be many possible answers for any one gapFew of the items may test the aspects of language with which the tester is concernedC-testInstructions are too complicatedThe number of missing letters should be shown in each gapEnough clues should be providedDictationIt is important to be presented in the same way to all the studentsIt is not clear whether a word is misspelt or just wrong in the process of markingIt is both time-consuming and boring to markThere may be many possible answers if students are required to write down the main pointsShort-answer questionsCandidates must know what is expected of them There are many ways of saying the same thingCompositions and essaysInstructions must be clearThe students are required to have a wide general knowledgeGive students some information before writingSummariesIt may be impossible to know whether the test taker is poor in comprehension or in writingMarking is complexTo provide a bank of possible words and phrasesOral interviews Only a limited vocabulary is used, not stretching the students ability to use complex structures Needs to be carefully structured to cover the aspects of language to be testedEach student is tested in a similar wayTo put candidates at easeInformation gap activitiesDifficult t construct Having a tendency to elicit a limited range of languageThe task can be biasedPersonal response assessmentIndividual tutorialsSelf- and peer-assessmentPortfoliosWhy test grammar?Content validityWashback effectImpact on skills performance Writing specificationsSyllabus (achievement tests)Textbooks and teaching materials All the structures (placement structure)SamplingWide selectionConcentration on the most importantTests of grammar and usage Multiple-choice itemsError-recognition itemsRearrangement items Completion items Transformation itemsItems involving the changing of words broken sentence items Pairing and matching items Combination items Addition itemsWhy test vocabulary?Essential to the development and demonstration of linguistic skillsWashback effectWriting specificationsAll the items presented to the studentsNew items met in other activitiesGrouping the items in terms of relative importanceFrequency of word use (proficiency tests)SamplingSelecting items randomly from groupsMore being selected from the groups containing more frequent and useful wordsTests of vocabulary Multiple-choice itemsSets (associated words)Matching items Word formation test itemsItems involving synonyms Rearrangement itemsDefinitionsCompletion itemsTesting reading comprehensionSkimmingThe method of glancing through a text in order to become familiar with the gist of the content Scanning The skills used when reading in order to locate specific informationSpecifying what the candidate should be able to doContent Criterial levels of performanceContent OperationsTypes of text (authentic)AddresseesTopicsOperations Macro-skillsScanning text to locate specific informationSkimming text to obtain the gistIdentifying stages of an argumentIdentifying examples presented in support of argumentMicro-skills Identifying referents of pronouns, etc.Using context to guess meaning of unfamiliar wordsUnderstanding relations between parts of text by recognizing indicators in discourse, especially for the introduction, development, transition, and conclusion of ideasGrammatical and lexical abilitiesSetting the tasksSelecting textsWriting itemsScoringErrors of grammar, spelling or pronunciation should not be penalizedSelecting textsKeep specifications constantly in mind and try to select as representative a sample as possibleChoose texts of appropriate lengthInclude as many passages as possible in a test, giving candidates a good number of fresh startsLook for passages which contian plenty of discrete pieces of informationChoose texts which will interest candidates but which will not overexcite or disturb themAvoid texts made up of information which may be part of candidates general knowledgeDo not choose texts whi
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 消费者接受度分析-第1篇-洞察及研究
- 低频电磁成像技术升级-洞察及研究
- 2025至2030中国水化学需氧量分析仪行业发展研究与产业战略规划分析评估报告
- 围墙及防爬网施工组织设计方案
- 城市公共交通车辆采购与运营协议
- 2025年中国Ⅴ领背心连衣裙数据监测报告
- 履行合同条款守约守信承诺书(9篇)
- 2025至2030年中国仿云石灯饰市场分析及竞争策略研究报告
- 工业废水处理技术及合同范本
- 可持续绿色1000吨日生物质能发电厂建设规模及绿色技术可行性研究报告
- 警务实战教官教学法课件
- 中式面点初级培训课件
- 2025年N1叉车司机模拟考试1000题及答案
- 2025高等教育人工智能发展报告
- 基于SERVQUAL模型的南京老门东历史文化街区旅游服务质量评价及提升策略研究
- 妇科异常子宫出血护理查房
- 北京中医药大学介绍
- 做账实操-泰国公司全盘会计账务处理分录实例
- 医院安保、停车场服务项目方案投标文件(技术标)(图文图表)
- 交警酒醉驾宣传课件
- 教学评一体化:新课标下道德与法治教学的必然选择
评论
0/150
提交评论