已阅读5页,还剩7页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1 Chapter 3 第三章 The Reliability of Testing 测试的信度 The definition of reliability The reliability coefficient How to make tests more reliable What is reliability Reliability refers to the trustworthiness and stability of candidates test results In other words if a group of students were given the same test twice at different time the more similar the scores would have been the more reliable the test is said to be How to establish the reliability of a test It is possible to quantify the reliability of a test in the form of a reliability coefficient They allow us to compare the reliability of different tests The ideal reliability coefficient is 1 A test with a reliability coefficient of 1 is one which would give precisely the same results for a particular set of candidates regardless of 2 when it happened to be administered A test which had a reliability coefficient of zero would give sets of result quite unconnected with each other It is between the two extremes of 1 and zero that genuine test reliability coefficients are to be found How high should we expect for different types of language tests Lado says Good vocabulary structure and reading tests are usually in the 0 9 to 0 99 range while auditory comprehension tests are more often in the 0 8 to 0 89 range A reliability coefficient of 0 85 might be considered high for an oral production test but low for a reading test The way to establish the reliability of a test 1 Test retest method It means to have two sets of scores for comparison The most obvious way of obtaining these is to get a group of subjects to take the same test twice 2 Split half method In this method the subjects take the test in the usual way but each subject is given two scores One score is for one half of the test the second score is for the other half The two sets of scores are then used to obtain the reliability coefficient as if the whole test had been taken twice 3 In order for this method to work it is necessary for the test to be spilt into two halves which are really equivalent through the careful matching of items in fact where items in the test have been ordered in terms of difficulty a split into odd numbered items and even numbered items may be adequate 3 Parallel forms method the alternate forms method It means to use two different forms of the same test to measure a group of students continuously or in a very short time However alternate forms are often simply not available How to make tests more reliable As we have seen there are two components of test reliability the performance of candidates from occasion to occasion and the reliability of the scoring Here we will begin by suggesting ways of achieving consistent performances from candidates and then turn our attention to scorer reliability 1 Take enough samples of behavior Other things being equal the more items that you have on a test the more reliable that test will be e g If we wanted to know how good an archer someone was we 4 wouldn t rely on the evidence of a single shot at the target That one shot could be quite unrepresentative of their ability To be satisfied that we had a really reliable measure of the ability we should want to see a large number of shots at the target The same is true for language testing It has been demonstrated empirically that the addition of further items will make a test more reliable The additional items should be independent of each other and of existing items e g A reading test asks the question Where did the thief hide the jewels If an additional item following that took the form What was unusual about the hiding place Would it make a full contribution to an increase in the reliability of the test No Why not Because it is hardly possible for someone who got the original questions wrong to get the supplementary question right We do not get an additional sample of their behavior so the reliability of our estimate of their ability is not increased 5 Each additional item should as far as possible represent a fresh start for the candidate Do you think the longer a test is the more reliability we will get It is important to make a test long enough to achieve satisfactory reliability but it should not be made so long that the candidates become so bored or tired that the behavior that they exhibit becomes unrepresentative of their ability 2 Do not allow candidates too much freedom In general candidates should not be given a choice and the range over which possible answers might vary should be restricted Compare the following writing tasks a Write a composition on tourism b Write a composition on tourism in this country c Write a composition on how we might develop the tourist industry in this country d Discuss the following measures intended to increase the number of foreign tourists coming to this country i More better advertising and or information where What form should it take ii Improve facilities hotels transportation communication etc iii Training of personnel guides hotel managers etc The successive tasks impose more and more control over what is 6 written The fourth task is likely to be a much more reliable indicator of writing ability than the first But in restricting the students we must be careful not to distort too much the task that we really want to see them perform 3 Write unambiguous items It is essential that candidates should not be presented with items whose meaning is not clear or to which there is an acceptable answer which the test writer has not anticipated The best way to arrive at unambiguous items is having drafted them to subject them to the critical scrutiny of colleagues who should try as hard as they can to find alternative interpretations to the ones intended 4 Provide clear and explicit instructions This applies both to written and oral instructions If it is possible for candidates to misinterpret what they are asked to do then on some occasions some of them certainly will A common fault of tests written for the students of a particular teaching institution is the supposition that the students all know what is intended by carelessly worded instructions The frequency of the complaint that students are unintelligent have been stupid have willfully misunderstood what they were asked to do reveals that the supposition is often unwarranted Test writers should not rely on the students powers of telepathy to 7 elicit the desired behavior The best means of avoiding problems is the use of colleagues to criticize drafts of instructions including those which will be spoken Spoken instructions should always be read from a prepared text in order to avoid introducing confusion 5 Ensure that tests are well laid out and perfectly legible Too often institutional tests are badly typed or handwritten have too much text in too small a space and are poorly reproduced As a result students are faced with additional tasks which are not ones meant to measure their language ability Their variable performance on the unwanted tasks will lower the reliability of a test 6 Candidates should be familiar with format and testing techniques In any aspect of a test is unfamiliar to candidates they are likely to perform less well than they would do otherwise For this reason every effort must be made to ensure that all candidates have the opportunity to learn just what will be required of them This may mean the distribution of sample tests or of past test paper or at least the provision of practice materials in the case of tests set within teaching institutions 7 Provide uniform and non distracting conditions of administration The greater the differences between one administration of a test and another the greater the differences one can expect between a candidate s performance on the two occasions 8 Great care should be taken to ensure uniformity e g Timing should be specified and strictly adhered to The acoustic conditions should be similar for all administrations of a listening test Every precaution should be taken to maintain a quiet setting with no distracting sounds or movements How to obtain scorer reliability 1 Use items that permit scoring which is as objective as possible This may appear to be a recommendation to use multiple choice items which permit completely objective scoring This is not intended While it would be mistaken to say that multiple choice items are never appropriate it is certainly true that there are many circumstances in which they are quite inappropriate What is more good multiple choice items are notoriously difficult to write and always require extensive pretesting An alternative to multiple choice is the open ended item which has a unique possibly one word correct response which the candidates produce themselves This too should ensure objective scoring but in fact problems with such matters as spelling which makes a candidate s meaning unclear often make demands on the scorer s judgment The longer the required response the greater the difficulties of this kind One way of dealing with this is to structure the candidate s response by 9 providing part of it e g The open ended question What was different about the results may be designed to elicit the response Success was closely associated with high motivation This is likely to cause problems for scoring Greater scorer reliability will probably be achieved if the question is followed by was more closely associated with 2 Make comparisons between candidates as direct as possible This reinforces the suggestion already made that candidates should not be given a choice of items and that they should be limited in the way that they are allowed to respond Scoring the compositions all on one topic will be more reliable than if the candidates are allowed to choose from six topics as has been the case in some well known tests 3 Provide a detailed scoring key This should specify acceptable answers and assign points for partially correct responses For high scorer reliability the key should be as detailed as possible in its assignment of points It should be the outcome of efforts to anticipate all possible responses and have been subjected to group criticism This advice applies only where responses can be classed as partially or totally correct not in the case of compositions for 10 instance 4 Train scorers This is especially important where scoring is more subjective The scoring of compositions for example should hot be assigned to anyone who has not learned to score accurately compositions from past administrations After each administration patterns of scoring should be analyzed Individuals whose scoring deviates markedly and inconsistently from the norm should not be used again 5 Agree acceptable responses and appropriate scores at outset of scoring A sample of scripts should be taken immediately after the administration of the test Where there are compositions archetypical representatives of different levels of ability should be selected Only when all scorers are agreed on the scores to be given to these should real scoring begin For short answer questions the scorers should note any difficulties they have in assigning points the key is unlikely to have anticipated every relevant response and bring these to the attention of whoever is supervising that part of the scoring Once a decision has been taken as to the points to be assigned the supervisor should convey it to all the scorers concerned 6 Identify candidates by number not name Scorers inevitably have expectations of candidates that they know 11 Except in purely objective testing this will affect the way that they score Studies have shown that even where the candidates are unknown to the scorers the name on a script or a photograph will make a significant difference to the scores given e g A scorer may be influenced by the gender or nationality of a name into making predictions which can affect the score given The identification of candidates only by number will reduce such effects 7 Employ multiple independent scoring
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年视力保健健康指导及宣教
- 2026年医学实验室质量指标(质量目标)设定与监测
- 上海立信会计金融学院《安全与危机管理》2025-2026学年第一学期期末试卷(B卷)
- 上海科技大学《阿拉伯各国概况》2025-2026学年第一学期期末试卷(B卷)
- 上海科技大学《安全防范系统工程》2025-2026学年第一学期期末试卷(A卷)
- 上海科技大学《AutoCAD 绘图》2025-2026学年第一学期期末试卷(B卷)
- 北方工业大学《走进中国》2025-2026学年第一学期期末试卷(A卷)
- 上海科技大学《Access 数据库技术》2025-2026学年第一学期期末试卷(A卷)
- 北方工业大学《舒缓医学》2025-2026学年第一学期期末试卷(A卷)
- 上海科学技术职业学院《Android 系统与开发》2025-2026学年第一学期期末试卷(B卷)
- 精神科安全检查及病房巡回
- 《DCS常见故障分析》课件
- 事业单位财务报销培训
- SJ∕T 11706-2018 半导体集成电路现场可编程门阵列测试方法
- 2024高考英语天津卷历年作文范文衡水体临摹字帖(描红无参考线) (二)
- 轮式智能移动操作机器人技术与应用-基于ROS的Python编程 课件 第11章 服务机器人应用
- SYLD显示屏培训资料
- 中国莫干山象月湖国际休闲度假谷一期项目环境影响报告
- 幼儿园获奖课件大班社会《遵守规则》
- 2022年浙江衢州市大花园集团招聘31人上岸笔试历年难、易错点考题附带参考答案与详解
- 劳动纠纷应急预案
评论
0/150
提交评论