Field Test Validity Study Results - WV Connections现场测试的效度研究结果WV连接.doc_第1页
Field Test Validity Study Results - WV Connections现场测试的效度研究结果WV连接.doc_第2页
Field Test Validity Study Results - WV Connections现场测试的效度研究结果WV连接.doc_第3页
Field Test Validity Study Results - WV Connections现场测试的效度研究结果WV连接.doc_第4页
Field Test Validity Study Results - WV Connections现场测试的效度研究结果WV连接.doc_第5页
已阅读5页,还剩101页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Field Test Validity Study Results:English Language Development AssessmentFinal ReportDeliverableOctober 2004Rebecca Kopriva, Project DirectorCenter for the Study of Assessment Validityand Evaluation (C-SAVE)University of MarylandCouncil of Chief State School Officers LEP-SCASSAmerican Institutes of Research, Test ContractorAward #0305198000Field Test Validity Study Results:English Language Development AssessmentRebecca KoprivaDavid E. WileyChen-su ChenRoy LevyPhoebe C. WinterTia CorlissOctober 27, 2004Center for the Study of Assessment Validity and EvaluationDepartment of Measurement, Evaluation, and StatisticsCollege of EducationUniversity of MarylandEnglish Language Development Assessment (ELDA)TABLE OF CONTENTSI. BACKGROUND.1II. OVERVIEW OF FIELD TEST VALIDITY STUDIES CONDUCTEDBY C-SAVE.4A. Overview of Item Analyses.4 1. Evaluation of Items using Latent Class Analyses and other Ratings.4a. Contributing analyses4 b. Analyses of gradients6 c. Evaluation of items using contributing data and gradients. .82. Analyses of Relationships among Development Level of Items, Percent Correct, and Teacher Rating of Students.9B. Overview of Test Score Analyses.91. Relationship of ELDA with Other Methods: The Multitrait- Multmethod Analyses.10 2. Latent Class Analyses of Test Scores11 a. Differences between LCA and Mixed Rasch Model.12 b. Latent Class Analyses to validate the developmental model for test scores.133. Analyses of Developmental Level Structure14 a. Analyses of a Simplex Structure.14 b. Regressing other measures on developmental level scores.16III. RESULTS OF ITEM ANALYSES.17A. Evaluation of Items using Latent Class Analyses and other Ratings.17B. Analyses of Relationships between Developmental Level of Items and Teacher Rating of Students.25IV. RESULTS OF TEST SCORE ANALYSES.27A. Multitrait-Multimethod Analyses.271. ELDA with Other Measures, Overall MTMM Analyses.27 a. Grade cluster 3-5.31 b. Grade cluster 6-8.34 c. Grade cluster 9-12.37 2. MTMM Analyses by Subgroup.41 a. Grade cluster 3-5.44 b. Grade cluster 6-8.49 c. Grade cluster 9-1253B. Results of Latent Class Test Score Analyses.59 1. Test Score Latent Class Analyses59 a. LCA on all field test items59 b. LCA test score results on items selected for test inclusion.63 2. Mixed Rasch Latent Class Analyses.67 a. Analyses of a five-solution model.67 b. Additional analyses of reading 3-5, form A72C. Analyses of Developmental Structure.75 1. Analyses of a Simplex Structure.75 a. Grade Cluster 3-577 b. Grade Cluster 6-880 c. Grade Cluster 9-12.84 2. Regressing Other Measures on Developmental Level Scores.88 a. Language Assessment Survey and ELDA89 b. Idea Proficiency Test and ELDA.93V. OVERALL CONCLUSIONS.98102English Language Development Assessment (ELDA)I. BACKGROUNDThe English Language Development Assessment (ELDA) is a new battery of tests designed to allow schools to measure annual progress in the acquisition of English language proficiency skills among non-native English speaking students in K-12. The battery currently consists of separate tests for listening, speaking, reading, and writing, at each of three grade clusters: 3-5, 6-8, and 9-12. The tests were designed and developed through collaboration between the 18 or so member states of the LEP-SCASS, CCSSO, AIR, the Center for Studies in Assessment Validity and Evaluation at the University of Maryland, and Measurement Inc. ELDA contains separate tests for each of the four skills domains of listening, speaking, reading, and writing. The tests for grades 3-12 have undergone small-scale pilot testing and two forms of each test were assembled for field testing. They are aligned with the ESL standards of project member states and are developed to provide content coverage across three academic topic areas and one non-academic topic area related to the school environment. Tables 1-4 (taken from Fast, Ferrara, and Conrad, 2004) provide a summary of the key design features of each of the test components of ELDA. Five levels of proficiency were identified: Beginners, Lower Intermediate, Upper Intermediate, Advanced, and Full English Proficiency. Table 1:Item Types, Item Totals, and Estimated Testing Times for ELDA Operational FormsSpeakingListeningReadingWritingK-2Under development3-516 CRs15 minutes50 MCs70 minutes50 MCs50 minutes15 MCs3 SCRs/1 ECR45 minutes6-816 CRs15 minutes50 MCs70 minutes50 MCs55 minutes15 MCs3 SCRs/1 ECR45 minutes9-1216 CRs20 minutes60 MCs80 minutes60 MCs60 minutes15 MCs4 SCRs/1 ECR50 minutesNotes:CR = Constructed response; SCR = Short constructed response; ECR = Extended constructed response; MC = Multiple choiceEstimated testing times include estimates for instructions; the April-May 2004 field test contains test forms that are approximately 15-20% longer than the intended operational test form length.Table 2: Targeted Distribution of Test Items Across Academic v. Social Topic Areas for Operational FormsSpeakingListeningReadingWritingELAMSTSSSEELAMSTSSSEELAMSTSSSEELAMSTSSSE3-575%25%50%40%50%50%80%20%6-875%25%55%45%60%40%80%20%9-1275%25%60%40%70%30%80%20%Note:ELA: English/Language Arts; MST: Math, Science, Technology; SS = Social Studies; SE: Social EnvironmentalTable 3:Key Features of ELDA Operational FormsSpeakingListeningReadingWriting3-56-89-12 Small group administration (max.6 students) Complete test is tape/CD administered Student responses are recorded on individual student response tape Test booklet contains graphics designed to provide motivation to structure response Structure of each of 16 tasks (per cluster) is: input, prompt, scaffold, repeat prompt Scoring rubric for each task is 0-2 Group administration (max. 0 students) Complete test is tape/CD administered Test booklet contains all test material except stimuli Stimuli based on 5 text types (4 text types for grade cluster 3-5) Stimuli contain natural sounding language with age appropriate voices Stimuli are heard twice Some graphic support provided to aid comprehension Group administration Test composed of 3 sections, each with own text types 1) Early reading passages are short and simple; test item types are gap-filling and simple comprehension questions 2) Instructions passages simulate instructions in textbooks and teacher handouts; multiple choice options are mainly graphic 3) Longer comprehension passages support 6-9 test items Some graphic support provided to aid comprehension Group administration Test composed of 3 sections 1) Planning & Organizing: contains outlines or graphic organizers as stimuli 2) Revising & Editing: contains short stimuli that simulate student writing; stimuli contain sentence- and text-level errors 3) Writing tasks: designed to test students ability to produce different text types Scoring rubric for each writing task is 0-3 (SCRs) and 0-4 (ECRs) Some graphic support provided to aid productionTable 4: Test Standards for Each DomainListeningSpeakingReadingWriting3-56-89-121. Comprehend spoken instructions 2. Determine main idea/purpose3. Identify important supporting ideas4. Comprehend key vocabulary/phrases5. Draw inferences, predictions, conclusions6. Determine speakers attitude/perspective1. Connect2. Tell3. Expand4. Reason1. Demonstrate pre-/ early reading skills2. Determine main idea/purpose3. Identify important supporting ideas4. Comprehend written instructions5. Comprehend key vocabulary/phrases6. Draw inferences, predictions, conclusions7. Determine writers attitude/perspective8. Analyze style/form1. Revising and editing 2. Planning and organizing3. Writing a draft text: Narrative Descriptive Expository PersuasiveELDA Validity Research AgendaWorking together, the CCSSO LEP-SCASS technical committee, the Center for the Study of Assessment Validity and Evaluation (C-SAVE), and AIR have developed a research agenda for the ELDA. CCSSO assured that the agenda would be established and implemented from the beginning of the ELDA project by requiring it as part of test design and development and including C-SAVE and a project evaluator in the project. The validity research agenda is built around three sets of questions about making interpretations and decisions based on the ELD assessments: Do ELDA scores and proficiency levels reflect students actual proficiency in English as defined by the proficiency level descriptions, and teacher and expert judgment? Do increases in test scores within and across assessment grade clusters represent growth in English language proficiency? Do the test scores support appropriate decisions to reclassify LEP students as “formerly LEP”?II. OVERVIEW OF FIELD TEST VALIDITY STUDIES CONDUCTED BY C-SAVETwo general types of analyses were performed by C-SAVE; these are item level and test level analyses. Within each broad category, several analyses were done to provide various forms of evidence. An overview of these specific analyses is provided below and the results of each are discussed in that section.A. Overview of Item AnalysesBelow is an overview of the primary item level analyses that were conducted by C-SAVE. The purpose of our item analyses was to facilitate CCSSO and AIR in the process of selecting items from the pool of field tested items to form valid ELDA test forms. These analyses supplement AIRs more traditional item analyses that focused on scoring keys and rubrics, item difficulty assessments, biserial and point biserial discrimination indices, and DIF analyses. Explanations of the analyses will be illustrated using examples of results from the Reading Form A for the grade span 3-5. As will be noted below, all item level analyses focus on five levels. This is because the construction of ELDA is predicated on the assumption that five levels of language proficiency can be differentiated in each domain: Beginners, Lower Intermediate, Higher Intermediate, Advanced, and Full English Proficiency.1. Evaluation of Items using Latent Class Analyses and other Ratings a. Contributing Analyses Latent Class Analyses The main purpose of the ELDA field test was to evaluate the initial pool of test items. A different collection of items was constructed for each domain (reading, writing, speaking, and listening) for each grade cluster (3-5, 6-8, 9-12) of students. Each collection was assigned on of two test forms (A, B). The field test data set included item responses for every item for each student together with collateral data on every student. We performed the Latent Class Analyses, using the Winmira program. 5-class models were fit to data from each field test form. The analyses were performed on the field test item response data. For each domain and grade cluster form, we estimated the proportion correct on each item in every latent class. Below, we display these proportions correct by class for 13 items from the Reading Form A. (Table 1) We note that one would expect the proportion correct to increase with latent class; however, this increase will generally not be uniform. For example, item #12 in table 1 shows large jumps between classes A and B, and B and C, and then relatively small increases after that. The significance of the non-uniformity will be discussed in subsequent sections of this document.Table 1. Latent Class AnalysisItem OrderItem ID Class AClass BClass CClass DClass E125930.510.870.960.981.00225940.620.981.000.981.00325950.630.981.000.991.00422940.450.780.880.930.97522950.350.610.850.950.97622960.220.480.760.830.93725990.390.860.970.980.99826000.250.730.940.970.99926010.410.620.870.940.981026020.510.840.980.990.981126030.280.650.910.980.991226040.260.640.930.960.981322800.160.200.330.500.61Teacher Ratings of Student Proficiency The field test data collection included teacher assessment of each students language proficiency in reading, writing, speaking, and listening. These took the form of a 5-point developmental scale in each domain.For each domain, we used these data to group students by level and calculated the proportion correct on each item in every form. Below, we display for the 13 items above these proportions correct by level (Table 2.). We note that, as one would expect, the proportion correct increases with proficiency level in a similar fashion to those in the latent class analysis.Table 2. Proportion Correct by Student Proficiency LevelStudent Proficiency Ratings (PR)Item OrderItem ID 12345125930.550.870.950.981.00225940.680.960.980.991.00325950.680.970.981.000.99422940.480.820.890.910.93522950.410.670.860.890.97622960.250.590.740.780.90725990.480.870.940.971.00826000.420.790.900.920.97926010.470.680.880.920.941026020.640.860.950.960.991126030.360.770.870.930.971226040.380.720.880.930.961322800.180.250.360.430.58.Developmental Level Ratings of Items C-SAVE was interested in producing an independent judgment of the primary performance level each item was supposed to target (or, similarly, the specific cut-point between two levels the item was focused upon). For this set of analyses, experts were trained and charged with characterizing each item by assigning to each item the performance level designation (that is, beginning, lower intermediate, and so on) that best identified the level of development at which the item was focused. The assumption here was that each item was primarily targeted at one of the five developmental performance levels as defined by ELDA, and/or it was targeted at the cut-point between two adjacent levels. It was further assumed that ELDA would have items across the range of these levels, setting it apart from other measures of English language proficiency. The experts who determined these developmental levels were C-SAVE staff members and consultants with C-SAVE and AIR with extensive expertise in ESOL and language testing. Table 3 shows the developmental level for the 13 items previously identified in Table 1. According to the judgment of our experts, the first three items focus at the lowest developmental level, items 4, 5, 6, and 10 focus at the second developmental level, and so forth. Note that, although our illustrative sample only shows items through developmental level 4, the full sample includes items at all developmental levels.Table 3. Developmental Level Ratings of ItemsDL ratingItem OrderItem ID 1125931225941325952422942522952622963725993826003926012102602311260331226044132280b. Analyses of GradientsGradients in Latent Class Analyses In order to evaluate the validity of items for discriminating among the ordered latent classes, we calculated the differences in proportion correct between adjacent classes. For example, in Table 1 we see that for item #1, the proportion correct for class B is 0.87 and for class A it is 0.51, yielding a difference or gradient of 0.36. As per our example below from grades 3-5 Reading (Table 4), we note that the largest gradients for items 1-4, and 7 and 8 are in the first column, indicating that performance on those items discriminates latent classes A and B. Similarly, item 13 discriminates classes C and D. Items 5, 6, 11, and 12 also discriminate, but that discrimination appears to be courser. That is, those items differentiate the lowest ability students from the highest ability students but they do not discriminate between two adjacent classes.Table 4. Gradient in Proportion Correct* by Latent ClassItem OrderItem ID B-AC-BD-CE-D125930.360.090.020.02225940.360.02-0.020.02325950.350.02-0.010.01422940.330.100.050.04522950.260.240.100.02622960.260.280.070.10725990.470.110.010.01826000.480.210.030.02926010.210

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论