Skip Navigation
Funding Opportunities | Search Funded Research Grants and Contracts

IES Grant

Title: English Learners and Science Tests
Center: NCER Year: 2011
Principal Investigator: Noble, Tracy Awardee: Technical Education Research Centers, Inc. (TERC)
Program: English Learners      [Program Details]
Award Period: 4 years Award Amount: $1,610,874
Goal: Measurement Award Number: R305A110122

Co-Principal Investigator: Ann Rosebery

Purpose: The use of students' scores on science achievement tests to make consequential decisions, such as grade promotion, rests on the assumption that a student's score is an accurate reflection of his or her knowledge of science. However, if a student is learning English and the test includes complex uses of English, it is difficult to distinguish whether the ELL (English language learner) student's score is low because she does not know the language or the science being tested. This project will identify linguistic aspects of multiple-choice assessment items in science that create undue difficulty for ELLs and suggest ways to modify the items to reduce factors that unfairly impede the performance of ELLs.

Project Activities: The research team will review items from the 5th grade Massachusetts Comprehensive Assessment System (MCAS) to study linguistic aspects of multiple-choice items in science that may impede the valid measurement of science knowledge among ELLs. The team will create and test modified items that avoid linguistic structures that may interfere with ELLs' ability to accurately complete the assessment item, such as lexical complexity, syntactic complexity, discourse complexity, unfamiliar context, and atypical perspective. In addition, they will examine other assessments for the prevalence of the identified linguistic structures. Finally, they intend to create guidance for school and district managers as well as teachers in how to avoid the use of assessment items with linguistic structures that may result in invalid measures of science knowledge among ELLs.

Products: Products will include published papers and reports on aspects of science assessment items that result in invalid measurement of ELLs and guidelines for coding linguistic features of science assessment items. Two handbooks will be developed. This will include a handbook for state and district personnel that describes linguistic aspects of science assessment items that should be avoided, and a handbook for teachers that explains how to help ELs cope with poorly written assessment items.

Structured Abstract

Setting: This study will take place in Massachusetts, New Jersey, and Utah.

Population: Participants will include approximately 75,000 fifth graders in Massachusetts in 2009. The study to try out new assessment items will include a total of 1,080 fifth-grade students from five urban school districts in Massachusetts. Studies to confirm the identified linguistic features will include a large sample of students from across the United States, as well as specialized studies that include all fourth-grade students in New Jersey and all fifth-grade students in Utah.

Intervention: The research team will develop a coding system for characterizing the linguistic complexity of multiple-choice science items, and identify linguistic features associated with differential item functioning (DIF) for ELLs. The validity of the linguistic features in creating difficulty for ELLs will be demonstrated by administering modified assessment items to remove the identified features and comparing item difficulty for ELL and non-ELL students.

Research Design and Methods: Researchers will begin by conducting a literature review to find linguistic features of items that have been identified in similar studies as contributing to invalid measurement for ELLs. Examples of such features include lexical complexity, syntactic complexity, discourse complexity, unfamiliar context, and atypical perspective. Three coders will independently review 175 multiple-choice items from the 5th grade MCAS between 2004 and 2010 for the presence of these features. The relationship between the identified features and differential item functioning will be studied by comparing the performance of ELLs and non-ELLs who took the MCAS in 2009. The research team will then conduct an empirical test of the relationship between modified items and ELL status by creating three test forms with both modified and non-modified items and administering them to 180 ELLs and 180 non-ELLs. In addition, researchers will investigate the frequency and relationship to ELL status of the identified item features that occur on the MCAS science tests in 8th and 10th grade; elementary-level assessments in New Jersey and Utah; and on the 4th grade NAEP in science.

Control Condition: There is no control condition.

Key Measures: The key measures used in this project are the MCAS science assessments in 5th, 8th, and 10th grade. In addition, the 4th grade NAEP assessments in science and the 4th grade New Jersey Assessment of Skills and Knowledge are utilized. Also, the 5th grade Utah Core Criterion-Reference Test will be used.

Data Analytic Strategy: A total feature score will be computed for each item by first putting the individual feature coding on a common scale and then creating a total feature score. The test of identified linguistic features will use logistic regression modeling to estimate differential item functioning (DIF). Items for which the coefficient predicting item passage is negative and statistically significant will be considered to exhibit DIF. Analyses will determine whether there are any statistically significant relationships between DIF level and total feature scores, DIF levels and individual feature scores, and DIF levels and combinations of feature scores. Based on these findings, initial feature hypotheses will be revised to create a refined set of hypotheses about features that act as sources of construct irrelevant difficulty for ELLs. The validity of the identified linguistic features will be tested by conducting analysis of variance to separate the effects of modified versus non-modified items for ELLs and non-ELLs. To explore the effects of individual item modifications, researchers will conduct independent samples t-tests to compare overall scores on the original versus modified forms of each item and a two-way analysis of variance to measure the effects of modifications to individual items across student groups. Similar analytic approaches will be used with Massachusetts standardized science tests in eighth and tenth grade, elementary school state science tests in New Jersey and Utah, and NAEP science data.

Publications for this project:

Kachchaf, R., Noble, T., Rosebery, A., O'Connor, C., Warren, B., & Wang, Y. (2016). A Closer Look at Linguistic Complexity: Pinpointing Individual Linguistic Features of Science Multiple-Choice Items Associated with English Language Learner Performance. Bilingual Research Journal, 39 (2), 152-166, DOI: 10.1080/15235882.2016.1169455.

Noble, T., Rosebery, A., Suarez, C., Warren, B., & O'Connor, M. C. (2014). Science Assessments and English Language Learners: Validity Evidence Based on Response Processes. Applied Measurement in Education, 27 (4), 248-260.

Noble, T., Rosebery, A., Kachchaf, R., & Suarez, C. (2015). A Handbook for Improving the Validity of Multiple Choice STE Items for English Learners. Unpublished manuscript. TERC, Cambridge, MA. Available at

Noble, T., Rosebery, A., Kachchaf, R., & Suarez, C. (2015). Lessons Learned and Implications for Practice from the English Learners and Science Tests Project. Unpublished manuscript. TERC, Cambridge, MA. Available at

Noble, T., Suarez, C., Rosebery, A., O'Connor, M. C., Warren, B., & Hudicourt-Barnes, J. (2012). "I Never Thought of It As Freezing": How Students Answer Questions on Large-Scale Science Tests and What They Know About Science. Journal of Research in Science Teaching, 49 (6), 778-803. doi: 10.1002/tea.21026