Skip Navigation
National Profile on Alternate Assessments Based on Alternate Achievement Standards:

NCSER 2009-3014
August 2009

C. Technical Quality1

NCLB requires that state assessments "be used for purposes for which such assessments are valid and reliable, and be consistent with relevant, nationally recognized professional and technical standards" (20 U.S.C. 6311 § 1111 (b)(3)(C)(iii). The U.S. Department of Education's (2004) Standards and Assessments Peer Review Guidance references the Standards for Educational and Psychological Testing (AERA/APA/ CME1999) to delineate the elements of validity and reliability required for technical quality. The elements of validity and reliability of assessment items (e.g., scoring and reporting structures, test and item scores, purposes of the assessment, grade-level equating) were based on the elements included in the Standards and Assessments Peer Review Guidance.

Other dimensions of technical quality reported here include fairness/accessibility, consistent procedures for test implementation, and alignment. Also reported here is the use of "extended" content standards. Alternate assessments based on alternate achievement standards must be aligned with the content standards for the grade in which the student is enrolled, although the grade-level content may be reduced in complexity or modified to reflect prerequisite skills. States can adapt or "extend" these grade-level content standards to reflect instructional activities appropriate for students with the most significant cognitive disabilities (U.S. Department of Education 2005).

Who was involved in reviewing the technical characteristics of validity, reliability, alignment, and fairness of the alternate assessment? (C1, C2, C3, C4)

Four multiple-choice items asked about who was involved in reviewing the validity, reliability, alignment, and fairness of the alternate assessment. Multiple responses were possible for each (validity, reliability, alignment, and fairness) and figure C1/C2/C3/C4 displays a summary of responses across states. Responses for individual states are displayed in tables C1, C2, C3, and C4 in appendix B, NSAA Data Tables.

  • State special education staff – The involvement of state special education staff ranged from 43 percent of states (22 states) for reliability to 71 percent of states (36 states) for fairness.
  • State assessment staff – The involvement of state assessment staff ranged from 67 percent of states (34 states) for alignment to 82 percent of states (42 states) for fairness.
  • State instruction and curriculum staff – The involvement of state instruction and curriculum staff ranged from 24 percent of states (12 states) for reliability to 53 percent of states (27 states) for alignment.
  • Test vendor – The involvement of test vendors ranged from 45 percent of states (23 states) for alignment to 69 percent of states (35 states) for reliability.
  • Outside experts – The involvement of outside experts ranged from 67 percent of states (34 states) for fairness to 86 percent of states (44 states) for validity.
  • Special education teachers – The involvement of special education teachers ranged from 43 percent of states (22 states) for reliability to 90 percent of states (46 states) for alignment.
  • General education teachers – The involvement of general education teachers ranged from 27 percent of states (14 states) for reliability to 71 percent of states (36 states) for alignment.
  • Content specialists – The involvement of content specialist ranged from 24 percent of states (12 states) for reliability to 73 percent of states (37 states) for alignment.
  • School psychologists/counselors – The involvement of school psychologists and counselors ranged from 6 percent of states (3 states) for reliability to 14 percent of states (7 states) for alignment.
  • School/district/state administrators – The involvement of school/district/state administrators ranged from 22 percent of states (11 states) for reliability to 49 percent of states (25 states) for alignment.
  • Parents – The involvement of parents ranged from 18 percent of states (9 states) for reliability to 65 percent of states (33 states) for fairness.
  • Other – The involvement of other individuals ranged from 8 percent of states (4 states) for reliability to 14 percent of states (7 states) for validity.
  • State did not address fairness – Six percent of states (3 states) did not address fairness.

Top

Did the state document the validity of the alternate assessment in terms of scoring and reporting structures consistent with the subdomain structures of its content standards? (C5)

This open-ended item asked whether the state had documented that the scoring and reporting structures reflected the knowledge and skills that students were expected to master and identified the nature of the evidence provided. If the reading standards were divided into certain subdomains/areas/categories, then evidence of the scoring and reporting structures should be divided into the same subdomains/areas/categories. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C5 and for individual states in table C5 in appendix B, NSAA Data Tables.

  • Yes, with evidence provided to the research team – This response category was coded when the state provided evidence that the depth and breadth of the standards were reflected or built into the scoring and reporting structures. Documents provided evidence that student performance was reported at the subdomain level, not just by content area. In other words, the state produced the scores for subdomain categories (i.e., standards/ benchmarks/indicators), which were the same subdomain categories as those in the content standards. In cases where states provided evidence to the research team, it was in the form of scoring and reporting documents. An alignment study on its own would not be sufficient evidence to code this response category; rather, there must be evidence that the scoring and reporting was consistent with the subdomains of the content standards. Thirty-five percent of states (18 states) reported that they had documented this type of validity and provided specific information regarding the evidence.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state claimed validity based on scoring and reporting structures, but the evidence was part of an internal, nonpublic report and was not available for examination by the research team. Six percent of states (3 states) reported that they had documented this type of validity but did not provide specific evidence.
  • No – The state did not claim or document the validity of the alternate assessment in terms of scoring and reporting structures consistent with the subdomain structures of its content standards. Fifty-seven percent of states (29 states) reported that they had not documented this type of validity, reflecting a majority of the states and the highest frequency reported.

Top

Did the state document the validity of the alternate assessment in terms of test and item scores related to internal or external variables as intended? (C6)

This open-ended item asked whether the state had documented the validity of test and item scores based on analysis of the relationship of test and item scores to one another (internal validity) or to other measures (external validity) and the nature of the evidence provided. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C6 and for individual states in table C6 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This category was coded when the state reported that a formal study or expert panel review was conducted, and evidence of the study was provided to the research team. The evidence may have been part of an internal or external study and was reported publicly or provided to the research team. Forty-one percent of states (21 states) reported that they had documented this type of validity and provided specific evidence.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that an internal study had been conducted or a formal study was in progress. The evidence may have been part of a plan or a study that was under way, and/or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Eight percent of states (4 states) reported that they had documented this type of validity but did not provide evidence.
  • Yes, but no formal study was conducted – This response category was coded when the state reported in an explanation or through anecdotes that validation occurred through a committee process or an internal review, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as a "committee process or internal review." Two percent of states (1 state) reported having documented this type of validity, but no formal study was reported.
  • No – The state did not document the validity of the alternate assessment in terms of test and item scores related to internal or external variables as intended. Forty-seven percent of states (24 states) reported that they had not documented this type of validity, reflecting the highest frequency reported.

Top

What evidence supported the validity argument in terms of test and item scores related to internal or external variables as intended? (C7)

This open-ended item asked about the types of formal analyses reported when the state had conducted a formal validity study of the test and item scores related to internal or external variables (see C6, response category "yes, formal study conducted"). Evidence may have included an indication that there were other assessments reported, such as standardized measures appropriate for students with significant cognitive disabilities that confirmed the results for similar students (external validity). Alternatively, the state may have provided statistical evidence that indicated the test items correlated with a total score in the same way (internal validity). The following types of evidence emerged during coding. Multiple responses were possible and are presented graphically in figure C7 and for individual states in table C7 in appendix B, NSAA Data Tables.

  • Correlational study indicating validity – Among the 21 states that provided evidence of a formal validity study to test item scores related to internal or external variables as intended, 86 percent of states (18 states) reported that they used a correlational study, reflecting a majority of the states and the highest frequency reported.
    • Internal item-to-item analysis – Thirty-three percent of states (7 states) that provided formal validity study information used item-to-item analysis to support this type of validity.
    • Correlational analysis using external measures – Twenty-four percent of states (5 states) that provided formal validity study information reported using correlational analysis (e.g., teacher grades, Academic Competence Evaluation Scales [ACES], different test) that used external measures.
  • Other type of analysis – Thirty-three percent of states (7 states) reported using another type of analysis or specific analytic strategy/approach not detailed.

Top

Did the state document the validity of the alternate assessment in terms of purposes of the assessment, delineating the types of uses and decisions most appropriate and the assessment results consistent with the purposes? (C8)

This open-ended item asked whether the state had documented the consistency of purposes of the assessment with the decisions made based on assessment results and the nature of the evidence provided. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C8 below and for individual states in table C8 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that a formal study or expert panel review was conducted. The evidence may have been part of either an internal or an external study, and the results were reported publicly or were provided to the research team. Thirty-three percent of states (17 states) reported that they had documented this type of validity and provided specific evidence, reflecting the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that an internal study had been conducted or formal study was in progress, and/or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Twenty percent of states (10 states) reported that they had documented this type of validity but did not provide specific evidence.
  • Yes, but no formal study was conducted – This response category was coded when the state reported that a validation was planned or under way and offered explanation or anecdotes that this type of validation had been done as part of a committee process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Twenty-four percent of states (12 states) provided explanations or anecdotes related to this type of validity, but they had not conducted a formal study.
  • No – The state did not document the validity of the alternate assessment in terms of the purposes of the assessment. Twenty-four percent of states (12 states) reported that they had not documented this type of validity.

Top

What evidence supported the validity argument in terms of purposes of the assessment, delineating the types of uses and decisions most appropriate and the assessment results consistent with the purposes? (C9)

This open-ended item asked about the types of formal analyses reported by the state when it had conducted a formal validity study on the consistency of the purposes and uses of the results of the assessment (see C8, response category "yes, formal study conducted"). The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C9 and for individual states in table C9 in appendix B, NSAA Data Tables.

  • Survey – Of the 17 states that provided evidence of a formal validity study to examine the purposes of the assessments, types of uses, and decisions made, 47 percent (8 states) reported that they had used a survey about the relationship between the purposes of the assessments and decisions made. This percentage reflected the highest frequency reported.
  • Alignment study – Twenty-nine percent of states (5 states) reported that they had assessed this type of validity through alignment studies.
  • Field tests/pilot tests – Six percent of states (1 state) reported that they had conducted field tests.
  • Construct validity analysis – Forty-one percent of states (7 states) reported that they had performed construct validity analysis.
  • Analytic review of outcomes – Eighteen percent of states (3 states) reported that they had performed an analytic review of outcomes.
  • State monitoring/program review – Twelve percent of states (2 states) reported that they had assessed this type of validity through state monitoring or program review.

Top

Did the state document the validity of the alternate assessment in terms of the assessment system's producing intended and unintended consequences? (C10)

This open-ended item asked whether the state documented the intended and/or unintended consequences of the assessment and the degree to which the determination of validity had been documented. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C10 and for individual states in table C10 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. Forty three percent of states (22 states) reported that they had documented this type of validity, reflecting the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Sixteen percent of states (8 states) reported that they had documented this type of validity but did not provide specific evidence.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Four percent of states (2 states) reported that they had documented this type of validity but did not provide evidence from a formal study or evaluation.
  • No – This response category was coded when the state had not documented the validity of the alternate assessment in terms of the assessment system's producing intended and/or unintended consequences. Thirty-five percent of states (18 states) reported that they had not documented this type of validity.

Top

What evidence supported the validity argument in terms of the assessment system's producing intended and unintended consequences? (C11)

This open-ended item described the types of evidence provided to document validity in terms of intended and/or unintended consequences (see C10, response category "yes, formal study conducted"). Evidence could include arguments or empirical evidence that demonstrated the direct or indirect consequences of taking the alternate assessment for students with significant cognitive disabilities, including those that were intended or unintended, positive or negative. Some items that were commonly addressed with this type of validity study were: Did the student learn more or less as a result of taking the assessment? Was an appropriate amount of preparation spent on the assessment? Did the assessment affect the student emotionally or functionally in some way? Did the assessment affect teacher understanding of the student's educational needs? Did the assessment change how teachers teach? The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C11 and for individual states in table C11 in appendix B, NSAA Data Tables.

  • Survey – This response category was coded when the state provided studies using surveys of teachers, parents, or other school staff as evidence. Of the 22 states with a formal validity study on intended and unintended consequences, 73 percent (16 states) provided evidence of a survey of teachers, parents, or other school staff, reflecting a majority of the states and the highest frequency reported.
  • Public reports – This response category was coded when the state provided published reports or other accounts of the consequences of the assessment. Twenty-three percent of states (5 states) provided public reports, newspaper articles, or other published evidence of investigating the consequences of the assessment.
  • Other post hoc data collection/analysis – This response category was coded when the state provided evidence of other types of data collection and analysis. Forty-one percent of states (9 states) reported post hoc data collection and analysis as evidence of investigating the consequences of the assessment.

Top

Did the state document the validity of the alternate assessment in terms of measurement of construct relevance? (C12)

This open-ended item asked whether the state had documented the construct relevance of the assessment (i.e., whether it measured the behavior or knowledge of interest, whether it measured only the standards and content appropriate to the age or grade of the assessed student and not information extraneous to the construct). Additionally, the item asked about the degree to which the determination of validity had been documented. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C12 and for individual states in table C12 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. Fifty-nine percent of states (30 states) reported that they documented this type of validity, reflecting a majority of states and the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Twelve percent of states (6 states) reported that they documented this type of validity, but the evidence was part of a plan or a study that was under way at the time of the interview and not available.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Eighteen percent of states (9 states) reported an explanation or anecdotal evidence regarding this type of validity, but no formal study was conducted.
  • No – This response category was coded when the state had not documented the validity of the alternate assessment in terms of construct relevance. Ten percent of states (5 states) reported that they had not documented this type of validity.

Top

What evidence supported the validity argument in terms of measurement of construct relevance? (C13)

This open-ended item described the types of evidence provided to document the measurement of construct relevance (see C12, response category "yes, formal study conducted"). Evidence could include arguments or empirical evidence that demonstrated that the behavior or knowledge of interest was measured as intended. The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C13 below and for individual states in table C13 in appendix B, NSAA Data Tables.

  • Statistical analyses – This response category was coded when the state provided evidence of conducting factor analysis, item-to-item analysis, and/or correlational studies across tests of similar constructs. Of the 30 states with a formal validity study of construct relevance, 43 percent (13 states) reported that statistical analyses including factor analysis, item-to-item analysis, or correlational studies across tests of similar constructs supported their validity argument.
  • Construct analyses – This response category was coded when the state provided evidence of alignment studies or other reviews by trained judges regarding the construct of the assessment. Eighty-three percent of states (25 states) reported that construct analyses including alignment studies and other reviews had been conducted, reflecting a majority of the states and the highest frequency reported.

Top

Did the state document the validity of the alternate assessment in terms of grade-level equating? (C14)

This open-ended item asked whether the state had documented the validity of the alternate assessment in terms of grade-level equating (i.e., the extent to which assessment items and tasks were calibrated within and across grade levels). Additionally, the item asked about the degree to which the determination of validity had been documented. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C14 and for individual states in table C14 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. In these cases, the type of evidence was reported in the state profile as a "formal study or expert panel review." Eight percent of states (4 states) reported that they had documented this type of validity, and the evidence was part of the documentation reported publicly.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Ten percent of states (5 states) reported that they had documented this type of validity, but evidence was not provided to the research team.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Eight percent of states (4 states) reported that they had evaluated this type of validity, but no formal studies were conducted.
  • No – This response category was coded when the state had not documented the validity of the alternate assessment in terms of grade-level equating. Fifty-seven percent of states (29 states) reported that they had not documented this type of validity, reflecting a majority of the states and the highest frequency reported.
  • Not appropriate for this type of assessment – This response category was coded when the state reported that grade-level equating was not appropriate for the type of assessment used and the assessment approach did not meet the assumptions needed to conduct this type of analysis. Eighteen percent of states (9 states) reported that this item was not applicable.

Top

Had the state content standards been extended or adapted to provide access for students with significant cognitive disabilities? (C15)

This item asked, at a summary level, whether the state had developed an augmented or supplementary list of "extended" standards that presented the appropriate level of challenge for students with significant cognitive disabilities to clarify the relationship between the academic content standards and the alternate assessment, and that allowed such students access to state curricular content. The information is presented graphically in figure C15 and for individual states in table C15 in appendix B, NSAA Data Tables.

  • Eighty-eight percent of states (45 states) reported that they extended or clarified content standards for students with significant cognitive disabilities, reflecting a majority of the states.

Top

How did the extended content standards map to the state content standards? (C16)

This item asked about the extent of the linkage between the state's "extended" content standards and the state's general education grade-level content standards. This was an open-ended item, and the following response categories emerged during coding. Multiple response codes were possible and are presented graphically in figure C16 and for individual states in table C16 in appendix B, NSAA Data Tables.

  1. General link to state content standards – This response category was coded when the state reported that the extended content standards were linked at a general level with the state's content standards. This refers to the broad concept or first level in the hierarchy of the state's standards. For example, one state required that the content domains of language arts, mathematics, and science be addressed but did not link to content area strands or grade-level competencies within those content areas. Eighty two percent of states (37 states) reported that the extended content standards linked to the state content standards at the broad concept level, reflecting a majority of states and the highest frequency reported.
  2. Grade or grade span – This response category was coded when the state reported that the state's standards had been expanded, defined, or redefined in terms of grade levels or grade spans to create the extended standards. For example, one state specified, within each content domain, the content strands that should be addressed in grades 3–8 and then in high school. Seventy-six percent of states (34 states) reported that the extended content standards linked to the state content standards by specific grades or by grade spans, reflecting a majority of states.
  3. Expanded benchmarks – This response category was coded when the state reported that its expanded standards provided greater specificity regarding the expectations for students with significant cognitive disabilities. These were downward extensions of the standards, which may have been referred to as expanded benchmarks, extended standards, essences, or dimensions. Expanded benchmarks might include information about the levels of complexity or depth of knowledge and describe the "essence" of standards or an "extension" to access points. For example, in one state the general 'indicator' "Demonstrate the ability to use a variety of strategies to derive meaning from texts and to read fluently" was expanded to "Understand how print is organized." Seventy-three percent of states (33 states) reported that the extended standards linked to the state standards through extended benchmarks, reflecting a majority of states.
  4. Alternate indicators or tasks – This response category was coded when the state reported that it had developed levels of specification that described activities, tasks, or how student performances might be structured, often referred to as performance indicators, indicator tasks, indicator activities, or alternate performance indicators (APIs). For example, in one state the learning standard "Identify and represent common fractions (1/2, 1/3, 1/4) as parts of wholes, parts of groups, and numbers on the number line was linked to the following activities at different entry points: (1) Understand whole and half; (2) Manipulate objects to make two objects from one; (3) Manipulate whole objects to make two, three, or four parts of a whole; (4) Manipulate up to four parts of an object to assemble a whole; and (5) Identify and compare parts of a whole (quarters, thirds, halves) and determine relative size of each (1/2, 1/3, 1/4) using manipulatives." Forty-nine percent of states (22 states) reported that the extended content standards linked to the state content standards through alternate performance tasks or alternate performance indicators.

Top

Did the state document the reliability of the alternate assessment in terms of variability across groups? (C17)

This open-ended item asked whether the state had documented the reliability of the alternate assessment in terms of differences in the performances of students in the various NCLB-defined groups (e.g., race/ethnicity, economically disadvantaged, limited English proficient). Additionally, the item asked about the degree to which the determination of reliability had been documented and reported. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C17 and for individual states in table C17 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. Twenty nine percent of states (15 states) reported that they had documented this type of reliability, and the report was available publicly.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Twelve percent of states (6 states) reported that they had documented this type of reliability, but evidence was not provided to the research team.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Two percent of states (1 state) reported having documented this type of reliability, but no formal study was reported.
  • No – This response category was coded when the state had not documented the reliability of the alternate assessment in terms across variability across groups. Thirty three percent of states (17 states) reported that they had not documented this type of reliability, reflecting the highest frequency reported.
  • Not appropriate for this type of assessment – This response category was coded when the state reported that analyzing the reliability in terms of the variability across groups was not appropriate for this type of assessment and the assessment approach did not meet the assumptions needed to conduct this type of analysis. Twenty-two percent of states (11 states) reported that this item was not applicable.

Top

What evidence supported the reliability argument in terms of variability across groups? (C18)

This open-ended item described the types of evidence provided to support the reliability of the alternate assessment in terms of variability across groups (see C17, response category "yes, formal study conducted"). The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C18 below and for individual states in table C18 in appendix B, NSAA Data Tables.

  • NCLB group statistical analyses conducted – This response category was coded when the state provided evidence of differential item functioning (DIF) analyses, consistency reliability, and/or test-retest reliability. Among the 15 states that provided evidence of a formal reliability study to test variability across groups, 93 percent (14 states) reported that they had used NCLB group statistical analyses, reflecting a majority of the states and the highest frequency reported.
  • Review of disability group results – This response category was coded when the state provided evidence of a published review by an expert panel or review group. Twenty seven percent (4 states) reported that they had used a review of disability group results.

Top

Did the state document the reliability of the alternate assessment in terms of internal consistency of item responses? (C19)

This open-ended item asked whether the state had documented that there was consistency between scores on particular groups of items and the total test score and that scores on one item were consistent with scores on other items that were measuring the same construct. These reliability test results should provide statistical evidence of item consistency; if the state reported having conducted a study, it should be of a statistical nature and statistical results should be evident. Additionally, the item asked about the degree to which the determination of reliability had been documented and reported. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C19 and for individual states in table C19 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. In these cases, the type of evidence was reported in the state profile as "formal study or expert panel review." Forty-one percent of states (21 states) reported that they had documented this type of reliability, reflecting the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Eight percent of states (4 states) reported that they had documented this type of reliability, but the evidence was part of a plan or a study that was under way and was not available for review at the time of the interview, or the evidence was part of an internal, nonpublic report.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." No states reported that they had documented this type of reliability and had not conducted a formal study.
  • No – This response category was coded when the state had not documented the reliability of the alternate assessment in terms of internal consistency of item responses. Thirty-one percent of states (16 states) did not document this type of reliability.
  • Not appropriate for this type of assessment – This response category was coded when the state reported that analyzing the reliability in terms of internal consistency of item responses would not be appropriate for this type of assessment and the assessment approach did not meet the assumptions needed to conduct this type of analysis. Twenty percent of states (10 states) reported that this item was not appropriate for this type of assessment.

Top

Did the state document the reliability of the alternate assessment in terms of interrater consistency in scoring? (C20)

This open-ended item asked whether the state had conducted statistical procedures to examine the consistency and reliability of scoring between and among scorers. Additionally, the item asked about the degree to which the determination of reliability had been documented and reported. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C20 and for individual states in table C20 in appendix B, NSAA Data Tables.

  • Yes, formal study conducted – This response category was coded when the state reported that an internal or external study or expert panel review had been conducted, and the report was available publicly or provided to the research team. Seventy five percent of states (38 states) reported that they had documented this type of reliability, and the evidence was part of documentation reported publicly, reflecting a majority of the states and the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that a plan or a study was under way, or the evidence was part of an internal, nonpublic report. These reports were not available for examination by the research team. Eight percent of states (4 states) reported that they had documented this type of reliability, but evidence was not provided to the research team.
  • Yes, but no formal study was conducted – This response category was coded when the state provided an explanation or anecdotes regarding a committee review process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "training documents or anecdotal." Eight percent of states (4 states) reported that they had documented this type of reliability, but no formal studies were reported.
  • No – This response category was coded when the state had not documented the reliability of the alternate assessment in terms of interrater consistency of scoring. Ten percent of states (5 states) reported that they had not documented this type of reliability.

Top

What evidence supported the reliability argument in terms of interrater consistency in scoring? (C21)

This open-ended item described the types of evidence provided to support the reliability of the alternate assessment in terms of interrater consistency in scoring and to document that statistical procedures were used to examine the consistency of scoring between and among scorers (see C20, response category "yes, formal study conducted"). Evidence should demonstrate that the state analyzed the frequency with which scorers scored tests similarly, using interrater reliability analyses. The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C21 below and for individual states in table C21 in appendix B, NSAA Data Tables.

  • Statistical analysis conducted as part of training – This response category was coded when states reported calculating correlation coefficients, agreement percentages, or other analysis of scoring done as part of scorer training. States may also have established interrater consistency cut points that scorers must meet to obtain scorer certification during training. Among the 38 states that provided evidence of a formal reliability study to test interrater consistency, 26 percent (10 states) reported that they used statistical analysis of scores during training.
  • Statistical analysis conducted as part of actual scoring – This response category was coded when states reported calculating correlation coefficients, agreement percentages, or other analysis of assessment or field test scores. Eighty-nine percent (34 states) reported that they used statistical analysis as part of the scoring of the alternate assessment or field tests, reflecting a majority of the states and the highest frequency reported.

Top

Had conditional standard errors of measurement (CSEMs) been reported for the alternate assessment? (C22)

This item asked whether the state had analyzed the standard errors of measurement (SEMs) or conditional standard errors of measurement (CSEMs). The following mutually exclusive response categories emerged during coding and are presented graphically in figure C22 below and for individual states in table C22 in appendix B, NSAA Data Tables.

  • Yes – Thirty-five percent of states (18 states) reported that SEM/CSEM calculation procedures had been conducted and reported.
  • No – Forty-five percent of states (23 states) reported that SEMs/CSEMs had not been calculated or reported, reflecting the highest frequency reported.
  • Not appropriate for this type of assessment – Eighteen percent of states (9 states) reported that calculations of SEM were not appropriate for this type of assessment.

Top

What was the initial process of aligning alternate achievement standards with the state content standards, and how was it validated? (C23)

This open-ended item asked about the processes and methodologies the state used to align its alternate achievement standards with state content standards, as well as how this alignment was validated. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C23 and for individual states in table C23 in appendix B, NSAA Data Tables.

  • A formal alignment study was conducted – This response category was coded when the state reported that an internal or external alignment study had been conducted, and it was reported publicly or to the research team. Evidence may include one or more formal expert panel reviews or studies using methodologies such as Webb or LINKS. In these cases, the type of evidence was reported in the state profile as "formal study." Seventy one percent of states (36 states) reported that the alternate achievement standards were aligned with the state content standards and that they had conducted a formal alignment study for validation, reflecting a majority of the states and the highest frequency reported.
  • Alignment was reported, but no formal study was conducted – This response category was coded when states provided an explanation or anecdotes about a committee process to establish alignment, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Twenty-four percent of states (12 states) reported that the alternate achievement standards were aligned with the state content standards but that they did not conduct a formal alignment study.
  • No alignment study was conducted – This response category was coded when the alternate achievement standards were not validated by an alignment study. Four percent of states (2 states) reported that they had not validated the alignment between the alternate achievement standards and the state content standards.

Top

What ongoing procedures were used to maintain and improve alignment between the alternate assessment based on alternate achievement standards and state content standards over time? (C24)

This item asked for the types of procedures the state used to monitor the alignment of the alternate assessment with state content standards and ensure that future alignment studies would be conducted. This was a multiple-choice item, and multiple responses were possible and are presented graphically in figure C24 and for individual states in table C24 in appendix B, NSAA Data Tables.

  • Internal alignment studies – This response category was coded when alignment studies were conducted by state assessment staff. Twenty-nine percent of states (15 states) reported that they used internal alignment studies to maintain and improve alignment between the alternate assessment and state content standards over time.
  • External alignment studies – This response category was coded when outside experts conducted alignment studies. Fifty-nine percent of states (30 states) reported that they used external alignment studies conducted by an independent evaluator outside of the state department of education to maintain and improve alignment between the alternate assessment and state content standards over time, reflecting a majority of the states and the highest frequency reported.
  • Other alignment studies – This response category was coded when an internal review was held or the details of the type of alignment study were not specified by the state. Twenty-two percent of states (11 states) reported that they used other kinds of alignment studies to maintain and improve alignment between the alternate assessment and state content standards over time.
  • No alignment studies conducted – This response category was coded when the alternate achievement standards were not validated on an ongoing basis. Fourteen percent of states (7 states) reported that no alignment procedures were used to maintain and improve alignment between the alternate assessment and state content standards over time.

Top

Was there a process to ensure fairness in the development of the alternate assessment? (C25)

This open-ended item asked whether the state used a formal process (a statistical validation process, a committee review, etc.) to ensure that students' performance on the alternate assessment was not biased or influenced, for example, by native language, prior experience, gender, ethnicity, or disability. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C25 below and for individual states in table C25 in appendix B, NSAA Data Tables.

  • Yes, bias review conducted systematically and regularly – This response category was coded when the state reported that either qualitative or statistical analyses were conducted (bias review) to ensure fairness in the assessment. Assessments may have been reviewed by technical committees and/or expert panels, and the results were reported either internally or externally on a regular basis. Thirty-three percent of states (17 states) reported that bias reviews were conducted systematically and regularly.
  • Yes, bias review not conducted regularly – This response category was coded when the state reported that bias review was conducted formally or informally, typically in conjunction with assessment development or revision, but not on a regular basis. Statistical evidence was sporadic and not necessarily available publicly. Thirty seven percent of states (19 states) reported that formal or informal bias reviews were conducted periodically, typically in conjunction with test development or revision, reflecting the highest frequency reported.
  • No evidence of bias review – Twenty-seven percent of states (14 states) did not provide evidence of a process to ensure fairness in the development of the alternate assessment.

Top

What evidence supported the process to ensure fairness in the development of the alternate assessment? (C26)

This open-ended item asked what types of evidence supported the process to ensure fairness in the development of the alternate assessment (see C25, response category "Yes, bias review conducted systematically and regularly"). The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C26 below and for individual states in table C26 in appendix B, NSAA Data Tables.

  • Regularly scheduled bias review by experts – Of the 17 states that conducted bias reviews, 94 percent (16 states) reported that experts were used to conduct the bias reviews, reflecting a majority of the states and the highest frequency reported.
  • Statistical analyses – Forty-seven percent (8 states) reported using statistical analyses (e.g., differential item functioning [DIF] analysis).

Top

Did the state document the validity of the alternate assessment in terms of implementation processes? (C27)

This open-ended item asked whether the state had documented the validity of the alternate assessment in terms of implementation processes. Implementation processes included how the state informed districts and schools about the assessment and assessment procedures and how test administrators were trained. Validation of these processes might have occurred through a variety of means, including training, guidelines, manuals, monitoring, and follow-up analyses. The following mutually exclusive response categories emerged during coding and are presented graphically in figure C27 and for individual states in table C27 in appendix B, NSAA Data Tables.

  • Yes, with evidence provided to the research team – This response category was coded when the state reported that formal studies or expert panel reviews were conducted on implementation processes, and evidence was part of documentation reported publicly or was provided to the research team. Seventy-six percent of states (39 states) reported that they had documented this type of validity and also provided evidence to support this assertion, reflecting a majority of the states and the highest frequency reported.
  • Yes, but evidence was not provided to the research team – This response category was coded when the state reported that this type of validation was planned or under way, or the evidence was part of an internal, nonpublic report of implementation processes. These reports were not available for examination by the research team. Four percent of states (2 states) reported that they had documented this type of validity, but did not provide evidence.
  • Yes, but no formal study was conducted – This response category was coded when the state reported in an explanation or through anecdotes that validation of implementation processes occurred as part of a committee process, but no formal study was conducted. In these cases, the type of evidence was reported in the state profile as "anecdotal or committee process." Four percent of states (2 states) provided an explanation related to this type of validity but did not conduct a formal study.
  • No – The state did not claim or document the validity of the alternate assessment in terms of the implementation processes. Fourteen percent of states (7 states) reported that they had not documented this type of validity.

Top

What evidence supported the validity argument in terms of implementation processes? (C28)

This open-ended item asked about the types of evidence the state provided to support the validity of the alternate assessment in terms of implementation processes (see C27, response category "yes, evidence provided to the research team"). The following response categories emerged during coding. Multiple responses were possible and are presented graphically in figure C28 below and for individual states in table C28 in appendix B, NSAA Data Tables.

  • Training – This response category was coded when the state reported that it had developed teaching tools that included in-person, video, or online training for administration, scoring, and reliability. Of the 39 states that provided evidence to support the validity argument in terms of the implementation processes, 82 percent (32 states) reported that they had developed training, reflecting a majority of the states and the highest frequency reported.
  • Administration manual/guide – This response category was coded when the state reported that it had developed manuals that provided directions, sample entries, protocols, and scoring rubrics. These manuals may have been available in hard copies or on websites. Seventy-four percent of states (29 states) reported that they had validated the implementation processes through publication of administrative manuals and guides, reflecting a majority of the states.
  • Monitoring – This response category was coded when the state reported that monitoring was conducted by the state agency, outside experts, citizen groups, or school-level administrators. These processes may have included sign-in verification by principals, test coordinators, or teachers. Forty-four percent of states (17 states) reported having validated the implementation of the alternate assessment through monitoring.
  • Post hoc data collection/analysis – This response category was coded when the state reported that reliability rescoring and examining of assessment results to determine fidelity were conducted. Forty-one percent of states (16 states) reported that they had conducted post hoc data collection/analysis on implementation processes.

Top

1 For the technical quality variables reported here, when evidence was provided to the research team, the evidence was examined to describe and classify it. It was beyond the scope of this study to summarize the findings of the evidence or to evaluate its quality or rigor.