Children's social and moral development has long been a central goal of American schools (McClellan 1999). Through the Partnerships in Character Education Program (PCEP), located in the Office of Safe and Drug-Free Schools (OSDFS) in the U.S. Department of Education, the federal government has distributed up to about $25 million annually in grants to state and local education agencies for the design and implementation of character education programs. Conducted at the request of OSDFS and under the auspices of the Institute of Education Sciences (IES), the present study has three objectives: (1) to document the constructs measured in studies of a delimited group of character education programs; (2) to develop a framework for systematically describing and assessing measures of character education outcomes; and (3) to provide a resource for evaluators to help identify and select measures of the outcomes of character education programs.
Method
We approached the selection of programs for review so as to ensure inclusion of
programs addressing the goals of PCEP and that were diverse along some key programmatic
dimensions. We drew on three primary sources: (1) The IES What Works Clearinghouse
(WWC) 2007 review of character education programs (WWC 2007); (2) research-driven
guides to character education developed by the What Works in Character Education
Project (WWCEP), a collaborative effort of the Center for Character and Citizenship
at the University of Missouri-St. Louis and the Character Education Partnership
(Berkowitz and Bier 2006a, 2006b); and (3) grantee reports from state and local
education agencies that received funds from PCEP between 2003 and 2007. From the
pool of 68 programs identified from these sources, we randomly selected 36 programs
for review after stratifying by source, grade level of focus, and whether the program
is comprehensive (that is, fully integrated into the life of a school) or modular
(that is, a stand-alone program). Random selection of the 36 programs for examination
ensured that the analysis of outcome measurement was conducted for a subset of the
68 programs which reflected the diversity in measured and unmeasured attributes
of the larger set of 68 programs. We then systematically identified the studies
of each program, using Psychinfo and gray literature searches, and focused on those
studies that provided the greatest detail on outcome measurement. We then reviewed
these studies, and developed a classification system to group related outcome constructs
conceptually. This taxonomy, outlined in Table 2, was structured to organize outcomes
from broad conceptual categories to increasingly specific conceptual categories.
The broadest level distinguished between student-level outcomes and "other" level
outcomes, with this latter category including teacher, school, parent, and community
outcomes; the mid-range of specificity distinguished between student affective,
behavioral, and cognitive outcomes; and finer levels of specificity distinguished,
for example, between conceptual categories such as student knowledge and reasoning,
and prosocial and risk behaviors. In reviewing studies, we identified all reported
outcomes measured and classified them according to our taxonomy (Appendix B provides
a crosswalk between the taxonomy and the programs selected for the report), we described
the measures used including their psychometric properties, and we provided citations
for the information on measures.
Key Findings
Research on the selected character education programs addresses a wide variety of
outcomes. Student-level outcomes are measured in studies of 34 of 36 programs, with
25 of 36 programs addressing one or more cognitive outcomes, 28 addressing one or
more affective outcomes, and 31 addressing one or more behavioral outcomes. Among
these student level outcomes, those most often measured were: academic content (measured
for 14 programs), prosocial dispositions and interpersonal strengths (each measured
for 11 programs), discipline issues and interpersonal competencies (each measured
for 13 programs), and substance use and intrapersonal competencies (each measured
for 11 programs). In terms of outcomes at other levels (that is, beyond the student),
research on 7 programs addressed teacher-level outcomes, 16 addressed school-level
outcomes, and 14 addressed parent/community-level outcomes. Staff morale, school
climate, and parent participation in school were the constructs measured most often
in these respective domains (for 6, 16, and 11 programs, respectively).
Measurement methods were also diverse. Appendix A provides detail, by program, on every measure used in the studies reviewed for this report. For each program, the appendix provides a brief description of the program, descriptions of each measure, and an indication of which outcome constructs from the taxonomy each measure addressed. As shown in these tables, researchers employed direct and indirect assessment, as well as surveys with reports by teachers, parents, and students. They reported outcomes on scales and for stand-alone items, as well as non-scaled measures, such as attendance or disciplinary infractions.
Table 3 summarizes information on all of the scaled measures included in the studies reviewed. For each measure, the table shows the name of the instrument, whether it was developed for the study or is an "off the shelf" measure, its source, the type of assessment (for example, direct assessment versus self report), the domain it assessed (student [cognitive, affective or behavioral] or "other"), and a rating of its reliability. Table 4 provides a crosswalk between the taxonomy outlined in Table 2 and the scaled measures identified in our review with reported reliability of .70 or greater.1 Our assessment of the characteristics of the scaled measures revealed two central themes:
Considerations When Using This Report
The evidence developed from studies of the sample of programs reviewed here suggests
that character education researchers use this report's information on outcome measurement
with the following considerations in mind. First, the taxonomy presented here suggests
a diverse array of outcomes may be affected by character education programming.
Reference to a clear theory of how program elements are linked to specific outcomes
may help researchers to identify those outcomes that the program in question is
most likely to affect. In the absence of a clearly articulated theory, researchers
could "work backward" from the taxonomy presented here to assess the extent to which
each of the constructs are likely to be influenced by their intervention, selecting
for measurement those that seem most appropriate.
Second, given the complexity of "character" as a construct, it could be beneficial for researchers to select or develop measures with demonstrated reliability and validity. While the measures presented here are not necessarily representative of the universe of research on character education programs, nor are they necessarily the best measures available, this report provides information on a variety of outcome measures with demonstrated psychometric properties. Related to this, the field of character education could benefit from more consistent reporting on the psychometric properties of outcome measures. Studies provided insufficient information to assess measures' reliability in the case of 33 of 95 scaled measures identified here. Consistent reporting of measures' psychometric properties would support comparison of outcomes across programs and populations and potentially improve our understanding of effective character education practices.
Finally, the findings of this report highlight the importance of alignment between the conceptualization and measurement of outcomes. Our review revealed two ways in which measurement methods demonstrate a potential lack of such alignment: (1) there may be misalignment between items in a particular scale (they do not "hang together"); and (2) there may be a mismatch between the domain or construct a measure actually captures and the domain or construct the researcher conceptualizes or reports. Clear conceptualization of constructs and alignment with measures may be supported by reference to the outcome taxonomy and related measures presented here.