Developing and Validating the Next Generation of Leadership Evaluation Tools: Formative Assessment for High Stakes Accountability
Co-Principal Investigator: Carolyn Kelley
This project aims to develop and validate Comprehensive Assessment of Leadership for Learning (CALL), a rubric-based online formative assessment system that can be used by middle and high schools to self-evaluate and to guide the development of critical leadership practices.
Project Activities: Activities fall into two main categories: the design, implementation, and iterative redesign of the CALL web-based system; and the proposed studies to establish the CALL validity. Researchers will have the primary responsibility for bringing together the ideas that guide rubric construction and validation study design; designers will be responsible for developing the Web-based assessment system; and practitioners will contribute to describing tasks across quality dimensions to ensure feedback that provides clear guidance for leadership development and sensitivity to variations in school context.
Products: This measurement project will produce a fully developed and validated online formative assessment system to evaluate and guide the development of school leadership practices.
Setting: Reliability and validity studies will take place in middle and high schools in four urban school districts from across the country (Madison, WI; Racine, WI; El Paso, TX; and Fairfax County, VA) and in several smaller rural and suburban districts.
Population: Middle and high school principals who have completed the Wisconsin Master Educator Assessment Process (WMEAP) will participate in CALL development. For the validation studies, teams from participating middle and high schools will include the principal, an assistant principal for instruction, an assistant principal for discipline/dean, the department chair/lead teacher for English/language arts, the department chair/lead teacher for math, the leaders of the guidance/student services department, and six randomly selected teachers.
Intervention: CALL is a formative assessment, meaning it provides information crucial for modifying the thinking or behavior of the learner (i.e., principal) toward intended outcomes. CALL provides an online rubric that will allow teams of school leaders and teachers to assess themselves in terms of core leadership tasks and to receive feedback that will scaffold efforts to improve local practices. CALL will focus on leadership tasks rather than leadership roles in order to draw the focus of the assessment away from summative judgment of positional leaders and toward measuring and understanding the kinds of work necessary to improve student learning. The resulting CALL reports can then be used as planning documents to help schools determine which tasks will be necessary to improve leadership for learning and to assign who will be responsible for conducting these tasks.
The initial content for CALL will be provided by two prior rubric-based evaluation systems developed by the project's Primary Investigators: Richard Halverson's School Leadership Rubrics and Carolyn Kelley's Socio-Cognitive Leadership Rubrics. The School Leadership Rubrics focus on five central tasks of school leadership: maintaining a focus on learning; monitoring teaching and learning; building a nested learning community; acquiring and allocating resources; and maintaining a safe learning environment. The Socio-Cognitive Leadership Rubrics ask reflective questions about advancing equity and excellence in student learning, developing teacher capacity, managing and aligning resources, and building and engaging community.
Research Design and Methods: The CALL development model is guided by core concepts of collaborative design. Collaborative design processes involve teams of researchers, practitioners, and designers in efforts to build tools that can be better implemented in contexts of practice. In Phase One (development), the collaborative design teams will critically review the existing rubric sets to examine the task descriptions and articulations appropriate to middle and high school contexts and to suggest revisions. This phase consists of five studies: reviewing constructs, item selection, content validity, user testing, and item distribution analysis.
In Phase Two (validation), data will be collected from participating schools that use CALL. This evaluation effort will emphasize the collection of four types of evidence: evidence that different assessors agree on ratings of performance, evidence that the ratings are measuring the performance dimensions or constructs they are intended to measure, evidence that the assessment ratings are related to other indicators of school or leader performance, and evidence that implementation of the assessment is related to changes in leadership practice. The primary co-investigators will work to integrate new information arising from implementation back into the system design.
Data Analytic Strategy: Inter-rater agreement will be calculated at each school. Regression models will treat agreement levels as a function of school leadership demographic information and school characteristics so that coefficients may be compared across schools.
To assess the relationship between CALL ratings and student achievement, researchers will compile or develop value-added measures of school average student achievement, then correlate these with CALL ratings (including ratings of the dimensions and an average across the dimensions). Similarly, to assess the relationship between CALL ratings and school climate, CALL ratings will be correlated with school climate survey data. If school districts conduct their own summative evaluations, these will also be analyzed to examine correlation with CALL ratings.
Construct validity will be analyzed by calculating the intercorrelations between the various dimensions measured by CALL. If the assessment dimensions represent a set of distinct but related constructs, the dimension ratings will be correlated with each other, but not so highly that one dimension rating is a nearly perfect predictor of another. (This would indicate that the ratings are not measuring distinct constructs.) Researchers will also conduct hierarchical confirmatory factor analyses of both the beginning and end of year CALL ratings. To assess consequential validity, researchers will collect evidence to determine whether leadership teams using the CALL focus their efforts on behaviors and performances emphasized as needing improvement by the formative assessment tools.
Project Websites: https://www.leadershipforlearning.org/ (measure) and https://web.education.wisc.edu/halverson/?page_id=164 (research proposal and rubrics)
Journal article, monograph, or newsletter
Blitz, M. H., and Modeste, M. (2015). The Differences Across Distributed Leadership Practices by School Position According to the Comprehensive Assessment of Leadership for Learning (CALL). Leadership and Policy in Schools, 14(3): 341–379.
Blitz, M., Salisbury, J. and Kelley, C. (2014). The Role of Cognitive Validity Testing in the Development of CALL, the Comprehensive Assessment of Leadership for Learning. Journal of Educational Administration, 52(3): 358–378.
Halverson, R., Kelley, C., and Shaw, J. (2014). A CALL for Improved School Leadership. Phi Delta Kappan, 95(6): 57–60.
Kelley, C., and Halverson, R. (2012). The Comprehensive Assessment of Leadership for Learning: A Next Generation Formative Evaluation and Feedback System. Journal of Applied Research on Children: Informing Policy for Children at Risk, 3(2).
Blitz, M. (2012). A Case Study Comparison of School Leadership Practice Against the Comprehensive Assessment of Leadership for Learning (CALL) Pilot Results (WCER No. 2012–5). Madison, WI: University of Wisconsin–Madison Working Paper.
Blitz, M.H., and Salisbury, S. (2012). The Role of Cognitive Validity Testing to Understand Leadership Practice in the Development of CALL, the Comprehensive Assessment of Leadership for Learning, WCER Working Paper No. 2012–11 (WCER No. 2012–11). Madison, Wisconsin: University of Wisconsin–Madison Working Paper.
Goff, P., Salisbury, J., and Blitz, M. (2015). Comparing CALL and VAL-ED: An Illustrative Application of a Decision Matrix for Selecting Among Leadership Feedback Instruments (WCER Working Paper No. 2015 5). University of Wisconsin ~ Wisconsin Center for Education Research.