Skip Navigation
Funding Opportunities | Search Funded Research Grants and Contracts

IES Grant

Title: Developing and Validating the Next Generation of Leadership Evaluation Tools: Formative Assessment for High Stakes Accountability
Center: NCER Year: 2009
Principal Investigator: Halverson, Richard Awardee: University of Wisconsin, Madison
Program: Education Leadership      [Program Details]
Award Period: 9/1/09-8/31/13 Award Amount: $1,600,000
Type: Measurement Award Number: R305A090265

Co-Principal Investigator(s): Kelley, Carolyn

Purpose: In this project, the researchers proposed to develop and validate the Comprehensive Assessment of Leadership for Learning (CALL), a rubric-based online formative assessment system for middle and high school leaders to self-evaluate and to guide the development of critical leadership practices.

Project Activities: The proposed activities fell into two main categories:

  • the design, implementation, and iterative redesign of the CALL web-based system
  • the proposed studies to establish the CALL validity.

The researchers brought together the ideas to guide rubric construction and validation study design. System designers were responsible for developing the Web-based assessment system. Practitioners contributed to describing tasks across quality dimensions to ensure feedback that provided clear guidance for leadership development and sensitivity to variations in school context.

Structured Abstract


Setting: Reliability and validity studies will take place in middle and high schools in four urban school districts from across the country (Madison, WI; Racine, WI; El Paso, TX; and Fairfax County, VA) and in several smaller rural and suburban districts.

Population: Middle and high school principals who have completed the Wisconsin Master Educator Assessment Process (WMEAP) will participate in CALL development. For the validation studies, teams from participating middle and high schools will include the principal, an assistant principal for instruction, an assistant principal for discipline or dean, the department chair or lead teacher for English language arts, the department chair or lead teacher for math, the leaders of the guidance or student services department, and six randomly selected teachers.

Measure: CALL is a formative assessment, meaning it provides information crucial for modifying the thinking or behavior of the learner (e.g., principal) toward intended outcomes. CALL provides an online rubric that will allow teams of school leaders and teachers to assess themselves in terms of core leadership tasks and to receive feedback that will scaffold efforts to improve local practices. CALL will focus on leadership tasks rather than leadership roles in order to draw the focus of the assessment away from summative judgment of positional leaders and toward measuring and understanding the kinds of work necessary to improve student learning. The resulting CALL reports can then be used as planning documents to help schools determine which tasks will be necessary to improve leadership for learning and to assign who will be responsible for conducting these tasks.

The initial content for CALL will be provided by two prior rubric-based evaluation systems developed by the project's Primary Investigators: Richard Halverson's School Leadership Rubrics and Carolyn Kelley's Socio-Cognitive Leadership Rubrics. The School Leadership Rubrics focus on five central tasks of school leadership: maintaining a focus on learning, monitoring teaching and learning, building a nested learning community, acquiring and allocating resources, and maintaining a safe learning environment. The Socio-Cognitive Leadership Rubrics ask reflective questions about advancing equity and excellence in student learning, developing teacher capacity, managing and aligning resources, and building and engaging community.

Research Design and Methods: The CALL development model is guided by core concepts of collaborative design. Collaborative design processes involve teams of researchers, practitioners, and designers in efforts to build tools that can be better implemented in contexts of practice. In phase 1 (development), the collaborative design teams will critically review the existing rubric sets to examine the task descriptions and articulations appropriate to middle and high school contexts and to suggest revisions. This phase consists of five studies: reviewing constructs, item selection, content validity, user testing, and item distribution analysis.

In phase 2 (validation), data will be collected from participating schools that use CALL. This evaluation effort will emphasize the collection of four types of evidence: evidence that different assessors agree on ratings of performance, evidence that the ratings are measuring the performance dimensions or constructs they are intended to measure, evidence that the assessment ratings are related to other indicators of school or leader performance, and evidence that implementation of the assessment is related to changes in leadership practice. The primary co-investigators will work to integrate new information arising from implementation back into the system design.

Data Analytic Strategy: Inter-rater agreement will be calculated at each school. Regression models will treat agreement levels as a function of school leadership demographic information and school characteristics so that coefficients may be compared across schools.

To assess the relationship between CALL ratings and student achievement, researchers will compile or develop value-added measures of school average student achievement, then correlate these with CALL ratings (including ratings of the dimensions and an average across the dimensions). Similarly, to assess the relationship between CALL ratings and school climate, CALL ratings will be correlated with school climate survey data. If school districts conduct their own summative evaluations, these will also be analyzed to examine correlation with CALL ratings.

Construct validity will be analyzed by calculating the intercorrelations between the various dimensions measured by CALL. If the assessment dimensions represent a set of distinct but related constructs, the dimension ratings will be correlated with each other but not so highly that one dimension rating is a nearly perfect predictor of another. This would indicate that the ratings are not measuring distinct constructs. Researchers will also conduct hierarchical confirmatory factor analyses of both the beginning and end of year CALL ratings. To assess consequential validity, researchers will collect evidence to determine whether leadership teams using the CALL focus their efforts on behaviors and performances emphasized as needing improvement by the formative assessment tools.


ERIC Citations: Find available citations in ERIC for this award here.

Project Website:

Additional Online Resources and information:

Select Publications:


Halverson, R., & Kelley, C. (2017). Mapping leadership: The tasks that matter for improving teaching and learning in schools. John Wiley & Sons.

Journal articles

Blitz, M. H., and Modeste, M. (2015). The Differences Across Distributed Leadership Practices by School Position According to the Comprehensive Assessment of Leadership for Learning (CALL). Leadership and Policy in Schools, 14(3): 341–379.

Blitz, M., Salisbury, J. and Kelley, C. (2014). The Role of Cognitive Validity Testing in the Development of CALL, the Comprehensive Assessment of Leadership for Learning. Journal of Educational Administration, 52(3): 358–378.

Halverson, R., Kelley, C., and Shaw, J. (2014). A CALL for Improved School Leadership. Phi Delta Kappan, 95(6): 57–60.

Kelley, C., & Dikkers, S. (2016). Framing feedback for school improvement around distributed leadership. Educational Administration Quarterly, 52(3), 392–422.

Kelley, C., and Halverson, R. (2012). The Comprehensive Assessment of Leadership for Learning: A Next Generation Formative Evaluation and Feedback System. Journal of Applied Research on Children: Informing Policy for Children at Risk, 3(2).

Min, S., Modeste, M. E., Salisbury, J., & Goff, P. T. (2016). Heeding the CALL (Comprehensive Assessment of Leadership for Learning): An inquiry into instructional collaboration among school professionals. Journal of Educational Administration, 54(2), 135–151.

Salisbury, J., Goff, P., & Blitz, M. (2019). Comparing CALL and VAL-ED: an illustrative application of a decision matrix for leadership feedback instruments. Journal of School Leadership, 29(1), 84–112.