This paper provides a guide to calculating statistical power for the complex multilevel designs that are used in most field studies in education research. For multilevel evaluation studies in the field of education, it is important to account for the impact of clustering on the standard errors of estimates of treatment effects. Using ideas from survey research, the paper explains how sample design induces random variation in the quantities observed in a randomized experiment, and how this random variation relates to statistical power. The manner in which statistical power depends upon the values of intraclass correlations, sample sizes at the various levels, the standardized average treatment effect (effect size), the multiple correlation between covariates and the outcome at different levels, and the heterogeneity of treatment effects across sampling units is illustrated. Both hierarchical and randomized block designs are considered. The paper demonstrates that statistical power in complex designs involving clustered sampling can be computed simply from standard power tables using the idea of operational effect sizes: effect sizes multiplied by a design effect that depends on features of the complex experimental design. These concepts are applied to provide methods for computing power for each of the research designs most frequently used in education research.