Skip Navigation
Perceptions and Expectations of Youth With Disabilities  (NLTS2)
NCSER 2007-3006
September 2007

Estimating Standard Errors

Each estimate reported in the data tables is accompanied by a standard error. A standard error acknowledges that any population estimate that is calculated from a sample will only approximate the true value for the population. The true population value will fall within the ranged demarcated by the estimate, plus or minus 1.96 times the standard error 95 percent of the time. For example, if an estimate for youth holding a particular view is 29 percent, with a standard error of 1.82, one can be 95 percent confident that the true rate of holding the view in question for the population is between 25.4 percent and 32.6 percent.

Because the NLTS2 sample is both stratified and clustered, calculating standard errors by formula is not straightforward. Standard errors for means and proportions were estimated using pseudoreplication, a procedure that is widely used by the U.S. Census Bureau and other federal agencies involved in fielding complex surveys. To that end, a set of weights was developed for each of 32 balanced half-replicate subsamples. Each half-replicate involved selecting half of the total set of LEAs that provided contact information using a partial factorial balanced design (resulting in about half of the LEAs being selected within each stratum) and then weighting that half to represent the entire universe. The half-replicates were used to estimate the variance of a sample mean by (1) calculating the mean of the variable of interest on the full sample and each half-sample using the appropriate weights; (2) calculating the squares of the deviations of the half-sample estimate from the full sample estimate; and (3) adding the squared deviations and dividing by (n-1) where n is the number of half-replicates.

Although the procedure of pseudoreplication is less unwieldy than development of formulas for calculating standard errors, it is not easily implemented using the Statistical Analysis System (SAS), the analysis program used for NLTS2, and it is computationally intensive. In the past, it was possible to develop straightforward estimates of standard errors using the effective sample size.

When respondents are independent and identically distributed, the effective sample size for a weighted sample of N respondents can be approximated as

equation

where Neff is the effective sample size, E2 [W] is the square of the arithmetic average of the weights and V[W] is the variance of the weights. For a variable X, the standard error of estimate can typically be approximated by equation , where V[X] is the weighted variance of X.

Due to the complex NLTS2 sampling design, traditional variance estimates for weighted means will not yield accurate estimates. One method for estimating the variances of weighted means is to use pseudo-replication on the primary sampling units. Unfortunately, this method is computationally intensive. We developed a computationally less intensive variance formula, which was tested by calculating variance estimates using pseudo-replication and the alternative formula for a variety of categorical and continuous Wave 1 variables. Overall, the formula yielded excellent average agreement, but there were instances of under- and over-estimation, which could have been due to sampling variability in either variance estimate (i.e., the estimate obtained via pseudo-replication, or the estimate obtained via the alternative formula). To be conservative (i.e., not to inadvertently underestimate the variance), we modified the alternative variance formula by incorporating a "safety factor" by multiplying the formula-derived variance by 1.25. This yielded estimates via modified formula that slightly exceeded the variance estimates via pseudo-replication for approximately 90% of the categorical and 90% of the continuous variables that were examined.

Top