If you can't view the applet, update your browser or download the Java 2 plug-in here

Why divide by n-1 when computing the sample standard deviation?
A simulation to demonstrate why this is.
You may choose the sample size and parent distribution variance. Pressing Reset registers these changes and resets the "trend" data.

 

The sample variance is obtained by dividing the sum of squares of deviations of each data point from the mean by n - 1.  To the beginning statistics student, it often makes more sense to divide by n, as this would seem to give an "average" squared deviation.  This computation:

is misleadingly referred to as the “population” variance in many contexts.  It is called this because it correctly computes the variance for a finite population of size n.

 

However, if the n items are from a sample of the population, then the sample variance is in fact:

 

If n  is large, both computations give similar results.  If n is small, however, the computations differ.  It can be shown that the latter computation (division by n-1) yields an "unbiased" estimator of the population variance.

To use this applet to show this result, choose a small value of n and generate repeated samples.  You will note that, over many samples, the "mean" of the sample variances approaches the true variance of the parent population, whereas the mean of the so-called population variances does not.

The sample standard deviation is, however, a biased estimator of the standard deviation of the parent population.

*This computation, for a sample of size n, is the “maximum likelihood” estimate of the population variance.  As this applet demonstrates, it is a “biased” estimator of the population variance, but can be shown to have lower mean squared error than the sample variance.

WUHS Home Page

L. Fisher Home Page