Central Limit Theorem

The central limit theorem states that the sample mean of a large number of independent identically-distributed random variables is approximately a normal distribution (a Gaussian).

Given a distribution, Y, with mean μ and standard deviation σ, and n random variables with that distribution, the sample mean, Ȳ is calculated as the sum of all n distributions, divided by n:

Ȳ = Y₁ + Y₂ + ... + Y_nn

and μ = μ σ = σn

The central limit theorem states that as n gets large, the distribution Ȳ becomes approximately normal, no matter what the distribution of Y is!

Hard to believe, isn't it?

Let's test it!

Draw a distribution in the box below, and this program will calculate the sample mean distribution Ȳ for n identical copies of the distribution Y you drew. If the central limit theorem is true, then no matter what you draw, as n gets large the sample mean will eventually become a normal distribution!

This program calculates the distribution of Ȳ using the formula given above. For each y value, the probability p(Ȳ = y) is calculated by adding together the probability of every possible combination of the sum of Y_i divided by n, that produces that y value.

This algorithm approximates the continuous distribution with a discrete distribution (an array of floating-point numbers). For efficiency, it only ever does the Ȳ computation for two distributions. Each time, n is doubled, and it uses two copies of the previous Ȳ distribution. This works by using the distributive and associative property on the sum of Y_i. The result of summing the two halves is exactly the same as summing everything, but it's much easier to compute! For those who are curious, click here to see the code!

(Note that the distributions are scaled to fit inside the box. They all have an area of 1 by definition.)