4.1 The CLT (Formally)
Recall the concept of sampling distribution from chapter 2. For every randomly selected sample (i.e., a subset of the population), you can calculate a sample mean. If you were to repeatedly collect random samples and record their sample means, then you would be able to construct a sampling distribution of the sample mean values. Looking at the frequency of values (i.e., a frequency distribution) would give you an idea of where you think the mean value from the next sample you would randomly draw will be. The statistical properties of this sampling distribution is where the educated guessing is coming from.
So here is the CLT formally…
The central limit theorem states that if you have a population with mean \(\mu\) and standard deviation \(\sigma\) and take sufficiently large random samples of size \(n\) from the population with replacement, then the distribution of the sample means will be approximately normally distributed.
There are some finer details to note.
Given the population parameters \(\mu\) and \(\sigma\), the resulting sampling distribution will be a normal distribution with mean \(\mu\) and standard deviation \(\sigma / \sqrt{n}\).
This will hold true regardless of whether or not the source population is normal, provided the sample size is sufficiently large (usually \(n > 30\)).
If the population distribution is normal, then the theorem holds true even for samples smaller than 30.
This means that we can use the normal probability distribution to quantify uncertainty when making inferences about a population mean based on the sample mean.
Now, the CLT can be proven - but I think it’s better to illustrate the CLT with a couple of examples.