5.4 Determining Sample Size
It was previously stated that the sample size should always be as big as possible in order to deliver the most precise conclusions. This isn’t always a satisfactory answer, because collecting observations might be possible (but costly).
How big should \(n\) be?
Selecting an appropriate sample size could be determined by many constraints
budget, time, … (things that cannot really be dealt with statistically)
acceptable sampling error (we can deal with this)
Recall our confidence interval equation:
\[\bar{X}-Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\]
or
\[\bar{X} \pm Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\]
The term \(Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\) is one-half the width of the confidence interval. This is called the sampling error (or margin of error).
\[e = \pm Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\]
In our previous exercises, we were given a sample size \((n)\) and used our calculations to determine the width of the confidence interval \((2e)\). If we instead wanted to fix the margin of error, then we can let the above identify determine how big our sample size needs to be.
\[n = \left( \frac{Z_{\frac{\alpha}{2}}\sigma}{e}\right)^2\]
Going back to our call center example, suppose that quality control demanded a 95% confidence interval with a 15 second (0.25 minute) margin of error. This means that the 95% confidence interval can only be 0.5 minutes wide. How many calls need to be in the sample?
= 0.05
alpha = qnorm(alpha/2,lower.tail = FALSE)
Z = 5.8
Xbar = 2.815
Sig = 0.25
e
n = (Z*Sig/e)^2) (
## [1] 487.0493
# Round up since you can't have a fraction of an observation
ceiling(n)
## [1] 488
Our analysis indicates that if you want this particular a margin of error, then you will need to collect a sample of 488 calls.
You might have noticed that we did something a bit incorrect in the last exercise. We specified a Z distribution and called the sample standard deviation \(\sigma\). Note that only in these sort of applications that determine a sample size is this permissible. The reason is because a sample standard deviation obviously depends on the sample in question. We therefore need to assume that the standard deviation is fixed when calculating the sample size (even though this isn’t the case). Once you determine a sample size, then you collect a sample, calculate the sample standard deviation, and calculate the appropriate confidence interval. The margin of error should be reasonably close to what was required.