6.4 Conducting a hypothesis test (when $\sigma$ is unknown)

When the population standard deviation $(\sigma)$ is unknown, it must be estimated. Just like with confidence intervals, When you replace $\sigma$ with it’s estimate $S$, you change the distribution from Z to t (and need to mind the degrees of freedom).

That’s the only difference

Let’s go through some applications when $\sigma$ is unknown. You will see that the only difference is that we use a t distribution with $n-1$ degrees of freedom to calculate rejection / nonrejection regions and p-values.

Application 2

The Saxon Home Improvement Co. has had a mean per sales invoice of $120 over the last 5 years and would like to know if the mean amount per sales invoice has significantly changed. This is enough information to state our hypotheses for a two-sided test.⁸

\[H_0:\mu=120 \quad versus \quad H_0:\mu \neq 120\]

You collected a sample of 12 observations, and concluded that the sample mean was $112.85 and the sample standard deviation was $20.80.

\[\bar{X}=112.85, \quad n=12, \quad S=20.80\]

This information allows us to calculate a t-test statistic under the null. The only difference is that we now have a sample standard deviation $(S)$ were we once had a population standard deviation $(\sigma)$.

Xbar = 112.85
n = 12
S = 20.80
mu = 120

(t = (Xbar - mu) / (S/sqrt(n)))

## [1] -1.190785

\[ t = \frac{\bar{X}-\mu}{\left(S / \sqrt{n} \right)}=\frac{112.85-120}{\left(20.80 / \sqrt{12} \right)}=-1.19\]

Now that we have our test statistic, we need to determine if it falls into our nonrejection or rejection regions. The important thing to realize is that these regions are now part of a t distribution with 11 $(n-1)$ degrees of freedom. If we consider 95% confidence…

alpha = 0.05
(tcrit = qt(alpha/2,n-1,lower.tail=FALSE))

## [1] 2.200985

The calculations suggest that the nonrejection region is between $\pm 2.2$. Since our test statistic falls within this region, we do not reject the null. This implies that we do not have evidence that the population average sales invoice has significantly changed from $120 with 95% confidence. The conclusion is therefore do not reject.

We could also calculate a p-value for the test:

(Pval = pt(t,n-1)*2)

## [1] 0.2588003

# Highest confidence interval for rejection:
((1-Pval)*100)

## [1] 74.11997

Notice here that the p-value states that if we were to reject the null, then we would incur a 25.88% chance of being wrong. This means that we could only reject the null with 74.12% confidence.

Note that the calculations uses a new R command: pt(q,df). This command calculates the probability under a t distribution the same way the pnorm(q) command calculates the probability under a standard normal distribution. In addition, I again encourage you to always visualize the distribution and explicitly draw the rejection and nonrejection regions. This is extremely helpful when first getting started. Below you will also see a note I wrote for a previous class reinforcing how R likes to calculate probabilities. It is for reference if needed.

Note the language - significantly changed means that the value could have either gone up or down. This is why it is a two-sided test.↩︎

6.4 Conducting a hypothesis test (when \(\sigma\) is unknown)

Application 2