2
So how much evidence is considered "enough" to
reject the null hypothesis and say that we have statistically significant
results? Our decision is determined by the p-value.
The
p-value
is the probability,
assuming the null hypothesis is true
, that we would obtain a sample statistic at least as
extreme as we did.
•
Large p-value
→
This means that if the null hypothesis is true, there is a
strong chance that we would have observed this data. A large p-value
supports the null hypothesis (H
0
). We fail to reject the null hypothesis.
The results are not statistically significant. Nothing to see here.
•
Small p-value
→
This means that if the null hypothesis is true, we were
unlikely
to have observed this data. A small p-value supports the
alternative hypothesis (H
a
). We reject the null hypothesis. The results
are statistically significant. Woo hoo! We have found something
interesting!
We set a threshold for the p-value before we perform the hypothesis test. This cut-off value, called
α
(pronounced
"alpha"),
is used to determine when the p-value is small enough for us to reject the null hypothesis, thus concluding that
the results are statistically significant.
The
significance level,
α
, is the probability of mistakenly rejecting the null hypothesis when it is actually true. We want
to keep our chances of making such a mistake very low, so we set the significance level low, often using
α
= 0.05 as our
significance level.
Speaking of mistakes, there is always a chance that we will make the wrong decision when performing a hypothesis test.
There are four possible scenarios for the correctness, or incorrectness, of a hypothesis test decision:
The null hypothesis is true, and we fail to reject it. That is, H
0
is true, and we choose H
0
as the "correct"
hypothesis. We have made the right decision!
The alternative hypothesis is true, and we reject the null hypothesis. That is, H
a
is true, and we choose H
a
as
the "correct" hypothesis.
We have made the right decision!
The null hypothesis is true, but we reject it. That is, H
0
is true, but we choose H
a
as the "correct" hypothesis.
This the wrong decision! Siding with the alternative hypothesis when the null hypothesis is actually true is a
Type I Error
.
The alternative hypothesis is true, but we fail to reject the null hypothesis. That is, H
a
is true, but we choose
H
0
as the "correct" hypothesis.
This is the wrong decision! Siding with the null hypothesis when the alternative
hypothesis is actually true is a
Type II Error
.
Note that we never know if we made a correct decision or an error when we perform a hypothesis test.
However, we do know that if we perform many hypothesis tests using a significance level of
α
= 0.05,
then over the long run we would make a Type I error about 5% of the time. We control for the Type I
error because it typically has more serious consequences (for example, saying that a pharmaceutical
drug works when it really doesn't).