Hypothesis Testing: One Sample
Tests of Significance
Tests of significance are a statistical technology used for ascertaining the likelihood of empirical data, and (from there) for inferring a real effect.Learning Objectives
Examine the idea of statistical significance and the fundamentals behind the corresponding tests.Key Takeaways
Key Points
- In relation to Fisher, statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance.
- In statistical testing, a result is deemed statistically significant if it is so extreme that such a result would be expected to arise simply by chance only in rare circumstances.
- Statistical significance refers to two separate notions: the -value and the Type I error rate.
- A typical test of significance comprises two related elements: the calculation of the probability of the data and an assessment of the statistical significance of that probability.
Key Terms
- statistical significance: A measure of how unlikely it is that a result has occurred by chance.
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

Sir Ronald Fisher: Sir Ronald Fisher was an English statistician, evolutionary biologist, geneticist, and eugenicist who standardized the interpretation of statistical significance (starting around 1925), and was the main driving force behind the popularity of tests of significance in empirical research, especially in the social and behavioral sciences.
- the -value, (the probability that the observed data would occur by chance in a given true null hypothesis ); or
- the Type I error rate (false positive rate) of a statistical hypothesis test (the probability of incorrectly rejecting a given null hypothesis in favor of a second alternative hypothesis).
In relation to Fisher, statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance. The fundamental challenge is that any partial picture of a given hypothesis, poll or question is subject to random error. In statistical testing, a result is deemed statistically significant if it is so extreme (without external variables which would influence the correlation results of the test) that such a result would be expected to arise simply by chance only in rare circumstances. Hence the result provides enough evidence to reject the hypothesis of "no effect. "
Reading Tests of Significance
A typical test of significance comprises two related elements:- the calculation of the probability of the data, and
- an assessment of the statistical significance of that probability.
Probability of the Data
The probability of the data is normally reported using two related statistics:- a test statistic (,,…), and
- an associated probability (,).
The information provided by the test statistic is of little immediate usability and can be ignored in most cases. The associated probability, on the other hand, tells how probable the test results are and forms the basis for assessing statistical significance.
Statistical Significance
The statistical significance of the results depends on criteria set up by the researcher beforehand. A result is deemed statistically significant if the probability of the data is small enough, conventionally if it is smaller than 5% (As an example, consider the following test statistics:
In this example, the test statistics are
Elements of a Hypothesis Test
A statistical hypothesis test is a method of making decisions using data from a scientific study.Learning Objectives
Outline the steps of a standard hypothesis test.Key Takeaways
Key Points
- Statistical hypothesis tests define a procedure that controls (fixes) the probability of incorrectly deciding that a default position ( null hypothesis ) is incorrect based on how likely it would be for a set of observations to occur if the null hypothesis were true.
- The first step in a hypothesis test is to state the relevant null and alternative hypotheses; the second is to consider the statistical assumptions being made about the sample in doing the test.
- Next, the relevant test statistic is stated, and its distribution is derived under the null hypothesis from the assumptions.
- After that, the relevant significance level and critical region are determined.
- Finally, values of the test statistic are observed and the decision is made whether to either reject the null hypothesis in favor of the alternative or not reject it.
Key Terms
- significance level: A measure of how likely it is to draw a false conclusion in a statistical test, when the results are really just random variations.
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
Statistical hypothesis tests define a procedure that controls (fixes) the probability of incorrectly deciding that a default position (null hypothesis) is incorrect based on how likely it would be for a set of observations to occur if the null hypothesis were true. Note that this probability of making an incorrect decision is not the probability that the null hypothesis is true, nor whether any specific alternative hypothesis is true. This contrasts with other possible techniques of decision theory in which the null and alternative hypothesis are treated on a more equal basis.
The Testing Process
The typical line of reasoning in a hypothesis test is as follows:- There is an initial research hypothesis of which the truth is unknown.
- The first step is to state the relevant null and alternative hypotheses. This is important as mis-stating the hypotheses will muddy the rest of the process.
- The second step is to consider the statistical assumptions being made about the sample in doing the test—for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is important because invalid assumptions will mean that the results of the test are invalid.
- Decide which test is appropriate, and state the relevant test statistic .
- Derive the distribution of the test statistic under the null hypothesis from the assumptions.
- Select a significance level (), a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%.
- The distribution of the test statistic under the null hypothesis partitions the possible values of into those for which the null hypothesis is rejected, the so called critical region, and those for which it is not. The probability of the critical region is.
- Compute from the observations the observed value of the test statistic.
- Decide to either reject the null hypothesis in favor of the alternative or not reject it. The decision rule is to reject the null hypothesis if the observed valueis in the critical region, and to accept or "fail to reject" the hypothesis otherwise.
An alternative process is commonly used:
7. Compute from the observations the observed value
8. From the statistic calculate a probability of the observation under the null hypothesis (the
9. Reject the null hypothesis in favor of the alternative or not reject it. The decision rule is to reject the null hypothesis if and only if the
The two processes are equivalent. The former process was advantageous in the past when only tables of test statistics at common probability thresholds were available. It allowed a decision to be made without the calculation of a probability. It was adequate for classwork and for operational use, but it was deficient for reporting results. The latter process relied on extensive tables or on computational support not always available. The calculations are now trivially performed with appropriate software.

Tea Tasting Distribution: This table shows the distribution of permutations in our tea tasting example.
The Null and the Alternative
The alternative hypothesis and the null hypothesis are the two rival hypotheses that are compared by a statistical hypothesis test.Learning Objectives
Differentiate between the null and alternative hypotheses and understand their implications in hypothesis testing.Key Takeaways
Key Points
- The null hypothesis refers to a general or default position: that there is no relationship between two measured phenomena, or that a potential medical treatment has no effect.
- In the testing approach of Ronald Fisher, a null hypothesis is potentially rejected or disproved, but never accepted or proved.
- In the hypothesis testing approach of Jerzy Neyman and Egon Pearson, a null hypothesis is contrasted with an alternative hypothesis, and these are decided between on the basis of data, with certain error rates.
- The four principal types of alternative hypotheses are: point, one-tailed directional, two-tailed directional, and non-directional.
Key Terms
- alternative hypothesis: a rival hypothesis to the null hypothesis, whose likelihoods are compared by a statistical hypothesis test
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
The Null Hypothesis
The null hypothesis refers to a general or default position: that there is no relationship between two measured phenomena, or that a potential medical treatment has no effect. Rejecting or disproving the null hypothesis (and thus concluding that there are grounds for believing that there is a relationship between two phenomena or that a potential treatment has a measurable effect) is a central task in the modern practice of science and gives a precise sense in which a claim is capable of being proven false.The concept of a null hypothesis is used differently in two approaches to statistical inference, though the same term is used, a problem shared with statistical significance. In the significance testing approach of Ronald Fisher, a null hypothesis is potentially rejected or disproved on the basis of data that is significantly under its assumption, but never accepted or proved. In the hypothesis testing approach of Jerzy Neyman and Egon Pearson, a null hypothesis is contrasted with an alternative hypothesis, and these are decided between on the basis of data, with certain error rates.

Sir Ronald Fisher: Sir Ronald Fisher, pictured here, was the first to coin the term null hypothesis.
The Alternative Hypothesis
In the case of a scalar parameter, there are four principal types of alternative hypothesis:- Point. Point alternative hypotheses occur when the hypothesis test is framed so that the population distribution under the alternative hypothesis is a fully defined distribution, with no unknown parameters. Such hypotheses are usually of no practical interest but are fundamental to theoretical considerations of statistical inference.
- One-tailed directional. A one-tailed directional alternative hypothesis is concerned with the region of rejection for only one tail of the sampling distribution.
- Two-tailed directional. A two-tailed directional alternative hypothesis is concerned with both regions of rejection of the sampling distribution.
- Non-directional. A non-directional alternative hypothesis is not concerned with either region of rejection, but, rather, only that the null hypothesis is not true.
The concept of an alternative hypothesis forms a major component in modern statistical hypothesis testing; however, it was not part of Ronald Fisher's formulation of statistical hypothesis testing. In Fisher's approach to testing, the central idea is to assess whether the observed dataset could have resulted from chance if the null hypothesis were assumed to hold, notionally without preconceptions about what other model might hold. Modern statistical hypothesis testing accommodates this type of test, since the alternative hypothesis can be just the negation of the null hypothesis.
The Test
A hypothesis test begins by consider the null and alternate hypotheses, each containing an opposing viewpoint.Since the null and alternate hypotheses are contradictory, we must examine evidence to decide if there is enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.
We can make a decision after determining which hypothesis the sample supports (there are two options for a decision). They are "reject
Examples
Example 1
Example 2
We want to test whether the mean grade point average in American colleges is different from 2.0 (out of 4.0).Example 3
We want to test if college students take less than five years to graduate from college, on the average.Type I and Type II Errors
If the result of a hypothesis test does not correspond with reality, then an error has occurred.Learning Objectives
Distinguish between Type I and Type II error and discuss the consequences of each.Key Takeaways
Key Points
- A type I error occurs when the null hypothesis () is true but is rejected.
- The rate of the type I error is called the size of the test and denoted by the Greek letter (alpha).
- A type II error occurs when the null hypothesis is false but erroneously fails to be rejected.
- The rate of the type II error is denoted by the Greek letter (beta) and related to the power of a test (which equals).
Key Terms
- type II error: Accepting the null hypothesis when the null hypothesis is false.
- Type I error: Rejecting the null hypothesis when the null hypothesis is true.
If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. Due to the statistical nature of a test, the result is never, except in very rare cases, free of error. The two types of error are distinguished as type I error and type II error. What we actually call type I or type II error depends directly on the null hypothesis, and negation of the null hypothesis causes type I and type II errors to switch roles.
Type I Error
A type I error occurs when the null hypothesis (The rate of the type I error is called the size of the test and denoted by the Greek letter
False Positive Error
A false positive error, commonly called a "false alarm," is a result that indicates a given condition has been fulfilled when it actually has not been fulfilled. In the case of "crying wolf," the condition tested for was "is there a wolf near the herd? " The actual result was that there had not been a wolf near the herd. The shepherd wrongly indicated there was one, by crying wolf.A false positive error is a type I error where the test is checking a single condition and results in an affirmative or negative decision, usually designated as "true or false."
Type II Error
A type II error occurs when the null hypothesis is false but erroneously fails to be rejected. It is failing to assert what is present, a miss. A type II error may be compared with a so-called false negative (where an actual "hit" was disregarded by the test and seen as a "miss") in a test checking for a single condition with a definitive result of true or false. A type II error is committed when we fail to believe a truth. In terms of folk tales, an investigator may fail to see the wolf ("failing to raise an alarm"). Again,The rate of the type II error is denoted by the Greek letter
False Negative Error
A false negative error is where a test result indicates that a condition failed, while it actually was successful. A common example is a guilty prisoner freed from jail. The condition: "Is the prisoner guilty? " actually had a positive result (yes, he is guilty). But the test failed to realize this and wrongly decided the prisoner was not guilty.A false negative error is a type II error occurring in test steps where a single condition is checked for and the result can either be positive or negative.
Consequences of Type I and Type II Errors
Both types of errors are problems for individuals, corporations, and data analysis. A false positive (with null hypothesis of health) in medicine causes unnecessary worry or treatment, while a false negative gives the patient the dangerous illusion of good health and the patient might not get an available treatment. A false positive in manufacturing quality control (with a null hypothesis of a product being well made) discards a product that is actually well made, while a false negative stamps a broken product as operational. A false positive (with null hypothesis of no effect) in scientific research suggest an effect that is not actually there, while a false negative fails to detect an effect that is there.Based on the real-life consequences of an error, one type may be more serious than the other. For example, NASA engineers would prefer to waste some money and throw out an electronic circuit that is really fine (null hypothesis: not broken; reality: not broken; test find: broken; action: thrown out; error: type I, false positive) than to use one on a spacecraft that is actually broken. On the other hand, criminal courts set a high bar for proof and procedure and sometimes acquit someone who is guilty (null hypothesis: innocent; reality: guilty; test find: not guilty; action: acquit; error: type II, false negative) rather than convict someone who is innocent.
Minimizing errors of decision is not a simple issue. For any given sample size the effort to reduce one type of error generally results in increasing the other type of error. The only way to minimize both types of error, without just improving the test, is to increase the sample size, and this may not be feasible. An example of acceptable type I error is discussed below.

Type I Error: NASA engineers would prefer to waste some money and throw out an electronic circuit that is really fine than to use one on a spacecraft that is actually broken. This is an example of type I error that is acceptable.
Significance Levels
If a test of significance gives aLearning Objectives
Outline the process for calculating aKey Takeaways
Key Points
- Significance levels may be used either as a cutoff mark for a -value or as a desired parameter in the test design.
- To compute a -value from the test statistic, one must simply sum (or integrate over) the probabilities of more extreme events occurring.
- In some situations, it is convenient to express the complementary statistical significance (so 0.95 instead of 0.05), which corresponds to a quantile of the test statistic.
- Popular levels of significance are 10% (0.1), 5% (0.05), 1% (0.01), 0.5% (0.005), and 0.1% (0.001).
- The lower the significance level chosen, the stronger the evidence required.
Key Terms
- Student's t-test: Any statistical hypothesis test in which the test statistic follows a Student's $t$ distribution if the null hypothesis is supported.
- p-value: The probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
-Value
In brief, the (left-tailed) 
Using Significance Levels
Popular levels of significance are 10% (0.1), 5% (0.05), 1% (0.01), 0.5% (0.005), and 0.1% (0.001). If a test of significance gives aIn some situations, it is convenient to express the complementary statistical significance (so 0.95 instead of 0.05), which corresponds to a quantile of the test statistic. In general, when interpreting a stated significance, one must be careful to make precise note of what is being tested statistically.
Different levels of cutoff trade off countervailing effects. Lower levels – such as 0.01 instead of 0.05 – are stricter and increase confidence in the determination of significance, but they run an increased risk of failing to reject a false null hypothesis. Evaluation of a given
Directional Hypotheses and One-Tailed Tests
A one-tailed hypothesis is one in which the value of a parameter is either above or equal to a certain value or below or equal to a certain value.Learning Objectives
Differentiate a one-tailed from a two-tailed hypothesis test.Key Takeaways
Key Points
- A one-tailed test or two-tailed test are alternative ways of computing the statistical significance of a data set in terms of a test statistic, depending on whether only one direction is considered extreme (and unlikely) or both directions are considered extreme.
- The terminology "tail" is used because the extremes of distributions are often small, as in the normal distribution or "bell curve".
- If the test statistic is always positive (or zero), only the one-tailed test is generally applicable, while if the test statistic can assume positive and negative values, both the one-tailed and two-tailed test are of use.
- Formulating the hypothesis as a "better than" comparison is said to give the hypothesis directionality.
- One-tailed tests are used for asymmetric distributions that have a single tail (such as the chi-squared distribution, which is common in measuring goodness-of-fit) or for one side of a distribution that has two tails (such as the normal distribution, which is common in estimating location).
Key Terms
- one-tailed hypothesis: a hypothesis in which the value of a parameter is specified as being either above or equal to a certain value or below or equal to a certain value
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
Two-Tailed Test: A two-tailed test corresponds to both extreme negative and extreme positive directions of the test statistic, here the normal distribution.
- above or equal to a certain value, or
- below or equal to a certain value.

One-Tailed Test: A one-tailed test, showing the
Applications of One-Tailed Tests
One-tailed tests are used for asymmetric distributions that have a single tail (such as the chi-squared distribution, which is common in measuring goodness-of-fit) or for one side of a distribution that has two tails (such as the normal distribution, which is common in estimating location). This corresponds to specifying a direction. Two-tailed tests are only applicable when there are two tails, such as in the normal distribution, and correspond to considering either direction significant.In the approach of Ronald Fisher, the null hypothesis
For example, if flipping a coin, testing whether it is biased towards heads is a one-tailed test. Getting data of "all heads" would be seen as highly significant, while getting data of "all tails" would not be significant at all (
Creating a Hypothesis Test
Creating a hypothesis test generally follows a five-step procedure.Learning Objectives
Design a hypothesis test utilizing the five steps listed in this text.Key Takeaways
Key Points
- The first step is to set up or assume a null hypothesis.
- The second step is to decide on an appropriate level of significance for assessing results.
- The third step is to decide between a one-tailed or a two-tailed statistical test.
- The fourth step is to interpret your results -- namely, your -value and observed test statistics.
- The final step is to write a report summarizing the statistical significance of your results.
Key Terms
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
1. Set up or assume a statistical null hypothesis (
- : Given our sample results, we will be unable to infer a significant correlation between the dependent and independent research variables.
- : It will not be possible to infer any statistically significant mean differences between the treatment and the control groups.
- : We will not be able to infer that this variable's distribution significantly departs from normality.
2. Decide on an appropriate level of significance for assessing results. Conventional levels are 5% (
3. Decide between a one-tailed or a two-tailed statistical test. A one-tailed test assesses whether the observed results are either significantly higher or smaller than the null hypothesis, but not both. Thus, one-tailed tests are appropriate when testing that results will only be higher or smaller than null results, or when the only interest is on interventions which will result in higher or smaller outputs. A two-tailed test, on the other hand, assesses both possibilities at once. It achieves so by dividing the total level of significance between both tails, which also implies that it is more difficult to get significant results than with a one-tailed test. Thus, two-tailed tests are appropriate when the direction of the results is not known, or when the researcher wants to check both possibilities in order to prevent making mistakes.

Two-Tailed Statistical Test: This image shows a graph representation of a two-tailed hypothesis test.
- Obtain and report the probability of the data. It is recommended to use the exact probability of the data, that is the '-value' (e.g.,, or). This exact probability is normally provided together with the pertinent statistic test (,,…).
- -values can be interpreted as the probability of getting the observed or more extreme results under the null hypothesis (e.g.,means that 3.3 times in 100, or 1 time in 33, we will obtain the same or more extreme results as normal [or random] fluctuation under the null).
- -values are considered statistically significant if they are equal to or smaller than the chosen significance level. This is the actual test of significance, as it interprets those-values falling beyond the threshold as "rare" enough as to deserve attention.
- If results are accepted as statistically significant, it can be inferred that the null hypothesis is not explanatory enough for the observed data.
5. Write Up the Report:
- All test statistics and associated exact -values can be reported as descriptive statistics, independently of whether they are statistically significant or not.
- Significant results can be reported in the line of "either an exceptionally rare chance has occurred, or the theory of random distribution is not true. "
- Significant results can also be reported in the line of "without the treatment I administered, experimental results as extreme as the ones I obtained would occur only about 3 times in 1000. Therefore, I conclude that my treatment has a definite effect.". Further, "this correlation is so extreme that it would only occur about 1 time in 100 (). Thus, it can be inferred that there is a significant correlation between these variables.
Testing a Single Proportion
Here we will evaluate an example of hypothesis testing for a single proportion.Learning Objectives
Construct and evaluate a hypothesis test for a single proportion.Key Takeaways
Key Points
- Our hypothesis test involves the following steps: stating the question, planning the test, stating the hypotheses, determine if we are meeting the test criteria, and computing the test statistic.
- We continue the test by: determining the critical region, sketching the test statistic and critical region, determining the -value, stating whether we reject or fail to reject the null hypothesis and making meaningful conclusions.
- Our example revolves around Michele, a statistics student who replicates a study conducted by Cell Phone Market Research Company in 2010 that found that 30% of households in the United States own at least three cell phones.
- Michele tests to see if the proportion of households owning at least three cell phones in her home town is higher than the national average.
- The sample data does not show sufficient evidence that the percentage of households in Michele's city that have at least three cell phones is more than 30%; therefore, we do not have strong evidence against the null hypothesis.
Key Terms
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
Hypothesis Test for a Single Proportion
For an example of a hypothesis test for a single proportion, consider the following. Cell Phone Market Research Company conducted a national survey in 2010 and found the 30% of households in the United States owned at least three cell phones. Michele, a statistics student, decides to replicate this study where she lives. She conducts a random survey of 150 households in her town and finds that 53 own at least three cell phones. Is this strong evidence that the proportion of households in Michele's town that own at least three cell phones is more than the national percentage? Test at a 5% significance level.1. State the question: State what we want to determine and what level of confidence is important in our decision.
We are asked to test the hypothesis that the proportion of households that own at least three cell phones is more than 30%. The parameter of interest,
2. Plan: Based on the above question(s) and the answer to the following questions, decide which test you will be performing. Is the problem about numerical or categorical data? If the data is numerical is the population standard deviation known? Do you have one group or two groups?
We have univariate, categorical data. Therefore, we can perform a one proportion
3. Hypotheses: State the null and alternative hypotheses in words then in symbolic form:
- Express the hypothesis to be tested in symbolic form.
- Write a symbolic expression that must be true when the original claims is false.
- The null hypothesis is the statement which includes the equality.
- The alternative hypothesis is the statement without the equality.
Null Hypothesis in words: The null hypothesis is that the true population proportion of households that own at least three cell phones is equal to 30%.
Null Hypothesis symbolically:
Alternative Hypothesis in words: The alternative hypothesis is that the population proportion of households that own at least three cell phones is more than 30%.
Alternative Hypothesis symbolically:
4. The criteria for the inferential test stated above: Think about the assumptions and check the conditions.
Randomization Condition: The problem tells us Michele uses a random sample.
Independence Assumption: When we know we have a random sample, it is likely that outcomes are independent. There is no reason to think how many cell phones one household owns has any bearing on the next household.
10% Condition: We will assume that the city in which Michele lives is large and that 150 households is less than 10% of all households in her community.
Success/Failure:
To meet this condition, both the success and failure products must be larger than 10 (
5. Compute the test statistic:
The conditions are satisfied, so we will use a hypothesis test for a single proportion to test the null hypothesis. For this calculation we need the sample proportion,
6. Determine the Critical Region(s): Based on our hypotheses are we performing a left-tailed, right tailed or two-tailed test?
We will perform a right-tailed test, since we are only concerned with the proportion being more than 30% of households.
7. Sketch the test statistic and critical region: Look up the probability on the table, as shown in:

Critical Region: This image shows a graph of the critical region for the test statistic in our example.
9. State whether you reject or fail to reject the null hypothesis:
Since the probability is greater than the critical value of 5%, we will fail to reject the null hypothesis.
10. Conclusion: Interpret your result in the proper context, and relate it to the original question.
Since the probability is greater than 5%, this is not considered a rare event and the large probability tells us not to reject the null hypothesis. The
Note that if evidence exists in support of rejecting the null hypothesis, the following steps are then required:
11. Calculate and display your confidence interval for the alternative hypothesis.
12. State your conclusion based on your confidence interval.
Testing a Single Mean
In this section we will evaluate an example of hypothesis testing for a single mean.Learning Objectives
Construct and evaluate a hypothesis test for a single mean.Key Takeaways
Key Points
- Our hypothesis test involves the following steps: stating the question, planning the test, stating the hypotheses, determine if we are meeting the test criteria, and computing the test statistic.
- We continue the test by: determining the critical region, sketching the test statistic and critical region, determining the -value, stating whether we reject or fail to reject the null hypothesis and making meaningful conclusions.
- Our example revolves around statistics students believe that the mean score on the first statistics test is 65 and a statistics instructor thinks the mean score is lower than 65.
- Since the resulting probability is greater than than the critical value of 5%, we will fail to reject the null hypothesis.
Key Terms
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
A Hypothesis Test for a Single Mean—Standard Deviation Unknown
As an example of a hypothesis test for a single mean, consider the following. Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is lower than 65. He randomly samples 10 statistics student scores and obtains the scores [62, 54, 64, 58, 70, 67, 63, 59, 69, 64]. He performs a hypothesis test using a 5% level of significance.1. State the question: State what we want to determine and what level of significance is important in your decision.
We are asked to test the hypothesis that the mean statistics score,
2. Plan: Based on the above question(s) and the answer to the following questions, decide which test you will be performing. Is the problem about numerical or categorical data ? If the data is numerical is the population standard deviation known? Do you have one group or two groups? What type of model is this?
We have univariate, quantitative data. We have a sample of 10 scores. We do not know the population standard deviation. Therefore, we can perform a Student's
3. Hypotheses: State the null and alternative hypotheses in words and then in symbolic form Express the hypothesis to be tested in symbolic form. Write a symbolic expression that must be true when the original claim is false. The null hypothesis is the statement which included the equality. The alternative hypothesis is the statement without the equality.
Null hypothesis in words: The null hypothesis is that the true mean of the statistics exam is equal to 65.
Null hypothesis symbolically:
Alternative hypothesis in words: The alternative is that the true mean statistics score on average is less than 65.
Alternative hypothesis symbolically:
4. The criteria for the inferential test stated above: Think about the assumptions and check the conditions. If your assumptions include the need for particular types of data distribution , construct appropriate graphs or charts.
Randomization Condition: The sample is a random sample.
Independence Assumption: It is reasonable to think that the scores of students are independent in a random sample. There is no reason to think the score of one exam has any bearing on the score of another exam.
10% Condition: We assume the number of statistic students is more than 100, so 10 scores is less than 10% of the population.
Nearly Normal Condition: We should look at a boxplot and histogram for this, shown respectively in and.

Histogram: This figure shows a histogram for the dataset in our example.

Boxplot: This figure shows a boxplot for the dataset in our example.
Sample Size Condition: Since the distribution of the scores is normal, our sample of 10 scores is large enough.
5. Compute the test statistic:
The conditions are satisfied and σ is unknown, so we will use a hypothesis test for a mean with unknown standard deviation. We need the sample mean, sample standard deviation and Standard Error (SE).
6. Determine the Critical Region(s): Based on your hypotheses, should we perform a left-tailed, right-tailed, or two-sided test?
We will perform a left-tailed test, since we are only concerned with the score being less than 65.
7. Sketch the test statistic and critical region: Look up the probability on the table shown in .

Critical Region: This graph shows the critical region for the test statistic in our example.
9. State whether you reject or fail to reject the null hypothesis:
Since the probability is greater than than the critical value of 5%, we will fail to reject the null hypothesis.
10. Conclusion: Interpret your result in the proper context, and relate it to the original question.
Since the probability is greater than 5%, this is not considered a rare event and the large probability tells us not to reject the null hypothesis. It is likely that the average statistics score is 65. The
Testing a Single Variance
In this section we will evaluate an example of hypothesis testing for a single variance.Learning Objectives
Construct and evaluate a hypothesis test for a single variance.Key Takeaways
Key Points
- A test of a single variance assumes that the underlying distribution is normal.
- The null and alternate hypotheses are stated in terms of the population variance (or population standard deviation ).
- A test of a single variance may be right-tailed, left-tailed, or two-tailed.
Key Terms
- variance: a measure of how far a set of numbers is spread out
- null hypothesis: A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
where:
We may think of
A test of a single variance may be right-tailed, left-tailed, or two-tailed.
The following example shows how to set up the null hypothesis and alternate hypothesis. The null and alternate hypotheses contain statements about the population variance.
Examples
Example 1
Math instructors are not only interested in how their students do on exams, on average, but how the exam scores vary. To many instructors, the variance (or standard deviation) may be more important than the average.Suppose a math instructor believes that the standard deviation for his final exam is 5 points. One of his best students thinks otherwise. The student claims that the standard deviation is more than 5 points. If the student were to conduct a hypothesis test, what would the null and alternate hypotheses be?
Solution
Even though we are given the population standard deviation, we can set the test up using the population variance as follows.Example 2
With individual lines at its various windows, a post office finds that the standard deviation for normally distributed waiting times for customers on Friday afternoon is 7.2 minutes. The post office experiments with a single main waiting line and finds that for a random sample of 25 customers, the waiting times for customers have a standard deviation of 3.5 minutes.With a significance level of 5%, test the claim that a single line causes lower variation among waiting times (shorter waiting times) for customers.
Solution
Since the claim is that a single line causes lower variation, this is a test of a single variance. The parameter is the population variance,Random Variable: The sample standard deviation,
The word "lower" tells you this is a left-tailed test.
Distribution for the test:
- is the number of customers sampled
Calculate the test statistic:
where
Graph:

Critical Region: This image shows the graph of the critical region in our example.
Compare
Make a decision: Since
Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that a single line causes a lower variation among the waiting times; or, with a single line, the customer waiting times vary less than 7.2 minutes.