Confidence Interval Notes

1 STA 210 Supplementary Notes Confidence Intervals In the last chapter we learned about sampling distributions. A major take-away from that chapter is that each possible sample we could select from a population is different, and thus yields a different statistic. This is called sampling variability. As we saw, the different samples vary from each other in a predictable way. We will take advantage of the predictable behavior of sample statistics to perform inferential procedures (i.e. - make guesses about a population parameter from a sample statistic) in this chapter and the next. The goal of this chapter, confidence intervals, is to find a range of values where we think the population parameter lies. We will be using population and sample notation again, so here is a reminder of the symbols we use: Mean Standard Deviation Proportion Population Parameter μ σ p Sample Statistic 𝑥̅ s What Is a Confidence Interval? Imagine you want to estimate a population mean, such as the mean number of marshmallows an adult can fit into his/her mouth. Unfortunately, you can't take a census and ask all adults to shove marshmallows into their faces , so what's the next best thing? Sample data! Okay, so now you have a nice representative sample of adults, and you have counted the number of marshmallows they can fit into their mouths. What number would you compute from the sample data to estimate the population mean? The sample mean, of course! This is called a point estimate. A point estimate of a population parameter is the value of a sample statistic used to estimate the parameter. o The sample mean is a point estimate of the population mean. o The sample proportion is a point estimate of the population proportion. We know the point estimate probably isn't exactly equal to the parameter we're trying to estimate because of sampling variability. Perhaps the sample mean in our example was 8.3 marshmallows. We know there is a good chance the true population mean is near 8.3 marshmallows, but it probably isn't 8.3 exactly . Therefore, we build an interval around the point estimate that we are pretty sure contains the parameter. A confidence interval (CI) is an interval of numbers obtained from a point estimate of a parameter. It offers a plausible range of values for the unknown parameter. All confidence intervals are calculated using the same generic format: point estimate ± margin of error or (point estimate - margin of error, point estimate + margin of error)
2 The margin of error is the cushion that we put on either side of the point estimate to account for the error that arises from using sample data. Its formula depends on which parameter we are trying to estimate (e.g. - a mean or a proportion). Example: A researcher wants to estimate the mean number of times per day that a typical dog barks. She records the daily number of barks for a random sample of dogs. On average, the dogs barked 40 times per day. Using a formula, she found the margin of error for a confidence interval to be 3.5. a) Find the confidence interval for the mean number of barks per day. Solution: point estimate ± margin of error 40 ± 3.5 (40 - 3.5, 40 + 3.5) (36.5, 43.5) b) What is a loose interpretation of this confidence interval? (We will learn a more formal interpretation soon.) Solution: We are trying to estimate the mean number of barks per day for ALL dogs. Based on our sample data, we think the mean number of barks per day for ALL dogs is somewhere between 36.5 and 43.5 barks. Example 1 : A manager at McDonald's wants to estimate the proportion of orders that receive customer complaints. For one week, he tracks the number of orders that his restaurant sells, as well as the number of orders that customers complain about. He calculates that 1.6% of t hat week's orders received customer complaints with a margin of error of 0.2%. a) Find the confidence interval for the proportion of orders that receive complaints. b) What is a loose interpretation of this confidence interval? Confidence Level Now that we've seen a general overview of confidence intervals, we need to dive deeper into the details. The confidence level is the confidence we have that the parameter lies in the confidence interval. o The most common confidence level is 95%, but 99%, 90%, and others are often used as well. o We can never be 100% confident. The only way to have a 100% confidence interval is to make it span from negative infinity to positive infinity, but that would be useless.
3 Meaning of a Confidence Level Think of the confidence level as a success rate. The unknown parameter, perhaps a mean or proportion, is fixed somewhere on the number line. The confidence interval, on the other hand, is variable. It depends on which individuals end up in our sample. Most of the time we'll obtain a sample that leads to a confidence interval that contains the unknown parameter (success!), but sometimes we get unlucky, and our confidence interval misses the mark (failure!). Take a look at the diagram to the left. The normal curve represents the sampling distribution of the sample mean. That is, it represents all the sample means that we could obtain if we took all possible samples of a given size. In the center of that bell curve is the true population mean, μ. The green lines below the normal curve represent some of the confidence intervals we could obtain. Most of them include μ (success!) . However, you'll notice one confidence interval that is far off to the right (failure!). That particular sample happened to have a very high mean, so high that even when the margin of error was added to both sides, the interval still didn't contain µ. The confidence level is the long-term success rate of a confidence interval method in capturing the population parameter. o If you are constructing 95% confidence intervals, for example, then over the long run, 95% of your confidence intervals will contain the parameter, while 5% will not. o In practice we only take one sample and construct one confidence interval. We do not know if it is a "success" or "failure". Interpreting a Confidence Interval Here is the generic way to interpret a confidence interval in this class: Do say: " We are confidence level% confident that the specific mean/proportion for all fill in the blank population lies between low end of confidence interval and high end of confidence interval units. " Example: We are 95% confident that the mean number of times all dogs bark during a day lies between 36.5 and 43.5 barks. Example: We are 90% confident that the proportion of all orders that receive customer complaints at this McDonald's lies between 1.4% a nd 1.8%. Do not say, "There is a 95% chance " that a parameter lies within the interval. The word "confident" has a specific meaning in statistics, so that is the word we use. The word "chance" implies that the parameter is moving around, but in reality, it is fixed.
Uploaded by PrivateJellyfishPerson520 on