# Week2

.pdf
Week 2 Ning Xu Aug/16/2021 Week 2 material is the review of Week 1 content. Last week, you've learned the following terms • population and sample • variable categorization - categorical vs quantitative - explanatory vs response • sampling bias 1 Topic : population and sample You should know • what is population and sample, • why you need sample and population in statistics Statistics: • Statistics : to learn some knowledge for population or summarize patterns for population. • For example, you want to know the average body height of US citizens, - you can use the knowledge or the pattern to predict the height of any US citizen - you can analyse the relation between body height of US citizens and other factors (smoking, french fries) • In this semester, you will focus on these two tasks. Sample and Population • To achieve two tasks above, you need data • You want to know the average body height of US citizens (the pattern or the knowledge you want to know in Statistics). • To find the true/correct/accurate average height, one idea is that you need to interview each and every person in US (population) . Population : where you find you data and where your data is generated • But the issue, this is impossible. - it may take too much time or money - in many applications, accessing the full population is impossible (e.g., you want summarize the knowledge about the entire human war history) • We have to go another way: you collect a subset of population (sample , in this case 100 US citizens), compute their average height (sample average) and use this value approximately as "the average height of everyone in US". 1
How statistics works (statistical inference): • collect a sample; • learn the sample knowledge/pattern; • use this sample knowledge/pattern approximately as the population knowledge/pattern; In the body height example, • sample knowledge/pattern is the average height in your sample (sample average); • the population knowledge/pattern is the average height of each and every citizen in US. 2 Topic : bias Bias is the mistakes you have when you do statistics. • How statistics commit bias: - when collect a sample; (sampling bias) * the US body height example : to learn the average height of each and every citizen in US (population pattern) , you need to collect data from each state. * however, if you only collect data from New York only, you are commiting the sampling bias (you collect your data in a wrong way) . * you aim to summarize the knowledge for US; but what you actually get is only for NY * sampling bias is one of the most important concept in this unit. - when learning the sample knowledge/pattern; (estimation bias, second-semester stat course , EMCT1020) - when using this sample knowledge/pattern approximately as the population knowledge/pattern ( transfer bias, 3rd-yr stat course ); * you collect a sample of human, which tells you the average number of hands is 2 * you use this sample pattern to approximate the population pattern for birds : based on your sample, I believe that birds on average has two hands. * you use a sample pattern to approximate the wrong population pattern. • In the body height example, - sample knowledge/pattern is sample mean (6.2 feet); - population knowledge/pattern is population mean is 6 feet. - if you collect your sample correctly and your sample is large enough , the sample mean is almost the same as population mean. - in this case, you don't have to worry about the sampling bias. • sampling bias may be due to the following reasons. - you collect the wrong sample (I want to know the average height of US citizen; instead, I collect my data from Hawaii) - your sample size is too small. (I want to know the average height of US citizen; I only collect 1 observations from all US citizens.) • This week we focus on the first one (you collect the wrong sample); a few weeks later we focus on the second one (your sample size is too small). • We will assume that all the samples in our tut questions are large enough (say, 1 billion observa- tions) 2