How statistics works (statistical inference):
• collect a sample;
• learn the sample knowledge/pattern;
• use this sample knowledge/pattern approximately as the population knowledge/pattern;
In the body height example,
• sample knowledge/pattern is the average height in your sample (sample average);
• the population knowledge/pattern is the
average
height
of
each
and
every
citizen
in
US.
2
Topic : bias
Bias is the mistakes you have when you do statistics.
• How statistics commit bias:

when collect a sample; (sampling bias)
*
the US body height example : to learn the
average
height
of
each
and
every
citizen
in
US
(population
pattern)
, you need to collect data from each state.
*
however, if you only collect data from New York only, you are commiting
the
sampling
bias
(you
collect
your
data
in
a
wrong
way)
.
*
you aim to summarize the knowledge for US; but what you actually get is only for NY
*
sampling bias is one of the most important concept in this unit.

when learning the sample knowledge/pattern; (estimation bias,
secondsemester
stat
course
,
EMCT1020)

when using this sample knowledge/pattern approximately as the population knowledge/pattern
(
transfer
bias,
3rdyr
stat
course
);
*
you collect a sample of human, which tells you the average number of hands is 2
*
you use this sample pattern to approximate the population pattern for birds : based on
your sample, I believe that birds on average has two hands.
*
you use a sample pattern to approximate the wrong population pattern.
• In the body height example,

sample knowledge/pattern is sample mean (6.2 feet);

population knowledge/pattern is population mean is 6 feet.

if
you
collect
your
sample
correctly
and
your
sample
is
large
enough
, the sample mean is
almost the same as population mean.

in this case, you don't have to worry about the sampling bias.
• sampling bias may be due to the following reasons.

you collect the wrong sample (I want to know the average height of US citizen; instead, I
collect my data from Hawaii)

your sample size is too small. (I want to know the average height of US citizen; I only collect
1 observations from all US citizens.)
• This week we focus on the first one (you collect the wrong sample); a few weeks later we focus on
the second one (your sample size is too small).
• We will assume that all the samples in our tut questions are large enough (say, 1 billion observa
tions)
2