School

The University of Queensland **We aren't endorsed by this school

Course

STAT 1201

Subject

Statistics

Date

Aug 28, 2023

Pages

11

Uploaded by didwlssud9 on coursehero.com

Statistics (STAT1201)
WEEK 1
Error variability
Penguins come in different shapes and sizes and so we expect to get different body mass
measurements. This is just the
natural variability
of the quantity we are measuring.
The measurement process may be prone to error, giving
measurement variability
.
Wrangling a penguin to measure its body mass is not trivial - here the researchers used
spring scales and a weigh bag. Even if they weigh a penguin twice they may well come up
with different answers. If different people make the measurements then this problem is
compounded. However, measurements can be made more accurate in a systematic way,
such as by having a clear
protocol
for how mass is to be measured, or improving equipment.
There is usually no way to distinguish between measurement variability and natural
variability in the data. We cannot tell whether the first two Adélie penguins, with body
masses of 4675 g and 4050 g, are different because their body masses are really different or
were in fact of the same but there were measurement errors. We collectively call this
variability the
error variability
since it gets in the way of making inferences from our data.
The presence of error variability makes it necessary to
replicate
our experiments. Taking a
single penguin and measuring their body mass tells us very little about body masses in
general. Having 10 observations not only gives us information about the typical body
masses, it gives us information about the nature and magnitude of the variability present in
them.
Group variability
In this example we also find differences in our observations because the two
groups
, Adélie and
Gentoo, do tend to have different body masses. This is the variability we would like to
understand and make inferences about.
Note that we cannot make any conclusion like "Gentoo penguins have greater body masses than
Adélie penguins" because this statement is not universally true. There is an Adélie penguin who
has a body mass that is greater than that of two of the Gentoo penguins. Instead we will talk
about means, so we might claim more correctly that "the mean body mass of Gentoo penguins is
greater than the mean body mass of Adélie penguins". For this data, the mean for Gentoo is
4935 g while for Adélie it is 3895 g, a 1040 g difference.

Sampling variability
This difference of 1040 g is a straightforward calculation and we wouldn't expect there to be any
variability in the result. But there is! This is because if we took another 5 Adélie and 5 Gentoo
and carried out the measurements again then we would most likely get different mean body
masses. The difference would most likely not be 1040 g again.
The mean we calculate depends on
the sample
.
This is a very important point. What we would like to do is to use the means we calculate in our
study to say something about penguins in general, such as " Gentoo penguins tend to have
greater body masses than Adélie penguins". But if we did the study again then we might get
different data which say something else! How can we ever make conclusions? Fortunately, we
are able to quantify this
sampling variability
, particularly if the study has been properly
designed. A focus of this course will be on understanding and characterising sampling variability
in a range of contexts.
As a result of this, statistics can be viewed as a
communication skill
. If a researcher wants to
communicate her findings to someone then she has to use the language of statistics in order to
incorporate sampling variability. Most research articles in the biological sciences, particularly in
medical and other human-related settings, are full of statistical statements and conclusions.
A
variable
is a
characteristic
that we can record about the subjects or objects in a study. These
can be measurements we make, like a forearm length or blood pressure, or can be attributes,
like sex or age.
Quantitative
variables represent measurements,
such
as the height of a person or the
temperature of an environment.
Quantitative variables are quite often
continuous
, taking any value over some range.
Continuous variables capture the idea that measurements can always be made more precisely.
Discrete
variables have only a small number of possibilities, such as a count of some outcomes
or an age measured in whole years. Note that systolic blood pressure is shown in the above
image as a whole number, suggesting it might be discrete, but it is really a continuous quantity.

Categorical
variables represent groups of objects with a particular characteristic. For example,
recording the sex of subjects is essentially the same as making a group of males and a group of
females.
Variables like sex are called
nominal
because they are arbitrary categories with no order
between them.
Ordinal
variables are those whose categories do have an order. A common example of this is in
recording the age group someone falls into. We can put these groups in order because we can
put ages in order. In most of this course we will not make much of the distinction between
nominal and ordinal variables.
The randomisation test in the video is one example of statistical
hypothesis testing
.
In the caffeine analysis, the first explanation for the observed difference in pulse rates between
the groups is referred to as the
null hypothesis
of the test. The null hypothesis will usually be a
statement of "no effect". For example, if we were trying to show that a new drug helped a
medical condition then our null hypothesis would be that it had no benefit. Note that this sense
of "hypothesis" is quite different to a scientific hypothesis. Here, our null hypothesis is that the
mean increase in pulse rate is the same for caffeinated and caffeine-free cola.
The null hypothesis is usually denoted when discussing the theory of hypothesis testing but you
will rarely find this notation appearing in scientific papers that use hypothesis tests. In fact it is
rare for authors to specify the null hypothesis at all, though it is usually easy to infer what it was,
based on the statement of results.

Page1of 11