# Biostats I 2023 Topic 3 materials

.pdf
L3.1 BIOSTATISTICS I (PUBH4401) TOPIC 3 BASICS OF STATISTICAL INFERENCE This is a semi-theoretical lecture that attempts to explain where formulae for confidence intervals (estimation) and p-values (hypothesis testing) come from. A full understanding of this lecture is not essential, but this understanding helps with knowing how to correctly interpret confidence interval and p-value results. 3.1 Normal distribution In Topic 2, we saw that the frequency distribution for the values of a quantitative variable could be presented by a histogram. This shows the proportion of values in each interval. The theoretical equivalent of a histogram is a probability density curve. The Normal distribution is very important in statistics for two reasons. (1) Many quantitative-continuous variables which represent measurements of natural phenomena (e.g. weight, blood pressure, plasma glucose) have a histogram which has a "Normal" shape. (2) The "sampling distribution" of estimates and test statistics often have a Normal distribution (this will be explained later).
L3.2 The probability density curve for a variable X that has a Normal distribution for a population with mean and standard deviation has the mathematical formula 3.14159 e 2 1 = P(X) 2 - x 2 1 - 2 π πσ σ μ It describes how the values of X are distributed. Values are more common around (curve is higher), and less common (curve is lower) as we go further from . Each specification of and gives a different curve but they all have the same shape. The total area under the probability density curve is 1.0. For a particular interval, the area under the curve gives the proportion of values in that interval. shaded area = proportion of values between 8 and 10 = Pr(8 < X < 10) This proportion is also the probability that a randomly chosen individual from this population has a value of X between 8 and 10.
L3.3 Special property of Normal distributions If X has a Normal(mean= , SD= ) distribution then σ μ - X = Z has a Normal(mean=0, SD=1) distribution. Equivalently, if Z has a Normal(mean=0, SD=1) distribution then X = + Z has a Normal(mean= , SD= ) distribution. Using this property, a probability for any Normal distribution can be converted into a probability for a Normal (mean=0, SD=1) distribution. Tables of probabilities for the Normal(mean=0, SD=1) distribution are available. This is often called the standard Normal distribution. Let X = plasma glucose and suppose the distribution of X amongst obese 40-49 year old men is Normal ( = 8, = 2). What proportion of obese 40-49 year old men have X>10.5? Since the distribution of X is Normal( = 8, = 2) Pr ( 𝑋 > 10.5 ) = Pr 𝑋 − 8 2 > 10.5 8 2 = Pr ( 𝑍 > 1.25 ) = ( 0.5 - 0.3944 ) from table for Normal ( 0,1 ) = 0.1056 So, 10.56% of men in this population have plasma glucose value greater than 10.5.