# Biostats I 2023 Topic 2 materials

.pdf
L2.1 BIOSTATISTICS I (PUBH4401) TOPIC 2 SUMMARISING AND PRESENTING DATA Most studies collect more data than is possible to interpret just by looking at the raw data. Usually, the data are summarised and presented in a way which helps to reveal the true patterns in the data. 2.1 Types of variables A variable is a characteristic or measurement that varies from one subject to another. It is useful to distinguish between four types of variables. Qualitative - nominal The variable has categories or levels with no natural order; e.g. sex, country of birth. Qualitative - ordinal The variable has categories or levels with a natural order; e.g. pain (none, mild, moderate, severe). Quantitative - discrete The values are numerical but only certain values are possible; e.g. household size, number of children. Quantitative - continuous The values are numerical and all values in some continuous range are possible; e.g. blood pressure, weight. Quantitative variables are called scale variables in SPSS Statistics. A qualitative variable is also called a categorical or attribute variable. A qualitative variable with only two possible values is also called a binary or dichotomous variable; e.g. male/female, dead/alive, yes/no.
L2.2 2.2 Frequency distributions Calculation for pie chart Cause of death Frequency Relative frequency Angle (degrees) Circulatory system 143 559 0.49471 178 Neoplasms (cancers) 62 767 0.21630 78 Respiratory system 43 886 0.15123 54 Injury and poisoning 7 736 0.02666 10 Digestive system 9 147 0.03152 11 Others 23 094 0.07958 29 Total 290 189 1.00000 360 Data for a qualitative variable are easily summarised by counting the number of subjects in each category. The categories must cover all possibilities and each subject is in exactly one category. This summary is called a frequency distribution and the proportions or percentages in each category form the relative frequency distribution which can be depicted in a pie chart or a bar chart. Qualitative variable = "Cause of death" n=290,189 observations (deaths) Others Digestive system Injury and poisoning Respiratory system Neoplasms Circulatory system Frequency distribution for quantitative-discrete variable Quantitative discrete variable = "Parity" n= 125 observations (pregnant women) Parity of 125 women attending antenatal clinics at St George's Hospital Parity Frequency Relative frequency (percent) Cumulative frequency Relative cumulative frequency (percent) 0 59 47.2 59 47.2 1 44 35.2 103 82.4 2 14 11.2 117 93.6 3 3 2.4 120 96.0 4 4 3.2 124 99.2 5 1 0.8 125 100.0 Total 125 100.0
L2.3 Frequency distribution for quantitative-continuous variable The frequency distribution is constructed by dividing the range of possible values into intervals and obtaining the frequency in each interval. Quantitative continuous variables = "Systolic BP" measured in mm of Hg n= 63 observations (adults). Note: Intervals are of length 20. Class interval Label Frequency Relative frequency (%) 89.5 - 109.5 90-109 10 16 109.5 - 129.5 110-129 24 38 129.5 - 149.5 130-149 18 29 149.5 - 169.5 150-169 9 14 169.5 - 189.5 170-189 2 3 Total 63 100 Graphical representations of frequency distribution for quantitative variable Histogram Stem-and-leaf plot Note: Intervals are of length 10. Stems (intervals) Leaves (observations) Frequency 90-99 2 4 6 8 4 100-109 0 4 6 8 8 8 6 110-119 2 2 4 4 8 8 8 8 8 9 120-129 0 2 2 2 2 4 4 8 8 8 8 8 8 8 8 15 130-139 0 0 0 2 2 4 4 4 4 4 4 8 12 140-149 0 0 2 4 4 6 6 150-159 2 2 4 4 4 4 6 7 160-169 2 2 2 170-179 0 2 2 180-189 0