School

Harrisburg University of Science and Technology **We aren't endorsed by this school

Course

ANLY 502-51

Subject

Statistics

Date

Sep 4, 2023

Type

Other

Pages

22

Uploaded by xszlf924 on coursehero.com

lab4
2023-05-26
lecture 4 - Distribution
#In this lab we'll investigate the probability distribution that
is most central to statistics:
# the normal distribution. If we are confident that our data are
nearly normal, that opens the door to many powerful statistical
methods. Here we'll use the graphical tools of R to assess the
normality of our data and also learn how to generate random
numbers from a normal distribution.
#The Data
# This week we'll be working with measurements of body
dimensions. This data set contains measurements from 247 men and
# 260 women, most of whom were considered healthy young adults.
download.file
(
"http://www.openintro.org/stat/data/bdims.RData"
,
destfile =
"bdims.RData"
)
load
(
"bdims.RData"
)
#Let's take a quick peek at the first few rows of the data.
head
(bdims)
##
bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di
ank.di sho.gi che.gi
## 1
42.9
26.0
31.5
17.7
28.0
13.1
10.4
18.8
14.1
106.2
89.5
## 2
43.7
28.5
33.5
16.9
30.8
14.0
11.8
20.6
15.1
110.5
97.0
## 3
40.1
28.2
33.3
20.9
31.7
13.9
10.9
19.7
14.1
115.1
97.5
## 4
44.3
29.9
34.0
18.4
28.2
13.9
11.2
20.9
15.0
104.5
97.0
## 5
42.5
29.9
34.0
21.5
29.4
15.2
11.6
20.7
14.9
107.5
97.5
## 6
43.3
27.0
31.5
19.6
31.3
14.0
11.5
18.8
13.9
119.8
99.9
##
wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi
ank.gi wri.gi age
## 1
71.5
74.5
93.5
51.5
32.5
26.0
34.5
36.5
23.5
16.5
21
## 2
79.0
86.5
94.8
51.5
34.4
28.0
36.5
37.5
24.5
17.0
23

## 3
83.2
82.9
95.0
57.3
33.4
28.8
37.0
37.3
21.9
16.9
28
## 4
77.8
78.8
94.0
53.0
31.0
26.2
37.0
34.8
23.0
16.6
23
## 5
80.0
82.5
98.5
55.4
32.0
28.4
37.7
38.6
24.4
18.0
22
## 6
82.5
80.1
95.3
57.5
33.0
28.0
36.6
36.1
23.5
16.9
21
##
wgt
hgt sex
## 1 65.6 174.0
1
## 2 71.8 175.3
1
## 3 80.7 193.5
1
## 4 72.6 186.5
1
## 5 78.8 187.2
1
## 6 74.8 181.5
1
# You'll see that for every observation we have 25 measurements,
many of which are either diameters or girths.
# A key to the variable names can be found at
http://www.openintro.org/stat/data/bdims.php, but we'll be
focusing on just
# three columns to get started: weight in kg ( wgt ), height in
cm ( hgt ), and
sex
( 1
indicates male,
0
indicates female).
#Since males and females tend to have different body dimensions,
it will be useful to create two additional data sets:
#one with only men and another with only women.``
mdims
<-
subset
(bdims, sex
==
1
)
fdims
<-
subset
(bdims, sex
==
0
)
# 1.Make a histogram of men's heights and a histogram of women's
heights.
# How would you compare the various aspects of the two
distributions?
hist
(mdims
$
hgt)

hist
(fdims
$
hgt)
#The normal distribution
# In your description of the distributions, did you use words
like bell-shaped or normal?
# It's tempting to say so when faced with a unimodal symmetric
distribution.
# To see how accurate that description is, we can plot a normal
distribution curve on top of a histogram to see how closely
# the data follow a normal distribution. This normal curve should
have the same mean and standard deviation as the data.
# We'll be working with women's heights, so let's store them as a
separate object and then calculate some statistics that
#will be referenced later.
fhgtmean
<-
mean
(fdims
$
hgt)
# mean of Female Height
fhgtsd
<-
sd
(fdims
$
hgt)
# StdDev of Female Height
# Next we make a density histogram to use as the backdrop and use
the
lines
function to overlay a normal probability curve.
# The difference between a frequency histogram and a density
histogram is that while in a frequency histogram the heights