Lab4

.docx
lab4 2023-05-26 lecture 4 - Distribution #In this lab we'll investigate the probability distribution that is most central to statistics: # the normal distribution. If we are confident that our data are nearly normal, that opens the door to many powerful statistical methods. Here we'll use the graphical tools of R to assess the normality of our data and also learn how to generate random numbers from a normal distribution. #The Data # This week we'll be working with measurements of body dimensions. This data set contains measurements from 247 men and # 260 women, most of whom were considered healthy young adults. download.file ( "http://www.openintro.org/stat/data/bdims.RData" , destfile = "bdims.RData" ) load ( "bdims.RData" ) #Let's take a quick peek at the first few rows of the data. head (bdims) ## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi che.gi ## 1 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2 89.5 ## 2 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5 97.0 ## 3 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1 97.5 ## 4 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5 97.0 ## 5 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5 97.5 ## 6 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8 99.9 ## wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi wri.gi age ## 1 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5 16.5 21 ## 2 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5 17.0 23
## 3 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9 16.9 28 ## 4 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0 16.6 23 ## 5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4 18.0 22 ## 6 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5 16.9 21 ## wgt hgt sex ## 1 65.6 174.0 1 ## 2 71.8 175.3 1 ## 3 80.7 193.5 1 ## 4 72.6 186.5 1 ## 5 78.8 187.2 1 ## 6 74.8 181.5 1 # You'll see that for every observation we have 25 measurements, many of which are either diameters or girths. # A key to the variable names can be found at http://www.openintro.org/stat/data/bdims.php, but we'll be focusing on just # three columns to get started: weight in kg ( wgt ), height in cm ( hgt ), and sex ( 1 indicates male, 0 indicates female). #Since males and females tend to have different body dimensions, it will be useful to create two additional data sets: #one with only men and another with only women.`` mdims <- subset (bdims, sex == 1 ) fdims <- subset (bdims, sex == 0 ) # 1.Make a histogram of men's heights and a histogram of women's heights. # How would you compare the various aspects of the two distributions? hist (mdims $ hgt)
hist (fdims $ hgt) #The normal distribution # In your description of the distributions, did you use words like bell-shaped or normal? # It's tempting to say so when faced with a unimodal symmetric distribution. # To see how accurate that description is, we can plot a normal distribution curve on top of a histogram to see how closely # the data follow a normal distribution. This normal curve should have the same mean and standard deviation as the data. # We'll be working with women's heights, so let's store them as a separate object and then calculate some statistics that #will be referenced later. fhgtmean <- mean (fdims $ hgt) # mean of Female Height fhgtsd <- sd (fdims $ hgt) # StdDev of Female Height # Next we make a density histogram to use as the backdrop and use the lines function to overlay a normal probability curve. # The difference between a frequency histogram and a density histogram is that while in a frequency histogram the heights
Uploaded by xszlf924 on coursehero.com