School

Australian National University **We aren't endorsed by this school

Course

STAT 7055

Subject

Statistics

Date

Aug 26, 2023

Pages

8

Uploaded by CommodoreEmu3543 on coursehero.com

STAT7055 Topic 01 Tutorial Solutions
1. Data was collected on 105 homes in Canberra in 2003.
For each house, the following
information was collected: the estimated price of the house (in dollars); the number of
bedrooms; the size of the house (in square metres); whether or not a pool was present (yes
or no); the distance from Civic; the rating of the insulation in the house (none, average
or high); the suburb; the number of bathrooms; and the type of internet connectivity
available (dialup, ADSL or the NBN, where dialup is the slowest connection and the NBN
is the fastest). Classify each variable as either nominal, ordinal, discrete or continuous.
Solution:
There are nine variables in this study:
1. Estimated price. Although the estimated price is likely to be a whole dollar amount,
we generally consider monetary values to be continuous, as they can technically be
any possible amount.
2. Number of bedrooms. This is a count variable so it is discrete.
3. Size. This is a measurement so it is continuous.
4. Pool present. It's not necessarily clear that having a pool is better than not having
a pool (e.g., increasing costs for maintenance), so this is nominal.
5. Distance from Civic. Again a measurement, so continuous.
6. Insulation rating. Clearly an ordering between the categories, so ordinal.
7. Suburb. Nominal.
8. Number of bathrooms. Discrete.
9. Type of internet. Again a clear ordering between the categories, so ordinal.
2. You work in a country where every resident plays a sport every day. However the only
two sports played are table tennis (when it is raining) and golf (when it is sunny). Your
job is to provide statistical analysis to the management of a company that sells "ping-
pong" (table tennis) balls directly through the internet. Over the past eight months you
have collected the following data:
Month
Marketing
expenditure ($)
Number of rainy
days
Number of
sales
1
4150
6
778
2
3000
10
779
3
2500
25
4200
4
10600
2
250
5
12000
7
300
6
8000
20
6000
7
1500
18
1500
8
6850
9
500
For this data, the sample coe
ﬃ
cients of variation for marketing expenditure, number of
rainy days per month, and number of sales have been calculated to be 0
.
642849, 0
.
656009,
and 1
.
194023, respectively.
Page 1 of 8

STAT7055
Topic 01 Tutorial Solutions
(a) The marketing manager has told you that it simply makes sense that there is a
strong and positive correlation between marketing expenditure and the number
of sales made.
Provide some analysis regarding this relationship.
What do you
conclude from your results?
Solution:
If we let
X
be marketing expenditure and
Y
be number of sales, we can
calculate the sample means to be
¯
X
= 6075
and
¯
Y
= 1788
.
375
.
From the given
coe
ﬃ
cients of variation, we can then determine the sample standard deviations:
s
X
=
cv
X
×
¯
X
= 3905
.
3077
s
Y
=
cv
Y
×
¯
Y
= 2135
.
3603
The following table can be used to help calculate the sample covariance between
X
and
Y
:
Month
X
i
=
Marketing
expenditure (
$
)
Y
i
=
Number of
sales
(
X
i
−
¯
X
)(
Y
i
−
¯
Y
)
1
4150
778
1944971
.
88
2
3000
779
3103828
.
13
3
2500
4200
−
8621559
.
38
4
10600
250
−
6961146
.
88
5
12000
300
−
8818621
.
88
6
8000
6000
8107378
.
13
7
1500
1500
1319315
.
63
8
6850
500
−
998490
.
63
¯
X
= 6075
¯
Y
= 1788
.
375
?
=
−
10924325
.
00
Hence
s
XY
=
−
10924325
8
−
1
=
−
1560617
.
86
Therefore, the correlation coe
ﬃ
cient is
r
XY
=
−
1560617
.
86
3905
.
3077
×
2135
.
3603
=
−
0
.
1871
Based on this information we would conclude that in contrast to the marketing
manager's assertions, there is a negative relationship between marketing expenditure
and number of sales.
If we were to do this using R, we would first create vectors for the
X
and
Y
values
(the order that you list the numbers is important - they must match up for
X
and
Y
):
> x <- c(4150,3000,2500,10600,12000,8000,1500,6850)
> y <- c(778,779,4200,250,300,6000,1500,500)
The
cor
function can then be used to calculate the sample correlation coe
ﬃ
cient:
> cor(x,y)
[1] -0.1871415
Page 2 of 8

STAT7055
Topic 01 Tutorial Solutions
(b) Using the data above, calculate the correlation coe
ﬃ
cient between the number of
rainy days per month and the number of sales. The covariance between the number
of rainy days per month and the number of sales has been calculated as 14012
.
23.
Solution:
Let
Z
denote the number of rainy days per month.
Again from the
coe
ﬃ
cient of variation, we can calculate the sample standard deviation of
Z
:
s
Z
=
cv
Z
×
¯
Z
= 0
.
656009
×
12
.
125 = 7
.
9541
Since we are given
s
ZY
, the sample correlation between
Z
and
Y
is equal to:
r
ZY
=
s
ZY
s
Z
s
Y
=
14012
.
23
7
.
9541
×
2135
.
3603
= 0
.
8250
Using R, we would just need to create the new vector of
Z
values:
> z <- c(6,10,25,2,7,20,18,9)
Then we can use the
cor
function again for the sample correlation coe
ﬃ
cient:
> cor(z,y)
[1] 0.8249823
(c) What does the result in (b) above suggest, and provide a potential reason for this
result.
Solution:
Part (b) suggests that there is very strong positive relationship between
the number of rainy days in a month and the number of sales in a month.
This
might be because when there is a lot of rain, people play an indoor sport on those
days, and that sport is table tennis, hence the stronger levels of sales for ping pong
balls on those days.
Try using R to calculate the sample correlation coe
ﬃ
cients from the raw data given in
the table.
3. A quality control o
ﬃ
cer in a chocolate factory records the number of minutes it takes
for the company's signature chocolate bar to melt at room temperature. He recorded
the following 11 times for 11 di
ff
erent chocolate bars:
14
20
20
12
9
13
35
12
11
12
46
(a) Calculate the mean, mode and median of the times.
Solution:
For the mean we have
¯
X
=
?
11
i
=1
X
i
11
= 18
.
55
. For the median, we first
sort the observations from smallest to largest:
9
11
12
12
12
13
14
20
20
35
46
Since there are an odd number of observations, the median is the middle observation,
13
. For the mode, the time of
12
occurs the most (three times).
Page 3 of 8

Page1of 8