# STAT7055T01Sol

.pdf
STAT7055 Topic 01 Tutorial Solutions 1. Data was collected on 105 homes in Canberra in 2003. For each house, the following information was collected: the estimated price of the house (in dollars); the number of bedrooms; the size of the house (in square metres); whether or not a pool was present (yes or no); the distance from Civic; the rating of the insulation in the house (none, average or high); the suburb; the number of bathrooms; and the type of internet connectivity available (dialup, ADSL or the NBN, where dialup is the slowest connection and the NBN is the fastest). Classify each variable as either nominal, ordinal, discrete or continuous. Solution: There are nine variables in this study: 1. Estimated price. Although the estimated price is likely to be a whole dollar amount, we generally consider monetary values to be continuous, as they can technically be any possible amount. 2. Number of bedrooms. This is a count variable so it is discrete. 3. Size. This is a measurement so it is continuous. 4. Pool present. It's not necessarily clear that having a pool is better than not having a pool (e.g., increasing costs for maintenance), so this is nominal. 5. Distance from Civic. Again a measurement, so continuous. 6. Insulation rating. Clearly an ordering between the categories, so ordinal. 7. Suburb. Nominal. 8. Number of bathrooms. Discrete. 9. Type of internet. Again a clear ordering between the categories, so ordinal. 2. You work in a country where every resident plays a sport every day. However the only two sports played are table tennis (when it is raining) and golf (when it is sunny). Your job is to provide statistical analysis to the management of a company that sells "ping- pong" (table tennis) balls directly through the internet. Over the past eight months you have collected the following data: Month Marketing expenditure (\$) Number of rainy days Number of sales 1 4150 6 778 2 3000 10 779 3 2500 25 4200 4 10600 2 250 5 12000 7 300 6 8000 20 6000 7 1500 18 1500 8 6850 9 500 For this data, the sample coe cients of variation for marketing expenditure, number of rainy days per month, and number of sales have been calculated to be 0 . 642849, 0 . 656009, and 1 . 194023, respectively. Page 1 of 8
STAT7055 Topic 01 Tutorial Solutions (a) The marketing manager has told you that it simply makes sense that there is a strong and positive correlation between marketing expenditure and the number of sales made. Provide some analysis regarding this relationship. What do you conclude from your results? Solution: If we let X be marketing expenditure and Y be number of sales, we can calculate the sample means to be ¯ X = 6075 and ¯ Y = 1788 . 375 . From the given coe cients of variation, we can then determine the sample standard deviations: s X = cv X × ¯ X = 3905 . 3077 s Y = cv Y × ¯ Y = 2135 . 3603 The following table can be used to help calculate the sample covariance between X and Y : Month X i = Marketing expenditure ( \$ ) Y i = Number of sales ( X i ¯ X )( Y i ¯ Y ) 1 4150 778 1944971 . 88 2 3000 779 3103828 . 13 3 2500 4200 8621559 . 38 4 10600 250 6961146 . 88 5 12000 300 8818621 . 88 6 8000 6000 8107378 . 13 7 1500 1500 1319315 . 63 8 6850 500 998490 . 63 ¯ X = 6075 ¯ Y = 1788 . 375 ஀? = 10924325 . 00 Hence s XY = 10924325 8 1 = 1560617 . 86 Therefore, the correlation coe cient is r XY = 1560617 . 86 3905 . 3077 × 2135 . 3603 = 0 . 1871 Based on this information we would conclude that in contrast to the marketing manager's assertions, there is a negative relationship between marketing expenditure and number of sales. If we were to do this using R, we would first create vectors for the X and Y values (the order that you list the numbers is important - they must match up for X and Y ): > x <- c(4150,3000,2500,10600,12000,8000,1500,6850) > y <- c(778,779,4200,250,300,6000,1500,500) The cor function can then be used to calculate the sample correlation coe cient: > cor(x,y) [1] -0.1871415 Page 2 of 8
STAT7055 Topic 01 Tutorial Solutions (b) Using the data above, calculate the correlation coe cient between the number of rainy days per month and the number of sales. The covariance between the number of rainy days per month and the number of sales has been calculated as 14012 . 23. Solution: Let Z denote the number of rainy days per month. Again from the coe cient of variation, we can calculate the sample standard deviation of Z : s Z = cv Z × ¯ Z = 0 . 656009 × 12 . 125 = 7 . 9541 Since we are given s ZY , the sample correlation between Z and Y is equal to: r ZY = s ZY s Z s Y = 14012 . 23 7 . 9541 × 2135 . 3603 = 0 . 8250 Using R, we would just need to create the new vector of Z values: > z <- c(6,10,25,2,7,20,18,9) Then we can use the cor function again for the sample correlation coe cient: > cor(z,y) [1] 0.8249823 (c) What does the result in (b) above suggest, and provide a potential reason for this result. Solution: Part (b) suggests that there is very strong positive relationship between the number of rainy days in a month and the number of sales in a month. This might be because when there is a lot of rain, people play an indoor sport on those days, and that sport is table tennis, hence the stronger levels of sales for ping pong balls on those days. Try using R to calculate the sample correlation coe cients from the raw data given in the table. 3. A quality control o cer in a chocolate factory records the number of minutes it takes for the company's signature chocolate bar to melt at room temperature. He recorded the following 11 times for 11 di ff erent chocolate bars: 14 20 20 12 9 13 35 12 11 12 46 (a) Calculate the mean, mode and median of the times. Solution: For the mean we have ¯ X = ஀? 11 i =1 X i 11 = 18 . 55 . For the median, we first sort the observations from smallest to largest: 9 11 12 12 12 13 14 20 20 35 46 Since there are an odd number of observations, the median is the middle observation, 13 . For the mode, the time of 12 occurs the most (three times). Page 3 of 8
Page1of 8