# 7.37

.docx
LINEAR REGRESSION - HOMEWORK - GROUP 1 MSDA-3055 7.37. Refer to the CDI data set in Appendix C.2. For predicting the number of active physicians (Y) in a county, it has been decided to include total population (X1) and total personal income (X2) as predictor variables. The question now is whether an additional predictor variable would be helpful in the model and, if so, which variable would be most helpful. Assume that a first-order multiple regression model is appropriate. a. For each of the following variables, calculate the coefficient of partial determination given that X1 and X2 are included in the model: land area (X3), percent of population 65 or older (X4), number of hospital beds (X5), and total serious crimes (X6). R-code: 1. cdi <- read.csv("C:\\Users\\akhil\\OneDrive\\Documents\\cdi.csv") # Fit the multiple regression model model <- lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area + Percent.of.population.65.or.older. + Number.of.hospital.beds + Total.serious.crimes, data = cdi) # Fit the initial model with X1 and X2 as predictor variables model_initial <- lm(Number.of.active..physicians ~ Total.population + Total.personal.income, data = cdi) # Calculating the coefficient of partial determination for each additional variable partial_determination_X3 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area, data = cdi))\$r.squared - summary(model_initial)\$r.squared partial_determination_X4 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Percent.of.population.65.or.older., data = cdi))\$r.squared - summary(model_initial)\$r.squared partial_determination_X5 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Number.of.hospital.beds, data = cdi))\$r.squared - summary(model_initial)\$r.squared partial_determination_X6 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Total.serious.crimes, data = cdi))\$r.squared - summary(model_initial) \$r.squared # Print the results partial_determination_X3 1
LINEAR REGRESSION - HOMEWORK - GROUP 1 MSDA-3055 partial_determination_X4 partial_determination_X5 partial_determination_X6 OUTPUT: > cdi <- read.csv("C:\\Users\\akhil\\OneDrive\\Documents\\cdi.csv") > # Fit the multiple regression model > model <- lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area + Percent.of.population.65.or.older. + Number.of.hospital.beds + Total.serious.crimes, data = cdi) > # Fit the initial model with X1 and X2 as predictor variables > model_initial <- lm(Number.of.active..physicians ~ Total.population + Total.personal.income, data = cdi) > # Calculating the coefficient of partial determination for each additional variable > partial_determination_X3 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area, data = cdi))\$r.squared - summary(model_initial) \$r.squared > partial_determination_X4 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Percent.of.population.65.or.older., data = cdi))\$r.squared - summary(model_initial)\$r.squared > partial_determination_X5 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Number.of.hospital.beds, data = cdi))\$r.squared - summary(model_initial)\$r.squared > partial_determination_X6 <- summary(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Total.serious.crimes, data = cdi))\$r.squared - summary(model_initial)\$r.squared > # Print the results > partial_determination_X3  0.002889597 > partial_determination_X4  0.0003851834 > partial_determination_X5  0.05551826 > partial_determination_X6  0.0007341451 b. On the basis of the results in part (a), which of the four additional predictor variables is best? Is the extra sum of squares associated with this variable larger than those for the other three variables? R-CODE: # Calculating the extra sum of squares for each additional variable extra_sumsquares_X3 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area, data = cdi))\$"Sum Sq" extra_sumsquares_X4 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Percent.of.population.65.or.older., data = cdi))\$"Sum Sq" 2
LINEAR REGRESSION - HOMEWORK - GROUP 1 MSDA-3055 extra_sumsquares_X5 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Number.of.hospital.beds, data = cdi))\$"Sum Sq" extra_sumsquares_X6 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Total.serious.crimes, data = cdi))\$"Sum Sq" # Compare the extra sum of squares extra_sumsquares <- c(extra_sumsquares_X3, extra_sumsquares_X4, extra_sumsquares_X5, extra_sumsquares_X6) best_variable <- which.max(extra_sumsquares) # Print the results extra_sumsquares best_variable OUTPUT: > # Calculating the extra sum of squares for each additional variable > extra_sumsquares_X3 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Land.area, data = cdi))\$"Sum Sq" > extra_sumsquares_X4 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Percent.of.population.65.or.older., data = cdi))\$"Sum Sq" > extra_sumsquares_X5 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Number.of.hospital.beds, data = cdi))\$"Sum Sq" > extra_sumsquares_X6 <- anova(lm(Number.of.active..physicians ~ Total.population + Total.personal.income + Total.serious.crimes, data = cdi))\$"Sum Sq" > # Compare the extra sum of squares > extra_sumsquares <- c(extra_sumsquares_X3, extra_sumsquares_X4, extra_sumsquares_X5, extra_sumsquares_X6) > best_variable <- which.max(extra_sumsquares) > # Print the results > extra_sumsquares  136903711 140425434 62896949 139934722 > best_variable  2 C. Using the F* test statistic, test whether or not the variable determined to be best in part (b) is helpful in the regression model when XI and X2 are included in the model; use alpha = .01 State the alternatives, decision rule, and conclusion. Would the F* test statistics for the other three potential predictor variables be as large as the one here? Discuss. 3