LINEAR REGRESSION - HOMEWORK - GROUP 1
MSDA-3055
extra_sumsquares_X5 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Number.of.hospital.beds, data = cdi))$"Sum Sq"[4]
extra_sumsquares_X6 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Total.serious.crimes, data = cdi))$"Sum Sq"[4]
# Compare the extra sum of squares
extra_sumsquares <- c(extra_sumsquares_X3, extra_sumsquares_X4, extra_sumsquares_X5,
extra_sumsquares_X6)
best_variable <- which.max(extra_sumsquares)
# Print the results
extra_sumsquares
best_variable
OUTPUT:
> # Calculating the extra sum of squares for each additional variable
> extra_sumsquares_X3 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Land.area, data = cdi))$"Sum Sq"[4]
> extra_sumsquares_X4 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Percent.of.population.65.or.older., data = cdi))$"Sum Sq"[4]
> extra_sumsquares_X5 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Number.of.hospital.beds, data = cdi))$"Sum Sq"[4]
> extra_sumsquares_X6 <- anova(lm(Number.of.active..physicians ~ Total.population +
Total.personal.income + Total.serious.crimes, data = cdi))$"Sum Sq"[4]
> # Compare the extra sum of squares
> extra_sumsquares <- c(extra_sumsquares_X3, extra_sumsquares_X4,
extra_sumsquares_X5, extra_sumsquares_X6)
> best_variable <- which.max(extra_sumsquares)
> # Print the results
> extra_sumsquares
[1] 136903711 140425434
62896949 139934722
> best_variable
[1] 2
C. Using the F* test statistic, test whether or not the variable determined to be best in part (b)
is helpful in the regression model when XI and X2 are included in the model; use alpha = .01
State the alternatives, decision rule, and conclusion. Would the F* test statistics for the other
three potential predictor variables be as large as the one here? Discuss.
3