##
Total.Length
74822
45.319
1.549
0.123825
##
Diagonal.length
34.389
30.518
1.126
0.262350
##
Height
lo.ce0
13.395
0.746
0.456692
##
Width
5.3
24.483
0.341
0.733924

##
Signif.
codes:
©
'***'
0.001
"**'
0.01
'*'
0.05
.
0.1
'
'
1
w
##
Residual
standard
error:
93.34
on
136
degrees
of
freedom
##
Multiple
Rsquared:
0.9385,
Adjusted
Rsquared:
0.9335
##
Fstatistic:
188.6
on
11
and
136
DF,
pvalue:
<
2.2e16
(c)
Display
the
VIF
of
each predictor
for
model2.
Using
a
VIF
threshold
of
max(10,
1/(1R?)
what
conclusions
can
you
draw?
vif(model2)
w
GUIF
Df
GVIFA(1/(2%DF))
##
Species
1545.55017
6
1.543983
##
Body.Height
2371.15420
1
48.694499
##
Total.length
4540.47695
1
67.383062
##
Diagonal.length
2126.64985
1
46.115614
##
Height
56.21375
1
7.497583
##
Width
29.01683
1
5.386727
#Calcluate
1/(1$r"2$)
value for
VIF
thresh
rsq<
0.9385
vif
thresh<1/(1rsq)
Vif
thresh
##
[1]
16.26016
The
VIF
threshold
is
max(10,
16.26).
Based
on
the
threshold
and
the
calculated
VIF
values,
there
seems
to
be
strong
mutticollinearity
among
the
predictors.
Species,
Body
Height,
Total
Length
and
Diagonal
Length
specifically
have
extremely
high
VIFs
meaning
they
must
be highly
correlated
with
other
predictors.
Question
4:
Checking
Model
Assumptions
[6
points]
Please
use
the
cleaned
data
set,
which
have
the
outlier(s)
removed,
and
model2
for
answering
the
following
questions.
(a)
Create scatterplots
of
the
standardized
residuals
of
model2
versus
each
quantitative
predictor.
Does
the
linearity
assumption
appear
to
hold
for
all
predictors?
res<residuals(model2)
par(mfrow=c(2,3))
plot(fish.redsBody.Height,
res,
xlab="Body
Height",
ylab="Residuals");
abline(
plot(fish.redsTotal.Length,
res,
xlab="Total
Length",
ylab="Residuals");
abline(h=0)
plot(fish.redsDiagonal.Length,
res,
xlab="Diagonal
Length",
ylab="Residuals");
abline(h=0)
plot(fish.redSHeight,
res,
xlab="Height",
ylab="Residuals");
abline(h=e)
plot(fish.redsuidth,
res,
xlab="uiidth",
ylab="Residuals");
abline(h=0)
3
8
%8
%8713
°q
i
i
i

§
§
g4
H
02
W
w0
02
0
w0
50070
oty
it
TotaLengn
DigonatLengtn
584
%
584
°
TR
3
Ty
o
and
&
i
L
s
1234567
e
wian
Based
on
the
scatterplots
of
residuals
vs
predictors,
there
is
a
fairly
weak
random
scatter
around
the
zero
line.
The
plots
of
Height
and
Width
have
a
stronger
random
scatter,
however
in
all
the
plots
there
seem
to
be
a
greater
amount
of
residuals
located
below
the
zero
line.
(b)
Create
a
scatter
plot
of
the
standardized
residuals
of
model2
versus
the
fitted
values
of
model2.
Does
the
constant
variance
'assumption
appear
to
hold?
Do
the
errors
appear
uncorrelated?
fit<model2sfitted.values
plot(res,
fit,
xlab="Residuals",
ylal
Fitted
Values");
abline(h=0)
3
3
o
o
o
o
8
o,
B
o
o0
2
00
8%
o
6o,
2
o
o
E]
o
o
oo
o
o
S
g
lo
o0
"hgo0
0P
@
84
%°
000
3
8
90
00f
[
£
2oy
o
E
)
°
o
o
°
B
T
%0
E
o
3
T
T
T
T
T
T
T
200
100
0
100
200
300
400
Residuals
The
scatter
plot
of
ftted
values
vs
residuals
does
not
appear
to
have
a
random
scatter
around
zero
indicating
that
the
constant variance
assumption
does
not
hold.
The
scatter
also
seems
to
show
somewhat
of
a
megaphone
effect.
The
errors
appear
to
be
uncorrelated
as
no
clusters
seem
to
be
formed
in
the
residuals
plot
(c)
Create
a
histogram
and
normal
QQ
plot
for
the
standardized
residuals.
What
conclusions
can
you
draw
from
these
plots?
par(nfrow=c(1,2))
hist(res,
xlab="Residuals",
main=
qanorm(res);
qqline(res)
Histogram
of
Residuals")
Histogram
of
Residuals
Normal
QQ
Plot
50
60
70
200
300
400
@
8
€
s
sncy
40