In most practical problems, the analyst has a rather large pool of
possible candidate regressors, of which only a few are likely to be
important.
Finding an appropriate
subset of regressors
for the model is often
called the
variable selection problem
.
Building a regression model that includes only a subset of the
available regressors involves two conflicting objectives.
We would like the model to
include as many regressors
as
possible so that the information content in these factors
value of
y
.
We want the model to include
as few regressors
as possible
because the
as the number of
regressors increases.
Also the more regressors there are in a model, the greater the costs
of data collection and model maintenance.
The process of finding a model that is a compromise between these
two objectives is called selecting the "
best
" regression equation.
Consequences of Model Misspecification