2
Clustering CARS Dataset
The first job was to evaluate the linear relationships between the quantitative variables of
the CARS dataset using correlations and scatterplots. Figure 1 depicts the code, whereas Figure 2
is a correlation matrix that quantifies the strength and direction of the linear link between two
variables. The matrix displays the correlation coefficients between MSRP, invoice price, engine
size, number of cylinders, horsepower, MPG in the city and on the highway, weight, wheelbase,
and length for a set of automobiles. The values in the matrix range from -1 to 1, where 1
represents a perfect positive correlation, -1 represents a perfect negative correlation, and 0
represents no connection. The closer the absolute value of a correlation coefficient is to 1, the
stronger the linear relationship between the two variables. Looking at the matrix, we can see that
MSRP and invoice price are highly correlated (0.99913) as expected, and both are strongly
correlated with engine size, cylinders, horsepower, and weight. MPG in city and highway are
negatively correlated with the other variables, meaning that as the values of the other variables
increase, the MPG decreases. Wheelbase and length are weakly correlated with the other
variables. Overall, the correlation matrix can help identify potential relationships between
variables and guide the selection of predictors in a regression model. However, it's important to
note that correlation does not necessarily imply causation and other factors should be considered
before making any conclusions or modeling decisions (Cheng et al., 2021; Data Flair, n.d.; SAS
Support, 2012).
Figure 1
Code for Correlations and Scatterplots