Clustering CARS Dataset First Last Colorado State University Global 22WC-MIS450-1: Data Mining Dr. Steve Chung February 19, 2023 1
2 Clustering CARS Dataset The first job was to evaluate the linear relationships between the quantitative variables of the CARS dataset using correlations and scatterplots. Figure 1 depicts the code, whereas Figure 2 is a correlation matrix that quantifies the strength and direction of the linear link between two variables. The matrix displays the correlation coefficients between MSRP, invoice price, engine size, number of cylinders, horsepower, MPG in the city and on the highway, weight, wheelbase, and length for a set of automobiles. The values in the matrix range from -1 to 1, where 1 represents a perfect positive correlation, -1 represents a perfect negative correlation, and 0 represents no connection. The closer the absolute value of a correlation coefficient is to 1, the stronger the linear relationship between the two variables. Looking at the matrix, we can see that MSRP and invoice price are highly correlated (0.99913) as expected, and both are strongly correlated with engine size, cylinders, horsepower, and weight. MPG in city and highway are negatively correlated with the other variables, meaning that as the values of the other variables increase, the MPG decreases. Wheelbase and length are weakly correlated with the other variables. Overall, the correlation matrix can help identify potential relationships between variables and guide the selection of predictors in a regression model. However, it's important to note that correlation does not necessarily imply causation and other factors should be considered before making any conclusions or modeling decisions (Cheng et al., 2021; Data Flair, n.d.; SAS Support, 2012). Figure 1 Code for Correlations and Scatterplots
3 Figure 2 Correlation Matrix
Uploaded by shelby80138 on coursehero.com