School

Southern New Hampshire University **We aren't endorsed by this school

Course

MAT 243

Subject

Statistics

Date

Aug 12, 2023

Pages

2

Uploaded by MoonyBlues0019 on coursehero.com

This is a preview

Want to read all 2 pages? Go Premium today.

View Full Document

Already Premium? Sign in here

Module Six Discussion: Multiple Regression
This notebook contains the step-by-step directions for your Module Six discussion. It is very important to run through the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the steps in this notebook, be sure to answer the questions about this activity
in the discussion for this module.
Reminder: If you have not already reviewed the discussion prompt, please do so before beginning this activity. That will give you an idea of the questions you will need to answer with the outputs of this script.
Initial post (due Thursday)
¶
Step 1: Generating cars dataset
This block of Python code will generate the sample data for you. You will not be generating the data set using numpy module this week. Instead, the data set will be imported from a CSV file. To make the data unique to you, a random sample of size 30, without replacement, will be drawn from
the data in the CSV file. The data set will be saved in a Python dataframe that will be used in later calculations.
Click the block of code below and hit the
Run
button above.
In [1]:
import
pandas
as
pd
from
IPython.display
import
display, HTML
# read data from mtcars.csv data set.
cars_df_orig = pd.read_csv("https://s3-us-west-2.amazonaws.com/data-analytics.zybooks.com/mtcars.csv")
# randomly pick 30 observations from the data set to make the data set unique to you.
cars_df = cars_df_orig.sample(n=30, replace=
False
)
# print only the first five observations in the dataset.
print("Cars data frame (showing only the first five observations)
\n
")
display(HTML(cars_df.head().to_html()))
Step 2: Scatterplot of miles per gallon against weight
The block of code below will create a scatterplot of the variables "miles per gallon" (coded as mpg in the data set) and "weight" of the car (coded as wt).
Click the block of code below and hit the
Run
button above.
NOTE: If the plot is not created, click the code section and hit the
Run
button again.
In [3]:
import
matplotlib.pyplot
as
plt
# create scatterplot of variables mpg against wt.
plt.plot(cars_df["wt"], cars_df["mpg"], 'o', color='red')
# set a title for the plot, x-axis, and y-axis.
plt.title('MPG against Weight')
plt.xlabel('Weight (1000s lbs)')
plt.ylabel('MPG')
# show the plot.
plt.show()
Step 3: Scatterplot of miles per gallon against horsepower
The block of code below will create a scatterplot of the variables "miles per gallon" (coded as mpg in the data set) and "horsepower" of the car (coded as hp).
Click the block of code below and hit the
Run
button above.
NOTE: If the plot is not created, click the code section and hit the
Run
button again.
In [4]:
import
matplotlib.pyplot
as
plt
# create scatterplot of variables mpg against hp.
plt.plot(cars_df["hp"], cars_df["mpg"], 'o', color='blue')
# set a title for the plot, x-axis, and y-axis.
plt.title('MPG against Horsepower')
plt.xlabel('Horsepower')
plt.ylabel('MPG')
# show the plot.
plt.show()
Cars data frame (showing only the first five observations)
Unnamed: 0
mpg cyl disp
hp
drat
wt
qsec
vs am gear carb
3
Hornet 4 Drive
21.4
6
258.0 110
3.08 3.215 19.44 1
0
3
1
0
Mazda RX4
21.0
6
160.0 110
3.90 2.620 16.46 0
1
4
4
14
Cadillac Fleetwood
10.4
8
472.0 205 2.93 5.250 17.98 0
0
3
4
16
Chrysler Imperial
14.7
8
440.0 230 3.23 5.345 17.42 0
0
3
4
18
Honda Civic
30.4
4
75.7
52
4.93 1.615 18.52 1
1
4
2

Step 4: Correlation matrix for miles per gallon, weight and horsepower
Now you will calculate the correlation coefficient between the variables "miles per gallon" and "weight". You will also calculate the correlation coefficient between the variables "miles per gallon" and "horsepower". The
corr
method of a dataframe returns the correlation matrix with the correlation
coefficients between all variables in the dataframe. You will specify to only return the matrix for the three variables.
Click the block of code below and hit the
Run
button above.
In [5]:
# create correlation matrix for mpg, wt, and hp.
# The correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column).
# The correlation coefficient between mpg and hp is contained in the cell for mpg row and hp column (or hp row and mpg column).
mpg_wt_corr = cars_df[['mpg','wt','hp']].corr()
print(mpg_wt_corr)
Step 5: Multiple regression model to predict miles per gallon using weight and horsepower
This block of code produces a multiple regression model with "miles per gallon" as the response variable, and "weight" and "horsepower" as predictor variables. The
ols
method in statsmodels.formula.api submodule returns all statistics for this multiple regression model.
Click the block of code below and hit the
Run
button above.
In [6]:
from
statsmodels.formula.api
import
ols
# create the multiple regression model with mpg as the response variable; weight and horsepower as predictor variables.
model = ols('mpg ~ wt+hp', data=cars_df).fit()
print(model.summary())
End of initial post
Attach the HTML output to your initial post in the Module Six discussion. The HTML output can be downloaded by clicking
File
, then
Download as
, then
HTML
. Be sure to answer all questions about this activity in the Module Six discussion.
Follow-up posts (due Sunday)
Return to the Module Six discussion to answer the follow-up questions in your response posts to other students. There are no Python scripts to run for your follow-up posts.
mpg
wt
hp
mpg
1.000000 -0.861996 -0.768893
wt
-0.861996
1.000000
0.646514
hp
-0.768893
0.646514
1.000000
OLS Regression Results
==============================================================================
Dep. Variable:
mpg
R-squared:
0.820
Model:
OLS
Adj. R-squared:
0.807
Method:
Least Squares
F-statistic:
61.49
Date:
Fri, 04 Aug 2023
Prob (F-statistic):
8.86e-11
Time:
02:10:23
Log-Likelihood:
-70.644
No. Observations:
30
AIC:
147.3
Df Residuals:
27
BIC:
151.5
Df Model:
2
Covariance Type:
nonrobust
==============================================================================
coef
std err
t
P>|t|
[0.025
0.975]
------------------------------------------------------------------------------
Intercept
37.2391
1.708
21.799
0.000
33.734
40.744
wt
-3.8822
0.663
-5.857
0.000
-5.242
-2.522
hp
-0.0318
0.009
-3.397
0.002
-0.051
-0.013
==============================================================================
Omnibus:
4.563
Durbin-Watson:
1.967
Prob(Omnibus):
0.102
Jarque-Bera (JB):
3.511
Skew:
0.835
Prob(JB):
0.173
Kurtosis:
3.132
Cond. No.
590.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Why is this page out of focus?

Because this is a Premium document. Subscribe to unlock this document and more.

Page1of 2