School

University of Melbourne **We aren't endorsed by this school

Course

ECON 20003

Subject

Economics

Date

Nov 20, 2023

Type

Other

Pages

21

Uploaded by LieutenantOwl3644 on coursehero.com

ECON20003 - Tutorial
12
1
ECON20003 - QUANTITATIVE METHODS 2
Semester 2 - 2023
TUTORIAL 12
Download the t12e3 Excel data file from the subject website and save it to your computer or
USB flash drive. Read this handout and try to complete the tutorial exercises before your
tutorial class, so that you can ask help from your tutor during the Zoom session if necessary.
Dummy Dependent Variable Regression Models (cont.)
In the previous tutorial we already discussed the simplest dummy dependent variable
regression model, the so-called linear probability model (LPM). This time we turn our
attention to the other two models, the logit model and the probit model.
We concluded Tutorial 11 with the two potentially most serious disadvantages of LPM,
namely that estimated dependent variable, which is an estimate of the probability of success,
might happen to be negative or greater than one, and that the marginal effect of a
quantitative independent variable on the probability of success is restricted to be constant.
The logit and probit models provide possible solutions to both of these problems.
Logit model
The logit model is based on the logistic cumulative distribution function (CDF),
1
1
( )
1
v
F v
e
Accordingly, in the logit model the probability of success is
1
(
)
1
Z
P
F Z
e
The marginal effect of the independent variable on the probability of success is
1
(
)
dP
f Z
dX
where
is the probability density function (PDF), i.e. the derivative of CDF. For the logit
model (logistic distribution), it is
2
(
)
(
)
(1
)
Z
Z
dF Z
e
f Z
dZ
e
1
For this reason, the logit model is also referred to as logistic model (for example, in the Selvanathan book).
L. Kónya

ECON20003 - Tutorial
12
2
Probit model
The probit model is based on the standard normal CDF
2
2
1
( )
2
u
F v
e
du
In this case the probability of success is given by
2
2
1
(
)
2
Z
u
P
F Z
e
du
and the probability density function (PDF) is
2
2
(
)
1
(
)
2
Z
dF Z
f Z
e
dZ
The standard normal CDF is clearly more complicated than the logistic CDF, but in practice
this does not pose any real problem because its values are tabulated in the standard normal
table and can be also obtained easily with statistical programs like
R
.
The logit and probit models are nonlinear regression models, and they cannot be estimated
with OLS. Instead, they are estimated with the maximum likelihood (ML) method. We do not
discuss the details of this procedure, but fortunately with
R
it can be implemented as easily
as the OLS method.
The logit and probit regression are interpreted differently, but usually they lead to very similar
inferences and conclusions, except under the tails of the distributions, i.e., for relatively small
and large values.
In R logit and probit models can be estimated with the
glm(formula = y ~ x1 + x2 + ..., family =familytype(link=linkfunction))
function, where formula is like in the lm() function and family is binomial(link = "logit") for the
logit model and binomial(link = "probit") for the probit model.
We are going to return to Exercise 4 of Tutorial 11 to illustrate logit and probit models.

ECON20003 - Tutorial
12
3
Exercise 1
(HGL, p. 694, ex. 16.6)
Complete the following tasks using the same data as in Exercise 4 of tutorial 11.
a)
Estimate a logit model and briefly evaluate and interpret the results.
Import the data from t11e4.xlsx file and execute the
logit = glm(COKE ~ PRATIO + DISP_COKE + DISP_PEPSI,
family = binomial(link = "logit"))
summary(logit)
commands. You should get
There are several details on this printout that warrant some explanation.
(i)
The corresponding logit and LPM coefficients cannot be compared directly to each
other because they measure different things, but their logical signs are the same.
(ii)
Note that, instead of
t
-ratios, this time
R
reports z-ratios.
2
As you can see, just like
in LPM, all three slopes are significant individually in the logical direction even at the
1.5% level.
3
(iii) Below
call
, which reminds us of the command we just executed,
R
reports the usual
location statistics for the
deviance residuals
.
Deviance
is the generalization of the
idea of using
SSE
to evaluate the goodness of fit of regressions estimated by OLS
to regressions estimated by the ML method. Like in the case of
SSE
, the smaller the
deviance
the better the fit.
2
Recall that ML is a large-sample method and at large sample sizes the binomial distribution can be
approximated with a normal distribution.
3
Note that this time the reported
p
-values are for two-tail
z
-tests.