# Tutorial 12

.pdf
ECON20003 - Tutorial 12 1 ECON20003 - QUANTITATIVE METHODS 2 Semester 2 - 2023 TUTORIAL 12 Download the t12e3 Excel data file from the subject website and save it to your computer or USB flash drive. Read this handout and try to complete the tutorial exercises before your tutorial class, so that you can ask help from your tutor during the Zoom session if necessary. Dummy Dependent Variable Regression Models (cont.) In the previous tutorial we already discussed the simplest dummy dependent variable regression model, the so-called linear probability model (LPM). This time we turn our attention to the other two models, the logit model and the probit model. We concluded Tutorial 11 with the two potentially most serious disadvantages of LPM, namely that estimated dependent variable, which is an estimate of the probability of success, might happen to be negative or greater than one, and that the marginal effect of a quantitative independent variable on the probability of success is restricted to be constant. The logit and probit models provide possible solutions to both of these problems. Logit model The logit model is based on the logistic cumulative distribution function (CDF), 1 1 ( ) 1 v F v e Accordingly, in the logit model the probability of success is 1 ( ) 1 Z P F Z e The marginal effect of the independent variable on the probability of success is 1 ( ) dP f Z dX where is the probability density function (PDF), i.e. the derivative of CDF. For the logit model (logistic distribution), it is 2 ( ) ( ) (1 ) Z Z dF Z e f Z dZ e 1 For this reason, the logit model is also referred to as logistic model (for example, in the Selvanathan book). L. Kónya
ECON20003 - Tutorial 12 2 Probit model The probit model is based on the standard normal CDF 2 2 1 ( ) 2 u F v e du  In this case the probability of success is given by 2 2 1 ( ) 2 Z u P F Z e du  and the probability density function (PDF) is 2 2 ( ) 1 ( ) 2 Z dF Z f Z e dZ The standard normal CDF is clearly more complicated than the logistic CDF, but in practice this does not pose any real problem because its values are tabulated in the standard normal table and can be also obtained easily with statistical programs like R . The logit and probit models are nonlinear regression models, and they cannot be estimated with OLS. Instead, they are estimated with the maximum likelihood (ML) method. We do not discuss the details of this procedure, but fortunately with R it can be implemented as easily as the OLS method. The logit and probit regression are interpreted differently, but usually they lead to very similar inferences and conclusions, except under the tails of the distributions, i.e., for relatively small and large values. In R logit and probit models can be estimated with the glm(formula = y ~ x1 + x2 + ..., family =familytype(link=linkfunction)) function, where formula is like in the lm() function and family is binomial(link = "logit") for the logit model and binomial(link = "probit") for the probit model. We are going to return to Exercise 4 of Tutorial 11 to illustrate logit and probit models.
ECON20003 - Tutorial 12 3 Exercise 1 (HGL, p. 694, ex. 16.6) Complete the following tasks using the same data as in Exercise 4 of tutorial 11. a) Estimate a logit model and briefly evaluate and interpret the results. Import the data from t11e4.xlsx file and execute the logit = glm(COKE ~ PRATIO + DISP_COKE + DISP_PEPSI, family = binomial(link = "logit")) summary(logit) commands. You should get There are several details on this printout that warrant some explanation. (i) The corresponding logit and LPM coefficients cannot be compared directly to each other because they measure different things, but their logical signs are the same. (ii) Note that, instead of t -ratios, this time R reports z-ratios. 2 As you can see, just like in LPM, all three slopes are significant individually in the logical direction even at the 1.5% level. 3 (iii) Below call , which reminds us of the command we just executed, R reports the usual location statistics for the deviance residuals . Deviance is the generalization of the idea of using SSE to evaluate the goodness of fit of regressions estimated by OLS to regressions estimated by the ML method. Like in the case of SSE , the smaller the deviance the better the fit. 2 Recall that ML is a large-sample method and at large sample sizes the binomial distribution can be approximated with a normal distribution. 3 Note that this time the reported p -values are for two-tail z -tests.