This file and several accompanying files contain the solutions to the oddnumbered problems in the book EconometricAnalysis of Cross Section and Panel Data, by Jeffrey M. Wooldridge, MIT Press, 2002.  The empirical examples are  solved using various versions of Stata, with some dating back to Stata 4.0. Partly out of laziness, but also because it is useful for students to see computer output, I have included Stata output in most cases rather than type tables.  In some cases, I do more hand calculations than are needed in current  versions of Stata. Currently, there are some missing solutions.  I will update the solutions  occasionally to fill in the missing solutions, and to make corrections. some problems I have given answers beyond what I originally asked.  For  Please  report any mistakes or discrepencies you might come across by sending me email at 
[email protected].  CHAPTER 2  dE(y|x1,x2) dE(y|x1,x2) = b1 + b4x2 and = b2 + 2b3x2 + b4x1. dx1 dx2 2 b. By definition, E(u|x1,x2) = 0. Because x2 and x1x2 are just functions  2.1. a.  -----------------------------------------------------  -----------------------------------------------------  of (x1,x2), it does not matter whether we also condition on them: E(u|x1,x2,x2,x1x2) = 0. 2  c. All we can say about Var(u|x1,x2) is that it is nonnegative for all x1 and x2:  E(u|x1,x2) = 0 in no way restricts Var(u|x1,x2).  2.3. a. y = b0 + b1x1 + b2x2 + b3x1x2 + u, where u has a zero mean given x1 and x2: b.  E(u|x1,x2) = 0.  We can say nothing further about u.  dE(y|x1,x2)/dx1 = b1 + b3x2.  Because E(x2) = 0, b1 = 1  E[dE(y|x1,x2)/dx1].  Similarly, b2 = E[dE(y|x1,x2)/dx2].  c. If x1 and x2 are independent with zero mean then E(x1x2) = E(x1)E(x2) = 0.  Further, the covariance between x1x2 and x1 is E(x1x2Wx1) = E(x1x2) = 2  2  E(x1)E(x2) (by independence) = 0.  A similar argument shows that the  covariance between x1x2 and x2 is zero.  But then the linear projection of  x1x2 onto (1,x1,x2) is identically zero.  Now just use the law of iterated  projections (Property LP.5 in Appendix 2A): L(y|1,x1,x2) = L(b0 + b1x1 + b2x2 + b3x1x2|1,x1,x2) = b0 + b1x1 + b2x2 + b3L(x1x2|1,x1,x2) = b0 + b1x1 + b2x2. d. Equation (2.47) is more useful because it allows us to compute the partial effects of x1 and x2 at any values of x1 and x2.  Under the  assumptions we have made, the linear projection in (2.48) does have as its slope coefficients on x1 and x2 the partial effects at the population average values of x1 and x2 -- zero in both cases -- but it does not allow us to obtain the partial effects at any other values of x1 and x2.  Incidentally,  the main conclusions of this problem go through if we allow x1 and x2 to have any population means.  2.5. By definition, Var(u1|x,z) = Var(y|x,z) and Var(u2|x) = Var(y|x). 2  assumption, these are constant and necessarily equal to s1 Var(u2), respectively.  By  _ Var(u1) and s22 _ 2  But then Property CV.4 implies that s2  > s21.  This  simple conclusion means that, when error variances are constant, the error variance falls as more explanatory variables are conditioned on.  2.7. Write the equation in error form as 2  y = g(x) + zB + u, E(u|x,z) = 0. Take the expected value of this equation conditional only on x: E(y|x) = g(x) + [E(z|x)]B, and subtract this from the first equation to get y - E(y|x) = [z - E(z|x)]B + u ~ ~ or y = zB + u.  ~ ~ Because z is a function of (x,z), E(u|z) = 0 (since E(u|x,z) =  ~ ~ ~ 0), and so E(y|z) = zB.  This basic result is fundamental in the literature on  estimating partial linear models.  First, one estimates E(y|x) and E(z|x)  using very flexible methods, typically, so-called nonparametric methods. ~ Then, after obtaining residuals of the form yi ^ E(zi|xi),  B  ^ ~ _ yi - E(y i|xi) and zi _ zi - -  ~ ~ is estimated from an OLS regression yi on zi, i = 1,...,N.  Under  general conditions, this kind of nonparametric partialling-out procedure leads to a  rN-consistent, asymptotically normal estimator of B. -----  See Robinson (1988)  and Powell (1994).  CHAPTER 3  3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be < and an integer Ne such that P[|xN| following fact:  since xN  p L  But  We use the  a, for any e > 0 there exists an integer Ne such  that P[|xN - a| > 1] < e for all N Definition 3.3(1).]  > be] < e, all N > Ne.  8  > Ne .  [The existence of Ne is implied by  |xN| = |xN - a + a| < |xN - a| + |a| (by the triangle  inequality), and so  |xN| - |a| < |xN - a|.  It follows that P[|xN| -  < P[|xN - a| > 1].  Therefore, in Definition 3.3(3) we can take be  |a| > 1]  _ |a| + 1  (irrespective of the value of e) and then the existence of Ne follows from Definition 3.3(1). 3   -----  -----  -----  Therefore. a. ^ b. the unbiased estimator of s  2  S (yi .3.1)/se(q) = 3/2 = 1. of course. -----  -----  3.s2). or s/rN. To obtain the asymptotic standard error of yN.g)]. -----  d.g)] -----  2 ^ = (1/q) Avar[rN(q . -----  ^ c. -----  2  Avar(yN) = s /N.3.which is. se(g) = se(q)/q. -----  is used:  ^2 s = (N -  The asymptotic  i=1  ^ standard error of yN is simply s/rN.  ^ ^ and se(q) = 2.q)]. Because g = log(q). In the scalar case.5. -1 N  1)  Typically. we need a consistent estimator of s. By the CLT. the null of interest can also be stated as H0: g = 4  .1 because g(xN)  p L  g(c). 2  -----  b. for g(q) = log(q).  -----  2  -----  2  rN(yN . -----  -----  e.7.yN)2.q)]. -----  -----  -----  -----  c. We Obtain Avar(yN) by dividing Avar[rN(yN . This follows immediately from Lemma 3. The asymptotic t statistic for testing H0: q = 1 is (q . g = log(4)  ^ When q = 4  ~ 1.g)] = [dg(q)/dq] Avar[rN(q . e. continuously differentiable -. this coincides with the actual variance of yN. -----  In the scalar case.  ^ ^ ^ 2 ^ if g = g(q) then Avar[rN(g .5. and then s^ is the positive square root.m) ~a Normal(0.39 and se(g^) = 1/2.  ^ ^ d.Avar[rN(g .m)] = N(s /N) = s .m)] by N. and so Avar[rN(yN .  3. and so ^ ^ plim[log(q)] = log[plim(q)] = log(q) = g. The asymptotic standard deviation of yN is the square root of its asymptotic variance. Var[rN(yN . -----  -----  When g(q) =  ^ log(q) -. We use the delta method to find Avar[rN(g . the asymptotic standard error of g is generally  |dg(^q)/dq|Wse(^q).  ^ ^ ^ Therefore. Since Var(yN) = s /N. a. -----  As expected. For q > 0 the natural logarithim is a continuous function.m)] = s2. V1 is positive semi-definite.  Now.  The lesson is that.0.5) = 2. Now. using the Wald test.78. a. where x denotes all explanatory variables.1]. say. and therefore G(Q)(V2 V1)G(Q)’ is p.  b. V2 . Therefore.  ^ . marginally significant.d.39/(. Avar[rN(G -----  Dqg(Q) is Q * P. then E[exp(u)|x] = E[exp(u)] = d0.  ~ Avar[rN(G -----  G)]  G)]  = G(Q)V2G(Q)’.  CHAPTER 4  4. finding the proportionate difference in this expectation at married = 1 and married = 0 (with all else equal) gives exp(b1) .  By assumption.  ^ The t statistic based on g is about 1. we need the derivative of g with 5  . By the delta method.1. if u and x are independent  Therefore  E(wage|x) = d0exp(b0 + b1married + b2educ + zG).9.Avar[rN(G -----  G)]  = G(Q)(V2 .  This completes the proof. ^ Avar[rN(G -----  where G(Q) =  G)]  ~ = G(Q)V1G(Q)’.s.1. the percentage difference is 100W[exp(b1) . all other factors cancel out.V1)G(Q)’.  3. at best. Since q1 = 100W[exp(b1) . Exponentiating equation (4.  This leads to a  ^ very strong rejection of H0.  Thus. we can  change the outcome of hypotheses tests by using nonlinear transformations. whereas the t statistic based on q is. E(wage|x) = E[exp(u)|x]exp(b0 + b1married + b2educ + zG).1] = g(b1).49) gives wage = exp(b0 + b1married + b2educ + zG + u) = exp(u)exp(b0 + b1married + b2educ + zG).  Therefore.   The conditional variance can always be written as  |x) .1 = exp(b2Deduc) . q2 = 100W[exp(b2Deduc) .  4.065. 2  the upper K  * K block gives 6  Inverting E(w’w) and focusing on  .  For  ^ ^ Then q2 = 29.  But. generally. then E(u2|x) $ Var(u|x).5.2 that Avar rN(D -----  ^ ^ (B’. because E(x’z) = 0. all else fixed. a.76. b1 = . E(w’w) is block diagonal.exp(b2educ0)]/exp(b2educ0) = exp[b2(educ1 . For the estimated version of equation (4. Var(u|x) = E(u  ^ ^ Therefore. ^ q2 we set Deduc = 4. and Var(u|x) is constant.11. se(b1) = ^ ^ .7 and se(q2) = 3.3.  4.006.z).where 2  -1  ^  D  =  Importantly. with  upper block E(x’x) and lower block E(z ).  The proportionate change  in expected wage from educ0 to educ1 is [exp(b2educ1) . the usual standard errors would not be  valid unless E(u|x) = 0.educ0)] .dg/db1 = 100Wexp(b1).[E(u|x)]2. if E(u|x) $ 0. q1 = 22.1. c.29). Not in general.50) as E(y|w) = wD. We can evaluate the conditional expectation in part (a) at two levels of education. 2 ^ s .g)’.01 and se(q1) = 4. b2 = .  D)  Since Var(y|w) =  is s [E(w’w)] .039.199.1] and ^ ^ ^ se(q2) = 100W|Deduc|exp(b2Deduc)se(b2) ^ ^ d. It could be that E(x’u) = 0. ^ Using the same arguments in part (b).  respect to b1:  ^ The asymptotic standard error of q1  ^ using the delta method is obtained as the absolute value of dg/db1 times ^ se(b1): ^ ^ ^ se(q1) = [100Wexp(b1)]Wse(b1). say educ0 and educ1. Write equation (4. where w = (x. in which case OLS is consistent. it follows by Theorem 4.  2  b. se(b2) = .  if E(z  > 0 (in which case y = xB + v satisfies the homoskedasticity assumption  OLS. One important omitted factor in u is family income:  students that  come from wealthier families tend to do better in school. E(v x’x) .B) -----  -----  = [E(x’x)] E(v x’x)[E(x’x)] -1  2  -1  = [E(x’x)] E(v x’x)[E(x’x)] -1  2  -1  .s E(x’x)][E(x’x)] . Family income and PC ownership are positively correlated because the probability of owning a PC increases with family income. a.  Then by the law of  |x)x’x] = g2E[h(x)x’x] + s2E(x’x).B) .Avar rN(B .  ~ rN(B . -----  Now we can show Avar  ~ ^ rN(B .s E(x’x) = g h E(x’x).B). -1  Because [E(x’x)]  -1  s E(x’x) is p.  |x) = g2E(z2|x) + E(u2|x) + 2gE(zu|x) = g2E(z2|x) +  2  Further. 2  2  2  |x) is constant. 2  = h  $ 0. is actually |x) = E(z2)  2  In particular. 2  2  2  -1  is positive definite. it suffices to show that E(v x’x) 2  To this end.s E(x’x) = g E[h(x)x’x].B) = s [E(x’x)] . without further assumptions.Avar rN(B .3). when g 2  2  2  a positive definite matrix except by fluke.s [E(x’x)] 2  -1  . Avar  So.  Therefore.B) is positive semi-definite by -----  -----  writing Avar  ~ ^ rN(B . -----  Next.7. other things equal.s [E(x’x)] E(x’x)[E(x’x)] 2  -1  -1  = [E(x’x)] [E(v x’x) .  ~ -1 2 -1 rN(B .B) = [E(x’x)] E(v x’x)[E(x’x)] .Avar  ^ 2 -1 rN(B .B) . E(v x’x) = E[E(v 2  2  _ E(z2|x). 2  2  2 2  4. we need to find Avar where v = gz + u and u E(x’v) = 0. let h(x)  iterated expectations.z) = Var(y|x.d. E(v  s .E(y|x.z). which is positive definite. the equation y = xB + v generally violates the  2  Unless E(z  homoskedasticity assumption OLS. E(v x’x) .z) = s . which.z) = 0 and E(u |x. where we use E(zu|x. 7  Another factor in u  .s.  It is helpful to write y = xB + v  -----  _ y .z) = zE(u|x.  Because E(x’z) = 0 and E(x’u) = 0.3.  by assumption.  If family income is not available sometimes level of parents’  education is. If data on family income can be collected then it can be included in the equation. Clearly. ^ b. where sw -1  -1  = sd(w-1) and sw = sd(w). but it is not clear-cut because of the other explanatory variables in the equation.  The  coefficient on log(y-1) changes. and so on.  Proxies  for high school quality might be faculty-student ratios. and it is likely to be positive.  This measures the partial  correlation between u (say. the intercept and slope estimates on x will be the same. as zip code is often part of school records. But. so we can write a1 = Cov(w-1. let w = log(y). w-1 = log(y-1).w)/Var(w-1). Var(w) = Var(w-1).  This may also be correlated with PC:  a student  who had more exposure with computers in high school may be more likely to own a computer.  Then the population  slope coefficient in a simple regression is always a1 = Cov(w-1. For simplicity.w) =  Cov(w-1. expenditure per student.  If we write the linear projection  u = d0 + d1hsGPA + d2SAT + d3PC + r then the bias is upward if d3 is greater than zero.  Another possibility is to use average house value in each  student’s home zip code.  4.is quality of high school.w)/(sw sw). a.9. b3 is likely to have an upward bias because of the positive correlation between u and PC. family income) and PC. c. and since a correlation coefficient is always between -1 -1  8  . average teacher salary.  But Corr(w-1.1)log(y-1) + u. Just subtract log(y-1) from both sides:  Dlog(y) = b0 + xB + (a1 . b.w)/(sw sw). 534 0.2591 .559489 925 . b.0011306 .1157839 .938 0. 925) = Prob > F =  8.000 .000 .and 1. t P>|t| [95% Conf. we have an even  lower estimated return to education.2286334 black | -.  4.003826 .079 0.863 0.1921449 . but it is still practically nontrivial and statistically very significant.0640893 iq | . the result follows.001 -. Std.002 .0157246 married | .0018521 2.656283 934 .0109248 .0498375 .0000 0.131415664 ---------+-----------------------------Total | 165.000 .947 0.0399014 -3.  Thus.924879 5.0032308 3.0967944 9 4.426408 -----------------------------------------------------------------------------.268 0.2685059 south | -.0001911 .175644 .066 0.1303995 .11.89964382 Residual | 121.000 .0820295 . We can see from the t statistics that these variables are going to be  9  . 925) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  935 37. The estimated return to education using both IQ and KWW as proxies for ability is about 5%.506 0.0190927 tenure | .128 0.006125 .1758226 .0355856 .  When we used no proxy the estimated return was about  6.0002  a. Err.0389094 4.0064117 .1230118 .007262 6.127776 40.59 0.4%.039 .2662 0.0024457 4.0031183 . test iq kww ( 1) ( 2)  iq = 0.467 0.1334913 -.36251  -----------------------------------------------------------------------------lwage | Coef.2087073 -. and with only IQ as a proxy it was about 5.0305676 urban | .0262222 -3.0 kww = 0.002 -. Here is some Stata output obtained to answer this question: .177362188  Number of obs F( 9. reg lwage exper tenure married south urban black educ iq kww Source | SS df MS ---------+-----------------------------Model | 44.000 .0074608 _cons | 5.0010128 3.0269095 6.5%.0520917 educ | .0051059 kww | .000 4.0127522 .28 0.0 F(  2. Interval] ---------+-------------------------------------------------------------------exper | . 009923 -----------------------------------------------------------------------------Because of the log-log functional form. To add the previous year’s crime rate we first generate the lag: .47 0.  4.28 0.2064441 0.15 0.78874002 Residual | 15. gen lcrmr_1 = lcrmrte[_n-1] if d87 (540 missing values generated) .  The F test verifies this.2507964 . and both are practically and statistically significant.3072706 lprbpris | . reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87 Source | SS df MS -------------+-----------------------------Model | 11.0764213 .570136 lavgsen | .77 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87 10  .42902  -----------------------------------------------------------------------------lcrmrte | Coef.18405574 -------------+-----------------------------Total | 26.000 -5. with p-value = .867922 . holding all other factors fixed.3888 .1634732 0.000 -.6377519 -.13. t P>|t| [95% Conf.1549601 4 2.  The elasticities with respect to the probability  of serving a prison term and the average sentence length are positive but are statistically insignificant. The wage differential between nonblacks and blacks does not disappear. Blacks are estimated to earn about 13% less than nonblacks. The elasticities of crime with respect to the arrest and conviction probabilities are the sign we expect.28 0.799698 89 .0000 0. 85) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  90 15.4014499 _cons | -4. all coefficients are elasticities.7239696 .2486073 .4315307 -11.0831078 -5. b. Using the 90 counties for 1987 gives .4162 0.  c.441 -. Interval] -------------+---------------------------------------------------------------lprbarr | -.301120202  Number of obs F( 4.641 -.4725112 .725921 -4.000 -.0002.69 0. a.1596698 .4946899 lprbconv | -. Err.1153163 -6.6447379 85 .jointly significant. Std.9532493 -. 3098523 -.7666256 .0000 0.67099462 Residual | 3.0000 0.0988505 -1. and the latter is almost statistically significant at the 5% level against a two-sided alternative (p-value = .1266874 .) c.83 0.7798129 .3549731 5 4. Std.3130986 -2. The elasticities with respect to prbarr and prbconv are much smaller now. t P>|t| [95% Conf. t P>|t| [95% Conf.20251  -----------------------------------------------------------------------------lcrmrte | Coef.301120202  Number of obs F( 14.4447249 84 .1439946 -----------------------------------------------------------------------------There are some notable changes in the coefficients on the original variables.0602325 lprbconv | -.94 0.799698 89 . Interval] -------------+---------------------------------------------------------------11  .799698 89 .409 -.3077141 .6899051 .0627624 -2.301120202  Number of obs F( 5.8697208 _cons | -.0698876 lavgsen | -. Interval] -------------+---------------------------------------------------------------lprbarr | -.1850424 .056 -.3232625 .19731  -----------------------------------------------------------------------------lcrmrte | Coef.056). Err. but still have signs predicted by a deterrent-effect story.25 0.004 -.0539921 lprbpris | -.  The conviction  Adding the lagged crime  rate changes the signs of the elasticities with respect to prbpris and avgsen.0782915 -1.28 0. 84) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  90 113.91982063 75 . probability is no longer statistically significant. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 lwcon-lwloc if d87 Source | SS df MS -------------+-----------------------------Model | 23.8638 .0465999 -0. the elasticity with  respect to the lagged crime rate is large and very statistically significant.04100863 -------------+-----------------------------Total | 26.8911 0.45 0.038930942 -------------+-----------------------------Total | 26.0036684 lcrmr_1 | . Std. Err.1520228 . 75) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  90 43. Adding the logs of the nine wage variables gives the following:  .8798774 14 1.0386768 .000 .016 -1.8707 . (The elasticity is also statistically different from unity.  Not surprisingly.0452114 17.81 0.70570553 Residual | 2.1313457 .204 -.389257 -.8715 0.90 0.Source | SS df MS -------------+-----------------------------Model | 23.95 0. 0 0.4749687 .000 .19 and p-value = .0 0.0835258 .134327 0.2453134 1.33 0.48 0.11 0.lprbarr | -.3038978 -.1069592 -----------------------------------------------------------------------------.0844647 -2.0659533 -2.2815703 lwmfg | .8248172 lwsta | .0686327 lwtuc | .2155553 .0 0.3291546 -0.0 0.253707 .175 -.1643  The nine wage variables are jointly insignificant even at the 15% level.1674273 .285) and the wage for federal employees (.0560619 .1186099 0. nine in this  12  .0847427 1.09 0.1127542 .6926951 .7453414 .277 -.039 -.3732769 . the elasticities are not consistently positive or negative.408 -. Using the "robust" option in Stata.0641312 .056 -7.0411265 lprbconv | -.7153665 lwfir | -.19 0.50 0.which also have the largest absolute t statistics -have the opposite sign.0 0.849 -.4195493 -.187 -.0115614 lavgsen | -.  These are with respect to the wage in construction (-  .3361278 .3350201 lwfed | .43 0.672 -.0987371 .1960546 .6386344 .336).4522947 lwloc | -.618724 _cons | -3.3317244 lwtrd | .364317 -.8509887 lwcon | -. Plus.049728 -1.2317449 1.0  9.05 0.1725122 .32 0.11 0.0 0.94 0. which is appended to the "reg" command.023 -.37 0.113 -.0 0.1375459 .032.011 -. testparm lwcon-lwloc ( ( ( ( ( ( ( ( (  1) 2) 3) 4) 5) 6) 7) 8) 9)  lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F(  = = = = = = = = =  0.  (This F statistic is the heteroskedasticity-robust Wald  statistic divided by the number of restrictions being tested.2034619 .1024014 -2. gives the heteroskedasiticity-robust F statistic as F = 2.0369855 .0530331 14.173 -.3079171 lwser | .2079524 .634 -.  The two  largest elasticities -.1525615 .0306994 lprbpris | -.692009 .2072112 0.37 0.957472 -1.  d.83 0.62 0.0 0.792525 1.0277923 lcrmr_1 | .0395089 .2850008 .1775178 -1.6396942 .911 -.61 0.1964974 -0. 75) = Prob > F =  1.0683639 .  Write R  = 1 .15.  The division by the number of restrictions turns the asymptotic chi-  square statistic into one that roughly has an F distribution.  Cov(xB. so we should allow for error variances to change across different models for the same response variable.[plim(SSR/N)]/[plim(SST/N)] = 1 .  2  Therefore.  8. regardless of the nature of heteroskedasticity in Var(u|x).SSR/SST = 1 . 2  c.  The  population R-squared depends on only the unconditional variances of u and y. plim(R ) = 1 2  2  2  .  Since Var(u)  But each xj is uncorrelated with u.su/sy = r . d. Var(y) = Var(xB) + Var(u). which is not a very interesting case).u) is well-defined. 2  where we use the fact that SSR/N is a consistent estimator of su and SST/N is 2  a consistent estimator of sy. which is uncorrelated with each xj. 2  The statement "Var(ui) = s are nonrandom (or  B  = Var(yi) for all i" assumes that the regressors  = 0.  Suppose that an element of the error term. or sy = Var(xB) + su. Therefore. Cov(xB. (It gets smaller. 2  2  b.plim[(SSR/N)/(SST/N)] = 1 .example. the usual R-squared consistently estimates the population R-squared. and so does the error variance. so  Therefore. Because each xj has finite second moment.)  In the vast majority of economic applications. say  z. The derivation in part (c) assumed nothing about Var(u|x). the error changes. it makes no  sense to think we have access to the entire set of factors that one would ever want to control for. This is nonsense when we view the xi as random draws along with yi. a.  When we add  z to the regressor list.  This is  another example of how the assumption of nonrandom regressors can lead to counterintuitive conclusions.(SSR/N)/(SST/N).  Neither  R-squared nor the adjusted R-squared has desirable finite-sample properties. suddenly becomes observed. Var(xB) < <  8. 13  .)  4.u) = 0.  women who smoke during  pregnancy may. (ii) Regress y1 onto ¨ x1. ^ ^ But when we regress z1 onto v2.y2).  B^1  can also be obtained by  partitioned regression: ^ (i) Regress x1 onto v2 and save the residuals. Basic economics says that packs should be negatively correlated with cigarette price. although the correlation might be small (especially because price is aggregated at the state level).  The statement in the problem is simply wrong. because  i=1  ^ ^ ^ ^ we can write y2 = y2 + v2. drink more coffee or alcohol.  ^ In other words.  For example.  S z’i1^vi2 = 0.3.y2) and x2 _ v^2. y2.54).  5. Define x1  ^ ^ ^ _ (z1.r1)’ be OLS estimator  from (5. the ^ residuals from regressing y2 onto v2 are simply the first stage fitted values.52).  (More precisely. a. There may be unobserved health factors correlated with smoking behavior that affect infant birth weight. ^ y2.  At first glance it seems that  cigarette price should be exogenous in equation (5.  But the 2SLS estimator of  B1  is obtained  ^ exactly from the OLS regression y1 on z1. but we must be a little careful. b.a1)’. or eat less nutritious meals.such as unbiasedness. where y2 and v2 are orthogonal in sample. so the only analysis we can do in any generality involves asymptotics.  Using the hint.  CHAPTER 5  5. where  B^1  ^ ^ = (D’ 1 . ¨ x1 = (z1. and let B _ (B ’1 . on average. the residuals are just z1 since v2 is N  orthogonal in sample to z.)  Further. say ¨ x1.  One component of cigarette price is the state tax on 14  .1. 009 .0064486 .0056646 2. on average.1173139 -. 1383) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  1388 12.264 -. Std.0000 0.4203336 1387 .0219322 -0. . t P>|t| [95% Conf.0055837 3.000 3.0417848 lfaminc | .4203336 1387 .7971063 1. OLS is followed by 2SLS (IV.0646972 parity | -.3500269 4 -22.056 0.32017  -----------------------------------------------------------------------------lbwght | Coef.094 -.102509299 ---------+-----------------------------Total | 50.975601 ------------------------------------------------------------------------------  (Note that Stata automatically shifts endogenous explanatory variables to the beginning of the list when report coefficients.262 0.116 0. 1383) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  (2SLS) 1388 2.675618 . Interval] ---------+-------------------------------------------------------------------packs | .65369 1383 .0290032 packs | -.463 -1. standard errors.  Quality of health care is in u.928031 male | .000 -. and so on.601 0.55 0. in this case): .035179819 ---------+-----------------------------Total | 50.2588289 17. Err.0147292 .890 0.036352079  Number of obs F( 4.0481949 .632694 4.8375067 Residual | 141.718542 -----------------------------------------------------------------------------.770361 1383 .233 0.0171209 -4.0180498 .  States that have lower taxes on cigarettes may also have lower  quality of health care.0262407 .441660908 Residual | 48.0570128 1.0322 .0050562 .333819 2.0036171 .600 0.0070964 . Err.18756  -----------------------------------------------------------------------------lbwght | Coef.39 0.1754869 _cons | 4.0100894 2.0350 0.0501423 _cons | 4.734 0.036352079  Number of obs F( 4.0298205 .76664363 4 .009 . t P>|t| [95% Conf.681 0.0012391 .0218813 213.955 -. and so  maybe cigarette price fails the exogeneity requirement for an IV.467861 .) 15  . c. reg lbwght male parity lfaminc packs (male parity lfaminc cigprice) Source | SS df MS ---------+-----------------------------Model | -91.0258414 lfaminc | . reg lbwght male parity lfaminc packs Source | SS df MS ---------+-----------------------------Model | 1.017779 1.001 .0837281 .cigarettes. Interval] ---------+-------------------------------------------------------------------male | .0490 .0460328 parity | . . Std.000 4.677 0.063646 .086275 0.044263 .960122 4. 0697023 -.86 0.  The IV estimate  has the opposite sign. Under the null hypothesis that q and z2 are uncorrelated. 1383) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  1388 10.0355692 lfaminc | -.0526374 .0022999 _cons | .187 -.0158539 -0.089182501  Number of obs F( 4.  Thus. z1 and z2 are exogenous in (5.298 0.94176277 Residual | 119.000777 . and is not statistically significant.086716615 ---------+-----------------------------Total | 123.0305 0.  d.317 -. Err.000 -. reg packs male parity lfaminc cigprice Source | SS df MS ---------+-----------------------------Model | 3.0181491 .0000 0.0276 .  cigprice fails as an IV for packs because cigprice is not partially correlated with packs (with a sensible sign for the correlation). Std.3414234 -----------------------------------------------------------------------------The reduced form estimates show that cigprice does not significantly affect packs.55) because each is uncorrelated with u1.0007291 .1040005 1.0007763 1.044 0. and is statistically significant.76705108 4 .The difference between OLS and IV in the estimated effect of packs on bwght is huge.0358264 .766 -. in fact.929078 1383 . y2  .0047261 . 16  Unfortunately.051 0.0086991 -6.  5.5.0088802 2. is huge in magnitude. Interval] ---------+-------------------------------------------------------------------male | -.  The sign and size of the smoking effect are not realistic.001 0.  This is separate from  the problem that cigprice may not truly be exogenous in the birth weight equation.0355724 cigprice | .041 .  With the OLS estimate.321 0.4%. We can see the problem with IV by estimating the reduced form for packs: .0007459 . one more pack of cigarettes is estimated to  reduce bwght by about 8.1374075 .0263742 parity | . the coefficient on cigprice is not the sign we expect.696129 1387 .0666084 .29448  -----------------------------------------------------------------------------packs | Coef. t P>|t| [95% Conf.  what we need for identification is that at least one of the zh appears in the reduced form for q1.56)  Now.. The point of this exercise is that one cannot simply add instrumental variable candidates in the structural equation and then test for significance of these variables using OLS.  This is the sense in which identification  With a single endogenous variable. since the zh are redundant in (5. v (by definition of redundancy).h1a1..  at least one of pK+1.45) we get y = b0 + b1x1 + .  More formally. We need family background variables to be redundant in the log(wage) 17  .. If we plug q = (1/d1)q1 . Given all of the zero correlation assumptions. that  ^ J 1  We could find  from this regression is statistically different from zero even when q  and z2 are uncorrelated -..is correlated with u1..  (5. z2 does not produce a consistent estimator of 0 on z2 even when E(z’ 2 q) = 0.z1.  Or. where h1  _ (1/d1). they are  uncorrelated with the structural error... cannot be tested...  5. we have assumed that the zh are uncorrelated with a1. b.h1a1.x1. .(1/d1)a1 into equation (5.7. y2.. Further. we might fail to reject H0:  J1  = 0 when z2  and q are correlated -.xK..45).zM) to get consistent of the bj and h1.in which case we incorrectly conclude that the elements in z2 are valid as instruments. a. + pK+MzM + r1. + bKxK + h1q1 + v . we must take a stand  that at least one element of z2 is uncorrelated with q.  Since each xj  is also uncorrelated with v ... + pKxK + pK+1z1 + ..in which case we would incorrectly conclude that z2 is not a valid IV candidate. pK+M must be different from zero. in the linear projection q1 = p0 + p1x1 + .56) by 2SLS using instruments (1.z2. and so the regression of y1 on z1.. we can estimate (5... 1201284 .48 0.0000 0.2513311 black | . such as educ and exper).0367425 -1.35 0.0241444 urban | .07 0.1901012 .70 0.45036497 Residual | 107.1835294 .0154368 .0400269 .RAW gives the following results: .0137529 educ | .31 0.0040076 4.537 -. Applying the procedure to the data set in NLS80.471616 .192 -.000 .1225442 .035254 . reg lwage exper tenure educ married south urban black iq (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------Model | 19.046 . c.38777  -----------------------------------------------------------------------------lwage | Coef.0162185 . once the xj have been netted out.0676158 married | .  For the rank condition to hold.  This is likely to be true if we think that family background and ability  are (partially) correlated. reg lwage exper tenure educ married south urban black kww (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------18  Number of obs = F( 8. say IQ.equation once ability (and other factors.551 5.000 .175883378  Number of obs F( 8. 713) =  722 25.1546 0. have been controlled for.00 0.0305692 tenure | .725 -.811916 721 .1451 .62 0.208996 713 .0077077 2.0030956 2.0161809 .0003044 . 713) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  722 25.0240867 _cons | 4. Interval] -------------+---------------------------------------------------------------iq | .047992 .0261982 0.013 .000 3.0076754 .2635832 exper | .000 .54 0.70  .392231 -----------------------------------------------------------------------------.6029198 8 2. q1.0982991 .0467592 4.150363248 -------------+-----------------------------Total | 126.2819033 south | -.0015979 .468913 9.0327986 5.1869376 . t P>|t| [95% Conf.1138678 0. Err.  The idea here is that family background may influence ability  but should have no partial effect on log(wage) once ability has been accounted for.05 0.0083503 . Std.81 0. we need family background variables to  be correlated with the indicator. 004 -. or they might be correlated with a1.635 -.02 0.307 -.0260808 .  What we could do is  define binary indicators for whether the corresponding variable is missing.2292093 black | -.b3. Interval] -------------+---------------------------------------------------------------kww | .1551341 -.537362 -----------------------------------------------------------------------------Even though there are 935 men in the sample.0037739 1.  This could be because family  background variables do not satisfy the appropriate redundancy condition.1468 .0046184 .175883378  Prob > F R-squared Adj R-squared Root MSE  = = = =  0.06 0.477538 Residual | 106.0201147 _cons | 5.1627592 32.0675914 . and then use the binary indicators as instruments along with meduc.1330137 exper | . and sibs have p-values below .0424452 .  This would allow us to use all  935 observations. the equation and rearranging gives  19  Plugging this expression into  .091887 .000 4.0255051 1.61 0.309 -.003 .811916 721 .098 -.1484003 . feduc.0529759 3. Std. the F  statistic for joint significance of meduc. Err.)  5.0249441 .Model | 19. so it seems the family background variables are sufficiently partially correlated with the ability indicators.150058361 -------------+-----------------------------Total | 126. so that b4 = b3 + q4.1563 0.47 0. because data are missing on meduc and feduc.0022947 . and sibs.0545067 tenure | .898273 5.66 0.0286399 urban | .0063783 . The return to education is estimated to be small and insignificant whether IQ or KWW used is used as the indicator.0411598 3.0565198 .0893695 -0.02 0.36 0.2179041 .0051145 .0000 0.2645347 south | -.85 0.0068682 .03 0.38737  -----------------------------------------------------------------------------lwage | Coef.0067471 1.002.0322147 -2.0239933 .820304 8 2.0125238 educ | . feduc.9.217818 .176 -.991612 713 .0761549 married | . t P>|t| [95% Conf.  (In both first-stage regressions. set the missing values to zero.1605273 .000 . only 722 are used for the estimation. Define q4 = b4 .0150576 1.   Now. dist2yr and dist4yr as the full set of instruments.  0  5. general. *  Now.1 show that the argument carries over to the case when  L2  is estimated.  log(wage) = b0 + b1exper + b2exper = b0 + b1exper + b2exper where totcoll = twoyr + fouryr.x) _ * = i i 7i=1 i 8 7i=1 i 8 20  .  The key consistency condition is that  each explanatory is orthogonal to the composite error. the IV estimate of the ^ slope can be written as b1 =  & SN (z . a. where E(z’r2) = 0. assumption.  (The results on  generated regressors in Section 6. let y2 be the linear projection of y2 on z2.  0  Further. *  The  lesson is that one must be very careful if manually carrying out 2SLS by explicitly doing the first.2  + b3(twoyr + fouryr) + q4fouryr + u  2  + b3 totcoll + q4fouryr + u. *  that  P2  The problem  $ 0 necessarily because z1 was not included in the linear  projection for y2. y2. let a2  L2  be the projection error.  By  y2  The second step regression (assuming  is known) is essentially y1 = z1D1 + a1y2 + a1r2 + u1. a1a2 + u1.and second-stage regressions.)  0  Plugging in y2 = y2 + a2 gives  y1 = z1D1 + a1y2 + a1a2 + u1.11.1. OLS will be inconsistent for all parameters in *  Contrast this with 2SLS when y2 is the projection on z1 and z2:  = y2 + r2 = zP2 + r2. E(y2a2) = 0 by construction. r2 is uncorrelated with z. ^ We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0.13. and assume that  is known. In a simple regression model with a single IV. exper . we regress y1 on z1. 0  0  Effectively. and so E(z’ 1 r2) = 0 and E(y2r2) = 0.  5. just estimate the latter equation by  2  2SLS using exper. is that E(z’ 1 a2)  Therefore._z)(y ._y)*/& SN (z . Following the hint._z)(x . E(z’u1) = 0.   (When eligibility is  Generally. write y  -----  y = (N0/N)y0 + (N1/N)y1. where IK  2  20  ^11  2  is the K2 x K2  is L1 x K1.y0).  -----  -----  So the numerator of the IV estimate is (N0N1/N)(y1 . we can write  ^  =  (^ 9^  2  11 12  0 IK  identity matrix.  ) .y). suppose xi = 1 if person i  participates in a job training program. If x is also binary -.N1)/N]y1 . i=1 i=1 i=1 N where N1 = S zi is the number of observations in the sample with zi = 1 and -----  -----  -----  -----  i=1  -----  -----  y1 is the average of the yi over the observations with zi = 1.(N0/N)y0 -----  -----  = (N0/N)(y1 . a. the vector z1 does not appear in L(xj|z).y = [(N . then 21  ^11  has  . -----  -----  The same argument shows that the denominator is (N0N1/N)(x1 .x0 is the difference in  -----  -----  necessary for participation._y) = S ziyi . the rank condition holds if and only if rank(^) = K. In L(x|z) = z^.& SN z (y .N1y = N1(y1 . If for some xj.  So. 0 is the L1 x K2 zero matrix. -----  b.  5.  Then x1 is the fraction of people -----  participating in the program out of those made eligibile. and let zi = 1 if person i is eligible for participation in the program.y0). K1.&7 S zi*8_y = N1y1 . and x0 is the fraction of people participating who are not eligible.  -----  -----  Next. x1 .representing some "treatment" -. as a weighted average: clear. where the notation should be -----  -----  -----  -----  Straightforward algebra shows that y1 .12. Now the numerator can be written as 7i=1 i i 8 7i=1 i i 8 N N N S zi(yi .15._y)*/& SN z (x .)  participation rates when z = 1 and z = 0.x1 is the -----  fraction of observations receiving treatment when zi = 1 and x0 is the fraction receiving treatment when zi = 0. x0 = 0. and  ^12  is K2 x  As in Problem 5.  Taking the  ratio proves the result.x0)._x)*.  So the difference in the mean  response between the z = 1 and z = 0 groups gets divided by the difference in participation rates across the two groups.  which means that the second row of  zeros.  ^  = K. in that case. Suppose K1 = 2 and L1 = 2. we can simply reorder the elements of z1 to ensure this is the case.  ^  can be written as  which means rank(^) < K.  (^  11 12  9^ 2  ^  0 IK  ) .  But then that column of  a linear combination of the last K2 elements of  ^.  ^  is all  Intuitively. a necessary condition for the rank condition is that no columns of  ^11  be exactly zero. c.. Without loss of generality.  Then  Looking at  ^11  ^  =  diagonals then Therefore. which means that at least one zh must appear in the  reduced form of each xj.K1.  It cannot have rank K.. only one of them turned out to be partially correlated with x1 and x2. j = 1. a. b.a column which is entirely zeros. Here is abbreviated Stata output for testing the null hypothesis that educ is exogenous:  . qui reg educ nearc4 nearc2 exper expersq black south smsa reg661-reg668 smsa66 .  CHAPTER 6  6.  Therefore.1. predict v2hat.. while we began with  two instruments. we assume that zj appears in the reduced form for xj. where z1 appears in the reduced form form both x1 and x2.. rank  is a K1 x K1 diagonal matrix with nonzero diagonal elements. resid 22  . but z2 appears in neither reduced form.  ^11  Then the 2 x 2 matrix  has zeros in its second row. we see that if  2  20  ^11  is diagonal with all nonzero  is lower triangular with all nonzero diagonal elements. 1811588 -.117 -.0552789 v2hat | -. Std.0610759 .000 .1659733 reg667 | . I would call this marginal evidence  (Depending on the application or purpose of a study.566 0.0205106 0.1371509 reg666 | .0002286 .153 -.0359807 -1.1057408 reg664 | -.463 -. t P>|t| [95% Conf.0341426 .0515041 .384 0.000 . reg uhat1 exper expersq black south smsa reg661-reg668 smsa66 nearc4 nearc2 Source |  SS  df  MS  Number of obs = 23  3010  . predict uhat1.0919791 smsa | .821434 4.0151411 reg665 | .2171749 -. we regress the 2SLS residuals on all exogenous variables: .540 0..1570594 .734 0.0699968 .1034468 smsa66 | .0251538 .050516 .2517275 exper | .0118296 .100753 . that educ is endogenous.253 0.994 -.177718 .855 0.000 -.010 -.000 -.0121169 _cons | 3.0150626 .0247932 reg662 | -.0484086 -1.0606186 reg663 | .0209423 5.71.0456842 0.007 0. reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 v2hat -----------------------------------------------------------------------------lwage | Coef. Err.000 1.729054 4.0289435 3.1286352 reg668 | -. To test the single overidentifying restiction we obtain the 2SLS residuals: .2926273 -.0259797 .179 0.0489487 1.0440018 .087 -.066 0.1188149 .574 0.0261202 -5.339687 .001 .1431945 .124 -.0436804 1.105 0.1980371 .0478882 -2.000 -.0310325 -0.0003191 -7.0482417 -4.482 0.0293807 south | -.673 0.1232778 .0390596 . which is not significant at the 5% level against a two-sided alternative.) b.0299809 1.481 0.1944098 -.0023565 .0017308 black | -.583 0.1575042 reg661 | -.950319 -----------------------------------------------------------------------------^ The t statistic on v2 is -1.010 -.0623912 .  The negative correlation between u1 and educ  is essentially the same finding that the 2SLS estimated return to education is larger than the OLS estimate.0482814 3.0469556 .0777521 . resid Now.0828005 .710 0.102976 .393 -.  In any case.430 0. qui reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 (nearc4 nearc2 exper expersq black south smsa reg661-reg668 smsa66) .  the same researcher may take t = -1.1259578 .238 -.71 as evidence for or against endogeneity. Interval] ---------+-------------------------------------------------------------------educ | .0398738 -2.0029822 -.1598776 expersq | -.0554084 .  is about .2) . educ.0000 = 0. We need prices to satisfy two requirements. obtained from a c1 distribution.  First. exper . exper . di 3010*.  Ideally. exper.012745177 Residual | 491. c. pM.163433913  F( 16. a. 2993) Prob > F R-squared Adj R-squared Root MSE  = 0.568721 2993 . we must also  assume prices are exogenous in the productivity equation.27332168 2  The p-value.40526  The test statistic is the sample size times the R-squared from this regression: .  Then we would run the  2 ^ ^ regression log(produc) on 1. Since there are two endogenous explanatory variables we need at least two prices.164239466 ---------+-----------------------------Total | 491. educ..0004 1.772644 3009 . and the M prices. b. so the instruments pass the overidentification test.0004 = -0.203922832 16 .08 = 1. v22 and do a joint  24  . exper. prices vary  because of things like transportation costs that are not systematically related to regional variations in individual productivity.273.204 .  In addition.3. calories and  protein must be partially correlated with prices of food. the rank condition could still be violated (although see Problem 15. di chiprob(1.5c). p1.0049 = .1. v21. v21 and v22. .---------+-----------------------------Model | . We would first estimate the two reduced forms for calories and protein 2  by regressing each on a constant.  6.  While this is easy  to test for each by estimating the two reduced forms.  ^ ^ We obtain the residuals..  A potential problem is that  prices reflect food quality and that features of the food other than calories and protein appear in the disturbance u1.. ^ ^ significance test on v21 and v22.  6. i h i i 8 7  (6.) N  (In any case. so y = xB + u. we can write ui = ui .Mh)’xi] = 0.s ) + op(1).s has a zero sample average.  We could use a standard F test or use a  heteroskedasticity-robust test.s ). absorb the intercept in x.B)rN(B .^s2) = N-1/2 S (hi .Mh)’xi rN(B .2 N S ui(hi . Dropping the "-2" the second term can & -1 N * ^ ^ be written as N S ui(hi .  N  i=1  i=1  We are done with this part if we show N  -1/2 N  N  S (hi .Mh)’xi8*(B . 25  We have shown that the last two  .Mh)’(u S h’i (u i .Mh)’u^2i = N-1/2 S (hi . s is implictly SSR/N -.Mh)’(xi t xi)*8{vec[rN(B .B) = 7 i=1 8 Op(1) and. ^2 In these tests.2uixi(B -  B)  +  so  -1/2 N  N  S (hi . E(u|x) = 0.s2) = Op(1)Wop(1) = op(1). under E(ui|xi) = 0.5.B) = op(1)WOp(1) because rN(B . the df adjustment makes no difference  -1/2 N  ^2 ^2 -1/2 N ^2 ^2 S (hi .s ) = N i .B)’]. which means that  asymptotically.Mh)’(s^2 . i=1  -1 N  -----  -----  where we again use the fact that sample averages are Op(1) by the law of large ^ numbers and vec[rN(B -----  B)rN(B^ -----  -  B)’]  = Op(1).B)’]}.40)  i=1  ^ where the expression for the third term follows from [xi(B -  B)]2  ^ = xi(B -  B)(B^  ^ ^ t xi)vec[(B . N  -1/2&  7N  The third term can be written as  ^ ^ -1/2 S (hi .Mh)’(u i .Mh)’ = Op(1) by the central limit theorem and s^2 .B) 7 i=1 N & -1/2 S (h .  ^ [xi(B N  B)]2.  S (hi .  i=1  ^2 2 ^ Now.Mh)’u^2i = N-1/2 S (hi -  i=1  M  2 h)’ui  + op(1). a. N  i=1 -1/2 N  op(1). 2  freedom adjustment.  ^2 ^2 So ui .B)’]} = N WOp(1)WOp(1). so  i=1  far we have -1/2 N  N ^2 2 S h’i (u^2i .4.s2 =  Next.  So N  Therefore.B)(B .B)(B . E[ui(hi . For simplicity.Mh)’u2i i=1 i=1 & -1/2 N ^ .there is no degrees of  Var(u|x) = s . the law of large numbers  -  B)’x’i  = (xi  -----  -----  implies that the sample average is op(1).M )’(x t x )*{vec[(B ^ ^ + N . as in Problem 4.  -1/2 N  i=1  i=1  S (hi . terms in (6.40) are op(1), which proves part (a). -1/2 N  S h’i (u^2i - ^s2) is Var[(hi  b. By part (a), the asymptotic variance of N  i=1  M  -  2 h)’(ui  2 2  4  2uis  2  - s )] =  + s .  2 E[(ui  2 2  - s ) (hi -  Mh)’(hi  -  Mh)].  2  Under the null, E(ui|xi) = Var(ui|xi) = s 2  2  2  |xi] = k2 - s4 _ h2.  2 2  2  2 2  standard iterated expectations argument gives E[(ui - s ) (hi 2  2 2  Mh)}  Mh)’(hi  -  Mh)]|xi}  2  [since hi = h(xi)] = h E[(hi -  show.  2  2 2  = E{E[(ui - s )  Mh)’(hi  -  4  = ui -  [since E(ui|xi) = 0 is  assumed] and therefore, when we add (6.27), E[(ui - s )  = E{E[(ui - s ) (hi -  2 2  Now (ui - s )  Mh)].  Mh)’(hi  -  A  Mh)]  |xi](hi - Mh)’(hi -  This is what we wanted to  (Whether we do the argument for a random draw i or for random variables  representing the population is a matter of taste.) c. From part (b) and Lemma 3.8, the following statistic has an asymptotic 2  cQ distribution:  &N-1/2 SN (u^2 - s^2)h *{h2E[(h - M )’(h - M )]}-1&N-1/2 SN h’(u ^2 ^2 * i i8 i h i h i i - s )8. 7 7 i=1 i=1 N ^2 ^2 Using again the fact that S (ui - s ) = 0, we can replace hi with hi - h in -----  i=1  the two vectors forming the quadratic form.  Then, again by Lemma 3.8, we can  replace the matrix in the quadratic form with a consistent estimator, which is ^2& -1 h N ^2 -1 where h = N  N ^2 ^2 2 S (u i - s ) .  7  N S (hi - h)’(hi - h)*8, i=1 -----  -----  The computable statistic, after simple algebra,  i=1  can be written as  & SN (u^2 - s^2)(h - h)*& SN (h - h)’(h - h)*-1& SN (h - h)’(u ^2 ^2 * ^2 i i i - s )8/h . 7i=1 i 87i=1 i 8 7i=1 i -----  -----  -----  -----  ^2 ^2 Now h is just the total sum of squares in the ui, divided by N.  The numerator  ^2 of the statistic is simply the explained sum of squares from the regression ui on 1, hi, i = 1,...,N.  Therefore, the test statistic is N times the usual  ^2 2 (centered) R-squared from the regression ui on 1, hi, i = 1,...,N, or NRc. 2  2 2  d. Without assumption (6.37) we need to estimate E[(ui - s ) (hi -  Mh)]  generally.  Hopefully, the approach is by now pretty clear. 26  Mh)’(hi  We replace  the population expected value with the sample average and replace any unknown parameters -(under H0).  B,  2  s , and  Mh  in this case -- with their consistent estimators  & 7  ^2 ^2 * S h’i (u i - s )8 i=1  -1/2 N  So a generally consistent estimator of Avar N  is N  -1 N  S (u^2i - s^2)2(hi - h)’(hi - h), -----  -----  i=1  and the test statistic robust to heterokurtosis can be written as  & SN (u ^2 ^2 *& SN (u^2 - ^s2)2(h - h)’(h - h)*-1 - s )(hi - h) i i i 7i=1 87i=1 i 8 N & ^2 ^2 * W7 S (hi - h)’(ui - s )8, -----  -----  -----  -----  i=1  which is easily seen to be the explained sum of squares from the regression of ^2 ^2 1 on (ui - s )(hi - h), i = 1,...,N (without an intercept). -----  Since the total  sum of squares, without demeaning, is N = (1 + 1 + ... + 1) (N times), the statistic is equivalent to N - SSR0, where SSR0 is the sum of squared residuals.  6.7. a. The simple regression results are  . reg lprice ldist if y81 Source | SS df MS ---------+-----------------------------Model | 3.86426989 1 3.86426989 Residual | 17.5730845 140 .125522032 ---------+-----------------------------Total | 21.4373543 141 .152037974  Number of obs F( 1, 140) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  142 30.79 0.0000 0.1803 0.1744 .35429  -----------------------------------------------------------------------------lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------ldist | .3648752 .0657613 5.548 0.000 .2348615 .4948889 _cons | 8.047158 .6462419 12.452 0.000 6.769503 9.324813 -----------------------------------------------------------------------------This regression suggests a strong link between housing price and distance from the incinerator (as distance increases, so does housing price). 27  The elasticity  is .365 and the t statistic is 5.55.  However, this is not a good causal  regression:  the incinerator may have been put near homes with lower values to  begin with.  If so, we would expect the positive relationship found in the  simple regression even if the new incinerator had no effect on housing prices. b. The parameter d3 should be positive:  after the incinerator is built a  house should be worth more the farther it is from the incinerator.  Here is my  Stata session: . gen y81ldist = y81*ldist . reg lprice y81 ldist y81ldist Source | SS df MS ---------+-----------------------------Model | 24.3172548 3 8.10575159 Residual | 37.1217306 317 .117103251 ---------+-----------------------------Total | 61.4389853 320 .191996829  Number of obs F( 3, 317) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  321 69.22 0.0000 0.3958 0.3901 .3422  -----------------------------------------------------------------------------lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------y81 | -.0113101 .8050622 -0.014 0.989 -1.59525 1.57263 ldist | .316689 .0515323 6.145 0.000 .2153006 .4180775 y81ldist | .0481862 .0817929 0.589 0.556 -.1127394 .2091117 _cons | 8.058468 .5084358 15.850 0.000 7.058133 9.058803 -----------------------------------------------------------------------------The coefficient on ldist reveals the shortcoming of the regression in part (a). This coefficient measures the relationship between lprice and ldist in 1978, before the incinerator was even being rumored.  The effect of the incinerator  is given by the coefficient on the interaction, y81ldist.  While the direction  of the effect is as expected, it is not especially large, and it is statistically insignificant anyway.  Therefore, at this point, we cannot reject  the null hypothesis that building the incinerator had no effect on housing prices.  28  7937 0. t P>|t| [95% Conf.095 -.605315 lintstsq | -.0591504 .0000315 8.0469214 .191996829  Number of obs F( 11. but the interaction is still statistically insignificant.027479 3.4877198 -0.229847 .0805715 baths | .4556655 lland | .214 -.43282858 Residual | 12.0000 0.0512328 6.926 0.0000486 rooms | .1884113 y81ldist | .4389853 320 .000 .189519 .7298249 ldist | .796236 -----------------------------------------------------------------------------The incinerator effect is now larger (the elasticity is about .675 0.1588297 age | -.041028709 ---------+-----------------------------Total | 61.185185 5.0617759 .151 0.69e-06 3.0171015 2.9.432 0. Err.627 0.c.0000 0. Using these models and this two years of data we must conclude the evidence that housing prices were adversely affected by the new incinerator is somewhat weak.006 . Std.0222128 larea | .305525 1.20256  -----------------------------------------------------------------------------lprice | Coef.7611143 11 4.0611683 .7863 .002 -.677871 309 .9633332 .000 .489 0. The Stata results are .0958867 .0866424 .0517205 1.0187723 -3.1499564 _cons | 2.062) and the t statistic is larger.04 0.300 0.003 .0357625 .246 0. Interval] ---------+-------------------------------------------------------------------y81 | -.6029852 Residual | 8341.0132713 .0000144 .744 0.0014108 -5. reg lprice y81 ldist y81ldist lintst lintstsq larea lland age agesq rooms baths Source | SS df MS ---------+-----------------------------Model | 48. reg ldurat afchnge highearn afhigh male married head-construc if ky Source | SS df MS -------------+-----------------------------Model | 358.241 0.1593143 lintst | .638 -1.56381928 -------------+-----------------------------29  Number of obs F( 14.  6.774032 1.0046178 agesq | .41206 5334 1.3548562 .953 0.441793 14 25.0495705 1. 5334) Prob > F R-squared Adj R-squared  = = = = =  5349 16. 309) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  321 108. a.096088 -.109999 .37 0.000 -.0412 0.0387  . Adding the variables listed in the problem gives .000 .195 -1.041817 .3213518 1.0248165 4.0151265 .0073939 .2540468 .3262647 2.471 0.000 .0101699 -. 2408591 -.454054 -----------------------------------------------------------------------------The estimated coefficient on the interaction term is actually higher now.76 0. 30  With over 5.2117581 _cons | 1.178539 .33). or 3.74 0.12 0.0106049 married | .000 1.32 0.095 -. we  . on the order of 4.16 0.0106274 .1%. t P>|t| [95% Conf.2604634 neck | .67 0.24 0.0085967 .0803101 occdis | .6859052 manuf | -.1614899 1.000 -.1090163 1. and even more statistically significant than in equation (6.0774276 .078 -.1061677 11.  Adding the other  explanatory variables only slightly increased the standard error on the interaction term.1987962 head | -.1292776 -3.001 .340168 lowback | -.813 -.2772035 afhigh | . the OLS estimator is consistent.0743161 .245922 .08 0.1904371 lowextr | -.933 -.0454027 .001 .0445498 -2.1757598 . means that we cannot explain much of the variation in time on workers compensation using the variables included in the regression.0466737 .028 -.3208922 . b.376892 .0872651 .1220995 .1015267 -0.1101967 .1852766 -.0518063 2.18 0.5139003 .246 -.  The low R-squared  means that making predictions of log(durat) would be very difficult given the factors we have included in the regression:  the variation in the  unobservables pretty much swamps the explained variation.Total |  8699.1023262 -1.13 0.0086352 .1011794 -1.2076305 .0409038 -3.196 -.2505  -----------------------------------------------------------------------------ldurat | Coef.0804827 construc | .98 0.1404816 .2308768 .000 observations.0979407 .62674904  Root MSE  =  1.9% if we used the adjusted R-squared.3671738 male | -.5864988 upextr | -.0986824 highearn | .0517462 3.1202911 .033 .2727118 .  Provided the Kentucky change is a good natural  experiment.002 . the low  R-squared does not mean we have a biased or consistent estimator of the effect of the policy change.7673372 -.  However.210769 1.1606709 .2699126 .0198141 trunk | .03779 1.000 -. Err. Std. Interval] -------------+---------------------------------------------------------------afchnge | .93 0. is often the case in the social sciences:  This  it is very difficult to include the  multitude of factors that can affect something like durat.29 0.0945798 .20 0.240 -.0391228 3. The small R-squared.85385  5348  1.40 0.0449167 0.0695248 3.1264514 . 0567172 24.523989 -----------------------------------------------------------------------------The coefficient on the interaction term.109 -.000 1. reg lwage y85 educ y85educ exper expersq union female y85fem Source | SS df MS -------------+-----------------------------Model | 135.15 0. the ratio of  The difference in the KY and MI cases shows  the importance of a large sample size for this kind of policy analysis.91356194  Number of obs F( 3. t P>|t| [95% Conf. .0973808 . is remarkably similar to that for Kentucky.3850177 3 11. because of the many fewer observations.23.05 0.9990092 31  Number of obs = F( 8.35483 1523 1.can get a reasonably precise estimate of the effect. although the 95% confidence interval is pretty wide.524)  ~ 1.301485 1.3765  -----------------------------------------------------------------------------ldurat | Coef.  standard errors is about 2.0004 0. Interval] -------------+---------------------------------------------------------------afchnge | .3762124 afhigh | . Err.91 0. 1520) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  1524 6. Using the data for Michigan to estimate the simple model gives .0689329 .89471698 -------------+-----------------------------Total | 2914.4943988 _cons | 1.213 -. reg ldurat afchnge highearn afhigh if mi Source | SS df MS -------------+-----------------------------Model | 34.992074 8 16. The following is Stata output that I will use to answer the first three parts:  .1541699 1.60 0.0118 0.96981 1520 1.626/1.92 larger than that for Kentucky.0000  . 1075) = Prob > F =  1084 99. the t  statistic is insignificant at the 10% level against a one-sided alternative. Asymptotic theory predicts that the standard error for Michigan will be about 1/2  (5.1691388 .1104176 .11.0379348 . Std.4616726 Residual | 2879.251 -.0847879 1.1919906 .  In fact.2636945 highearn | . c.1055676 1.80 0.192.412737 .  6.  Unfortunately.0098 1.25 0. 0035673 8.29463635  R-squared = Adj R-squared = Root MSE =  0.  (But see part e for the proper interpretation of the coefficient.19 0. which gives a t statistic of about 1. Still.0000775 -5.000106 .2755707 . Interval] -------------+---------------------------------------------------------------y85 | . which  is marginally significant at the 5% level against a two-sided alternative.0302945 6. or 1.051309 1.95 0.0093542 1. Err.0934485 4.)  d.3885663 -.  32  So  .0366215 -8.0295843 .642295 ------------------------------------------------------------------------------  a.2021319 .000 .125075 .1426888 . c.000 .0156251 .091167 1083 .97 0.67 0. Only the coefficient on y85 changes if wages are measured in 1978 dollars.3167086 .049 . The return to another year of education increased by about .97. this is suggestive of some closing of wage differentials between men and women at given levels of education and workforce experience.0747209 . between 1978 and 1985.91 0. The coefficient on y85fem is positive and shows that the estimated gender gap declined by about 8.244851 y85fem | .0066764 11.0003994 .098 -.0005516 -.036584 expersq | -. I just took the squared OLS residuals and regressed those on the year dummy.042 with a  standard error of about . the  coefficient on y85 becomes about -.4127  -----------------------------------------------------------------------------lwage | Coef. b.0878212 y85educ | .4262 0.  But the t statistic is  only significant at about the 10% level against a two-sided alternative.022.170324738 -------------+-----------------------------Total | 319.000 . which shows a significant fall in real wages for given productivity characteristics and gender over the seven-year period.000 . t P>|t| [95% Conf.91.000 -.Residual | 183.  The coefficient is about .3606874 educ | .0185.65 0.85 percentage points.085052 .185729 _cons | . you can check that when 1978 wages are used.036815 exper | .0616206 .099094 1075 .29 0.341 -.  The t statistic is 1.0002473 union | .0184605 .5 percentage points. To answer this question.2615749 female | -.  In fact. Std.000 -.4589329 .4219 .66 0.0225846 . y85.383.1237817 0.1178062 .15 0. 0003994 .0093542 1. we want q0  _ d0 + 12d1.  [We could use the more accurate estimate. reg lwage y85 educ y85educ0 exper expersq union female y85fem Source | SS df MS -------------+-----------------------------Model | 135.244851 y85fem | .170324738 -------------+-----------------------------Total | 319.0295843 .4589329 .0066764 11.2755707 .67 0.0156251 .085052 .0747209 .000 -.000106 .0878212 y85educ0 | .  In Stata we have  .000 .66 0.19 0.0000775 -5.0000 0.4060659 educ | .29 0. 1075) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  1084 99.there is some evidence that the variance of the unexplained part of log wages (or log real wages) has increased over time. Interval] -------------+---------------------------------------------------------------y85 | .9%.3393326 . gen y85educ0 = y85*(educ .185729 _cons | .0934485 4.]  The 95% confidence interval goes from about 27.099094 1075 .29463635  Number of obs F( 8.91 0. in the new model.0184605 .80 0. educ.1426888 .000 .4219 .642295 -----------------------------------------------------------------------------So the growth in nominal wages for a man with educ = 12 is about .049 . obtained from exp(.6. As the equation is written in the problem.  For a male  A simple way to obtain  ^ ^ ^ the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that.000 .0616206 .2021319 .2615749 female | -.000 .0302945 6.098 -. Err.339.15 0.0002473 union | .9990092 Residual | 183.3 to 40.992074 8 16.12) .036815 exper | .2725993 . or 33.0225846 .0366215 -8. Std.000 -.339) -1. t P>|t| [95% Conf.  33  . the coefficient d0 is the growth in nominal wages for a male with no years of education! with 12 years of education.036584 expersq | -.3885663 -.4262 0. q0 is the coefficient on  12).97 0.4127  -----------------------------------------------------------------------------lwage | Coef.65 0.000 .98 0.0035673 8.3167086 . e.091167 1083 .051309 1.0340099 9.0005516 -.  Write (with probability approaching one)  B^  =  B  +  &N-1 SN X’X *-1&N-1 SN X’u *. ) plim B = B + plim N S X’ 7 i=1 i i8 7 i=1 i i8  Further.3  sg2E(x’igxig) for all g = 1. 7 i=1 i i8 7 i=1 i i8  From SOLS. it is also block diagonal.1. 0 2 s-2 G E(x’ iGxiG)8 0  W  When this matrix is inverted..  &s-2 0 1 E(x’ i1xi1) 2 -1 0 W E(X’ i ) Xi) = 2 W 2 7 0 0  Therefore. and SGLS. the WLLN implies that plim  7.  it suffices to show that the GLS estimators for different equations are asymptotically uncorrelated. Thus.3. 34  This shows that the  . and Slutsky’s Theorem.1. from Theorem 7.G.B) = [E(X’ i ) Xi)] . a.  Now. plim  &N-1 SN X’X *-1 = A-1.1.^ -1 -1 Avar rN(B . we use the result  under SGLS.CHAPTER 7  7. under SOLS. and SGLS. we have  * 2 2.4:  To establish block diagonality. the fact that  )-1  is diagonal.2. we can use the special form of Xi for SUR (see Example 7.2.3. all g  $ h.  SGLS. SGLS.  This follows if the asymptotic variance matrix  is block diagonal (see Section 3.. 7 i=1 i i8  &N-1 SN X’u * = 0.3. the weak law of large numbers. Since OLS equation-by-equation is the same as GLS when  )  is diagonal. --. and  E(uiguihx’ igxih) = E(uiguih)E(x’ igxih) = 0. asymptotic variance of what we wanted to show..5). where the blocking is by the parameter vector for each equation. 7 i=1 i i8 ^ & -1 N X *-1Wplim &N-1 SN X’u * = B + A-1W0 = B.  implies that E(uigx’ igxig) = 2  In the SUR model with diagonal  ).1).. .  See Problem 7.6 for one way to  impose general linear restrictions. model with the restriction  B1  =  For the restricted SSR we must estimate the  B2  imposed.. OLS and FGLS are asymptotically equivalent. we can either construct the Wald statistic or we can use the weighted sum of squared residuals form of the statistic as in (7.BGLS) = op(1).3 holds. GLS and FGLS are asymptotically equivalent (regardless of the structure of =  B^GLS  when  )  and  ))  whether or not SGLS.. consider E(uituis). This is easy with the hint. the diagonal elements of E[E(uit|xit)] = 2  )  st2 by iterated expectations.G.B ^ ^ --.b. then rN(BSOLS .  Under SGLS.  is diagonal.53).  B^SOLS  But.1 and SGLS. First. c.  )  7..  B^  =  -1* &^ N ^ -1 2) t &7 S x’i xi*8 2() 7 i=1 8  & SN x’y * & SN x’y * i i12 i i12 2i=1 2i=1 2 2 -1 2 2 & N * t IK)2 WW 2 = 2IG t &7 S x’i xi*8 22 W WW 22.5. system OLS and GLS are the same.52) or (7. a.7. and  .  Note that  -1 &^ -1 & N *-1 N ^ 2) t 7 S x’i xi*82 = ) t &7 S x’i xi*8 .  --.3 does not hold. When  )  is diagonal in a SUR system. where Bg is the OLS estimator for equation g. 7 i=1 8 i=1 Therefore.2.^ ^ rN( FGLS .BFGLS) = op(1). To test any linear hypothesis. g = 1. even if  )^  is  estimated in an unrestricted fashion and even if the system homoskedasticity assumption SGLS.  7. if Thus. i=1 82 2 W 2 7 2 N 2 2 SN x’y 2 i iG8 7 S x’i yiG8 7 i=1  i=1  Straightforward multiplication shows that the right hand side of the equation ^ ^ ^ ^ is just the vector of stacked Bg. 35  2  are easily found since E(uit) = Now.  7i=1t=1 t it it8 7i=1t=1 t it it8 = b0 + b1yi.2.1. which says that xi. say..  T  S x’its-2 t uit. yit  with uit. does not hold.  First consider the terms for s  T  T  -2 S S s-2 t ss E(uituisx’ itxis).  E(uit|xit.t-1 + uit. The GLS estimator is -1 N N B* _ &7 S X’i )-1Xi*8 &7 S X’i )-1yi*8 i=1 i=1  & SN ST s-2x’ x *-1& SN ST s-2x’ y *.. s > t.  and so E(X’ i ) uiu’ i ) Xi) = -1  -1  $ t.  for each t. First. SGLS. E(uit|uis) = 0 since uis  take s < t without loss of generality. . b.79)..t+1 = yit is correlated with uit..80). xis.  It follows that E(X’ i ) uiu’ i ) Xi) = -1  -1  T  -1 S s-2 t E(x’ itxit) = E(X’ i ) Xi).. X’ i ) ui = -1  T  E(X’ i ) ui) =  S s-2 t E(x’ ituit) t=1  -1  since E(x’ ituit) = 0 under (7. since  is diagonal... then yit is clearly correlated =  c. If. d.80).  Thus.  t=1  36  Next.80).  . and so by the LIE. E(uitx’ itxit) = E[E(uitx’ itxit|xit)] = E[E(uit|xit)x’ itxit)] 2  2  2  = E[stx’ itxit] = 2  st2E(x’itxit). X’ i)  = (s1 x’ i1.s2 x’ i2.  t=1s=1  Under (7.  Applying the law of  iterated expectations (LIE) again we have E(uituis) = E[E(uituis|uis)] = E[E(uit|uis)uis)] = 0.T.uis. GLS is consistent in this case  without SGLS. E(uituisx’ itxis) = 0. SGLS.1 holds whenever there is feedback from yit to  )-1  However. t  $ s. and so  t=1  = 0  Thus.  -1  -2  -2  s-2 T x’ iT)’. since  )-1  is diagonal.1  Generally.xis) = 0.  is a subset of the conditioning information in (7.Under (7. if s < t.  t = 1. K as a degrees-of-freedom adjustment.^ ^  e. g.) standard arguments.5942432 Residual | 31. and F statistics from this weighted least squares analysis. inference is very easy.. we can use the standard errors and test statistics  reported by a standard OLS regression pooled across i and t..1). note that the  ^2  st should be  obtained from the pooled OLS residuals for the unrestricted model.8509 .  7. t statistics.51) are asymptotically valid. We have verified the assumptions under which standard FGLS statistics have nice properties (although we relaxed SGLS.. The Stata session follows..9.. let uit denote the pooled OLS residuals.  FGLS reduces  Thus. then the FGLS statistics are easily shown to be identical to the  statistics obtained by performing pooled OLS on the equation ^ ^ (yit/st) = (xit/st)B + errorit.  I first test for serial correlation before  computing the fully robust standard errors: .T.. We can obtain valid standard errors. for each t.  For F testing.  Then.N. 103) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  108 153.03370676 37  Number of obs F( 4.2. run pooled regression across all i and t. th  t  Now. If  s2t = s2 for all t = 1.. by  ^2  st Lp s2t as N L 8.376973 4 46.303200488 ---------+-----------------------------Total | 217.. define N ^2 ^2 st = N-1 S ^uit i=1  (We might replace N with N .606623 107 2..8565 0..T.  f. i = 1. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987 Source | SS df MS ---------+-----------------------------Model | 186.  to pooled OLS.55064  .  In particular.53) are valid. and F statistics from (7.0000 0. t = 1.2296502 103 . if  ^ )  is taken to be the diagonal matrix with  s^t2 as the  diagonal.. First. standard  errors obtained from (7.67 0..  Then.  but neither is strongly statistically significant.0357963 24.083 -.000 .0371354 .1610378 -0.0276544 .567  -----------------------------------------------------------------------------lscrap | Coef.770 0.8055569 1.035384 uhat_1 | . t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------d89 | -.809828 . are now the expected sign.  The variable grant would be if  we use a 10% significance level and a one-sided test. Std. 53) = Prob > F = 38  108 77.5958904 _cons | -.048 -.338 -.077 0.0000  .7530202 49 .2123137 .1257443 -1.1224292 grant | -.0571831 16.9204706 .2790328 .9518152 _cons | -.07976741  Number of obs F( 4.0021646 -----------------------------------------------------------------------------.939 -. predict uhat. Err.675 -. Now test for AR(1) serial correlation: .666 0. Interval] ---------+-------------------------------------------------------------------grant | .371 0.227673 53 2.962 0. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987.1153893 . 49) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  54 73.6186631 Residual | 15.4746525 4 23.3232679 lscrap_1 | .4170208 .  The results are  certainly different from when we omit the lag of log(scrap).232525 .426703 .3532078 . Std.0883283 -0.8571 0. Err.3785767 .606 0.8808216 .4500385 grant_1 | -.420 0.158 0.8454 .-----------------------------------------------------------------------------lscrap | Coef.0769918 grant_1 | -.47 0.0165089 .215732 0.4217765 .2120579 lscrap_1 | .24 0. reg lscrap grant grant_1 lscrap_1 uhat_1 if d89 Source | SS df MS ---------+-----------------------------Model | 94. and its lag.321490208 ---------+-----------------------------Total | 110.028 0.000 .1073226 .0000 0.0378247 .1146314 -2.1576739 1. robust cluster(fcode) Regression with robust standard errors  Number of obs = F( 4.1746251 -0. t P>|t| [95% Conf.173 -.875 -.4628854 -.097 0.507 -.1199127 -0. gen uhat_1 = uhat[_n-1] if d89 (417 missing values generated) . resid (363 missing values generated) .138043 -----------------------------------------------------------------------------The estimated effect of grant.1723924 . 6473024 lprbconv | -.R-squared Root MSE  Number of clusters (fcode) = 54  = =  0.5974413 -.380342 629 .0893147 -0.2475521 .5624 .0 F(  2.4663616 .010261 _cons | -.1188807 -1.3450708 .0645344 13.1153893 . Err. grant and  grant-1 are jointly insignificant: .7917042 -. Std.7513821 1.42 0.60 0.570 0. t P>|t| [95% Conf. The following Stata output should be self-explanatory. test grant grant_1 ( 1) ( 2)  grant = 0.679 -.000 -.153 -.1723924 .0660522 grant_1 | -.11.000 .01 0.143585231 ---------+-----------------------------Total | 206.1142922 grant | -.1145118 -1.735673 618 . t P>|t| [95% Conf. 618) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  630 74.  There is  strong evidence of positive serial correlation in the static model.8565 . .5700 0.328108652  Number of obs F( 11.318 -.694 0. Err.49 0.000 -. making both more statistically significant. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87 Source | SS df MS ---------+-----------------------------Model | 117.1073226 .1155314 . Interval] -------------+---------------------------------------------------------------d89 | -.0672268 3.1420073 -----------------------------------------------------------------------------The robust standard errors for grant and grant-1 are actually smaller than the usual ones.3266  7.4108369 .7195033 .3795728 39  .4938765 lprbpris | .0 grant_1 = 0.  However.551 -.0263683 -20.1790052 -0.6949699 Residual | 88.55064  -----------------------------------------------------------------------------| Robust lscrap | Coef.65 0.2517165 lscrap_1 | .8808216 . and the fully robust standard errors are much larger than the nonrobust ones.0367657 -19. Interval] ---------+-------------------------------------------------------------------lprbarr | -.14 0.0371354 .5456589 . Std.000 .216278 .45 0. 53) = Prob > F =  1.37893  -----------------------------------------------------------------------------lcrmrte | Coef. a.0000 0.682 0.644669 11 10.  I obtain the fully robust standard errors:  .0867575 .000 -2. robust cluster(county) Regression with robust standard errors  Number of obs = F( 11.056127934 ---------+-----------------------------Total | 76. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87. t P>|t| [95% Conf. t P>|t| [95% Conf.222504 .1968286 538 .1086284 . 538) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  540 831.9372719 -.0420791 .0696601 d84 | -. Err.2516253 -8.8648693 539 .1387815 .1095979 -6.0714718 d87 | -.0578218 -0. resid .2005023 .181 -.0051371 .0200271 -----------------------------------------------------------------------------Because of the strong serial correlation. gen uhat_1 = uhat[_n-1] if year > 81 (90 missing values generated) .7918085 .0200271 .6071 0. reg uhat uhat_1 Source | SS df MS ---------+-----------------------------Model | 46.835 0.000 -. Interval] ---------+-------------------------------------------------------------------uhat_1 | .057931 0.275 0.061 -.56 0.000 .576438 -1.728 0.3070248 .19 0.000 1. Std.878 0.000 .338 0.467 -.0000 0.0300252 12. Err.451 -.0101951 0.089 0.755 0.043503 .082293 .7195033 .6680407 Residual | 30.0049957 d85 | -.189 0.74e-10 . Interval] -------------+---------------------------------------------------------------lprbarr | -.0579205 -1.lavgsen | -.057923 -1. predict uhat.588149 -----------------------------------------------------------------------------.0364927 d86 | -.635 -. Std.02746 28.0270426 .23691  -----------------------------------------------------------------------------uhat | Coef. 89) = Prob > F = R-squared = Root MSE =  Number of clusters (county) = 90  630 37.0000 0.0780454 .7378666 .37893  -----------------------------------------------------------------------------| Robust lcrmrte | Coef.1566662 .5700 .4249525 d82 | .498 0.0583244 -1.0269872 lpolpc | .929 -.8457504 _cons | 1.475 0.6680407 1 46.6064 .3659886 .46 0.15563 .1087542 .056899 -0.0846963 _cons | -2.135 -.000 -.5017347 40  .142606437  Number of obs F( 1.0576243 -0.1925835 .1189026 d83 | -. 230 0.889 -.1062619 -.0440833 d86 | . the lagged crime rate is very significant.0267623 -2.0165096 -7.0229405 -7.8637879 _cons | -.0780454 .1609444 -.3113499 .27 0.7888214 . 528) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  540 464.0085982 .1272783 .0555355 lpolpc | .8647054 -2.68 0.031945255 ---------+-----------------------------Total | 180.8670945 528 .174924 -.71 0.784 0.199 -.045 -.1324195 -0.0000 0.445 -.77 0.229651 -----------------------------------------------------------------------------Not surprisingly.0108203 .014 .154268 539 .0270426 .1254092 .1088453 2. 1981. Err.0420159 .0678438 .02 0.082293 .043503 .0190806 43.1028353 .17873  -----------------------------------------------------------------------------lcrmrte | Coef.2119007 -.0570255 lavgsen | -.321 0.025 .0381447 -0. when we add the lag of log(crmrte): .1865956 -.480 -.1337606 d83 | -.  41  The  .1217691 lprbconv | -.119 -.0267299 -2.018 -3.02 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83-d87 lcrmrt_1 Source | SS df MS ---------+-----------------------------Model | 163.007 -.1130321 -0.000 .0124338 d84 | -.75 0.0107492 .0692234 .818 -.0011145 d85 | -.030387 -3.9064 0.800445 -.41 0.0268172 -0. We lose the first year.101492 .4638255 lavgsen | -. gen lcrmrt_1 = lcrmrte[_n-1] if year > 81 (90 missing values generated) .000 -.4057025 lprbpris | .0345003 -0.329 -.0385625 -2.0137298 .1546683 -.1087542 .015 -.121078 3.8442885 Residual | 16.0671272 .0428788 -0. t P>|t| [95% Conf.1152298 .046 -.0536882 .0051371 .9044 .026896 1.3659886 .0014224 d86 | -.0420791 .755 -.  Further.lprbconv | -.0948522 d87 | .14 0.749 -.0960793 lprbpris | -.5456589 .2475521 .000 -.0164261 6.29 0.6065681 d82 | .0431201 d87 | -.0233448 d84 | -.562 0.000 .0867575 .0304828 .8263047 .2906166 .306 0.312 0.078524 .006 0.273 0.  including it makes all other coefficients much smaller in magnitude.0312787 .792 0.0271816 2.0309127 d85 | -.0704368 -7.334237975  Number of obs F( 11.000 -.1174537 -.78 0.1103509 .179 0.0649438 . drop uhat uhat_1 b.1285118 .033643 -1.0391758 -2.470 0.0612797 .3641423 -----------------------------------------------------------------------------.0367296 0.430 0. Interval] ---------+-------------------------------------------------------------------lprbarr | -.0781181 d83 | -.1205245 lcrmrt_1 | .1668349 . Std.0487502 _cons | -2.287174 11 14.1378348 lpolpc | .003 .000 -.6856152 -.98 0. 000 -.2214516 -.154268 539 .272 0.0287165 -2.542 0.557 0.17667116 Residual | 16.000 -.0169096 -7.533423 20 8. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d84-d87 lcrmrt_1 uhat_1 From this regression the coefficient on uhat-1 is only -.023 -.334237975  Number of obs F( 20. t P>|t| [95% Conf.32 0.03202475 ---------+-----------------------------Total | 180.1389838 d83 | -.0195318 .0652494 . Interval] ---------+-------------------------------------------------------------------lprbarr | -.1005518 lprbpris | -.554 0.580 -. 519) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  540 255.9042 . d.0729231 .0088345 42  .0497918 lavgsen | -.049654 lpolpc | .1108926 . Std.0888553 .0352873 -0.071157 .1337714 .059 with t statistic -. although it is insignificant.6208452 519 .1277591 lprbconv | -.000 -.1292903 -.0000 0.166991 -. Err. gen uhat_1 = uhat[_n-1] if year > 82 (180 missing values generated) . which means that there is little evidence of serial correlation (especially since  ^  r is practically small).0172627 6.0238458 -7.000 .1746053 . None of the log(wage) variables is statistically significant.0165559 d84 | -.9077 0.1050704 . c.variable log(prbpris) now has a negative sign.1721313 -.911 0.087 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83-d87 lcrmrt_1 lwconlwloc Source | SS df MS ---------+-----------------------------Model | 163.0286922 -2. and the magnitudes are pretty small in all cases: . We still get a positive relationship between size of police force and crime rate.322 0. resid (90 missing values generated) . however.0311719 -3. I will not correct the  standard errors.1216644 -. predict uhat.011 -.17895  -----------------------------------------------------------------------------lcrmrte | Coef.  Thus. There is no evidence of serial correlation in the model with a lagged dependent variable: .986. 767901 .0 0.1. ^ ^ ^ (X’ZWZ’X)B = (X’ZWZ’Y).0 0.098539 lwfir | ..1173893 .0283133 . Letting Q(b) denote the objective function in (8.6438061 .1054249 .478 -.038269 d86 | .051 0.0660699 -1.5663  CHAPTER 8  8.0 0.  43  .181 -.0392516 -0. test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc ( ( ( ( ( ( ( ( (  1) 2) 3) 4) 5) 6) 7) 8) 9)  lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F(  = = = = = = = = =  0.354 -.791 0.154 0.0564908 lwmfg | -.310 -1.266 -.877 -.0409046 .0903894 .1003172 0. we can write.0405482 lwtrd | .88852 .0994076 d87 | ..871 0. it follows from multivariable calculus that  dQ(b)’ N ^& N .721 0.0 0..0223995 -0.0121236 ..1286819 lcrmrt_1 | .000 .582 0.172 -.0221872 0.561 -.710 0..0922683 lwser | .0263763 ..783 -.23). 519) = Prob > F =  0.0 0.0465632 .0318995 0.0355801 lwfed | .2201867 .. 7i=1 i i8 7i=1 i i In terms of full data matrices.0474615 .294 -.341 0.X B ^ * i )8 = 0.0208067 38..114 0.0466549 .0306847 .016 0. after simple algebra.0326156 -0.0 0.8496525 lwcon | -.6009076 -----------------------------------------------------------------------------.1070534 .0487983 lwtuc | -..= -2&7 S Z’i Xi*’ S Z’i (yi . 8 W7i=1 db i=1 Evaluating the derivative at the solution  B^  gives  & SN Z’X *’^W& SN Z’(y .0 0.928 0.471 -..0330676 .Xib)*8.d85 | -.0258059 ..1009652 .958 0.8087768 .6335887 -1.0034567 .0742918 ..0898807 .276 0.368 0.85 0.0296003 .0 0.0798526 1..0961124 .0355555 .0418004 1.012903 .0  9.2639275 lwsta | -.338 -..039408 lwloc | .0389325 -1.0371746 0.429 -.0498207 .29319 _cons | -.0439875 0. 5. where x  = z^ and E(z’e) = 0.  Then we must show that  ^2  = 0. E(s’r) = 0. s is simply a function of z since h shows that  ^2  _ h(z). it is now 44  .D(D’D) D’]* -1  -1/2  C. by the assumption that E(x|z) = L(x|z). th  is block diagonal with g  block Z’ g Xg.7 in Chapter 2). E(z’x) = E(z’x ). E(r|z) = 0.  But  Therefore.L(h|z) and r _ x .  Since this is a matrix quadratic form in the L  * L  symmetric. from  the two-step projection theorem (see Property LP. Z’X  Using these facts.  [IL .24).  But. r is also equal to x .E(x|z). idempotent matrix IL .  verifies the first part of the hint.  8. it is necessarily itself -1  positive semi-definite.7. which *  To verify the second step. First.  and write the linear projection as L(y|z.L(x|z).  x =  Therefore. we can always write x as its linear projection plus an error: *  x  *  + e. and so r is uncorrelated with all functions of z.  Straightforward matrix algebra  shows that (C’* C) .  Now.15). When  )^  is diagonal and Zi has the form in (8.(C’WC)(C’W*WC) (C’WC) can be written as -1  -1  C’*  -1/2  where D  _ *1/2WC. Therefore.B^  Solving for  gives (8.  8.  ^2  = [E(s’s)] E(s’r).D(D’D) D’. th  is a block diagonal matrix with g denotes the N  ^2&  N ^ ^ S Z’i ) Zi = Z’(IN t ))Z  i=1  * ^2 block sg S z’ 7i=1 igzig8 _ sgZ’g Zg. and this  = 0. where s -1  _ h . let h  _ h(z). This follows directly from the hint. where  ^1  is M  * K and ^2 is Q * K. where Zg N  * Lg matrix of instruments for the gth equation.  Further.  8.h) = z^1 + h^2.3. E(y2|z)]. If E(u|x) = 0 and E(u  2  =  |x) = s2 then the optimal instruments are s-2E(x|x)  s-2x. *  If E(ui|zi) = 2  -1  w(zi) = E(ui2|zi).F(z)].  8.9.straightforward to show that the 3SLS estimator consists of [X’ g Zg(Z’ g Zg) Z’ g Xg] X’ g Zg(Z’ g Zg) Z’ g Yg stacked from g = 1.  The constant multiple  s-2 clearly has no effect on the optimal IV estimator.  Further. -1  -1  -1  This is just  the system 2SLS estimator or.  45  . equivalently. the the optimal instruments are s-2zi^. except that  ^  These are the optimal IVs underlying  is replaced with its  --rN-consistent OLS estimator.5.  =  It follows that the optimal instruments are  (1/s1)[z1. If y2 is binary then E(y2|z) = P(y2 = 1|z) = F(z).E(y2|z)]. 2  Dropping the division by  s21 clearly does not affect the  optimal instruments. 2SLS. and this leads to the OLS estimator.y2) and so E(x1|z) = [z1. 2SLS equation-by-equation. The optimal instruments are given in Theorem 8. This is a simple application of Theorem 8. and so  2SLS is asymptotically efficient. and so the optimal IVs are [z1.  s2 and E(xi|zi) = zi^.... a. with G = 1: zi = [w(zi)] E(xi|zi)..11. b.  Without the  i subscript.5 when G = 1. x1 = (z1.  8.  2SLS estimator has the same asymptotic variance whether  ^  or  ^ ^  The  is used.G.  so the optimal instruments are zi^.  )(z)  Var(u1|z) =  s21.  we could use a simple regression analysis. but then either of these can be taken as the dependent variable and the analysis would be by OLS.  We we can certainly be interested in the causal effect of  alcohol consumption on productivity.  We can certainly think of an exogenous change in law enforcement  expenditures causing a reduction in crime. and the parameters  in a two-equation system modeling one in terms of the other. then we could estimate the crime equation by OLS.  What causal inference could one draw from this?  We may be  interested in the tradeoff between wages and benefits.  If we want to know how a change in the price of  foreign technology affects foreign technology (FT) purchases. No. have no economic meaning. where  expenditures are assigned randomly across cities. d. Of course. why would we want to hold fixed R&D spending?  Clearly FT purchases and R&D spending are  simultaneously chosen. b.  (In fact. 46  One’s hourly wage is  . and therefore wage.CHAPTER 9  9. if we have omitted some important factors or have a measurement error problem.  But it  is not a simultaneity problem.  If we could do the appropriate experiment. and we are certainly interested in such thought experiments.  An SEM is a  convenient way to allow expenditures to depend on unobservables (to the econometrician) that affect crime. and vice versa. a. Yes.)  The simultaneous equations model recognizes that cities choose law enforcement expenditures in part on what they expect the crime rate to be.  These are both choice variables of the firm. but we should use a SUR model where neither is an explanatory variable in the other’s equation. No.1. OLS could be inconsistent for estimating the tradeoff. Yes. c.  Further. obtain the reduced form for visits: 47  . dist. Each equation can be estimated by 2SLS using instruments 1.  We would not want to hold fixed family saving and then measure the  effect of changing property taxes on housing expenditures. we need  d11 $ 0 or  d13 $ 0.  9. mremarr. a.3. presumably to maximize  It makes no sense to hold advertising expenditures fixed while  looking at how other variables affect price markup.determined by the demand for skills. fremarr. that is.2. this equation is identified if and only if  d21 $ 0. No. f.  This ensures that there is an exogenous variable shifting the  mother’s reaction function that does not also shift the father’s reaction function. c.  When the property  tax changes. alcohol consumption is determined by individual behavior. e.  These are both chosen by the firm. First.  First. the only variable excluded  from the support equation is the variable mremarr. No. The visits equation is identified if and only if at least one of finc and fremarr actually appears in the support equation.  It makes no  sense to think about how exogenous changes in one would affect the other.  A  SUR system with property tax as an explanatory variable seems to be the appropriate model. since the support equation contains one endogenous variable. We can apply part b of Problem 9. suppose that we look at the effects of changes in local property tax rates. finc.  These are choice variables by the same household. profits. b. a family will generally adjust expenditure in all categories.   ^ regress finc (or fremarr) on support. finc. assuming that  d11 and d12 are both different from zero. dist.d13.  ^ Let support  denote the fitted values from the reduced form regression for support. the sample size times the usual R-squared from this regression is distributed asymptotically as  c21 under the null hypothesis that all instruments are  exogenous.  (SSR0 is just the  usual sum of squared residuals.g12. and save the residuals. a. Let  B1  denote the 7  * 1 vector of parameters in the first equation  with only the normalization restriction imposed:  B’1  = (-1. v2.  ^ Estimate this equation by OLS. d.  Then. dist.d14).  ^ ^ Then. run the simple regression (without intercept) of 1 on u2r1.  Then. mremarr. 48  . A heteroskedasticity-robust test is also easy to obtain. ^ Let u2 be the 2SLS residuals. and save the residuals.)  9. fremarr. dist. v2 ^ and do a (heteroskedasticity-robust) t test that the coefficient on v2 is zero. ^ say r1. mremarr.visits =  p20 + p21finc + p22fremarr + p23dist + p24mremarr + v2.g13. fremarr.d11. run the auxiliary regression  ^ u2 on 1. N -  SSR0 from this regression is asymptotically  c21 under H0. run the OLS  regression ^ support on 1. visits. as in part b. There is one overidentifying restriction in the visits equation.d12.  Assuming  homoskedasticity of u2. the easiest way to test the overidentifying restriction is to first estimate the visits equation by 2SLS.  If this test rejects we conclude that visits is in fact endogenous in  the support equation.5. finc.  Next.  if we just count instruments.  Set  d14 = 1 .z4 =  g12y2 + g13y3 + d11z1 + d13(z3 .d13 and plug this into the equation. &0 0 R1B = 2 70 -g21 d33 Identification requires g21 $ 0 and d32 $  0.The restrictions  d12 = 0 and d13 + d14 = 1 are obtained by choosing R1 =  &0 0 71 0  0 0  0 0  1 0  0* .  49  Ideally.  After simple  algebra we get y1 . a.  . we use the constraints in the remainder of the system to  get the expression for R1B with all information imposed. Because alcohol and educ are endogenous in the first equation.1 14 7 13  d23  d22 + d24 .1 = 2. d24 = 0. + d34 . 18  0 1  Because R1 has two rows.g31 8  By definition of the constraints on the first equation.z4). straightforward matrix multiplication gives R1B =  & d12 2d + d . we need to check the rank condition. we need at least two elements in z(2) and/or z(3) that are not also in z(1). g31 = 0.g31 8 0.  b.  9.g21  d33  d32 * 2. and G . the first column of R1B is zero.  This equation can be estimated by 2SLS using instruments (z1.7.z4) + u1.  But  g23 = 0. there are just enough instruments to estimate this equation.  d23 = 0. the order condition is satisfied. + d34 .z2.  Note  that.  Next. Now. It is easy to see how to estimate the first equation under the given assumptions.  Letting B denote the 7  * 3 matrix  of all structural parameters with only the three normalizations. and g32 = 0. d22 =  and so R1B becomes  d32 * 2.z3.  Err.0267 51.95 0.0142782 1.132745 _cons | 2504.educi) 0  ) 2 2.82352 nwifeinc | .89 0.000 1454.396957 7.340 -.135 -463.128 -------------+---------------------------------------------------------------lwage | hours | .46 0.5673 134.6892584 0.2685 -1.0006143 educ | .0002109 0.143 -.53608 0.0208906 .000 831.0000 lwage 428 4 .169 3.009 educ | -205. b.451518 0.7287 62.3678943 3.1032 21.47 3555.95 0. The matrix of instruments for each i is  ( 2z i Zi = 2 0 2 0 9 d.29):  .176 -119.911094 kidslt6 | -200.362 -2.  9.000 .6455 -103.87188 0.261529 -1.63986 35.49 0.137 -28.47351 3.8577 2522.0070942 .000 -306. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper expersq) Three-stage least squares regression ---------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" chi2 P ---------------------------------------------------------------------hours 428 6 1368.35 0.8919 4. Interval] -------------+---------------------------------------------------------------hours | lwage | 1676.84729 -3.11 0. we should not make any exclusion restrictions in  the reduced form for educ. z P>|z| [95% Conf. c. a.67 0.1145 34.4078 age | -12.0151452 7.9. zi2 0 0  That is. Let z denote all nonredundant exogenous variables in the system.28) and (9. z(3) = z. Here is my Stata output for the 3SLS estimation of (9. Std.59414 kidsge6 | -48.95137 -1.0895 79.49 0.0488753 50  .799 535.933 431.000201 .  Then  use these as instruments in a 2SLS analysis.0002123 .915 -6.1129699 .0832858 .1426539 exper | .28121 8.46 0.0000 --------------------------------------------------------------------------------------------------------------------------------------------------| Coef.we have at least one such element in z(2) and at least one such element in z(3).  0 (zi.  we just need  d22 $ 0 or d23 $ 0 (or both. f.g12g21). and g^21. whereas using an SEM does. a. e. if the SEM is correctly specified. this is identical to 2SLS since it is just identified.  Given  Or.3045904 -2. We can just estimate the reduced form E(y2|z1.  (Since we are estimating the second  equation by 2SLS.302097 -.13 0.  identified if and only if  The second equation is  d11 $ 0. we could just use 2SLS on  ^  d11. we will generally inconsistently estimate  d11 and g12. we obtain a more efficient 51  Of  .021 -1. for the second equation.0008066 . We can estimate the system by 3SLS. After substitution and straightforward algebra.  b. c. course. of course).z2.g^12g^21). we would form p^11 = d^11/(1 . it can be seen that  p11 = d11/(1 .z3) by ordinary least squares. we will still consistently estimate misspecified this equation.0002943 . Consistency of OLS for  p11 does not hinge on the validity of the  exclusion restrictions in the structural model.31 0.expersq | -.11.  Unfortunately.7051103 .260 -.000218 _cons | -.)  So our estimate of  g21 provided we have not  p11 = dE(y2|z)/dz1 will be  inconsistent in any case. Since z2 and z3 are both omitted from the first equation. Whether we estimate the parameters by 2SLS or 3SLS. ^g12.0002614 -1. each equation.1081241 -----------------------------------------------------------------------------Endogenous variables: hours lwage Exogenous variables: educ age kidslt6 kidsge6 nwifeinc exper expersq ------------------------------------------------------------------------------  b.  9. To be added.  d. I know of no econometrics packages that  conveniently allow system estimation using different instruments for different equations. 1936 2 14303.3374871 . a.  (This is  the rank condition.021 -. and only if.  d22 $ 0.015081 0.0968 Residual | 35151.412473 3.68006 148.187 0.194 111 568.4487 0.0000 0.230002  Number of obs F( 2.0845 15.5464812 1.0309 0. Interval] ---------+-------------------------------------------------------------------lpcinc | .4217 113 575.083 -3.796  -----------------------------------------------------------------------------open | Coef. 111) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  (2SLS) 114 2.61387 Residual | 63064.489 -----------------------------------------------------------------------------This shows that log(land) is very statistically significant in the RF for Smaller countries are more open.870989  Number of obs F( 2.9902 113 564.715 -2.17 0.13.342 0.  First.4012 1. Std.4387 17.22775 2 1004. t P>|t| [95% Conf.89934 15.145892 ---------+-----------------------------Total | 65073.49324 0.294 0. The first equation is identified if.79 0. 111) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  114 45.1441212 -2. t P>|t| [95% Conf.3758247 2.  9. Here is my Stata output.estimator of the reduced form parameters by imposing the restrictions in estimating  p11.682852 ---------+-----------------------------Total | 63757. Interval] ---------+-------------------------------------------------------------------open | -. Err.0519014 lpcinc | .836  -----------------------------------------------------------------------------inf | Coef. Here is my Stata output: .505435 lland | -7.41783 52  .6230728 -.  open. Std.388 0.0657 0.) b.000 -9.7966 111 316.953679 _cons | 117.8142162 -9.  c. reg inf open lpcinc (lland lpcinc) Source | SS df MS ---------+-----------------------------Model | 2009.8483 7.000 85.366 0. 2SLS. Err.852 -3.0134 23.617192 4.368841 _cons | 26. then OLS:  .180527 -5.567103 . reg open lpcinc lland Source | SS df MS ---------+-----------------------------Model | 28606.747 0.61916 57.   2  A regression of open  Since  is a natural 2  on log(land).025 -.870989  Number of obs F( 3. t P>|t| [95% Conf.40  -----------------------------------------------------------------------------inf | Coef. Interval] ---------+-------------------------------------------------------------------53  .2150695 . Interval] ---------+-------------------------------------------------------------------open | -.96406 Residual | 62127. we need an IV for it.0946289 -2. gen opensq = open^2 . 24. but we will go ahead.651 0. Err. .026122 55. Std.4217 113 575. [log(land)] candidate.4217 113 575.-----------------------------------------------------------------------------.23419 -----------------------------------------------------------------------------The 2SLS estimate is notably larger in magnitude. 110) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  (2SLS) 114 2. d.993 -3.343207 ---------+-----------------------------Total | 65073.0764 0.09 0. t P>|t| [95% Conf. 2  log(land) is partially correlated with open.4936 111 559.402583 -. reg inf open lpcinc Source | SS df MS ---------+-----------------------------Model | 2945.20522 1.009 0.102 -5.92812 2 1472.870989  Number of obs F( 2. it also  You might want to test to see if open is  endogenous.  of about 2. If we add  g13open2 to the equation.658  -----------------------------------------------------------------------------inf | Coef.273 0. gen llandsq = lland^2 . reg inf open opensq lpcinc (lland llandsq lpcinc) Source | SS df MS ---------+-----------------------------Model | -414.  The Stata output for 2SLS is  .0453 0. [log(land)] .975267 0.896555 3. Std.1060 .110342 Residual | 65487.63 0. has a larger standard error.7527 110 595.70715 ---------+-----------------------------Total | 65073.0175683 1. 111) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  114 2.331026 3 -138.027556 lpcinc | .10403 15.  Not surprisingly. and log(pcinc) 2  gives a heteroskedasticity-robust t statistic on [log(land)] This is borderline.931692 _cons | 25.0281 23. Err. 230002  Number of obs F( 2.0075781 . reg inf openh openhsq lpcinc Source | SS df MS -------------+-----------------------------Model | 3743. the estimate would be significant at about the 6.000 85.39 0.68006 148.0968 Residual | 35151.54102 ------------------------------------------------------------------------------  The squared term indicates that the impact of open on inf diminishes.8483 7.180527 -5.18411 3 1247. gen openhsq = openh^2 .593929 4.0000 0.198637 .796  -----------------------------------------------------------------------------open | Coef.2376 110 557.131 -.5066092 2. t P>|t| [95% Conf. reg open lpcinc lland Source | SS df MS -------------+-----------------------------Model | 28606.0022966 .932 0.0575 0.0174527 lpcinc | .8142162 -9.5464812 1.0879 0.0049828 1.0318 23.245 0.505435 lland | -7.36141 2. predict openh (option xb assumed. Std.000 -9.069134 0.4387 17. Err. t P>|t| [95% Conf. Err. fitted values) .547615 -------------+-----------------------------Total | 65073.1936 2 14303. 111) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  114 45.9902 113 564.567103 .6205699 -1.521 0.5% level against a one-sided alternative.056 -2.682852 -------------+-----------------------------Total | 63757. e.953679 _cons | 117.715 -2.870989  Number of obs F( 3. Std.607147 _cons | 43.17124 19.7966 111 316.17 0.028 4.489 -----------------------------------------------------------------------------. Here is the Stata output for implementing the method described in the problem: . Interval] -------------+---------------------------------------------------------------lpcinc | .612  -----------------------------------------------------------------------------inf | Coef.0311868 opensq | . 110) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  114 2.428461 .0845 15.807 -3.37 0.open | -1.801467 81.72804 Residual | 61330.24 0.230 0.4217 113 575.412473 3. Interval] -------------+---------------------------------------------------------------54  .49324 0.29 0.4487 0. 17831 19. (Actually. If  g13 = 0. b. it is important to allow for these by including separate time intercepts.047 . less robust. Since investment is likely to be affected by macroeconomic factors.1.050927 _cons | 39.112 -1.968493 4.48041 2. Standard investment theories suggest that.  c.openh | -.0178777 lpcinc | .01 0. a. larger marginal tax rates decrease investment. selected by state and local officials.60 0. I would start with a fixed effects analysis to allow arbitrary correlation between all time-varying explanatory variables and ci. at least to a certain extent. could easily be correlated with tax rates because tax rates are.0060502 .5727026 77. the results are similar to the correct IV method from part d. d.12. both  methods are consistent. Putting the unobserved effect ci in the equation is a simple way to account for time-constant features of a county that affect investment and might also be correlated with the tax variable.313 -.02 0.  If only a cross section were available. this is done by using T .78391 ------------------------------------------------------------------------------  Qualitatively.  CHAPTER 10  10.933799 .1 time period dummies.023302 0. which affects investment.0057774 . E(open|lpcinc.5394132 -1.  But the forbidden regression implemented in this part  is uncessary. as shown in Problem 9.984 -3. and we cannot trust the standard errors. ceteris paribus.  This is often a difficult task. we  would have to find an instrument for the tax variable that is uncorrelated with ci and correlated with the tax rate. 55  .8648092 .01 0. anyway.204181 openhsq | .0412172 2.  Something like "average"  county economic climate.0059682 1.lland) is linear and.  this might  not be much of a problem. If taxit and disasterit do not have lagged effects on investment. taxit. 56  . in which case I might use first differencing instead.xi.  In either case. I would compute the  fully robust standard errors along with the usual ones. rates:  This could be similar to setting property tax  sometimes property tax rates are set depending on recent housing  values.  presumably.  These might have little serial correlation because we have  allowed for ci. state  officials might look at the levels of past investment in determining future tax policy. ¨ xi1 = xi1 .  But it cannot be ruled out ahead of time. differencing it is easy to test whether the changes  Remember.  Such an analysis assumes strict exogeneity  of zit. in which case I would use standard fixed effects. future natural On the other hand. and disasterit in the sense that these are uncorrelated with the errors uis for all t and s. especially if there is a target level of tax revenue the officials are are trying to achieve. then the only possible violation of the strict exogeneity assumption is if future values of these variables are correlated with uit.  ------10. a. Let xi = (xi1 + xi2)/2. these results can be compared with those from an FE analysis).  Given that we allow taxit to be correlated with ci. yi = (yi1 + yi2)/2. since a larger base means a smaller rate can achieve the same amount of revenue.  However.3. with first-  Duit are serially  uncorrelated. it  seems more likely that the uit are positively autocorrelated. I have no strong intuition for the likely serial correlation properties of the {uit}.doing pooled OLS is a useful initial exercise. e. this is not a worry for the disaster variable:  It is safe to say that  disasters are not determined by past investment.  ¨ xi1 = (xi1 .  Since  B^FE  and using the representations in (4.. i2 i2 8 i2 i2 8 7i=1 i1 i1 7i=1 i1 i1  Now.  i=1  i=1  This shows that the sum of squared residuals from the fixed effects regression is exactly one have the sum of squared residuals from the first difference regression. ¨ ¨ ¨ ¨ x’ i1xi1 + x’ i2xi2 =  Dx’D i xi/4 + Dx’D i xi/4 = Dx’D i xi/2  ¨ ¨ ¨ y ¨ x’ i1yi1 + x’ i2 i2 =  Dx’D i yi/4 + Dx’D i yi/4 = Dx’D i yi/2. and similarly for ¨ yi1 and y i2  For T = 2 the fixed effects  estimator can be written as  B^FE  =  & SN (x¨’ x¨ + x¨’ x¨ )*-1& SN (x¨’ ¨y + ¨x’ y¨ )*.1’).--¨ ¨ .x i2 FE be the fixed effects residuals for the two time periods for cross section observation i.yi1)/2 =  Dyi/2.xi1)/2 =  Dxi/2  ¨ yi1 = (yi1 .yi2)/2 = -Dyi/2 ¨ yi2 = (yi2 .¨ xi1BFE and ui2 = ¨ yi2 . Let ui1 = ¨ yi1 . 7i=1 i i8 7i=1 i i8  =  ^ ^ ^ ^ ¨ B b. =  B^FD.  Since we know the variance estimate for fixed effects is the SSR 57  .DxiBFD)/2 _ ei/2. by simple algebra..xi.(-Dxi/2)BFD = -(Dyi ^ ui2 =  ^ where ei  ^ ^ DxiB FD)/2 _ -ei/2  ^ ^ ^ Dyi/2 .(Dxi/2)B FD = (Dyi .  Therefore.  and so  B^FE  & SN Dx’Dx /2*-1& SN Dx’Dy /2* 7i=1 i i 8 7i=1 i i 8 & SN Dx’Dx *-1& SN Dx’Dy * = B ^ = FD. we have ^ ^ ui1 = -Dyi/2 .  Therefore.xi2)/2 = -Dxi/2 ¨ xi2 = (xi2 .2.DxiB FD are the first difference residuals. xi2 = xi2 .N.. i = 1..  ^ _ Dyi . N N ^2 2 ^2 S (u^i1 + ui2) = (1/2) S ei.   $  Therefore. they do not depend on time separately.  Thus. b.  10. which implies that E[(ciu’ i )|xi) = 0 by interated expecations.K. Write viv’ i = cijTj’ T + uiu’ i + jT(ciu’ i ) + (ciui)j’ T. the  standard errors.30)  58  . 2  Under RE.  Therefore. the error variance from fixed effects is always half the size as the error variance for first difference estimation.K (when T = 2). ^2  su = ^s2e/2 (contrary to what the problem asks you so show). to show is that the variance matrix estimates of  B^FE  and  What I wanted you  B^FD  are identical.1. and the variance estimate for first difference is the SSR divided by N . which implies that E(uiu’| i xi) = suIT  (again. but the usual random effects variance estimator of  B^RE  is no longer valid because E(viv’| i xi) does not have the form (10.1b).  This is easy since the variance matrix estimate for fixed effects is -1 N N *-1 = ^s2& SN Dx’Dx *-1. a.divided by N .  s2uIT.ci) =  2 s2uIT. The RE estimator is still consistent and  -r-N-asymptotically normal  without assumption RE.3b. that is.3a. h(xi) +  s2u. and in fact all other test statistics (F statistics) will be numerically identical using the two approaches.  This shows that the  conditional variance matrix of vi given xi has the same covariance for all t s. Under RE. by iterated expectations). and the same variance for all t. E(uiu’| i xi. su7 S (x¨’i1¨xi1 + ¨x’i2¨xi2)*8 = (^s2e/2)&7 S Dx’D x /2 i i 8 e7 i i8 i=1 i=1 i=1  ^2&  which is the variance matrix estimator for first difference. while the  variances and covariances depend on xi in general.  E(viv’| i xi) = E(ci|xi)jTj’ T + E(uiu’| i xi) = h(xi)jTj’ T + 2  where h(xi)  _ Var(ci|xi) = E(c2i|xi) (by RE.  E(ui|xi.ci) = 0. h(xi).5. 4779937 . xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black female. and I compute the nonrobust regression-based statistic from equation (11.1012331 female | .73492 .960 -.4782886 _cons | -1.1145132 . X)  =  0 (assumed)  (theta = 0.1629538 hsperc | -.4785  chi2( 10) = Prob > chi2 =  512.0012426 -6. * random effects estimation .1575199 .621 0.0013582 .0328061 sat | . . Err.0440992 . Std.0000775 .810 0.000 -.445 0.5390 0.0020523 verbmath | -.49) should be used in obtaining standard errors or Wald statistics.3566599 -4.16351 -0.000 .0599542 0.77 0.082365 .000167 black | -. * fixed effects estimation.000 -2.7.264814 frstsem | .630 0.627 0. xtreg trmgpa spring crsgpa frstsem season.0108977 -.035879 -----------------------------------------------------------------------------.335 -.0060268 hssize | -.3718544 . Interval] ---------+-------------------------------------------------------------------spring | -.843 0.2067 0. I provide annotated Stata output.0084622 .632 0.0001771 9.3684048 -.0612948 5.0121797 crsgpa | 1.050 0.864 0.3862)  Random-effects GLS regression Number of obs = 732 n = 366 T = 2 R-sq within between overall  = = =  0.1334868 .  10.261 -. with time-varying variables only.0606536 .0017052 .2348189 .79):  .1205028 season | -.000 .1210044 . fe 59  .8999166 1.000 .  The robust variance matrix estimator given in  (7.534 -.103 -.0001248 -0. re  sd(u_id) sd(e_id_t) sd(e_id_t + u_id)  = = =  .3581529 .0392381 -1.000 -. tis term .5526448  corr(u_id.0930877 11.0371605 -1. iis id . z P>|z| [95% Conf.43396 -1.(because it depends on xi).0029948 .963 0.000322 .0681573 -3.0000  -----------------------------------------------------------------------------trmgpa | Coef.124 0.4088283 .2380173 . 0128523 .4088283 .0657817 .000 .7708056 .399 0. egen afrstsem = mean(frstsem).0391404 -1.614 .386 . gen bone = . by(id) . .1482218 season | -.3305004 -2.0111895 crsgpa | 1. gen bvrbmth = . t P>|t| [95% Conf.140688 .614*hssize 60  First.852 -.792693  corr(u_id.1186538 9.614*sat .0000  -----------------------------------------------------------------------------trmgpa | Coef.1225172 . di 1 .614*hsperc . by(id) . Xb)  =  -0. by(id) .366 0. gen bhssize = .  . gen bhsperc = .187 0.332 0. egen atrmgpa = mean(trmgpa). Note that lamdahat = . egen aseason = mean(season).0414748 -1.173 -.681 0. * Now obtain GLS transformations for both time-constant and .1382072 .614*verbmath .. gen bsat = . egen aspring = mean(spring).094 -.386. 362) = Prob > F =  23.0893  Fixed-effects (within) regression Number of obs = 732 n = 366 T = 2 R-sq within between overall F(  = = =  0.2069 0.sd(u_id) sd(e_id_t) sd(e_id_t + u_id)  = = =  .0566454 .0688364 0. Std.0613  4.420747 -.1427528 .020 -1. * Obtaining the regression-based Hausman test is a bit tedious.9073506 1. * time-varying variables.0249165 _cons | -.614 0.362) = 5.0333 0. by(id) .000 (366 categories) .61 0. Err.679133 . egen acrsgpa = mean(crsgpa). Interval] ---------+-------------------------------------------------------------------spring | -. compute the time-averages for all of the time-varying variables: .614 . by(id) .374025 frstsem | .1208637 -----------------------------------------------------------------------------id | F(365. 386*aspring . gen bseason = season .3686049 -. 721) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  732 862. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale.386*acrsgpa .844 0.0017052 .40858  -----------------------------------------------------------------------------btrmgpa | Coef. gen bfrstsem = frstsem .000 . gen bspring = spring . Std.535 -.103 -.46076 732 2.265101 bfrstsem | .386*aseason .1634784 -0.960 -.0001674 bblack | -...123 0. * to perform the Hausman test: .0930923 11.9283 .3284983  Number of obs F( 11. nocons Source | SS df MS ---------+-----------------------------Model | 1584.9294 0.0123167 bcrsgpa | 1.446 0.1207046 bseason | -.3566396 -4. * Check to make sure that pooled OLS on transformed data is random .10163 11 144.000 .0599604 0. * effects.386*atrmgpa .621 0.386*afrstsem .3581524 .1010359 bfemale | . t P>|t| [95% Conf.0392441 -1.0000 0.034666 bspring | -.435019 -1. Err.0109013 -. * These are the RE estimates.0084622 .009239 Residual | 120.114731 .964 0.1211368 .163434 bhsperc | -. .4784672 . * Now add the time averages of the variables that change across i and t .000 -2.811 0. nocons 61  .2348204 .0013577 . gen bfemale = .0029868 .734843 .0003224 .632 0.864 0.8995719 1.1669336 ---------+-----------------------------Total | 1704.0329558 bsat | ..336 -.262 -.0000775 .082336 .0612839 5. . gen bcrsgpa = crsgpa .614*black ..060651 .1336187 .67 0.000 -.0681441 -3.000 .1575166 .050 0..359125 721 .632 0. gen btrmgpa = trmgpa .000177 9.4784686 -----------------------------------------------------------------------------.2378363 .0440905 . gen bblack = ..000 -. subject to rounding error.0020528 bvrbmth | -.0371666 -1.0012424 -6.0001247 -0.626 0. Interval] ---------+-------------------------------------------------------------------bone | -1. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale acrsgpa afrstsem aseason.0060231 bhssize | -.614*female . 053023 718 .000 -.0).61 0.441186 -.0128523 .745 0.926 0.4063367 bspring | -. 718) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  732 676.796 0.006 -2.423761 .531 -.627 0.000 . the usual form of the Hausman test. * Thus.0685972 -3.536 0. based on a distribution (using Stata 7.9296 0.0084655 .6085  .1223184 .093 -. we fail to reject the random effects assumptions even at very large .0247967 bsat | .0000783 . * significance levels.4564551 .171981 Residual | 120.770.1234835 -0.961 0.0566454 .Source | SS df MS ---------+-----------------------------Model | 1584.1101184 bfemale | .0020223 bvrbmth | -.1186766 9.0688496 0.001314 .0480418 .0003236 .1959815 .366 0.40773 14 113.  62  add ". test acrsgpa afrstsem aseason ( 1) ( 2) ( 3)  acrsgpa = 0.1142992 . which includes spring among the coefficients tested.247 0.592 -.0110764 bcrsgpa | 1.85 0.373683 bfrstsem | .0000 0.0 afrstsem = 0. gives p-value = .747 0.426 -.0 F(  3.569 0.2447934 .0711669 4.9282 . Err.0763206 .0794119 0.717 0.0012551 -6.9076934 1.1281327 afrstsem | -.1654425 -0.0060013 bhssize | -. t P>|t| [95% Conf.000167 bblack | -.1316462 .1380874 .3284983  Number of obs F( 14.148023 bseason | -.0001804 9.140688 .3567312 .0795867 .40891  -----------------------------------------------------------------------------btrmgpa | Coef. Interval] ---------+-------------------------------------------------------------------bone | -1.0001249 -0.187 0.355 -.5182286 -2. For comparison.4754216 acrsgpa | -. 718) = Prob > F =  0.000 -.612 0.1931626 bhsperc | -.2322278 -----------------------------------------------------------------------------.3: cluster(id)" to the regression command.0414828 -1.000 .0 aseason = 0.680 0.173 -. robust  .46076 732 2.0657817 .1426398 . Std.337 -.3794684 -.3357016 .0391479 -1.1280569 aseason | .000 .2241405 .0896965 -0.167204767 ---------+-----------------------------Total | 1704.0016681 .852 -.  c24  It would have been easy to make the  regression-based test robust to any violation of RE.0109296 -.  excluding the dummy variables. by random effects and test H0:  X  = 0... there is no sense in which we can  estimate the ci consistently. First.10.  10.T. is another case where "estimating" the fixed effects leads to an estimator of properties.  The simplest way to compute a Hausman test  is to just add the time averages of all explanatory variables. and estimating the equation by random effects.)  Verifying this claim takes much more work. To be added. but  it is mostly just algebra.yi) and b. write  ---  yit = xitB + wiX + rit.9. it produces a  estimator of  B. where xit includes an overall intercept along with time dummies. asymptotically normal  Therefore. we can "concentrate" the ai out ^ by finding ai(b) as a function of (xi..11.  -r-N-consistent. substituting back into the  63  .  B  with good  (As usual with fixed T.13. t = 1.  We can estimate this equation  The actual calculation for this example  is to be added. "fixed effects weighted least squares.  10.  Parts b. in the sum of squared residuals.  Yes. and d:  To be added. The short answer is: as N  L 8. the covariates that change across i and t. a. The Stata output follows." where the  weights are known functions of exogenous variables (including xi and possible other covariates that do not appear in the conditional mean). c. done a better job of spelling this out in the text. as well as wit. we can justify this procedure with fixed T  In particular..  I should have  In other words. -w .-w ..-w  But this is just a pooled weighted least squares regression of (yit .-. but  -------  where the usual 1/rhit weighting shows up in the sum of squared residuals on the time-demeaned data (where the demeaming is a weighted average).-x..-wi) .-.-wi)/r-h.. all t = 1.  t=1  which gives ^ & ai(b) = wi  T * & T * S yit/hit .^ai .-wi)/r-h. it is . define yit  (xit .-y.. Note that yi and xi are simply weighted averages.(xit .-w ..-.  i=1t=1  . _ (uit . where ui _ wi S uit/hit ..x..-w & T * where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit ..xi) with weights 1/hit.-it  Then  B^  can be expressed  in usual pooled OLS form:  B^  =  & SN ST x~’ x~ *-1& SN ST x~’ y~ *. .-w .wi S xit/hit b 7t=1 8 7t=1 8 w w _ yi ... and then minimizing with respect to b only....  ~ When we plug this in for yit in (10. Subtracting 7t=1 8 (10.-w If h equals the same constant for all t.-w  Note carefully how the initial yit are weighted by 1/hit to obtain yi.T.. i = 1..-it  . we can study the asymptotic (N  this equation from yit = xitB ~ where uit  ... T & * ...82)  .-w & T * easy to show that yi = xiB + ci + ui.-w . _ (yit .-u. _ (xit . ~ xit  ...N..sum of squared residuals.-w definition holds for xi.xitb)/hit = 0.-w .-wi)b]2/hit. ^ ^ Now we can plug each ai(b) into the SSR to get the problem solved by B: min K beR  N T S S [(yit .-wi)/r-h..-it  +  ~ ~ ci + uit for all t gives yit = xitB  +  ~ uit..  Given  ^ L 8) properties of B . y and x are the usual time it  i  i  averages. First.yi) on  -----w  ~ Equivalently. 7i=1t=1 it it8 7i=1t=1 it it8  (10.82).82) and  divide by N in the appropriate places we get  64  .xib. and a similar 7t=1 8 7t=1 8 ... Straightforward algebra gives the first order conditions for each i as T S (yit ..-y. B) = A BA ..84)  ^ ~ ...) ^ --It is also clear from (10. including xit.-* .. t $ s. i = 1... note that the residuals from the pooled OLS regression ~ ~ yit on xit.  As long  & S E(x~’ ~x )* = K. t = 1.H) = B.ci) = 0.hi).83) we can immediately read off the consistency of  (10.ci) = 0.B) = suA .83)  B^. The same subtleties that arise in estimating  effects estimator crop up here as well...B ^ 2 -1 su2A.  B^  =  B  +  t=1  t=1  and so we have the convenient expression  B^  =  B  +  &N-1 SN ST x~’ x~ *-1&N-1 SN ST x~’ u /rh.  (10. we can use the usual proof to show it it 8 7t=1 ^ ^ plim(B) = B..-w ------^ say rit.....  ~2 2 .B ^ -1 -1 rN( .N. t = 1.T...  s2u for the usual fixed  Assume the zero conditional  covariance assumption and correct variance specification in the previous paragraph... are estimating uit = (uit ..2su2E[(wi/hit)] + su2E[(wi/hit)]. then it is easily shown that B  --.  The asymptotic variance is generally Avar  --.&N-1 SN ST x~’ x~ *-1&N-1 SN ST x~’ u~ *. which means uit is uncorrelated with any ~ function of (xi. in addition to the  A  variance assumption Var(uit|xi.ci) = =  s2uhit.-w Now E(uit) = E[(uit/hit)] ... (We can even show that E(B|X. 7 i=1t=1 it it8 7 i=1t=1 it it8 T ~ ~ T ~ ------Straightforward algebra shows that S x’ ituit = S x’ ituit/rhit.T..uis|xi. where the law of  65  . i = 1.  Why?  We  assumed that E(uit|xi.hi.-it 8 t=1 t=1 If we assume that Cov(uit..  ~ So E(x’ ituit) = 0.2E[(uitui)/hit]  su2 ..  where T T . and so rN( .. _ S E(x~’it~xit) and B _ Var&7 S ~x’ituit/r-h. 7 i=1t=1 it it8 7 i=1t=1 it it it8  From (10.hi.  Then.N..ui)/rhit (in the sense that we obtain rit ~ from uit by replacing  -----w  2  + E[(ui) /hit] =  B  with  B^).83) that B is rN-asymptotically normal under T  as we assume rank  mild assumptions.hi.-*. - - -w 2|x ,h ] = s2w has i i u i  iterated expectations is applied several times, and E[(ui) been used.  ~2 Therefore, E(uit) =  su2[1 - E(wi/hit)], t = 1,...,T, and so  T  S E(u~2it) = s2u{T - E[wiWSTt=1(1/hit)]} = s2u(T - 1).  t=1  This contains the usual result for the within transformation as a special case.  A consistent estimator of  su2 is SSR/[N(T - 1) - K], where SSR is the  usual sum of squared residuals from (10.84), and the subtraction of K is optional.  ^ The estimator of Avar(B) is then ^2&  -1 N T su7 S S ~x’it~xit*8 . i=1t=1  If we want to allow serial correlation in the {uit}, or allow Var(uit|xi,hi,ci)  $ s2uhit, then we can just apply the robust formula for the  pooled OLS regression (10.84).  CHAPTER 11  11.1. a. It is important to remember that, any time we put a variable in a regression model (whether we are using cross section or panel data), we are controlling for the effects of that variable on the dependent variable.  The  whole point of regression analysis is that it allows the explanatory variables to be correlated while estimating ceteris paribus effects.  Thus, the  inclusion of yi,t-1 in the equation allows progit to be correlated with yi,t-1, and also recognizes that, due to inertia, yit is often strongly related to yi,t-1. An assumption that implies pooled OLS is consistent is E(uit|zi,xit,yi,t-1,progit) = 0, all t,  66  which is implied by but is weaker than dynamic completeness.  Without  additional assumptions, the pooled OLS standard errors and test statistics need to be adjusted for heteroskedasticity and serial correlation (although the later will not be present under dynamic completeness). b. As we discussed in Section 7.8.2, this statement is incorrect. Provided our interest is in E(yit|zi,xit,yi,t-1,progit), we do not care about serial correlation in the implied errors, nor does serial correlation cause inconsistency in the OLS estimators. c. Such a model is the standard unobserved effects model: yit = xitB +  d1progit + ci + uit,  t=1,2,...,T.  We would probably assume that (xit,progit) is strictly exogenous; the weakest form of strict exogeneity is that (xit,progit) is uncorrelated with uis for all t and s.  Then we could estimate the equation by fixed effects or first  differencing.  If the uit are serially uncorrelated, FE is preferred.  We  could also do a GLS analysis after the fixed effects or first-differencing transformations, but we should have a large N. d. A model that incorporates features from parts a and c is yit = xitB +  d1progit + r1yi,t-1 + ci + uit,  t = 1,...,T.  Now, program participation can depend on unobserved city heterogeneity as well as on lagged yit (we assume that yi0 is observed). differencing are both inconsistent as N  Fixed effects and first-  L 8 with fixed T.  Assuming that E(uit|xi,progi,yi,t-1,yi,t-2,...,yi0) = 0, a consistent procedure is obtained by first differencing, to get yit = At time t and yi,t-j for j  DxitB + d1Dprogit + r1Dyi,t-1 + Duit,  t=2,...,T.  Dxit, Dprogit can be used as there own instruments, along with  > 2.  Either pooled 2SLS or a GMM procedure can be used. 67  Under  strict exogeneity, past and future values of xit can also be used as instruments.  11.3. Writing yit =  bxit + ci + uit - brit, the fixed effects estimator ^bFE  can be written as 2 N T N T b + 7&N-1 S S (xit - x- - -i)*8 &7N-1 S S (xit - x- - -i)(uit - u- - -i - b(rit - -r- -i)*8. i=1t=1 i=1t=1 * * --* Now, xit - xi = (xit - xi) + (rit - ri). Then, because E(rit|xi,ci) = 0 for * - - -* --all t, (x - x ) and (r - r ) are uncorrelated, and so it  i  it  i  ---  - - -*  *  ---  Var(xit - xi) = Var(xit - xi) + Var(rit - ri), all t.  ---  ---  Similarly, under (11.30), (xit - xi) and (uit - ui) are uncorrelated for all  ---  ---  *  - - -*  ---  ---  Now E[(xit - xi)(rit - ri)] = E[{(xit - xi) + (rit - ri)}(rit - ri)] =  t.  ---  Var(rit - ri).  By the law of large numbers and the assumption of constant  variances across t, N  T T S S (xit - x- - -i) Lp S Var(xit - -x- -i) = T[Var(x*it - -x- -*i) + Var(rit - -r- -i)]  -1 N  i=1t=1  t=1  and N  T S S (xit - x- - -i)(uit - u- - -i - b(rit - -r- -i) Lp -TbVar(rit - -r- -i).  -1 N  i=1t=1  Therefore,  plim  ^  bFE  --Var(rit - r i ) & * = b - b -------------------------------------------------------------------------------7[Var(x* - x- - - * ) + Var(r - r- - - )]8 it  i  it  i  it  i  it  i  --Var(rit - r i ) & * = b 1 - -------------------------------------------------------------------------------- . 7 * * 8 [Var(x - x ) + Var(r - r )]  11.5. a. E(vi|zi,xi) = Zi[E(ai|zi,xi) -  A]  + E(ui|zi,xi) = Zi(A -  A)  + 0 = 0.  Next, Var(vi|zi,xi) = ZiVar(ai|zi,xi)Z’ i + Var(ui|zi,xi) + Cov(ai,ui|zi,xi) + Cov(ui,ai|zi,xi) = ZiVar(ai|zi,xi)Z’ i + Var(ui|zi,xi) because ai and ui are uncorrelated, conditional on (zi,xi), by FE.1’ and the usual iterated 68  4)]. b.T. Assumption (Remember.  *  to be zero (all but those corresponding the the  Therefore.. From part a.  denote the pooled OLS estimator from this equation.expectations argument. Unlike in the standard random effects model.zi) = plim()).  Therefore.  based on the usual RE variance matrix estimator -. which shows that the conditional variance depends on zi. Davidson and MacKinnon (1993. the usual random effects inference -.  for all t. t = 1. we are applying FGLS to the equation yi = ZiA + XiB + vi...zi) = 0. and so the usual RE estimator is consistent (as N fixed T) and RE. or even that Var(vi) = plim()). we expand the set of  explanatory variables to (zit..xi) = Zi*Z’ i +  s2uIT under the  assumptions given.zi) depends on zi unless we restrict almost all elements of constant in zit).will be invalid. provided  ^ )  L 8 for  -r-N-asymptotically normal.xit).  From part a.  Naturally.49). as in equation (7.zi).2.60) to get  yit = xitB + xiL Let  A  L)  +  vit. we know that  E(vi|xi. If we use the usual RE analysis. we know that Var(vi|xi.that is. there is conditional heteroskedasticity. holds. c.7. where vi = Zi(ai -  A)  + ui. and we estimate  11. We can easily make the RE analysis fully robust to an arbitrary Var(vi|xi. When  Lt  =  L/T  ---  B^  (along with  ^  along with  B. Section 1.  It need  ^ ^ not be the case that Var(vi|xi.2. a feasible GLS analysis with any  ^ )  will be consistent  converges in probability to a nonsingular matrix as N  L 8. provided the rank condition. we can rearrange (11. Var(vi|zi.  By  standard results on partitioned regression [for example.  B^  can be obtained by the following two-step  procedure:  69  .  and  *  into (8. We can apply the results on GMM estimation in Chapter 8..9.  Given that the FE estimator can  --^ be obtained by pooled OLS of yit on (xit ..xi. it it 8 7t=1  time-demeaned equation:  rank  This clearly fails if xit  contains any time-constant explanatory variables (across all i. and so rit = xit . just as before.25).xi for all t and i. and save the 1  * K vectors of  ^ residuals. The argument is very similar to the case of the fixed effects T  estimator..--N T --. we can always redefine  T  zit so that  S E(z¨’it¨zit) has full rank.  t=1  70  ^ ¨ ¨ ^ If uit = y it .. ^ (ii) Regress yit on rit across all t and i.--^ --= S Tx’ i xi = S S x’ i xi.  & ST E(z¨’ ¨z )* = L is also needed.  and  i=1t=1  11. a.xi). We can apply Problem 8.xitB  .8.T. This completes the proof.  First. and  ¨’¨ ¨ ¨ E(Z i uiu’ i Zi).  c. W..25)  and simplify.& rit = xit . as usual)... W = [E(Z’ i Zi)] .*-1& N T .  B^  We want to show that  is the FE estimator.  *  =  " ¨ u .  If we plug these choices of C.b.  Under (11.xiIK  N T N T S S x.. where A key point is that ¨ Z’ i ui = (QTZi)’(QTui) = Z’ i QTui = Z’ i i  QT is the T x T time-demeaning matrix defined in Chapter 10. take C = E(Z i Xi). as we are applying pooled 2SLS to the  & ST E(z¨’ ¨x )* = K.  su2IT (by the usual iterated expectations argument)..S S x’i xi8 7 S S x’i xit*8 7i=1t=1 i=1t=1 N --...  But if the rank condition holds..80).1)su. i = 1. say rit.  In  -1 ¨’¨ ¨ ¨ particular.B ^ 2 -1 ¨ ¨ -1 ¨’Z ¨ ¨ ¨ rN( ..  2 2 S E(u¨it ) = (T .  But  ^ --.. and so * =  ¨ E(uiu’| i Zi) = ¨’u u’¨ E(Z i i i Zi) =  ¨’¨ s2uE(Z i Zi).-’i xit = S -x. we obtain Avar  --.xi  N T .N. and this rules out timeit it 8 7t=1  The condition rank  constant instruments. it suffices to show that rit = -----  xit .  t=1  b.---  (i) Regress xit on xi across all t and i. t = 1.B) = su{E(X i i)[E(Z’ i Zi)] E(Z’ i Xi)} .. in equation (8.  ^ ^ The OLS vector on rit is B.-’i S xit i=1t=1 i=1 t=1 i=1 ----= xit .  dNi. B from the pooled regression yit on d1i.. d. ^  D. the 2SLS estimator of all parameters in (11.1 (which is purely algebraic.. by algebra of partial regression. ^ obtained from the pooled regression ¨ yit on x it rit. Now consider the 2SLS estimator of  B  from (11.  Therefore. First. cN. by writing down the first order condition for the 2SLS ^ estimates from (11.... where B is the IV estimator 71  . then [N(T -1 N  1)]  T S S ^u2it is a consistent estimator of s2u.81) are identical.  ^ . . and obtain the residuals..) e.  B  from (11. as  ^ would usually be the case. xit.  But..  But we can simply drop those without changing any other steps in the  argument.  Typically. say sit.1) would be  i=1t=1  replaced by N(T . it is easy to show that ci = yi .81) (with dni as their own instruments... ^ ^ ^ obtain c1. and then ^ running the OLS regression ¨ yit on ¨ xit.79). dNi results in time demeaning..  Now.  B^  ^  and the coefficient on rit. second. again by partial regression  ^ and the fact that regressing on d1i. including  B.xiB.81). say rit. N(T . this partialling  out is equivalent to time demeaning all variables.1) . some entries in rit are identically zero for all t and i.  This proves that the 2SLS estimates of  and (11. say  from this last regression can be obtained by first partialling out the  dummy variables.^ ^ for xit).  As we know from Chapter 10. From Problem 5. sit.. and xit as the IVs ^ ----..K as a degrees of freedom adjustment.79)  (If some elements of xit are included in zit. . d1i. dNi.. and so applies immediately to pooled 2SLS).  This is equivalent to  ^ first regressing ¨ xit on ¨ zit and saving the residuals. zit across all t and i. . ^ rit.are the pooled 2SLS residuals applied to the time-demeaned data.  can be obtained as follows:  first run the regression xit on d1i. where we use the fact ^ that the time average of rit for each i is identically zero. dNi. .. sit = ^ rit for all i and t..  B^  and  D^  can be  ¨ . 79)).97 = 0. 51) Prob > F R-squared Adj R-squared Root MSE  = 54 = 0.49516731 ---------+-----------------------------Total | 26.xiB) . and * =  &N-1 SN ¨Z’u^ ^u’¨Z *.  Because the N dummy  variables are explicitly included in (11.  Therefore.81) (and also (11.cgrant_1[_n-1] if d89 (314 missing values generated) . gen cclscrap = clscrap .479224186 Residual | 25.(yi . the degrees of freedom in estimating  s2u from part c are properly calculated.0366 = -0. Err.11. gen ccgrnt_1 = cgrant_1 .(xit .xi)B = y it ^ ¨ xitB.¨ XiB.from (11.yi) . I can used fixed effects on the first differences:  . gen ccgrnt = cgrant .  f. the 2SLS residuals from (11. Std.^ ^ ----.0012 = .xitB = (yit . reg cclscrap ccgrnt ccgrnt_1 Source | SS df MS ---------+-----------------------------Model | .81)  ----.  11. messy estimator in equation (8.cgrant[_n-1] if d89 (314 missing values generated) .3868 = 0.^ ¨ are computed as yit . all elements).958448372 2 .79).clscrap[_n-1] if d89 (417 missing values generated) .27) should be used. Differencing twice and using the resulting cross section is easily done in Stata. as is any IV  method that uses time-demeaning to eliminate the unobserved effect. The 2SLS procedure is inconsistent as N  L 8 with fixed T.  This is  because the time-demeaned IVs will generally be correlated with some elements of ui (usually.81). Interval] 72  .70368  -----------------------------------------------------------------------------cclscrap | Coef. W = (Z Z/N) .2119812 53 .  Alternatively. ui = ¨ yi .2535328 51 . where ^ -1 ^ ^ ^ ¨’¨ X and Z are replaced with ¨ X and ¨ Z. which are exactly the 2SLS residuals from (11.494565682  Number of obs F( 2. The general. t P>|t| [95% Conf. 7 i=1 i i i i8 g. 51) = 1.4544153 .0577 0.3826  -----------------------------------------------------------------------------clscrap | Coef.674 0.  73  .6343411 0.689 0.953 0.341 -. To be added.5202783 .5202783 .1564748 . t P>|t| [95% Conf.  11.555 -.15. xtreg clscrap d89 cgrant cgrant_1.0448014 cgrant | .689 0. 51) = Prob > F =  1.0050  3.13.6343411 0. To be added.the estimates on the grant variables are of the wrong sign -.6635913 1.1407363 -1.097 -.2632934 0.341 -.6099016 .04 0.4975778 .097 -.594 0.4011  Fixed-effects (within) regression Number of obs = 108 n = 54 T = 2 R-sq within between overall F(  = = =  0.2377384 . Std.1407362 -1.961 0. Xb)  =  -0.---------+-------------------------------------------------------------------ccgrnt | .and they are very imprecise.  11. so it is hard to know what to make of this.883394 _cons | -.6850584 cgrant_1 | .1564748 .961 0.6099015 .  It does cast doubt on the  standard unobserved effects model without a random growth term.3721087 .594 0.509567 . Err.056 -.0476 0. Interval] ---------+-------------------------------------------------------------------d89 | -.883394 _cons | -.114748 -1.033 (54 categories)  The estimates from the random growth model are pretty bad -.0448014 ------------------------------------------------------------------------------  .6850584 ccgrnt_1 | .  The  joint F test for the 53 different intercepts is significant at the 5% level.2632934 0.3721087 .6635913 1.555 -. fe  sd(u_fcode) sd(e_fcode_t) sd(e_fcode_t + u_fcode)  = = =  .2377385 .2240491 .0063171 -----------------------------------------------------------------------------fcode | F(53.7122094  corr(u_fcode.  sincethe ui are the FE residuals. and  B  with  ^ their consistent estimators.A ^ r-N( .11.B ^ rN( FE . we use (11.B) =  Simple algebra and standard properties of Op(1)  and op(1) give  --.CA-1N-1/2 S X¨’i ui + op(1) N  i=1  where C =  A.XiB).^ .  i=1  _ E[(Z’i Zi) Z’i Xi] and si _ (Z’i Zi) Z’i (yi .A) .CA-1X¨’i ui.XiB) .A) is asymptotically normal with zero mean and variance  E(rir’ i ).17. where ri  _ (si .A ^ -1/2 rN( . we get exactly (11. To obtain (11.CA-1X¨’i ui] + op(1).  C.  --.A) = N S [(si .B) N -1/2 N = N S (si .A) = N S [(Z’i Zi)-1Z’i (yi .55).  74  .A) . A. N  i=1  which implies by the central limit theorem and the asymptotic equivalence lemma that  -.  If we replace  A. -1  -1  By definition.A) .54) and the representation -1  A (N  -1/2 N ¨ Si=1X’i ui)  + op(1).A] i=1 & -1 N -1 * --.55).A ^ -1/2 rN( .N S (Z’ 7 i=1 i Zi) Z’i Xi8rN(BFE . E(si)  By combining terms in the sum we have  --.  i = 1. in general).Q) evaluated under the null hypothesis.Q)]  2  |x}  |x) + 0 + [m(x. obtain the 1  ~ ~ For the robust test. where ~Q1 is the From (12.Q)]E(u|x) + E{[m(x.N. *  the turning point is z2 = d..3.m(x.  Then we compute the statistic as in  regression (12.1. By the chain rule.75).  2  = E(u  Now.5. We need the gradient of m(xi.  12. a..m(x. Since  d^E(y|z)/dz2 = exp[^q1 + ^q2log(z1) + ^q3z2 + ^q4z22]W(^q3 + 2^q4z2). This is approximated by 100Wdlog[E(y|z)]/dz2 = 100Wq3.  Dbm(x.Qo) . we first regress mixi2 on mixi1 and  * K2 residuals.CHAPTER 12  12.  Dqm(x.m(x.  and the second term is clearly  (although not uniquely...72).m(x.Qo) . Since  ^  q3/(-2^q4).mi. Take the conditional expectation of equation (12.Qo) . The approximate elasticity is  ^ dlog[E(y |z)]/dlog(z1) = d[q^1 +  ^  q2log(z1) + ^q3z2]/dlog(z1) = ^q2. we can compute the usual LM  2 ~ ~ ~ statistic as NRu from the regression ui on mixi1.Qo) .Q)]2  2  = E(u  |x) + [m(x. the gradient of the mean function  evaluated under the null is restricted NLS estimator.m(x. the first term does not depend on  Q  minimized at  =  Qo  Q. and use E(u|x) = 0: E{[y . c.4) with respect to x.Q) = exp(x1Q1 + x2Q2)x. ^ ^ b.Q) = g[xB + d1(xB)2 + d2(xB)3]W[x + 2d1(xB)2 + 75  .Q)]2.Q)]  2  |x] = E(u2|x) + 2[m(x.  12. r~i. where  ~ ~ ui = yi . mixi2.  Dq~mi = exp(xi1Q~1)xi _ ~mixi.   ~ ~ 2 ~ 3 Ddm(xi.Q~) = g(xiB )[(xiB) .Q. hopefully. giW(xiB) .(xB)3]..Qo..  With this definition.  _ yig . too.(xiB) ]. do NLS for each g.  ^ ^ be the vector of nonlinear least Let u i  That is.  )o. by standard arguments. 76  .7. This part involves several steps. For each i and g.G).Q)’) ui(Q) -1  where. which has the same consequence. a consistent estimator of  )^ _ because each NLS estimator. g = 1. linear combination of ui(Qo) of (xi..  )  -.  So  Each  Dgsj(wi. a.G)|xi] = 0.Qo. even though the actual derivatives are complicated. where  ~ gi  ~ _ g(xiB ).~Q) = g(xiB )xi and  Therefore.G) = -Dqm(xi.. we can verify  condition (12.Q) = g[xB + d1(xB)2 + d2(xB)3]W[(xB)2. the usual LM statistic can be  2 ~ ~ ~ ~ 2 ~ ~ 3 obtained as NRu from the regression ui on gixi. 2  Ddm(x. where the linear combination is a function  Since E(ui|xi) = 0.  Then  Let  B~  denote  ~ Dbm(xi.  _ 1. the score for  observation i is s(wi. element of s(wi. and so its  unconditional expectation is zero.the nuisance  Then.  Alternatively. so that E(uig|xi) =  Further.  If G(W) is the identity function.  )o  is  -1 N  N  ^ ^  Qg  S ^^ui^^u’i  i=1  is consistent for  Qog  as N  8.  First.  This shows that we do not have to  adjust for the first-stage estimation of  )o.  Then.37). g(W)  12.G) is a  _ ui.  L  b. giW(xiB) .Q.  * 1 vector containing the uig. and we get RESET. and collect the residuals. one can verify  the hint directly. let  G  be the vector of distinct elements of  parameters in the context of two-step M-estimation.3d2(xB) ].G) is a linear combination of ui(Q).Qo.  the NLS estimator with  d1 = d2 = 0 imposed. the notation is clear. and I will sketch how each one goes.m(xig. let ui be the G  Then E(uiu’| i xi) = E(uiu’ i) = squares residuals.G. E[Dgsj(wi.Qo). define uig 0. Q. but its expected value is not.37) and that Ao = Bo. -1  So. E[Hi(Qo.  The  = {E[Dqmi(Qo)’)o  -1  Dqmi(Qo)].G) depends on xi.  _ E[Hi(Qo. First. we replace expectations with sample averages and unknown ^ ^ parameters.Next. not on yi. of si(Q. we derive Bo  _ E[si(Qo.3. As usual. -1  = Ao  with respect to  Dqmi(Qo)’)-1 o Dqmi(Qo) + [IP t E(ui|xi)’]F(xi.  So. d.Go)]. from Theorem 12. we have  Therefore. note that  Dqmi(Qo) is a block-diagonal matrix with blocks  Dqgmig(Qog).G) is a GP  Q.Q)’)-1Dqm(xi.G) with respect to Hi(Q. we have to derive Ao  Dqmi(Qo)]  Dqmi(Qo)|xi]}  = E[Dqmi(Qo)’)o E(uiu’| i xi))o = E[Dqmi(Qo)’)o  -1  -1  Dqmi(Qo)]  )o)-1 o Dqmi(Qo)]  = E[Dqmi(Qo)’)o  -1  Dqmi(Qo)]. where P is the total number of  parameters. Avar  rN(^Q -----  Dqmi(Qo)]}-1.  (I implicityly assumed that there are no  cross-equation restrictions imposed in the nonlinear SUR estimation. q i 7  ^ ^ Avar(Q) =  i=1  The estimate  ) ^  can be based on the multivariate NLS residuals or can be  updated after the nonlinear SUR estimates have been obtained.Go)si(Qo.Go)’]:  E[si(Qo.G) =  Q  can be written  * P matrix.Go)|xi] = =  verified (12. and show that Bo = Ao.  c.G).) 77  If  )o  .Q.Go)  Now iterated expectations gives Ao = E[Dqmi(Qo)’)o  Qo)  The Jacobian  Dqm(xi.Q.  Dqmi(Qo)’)-1 o Dqmi(Qo).  where F(xi.Q) + [IP t ui(Q)’]F(xi. and divide the result by N to get Avar(Q):  &N-1 SN D m (^Q)’) ^ -1 ^ *-1 D mi(Q) q i q 7 i=1 8 /N N -1 & S D m (Q^)’) ^ -1 = Dqmi(^Q)*8 .Qo.Go)’] = E[Dqmi(Qo)’)o uiu’ i )o -1  = E{E[Dqmi(Qo)’)o uiu’ i )o -1  -1  -1  -1  Next.Go)si(Qo. a 1 * Pg matrix. that involves Jacobians of the rows of  )-1Dqmi(Q)  The key is that F(xi.  Hessian itself is complicated. 5 does not extend readily to nonlinear models. for all g.Qog) = xi  For example.Qo).  Then E(y|x) . so is its inverse. b.9.  Standard matrix multiplication shows  that  (  -2 o so1 Dq1mi1 ’Dq 1 m oi1 0 W W W  2  Dqmi(Qo)’)-1 o Dqmi(Qo) =  ) 0  2  2  2  0  WW W  2  2  .  But... I cannot see a nonlinear analogue of Theorem 7. even when the same regressors appear in each equation.  While this G  The key is that Xi is replaced  * P matrix has a block-diagonal form.)  These  asymptotic variances are easily seen to be the same as those for nonlinear least squares on each equation.a very  Dqgmg(xi.Qog) = exp(xiQog) then  Dqgmg(xi.G. e.  is the same in all equations -. if  Dqgmg(xi.  (Note also that the nonlinear SUR  estimators are asymptotically uncorrelated across equations. which does not  . then E(u|x) and Med(u|x) are both constants.Med(y|x) =  78  a . with  Dqm(xi..  mg(xi.  The first hint  given in Problem 7. and the gradients  differ across g.Qog) = exp(xiQog)xi. and Med(u|x) could be a general function of x. We cannot say anything in general about Med(y|x).  12. the blocks are not the same even when the same regressors appear in each equation. g = 1. unless  Qog  restrictive assumption --  In the linear case.is block-diagonal.d. since Med(y|x) = m(x. as  described in part d. see p.Qog) varies across g. a. If u and x are independent.Qog) =  0  2  W W W  0  2  -----  s2og[E(Dqgmoig’Dqgmoig)]-1..Bo) + Med(u|x).7. 360.  2  2  o o s-2 oG DqGm iG’DqGmiG 9 0 ^ Taking expectations and inverting the result shows that Avar rN(Qg . say  a and d.  we need -.B)]} because E(ui|xi) = 0.m(xi.in addition to the regularity conditions. c. a.B)] = E{[yi -  m(xi.depend on x. b.m(xi. we could interpret large differences  between LAD and NLS as perhaps indicating an outlier problem.  Then.W) is twice continuously differentiable. and there is no ambiguity about what is "the effect of xj on y.m(xi. Provided m(x.B)]’[m(xi.  Generally.Bo) .11.Bo) m(xi.m(xi. the identification assumption is  that E{[m(xi. the condition  is (Bo -  B)’E(X’i Xi)(Bo  -  B)  > 0.m(xi.B)]}) = E(u’ i ui) + 2E{[m(xi.B)]’[m(xi.Bo) . where m(xi.Bo) .B)]}’{ui + [m(xi. When u and x are independent.Bo) .Bo) m(xi. Bo = E[Dqmi(Bo)’uiu’D i qmi(Bo)]  These can be consistently estimated in the  obvious way after obtain the MNLS estimators.B)]’[yi .B)]’ui} + E{[m(xi. and Ao = E[Dqmi(Bo)’Dqmi(Bo)].  B $ Bo.m(xi.Bo) m(xi.3. the partial effects of xj on the conditional mean and conditional median are the same.the identification condition.B) = XiB for Xi a G  B $ Bo.  That is.  Therefore.  12.B)]} = E({ui + [m(xi. there are no problems in applying Theorem 12.B)]} > 0. 79  The key is that. We can apply the results on two-step M-estimation.m(xi.  * K matrix.Bo) .Bo) ." at least when only the mean and median are in the running. For consistency of the MNLS estimator.B)]} = E(uiu’ i ) + E{[m(xi.  But it could  just be that u and x are not independent.B)]’[m(xi.  . which I will ignore -.  Bo  must uniquely minimize E[q(wi. In a linear model.  and this holds provided E(X’ i Xi) is positive definite. B)]}. -1  80  Dbmi(Bo)]  Dbmi(Bo)]  . 2E{[m(xi.Do) for some function G(xi.m(xi.m(xi.Bo) . as always). -----  To obtain the asymptotic variance when the conditional variance matrix is correctly specified. which implies (12.B)]} = E{u’ i [Wi(Do)] ui} -1  -1  + E{[m(xi.m(xi.Do).Bo) . we proceed as in Problem 12. -1  before. under E(yi|xi) = m(xi.Bo.  First.7.B)]’[W(xi. -1  where E(ui|xi) = 0 is used to show the cross-product term. -1  which is just to say that the usual consistency proof can be used provided we verify identification.B)]}/2. we can write  Ddsi(Bo.  To get the asymptotic variance. the first term does not depend on at  Bo. we can ignore preliminary estimation of provided we have a  Do  rN-consistent estimator.B)]’[Wi(Do)] ui}. that is.Do) = (IP t ui)’G(xi.Bo) m(xi.Do)] [yi .B)]’[Wi(^D)]-1[yi .m(xi.Do)|xi] = 0.Do)’] = E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] -1  -1  = E{E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] -1  -1  Dbmi(Bo)|xi]}  = E[Dbmi(Bo)’[Wi(Do)] E(uiu’| i xi)[Wi(Do)] -1  -1  = E{Dbmi(Bo)’[Wi(Do)] ]Dbmi(Bo)}. N  -1 N  S [yi . is zero (by iterated expectations. it can be shown that condition (12.  In particular.  It  follows easily that E[Ddsi(Bo.Do)si(Bo.  B  As  and the second term is minimized  we would have to assume it is uniquely minimized.7: E[si(Bo.Do)] [yi .underl general regularity conditions.Bo.  i=1  converges uniformly in probability to E{[yi .m(xi.B)]/2.  But we can use an argument very similar to the  unweighted case to show E{[yi .  This means  that.m(xi.37).B)]’[Wi(Do)] [m(xi. we can use an argument very similar to the nonlinear SUR case in Problem 12.m(xi.Do).37) holds.Bo).B)]’[W(xi. when Var(yi|xi) = Var(ui|xi) = W(xi.m(xi.  ^ B = N  $  The estimator of Ao in part b  To consistently estimate Bo we use  -1 N  ^ ^ -1^ ^ ^ -1 ^ S Dbm(xi.Now.Q)]. No. the asymptotic variance is affected because Ao  Bo.B). we estimate Avar  ^ ^-1^^-1 rN(B .Do).  evaluated at (Bo. The consistency argument in part b did not use the fact that W(x.Bo.  fact.Q)]} over  $.  i=1  Now.  Exactly the same derivation goes  But.Q)]  $ exp{E[log f(yi|xi.1.Bo) = Ao . through. and the expression for Bo no longer holds. still works.B )’[Wi(D)] uiu’ i [Wi(D)] Dbm(xi.Bo)} = Bo. -----  CHAPTER 13  13.Q)]}.Bo) + (IP t ui)’]F(xi.  Taking  expectations gives Ao  _ E[Hi(Bo.  Qo  Therefore. from the usual results on M-estimation.Do) that depends only on xi.  We know that  Qo  solves  max E[log f(yi|xi.Q)] > exp{E[log  81  In  .  also maximizes exp{E[log  The problem is that the expectation and the exponential  function cannot be interchanged:  E[f(yi|xi.B). Avar  ^ -1 rN(B .D) is correctly specified for Var(y|x). can be written as  Dbm(xi. the Hessian (with respect to Hi(Bo.Bo. f(yi|xi. because exp(W) is an increasing function. and -----  a consistent estimator of Ao is ^ -1 A = N  N  ^ ^ -1 ^ S Dbm(xi. Jensen’s inequality tells us that E[f(yi|xi.  Therefore.Do).Do) =  B). of course.Bo)’[Wi(Do)]-1Dbm(xi.  i=1  c.Bo)’[Wi(Do)]-1Dbm(xi.  for some complicated function F(xi.Bo) in the usual way: A BA .  Qe$  where the expectation is over the joint distribution of (xi. of course.Do)] = E{Dbm(xi.B )’[Wi(D)] Dbm(xi.yi).   13.f(yi|xi. E[ri2li1(Q)|yi2. and  .Qo)Wh(y2|x.  Qo  maximizes E[li1(Q)|yi2. but where it is based initial on si and Ai: & SN ~sg*’& SN A~g*-1& SN ~sg* LMg = 7i=1 i8 7i=1 i8 7i=1 i8 & SN ~G’-1~s *’& SN ~G’-1~A ~G-1*-1& SN ~G’-1s~ * = i8 7 i i8 7i=1 8 7i=1 i=1 N ’ N -1 N & S s~ * G~-1G~& S A~ * G~’G~’-1& S ~s * = 7i=1 i8 7i=1 i8 7i=1 i8 N N -1 N & S ~s *’& S ~A * & S ~s * = LM.xi].7. In part b. The expected Hessian form of the statistic is given in the second ~g ~g part of equation (13.5.Qo). for all (yi2.  Since ri2 is a function of (yi2.  The log-  likelihood for observation i is  Q) _  li(  log g(yi1|yi2. Since si(Fo) = [G(Qo)’] si(Qo).  c.xi).xi). since ri2  > 1.4.3. -1  b.x. b.Q). Qo maximizes E[ri2li1(Q)|yi2. a. Parts a and b essentially appear in Section 15. The joint density is simply g(y1|y2. g  -1  E[si(Fo)si(Fo)’|xi] = E{[G(Qo)’] si(Qo)si(Qo)’[G(Qo)] g  g  -1  -1  |xi}  = [G(Qo)’] E[si(Qo)si(Qo)’|xi][G(Qo)] -1  -1  = [G(Qo)’] Ai(Qo)[G(Qo)] . = i i i  7i=1 8 7i=1 8  7i=1 8  13. 82  Qo  maximizes E[li2(Q)].  and we would use this in a standard MLE analysis (conditional on xi).xi] for all (yi2. we know that.  Similary. we just replace  -1  Qo  with  ~g ~ -1 ~ ~ -1 Ai = [G(Q)’] Ai(Q)[G(Q)]  Q~  and  Fo  with  F~:  _ ~G’-1~Ai~G-1. and  therefore  Qo  maximizes E[ri2li1(Q)].xi).36).xi] = ri2E[li1(Q)|yi2.  13.xi].xi. a.Q) + log h(yi2|xi.Q)]}. First. xi]. E[si1(Qo)si1(Qo)’|yi2. it follows that E[ri2si1(Qo)si2(Qo)’|yi2. we  have to assume or verify uniqueness.70). byt the conditional IM equality for the density g(y1|y2. where Hi1(Q) =  Dqsi1(Q).x.so it follows that  Qo  maximizes ri2li1(Q) + li2(Q).  As usual. -----  . E[si1(Qo)|yi2.  For identification.xi). The score is si(Q) = ri2si1(Q) + si2(Q).  E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’] + E[ri2si1(Qo)si2(Qo)’] + E[ri2si2(Qo)si1(Qo)’]. which means we can estimate the asymptotic variance of 83  rN(^Q . by iterated expectatins.70)  Since ri2 is a function of (yi2.  So we have verified that an unconditional IM equality holds. and so its transpose also has zero conditional expectation.xi] = 0. we have shown that E[si(Qo)si(Qo)’] = -E[ri2Hi1(Qo)] .Q). c.  Then.Q). this implies zero unconditional  We have shown  E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’]. Now. Now by the usual conditional MLE theory. Combining all the pieces. where si1(Q)  _ Dqli1(Q)’ and si2(Q) _ Dqli2(Q)’. expectation.xi] = 0 and.Qo) by {-E[Hi(Q)]}-1.xi). E[si2(Qo)si2(Qo)’] = -E[Hi2(Qo)].E[Hi2(Qo)] = -{E[ri2Dqsi1(Q) + = -E[Dqli(Q)] 2  Dqsi2(Q)]  _ -E[Hi(Q)].xi] = -E[Hi1(Qo)|yi2.  E[ri2si1(Qo)si1(Qo)’] = -E[ri2Hi1(Qo)].  (13. since ri2 and si2(Q) are functions of (yi2.  Further. by the unconditional information matrix equality for the density h(y2|x. where Hi2(Q) =  Dqsi2(Q). we can put ri2  inside both expectations in (13.  Therefore.  N  S ^Ai2 is consistent for -E[Hi2(Qo)]  i=1  by the usual iterated expectations argument. the asymptotic variance would be {E[Ai1(Qo) + Ai2(Qo)]} . under general regularity  -1 N  S ri2^Ai1 consistently estimates -E[ri2Hi1(Qo)]. since Ai1(Qo)  _  -E[Hi1(Qo)|yi2.xi). definition.B is p.  Similarly. Answer: B are P -1  B  .s.xi]. N  This implies that.to consistently estimate the asymptotic variance of the partial MLE. then A .A  We use a basic fact about positive definite matrices:  if A and  * P positive definite matrices.s. even though we do not have  a true conditional maximum likelihood problem. it follows that E[ri2Ai1(Qo)] = -E[ri2Hi1(Qo)].  Instead. -1  Ai2(Qo)]}  -1  But {E[ri2Ai1(Qo) + Ai2(Qo)]}  -1  . e. if and only if  -1  is positive definite. and ri2 is a function of (yi2.  Interestingly.{E[Ai1(Qo) +  is p.  i=1  where the notation should be obvious.xi) in one case. this estimator need not be positive definite. we can break  the problem into needed consistent estimators of -E[ri2Hi1(Qo)] and -E[Hi2(Qo)]. by  -1 N  _ -E[Hi2(Qo)|xi].Qo) is -----  -1 N  N  S (ri2H^i1 + H^i2). Bonus Question:  Show that if we were able to use the entire random  sample. the result conditional MLE would be more efficient than the partial MLE based on the selected sample.  But.  This  i=1  completes what we needed to show. -1  If we could use  the entire random sample for both terms. the asymptotic  variance of the partial MLE is {E[ri2Ai1(Qo) + Ai2(Qo)]} . From part c.d. for which we can use iterated expectations.  Now. (yi2. as we discussed in Chapters 12  and 13.d. conditions. one consistent estimator of  rN(^Q .d.E[ri2Ai1(Qo) + Ai2(Qo)] 84  . we can still used the conditional expectations of the hessians -.but conditioned on different sets of variables. because E[Ai1(Qo) + Ai2(Qo)] . Ai2(Qo)  Since. as we showed in part d. and xi in the other -.  using instruments (x1. single equation GMM estimator.ri2)Ai1(Qo)] is p.  If E(u2|x) = 2  s22. and 1 .1.  13. The simplest way to estimate (14.  CHAPTER 14  14. g1 = 0. the parameter g2 does not appear in the model.  13.x2). E(y22|x) $ [E(y2|x)] 2.  Even under  homoskedasticity.  course. We can see this by obtaining E(y1|x): E(y1|x) = x1D1 +  Now. so we cannot write E(y1|x) = x1D1 +  g  g1(xD2) 2.  2SLS using the given list of instruments is the efficient.d.  in fact. No.s.  Nonlinear functions of these can be added to the instrument list  -. a. if we knew  Of  g1 = 0.= E[(1 . these are difficult.  Finally. we would consistently estimate D1 by OLS.s. when  g  = x1D1 +  g  g1E(y22|x) + E(u1|x)  g  g1E(y22|x).  If  g2 $ 1.5. (since Ai1(Qo) is p. to find analytically if b.d.ri2  > 0.  Otherwise.3.  g  g2 $ 1.  c. one could try  to use the optimal instruments derived in section 14.these would generally improve efficiency if  g2 $ 1. we cannot find E(y1|x) without more assumptions.35) is by 2SLS. To be added. if not impossible.9. the optimal weighting matrix that allows  heteroskedasticity of unknown form should be used. To be added. regression y2 on x2 consistently estimates 85  D2.11.  While the  the two-step NLS estimator of  . ")  When  D1  and  g2.  * (1 + 4K) matrix H defined  .Qo)r(wi. -1  14. -1  _  Now we can verify (14. (xiD2) will not be consistent for example of a "forbidden regression.L’ 2 .3. the plug-in method works:  it is just the usual 2SLS estimator.(L3 +  = HQ for the (3 + 9K)  by  86  B)’]’.L’ 2 . we can write  P  B)’.L’ 1 .  So.^ g2 yi1 on xi1.10) with Go = E[Z’ i Ro(xi)].L’ 3 .  *  14.  (This is an  g2 = 1. t = 1.  * 1. G’ o %oZ’ i r(wi.(L2 +  Therefore.L’3 ]’.63).B’)’.3.Qo)’)o(xi) Ro(xi)] *  -1  = G’ o %oE[Z’ i E{r(wi.  Then the asymptotic variance of the GMM estimator has the form (14.  %o  function of xi and let  * L matrix that is a  Let Zi be a G  be the probability limit of the weighting matrix.2. where  we suppress its dependence on xi.3. take A  _ G’o %oGo and s(wi) _ *  The optimal score function is s (wi)  Ro(xi)’)o(xi) r(wi.2.  P2  = [L’ 1 . P1 = [(L1 + B)’.Qo). and then P is the 3 + 9K * 1 vector obtained by Let  Q  = (j.  pt0 + xiPt + vit.Qo).57) with  r = 1:  E[s(wi)s (wi)’] = G’ o %oE[Z’ i r(wi.Qo)r(wi.L’3 ]’.5. t = 1. P3  = [L’ 1 . in (14. Let Zi be the G  * G matrix of optimal instruments in (14.L’2 .54).  With the restrictions imposed on  we have  pt0 = j.Qo)’|xi})o(xi) Ro(xi)] -1  = G’ o %oE[Z’ i )o(xi))o(xi) Ro(xi)] = G’ o %oGo = A. We can write the unrestricted linear projection as yit = where  Pt  is 1 + 3K  stacking the the  Pt  Pt.  assuming H’% H is nonsingular -.a. With h(Q) = HQ.is nonsingular -.&1 0 0 0 0 * 0 IK 0 0 0 0 IK 0  2  2  2  0 1 0 H = 0 0 1 0 0 70 2  2  2  2  2  2  2  2  2  2  0 0 IK 0 0 0 IK 0 0  0 0 0 IK 0 0 0 IK 0  IK 0 0 0 IK 0 0 0 IK  IK 0  2  2  2  0 0 0 . from Chapter  s2uIT under RE. IK 0 0 0 0 IK8 2  2  2  2  2  2  2  2  2  2  14.we have -1  Q^  ^-1 -1 ^-1^ = (H’% H) H’% P.  or  ^-1 Therefore. as described in the hint.p.56) for this choice of ¨ ¨ Now. We have to verify equations (14.55).1. E(si1s’ i1) = E(X’ i rir’ i Xi) =  iterated expectations argument.  su2E(Xˇ’i Xˇi) _ su2A1 by the usual  This means that.  Now. the minimization problem becomes ^ ^-1 ^ min (P .HQ) = 0  ^-1 ^ ^-1^ (H’% H)Q = H’% P.56) for the random effects and fixed effects estimators. in (14. RE. A1. where ri = vi  ˇ ˇ Therefore.55) and (14.9. and RE.  ljTvi) = ¨X’i vi = ¨X’i (cijT + ui) = -----  .  ¨ ˇ But si2s’ i1 = X’ i uir’ i X i. when H’%o H -. -----  Now.which occurs w. X’ i ri = X’ i (vi 87  r.7. si2 (with added i  subscripts for clarity).  The choices of si1.2. we know that E(rir’| i xi) = -  ljTvi.HQ).  QeR P  where it is assumed that no restrictions are placed on  Q.  The first order  condition is easily seen to be ^-1 ^ ^ -2H’% (P .HQ)’% (P .  r _ su2.3.1.  14. and A2 are given in the hint.  we just need to verify (14. 10.   88  To finish off the proof. This verifies (14.  ¨ ˇ ¨ ˇ So si2s’ i1 = X’ i rir’ i Xi and therefore E(si2s’ i1|xi) = X’ i E(rir’| i xi)Xi = It follows that E(si2s’ i1) =  ¨ ˇ ¨ note that X’ i Xi = X’ i (Xi =  su2E(X¨’i Xˇi). -----  s2u.  su2X¨’i Xˇi.¨ X’ i ui.  ljTxi) = ¨X’i Xi = ¨X’i ¨Xi.56) with  r  .  a.the coefficient on dm is obtained from the  regression yi on dmi. and these are necessarily in [0.d1) = (g1 + dz 2 ---------------------------------------------------------------------------------------  F(z1D1 + g1z2 + g2d1 + g3z2d1).d1) = the partial effect of z2 is dP(y = 1|z1.. The fitted values for each category will be the same. a. b.z2) = (g1 + 2g2z2)Wf(z1D1 + dz 2 2  -------------------------------------------------------------------------  g1z2 + g2z22).1].3.z2) = F(z1D1 + g1z2+ g2z2) then dP(y = 1|z1.z2. i = 1.  But this is easily seen to be the fraction  of yi in the sample falling into category m...d1 = 1) .  Therefore.d1 = 0) =  F[z1D1 + (g1 + g3)z2 + g2] . If P(y = 1|z1. In the model P(y = 1|z1.in the first case..1. and all i -.  for given z. this is estimated as ^ ^ ^ ^ ^ 2 (g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2). the estimates are the probit estimates. ..P(y = 1|z.dkiWdmi = 0 for k  $ m. the fitted values are  just the cell frequencies. and the coefficient on dm becomes the difference in cell frequency between category m and category one.  If we drop d1  but add an overall intercept. m = 2.N.  The effect of d1 is measured as the difference in the probabilities at d1 = 1 and d1 = 0: P(y = 1|z. the overall intercept is the cell frequency for the first category.we 89  . to estimate these effects at given z and -. of course. M.  Again.  g3d1)Wf(z1D1 + g1z2 + g2d1 + g3z2d1).CHAPTER 15  15. b..F(z1D1 + g1z2). d1 -. where. Since the regressors are all orthogonal by construction -.  15.z2.. q) with a standard normal distribution. ======================================  It follows  that  5 F&7z1D1/r g21z22 + 18*.  for P(y = 1|z).  Also.q).F^i)  ^ -----  r1 = 0. we have a standard probit model. r/r  g1z2E(q|z) +  Thus. ^ui = yi ^  90  5 Fi. this is what we can estimate ======================================  P(y = 1|z) =  along with  D1. 1 1 1 1 2 dz 2  15. (15. we would  require the full variance matrix of the probit estimates as well as the gradient of the expression of interest. Because P(y = 1|z) depends only on g1.5.e|z) = 0 by independence between e and (z.)  g1 = -2 and g1 = 2 give exactly the same model  This is why we define  r1 = g21.  g1z2 +  (Not with respect to the zj. where r =  g1z2q + e. a.  Thus. If P(y = 1|z. Define  Let  D1 ^  denote the probit estimates under the null that  Fi = F(zi1^D1). We would apply the delta method from Chapter 3. c.  Because q is assumed independent  of z.  (For example. this follows because E(r|z) = 2 2  E(e|z) = 0.e|z) = g12z22 + 1  because Cov(q. Write y  = z1D1 + r. and ~ui _ u^i/r F^i(1 .g1z2 + 1).q) =  -----------------------------------------------------------------  assuming that z2 is not functionally related to z1.  ================================================  . under H0.just replace the parameters with their probit estimates. with respect to all probit parameters. such as (g1 + 2g2z2)Wf(z1D1 +  g2z22). and use average or other interesting values of z.)  F(z1D1 + g1z2q) then dP(y = 1|z.q) = g qWf(z D + g z q).  5 2 2 g1z2 + 1 has a standard normal distribution independent of z. q|z ~ Normal(0. ^fi = f(zi1^D1).  Var(r|z) =  g12z22Var(q|z) + Var(e|z) + 2g1z2Cov(q.90) 2 c. *  b. and e is independent of  (z.  Testing H0:  r1 = 0 is most  easily done using the score or LM test because. 42 0.  for each i.0116466 .0020613 .34 0.0000 0.(the standardized residuals).62151145 Residual | 500.F^i).0089326 .  15.  The  r1 evaluated at the  But the partial derivative of (15. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Source | SS df MS -------------+-----------------------------Model | 44.0048884 -0. Interval] -------------+---------------------------------------------------------------pcnv | -.F^i).0063417 0.0797 .  r1 in  But this is not a standard probit estimation.000 .581 -.17 0. NRu ~  ================================================  c21.90)  evaluated under the null estimates.1295718 born60 | .83 0.1156299 . is simply  only other quantity needed is the gradient with respect to null estimates.2078066 hispan | .3609831 .0215953 .0044679 -4. (zi1D^1)zi2 fi/r F^i(1 .43 0. 1 i2 7 1 i2 8 9 i1 1 0 ^ ^ 2 ^ When we evaluate this at r1 = 0 and D1 we get -(zi1D1)(zi2/2)fi. The model can be estimated by MLE using the formulation with place of  g21.90) with respect to  r1 is.0209336 -7.816514 2724 .20037317  Number of obs F( 8.42942  -----------------------------------------------------------------------------arr86 | Coef.0171986 0.844422 2716 .0159374 tottime | -. Err.184405163 -------------+-----------------------------Total | 545. the  2  score statistic can be obtained as NRu from the regression ~ ui  5 5 2 ^ f^izi1/r F^i(1 .1617183 .000 -.3925382 91  .0014738 -.0365936 _cons | .0028698 . The following Stata output is for part a:  .329428 .48 0.673 -. ================================================  on  2 a under H0.  &r z2 + 1*-3/2f(z D /r5g2z2 + 1).000127 -9.0035024 .  ^  fizi1 .55 0.0303561 -.000 -.0012248 .37 0.000 . t P>|t| [95% Conf.0308539 . a.0892586 .88 0.0824 0.000 .  with respect to  The gradient of the mean function in (15.1133329 avgsen | .0205592 4.1954275 -.  d.0160927 22.1543802 .0009759 black | .9720916 8 5. ==========================================  -(zi1D1)(zi2/2) 2  Then.  D1.0128344 inc86 | -. 2716) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  2725 30.000 -. Std.65 0.867 -.0489454 .007524 ptime86 | -.7.0235044 6. 0080423 .42942  -----------------------------------------------------------------------------| Robust arr86 | Coef.0269938 -.0 F(  2. test avgsen tottime ( 1) ( 2)  avgsen = 0.59 0.3937449 -----------------------------------------------------------------------------The estimated effect from increasing pcnv from .-----------------------------------------------------------------------------. b. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60.  There are no  important differences between the usual and robust standard errors.0210689 4.000 -.0167081 21.018964 -8. qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 .33 0.3609831 .  In fact.2117743 hispan | .077. robust Regression with robust standard errors  Number of obs F( 8.626 -.75 is about -.0001141 -10.1543802 . so the probability of arrest falls by about 7.000 .7 points.0028698 .1617183 .5) = -.0014487 -.0058876 0.0020613 .0012248 . test avgsen tottime  92  .0027532 -7.17 0.000 -.154(. Interval] -------------+---------------------------------------------------------------pcnv | -.0307774 .1116622 .0062244 ptime86 | -.036517 _cons | .0035024 .3282214 .552 -.59 0.8320  .24 0.0824 .0 tottime = 0. Err.14 0.84 0.0150471 tottime | -.0255279 6. Std.0215953 .1305714 born60 | . 2716) Prob > F R-squared Root MSE  = = = = =  2725 37.49 0.1915656 -.61 0. t P>|t| [95% Conf.0000 0.0479459 .0892586 .001001 black | .0042256 -0.000 -.1171948 avgsen | .73 0.0161967 inc86 | -.  in a couple of cases the robust standard errors are notably smaller.000 .0171596 0.25 to . The robust statistic and its p-value are gotten by using the "test" command after appending "robust" to the regression command: . 2716) = Prob > F =  0.000 .867 -.010347 .18 0. 4 ptime86 | 2725 .0127395 .0000 0. Min Max ---------+----------------------------------------------------avgsen | 2725 .1629135 .840 -.52 0.6406  Probit estimates  Number of obs LR chi2(8) Prob > chi2 Pseudo R2  Log likelihood = -1483.548 -.62721 0 541  93  .4192875 born60 | .607019 0 63.0719687 6. z P>|z| [95% Conf.028874 .000 .0 F(  2.5529248 .017963 -4.45 0.60 0.12 0.( 1) ( 2)  avgsen = 0.20 0.0076486 .0212318 0.950051 0 12 inc86 | 2725 54.1164085 -.4143791 -. Err. and at the average values of the remaining variables: .48 0.508031 0 59. Interval] -------------+---------------------------------------------------------------pcnv | -.000 . 2716) = Prob > F =  0.0004777 -9.387156 1.3138331 .18 0.0036983 black | .0168844 -0.0543531 tottime | -.2911005 .0 tottime = 0.0254442 ptime86 | -. black = 1. The probit model is estimated as follows: .4116549 avgsen | .0112074 .70 0. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = = = =  -1608.2 tottime | 2725 .8360  c.0512999 -6.0979318 .0055709 -.3157 -1483. sum avgsen tottime ptime86 inc86 Variable | Obs Mean Std.0046346 .0459949 inc86 | -.6458 -1483. we must compute the difference in the normal cdf at the two different values of pcnv.000 -.1203466 _cons | -.6406  = = = =  2725 249.0774  -----------------------------------------------------------------------------arr86 | Coef.45 0.0812017 .96705 66.651 -.000 -.0556843 0.4666076 .6941947 -.0654027 4.67 0. Dev.0720778 -7.0407414 . Std.1837 -1486.213287 -----------------------------------------------------------------------------Now.8387523 4.3255516 .000 -.09 0.000 -.6322936 3.6076635 hispan | . born60 = 1. hispan = 0.  probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq pt86sq inc86sq 94  .. but we cannot very well predict the outcome we would most like to predict.0046*54. di 1903/1970 .6% of the time.632 .normprob(-.. we first generate the predicted values of arr86 as described on page 465: .97 + . di -.. d.25 .10331126 For men who were not arrested. Pr(arr86)) . gen arr86h = phat > .117) -. the probit is correct  only about 10. di 78/755 .839 . the probit predicts correctly about 96. e.96598985 . tab arr86h arr86 | arr86 arr86h | 0 1 | Total -----------+----------------------+---------0 | 1903 677 | 2580 1 | 67 78 | 145 -----------+----------------------+---------Total | 1970 755 | 2725  .0076*.1174364 . for the men who were arrested.0112 -. To obtain the percent correctly predicted for each outcome... predict phat (option p assumed.387 .313 + . di normprob(-.5 .10181543 This last command shows that the probability falls by about ..3% of the time.0127*. Adding the quadratic terms gives .467 + .553*.  The overall percent correctly predicted is  quite high.75 .553*.0812*.  Unfortunately.10.117) . which is somewhat larger than the effect obtained from the LPM. 0000  The quadratics are individually and jointly significant.067082 3.000 .1474522 -. Std.04 0.3250042 pt86sq | -.57 0.2089 -1444.0340166 .0 chi2( 3) = Prob > chi2 =  38.1047  -----------------------------------------------------------------------------arr86 | Coef.8570512 .857)  ~ .8535 -1440.Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration  0: 1: 2: 3: 4: 5: 6: 7:  log log log log log log log log  likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood  = = = = = = = =  -1608.89 0. which does not make much sense.2714575 -3.1349163 .389098 -.26 0.268 -1439.8166 -1439.0244972 0.0 inc86sq = 0.568 -.  The quadratic in  pcnv means that.405 -.0009851 -5.0213253 ptime86 | .4630333 1.97 0.63e-07 .00 0. .0039478 black | . Err.8005  = = = =  2725 336.28e-06 2.000 -.0058786 .2663945 .000 -.000 .0224234 -4.059554 inc86sq | 8.3978727 born60 | -.0620105 tottime | -. at low levels of pcnv. there is actually a positive relationship between probability of arrest and pcnv.0562665 -6. Interval] -------------+---------------------------------------------------------------pcnv | .0000171 _cons | -.0 pt86sq = 0.0199703 -0.1438485 5.2929913 .8005 -1439.798 -.2604937 0.95 0.77 0.0965905 pcnvsq | -.4476423 -.056957 .580635 hispan | .0145223 .217/(2*.54 0.000 .  The turning point is easily found as .2937968 . which means  that there is an estimated deterrent effect over most of the range of pcnv. z P>|z| [95% Conf.62 0.16 0.002 -1.000 -.041 3.0139969 .2270817 -----------------------------------------------------------------------------note: 51 failures and 0 successes completely determined.8005  Probit estimates  Number of obs LR chi2(11) Prob > chi2 Pseudo R2  Log likelihood = -1439.337362 .1837 -1452.0733798 5.18 0.026909 inc86 | -. 95  .75e-06 4.83 0. test pcnvsq pt86sq inc86sq ( 1) ( 2) ( 3)  pcnvsq = 0.97 0.0078094 -.2167615 .0000 0.0566913 -0.0178158 .127.7273198 avgsen | .372 -.4368131 .1035031 .1256351 .3151 -1441.7449712 .  c. just as we can  use an R-squared to choose among different functional forms for E(y|x). for each i. the joint density (conditional on xi) is the product of the marginal densities (each conditional on xi). where x1 = 1.maximizes the KLIC.. exogeneity assumptiond:  The second assumption is a strict  D(yit|xi) = D(yit|xit)..  It may be  impossible to find an estimate that satisfies these inequalities for every observation... especially if N is large. during the  iterations to obtain the MLE.yT|xi) = f1(y1|xi)WWWfT(yT|xi)..T.that D(yit|xit) follows a probit model -..yi)log(1 . For any possible estimate  B^. t = 1. b.  When we add the  standard assumption for pooled probit -. (yi1. independence assumption: independent... we can use values of the log-likelihood to choose among different models for P(y = 1|x) when y is binary. of course -.then 96  . We really need to make two assumptions..9..... a.  15.  that is. asymptotically the true density will produce the highest average log likelihood function.. Let P(y = 1|x) = xB.  Therefore.N.yiT) are  This allows us to write f(y1.  So.xiT)..  = yilog(xiB) + (1 .  which is only well-defined for 0 < xiB < 1..11.  Since the MLEs  are consistent for the unknown parameters.  li(B)  Then.15... This follows from the KLIC:  the true density of y given x --  evaluated at the true values.  the log-likelihood function is well-  ^ defined only if 0 < xiB < 1 for all i = 1..xiB)..  The first is a conditional  given xi = (xi1. this condition must be checked.  say x...13.  15. -----  which requires either plugging in a value for x.. c. between groups B and A.F(^d0 + ^d2 + x^G)] ------  ^ ^ ^ . we have  q^ _ [F(^d0 + ^d1 + ^d2 + ^d3 + x^G) . or averaging the differences across xi.  We would estimate all parameters from a  probit of y on 1.G(xitB)]1-yt. will be identical across models. of the change in the response probability over time.  t=1  and so pooled probit is conditional MLE. and let dB be an indicator for the treatment group. and x using all observations.  The estimated probabilities for  the treatment and control groups.f(y1.F(d0 + d2 + xiG)]  ~  i=1  ^ ^ ^ .  Once we have  the estimates. If there are no covariates.  where x is a vector of covariates. ------  and in the latter we have N ^ ^ ^ ^ q _ N-1 S {[F(d^0 + d^1 + d^2 + d^3 + xiG ) .  In the former case. d2WdB. Let d2 be a binary indicator for the second time period. both before and after the policy change. d2.  Both are estimates of the difference.yT|xi) =  T  p [G(xitB)]yt[1 . b.[F(d0 + d1 + xG) ------  ------  F(^d0 + x^G)].[F(d0 + d1 + xiG) -  F(^d0 + xi^G)]}. we need to compute the "difference-in-differences" estimate.  Then a probit model to evaluate the  treatment effect is P(y = 1|x) =  F(d0 + d1d2 + d2dB + d3d2WdB + xG).. dB.  97  . We would have to use the delta method to obtain a valid standard error for either  ^  q or ~q. there is no point in using any method other than a straight comparison of means. a.   We would be assuming that the underlying GPA is  normally distributed conditional on x.  Along with the  bj -.yG|x. equivalently. this depends only on the observed data..c..yG are dependent without conditioning on c. and the unknown parameters..c. i 3 -87g=1 g ig i 8 4  log  As expected. 1  2  G  b. ordered probit with known cut points.. but we only observe interval coded data..  (Clearly a conditional normal distribution for the GPAs is at best an  approximation...Go) h(c|x.c.)  s2.. (xi.including an intercept -.Do)dc.c...Go)f2(y1|x. 7 8 -8 g=1 where  c  g  is a dummy argument of integration...Go).yiG). We obtain the joint density by the product rule...yi1..yG) given x is obtained by integrating out with respect to the distribution of c given x:  8&  G  *  g(y1. We should use an interval regression model. y1.19.Gg)*h(c|x .. The log likelihood for each i is  # 8i& pG f (y |x .D)dc$..  98  .  Because c appears in each  D(yg|x. To be added.15.  15.  15.. c. a.c): f(y1.Go) = f1(y1|x. since we have independence conditional on (x.c)..c.Go)WWWfG(yG|x. The density of (y1.c.we estimate  The estimated coefficients are interpreted as if we had done a linear  regression with actual GPAs.yG|x.Go) = i p fg(yg|x.15.17.. xiB]/s}.1. f(y|xi) = 1 -  This is because.  LR  is  c2K2. The density of yi *  density of yi  < log(c).ci) has the same form as the density of yi given xi above.xiB]/s}  L  F{[log(c) . for y  Thus. and then  the model without x2.  Thus. then the 99  .s2).  e. P(yi  _ log(ti*).xiB)/s].xiB|xi] = 1 As c  L  8. which is treated as exogenous.  _ log(ti) (given xi) when ti < c is the same as the  b. the density for yi = log(ti) is  F{[log(c) .F{[log(c) .  = 0. I would probably use the likelihood ratio  This requires estimating the model with all variables.  where ui might contain unobserved ability. in something like an unemployment duration equation. we do not wait longer to censor people of lower ability. the density of yi given (xi. which is just Normal(xiB. P[log(ti) = log(c)|xi] = P[log(ti) > log(c)|xi] *  = P[ui > log(c) .  Thus.  1. 2 c.  B2  -1  f[(yi .CHAPTER 16  16. a. the longer we wait to censor.xiB]/s}) f(y|xi) =  1  -----  + 1[yi < log(c)]Wlog{s d. F{[log(c) . Since ui is independent of (xi. < y|xi) = P(yi* < y|xi).  This simply says that. except that ci replaces c.  The assumption that ui is independent of ci means that the decision to  censor an individual (or other economic unit) is not related to unobservables *  affecting ti. if xi  contains something like education. and so P[log(ti) = log(c)|xi]  L  0 as c  L  8.xiB]/s}.ci). To test H0: statistic.xiB)/s]}. y < log(c).s ) = 1[yi = log(c)]Wlog(1 .  Note that ci can be related to xi.  The LR statistic is  distributed asymptotically as  LR  = 2(Lur -  Lr).  Under H0. y = log(c)  sf[(y . the less likely it is that we observe a censored observation. li(B. xB)/s] .xB)  *  E(y  = xB +  sE[(u/s)|x. a. using the hint. E(y|x.a1 < y < a2)WP(a1 < y < a2|x) + a2P(y2 = a2|x) = a1F[(a1 .xiB)/s]  F[-(a2 . P(yi = a2|xi) = P(yi  *  = P[(ui/s) =  > a2|xi) = P(xiB + ui > a2|xi)  > (a2 . P(yi = a1|xi) = P(yi *  =  < a1|xi) = P[(ui/s) < (a1 .xB)/s]}  = E(y|x.xB)/s < u/s < (a2 .  16.xiB)/s].xB)/s]}  .xiB)/s] = 1 .a2)/s] = a1F[(a1 .xB)/s]  = xB +  s{f[(a1 .  Similarly.  Next.xB < u < a2 .(a1 .xB < u < a2 .  Therefore.xB)/s] + a2F[(xB .  *  *  |x.xB)/s]  -  f[(a2 .xB)/s] 100  F[(a1 .xiB)/s]  F[(a1 .a1 < y* <  < a2 if and only if a1 . Since y = y a2).a1 < y < a2). P(yi  < y|xi) = P(y*i < y|xi) = F[(y .a1 < y < a2)W{F[(a2 . and a1 < y  (1/s)f[(y .a1 < y* < a2) = xB + E(u|x.3.a1 .xB)/s]}/{F[(a2 .  *  But y  < a2.xiB)/s].F[(a1 .xB)/s] + E(y|x. for a1 < y < a2. we can easily get E(y|x) by using the following: E(y|x) = a1P(y = a1|x) + E(y|x.xiB)/s].xiB)/s].censoring time can depend on education.F[(a2 .xB. Now.  Taking  the derivative of this cdf with respect to y gives the pdf of yi conditional on xi for values of y strictly between a1 and a2: *  b.  |x.a1 < y < a2) = E(y  *  when a1 < y  = xB + u. 57)  s{f[(a1 . and F2 _ F[(a2 .  The linear regression of yi on xi using only those yi such that a1 < yi < a2 *  consistently estimates the linear projection of y *  for which a1 < y  < a2.F1)bj + [(xB/s)(f1 . at the right endpoint.a1 < y* < a2) $ xB. just plug  The expressions can be evaluated at  interesting values of x. We get the log-likelihood immediately from part a:  li(q)  = 1[yi = a1]log{F[(a1 .xB)/s] = f[(xB -  F1 _ F[(a1 .xiB)/s]}. and so it  *  c.  f1 _ f[(a1 . We can show this by brute-force differentiation of equation (16. As a shorthand. or  strictly between the endpoints.] d. After obtaining the maximum likelihood estimates these into the formulas in part b.xB)/s] +  F[(a1 . the regression on the restricted subsample could consistently estimate  B  up to a common scale coefficient.a2)/s]} + 1[a1 < yi < a2]log{(1/s)f[(yi . e. dE(y|x) = -(a /s)f b + (a /s)f b 1 1 j 2 2 j dx j + (F2 . f.  Note how the indicator function selects out the appropriate density for each of the three possible cases:  at the left endpoint.  [In some restrictive  cases.xB)/s] .  on x in the subpopulation  Generally.  B^  and  s^2. there is no reason to think that this will  have any simple relationship to the parameter vector  B.xB)/s].57).xiB)/s]} + 1[yi = a2]log{F[(xiB .f[(a2 .xB)/s]}  (16.xB)/s]}  + a2F[(xB .xiB)/s].  |x. From part b it is clear that E(y  would be a fluke if OLS on the restricted sample consistently estimated  B.xB)/s].+ (xB)W{F[(a2 .f2)]bj -----------------------------------  101  Then  . write a2)/s].a2)/s]. f2 _ f[(a2 .  For data censoring where the censoring points might change with i. we could average {F[(a2 . terms cancel except (F2 -  Careful inspection shows that all  F1)bj.xB)/s]f2}bj. does not make sense to directly compare the magnitude of By the way.  there is no sense in which  ^  s is "ancillary. which is the expression we wanted to be left  with. say.xB )/s]}bj.  These are  estimated as ^ ^ {F[(a2 . The partial effects on E(y|x) are given in part f."  h.  ^  s appears in the partial effects along with the ^bj.  (16.  Intepretating the results is even easier.xB)/s. the scaled  ^  bj  Generally. x. the analysis is essentially the same but a1 and a2 are replaced with ai1 and ai2.57). this approximation need not  be very good in a partiular application. where 0 <  ^  r < 1 is the scale factor.+ {[(a1 .58)  ^ ^ F[(a1 .xiB)/s] -  all i to obtain the average partial effect. respectively. -----  at.xiB )/s]} across  In either case.xB)/s]f1}bj . The scale factor is simply the probability that a standard normal random variable falls in the interval [(a1 .  We could evaluate these partial effects  ^ ^ Or. in (16. can be compared to the  ^  gj.(a2 .  Of course. 102  . and the last two lines are obtained from differentiating the second term in E(y|x).  ^ ^ ^ F[(a1 . since we act as if we were  able to do OLS on an uncensored sample.{[(a2 . where the first two parts are the derivatives of the first and third terms. we expect ^  ^ gj ~ ^rWb j. but it is often roughly true. g.xB)/s] where the estimates are the MLEs.xB)/s]. note that  It  ^  bj with that of ^gj. which is necessarily between zero and one. 1608084 .0678666 -0.423 0.635 0.442231015  Number of obs F( 11.000 .282847328 ---------+-----------------------------Total | 271.0025859 . t P>|t| [95% Conf.0022495 .0477021 .0986582 tenure | .265 0.762 0.871 0.0132201 age | -.0046627 0. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union.1772515 -3.1473341 .0492621 .0673714 -0.1825451 .251898 .66616  = 616 = 283.258 -.2084814 male | .688 0.0040631 .005544 .812 0.3547274 white | .2556765 .1038129 union | .2282836 .000 -1.4748429 _cons | -.186 -.0041162 -0.3518203 -----------------------------------------------------------------------------b. t P>|t| [95% Conf.0499022 7.2455481 nrtheast | -.057 -.0869168 .0614223 nrthcen | -.084021 south | -.0029666 .000 .0510187 1.560 -.0058343 educ | .0037237 7.0044362 -0.0103333 .364019 white | .0360227 married | .16.0551672 4.492 -.811 0.0061263 educ | . a.0083783 9.949 0.021225 .468 -.3768401 .000 . Interval] ---------+-------------------------------------------------------------------exper | .909 0.010294 .0029862 .0737578 -1.552 0.0281931 .972074 615 .0696015 .2145  -----------------------------------------------------------------------------hrbens | Coef. reg hrbens exper age educ tenure married male white nrtheast nrthcen south union Source | SS df MS ---------+-----------------------------Model | 101.726 0.0351612 married | .672 -.3718 0. Interval] ---------+-------------------------------------------------------------------exper | .325 0. ll(0) Tobit Estimates  Number of obs chi2(11) Prob > chi2 Pseudo R2  Log Likelihood = -519.0050939 . 604) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  616 32.585 -.50 0.0538339 1.131 0. Std. Err.054929 .000 .86 = 0.2788372 .0035481 7.048028 -.021397 .53183  -----------------------------------------------------------------------------hrbens | Coef.583 0.0746602 1.6999244 .3604 .0043435 0.1490686 .0000 0.839786 604 .547 0.000 .384 -.0994408 .946 0.19384436 Residual | 170.132288 11 9.082204 .710 0.0088168 9.1900971 male | .0899016 .0115164 age | -.858 0.1042321 tenure | .1027574 .5.0000 = 0. Std.098923 .206 -.0834306 .0112981 .0657498 .0523598 4. The Tobit estimates are . The results from OLS estimation of the linear model are . Err.079 -.000 .0284978 .078604 1.000 .2538105 103  .0287099 . 2416193 nrtheast | -.0743625 nrthcen | -.2300547 .0973362 tenure | .197323 .629 -. have hrbens = 0.2870843 .0104947 5.000 -.1187017 union | .0802587 .0714831 . the parameter "_se" is  ^  s.nrtheast | -.0760238 -0.685 0.753 0. Interval] ---------+-------------------------------------------------------------------exper | .3621491 white | .715 0.0001487 -3.047408 age | -.95 = 0.0044995 educ | .0539178 4.801 -.177 -.0768576 1.  2  c.000 .0246854 .632 0.1880725 -4.0002604 104  .2562597 . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq.717 0.0040294 .0000 = 0.180 0.004 0.0906783 .0139224 .3874497 .y > 0).252 0.0165773 (Ancillary parameter) -----------------------------------------------------------------------------Obs.033717 .000 .051105 7.1536597 .316 -.1503703 . the Tobit  estimates are all slightly larger in magnitude.000 .3006999 .348 0.0698213 -0.2388  -----------------------------------------------------------------------------hrbens | Coef.8137158 .000 -1.037525 .4878151 expersq | -. this reflects that the scale factor is always less than unity. or about 6.0324014 .1639731 .0522697 7.017479 .0693418 -0.0912729 south | -. Here is what happens when exper  and tenure  2  are included:  .1753675 male | .62108  = 616 = 315.0581357 .0787463 married | .327 0.597 0.483 0.0489422 . Err.5551027 .230 0.000 .0631812 .493 -.0005524 .0709243 -0.1034053 south | -.0085253 3.1708394 .0778461 .528 -.239 -.351 0.  as we know.0480194 .  As expected.728 -.0713965 -0.0008445 -.581 0.5060039 _cons | -.540 0.0602628 .0528969 1.000 .1012841 nrthcen | -.0306652 . ll(0) Tobit Estimates  Number of obs chi2(13) Prob > chi2 Pseudo R2  Log Likelihood = -503.4443616 ---------+-------------------------------------------------------------------_se | .0086957 9.18307 -.  Again.1891572 .4033519 . Std.000 .7% of the sample.0125583 .  You  should ignore the phrase "Ancillary parameter" (which essentially means "subordinate") associated with "_se" as it is misleading for corner solution applications:  ^2  s appears directly in ^E(y|x) and ^E(y|x. t P>|t| [95% Conf.0043428 -0. summary:  41 left-censored observations at hrbens<=0 575 uncensored observations  The Tobit and OLS estimates are similar because only 41 of 616 observations.928 0.0775035 -1.354 -.1146022 union | . 888 0.09766  = 616 = 388.0556864 4.5099298 .319 -1.0013026 .330 0.0427534 age | -.2433643 .243 0.107 -. and we use ind1 as the base industry: .376006 -1.9436572 .2035085 .993 -.0400045 nrthcen | -.0438005 .372 0.7117618 .000 .2351539 . There are nine industries.214924 ind7 | .6107854 .5418171 .307673 -.04645 .0099413 5. ll(0) Tobit Estimates  Number of obs chi2(21) Prob > chi2 Pseudo R2  Log Likelihood = -467.3504717 white | .3716415 -0.343 0.168 -1.375 -1.091 0.910 0.408 -.380 0.2375989 ---------+-------------------------------------------------------------------_se | .109 0.3742017 -0. Err.563 -.8203574 .001 -.053115 .0963657 .0034182 .1188029 .295 0.7377754 ind8 | -.0000 = 0.231545 .0908226 union | .0005242 _cons | -.002134 -.009 0.000 -1.0013291 .127675 ind9 | -.105 -1.tenuresq | -.997 0.000 -.5083107 .373072 0.06154 .4947348 ind5 | .3669437 -0.3257878 .086 0.0115306 .6276261 ind4 | -.001 .7536342 ind6 | -.390 0.1016799 .0506381 6.368639 -0.0001623 tenuresq | -. summary:  41 left-censored observations at hrbens<=0 575 uncensored observations  Both squared terms are very signficant.0007188 -. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq ind2-ind9.4137824 -1.0267869 .0735678 1.0003863 -3.0726393 married | .0655859 -0.9650425 .159 -.0161572 (Ancillary parameter) -----------------------------------------------------------------------------Obs.0963403 tenure | .0151907 (Ancillary parameter) -----------------------------------------------------------------------------105  .0667174 -1.0501776 1.527 -.3682535 -1.409 0.5796409 ---------+-------------------------------------------------------------------_se | . d.955 -.0379854 .0789402 .794 -.0209362 .2940  -----------------------------------------------------------------------------hrbens | Coef. Std.000 .615 0.633 0.000544 ind2 | -.261 0.387704 .091 0.3617389 ind3 | -.5750527 .108095 .624 0.0724782 .99 = 0.000 .000 . so they should be included in the model. t P>|t| [95% Conf.0041306 -0.165 -1.0108205 .276 -.0585521 south | -.0721422 -1.4137686 expersq | -.0256812 .1532928 male | .3739442 0.828 0.579 0.349246 .7310468 .0001417 -3.0081297 3.0335907 .1317401 .3948746 _cons | -.1853532 -5.278 -.0004405 .002 -.0088598 8.0004098 -3.056 0.1667934 .3731778 .2632871 nrtheast | -.207 0.0046942 educ | .0033643 .3143174 .0547462 .2148662 .2411059 . Interval] ---------+-------------------------------------------------------------------exper | .0020613 -.  but the joint Wald test says that they are jointly very significant.  A more general case is done in Section  Briefly. This is somewhat unusual for dummy variables that are necessarily orthogonal (so that there is not a multicollinearity problem among them).]  16.467. if f(W|x) is the continuous density of y given x. 595) = Prob > F =  9. 17. in this example. with a worker in. summary:  41 left-censored observations at hrbens<=0 575 uncensored observations  .621 .0  8.046. This follows because the densities conditional on y > 0 are identical for the Tobit model and Cragg’s model.0 0.0 0.66 0.F(0|x)]. test ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 ( ( ( ( ( ( ( (  1) 2) 3) 4) 5) 6) 7) 8)  ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 F(  = = = = = = = =  0. a.  Certainly several estimates on the  industry dummies are economically significant.3. the p-value for the LR statistic is also essentially zero. where F(W|x) is the cdf 106  .0 0.Obs. say.0 0. then the  density of y given x and y > 0 is f(W|x)/[1 . it is roughly legitimate to use the parameter estimates as the partial effects. notice that this is roughly 8 (= number of restrictions) times the F statistic.0 0.  The likelihood  ratio statistic is 2(503.7.0000  Each industry dummy variable is individually insignificant at even the 10% level.  [Remember. with so few observations at  zero. industry eight earning about 61 cents less per hour in benefits than comparable worker in industry one.0 0.0 0.098) = 73.  we can think of an  underlying variable. c.y > 0)]. the upper limit of 10 is an  arbitrary corner imposed by law. From Problem 16.8) we have E(y|x) =  F(xG)WE(y|x.  When f is the normal pdf with mean xB and variance  s2. with a1 = 0. If we take the partial derivative with respect to log(x1) we clearly get the sum of the elasticities.xB)/s] +  F(-xB/s)}  s{f(xB/s) . we have E(y|x) = (xB)W{F[(a2 .y > 0) = {F(xB/s)} {f[(y .  So.9.f[(a2 .not just for Cragg’s model or the Tobit model -.  c. of the kind analyzed in Problem 16.a2)/s].from (16. a.xB)/s]/s} for the Tobit model.3(b).  On the other hand. b.of y given x. From (6. a2 = 10.8): log[E(y|x)] = log[P(y > 0|x)] + log[E(y|x. which would be the percentage invested in the absense of any restrictions.3. A two-limit Tobit model. there would be no upper bound required (since we  would not have to worry about 100 percent of income being invested in a pension plan). b. with a1 = 0. The lower limit at zero is logically necessary considering the kind of response:  the smallest percentage of one’s income that can be invested in a  pension plan is zero.  Taking the derivative of this function with respect to a2 gives 107  .  16. we get  that f(y|x.xB)/s]} + a2F[(xB . is appropriate. This follows very generally -. and this -1  is exactly the density specified for Cragg’s model given y > 0.  Then.  One can imagine that some people at the  corner y = 10 would choose y > 10 if they could.y > 0) = F(xG)[xB + sl(xB/s)]. xB)/s] + =  F[(xB .  16.a2)/s]  F[(xB . and Var(x) has full rank K -.. t=1  -1 T  -----  -----  Of course.N. corner solution outcomes. and count outcomes.  We simply  have  & 7  ci = .provided the second moments of y and the xj are finite.  B^  and  For a given value of x. where  We might evaluate this expression at the sample average  of x or at other interesting values (such as across gender or race).T where  T j _ -7&T-1 S Pt8*X.13. and 108  . we would compute  ^  s are the MLEs.. d.  That is why a  linear regression analysis is always a reasonable first step for binary outcomes.  ^ ^ F[(xB . No.a2)/s]. This extension has no practical effect on how we estimate an unobserved effects Tobit or probit model. or how we estimate a variety of unobserved effects panel data models with conditional normal heterogeneity.  (16.11..  B^  and  ^  s are just the usual Tobit estimates  with the "censoring" at zero. t=1  S Pt8*X + xiX + ai _ j + xiX + ai.  16.10)/s].xB)/s]Wf[(a2 .(a2/s)f[(xB .a2)/s] . An interesting follow-up question would have been:  What if we  standardize each xit by its cross-sectional mean and variance at time t.xB)/s] + [(a2 .59)  We can plug in a2 = 10 to obtain the approximate effect of increasing the cap from 10 to 11. any aggregate time dummies explicitly get  -----  swept out of xi in this case but would usually be included in xit. OLS always consistently estimates the parameters of a linear projection -.dE(y|x)/da2 = (xB/s)Wf[(a2 .regardless of the nature of y or x. If yi < 10 for i = 1.. provided there is not true data censoring. .N}..  You might want to supplement this with an analysis  of the probability that buildings catch fire. given that a fire has occured.. t = 1. 2  -----  from the population... -----  Then.2. other words. given building and neighborhood characteristics..  We simply need a random sample of buildings that  actually caught on fire.  But then a two-stage analysis is appropriate.  16.T.  ci =  j + S xirLr + ai.  CHAPTER 17  17..Pt). say and  rN-asymptotically normal. for each random draw i Then.T. 109  . usual sample means and sample variance matrices.. zit would not contain aggregate time dummies).sa) (where. one could estimate estimate  Pt  for each t using the cross section observations {xit: i = 1. we might assume ci|xi ~ Normal(j + ziX.Pt))t-1/2. and proceed t  with the usual Tobit (or probit) unobserved effects analysis that includes the -1 time averages ^ zi = T -----  T S ^zit. form ^ zit  P^t  and  )^ t. let zit  In  _ (xit .  -1/2 r  t = 1.1.  This is the kind of  scenario that is handled by Chamberlain’s more general assumption concerning T  the relationship between ci and xi:  )  X/T. This is a rather simple two-step estimation  t=1  method..15.  and  )t  The  are consistent  ^ -1/2 ^ _ ) (xit . To be added. then there is no problem.2.  in  which case one might ignore the sampling error in the first-stage estimates. but accounting for the sample variation in cumbersome. where Lr = r=1  Alternatively.. If you are interested in the effects of things like age of the building and neighborhood demographics on fire damage.  again..assume ci is related to the mean and variance of the standardized vectors.  P^t  and  )^ t  would be  It may be possible to use a much larger to obtain  P^t  and  )^ t..  we need the expected value of u1 + given (z. but  If we use an IV approach. -----  The key is to note how the error term in (17. is p(y|xi.G)  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------  In the Hausman and Wise (1977) study.v2. a1(xi) = -8. a1(xi) < y < a2(xi). it cannot depend on z).17. where  B  Let yi given xi have  is the vector indexing E(yi|xi) and  set of parameters (usually a single variance parameter). the procedure is to replace  a1W(zD2 + v2) + u1 a1W(zD2) + (u1 + a1v2). we need to see what happens when y2 = zD2 + v2 is plugged into the structural mode: y1 = z1D1 + = z1D1 + So.  a1v2)|v3] = g1v3 by normality. si = 1.3. when si = 1[a1(xi) < yi < a2(xi)].14).v3) = z1D1 + where E[(u1 +  a1W(zD2) + g1v3. E(y1|z.5.B.B.y3 = 1) = z1D1 +  Conditioning on y3 = 1 gives  a1W(zD2) + g1l(zD3).82)  A sufficient condition for (17.v3) is independent of z with a trivariate normal distribution. yi = log(incomei). and a2(xi) was a function of family size (which determines the official poverty level).  ^ 17.81) is u1 +  a1v2.B.81) its  (17.  G  is another  Then the density of  yi given xi. F(a2(xi)|xi.82) is that (u1.G).  If the  selection correction is going to work.  We can get by with less than this. the nature of v2 is restricted.G) .F(a 1 (xi)|xi.B. density f(y|xi.  (17.81)  rN-consistent estimator. D^2.si=1) =  f(y|x i . This is essentially given in equation (17. we need assume  110  . If we replace y2 with y2.  a1v2  Then  we can write E(y1|z.G) .  D2  in (17.v3) to be linear in v3 (in particular.   (17.a1(zD1) +  a2(zD2) + z3D3 + v3)  _ max(0. consistent estimators are obtained by using initial consistent  D1  estimators of entire sample. equations where y2 is binary. cannot be consistently estimated using the OLS procedure.84)  where y1 is observed only when y3 > 0. zD2.  17. where v3  _ u3 + a1v1 + a2v2.  we could consistently  a1. Substitute the reduced forms for y1 and y2 into the third equation: y3 = max(0.zP3 + v3). then the OLS alternative will not be consistent. (zD1. if we cannot write y2 = zD2 + v2.  Under the assumptions given. zi3  using all observations. a.  Thus.  and  Estimation of  Estimation of  D1  D2  is simple:  just use OLS using the  follows exactly as in Procedure 17.3 using  the system y1 = zD1 + v1  (17. For identification.  a1. estimate  Thus.zD2.  This is why 2SLS is generally preferred. v3 is indepdent of  z and normally distributed. (ziD2).z3) can contain no exact linear dependencies.zP3 + v3). As a practical matter. where v2 is independent of z and approximately normal. and D^3 from the Tobit ^  yi3  ^ ^ on (ziD1).7.  >From the  usual argument. and z3. Given  D1 ^  Then.  ^ ^ form ziD1 and ziD2 for each observation i in the sample. if we knew  D1  and  D2. and D3 from a Tobit of y3 on zD1.  Necessary is that there must be at least two elements in z not 111  .  D2. ^a2.nothing about v2 except for the usual linear projection assumption. a2. or is some other variable  that exhibits nonnormality. obtain  and  D^2.83)  y3 = max(0.   It is  most easily done in a generalized method of moments framework. you have specified the distribution of y given x and y > 0.11. Obtaining the correct asymptotic variance matrix is complicated. I think. b. there is no sample selection  Confusion arises. there is no sample selection bias because we have specified the conditional expectation for the population of interest. Again. F(xG)Wexp(xB). We need to estimate the variance of u3. We would use a standard probit model.  e. There is no sample selection problem because. x follows a probit model with P(w = 1|x) = d.  17.  Then w given  F(xG). To be added.  For example.  If we have a random  sample from that population.9.  s23.  The only difference is that  Then follow the steps from part a. E(y|x) = P(y > 0|x)WE(y|x.  c.  and the probit estimator of  So we would plug in  G.3.  We only need to  obtain a random sample from the subpopulation with y > 0. This is not very different from part a.y > 0) = the NLS estimator of  B  Let w = 1[y > 0].  17. for the two parts.  By definition. NLS is generally consistent and  rN-asymptotically -----  normal.also in z3.  D2  must be estimated using Procedure 17. we could write  y = wWexp(xB + u). when two part models are specified with  unobservables that may be correlated. or conditional means. Not when you specify the conditional distributions. c. a. problem. b. w = 1[xG + v > 0]. 112  . by definition.   Ideally. one would try to find a variable that affects the fixed costs of being employed that does not affect the choice of hours.  In labor economics.  This two-step procedure reveals a potential problem with the model that allows u and v to be correlated:  adding the inverse Mills ratio means that we  are adding a nonlinear function of x. correlated. we would have a variable that affects P(w = 1|x) that can  be excluded from xB.we have E(y|x.w) = wWexp(xB)E[exp(u)|x.  A standard t  ^  r is a simple test of Cov(u.so that u is independent of (x.w = 1] = xB + A two-step strategy for estimating probit of wi on xi to get  G  ^  and  and  l(xi^G).  First.  run the regression log(yi) on xi.  In other words. once we absorb E[exp(u)] into the intercept). If we assume (u. r. identification of  B  comes entirely from the nonlinearity of the IMR. 113  While this would be a little less  .v) is multivariate normal.v) is independent of x.  ^ ^ l(xi^G) to obtain B . If we make the usual linearity assumption.w = 1). estimate a  Then. with mean zero.  The interesting twist here is if u and v are  Given w = 1.  Then. statistic on  G  B  rl(xG). which we warned about in this chapter. which implies the specification in part b (by setting w = 1.  So  E[log(y)|x.v) = 0.w = 1] = xB + E(u|x.w) -. we can write log(y) = xB + u. then we can use a full maximum likelihood procedure.  Assume that (u. if u and  v are independent -.so that w = 0  6 y = 0. E(u|v) =  rv and assume a standard  normal distribution for v then we have the usual inverse Mills ratio added to the linear model: E[log(y)|x. where two-part models are used to  allow for fixed costs of entering the labor market.w] = wWexp(xB)E[exp(u)]. using the yi > 0 observations.  is pretty clear.  a. making full distributional assumptions has a subtle advantage: then compute partial effects on E(y|x) and E(y|x.robust. particularly. where E[exp(u)|x. Then.  Usually.  In the case where an element of x is a  derived price. We cannot use censored Tobit because that requires observing x when whatever the value of y.w = 1)].  A similar  example is given in Section 19.2.  Instead.  we can  Even with a full set  of assumptions.44). rank E(x’x|y > 0) = K.  This  is very different from the sample selection model. equation (19.B)WE[exp(u)|x. y given x follows a standard Tobit model in the population  (for a corner solution outcome).y > 0) = exp(x. the partial effects are not straightforward to obtain.  For  one. the underlying  variable y of interest has a conditional normal distribution in the population. Provided x varies enough in the subpopulation where y > 0 such that b. we can multiply this expectation by P(w = 1|x) = that we cannot simply look at  B  F(xG).  The point is  to obtain partial effects of interest.y > 0).  114  .  17. the parameters. we need sufficient price variation for the population that consumes some of the good.  Here.13.w = 1)] can be obtained under joint normality. b. we can use truncated Tobit:  distribution of y given x and y > 0.  we use the  Notice that our reason for using  truncated Tobit differs from the usual application.  Given such variation.5. E(y|x. see. we can estimate E(y|x) =  F(xB/s)xB + sf(xB/s) because we have made the assumption that y given x follows a Tobit in the full population. 2271609 -0.06748 = -294.1726192 0. a. probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = -302. on average.170 -.07642 = -294.1449258 married | .  18.1052176 . The following Stata session estimates a using the three different regression approaches.1446253 . Interval] -------------+---------------------------------------------------------------re74 | -.0008734 0.0159392 -1.01 0.0189577 .44195 .1515457 -2.0371871 .64 0.0501979 .E(y0|w = 0)] + ATE1. by (18.06748  Probit estimates  Number of obs LR chi2(8) Prob > chi2 Pseudo R2  Log likelihood = -294. This follows from equation (18. those who participate in the program would have had lower average earnings without training than those who chose not to participate. b.  This is a form of sample selection.y0) = [E(y0|w = 1) .07 0.992 -. z P>|z| [95% Conf.090319 age | -.0005467 .  leads to an underestimate of the impact of the program. and.0122825 re75 | .234 -.08 0. but I did not do so:  .92 0. Std.5898524 .CHAPTER 18  18. Err.091519 .5).06748  = = = =  445 16. E(y0) = E(y|w = 1).0159447 .4298464 black | -.0534045 -0.3.0415 0.1.934 -.0271086 1. -----  First.3006019 115  .1 = -294.7389742 -.19 0. E(y1) = E(y|w = 1) and -----  Therefore.37 0.2468083 .0017837 nodegree | -.  It would have made sense to add unem74 and unem75 to  the vector x.1041242 agesq | .  E(y1 .0016399 .5).004 -.0000719 .53 0.524 -.0266  -----------------------------------------------------------------------------train | Coef. If E(y0|w = 1) < E(y0|w = 0). -----  -----  and so the bias is given by the first term.596 -. .3184939 .28 0.661695 . Interval] -------------+---------------------------------------------------------------train | -.0139 0.4998223 442 .0181789 phat | .80 0.233225 .84 0.0449 0.719599 .000 .195208 -.3579151 .966 -.1030629 _cons | .110242 .0217247 phat | -. Pr(train)) .1066934 .5534283 -----------------------------------------------------------------------------.2378099 -0.04 0.60 0.45993  -----------------------------------------------------------------------------unem78 | Coef.45928  -----------------------------------------------------------------------------unem78 | Coef.2284561 .222497 _cons | .1987593 -. Dev.63 0.0095 .5004545 .213564126  Number of obs F( 3.779 -1.8154273 0.8224719 444 . Err. gen traphat0 = train*(phat .0101531 . predict phat (option p assumed.416) .045039 -2. Interval] -------------+---------------------------------------------------------------train | -. Std.210939799 -------------+-----------------------------Total | 94.129489 1.213564126  Number of obs F( 2.8224719 444 . 442) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  445 3.0994803 3.9204644 traphat0 | -.072 -. t P>|t| [95% Conf. 441) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  445 2.015 -.3009852 .826664 -----------------------------------------------------------------------------.0934459 .95 0. reg unem78 train phat Source | SS df MS -------------+-----------------------------Model | 1. reg unem78 train phat traphat0 Source | SS df MS -------------+-----------------------------Model | 1.21153806 -------------+-----------------------------Total | 94.hisp | -.4155321 .103972 .37 0.0450374 -2. sum phat Variable | Obs Mean Std.599340137 Residual | 93.1638736 .50 0.13 0.0375 0.4877173 -----------------------------------------------------------------------------. t P>|t| [95% Conf.134 -1. Min Max -------------+----------------------------------------------------phat | 445 .0244515 441 .369752 1. reg unem78 train re74 re75 age agesq nodegree married black hisp 116  .3151992 0. Std.3079227 -1.45 0.018 -.104 -1.6738951 . Err.0190 0.79802041 3 .0212673 .1624018 .0123 .4775317 .3226496 2 .4793509 -1.661324802 Residual | 93.340 -.4572254 _cons | . 0342 .0538 0.111 -.0676704 agesq | -.013 -.0659889 .0080391 re75 | -.49 0.48 0. so we are not surprised that different methods lead to roughly the same estimate.421 -. in this example.007121 .0266  -----------------------------------------------------------------------------117  .025669 .0815002 2.213564126  Number of obs F( 9. I used the following Stata session to answer all parts: .0011038 .36 0.60 0.2342579 .206263502 -------------+-----------------------------Total | 94.0001139 nodegree | .3408202 hisp | -.2905718 -0.1502777 married | -.11:  participating in job training is estimated to reduce the  unemployment probability by about .60 0.06748  = = = =  445 16.0068449 .0415 0.0231295 re74 | -.0421444 .027 .444 -. 435) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  445 2.0620734 -0.1105582 .1726761 _cons | -.0003098 -1.  An alternative.636 -. Err.2512535 .0296401 .1979868 -.0204538 . a.75 0.22 0.0094371 -0.0025525 . the average treatment effect is estimated to be right around -.7246235 435 . of course.633 -.06748  Probit estimates  Number of obs LR chi2(8) Prob > chi2 Pseudo R2  Log likelihood = -294.  18.566427604 Residual | 89.1078464 -0.0004949 . training  status was randomly assigned.1516412 .0053889 -0. probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = -302.45416  -----------------------------------------------------------------------------unem78 | Coef. Std.0189565 1.77 0.0131441 .0923609 black | .8053572 .0114269 age | .0040 0.75 0. is to use a  probit model for unem78 on train and x.07642 = -294.11.81 0.1 = -294.451 -.180637 .06748 = -294. Interval] -------------+---------------------------------------------------------------train | -.109 -. t P>|t| [95% Conf.  Of course.0392887 .47 0.5.0304127 .07 0.3368413 ------------------------------------------------------------------------------  In all three cases.Source | SS df MS -------------+-----------------------------Model | 5.09784844 9 .8224719 444 .0444832 -2.716 -.0550176 0. 1515457 -2.9992 = .367622 3.2284561 .0501979 .0159392 -1.1726192 0.0371871 .0008734 0.776258 9 78.108893 black | -2.759 -.004 -.656719 -0.203039 -0.40 0.554259 -1.0045238 -0.0000719 .45109 re74 | .7397788 agesq | -.0763 0.670 -7.01 0.997 -35.1041242 agesq | .927734 married | -.104 -1.369752 1. reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74 re75 age agesq nodegree married black hisp) Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------Model | 703.5004545 .31125 35.47144 0.0271086 1. t P>|t| [95% Conf.2746971 0.43 0. predict phat (option p assumed.524 -.936 -7.64 0. Err. Err.050672 1.1453799 0.1998802 .53 0.5898524 . Interval] -------------+---------------------------------------------------------------re74 | -.4298464 black | -.44195 .826664 -----------------------------------------------------------------------------.19 0.103972 .87706754 444 .00263  . 435) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  445 1.257878 .008732134 118  Number of obs F( 8.934 -.2468083 .8154273 0.0122825 re75 | . Std.0064086 nodegree | -1.92 0.0016399 .963 -2. Std.2953534 3.1446253 .44 = 0. Pr(train))  .3481955 re75 | .090319 age | -.170 -.4668602 .08 0.197362 Residual | 18821.467 -.484255158 Residual | .63 0.1052176 .668 -.992 -.00172 0.0624611 .9410e-06 -------------+-----------------------------Total | 3.0189577 .8517046 hisp | -.0534045 -0.73 0.3400184 .7389742 -.3079227 -1.3006019 hisp | -.0017837 nodegree | -.05 0.234 -.2686905 -------------+-----------------------------Total | 19525.37 0.203087 1.train | Coef.596 -.0161 6.8804 435 43.55 0.00 0.2814839 0. 436) Prob > F R-squared Adj R-squared Root MSE  = 445 =69767.098774 -0.0863775 .31 0.2232733 .210237 2.157 -5.613857 11.003026272 436 6. z P>|z| [95% Conf.6396151 age | .5779  -----------------------------------------------------------------------------re78 | Coef.0113738 .2271609 -0.43 0.28 0.0005467 .779 -1.0159447 .1030629 _cons | .0024826 .0360 0.482387 6.87404126 8 .0699177 18.9992 = 0.89168 _cons | 4.42 0. reg phat re74 re75 age agesq nodegree married black hisp Source | SS df MS -------------+-----------------------------Model | 3.1449258 married | .583 -.091519 .9767041  Number of obs F( 9. Interval] -------------+---------------------------------------------------------------train | .662979 4.1602 -----------------------------------------------------------------------------.6566 444 43.688 -17.08 0.75 0.93248 27.0000 = 0.  The collinearity suspected in part b is confirmed by regressing Fi on the xi:  the R-squared is .  ^ (When we do not instrument for train.000 -.z. and we have 119  .000 -. we will replace wWv with its expectation given (x.71 0. Err.93 0.v)|x.1726018 . The IV estimate of a is very small -..v)Wv|x.80e-06 16.2.0000258 .)  The very large standard error (18.  18. y = h0 + xG + bw + wW(x -  J)D  + u + wWv + e.594057 -----------------------------------------------------------------------------b.0005368 -.000 .0068687 re75 | .1719806 married | . This example illustrates why trying to achieve identification off of a nonlinearity can be fraught with problems.00011 -2.0001046 agesq | . much smaller than when we used either linear regression or the propensity score in a regression in Example 18.1732229 -.66).9992.  18.0000293 1.000 .1826192 _cons | . Interval] -------------+---------------------------------------------------------------re74 | -.9.625.000316 -546.000 -. We can start with equation (18. a = 1. it is not a good idea.0571603 -.0069301 .1850713 -.0003207 . But E(wWv|x.z) and an error.14 0. again.z] = E[exp(p0 + xP1 + zP2 + p3v)Wv|x.z.000 -. a.0138135 . t P>|t| [95% Conf.0006238 -294.5907578 .-----------------------------------------------------------------------------phat | Coef.01 0.z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv]. which means there is virtually no separate  ^ variation in Fi that cannot be explained by xi. Std.0016786 351.82 0. To be added.000 .0000328 nodegree | -.z] = E[E(w|x.04 0.0069914 -.z) = E[E(wWv|x.  Generally.000 .00) suggests severe collinearity among the instruments.  and.0004726 -118.1838453 . ^ c.0562315 .0553027 hisp | -. d.0352802 .0359877 black | -.070.5874586 .0345727 .004 -.0140283 age | -.31 0.00036 98. se = .0000312 -222.7.0139209 .99 0.0000546 254.92 0.640. z) + wE(v|gx..  These are standard linearity assumptions under independence  of (u. wigi. wi(xi ..z).  b.x.x. run the regression  ^ ^ yi on 1.z) = rWg and E(v|g. an F-type 120  .z) = 0.. wi. we can consistently estimate p0.  First.x). zi. i = 1..  p0 + xiP1 + ziP2 + gi.z) in the estimating equation.  assume we can write w = exp(p0 + xP1 + zP2 + g).x.  and  P2  from the  From this regression.z) is any function of (x.h) =  L(w|q) = q because q = E(w|x.  [Note that we do not  need to replace p0 with a different constant. E(wWv|x.z): E(y|v. This is not what I intended to ask.z) + r.z)] + e.z). gi..  What I should have said is.z) = h0 + xG + bw + wW(x -  J)D  + rWg + qwWg.z) and E(e|g. ^ need the residuals.. xi. E(r|x.z) = h0 + xG + bw + wW(x -  J)D  + E(u|g. i = 1. The ATE b is not identified by the IV estimator applied to the extended equation.66) conditional  on (g. define r = u + [w -  Given the assumptions. we  In the second step.N.  Now. since log(wi) =  P1.  If h  _ h(x. becaue we need to include  E(w|x.] So we can write y = h0 + xG + bw + wW(x -  J)D  + xE(w|x.x.q.N. where E(u|g.x.  c.z)  + E(e|g. E(r|x.z).x.z) = 0.for example.  In effect.z). the coefficient on wi is the consistent estimator of b.x. the average treatment effect.x..  The last equation suggests a two-step procedure.. L(w|1.x.  A standard joint significant test -.v.z) are valid as instruments. -----  As usual. gi.  Then we take the expected value of (18. as is implied in the statement of the problem.z) = 0.  where we have used the fact that w is a function of (g.z) = qWg.. no other functions of (x. i = 1... OLS regression log(wi) on 1.N.used the assumption that v is independent of (x. xi.g) and (x..  This is a clear weakness of the approach. 4594197 1.3.1.  19.83 0.000 .  Write q(m)  _  Then dq(m)/dm = mo/m .561503 2.233 -. -2  The -2  + m .on the last two terms effectively tests the null hypothesis that w is exogenous.7745021 . The following is Stata output used to answer parts a through f.829893 -.m for m > 0.38 0.1736136 age | .702 -3. 799) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  807 6.06233 Residual | 143724.280003  Number of obs F( 7.782321 -0. so m = mo uniquely sets the  The second derivative of q(m) is -mom  -2  > 0 for all m >  0.117406 -2.  .424067 2.1671677 -3.5592363 1. which is uniquely solved by m = mo. Interval] -------------+---------------------------------------------------------------lcigpric | -.15 0. Std.  molog(m) . This is a simple problem in univariate calculus.43631 7 1147.m -3  second derivative is -2mom -2  mo  -2  = -mo  -1  _ E[li(m)] = -mo/m .305594 educ | -.0000 0.459461 -0.38 0.6722235 white | -.00 0.8509044 5. a.56 0.19 0.880158 -------------+-----------------------------Total | 151753.1605158 4. q(m) -2  order condition is mom  . t P>|t| [95% Conf.883 -12. so the sufficient second order condition is satisfied.20124 10.412  -----------------------------------------------------------------------------cigs | Coef.299532 restaurn | -2. gives -2mo  +  < 0.059019 -. which. reg cigs lcigpric lincome restaurn white educ age agesq Source | SS df MS -------------+-----------------------------Model | 8029.0446 13.003 -. b.246 799 179.0529 0.49943 lincome | .8690144 .  The  answers are given below.865621 1.683 806 188.test -.1. Err.  CHAPTER 19  19. when evaluated at mo. For the exponential case. derivative to zero.5017533 .011 -5.  The first  = 0.7287636 1.089585 121  .log(m). 14 0.0529 13.054396 -0.147 -.002 -.912 -50.412  -----------------------------------------------------------------------------| Robust cigs | Coef. reg cigs lcigpric lincome restaurn white educ age agesq.1380317 5. 799) = Prob > F =  0.11 0.04545 agesq | -.5592363 1.918 -53.16145 -----------------------------------------------------------------------------.685 -3.000 -.0 lincome = 0.5191 log likelihood = -8111.519  Poisson regression  Number of obs LR chi2(7) 122  = =  807 1068.8346 log likelihood = -8111.0 lincome = 0.682435 24. test lcigpric lincome ( 1) ( 2)  lcigpric = 0.26472 2.000 .7353 11.7745021 .888 -12.597972 1.042796 restaurn | -2.5035545 1.1829532 age | .0000 0.0056373 _cons | -2.0017481 -5.0014589 -6. 799) Prob > F R-squared Root MSE  = = = = =  807 9.8205533 -.agesq | -.0 F(  2.0090686 . poisson cigs lcigpric lincome restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2:  log likelihood = -8111. robust Regression with robust standard errors  Number of obs F( 7.07 0.0335 lincome | .22 0.22073 -0.0119324 -.45 0.0 F(  2.4899  .0124999 -.90194 -0.8509044 6.86134 -----------------------------------------------------------------------------.682435 25.52632 48. test lcigpric lincome ( 1) ( 2)  lcigpric = 0.1624097 -3.19 0.41 0.8687741 white | -.862469 -.61 0.865621 1. Interval] -------------+---------------------------------------------------------------lcigpric | -.10 0.8690144 .146247 educ | -.000 -.3441  .22621 44.38 0.3047671 2.82 0.378283 -0. Std.70  . 799) = Prob > F =  1.017275 -2.09 0.0062048 _cons | -2.0090686 .71 0.005 -4. t P>|t| [95% Conf. Err.5017533 . 0914144 1.6463244 -0.16 0.743 -. Err.1434779 restaurn | -.33 0.1045172 .14 0.10 0.000 -.002 -.0013708 .1142571 .96 0.1059607 .1686685 -0.3870061 .000 -.0312231 -11.0181421 educ | -.519  = =  0.1407338 -2.3636059 .0510802 age | .58 0. family(poisson) sca(x2) Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = -8380.519022 = 14698.99 0.519  Generalized linear models Optimization : ML: Newton-Raphson Deviance Pearson  = =  No.000 -.0594225 .1750847 lincome | .0618  -----------------------------------------------------------------------------cigs | Coef.07 0.000 . z P>|z| [95% Conf.0202811 5. Err.460 -.000 .372733 1.Log likelihood =  Prob > chi2 Pseudo R2  -8111.2753831 educ | -.027457 5.000057 -24.820355 -----------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion) * The estimate of sigma is 123  .6454 = -8111.0552011 . of obs Residual df Scale param (1/df) Deviance (1/df) Pearson  14752.158158 agesq | -. Interval] -------------+---------------------------------------------------------------lcigpric | -.12272  -----------------------------------------------------------------------------cigs | Coef.000 -.74 0.70987  Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : OIM  [Poisson] [Log]  Log likelihood BIC  AIC  = -8111.140 -.8068952 1.0677648 -.160812 lincome | .0000 0.10 0.1037275 .13 0.0594225 .34 0.0754414 .31628  =  20.0013708 .0374207 -1.48 0. z P>|z| [95% Conf.1239969 agesq | -.0552012 .0049694 22.3024098 white | -.518 -.257 -.519 = -8111.1142571 .3964494 .3857854 .65 0.0008677 _cons | .65 0.1037275 .1433932 -0.0639772 .3636059 .0218208 age | .0042564 -13.3964493 2.0970243 -. Interval] -------------+---------------------------------------------------------------lcigpric | -.010 -. glm cigs lcigpric lincome restaurn white educ age agesq.1083 = -8111.0012592 _cons | .001874 -.1059607 .870 -1.000 .0703561 .0223989 5. Std.0191849 -3.0002567 -5.6139626 0. Std.4248021 -.6394391 -.11 0.46933 16232.46367 20.0877728 white | -.2828965 restaurn | -.76735 0.92274  = = = = =  807 799 1 18.1285444 .0014825 -.886 -5.599794 -----------------------------------------------------------------------------.  di 2*(8125. di sqrt(20.32) 4.519) 27.4150564 -.5077711 .098 -.14 0.0000553 -26.1350483 .1116754 .0618025 .000 -.7617484 .46367 20.3555118 .519)/(20.0452489 age | .291 .0114433 educ | -.65 0.9765587 -----------------------------------------------------------------------------.31628  .000 -.2906  = = = =  807 1041.000 . Err.48 0. Interval] -------------+---------------------------------------------------------------restaurn | -.8111.46933 16232.70987  Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : Sandwich  [Poisson] [Log] 124  = = = = =  807 799 1 18.5469381 .000 -.09 0. di 2*(8125.2940107 white | -. z P>|z| [95% Conf.519 = -8111.8111.2906  Poisson regression  Number of obs LR chi2(5) Prob > chi2 Pseudo R2  Log likelihood = -8125. poisson cigs restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2:  log likelihood = -8125.1083 = -8111.1211174 .544 . Std. * This is the usual LR statistic.519  Generalized linear models Optimization : ML: Newton-Raphson Deviance Pearson  = =  No. of obs Residual df Scale param (1/df) Deviance (1/df) Pearson  14752.1095991 6.6454 = -8111.0014458 . .32) 1.0040652 -13.0013374 _cons | ..000 .0015543 -.291 . family(poisson) robust Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = -8380.0532166 .037371 -1.0611842 -.2907 log likelihood = -8125.16 0.32:  The GLM version is obtained by  . glm cigs lcigpric lincome restaurn white educ age agesq.0308796 -11.95 0.3545336 .0602  -----------------------------------------------------------------------------cigs | Coef.0000 0.0048175 25. * dividing by 20.14 0.618 log likelihood = -8125.1305594 agesq | -.  Incidentally.0552011 .438442 6.09 0. on the order of 2.106 and the estimated income elasticity is .Log likelihood BIC  = -8111.0002446 -5.1558715 agesq | -.6681827 -0.then log(cigpric) becomes much more significant (but using the incorrect standard errors).9%.0212322 5.104. .894 -5.0595355 . Std.3636059 . Err. not surprisingly.23134 -----------------------------------------------------------------------------.a binary indicator for restaurant smoking restrictions at the state level -.0018503 -.92274  AIC  =  20.1142571 .1037275 .59 0. using the usual and heteroskedasticity-robust tests (p-values = .715328 a.2669906 restaurn | -.213 -.  (States that have restaurant smoking  restrictions also have higher average prices.  In this data set.34 0.3752553 .735 -.519022 = 14698. b.16 0.00137) 41.0013708 .13 0.0726427 . based on the usual Poisson standard errors.203653 lincome | .1059607 .415575 1.12272  -----------------------------------------------------------------------------| Robust cigs | Coef.000 -. Interval] -------------+---------------------------------------------------------------lcigpric | -. is very significant: t = 5.140366 -2. although the coefficient estimates are the expected sign.)  125  . they are significantly correlated.60 0.  The  two variables are jointly insignificant.0970653 -.264853 educ | -.11.6387182 -.1143/(2*.0594225 .000 .46).97704 0. both  cigpric and restaurn vary only at the state level.0008914 _cons | . too.1632959 -0.490. It does not matter whether we use the usual or robust standard errors.0884937 white | -. if you drop restaurn -.010 -.25 0.0217798 age | .  Both estimates are elasticities:  the estimate price  elasticity is -. and. the income variable.3964493 2.002 -. While the price variable is still very insignificant (p-value = . di . z P>|z| [95% Conf. respectively).344. Neither the price nor income variable is significant at any reasonable significance level.083299 1.0192058 -3.874 -1.38 0. "  The t statistic on  lcigpric is now very small (-.36 (p-value  ~ .much more in line with the linear model t statistic (1.^ c.51). and the age variables are still significant.y  It  > 1) as a  truncated Poisson distribution. with the option "sca(x2).54. and that on lincome falls to 1.seems like a good idea. and then to model P(y = 0|x) as a logit or probit. (Interestingly. which is a 2  very large value in a c2 distribution (p-value  ~ 0). Using the robust standard errors does not significantly change any conclusions.  As expected.72.19 with the usual standard errors).51.00137)  ^ ^ bage/(-2bage2)  ~ 41.  Clearly. 126  .519) = 27. The usual LR statistic is 2(8125.13 -. it is Having fully robust  standard errors has no additional effect. e. the restaurant  restriction variable.  With the GLM standard errors.  g. f.32. most explanatory variables become slightly more significant than when we use the GLM standard errors. education. so QLR = 1.which separates the initial decision to smoke at all from the decision of how much to smoke -.  One approach is to model D(y|x.  In this example. the QLR statistic shows that the variables are jointly insignificant. We simply compute the turning point for the quadratic: = 1143/(2*.  The QLR statistic  ^2 divides the usual LR statistic by s = 20.16). The GLM estimate of s is s = 4. using the maximum likelihood standard errors is very  misleading in this example.291 . is certainly worth investigating. as is done using the "glm" command in Stata. while the LR statistic shows strong significance.8111.  This means all of the Poisson  standard errors should be multiplied by this factor. there is no race effect. A double hurdle model -. in fact. ^ the adjustment by s > 1 that makes the most difference. conditional on the other covariates.) d.   A similar.19. B by. A consistent estimator of t can be obtained from  estimate. Var(yit|xi) = E[Var(yit|xi.E(yit|xit).yit. E(uit|xi) = E(uit|xit) 2  = exp(a + xitB) + t [exp(xitB)] . Let yit = exp(a + ~ ~ ~ ~ ~ ~ 2 xitB) and uit = yit . say. through the origin. t = 1... where uit 2  2  _ yit .. Var(yi|xi) depends on a.ci)|xi] = 0 + Cov[ciexp(xitB). under H0.E(yir|xi.  B.ci)|xi] + Var[E(yit|xi... under H0. 2  2  where t  2  _ Var(ci) and we have used E(ci|xi) = exp(a) under H0. but we are maintaining full  We have enough assumptions to derive  * T conditional variance matrix of yi given xi under H0.ci)|xi] = E[ciexp(xitB)|xi] + Var[ciexp(xitB)|xi] = exp(a + xitB) + t [exp(xitB)] .  general expression holds for conditional covariances: Cov(yit. We just use iterated expectations: E(yit|xi) = E[E(yit|xi. a. of ~ ~ ~ ~2 ~ ~ 2 uit .ci).  First..yir|xi. First.yir|xi) = E[Cov(yit. the T  = 0.5.N.yit on [exp(xitB)] . 2  So. -----  -----  G  b.  2  and t .ciexp(xirB)|xi] = t exp(xitB)exp(xirB).. We are explicitly testing H0: independence of ci and xi under H0. ~2 Call this estimator t .  This works because. Var(yi|xi). obtain ~ ~ ~ ~ ~ ~ ~ ~ consistent estimators a.ci)|xi] = E(ci|xi)exp(xitB) = exp(a + xiG)exp(xitB) = exp(a + xitB + xiG). pooled Poisson QMLE. i = 1.ci)|xi] + Cov[E(yit|xi. all of which we can  It is natural to use a score test of H0: G = 0.T..  a simple pooled regression. 2  also use the many covariance terms in estimating t 127  2  because t  [We could 2  =  . E{[uit/exp(xitB)][uir/exp(xirB)]}, all t 2  2  Next, we construct the T  $ r.  * T weighting matrix for observation i, as in  ~ ~ Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has ~ ~ ~ ~2 ~ 2 diagonal elements yit + t [exp(xitB)] , t = 1,...,T and off-diagonal elements ~ ~ ~2 ~ ~ ~ ~ t exp(xitB)exp(xirB), t $ r. Let a, B be the solutions to N ~ -1 min (1/2) S [yi - m(xi,a,B)]’[Wi(D)] [yi - m(xi,a,B)], i=1 a,B  where m(xi,a,B) has t  th  element exp(a + xitB).  Since Var(yi|xi) = W(xi,D),  this is a MWNLS estimation problem with a correctly specified conditional variance matrix.  Therefore, as shown in Problem 12.1, the conditional  information matrix equality holds.  To obtain the score test in the context  of MWNLS, we need the score of the comditional mean function, with respect to all parameters, evaluated under H0. Let  Q _  Then, we can apply equation (12.69).  (a,B’,G’)’ denote the full vector of conditional mean  parameters, where we want to test H0:  G  = 0.  The unrestricted conditional  mean function, for each t, is  mt(xi,Q) = exp(a + xitB + xiG). -----  Taking the gradient and evaluating it under H0 gives ~ Dqmt(xi,Q~) = exp(a~ + xitB )[1,xit,xi], -----  which would be 1  * (1 + 2K) without any redundancies in xi. -----  Usually, xit  would contain year dummies or other aggregate effects, and these would be -----  dropped from xi; we do not make that explicit here. T  Let  DqM(xi,~Q) denote the  * (1 + 2K) matrix obtained from stacking the Dqmt(xi,Q~) from t = 1,...,T.  Then the score function, evaluate at the null estimates  ~  Q _  ~ ~ ~ (a,B’,G’)’, is  ~ ~ ~ -1~ si(Q) = -DqM(xi,Q)’[Wi(D)] ui, ~ where ui is the T  * 1 vector with elements ~uit _ yit - exp(~a + xitB~ ).  128  The  estimated conditional Hessian, under H0, is ~ -1 A = N  N  S DqM(xi,Q~)’[Wi(~D)]-1DqM(xi,~Q),  i=1  a (1 + 2K)  * (1 + 2K) matrix.  The score or LM statistic is therefore  & S D M(x ,~Q)’[W (~D)]-1~u *’& SN D M(x ,~Q)’[W (~D)]-1D M(x ,~Q)*-1 i i i8 7 q i i q i 8 7i=1 q i=1 N W&7 S DqM(xi,Q~)’[Wi(~D)]-1~ui*8. N  LM =  i=1  a  2  Under H0, and the full set of maintained assumptions, LM ~ cK.  If only J < K  -----  elements of xi are included, then the degrees of freedom gets reduced to J. In practice, we might want a robust form of the test that does not require Var(yi|xi) = W(xi,D) under H0, where W(xi,D) is the matrix described above.  This variance matrix was derived under pretty restrictive  assumptions.  ~ A fully robust form is given in equation (12.68), where si(Q)  ~ ~ -1 and A are as given above, and B = N  N  S si(~Q)si(~Q)’.  Since the restrictions  i=1  are written as matrix is K  G  = 0, we take c(Q) =  G,  ~ and so C = [0|IK], where the zero  * (1 + K).  c. If we assume (19.60), (19.61) and ci = aiexp(a + xiG) where ai|xi ~ -----  Gamma(d,d), then things are even easier -- at least if we have software that estimates random effects Poisson models.  Under these assumptions, we have  yit|xi,ai ~ Poisson[aiexp(a + xitB + xiG)] -----  yit, yir are independent conditional on (xi,ai), t  $ r  ai|xi ~ Gamma(d,d). In other words, the full set of random effects Poisson assumptions holds, but where the mean function in the Poisson distribution is aiexp(a + xitB + xiG). -----  -----  In practice, we just add the (nonredundant elements of) xi in each time period, along with a constant and xit, and carry out a random effects Poisson analysis.  We can test H0:  G  = 0 using the LR, Wald, or score approaches.  Any of these wouldbe asymptotically efficient. 129  But none is robust because we  have used a full distribution for yi given xi.  19.7. a. First, for each t, the density of yit given (xi = x, ci = c) is yt  f(yt|x,c;Bo) = exp[-cWm(xt,Bo)][cWm(xt,Bo)] /yt!,  yt = 0,1,2,....  Multiplying these together gives the joint density of (yi1,...,yiT) given (xi = x, ci = c).  Taking the log, plugging in the observed data for observation  i, and dropping the factorial term gives T  S {-cim(xit,B) + yit[log(ci) + log(m(xit,B))]}.  t=1  b. Taking the derivative of li(ci,B) with respect to ci, setting the result to zero, and rerranging gives T  (ni/ci) =  S m(xit,B).  t=1  Letting ci(B) denote the solution as a function of ni/Mi(B), where Mi(B)  B,  we have ci(B) =  T  _ S m(xit,B).  The second order sufficient condition  t=1  for a maximum is easily seen to hold. c. Plugging the solution from part b into li(ci,B) gives li[ci(B),B]  = -[ni/Mi(B)]Mi(B) + = -ni + nilog(ni) + T  =  T  S yit{log[ni/Mi(B)] + log[m(xit,B)]  t=1 T  S yit{log[m(xit,B)/Mi(B)]  t=1  S yitlog[pt(xi,B)] + (ni - 1)log(ni),  t=1  because pt(xi,B) = m(xit,B)/Mi(B) [see equation (19.66)]. N  d. From part c it follows that if we maximize  S  i=1  li(ci,B)  with respect to  (c1,...,cN) -- that is, we concentrate out these parameters -- we get exactly N  li[ci(B),B]. i=1  S  B  depend on  N  But, except for the term  S (ni - 1)log(ni) -- which does not  i=1  -- this is exactly the conditional log likelihood for the  conditional multinomial distribution obtained in Section 19.6.4.  Therefore,  this is another case where treating the ci as parameters to be estimated leads us to a  rN-consistent, asymptotically normal estimator of Bo. -----  130   Std.1599947 .48849072 Residual | 13. glm atndrte ACT priGPA frosh soph.14287  -----------------------------------------------------------------------------atndrte | Coef. predict atndrteh (option xb assumed. of obs 131  =  680  .448 -.9.64937 -223.7777696 675 .2976 . replace atndrte = atndrte/100 (680 real changes made) .0177377 .64509 -223.100]. rather than [0.014485 0. count if atndrteh > 1 12 .64937  Generalized linear models  No.0856818 soph | . reg atndrte ACT priGPA frosh soph Source | SS df MS -------------+-----------------------------Model | 5. sum atndrteh Variable | Obs Mean Std.0394496 _cons | .92 0.19.95396289 4 1. family(binomial) sca(x2) note: atndrte has non-integer values Iteration Iteration Iteration Iteration  0: 1: 2: 3:  log log log log  likelihood likelihood likelihood likelihood  = = = =  -226.99 0.000 -.1].23 0.4846666 1.07 0.000 .0110085 . 675) Prob > F R-squared Adj R-squared Root MSE  = = = = = =  680 72.0936415 . Interval] -------------+---------------------------------------------------------------ACT | -.76 0.2040379 frosh | . Dev. fitted values) .020411511 -------------+-----------------------------Total | 19. Min Max -------------+----------------------------------------------------atndrteh | 680 .99 0.7907046 -----------------------------------------------------------------------------.6268492 .086443 . Err. .  This is required to easily use  the "glm" command in Stata.0173019 2.0169202 .0000 0.0112156 16.0136196 priGPA | .003 .8170956 .7317325 679 .0174327 .000 .001681 -10.1820163 . I will use the following Stata output.7087769 .0517097 . t P>|t| [95% Conf.3017 0.0202207 -.0417257 16.  I first converted the dependent  variable to be in [0.029059989  Number of obs F( 4.64983 -223. 7622 . di exp(.1114*30 + 1.5725 1. di (.0922209 .760 . Interval] -------------+---------------------------------------------------------------ACT | -. predicted mean atndrte) .6493665 = 253.1114*25 + 1.201627 1.001 .6122622 soph | .2859966 2.0891901 priGPA | 1.000 -.0928127 .000 1.7622 .84673249 .44 0. Min Max -------------+----------------------------------------------------atndh | 680 .244*3)) .7622 .66 0.9697185 .8170956 .7622 .13 0.75991253 .1267746  =  . z P>|z| [95% Conf.7621699 .244*3)/(1 + exp(..244*3)) .244*3)/(1 + exp(.008 .1114*25 + 1.322713 -----------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion) .2778463 _cons | .1335703 -..3499525 ..1113802 .0771321 16.7371358 85.84 0.093199 1.1676013 . predict atndh (option mu assumed.. di exp(. Std.57283238  Variance function: V(u) = u*(1-u) Link function : g(u) = ln(u/(1-u)) Standard errors : OIM  [Bernoulli] [Logit]  Log likelihood BIC  AIC  = -223. Dev.1266718  = = = =  675 1 .244375 .087 .113436 3. sum atndh Variable | Obs Mean Std.1268)^2 .1114*30 + 1.0113217 -9.01607824 .98 0.4233143 .6724981  -----------------------------------------------------------------------------atndrte | Coef. Err.Optimization  : ML: Newton-Raphson  Deviance Pearson  = =  Residual df Scale param (1/df) Deviance (1/df) Pearson  285. corr atndrte atndh (obs=680) | atndrte atndh -------------+-----------------atndrte | 1.0965356 .0944066 0..326 -.0000 132  .3899318 .395552 frosh | .847 -. di .0000 atndh | 0.  remember that the parameters  in the logistic functional form are not chosen to maximize an R-squared.0161.then the attendance rate is estimated to fall by about .5725)^2 . The R-squared for the linear model is about . you will get the usual MLE standard errors. are much too large.017(5) = .7 percentage points. the usual MLE standard errors. b. the attendance rate is predicted to be about . The GLM standard errors are given in the output.328.  the estimated fall in atndrte is about . obtained. di (. This is very similar to that found using the linear model. or 8.  ^ Note that s  ~ .085. these changes do not always make sense when starting at  extreme values of atndrte. we know that an increase in ACT score. or 18. The coefficient on priGPA means that if prior GPA is one point higher. Since the coefficient on ACT is negative..302. The coefficient on ACT means that if the ACT score increases by 5 points -. 2  standard errors that account for s  The  < 1 are given by the GLM output.  This R-squared is about . say.2 percentage points. from the expected Hessian of the quasi-log likelihood. 133  .  Naturally. d.32775625 a. and so the logistic functional  form does fit better than the linear model.  There are 12 fitted values greater than one. or about 8. holding year and prior GPA fixed.  In other words.  (If you  omit the "sca(x2)" option in the "glm" command.  The calculation shows that when ACT increases from 25 to 30. none  less than zero. actually reduces predicted attendance rate.  For the logistic  functional form. I computed the squared correlation between atndrtei and ^ E(atndrtei|xi).087.182 higher.5 percentage points.more than a one standard deviation increase -.  And.) c.  The derivative of the cdf in part a.19.F(ci|xi.  SOLUTIONS TO CHAPTER 20 PROBLEMS  20. N  log-likelihood is -  i=1  c. F(t|xi. and so the log-likelihood is  N  N  i=1  i=1  S log[1 .ai|xi)/P(ti *  >  b .si = 1) = P(ti *  134  > ci|xi.t*i > b .1. P(ti  < t|xi.exp[-exp(xiB)t ].5.t*i >  b .  But as b  L  -8. we can choose any a > 0 so that  N S cai > 0.ai|xi)].  i=1  Since ci > 0.F(b .3. Without covariates.t*i > b . a.F(b . P(ti = ci|xi. It is not possible to estimate duration models from flow data when all durations are right censored. c. the Weibull log-likelihood with complete censoring is -exp(b)  N S cai . exp(b)  L  0.ai|xi)]. for any a > 0. d.Q)] = S log[1 .ai) = P(t*i >  .ai|xi)]/[1 . If all durations in the sample are censored. To be added.Q)]  a  b. a.ai.ai. is simply f(t|xi)/[1 .ai|xi) = P(ti *  < t|xi)/P(t*i  >  b .  *  20. For the Weibull case. with respect to t.F(ti|xi.ci.F(b . b. and so the  S exp(xiB)cai . To be added.ci.ai) = P(t*i < t.ai|xi) (because t < b -  ai) = [F(t|xi) .Q) = 1 .si = 1) = P(t*i < t|xi. the log-likelihood is maximized by minimizing exp(b) across b.  i=1  But then.11.  So no  two real numbers for a and b maximize the log likelihood. di = 0 for all i.  So plugging any value a into the log-  likelihood will lead to b getting more and more negative without bound.  20.  for all combinations (a.ti). the density of (ai.56) results in more efficient estimators provided we have the two densities correctly specified.32).ai) = [1 .xi) = D(ai|xi).ti) given (ci.xi).ti) given (ci.xi) when t < ci.  For t = ci.F(b -  ai|xi)].  20. by the usual right censoring argument. by (20.  First.  Using  the log likelihood (20.F(ci|xi)]/[1 . the density is k(a|xi)[1 -  F(ci|xi)].22) and  D(ai|ci. We have the usual tradeoff between robustness and efficiency.di) k(a|xi)[f(t|xi)] i[1 .56).xi) does not depend *  on ci and is given by k(a|xi)f(t|xi) for 0 < a < b and 0 < t <  8.t) such that si = 1.30) requires us to only specify f(W|xi). To be added. We suppress the parameters in the densities.  20.ai|xi) (because ci > b .ti) given (ci.  Now. which is exactly (20.  > b -  From the standard result for densities for  truncated distributions. that is.9.xi) and si = 1 is d (1 . conditional on xi.F(ci|xi)] /P(si = 1|xi).ci. b. (20.ci|xi)/P(ti *  > b .  This is  also the conditional density of (ai.di.  135  .  Putting in the parameters and  taking the log gives (20.7. the density of (ai. the probability of *  observing the random draw (ai. the observation is uncensored. is P(ti ai. a.xi. Documents Similar To Jeffrey M Wooldridge Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data 2003Skip carouselcarousel previouscarousel nextWooldridge Solution chapter 3Introductory Econometrics Solutions.Wooldridge Chapter 2Jehle and Reny SolutionsUnofficial Solutions Manual to R.a Gibbon's a Primer in Game TheoryFudenberg Tirole Game Theory Solutions CompleteSolucionario Econometría Jeffrey M. WooldridgeJeffrey M. Wooldridge-Econometric Analysis of Cross Section and Panel Data, 2nd Edition(2010)Fumio Hayashi Econometrics 2000Stokey Lucas - Recursive Methods in Economic Dynamics 1989Solution Manual for MicroeconometricsJeffrey Wooldridge - Teachers' Guide to Introductory Eco No Metrics (2nd EdWooldridge Introductory Econometrics a Modern Approach 5th Edition by WooldridgeSolution Manual for Introductory Econometrics a Modern Approach 5th Edition by WooldridgeEconometric Analysis of Cross Section and Panel Data 2010.pdfIntroductory Econometrics a Modern Approach 6th Edition Wooldridge Test Bank SolutionsStock, J., Watson, M. Introduction to EconometricsIntroductory econometrics test bankJehle Reny Solutions_AllMicroeconometrics Using StataEcon5025 Practice ProblemsIntroductory Econometrics a Modern Approach 6th Edition Wooldridge Solutions ManualEconometrics_solutions to Analy - Fumio HayashiTime Series Analysis.hamiltonCahuc Zylberberg - Labor Economics (MIT 2004)Sargent - Macroeconomic TheoryAcemoglu SolutionWooldridge.solutions.chap.7 11Footer MenuBack To TopAboutAbout ScribdPressOur blogJoin our team!Contact UsJoin todayInvite FriendsGiftsLegalTermsPrivacyCopyrightSupportHelp / FAQAccessibilityPurchase helpAdChoicesPublishersSocial MediaCopyright © 2018 Scribd Inc. .Browse Books.Site Directory.Site Language:  English中文EspañolالعربيةPortuguês日本語DeutschFrançaisTurkceРусский языкTiếng việtJęzyk polskiBahasa indonesiaSign up to vote on this titleUsefulNot usefulMaster Your Semester with Scribd & The New York TimesSpecial offer for students: Only $4.99/month.Master Your Semester with a Special Offer from Scribd & The New York TimesRead Free for 30 DaysCancel anytime.Read Free for 30 DaysYou're Reading a Free PreviewDownloadClose DialogAre you sure?This action might not be possible to undo. Are you sure you want to continue?CANCELOK