Econometrics_solutions to Analy - Fumio Hayashi

Nov. 22, 2003, revised Dec.27, 2003 Hayashi Econometrics Solution to Chapter 1 Analytical Exercises 1. (Reproducing the answer on p. 84 of the book) (y−Xβ ) (y − Xβ ) = [(y − Xb) + X(b − β )] [(y − Xb) + X(b − β )] (by the add-and-subtract strategy) = [(y − Xb) + (b − β ) X ][(y − Xb) + X(b − β )] = (y − Xb) (y − Xb) + (b − β ) X (y − Xb) + (y − Xb) X(b − β ) + (b − β ) X X(b − β ) = (y − Xb) (y − Xb) + 2(b − β ) X (y − Xb) + (b − β ) X X(b − β ) (since (b − β ) X (y − Xb) = (y − Xb) X(b − β )) = (y − Xb) (y − Xb) + (b − β ) X X(b − β ) (since X (y − Xb) = 0 by the normal equations) ≥ (y − Xb) (y − Xb) n (since (b − β ) X X(b − β ) = z z = i=1 2 zi ≥ 0 where z ≡ X(b − β )). 2. (a), (b). If X is an n × K matrix of full column rank, then X X is symmetric and invertible. It is very straightforward to show (and indeed you’ve been asked to show in the text) that MX ≡ In − X(X X)−1 X is symmetric and idempotent and that MX X = 0. In this question, set X = 1 (vector of ones). (c) M1 y = [In − 1(1 1)−1 1 ]y 1 = y − 11 y (since 1 1 = n) n n 1 =y− 1 y i = y − 1· y n i=1 (d) Replace “y” by “X” in (c). 3. Special case of the solution to the next exercise. 4. From the normal equations (1.2.3) of the text, we obtain (a) X1 X2 . [X1 . . X2 ] b1 b2 = X1 X2 y. Using the rules of multiplication of partitioned matrices, it is straightforward to derive (∗) and (∗∗) from the above. 1 (b) By premultiplying both sides of (∗) in the question by X1 (X1 X1 )−1 , we obtain X1 (X1 X1 )−1 X1 X1 b1 = −X1 (X1 X1 )−1 X1 X2 b2 + X1 (X1 X1 )−1 X1 y ⇔ X1 b1 = −P1 X2 b2 + P1 y Substitution of this into (∗∗) yields X2 (−P1 X2 b2 + P1 y) + X2 X2 b2 = X2 y ⇔ ⇔ X2 (I − P1 )X2 b2 = X2 (I − P1 )y ⇔ X2 M1 X2 b2 = X2 M1 y X2 M1 M1 X2 b2 = X2 M1 M1 y ⇔ Therefore, b2 = (X2 X2 )−1 X2 y (The matrix X2 X2 is invertible because X2 is of full column rank. To see that X2 is of full column rank, suppose not. Then there exists a non-zero vector c such that X2 c = 0. But −d X2 c = X2 c − X1 d where d ≡ (X1 X1 )−1 X1 X2 c. That is, Xπ = 0 for π ≡ . This is c . a contradiction because X = [X1 . . X2 ] is of full column rank and π = 0.) (c) By premultiplying both sides of y = X1 b1 + X2 b2 + e by M1 , we obtain M1 y = M1 X1 b1 + M1 X2 b2 + M1 e. Since M1 X1 = 0 and y ≡ M1 y, the above equation can be rewritten as y = M1 X2 b2 + M1 e = X2 b2 + M1 e. M1 e = e because M1 e = (I − P1 )e = e − P1 e = e − X1 (X1 X1 )−1 X1 e =e (d) From (b), we have b2 = (X2 X2 )−1 X2 y = (X2 X2 )−1 X2 M1 M1 y = (X2 X2 )−1 X2 y. Therefore, b2 is the OLS coefficient estimator for the regression y on X2 . The residual vector from the regression is y − X2 b2 = (y − y) + (y − X2 b2 ) = (y − M1 y) + (y − X2 b2 ) = (y − M1 y) + e (by (c)) = P 1 y + e. 2 (since X1 e = 0 by normal equations). X2 X2 b2 = X2 y. (since M1 is symmetric & idempotent) This does not equal e because P1 y is not necessarily zero. The SSR from the regression of y on X2 can be written as (y − X2 b2 ) (y − X2 b2 ) = (P1 y + e) (P1 y + e) = ( P1 y ) ( P1 y ) + e e This does not equal e e if P1 y is not zero. (e) From (c), y = X2 b2 + e. So y y = (X2 b2 + e) (X2 b2 + e) = b2 X2 X2 b2 + e e (since X2 e = 0). (since P1 e = X1 (X1 X1 )−1 X1 e = 0). Since b2 = (X2 X2 )−1 X2 y, we have b2 X2 X2 b2 = y X2 (X2 M1 X2 )−1 X2 y. (f) (i) Let b1 be the OLS coefficient estimator for the regression of y on X1 . Then b1 = (X1 X1 )−1 X1 y = (X1 X1 )−1 X1 M1 y = (X1 X1 )−1 (M1 X1 ) y =0 (since M1 X1 = 0). So SSR1 = (y − X1 b1 ) (y − X1 b1 ) = y y. (ii) Since the residual vector from the regression of y on X2 equals e by (c), SSR2 = e e. (iii) From the Frisch-Waugh Theorem, the residuals from the regression of y on X1 and X2 equal those from the regression of M1 y (= y) on M1 X2 (= X2 ). So SSR3 = e e. 5. (a) The hint is as good as the answer. (b) Let ε ≡ y − Xβ , the residuals from the restricted regression. By using the add-and-subtract strategy, we obtain ε ≡ y − Xβ = (y − Xb) + X(b − β ). So SSRR = [(y − Xb) + X(b − β )] [(y − Xb) + X(b − β )] = (y − Xb) (y − Xb) + (b − β ) X X(b − β ) But SSRU = (y − Xb) (y − Xb), so SSRR − SSRU = (b − β ) X X(b − β ) = (Rb − r) [R(X X)−1 R ]−1 (Rb − r) = λ R(X X) = ε X(X X) = ε Pε. (c) The F -ratio is defined as F ≡ (Rb − r) [R(X X)−1 R ]−1 (Rb − r)/r s2 3 (where r = #r) (1.4.9) −1 (since X (y − Xb) = 0). (using the expresion for β from (a)) Rλ (using the expresion for λ from (a)) (by the first order conditions that X (y − Xβ ) = R λ) −1 Xε Since (Rb − r) [R(X X)−1 R ]−1 (Rb − r) = SSRR − SSRU as shown above, the F -ratio can be rewritten as F = (SSRR − SSRU )/r s2 (SSRR − SSRU )/r = e e/(n − K ) (SSRR − SSRU )/r = SSRU /(n − K ) Therefore, (1.4.9)=(1.4.11). 6. (a) Unrestricted model: y = Xβ + ε, where    y1 1 x12  .   . . . y = . X = . . , . . (N ×1) yn (N × K ) 1 xn2  . . . x1K  . .. . . . , . . . xnK  β1  .  β = . . . (K ×1) βn  Restricted model: y = Xβ + ε,  0  0  R = .  . ((K −1)×K ) . Rβ = r, where  1 0 ... 0 0 1 ... 0   , . .. .  . . 0 0 1  0  .  r = . . . ((K −1)×1) 0  Obviously, the restricted OLS estimator of β is    y y  0   y    β =  .  . So Xβ =  . .  .   . (K ×1) . 0 y     = 1· y.  (You can use the formula for the unrestricted OLS derived in the previous exercise, β = b − (X X)−1 R [R(X X)−1 R ]−1 (Rb − r), to verify this.) If SSRU and SSRR are the minimized sums of squared residuals from the unrestricted and restricted models, they are calculated as n SSRR = (y − Xβ ) (y − Xβ ) = i=1 (yi − y )2 n SSRU = (y − Xb) (y − Xb) = e e = i=1 e2 i Therefore, n n SSRR − SSRU = i=1 (yi − y )2 − i=1 e2 i. (A) 4 Bε) = A Var(ε)B = σ 2 AVB . n n n (yi − y )2 − i=1 i=1 e2 i = i=1 (yi − y )2 . (B) (b) F = (SSRR − SSRU )/(K − 1) n 2 i=1 ei /(n − K ) ( n i=1 (yi n (by Exercise 5(c)) = − y )2 − i=1 e2 i )/(K − 1) n 2 /(n − K ) e i=1 i 1) (by equation (A) above) = = P (y P b −y) /y()K−1) P e(y/(− n−K ) P (y − y) n i=1 i n i=1 2 i 2 n 2 i=1 i n i i=1 2 n 2 i=1 (yi − y ) /(K − n 2 i=1 ei /(n − K ) (by equation (B) above) n (by dividing both numerator & denominator by i=1 (yi − y )2 ) = R2 /(K − 1) (1 − R2 )/(n − K ) (by the definition or R2 ). b − β GLS ) = Cov(Aε. −1 Var(β ) − Var(β GLS ) = −CVq C. then there exists a nonzero vector z such that C z ≡ v = 0. If C = 0. 7. For such z. which is a contradiction because β GLS is efficient.On the other hand. 84-85 of the book) (a) β GLS − β = Aε where A ≡ (X V−1 X)−1 X V−1 and b − β GLS = Bε where B ≡ (X X)−1 X − (X V−1 X)−1 X V−1 . (b − β ) (X X)(b − β ) = (Xb − Xβ ) (Xb − Xβ ) n = i=1 (yi − y )2 . 5 . It is straightforward to show that AVB = 0. So Cov(β GLS − β . (b) For the choice of H indicated in the hint. −1 z [Var(β ) − Var(β GLS )]z = −v Vq v<0 (since Vq is positive definite). Since SSRR − SSRU = (b − β ) (X X)(b − β ) (as shown in Exercise 5(b)). (Reproducing the answer on pp. n Take the limit as n → ∞ of both sides to obtain n→∞ lim E[(z n − µ)2 ] = lim Var(z n ) + lim (E(z n ) − µ)2 n→∞ n→∞ =0 (because lim E(z n ) = µ. process is ergodic stationary.d. Assumptions 2. by Lemma 2.2 is implied by Assumption 2. this implies zn →p µ. p 1 .Nov. n→∞ n→∞ Therefore.i. Take the expectation of both sides to obtain E[(z n − µ)2 ] = E[(z n − E(z n ))2 ] + 2 E[z n − E(z n )](E(z n ) − µ) + (E(z n ) − µ)2 = Var(z n ) + (E(z n ) − µ)2 (because E[z n − E(z n )] = E(z n ) − E(z n ) = 0). plim zn = 0. (b) Rewrite the OLS estimator as 1 b − β = (X X)−1 X ε = S− xx g.i.d.2(a). So by Kolmogorov’s Second Strong LLN. Prob(|zn | > ε) = So. process with mean zero is mds (martingale differences).1 and 2. As shown in the hint. (z n − µ)2 = (z n − E(z n ))2 + 2(z n − E(z n ))(E(z n ) − µ) + (E(z n ) − µ)2 . Assumption 2.2 {xi } is i.i. 2.2 and 2.5 is implied by Assumptions 2. {xi xi } is i. On the other hand.s..d.5 . lim Var(z n ) = 0).2 imply that gi ≡ xi · εi is i. Since Σxx is invertible by Assumption 2. 1 n−1 · 0 + · n2 = n.i. 2010 Hayashi Econometrics Solution to Chapter 2 Analytical Exercises 1.2 . 2003.i. By Lemma 2. but almost sure convergence implies convergence in probability. 25. (a) Since an i.d. Revised February 23.4. (A) Since by Assumption 2. µ. E(zn ) = which means that limn→∞ E(zn ) = ∞. n n 1 → 0 as n → ∞.3(a) we get −1 1 S− xx → Σxx . we obtain Sxx → Σxx p The convergence is actually almost surely. Since an i. Assumption 2.d. 3. For any ε > 0. zn →m. An ≡ s R Sxx R . S). S− xx →p Σxx . Using the restrictions of the null hypothesis. we have: 1 −1 zn → N (0. with E(gi ) = 0. Lemma 2. where √ 1 2 −1 zn ≡ R S− xx ( n g).Similarly. So by 2 . So xx R] √ √ −1 −1 1 1 R S− SSRR − SSRU = ( n g) S− xx ( n g). xx R (R Sxx R ) Thus √ √ SSRR − SSRU 1 2 −1 −1 1 = ( n g) S− R S− xx R (s R Sxx R ) xx ( n g) 2 s 1 = zn A− n zn . as already noted.4(c). By Kolmogorov’s Second Strong LLN. under Assumption 2. So by Lemma 2. RΣ− xx SΣxx R ). As shown in the solution to Chapter 1 Analytical Exercise 5.4(c). {gi } is i. √ 1 −1 n(b − β ) → N (0. −1 1 S− xx g → Σxx · 0 = 0.2 {gi } is i. By Assumption 2. plimn→∞ (b − β ) = 0 which implies that the OLS estimator b is consistent. i=1 1 −1 Also [R(X X)−1 R]−1 = n· [RS− . Σ− xx S Σxx ).3(a). we obtain g → E(gi ). Rb − r = R(b − β ) = R(X X)−1 X ε 1 = RS− xx g (since b − β = (X X)−1 X ε) 1 n n (where g ≡ xi · εi . p Therefore. √ ng → N (0. The variance of gi equals E(gi gi ) = S since E(gi ) = 0 by Assumption 2. SSRR − SSRU can be written as SSRR − SSRU = (Rb − r) [R(X X)−1 R ]−1 (Rb − r).).3.1 and 2. p which is zero by Assumption 2.3. 5. So by the Lindeberg-Levy CLT.2. Next. d √ ng →d N (0.5. The hint is as good as the answer. By Assumption 2. Thus by Lemma 2. xx As already observed. d 4.i. Rewrite equation(A) above as √ √ 1 n(b − β ) = S− ng .d. we prove that the OLS estimator b is asymptotically normal. plim Sxx = Σxx .d.i. S). d −1 1 Furthermore. Then the second term converges in probability to zero because plim(b − β ) = 0.1-2. (c) Regarding the first term of (∗∗).5.8. A). we have E(ηi |zi ) = 0 by the Law of Iterated Expectation.6. 6. Assumption 2.4 satisfied for (2. For simplicity. Thus E(xi εi zi ) = 0.8. by Kolmogorov’s LLN. (iii) (rank condition) E(xi xi ) is non-singular. (ii) (random sample) {yi .1-2. Collecting all the assumptions made in Section 2. Assumption 2.3 about (2. (b) Note that α − α = (α − α) − (α − α) and use the hint.4 (which are implied by Assumptions (i)-(vi) above) are satisfied for the original regression. the first term of (∗∗) converges in probability to zero. by Lemma 2. the sample mean in that term converges in probability to E(xi εi zi ) provided this population mean exists. Thus by Lemma 2.8). Thus we have shown: zn → z. 3 . By Assumption 2. These conditions together are stronger than Assumptions 2. we assumed in Section 2. Since zi is a function of xi . But E(xi εi zi ) = E[zi · xi · E(εi |zi )].4). the distribution of z A −1 z is chi-squared with #z degrees of freedom.8).3 is satisfied.2.8.2 about (2. (iv) E(ε2 i xi xi ) is non-singular.3 for the regression equation (2. Regarding the second term of (∗∗). So by Proposition 2.8) is Assumption 2.d.7. An →p A. σ 2 = E(ε2 i ).8.1(a) applied to (2. s2 →p σ 2 .7). xi } is i.3(a) (the “Continuous Mapping Theorem”). plim(b − β ) = 0 since b is consistent when Assumptions 2. So the expression for the variance of the limiting distribution above becomes 1 −1 −1 2 RΣ− xx SΣxx R = σ RΣxx R ≡ A. (i) (linearity) yi = xi β + εi . Assumption 2.8). xi } is i.8 that {yi . Sxx →p Σxx . To see that Assumption 2. E(εi |zi ) = 0. n zn → z A d But since Var(z) = A.8) is satisfied by (i) about the original regression. (vi) (parameterized conditional heteroskedasticity) E(ε2 i |xi ) = zi α.8) (that {ε2 i .8) (that E(zi ηi ) = 0) is satisfied. z ∼ N (0. the sample mean in that term converges in probability to E(x2 i zi ) provided this population mean exists.i.8. By (v) (that E(εi |xi ) = 0) and the Law of Iterated Expectations.4(d).1 about the regression equation (2.i. the OLS estimator α is consistent by Proposition 2. Clearly. note first that E(ηi |xi ) = 0 by construction.4 that E(zi zi ) be nonsingular. Therefore. The additional assumption needed for (2.8. (a) We wish to verify Assumptions 2.5)).8.But. Furthermore. d As already observed. Therefore.8. 1 −1 zn A− z.8. as shown in (2. xi } is ergodic stationary) is satisfied by (i) and (ii).1-2. With Assumptions 2. S = σ 2 Σxx under conditional homoekedasticity (Assumption 2.d. Therefore.1-2. (v) (stronger version of orthogonality) E(εi |xi ) = 0 (see (2. 4(b) the second term. . Given the hint. . So by Lemma 2. .(d) Multiplying both sides of (∗) by √ n(α − α) = = 1 n n √ n. i=1 Under Assumptions √ 2. . g2 ] = E[εt−1 E(εt |εt−1 . . . Since xi is i.8. g2 ] =0 (since E(εt |εt−1 . . gt−2 . vanishes. g2 ] (by the Law of Iterated Expectations) = E[E(εt · εt−1 |εt−1 . by (ii) and since zi is a function of xi . . . the only thing to show is 1 −1 1 that the LHS of (∗∗) equals Σ− xx S Σxx . . n 8. . See the hint. . . As shown in (c). that plim n X VX = S.d. εt−2 . n i=1 xi εi zi →p 0. This exercise is about the model in Section 2. . The sample mean can be written as 1 n = = n zi αxi xi i=1 1 n n vi xi xi i=1 (by the definition of vi . So by Lemma 2. . .d. ε1 ) = 0). gt−2 .5 for the original regression (which are implied by Assumptions (i)-(vi) above).i. n(b − β ) converges in distribution to a random variable. As shown in n 1 (c). (b − β ) n i=1 xi zi vanishes provided 2 E(xi zi ) exists and is finite. ε1 )|gt−1 . (by the linearity of conditional expectations) 4 . .4(b) the first term in the brackets vanishes n 1 2 (converges to zero in probability). zi αxi xi is i. εt−2 . . gt−2 . . so we continue to maintain Assumptions (i)(vi) listed in the solution to the previous exercise. . gt−2 . or more specifically. . . 7. provided that E(zi zi ) is non-singular. So its sample mean converges in probability to its population mean E(zi α xi xi ). . n 1 n n zi zi i=1 −1 −1 1 √ n zi · vi i=1 n zi zi i=1 √ 1 −2 n(b − β ) n xi εi zi + i=1 √ n(b − β )· (b − β ) 1 n n x2 i zi . where vi is the i-th diagonal element of V) 1 X VX. . εt−2 .i. . which equals S.1-2. too. . . g2 ) = E[E(gt |εt−1 . . 9. ε1 )|gt−1 . Write S as S = E(ε2 i xi xi ) = E[E(ε2 i |xi )xi xi ] = E(zi α xi xi ) (since E(ε2 i |xi ) = zi α by (vi)). . . (a) E(gt |gt−1 . ε1 )|gt−1 . √ n(α − α) vanishes. . εt−2 . Therefore. . .) n 1 Billingsley CLT (see p. 488 of S. . . .   2 t−2  0 for j > 2. (a) Clearly. εt−j −1 . ed. Academic Press. 10. . 1975. y0 .. ε−1 )  εt + θ1 εt−1 + θ2 εt−2 for j = 0. 2 in probability to E(ε2 (d) Since ε2 t is ergodic stationary. . σ ). e. ε1 )εt−1 ] 2 2 2 E(σ εt−1 ) (since E(εt |εt−1 . . But 2 2 2 E(ε2 t−1 ) = E[E(εt−1 |εt−2 . 5 . .(b) 2 2 E(gt ) = E(ε2 t · εt−1 ) 2 = E[E(ε2 t · εt−1 |εt−1 . . 2. . 1). the sequence Yn = φ(Xn . εt−j −1 . Karlin and H. .. (b) E(yt |yt−j . Taylor. . . . εt−2 . = 2. yt−j ) depends on t. ε1 )] (by the Law of Total Expectations) of conditional expectations) = = 2 E[E(ε2 (by the linearity t |εt−1 . . 1 t−1 + θ2 εt−2 =  θ ε for j = 2. A First Course in Stochastic Processes. yt−j ) = 2  θ2 σε    0 So neither E(yt ) nor Cov(yt . . .3 on p. (c) If {εt } is ergodic stationary. . 106 of the text) is applicable to nγ1 = n n t=j +1 gt . ε0 . . . yt−j −1 . . 2nd. εt−2 . . εt−3 . which gives the desired result. γ0 converges t ) = σ . . y−1 ) = E(yt |εt−j .    θ ε for j = 1. . which states that “For any function φ. . Xn+1 . ε0 . ε−1 ) (as noted in the hint) = E(εt + θ1 εt−1 + θ2 εt−2 |εt−j . So by Lemma 2. . As shown in (c). then {εt · εt−1 } is ergodic stationary (see. .g. .4(c) n γ0 →d N (0. ) generates an ergodic stationary process whenever {Xn } √ is ergodic Thus the √ stationary”. . . √ √ γ1 4 nγ1 →d N (0. Remark 5. E(yt ) = 0 for all t = 1. for for for for j j j j =0 = 1. . . . . ε1 )] = E(σ ) = σ . > 2. εt−2 .  2 2 2 )σε + θ2 (1 + θ1    (θ + θ θ )σ 2 1 1 2 ε Cov(yt . ε1 ) = σ 2 ) = σ 2 E(ε2 t−1 ). . 1.) Since γj = 0 for j > 2. (a) In the auxiliary regression. for i ≥ j . n 1 1 (b) The j -th column of n X E is n t=j +1 xt · et−j (which. as stated in the book. Rewrite it as follows.2) of the book. Using the OLS formula. j ) element of the symmetric matrix n n n n 1 1 (e1+i−j e1 + · · · + en−j en−i ) = et et−(i−j ) . n (This is just reproducing (6. 147 of the book).9). y1 + · · · + yn )] n = 1 [(γ0 + γ1 + · · · + γn−2 + γn−1 ) + (γ1 + γ0 + γ1 + · · · + γn−2 ) n + · · · + (γn−1 + γn−2 + · · · + γ1 + γ0 )] 1 [nγ0 + 2(n − 1)γ1 + · · · + 2(n − j )γj + · · · + 2γn−1 ] n n−1 = = γ0 + 2 j =1 1− j γj . √Provided this technical √condition is satisfied. the vector of the dependent variable is e and the matrix of . n n t=j +1 n e is which equals γj defined in (2. 11. incidentally.(c) √ 1 Var( n y ) = [Cov(y1 . n n t=1+i−j 6 n−j . α = B−1 1 nX 1 nE e e .1. 1 nE X e = 0 by the normal equations for the original regression. one sets zn = ny . 1 xt · et−j n t=j +1 1 = xt (εt−j − xt−j (b − β )) n t=j +1   n n 1 1 = xt · εt−j −  xt xt−j  (b − β ) n t=j +1 n t=j +1 1 The last term vanishes because b is consistent for β . which is γ0 + 2(γ1 + γ2 ). √ (d) To use Lemma 2. However. E]. regressors is [X . Thus n t=j +1 xt · et−j converges in probability to E(xt · εt−j ). Lemma 2. the variance of the limiting distribution of ny is the limit of Var( ny ). one obtains the desired result. The j -th element of 1 1 (ej +1 e1 + · · · + en en−j ) = et et−j . 1 E E is. inadvertently misses the required condition that there exist an M > 0 such that E(|zn |s+δ ) < M for all n for some δ > 0.10.5. y1 + · · · + yn ) + · · · + Cov(yn . equals µj defined on p. . The (i. By Proposition 2.10)) shows that this expression converges in probability to γi−j . E]α) e (by the normal equation for the auxiliary regression) n . . plim α = 0 and plim γ = 0. E]α) (e − [X . . this can be rewritten as 1 1 εt εt−(i−j ) − (xt εt−(i−j ) + xt−(i−j ) εt ) (b − β ) n t=1+i−j n t=1+i−j − (b − β ) 1 xt xt−(i−j ) (b − β ). Show that SSR n = 1 ne e−α 0 . Ip . 1 1 SSR = (e − [X . we can show that plim γ = 0. . SSR/(n − K − p) −1 (∗) 7 . As shown in (c). (e) Let R≡ (p×K ) 0 . which is σ 2 for i = j and zero for i = j . . using an argument similar to the one used in (b) 1 E E = Ip . n t=1+i−j n−j n−j n−j The type of argument that is by now routine (similar to the one used on p. E] e = e e − α [X . 1 1 . plim B = B.” The SSR from γ the auxiliary regression can be written as . V ≡ [X . .10. E]α) n n .2. The F -ratio can be written as F = (Rα) R(V V)−1 R (Rα)/p . B is non-singular. So B−1 converges in probability to B−1 . we have plim n 2 Hence SSR/n (and therefore SSR/(n − K − p)) converges to σ in probability.Using the relation et = εt − xt (b − β ). 1 (d) (The hint should have been: “ n E e = γ . n n = 1 ee−α n 1 ee−α n 1 nX 1 nE e e (since X e = 0 and 1 E e = γ ). E]. 1 = ( e − [X . Also. n = 0 γ 1 e e = σ2 . (c) As shown in (b). . The F -ratio is for the hypothesis that Rα = 0. Thus the formula in (a) for showing that plim n shows that α converges in probability to zero. . . Since Σxx is non-singular. 145 for (2. 19) on p. σ 2 Σxx ) as n (and hence r) goes to infinity. (f) Just apply the formula for partitioned inverses. (∗ ∗ ∗) n Substitution of (∗ ∗ ∗) and (∗∗) into (∗) produces the desired result. √ √ (g) Since nρ − nγ /σ 2 →p 0 and Φ →p Φ. . λ σ 2 Σxx ). . 8 . So the whole expression converges in distribution to N (0. we have s2 Φ = so B22 = 1 1 1 E X S− XE . B22 →p cally equivalent to nγ (Ip − Φ)−1 γ /σ 4 . Also. consider the expression for B22 given in (f) above. Rα can be written as   0 . (c) We only prove the first convergence result. The term in parentheses converges in probability to Σxx as n (and hence r) goes to infinity.Using the expression for α in (a) above. The hints are almost as good as the answer. Therefore. 1 n r xt xt = n t=1 r 1 r r xt xt t=1 =λ 1 r r xt xt t=1 . R(V V) −1 R in the expression for F can be written as 1 R B− 1 R n 0 (since  . − Φ)−1 . 1 √ n r xt · εt = t=1 r n 1 √ r r xt · εt t=1 = √ λ 1 √ r r xt · εt t=1 . ( K × 1) . Here. 147. and pF is asymptoti- 1 As shown in (b). n E E →p σ 2 Ip . The term in parentheses converges in distribution to N (0. Rα = 0 . (b) We only prove the first convergence result.10. xx n n 1 E E − s2 Φ n −1 . Since 1 the j -th element of n X E is µj defined right below (2. it should be clear that the modified Box-Pierce Q (= n· ρ (Ip − Φ)−1 ρ) is asymptotically equivalent to nγ (Ip − Φ)−1 γ /σ 4 . Ip (K ×K ) B21 (p×K ) B11 (K ×p) 22 (p×p) B12 B    (K ×1) 0   (∗∗) γ (p×1) = B22 γ . Regarding the pF statistic given in (e) above. . 1 σ 2 (Ip 12. we give solutions to (b) and (c) only. Ip B−1  γ  (p×K ) (p×1)  = (p×K ) 0 . . Ip 1 V V = B) n B11 B12 B R(V V)−1 R =  (K ×p) 1 = n = (p×K ) (K ×K ) B21 (p×K ) (K ×p) 22 (p×p) 0 Ip 1 22 B . IK − G(G(G G)−1 ) IK − G((G G)−1 G ) IK − G(G G)−1 G MG . so by ergodic theorem the first 2 term of (∗) converges almost surely to E(x2 i εi ) which exists and is finite by Assumption 3. 1 . Because δ is consistent for δ by Proposition 3. By using the Cauchy-Schwarts inequality. δ − δ converges to 0 in probability. (c) By ergodic stationarity the sample average of zi x2 i εi converges in probability to some finite number. Therefore. (a) By assumption. (b) zi x2 i εi is the product of xi εi and xi zi . Thus. then A = A and AA = A. we obtain E(|xi εi · xi zi |) ≤ 2 2 2 E(x2 i εi ) E(xi zi ). 2003 Hayashi Econometrics Solution to Chapter 3 Analytical Exercises 1.December 27. If A is symmetric and idempotent.1. x Qx = = ≥ x H MG Hx z MG z (where z ≡ Hx) 0 (since MG is positive semidefinite). the last term of (∗) vanishes. {xi . MG is symmetric and idempotent. Therefore. Q is positive semidefinite. εi } is jointly stationary and ergodic. 2 2 2 E(x2 i εi ) exists and is finite by Assumption 3.6 the sample average of zi ability to some finite number. Hence. As mentioned in (c) δ − δ converges to 0 in probability. −1 −1 WΣxz )−1 Σxz WΣxz (b) First. Therefore. Therefore.5 and E(xi zi ) exists and is finite by Assumption 3.5. (a) Q ≡ = = = = = Σxz S−1 Σxz − Σxz WΣxz (Σxz WSWΣxz )−1 Σxz WΣxz Σxz C CΣxz − Σxz WΣxz (Σxz WC−1 C H H − Σxz WΣxz (G G) Σxz WΣxz H H − H G(G G)−1 G H H [IK − G(G G)−1 G ]H H MG H. we show that MG is symmetric and idempotent. the second term of (∗) converges to zero in probability. For any L-dimensional vector x. So x Ax = x AAx = x A Ax = z z ≥ 0 where z ≡ Ax. 2. E(xi zi · xi εi ) exists and is finite. 2 2 xi converges in prob(d) By ergodic stationarity and Assumption 3.6. 3. MG = = = = MG MG = = = IK IK − G(G G)−1 G IK − IK G(G G)−1 G + G(G G)−1 G G(G G)−1 G IK − G(G G)−1 G MG . E(|xi zi · xi εi |) is finite. It should be routine to show that M is symmetric and idempotent. So B S−1 B = (MC) (MC) = C M MC. 5. Thus B S−1 B = C MC. 254 of the book simplified) If W is as defined in the hint.1) reduces to the asymptotic variance of the OLS estimator. Avar(v)) d where Avar(v) = = = = 2 DSD D(D D)−1 D DD−1 D−1 D IK . Now.4. S).5.4.11).2)). But Bsxy = Bg because Bsxy = (IK − Sxz (Sxz S−1 Sxz )−1 Sxz S−1 )sxy = (IK − Sxz (Sxz S−1 Sxz )−1 Sxz S−1 )(Sxz δ + g) = (Sxz − Sxz )δ + Bg = Bg. . it is easy to show that gn (δ (S−1 )) = Bsxy . By (3.5. Let D be such that D D = S−1 .5. (a) From the expression for δ (S−1 ) (given in (3. which is the asymptotic variance of the efficient GMM estimator. The choice of C and D is not unique. then WSW = W and Σxz WΣxz = Σzz A−1 Σzz . √ √ v ≡ n(Cg) = C( n g). But CB = = C(IK − Sxz (Sxz S−1 Sxz )−1 Sxz S−1 ) C − CSxz (Sxz C CSxz )−1 Sxz C C (where A ≡ CSxz ) (since yi = zi δ + εi ) = (Sxz − Sxz (Sxz S−1 Sxz )−1 Sxz S−1 Sxz )δ + (IK − Sxz (Sxz S−1 Sxz )−1 Sxz S−1 )g = C − A(A A)−1 A C = [IK − A(A A)−1 A ]C ≡ MC. we obtain B S−1 B = B C CB = (CB) (CB). The rank of M equals its trace. but it would be possible to choose C so that plim C = D. which is trace(M) = trace(IK − A(A A)−1 A ) = trace(IK ) − trace(A(A A)−1 A ) = = = trace(IK ) − trace(A A(A A)−1 ) K − trace(IL ) K − L. (b) Since S−1 = C C. C C = S−1 . So (3. (the answer on p. we obtain n g →d N (0. √ By using the Ergodic Stationary Martingale Differences CLT. So √ v = C( n g) → N (0. it is no smaller than (Σxz S−1 Σxz )−1 .12)) and the expression for gn (δ ) (given in (3. (c) As defined in (b). 6. As shown in (e). Here. (g) By the definition of M in Exercise 5. The rank of a symmetric and idempotent matrix is its trace. we show that g C DCg = sxy C DCsxy . J1 = v1 M1 v1 . IK ) and M is idempotent. It should be easy to show that A1 M1 = 0 from the definition of M1 . we provide answers to (d). So (M − D)2 = M2 − DM − MD + D2 = M − D. (f). it follows that B1 g1 = B1 sx1 y . the trace of M is K − L. So MD = D since A D = 0 as shown in the previous part. S−1 ) = = = = = n · gn (δ (S−1 )) S−1 gn (δ (S−1 )) n · (Bg) S−1 (Bg) n·g BS −1 (by (a)) Bg n · g C MCg (by (b)) √ v Mv (since v ≡ nCg). From the definition of B1 and the fact that sx1 y = Sx1 z δ + g1 . So the trace of M − D is K − K1 . and (j). the hints are nearly the answer. (i). 7. DM = D M = (MD) = D = D. v Mv is asymptotically chi-squared with degrees of freedom equaling the rank of M = K − L. Also. Bg = Bsxy . MD = D − A(A A)−1 A D. From Exercise 5. For the most parts. So g1 B1 (S11 )−1 B1 g1 = sx1 y B1 (S11 )−1 B1 sx1 y = sxy FB1 (S11 )−1 B1 F sxy = sxy FC1 M1 C1 F sxy = sxy C DCsxy . (i) It has been shown in Exercise 6 that g C MCg = sxy C MCsxy since C MC = B S−1 B. It suffices to prove that v1 = C1 F C−1 v. Since both M and D are symmetric. As shown in Exercise 5. (g). D is idempotent. (d) As shown in (c). √ v 1 ≡ nC1 g 1 √ = nC1 F g √ = nC1 F C−1 Cg √ = C1 F C−1 nCg √ = C1 F C−1 v (since v ≡ nCg). Here. g C DCg = g FC1 M1 C1 F g = g FB1 (S11 )−1 B1 F g = g1 B1 (S11 )−1 B1 g1 (C DC = FC1 M1 C1 F by the definition of D in (d)) (since C1 M1 C1 = B1 (S11 )−1 B1 from (a)) (since g1 = F g). J = n·g B S−1 Bg. As shown in part (e). the trace of D is K1 − L. (f) Use the hint to show that A D = 0 if A1 M1 = 0.(d) J (δ (S−1 ). Since v →d N (0. M is idempotent as shown in Exercise 5. 3 (since sx1 y = F sxy ) (since B1 (S11 )−1 B1 = C1 M1 C1 from (a)) . Also from Exercise 5. (c) What needs to be shown is that n·(δ (W) − δ ) (Sxz WSxz )(δ (W) − δ ) equals the Wald statistic. d → p 1 Q− 1 Σxz W1 . 8. W2 Σxz Q− 1 xz 1 −1 2 ) Q2 Σxz W2 A11 A21 A12 A22 . √ n( δ 1 − δ ) (Sxz W2 Sxz )−1 Sxz W2 By using Billingsley CLT. (b) The hint is almost the answer. S). . we obtain √ n( δ 1 − δ ) (Sxz W1 Sxz )−1 Sxz W1 √ = ng . 2n xz Substitute this into the constraint Rδ = r to obtain the expression for λ in the question. (a) By applying (3. 0. −1 Q2 Σxz W2 N N 0. Q− 1 1 Σxz W1 S (W Σ Q−1 . −1 √ n( δ 1 − δ ) .4(c). √ n( δ 2 − δ ) where Avar(q) = 1 −1 A11 A21 A12 A22 1 = A11 + A22 − A12 − A21 .(j) M − D is positive semi-definite because it is symmetric and idempotent. 1 . (a) Solve the first-order conditions in the hint for δ to obtain δ = δ (W ) − 1 (S WSxz )−1 R λ. −1 4 . we have (Sxz W1 Sxz )−1 Sxz W1 (Sxz W2 Sxz )−1 Sxz W2 Therefore. Then substitute this expression for λ into the above equation to obtain the expression for δ in the question. But this is immediate from substitution of the expression for δ in (a). d Also.11). √ n( δ 1 − δ ) √ n( δ 1 − δ ) →d = (b) √ nq can be rewritten as √ √ √ √ nq = n(δ 1 − δ 2 ) = n(δ 1 − δ ) − n(δ 2 − δ ) = 1 Therefore. 9.4. Avar(q)). we obtain √ nq → N (0. by Lemma 2. we have √ ng → N (0. E(xi (xi β + vi )) β E(x2 i ) + E(xi vi ) 2 (by assumptions (2). which. and (4)). being a function of (xi . we obtain Avar(q) = = = 10. (a) σxz ≡ E(xi zi ) = = = (b) From the definition of δ . (3). 1 1 −1 Q− Σxz Q− 1 Σxz W1 S S 2 1 −1 Q− 1 (Σxz W1 Σxz )Q2 1 −1 Q− 1 Q1 Q2 1 Q− 2 . A12 . we have n 1 i=1 xi εi →p 0. is ergodic stationary by assumption (1). 1 1 −1 Q− SW1 Σxz Q− 2 Σxz S 1 1 Q− 2 . i=1 We have xi zi = xi (xi β + vi ) = x2 i β + xi vi . βσx = 0 xi zi i=1 1 n n 1 xi εi = s− xz i=1 1 n n xi εi . and A22 can be rewritten as follows: Q2 = = = = = = A21 = = A22 Σxz W2 Σxz Σxz S−1 Σxz . So by the Ergodic theorem. δ−δ = 1 n n −1 1 A11 − Q− 2 A11 − (Σxz S−1 Σxz )−1 Avar(δ (W1 )) − Avar(δ (S−1 )). Substitution of these into the expression for Avar(q) in (b). By assumption (2). we have s− xz →p σxz . A21 . Thus δ − δ →p 0. η i ). sxz →p σxz . Q2 . Since σxz = 0 by 1 −1 (a). E(xi εi ) = 0. A12 = (Σxz S−1 Σxz )−1 Σxz S−1 SS−1 Σxz (Σxz S−1 Σxz )−1 = (Σxz S−1 Σxz )−1 = 1 Q− 2 . So by assumption (1).(c) Since W2 = S−1 . n (c) sxz ≡ = = →p = 1 n 1 n n xi z i i=1 n (x2 i β + xi vi ) i=1 n 1 1 √ nn x2 i + i=1 1 n n xi vi i=1 1 (since β = √ ) n 0 · E(x2 i ) + E(xi vi ) 0 5 . 3(b). From assumption (2) and the Martingale Differences CLT. 6 . 2 δ − δ → (σx + a)−1 b. i=1 d Therefore. b) are jointly normal because the joint distribution is the limiting distribution of √ ng = √ √ ng 1 1 n( n n i=1 xi vi ) . the first term of RHS converges in probability 2 to E(x2 i ) = σx > 0. d (a. we obtain √ 2 nsxz → σx + a.(d) √ 1 nsxz = n n x2 i i=1 1 +√ n n xi vi . Assumption (2) and the Martingale Differences CLT imply that 1 √ n n xi vi → a ∼ N (0. s11 ). d (e) δ − δ can be rewritten as √ √ δ − δ = ( nsxz )−1 ng 1 . the answer is No. 1) element of S. By using the result of (d) and Lemma 2. 2 (f) Because δ − δ converges in distribution to (σx + a)−1 b which is not zero. by Lemma 2. s22 ).4(a). d where s11 is the (1. we obtain √ ng 1 → b ∼ N (0. i=1 By assumption (1) and the Ergodic Theorem. 5. . 2004 Hayashi Econometrics Solution to Chapter 4 Analytical Exercises 1 1 1.1-4. That leaves Assumption 1.3 (the rank condition that Z (defined in Analytical Exercise 1) be of full column rank). .6.5. j ) element of the n × n matrix E(εm εh | Z) is E(εim εjh | Z). this becomes E(εim εjh | zi . 2004. . .18) together imply Assumptions 1. 320) In this part only. εjh ) | zi . zj . 4. Assumption 1. Assumption 4. h) block is Amh .5. (by the Law of Iterated Expectations) = E [εjh E(εim | zi . . εjh ) | zi . zj ] (by linearity of conditional expectations) = E [εjh E(εim | zi ) | zi . . The (i. Going back to the formula (4.2 (strict exogeneity) and (1.5.5. February 23. the first matrix on the RHS (the matrix to be inverted) is a partitioned matrix whose (m.January 8. For j = i. . j )). it suffices to show that Zm is of full column rank for m = 1. M . By Assumption 4. zi . The proof goes as follows. . 278 of the book. .1). Assumption 1. .3 and (1. zj ) = E [E(εim εjh | zi . zj )) =0 (since E(εim | zi ) = 0).1-1. 1 . zj ] (since (εim . . zj . 2. states that E(εim εih | zi ) = E(εim εih | xi ) = σmh . for notational brevity. zn ) = E(εim εjh | zi . let zi be a m Lm × 1 stacked vector collecting (zi1 . the second matrix on the RHS of (4. z2 . Similarly. .1) have been verified in 3(b). It should be easy to see that it equals 1 nZ (Σ −1 1 n [Z (Σ −1 ⊗ P)Z]. . . . answer to 3(c)(i) simplified. E(εim | Z) = E(εim | z1 . zi ) is independent of zj (j = i)) =0 (by the strengthened orthogonality conditions). Since xim = xi and xi is the union of (zi1 . z2 . εjh . E(εim εjh | Z) = E(εim εjh | z1 . Since Z is block diagonal. zj ) is independent of zk (k = i. . ziM ) in the SUR model.6. . .7. zi ) is independent of (εjh . . zj ] (since (εim . zn ) (since Z collects zi ’s) = E(εim | zi ) (since (εim . It should be easy to show that Amh = n Zm PZh and that cmh = n Zm Pyh . The sprinkled hints are as good as the answer. ziM ). zj ) For j = i. the conditional homoskedasticity assumption.12) equals ⊗ P) y . (c) (i) We need to show that Assumptions 4.1 (linearity) is obviously satisfied. (b) (amplification of the answer given on p. . 3. 2. E(εim εjh | Z) = E(εim εih | Z) = E(εim εih | zi ).7 and (4.12) on p. (iv) Avar(δ SUR ) is (4. Sxz and W are block diagonal. MED) are 2 in number while the number of the regressors is 3. (b) The claim to be shown is just a restatement of Propositions 3. 5.7(a). the M matrix Dm is K × L. so WSxz (Sxz WSxz )−1 is block diagonal. By Assumption 4. the instruments (1.15) where Amh is given by (4.5. D is of full column rank. 4. (iii) The unbiasedness of δ SUR follows from (i). The rest follows from the inequality in the question and the hint. So the square matrix E(xi xi ) is non-singular. Let Dm be the Dm matrix introduced in the answer to (c). β2 E(MED) E(S80 · MED) E(IQ · MED) E(LW80 · MED) The condition for the system to be identified is that the 4 × 3 coefficient matrix is of full column rank. 4. the n × K data matrix X is of full column rank for sufficiently large n. so zim = Dm xi and Zm = XDm .5. and Proposition 1.5. 280. The m=1 Km × L matrix Σxz in Assumption 4. Since n X X (where X is the n × K data matrix.6.5 imply that the Avar of the efficient multiple-equation GMM estimator is (Σxz S−1 Σxz )−1 . Zm is of full column rank as well.5.7 and the condition (implied by (4. (c) Use (A9) and (A6) of the book’s Appendix A. (d) If the same residuals are used in both the efficient equation-by-equation GMM and the efficient multiple-equation GMM.9)).16 ) on p. then the S in (∗∗) and the S in (Sxz S−1 Sxz )−1 are numerically the same. (e) Yes. Assumption 4. and 4. (KM ×L) (KM ×L) DM Since Σxz is of full column rank by Assumption 4. the plim of S is S. (a) For the LW69 equation. 320-321)       1 E(S69) E(IQ) E(LW69)    β0  1 E(S80) E(IQ) E(LW80)      E(MED) E(S69 · MED) E(IQ · MED) β1 = E(LW69 · MED) . Since the dimension of xi is K and that of zim is L.1. we have S = Σ ⊗ E(xi xi ) (as 1 in (4. Since Zm consists of columns selected from the columns of X. (f) The hint is the answer. So the order condition is not satisfied for the equation.2. The hint shows that it equals the plim of n · Var(δ SUR | Z). (ii). The only part that is not so straightforward is to show in part (i) that the M n × L matrix Z is of full column rank. . 2 . (a) Assumptions 4. (ii) The hint is the answer.1-4. Under Assumptions 4.5.18)) that the set of instruments be common across equations. (b) (reproducing the answer on pp.4 and since E(xi xi ) is non-singular.4 and 3.  Σxz = [IM ⊗ E(xi xi )]D where D ≡ . So Z = (IM ⊗ X)D is of full column rank if X is of full column rank. . as defined in Analytical Exercise 1) converges almost surely to E(xi xi ).4 can be written as   D1  .S is non-singular. X is of full column rank for sufficiently large n if E(xi xi ) is non-singular. the answer is a straightforward modification of the answer to (c). (d) For the most part.2 implies that the plim of Sxz is Σxz . (a) Let B. . 6. . (3) →p 0. So E(zim · εih ) is finite and (2) →p 0 because δ m − δ m →p 0. Also let  1 n  i=1 xi · yi1 n   . . i=1 (2) = −(δ m − δ m ) (3) = −(δ h − δ h ) (4) = (δ m − δ m ) 1 n 1 n 1 n n zim · εih . . E(zimj ih 1 n i · where zimj is the j -th element of zim . (reproducing the answer on p. So (4) →p 0. Regarding (2). i=1 n zih · εim . then E(IQ · MED) = E(IQ) · E(MED) and the third column of the coefficient matrix is E(IQ) times the first column. Similarly. Sxz . So 1 n where (1) = 1 n n n [εim − zim (δ m − δ m )][εih − zih (δ h − δ h )] = (1) + (2) + (3) + (4). i=1 εim εih . and W be as defined in the hint.1 and 4.(c) (reproducing the answer on p. 1 −1 1 (B S− B S− xx B) xx 1 n n i=1 xi · yiM 3 . For (4). sxy =  . = . (1) →p σmh (≡ E(εim εih )). i=1 n zim zih (δ h − δ h ). So the matrix cannot be of full column rank. by Cauchy-Schwartz. 7. zim zih converges in probability to a (finite) matrix. 321) εim = yim − zim δ m = εim − zim (δ m − δ m ).2 and the assumption that E(zim zih ) is finite. (M K ×1) n 1 i=1 xi · yiM n Then δ 3SLS = Sxz WSxz = (I ⊗ B )(Σ = Σ −1 −1 Sxz Wsxy −1 −1 1 ⊗ S− xx )(I ⊗ B) −1 (I ⊗ B )(Σ −1 1 ⊗ S− xx )sxy 1 ⊗ B S− xx B Σ Σ −1 1 ⊗ B S− xx sxy 1 −1 = Σ ⊗ (B S− xx B) −1 1 ⊗ B S− xx sxy 1 −1 1 = IM ⊗ (B S− B S− xx B) xx sxy   n 1 −1 1 1 (B S− B S− xx B) xx n i=1 xi · yi1   . E(|zimj · εih |) ≤ 2 ) · E(ε2 ). by Assumption 4. i=1 As usual. 321) If IQ and MED are uncorrelated.2. under Assumption 4. use the SUR formula you derived in Analytical Exercise 2(b)). 1 10. (b) Avar(δ 1.2) on p. where Sxz and sxy are as defined in (4. The OLS estimator derived above is (trivially) efficient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim · εim ) = 0 for all m. 11. (a) The efficient multiple-equation GMM estimator is Sxz S−1 Sxz −1 Sxz S−1 sxy . The hint is the answer (to derive the formula in (b) of the hint. So the above formula becomes 1 S− xz S Sxz −1 1 Sxz S−1 sxy = S− xz sxy .which is a stacked vector of 2SLS estimators. 8. it is possible to choose δ so that gn (δ ) defined in the hint is a zero vector.3SLS ) equals G−1 .2SLS ) = σ11 A− 11 . Since xim = zim here. 4 . Since the sets of orthogonality conditions differ.2. Because there are as many orthogonality conditions as there are coefficients to be estimated. the efficient GMM estimators differ. (b) The SUR is efficient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim · εih ) = 0 for all m. (a) Avar(δ 1. which is a stacked vector of OLS estimators. Sxz is square. h. i=1 which is none other than the pooled OLS estimator. (b) The hint is the answer. 9. we obtain δ= 1 n n zi 1 zi 1 + · · · + i=1 1 n n ziM ziM i=1 −1 1 n n zi1 ·yi1 + · · · + i=1 1 n n ziM ·yiM . 266 and S−1 is a consistent estimator of S. Solving 1 n n zi1 ·yi1 + · · · + i=1 1 n n ziM ·yiM − i=1 1 n n zi 1 zi 1 + · · · + i=1 1 n n ziM ziM δ = 0 i=1 for δ . The hint shows that G = 1 σ11 A11 . MD = IM n − (In ⊗ 1M ) [(In ⊗ 1M ) (In ⊗ 1M )] = IM n − (In ⊗ 1M ) [(In ⊗ 1M 1M )] −1 −1 −1 (In ⊗ 1M ) (In ⊗ 1M ) = IM n − (In ⊗ 1M ) [(In ⊗ M )] (In ⊗ 1M ) 1 = IM n − (In ⊗ 1M )(In ⊗ )(In ⊗ 1M ) M 1 = IM n − (In ⊗ 1M 1M ) M 1 = (In ⊗ IM ) − (In ⊗ 1M 1M ) M 1 = (In ⊗ (IM − 1M 1M )) M = In ⊗ Q. − 1M Fn b) The desired result follows from this because b equals the fixed-effects estimator β FE and   fi1 M  .4)). Define MD as in equation (4) of the hint. By the Frisch-Waugh theorem.  . (a) Let (a .  1 M (1M y1 1 M (1M yn  a=  − 1M F1 b)  . But the above claim follows immediately if we 1 can show that MD = In ⊗ Q. β ) . . b = fim b.2) and (5. D D = M In . This is because the fixed-effects estimator can be written as (F F)1 F y (see (5. It should be straightforward to show that     1M F1 b 1M y 1  .2. where y and F are defined in (5. .3). The proof is complete if we can show the claim that y = MD y and F = MD F. 2004 Hayashi Econometrics Solution to Chapter 5 Analytical Exercises 1. b is the OLS coefficient estimate in the regression of MD y on MD F. we have a = (D D)−1 (D y − D Fb).    . b ) be the OLS estimate of (α . . 1M yn 1M Fn b Therefore.  .  1M yi = (yi1 + · · · + yiM ) and 1M Fn b = 1M  . fiM m=1 1 .2. (b) As indicated in the hint to (a). D y =  .2. .January 9. . . where Q ≡ IM − M 1M 1M . D Fb =  . the annihilator associated with 1M . 2 .4. As shown in Analytical Exercise 4. Fj ) is indep. . Next. j by (i)) = E[E(η i η j | Fi .1 holds for the OLS estimator (a. Fj ] =0 (since (η j . .(c) on p. which is a set of M equations. So the two SSR ’s are the same.1 ) on p. Then ((M −1)×M ) C 1M = ((M −1)×1) 0 .1. η i ) | Fi . Fi ) is indep. Fj ) is independent of (η i . Fj ] = E[η i E(η j | Fj ) | Fi . b). . Assumption 1. Fn ) = E(η i | Fi ) (since (η i . Fj ] = E[η i E(η j | Fi .1 (linearity) is none other than (3). 329 by C . 2 So E(ηη | W) = ση IM n (Assumption 1. Also. of Fj for j = i) by (i) =0 (by (ii)).1-1. . Propositions 1.2 (strict exogeneity) and Assumption 1. η i ) | Fi . E(η i | W) = E(η i | F) (since D is a matrix of constants) = E(η i | F1 .4 (spherical error term) to be verified. The following is an amplification of the answer to 1. . Fj . 2. .4). For i = j . to see that C 1M = 0 if C is an M × (M − 1) matrix created by dropping one column from Q. Fj ) (since (η i . This leaves Assumptions 1. we have: Q 1M = (M ×M ) (M ×1) 0 . the regressors are strictly exogenous (Assumption 1. first note that by construction of Q. η j . Assumption 1. E(η i η j | W) = E(η i η j | F) = E(η i η j | F1 . Fi . Therefore. . of Fk for k = i. The estimator is unbiased and the Gauss-Markov theorem holds. (b) By multiplying both sides of (5. Drop one row from Q and call it C and drop the corresponding element from the 0 vector on the RHS. Fn ) = E(η i η j | Fi . Fj . E(η i η i | W) = E(η i η i | F) = E(η i η i | Fi ) 2 = ση IM (by the spherical error assumption (iii)). the residual vector from the original regression (3) (which is to regress y on D and F) is numerically the same as the residual vector from the regression of y (= MD y) on F (= MD F)).3 is a restatement of (iv). Since the assumptions of the classical regression model are satisfied. (a) It is evident that C 1M = 0 if C is what is referred to in the question as the matrix of first differences.(f) in Chapter 1. Fi ) by (i)) (since E(η j | Fj ) by (ii)). 363. .2). we eliminate 1M · bi γ and 1M · αi .(c) What needs to be shown is that (3) and conditions (i)-(iv) listed in the question together imply Assumptions 1. Since E(gi gi ) is non-singular by (5.1. η i η i = C εi εi C.1. i=1 Fi ⊗ xi (C ⊗ IK ) (C C)−1 ⊗ i=1 n 1 n n n xi xi i=1 −1 −1 (C ⊗ IK ) n 1 n n Fi ⊗ xi i=1 Fi ⊗ xi i=1 n C(C C)−1 C ⊗ Q⊗ 1 n n 1 n xi xi i=1 1 n Fi ⊗ xi i=1 = Fi ⊗ xi i=1 xi xi i=1 −1 1 n n Fi ⊗ xi i=1 (since C(C C)−1 C = Q. (The last equality is by (5. • Since εi = 1M · αi + η i .1. Thus E(gi gi ) = (C ⊗ IK ) E[(εi εi ⊗ xi xi )](C ⊗ IK ) = (C ⊗ IK ) E(gi gi )(C ⊗ IK ) (since gi ≡ εi ⊗ xi ).1. the identification condition to be verified is equivalent to (5. we can rewrite Sxz and sxy as Sxz = (C ⊗ IK ) So Sxz WSxz = = 1 n 1 n 1 n n 1 n n Fi ⊗ xi . But as just shown above. • The random sample condition is immediate from (5.) • By the definition of gi . as mentioned in the hint). So η i η i = C εi εi C and E(η i η i | xi ) = E(C εi εi C | xi ) = C E(εi εi | xi )C = C ΣC.5).(c) Below we verify the five conditions. Similarly.6) and since C is of full column rank. E(gi gi ) is non-singular. 363-364. (d) Since Fi ≡ C Fi . So gi gi = C εi εi C ⊗ xi xi = (C ⊗ IK )(εi εi ⊗ xi xi )(C ⊗ IK ). • As shown on pp. This implies the orthogonality conditions because E(η i ⊗ xi ) = E[(C ⊗ IK )(η i ⊗ xi )] = (C ⊗ IK ) E(η i ⊗ xi ). we have: gi gi = η i η i ⊗ xi xi . sxy = (C ⊗ IK ) i=1 1 n n yi ⊗ xi .2). we have η i ≡ C η i = C εi . (5. as mentioned in the hint.8b) can be written as E(η i ⊗ xi ) = 0. i=1 3 .1. Sxz Wsxy = 1 n n Fi ⊗ xi i=1 Q⊗ 1 n n xi xi i=1 −1 1 n n yi ⊗ xi . • Regarding the orthogonality conditions.15) (that E(QFi ⊗ xi ) be of full column rank). 1. As noted on p. the random-effects estimator (4. we obtain M M Sxz WSxz = m=1 h=1 M M qmh 1 n 1 n n fim xi i=1 n 1 n 1 n n xi xi i=1 n −1 1 n 1 n n xi fih i=1 n . the required assumptions are (i) that the coefficient estimate 4 . In (c).8) and (4.6.6.7. i=1 m=1 h=1 Using the “beautifying” formula (4.6) with xim = xi .6. (e) The previous part shows that the fixed-effects estimator is not efficient because the W in (10) does not satisfy the efficiency condition that plim W = S−1 .7. 292-293 that these “beautified” formulas are numerically equivalent versions of (4.6. In particular. This is none other than the random-effects estimator applied to the system of M − 1 equations (9). (This is just (4. Σ = Ψ. which is about the estimation of error cross moments for the multipleequation model of Section 4. 293. we obtain (12) and (13) in the question. i=1 So Sxz WSxz Sxz Wsxy is the fixed-effects estimator. By setting Zi = Fi . It is shown on pp. we’ve verified that this matrix is of full column rank.) Since xi includes all the elements of Fi . By Proposition 4.9 ) on p. Besides linearity. (f) Proposition 4.Noting that fim is the m-th row of Fi and writing out the Kronecker products in full. where qmh is the (m.8 ) and (4. Under conditional homoskedasticity. zim = fim . the Σxz referred to in Assumption 4.6. i=1 n Fi Qyi .1.4 can be written as E(Fi ⊗ xi ). i=1 m=1 h=1 n M M Sxz Wsxy = m=1 h=1 qmh fim · yih = i=1 qmh fim · yih . as noted in the hint. S = E(η i η i ) ⊗ E(xi xi ). Thus.6.16b). h) element of Q.9). it should be routine to show that those conditions verified in (c) above are sufficient for the hypothesis of Proposition 4.6. 324. Sxz Wsxy = m=1 h=1 qmh fim xi i=1 xi xi i=1 −1 xi · yih i=1 .9).6. M n n M M M Sxz WSxz = m=1 h=1 M M qmh 1 n 1 n fim fih = i=1 n 1 n 1 n qmh fim fih . 1 W = Q⊗ n i=1 xi xi xi “dissappears”. So n −1 .6. can easily be adapted to the common-coefficient model of Section 4. with Ψ being a consistent estimator of E(η i η i ). yi = yi in (4. this expression can be simplified as Sxz WSxz = Sxz Wsxy = −1 1 n 1 n n Fi QFi . the efficient GMM estimator is given by setting W=Ψ −1 ⊗ 1 n n xi xi i=1 −1 .8) is consistent and asymptotically normal and the asymptotic variance is given by (4. F i i i i −1 ˇ Ψ ˇ −1 y ˇ i = Fi Ψ yi . E(xi xi ) is non-singular. n 4. F i 3. . plim SSR 2 2 2 = trace (C C)−1 ση C C = ση trace[IM −1 ] = ση · (M − 1). . Fi =  . (g) As noted in (e). the assumptions of Proposition 4. So the fixed-effects estimator is invariant to the choice of C. Multiply both sides of the equation by ηih and take expectations to obtain E(yim · ηih ) = E(ηim · ηih ) + ρ E(ηi.     y i1 y i0  . as was verified in (d).1 ) are transformed into M − 1 equations by B = CA. . So ˇ i replacing yi and F used with y ˇ Ψ ˇ −1 F ˇ i = F A(A ΨA)−1 A Fi = F AA−1 Ψ−1 (A )−1 A Fi = F Ψ−1 Fi .  yi =  . . So E(vi vi ) = E(C η i η i C) = C E(η i η i )C = 2 ση C C. . To see that the numerical values of (12) and (13) ˇ i ≡ B Fi and y ˇ i ≡ B yi . . Similarly. 2 C C (the last equality is by (h) Since η i ≡ C η i . Since xi contains all the elements of Fi . the fixed-effects estimator β FE is a GMM estimator. . it is numerically equal to the GMM estimator with 1 n n i=1 W = C C⊗ xi xi . Then ˇ i = A Fi and y ˇ is the estimated error cross moment matrix when (14) is ˇ i = A yi .7 holds for the present model in question. vi = C (yi − Fi β ) = C η i . .1.(here β FE ) used for calculating the residual vector be consistent and (ii) that the cross moment between the vector of regressors from one equation (a row from Fi ) and those from another (another row from Fi ) exist and be finite. As noted in (c).1. Clearly. So it is consistent. let F equations (5. the estimator can be written as a GMM estimator (Sxz WSxz ) −1 Sxz Wsxy .1 ). we have E(η i η i ) = E(C η i η i C) = ση 2 (15)). From (5. yiM yi. It has been verified in (f) that Ψ defined in (14) is consistent. is the fixed-effects estimator. That is.  . the original M are invariant to the choice of C. By the hint. m. (i) Evidently. replacing C by B ≡ CA in (11) does not change Q. . Proposition 4. m + 2. By setting Ψ = ση C C in the expression for W in the answer to (e) (thus setting 2 W = ση CC⊗ 1 n n i=1 xi xi −1 −1 ). Therefore. . (a) bi is absent from the system of M equations (or bi is a zero vector). if h = m + 1. As seen in (d). the cross moment assumption is satisfied.M −1 (b) Recursive substitution (starting with a substitution of the first equation of the system into the second) yields the equation in the hint. then we have: Ψ ˇ = A ΨA.   .m−1 · ηih ) + · · · + ρm−1 E(ηi1 · ηih ) (since E(αi · ηih ) = 0 and E(yi0 · ηih ) = 0) = 2 ρm−h ση 0 if h = 1. 5 . 2.7(c) holds for the present model. If Ψ F ˇ i replacing Fi .m−1 · ηih ) + · · · + ρm−1 E(ηi1 · ηih ) 1 − ρm + E(αi · ηih ) + ρm E(yi0 · ηih ) 1−ρ = E(ηim · ηih ) + ρ E(ηi. not by C. which. . . E(η i Fi ) = ση . not a matrix. we have: E(Fi Qη i ) = E[trace(Fi Qη i )] = E[trace(η i Fi Q)] = trace[E(η i Fi )Q] = trace[E(η i Fi )(IM − = trace[E(η i Fi )] − = trace[E(η i Fi )] − By the results shown in (b). . . . We’ve verified in 2(c) that E(C Fi ⊗ xi ) is of full column rank. So the matrix product above is nonsingular. E(QFi ⊗ xi ) is of full column rank.1. 1 ρ   0 1  ··· 0 So.2. we have E(Fi Ψ−1 Fi ) = E(C Fi ⊗ xi ) Ψ−1 ⊗ E(xi xi ) −1 E(C Fi ⊗ xi ).6 ). M written as ρ 1 .1. (a) The hint shows that E(Fi Fi ) = E(QFi ⊗ xi ) IM ⊗ E(xi xi ) −1 E(QFi ⊗ xi ). (1 − ρ)2 2 = ση (d) (5. 0 · · ·  0 · · · 0 ··· 1 11 )] M 1 trace[E(η i Fi )11 ] M 1 1 E(η i Fi )1.m−1 · ηih ) = 0 for h ≤ m − 1. trace[E(η i Fi )] = 0 and 1 E(η i Fi )1 = sum of the elements of E(η i Fi ) = sum of the first row + · · · + sum of the last row 2 = ση 1 − ρM −1 1 − ρM − 2 1−ρ + + ··· + 1−ρ 1−ρ 1−ρ M − 1 − M ρ + ρM .. 0 ··· ···  · · · ρM −2 · · · ρM −3    .. E(η i Fi ) can be  0 1 0 0  . 6 .2 (c) That E(yim · ηih ) = ρm−h ση for m ≥ h is shown in (b). Noting that Fi here is a vector. ··· .1. . (b) By (5. (c) By the same sort of argument used in (a) and (b) and noting that Fi ≡ C Fi .6) is violated because E(fim · ηih ) = E(yi. By (5. in the above expression for E(Fi Qη i ). ··· ··· ··· ρ2 ρ . .15). E(εi εi ) is non-singular.  . 2 .5) and (5. 5. Since E(xi xi ) is non-singular.15) is that E(Fi ⊗ xi ) be of full column rank (where Fi ≡ QFi ). The hint is the answer. E(Fi ⊗ xi ) = [IM ⊗ E(xi xi )](Q ⊗ IK )A. fiM  bi (a) The m-th row of Fi is fim and fim = xi Am . This question presumes that  fi1  . 7.  7 . By the hint.   .1. IM ⊗ E(xi xi ) is non-singular. (b) The rank condition (5. Multiplication by a non-singular matrix does not alter rank.  xi =  .  and fim = Am xi .6. n→∞ (d) (reproducing the answer on pp. n → ∞. So |ψj +k · ψk | ≤ A|ψk |. The hint is the answer.s. 1. which means {yt. k . n→∞ (c) Since yt. Then m 2 E[(yt. . there exists an A > 0 such that |ψj +k | ≤ A for all j. 2. 2. So |αm − αn | → ∞ as m. But E(yt. yt − µ and yt−j.n − µ →m. Now set ajk in (ii) to |ψj +k | · |ψk |. E[(yt − µ)(yt−j − µ)] = lim E[(yt.m − yt.m − yt.n − µ)(yt−j . Thus by (i).n )2 ] = E j =n+1 m ψj εt−j 2 ψj j =n+1 = σ2 (since {εt } is white noise) = σ 2 |αm − αn |.n ) = 0. Since {ψk } (and hence {Aψk }) is absolutely summable. . E[(yt. Let M≡ ∞ ∞ |ψj | and sk ≡ |ψk | j =0 j =0 |ψj +k |.n ) by (ii). {αn } converges.n − µ)]. 441-442 of the book) Since {ψj } is absolutely summable. So for any j . ∞ ∞ |ψj +k | · |ψk | < ∞. Then ∞ ∞ ∞ |ajk | = j =0 j =0 |ψk | |ψj +k | ≤ |ψk | j =0 |ψj | < ∞. yt−j − µ as n → ∞. yt as shown in (a). ∞ ∞ ∞ |γj | = σ 2 k=0 ψj +k ψk ≤ σ 2 k=0 |ψj +k ψk | = σ 2 k=0 |ψj +k | |ψk | < ∞. so is {ψj +k · ψk } (k = 0. j =0 k=0 This and the first inequality above mean that {γj } is absolutely summable. n → ∞.s. by (ii). 1 . E(yt ) = lim E(yt.) for any given j . Then {sk } is summable because |sk | ≤ |ψk | · M and {ψk } is absolutely summable. ψj → 0 as j → ∞. (a) Let σn ≡ n j =0 2 ψj . (b) Since yt.n )2 ] → 0 as m.n →m. Therefore. .s. Since {ψj } is absolutely summable (and hence square summable). Therefore.September 10. 2004 Hayashi Econometrics Solution to Chapter 6 Analytical Exercises 1.n − µ →m.n } converges in mean square in n by (i). 2(a). yt−j ) as n → ∞. (b) The result follows immediately from the MA representation yt−j − µ = εt−j + φ εt−j −1 + φ2 εt−j −2 + · · · . That is.n ) E(yt−j. yt−j. we have j n ξ j < bj . jnξj or j n ξ j ≤ B bj bj for j = 0. Writing down (8) for j = 0.n ) = Cov(h0 xt + h1 xt−1 + · · · + hn xt−n . yt−j. which is the desired 4. b b b bJ −1 . . Define B as B ≡ max Then.n ) − E(yt. result.n →m.n . λ2 ). n k=0 n =0 x hk h γj + −k converges as n → ∞. J − 1. 2 (J − 1)n ξ J −1 ξ 2n ξ j 3n ξ 3 . B≥ (d) The hint is the answer. (a) (8) solves the difference equation yj − φ1 yj −1 − φ2 yj −2 = 0 because yj − φ1 yj −1 − φ2 yj −2 j −j −j +1 −j +1 j +2 j +2 = (c10 λ− + c20 λ2 ) − φ2 (c10 λ− + c20 λ− ) 1 + c20 λ2 ) − φ1 (c10 λ1 1 2 j −j 2 2 = c10 λ− 1 (1 − φ1 λ1 − φ2 λ1 ) + c20 λ1 (1 − φ1 λ2 − φ2 λ2 ) =0 (since λ1 and λ2 are the roots of 1 − φ1 z − φ2 z 2 = 0). . (a) γj = Cov(yt.1 ) by yt−j − µ and take the expectation of both sides to derive the desired result... 5. 2 .n ) = E(yt. . c20 ) given (y0 . = (b) Since {hj } is absolutely summable. λ1 .n yt−j.n . 3 . . . by construction.. (a) Multiply both sides of (6. (c) For j ≥ J . 1.s. Then j n ξ j < bj < A bj for j ≥ J and j n ξ j ≤ B bj < A bj for all j = 0.2. using the facts (i) and (ii) displayed in Analytical Exercise 2. yt as n → ∞ by Proposition 6. we can show: n n x hk h γj + k=0 =0 −k = Cov(yt. Solve this for (c10 . J − 1. (b) This should be easy. 1. xt−j − ) x hk h γj + k=0 =0 −k . we have yt.3. 1 gives 1 −1 y0 = c10 + c20 . Choose A so that A > 1 and A > B . .n ) → E(yt yt−j ) − E(yt ) E(yt−j ) = Cov(yt . h0 xt−j + h1 xt−j −1 + · · · + hn xt−j −n ) n n = k=0 =0 n n hk h Cov(xt−k . y1 = c10 λ− 1 + c20 λ2 . Then... y1 .. zn zn = trace(zn zn ) = trace[ξ t−n [(F)n ] [(F)n ]ξ t−n ] = trace{ξ t−n ξ t−n [(F)n ] [(F)n ]} Since the trace and the expectations operator can be interchanged. E[(xt − xt.2. 2 1−φ 1 − φ2 1 − φ2(t−j ) 2 σ2 Cov(yt . the result proved in (a) implies that n j =1 |γj | → 0. verifying (6. n 3 . (a) The hint is the answer. Fn = T(Λ)n T−1 converges to a zero matrix. Contrary to the suggestion of the hint.1) that lim E(zn zn ) = 0. γ1 = φ.1) element of T(Λ)n T−1 . what needs to be shown is that (F)n ξ t−n →m. here we show an equivalent claim (see Review Question 2 to Section 2. (a) 1 − φt c c + φt E(y0 ) → . E(zn zn ) = trace{E(ξ t−n ξ t−n )[(F)n ] [(F)n ]}. 3. we have E(ξ t−n ξ t−n ) = V (the autocovariance matrix). γ0 /n → 0. (d) By the hint. which is to show the mean-square convergence of the components of zn . 1−φ 1−φ 1 − φ2t 2 σ2 Var(yt ) = σ + φ2t Var(y0 ) → .(c) Immediate from (a) and (b). 0. 9. Let zn ≡ (F)n ξ t−n . 2 1−φ 1 − φ2 E(yt ) = (b) This should be easy to verify given the above formulas. (d) Set j = 1 in (10) to obtain γ1 − ργ0 = 0. yt−j ) = φj σ + φ2(t−j ) Var(y0 ) → φj . n→∞ (since xt = xt. 8. So by the inequality for Var(y ) shown in the question. (e) ψn is the (1. 2 1−φ 1 − φ2 Then use (10) as the first-order difference equation for j = 2.5). Since ξ t is covariance-stationary. (a) Should be obvious.s. in γj with the initial σ2 σ2 j condition γ1 = 1− φ2 φ. This gives: γj = 1−φ2 φ .n + φn xt−n ) (since |φ| < 1 and E(x2 t−n ) < ∞). Combine this with (9) to solve for (γ0 . . 6. 2 (b) Since γj → 0. . (b) By the definition of mean-square convergence. 7.n )2 ] = E[(φn xt−n )2 ] = φ2n E(x2 t−n ) →0 (c) Should be obvious. Also. . γ1 ): γ0 = σ2 σ2 . Since all the roots of the characteristic equation are less than one in absolute value.n )2 ] → 0 as n → ∞. what needs to be shown is that E[(xt − xt. Var(y ) → 0. We can therefore conclude that E(zn zn ) → 0. n N n n n j aj ≤ j =1 j =1 k=j ak + j =N +1 k=j ε ak < N M + (n − N ) .2). (a) By the hint. (b) From (6. 1− n n j =1 j =1 ∞ The term in brackets converges to j =−∞ γj if {γj } is summable. 4 .5. 2 So 1 n n j aj < j =1 NM n−N ε NM ε + < + . n n 2 n 2 By taking n large enough. (a) has shown that the last term converges to zero if {γj } is summable. √ n−1 Var( n y ) = γ0 + 2 j =1   n−1 n−1 2 j γj = γ0 + 2 γj  − j γj .10. N M/n can be made less than ε/2. θ ) = f (y |x. θ )dy = 0 . H(w. (5) (p×1) Then. (c) By the hint. 2 2 2σ n t=1 1 . θ ) = into (3). 2. 505 is reproduced here. θ 0 )f (y | x. θ 0 )dy = E[s(w. θ 0 )] > 0 by hypothesis. Differentiating both sides of this identity with respect to θ . By the Law of Total Expectation. a(w) is non-constant by (a). Substituting this (4) (p×1) 0 . 2004 Hayashi Econometrics Solution to Chapter 7 Analytical Exercises 1. θ )f (y | x. E[a(w)] = 1. But log(a(w)) = log f (y |x.September 14. (1) This is an identity. E[a(w)|x] = 1. E[log(a(w))] < log(1) = 0. we obtain the desired result. θ )dy = ∂ f (y | x. θ ) is a hypothetical density. in particular. Setting θ = θ 0 . the objective function is the average log likelihood: n 1 1 1 1 2 Qn (θ ) = − log(2π ) − log(σ ) − 2 (yt − xt β )2 . θ ) f (y | x. its integral is unity: f (y | x. by the Law of Total Expectations. This holds for any θ ∈ Θ. θ ) = f (y |x. we obtain ∂ f (y | x. (p×p) The desired result follows by setting θ = θ 0 . θ )dy + s(w. θ ) s(w. 3. θ )dy = | x. (b) Set c(x) = log(x) in Jensen’s inequality. θ 0 ). valid for any θ ∈ Θ. we obtain s(w. (a) Since a(w) = 1 ⇔ f (y |x. s(w. θ )f (y |x. (b) By the hint. θ )dy = 1. θ 0 ) | x] = 0 . then ∂ ∂θ f (y | x. (d) By combining (b) and (c). But Prob[f (y |x.) Since f (y | x. (a) For the linear regression model with θ ≡ (β . ∂θ ∂ ∂ θ f (y (3) But by the definition of the score. θ ) = f (y |x. we obtain s(w. θ )f (y | x. θ 0 ). for θ 0 . (2) ∂θ (p×1) If the order of differentiation and integration can be interchanged. θ )dy. θ ) − log f (y |x. θ ). σ 2 ) . (a) (The answer on p. we have Prob[a(w) = 1] = Prob[f (y |x. θ 0 )]. θ )dy = 0 . (d) The a(θ ) and A(θ ) in Table 7. n 2 2 2 n The unconstrained ML estimator (β . 1 x − 1 > log(x) > 1 − x .2 for the present case are a(θ ) = Rβ − c. average log likelihood with respect to β .3. 2 .To obtain the concentrated average log likelihood. Reproducing (part of) (7.18) − E H(wt . and 1 1 1 1 1 1 1 1 Qn (θ ) = − log(2π ) − − log( SSRU ). 1 n SSR(β ) and σ 2 = 1 n SSR(β ) into the concentrated average log 2 (c) As explained in the hint.18) of Example 7. . we obtain the concentrated average log likelihood (concentrated with respect to σ 2 ): Qn (β . LR/n = log(x). and then setting σ 2 = n The constrained ML estimator.10. is obtained from doing the same subject to the constraint Rβ = c. observe that 1 1 ∂Qn (θ ) 1 t=1 xt (yt − xt β ) σ2 n X (y − Xβ ) = = n 1 1 1 2 0 ∂θ SSR R − 2σ2 + 2σ4 n t=1 (yt − xt β ) n A(θ ) = (r ×K ) R . (b) Just substitute σ 2 = likelihood above. both σ 2 and σ 2 are consistent for σ0 .  1 0 2 E(xt xt ) σ0 . Also observe that for x > 1. both Σ and Σ are consistent for − E H(wt . (7. (β . as clear from the expression for the concentrated average log likelihood shown above. and just do the matrix algebra. SSRR 1 (f) Let x ≡ SSR . (r ×1) 0 . Observe that their values at x = 1 are all 0 and the slopes at x = 1 are all one. Also. θ 0 ) =  1 0 4 2σ 0 Clearly. 1 1 1 1 1 SSR(β )) = − log(2π ) − − log( SSR(β )). (e) The hint is the answer. σ 2 ). n Substituting this into the average log likelihood. which yields σ2 = 1 n n (yt − xt β )2 ≡ t=1 1 SSR(β ). .2 formulas. Draw U the graph of these three functions of x with x in the horizontal axis. But. take the partial derivative with respect to σ 2 and set it equal to 0. Then x ≥ 1 and W/n = x − 1. which yields β . 2 2 2 n 2 2 2 n Substitute these expressions and the expression for Σ and Σ given in the question into the Table 7. Qn (θ ) = − log(2π ) − − log( SSRR ). θ 0 ) because both σ 2 and σ 2 are n 1 2 consistent for σ0 and n t=1 xt xt is consistent for E(xt xt ).3. σ 2 ) of θ 0 is obtained by maximizing this concentrated 1 SSR(β ). maximizing the concentrated average log likelihood is equivalent to minimizing the sum of squared residuals SSR(β ). and LM/n = 1 − x . xt xt Cm + xt vt . (a) Multiply both sides of ztm = yt Sm . . Cm + xt vt . we have yt − Π xt = vt + (Π0 − Π) xt . xt Cm from left by xt to obtain 3. So for sufficiently large n. xt xt Cm . . From the hint. So (yt + Γ−1 Bxt )(yt + Γ−1 Bxt ) = [vt + (Π0 + Γ−1 B)xt ][vt + (Π0 + Γ−1 B)xt ] = vt vt + (Π0 + Γ−1 B)xt vt + vt xt (Π0 + Γ−1 B) + (Π0 + Γ−1 B)xt xt (Π0 + Γ−1 B) . n n n (yt − Π xt )(yt − Π xt ) = t=1 t=1 n vt vt + (Π − Π) t=1 n xt xt (Π − Π). xt ztm = xt yt Sm . 2. . . (b) Use the reduced form yt = Π0 xt + vt to derive yt + Γ−1 Bxt = vt + (Π0 + Γ−1 B)xt as in the hint. Ω(Π) is positive definite. . Since yt = Π0 xt + vt . But (Π − Π) xt xt (Π − Π) = t=1 t=1 Π − Π) xt xt (Π − Π) is positive semi-definite. 0 = xt xt Π0 Sm . . |Ω0 + (Π0 − Π) E(xt xt )(Π0 − Π)| ≥ |Ω0 | > 0. . . . xt ztm = xt xt Π0 Sm . Substitute this into (∗) to obtain . So E[(yt − Π xt )(yt − Π xt ) ] = E[(vt + (Π0 − Π) xt )(vt + (Π0 − Π) xt ) ] = E(vt vt ) + E[vt xt (Π0 − Π)] + E[(Π0 − Π) xt vt ] + (Π0 − Π) E(xt xt )(Π0 − Π) = E(vt vt ) + (Π0 − Π) E(xt xt )(Π0 − Π) So Ω(Π) → Ω0 + (Π0 − Π) E(xt xt )(Π0 − Π) almost surely. Take the expected value of both sides and use the fact that E(xt vt ) = 0 to obtain the desired result. (∗) Do the same to the reduced form yt = xt Π0 + vt to obtain xt yt = xt xt Π0 + xt vt . 1 . .September 22. (since E(xt vt ) = 0). By the matrix algebra result cited in the previous question. .0 . 2004 Hayashi Econometrics Solution to Chapter 8 Analytical Exercises 1. D ≡ (Π0 + Γ−1 B) E(xt xt )(Π0 + Γ−1 B) is a zero matrix only if Π0 + Γ−1 B = 0. A is positive definite. In this 1 −1 −1 expression. Magnus and Heinz Neudecker. 21 of Matrix Differential Calculus with Applications in Statistics and Econometrics by Jan R. the probability limit of Ω(δ ) is given by this expectation. Hence 1 −1 plim |Ω(δ )| = |A + D| ≥ |A| = |Γ− 0 Σ0 (Γ0 ) |. E(vt vt ) equals Γ− 0 Σ0 (Γ0 ) because by definition vt ≡ Γ0 εt and Σ0 ≡ E(εt εt ).i.d. Since E(xt xt ) is positive definite. with equality “|A + D| = |A|” only if D = 0. Wiley. Since Σ0 is positive definite and Γ0 is non-singular. the LHS of (8) is αm = The RHS is  em − δm Sm (Km ×M ) − γ11 1 −β11 −β12 0 . 1988) Let A be positive definite and B positive semi-definite. xt } is i. 0 Cm (Km ×K ) = 0 1 0 0 0 − γ11 β11 β12 2 . (c) What needs to be proved is that plim |Ω(δ )| is minimized only if ΓΠ0 + B = 0. Since E(xt xt ) is positive definite.. Since {yt . 552): (Theorem 22 on p. Let A ≡ −1 −1 1 B) E(xt xt )(Π0 + Γ− 0 Σ0 (Γ0 ) be the first term on the RHS of (7) and let D ≡ (Π0 + Γ −1 −1 Γ B) be the second term. (d) For m = 1. which holds if and only if ΓΠ0 + B = 0 since Γ is non-singular (the parameter space is such that Γ is non-singular). D is positive semi-definite. (Mm ×M )  0 (1×(Mm +Km )) (Mm ×K ) 0    1  0 0 0 0 0 0 1 0 0 0 1  0  0 .Taking the expected value and noting that E(xt vt ) = 0. Then use the following the matrix inequality (which is slightly different from the one mentioned in Analytical Exercise 1 on p. we obtain E[(yt + Γ−1 Bxt )(yt + Γ−1 Bxt ) ] = E(vt vt ) + (Π0 + Γ−1 B) E(xt xt )(Π0 + Γ−1 B) . Then |A + B| ≥ |A| with equality if and only if B = 0. Ax = y with A ≡ [Π0 Sm . 0 (1×Lm ) Dropping xtK from the list of instruments means dropping the last row of Fm . .5. Cm ].5.16). By the same argument given in (e) with δ m replaced by δ 0m shows that δ 0m is a solution to (10). Γ0 Π0 + B0 = 0. we let Fm stand for the K × Lm matrix [Π0 Sm . y ≡ π 0m . . So the last row of of Fm is a vector of zeros:   Fm   Fm = ((K −1)×Lm ) . Since xtK does not appear in the system. B . Using (6) on (4. (e) Since αm is the m-th row of Γ . the m-th row of of the LHS of (9) equals        Sm 0   Π Π0 (Mm ×M ) (Mm ×K ) (M ×0 K ) (by (8)) αm = em − δm     0 Cm IK (1×(Mm +Km ))   I K (K ×M ) m (Km ×K ) = em Π0 Sm − δm IK 0 0 Cm Π0 IK . .15) with (4. Cm ]. x ≡ δ m . .5. We have shown in (a) that this condition is equivalent to the rank condition for identification for the m-th equation.16) on p. 0 This shows that the asymptotic variance is unchanged when xtK is dropped. In this part. be of full column rank (that is. 4. the rank of the matrix be equal to the number of columns. (10 ) A necessary and sufficient condition that δ 0m is the only solution to (10 ) is that the coefficient matrix in (10 ).5. 278. the last row of Π0 is a vector of zeros and the last row of Cm is a vector of zeros. which does not alter the full column rank condition. Sm Π0 = [[Π0 . which is K × Lm (where Lm = Mm + Km ). Rewrite (10) by taking the transpose: . (g) The hint is the answer.10)). 3 .. which is Lm ). we obtain Amh = Fm E(xt xt )Fh = Fm 0 E(xt xt ) E(xtK xt ) E(xtK xt ) E(x2 tK ) Fh = Fm E(xt xt )Fh . (f) By definition (see (8. The asymptotic variance of the FIML estimator is given in (4. . IK ] e m ] − δ m Cm = π 0m − δ m Sm Π0 Cm (by the definition of π 0m ). Since E(ξ0 / T ) → 0 and Var(ξ0 / T ) → 0. we have 1 T T ∆ξt · ξt−1 = t=1 1 ξT √ 2 T 2 − 1 ξ0 √ 2 T 2 − 1 2T T (∆ξt )2 . 2. consider the expression ξT / T in the first term on the RHS of (∗). X ∼ N (0.2(d) to the numerator and Proposition 9. Next. (d) • First. to 0. So √ 1 T T T ∆ξt → λX.3a)).3). t=1 d where λ2 is the long-run variance of ∆ξt . √ T tion 6. t=1 (∗) √ √ Consider the second term on the RHS of (∗). and hence in probability. Finally. since T 1 2 ∆ξt is ergodic stationary.4(a) we conclude that the RHS of (∗) converges in distribution to λ 2 X − 2 γ0 . Just set λ2 = γ0 in (4) of the question. (a) The hint is the answer. T · (ρµ − 1) = 1 T 1 T2 T µ t=1 ∆yt yt−1 . √ ξ0 / T converges in mean square (by Chevychev’s LLN). t=1 ξ0 vanishes.September 16. Regarding the third term on the RHS of (∗). λ2 = γ0 .9 is satisfied (in particular. T µ 2 t=1 (yt−1 ) Apply Proposition 9. Since ∆ξt is I(0) satisfying (9.2. α∗ = 1 T 1 T 1 T 1 T T (yt − ρµ yt−1 ) t=1 T = (∆yt − (ρµ − 1)yt−1 ) t=1 T = ∆yt − (ρµ − 1) t=1 T 1 T T yt−1 t=1 = 1 ∆yt − √ T · (ρµ − 1) T t=1 1 1 1 √ TT T yt−1 t=1 . (b) From (a). . From the hint. 1). by Lemma T 2 1 2 2.2. 21 t=1 (∆ξt ) converges in probability to 2 γ0 . 2004 Hayashi Econometrics Solution to Chapter 9 Analytical Exercises 1. So the second term vanishes (converges in probability to zero) (this can actually be shown √ directly from the definition of convergence in probability). By the algebra of OLS. It can be written as √ 1 ξ 1 ξ √T = √ (ξ0 + ∆ξ1 + · · · + ∆ξT ) = √0 + T T T T T T ∆ξt . (c) Since {yt } is random walk. the absolute summability in the hypothesis of the Proposition is satisfied because it is implied by the one-summability (9. a proof that α∗ →p 0.1)-(9.2(c) to the denominator.2. the hypothesis of ProposiAs just seen. we first note that √ T · (ρµ − 1) vanishes because T 1 1 T · (ρµ − 1) converges to a random variable by (b).1 The first term after the last equality. the whole second term vanishes. T τ 2 t=1 (yt−1 ) τ τ τ . T t=1 ∆yt yt−1 converges to a random variable. it should be easy to show that the first term on the RHS of (∗) converges to γ0 in probability. To show that the second term 1 after the last equality vanishes.2(a). Hence the first term of (∗∗) vanishes. So t=1 t−1 T 1 τ ∆ξt ξ t −1 T · (ρτ − 1) = T1 t=1 . rewrite it as 1 2 · [T · (ρµ − 1)] · T −1 T √ 2 T 1 1 ∆yt yt−1 − · [T · (ρµ − 1)] · α∗ · √ T −1 TT t=1 T T T yt−1 . • Now turn to s2 . by Lemma 2. Let ξt and ξt be as defined in the hint. 2 . T T s2 = 1 T −1 T (∆yt − α∗ )2 − t=1 2 1 · [T · (ρµ − 1)] · T −1 T T (∆yt − α∗ ) · yt−1 t=1 T 1 1 · [T · (ρµ − 1)]2 · 2 + T −1 T (yt−1 )2 . By construction.4(b). (a) The hint is the answer. we have T · (ρτ − 1) = 1 T 1 T2 T τ t=1 ∆yt yt−1 . By (6) in the hint. So does T · (ρµ − 1). (e) By (7) in the hint and (3). From the hint. (c) Just observe that λ2 = γ0 if {yt } is a random walk with or without drift. shows that the third term of (∗) vanishes. t=1 (∗) Since α∗ →p 0. Then ∆yt = δ + ∆ξt and yt = ξt T τ y = 0.2(b). It should now T T be routine to show that the whole second term of (∗∗) vanishes. (b) From (a). T t=1 ∆yt . Therefore. A similar argument. Use Proposition 9. (6) T 1 1 in the question means √ t=1 yt−1 converges to a random variable. T τ 2 ) ( ξ 2 t − 1 t =1 T Since {ξt } is driftless I(1). vanishes (converges to zero in probability) because ∆yt is ergodic stationary and E(∆yt ) = 0.2(c) and (d) with λ2 = γ0 = σ 2 and the fact that s is consistent for σ to complete the proof. (∗∗) t=1 1 By Proposition 9. Turning to the second term of (∗∗).2(e) and (f) can be used here. a little algebra yields tµ = ρµ − 1 s· 1 µ T 2 t=1 (yt−1 ) = s· 1 T T t=1 1 T2 µ ∆ yt yt −1 T µ 2 t=1 (yt−1 ) . 3. Regarding the second term. Proposition 9. √ t=1 yt−1 T T converges to a random variable. this time utilizing Proposition 9. 7 has an intercept.14). Then by the ergodic theorem T 1 this second term vanishes. .. The proof of Proposition 9. is independent of εt . where λ2 = σ 2 [ψ (1)]2 with σ 2 ≡ −1 ) →d λ Var(εt ). T t=1 εt →p 0. 2. Since E(∆yt−1 ) = 0 and E[(∆yt−1 )2 ] = γ0 (the variance of ∆yt ). Lastly. Since ηt−1 .4). √ t=1 yt−1 . So the whole third term vanishes.1) element of AT : Since {yt } is driftless I(1) under the null.7. 1 • (2. Therefore. Regarding the third term of (∗).2(c) can T µ 2 2 be used to claim that T12 t=1 (yt (W µ )2 .4.. T . cT = T (µ) µ εt t=1 (∆yt−1 ) = T (µ) εt t=1 (∆yt−1 ) . which is shown to converge to a random variable T 1 1 (Review Question 3 of Section 9. 1 T T yt−1 εt = ψ (1) t=1 1 T T wt−1 εt + t=1 1 T T ηt−1 εt + (y0 − η0 ) t=1 1 T T εt . φ(L)∆yt = εt . this element can be 2 1 T T ∆yt−1 t=1 . which is a function of (εt−1 . εt−2 . the off-diagonal elements vanish. . where εµ t is the residual from the regression of εt on a constant for t = 1. whose MA representation is ∆yt = ψ (L)εt with ψ (L) ≡ φ(L)−1 ) but the augmented autoregression in Proposition 9.6 and 9. The next term. Since {wt } is random walk and T 1 σ2 2 εt = ∆wt . The AT and cT for the present case is AT = 1 T2 1 1 √ T T 1 T 1 √ T T µ 2 t=1 (yt−1 ) T (µ) t=1 (∆yt−1 ) T t=1 µ µ yt −1 εt 1 1 √ T T 1 T 1 T 1 √ T T µ (µ) t=1 yt−1 (∆yt−1 ) T (µ) 2 ] t=1 [(∆yt−1 ) T t=1 µ yt −1 εt µ yt −1 . From the hint. this expression converges in probability to γ0 . 587-590. we have shown that AT is asymptotically diagonal: AT → d λ2 · 1 [W µ (r)]2 0 dr 0 3 0 . Let b and β be as defined in the hint. γ0 . The last term.. • Off diagonal elements of AT : it equals 1 1 √ TT 1 µ (∆yt−1 )(µ) yt −1 = √ T t=1 T 1 T 1 1 (∆yt−1 ) yt−1 − √ TT t=1 T T yt−1 t=1 1 T T ∆yt−1 t=1 . converges to a ranT T 1 dom variable by (6) assumed in Analytical Exercise 2(d).4. t=1 (∗) Consider first the second term on the RHS of (∗). Proposition 9. consider the first term on the RHS of (∗). 5. ). The term in the square bracket is (9. Comparing Proposition 9. Taken together. . • (1. Proposition 9. . T t=1 ∆yt−1 . we have: E(ηt−1 εt ) = E(ηt−1 ) E(εt ) = 0.2(b) with λ2 = γ0 = σ 2 implies T t=1 wt−1 εt →d 2 [W (1) − 1].2) element of AT : Since (∆yt−1 )(µ) = ∆yt−1 − T written as T T 1 1 [(∆yt−1 )(µ) ]2 = (∆yt−1 )2 − T t=1 T t=1 T t=1 ∆yt−1 . T converges to zero in probability. the null is the same (that {∆yt } is zero-mean stationary AR(p).7 (for p = 1) makes appropriate changes on the argument developed on pp. t=1 1 where ≡ wt−1 − T wt−1 . Using the results derived so far. 590 for the present case where the augmented autoregression has an intercept is T · (ρµ − 1) → d σ 2 ψ (1) · λ2 1 2 [W (1)µ ]2 − [W (0)µ ]2 − 1 1 [W µ (r)]2 0 or σ2 . Therefore. T t=1 1 • 2nd element of cT : Using the definition (∆yt−1 )(µ) ≡ ∆yt−1 − T easy to show that it converges in distribution to ∆yt−1 . (a) The hint is the answer.. γ0 · σ 2 ).2.20) and (9.2(d) to the random walk {wt }. 2.. 590. 7. Combine this with the BN decomposition yt−1 = ψ (1)wt−1 + ηt−1 + (y0 − η0 ) with wt−1 ≡ ε1 + · · · + εt−1 to obtain T 1 T µ wt −1 T µ yt −1 εt = ψ (1) t=1 T t=1 1 T T µ wt −1 εt + t=1 1 T T µ ηt −1 εt . For this purpose.21) on p.4. Repeating exactly the same argument that is given in the subsection entitled “Deriving Test 2 Statistics” on p.so (AT )−1 → d λ2 · 1 [W µ (r)]2 0 dr −1 0 0 −1 . 6. we obtain µ ηt −1 1 T T µ wt −1 εt → t=1 d σ2 2 [W (1)µ ]2 − [W (0)µ ]2 − 1 . This completes the proof of claim (9. The one-line proof displayed in the hint is (with i replaced by k to avoid confusion) ∞ ∞ ∞ ∞ ∞ ∞ |αj | = j =0 j =0 − k=j +1 ψk ≤ j =0 k=j +1 |ψk | = k=0 k |ψk | < ∞.) is one-summable as assumed in (9.4. we can claim that σ2λ ψ (1) is consistently estimated by 1/(1 − ζ ). (∗) where {ψk } (k = 0. the second term on the RHS vanishes. γ0 dr d λ2 µ · T · (ρµ − 1) → DFρ . we reproduce here the facts from calculus shown on pp. it should be c2 ∼ N (0. 429-430: 4 . We now justify each of the equalities and inequalities. Since ηt−1 is independent of εt . Noting that ∆wt = εt and applying Proposition 9. the 1st element of cT converges in distribution to c1 ≡ σ 2 · ψ (1) · 1 2 [W (1)µ ]2 − [W (0)µ ]2 − 1 . γ0 Now turn to cT .34) of Proposition 9. 1. d σ 2 ψ (1) √ T · (ζ1 − ζ1 ) → N 0.4.3a).7. is defined similarly. (b) The proof should be straightforward. µ 1 • 1st element of cT : Recall that yt −1 ≡ yt−1 − T t=1 yt−1 . . the modification to be made on (9. . Summing over j = 0.. . 1. Let ak = ψk 0 if k ≥ j + 1. Suppose {sk } is summable. If the limit as n → ∞ of the RHS exists and is finite. Then ∞ ∞ ∞ ∞ ∞ ∞ |ajk | < ajk j =0 k=0 < ∞ and j =0 k=0 ajk = k=0 j =0 ajk < ∞. {sk } is summable.. 2. So by (i) above. 0 otherwise. Then {ak } is absolutely summable because {ψk } is absolutely summable.(i) If {ak } is absolutely summable. then the limit of the LHS exists and is finite (this follows from the fact that if {xn } is non-decreasing in n and if xn ≤ A < ∞. {ajk } (j. 1. ajk = ∞ Then j =0 |ajk | = k |ψk | < ∞ for each k and sk = k |ψk |. k = 0. 2. Since {ψk } is one-summable. we obtain n ∞ n ∞ − j =0 k=j +1 ψk ≤ j =0 k=j +1 |ψk |. We now show that ∞ j =0 ∞ k=j +1 |ψk | is well-defined.e. .).. So the conditions in (ii) are satisfied for this choice of ajk . it is absolutely summable. we have ∞ ∞ ∞ ∞ − j =0 k=j +1 ψk ≤ j =0 k=j +1 |ψk |. provided that ∞ ∞ j =0 k=j +1 |ψk | is well-defined. n ∞ then the limit of xn exists and is finite. . set xn ≡ j =0 | − k=j +1 ψk |). otherwise. −∞ < ∞ ∞ ∞ k=0 ak < ∞) and ak ≤ k=0 k=0 |ak |. Suppose ∞ ∞ for each k and let sk ≡ j =0 |ajk |.. By one-summability of {ψk }. ∞ j =0 (ii) Consider a sequence with two subscripts. we have ∞ ∞ ∞ ∞ ∞ − k=j +1 ψk = k=j +1 ψk = k=0 ak ≤ k=0 |ak | = k=j +1 |ψk |. Thus. then {ak } is summable (i. n. We therefore conclude that ∞ ∞ ∞ ∞ ∞ ∞ ∞ |ψk | = j =0 k=j +1 j =0 k=0 ajk = k=0 j =0 ajk = k=0 k |ψk | < ∞. 5 . set ajk as |ψk | if k ≥ j + 1. In (ii). This completes the proof.

Econometrics_solutions to Analy - Fumio Hayashi

Comments

Description