Prob Stats Solution Manual Annotated

March 27, 2018 | Author: Hector Gutierrez | Category: Batting (Baseball), Mathematics, Science, Sports


Comments



Description

Probability and Stochastic ProcessesA Friendly Introduction for Electrical and Computer Engineers SECOND EDITION Problem Solutions September 28, 2005 Draft Roy D. Yates, David J. Goodman, David Famolari September 28, 2005 • This solution manual remains under construction. The current count is that 678 (out of 687) problems have solutions. The unsolved problems are 12.1.7, 12.1.8, 12.5.8, 12.5.9, 12.11.5 – 12.11.9. If you volunteer a solution for one of those problems, we’ll be happy to include it . . . and, of course, “your wildest dreams will come true.” • Of course, the correctness of every single solution reamins unconfirmed. If you find errors or have suggestions or comments, please send email: [email protected]. • If you need to make solution sets for your class, you might like the Solution Set Constructor at the instructors site www.winlab.rutgers.edu/probsolns. If you need access, send email: [email protected]. • Matlab functions written as solutions to homework problems can be found in the archive matsoln.zip (available to instructors) or in the directory matsoln. Other Matlab functions used in the text or in these homework solutions can be found in the archive matcode.zip or directory matcode. The .m files in matcode are available for download from the Wiley website. Two other documents of interest are also available for download: – A manual probmatlab.pdf describing the matcode .m functions is also available. – The quiz solutions manual quizsol.pdf. • A web-based solution set constructor for the second edition is available to instructors at http://www.winlab.rutgers.edu/probsolns • The next update of this solution manual is likely to occur in January, 2006. 1 Problem Solutions – Chapter 1 Problem 1.1.1 Solution Based on the Venn diagram M O T the answers are fairly straightforward: (a) Since T ∩ M = φ, T and M are not mutually exclusive. (b) Every pizza is either Regular (R), or Tuscan (T). Hence R ∪ T = S so that R and T are collectively exhaustive. Thus its also (trivially) true that R∪ T ∪ M = S. That is, R, T and M are also collectively exhaustive. (c) From the Venn diagram, T and O are mutually exclusive. In words, this means that Tuscan pizzas never have onions or pizzas with onions are never Tuscan. As an aside, “Tuscan” is a fake pizza designation; one shouldn’t conclude that people from Tuscany actually dislike onions. (d) From the Venn diagram, M∩T and O are mutually exclusive. Thus Gerlanda’s doesn’t make Tuscan pizza with mushrooms and onions. (e) Yes. In terms of the Venn diagram, these pizzas are in the set (T ∪ M ∪ O) c . Problem 1.1.2 Solution Based on the Venn diagram, M O T the complete Gerlandas pizza menu is • Regular without toppings • Regular with mushrooms • Regular with onions • Regular with mushrooms and onions • Tuscan without toppings • Tuscan with mushrooms Problem 1.2.1 Solution (a) An outcome specifies whether the fax is high (h), medium (m), or low (l) speed, and whether the fax has two (t) pages or four (f) pages. The sample space is S = {ht, hf, mt, mf, lt, lf} . (1) 2 (b) The event that the fax is medium speed is A 1 = {mt, mf}. (c) The event that a fax has two pages is A 2 = {ht, mt, lt}. (d) The event that a fax is either high speed or low speed is A 3 = {ht, hf, lt, lf}. (e) Since A 1 ∩ A 2 = {mt} and is not empty, A 1 , A 2 , and A 3 are not mutually exclusive. (f) Since A 1 ∪ A 2 ∪ A 3 = {ht, hf, mt, mf, lt, lf} = S, (2) the collection A 1 , A 2 , A 3 is collectively exhaustive. Problem 1.2.2 Solution (a) The sample space of the experiment is S = {aaa, aaf, afa, faa, ffa, faf, aff, fff} . (1) (b) The event that the circuit from Z fails is Z F = {aaf, aff, faf, fff} . (2) The event that the circuit from X is acceptable is X A = {aaa, aaf, afa, aff} . (3) (c) Since Z F ∩ X A = {aaf, aff} = φ, Z F and X A are not mutually exclusive. (d) Since Z F ∪X A = {aaa, aaf, afa, aff, faf, fff} = S, Z F and X A are not collectively exhaus- tive. (e) The event that more than one circuit is acceptable is C = {aaa, aaf, afa, faa} . (4) The event that at least two circuits fail is D = {ffa, faf, aff, fff} . (5) (f) Inspection shows that C ∩ D = φ so C and D are mutually exclusive. (g) Since C ∪ D = S, C and D are collectively exhaustive. Problem 1.2.3 Solution The sample space is S = {A♣, . . . , K♣, A♦, . . . , K♦, A♥, . . . , K♥, A♠, . . . , K♠} . (1) The event H is the set H = {A♥, . . . , K♥} . (2) 3 Problem 1.2.4 Solution The sample space is S = 1/1 . . . 1/31, 2/1 . . . 2/29, 3/1 . . . 3/31, 4/1 . . . 4/30, 5/1 . . . 5/31, 6/1 . . . 6/30, 7/1 . . . 7/31, 8/1 . . . 8/31, 9/1 . . . 9/31, 10/1 . . . 10/31, 11/1 . . . 11/30, 12/1 . . . 12/31 . (1) The event H defined by the event of a July birthday is described by following 31 sample points. H = {7/1, 7/2, . . . , 7/31} . (2) Problem 1.2.5 Solution Of course, there are many answers to this problem. Here are four event spaces. 1. We can divide students into engineers or non-engineers. Let A 1 equal the set of engineering students and A 2 the non-engineers. The pair {A 1 , A 2 } is an event space. 2. We can also separate students by GPA. Let B i denote the subset of students with GPAs G satisfying i − 1 ≤ G < i. At Rutgers, {B 1 , B 2 , . . . , B 5 } is an event space. Note that B 5 is the set of all students with perfect 4.0 GPAs. Of course, other schools use different scales for GPA. 3. We can also divide the students by age. Let C i denote the subset of students of age i in years. At most universities, {C 10 , C 11 , . . . , C 100 } would be an event space. Since a university may have prodigies either under 10 or over 100, we note that {C 0 , C 1 , . . .} is always an event space 4. Lastly, we can categorize students by attendance. Let D 0 denote the number of students who have missed zero lectures and let D 1 denote all other students. Although it is likely that D 0 is an empty set, {D 0 , D 1 } is a well defined event space. Problem 1.2.6 Solution Let R 1 and R 2 denote the measured resistances. The pair (R 1 , R 2 ) is an outcome of the experiment. Some event spaces include 1. If we need to check that neither resistance is too high, an event space is A 1 = {R 1 < 100, R 2 < 100} , A 2 = {either R 1 ≥ 100 or R 2 ≥ 100} . (1) 2. If we need to check whether the first resistance exceeds the second resistance, an event space is B 1 = {R 1 > R 2 } B 2 = {R 1 ≤ R 2 } . (2) 3. If we need to check whether each resistance doesn’t fall below a minimum value (in this case 50 ohms for R 1 and 100 ohms for R 2 ), an event space is C 1 = {R 1 < 50, R 2 < 100} , C 2 = {R 1 < 50, R 2 ≥ 100} , (3) C 3 = {R 1 ≥ 50, R 2 < 100} , C 4 = {R 1 ≥ 50, R 2 ≥ 100} . (4) 4. If we want to check whether the resistors in parallel are within an acceptable range of 90 to 110 ohms, an event space is D 1 = ¸ (1/R 1 + 1/R 2 ) −1 < 90 ¸ , (5) D 2 = ¸ 90 ≤ (1/R 1 + 1/R 2 ) −1 ≤ 110 ¸ , (6) D 2 = ¸ 110 < (1/R 1 + 1/R 2 ) −1 ¸ . (7) 4 Problem 1.3.1 Solution The sample space of the experiment is S = {LF, BF, LW, BW} . (1) From the problem statement, we know that P[LF] = 0.5, P[BF] = 0.2 and P[BW] = 0.2. This implies P[LW] = 1 −0.5 −0.2 −0.2 = 0.1. The questions can be answered using Theorem 1.5. (a) The probability that a program is slow is P [W] = P [LW] +P [BW] = 0.1 + 0.2 = 0.3. (2) (b) The probability that a program is big is P [B] = P [BF] +P [BW] = 0.2 + 0.2 = 0.4. (3) (c) The probability that a program is slow or big is P [W ∪ B] = P [W] +P [B] −P [BW] = 0.3 + 0.4 −0.2 = 0.5. (4) Problem 1.3.2 Solution A sample outcome indicates whether the cell phone is handheld (H) or mobile (M) and whether the speed is fast (F) or slow (W). The sample space is S = {HF, HW, MF, MW} . (1) The problem statement tells us that P[HF] = 0.2, P[MW] = 0.1 and P[F] = 0.5. We can use these facts to find the probabilities of the other outcomes. In particular, P [F] = P [HF] +P [MF] . (2) This implies P [MF] = P [F] −P [HF] = 0.5 −0.2 = 0.3. (3) Also, since the probabilities must sum to 1, P [HW] = 1 −P [HF] −P [MF] −P [MW] = 1 −0.2 −0.3 −0.1 = 0.4. (4) Now that we have found the probabilities of the outcomes, finding any other probability is easy. (a) The probability a cell phone is slow is P [W] = P [HW] +P [MW] = 0.4 + 0.1 = 0.5. (5) (b) The probability that a cell hpone is mobile and fast is P[MF] = 0.3. (c) The probability that a cell phone is handheld is P [H] = P [HF] +P [HW] = 0.2 + 0.4 = 0.6. (6) 5 Problem 1.3.3 Solution A reasonable probability model that is consistent with the notion of a shuffled deck is that each card in the deck is equally likely to be the first card. Let H i denote the event that the first card drawn is the ith heart where the first heart is the ace, the second heart is the deuce and so on. In that case, P[H i ] = 1/52 for 1 ≤ i ≤ 13. The event H that the first card is a heart can be written as the disjoint union H = H 1 ∪ H 2 ∪ · · · ∪ H 13 . (1) Using Theorem 1.1, we have P [H] = 13 ¸ i=1 P [H i ] = 13/52. (2) This is the answer you would expect since 13 out of 52 cards are hearts. The point to keep in mind is that this is not just the common sense answer but is the result of a probability model for a shuffled deck and the axioms of probability. Problem 1.3.4 Solution Let s i denote the outcome that the down face has i dots. The sample space is S = {s 1 , . . . , s 6 }. The probability of each sample outcome is P[s i ] = 1/6. From Theorem 1.1, the probability of the event E that the roll is even is P [E] = P [s 2 ] +P [s 4 ] +P [s 6 ] = 3/6. (1) Problem 1.3.5 Solution Let s i equal the outcome of the student’s quiz. The sample space is then composed of all the possible grades that she can receive. S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} . (1) Since each of the 11 possible outcomes is equally likely, the probability of receiving a grade of i, for each i = 0, 1, . . . , 10 is P[s i ] = 1/11. The probability that the student gets an A is the probability that she gets a score of 9 or higher. That is P [Grade of A] = P [9] +P [10] = 1/11 + 1/11 = 2/11. (2) The probability of failing requires the student to get a grade less than 4. P [Failing] = P [3] +P [2] +P [1] +P [0] = 1/11 + 1/11 + 1/11 + 1/11 = 4/11. (3) Problem 1.4.1 Solution From the table we look to add all the disjoint events that contain H 0 to express the probability that a caller makes no hand-offs as P [H 0 ] = P [LH 0 ] +P [BH 0 ] = 0.1 + 0.4 = 0.5. (1) In a similar fashion we can express the probability that a call is brief by P [B] = P [BH 0 ] +P [BH 1 ] +P [BH 2 ] = 0.4 + 0.1 + 0.1 = 0.6. (2) The probability that a call is long or makes at least two hand-offs is P [L ∪ H 2 ] = P [LH 0 ] +P [LH 1 ] +P [LH 2 ] +P [BH 2 ] (3) = 0.1 + 0.1 + 0.2 + 0.1 = 0.5. (4) 6 Problem 1.4.2 Solution (a) From the given probability distribution of billed minutes, M, the probability that a call is billed for more than 3 minutes is P [L] = 1 −P [3 or fewer billed minutes] (1) = 1 −P [B 1 ] −P [B 2 ] −P [B 3 ] (2) = 1 −α −α(1 −α) −α(1 −α) 2 (3) = (1 −α) 3 = 0.57. (4) (b) The probability that a call will billed for 9 minutes or less is P [9 minutes or less] = 9 ¸ i=1 α(1 −α) i−1 = 1 −(0.57) 3 . (5) Problem 1.4.3 Solution The first generation consists of two plants each with genotype yg or gy. They are crossed to produce the following second generation genotypes, S = {yy, yg, gy, gg}. Each genotype is just as likely as any other so the probability of each genotype is consequently 1/4. A pea plant has yellow seeds if it possesses at least one dominant y gene. The set of pea plants with yellow seeds is Y = {yy, yg, gy} . (1) So the probability of a pea plant with yellow seeds is P [Y ] = P [yy] +P [yg] +P [gy] = 3/4. (2) Problem 1.4.4 Solution Each statement is a consequence of part 4 of Theorem 1.4. (a) Since A ⊂ A∪ B, P[A] ≤ P[A∪ B]. (b) Since B ⊂ A∪ B, P[B] ≤ P[A∪ B]. (c) Since A∩ B ⊂ A, P[A∩ B] ≤ P[A]. (d) Since A∩ B ⊂ B, P[A∩ B] ≤ P[B]. Problem 1.4.5 Solution Specifically, we will use Theorem 1.7(c) which states that for any events A and B, P [A∪ B] = P [A] +P [B] −P [A∩ B] . (1) To prove the union bound by induction, we first prove the theorem for the case of n = 2 events. In this case, by Theorem 1.7(c), P [A 1 ∪ A 2 ] = P [A 1 ] +P [A 2 ] −P [A 1 ∩ A 2 ] . (2) 7 By the first axiom of probability, P[A 1 ∩ A 2 ] ≥ 0. Thus, P [A 1 ∪ A 2 ] ≤ P [A 1 ] +P [A 2 ] . (3) which proves the union bound for the case n = 2. Now we make our induction hypothesis that the union-bound holds for any collection of n − 1 subsets. In this case, given subsets A 1 , . . . , A n , we define A = A 1 ∪ A 2 ∪ · · · ∪ A n−1 , B = A n . (4) By our induction hypothesis, P [A] = P [A 1 ∪ A 2 ∪ · · · ∪ A n−1 ] ≤ P [A 1 ] + · · · +P [A n−1 ] . (5) This permits us to write P [A 1 ∪ · · · ∪ A n ] = P [A∪ B] (6) ≤ P [A] +P [B] (by the union bound for n = 2) (7) = P [A 1 ∪ · · · ∪ A n−1 ] +P [A n ] (8) ≤ P [A 1 ] + · · · P [A n−1 ] +P [A n ] (9) which completes the inductive proof. Problem 1.4.6 Solution (a) For convenience, let p i = P[FH i ] and q i = P[V H i ]. Using this shorthand, the six unknowns p 0 , p 1 , p 2 , q 0 , q 1 , q 2 fill the table as H 0 H 1 H 2 F p 0 p 1 p 2 V q 0 q 1 q 2 . (1) However, we are given a number of facts: p 0 +q 0 = 1/3, p 1 +q 1 = 1/3, (2) p 2 +q 2 = 1/3, p 0 +p 1 +p 2 = 5/12. (3) Other facts, such as q 0 + q 1 + q 2 = 7/12, can be derived from these facts. Thus, we have four equations and six unknowns, choosing p 0 and p 1 will specify the other unknowns. Un- fortunately, arbitrary choices for either p 0 or p 1 will lead to negative values for the other probabilities. In terms of p 0 and p 1 , the other unknowns are q 0 = 1/3 −p 0 , p 2 = 5/12 −(p 0 +p 1 ), (4) q 1 = 1/3 −p 1 , q 2 = p 0 +p 1 −1/12. (5) Because the probabilities must be nonnegative, we see that 0 ≤ p 0 ≤ 1/3, (6) 0 ≤ p 1 ≤ 1/3, (7) 1/12 ≤ p 0 +p 1 ≤ 5/12. (8) 8 Although there are an infinite number of solutions, three possible solutions are: p 0 = 1/3, p 1 = 1/12, p 2 = 0, (9) q 0 = 0, q 1 = 1/4, q 2 = 1/3. (10) and p 0 = 1/4, p 1 = 1/12, p 2 = 1/12, (11) q 0 = 1/12, q 1 = 3/12, q 2 = 3/12. (12) and p 0 = 0, p 1 = 1/12, p 2 = 1/3, (13) q 0 = 1/3, q 1 = 3/12, q 2 = 0. (14) (b) In terms of the p i , q i notation, the new facts are p 0 = 1/4 and q 1 = 1/6. These extra facts uniquely specify the probabilities. In this case, p 0 = 1/4, p 1 = 1/6, p 2 = 0, (15) q 0 = 1/12, q 1 = 1/6, q 2 = 1/3. (16) Problem 1.4.7 Solution It is tempting to use the following proof: Since S and φ are mutually exclusive, and since S = S ∪ φ, 1 = P [S ∪ φ] = P [S] +P [φ] . (1) Since P[S] = 1, we must have P[φ] = 0. The above “proof” used the property that for mutually exclusive sets A 1 and A 2 , P [A 1 ∪ A 2 ] = P [A 1 ] +P [A 2 ] . (2) The problem is that this property is a consequence of the three axioms, and thus must be proven. For a proof that uses just the three axioms, let A 1 be an arbitrary set and for n = 2, 3, . . ., let A n = φ. Since A 1 = ∪ ∞ i=1 A i , we can use Axiom 3 to write P [A 1 ] = P [∪ ∞ i=1 A i ] = P [A 1 ] +P [A 2 ] + ∞ ¸ i=3 P [A i ] . (3) By subtracting P[A 1 ] from both sides, the fact that A 2 = φ permits us to write P [φ] + ∞ ¸ n=3 P [A i ] = 0. (4) By Axiom 1, P[A i ] ≥ 0 for all i. Thus, ¸ ∞ n=3 P[A i ] ≥ 0. This implies P[φ] ≤ 0. Since Axiom 1 requires P[φ] ≥ 0, we must have P[φ] = 0. 9 Problem 1.4.8 Solution Following the hint, we define the set of events {A i |i = 1, 2, . . .} such that i = 1, . . . , m, A i = B i and for i > m, A i = φ. By construction, ∪ m i=1 B i = ∪ ∞ i=1 A i . Axiom 3 then implies P [∪ m i=1 B i ] = P [∪ ∞ i=1 A i ] = ∞ ¸ i=1 P [A i ] . (1) For i > m, P[A i ] = P[φ] = 0, yielding the claim P[∪ m i=1 B i ] = ¸ m i=1 P[A i ] = ¸ m i=1 P[B i ]. Note that the fact that P[φ] = 0 follows from Axioms 1 and 2. This problem is more challenging if you just use Axiom 3. We start by observing P [∪ m i=1 B i ] = m−1 ¸ i=1 P [B i ] + ∞ ¸ i=m P [A i ] . (2) Now, we use Axiom 3 again on the countably infinite sequence A m , A m+1 , . . . to write ∞ ¸ i=m P [A i ] = P [A m ∪ A m+1 ∪ · · ·] = P [B m ] . (3) Thus, we have used just Axiom 3 to prove Theorem 1.4: P[∪ m i=1 B i ] = ¸ m i=1 P[B i ]. Problem 1.4.9 Solution Each claim in Theorem 1.7 requires a proof from which we can check which axioms are used. However, the problem is somewhat hard because there may still be a simpler proof that uses fewer axioms. Still, the proof of each part will need Theorem 1.4 which we now prove. For the mutually exclusive events B 1 , . . . , B m , let A i = B i for i = 1, . . . , m and let A i = φ for i > m. In that case, by Axiom 3, P [B 1 ∪ B 2 ∪ · · · ∪ B m ] = P [A 1 ∪ A 2 ∪ · · ·] (1) = m−1 ¸ i=1 P [A i ] + ∞ ¸ i=m P [A i ] (2) = m−1 ¸ i=1 P [B i ] + ∞ ¸ i=m P [A i ] . (3) Now, we use Axiom 3 again on A m , A m+1 , . . . to write ∞ ¸ i=m P [A i ] = P [A m ∪ A m+1 ∪ · · ·] = P [B m ] . (4) Thus, we have used just Axiom 3 to prove Theorem 1.4: P [B 1 ∪ B 2 ∪ · · · ∪ B m ] = m ¸ i=1 P [B i ] . (5) (a) To show P[φ] = 0, let B 1 = S and let B 2 = φ. Thus by Theorem 1.4, P [S] = P [B 1 ∪ B 2 ] = P [B 1 ] +P [B 2 ] = P [S] +P [φ] . (6) Thus, P[φ] = 0. Note that this proof uses only Theorem 1.4 which uses only Axiom 3. 10 (b) Using Theorem 1.4 with B 1 = A and B 2 = A c , we have P [S] = P [A∪ A c ] = P [A] +P [A c ] . (7) Since, Axiom 2 says P[S] = 1, P[A c ] = 1 −P[A]. This proof uses Axioms 2 and 3. (c) By Theorem 1.2, we can write both A and B as unions of disjoint events: A = (AB) ∪ (AB c ) B = (AB) ∪ (A c B). (8) Now we apply Theorem 1.4 to write P [A] = P [AB] +P [AB c ] , P [B] = P [AB] +P [A c B] . (9) We can rewrite these facts as P[AB c ] = P[A] −P[AB], P[A c B] = P[B] −P[AB]. (10) Note that so far we have used only Axiom 3. Finally, we observe that A∪ B can be written as the union of mutually exclusive events A∪ B = (AB) ∪ (AB c ) ∪ (A c B). (11) Once again, using Theorem 1.4, we have P[A∪ B] = P[AB] +P[AB c ] +P[A c B] (12) Substituting the results of Equation (10) into Equation (12) yields P [A∪ B] = P [AB] +P [A] −P [AB] +P [B] −P [AB] , (13) which completes the proof. Note that this claim required only Axiom 3. (d) Observe that since A ⊂ B, we can write B as the disjoint union B = A ∪ (A c B). By Theorem 1.4 (which uses Axiom 3), P [B] = P [A] +P [A c B] . (14) By Axiom 1, P[A c B] ≥ 0, hich implies P[A] ≤ P[B]. This proof uses Axioms 1 and 3. Problem 1.5.1 Solution Each question requests a conditional probability. (a) Note that the probability a call is brief is P [B] = P [H 0 B] +P [H 1 B] +P [H 2 B] = 0.6. (1) The probability a brief call will have no handoffs is P [H 0 |B] = P [H 0 B] P [B] = 0.4 0.6 = 2 3 . (2) (b) The probability of one handoff is P[H 1 ] = P[H 1 B] + P[H 1 L] = 0.2. The probability that a call with one handoff will be long is P [L|H 1 ] = P [H 1 L] P [H 1 ] = 0.1 0.2 = 1 2 . (3) (c) The probability a call is long is P[L] = 1 −P[B] = 0.4. The probability that a long call will have one or more handoffs is P [H 1 ∪ H 2 |L] = P [H 1 L ∪ H 2 L] P [L] = P [H 1 L] +P [H 2 L] P [L] = 0.1 + 0.2 0.4 = 3 4 . (4) 11 Problem 1.5.2 Solution Let s i denote the outcome that the roll is i. So, for 1 ≤ i ≤ 6, R i = {s i }. Similarly, G j = {s j+1 , . . . , s 6 }. (a) Since G 1 = {s 2 , s 3 , s 4 , s 5 , s 6 } and all outcomes have probability 1/6, P[G 1 ] = 5/6. The event R 3 G 1 = {s 3 } and P[R 3 G 1 ] = 1/6 so that P [R 3 |G 1 ] = P [R 3 G 1 ] P [G 1 ] = 1 5 . (1) (b) The conditional probability that 6 is rolled given that the roll is greater than 3 is P [R 6 |G 3 ] = P [R 6 G 3 ] P [G 3 ] = P [s 6 ] P [s 4 , s 5 , s 6 ] = 1/6 3/6 . (2) (c) The event E that the roll is even is E = {s 2 , s 4 , s 6 } and has probability 3/6. The joint probability of G 3 and E is P [G 3 E] = P [s 4 , s 6 ] = 1/3. (3) The conditional probabilities of G 3 given E is P [G 3 |E] = P [G 3 E] P [E] = 1/3 1/2 = 2 3 . (4) (d) The conditional probability that the roll is even given that it’s greater than 3 is P [E|G 3 ] = P [EG 3 ] P [G 3 ] = 1/3 1/2 = 2 3 . (5) Problem 1.5.3 Solution Since the 2 of clubs is an even numbered card, C 2 ⊂ E so that P[C 2 E] = P[C 2 ] = 1/3. Since P[E] = 2/3, P [C 2 |E] = P [C 2 E] P [E] = 1/3 2/3 = 1/2. (1) The probability that an even numbered card is picked given that the 2 is picked is P [E|C 2 ] = P [C 2 E] P [C 2 ] = 1/3 1/3 = 1. (2) Problem 1.5.4 Solution Define D as the event that a pea plant has two dominant y genes. To find the conditional probability of D given the event Y , corresponding to a plant having yellow seeds, we look to evaluate P [D|Y ] = P [DY ] P [Y ] . (1) Note that P[DY ] is just the probability of the genotype yy. From Problem 1.4.3, we found that with respect to the color of the peas, the genotypes yy, yg, gy, and gg were all equally likely. This implies P [DY ] = P [yy] = 1/4 P [Y ] = P [yy, gy, yg] = 3/4. (2) Thus, the conditional probability can be expressed as P [D|Y ] = P [DY ] P [Y ] = 1/4 3/4 = 1/3. (3) 12 Problem 1.5.5 Solution The sample outcomes can be written ijk where the first card drawn is i, the second is j and the third is k. The sample space is S = {234, 243, 324, 342, 423, 432} . (1) and each of the six outcomes has probability 1/6. The events E 1 , E 2 , E 3 , O 1 , O 2 , O 3 are E 1 = {234, 243, 423, 432} , O 1 = {324, 342} , (2) E 2 = {243, 324, 342, 423} , O 2 = {234, 432} , (3) E 3 = {234, 324, 342, 432} , O 3 = {243, 423} . (4) (a) The conditional probability the second card is even given that the first card is even is P [E 2 |E 1 ] = P [E 2 E 1 ] P [E 1 ] = P [243, 423] P [234, 243, 423, 432] = 2/6 4/6 = 1/2. (5) (b) The conditional probability the first card is even given that the second card is even is P [E 1 |E 2 ] = P [E 1 E 2 ] P [E 2 ] = P [243, 423] P [243, 324, 342, 423] = 2/6 4/6 = 1/2. (6) (c) The probability the first two cards are even given the third card is even is P [E 1 E 2 |E 3 ] = P [E 1 E 2 E 3 ] P [E 3 ] = 0. (7) (d) The conditional probabilities the second card is even given that the first card is odd is P [E 2 |O 1 ] = P [O 1 E 2 ] P [O 1 ] = P [O 1 ] P [O 1 ] = 1. (8) (e) The conditional probability the second card is odd given that the first card is odd is P [O 2 |O 1 ] = P [O 1 O 2 ] P [O 1 ] = 0. (9) Problem 1.5.6 Solution The problem statement yields the obvious facts that P[L] = 0.16 and P[H] = 0.10. The words “10% of the ticks that had either Lyme disease or HGE carried both diseases” can be written as P [LH|L ∪ H] = 0.10. (1) (a) Since LH ⊂ L ∪ H, P [LH|L ∪ H] = P [LH ∩ (L ∪ H)] P [L ∪ H] = P [LH] P [L ∪ H] = 0.10. (2) Thus, P [LH] = 0.10P [L ∪ H] = 0.10 (P [L] +P [H] −P [LH]) . (3) Since P[L] = 0.16 and P[H] = 0.10, P [LH] = 0.10 (0.16 + 0.10) 1.1 = 0.0236. (4) 13 (b) The conditional probability that a tick has HGE given that it has Lyme disease is P [H|L] = P [LH] P [L] = 0.0236 0.16 = 0.1475. (5) Problem 1.6.1 Solution This problems asks whether A and B can be independent events yet satisfy A = B? By definition, events A and B are independent if and only if P[AB] = P[A]P[B]. We can see that if A = B, that is they are the same set, then P [AB] = P [AA] = P [A] = P [B] . (1) Thus, for A and B to be the same set and also independent, P [A] = P [AB] = P [A] P [B] = (P [A]) 2 . (2) There are two ways that this requirement can be satisfied: • P[A] = 1 implying A = B = S. • P[A] = 0 implying A = B = φ. Problem 1.6.2 Solution A B In the Venn diagram, assume the sample space has area 1 correspond- ing to probability 1. As drawn, both A and B have area 1/4 so that P[A] = P[B] = 1/4. Moreover, the intersection AB has area 1/16 and covers 1/4 of A and 1/4 of B. That is, A and B are independent since P [AB] = P [A] P [B] . (1) Problem 1.6.3 Solution (a) Since A and B are disjoint, P[A∩ B] = 0. Since P[A∩ B] = 0, P [A∪ B] = P [A] +P [B] −P [A∩ B] = 3/8. (1) A Venn diagram should convince you that A ⊂ B c so that A∩ B c = A. This implies P [A∩ B c ] = P [A] = 1/4. (2) It also follows that P[A∪ B c ] = P[B c ] = 1 −1/8 = 7/8. (b) Events A and B are dependent since P[AB] = P[A]P[B]. 14 (c) Since C and D are independent, P [C ∩ D] = P [C] P [D] = 15/64. (3) The next few items are a little trickier. From Venn diagrams, we see P [C ∩ D c ] = P [C] −P [C ∩ D] = 5/8 −15/64 = 25/64. (4) It follows that P [C ∪ D c ] = P [C] +P [D c ] −P [C ∩ D c ] (5) = 5/8 + (1 −3/8) −25/64 = 55/64. (6) Using DeMorgan’s law, we have P [C c ∩ D c ] = P [(C ∪ D) c ] = 1 −P [C ∪ D] = 15/64. (7) (d) Since P[C c D c ] = P[C c ]P[D c ], C c and D c are independent. Problem 1.6.4 Solution (a) Since A∩ B = ∅, P[A∩ B] = 0. To find P[B], we can write P [A∪ B] = P [A] +P [B] −P [A∩ B] (1) 5/8 = 3/8 +P [B] −0. (2) Thus, P[B] = 1/4. Since A is a subset of B c , P[A∩ B c ] = P[A] = 3/8. Furthermore, since A is a subset of B c , P[A∪ B c ] = P[B c ] = 3/4. (b) The events A and B are dependent because P [AB] = 0 = 3/32 = P [A] P [B] . (3) (c) Since C and D are independent P[CD] = P[C]P[D]. So P [D] = P [CD] P [C] = 1/3 1/2 = 2/3. (4) In addition, P[C ∩ D c ] = P[C] − P[C ∩ D] = 1/2 − 1/3 = 1/6. To find P[C c ∩ D c ], we first observe that P [C ∪ D] = P [C] +P [D] −P [C ∩ D] = 1/2 + 2/3 −1/3 = 5/6. (5) By De Morgan’s Law, C c ∩ D c = (C ∪ D) c . This implies P [C c ∩ D c ] = P [(C ∪ D) c ] = 1 −P [C ∪ D] = 1/6. (6) Note that a second way to find P[C c ∩ D c ] is to use the fact that if C and D are independent, then C c and D c are independent. Thus P [C c ∩ D c ] = P [C c ] P [D c ] = (1 −P [C])(1 −P [D]) = 1/6. (7) Finally, since C and D are independent events, P[C|D] = P[C] = 1/2. (d) Note that we found P[C ∪ D] = 5/6. We can also use the earlier results to show P [C ∪ D c ] = P [C] +P [D] −P [C ∩ D c ] = 1/2 + (1 −2/3) −1/6 = 2/3. (8) (e) By Definition 1.7, events C and D c are independent because P [C ∩ D c ] = 1/6 = (1/2)(1/3) = P [C] P [D c ] . (9) 15 Problem 1.6.5 Solution For a sample space S = {1, 2, 3, 4} with equiprobable outcomes, consider the events A 1 = {1, 2} A 2 = {2, 3} A 3 = {3, 1} . (1) Each event A i has probability 1/2. Moreover, each pair of events is independent since P [A 1 A 2 ] = P [A 2 A 3 ] = P [A 3 A 1 ] = 1/4. (2) However, the three events A 1 , A 2 , A 3 are not independent since P [A 1 A 2 A 3 ] = 0 = P [A 1 ] P [A 2 ] P [A 3 ] . (3) Problem 1.6.6 Solution There are 16 distinct equally likely outcomes for the second generation of pea plants based on a first generation of {rwyg, rwgy, wryg, wrgy}. They are listed below rryy rryg rrgy rrgg rwyy rwyg rwgy rwgg wryy wryg wrgy wrgg wwyy wwyg wwgy wwgg (1) A plant has yellow seeds, that is event Y occurs, if a plant has at least one dominant y gene. Except for the four outcomes with a pair of recessive g genes, the remaining 12 outcomes have yellow seeds. From the above, we see that P [Y ] = 12/16 = 3/4 (2) and P [R] = 12/16 = 3/4. (3) To find the conditional probabilities P[R|Y ] and P[Y |R], we first must find P[RY ]. Note that RY , the event that a plant has rounded yellow seeds, is the set of outcomes RY = {rryy, rryg, rrgy, rwyy, rwyg, rwgy, wryy, wryg, wrgy} . (4) Since P[RY ] = 9/16, P [Y |R] = P [RY ] P [R] = 9/16 3/4 = 3/4 (5) and P [R|Y ] = P [RY ] P [Y ] = 9/16 3/4 = 3/4. (6) Thus P[R|Y ] = P[R] and P[Y |R] = P[Y ] and R and Y are independent events. There are four visibly different pea plants, corresponding to whether the peas are round (R) or not (R c ), or yellow (Y ) or not (Y c ). These four visible events have probabilities P [RY ] = 9/16 P [RY c ] = 3/16, (7) P [R c Y ] = 3/16 P [R c Y c ] = 1/16. (8) 16 Problem 1.6.7 Solution (a) For any events A and B, we can write the law of total probability in the form of P [A] = P [AB] +P [AB c ] . (1) Since A and B are independent, P[AB] = P[A]P[B]. This implies P [AB c ] = P [A] −P [A] P [B] = P [A] (1 −P [B]) = P [A] P [B c ] . (2) Thus A and B c are independent. (b) Proving that A c and B are independent is not really necessary. Since A and B are arbitrary labels, it is really the same claim as in part (a). That is, simply reversing the labels of A and B proves the claim. Alternatively, one can construct exactly the same proof as in part (a) with the labels A and B reversed. (c) To prove that A c and B c are independent, we apply the result of part (a) to the sets A and B c . Since we know from part (a) that A and B c are independent, part (b) says that A c and B c are independent. Problem 1.6.8 Solution A AC AB ABC C BC B In the Venn diagram at right, assume the sample space has area 1 cor- responding to probability 1. As drawn, A, B, and C each have area 1/2 and thus probability 1/2. Moreover, the three way intersection ABC has probability 1/8. Thus A, B, and C are mutually independent since P [ABC] = P [A] P [B] P [C] . (1) Problem 1.6.9 Solution A AB B AC C BC In the Venn diagram at right, assume the sample space has area 1 cor- responding to probability 1. As drawn, A, B, and C each have area 1/3 and thus probability 1/3. The three way intersection ABC has zero probability, implying A, B, and C are not mutually independent since P [ABC] = 0 = P [A] P [B] P [C] . (1) However, AB, BC, and AC each has area 1/9. As a result, each pair of events is independent since P [AB] = P [A] P [B] , P [BC] = P [B] P [C] , P [AC] = P [A] P [C] . (2) 17 Problem 1.7.1 Solution A sequential sample space for this experiment is $ $ $ $ $ $H 1 1/4 ˆ ˆ ˆ ˆ ˆ ˆ T 1 3/4 $ $ $ $ $ $H 2 1/4 T 2 3/4 H 2 1/4 ˆ ˆ ˆ ˆ ˆ ˆ T 2 3/4 •H 1 H 2 1/16 •H 1 T 2 3/16 •T 1 H 2 3/16 •T 1 T 2 9/16 (a) From the tree, we observe P [H 2 ] = P [H 1 H 2 ] +P [T 1 H 2 ] = 1/4. (1) This implies P [H 1 |H 2 ] = P [H 1 H 2 ] P [H 2 ] = 1/16 1/4 = 1/4. (2) (b) The probability that the first flip is heads and the second flip is tails is P[H 1 T 2 ] = 3/16. Problem 1.7.2 Solution The tree with adjusted probabilities is ¨ ¨ ¨ ¨ ¨ ¨ G 1 1/2 r r r r r r R 1 1/2 $ $ $ $ $ $G 2 3/4 ˆ ˆ ˆ ˆ ˆ ˆ R 2 1/4 $ $ $ $ $ $G 2 1/4 ˆ ˆ ˆ ˆ ˆ ˆ R 2 3/4 •G 1 G 2 3/8 •G 1 R 2 1/8 •R 1 G 2 1/8 •R 1 R 2 3/8 From the tree, the probability the second light is green is P [G 2 ] = P [G 1 G 2 ] +P [R 1 G 2 ] = 3/8 + 1/8 = 1/2. (1) The conditional probability that the first light was green given the second light was green is P [G 1 |G 2 ] = P [G 1 G 2 ] P [G 2 ] = P [G 2 |G 1 ] P [G 1 ] P [G 2 ] = 3/4. (2) Finally, from the tree diagram, we can directly read that P[G 2 |G 1 ] = 3/4. Problem 1.7.3 Solution Let G i and B i denote events indicating whether free throw i was good (G i ) or bad (B i ). The tree for the free throw experiment is 18 ¨ ¨ ¨ ¨ ¨ ¨ G 1 1/2 r r r r r r B 1 1/2 $ $ $ $ $ $G 2 3/4 ˆ ˆ ˆ ˆ ˆ ˆ B 2 1/4 $ $ $ $ $ $G 2 1/4 ˆ ˆ ˆ ˆ ˆ ˆ B 2 3/4 •G 1 G 2 3/8 •G 1 B 2 1/8 •B 1 G 2 1/8 •B 1 B 2 3/8 The game goes into overtime if exactly one free throw is made. This event has probability P [O] = P [G 1 B 2 ] +P [B 1 G 2 ] = 1/8 + 1/8 = 1/4. (1) Problem 1.7.4 Solution The tree for this experiment is $ $ $ $ $ $ A 1/2 ˆ ˆ ˆ ˆ ˆ ˆ B 1/2 $ $ $ $ $ $ H 1/4 T 3/4 H 3/4 ˆ ˆ ˆ ˆ ˆ ˆ T 1/4 •AH 1/8 •AT 3/8 •BH 3/8 •BT 1/8 The probability that you guess correctly is P [C] = P [AT] +P [BH] = 3/8 + 3/8 = 3/4. (1) Problem 1.7.5 Solution The P[−|H] is the probability that a person who has HIV tests negative for the disease. This is referred to as a false-negative result. The case where a person who does not have HIV but tests positive for the disease, is called a false-positive result and has probability P[+|H c ]. Since the test is correct 99% of the time, P [−|H] = P [+|H c ] = 0.01. (1) Now the probability that a person who has tested positive for HIV actually has the disease is P [H|+] = P [+, H] P [+] = P [+, H] P [+, H] +P [+, H c ] . (2) We can use Bayes’ formula to evaluate these joint probabilities. P [H|+] = P [+|H] P [H] P [+|H] P [H] +P [+|H c ] P [H c ] (3) = (0.99)(0.0002) (0.99)(0.0002) + (0.01)(0.9998) (4) = 0.0194. (5) Thus, even though the test is correct 99% of the time, the probability that a random person who tests positive actually has HIV is less than 0.02. The reason this probability is so low is that the a priori probability that a person has HIV is very small. 19 Problem 1.7.6 Solution Let A i and D i indicate whether the ith photodetector is acceptable or defective. $ $ $ $ $ $A 1 3/5 ˆ ˆ ˆ ˆ ˆ ˆ D 1 2/5 $ $ $ $ $ $A 2 4/5 D 2 1/5 A 2 2/5 ˆ ˆ ˆ ˆ ˆ ˆ D 2 3/5 •A 1 A 2 12/25 •A 1 D 2 3/25 •D 1 A 2 4/25 •D 1 D 2 6/25 (a) We wish to find the probability P[E 1 ] that exactly one photodetector is acceptable. From the tree, we have P [E 1 ] = P [A 1 D 2 ] +P [D 1 A 2 ] = 3/25 + 4/25 = 7/25. (1) (b) The probability that both photodetectors are defective is P[D 1 D 2 ] = 6/25. Problem 1.7.7 Solution The tree for this experiment is ¨ ¨ ¨ ¨ ¨ ¨ A 1 1/2 r r r r r r B 1 1/2 $ $ $ $ $ $H 1 1/4 T 1 3/4 H 1 3/4 ˆ ˆ ˆ ˆ ˆ ˆ T 1 1/4 $ $ $ $ $ $H 2 3/4 T 2 1/4 H 2 3/4 ˆ ˆ ˆ ˆ ˆ ˆ T 2 1/4 $ $ $ $ $ $H 2 1/4 T 2 3/4 H 2 1/4 ˆ ˆ ˆ ˆ ˆ ˆ T 2 3/4 •A 1 H 1 H 2 3/32 •A 1 H 1 T 2 1/32 •A 1 T 1 H 2 9/32 •A 1 T 1 T 2 3/32 •B 1 H 1 H 2 3/32 •B 1 H 1 T 2 9/32 •B 1 T 1 H 2 1/32 •B 1 T 1 T 2 3/32 The event H 1 H 2 that heads occurs on both flips has probability P [H 1 H 2 ] = P [A 1 H 1 H 2 ] +P [B 1 H 1 H 2 ] = 6/32. (1) The probability of H 1 is P [H 1 ] = P [A 1 H 1 H 2 ] +P [A 1 H 1 T 2 ] +P [B 1 H 1 H 2 ] +P [B 1 H 1 T 2 ] = 1/2. (2) Similarly, P [H 2 ] = P [A 1 H 1 H 2 ] +P [A 1 T 1 H 2 ] +P [B 1 H 1 H 2 ] +P [B 1 T 1 H 2 ] = 1/2. (3) Thus P[H 1 H 2 ] = P[H 1 ]P[H 2 ], implying H 1 and H 2 are not independent. This result should not be surprising since if the first flip is heads, it is likely that coin B was picked first. In this case, the second flip is less likely to be heads since it becomes more likely that the second coin flipped was coin A. 20 Problem 1.7.8 Solution (a) The primary difficulty in this problem is translating the words into the correct tree diagram. The tree for this problem is shown below. & & & & & H 1 1/2 ˆ ˆ ˆ ˆ ˆ T 1 1/2 ¨ ¨ ¨ ¨ ¨ H 2 1/2 T 2 1/2 H 3 1/2      T 3 1/2 H 4 1/2 ˆ ˆ ˆ ˆ ˆ T 4 1/2 & & & & & H 3 1/2 T 3 1/2 $ $ $ $ $H 4 1/2 T 4 1/2 •H 1 1/2 •T 1 H 2 H 3 1/8 •T 1 H 2 T 3 H 4 1/16 •T 1 H 2 T 3 T 4 1/16 •T 1 T 2 H 3 H 4 1/16 •T 1 T 2 H 3 T 4 1/16 •T 1 T 2 T 3 1/8 (b) From the tree, P [H 3 ] = P [T 1 H 2 H 3 ] +P [T 1 T 2 H 3 H 4 ] +P [T 1 T 2 H 3 H 4 ] (1) = 1/8 + 1/16 + 1/16 = 1/4. (2) Similarly, P [T 3 ] = P [T 1 H 2 T 3 H 4 ] +P [T 1 H 2 T 3 T 4 ] +P [T 1 T 2 T 3 ] (3) = 1/8 + 1/16 + 1/16 = 1/4. (4) (c) The event that Dagwood must diet is D = (T 1 H 2 T 3 T 4 ) ∪ (T 1 T 2 H 3 T 4 ) ∪ (T 1 T 2 T 3 ). (5) The probability that Dagwood must diet is P [D] = P [T 1 H 2 T 3 T 4 ] +P [T 1 T 2 H 3 T 4 ] +P [T 1 T 2 T 3 ] (6) = 1/16 + 1/16 + 1/8 = 1/4. (7) The conditional probability of heads on flip 1 given that Dagwood must diet is P [H 1 |D] = P [H 1 D] P [D] = 0. (8) Remember, if there was heads on flip 1, then Dagwood always postpones his diet. (d) From part (b), we found that P[H 3 ] = 1/4. To check independence, we calculate P [H 2 ] = P [T 1 H 2 H 3 ] +P [T 1 H 2 T 3 ] +P [T 1 H 2 T 4 T 4 ] = 1/4 (9) P [H 2 H 3 ] = P [T 1 H 2 H 3 ] = 1/8. (10) Now we find that P [H 2 H 3 ] = 1/8 = P [H 2 ] P [H 3 ] . (11) Hence, H 2 and H 3 are dependent events. In fact, P[H 3 |H 2 ] = 1/2 while P[H 3 ] = 1/4. The reason for the dependence is that given H 2 occurred, then we know there will be a third flip which may result in H 3 . That is, knowledge of H 2 tells us that the experiment didn’t end after the first flip. 21 Problem 1.7.9 Solution (a) We wish to know what the probability that we find no good photodiodes in n pairs of diodes. Testing each pair of diodes is an independent trial such that with probability p, both diodes of a pair are bad. From Problem 1.7.6, we can easily calculate p. p = P [both diodes are defective] = P [D 1 D 2 ] = 6/25. (1) The probability of Z n , the probability of zero acceptable diodes out of n pairs of diodes is p n because on each test of a pair of diodes, both must be defective. P [Z n ] = n ¸ i=1 p = p n = 6 25 n (2) (b) Another way to phrase this question is to ask how many pairs must we test until P[Z n ] ≤ 0.01. Since P[Z n ] = (6/25) n , we require 6 25 n ≤ 0.01 ⇒ n ≥ ln 0.01 ln 6/25 = 3.23. (3) Since n must be an integer, n = 4 pairs must be tested. Problem 1.7.10 Solution The experiment ends as soon as a fish is caught. The tree resembles ¨ ¨ ¨ ¨ ¨ ¨ C 1 p C c 1 1−p ¨ ¨ ¨ ¨ ¨ ¨ C 2 p C c 2 1−p ¨ ¨ ¨ ¨ ¨ ¨ C 3 p C c 3 1−p ... From the tree, P[C 1 ] = p and P[C 2 ] = (1 −p)p. Finally, a fish is caught on the nth cast if no fish were caught on the previous n −1 casts. Thus, P [C n ] = (1 −p) n−1 p. (1) Problem 1.8.1 Solution There are 2 5 = 32 different binary codes with 5 bits. The number of codes with exactly 3 zeros equals the number of ways of choosing the bits in which those zeros occur. Therefore there are 5 3 = 10 codes with exactly 3 zeros. Problem 1.8.2 Solution Since each letter can take on any one of the 4 possible letters in the alphabet, the number of 3 letter words that can be formed is 4 3 = 64. If we allow each letter to appear only once then we have 4 choices for the first letter and 3 choices for the second and two choices for the third letter. Therefore, there are a total of 4 · 3 · 2 = 24 possible codes. 22 Problem 1.8.3 Solution (a) The experiment of picking two cards and recording them in the order in which they were selected can be modeled by two sub-experiments. The first is to pick the first card and record it, the second sub-experiment is to pick the second card without replacing the first and recording it. For the first sub-experiment we can have any one of the possible 52 cards for a total of 52 possibilities. The second experiment consists of all the cards minus the one that was picked first(because we are sampling without replacement) for a total of 51 possible outcomes. So the total number of outcomes is the product of the number of outcomes for each sub-experiment. 52 · 51 = 2652 outcomes. (1) (b) To have the same card but different suit we can make the following sub-experiments. First we need to pick one of the 52 cards. Then we need to pick one of the 3 remaining cards that are of the same type but different suit out of the remaining 51 cards. So the total number outcomes is 52 · 3 = 156 outcomes. (2) (c) The probability that the two cards are of the same type but different suit is the number of outcomes that are of the same type but different suit divided by the total number of outcomes involved in picking two cards at random from a deck of 52 cards. P [same type, different suit] = 156 2652 = 1 17 . (3) (d) Now we are not concerned with the ordering of the cards. So before, the outcomes (K♥, 8♦) and (8♦, K♥) were distinct. Now, those two outcomes are not distinct and are only considered to be the single outcome that a King of hearts and 8 of diamonds were selected. So every pair of outcomes before collapses to a single outcome when we disregard ordering. So we can redo parts (a) and (b) above by halving the corresponding values found in parts (a) and (b). The probability however, does not change because both the numerator and the denominator have been reduced by an equal factor of 2, which does not change their ratio. Problem 1.8.4 Solution We can break down the experiment of choosing a starting lineup into a sequence of subexperiments: 1. Choose 1 of the 10 pitchers. There are N 1 = 10 1 = 10 ways to do this. 2. Choose 1 of the 15 field players to be the designated hitter (DH). There are N 2 = 15 1 = 15 ways to do this. 3. Of the remaining 14 field players, choose 8 for the remaining field positions. There are N 3 = 14 8 to do this. 4. For the 9 batters (consisting of the 8 field players and the designated hitter), choose a batting lineup. There are N 4 = 9! ways to do this. 23 So the total number of different starting lineups when the DH is selected among the field players is N = N 1 N 2 N 3 N 4 = (10)(15) 14 8 9! = 163,459,296,000. (1) Note that this overestimates the number of combinations the manager must really consider because most field players can play only one or two positions. Although these constraints on the manager reduce the number of possible lineups, it typically makes the manager’s job more difficult. As for the counting, we note that our count did not need to specify the positions played by the field players. Although this is an important consideration for the manager, it is not part of our counting of different lineups. In fact, the 8 nonpitching field players are allowed to switch positions at any time in the field. For example, the shortstop and second baseman could trade positions in the middle of an inning. Although the DH can go play the field, there are some coomplicated rules about this. Here is an an excerpt from Major league Baseball Rule 6.10: The Designated Hitter may be used defensively, continuing to bat in the same posi- tion in the batting order, but the pitcher must then bat in the place of the substituted defensive player, unless more than one substitution is made, and the manager then must designate their spots in the batting order. If you’re curious, you can find the complete rule on the web. Problem 1.8.5 Solution When the DH can be chosen among all the players, including the pitchers, there are two cases: • The DH is a field player. In this case, the number of possible lineups, N F , is given in Problem 1.8.4. In this case, the designated hitter must be chosen from the 15 field players. We repeat the solution of Problem 1.8.4 here: We can break down the experiment of choosing a starting lineup into a sequence of subexperiments: 1. Choose 1 of the 10 pitchers. There are N 1 = 10 1 = 10 ways to do this. 2. Choose 1 of the 15 field players to be the designated hitter (DH). There are N 2 = 15 1 = 15 ways to do this. 3. Of the remaining 14 field players, choose 8 for the remaining field positions. There are N 3 = 14 8 to do this. 4. For the 9 batters (consisting of the 8 field players and the designated hitter), choose a batting lineup. There are N 4 = 9! ways to do this. So the total number of different starting lineups when the DH is selected among the field players is N = N 1 N 2 N 3 N 4 = (10)(15) 14 8 9! = 163,459,296,000. (1) • The DH is a pitcher. In this case, there are 10 choices for the pitcher, 10 choices for the DH among the pitchers (including the pitcher batting for himself), 15 8 choices for the field players, and 9! ways of ordering the batters into a lineup. The number of possible lineups is N = (10)(10) 15 8 9! = 233, 513, 280, 000. (2) The total number of ways of choosing a lineup is N +N = 396,972,576,000. 24 Problem 1.8.6 Solution (a) We can find the number of valid starting lineups by noticing that the swingman presents three situations: (1) the swingman plays guard, (2) the swingman plays forward, and (3) the swingman doesn’t play. The first situation is when the swingman can be chosen to play the guard position, and the second where the swingman can only be chosen to play the forward position. Let N i denote the number of lineups corresponding to case i. Then we can write the total number of lineups as N 1 +N 2 +N 3 . In the first situation, we have to choose 1 out of 3 centers, 2 out of 4 forwards, and 1 out of 4 guards so that N 1 = 3 1 4 2 4 1 = 72. (1) In the second case, we need to choose 1 out of 3 centers, 1 out of 4 forwards and 2 out of 4 guards, yielding N 2 = 3 1 4 1 4 2 = 72. (2) Finally, with the swingman on the bench, we choose 1 out of 3 centers, 2 out of 4 forward, and 2 out of four guards. This implies N 3 = 3 1 4 2 4 2 = 108, (3) and the total number of lineups is N 1 +N 2 +N 3 = 252. Problem 1.8.7 Solution What our design must specify is the number of boxes on the ticket, and the number of specially marked boxes. Suppose each ticket has n boxes and 5 +k specially marked boxes. Note that when k > 0, a winning ticket will still have k unscratched boxes with the special mark. A ticket is a winner if each time a box is scratched off, the box has the special mark. Assuming the boxes are scratched off randomly, the first box scratched off has the mark with probability (5 + k)/n since there are 5 + k marked boxes out of n boxes. Moreover, if the first scratched box has the mark, then there are 4 + k marked boxes out of n − 1 remaining boxes. Continuing this argument, the probability that a ticket is a winner is p = 5 +k n 4 +k n −1 3 +k n −2 2 +k n −3 1 +k n −4 = (k + 5)!(n −5)! k!n! . (1) By careful choice of n and k, we can choose p close to 0.01. For example, n 9 11 14 17 k 0 1 2 3 p 0.0079 0.012 0.0105 0.0090 (2) A gamecard with N = 14 boxes and 5 +k = 7 shaded boxes would be quite reasonable. Problem 1.9.1 Solution 25 (a) Since the probability of a zero is 0.8, we can express the probability of the code word 00111 as 2 occurrences of a 0 and three occurrences of a 1. Therefore P [00111] = (0.8) 2 (0.2) 3 = 0.00512. (1) (b) The probability that a code word has exactly three 1’s is P [three 1’s] = 5 3 (0.8) 2 (0.2) 3 = 0.0512. (2) Problem 1.9.2 Solution Given that the probability that the Celtics win a single championship in any given year is 0.32, we can find the probability that they win 8 straight NBA championships. P [8 straight championships] = (0.32) 8 = 0.00011. (1) The probability that they win 10 titles in 11 years is P [10 titles in 11 years] = 11 10 (.32) 10 (.68) = 0.00084. (2) The probability of each of these events is less than 1 in 1000! Given that these events took place in the relatively short fifty year history of the NBA, it should seem that these probabilities should be much higher. What the model overlooks is that the sequence of 10 titles in 11 years started when Bill Russell joined the Celtics. In the years with Russell (and a strong supporting cast) the probability of a championship was much higher. Problem 1.9.3 Solution We know that the probability of a green and red light is 7/16, and that of a yellow light is 1/8. Since there are always 5 lights, G, Y , and R obey the multinomial probability law: P [G = 2, Y = 1, R = 2] = 5! 2!1!2! 7 16 2 1 8 7 16 2 . (1) The probability that the number of green lights equals the number of red lights P [G = R] = P [G = 1, R = 1, Y = 3] +P [G = 2, R = 2, Y = 1] +P [G = 0, R = 0, Y = 5] (2) = 5! 1!1!3! 7 16 7 16 1 8 3 + 5! 2!1!2! 7 16 2 7 16 2 1 8 + 5! 0!0!5! 1 8 5 (3) ≈ 0.1449. (4) Problem 1.9.4 Solution For the team with the homecourt advantage, let W i and L i denote whether game i was a win or a loss. Because games 1 and 3 are home games and game 2 is an away game, the tree is 26 p=1-q $ $ $ $ $ $W 1 p ˆ ˆ ˆ ˆ ˆ ˆ L 1 1−p ¨ ¨ ¨ ¨ ¨ ¨ W 2 1−p L 2 p W 2 1−p r r r r r r L 2 p $ $ $ $ $ $W 3 p L 3 1−p W 3 p ˆ ˆ ˆ ˆ ˆ ˆ L 3 1−p •W 1 W 2 p(1−p) •L 1 L 2 p(1−p) •W 1 L 2 L 3 p 2 (1−p) •W 1 L 2 W 3 p 3 •L 1 W 2 W 3 p(1−p) 2 •L 1 W 2 L 3 (1−p) 3 The probability that the team with the home court advantage wins is P [H] = P [W 1 W 2 ] +P [W 1 L 2 W 3 ] +P [L 1 W 2 W 3 ] (1) = p(1 −p) +p 3 +p(1 −p) 2 . (2) Note that P[H] ≤ p for 1/2 ≤ p ≤ 1. Since the team with the home court advantage would win a 1 game playoff with probability p, the home court team is less likely to win a three game series than a 1 game playoff! Problem 1.9.5 Solution (a) There are 3 group 1 kickers and 6 group 2 kickers. Using G i to denote that a group i kicker was chosen, we have P [G 1 ] = 1/3 P [G 2 ] = 2/3. (1) In addition, the problem statement tells us that P [K|G 1 ] = 1/2 P [K|G 2 ] = 1/3. (2) Combining these facts using the Law of Total Probability yields P [K] = P [K|G 1 ] P [G 1 ] +P [K|G 2 ] P [G 2 ] (3) = (1/2)(1/3) + (1/3)(2/3) = 7/18. (4) (b) To solve this part, we need to identify the groups from which the first and second kicker were chosen. Let c i indicate whether a kicker was chosen from group i and let C ij indicate that the first kicker was chosen from group i and the second kicker from group j. The experiment to choose the kickers is described by the sample tree: $ $ $ $ $ $ c 1 3/9 ˆ ˆ ˆ ˆ ˆ ˆ c 2 6/9 $ $ $ $ $ $ c 1 2/8 c 2 6/8 c 1 3/8 ˆ ˆ ˆ ˆ ˆ ˆ c 2 5/8 •C 11 1/12 •C 12 1/4 •C 21 1/4 •C 22 5/12 Since a kicker from group 1 makes a kick with probability 1/2 while a kicker from group 2 makes a kick with probability 1/3, P [K 1 K 2 |C 11 ] = (1/2) 2 P [K 1 K 2 |C 12 ] = (1/2)(1/3) (5) P [K 1 K 2 |C 21 ] = (1/3)(1/2) P [K 1 K 2 |C 22 ] = (1/3) 2 (6) 27 By the law of total probability, P [K 1 K 2 ] = P [K 1 K 2 |C 11 ] P [C 11 ] +P [K 1 K 2 |C 12 ] P [C 12 ] (7) +P [K 1 K 2 |C 21 ] P [C 21 ] +P [K 1 K 2 |C 22 ] P [C 22 ] (8) = 1 4 1 12 + 1 6 1 4 + 1 6 1 4 + 1 9 5 12 = 15/96. (9) It should be apparent that P[K 1 ] = P[K] from part (a). Symmetry should also make it clear that P[K 1 ] = P[K 2 ] since for any ordering of two kickers, the reverse ordering is equally likely. If this is not clear, we derive this result by calculating P[K 2 |C ij ] and using the law of total probability to calculate P[K 2 ]. P [K 2 |C 11 ] = 1/2, P [K 2 |C 12 ] = 1/3, (10) P [K 2 |C 21 ] = 1/2, P [K 2 |C 22 ] = 1/3. (11) By the law of total probability, P [K 2 ] = P [K 2 |C 11 ] P [C 11 ] +P [K 2 |C 12 ] P [C 12 ] +P [K 2 |C 21 ] P [C 21 ] +P [K 2 |C 22 ] P [C 22 ] (12) = 1 2 1 12 + 1 3 1 4 + 1 2 1 4 + 1 3 5 12 = 7 18 . (13) We observe that K 1 and K 2 are not independent since P [K 1 K 2 ] = 15 96 = 7 18 2 = P [K 1 ] P [K 2 ] . (14) Note that 15/96 and (7/18) 2 are close but not exactly the same. The reason K 1 and K 2 are dependent is that if the first kicker is successful, then it is more likely that kicker is from group 1. This makes it more likely that the second kicker is from group 2 and is thus more likely to miss. (c) Once a kicker is chosen, each of the 10 field goals is an independent trial. If the kicker is from group 1, then the success probability is 1/2. If the kicker is from group 2, the success probability is 1/3. Out of 10 kicks, there are 5 misses iff there are 5 successful kicks. Given the type of kicker chosen, the probability of 5 misses is P [M|G 1 ] = 10 5 (1/2) 5 (1/2) 5 , P [M|G 2 ] = 10 5 (1/3) 5 (2/3) 5 . (15) We use the Law of Total Probability to find P [M] = P [M|G 1 ] P [G 1 ] +P [M|G 2 ] P [G 2 ] (16) = 10 5 (1/3)(1/2) 10 + (2/3)(1/3) 5 (2/3) 5 . (17) Problem 1.10.1 Solution From the problem statement, we can conclude that the device components are configured in the following way. 28 W 1 W 2 W 5 W 3 W 4 W 6 To find the probability that the device works, we replace series devices 1, 2, and 3, and parallel devices 5 and 6 each with a single device labeled with the probability that it works. In particular, P [W 1 W 2 W 3 ] = (1 −q) 3 , (1) P [W 5 ∪ W 6 ] = 1 −P [W c 5 W c 6 ] = 1 −q 2 . (2) This yields a composite device of the form 1-q 2 1-q ( ) 1-q 3 The probability P[W ] that the two devices in parallel work is 1 minus the probability that neither works: P W = 1 −q(1 −(1 −q) 3 ). (3) Finally, for the device to work, both composite device in series must work. Thus, the probability the device works is P [W] = [1 −q(1 −(1 −q) 3 )][1 −q 2 ]. (4) Problem 1.10.2 Solution Suppose that the transmitted bit was a 1. We can view each repeated transmission as an indepen- dent trial. We call each repeated bit the receiver decodes as 1 a success. Using S k,5 to denote the event of k successes in the five trials, then the probability k 1’s are decoded at the receiver is P [S k,5 ] = 5 k p k (1 −p) 5−k , k = 0, 1, . . . , 5. (1) The probability a bit is decoded correctly is P [C] = P [S 5,5 ] +P [S 4,5 ] = p 5 + 5p 4 (1 −p) = 0.91854. (2) The probability a deletion occurs is P [D] = P [S 3,5 ] +P [S 2,5 ] = 10p 3 (1 −p) 2 + 10p 2 (1 −p) 3 = 0.081. (3) The probability of an error is P [E] = P [S 1,5 ] +P [S 0,5 ] = 5p(1 −p) 4 + (1 −p) 5 = 0.00046. (4) Note that if a 0 is transmitted, then 0 is sent five times and we call decoding a 0 a success. You should convince yourself that this a symmetric situation with the same deletion and error probabilities. Introducing deletions reduces the probability of an error by roughly a factor of 20. However, the probability of successfull decoding is also reduced. 29 Problem 1.10.3 Solution Note that each digit 0 through 9 is mapped to the 4 bit binary representation of the digit. That is, 0 corresponds to 0000, 1 to 0001, up to 9 which corresponds to 1001. Of course, the 4 bit binary numbers corresponding to numbers 10 through 15 go unused, however this is unimportant to our problem. the 10 digit number results in the transmission of 40 bits. For each bit, an independent trial determines whether the bit was correct, a deletion, or an error. In Problem 1.10.2, we found the probabilities of these events to be P [C] = γ = 0.91854, P [D] = δ = 0.081, P [E] = = 0.00046. (1) Since each of the 40 bit transmissions is an independent trial, the joint probability of c correct bits, d deletions, and e erasures has the multinomial probability P [C = c, D = d, E = d] = 40! c!d!e! γ c δ d e c +d +e = 40; c, d, e ≥ 0, 0 otherwise. (2) Problem 1.10.4 Solution From the statement of Problem 1.10.1, the configuration of device components is W 1 W 2 W 5 W 3 W 4 W 6 By symmetry, note that the reliability of the system is the same whether we replace component 1, component 2, or component 3. Similarly, the reliability is the same whether we replace component 5 or component 6. Thus we consider the following cases: I Replace component 1 In this case P [W 1 W 2 W 3 ] = (1 − q 2 )(1 −q) 2 , P [W 4 ] = 1 −q, P [W 5 ∪ W 6 ] = 1 −q 2 . (1) This implies P [W 1 W 2 W 3 ∪ W 4 ] = 1 −(1 −P [W 1 W 2 W 3 ])(1 −P [W 4 ]) = 1 − q 2 2 (5 −4q +q 2 ). (2) In this case, the probability the system works is P [W I ] = P [W 1 W 2 W 3 ∪ W 4 ] P [W 5 ∪ W 6 ] = ¸ 1 − q 2 2 (5 −4q +q 2 ) (1 −q 2 ). (3) II Replace component 4 In this case, P [W 1 W 2 W 3 ] = (1 −q) 3 , P [W 4 ] = 1 − q 2 , P [W 5 ∪ W 6 ] = 1 −q 2 . (4) This implies P [W 1 W 2 W 3 ∪ W 4 ] = 1 −(1 −P [W 1 W 2 W 3 ])(1 −P [W 4 ]) = 1 − q 2 + q 2 (1 −q) 3 . (5) In this case, the probability the system works is P [W II ] = P [W 1 W 2 W 3 ∪ W 4 ] P [W 5 ∪ W 6 ] = 1 − q 2 + q 2 (1 −q) 3 (1 −q 2 ). (6) 30 III Replace component 5 In this case, P [W 1 W 2 W 3 ] = (1 −q) 3 , P [W 4 ] = 1 −q, P [W 5 ∪ W 6 ] = 1 − q 2 2 . (7) This implies P [W 1 W 2 W 3 ∪ W 4 ] = 1 −(1 −P [W 1 W 2 W 3 ])(1 −P [W 4 ]) = (1 −q) 1 +q(1 −q) 2 . (8) In this case, the probability the system works is P [W III ] = P [W 1 W 2 W 3 ∪ W 4 ] P [W 5 ∪ W 6 ] (9) = (1 −q) 1 − q 2 2 1 +q(1 −q) 2 . (10) From these expressions, its hard to tell which substitution creates the most reliable circuit. First, we observe that P[W II ] > P[W I ] if and only if 1 − q 2 + q 2 (1 −q) 3 > 1 − q 2 2 (5 −4q +q 2 ). (11) Some algebra will show that P[W II ] > P[W I ] if and only if q 2 < 2, which occurs for all nontrivial (i.e., nonzero) values of q. Similar algebra will show that P[W II ] > P[W III ] for all values of 0 ≤ q ≤ 1. Thus the best policy is to replace component 4. Problem 1.11.1 Solution We can generate the 200 1 vector T, denoted T in Matlab, via the command T=50+ceil(50*rand(200,1)) Keep in mind that 50*rand(200,1) produces a 200 1 vector of random numbers, each in the interval (0, 50). Applying the ceiling function converts these random numbers to rndom integers in the set {1, 2, . . . , 50}. Finally, we add 50 to produce random numbers between 51 and 100. Problem 1.11.2 Solution Rather than just solve the problem for 50 trials, we can write a function that generates vectors C and H for an arbitrary number of trials n. The code for this task is function [C,H]=twocoin(n); C=ceil(2*rand(n,1)); P=1-(C/4); H=(rand(n,1)< P); The first line produces the n1 vector C such that C(i) indicates whether coin 1 or coin 2 is chosen for trial i. Next, we generate the vector P such that P(i)=0.75 if C(i)=1; otherwise, if C(i)=2, then P(i)=0.5. As a result, H(i) is the simulated result of a coin flip with heads, corresponding to H(i)=1, occurring with probability P(i). Problem 1.11.3 Solution Rather than just solve the problem for 100 trials, we can write a function that generates n packets for an arbitrary number of trials n. The code for this task is 31 function C=bit100(n); % n is the number of 100 bit packets sent B=floor(2*rand(n,100)); P=0.03-0.02*B; E=(rand(n,100)< P); C=sum((sum(E,2)<=5)); First, B is an n 100 matrix such that B(i,j) indicates whether bit i of packet j is zero or one. Next, we generate the n100 matrix P such that P(i,j)=0.03 if B(i,j)=0; otherwise, if B(i,j)=1, then P(i,j)=0.01. As a result, E(i,j) is the simulated error indicator for bit i of packet j. That is, E(i,j)=1 if bit i of packet j is in error; otherwise E(i,j)=0. Next we sum across the rows of E to obtain the number of errors in each packet. Finally, we count the number of packets with 5 or more errors. For n = 100 packets, the packet success probability is inconclusive. Experimentation will show that C=97, C=98, C=99 and C=100 correct packets are typica values that might be observed. By increasing n, more consistent results are obtained. For example, repeated trials with n = 100, 000 packets typically produces around C = 98, 400 correct packets. Thus 0.984 is a reasonable estimate for the probability of a packet being transmitted correctly. Problem 1.11.4 Solution To test n 6-component devices, (such that each component works with probability q) we use the following function: function N=reliable6(n,q); % n is the number of 6 component devices %N is the number of working devices W=rand(n,6)>q; D=(W(:,1)&W(:,2)&W(:,3))|W(:,4); D=D&(W(:,5)|W(:,6)); N=sum(D); The n6 matrix W is a logical matrix such that W(i,j)=1 if component j of device i works properly. Because W is a logical matrix, we can use the Matlab logical operators | and & to implement the logic requirements for a working device. By applying these logical operators to the n 1 columns of W, we simulate the test of n circuits. Note that D(i)=1 if device i works. Otherwise, D(i)=0. Lastly, we count the number N of working devices. The following code snippet produces ten sample runs, where each sample run tests n=100 devices for q = 0.2. >> for n=1:10, w(n)=reliable6(100,0.2); end >> w w = 82 87 87 92 91 85 85 83 90 89 >> As we see, the number of working devices is typically around 85 out of 100. Solving Problem 1.10.1, will show that the probability the device works is actually 0.8663. Problem 1.11.5 Solution The code 32 function n=countequal(x,y) %Usage: n=countequal(x,y) %n(j)= # elements of x = y(j) [MX,MY]=ndgrid(x,y); %each column of MX = x %each row of MY = y n=(sum((MX==MY),1))’; for countequal is quite short (just two lines excluding comments) but needs some explanation. The key is in the operation [MX,MY]=ndgrid(x,y). The Matlab built-in function ndgrid facilitates plotting a function g(x, y) as a surface over the x, y plane. The x, y plane is represented by a grid of all pairs of points x(i), y(j). When x has n elements, and y has m elements, ndgrid(x,y) creates a grid (an n m array) of all possible pairs [x(i) y(j)]. This grid is represented by two separate n m matrices: MX and MY which indicate the x and y values at each grid point. Mathematically, MX(i,j) = x(i), MY(i,j)=y(j). Next, C=(MX==MY) is an n m array such that C(i,j)=1 if x(i)=y(j); otherwise C(i,j)=0. That is, the jth column of C indicates indicates which elements of x equal y(j). Lastly, we sum along each column j to count number of elements of x equal to y(j). That is, we sum along column j to count the number of occurrences (in x) of y(j). Problem 1.11.6 Solution For arbitrary number of trials n and failure probability q, the following functions evaluates replacing each of the six components by an ultrareliable device. function N=ultrareliable6(n,q); % n is the number of 6 component devices %N is the number of working devices for r=1:6, W=rand(n,6)>q; R=rand(n,1)>(q/2); W(:,r)=R; D=(W(:,1)&W(:,2)&W(:,3))|W(:,4); D=D&(W(:,5)|W(:,6)); N(r)=sum(D); end This above code is based on the code for the solution of Problem 1.11.4. The n 6 matrix W is a logical matrix such that W(i,j)=1 if component j of device i works properly. Because W is a logical matrix, we can use the Matlab logical operators | and & to implement the logic requirements for a working device. By applying these logical opeators to the n 1 columns of W, we simulate the test of n circuits. Note that D(i)=1 if device i works. Otherwise, D(i)=0. Note that in the code, we first generate the matrix W such that each component has failure probability q. To simulate the replacement of the jth device by the ultrareliable version by replacing the jth column of W by the column vector R in which a device has failure probability q/2. Lastly, for each column replacement, we count the number N of working devices. A sample run for n = 100 trials and q = 0.2 yielded these results: 33 >> ultrareliable6(100,0.2) ans = 93 89 91 92 90 93 From the above, we see, for example, that replacing the third component with an ultrareliable component resulted in 91 working devices. The results are fairly inconclusive in that replacing devices 1, 2, or 3 should yield the same probability of device failure. If we experiment with n = 10, 000 runs, the results are more definitive: >> ultrareliable6(10000,0.2) ans = 8738 8762 8806 9135 8800 8796 >> >> ultrareliable6(10000,0.2) ans = 8771 8795 8806 9178 8886 8875 >> In both cases, it is clear that replacing component 4 maximizes the device reliability. The somewhat complicated solution of Problem 1.10.4 will confirm this observation. 34 Problem Solutions – Chapter 2 Problem 2.2.1 Solution (a) We wish to find the value of c that makes the PMF sum up to one. P N (n) = c(1/2) n n = 0, 1, 2 0 otherwise (1) Therefore, ¸ 2 n=0 P N (n) = c +c/2 +c/4 = 1, implying c = 4/7. (b) The probability that N ≤ 1 is P [N ≤ 1] = P [N = 0] +P [N = 1] = 4/7 + 2/7 = 6/7 (2) Problem 2.2.2 Solution From Example 2.5, we can write the PMF of X and the PMF of R as P X (x) = 1/8 x = 0 3/8 x = 1 3/8 x = 2 1/8 x = 3 0 otherwise P R (r) = 1/4 r = 0 3/4 r = 2 0 otherwise (1) From the PMFs P X (x) and P R (r), we can calculate the requested probabilities (a) P[X = 0] = P X (0) = 1/8. (b) P[X < 3] = P X (0) +P X (1) +P X (2) = 7/8. (c) P[R > 1] = P R (2) = 3/4. Problem 2.2.3 Solution (a) We must choose c to make the PMF of V sum to one. 4 ¸ v=1 P V (v) = c(1 2 + 2 2 + 3 2 + 4 2 ) = 30c = 1 (1) Hence c = 1/30. (b) Let U = {u 2 |u = 1, 2, . . .} so that P [V ∈ U] = P V (1) +P V (4) = 1 30 + 4 2 30 = 17 30 (2) (c) The probability that V is even is P [V is even] = P V (2) +P V (4) = 2 2 30 + 4 2 30 = 2 3 (3) (d) The probability that V > 2 is P [V > 2] = P V (3) +P V (4) = 3 2 30 + 4 2 30 = 5 6 (4) 35 Problem 2.2.4 Solution (a) We choose c so that the PMF sums to one. ¸ x P X (x) = c 2 + c 4 + c 8 = 7c 8 = 1 (1) Thus c = 8/7. (b) P [X = 4] = P X (4) = 8 7 · 4 = 2 7 (2) (c) P [X < 4] = P X (2) = 8 7 · 2 = 4 7 (3) (d) P [3 ≤ X ≤ 9] = P X (4) +P X (8) = 8 7 · 4 + 8 7 · 8 = 3 7 (4) Problem 2.2.5 Solution Using B (for Bad) to denote a miss and G (for Good) to denote a successful free throw, the sample tree for the number of points scored in the 1 and 1 is ¨ ¨ ¨ ¨ ¨ ¨ B 1−p G p ¨ ¨ ¨ ¨ ¨ ¨ B 1−p G p •Y =0 •Y =1 •Y =2 From the tree, the PMF of Y is P Y (y) = 1 −p y = 0 p(1 −p) y = 1 p 2 y = 2 0 otherwise (1) Problem 2.2.6 Solution The probability that a caller fails to get through in three tries is (1 −p) 3 . To be sure that at least 95% of all callers get through, we need (1 −p) 3 ≤ 0.05. This implies p = 0.6316. Problem 2.2.7 Solution In Problem 2.2.6, each caller is willing to make 3 attempts to get through. An attempt is a failure if all n operators are busy, which occurs with probability q = (0.8) n . Assuming call attempts are independent, a caller will suffer three failed attempts with probability q 3 = (0.8) 3n . The problem statement requires that (0.8) 3n ≤ 0.05. This implies n ≥ 4.48 and so we need 5 operators. 36 Problem 2.2.8 Solution From the problem statement, a single is twice as likely as a double, which is twice as likely as a triple, which is twice as likely as a home-run. If p is the probability of a home run, then P B (4) = p P B (3) = 2p P B (2) = 4p P B (1) = 8p (1) Since a hit of any kind occurs with probability of .300, p + 2p + 4p + 8p = 0.300 which implies p = 0.02. Hence, the PMF of B is P B (b) = 0.70 b = 0 0.16 b = 1 0.08 b = 2 0.04 b = 3 0.02 b = 4 0 otherwise (2) Problem 2.2.9 Solution (a) In the setup of a mobile call, the phone will send the “SETUP” message up to six times. Each time the setup message is sent, we have a Bernoulli trial with success probability p. Of course, the phone stops trying as soon as there is a success. Using r to denote a successful response, and n a non-response, the sample tree is ¨ ¨ ¨ ¨ ¨ r p n 1−p ¨ ¨ ¨ ¨ ¨ r p n 1−p ¨ ¨ ¨ ¨ ¨ r p n 1−p ¨ ¨ ¨ ¨ ¨ r p n 1−p ¨ ¨ ¨ ¨ ¨ r p n 1−p ¨ ¨ ¨ ¨ ¨ r p n 1−p •K=1 •K=2 •K=3 •K=4 •K=5 •K=6 •K=6 (b) We can write the PMF of K, the number of “SETUP” messages sent as P K (k) = (1 −p) k−1 p k = 1, 2, . . . , 5 (1 −p) 5 p + (1 −p) 6 = (1 −p) 5 k = 6 0 otherwise (1) Note that the expression for P K (6) is different because K = 6 if either there was a success or a failure on the sixth attempt. In fact, K = 6 whenever there were failures on the first five attempts which is why P K (6) simplifies to (1 −p) 5 . (c) Let B denote the event that a busy signal is given after six failed setup attempts. The probability of six consecutive failures is P[B] = (1 −p) 6 . (d) To be sure that P[B] ≤ 0.02, we need p ≥ 1 −(0.02) 1/6 = 0.479. Problem 2.3.1 Solution (a) If it is indeed true that Y , the number of yellow M&M’s in a package, is uniformly distributed between 5 and 15, then the PMF of Y , is P Y (y) = 1/11 y = 5, 6, 7, . . . , 15 0 otherwise (1) 37 (b) P [Y < 10] = P Y (5) +P Y (6) + · · · +P Y (9) = 5/11 (2) (c) P [Y > 12] = P Y (13) +P Y (14) +P Y (15) = 3/11 (3) (d) P [8 ≤ Y ≤ 12] = P Y (8) +P Y (9) + · · · +P Y (12) = 5/11 (4) Problem 2.3.2 Solution (a) Each paging attempt is an independent Bernoulli trial with success probability p. The number of times K that the pager receives a message is the number of successes in n Bernoulli trials and has the binomial PMF P K (k) = n k p k (1 −p) n−k k = 0, 1, . . . , n 0 otherwise (1) (b) Let R denote the event that the paging message was received at least once. The event R has probability P [R] = P [B > 0] = 1 −P [B = 0] = 1 −(1 −p) n (2) To ensure that P[R] ≥ 0.95 requires that n ≥ ln(0.05)/ ln(1 −p). For p = 0.8, we must have n ≥ 1.86. Thus, n = 2 pages would be necessary. Problem 2.3.3 Solution Whether a hook catches a fish is an independent trial with success probability h. The the number of fish hooked, K, has the binomial PMF P K (k) = m k h k (1 −h) m−k k = 0, 1, . . . , m 0 otherwise (1) Problem 2.3.4 Solution (a) Let X be the number of times the frisbee is thrown until the dog catches it and runs away. Each throw of the frisbee can be viewed as a Bernoulli trial in which a success occurs if the dog catches the frisbee an runs away. Thus, the experiment ends on the first success and X has the geometric PMF P X (x) = (1 −p) x−1 p x = 1, 2, . . . 0 otherwise (1) (b) The child will throw the frisbee more than four times iff there are failures on the first 4 trials which has probability (1 − p) 4 . If p = 0.2, the probability of more than four throws is (0.8) 4 = 0.4096. 38 Problem 2.3.5 Solution Each paging attempt is a Bernoulli trial with success probability p where a success occurs if the pager receives the paging message. (a) The paging message is sent again and again until a success occurs. Hence the number of paging messages is N = n if there are n−1 paging failures followed by a paging success. That is, N has the geometric PMF P N (n) = (1 −p) n−1 p n = 1, 2, . . . 0 otherwise (1) (b) The probability that no more three paging attempts are required is P [N ≤ 3] = 1 −P [N > 3] = 1 − ∞ ¸ n=4 P N (n) = 1 −(1 −p) 3 (2) This answer can be obtained without calculation since N > 3 if the first three paging attempts fail and that event occurs with probability (1 − p) 3 . Hence, we must choose p to satisfy 1 −(1 −p) 3 ≥ 0.95 or (1 −p) 3 ≤ 0.05. This implies p ≥ 1 −(0.05) 1/3 ≈ 0.6316 (3) Problem 2.3.6 Solution The probability of more than 500,000 bits is P [B > 500,000] = 1 − 500,000 ¸ b=1 P B (b) (1) = 1 −p 500,000 ¸ b=1 (1 −p) b−1 (2) Math Fact B.4 implies that (1 −x) ¸ 500,000 b=1 x b−1 = 1 −x 500,000 . Substituting, x = 1 −p, we obtain: P [B > 500,000] = 1 −(1 −(1 −p) 500,000 ) (3) = (1 −0.25 10 −5 ) 500,000 ≈ exp(−500,000/400,000) = 0.29. (4) Problem 2.3.7 Solution Since an average of T/5 buses arrive in an interval of T minutes, buses arrive at the bus stop at a rate of 1/5 buses per minute. (a) From the definition of the Poisson PMF, the PMF of B, the number of buses in T minutes, is P B (b) = (T/5) b e −T/5 /b! b = 0, 1, . . . 0 otherwise (1) (b) Choosing T = 2 minutes, the probability that three buses arrive in a two minute interval is P B (3) = (2/5) 3 e −2/5 /3! ≈ 0.0072 (2) 39 (c) By choosing T = 10 minutes, the probability of zero buses arriving in a ten minute interval is P B (0) = e −10/5 /0! = e −2 ≈ 0.135 (3) (d) The probability that at least one bus arrives in T minutes is P [B ≥ 1] = 1 −P [B = 0] = 1 −e −T/5 ≥ 0.99 (4) Rearranging yields T ≥ 5 ln 100 ≈ 23.0 minutes. Problem 2.3.8 Solution (a) If each message is transmitted 8 times and the probability of a successful transmission is p, then the PMF of N, the number of successful transmissions has the binomial PMF P N (n) = 8 n p n (1 −p) 8−n n = 0, 1, . . . , 8 0 otherwise (1) (b) The indicator random variable I equals zero if and only if N = 8. Hence, P [I = 0] = P [N = 0] = 1 −P [I = 1] (2) Thus, the complete expression for the PMF of I is P I (i) = (1 −p) 8 i = 0 1 −(1 −p) 8 i = 1 0 otherwise (3) Problem 2.3.9 Solution The requirement that ¸ n x=1 P X (x) = 1 implies n = 1 : c(1) ¸ 1 1 = 1 c(1) = 1 (1) n = 2 : c(2) ¸ 1 1 + 1 2 = 1 c(2) = 2 3 (2) n = 3 : c(3) ¸ 1 1 + 1 2 + 1 3 = 1 c(3) = 6 11 (3) n = 4 : c(4) ¸ 1 1 + 1 2 + 1 3 + 1 4 = 1 c(4) = 12 25 (4) n = 5 : c(5) ¸ 1 1 + 1 2 + 1 3 + 1 4 + 1 5 = 1 c(5) = 12 25 (5) n = 6 : c(6) ¸ 1 1 + 1 2 + 1 3 + 1 4 + 1 6 = 1 c(6) = 20 49 (6) As an aside, find c(n) for large values of n is easy using the recursion 1 c(n + 1) = 1 c(n) + 1 n + 1 . (7) 40 Problem 2.3.10 Solution (a) We can view whether each caller knows the birthdate as a Bernoulli trial. As a result, L is the number of trials needed for 6 successes. That is, L has a Pascal PMF with parameters p = 0.75 and k = 6 as defined by Definition 2.8. In particular, P L (l) = l−1 5 (0.75) 6 (0.25) l−6 l = 6, 7, . . . 0 otherwise (1) (b) The probability of finding the winner on the tenth call is P L (10) = 9 5 (0.75) 6 (0.25) 4 ≈ 0.0876 (2) (c) The probability that the station will need nine or more calls to find a winner is P [L ≥ 9] = 1 −P [L < 9] (3) = 1 −P L (6) −P L (7) −P L (8) (4) = 1 −(0.75) 6 [1 + 6(0.25) + 21(0.25) 2 ] ≈ 0.321 (5) Problem 2.3.11 Solution The packets are delay sensitive and can only be retransmitted d times. For t < d, a packet is transmitted t times if the first t −1 attempts fail followed by a successful transmission on attempt t. Further, the packet is transmitted d times if there are failures on the first d − 1 transmissions, no matter what the outcome of attempt d. So the random variable T, the number of times that a packet is transmitted, can be represented by the following PMF. P T (t) = p(1 −p) t−1 t = 1, 2, . . . , d −1 (1 −p) d−1 t = d 0 otherwise (1) Problem 2.3.12 Solution (a) Since each day is independent of any other day, P[W 33 ] is just the probability that a winning lottery ticket was bought. Similarly for P[L 87 ] and P[N 99 ] become just the probability that a losing ticket was bought and that no ticket was bought on a single day, respectively. Therefore P [W 33 ] = p/2 P [L 87 ] = (1 −p)/2 P [N 99 ] = 1/2 (1) (b) Suppose we say a success occurs on the kth trial if on day k we buy a ticket. Otherwise, a failure occurs. The probability of success is simply 1/2. The random variable K is just the number of trials until the first success and has the geometric PMF P K (k) = (1/2)(1/2) k−1 = (1/2) k k = 1, 2, . . . 0 otherwise (2) 41 (c) The probability that you decide to buy a ticket and it is a losing ticket is (1−p)/2, independent of any other day. If we view buying a losing ticket as a Bernoulli success, R, the number of losing lottery tickets bought in m days, has the binomial PMF P R (r) = m r [(1 −p)/2] r [(1 +p)/2] m−r r = 0, 1, . . . , m 0 otherwise (3) (d) Letting D be the day on which the j-th losing ticket is bought, we can find the probability that D = d by noting that j −1 losing tickets must have been purchased in the d−1 previous days. Therefore D has the Pascal PMF P D (d) = d−1 j−1 [(1 −p)/2] j [(1 +p)/2] d−j d = j, j + 1, . . . 0 otherwise (4) Problem 2.3.13 Solution (a) Let S n denote the event that the Sixers win the series in n games. Similarly, C n is the event that the Celtics in in n games. The Sixers win the series in 3 games if they win three straight, which occurs with probability P [S 3 ] = (1/2) 3 = 1/8 (1) The Sixers win the series in 4 games if they win two out of the first three games and they win the fourth game so that P [S 4 ] = 3 2 (1/2) 3 (1/2) = 3/16 (2) The Sixers win the series in five games if they win two out of the first four games and then win game five. Hence, P [S 5 ] = 4 2 (1/2) 4 (1/2) = 3/16 (3) By symmetry, P[C n ] = P[S n ]. Further we observe that the series last n games if either the Sixers or the Celtics win the series in n games. Thus, P [N = n] = P [S n ] +P [C n ] = 2P [S n ] (4) Consequently, the total number of games, N, played in a best of 5 series between the Celtics and the Sixers can be described by the PMF P N (n) = 2(1/2) 3 = 1/4 n = 3 2 3 1 (1/2) 4 = 3/8 n = 4 2 4 2 (1/2) 5 = 3/8 n = 5 0 otherwise (5) (b) For the total number of Celtic wins W, we note that if the Celtics get w < 3 wins, then the Sixers won the series in 3 +w games. Also, the Celtics win 3 games if they win the series in 3,4, or 5 games. Mathematically, P [W = w] = P [S 3+w ] w = 0, 1, 2 P [C 3 ] +P [C 4 ] +P [C 5 ] w = 3 (6) 42 Thus, the number of wins by the Celtics, W, has the PMF shown below. P W (w) = P [S 3 ] = 1/8 w = 0 P [S 4 ] = 3/16 w = 1 P [S 5 ] = 3/16 w = 2 1/8 + 3/16 + 3/16 = 1/2 w = 3 0 otherwise (7) (c) The number of Celtic losses L equals the number of Sixers’ wins W S . This implies P L (l) = P W S (l). Since either team is equally likely to win any game, by symmetry, P W S (w) = P W (w). This implies P L (l) = P W S (l) = P W (l). The complete expression of for the PMF of L is P L (l) = P W (l) = 1/8 l = 0 3/16 l = 1 3/16 l = 2 1/2 l = 3 0 otherwise (8) Problem 2.3.14 Solution Since a and b are positive, let K be a binomial random variable for n trials and success probability p = a/(a +b). First, we observe that the sum of over all possible values of the PMF of K is n ¸ k=0 P K (k) = n ¸ k=0 n k p k (1 −p) n−k (1) = n ¸ k=0 n k a a +b k b a +b n−k (2) = ¸ n k=0 n k a k b n−k (a +b) n (3) Since ¸ n k=0 P K (k) = 1, we see that (a +b) n = (a +b) n n ¸ k=0 P K (k) = n ¸ k=0 n k a k b n−k (4) Problem 2.4.1 Solution Using the CDF given in the problem statement we find that (a) P[Y < 1] = 0 (b) P[Y ≤ 1] = 1/4 (c) P[Y > 2] = 1 −P[Y ≤ 2] = 1 −1/2 = 1/2 (d) P[Y ≥ 2] = 1 −P[Y < 2] = 1 −1/4 = 3/4 (e) P[Y = 1] = 1/4 43 (f) P[Y = 3] = 1/2 (g) From the staircase CDF of Problem 2.4.1, we see that Y is a discrete random variable. The jumps in the CDF occur at at the values that Y can take on. The height of each jump equals the probability of that value. The PMF of Y is P Y (y) = 1/4 y = 1 1/4 y = 2 1/2 y = 3 0 otherwise (1) Problem 2.4.2 Solution (a) The given CDF is shown in the diagram below. −2 −1 0 1 2 0 0.2 0.4 0.6 0.8 1 x F X ( x ) F X (x) = 0 x < −1 0.2 −1 ≤ x < 0 0.7 0 ≤ x < 1 1 x ≥ 1 (1) (b) The corresponding PMF of X is P X (x) = 0.2 x = −1 0.5 x = 0 0.3 x = 1 0 otherwise (2) Problem 2.4.3 Solution (a) Similar to the previous problem, the graph of the CDF is shown below. −3 0 5 7 0 0.2 0.4 0.6 0.8 1 x F X ( x ) F X (x) = 0 x < −3 0.4 −3 ≤ x < 5 0.8 5 ≤ x < 7 1 x ≥ 7 (1) (b) The corresponding PMF of X is P X (x) = 0.4 x = −3 0.4 x = 5 0.2 x = 7 0 otherwise (2) 44 Problem 2.4.4 Solution Let q = 1 −p, so the PMF of the geometric (p) random variable K is P K (k) = pq k−1 k = 1, 2, . . . , 0 otherwise. (1) For any integer k ≥ 1, the CDF obeys F K (k) = k ¸ j=1 P K (j) = k ¸ j=1 pq j−1 = 1 −q k . (2) Since K is integer valued, F K (k) = F K (k|) for all integer and non-integer values of k. (If this point is not clear, you should review Example 2.24.) Thus, the complete expression for the CDF of K is F K (k) = 0 k < 1, 1 −(1 −p) k k ≥ 1. (3) Problem 2.4.5 Solution Since mushrooms occur with probability 2/3, the number of pizzas sold before the first mushroom pizza is N = n < 100 if the first n pizzas do not have mushrooms followed by mushrooms on pizza n+1. Also, it is possible that N = 100 if all 100 pizzas are sold without mushrooms. the resulting PMF is P N (n) = (1/3) n (2/3) n = 0, 1, . . . , 99 (1/3) 100 n = 100 0 otherwise (1) For integers n < 100, the CDF of N obeys F N (n) = n ¸ i=0 P N (i) = n ¸ i=0 (1/3) i (2/3) = 1 −(1/3) n+1 (2) A complete expression for F N (n) must give a valid answer for every value of n, including non-integer values. We can write the CDF using the floor function x| which denote the largest integer less than or equal to X. The complete expression for the CDF is F N (x) = 0 x < 0 1 −(1/3) x+1 0 ≤ x < 100 1 x ≥ 100 (3) Problem 2.4.6 Solution From Problem 2.2.8, the PMF of B is P B (b) = 0.70 b = 0 0.16 b = 1 0.08 b = 2 0.04 b = 3 0.02 b = 4 0 otherwise (1) 45 The corresponding CDF is −1 0 1 2 3 4 5 0 0.25 0.5 0.75 1 b F B ( b ) F B (b) = 0 b < 0 0.70 0 ≤ b < 1 0.86 1 ≤ b < 2 0.94 2 ≤ b < 3 0.98 3 ≤ b < 4 1.0 b ≥ 4 (2) Problem 2.4.7 Solution In Problem 2.2.5, we found the PMF of Y . This PMF, and its corresponding CDF are P Y (y) = 1 −p y = 0 p(1 −p) y = 1 p 2 y = 2 0 otherwise F Y (y) = 0 y < 0 1 −p 0 ≤ y < 1 1 −p 2 1 ≤ y < 2 1 y ≥ 2 (1) For the three values of p, the CDF resembles −1 0 1 2 3 0 0.25 0.5 0.75 1 y F Y ( y ) −1 0 1 2 3 0 0.25 0.5 0.75 1 y F Y ( y ) −1 0 1 2 3 0 0.25 0.5 0.75 1 y F Y ( y ) p = 1/4 p = 1/2 p = 3/4 (2) Problem 2.4.8 Solution From Problem 2.2.9, the PMF of the number of call attempts is P N (n) = (1 −p) k−1 p k = 1, 2, . . . , 5 (1 −p) 5 p + (1 −p) 6 = (1 −p) 5 k = 6 0 otherwise (1) For p = 1/2, the PMF can be simplified to P N (n) = (1/2) n n = 1, 2, . . . , 5 (1/2) 5 n = 6 0 otherwise (2) The corresponding CDF of N is 0 1 2 3 4 5 6 7 0 0.25 0.5 0.75 1 n F N ( n ) F N (n) = 0 n < 1 1/2 1 ≤ n < 2 3/4 2 ≤ n < 3 7/8 3 ≤ n < 4 15/16 4 ≤ n < 5 31/32 5 ≤ n < 6 1 n ≥ 6 (3) 46 Problem 2.5.1 Solution For this problem, we just need to pay careful attention to the definitions of mode and median. (a) The mode must satisfy P X (x mod ) ≥ P X (x) for all x. In the case of the uniform PMF, any integer x between 1 and 100 is a mode of the random variable X. Hence, the set of all modes is X mod = {1, 2, . . . , 100} (1) (b) The median must satisfy P[X < x med ] = P[X > x med ]. Since P [X ≤ 50] = P [X ≥ 51] = 1/2 (2) we observe that x med = 50.5 is a median since it satisfies P [X < x med ] = P [X > x med ] = 1/2 (3) In fact, for any x satisfying 50 < x < 51, P[X < x ] = P[X > x ] = 1/2. Thus, X med = {x|50 < x < 51} (4) Problem 2.5.2 Solution Voice calls and data calls each cost 20 cents and 30 cents respectively. Furthermore the respective probabilities of each type of call are 0.6 and 0.4. (a) Since each call is either a voice or data call, the cost of one call can only take the two values associated with the cost of each type of call. Therefore the PMF of X is P X (x) = 0.6 x = 20 0.4 x = 30 0 otherwise (1) (b) The expected cost, E[C], is simply the sum of the cost of each type of call multiplied by the probability of such a call occurring. E [C] = 20(0.6) + 30(0.4) = 24 cents (2) Problem 2.5.3 Solution From the solution to Problem 2.4.1, the PMF of Y is P Y (y) = 1/4 y = 1 1/4 y = 2 1/2 y = 3 0 otherwise (1) The expected value of Y is E [Y ] = ¸ y yP Y (y) = 1(1/4) + 2(1/4) + 3(1/2) = 9/4 (2) 47 Problem 2.5.4 Solution From the solution to Problem 2.4.2, the PMF of X is P X (x) = 0.2 x = −1 0.5 x = 0 0.3 x = 1 0 otherwise (1) The expected value of X is E [X] = ¸ x xP X (x) = −1(0.2) + 0(0.5) + 1(0.3) = 0.1 (2) Problem 2.5.5 Solution From the solution to Problem 2.4.3, the PMF of X is P X (x) = 0.4 x = −3 0.4 x = 5 0.2 x = 7 0 otherwise (1) The expected value of X is E [X] = ¸ x xP X (x) = −3(0.4) + 5(0.4) + 7(0.2) = 2.2 (2) Problem 2.5.6 Solution From Definition 2.7, random variable X has PMF P X (x) = 4 x (1/2) 4 x = 0, 1, 2, 3, 4 0 otherwise (1) The expected value of X is E [X] = 4 ¸ x=0 xP X (x) = 0 4 0 1 2 4 + 1 4 1 1 2 4 + 2 4 2 1 2 4 + 3 4 3 1 2 4 + 4 4 4 1 2 4 (2) = [4 + 12 + 12 + 4]/2 4 = 2 (3) Problem 2.5.7 Solution From Definition 2.7, random variable X has PMF P X (x) = 5 x (1/2) 5 x = 0, 1, 2, 3, 4, 5 0 otherwise (1) The expected value of X is E [X] = 5 ¸ x=0 xP X (x) (2) = 0 5 0 1 2 5 + 1 5 1 1 2 5 + 2 5 2 1 2 5 + 3 5 3 1 2 5 + 4 5 4 1 2 5 + 5 5 5 1 2 5 (3) = [5 + 20 + 30 + 20 + 5]/2 5 = 2.5 (4) 48 Problem 2.5.8 Solution The following experiments are based on a common model of packet transmissions in data networks. In these networks, each data packet contains a cylic redundancy check (CRC) code that permits the receiver to determine whether the packet was decoded correctly. In the following, we assume that a packet is corrupted with probability = 0.001, independent of whether any other packet is corrupted. (a) Let X = 1 if a data packet is decoded correctly; otherwise X = 0. Random variable X is a Bernoulli random variable with PMF P X (x) = 0.001 x = 0 0.999 x = 1 0 otherwise (1) The parameter = 0.001 is the probability a packet is corrupted. The expected value of X is E [X] = 1 − = 0.999 (2) (b) Let Y denote the number of packets received in error out of 100 packets transmitted. Y has the binomial PMF P Y (y) = 100 y (0.001) y (0.999) 100−y y = 0, 1, . . . , 100 0 otherwise (3) The expected value of Y is E [Y ] = 100 = 0.1 (4) (c) Let L equal the number of packets that must be received to decode 5 packets in error. L has the Pascal PMF P L (l) = l−1 4 (0.001) 5 (0.999) l−5 l = 5, 6, . . . 0 otherwise (5) The expected value of L is E [L] = 5 = 5 0.001 = 5000 (6) (d) If packet arrivals obey a Poisson model with an average arrival rate of 1000 packets per second, then the number N of packets that arrive in 5 seconds has the Poisson PMF P N (n) = 5000 n e −5000 /n! n = 0, 1, . . . 0 otherwise (7) The expected value of N is E[N] = 5000. Problem 2.5.9 Solution In this ”double-or-nothing” type game, there are only two possible payoffs. The first is zero dollars, which happens when we lose 6 straight bets, and the second payoff is 64 dollars which happens unless we lose 6 straight bets. So the PMF of Y is P Y (y) = (1/2) 6 = 1/64 y = 0 1 −(1/2) 6 = 63/64 y = 64 0 otherwise (1) The expected amount you take home is E [Y ] = 0(1/64) + 64(63/64) = 63 (2) So, on the average, we can expect to break even, which is not a very exciting proposition. 49 Bernoulli- one or the other prob is raised to the 6 because it takes 6 times to reach $0 Problem 2.5.10 Solution By the definition of the expected value, E [X n ] = n ¸ x=1 x n x p x (1 −p) n−x (1) = np n ¸ x=1 (n −1)! (x −1)!(n −1 −(x −1))! p x−1 (1 −p) n−1−(x−1) (2) With the substitution x = x −1, we have E [X n ] = np n−1 ¸ x =0 n −1 x p x (1 −p) n−x . .. . 1 = np n−1 ¸ x =0 P X n−1 (x) = np (3) The above sum is 1 because it is the sum of a binomial random variable for n − 1 trials over all possible values. Problem 2.5.11 Solution We write the sum as a double sum in the following way: ∞ ¸ i=0 P [X > i] = ∞ ¸ i=0 ∞ ¸ j=i+1 P X (j) (1) At this point, the key step is to reverse the order of summation. You may need to make a sketch of the feasible values for i and j to see how this reversal occurs. In this case, ∞ ¸ i=0 P [X > i] = ∞ ¸ j=1 j−1 ¸ i=0 P X (j) = ∞ ¸ j=1 jP X (j) = E [X] (2) Problem 2.6.1 Solution From the solution to Problem 2.4.1, the PMF of Y is P Y (y) = 1/4 y = 1 1/4 y = 2 1/2 y = 3 0 otherwise (1) (a) Since Y has range S Y = {1, 2, 3}, the range of U = Y 2 is S U = {1, 4, 9}. The PMF of U can be found by observing that P [U = u] = P Y 2 = u = P Y = √ u +P Y = − √ u (2) Since Y is never negative, P U (u) = P Y ( √ u). Hence, P U (1) = P Y (1) = 1/4 P U (4) = P Y (2) = 1/4 P U (9) = P Y (3) = 1/2 (3) For all other values of u, P U (u) = 0. The complete expression for the PMF of U is P U (u) = 1/4 u = 1 1/4 u = 4 1/2 u = 9 0 otherwise (4) 50 (b) From the PMF, it is straighforward to write down the CDF. F U (u) = 0 u < 1 1/4 1 ≤ u < 4 1/2 4 ≤ u < 9 1 u ≥ 9 (5) (c) From Definition 2.14, the expected value of U is E [U] = ¸ u uP U (u) = 1(1/4) + 4(1/4) + 9(1/2) = 5.75 (6) From Theorem 2.10, we can calculate the expected value of U as E [U] = E Y 2 = ¸ y y 2 P Y (y) = 1 2 (1/4) + 2 2 (1/4) + 3 2 (1/2) = 5.75 (7) As we expect, both methods yield the same answer. Problem 2.6.2 Solution From the solution to Problem 2.4.2, the PMF of X is P X (x) = 0.2 x = −1 0.5 x = 0 0.3 x = 1 0 otherwise (1) (a) The PMF of V = |X| satisfies P V (v) = P [|X| = v] = P X (v) +P X (−v) (2) In particular, P V (0) = P X (0) = 0.5 P V (1) = P X (−1) +P X (1) = 0.5 (3) The complete expression for the PMF of V is P V (v) = 0.5 v = 0 0.5 v = 1 0 otherwise (4) (b) From the PMF, we can construct the staircase CDF of V . F V (v) = 0 v < 0 0.5 0 ≤ v < 1 1 v ≥ 1 (5) (c) From the PMF P V (v), the expected value of V is E [V ] = ¸ v P V (v) = 0(1/2) + 1(1/2) = 1/2 (6) You can also compute E[V ] directly by using Theorem 2.10. 51 b/c of the abs. 1 and -1 are the same v so are added together Problem 2.6.3 Solution From the solution to Problem 2.4.3, the PMF of X is P X (x) = 0.4 x = −3 0.4 x = 5 0.2 x = 7 0 otherwise (1) (a) The PMF of W = −X satisfies P W (w) = P [−X = w] = P X (−w) (2) This implies P W (−7) = P X (7) = 0.2 P W (−5) = P X (5) = 0.4 P W (3) = P X (−3) = 0.4 (3) The complete PMF for W is P W (w) = 0.2 w = −7 0.4 w = −5 0.4 w = 3 0 otherwise (4) (b) From the PMF, the CDF of W is F W (w) = 0 w < −7 0.2 −7 ≤ w < −5 0.6 −5 ≤ w < 3 1 w ≥ 3 (5) (c) From the PMF, W has expected value E [W] = ¸ w wP W (w) = −7(0.2) +−5(0.4) + 3(0.4) = −2.2 (6) Problem 2.6.4 Solution A tree for the experiment is $ $ $ $ $ $ $ D=99.75 1/3 D=100 1/3 ˆ ˆ ˆ ˆ ˆ ˆ ˆ D=100.25 1/3 •C=100074.75 •C=10100 •C=10125.13 Thus C has three equally likely outcomes. The PMF of C is P C (c) = 1/3 c = 100,074.75, 10,100, 10,125.13 0 otherwise (1) 52 Problem 2.6.5 Solution (a) The source continues to transmit packets until one is received correctly. Hence, the total number of times that a packet is transmitted is X = x if the first x −1 transmissions were in error. Therefore the PMF of X is P X (x) = q x−1 (1 −q) x = 1, 2, . . . 0 otherwise (1) (b) The time required to send a packet is a millisecond and the time required to send an acknowl- edgment back to the source takes another millisecond. Thus, if X transmissions of a packet are needed to send the packet correctly, then the packet is correctly received after T = 2X−1 milliseconds. Therefore, for an odd integer t > 0, T = t iff X = (t + 1)/2. Thus, P T (t) = P X ((t + 1)/2) = q (t−1)/2 (1 −q) t = 1, 3, 5, . . . 0 otherwise (2) Problem 2.6.6 Solution The cellular calling plan charges a flat rate of $20 per month up to and including the 30th minute, and an additional 50 cents for each minute over 30 minutes. Knowing that the time you spend on the phone is a geometric random variable M with mean 1/p = 30, the PMF of M is P M (m) = (1 −p) m−1 p m = 1, 2, . . . 0 otherwise (1) The monthly cost, C obeys P C (20) = P [M ≤ 30] = 30 ¸ m=1 (1 −p) m−1 p = 1 −(1 −p) 30 (2) When M ≥ 30, C = 20 + (M −30)/2 or M = 2C −10. Thus, P C (c) = P M (2c −10) c = 20.5, 21, 21.5, . . . (3) The complete PMF of C is P C (c) = 1 −(1 −p) 30 c = 20 (1 −p) 2c−10−1 p c = 20.5, 21, 21.5, . . . (4) Problem 2.7.1 Solution From the solution to Quiz 2.6, we found that T = 120 −15N. By Theorem 2.10, E [T] = ¸ n∈S N (120 −15n)P N (n) (1) = 0.1(120) + 0.3(120 −15) + 0.3(120 −30) + 0.3(120 −45) = 93 (2) Also from the solution to Quiz 2.6, we found that P T (t) = 0.3 t = 75, 90, 105 0.1 t = 120 0 otherwise (3) 53 Using Definition 2.14, E [T] = ¸ t∈S T tP T (t) = 0.3(75) + 0.3(90) + 0.3(105) + 0.1(120) = 93 (4) As expected, the two calculations give the same answer. Problem 2.7.2 Solution Whether a lottery ticket is a winner is a Bernoulli trial with a success probability of 0.001. If we buy one every day for 50 years for a total of 50 · 365 = 18250 tickets, then the number of winning tickets T is a binomial random variable with mean E [T] = 18250(0.001) = 18.25 (1) Since each winning ticket grosses $1000, the revenue we collect over 50 years is R = 1000T dollars. The expected revenue is E [R] = 1000E [T] = 18250 (2) But buying a lottery ticket everyday for 50 years, at $2.00 a pop isn’t cheap and will cost us a total of 18250 · 2 = $36500. Our net profit is then Q = R − 36500 and the result of our loyal 50 year patronage of the lottery system, is disappointing expected loss of E [Q] = E [R] −36500 = −18250 (3) Problem 2.7.3 Solution Let X denote the number of points the shooter scores. If the shot is uncontested, the expected number of points scored is E [X] = (0.6)2 = 1.2 (1) If we foul the shooter, then X is a binomial random variable with mean E[X] = 2p. If 2p > 1.2, then we should not foul the shooter. Generally, p will exceed 0.6 since a free throw is usually even easier than an uncontested shot taken during the action of the game. Furthermore, fouling the shooter ultimately leads to the the detriment of players possibly fouling out. This suggests that fouling a player is not a good idea. The only real exception occurs when facing a player like Shaquille O’Neal whose free throw probability p is lower than his field goal percentage during a game. Problem 2.7.4 Solution Given the distributions of D, the waiting time in days and the resulting cost, C, we can answer the following questions. (a) The expected waiting time is simply the expected value of D. E [D] = 4 ¸ d=1 d · P D (d) = 1(0.2) + 2(0.4) + 3(0.3) + 4(0.1) = 2.3 (1) (b) The expected deviation from the waiting time is E [D −µ D ] = E [D] −E [µ d ] = µ D −µ D = 0 (2) 54 (c) C can be expressed as a function of D in the following manner. C(D) = 90 D = 1 70 D = 2 40 D = 3 40 D = 4 (3) (d) The expected service charge is E [C] = 90(0.2) + 70(0.4) + 40(0.3) + 40(0.1) = 62 dollars (4) Problem 2.7.5 Solution As a function of the number of minutes used, M, the monthly cost is C(M) = 20 M ≤ 30 20 + (M −30)/2 M ≥ 30 (1) The expected cost per month is E [C] = ∞ ¸ m=1 C(m)P M (m) = 30 ¸ m=1 20P M (m) + ∞ ¸ m=31 (20 + (m−30)/2)P M (m) (2) = 20 ∞ ¸ m=1 P M (m) + 1 2 ∞ ¸ m=31 (m−30)P M (m) (3) Since ¸ ∞ m=1 P M (m) = 1 and since P M (m) = (1 −p) m−1 p for m ≥ 1, we have E [C] = 20 + (1 −p) 30 2 ∞ ¸ m=31 (m−30)(1 −p) m−31 p (4) Making the substitution j = m−30 yields E [C] = 20 + (1 −p) 30 2 ∞ ¸ j=1 j(1 −p) j−1 p = 20 + (1 −p) 30 2p (5) Problem 2.7.6 Solution Since our phone use is a geometric random variable M with mean value 1/p, P M (m) = (1 −p) m−1 p m = 1, 2, . . . 0 otherwise (1) For this cellular billing plan, we are given no free minutes, but are charged half the flat fee. That is, we are going to pay 15 dollars regardless and $1 for each minute we use the phone. Hence C = 15 + M and for c ≥ 16, P[C = c] = P[M = c −15]. Thus we can construct the PMF of the cost C P C (c) = (1 −p) c−16 p c = 16, 17, . . . 0 otherwise (2) 55 Since C = 15 +M, the expected cost per month of the plan is E [C] = E [15 +M] = 15 +E [M] = 15 + 1/p (3) In Problem 2.7.5, we found that that the expected cost of the plan was E [C] = 20 + [(1 −p) 30 ]/(2p) (4) In comparing the expected costs of the two plans, we see that the new plan is better (i.e. cheaper) if 15 + 1/p ≤ 20 + [(1 −p) 30 ]/(2p) (5) A simple plot will show that the new plan is better if p ≤ p 0 ≈ 0.2. Problem 2.7.7 Solution We define random variable W such that W = 1 if the circuit works or W = 0 if the circuit is defective. (In the probability literature, W is called an indicator random variable.) Let R s denote the profit on a circuit with standard devices. Let R u denote the profit on a circuit with ultrareliable devices. We will compare E[R s ] and E[R u ] to decide which circuit implementation offers the highest expected profit. The circuit with standard devices works with probability (1 − q) 10 and generates revenue of k dollars if all of its 10 constituent devices work. We observe that we can we can express R s as a function r s (W) and that we can find the PMF P W (w): R s = r s (W) = −10 W = 0, k −10 W = 1, P W (w) = 1 −(1 −q) 10 w = 0, (1 −q) 10 w = 1, 0 otherwise. (1) Thus we can express the expected profit as E [r s (W)] = 1 ¸ w=0 P W (w) r s (w) (2) = P W (0) (−10) +P W (1) (k −10) (3) = (1 −(1 −q) 10 )(−10) + (1 −q) 10 (k −10) = (0.9) 10 k −10. (4) For the ultra-reliable case, R u = r u (W) = −30 W = 0, k −30 W = 1, P W (w) = 1 −(1 −q/2) 10 w = 0, (1 −q/2) 10 w = 1, 0 otherwise. (5) Thus we can express the expected profit as E [r u (W)] = 1 ¸ w=0 P W (w) r u (w) (6) = P W (0) (−30) +P W (1) (k −30) (7) = (1 −(1 −q/2) 10 )(−30) + (1 −q/2) 10 (k −30) = (0.95) 10 k −30 (8) To determine which implementation generates the most profit, we solve E[R u ] ≥ E[R s ], yielding k ≥ 20/[(0.95) 10 −(0.9) 10 ] = 80.21. So for k < $80.21 using all standard devices results in greater 56 Ordinary: q=0.1 U: q/2 = 0.05 10 tests are done O: $1 U: $3 Cost of ordinary if success and fail Geometric probability of fail and sucess Expected = probability x cost revenue, while for k > $80.21 more revenue will be generated by implementing all ultra-reliable devices. That is, when the price commanded for a working circuit is sufficiently high, we should build more-expensive higher-reliability circuits. If you have read ahead to Section 2.9 and learned about conditional expected values, you might prefer the following solution. If not, you might want to come back and review this alternate approach after reading Section 2.9. Let W denote the event that a circuit works. The circuit works and generates revenue of k dollars if all of its 10 constituent devices work. For each implementation, standard or ultra-reliable, let R denote the profit on a device. We can express the expected profit as E [R] = P [W] E [R|W] +P [W c ] E [R|W c ] (9) Let’s first consider the case when only standard devices are used. In this case, a circuit works with probability P[W] = (1 −q) 10 . The profit made on a working device is k −10 dollars while a nonworking circuit has a profit of -10 dollars. That is, E[R|W] = k −10 and E[R|W c ] = −10. Of course, a negative profit is actually a loss. Using R s to denote the profit using standard circuits, the expected profit is E [R s ] = (1 −q) 10 (k −10) + (1 −(1 −q) 10 )(−10) = (0.9) 10 k −10 (10) And for the ultra-reliable case, the circuit works with probability P[W] = (1−q/2) 10 . The profit per working circuit is E[R|W] = k −30 dollars while the profit for a nonworking circuit is E[R|W c ] = −30 dollars. The expected profit is E [R u ] = (1 −q/2) 10 (k −30) + (1 −(1 −q/2) 10 )(−30) = (0.95) 10 k −30 (11) Not surprisingly, we get the same answers for E[R u ] and E[R s ] as in the first solution by performing essentially the same calculations. it should be apparent that indicator random variable W in the first solution indicates the occurrence of the conditioning event W in the second solution. That is, indicators are a way to track conditioning events. Problem 2.7.8 Solution (a) There are 46 6 equally likely winning combinations so that q = 1 46 6 = 1 9,366,819 ≈ 1.07 10 −7 (1) (b) Assuming each ticket is chosen randomly, each of the 2n −1 other tickets is independently a winner with probability q. The number of other winning tickets K n has the binomial PMF P K n (k) = 2n−1 k q k (1 −q) 2n−1−k k = 0, 1, . . . , 2n −1 0 otherwise (2) (c) Since there are K n + 1 winning tickets in all, the value of your winning ticket is W n = n/(K n + 1) which has mean E [W n ] = nE ¸ 1 K n + 1 (3) 57 Calculating the expected value E ¸ 1 K n + 1 = 2n−1 ¸ k=0 1 k + 1 P Kn (k) (4) is fairly complicated. The trick is to express the sum in terms of the sum of a binomial PMF. E ¸ 1 K n + 1 = 2n−1 ¸ k=0 1 k + 1 (2n −1)! k!(2n −1 −k)! q k (1 −q) 2n−1−k (5) = 1 2n 2n−1 ¸ k=0 (2n)! (k + 1)!(2n −(k + 1))! q k (1 −q) 2n−(k+1) (6) By factoring out 1/q, we obtain E ¸ 1 K n + 1 = 1 2nq 2n−1 ¸ k=0 2n k + 1 q k+1 (1 −q) 2n−(k+1) (7) = 1 2nq 2n ¸ j=1 2n j q j (1 −q) 2n−j . .. . A (8) We observe that the above sum labeled A is the sum of a binomial PMF for 2n trials and success probability q over all possible values except j = 0. Thus A = 1 − 2n 0 q 0 (1 −q) 2n−0 = 1 −(1 −q) 2n (9) This implies E ¸ 1 K n + 1 = 1 −(1 −q) 2n 2nq (10) Our expected return on a winning ticket is E [W n ] = nE ¸ 1 K n + 1 = 1 −(1 −q) 2n 2q (11) Note that when nq <1, we can use the approximation that (1 −q) 2n ≈ 1 −2nq to show that E [W n ] ≈ 1 −(1 −2nq) 2q = n (nq <1) (12) However, in the limit as the value of the prize n approaches infinity, we have lim n→∞ E [W n ] = 1 2q ≈ 4.683 10 6 (13) That is, as the pot grows to infinity, the expected return on a winning ticket doesn’t approach infinity because there is a corresponding increase in the number of other winning tickets. If it’s not clear how large n must be for this effect to be seen, consider the following table: n 10 6 10 7 10 8 E [W n ] 9.00 10 5 4.13 10 6 4.68 10 6 (14) When the pot is $1 million, our expected return is $900,000. However, we see that when the pot reaches $100 million, our expected return is very close to 1/(2q), less than $5 million! 58 Problem 2.7.9 Solution (a) There are 46 6 equally likely winning combinations so that q = 1 46 6 = 1 9,366,819 ≈ 1.07 10 −7 (1) (b) Assuming each ticket is chosen randomly, each of the 2n −1 other tickets is independently a winner with probability q. The number of other winning tickets K n has the binomial PMF P Kn (k) = 2n−1 k q k (1 −q) 2n−1−k k = 0, 1, . . . , 2n −1 0 otherwise (2) Since the pot has n +r dollars, the expected amount that you win on your ticket is E [V ] = 0(1 −q) +qE ¸ n +r K n + 1 = q(n +r)E ¸ 1 K n + 1 (3) Note that E[1/K n + 1] was also evaluated in Problem 2.7.8. For completeness, we repeat those steps here. E ¸ 1 K n + 1 = 2n−1 ¸ k=0 1 k + 1 (2n −1)! k!(2n −1 −k)! q k (1 −q) 2n−1−k (4) = 1 2n 2n−1 ¸ k=0 (2n)! (k + 1)!(2n −(k + 1))! q k (1 −q) 2n−(k+1) (5) By factoring out 1/q, we obtain E ¸ 1 K n + 1 = 1 2nq 2n−1 ¸ k=0 2n k + 1 q k+1 (1 −q) 2n−(k+1) (6) = 1 2nq 2n ¸ j=1 2n j q j (1 −q) 2n−j . .. . A (7) We observe that the above sum labeled A is the sum of a binomial PMF for 2n trials and success probability q over all possible values except j = 0. Thus A = 1 − 2n 0 q 0 (1 − q) 2n−0 , which implies E ¸ 1 K n + 1 = A 2nq = 1 −(1 −q) 2n 2nq (8) The expected value of your ticket is E [V ] = q(n +r)[1 −(1 −q) 2n ] 2nq = 1 2 1 + r n [1 −(1 −q) 2n ] (9) Each ticket tends to be more valuable when the carryover pot r is large and the number of new tickets sold, 2n, is small. For any fixed number n, corresponding to 2n tickets sold, 59 a sufficiently large pot r will guarantee that E[V ] > 1. For example if n = 10 7 , (20 million tickets sold) then E [V ] = 0.44 1 + r 10 7 (10) If the carryover pot r is 30 million dollars, then E[V ] = 1.76. This suggests that buying a one dollar ticket is a good idea. This is an unusual situation because normally a carryover pot of 30 million dollars will result in far more than 20 million tickets being sold. (c) So that we can use the results of the previous part, suppose there were 2n − 1 tickets sold before you must make your decision. If you buy one of each possible ticket, you are guaranteed to have one winning ticket. From the other 2n − 1 tickets, there will be K n winners. The total number of winning tickets will be K n + 1. In the previous part we found that E ¸ 1 K n + 1 = 1 −(1 −q) 2n 2nq (11) Let R denote the expected return from buying one of each possible ticket. The pot had r dollars beforehand. The 2n − 1 other tickets are sold add n − 1/2 dollars to the pot. Furthermore, you must buy 1/q tickets, adding 1/(2q) dollars to the pot. Since the cost of the tickets is 1/q dollars, your expected profit E [R] = E ¸ r +n −1/2 + 1/(2q) K n + 1 − 1 q (12) = q(2r + 2n −1) + 1 2q E ¸ 1 K n + 1 − 1 q (13) = [q(2r + 2n −1) + 1](1 −(1 −q) 2n ) 4nq 2 − 1 q (14) For fixed n, sufficiently large r will make E[R] > 0. On the other hand, for fixed r, lim n→∞ E[R] = −1/(2q). That is, as n approaches infinity, your expected loss will be quite large. Problem 2.8.1 Solution Given the following PMF P N (n) = 0.2 n = 0 0.7 n = 1 0.1 n = 2 0 otherwise (1) (a) E[N] = (0.2)0 + (0.7)1 + (0.1)2 = 0.9 (b) E[N 2 ] = (0.2)0 2 + (0.7)1 2 + (0.1)2 2 = 1.1 (c) Var[N] = E[N 2 ] −E[N] 2 = 1.1 −(0.9) 2 = 0.29 (d) σ N = Var[N] = √ 0.29 60 Problem 2.8.2 Solution From the solution to Problem 2.4.1, the PMF of Y is P Y (y) = 1/4 y = 1 1/4 y = 2 1/2 y = 3 0 otherwise (1) The expected value of Y is E [Y ] = ¸ y yP Y (y) = 1(1/4) + 2(1/4) + 3(1/2) = 9/4 (2) The expected value of Y 2 is E Y 2 = ¸ y y 2 P Y (y) = 1 2 (1/4) + 2 2 (1/4) + 3 2 (1/2) = 23/4 (3) The variance of Y is Var[Y ] = E Y 2 −(E [Y ]) 2 = 23/4 −(9/4) 2 = 11/16 (4) Problem 2.8.3 Solution From the solution to Problem 2.4.2, the PMF of X is P X (x) = 0.2 x = −1 0.5 x = 0 0.3 x = 1 0 otherwise (1) The expected value of X is E [X] = ¸ x xP X (x) = (−1)(0.2) + 0(0.5) + 1(0.3) = 0.1 (2) The expected value of X 2 is E X 2 = ¸ x x 2 P X (x) = (−1) 2 (0.2) + 0 2 (0.5) + 1 2 (0.3) = 0.5 (3) The variance of X is Var[X] = E X 2 −(E [X]) 2 = 0.5 −(0.1) 2 = 0.49 (4) Problem 2.8.4 Solution From the solution to Problem 2.4.3, the PMF of X is P X (x) = 0.4 x = −3 0.4 x = 5 0.2 x = 7 0 otherwise (1) 61 The expected value of X is E [X] = ¸ x xP X (x) = −3(0.4) + 5(0.4) + 7(0.2) = 2.2 (2) The expected value of X 2 is E X 2 = ¸ x x 2 P X (x) = (−3) 2 (0.4) + 5 2 (0.4) + 7 2 (0.2) = 23.4 (3) The variance of X is Var[X] = E X 2 −(E [X]) 2 = 23.4 −(2.2) 2 = 18.56 (4) Problem 2.8.5 Solution (a) The expected value of X is E [X] = 4 ¸ x=0 xP X (x) = 0 4 0 1 2 4 + 1 4 1 1 2 4 + 2 4 2 1 2 4 + 3 4 3 1 2 4 + 4 4 4 1 2 4 (1) = [4 + 12 + 12 + 4]/2 4 = 2 (2) The expected value of X 2 is E X 2 = 4 ¸ x=0 x 2 P X (x) = 0 2 4 0 1 2 4 + 1 2 4 1 1 2 4 + 2 2 4 2 1 2 4 + 3 2 4 3 1 2 4 + 4 2 4 4 1 2 4 (3) = [4 + 24 + 36 + 16]/2 4 = 5 (4) The variance of X is Var[X] = E X 2 −(E [X]) 2 = 5 −2 2 = 1 (5) Thus, X has standard deviation σ X = Var[X] = 1. (b) The probability that X is within one standard deviation of its expected value is P [µ X −σ X ≤ X ≤ µ X + σ X ] = P [2 −1 ≤ X ≤ 2 + 1] = P [1 ≤ X ≤ 3] (6) This calculation is easy using the PMF of X. P [1 ≤ X ≤ 3] = P X (1) +P X (2) +P X (3) = 7/8 (7) Problem 2.8.6 Solution (a) The expected value of X is E [X] = 5 ¸ x=0 xP X (x) (1) = 0 5 0 1 2 5 + 1 5 1 1 2 5 + 2 5 2 1 2 5 + 3 5 3 1 2 5 + 4 5 4 1 2 5 + 5 5 5 1 2 5 (2) = [5 + 20 + 30 + 20 + 5]/2 5 = 5/2 (3) 62 The expected value of X 2 is E X 2 = 5 ¸ x=0 x 2 P X (x) (4) = 0 2 5 0 1 2 5 + 1 2 5 1 1 2 5 + 2 2 5 2 1 2 5 + 3 2 5 3 1 2 5 + 4 2 5 4 1 2 5 + 5 2 5 5 1 2 5 (5) = [5 + 40 + 90 + 80 + 25]/2 5 = 240/32 = 15/2 (6) The variance of X is Var[X] = E X 2 −(E [X]) 2 = 15/2 −25/4 = 5/4 (7) By taking the square root of the variance, the standard deviation of X is σ X = 5/4 ≈ 1.12. (b) The probability that X is within one standard deviation of its mean is P [µ X −σ X ≤ X ≤ µ X + σ X ] = P [2.5 −1.12 ≤ X ≤ 2.5 + 1.12] (8) = P [1.38 ≤ X ≤ 3.62] (9) = P [2 ≤ X ≤ 3] (10) By summing the PMF over the desired range, we obtain P [2 ≤ X ≤ 3] = P X (2) +P X (3) = 10/32 + 10/32 = 5/8 (11) Problem 2.8.7 Solution For Y = aX +b, we wish to show that Var[Y ] = a 2 Var[X]. We begin by noting that Theorem 2.12 says that E[aX +b] = aE[X] +b. Hence, by the definition of variance. Var [Y ] = E (aX +b −(aE [X] +b)) 2 (1) = E a 2 (X −E [X]) 2 (2) = a 2 E (X −E [X]) 2 (3) Since E[(X −E[X]) 2 ] = Var[X], the assertion is proved. Problem 2.8.8 Solution Given the following description of the random variable Y , Y = 1 σ x (X −µ X ) (1) we can use the linearity property of the expectation operator to find the mean value E [Y ] = E [X −µ X ] σ X = E [X] −E [X] σ X = 0 (2) Using the fact that Var[aX +b] = a 2 Var[X], the variance of Y is found to be Var [Y ] = 1 σ 2 X Var [X] = 1 (3) 63 Problem 2.8.9 Solution With our measure of jitter being σ T , and the fact that T = 2X −1, we can express the jitter as a function of q by realizing that Var[T] = 4 Var[X] = 4q (1 −q) 2 (1) Therefore, our maximum permitted jitter is σ T = 2 √ q (1 −q) = 2 msec (2) Solving for q yields q 2 −3q + 1 = 0. By solving this quadratic equation, we obtain q = 3 ± √ 5 2 = 3/2 ± √ 5/2 (3) Since q must be a value between 0 and 1, we know that a value of q = 3/2 − √ 5/2 ≈ 0.382 will ensure a jitter of at most 2 milliseconds. Problem 2.8.10 Solution We wish to minimize the function e(ˆ x) = E (X − ˆ x) 2 (1) with respect to ˆ x. We can expand the square and take the expectation while treating ˆ x as a constant. This yields e(ˆ x) = E X 2 −2ˆ xX + ˆ x 2 = E X 2 −2ˆ xE [X] + ˆ x 2 (2) Solving for the value of ˆ x that makes the derivative de(ˆ x)/dˆ x equal to zero results in the value of ˆ x that minimizes e(ˆ x). Note that when we take the derivative with respect to ˆ x, both E[X 2 ] and E[X] are simply constants. d dˆ x E X 2 −2ˆ xE [X] + ˆ x 2 = 2E [X] −2ˆ x = 0 (3) Hence we see that ˆ x = E[X]. In the sense of mean squared error, the best guess for a random variable is the mean value. In Chapter 9 this idea is extended to develop minimum mean squared error estimation. Problem 2.8.11 Solution The PMF of K is the Poisson PMF P K (k) = λ k e −λ /k! k = 0, 1, . . . 0 otherwise (1) The mean of K is E [K] = ∞ ¸ k=0 k λ k e −λ k! = λ ∞ ¸ k=1 λ k−1 e −λ (k −1)! = λ (2) To find E[K 2 ], we use the hint and first find E [K(K −1)] = ∞ ¸ k=0 k(k −1) λ k e −λ k! = ∞ ¸ k=2 λ k e −λ (k −2)! (3) 64 By factoring out λ 2 and substituting j = k −2, we obtain E [K(K −1)] = λ 2 ∞ ¸ j=0 λ j e −λ j! . .. . 1 = λ 2 (4) The above sum equals 1 because it is the sum of a Poisson PMF over all possible values. Since E[K] = λ, the variance of K is Var[K] = E K 2 −(E [K]) 2 (5) = E [K(K −1)] +E [K] −(E [K]) 2 (6) = λ 2 + λ −λ 2 = λ (7) Problem 2.8.12 Solution The standard deviation can be expressed as σ D = Var[D] = E [D 2 ] −E [D] 2 (1) where E D 2 = 4 ¸ d=1 d 2 P D (d) = 0.2 + 1.6 + 2.7 + 1.6 = 6.1 (2) So finally we have σ D = 6.1 −2.3 2 = √ 0.81 = 0.9 (3) Problem 2.9.1 Solution From the solution to Problem 2.4.1, the PMF of Y is P Y (y) = 1/4 y = 1 1/4 y = 2 1/2 y = 3 0 otherwise (1) The probability of the event B = {Y < 3} is P[B] = 1 −P[Y = 3] = 1/2. From Theorem 2.17, the conditional PMF of Y given B is P Y |B (y) = P Y (y) P[B] y ∈ B 0 otherwise = 1/2 y = 1 1/2 y = 2 0 otherwise (2) The conditional first and second moments of Y are E [Y |B] = ¸ y yP Y |B (y) = 1(1/2) + 2(1/2) = 3/2 (3) E Y 2 |B = ¸ y y 2 P Y |B (y) = 1 2 (1/2) + 2 2 (1/2) = 5/2 (4) The conditional variance of Y is Var[Y |B] = E Y 2 |B −(E [Y |B]) 2 = 5/2 −9/4 = 1/4 (5) 65 What's the probability of y=1 or 2 Problem 2.9.2 Solution From the solution to Problem 2.4.2, the PMF of X is P X (x) = 0.2 x = −1 0.5 x = 0 0.3 x = 1 0 otherwise (1) The event B = {|X| > 0} has probability P[B] = P[X = 0] = 0.5. From Theorem 2.17, the conditional PMF of X given B is P X|B (x) = P X (x) P[B] x ∈ B 0 otherwise = 0.4 x = −1 0.6 x = 1 0 otherwise (2) The conditional first and second moments of X are E [X|B] = ¸ x xP X|B (x) = (−1)(0.4) + 1(0.6) = 0.2 (3) E X 2 |B = ¸ x x 2 P X|B (x) = (−1) 2 (0.4) + 1 2 (0.6) = 1 (4) The conditional variance of X is Var[X|B] = E X 2 |B −(E [X|B]) 2 = 1 −(0.2) 2 = 0.96 (5) Problem 2.9.3 Solution From the solution to Problem 2.4.3, the PMF of X is P X (x) = 0.4 x = −3 0.4 x = 5 0.2 x = 7 0 otherwise (1) The event B = {X > 0} has probability P[B] = P X (5) + P X (7) = 0.6. From Theorem 2.17, the conditional PMF of X given B is P X|B (x) = P X (x) P[B] x ∈ B 0 otherwise = 2/3 x = 5 1/3 x = 7 0 otherwise (2) The conditional first and second moments of X are E [X|B] = ¸ x xP X|B (x) = 5(2/3) + 7(1/3) = 17/3 (3) E X 2 |B = ¸ x x 2 P X|B (x) = 5 2 (2/3) + 7 2 (1/3) = 33 (4) The conditional variance of X is Var[X|B] = E X 2 |B −(E [X|B]) 2 = 33 −(17/3) 2 = 8/9 (5) 66 Problem 2.9.4 Solution The event B = {X = 0} has probability P[B] = 1 − P[X = 0] = 15/16. The conditional PMF of X given B is P X|B (x) = P X (x) P[B] x ∈ B 0 otherwise = 4 x 1 15 x = 1, 2, 3, 4 0 otherwise (1) The conditional first and second moments of X are E [X|B] = 4 ¸ x=1 xP X|B (x) = 1 4 1 1 15 2 4 2 1 15 + 3 4 3 1 15 + 4 4 4 1 15 (2) = [4 + 12 + 12 + 4]/15 = 32/15 (3) E X 2 |B = 4 ¸ x=1 x 2 P X|B (x) = 1 2 4 1 1 15 2 2 4 2 1 15 + 3 2 4 3 1 15 + 4 2 4 4 1 15 (4) = [4 + 24 + 36 + 16]/15 = 80/15 (5) The conditional variance of X is Var[X|B] = E X 2 |B −(E [X|B]) 2 = 80/15 −(32/15) 2 = 176/225 ≈ 0.782 (6) Problem 2.9.5 Solution The probability of the event B is P [B] = P [X ≥ µ X ] = P [X ≥ 3] = P X (3) +P X (4) +P X (5) (1) = 5 3 + 5 4 + 5 5 32 = 21/32 (2) The conditional PMF of X given B is P X|B (x) = P X (x) P[B] x ∈ B 0 otherwise = 5 x 1 21 x = 3, 4, 5 0 otherwise (3) The conditional first and second moments of X are E [X|B] = 5 ¸ x=3 xP X|B (x) = 3 5 3 1 21 + 4 5 4 1 21 + 5 5 5 1 21 (4) = [30 + 20 + 5]/21 = 55/21 (5) E X 2 |B = 5 ¸ x=3 x 2 P X|B (x) = 3 2 5 3 1 21 + 4 2 5 4 1 21 + 5 2 5 5 1 21 (6) = [90 + 80 + 25]/21 = 195/21 = 65/7 (7) The conditional variance of X is Var[X|B] = E X 2 |B −(E [X|B]) 2 = 65/7 −(55/21) 2 = 1070/441 = 2.43 (8) 67 Problem 2.9.6 Solution (a) Consider each circuit test as a Bernoulli trial such that a failed circuit is called a success. The number of trials until the first success (i.e. a failed circuit) has the geometric PMF P N (n) = (1 −p) n−1 p n = 1, 2, . . . 0 otherwise (1) (b) The probability there are at least 20 tests is P [B] = P [N ≥ 20] = ∞ ¸ n=20 P N (n) = (1 −p) 19 (2) Note that (1 − p) 19 is just the probability that the first 19 circuits pass the test, which is what we would expect since there must be at least 20 tests if the first 19 circuits pass. The conditional PMF of N given B is P N|B (n) = P N (n) P[B] n ∈ B 0 otherwise = (1 −p) n−20 p n = 20, 21, . . . 0 otherwise (3) (c) Given the event B, the conditional expectation of N is E [N|B] = ¸ n nP N|B (n) = ∞ ¸ n=20 n(1 −p) n−20 p (4) Making the substitution j = n −19 yields E [N|B] = ∞ ¸ j=1 (j + 19)(1 −p) j−1 p = 1/p + 19 (5) We see that in the above sum, we effectively have the expected value of J + 19 where J is geometric random variable with parameter p. This is not surprising since the N ≥ 20 iff we observed 19 successful tests. After 19 successful tests, the number of additional tests needed to find the first failure is still a geometric random variable with mean 1/p. Problem 2.9.7 Solution (a) The PMF of M, the number of miles run on an arbitrary day is P M (m) = q(1 −q) m m = 0, 1, . . . 0 otherwise (1) And we can see that the probability that M > 0, is P [M > 0] = 1 −P [M = 0] = 1 −q (2) 68 (b) The probability that we run a marathon on any particular day is the probability that M ≥ 26. r = P [M ≥ 26] = ∞ ¸ m=26 q(1 −q) m = (1 −q) 26 (3) (c) We run a marathon on each day with probability equal to r, and we do not run a marathon with probability 1 −r. Therefore in a year we have 365 tests of our jogging resolve, and thus 365 chances to run a marathon. So the PMF of the number of marathons run in a year, J, can be expressed as P J (j) = 365 j r j (1 −r) 365−j j = 0, 1, . . . , 365 0 otherwise (4) (d) The random variable K is defined as the number of miles we run above that required for a marathon, K = M −26. Given the event, A, that we have run a marathon, we wish to know how many miles in excess of 26 we in fact ran. So we want to know the conditional PMF P K|A (k). P K|A (k) = P [K = k, A] P [A] = P [M = 26 +k] P [A] (5) Since P[A] = r, for k = 0, 1, . . ., P K|A (k) = (1 −q) 26+k q (1 −q) 26 = (1 −q) k q (6) The complete expression of for the conditional PMF of K is P K|A (k) = (1 −q) k q k = 0, 1, . . . 0 otherwise (7) Problem 2.9.8 Solution Recall that the PMF of the number of pages in a fax is P X (x) = 0.15 x = 1, 2, 3, 4 0.1 x = 5, 6, 7, 8 0 otherwise (1) (a) The event that a fax was sent to machine A can be expressed mathematically as the event that the number of pages X is an even number. Similarly, the event that a fax was sent to B is the event that X is an odd number. Since S X = {1, 2, . . . , 8}, we define the set A = {2, 4, 6, 8}. Using this definition for A, we have that the event that a fax is sent to A is equivalent to the event X ∈ A. The event A has probability P [A] = P X (2) +P X (4) +P X (6) +P X (8) = 0.5 (2) Given the event A, the conditional PMF of X is P X|A (x) = P X (x) P[A] x ∈ A 0 otherwise = 0.3 x = 2, 4 0.2 x = 6, 8 0 otherwise (3) 69 The conditional first and second moments of X given A is E [X|A] = ¸ x xP X|A (x) = 2(0.3) + 4(0.3) + 6(0.2) + 8(0.2) = 4.6 (4) E X 2 |A = ¸ x x 2 P X|A (x) = 4(0.3) + 16(0.3) + 36(0.2) + 64(0.2) = 26 (5) The conditional variance and standard deviation are Var[X|A] = E X 2 |A −(E [X|A]) 2 = 26 −(4.6) 2 = 4.84 (6) σ X|A = Var[X|A] = 2.2 (7) (b) Let the event B denote the event that the fax was sent to B and that the fax had no more than 6 pages. Hence, the event B = {1, 3, 5} has probability P B = P X (1) +P X (3) +P X (5) = 0.4 (8) The conditional PMF of X given B is P X|B (x) = P X (x) P[B ] x ∈ B 0 otherwise = 3/8 x = 1, 3 1/4 x = 5 0 otherwise (9) Given the event B , the conditional first and second moments are E X|B = ¸ x xP X|B (x) = 1(3/8) + 3(3/8) + 5(1/4)+ = 11/4 (10) E X 2 |B = ¸ x x 2 P X|B (x) = 1(3/8) + 9(3/8) + 25(1/4) = 10 (11) The conditional variance and standard deviation are Var[X|B ] = E X 2 |B −(E X|B ) 2 = 10 −(11/4) 2 = 39/16 (12) σ X|B = Var[X|B ] = √ 39/4 ≈ 1.56 (13) Problem 2.10.1 Solution For a binomial (n, p) random variable X, the solution in terms of math is P [E 2 ] = √ n| ¸ x=0 P X x 2 (1) In terms of Matlab, the efficient solution is to generate the vector of perfect squares x = [0 1 4 9 16 ...] and then to pass that vector to the binomialpmf.m. In this case, the val- ues of the binomial PMF are calculated only once. Here is the code: function q=perfectbinomial(n,p); i=0:floor(sqrt(n)); x=i.^2; q=sum(binomialpmf(n,p,x)); For a binomial (100, 0.5) random variable X, the probability X is a perfect square is >> perfectbinomial(100,0.5) ans = 0.0811 70 Problem 2.10.2 Solution The random variable X given in Example 2.29 is just a finite random variable. We can generate random samples using the finiterv function. The code is function x=faxlength8(m); sx=1:8; p=[0.15*ones(1,4) 0.1*ones(1,4)]; x=finiterv(sx,p,m); Problem 2.10.3 Solution First we use faxlength8 from Problem 2.10.2 to generate m samples of the faqx length X. Next we convert that to m samples of the fax cost Y . Summing these samples and dividing by m, we obtain the average cost of m samples. Here is the code: function y=avgfax(m); x=faxlength8(m); yy=cumsum([10 9 8 7 6]); yy=[yy 50 50 50]; y=sum(yy(x))/m; Each time we perform the experiment of executing the function avgfax, we generate m random samples of X, and m corresponding samples of Y . The sum Y = 1 m ¸ m i=1 Y i is random. For m = 10, four samples of Y are >> [avgfax(10) avgfax(10) avgfax(10) avgfax(10)] ans = 31.9000 31.2000 29.6000 34.1000 >> For m = 100, the results are arguably more consistent: >> [avgfax(100) avgfax(100) avgfax(100) avgfax(100)] ans = 34.5300 33.3000 29.8100 33.6900 >> Finally, for m = 1000, we obtain results reasonably close to E[Y ]: >> [avgfax(1000) avgfax(1000) avgfax(1000) avgfax(1000)] ans = 32.1740 31.8920 33.1890 32.8250 >> In Chapter 7, we will develop techniques to show how Y converges to E[Y ] as m →∞. Problem 2.10.4 Solution Suppose X n is a Zipf (n, α = 1) random variable and thus has PMF P X (x) = c(n)/x x = 1, 2, . . . , n 0 otherwise (1) The problem asks us to find the smallest value of k such that P[X n ≤ k] ≥ 0.75. That is, if the server caches the k most popular files, then with P[X n ≤ k] the request is for one of the k cached 71 files. First, we might as well solve this problem for any probability p rather than just p = 0.75. Thus, in math terms, we are looking for k = min ¸ k |P X n ≤ k ≥ p ¸ . (2) What makes the Zipf distribution hard to analyze is that there is no closed form expression for c(n) = n ¸ x=1 1 x −1 . (3) Thus, we use Matlab to grind through the calculations. The following simple program generates the Zipf distributions and returns the correct value of k. function k=zipfcache(n,p); %Usage: k=zipfcache(n,p); %for the Zipf (n,alpha=1) distribution, returns the smallest k %such that the first k items have total probability p pmf=1./(1:n); pmf=pmf/sum(pmf); %normalize to sum to 1 cdf=cumsum(pmf); k=1+sum(cdf<=p); The program zipfcache generalizes 0.75 to be the probability p. Although this program is suffi- cient, the problem asks us to find k for all values of n from 1 to 10 3 !. One way to do this is to call zipfcache a thousand times to find k for each value of n. A better way is to use the properties of the Zipf PDF. In particular, P X n ≤ k = c(n) k ¸ x=1 1 x = c(n) c(k ) (4) Thus we wish to find k = min k | c(n) c(k ) ≥ p = min k | 1 c(k ) ≥ p c(n) . (5) Note that the definition of k implies that 1 c(k ) < p c(n) , k = 1, . . . , k −1. (6) Using the notation |A| to denote the number of elements in the set A, we can write k = 1 + k | 1 c(k ) < p c(n) (7) This is the basis for a very short Matlab program: function k=zipfcacheall(n,p); %Usage: k=zipfcacheall(n,p); %returns vector k such that the first %k(m) items have total probability >= p %for the Zipf(m,1) distribution. c=1./cumsum(1./(1:n)); k=1+countless(1./c,p./c); 72 Note that zipfcacheall uses a short Matlab program countless.m that is almost the same as count.m introduced in Example 2.47. If n=countless(x,y), then n(i) is the number of elements of x that are strictly less than y(i) while count returns the number of elements less than or equal to y(i). In any case, the commands k=zipfcacheall(1000,0.75); plot(1:1000,k); is sufficient to produce this figure of k as a function of m: 0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 n k We see in the figure that the number of files that must be cached grows slowly with the total number of files n. Finally, we make one last observation. It is generally desirable for Matlab to execute opera- tions in parallel. The program zipfcacheall generally will run faster than n calls to zipfcache. However, to do its counting all at once, countless generates and n n array. When n is not too large, say n ≤ 1000, the resulting array with n 2 = 1,000,000 elements fits in memory. For much large values of n, say n = 10 6 (as was proposed in the original printing of this edition of the text, countless will cause an “out of memory” error. Problem 2.10.5 Solution We use poissonrv.m to generate random samples of a Poisson (α = 5) random variable. To compare the Poisson PMF against the output of poissonrv, relative frequencies are calculated using the hist function. The following code plots the relative frequency against the PMF. function diff=poissontest(alpha,m) x=poissonrv(alpha,m); xr=0:ceil(3*alpha); pxsample=hist(x,xr)/m; pxsample=pxsample(:); %pxsample=(countequal(x,xr)/m); px=poissonpmf(alpha,xr); plot(xr,pxsample,xr,px); diff=sum((pxsample-px).^2); For m = 100, 1000, 10000, here are sample plots comparing the PMF and the relative frequency. The plots show reasonable agreement for m = 10000 samples. 73 0 5 10 15 0 0.1 0.2 0 5 10 15 0 0.1 0.2 (a) m = 100 (b) m = 100 0 5 10 15 0 0.1 0.2 0 5 10 15 0 0.1 0.2 (a) m = 1000 (b) m = 1000 0 5 10 15 0 0.1 0.2 0 5 10 15 0 0.1 0.2 (a) m = 10,000 (b) m = 10,000 Problem 2.10.6 Solution We can compare the binomial and Poisson PMFs for (n, p) = (100, 0.1) using the following Matlab code: x=0:20; p=poissonpmf(100,x); b=binomialpmf(100,0.1,x); plot(x,p,x,b); For (n, p) = (10, 1), the binomial PMF has no randomness. For (n, p) = (100, 0.1), the approxima- tion is reasonable: 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 0 0.05 0.1 0.15 0.2 (a) n = 10, p = 1 (b) n = 100, p = 0.1 Finally, for (n, p) = (1000, 0.01), and (n, p) = (10000, 0.001), the approximation is very good: 74 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0 5 10 15 20 0 0.05 0.1 0.15 0.2 (a) n = 1000, p = 0.01 (b) n = 10000, p = 0.001 Problem 2.10.7 Solution Following the Random Sample algorithm, we generate a sample value R = rand(1) and then we find k ∗ such that F K (k ∗ −1) < R < F K (k ∗ ) . (1) From Problem 2.4.4, we know for integers k ≥ 1 that geometric (p) random variable K has CDF F K (k) = 1 −(1 −p) k . Thus, 1 −(1 −p) k ∗ −1 < R ≤ 1 −(1 −p) k ∗ . (2) Subtracting 1 from each side and then multiplying through by −1 (which reverses the inequalities), we obtain (1 −p) k ∗ −1 > 1 −R ≥ (1 −p) k ∗ . (3) Next we take the logarithm of each side. Since logarithms are monotonic functions, we have (k ∗ −1) ln(1 −p) > ln(1 −R) ≥ k ∗ ln(1 −p). (4) Since 0 < p < 1, we have that ln(1 − p) < 0. Thus dividing through by ln(1 − p) reverses the inequalities, yielding k ∗ −1 > ln(1 −R) ln(1 −p) ≤ k ∗ . (5) Since k ∗ is an integer, it must be the smallest integer greater than or equal to ln(1 −R)/ ln(1 −p). That is, following the last step of the random sample algorithm, K = k ∗ = ¸ ln(1 −R) ln(1 −p) ¸ (6) The Matlab algorithm that implements this operation is quite simple: function x=geometricrv(p,m) %Usage: x=geometricrv(p,m) % returns m samples of a geometric (p) rv r=rand(m,1); x=ceil(log(1-r)/log(1-p)); 75 Problem 2.10.8 Solution For the PC version of Matlab employed for this test, poissonpmf(n,n) reported Inf for n = n ∗ = 714. The problem with the poissonpmf function in Example 2.44 is that the cumulative product that calculated n k /k! can have an overflow. Following the hint, we can write an alternate poissonpmf function as follows: function pmf=poissonpmf(alpha,x) %Poisson (alpha) rv X, %out=vector pmf: pmf(i)=P[X=x(i)] x=x(:); if (alpha==0) pmf=1.0*(x==0); else k=(1:ceil(max(x)))’; logfacts =cumsum(log(k)); pb=exp([-alpha; ... -alpha+ (k*log(alpha))-logfacts]); okx=(x>=0).*(x==floor(x)); x=okx.*x; pmf=okx.*pb(x+1); end %pmf(i)=0 for zero-prob x(i) By summing logarithms, the intermediate terms are much less likely to overflow. 76 Problem Solutions – Chapter 3 Problem 3.1.1 Solution The CDF of X is F X (x) = 0 x < −1 (x + 1)/2 −1 ≤ x < 1 1 x ≥ 1 (1) Each question can be answered by expressing the requested probability in terms of F X (x). (a) P [X > 1/2] = 1 −P [X ≤ 1/2] = 1 −F X (1/2) = 1 −3/4 = 1/4 (2) (b) This is a little trickier than it should be. Being careful, we can write P [−1/2 ≤ X < 3/4] = P [−1/2 < X ≤ 3/4] +P [X = −1/2] −P [X = 3/4] (3) Since the CDF of X is a continuous function, the probability that X takes on any specific value is zero. This implies P[X = 3/4] = 0 and P[X = −1/2] = 0. (If this is not clear at this point, it will become clear in Section 3.6.) Thus, P [−1/2 ≤ X < 3/4] = P [−1/2 < X ≤ 3/4] = F X (3/4) −F X (−1/2) = 5/8 (4) (c) P [|X| ≤ 1/2] = P [−1/2 ≤ X ≤ 1/2] = P [X ≤ 1/2] −P [X < −1/2] (5) Note that P[X ≤ 1/2] = F X (1/2) = 3/4. Since the probability that P[X = −1/2] = 0, P[X < −1/2] = P[X ≤ 1/2]. Hence P[X < −1/2] = F X (−1/2) = 1/4. This implies P [|X| ≤ 1/2] = P [X ≤ 1/2] −P [X < −1/2] = 3/4 −1/4 = 1/2 (6) (d) Since F X (1) = 1, we must have a ≤ 1. For a ≤ 1, we need to satisfy P [X ≤ a] = F X (a) = a + 1 2 = 0.8 (7) Thus a = 0.6. Problem 3.1.2 Solution The CDF of V was given to be F V (v) = 0 v < −5 c(v + 5) 2 −5 ≤ v < 7 1 v ≥ 7 (1) (a) For V to be a continuous random variable, F V (v) must be a continuous function. This occurs if we choose c such that F V (v) doesn’t have a discontinuity at v = 7. We meet this requirement if c(7 + 5) 2 = 1. This implies c = 1/144. 77 (b) P [V > 4] = 1 −P [V ≤ 4] = 1 −F V (4) = 1 −81/144 = 63/144 (2) (c) P [−3 < V ≤ 0] = F V (0) −F V (−3) = 25/144 −4/144 = 21/144 (3) (d) Since 0 ≤ F V (v) ≤ 1 and since F V (v) is a nondecreasing function, it must be that −5 ≤ a ≤ 7. In this range, P [V > a] = 1 −F V (a) = 1 −(a + 5) 2 /144 = 2/3 (4) The unique solution in the range −5 ≤ a ≤ 7 is a = 4 √ 3 −5 = 1.928. Problem 3.1.3 Solution In this problem, the CDF of W is F W (w) = 0 w < −5 (w + 5)/8 −5 ≤ w < −3 1/4 −3 ≤ w < 3 1/4 + 3(w −3)/8 3 ≤ w < 5 1 w ≥ 5. (1) Each question can be answered directly from this CDF. (a) P [W ≤ 4] = F W (4) = 1/4 + 3/8 = 5/8. (2) (b) P [−2 < W ≤ 2] = F W (2) −F W (−2) = 1/4 −1/4 = 0. (3) (c) P [W > 0] = 1 −P [W ≤ 0] = 1 −F W (0) = 3/4 (4) (d) By inspection of F W (w), we observe that P[W ≤ a] = F W (a) = 1/2 for a in the range 3 ≤ a ≤ 5. In this range, F W (a) = 1/4 + 3(a −3)/8 = 1/2 (5) This implies a = 11/3. Problem 3.1.4 Solution (a) By definition, nx| is the smallest integer that is greater than or equal to nx. This implies nx ≤ nx| ≤ nx + 1. (b) By part (a), nx n ≤ nx| n ≤ nx + 1 n (1) That is, x ≤ nx| n ≤ x + 1 n (2) This implies x ≤ lim n→∞ nx| n ≤ lim n→∞ x + 1 n = x (3) 78 (c) In the same way, nx| is the largest integer that is less than or equal to nx. This implies nx −1 ≤ nx| ≤ nx. It follows that nx −1 n ≤ nx| n ≤ nx n (4) That is, x − 1 n ≤ nx| n ≤ x (5) This implies lim n→∞ x − 1 n = x ≤ lim n→∞ nx| n ≤ x (6) Problem 3.2.1 Solution f X (x) = cx 0 ≤ x ≤ 2 0 otherwise (1) (a) From the above PDF we can determine the value of c by integrating the PDF and setting it equal to 1. 2 0 cxdx = 2c = 1 (2) Therefore c = 1/2. (b) P[0 ≤ X ≤ 1] = 1 0 x 2 dx = 1/4 (c) P[−1/2 ≤ X ≤ 1/2] = 1/2 0 x 2 dx = 1/16 (d) The CDF of X is found by integrating the PDF from 0 to x. F X (x) = x 0 f X x dx = 0 x < 0 x 2 /4 0 ≤ x ≤ 2 1 x > 2 (3) Problem 3.2.2 Solution From the CDF, we can find the PDF by direct differentiation. The CDF and correponding PDF are F X (x) = 0 x < −1 (x + 1)/2 −1 ≤ x ≤ 1 1 x > 1 f X (x) = 1/2 −1 ≤ x ≤ 1 0 otherwise (1) Problem 3.2.3 Solution We find the PDF by taking the derivative of F U (u) on each piece that F U (u) is defined. The CDF and corresponding PDF of U are F U (u) = 0 u < −5 (u + 5)/8 −5 ≤ u < −3 1/4 −3 ≤ u < 3 1/4 + 3(u −3)/8 3 ≤ u < 5 1 u ≥ 5. f U (u) = 0 u < −5 1/8 −5 ≤ u < −3 0 −3 ≤ u < 3 3/8 3 ≤ u < 5 0 u ≥ 5. (1) 79 Problem 3.2.4 Solution For x < 0, F X (x) = 0. For x ≥ 0, F X (x) = x 0 f X (y) dy (1) = x 0 a 2 ye −a 2 y 2 /2 dy (2) = −e −a 2 y 2 /2 x 0 = 1 −e −a 2 x 2 /2 (3) A complete expression for the CDF of X is F X (x) = 0 x < 0 1 −e −a 2 x 2 /2 x ≥ 0 (4) Problem 3.2.5 Solution f X (x) = ax 2 +bx 0 ≤ x ≤ 1 0 otherwise (1) First, we note that a and b must be chosen such that the above PDF integrates to 1. 1 0 (ax 2 +bx) dx = a/3 +b/2 = 1 (2) Hence, b = 2 −2a/3 and our PDF becomes f X (x) = x(ax + 2 −2a/3) (3) For the PDF to be non-negative for x ∈ [0, 1], we must have ax + 2 − 2a/3 ≥ 0 for all x ∈ [0, 1]. This requirement can be written as a(2/3 −x) ≤ 2 (0 ≤ x ≤ 1) (4) For x = 2/3, the requirement holds for all a. However, the problem is tricky because we must consider the cases 0 ≤ x < 2/3 and 2/3 < x ≤ 1 separately because of the sign change of the inequality. When 0 ≤ x < 2/3, we have 2/3 − x > 0 and the requirement is most stringent at x = 0 where we require 2a/3 ≤ 2 or a ≤ 3. When 2/3 < x ≤ 1, we can write the constraint as a(x − 2/3) ≥ −2. In this case, the constraint is most stringent at x = 1, where we must have a/3 ≥ −2 or a ≥ −6. Thus a complete expression for our requirements are −6 ≤ a ≤ 3 b = 2 −2a/3 (5) As we see in the following plot, the shape of the PDF f X (x) varies greatly with the value of a. 80 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 a=−6 a=−3 a=0 a=3 Problem 3.3.1 Solution f X (x) = 1/4 −1 ≤ x ≤ 3 0 otherwise (1) We recognize that X is a uniform random variable from [-1,3]. (a) E[X] = 1 and Var[X] = (3+1) 2 12 = 4/3. (b) The new random variable Y is defined as Y = h(X) = X 2 . Therefore h(E [X]) = h(1) = 1 (2) and E [h(X)] = E X 2 = Var [X] +E [X] 2 = 4/3 + 1 = 7/3 (3) (c) Finally E [Y ] = E [h(X)] = E X 2 = 7/3 (4) Var [Y ] = E X 4 −E X 2 2 = 3 −1 x 4 4 dx − 49 9 = 61 5 − 49 9 (5) Problem 3.3.2 Solution (a) Since the PDF is uniform over [1,9] E [X] = 1 + 9 2 = 5 Var [X] = (9 −1) 2 12 = 16 3 (1) (b) Define h(X) = 1/ √ X then h(E [X]) = 1/ √ 5 (2) E [h(X)] = 9 1 x −1/2 8 dx = 1/2 (3) 81 (c) E [Y ] = E [h(X)] = 1/2 (4) Var [Y ] = E Y 2 −(E [Y ]) 2 = 9 1 x −1 8 dx −E [X] 2 = ln 9 8 −1/4 (5) Problem 3.3.3 Solution The CDF of X is F X (x) = 0 x < 0 x/2 0 ≤ x < 2 1 x ≥ 2 (1) (a) To find E[X], we first find the PDF by differentiating the above CDF. f X (x) = 1/2 0 ≤ x ≤ 2 0 otherwise (2) The expected value is then E [X] = 2 0 x 2 dx = 1 (3) (b) E X 2 = 2 0 x 2 2 dx = 8/3 (4) Var[X] = E X 2 −E [X] 2 = 8/3 −1 = 5/3 (5) Problem 3.3.4 Solution We can find the expected value of X by direct integration of the given PDF. f Y (y) = y/2 0 ≤ y ≤ 2 0 otherwise (1) The expectation is E [Y ] = 2 0 y 2 2 dy = 4/3 (2) To find the variance, we first find the second moment E Y 2 = 2 0 y 3 2 dy = 2. (3) The variance is then Var[Y ] = E[Y 2 ] −E[Y ] 2 = 2 −(4/3) 2 = 2/9. Problem 3.3.5 Solution The CDF of Y is F Y (y) = 0 y < −1 (y + 1)/2 −1 ≤ y < 1 1 y ≥ 1 (1) 82 (a) We can find the expected value of Y by first differentiating the above CDF to find the PDF f Y (y) = 1/2 −1 ≤ y ≤ 1, 0 otherwise. (2) It follows that E [Y ] = 1 −1 y/2 dy = 0. (3) (b) E Y 2 = 1 −1 y 2 2 dy = 1/3 (4) Var[Y ] = E Y 2 −E [Y ] 2 = 1/3 −0 = 1/3 (5) Problem 3.3.6 Solution To evaluate the moments of V , we need the PDF f V (v), which we find by taking the derivative of the CDF F V (v). The CDF and corresponding PDF of V are F V (v) = 0 v < −5 (v + 5) 2 /144 −5 ≤ v < 7 1 v ≥ 7 f V (v) = 0 v < −5 (v + 5)/72 −5 ≤ v < 7 0 v ≥ 7 (1) (a) The expected value of V is E [V ] = ∞ −∞ vf V (v) dv = 1 72 7 −5 (v 2 + 5v) dv (2) = 1 72 v 3 3 + 5v 2 2 7 −5 = 1 72 343 3 + 245 2 + 125 3 − 125 2 = 3 (3) (b) To find the variance, we first find the second moment E V 2 = ∞ −∞ v 2 f V (v) dv = 1 72 7 −5 (v 3 + 5v 2 ) dv (4) = 1 72 v 4 4 + 5v 3 3 7 −5 = 6719/432 = 15.55 (5) The variance is Var[V ] = E[V 2 ] −(E[V ]) 2 = 2831/432 = 6.55. (c) The third moment of V is E V 3 = ∞ −∞ v 3 f V (v) dv = 1 72 7 −5 (v 4 + 5v 3 ) dv (6) = 1 72 v 5 5 + 5v 4 4 7 −5 = 86.2 (7) 83 Problem 3.3.7 Solution To find the moments, we first find the PDF of U by taking the derivative of F U (u). The CDF and corresponding PDF are F U (u) = 0 u < −5 (u + 5)/8 −5 ≤ u < −3 1/4 −3 ≤ u < 3 1/4 + 3(u −3)/8 3 ≤ u < 5 1 u ≥ 5. f U (u) = 0 u < −5 1/8 −5 ≤ u < −3 0 −3 ≤ u < 3 3/8 3 ≤ u < 5 0 u ≥ 5. (1) (a) The expected value of U is E [U] = ∞ −∞ uf U (u) du = −3 −5 u 8 du + 5 3 3u 8 du (2) = u 2 16 −3 −5 + 3u 2 16 5 3 = 2 (3) (b) The second moment of U is E U 2 ∞ −∞ u 2 f U (u) du = −3 −5 u 2 8 du + 5 3 3u 2 8 du (4) = u 3 24 −3 −5 + u 3 8 5 3 = 49/3 (5) The variance of U is Var[U] = E[U 2 ] −(E[U]) 2 = 37/3. (c) Note that 2 U = e (ln 2)U . This implies that 2 u du = e (ln 2)u du = 1 ln 2 e (ln 2)u = 2 u ln 2 (6) The expected value of 2 U is then E 2 U = ∞ −∞ 2 u f U (u) du = −3 −5 2 u 8 du + 5 3 3 · 2 u 8 du (7) = 2 u 8 ln 2 −3 −5 + 3 · 2 u 8 ln 2 5 3 = 2307 256 ln 2 = 13.001 (8) Problem 3.3.8 Solution The Pareto (α, µ) random variable has PDF f X (x) = (α/µ) (x/µ) −(α+1) x ≥ µ 0 otherwise (1) The nth moment is E [X n ] = ∞ µ x n α µ x µ −(α+1) dx = µ n ∞ µ α µ x µ −(α−n+1) dx (2) 84 With the variable substitution y = x/µ, we obtain E [X n ] = αµ n ∞ 1 y −(α−n+1) dy (3) We see that E[X n ] < ∞ if and only if α −n + 1 > 1, or, equivalently, n < α. In this case, E [X n ] = αµ n −(α −n + 1) + 1 y −(α−n+1)+1 y=∞ y=1 (4) = −αµ n α −n y −(α−n) y=∞ y=1 = αµ n α −n (5) Problem 3.4.1 Solution The reflected power Y has an exponential (λ = 1/P 0 ) PDF. From Theorem 3.8, E[Y ] = P 0 . The probability that an aircraft is correctly identified is P [Y > P 0 ] = ∞ P 0 1 P 0 e −y/P 0 dy = e −1 . (1) Fortunately, real radar systems offer better performance. Problem 3.4.2 Solution From Appendix A, we observe that an exponential PDF Y with parameter λ > 0 has PDF f Y (y) = λe −λy y ≥ 0 0 otherwise (1) In addition, the mean and variance of Y are E [Y ] = 1 λ Var[Y ] = 1 λ 2 (2) (a) Since Var[Y ] = 25, we must have λ = 1/5. (b) The expected value of Y is E[Y ] = 1/λ = 5. (c) P [Y > 5] = ∞ 5 f Y (y) dy = −e −y/5 ∞ 5 = e −1 (3) Problem 3.4.3 Solution From Appendix A, an Erlang random variable X with parameters λ > 0 and n has PDF f X (x) = λ n x n−1 e −λx /(n −1)! x ≥ 0 0 otherwise (1) In addition, the mean and variance of X are E [X] = n λ Var[X] = n λ 2 (2) (a) Since λ = 1/3 and E[X] = n/λ = 15, we must have n = 5. (b) Substituting the parameters n = 5 and λ = 1/3 into the given PDF, we obtain f X (x) = (1/3) 5 x 4 e −x/3 /24 x ≥ 0 0 otherwise (3) (c) From above, we know that Var[X] = n/λ 2 = 45. 85 Problem 3.4.4 Solution Since Y is an Erlang random variable with parameters λ = 2 and n = 2, we find in Appendix A that f Y (y) = 4ye −2y y ≥ 0 0 otherwise (1) (a) Appendix A tells us that E[Y ] = n/λ = 1. (b) Appendix A also tells us that Var[Y ] = n/λ 2 = 1/2. (c) The probability that 1/2 ≤ Y < 3/2 is P [1/2 ≤ Y < 3/2] = 3/2 1/2 f Y (y) dy = 3/2 1/2 4ye −2y dy (2) This integral is easily completed using the integration by parts formula udv = uv − v du with u = 2y dv = 2e −2y du = 2dy v = −e −2y Making these substitutions, we obtain P [1/2 ≤ Y < 3/2] = −2ye −2y 3/2 1/2 + 3/2 1/2 2e −2y dy (3) = 2e −1 −4e −3 = 0.537 (4) Problem 3.4.5 Solution (a) The PDF of a continuous uniform (−5, 5) random variable is f X (x) = 1/10 −5 ≤ x ≤ 5 0 otherwise (1) (b) For x < −5, F X (x) = 0. For x ≥ 5, F X (x) = 1. For −5 ≤ x ≤ 5, the CDF is F X (x) = x −5 f X (τ) dτ = x + 5 10 (2) The complete expression for the CDF of X is F X (x) = 0 x < −5 (x + 5)/10 5 ≤ x ≤ 5 1 x > 5 (3) (c) The expected value of X is 5 −5 x 10 dx = x 2 20 5 −5 = 0 (4) Another way to obtain this answer is to use Theorem 3.6 which says the expected value of X is E[X] = (5 +−5)/2 = 0. 86 (d) The fifth moment of X is 5 −5 x 5 10 dx = x 6 60 5 −5 = 0 (5) (e) The expected value of e X is 5 −5 e x 10 dx = e x 10 5 −5 = e 5 −e −5 10 = 14.84 (6) Problem 3.4.6 Solution We know that X has a uniform PDF over [a, b) and has mean µ X = 7 and variance Var[X] = 3. All that is left to do is determine the values of the constants a and b, to complete the model of the uniform PDF. E [X] = a +b 2 = 7 Var[X] = (b −a) 2 12 = 3 (1) Since we assume b > a, this implies a +b = 14 b −a = 6 (2) Solving these two equations, we arrive at a = 4 b = 10 (3) And the resulting PDF of X is, f X (x) = 1/6 4 ≤ x ≤ 10 0 otherwise (4) Problem 3.4.7 Solution Given that f X (x) = (1/2)e −x/2 x ≥ 0 0 otherwise (1) (a) P [1 ≤ X ≤ 2] = 2 1 (1/2)e −x/2 dx = e −1/2 −e −1 = 0.2387 (2) (b) The CDF of X may be be expressed as F X (x) = 0 x < 0 x 0 (1/2)e −x/2 dτ x ≥ 0 = 0 x < 0 1 −e −x/2 x ≥ 0 (3) (c) X is an exponential random variable with parameter a = 1/2. By Theorem 3.8, the expected value of X is E[X] = 1/a = 2. (d) By Theorem 3.8, the variance of X is Var[X] = 1/a 2 = 4. 87 Problem 3.4.8 Solution Given the uniform PDF f U (u) = 1/(b −a) a ≤ u ≤ b 0 otherwise (1) The mean of U can be found by integrating E [U] = b a u/(b −a) du = b 2 −a 2 2(b −a) = b +a 2 (2) Where we factored (b 2 −a 2 ) = (b−a)(b+a). The variance of U can also be found by finding E[U 2 ]. E U 2 = b a u 2 /(b −a) du = (b 3 −a 3 ) 3(b −a) (3) Therefore the variance is Var[U] = (b 3 −a 3 ) 3(b −a) − b +a 2 2 = (b −a) 2 12 (4) Problem 3.4.9 Solution Let X denote the holding time of a call. The PDF of X is f X (x) = (1/τ)e −x/τ x ≥ 0 0 otherwise (1) We will use C A (X) and C B (X) to denote the cost of a call under the two plans. From the problem statement, we note that C A (X) = 10X so that E[C A (X)] = 10E[X] = 10τ. On the other hand C B (X) = 99 + 10(X −20) + (2) where y + = y if y ≥ 0; otherwise y + = 0 for y < 0. Thus, E [C B (X)] = E 99 + 10(X −20) + (3) = 99 + 10E (X −20) + (4) = 99 + 10E (X −20) + |X ≤ 20 P [X ≤ 20] + 10E (X −20) + |X > 20 P [X > 20] (5) Given X ≤ 20, (X −20) + = 0. Thus E[(X −20) + |X ≤ 20] = 0 and E [C B (X)] = 99 + 10E [(X −20)|X > 20] P [X > 20] (6) Finally, we observe that P[X > 20] = e −20/τ and that E [(X −20)|X > 20] = τ (7) since given X ≥ 20, X−20 has a PDF identical to X by the memoryless property of the exponential random variable. Thus, E [C B (X)] = 99 + 10τe −20/τ (8) Some numeric comparisons show that E[C B (X)] ≤ E[C A (X)] if τ > 12.34 minutes. That is, the flat price for the first 20 minutes is a good deal only if your average phone call is sufficiently long. 88 Problem 3.4.10 Solution The integral I 1 is I 1 = ∞ 0 λe −λx dx = −e −λx ∞ 0 = 1 (1) For n > 1, we have I n = ∞ 0 λ n−1 x n−1 (n −1)! . .. . u λe −λx dt . .. . dv (2) We define u and dv as shown above in order to use the integration by parts formula udv = uv − v du. Since du = λ n−1 x n−1 (n −2)! dx v = −e −λx (3) we can write I n = uv| ∞ 0 − ∞ 0 v du (4) = − λ n−1 x n−1 (n −1)! e −λx ∞ 0 + ∞ 0 λ n−1 x n−1 (n −2)! e −λx dx = 0 +I n−1 (5) Hence, I n = 1 for all n ≥ 1. Problem 3.4.11 Solution For an Erlang (n, λ) random variable X, the kth moment is E X k = ∞ 0 x k f X (x) dt (1) = ∞ 0 λ n x n+k−1 (n −1)! e −λx dt = (n +k −1)! λ k (n −1)! ∞ 0 λ n+k x n+k−1 (n +k −1)! e −λt dt . .. . 1 (2) The above marked integral equals 1 since it is the integral of an Erlang PDF with parameters λ and n +k over all possible values. Hence, E X k = (n +k −1)! λ k (n −1)! (3) This implies that the first and second moments are E [X] = n! (n −1)!λ = n λ E X 2 = (n + 1)! λ 2 (n −1)! = (n + 1)n λ 2 (4) It follows that the variance of X is n/λ 2 . Problem 3.4.12 Solution In this problem, we prove Theorem 3.11 which says that for x ≥ 0, the CDF of an Erlang (n, λ) random variable X n satisfies F Xn (x) = 1 − n−1 ¸ k=0 (λx) k e −λx k! . (1) We do this in two steps. First, we derive a relationship between F X n (x) and F X n−1 (x). Second, we use that relationship to prove the theorem by induction. 89 (a) By Definition 3.7, the CDF of Erlang (n, λ) random variable X n is F Xn (x) = x −∞ f Xn (t) dt = x 0 λ n t n−1 e −λt (n −1)! dt. (2) (b) To use integration by parts, we define u = t n−1 (n −1)! dv = λ n e −λt dt (3) du = t n−2 (n −2)! v = −λ n−1 e −λt (4) Thus, using the integration by parts formula udv = uv − v du, we have F X n (x) = x 0 λ n t n−1 e −λt (n −1)! dt = − λ n−1 t n−1 e −λt (n −1)! x 0 + x 0 λ n−1 t n−2 e −λt (n −2)! dt (5) = − λ n−1 x n−1 e −λx (n −1)! +F X n−1 (x) (6) (c) Now we do proof by induction. For n = 1, the Erlang (n, λ) random variable X 1 is simply an exponential random variable. Hence for x ≥ 0, F X 1 (x) = 1 − e −λx . Now we suppose the claim is true for F X n−1 (x) so that F X n−1 (x) = 1 − n−2 ¸ k=0 (λx) k e −λx k! . (7) Using the result of part (a), we can write F Xn (x) = F X n−1 (x) − (λx) n−1 e −λx (n −1)! (8) = 1 − n−2 ¸ k=0 (λx) k e −λx k! − (λx) n−1 e −λx (n −1)! (9) which proves the claim. Problem 3.4.13 Solution For n = 1, we have the fact E[X] = 1/λ that is given in the problem statement. Now we assume that E[X n−1 ] = (n − 1)!/λ n−1 . To complete the proof, we show that this implies that E[X n ] = n!/λ n . Specifically, we write E [X n ] = 0 x n λe −λx dx (1) Now we use the integration by parts formula udv = uv − v du with u = x n and dv = λe −λx dx. This implies du = nx n−1 dx and v = −e −λx so that E [X n ] = −x n e −λx ∞ 0 + ∞ 0 nx n−1 e −λx dx (2) = 0 + n λ ∞ 0 x n−1 λe −λx dx (3) = n λ E X n−1 (4) 90 By our induction hyothesis, E[X n−1 ] = (n −1)!/λ n−1 which implies E [X n ] = n!/λ n (5) Problem 3.4.14 Solution (a) Since f X (x) ≥ 0 and x ≥ r over the entire integral, we can write ∞ r xf X (x) dx ≥ ∞ r rf X (x) dx = rP [X > r] (1) (b) We can write the expected value of X in the form E [X] = r 0 xf X (x) dx + ∞ r xf X (x) dx (2) Hence, rP [X > r] ≤ ∞ r xf X (x) dx = E [X] − r 0 xf X (x) dx (3) Allowing r to approach infinity yields lim r→∞ rP [X > r] ≤ E [X] − lim r→∞ r 0 xf X (x) dx = E [X] −E [X] = 0 (4) Since rP[X > r] ≥ 0 for all r ≥ 0, we must have lim r→∞ rP[X > r] = 0. (c) We can use the integration by parts formula udv = uv − v du by defining u = 1 −F X (x) and dv = dx. This yields ∞ 0 [1 −F X (x)] dx = x[1 −F X (x)]| ∞ 0 + ∞ 0 xf X (x) dx (5) By applying part (a), we now observe that x[1 −F X (x)]| ∞ 0 = lim r→∞ r[1 −F X (r)] −0 = lim r→∞ rP [X > r] (6) By part (b), lim r→∞ rP[X > r] = 0 and this implies x[1 −F X (x)]| ∞ 0 = 0. Thus, ∞ 0 [1 −F X (x)] dx = ∞ 0 xf X (x) dx = E [X] (7) Problem 3.5.1 Solution Given that the peak temperature, T, is a Gaussian random variable with mean 85 and standard deviation 10 we can use the fact that F T (t) = Φ((t−µ T )/σ T ) and Table 3.1 on page 123 to evaluate the following P [T > 100] = 1 −P [T ≤ 100] = 1 −F T (100) = 1 −Φ 100 −85 10 (1) = 1 −Φ(1.5) = 1 −0.933 = 0.066 (2) P [T < 60] = Φ 60 −85 10 = Φ(−2.5) (3) = 1 −Φ(2.5) = 1 −.993 = 0.007 (4) P [70 ≤ T ≤ 100] = F T (100) −F T (70) (5) = Φ(1.5) −Φ(−1.5) = 2Φ(1.5) −1 = .866 (6) 91 Problem 3.5.2 Solution The standard normal Gaussian random variable Z has mean µ = 0 and variance σ 2 = 1. Making these substitutions in Definition 3.8 yields f Z (z) = 1 √ 2π e −z 2 /2 (1) Problem 3.5.3 Solution X is a Gaussian random variable with zero mean but unknown variance. We do know, however, that P [|X| ≤ 10] = 0.1 (1) We can find the variance Var[X] by expanding the above probability in terms of the Φ(·) function. P [−10 ≤ X ≤ 10] = F X (10) −F X (−10) = 2Φ 10 σ X −1 (2) This implies Φ(10/σ X ) = 0.55. Using Table 3.1 for the Gaussian CDF, we find that 10/σ X = 0.15 or σ X = 66.6. Problem 3.5.4 Solution Repeating Definition 3.11, Q(z) = 1 √ 2π ∞ z e −u 2 /2 du (1) Making the substitution x = u/ √ 2, we have Q(z) = 1 √ π ∞ z/ √ 2 e −x 2 dx = 1 2 erfc z √ 2 (2) Problem 3.5.5 Solution Moving to Antarctica, we find that the temperature, T is still Gaussian but with variance 225. We also know that with probability 1/2, T exceeds 10 degrees. First we would like to find the mean temperature, and we do so by looking at the second fact. P [T > 10] = 1 −P [T ≤ 10] = 1 −Φ 10 −µ T 15 = 1/2 (1) By looking at the table we find that if Φ(Γ) = 1/2, then Γ = 0. Therefore, Φ 10 −µ T 15 = 1/2 (2) implies that (10 −µ T )/15 = 0 or µ T = 10. Now we have a Gaussian T with mean 10 and standard 92 deviation 15. So we are prepared to answer the following problems. P [T > 32] = 1 −P [T ≤ 32] = 1 −Φ 32 −10 15 (3) = 1 −Φ(1.45) = 1 −0.926 = 0.074 (4) P [T < 0] = F T (0) = Φ 0 −10 15 (5) = Φ(−2/3) = 1 −Φ(2/3) (6) = 1 −Φ(0.67) = 1 −0.749 = 0.251 (7) P [T > 60] = 1 −P [T ≤ 60] = 1 −F T (60) (8) = 1 −Φ 60 −10 15 = 1 −Φ(10/3) (9) = Q(3.33) = 4.34 · 10 −4 (10) Problem 3.5.6 Solution In this problem, we use Theorem 3.14 and the tables for the Φ and Q functions to answer the questions. Since E[Y 20 ] = 40(20) = 800 and Var[Y 20 ] = 100(20) = 2000, we can write P [Y 20 > 1000] = P ¸ Y 20 −800 √ 2000 > 1000 −800 √ 2000 (1) = P ¸ Z > 200 20 √ 5 = Q(4.47) = 3.91 10 −6 (2) The second part is a little trickier. Since E[Y 25 ] = 1000, we know that the prof will spend around $1000 in roughly 25 years. However, to be certain with probability 0.99 that the prof spends $1000 will require more than 25 years. In particular, we know that P [Y n > 1000] = P ¸ Y n −40n √ 100n > 1000 −40n √ 100n = 1 −Φ 100 −4n √ n = 0.99 (3) Hence, we must find n such that Φ 100 −4n √ n = 0.01 (4) Recall that Φ(x) = 0.01 for a negative value of x. This is consistent with our earlier observation that we would need n > 25 corresponding to 100 −4n < 0. Thus, we use the identity Φ(x) = 1 −Φ(−x) to write Φ 100 −4n √ n = 1 −Φ 4n −100 √ n = 0.01 (5) Equivalently, we have Φ 4n −100 √ n = 0.99 (6) From the table of the Φ function, we have that (4n −100)/ √ n = 2.33, or (n −25) 2 = (0.58) 2 n = 0.3393n. (7) Solving this quadratic yields n = 28.09. Hence, only after 28 years are we 99 percent sure that the prof will have spent $1000. Note that a second root of the quadratic yields n = 22.25. This root is not a valid solution to our problem. Mathematically, it is a solution of our quadratic in which we choose the negative root of √ n. This would correspond to assuming the standard deviation of Y n is negative. 93 Problem 3.5.7 Solution We are given that there are 100,000,000 men in the United States and 23,000 of them are at least 7 feet tall, and the heights of U.S men are independent Gaussian random variables with mean 5 10 . (a) Let H denote the height in inches of a U.S male. To find σ X , we look at the fact that the probability that P[H ≥ 84] is the number of men who are at least 7 feet tall divided by the total number of men (the frequency interpretation of probability). Since we measure H in inches, we have P [H ≥ 84] = 23,000 100,000,000 = Φ 70 −84 σ X = 0.00023 (1) Since Φ(−x) = 1 −Φ(x) = Q(x), Q(14/σ X ) = 2.3 · 10 −4 (2) From Table 3.2, this implies 14/σ X = 3.5 or σ X = 4. (b) The probability that a randomly chosen man is at least 8 feet tall is P [H ≥ 96] = Q 96 −70 4 = Q(6.5) (3) Unfortunately, Table 3.2 doesn’t include Q(6.5), although it should be apparent that the probability is very small. In fact, Q(6.5) = 4.0 10 −11 . (c) First we need to find the probability that a man is at least 7’6”. P [H ≥ 90] = Q 90 −70 4 = Q(5) ≈ 3 · 10 −7 = β (4) Although Table 3.2 stops at Q(4.99), if you’re curious, the exact value is Q(5) = 2.87 · 10 −7 . Now we can begin to find the probability that no man is at least 7’6”. This can be modeled as 100,000,000 repetitions of a Bernoulli trial with parameter 1 −β. The probability that no man is at least 7’6” is (1 −β) 100,000,000 = 9.4 10 −14 (5) (d) The expected value of N is just the number of trials multiplied by the probability that a man is at least 7’6”. E [N] = 100,000,000 · β = 30 (6) Problem 3.5.8 Solution This problem is in the wrong section since the erf(·) function is defined later on in Section 3.9 as erf(x) = 2 √ π x 0 e −u 2 du. (1) 94 (a) Since Y is Gaussian (0, 1/ √ 2), Y has variance 1/2 and f Y (y) = 1 2π(1/2) e −y 2 /[2(1/2)] = 1 √ π e −y 2 . (2) For y ≥ 0, F Y (y) = y −∞ f Y (u) du = 1/2 + y 0 f Y (u) du. Substituting f Y (u) yields F Y (y) = 1 2 + 1 √ π y 0 e −u 2 du = 1 2 + erf(y). (3) (b) Since Y is Gaussian (0, 1/ √ 2), Z = √ 2Y is Gaussian with expected value E[Z] = √ 2E[Y ] = 0 and variance Var[Z] = 2 Var[Y ] = 1. Thus Z is Gaussian (0, 1) and Φ(z) = F Z (z) = P √ 2Y ≤ z = P ¸ Y ≤ z √ 2 = F Y z √ 2 = 1 2 + erf z √ 2 (4) Problem 3.5.9 Solution First we note that since W has an N[µ, σ 2 ] distribution, the integral we wish to evaluate is I = ∞ −∞ f W (w) dw = 1 √ 2πσ 2 ∞ −∞ e −(w−µ) 2 /2σ 2 dw (1) (a) Using the substitution x = (w −µ)/σ, we have dx = dw/σ and I = 1 √ 2π ∞ −∞ e −x 2 /2 dx (2) (b) When we write I 2 as the product of integrals, we use y to denote the other variable of integration so that I 2 = 1 √ 2π ∞ −∞ e −x 2 /2 dx 1 √ 2π ∞ −∞ e −y 2 /2 dy (3) = 1 2π ∞ −∞ ∞ −∞ e −(x 2 +y 2 )/2 dxdy (4) (c) By changing to polar coordinates, x 2 +y 2 = r 2 and dxdy = r dr dθ so that I 2 = 1 2π 2π 0 ∞ 0 e −r 2 /2 r dr dθ (5) = 1 2π 2π 0 −e −r 2 /2 ∞ 0 dθ = 1 2π 2π 0 dθ = 1 (6) Problem 3.5.10 Solution This problem is mostly calculus and only a little probability. From the problem statement, the SNR Y is an exponential (1/γ) random variable with PDF f Y (y) = (1/γ)e −y/γ y ≥ 0, 0 otherwise. (1) 95 Thus, from the problem statement, the BER is P e = E [P e (Y )] = ∞ −∞ Q( 2y)f Y (y) dy = ∞ 0 Q( 2y) y γ e −y/γ dy (2) Like most integrals with exponential factors, its a good idea to try integration by parts. Before doing so, we recall that if X is a Gaussian (0, 1) random variable with CDF F X (x), then Q(x) = 1 −F X (x) . (3) It follows that Q(x) has derivative Q (x) = dQ(x) dx = − dF X (x) dx = −f X (x) = − 1 √ 2π e −x 2 /2 (4) To solve the integral, we use the integration by parts formula b a udv = uv| b a − b a v du, where u = Q( 2y) dv = 1 γ e −y/γ dy (5) du = Q ( 2y) 1 √ 2y = − e −y 2 √ πy v = −e −y/γ (6) From integration by parts, it follows that P e = uv| ∞ 0 − ∞ 0 v du = −Q( 2y)e −y/γ ∞ 0 − ∞ 0 1 √ y e −y[1+(1/γ)] dy (7) = 0 +Q(0)e −0 − 1 2 √ π ∞ 0 y −1/2 e −y/¯ γ dy (8) where ¯ γ = γ/(1 + γ). Next, recalling that Q(0) = 1/2 and making the substitution t = y/¯ γ, we obtain P e = 1 2 − 1 2 ¯ γ π ∞ 0 t −1/2 e −t dt (9) From Math Fact B.11, we see that the remaining integral is the Γ(z) function evaluated z = 1/2. Since Γ(1/2) = √ π, P e = 1 2 − 1 2 ¯ γ π Γ(1/2) = 1 2 1 − √ ¯ γ = 1 2 ¸ 1 − γ 1 + γ (10) Problem 3.6.1 Solution (a) Using the given CDF P [X < −1] = F X −1 − = 0 (1) P [X ≤ −1] = F X (−1) = −1/3 + 1/3 = 0 (2) Where F X (−1 − ) denotes the limiting value of the CDF found by approaching −1 from the left. Likewise, F X (−1 + ) is interpreted to be the value of the CDF found by approaching −1 from the right. We notice that these two probabilities are the same and therefore the probability that X is exactly −1 is zero. 96 (b) P [X < 0] = F X 0 − = 1/3 (3) P [X ≤ 0] = F X (0) = 2/3 (4) Here we see that there is a discrete jump at X = 0. Approached from the left the CDF yields a value of 1/3 but approached from the right the value is 2/3. This means that there is a non-zero probability that X = 0, in fact that probability is the difference of the two values. P [X = 0] = P [X ≤ 0] −P [X < 0] = 2/3 −1/3 = 1/3 (5) (c) P [0 < X ≤ 1] = F X (1) −F X 0 + = 1 −2/3 = 1/3 (6) P [0 ≤ X ≤ 1] = F X (1) −F X 0 − = 1 −1/3 = 2/3 (7) The difference in the last two probabilities above is that the first was concerned with the probability that X was strictly greater then 0, and the second with the probability that X was greater than or equal to zero. Since the the second probability is a larger set (it includes the probability that X = 0) it should always be greater than or equal to the first probability. The two differ by the probability that X = 0, and this difference is non-zero only when the random variable exhibits a discrete jump in the CDF. Problem 3.6.2 Solution Similar to the previous problem we find (a) P [X < −1] = F X −1 − = 0 P [X ≤ −1] = F X (−1) = 1/4 (1) Here we notice the discontinuity of value 1/4 at x = −1. (b) P [X < 0] = F X 0 − = 1/2 P [X ≤ 0] = F X (0) = 1/2 (2) Since there is no discontinuity at x = 0, F X (0 − ) = F X (0 + ) = F X (0). (c) P [X > 1] = 1 −P [X ≤ 1] = 1 −F X (1) = 0 (3) P [X ≥ 1] = 1 −P [X < 1] = 1 −F X 1 − = 1 −3/4 = 1/4 (4) Again we notice a discontinuity of size 1/4, here occurring at x = 1. Problem 3.6.3 Solution (a) By taking the derivative of the CDF F X (x) given in Problem 3.6.2, we obtain the PDF f X (x) = δ(x+1) 4 + 1/4 + δ(x−1) 4 −1 ≤ x ≤ 1 0 otherwise (1) 97 (b) The first moment of X is E [X] = ∞ −∞ xf X (x) dx (2) = x/4| x=−1 + x 2 /8 1 −1 + x/4| x=1 = −1/4 + 0 + 1/4 = 0. (3) (c) The second moment of X is E X 2 = ∞ −∞ x 2 f X (x) dx (4) = x 2 /4 x=−1 + x 3 /12 1 −1 + x 2 /4 x=1 = 1/4 + 1/6 + 1/4 = 2/3. (5) Since E[X] = 0, Var[X] = E[X 2 ] = 2/3. Problem 3.6.4 Solution The PMF of a Bernoulli random variable with mean p is P X (x) = 1 −p x = 0 p x = 1 0 otherwise (1) The corresponding PDF of this discrete random variable is f X (x) = (1 −p)δ(x) +pδ(x −1) (2) Problem 3.6.5 Solution The PMF of a geometric random variable with mean 1/p is P X (x) = p(1 −p) x−1 x = 1, 2, . . . 0 otherwise (1) The corresponding PDF is f X (x) = pδ(x −1) +p(1 −p)δ(x −2) + · · · (2) = ∞ ¸ j=1 p(1 −p) j−1 δ(x −j) (3) Problem 3.6.6 Solution (a) Since the conversation time cannot be negative, we know that F W (w) = 0 for w < 0. The conversation time W is zero iff either the phone is busy, no one answers, or if the conversation time X of a completed call is zero. Let A be the event that the call is answered. Note that the event A c implies W = 0. For w ≥ 0, F W (w) = P [A c ] +P [A] F W|A (w) = (1/2) + (1/2)F X (w) (1) Thus the complete CDF of W is F W (w) = 0 w < 0 1/2 + (1/2)F X (w) w ≥ 0 (2) 98 (b) By taking the derivative of F W (w), the PDF of W is f W (w) = (1/2)δ(w) + (1/2)f X (w) 0 otherwise (3) Next, we keep in mind that since X must be nonnegative, f X (x) = 0 for x < 0. Hence, f W (w) = (1/2)δ(w) + (1/2)f X (w) (4) (c) From the PDF f W (w), calculating the moments is straightforward. E [W] = ∞ −∞ wf W (w) dw = (1/2) ∞ −∞ wf X (w) dw = E [X] /2 (5) The second moment is E W 2 = ∞ −∞ w 2 f W (w) dw = (1/2) ∞ −∞ w 2 f X (w) dw = E X 2 /2 (6) The variance of W is Var[W] = E W 2 −(E [W]) 2 = E X 2 /2 −(E [X] /2) 2 (7) = (1/2) Var[X] + (E [X]) 2 /4 (8) Problem 3.6.7 Solution The professor is on time 80 percent of the time and when he is late his arrival time is uniformly distributed between 0 and 300 seconds. The PDF of T, is f T (t) = 0.8δ(t −0) + 0.2 300 0 ≤ t ≤ 300 0 otherwise (1) The CDF can be found be integrating F T (t) = 0 t < −1 0.8 + 0.2t 300 0 ≤ t < 300 1 t ≥ 300 (2) Problem 3.6.8 Solution Let G denote the event that the throw is good, that is, no foul occurs. The CDF of D obeys F D (y) = P [D ≤ y|G] P [G] +P [D ≤ y|G c ] P [G c ] (1) Given the event G, P [D ≤ y|G] = P [X ≤ y −60] = 1 −e −(y−60)/10 (y ≥ 60) (2) Of course, for y < 60, P[D ≤ y|G] = 0. From the problem statement, if the throw is a foul, then D = 0. This implies P [D ≤ y|G c ] = u(y) (3) 99 where u(·) denotes the unit step function. Since P[G] = 0.7, we can write F D (y) = P [G] P [D ≤ y|G] +P [G c ] P [D ≤ y|G c ] (4) = 0.3u(y) y < 60 0.3 + 0.7(1 −e −(y−60)/10 ) y ≥ 60 (5) Another way to write this CDF is F D (y) = 0.3u(y) + 0.7u(y −60)(1 −e −(y−60)/10 ) (6) However, when we take the derivative, either expression for the CDF will yield the PDF. However, taking the derivative of the first expression perhaps may be simpler: f D (y) = 0.3δ(y) y < 60 0.07e −(y−60)/10 y ≥ 60 (7) Taking the derivative of the second expression for the CDF is a little tricky because of the product of the exponential and the step function. However, applying the usual rule for the differentation of a product does give the correct answer: f D (y) = 0.3δ(y) + 0.7δ(y −60)(1 −e −(y−60)/10 ) + 0.07u(y −60)e −(y−60)/10 (8) = 0.3δ(y) + 0.07u(y −60)e −(y−60)/10 (9) The middle term δ(y −60)(1 −e −(y−60)/10 ) dropped out because at y = 60, e −(y−60)/10 = 1. Problem 3.6.9 Solution The professor is on time and lectures the full 80 minutes with probability 0.7. In terms of math, P [T = 80] = 0.7. (1) Likewise when the professor is more than 5 minutes late, the students leave and a 0 minute lecture is observed. Since he is late 30% of the time and given that he is late, his arrival is uniformly distributed between 0 and 10 minutes, the probability that there is no lecture is P [T = 0] = (0.3)(0.5) = 0.15 (2) The only other possible lecture durations are uniformly distributed between 75 and 80 minutes, because the students will not wait longer then 5 minutes, and that probability must add to a total of 1 −0.7 −0.15 = 0.15. So the PDF of T can be written as f T (t) = 0.15δ(t) t = 0 0.03 75 ≤ 7 < 80 0.7δ(t −80) t = 80 0 otherwise (3) Problem 3.7.1 Solution Since 0 ≤ X ≤ 1, Y = X 2 satisfies 0 ≤ Y ≤ 1. We can conclude that F Y (y) = 0 for y < 0 and that F Y (y) = 1 for y ≥ 1. For 0 ≤ y < 1, F Y (y) = P X 2 ≤ y = P [X ≤ √ y] (1) 100 Since f X (x) = 1 for 0 ≤ x ≤ 1, we see that for 0 ≤ y < 1, P [X ≤ √ y] = √ y 0 dx = √ y (2) Hence, the CDF of Y is F Y (y) = 0 y < 0 √ y 0 ≤ y < 1 1 y ≥ 1 (3) By taking the derivative of the CDF, we obtain the PDF f Y (y) = 1/(2 √ y) 0 ≤ y < 1 0 otherwise (4) Problem 3.7.2 Solution Since Y = √ X, the fact that X is nonegative and that we asume the squre root is always positive implies F Y (y) = 0 for y < 0. In addition, for y ≥ 0, we can find the CDF of Y by writing F Y (y) = P [Y ≤ y] = P √ X ≤ y = P X ≤ y 2 = F X y 2 (1) For x ≥ 0, F X (x) = 1 −e −λx . Thus, F Y (y) = 1 −e −λy 2 y ≥ 0 0 otherwise (2) By taking the derivative with respect to y, it follows that the PDF of Y is f Y (y) = 2λye −λy 2 y ≥ 0 0 otherwise (3) In comparing this result to the Rayleigh PDF given in Appendix A, we observe that Y is a Rayleigh (a) random variable with a = √ 2λ. Problem 3.7.3 Solution Since X is non-negative, W = X 2 is also non-negative. Hence for w < 0, f W (w) = 0. For w ≥ 0, F W (w) = P [W ≤ w] = P X 2 ≤ w (1) = P [X ≤ w] (2) = 1 −e −λ √ w (3) Taking the derivative with respect to w yields f W (w) = λe −λ √ w /(2 √ w). The complete expression for the PDF is f W (w) = λe −λ √ w 2 √ w w ≥ 0 0 otherwise (4) 101 Problem 3.7.4 Solution From Problem 3.6.1, random variable X has CDF F X (x) = 0 x < −1 x/3 + 1/3 −1 ≤ x < 0 x/3 + 2/3 0 ≤ x < 1 1 1 ≤ x (1) (a) We can find the CDF of Y , F Y (y) by noting that Y can only take on two possible values, 0 and 100. And the probability that Y takes on these two values depends on the probability that X < 0 and X ≥ 0, respectively. Therefore F Y (y) = P [Y ≤ y] = 0 y < 0 P [X < 0] 0 ≤ y < 100 1 y ≥ 100 (2) The probabilities concerned with X can be found from the given CDF F X (x). This is the general strategy for solving problems of this type: to express the CDF of Y in terms of the CDF of X. Since P[X < 0] = F X (0 − ) = 1/3, the CDF of Y is F Y (y) = P [Y ≤ y] = 0 y < 0 1/3 0 ≤ y < 100 1 y ≥ 100 (3) (b) The CDF F Y (y) has jumps of 1/3 at y = 0 and 2/3 at y = 100. The corresponding PDF of Y is f Y (y) = δ(y)/3 + 2δ(y −100)/3 (4) (c) The expected value of Y is E [Y ] = ∞ −∞ yf Y (y) dy = 0 · 1 3 + 100 · 2 3 = 66.66 (5) Problem 3.7.5 Solution Before solving for the PDF, it is helpful to have a sketch of the function X = −ln(1 −U). 0 0.5 1 0 2 4 U X (a) From the sketch, we observe that X will be nonnegative. Hence F X (x) = 0 for x < 0. Since U has a uniform distribution on [0, 1], for 0 ≤ u ≤ 1, P[U ≤ u] = u. We use this fact to find the CDF of X. For x ≥ 0, F X (x) = P [−ln(1 −U) ≤ x] = P 1 −U ≥ e −x = P U ≤ 1 −e −x (1) 102 For x ≥ 0, 0 ≤ 1 −e −x ≤ 1 and so F X (x) = F U 1 −e −x = 1 −e −x (2) The complete CDF can be written as F X (x) = 0 x < 0 1 −e −x x ≥ 0 (3) (b) By taking the derivative, the PDF is f X (x) = e −x x ≥ 0 0 otherwise (4) Thus, X has an exponential PDF. In fact, since most computer languages provide uniform [0, 1] random numbers, the procedure outlined in this problem provides a way to generate exponential random variables from uniform random variables. (c) Since X is an exponential random variable with parameter a = 1, E[X] = 1. Problem 3.7.6 Solution We wish to find a transformation that takes a uniformly distributed random variable on [0,1] to the following PDF for Y . f Y (y) = 3y 2 0 ≤ y ≤ 1 0 otherwise (1) We begin by realizing that in this case the CDF of Y must be F Y (y) = 0 y < 0 y 3 0 ≤ y ≤ 1 1 otherwise (2) Therefore, for 0 ≤ y ≤ 1, P [Y ≤ y] = P [g(X) ≤ y] = y 3 (3) Thus, using g(X) = X 1/3 , we see that for 0 ≤ y ≤ 1, P [g(X) ≤ y] = P X 1/3 ≤ y = P X ≤ y 3 = y 3 (4) which is the desired answer. Problem 3.7.7 Solution Since the microphone voltage V is uniformly distributed between -1 and 1 volts, V has PDF and CDF f V (v) = 1/2 −1 ≤ v ≤ 1 0 otherwise F V (v) = 0 v < −1 (v + 1)/2 −1 ≤ v ≤ 1 1 v > 1 (1) The voltage is processed by a limiter whose output magnitude is given by below L = |V | |V | ≤ 0.5 0.5 otherwise (2) 103 (a) P [L = 0.5] = P [|V | ≥ 0.5] = P [V ≥ 0.5] +P [V ≤ −0.5] (3) = 1 −F V (0.5) +F V (−0.5) (4) = 1 −1.5/2 + 0.5/2 = 1/2 (5) (b) For 0 ≤ l ≤ 0.5, F L (l) = P [|V | ≤ l] = P [−l ≤ v ≤ l] = F V (l) −F V (−l) (6) = 1/2(l + 1) −1/2(−l + 1) = l (7) So the CDF of L is F L (l) = 0 l < 0 l 0 ≤ l < 0.5 1 l ≥ 0.5 (8) (c) By taking the derivative of F L (l), the PDF of L is f L (l) = 1 + (0.5)δ(l −0.5) 0 ≤ l ≤ 0.5 0 otherwise (9) The expected value of L is E [L] = ∞ −∞ lf L (l) dl = 0.5 0 l dl + 0.5 0.5 0 l(0.5)δ(l −0.5) dl = 0.375 (10) Problem 3.7.8 Solution Let X denote the position of the pointer and Y denote the area within the arc defined by the stopping position of the pointer. (a) If the disc has radius r, then the area of the disc is πr 2 . Since the circumference of the disc is 1 and X is measured around the circumference, Y = πr 2 X. For example, when X = 1, the shaded area is the whole disc and Y = πr 2 . Similarly, if X = 1/2, then Y = πr 2 /2 is half the area of the disc. Since the disc has circumference 1, r = 1/(2π) and Y = πr 2 X = X 4π (1) (b) The CDF of Y can be expressed as F Y (y) = P [Y ≤ y] = P ¸ X 4π ≤ y = P [X ≤ 4πy] = F X (4πy) (2) Therefore the CDF is F Y (y) = 0 y < 0 4πy 0 ≤ y ≤ 1 4π 1 y ≥ 1 4π (3) (c) By taking the derivative of the CDF, the PDF of Y is f Y (y) = 4π 0 ≤ y ≤ 1 4π 0 otherwise (4) (d) The expected value of Y is E[Y ] = 1/(4π) 0 4πy dy = 1/(8π). 104 Problem 3.7.9 Solution The uniform (0, 2) random variable U has PDF and CDF f U (u) = 1/2 0 ≤ u ≤ 2, 0 otherwise, F U (u) = 0 u < 0, u/2 0 ≤ u < 2, 1 u > 2. (1) The uniform random variable U is subjected to the following clipper. W = g(U) = U U ≤ 1 1 U > 1 (2) To find the CDF of the output of the clipper, W, we remember that W = U for 0 ≤ U ≤ 1 while W = 1 for 1 ≤ U ≤ 2. First, this implies W is nonnegative, i.e., F W (w) = 0 for w < 0. Furthermore, for 0 ≤ w ≤ 1, F W (w) = P [W ≤ w] = P [U ≤ w] = F U (w) = w/2 (3) Lastly, we observe that it is always true that W ≤ 1. This implies F W (w) = 1 for w ≥ 1. Therefore the CDF of W is F W (w) = 0 w < 0 w/2 0 ≤ w < 1 1 w ≥ 1 (4) From the jump in the CDF at w = 1, we see that P[W = 1] = 1/2. The corresponding PDF can be found by taking the derivative and using the delta function to model the discontinuity. f W (w) = 1/2 + (1/2)δ(w −1) 0 ≤ w ≤ 1 0 otherwise (5) The expected value of W is E [W] = ∞ −∞ wf W (w) dw = 1 0 w[1/2 + (1/2)δ(w −1)] dw (6) = 1/4 + 1/2 = 3/4. (7) Problem 3.7.10 Solution Given the following function of random variable X, Y = g(X) = 10 X < 0 −10 X ≥ 0 (1) we follow the same procedure as in Problem 3.7.4. We attempt to express the CDF of Y in terms of the CDF of X. We know that Y is always less than −10. We also know that −10 ≤ Y < 10 when X ≥ 0, and finally, that Y = 10 when X < 0. Therefore F Y (y) = P [Y ≤ y] = 0 y < −10 P [X ≥ 0] = 1 −F X (0) −10 ≤ y < 10 1 y ≥ 10 (2) 105 Problem 3.7.11 Solution The PDF of U is f U (u) = 1/2 −1 ≤ u ≤ 1 0 otherwise (1) Since W ≥ 0, we see that F W (w) = 0 for w < 0. Next, we observe that the rectifier output W is a mixed random variable since P [W = 0] = P [U < 0] = 0 −1 f U (u) du = 1/2 (2) The above facts imply that F W (0) = P [W ≤ 0] = P [W = 0] = 1/2 (3) Next, we note that for 0 < w < 1, F W (w) = P [U ≤ w] = w −1 f U (u) du = (w + 1)/2 (4) Finally, U ≤ 1 implies W ≤ 1, which implies F W (w) = 1 for w ≥ 1. Hence, the complete expression for the CDF is F W (w) = 0 w < 0 (w + 1)/2 0 ≤ w ≤ 1 1 w > 1 (5) By taking the derivative of the CDF, we find the PDF of W; however, we must keep in mind that the discontinuity in the CDF at w = 0 yields a corresponding impulse in the PDF. f W (w) = (δ(w) + 1)/2 0 ≤ w ≤ 1 0 otherwise (6) From the PDF, we can calculate the expected value E [W] = 1 0 w(δ(w) + 1)/2 dw = 0 + 1 0 (w/2) dw = 1/4 (7) Perhaps an easier way to find the expected value is to use Theorem 2.10. In this case, E [W] = ∞ −∞ g(u)f W (w) du = 1 0 u(1/2) du = 1/4 (8) As we expect, both approaches give the same answer. Problem 3.7.12 Solution Theorem 3.19 states that for a constant a > 0, Y = aX has CDF and PDF F Y (y) = F X (y/a) f Y (y) = 1 a f X (y/a) (1) (a) If X is uniform (b, c), then Y = aX has PDF f Y (y) = 1 a f X (y/a) = 1 a(c−b) b ≤ y/a ≤ c 0 otherwise = 1 ac−ab ab ≤ y ≤ ac 0 otherwise (2) Thus Y has the PDF of a uniform (ab, ac) random variable. 106 (b) Using Theorem 3.19, the PDF of Y = aX is f Y (y) = 1 a f X (y/a) = λ a e −λ(y/a) y/a ≥ 0 0 otherwise (3) = (λ/a)e −(λ/a)y y ≥ 0 0 otherwise (4) Hence Y is an exponential (λ/a) exponential random variable. (c) Using Theorem 3.19, the PDF of Y = aX is f Y (y) = 1 a f X (y/a) = λ n (y/a) n−1 e −λ(y/a) a(n−1)! y/a ≥ 0 0 otherwise (5) = (λ/a) n y n−1 e −(λ/a)y (n−1)! y ≥ 0, 0 otherwise, (6) which is an Erlang (n, λ) PDF. (d) If X is a Gaussian (µ, σ) random variable, then Y = aX has PDF f Y (y) = f X (y/a) = 1 a √ 2πσ 2 e −((y/a)−µ) 2 /2σ 2 (7) = 1 √ 2πa 2 σ 2 e −(y−aµ) 2 /2(a 2 σ 2 ) (8) (9) Thus Y is a Gaussian random variable with expected value E[Y ] = aµ and Var[Y ] = a 2 σ 2 . That is, Y is a Gaussian (aµ, aσ) random variable. Problem 3.7.13 Solution If X has a uniform distribution from 0 to 1 then the PDF and corresponding CDF of X are f X (x) = 1 0 ≤ x ≤ 1 0 otherwise F X (x) = 0 x < 0 x 0 ≤ x ≤ 1 1 x > 1 (1) For b −a > 0, we can find the CDF of the function Y = a + (b −a)X F Y (y) = P [Y ≤ y] = P [a + (b −a)X ≤ y] (2) = P ¸ X ≤ y −a b −a (3) = F X y −a b −a = y −a b −a (4) Therefore the CDF of Y is F Y (y) = 0 y < a y−a b−a a ≤ y ≤ b 1 y ≥ b (5) By differentiating with respect to y we arrive at the PDF f Y (y) = 1/(b −a) a ≤ x ≤ b 0 otherwise (6) which we recognize as the PDF of a uniform (a, b) random variable. 107 Problem 3.7.14 Solution Since X = F −1 (U), it is desirable that the function F −1 (u) exist for all 0 ≤ u ≤ 1. However, for the continuous uniform random variable U, P[U = 0] = P[U = 1] = 0. Thus, it is a zero probability event that F −1 (U) will be evaluated at U = 0 or U = 1. Asa result, it doesn’t matter whether F −1 (u) exists at u = 0 or u = 1. Problem 3.7.15 Solution The relationship between X and Y is shown in the following figure: 0 1 2 3 0 1 2 3 X Y (a) Note that Y = 1/2 if and only if 0 ≤ X ≤ 1. Thus, P [Y = 1/2] = P [0 ≤ X ≤ 1] = 1 0 f X (x) dx = 1 0 (x/2) dx = 1/4 (1) (b) Since Y ≥ 1/2, we can conclude that F Y (y) = 0 for y < 1/2. Also, F Y (1/2) = P[Y = 1/2] = 1/4. Similarly, for 1/2 < y ≤ 1, F Y (y) = P [0 ≤ X ≤ 1] = P [Y = 1/2] = 1/4 (2) Next, for 1 < y ≤ 2, F Y (y) = P [X ≤ y] = y 0 f X (x) dx = y 2 /4 (3) Lastly, since Y ≤ 2, F Y (y) = 1 for y ≥ 2. The complete expression of the CDF is F Y (y) = 0 y < 1/2 1/4 1/2 ≤ y ≤ 1 y 2 /4 1 < y < 2 1 y ≥ 2 (4) Problem 3.7.16 Solution We can prove the assertion by considering the cases where a > 0 and a < 0, respectively. For the case where a > 0 we have F Y (y) = P [Y ≤ y] = P ¸ X ≤ y −b a = F X y −b a (1) Therefore by taking the derivative we find that f Y (y) = 1 a f X y −b a a > 0 (2) 108 Similarly for the case when a < 0 we have F Y (y) = P [Y ≤ y] = P ¸ X ≥ y −b a = 1 −F X y −b a (3) And by taking the derivative, we find that for negative a, f Y (y) = − 1 a f X y −b a a < 0 (4) A valid expression for both positive and negative a is f Y (y) = 1 |a| f X y −b a (5) Therefore the assertion is proved. Problem 3.7.17 Solution Understanding this claim may be harder than completing the proof. Since 0 ≤ F(x) ≤ 1, we know that 0 ≤ U ≤ 1. This implies F U (u) = 0 for u < 0 and F U (u) = 1 for u ≥ 1. Moreover, since F(x) is an increasing function, we can write for 0 ≤ u ≤ 1, F U (u) = P [F(X) ≤ u] = P X ≤ F −1 (u) = F X F −1 (u) (1) Since F X (x) = F(x), we have for 0 ≤ u ≤ 1, F U (u) = F(F −1 (u)) = u (2) Hence the complete CDF of U is F U (u) = 0 u < 0 u 0 ≤ u < 1 1 u ≥ 1 (3) That is, U is a uniform [0, 1] random variable. Problem 3.7.18 Solution (a) Given F X (x) is a continuous function, there exists x 0 such that F X (x 0 ) = u. For each value of u, the corresponding x 0 is unique. To see this, suppose there were also x 1 such that F X (x 1 ) = u. Without loss of generality, we can assume x 1 > x 0 since otherwise we could exchange the points x 0 and x 1 . Since F X (x 0 ) = F X (x 1 ) = u, the fact that F X (x) is nondecreasing implies F X (x) = u for all x ∈ [x 0 , x 1 ], i.e., F X (x) is flat over the interval [x 0 , x 1 ], which contradicts the assumption that F X (x) has no flat intervals. Thus, for any u ∈ (0, 1), there is a unique x 0 such that F X (x) = u. Moreiver, the same x 0 is the minimum of all x such that F X (x ) ≥ u. The uniqueness of x 0 such that F X (x)x 0 = u permits us to define ˜ F(u) = x 0 = F −1 X (u). (b) In this part, we are given that F X (x) has a jump discontinuity at x 0 . That is, there exists u − 0 = F X (x − 0 ) and u + 0 = F X (x + 0 ) with u − 0 < u + 0 . Consider any u in the interval [u − 0 , u + 0 ]. Since F X (x 0 ) = F X (x + 0 ) and F X (x) is nondecreasing, F X (x) ≥ F X (x 0 ) = u + 0 , x ≥ x 0 . (1) 109 Moreover, F X (x) < F X x − 0 = u − 0 , x < x 0 . (2) Thus for any u satisfying u − o ≤ u ≤ u + 0 , F X (x) < u for x < x 0 and F X (x) ≥ u for x ≥ x 0 . Thus, ˜ F(u) = min{x|F X (x) ≥ u} = x 0 . (c) We note that the first two parts of this problem were just designed to show the properties of ˜ F(u). First, we observe that P ˆ X ≤ x = P ˜ F(U) ≤ x = P min ¸ x |F X x ≥ U ¸ ≤ x . (3) To prove the claim, we define, for any x, the events A : min ¸ x |F X x ≥ U ¸ ≤ x, (4) B : U ≤ F X (x) . (5) Note that P[A] = P[ ˆ X ≤ x]. In addition, P[B] = P[U ≤ F X (x)] = F X (x) since P[U ≤ u] = u for any u ∈ [0, 1]. We will show that the events A and B are the same. This fact implies P ˆ X ≤ x = P [A] = P [B] = P [U ≤ F X (x)] = F X (x) . (6) All that remains is to show A and B are the same. As always, we need to show that A ⊂ B and that B ⊂ A. • To show A ⊂ B, suppose A is true and min{x |F X (x ) ≥ U} ≤ x. This implies there exists x 0 ≤ x such that F X (x 0 ) ≥ U. Since x 0 ≤ x, it follows from F X (x) being nondecreasing that F X (x 0 ) ≤ F X (x). We can thus conclude that U ≤ F X (x 0 ) ≤ F X (x) . (7) That is, event B is true. • To show B ⊂ A, we suppose event B is true so that U ≤ F X (x). We define the set L = ¸ x |F X x ≥ U ¸ . (8) We note x ∈ L. It follows that the minimum element min{x |x ∈ L} ≤ x. That is, min ¸ x |F X x ≥ U ¸ ≤ x, (9) which is simply event A. Problem 3.8.1 Solution The PDF of X is f X (x) = 1/10 −5 ≤ x ≤ 5 0 otherwise (1) 110 (a) The event B has probability P [B] = P [−3 ≤ X ≤ 3] = 3 −3 1 10 dx = 3 5 (2) From Definition 3.15, the conditional PDF of X given B is f X|B (x) = f X (x) /P [B] x ∈ B 0 otherwise = 1/6 |x| ≤ 3 0 otherwise (3) (b) Given B, we see that X has a uniform PDF over [a, b] with a = −3 and b = 3. From Theorem 3.6, the conditional expected value of X is E[X|B] = (a +b)/2 = 0. (c) From Theorem 3.6, the conditional variance of X is Var[X|B] = (b −a) 2 /12 = 3. Problem 3.8.2 Solution From Definition 3.6, the PDF of Y is f Y (y) = (1/5)e −y/5 y ≥ 0 0 otherwise (1) (a) The event A has probability P [A] = P [Y < 2] = 2 0 (1/5)e −y/5 dy = −e −y/5 2 0 = 1 −e −2/5 (2) From Definition 3.15, the conditional PDF of Y given A is f Y |A (y) = f Y (y) /P [A] x ∈ A 0 otherwise (3) = (1/5)e −y/5 /(1 −e −2/5 ) 0 ≤ y < 2 0 otherwise (4) (b) The conditional expected value of Y given A is E [Y |A] = ∞ −∞ yf Y |A (y) dy = 1/5 1 −e −2/5 2 0 ye −y/5 dy (5) Using the integration by parts formula udv = uv − v du with u = y and dv = e −y/5 dy yields E [Y |A] = 1/5 1 −e −2/5 −5ye −y/5 2 0 + 2 0 5e −y/5 dy (6) = 1/5 1 −e −2/5 −10e −2/5 − 25e −y/5 2 0 (7) = 5 −7e −2/5 1 −e −2/5 (8) 111 Problem 3.8.3 Solution The condition right side of the circle is R = [0, 1/2]. Using the PDF in Example 3.5, we have P [R] = 1/2 0 f Y (y) dy = 1/2 0 3y 2 dy = 1/8 (1) Therefore, the conditional PDF of Y given event R is f Y |R (y) = 24y 2 0 ≤ y ≤ 1/2 0 otherwise (2) The conditional expected value and mean square value are E [Y |R] = ∞ −∞ yf Y |R (y) dy = 1/2 0 24y 3 dy = 3/8 meter (3) E Y 2 |R = ∞ −∞ y 2 f Y |R (y) dy = 1/2 0 24y 4 dy = 3/20 m 2 (4) The conditional variance is Var [Y |R] = E Y 2 |R −(E [Y |R]) 2 = 3 20 − 3 8 2 = 3/320 m 2 (5) The conditional standard deviation is σ Y |R = Var[Y |R] = 0.0968 meters. Problem 3.8.4 Solution From Definition 3.8, the PDF of W is f W (w) = 1 √ 32π e −w 2 /32 (1) (a) Since W has expected value µ = 0, f W (w) is symmetric about w = 0. Hence P[C] = P[W > 0] = 1/2. From Definition 3.15, the conditional PDF of W given C is f W|C (w) = f W (w) /P [C] w ∈ C 0 otherwise = 2e −w 2 /32 / √ 32π w > 0 0 otherwise (2) (b) The conditional expected value of W given C is E [W|C] = ∞ −∞ wf W|C (w) dw = 2 4 √ 2π ∞ 0 we −w 2 /32 dw (3) Making the substitution v = w 2 /32, we obtain E [W|C] = 32 √ 32π ∞ 0 e −v dv = 32 √ 32π (4) (c) The conditional second moment of W is E W 2 |C = ∞ −∞ w 2 f W|C (w) dw = 2 ∞ 0 w 2 f W (w) dw (5) 112 We observe that w 2 f W (w) is an even function. Hence E W 2 |C = 2 ∞ 0 w 2 f W (w) dw (6) = ∞ −∞ w 2 f W (w) dw = E W 2 = σ 2 = 16 (7) Lastly, the conditional variance of W given C is Var[W|C] = E W 2 |C −(E [W|C]) 2 = 16 −32/π = 5.81 (8) Problem 3.8.5 Solution (a) We first find the conditional PDF of T. The PDF of T is f T (t) = 100e −100t t ≥ 0 0 otherwise (1) The conditioning event has probability P [T > 0.02] = ∞ 0.02 f T (t) dt = −e −100t ∞ 0.02 = e −2 (2) From Definition 3.15, the conditional PDF of T is f T|T>0.02 (t) = f T (t) P[T>0.02] t ≥ 0.02 0 otherwise = 100e −100(t−0.02) t ≥ 0.02 0 otherwise (3) The conditional expected value of T is E [T|T > 0.02] = ∞ 0.02 t(100)e −100(t−0.02) dt (4) The substitution τ = t −0.02 yields E [T|T > 0.02] = ∞ 0 (τ + 0.02)(100)e −100τ dτ (5) = ∞ 0 (τ + 0.02)f T (τ) dτ = E [T + 0.02] = 0.03 (6) (b) The conditional second moment of T is E T 2 |T > 0.02 = ∞ 0.02 t 2 (100)e −100(t−0.02) dt (7) The substitution τ = t −0.02 yields E T 2 |T > 0.02 = ∞ 0 (τ + 0.02) 2 (100)e −100τ dτ (8) = ∞ 0 (τ + 0.02) 2 f T (τ) dτ (9) = E (T + 0.02) 2 (10) 113 Now we can calculate the conditional variance. Var[T|T > 0.02] = E T 2 |T > 0.02 −(E [T|T > 0.02]) 2 (11) = E (T + 0.02) 2 −(E [T + 0.02]) 2 (12) = Var[T + 0.02] (13) = Var[T] = 0.01 (14) Problem 3.8.6 Solution (a) In Problem 3.6.8, we found that the PDF of D is f D (y) = 0.3δ(y) y < 60 0.07e −(y−60)/10 y ≥ 60 (1) First, we observe that D > 0 if the throw is good so that P[D > 0] = 0.7. A second way to find this probability is P [D > 0] = ∞ 0 + f D (y) dy = 0.7 (2) From Definition 3.15, we can write f D|D>0 (y) = f D (y) P[D>0] y > 0 0 otherwise = (1/10)e −(y−60)/10 y ≥ 60 0 otherwise (3) (b) If instead we learn that D ≤ 70, we can calculate the conditional PDF by first calculating P [D ≤ 70] = 70 0 f D (y) dy (4) = 60 0 0.3δ(y) dy + 70 60 0.07e −(y−60)/10 dy (5) = 0.3 + −0.7e −(y−60)/10 70 60 = 1 −0.7e −1 (6) The conditional PDF is f D|D≤70 (y) = f D (y) P[D≤70] y ≤ 70 0 otherwise (7) = 0.3 1−0.7e −1 δ(y) 0 ≤ y < 60 0.07 1−0.7e −1 e −(y−60)/10 60 ≤ y ≤ 70 0 otherwise (8) Problem 3.8.7 Solution (a) Given that a person is healthy, X is a Gaussian (µ = 90, σ = 20) random variable. Thus, f X|H (x) = 1 σ √ 2π e −(x−µ) 2 /2σ 2 = 1 20 √ 2π e −(x−90) 2 /800 (1) 114 (b) Given the event H, we use the conditional PDF f X|H (x) to calculate the required probabilities P T + |H = P [X ≥ 140|H] = P [X −90 ≥ 50|H] (2) = P ¸ X −90 20 ≥ 2.5|H = 1 −Φ(2.5) = 0.006 (3) Similarly, P T − |H = P [X ≤ 110|H] = P [X −90 ≤ 20|H] (4) = P ¸ X −90 20 ≤ 1|H = Φ(1) = 0.841 (5) (c) Using Bayes Theorem, we have P H|T − = P [T − |H] P [H] P [T − ] = P [T − |H] P [H] P [T − |D] P [D] +P [T − |H] P [H] (6) In the denominator, we need to calculate P T − |D = P [X ≤ 110|D] = P [X −160 ≤ −50|D] (7) = P ¸ X −160 40 ≤ −1.25|D (8) = Φ(−1.25) = 1 −Φ(1.25) = 0.106 (9) Thus, P H|T − = P [T − |H] P [H] P [T − |D] P [D] +P [T − |H] P [H] (10) = 0.841(0.9) 0.106(0.1) + 0.841(0.9) = 0.986 (11) (d) Since T − , T 0 , and T + are mutually exclusive and collectively exhaustive, P T 0 |H = 1 −P T − |H −P T + |H = 1 −0.841 −0.006 = 0.153 (12) We say that a test is a failure if the result is T 0 . Thus, given the event H, each test has conditional failure probability of q = 0.153, or success probability p = 1 − q = 0.847. Given H, the number of trials N until a success is a geometric (p) random variable with PMF P N|H (n) = (1 −p) n−1 p n = 1, 2, . . . , 0 otherwise. (13) Problem 3.8.8 Solution 115 (a) The event B i that Y = ∆/2 + i∆ occurs if and only if i∆ ≤ X < (i + 1)∆. In particular, since X has the uniform (−r/2, r/2) PDF f X (x) = 1/r −r/2 ≤ x < r/2, 0 otherwise, (1) we observe that P [B i ] = (i+1)∆ i∆ 1 r dx = ∆ r (2) In addition, the conditional PDF of X given B i is f X|B i (x) = f X (x) /P [B] x ∈ B i 0 otherwise = 1/∆ i∆ ≤ x < (i + 1)∆ 0 otherwise (3) It follows that given B i , Z = X − Y = X − ∆/2 − i∆, which is a uniform (−∆/2, ∆/2) random variable. That is, f Z|B i (z) = 1/∆ −∆/2 ≤ z < ∆/2 0 otherwise (4) (b) We observe that f Z|B i (z) is the same for every i. Thus, we can write f Z (z) = ¸ i P [B i ] f Z|B i (z) = f Z|B 0 (z) ¸ i P [B i ] = f Z|B 0 (z) (5) Thus, Z is a uniform (−∆/2, ∆/2) random variable. From the definition of a uniform (a, b) random variable, Z has mean and variance E [Z] = 0, Var[Z] = (∆/2 −(−∆/2)) 2 12 = ∆ 2 12 . (6) Problem 3.8.9 Solution For this problem, almost any non-uniform random variable X will yield a non-uniform random variable Z. For example, suppose X has the “triangular” PDF f X (x) = 8x/r 2 0 ≤ x ≤ r/2 0 otherwise (1) In this case, the event B i that Y = i∆+∆/2 occurs if and only if i∆ ≤ X < (i + 1)∆. Thus P [B i ] = (i+1)∆ i∆ 8x r 2 dx = 8∆(i∆+∆/2) r 2 (2) It follows that the conditional PDF of X given B i is f X|B i (x) = f X (x) P[B i ] x ∈ B i 0 otherwise = x ∆(i∆+∆/2) i∆ ≤ x < (i + 1)∆ 0 otherwise (3) Given event B i , Y = i∆+∆/2, so that Z = X −Y = X −i∆−∆/2. This implies f Z|B i (z) = f X|B i (z +i∆+∆/2) = z+i∆+∆/2 ∆(i∆+∆/2) −∆/2 ≤ z < ∆/2 0 otherwise (4) We observe that the PDF of Z depends on which event B i occurs. Moreover, f Z|B i (z) is non-uniform for all B i . 116 Problem 3.9.1 Solution Taking the derivative of the CDF F Y (y) in Quiz 3.1, we obtain f Y (y) = 1/4 0 ≤ y ≤ 4 0 otherwise (1) We see that Y is a uniform (0, 4) random variable. By Theorem 3.20, if X is a uniform (0, 1) random variable, then Y = 4X is a uniform (0, 4) random variable. Using rand as Matlab’s uniform (0, 1) random variable, the program quiz31rv is essentially a one line program: function y=quiz31rv(m) %Usage y=quiz31rv(m) %Returns the vector y holding m %samples of the uniform (0,4) random %variable Y of Quiz 3.1 y=4*rand(m,1); Problem 3.9.2 Solution The modem receiver voltage is genrated by taking a ±5 voltage representing data, and adding to it a Gaussian (0, 2) noise variable. Although siuations in which two random variables are added together are not analyzed until Chapter 4, generating samples of the receiver voltage is easy in Matlab. Here is the code: function x=modemrv(m); %Usage: x=modemrv(m) %generates m samples of X, the modem %receiver voltage in Exampe 3.32. %X=+-5 + N where N is Gaussian (0,2) sb=[-5; 5]; pb=[0.5; 0.5]; b=finiterv(sb,pb,m); noise=gaussrv(0,2,m); x=b+noise; The commands x=modemrv(10000); hist(x,100); generate 10,000 sample of the modem receiver voltage and plots the relative frequencies using 100 bins. Here is an example plot: −15 −10 −5 0 5 10 15 0 50 100 150 200 250 300 R e l a t i v e F r e q u e n c y x As expected, the result is qualitatively similar (“hills” around X = −5 and X = 5) to the sketch in Figure 3.3. 117 Problem 3.9.3 Solution The code for ˆ Q(z) is the Matlab function function p=qapprox(z); %approximation to the Gaussian % (0,1) complementary CDF Q(z) t=1./(1.0+(0.231641888.*z(:))); a=[0.127414796; -0.142248368; 0.7107068705; ... -0.7265760135; 0.5307027145]; p=([t t.^2 t.^3 t.^4 t.^5]*a).*exp(-(z(:).^2)/2); This code generates two plots of the relative error e(z) as a function of z: z=0:0.02:6; q=1.0-phi(z(:)); qhat=qapprox(z); e=(q-qhat)./q; plot(z,e); figure; semilogy(z,abs(e)); Here are the output figures of qtest.m: 0 2 4 6 −4 −3 −2 −1 0 1 x 10 −3 z e ( z ) 0 2 4 6 10 −10 10 −8 10 −6 10 −4 10 −2 The left side plot graphs e(z) versus z. It appears that the e(z) = 0 for z ≤ 3. In fact, e(z) is nonzero over that range, but the relative error is so small that it isn’t visible in comparison to e(6) ≈ −3.5 10 −3 . To see the error for small z, the right hand graph plots |e(z)| versus z in log scale where we observe very small relative errors on the order of 10 −7 . Problem 3.9.4 Solution By Theorem 3.9, if X is an exponential (λ) random variable, then K = X| is a geometric (p) random variable with p = 1 − e −λ . Thus, given p, we can write λ = −ln(1 − p) and X| is a geometric (p) random variable. Here is the Matlab function that implements this technique: function k=georv(p,m); lambda= -log(1-p); k=ceil(exponentialrv(lambda,m)); To compare this technique with that use in geometricrv.m, we first examine the code for exponentialrv.m: function x=exponentialrv(lambda,m) x=-(1/lambda)*log(1-rand(m,1)); 118 To analyze how m = 1 random sample is generated, let R = rand(1,1). In terms of mathematics, exponentialrv(lambda,1) generates the random variable X = − ln(1 −R) λ (1) For λ = −ln(1 −p), we have that K = X| = ¸ ln(1 −R) ln(1 −p) ¸ (2) This is precisely the same function implemented by geometricrv.m. In short, the two methods for generating geometric (p) random samples are one in the same. Problem 3.9.5 Solution Given 0 ≤ u ≤ 1, we need to find the “inverse” function that finds the value of w satisfying u = F W (w). The problem is that for u = 1/4, any w in the interval [−3, 3] satisfies F W (w) = 1/4. However, in terms of generating samples of random variable W, this doesn’t matter. For a uniform (0, 1) random variable U, P[U = 1/4] = 0. Thus we can choose any w ∈ [−3, 3]. In particular, we define the inverse CDF as w = F −1 W (u) = 8u −5 0 ≤ u ≤ 1/4 (8u + 7)/3 1/4 < u ≤ 1 (1) Note that because 0 ≤ F W (w) ≤ 1, the inverse F −1 W (u) is defined only for 0 ≤ u ≤ 1. Careful inspection will show that u = (w + 5)/8 for −5 ≤ w < −3 and that u = 1/4 + 3(w − 3)/8 for −3 ≤ w ≤ 5. Thus, for a uniform (0, 1) random variable U, the function W = F −1 W (U) produces a random variable with CDF F W (w). To implement this solution in Matlab, we define function w=iwcdf(u); w=((u>=0).*(u <= 0.25).*(8*u-5))+... ((u > 0.25).*(u<=1).*((8*u+7)/3)); so that the Matlab code W=icdfrv(@iwcdf,m) generates m samples of random variable W. Problem 3.9.6 Solution (a) To test the exponential random variables, the following code function exponentialtest(lambda,n) delta=0.01; x=exponentialrv(lambda,n); xr=(0:delta:(5.0/lambda))’; fxsample=(histc(x,xr)/(n*delta)); fx=exponentialpdf(lambda,xr); plot(xr,fx,xr,fxsample); generates n samples of an exponential λ random variable and plots the relative frequency n i /(n∆) against the corresponding exponential PDF. Note that the histc function generates a histogram using xr to define the edges of the bins. Two representative plots for n = 1,000 and n = 100,000 samples appear in the following figure: 119 0 1 2 3 4 5 0 0.5 1 1.5 0 1 2 3 4 5 0 0.5 1 1.5 exponentialtest(1,1000) exponentialtest(1,100000) For n = 1,000, the jaggedness of the relative frequency occurs because δ is sufficiently small that the number of sample of X in each bin i∆ < X ≤ (i+1)∆is fairly small. For n = 100,000, the greater smoothness of the curve demonstrates how the relative frequency is becoming a better approximation to the actual PDF. (b) Similar results hold for Gaussian random variables. The following code generates the same comparison between the Gaussian PDF and the relative frequency of n samples. function gausstest(mu,sigma2,n) delta=0.01; x=gaussrv(mu,sigma2,n); xr=(0:delta:(mu+(3*sqrt(sigma2))))’; fxsample=(histc(x,xr)/(n*delta)); fx=gausspdf(mu,sigma2,xr); plot(xr,fx,xr,fxsample); Here are two typical plots produced by gaussiantest.m: 0 2 4 6 0 0.2 0.4 0.6 0.8 1 0 2 4 6 0 0.1 0.2 0.3 0.4 0.5 gausstest(3,1,1000) gausstest(3,1,100000) Problem 3.9.7 Solution First we need to build a uniform (−r/2, r/2) b-bit quantizer. The function uquantize does this. function y=uquantize(r,b,x) %uniform (-r/2,r/2) b bit quantizer n=2^b; delta=r/n; x=min(x,(r-delta/2)/2); x=max(x,-(r-delta/2)/2); y=(delta/2)+delta*floor(x/delta); 120 Note that if |x| > r/2, then x is truncated so that the quantizer output has maximum amplitude. Next, we generate Gaussian samples, quantize them and record the errors: function stdev=quantizegauss(r,b,m) x=gaussrv(0,1,m); x=x((x<=r/2)&(x>=-r/2)); y=uquantize(r,b,x); z=x-y; hist(z,100); stdev=sqrt(sum(z.^2)/length(z)); For a Gaussian random variable X, P[|X| > r/2] > 0 for any value of r. When we generate enough Gaussian samples, we will always see some quantization errors due to the finite (−r/2, r/2) range. To focus our attention on the effect of b bit quantization, quantizegauss.m eliminates Gaussian samples outside the range (−r/2, r/2). Here are outputs of quantizegauss for b = 1, 2, 3 bits. −2 0 2 0 5000 10000 15000 −1 0 1 0 5000 10000 15000 −0.5 0 0.5 0 5000 10000 15000 b = 1 b = 2 b = 3 It is obvious that for b = 1 bit quantization, the error is decidely not uniform. However, it appears that the error is uniform for b = 2 and b = 3. You can verify that uniform errors is a reasonable model for larger values of b. Problem 3.9.8 Solution To solve this problem, we want to use Theorem 3.22. One complication is that in the theorem, U denotes the uniform random variable while X is the derived random variable. In this problem, we are using U for the random variable we want to derive. As a result, we will use Theorem 3.22 with the roles of X and U reversed. Given U with CDF F U (u) = F(u), we need to find the inverse functon F −1 (x) = F −1 U (x) so that for a uniform (0, 1) random variable X, U = F −1 (X). Recall that random variable U defined in Problem 3.3.7 has CDF −5 0 5 0 0.5 1 u F U ( u ) F U (u) = 0 u < −5 (u + 5)/8 −5 ≤ u < −3 1/4 −3 ≤ u < 3 1/4 + 3(u −3)/8 3 ≤ u < 5 1 u ≥ 5. (1) At x = 1/4, there are multiple values of u such that F U (u) = 1/4. However, except for x = 1/4, the inverse F −1 U (x) is well defined over 0 < x < 1. At x = 1/4, we can arbitrarily define a value for F −1 U (1/4) because when we produce sample values of F −1 U (X), the event X = 1/4 has probability zero. To generate the inverse CDF, given a value of x, 0 < x < 1, we ave to find the value of u such that x = F U (u). From the CDF we see that 0 ≤ x ≤ 1 4 ⇒x = u + 5 8 (2) 1 4 < x ≤ 1 ⇒x = 1 4 + 3 8 (u −3) (3) (4) 121 These conditions can be inverted to express u as a function of x. u = F −1 (x) = 8x −5 0 ≤ x ≤ 1/4 (8x + 7)/3 1/4 < x ≤ 1 (5) In particular, when X is a uniform (0, 1) random variable, U = F −1 (X) will generate samples of the rndom variable U. A Matlab program to implement this solution is now straightforward: function u=urv(m) %Usage: u=urv(m) %Generates m samples of the random %variable U defined in Problem 3.3.7 x=rand(m,1); u=(x<=1/4).*(8*x-5); u=u+(x>1/4).*(8*x+7)/3; To see that this generates the correct output, we can generate a histogram of a million sample values of U using the commands u=urv(1000000); hist(u,100); The output is shown in the following graph, alongside the corresponding PDF of U. −5 0 5 0 1 2 3 4 x 10 4 u C o u n t f U (u) = 0 u < −5 1/8 −5 ≤ u < −3 0 −3 ≤ u < 3 3/8 3 ≤ u < 5 0 u ≥ 5. (6) Note that the scaling constant 10 4 on the histogram plot comes from the fact that the histogram was generated using 10 6 sample points and 100 bins. The width of each bin is ∆ = 10/100 = 0.1. Consider a bin of idth ∆ centered at u 0 . A sample value of U would fall in that bin with probability f U (u 0 )∆. Given that we generate m = 10 6 samples, we would expect about mf U (u 0 )∆ = 10 5 f U (u 0 ) samples in each bin. For −5 < u 0 < −3, we would expect to see about 1.25 10 4 samples in each bin. For 3 < u 0 < 5, we would expect to see about 3.75 10 4 samples in each bin. As can be seen, these conclusions are consistent with the histogam data. Finally, we comment that if you generate histograms for a range of values of m, the number of samples, you will see that the histograms will become more and more similar to a scaled version of the PDF. This gives the (false) impression that any bin centered on u 0 has a number of samples increasingly close to mf U (u 0 )∆. Because the histpgram is always the same height, what is actually happening is that the vertical axis is effectively scaled by 1/m and the height of a histogram bar is proportional to the fraction of m samples that land in that bin. We will see in Chapter 7 that the fraction of samples in a bin does converge to the probability of a sample being in that bin as the number of samples m goes to infinity. Problem 3.9.9 Solution From Quiz 3.6, random variable X has CDF The CDF of X is −2 0 2 0 0.5 1 x F X ( x ) F X (x) = 0 x < −1, (x + 1)/4 −1 ≤ x < 1, 1 x ≥ 1. (1) 122 Following the procedure outlined in Problem 3.7.18, we define for 0 < u ≤ 1, ˜ F(u) = min {x|F X (x) ≥ u} . (2) We observe that if 0 < u < 1/4, then we can choose x so that F X (x) = u. In this case, (x+1)/4 = u, or equivalently, x = 4u − 1. For 1/4 ≤ u ≤ 1, the minimum x that satisfies F X (x) ≥ u is x = 1. These facts imply ˜ F(u) = 4u −1 0 < u < 1/4 1 1/4 ≤ u ≤ 1 (3) It follows that if U is a uniform (0, 1) random variable, then ˜ F(U) has the same CDF as X. This is trivial to implement in Matlab. function x=quiz36rv(m) %Usage x=quiz36rv(m) %Returns the vector x holding m samples %of the random variable X of Quiz 3.6 u=rand(m,1); x=((4*u-1).*(u< 0.25))+(1.0*(u>=0.25)); 123 Problem Solutions – Chapter 4 Problem 4.1.1 Solution (a) The probability P[X ≤ 2, Y ≤ 3] can be found be evaluating the joint CDF F X,Y (x, y) at x = 2 and y = 3. This yields P [X ≤ 2, Y ≤ 3] = F X,Y (2, 3) = (1 −e −2 )(1 −e −3 ) (1) (b) To find the marginal CDF of X, F X (x), we simply evaluate the joint CDF at y = ∞. F X (x) = F X,Y (x, ∞) = 1 −e −x x ≥ 0 0 otherwise (2) (c) Likewise for the marginal CDF of Y , we evaluate the joint CDF at X = ∞. F Y (y) = F X,Y (∞, y) = 1 −e −y y ≥ 0 0 otherwise (3) Problem 4.1.2 Solution (a) Because the probability that any random variable is less than −∞ is zero, we have F X,Y (x, −∞) = P [X ≤ x, Y ≤ −∞] ≤ P [Y ≤ −∞] = 0 (1) (b) The probability that any random variable is less than infinity is always one. F X,Y (x, ∞) = P [X ≤ x, Y ≤ ∞] = P [X ≤ x] = F X (x) (2) (c) Although P[Y ≤ ∞] = 1, P[X ≤ −∞] = 0. Therefore the following is true. F X,Y (−∞, ∞) = P [X ≤ −∞, Y ≤ ∞] ≤ P [X ≤ −∞] = 0 (3) (d) Part (d) follows the same logic as that of part (a). F X,Y (−∞, y) = P [X ≤ −∞, Y ≤ y] ≤ P [X ≤ −∞] = 0 (4) (e) Analogous to Part (b), we find that F X,Y (∞, y) = P [X ≤ ∞, Y ≤ y] = P [Y ≤ y] = F Y (y) (5) 124 Problem 4.1.3 Solution We wish to find P[x 1 ≤ X ≤ x 2 ] or P[y 1 ≤ Y ≤ y 2 ]. We define events A = {y 1 ≤ Y ≤ y 2 } and B = {y 1 ≤ Y ≤ y 2 } so that P [A∪ B] = P [A] +P [B] −P [AB] (1) Keep in mind that the intersection of events A and B are all the outcomes such that both A and B occur, specifically, AB = {x 1 ≤ X ≤ x 2 , y 1 ≤ Y ≤ y 2 }. It follows that P [A∪ B] = P [x 1 ≤ X ≤ x 2 ] +P [y 1 ≤ Y ≤ y 2 ] −P [x 1 ≤ X ≤ x 2 , y 1 ≤ Y ≤ y 2 ] . (2) By Theorem 4.5, P [x 1 ≤ X ≤ x 2 , y 1 ≤ Y ≤ y 2 ] = F X,Y (x 2 , y 2 ) −F X,Y (x 2 , y 1 ) −F X,Y (x 1 , y 2 ) +F X,Y (x 1 , y 1 ) . (3) Expressed in terms of the marginal and joint CDFs, P [A∪ B] = F X (x 2 ) −F X (x 1 ) +F Y (y 2 ) −F Y (y 1 ) (4) −F X,Y (x 2 , y 2 ) +F X,Y (x 2 , y 1 ) +F X,Y (x 1 , y 2 ) −F X,Y (x 1 , y 1 ) (5) Problem 4.1.4 Solution Its easy to show that the properties of Theorem 4.1 are satisfied. However, those properties are necessary but not sufficient to show F(x, y) is a CDF. To convince ourselves that F(x, y) is a valid CDF, we show that for all x 1 ≤ x 2 and y 1 ≤ y 2 , P [x 1 < X 1 ≤ x 2 , y 1 < Y ≤ y 2 ] ≥ 0 (1) In this case, for x 1 ≤ x 2 and y 1 ≤ y 2 , Theorem 4.5 yields P [x 1 < X ≤ x 2 , y 1 < Y ≤ y 2 ] = F(x 2 , y 2 ) −F(x 1 , y 2 ) −F(x 2 , y 1 ) +F(x 1 , y 1 ) (2) = F X (x 2 ) F Y (y 2 ) −F X (x 1 ) F Y (y 2 ) (3) −F X (x 2 ) F Y (y 1 ) +F X (x 1 ) F Y (y 1 ) (4) = [F X (x 2 ) −F X (x 1 )][F Y (y 2 ) −F Y (y 1 )] (5) ≥ 0 (6) Hence, F X (x)F Y (y) is a valid joint CDF. Problem 4.1.5 Solution In this problem, we prove Theorem 4.5 which states P [x 1 < X ≤ x 2 , y 1 < Y ≤ y 2 ] = F X,Y (x 2 , y 2 ) −F X,Y (x 2 , y 1 ) (1) −F X,Y (x 1 , y 2 ) +F X,Y (x 1 , y 1 ) (2) (a) The events A, B, and C are Y X x 1 y 1 y 2 Y X x 1 x 2 y 1 y 2 Y X x 1 x 2 y 1 y 2 A B C (3) 125 (b) In terms of the joint CDF F X,Y (x, y), we can write P [A] = F X,Y (x 1 , y 2 ) −F X,Y (x 1 , y 1 ) (4) P [B] = F X,Y (x 2 , y 1 ) −F X,Y (x 1 , y 1 ) (5) P [A∪ B ∪ C] = F X,Y (x 2 , y 2 ) −F X,Y (x 1 , y 1 ) (6) (c) Since A, B, and C are mutually exclusive, P [A∪ B ∪ C] = P [A] +P [B] +P [C] (7) However, since we want to express P [C] = P [x 1 < X ≤ x 2 , y 1 < Y ≤ y 2 ] (8) in terms of the joint CDF F X,Y (x, y), we write P [C] = P [A∪ B ∪ C] −P [A] −P [B] (9) = F X,Y (x 2 , y 2 ) −F X,Y (x 1 , y 2 ) −F X,Y (x 2 , y 1 ) +F X,Y (x 1 , y 1 ) (10) which completes the proof of the theorem. Problem 4.1.6 Solution The given function is F X,Y (x, y) = 1 −e −(x+y) x, y ≥ 0 0 otherwise (1) First, we find the CDF F X (x) and F Y (y). F X (x) = F X,Y (x, ∞) = 1 x ≥ 0 0 otherwise (2) F Y (y) = F X,Y (∞, y) = 1 y ≥ 0 0 otherwise (3) Hence, for any x ≥ 0 or y ≥ 0, P [X > x] = 0 P [Y > y] = 0 (4) For x ≥ 0 and y ≥ 0, this implies P [{X > x} ∪ {Y > y}] ≤ P [X > x] +P [Y > y] = 0 (5) However, P [{X > x} ∪ {Y > y}] = 1 −P [X ≤ x, Y ≤ y] = 1 −(1 −e −(x+y) ) = e −(x+y) (6) Thus, we have the contradiction that e −(x+y) ≤ 0 for all x, y ≥ 0. We can conclude that the given function is not a valid CDF. 126 Problem 4.2.1 Solution In this problem, it is helpful to label points with nonzero probability on the X, Y plane: E T y x P X,Y (x, y) • c • 3c • 2c • 6c • 4c • 12c 0 1 2 3 4 0 1 2 3 4 (a) We must choose c so the PMF sums to one: ¸ x=1,2,4 ¸ y=1,3 P X,Y (x, y) = c ¸ x=1,2,4 x ¸ y=1,3 y (1) = c [1(1 + 3) + 2(1 + 3) + 4(1 + 3)] = 28c (2) Thus c = 1/28. (b) The event {Y < X} has probability P [Y < X] = ¸ x=1,2,4 ¸ y<x P X,Y (x, y) = 1(0) + 2(1) + 4(1 + 3) 28 = 18 28 (3) (c) The event {Y > X} has probability P [Y > X] = ¸ x=1,2,4 ¸ y>x P X,Y (x, y) = 1(3) + 2(3) + 4(0) 28 = 9 28 (4) (d) There are two ways to solve this part. The direct way is to calculate P [Y = X] = ¸ x=1,2,4 ¸ y=x P X,Y (x, y) = 1(1) + 2(0) 28 = 1 28 (5) The indirect way is to use the previous results and the observation that P [Y = X] = 1 −P [Y < X] −P [Y > X] = (1 −18/28 −9/28) = 1/28 (6) (e) P [Y = 3] = ¸ x=1,2,4 P X,Y (x, 3) = (1)(3) + (2)(3) + (4)(3) 28 = 21 28 = 3 4 (7) Problem 4.2.2 Solution On the X, Y plane, the joint PMF is 127 ' E T c y x P X,Y (x, y) • 3c • 2c • c • c • c • c • 2c • 3c 1 2 1 (a) To find c, we sum the PMF over all possible values of X and Y . We choose c so the sum equals one. ¸ x ¸ y P X,Y (x, y) = ¸ x=−2,0,2 ¸ y=−1,0,1 c |x +y| = 6c + 2c + 6c = 14c (1) Thus c = 1/14. (b) P [Y < X] = P X,Y (0, −1) +P X,Y (2, −1) +P X,Y (2, 0) +P X,Y (2, 1) (2) = c +c + 2c + 3c = 7c = 1/2 (3) (c) P [Y > X] = P X,Y (−2, −1) +P X,Y (−2, 0) +P X,Y (−2, 1) +P X,Y (0, 1) (4) = 3c + 2c +c +c = 7c = 1/2 (5) (d) From the sketch of P X,Y (x, y) given above, P[X = Y ] = 0. (e) P [X < 1] = P X,Y (−2, −1) +P X,Y (−2, 0) +P X,Y (−2, 1) +P X,Y (0, −1) +P X,Y (0, 1) (6) = 8c = 8/14. (7) Problem 4.2.3 Solution Let r (reject) and a (accept) denote the result of each test. There are four possible outcomes: rr, ra, ar, aa. The sample tree is ¨ ¨ ¨ ¨ ¨ ¨ r p r r r r r r a 1−p $ $ $ $ $ $ r p ˆ ˆ ˆ ˆ ˆ ˆ a 1−p $ $ $ $ $ $ r p ˆ ˆ ˆ ˆ ˆ ˆ a 1−p •rr p 2 •ra p(1−p) •ar p(1−p) •aa (1−p) 2 128 Now we construct a table that maps the sample outcomes to values of X and Y . outcome P [·] X Y rr p 2 1 1 ra p(1 −p) 1 0 ar p(1 −p) 0 1 aa (1 −p) 2 0 0 (1) This table is esentially the joint PMF P X,Y (x, y). P X,Y (x, y) = p 2 x = 1, y = 1 p(1 −p) x = 0, y = 1 p(1 −p) x = 1, y = 0 (1 −p) 2 x = 0, y = 0 0 otherwise (2) Problem 4.2.4 Solution The sample space is the set S = {hh, ht, th, tt} and each sample point has probability 1/4. Each sample outcome specifies the values of X and Y as given in the following table outcome X Y hh 0 1 ht 1 0 th 1 1 tt 2 0 (1) The joint PMF can represented by the table P X,Y (x, y) y = 0 y = 1 x = 0 0 1/4 x = 1 1/4 1/4 x = 2 1/4 0 (2) Problem 4.2.5 Solution As the problem statement says, reasonable arguments can be made for the labels being X and Y or x and y. As we see in the arguments below, the lowercase choice of the text is somewhat arbitrary. • Lowercase axis labels: For the lowercase labels, we observe that we are depicting the masses associated with the joint PMF P X,Y (x, y) whose arguments are x and y. Since the PMF function is defined in terms of x and y, the axis labels should be x and y. • Uppercase axis labels: On the other hand, we are depicting the possible outcomes (labeled with their respective probabilities) of the pair of random variables X and Y . The corresponding axis labels should be X and Y just as in Figure 4.2. The fact that we have labeled the possible outcomes by their probabilities is irrelevant. Further, since the expression for the PMF P X,Y (x, y) given in the figure could just as well have been written P X,Y (·, ·), it is clear that the lowercase x and y are not what matter. 129 Problem 4.2.6 Solution As the problem statement indicates, Y = y < n if and only if A: the first y tests are acceptable, and B: test y + 1 is a rejection. Thus P[Y = y] = P[AB]. Note that Y ≤ X since the number of acceptable tests before the first failure cannot exceed the number of acceptable circuits. Moreover, given the occurrence of AB, the event X = x < n occurs if and only if there are x−y acceptable circuits in the remaining n−y −1 tests. Since events A, B and C depend on disjoint sets of tests, they are independent events. Thus, for 0 ≤ y ≤ x < n, P X,Y (x, y) = P [X = x, Y = y] = P [ABC] (1) = P [A] P [B] P [C] (2) = p y .... P[A] (1 −p) . .. . P[B] n −y −1 x −y p x−y (1 −p) n−y−1−(x−y) . .. . P[C] (3) = n −y −1 x −y p x (1 −p) n−x (4) The case y = x = n occurs when all n tests are acceptable and thus P X,Y (n, n) = p n . Problem 4.2.7 Solution The joint PMF of X and K is P K,X (k, x) = P[K = k, X = x], which is the probability that K = k and X = x. This means that both events must be satisfied. The approach we use is similar to that used in finding the Pascal PMF in Example 2.15. Since X can take on only the two values 0 and 1, let’s consider each in turn. When X = 0 that means that a rejection occurred on the last test and that the other k −1 rejections must have occurred in the previous n −1 tests. Thus, P K,X (k, 0) = n −1 k −1 (1 −p) k−1 p n−1−(k−1) (1 −p) k = 1, . . . , n (1) When X = 1 the last test was acceptable and therefore we know that the K = k ≤ n−1 tails must have occurred in the previous n −1 tests. In this case, P K,X (k, 1) = n −1 k (1 −p) k p n−1−k p k = 0, . . . , n −1 (2) We can combine these cases into a single complete expression for the joint PMF. P K,X (k, x) = n−1 k−1 (1 −p) k p n−k x = 0, k = 1, 2, . . . , n n−1 k (1 −p) k p n−k x = 1, k = 0, 1, . . . , n −1 0 otherwise (3) Problem 4.2.8 Solution Each circuit test produces an acceptable circuit with probability p. Let K denote the number of rejected circuits that occur in n tests and X is the number of acceptable circuits before the first re- ject. The joint PMF, P K,X (k, x) = P[K = k, X = x] can be found by realizing that {K = k, X = x} occurs if and only if the following events occur: 130 Optional A The first x tests must be acceptable. B Test x+1 must be a rejection since otherwise we would have x+1 acceptable at the beginnning. C The remaining n −x −1 tests must contain k −1 rejections. Since the events A, B and C are independent, the joint PMF for x +k ≤ r, x ≥ 0 and k ≥ 0 is P K,X (k, x) = p x .... P[A] (1 −p) . .. . P[B] n −x −1 k −1 (1 −p) k−1 p n−x−1−(k−1) . .. . P[C] (1) After simplifying, a complete expression for the joint PMF is P K,X (k, x) = n−x−1 k−1 p n−k (1 −p) k x +k ≤ n, x ≥ 0, k ≥ 0 0 otherwise (2) Problem 4.3.1 Solution On the X, Y plane, the joint PMF P X,Y (x, y) is E T y x P X,Y (x, y) • c • 3c • 2c • 6c • 4c • 12c 0 1 2 3 4 0 1 2 3 4 By choosing c = 1/28, the PMF sums to one. (a) The marginal PMFs of X and Y are P X (x) = ¸ y=1,3 P X,Y (x, y) = 4/28 x = 1 8/28 x = 2 16/28 x = 4 0 otherwise (1) P Y (y) = ¸ x=1,2,4 P X,Y (x, y) = 7/28 y = 1 21/28 y = 3 0 otherwise (2) (b) The expected values of X and Y are E [X] = ¸ x=1,2,4 xP X (x) = (4/28) + 2(8/28) + 4(16/28) = 3 (3) E [Y ] = ¸ y=1,3 yP Y (y) = 7/28 + 3(21/28) = 5/2 (4) 131 (c) The second moments are E X 2 = ¸ x=1,2,4 xP X (x) = 1 2 (4/28) + 2 2 (8/28) + 4 2 (16/28) = 73/7 (5) E Y 2 = ¸ y=1,3 yP Y (y) = 1 2 (7/28) + 3 2 (21/28) = 7 (6) The variances are Var[X] = E X 2 −(E [X]) 2 = 10/7 Var[Y ] = E Y 2 −(E [Y ]) 2 = 3/4 (7) The standard deviations are σ X = 10/7 and σ Y = 3/4. Problem 4.3.2 Solution On the X, Y plane, the joint PMF is ' E T c y x P X,Y (x, y) • 3c • 2c • c • c • c • c • 2c • 3c 1 2 1 The PMF sums to one when c = 1/14. (a) The marginal PMFs of X and Y are P X (x) = ¸ y=−1,0,1 P X,Y (x, y) = 6/14 x = −2, 2 2/14 x = 0 0 otherwise (1) P Y (y) = ¸ x=−2,0,2 P X,Y (x, y) = 5/14 y = −1, 1 4/14 y = 0 0 otherwise (2) (b) The expected values of X and Y are E [X] = ¸ x=−2,0,2 xP X (x) = −2(6/14) + 2(6/14) = 0 (3) E [Y ] = ¸ y=−1,0,1 yP Y (y) = −1(5/14) + 1(5/14) = 0 (4) (c) Since X and Y both have zero mean, the variances are Var[X] = E X 2 = ¸ x=−2,0,2 x 2 P X (x) = (−2) 2 (6/14) + 2 2 (6/14) = 24/7 (5) Var[Y ] = E Y 2 = ¸ y=−1,0,1 y 2 P Y (y) = (−1) 2 (5/14) + 1 2 (5/14) = 5/7 (6) The standard deviations are σ X = 24/7 and σ Y = 5/7. 132 Problem 4.3.3 Solution We recognize that the given joint PMF is written as the product of two marginal PMFs P N (n) and P K (k) where P N (n) = 100 ¸ k=0 P N,K (n, k) = 100 n e −100 n! n = 0, 1, . . . 0 otherwise (1) P K (k) = ∞ ¸ n=0 P N,K (n, k) = 100 k p k (1 −p) 100−k k = 0, 1, . . . , 100 0 otherwise (2) Problem 4.3.4 Solution The joint PMF of N, K is P N,K (n, k) = (1 −p) n−1 p/n k = 1, 2, . . . , n n = 1, 2 . . . o otherwise (1) For n ≥ 1, the marginal PMF of N is P N (n) = n ¸ k=1 P N,K (n, k) = n ¸ k=1 (1 −p) n−1 p/n = (1 −p) n−1 p (2) The marginal PMF of K is found by summing P N,K (n, k) over all possible N. Note that if K = k, then N ≥ k. Thus, P K (k) = ∞ ¸ n=k 1 n (1 −p) n−1 p (3) Unfortunately, this sum cannot be simplified. Problem 4.3.5 Solution For n = 0, 1, . . ., the marginal PMF of N is P N (n) = ¸ k P N,K (n, k) = n ¸ k=0 100 n e −100 (n + 1)! = 100 n e −100 n! (1) For k = 0, 1, . . ., the marginal PMF of K is P K (k) = ∞ ¸ n=k 100 n e −100 (n + 1)! = 1 100 ∞ ¸ n=k 100 n+1 e −100 (n + 1)! (2) = 1 100 ∞ ¸ n=k P N (n + 1) (3) = P [N > k] /100 (4) Problem 4.4.1 Solution 133 (a) The joint PDF of X and Y is Y X Y + X = 1 1 1 f X,Y (x, y) = c x +y ≤ 1, x, y ≥ 0 0 otherwise (1) To find the constant c we integrate over the region shown. This gives 1 0 1−x 0 c dy dx = cx − cx 2 1 0 = c 2 = 1 (2) Therefore c = 2. (b) To find the P[X ≤ Y ] we look to integrate over the area indicated by the graph Y X X=Y 1 1 X Y £ P [X ≤ Y ] = 1/2 0 1−x x dy dx (3) = 1/2 0 (2 −4x) dx (4) = 1/2 (5) (c) The probability P[X +Y ≤ 1/2] can be seen in the figure. Here we can set up the following integrals Y X Y + X = 1 Y + X = ½ 1 1 P [X +Y ≤ 1/2] = 1/2 0 1/2−x 0 2 dy dx (6) = 1/2 0 (1 −2x) dx (7) = 1/2 −1/4 = 1/4 (8) Problem 4.4.2 Solution Given the joint PDF f X,Y (x, y) = cxy 2 0 ≤ x, y ≤ 1 0 otherwise (1) (a) To find the constant c integrate f X,Y (x, y) over the all possible values of X and Y to get 1 = 1 0 1 0 cxy 2 dxdy = c/6 (2) Therefore c = 6. 134 (b) The probability P[X ≥ Y ] is the integral of the joint PDF f X,Y (x, y) over the indicated shaded region. X 1 1 Y P [X ≥ Y ] = 1 0 x 0 6xy 2 dy dx (3) = 1 0 2x 4 dx (4) = 2/5 (5) X 1 1 Y Y=X 2 Similarly, to find P[Y ≤ X 2 ] we can integrate over the region shown in the figure. P Y ≤ X 2 = 1 0 x 2 0 6xy 2 dy dx (6) = 1/4 (7) (c) Here we can choose to either integrate f X,Y (x, y) over the lighter shaded region, which would require the evaluation of two integrals, or we can perform one integral over the darker region by recognizing X 1 1 Y min(X,Y) < ½ min(X,Y) > ½ P [min(X, Y ) ≤ 1/2] = 1 −P [min(X, Y ) > 1/2] (8) = 1 − 1 1/2 1 1/2 6xy 2 dxdy (9) = 1 − 1 1/2 9y 2 4 dy = 11 32 (10) (d) The probability P[max(X, Y ) ≤ 3/4] can be found be integrating over the shaded region shown below. X 1 1 Y max(X,Y) < ¾ P [max(X, Y ) ≤ 3/4] = P [X ≤ 3/4, Y ≤ 3/4] (11) = 3 4 0 3 4 0 6xy 2 dxdy (12) = x 2 3/4 0 y 3 3/4 0 (13) = (3/4) 5 = 0.237 (14) Problem 4.4.3 Solution The joint PDF of X and Y is f X,Y (x, y) = 6e −(2x+3y) x ≥ 0, y ≥ 0, 0 otherwise. (1) 135 (a) The probability that X ≥ Y is: Y X X Y ³ P [X ≥ Y ] = ∞ 0 x 0 6e −(2x+3y) dy dx (2) = ∞ 0 2e −2x −e −3y y=x y=0 dx (3) = ∞ 0 [2e −2x −2e −5x ] dx = 3/5 (4) The P[X +Y ≤ 1] is found by integrating over the region where X +Y ≤ 1 Y X X+Y 1 ≤ 1 1 P [X +Y ≤ 1] = 1 0 1−x 0 6e −(2x+3y) dy dx (5) = 1 0 2e −2x −e −3y y=1−x y=0 dx (6) = 1 0 2e −2x 1 −e −3(1−x) dx (7) = −e −2x −2e x−3 1 0 (8) = 1 + 2e −3 −3e −2 (9) (b) The event min(X, Y ) ≥ 1 is the same as the event {X ≥ 1, Y ≥ 1}. Thus, P [min(X, Y ) ≥ 1] = ∞ 1 ∞ 1 6e −(2x+3y) dy dx = e −(2+3) (10) (c) The event max(X, Y ) ≤ 1 is the same as the event {X ≤ 1, Y ≤ 1} so that P [max(X, Y ) ≤ 1] = 1 0 1 0 6e −(2x+3y) dy dx = (1 −e −2 )(1 −e −3 ) (11) Problem 4.4.4 Solution The only difference between this problem and Example 4.5 is that in this problem we must integrate the joint PDF over the regions to find the probabilities. Just as in Example 4.5, there are five cases. We will use variable u and v as dummy variables for x and y. • x < 0 or y < 0 x X y Y 1 1 In this case, the region of integration doesn’t overlap the region of nonzero probability and F X,Y (x, y) = y −∞ x −∞ f X,Y (u, v) dudv = 0 (1) 136 • 0 < y ≤ x ≤ 1 In this case, the region where the integral has a nonzero contribution is x X y Y 1 1 F X,Y (x, y) = y −∞ x −∞ f X,Y (u, v) dy dx (2) = y 0 x v 8uv dudv (3) = y 0 4(x 2 −v 2 )v dv (4) = 2x 2 v 2 −v 4 v=y v=0 = 2x 2 y 2 −y 4 (5) • 0 < x ≤ y and 0 ≤ x ≤ 1 x X y Y 1 1 F X,Y (x, y) = y −∞ x −∞ f X,Y (u, v) dv du (6) = x 0 u 0 8uv dv du (7) = x 0 4u 3 du = x 4 (8) • 0 < y ≤ 1 and x ≥ 1 X Y 1 1 y x F X,Y (x, y) = y −∞ x −∞ f X,Y (u, v) dv du (9) = y 0 1 v 8uv dudv (10) = y 0 4v(1 −v 2 ) dv (11) = 2y 2 −y 4 (12) • x ≥ 1 and y ≥ 1 137 X Y 1 1 y x In this case, the region of integration completely covers the region of nonzero probability and F X,Y (x, y) = y −∞ x −∞ f X,Y (u, v) dudv (13) = 1 (14) The complete answer for the joint CDF is F X,Y (x, y) = 0 x < 0 or y < 0 2x 2 y 2 −y 4 0 < y ≤ x ≤ 1 x 4 0 ≤ x ≤ y, 0 ≤ x ≤ 1 2y 2 −y 4 0 ≤ y ≤ 1, x ≥ 1 1 x ≥ 1, y ≥ 1 (15) Problem 4.5.1 Solution (a) The joint PDF (and the corresponding region of nonzero probability) are Y X 1 -1 f X,Y (x, y) = 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) (b) P [X > 0] = 1 0 1 x 1 2 dy dx = 1 0 1 −x 2 dx = 1/4 (2) This result can be deduced by geometry. The shaded triangle of the X, Y plane corresponding to the event X > 0 is 1/4 of the total shaded area. (c) For x > 1 or x < −1, f X (x) = 0. For −1 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = 1 x 1 2 dy = (1 −x)/2. (3) The complete expression for the marginal PDF is f X (x) = (1 −x)/2 −1 ≤ x ≤ 1, 0 otherwise. (4) 138 (d) From the marginal PDF f X (x), the expected value of X is E [X] = ∞ −∞ xf X (x) dx = 1 2 1 −1 x(1 −x) dx (5) = x 2 4 − x 3 6 1 −1 = − 1 3 . (6) Problem 4.5.2 Solution f X,Y (x, y) = 2 x +y ≤ 1, x, y ≥ 0 0 otherwise (1) Y X Y + X = 1 1 1 Using the figure to the left we can find the marginal PDFs by integrating over the appropriate regions. f X (x) = 1−x 0 2 dy = 2(1 −x) 0 ≤ x ≤ 1 0 otherwise (2) Likewise for f Y (y): f Y (y) = 1−y 0 2 dx = 2(1 −y) 0 ≤ y ≤ 1 0 otherwise (3) Problem 4.5.3 Solution Random variables X and Y have joint PDF f X,Y (x, y) = 1/(πr 2 ) 0 ≤ x 2 +y 2 ≤ r 2 0 otherwise (1) (a) The marginal PDF of X is f X (x) = 2 √ r 2 −x 2 − √ r 2 −x 2 1 πr 2 dy = 2 √ r 2 −x 2 πr 2 −r ≤ x ≤ r, 0 otherwise. (2) (b) Similarly, for random variable Y , f Y (y) = 2 √ r 2 −y 2 − √ r 2 −y 2 1 πr 2 dx = 2 √ r 2 −y 2 πr 2 −r ≤ y ≤ r, 0 otherwise. (3) Problem 4.5.4 Solution The joint PDF of X and Y and the region of nonzero probability are Y X 1 1 -1 f X,Y (x, y) = 5x 2 /2 −1 ≤ x ≤ 1, 0 ≤ y ≤ x 2 0 otherwise (1) We can find the appropriate marginal PDFs by integrating the joint PDF. 139 (a) The marginal PDF of X is f X (x) = x 2 0 5x 2 2 dy = 5x 4 /2 −1 ≤ x ≤ 1 0 otherwise (2) (b) Note that f Y (y) = 0 for y > 1 or y < 0. For 0 ≤ y ≤ 1, Y X 1 1 -1 y - y Ö Öy f Y (y) = ∞ −∞ f X,Y (x, y) dx (3) = − √ y −1 5x 2 2 dx + 1 √ y 5x 2 2 dx (4) = 5(1 −y 3/2 )/3 (5) The complete expression for the marginal CDF of Y is f Y (y) = 5(1 −y 3/2 )/3 0 ≤ y ≤ 1 0 otherwise (6) Problem 4.5.5 Solution In this problem, the joint PDF is f X,Y (x, y) = 2 |xy| /r 4 0 ≤ x 2 +y 2 ≤ r 2 0 otherwise (1) (a) Since |xy| = |x||y|, for −r ≤ x ≤ r, we can write f X (x) = ∞ −∞ f X,Y (x, y) dy = 2 |x| r 4 √ r 2 −x 2 − √ r 2 −x 2 |y| dy (2) Since |y| is symmetric about the origin, we can simplify the integral to f X (x) = 4 |x| r 4 √ r 2 −x 2 0 y dy = 2 |x| r 4 y 2 √ r 2 −x 2 0 = 2 |x| (r 2 −x 2 ) r 4 (3) Note that for |x| > r, f X (x) = 0. Hence the complete expression for the PDF of X is f X (x) = 2|x|(r 2 −x 2 ) r 4 −r ≤ x ≤ r 0 otherwise (4) (b) Note that the joint PDF is symmetric in x and y so that f Y (y) = f X (y). Problem 4.5.6 Solution 140 (a) The joint PDF of X and Y and the region of nonzero probability are Y X 1 1 f X,Y (x, y) = cy 0 ≤ y ≤ x ≤ 1 0 otherwise (1) (b) To find the value of the constant, c, we integrate the joint PDF over all x and y. ∞ −∞ ∞ −∞ f X,Y (x, y) dxdy = 1 0 x 0 cy dy dx = 1 0 cx 2 2 dx = cx 3 6 1 0 = c 6 . (2) Thus c = 6. (c) We can find the CDF F X (x) = P[X ≤ x] by integrating the joint PDF over the event X ≤ x. For x < 0, F X (x) = 0. For x > 1, F X (x) = 1. For 0 ≤ x ≤ 1, Y X 1 x 1 F X (x) = x ≤x f X,Y x , y dy dx (3) = x 0 x 0 6y dy dx (4) = x 0 3(x ) 2 dx = x 3 . (5) The complete expression for the joint CDF is F X (x) = 0 x < 0 x 3 0 ≤ x ≤ 1 1 x ≥ 1 (6) (d) Similarly, we find the CDF of Y by integrating f X,Y (x, y) over the event Y ≤ y. For y < 0, F Y (y) = 0 and for y > 1, F Y (y) = 1. For 0 ≤ y ≤ 1, Y X 1 y 1 F Y (y) = y ≤y f X,Y x , y dy dx (7) = y 0 1 y 6y dx dy (8) = y 0 6y (1 −y ) dy (9) = 3(y ) 2 −2(y ) 3 y 0 = 3y 2 −2y 3 . (10) The complete expression for the CDF of Y is F Y (y) = 0 y < 0 3y 2 −2y 3 0 ≤ y ≤ 1 1 y > 1 (11) 141 (e) To find P[Y ≤ X/2], we integrate the joint PDF f X,Y (x, y) over the region y ≤ x/2. Y X 1 ½ 1 P [Y ≤ X/2] = 1 0 x/2 0 6y dy dx (12) = 1 0 3y 2 x/2 0 dx (13) = 1 0 3x 2 4 dx = 1/4 (14) Problem 4.6.1 Solution In this problem, it is helpful to label possible points X, Y along with the corresponding values of W = X −Y . From the statement of Problem 4.6.1, E T y x P X,Y (x, y) • W=0 1/28 • W=−2 3/28 • W=1 2/28 • W=−1 6/28 • W=3 4/28 • W=1 12/28 0 1 2 3 4 0 1 2 3 4 (a) To find the PMF of W, we simply add the probabilities associated with each possible value of W: P W (−2) = P X,Y (1, 3) = 3/28 P W (−1) = P X,Y (2, 3) = 6/28 (1) P W (0) = P X,Y (1, 1) = 1/28 P W (1) = P X,Y (2, 1) +P X,Y (4, 3) (2) P W (3) = P X,Y (4, 1) = 4/28 = 14/28 (3) For all other values of w, P W (w) = 0. (b) The expected value of W is E [W] = ¸ w wP W (w) (4) = −2(3/28) +−1(6/28) + 0(1/28) + 1(14/28) + 3(4/28) = 1/2 (5) (c) P[W > 0] = P W (1) +P W (3) = 18/28. 142 Problem 4.6.2 Solution ' E T c y x P X,Y (x, y) • 3c W=−4 • 2c W=−2 • c W=0 • c W=−2 • c W=2 • c W=0 • 2c W=2 • 3c W=4 1 2 1 In Problem 4.2.2, the joint PMF P X,Y (x, y) is given in terms of the parameter c. For this problem, we first need to find c. Before doing so, it is convenient to label each possible X, Y point with the corresponding value of W = X + 2Y . To find c, we sum the PMF over all possible values of X and Y . We choose c so the sum equals one. ¸ x ¸ y P X,Y (x, y) = ¸ x=−2,0,2 ¸ y=−1,0,1 c |x +y| (1) = 6c + 2c + 6c = 14c (2) Thus c = 1/14. Now we can solve the actual problem. (a) From the above graph, we can calculate the probability of each possible value of w. P W (−4) = P X,Y (−2, −1) = 3c (3) P W (−2) = P X,Y (−2, 0) +P X,Y (0, −1) = 3c (4) P W (0) = P X,Y (−2, 1) +P X,Y (2, −1) = 2c (5) P W (2) = P X,Y (0, 1) +P X,Y (2, 0) = 3c (6) P W (4) = P X,Y (2, 1) = 3c (7) With c = 1/14, we can summarize the PMF as P W (w) = 3/14 w = −4, −2, 2, 4 2/14 w = 0 0 otherwise (8) (b) The expected value is now straightforward: E [W] = 3 14 (−4 +−2 + 2 + 4) + 2 14 0 = 0. (9) (c) Lastly, P[W > 0] = P W (2) +P W (4) = 3/7. Problem 4.6.3 Solution We observe that when X = x, we must have Y = w −x in order for W = w. That is, P W (w) = ∞ ¸ x=−∞ P [X = x, Y = w −x] = ∞ ¸ x=−∞ P X,Y (x, w −x) (1) 143 Problem 4.6.4 Solution X Y w w W>w The x, y pairs with nonzero probability are shown in the figure. For w = 0, 1, . . . , 10, we observe that P [W > w] = P [min(X, Y ) > w] (1) = P [X > w, Y > w] (2) = 0.01(10 −w) 2 (3) To find the PMF of W, we observe that for w = 1, . . . , 10, P W (w) = P [W > w −1] −P [W > w] (4) = 0.01[(10 −w −1) 2 −(10 −w) 2 ] = 0.01(21 −2w) (5) The complete expression for the PMF of W is P W (w) = 0.01(21 −2w) w = 1, 2, . . . , 10 0 otherwise (6) Problem 4.6.5 Solution X Y v v V<v The x, y pairs with nonzero probability are shown in the figure. For v = 1, . . . , 11, we observe that P [V < v] = P [max(X, Y ) < v] (1) = P [X < v, Y < v] (2) = 0.01(v −1) 2 (3) To find the PMF of V , we observe that for v = 1, . . . , 10, P V (v) = P [V < v + 1] −P [V < v] (4) = 0.01[v 2 −(v −1) 2 ] (5) = 0.01(2v −1) (6) The complete expression for the PMF of V is P V (v) = 0.01(2v −1) v = 1, 2, . . . , 10 0 otherwise (7) 144 Problem 4.6.6 Solution (a) The minimum value of W is W = 0, which occurs when X = 0 and Y = 0. The maximum value of W is W = 1, which occurs when X = 1 or Y = 1. The range of W is S W = {w|0 ≤ w ≤ 1}. (b) For 0 ≤ w ≤ 1, the CDF of W is Y X 1 w 1 w W<w F W (w) = P [max(X, Y ) ≤ w] (1) = P [X ≤ w, Y ≤ w] (2) = w 0 w 0 f X,Y (x, y) dy dx (3) Substituting f X,Y (x, y) = x +y yields F W (w) = w 0 w 0 (x +y) dy dx (4) = w 0 xy + y 2 2 y=w y=0 dx = w 0 (wx +w 2 /2) dx = w 3 (5) The complete expression for the CDF is F W (w) = 0 w < 0 w 3 0 ≤ w ≤ 1 1 otherwise (6) The PDF of W is found by differentiating the CDF. f W (w) = dF W (w) dw = 3w 2 0 ≤ w ≤ 1 0 otherwise (7) Problem 4.6.7 Solution (a) Since the joint PDF f X,Y (x, y) is nonzero only for 0 ≤ y ≤ x ≤ 1, we observe that W = Y −X ≤ 0 since Y ≤ X. In addition, the most negative value of W occurs when Y = 0 and X = 1 and W = −1. Hence the range of W is S W = {w| −1 ≤ w ≤ 0}. (b) For w < −1, F W (w) = 0. For w > 0, F W (w) = 1. For −1 ≤ w ≤ 0, the CDF of W is Y X 1 -w ½ 1 Y=X+w F W (w) = P [Y −X ≤ w] (1) = 1 −w x+w 0 6y dy dx (2) = 1 −w 3(x +w) 2 dx (3) = (x +w) 3 1 −w = (1 +w) 3 (4) 145 Therefore, the complete CDF of W is F W (w) = 0 w < −1 (1 +w) 3 −1 ≤ w ≤ 0 1 w > 0 (5) By taking the derivative of f W (w) with respect to w, we obtain the PDF f W (w) = 3(w + 1) 2 −1 ≤ w ≤ 0 0 otherwise (6) Problem 4.6.8 Solution Random variables X and Y have joint PDF Y X 1 1 f X,Y (x, y) = 2 0 ≤ y ≤ x ≤ 1 0 otherwise (1) (a) Since X and Y are both nonnegative, W = Y/X ≥ 0. Since Y ≤ X, W = Y/X ≤ 1. Note that W = 0 can occur if Y = 0. Thus the range of W is S W = {w|0 ≤ w ≤ 1}. (b) For 0 ≤ w ≤ 1, the CDF of W is Y X 1 1 w P[Y<wX] F W (w) = P [Y/X ≤ w] = P [Y ≤ wX] = w (2) The complete expression for the CDF is F W (w) = 0 w < 0 w 0 ≤ w < 1 1 w ≥ 1 (3) By taking the derivative of the CDF, we find that the PDF of W is f W (w) = 1 0 ≤ w < 1 0 otherwise (4) We see that W has a uniform PDF over [0, 1]. Thus E[W] = 1/2. Problem 4.6.9 Solution Random variables X and Y have joint PDF Y X 1 1 f X,Y (x, y) = 2 0 ≤ y ≤ x ≤ 1 0 otherwise (1) 146 (a) Since f X,Y (x, y) = 0 for y > x, we can conclude that Y ≤ X and that W = X/Y ≥ 1. Since Y can be arbitrarily small but positive, W can be arbitrarily large. Hence the range of W is S W = {w|w ≥ 1}. (b) For w ≥ 1, the CDF of W is Y X 1 1 1/w P[Y<X/w] F W (w) = P [X/Y ≤ w] (2) = 1 −P [X/Y > w] (3) = 1 −P [Y < X/w] (4) = 1 −1/w (5) Note that we have used the fact that P[Y < X/w] equals 1/2 times the area of the corre- sponding triangle. The complete CDF is F W (w) = 0 w < 1 1 −1/w w ≥ 1 (6) The PDF of W is found by differentiating the CDF. f W (w) = dF W (w) dw = 1/w 2 w ≥ 1 0 otherwise (7) To find the expected value E[W], we write E [W] = ∞ −∞ wf W (w) dw = ∞ 1 dw w . (8) However, the integral diverges and E[W] is undefined. Problem 4.6.10 Solution The position of the mobile phone is equally likely to be anywhere in the area of a circle with radius 16 km. Let X and Y denote the position of the mobile. Since we are given that the cell has a radius of 4 km, we will measure X and Y in kilometers. Assuming the base station is at the origin of the X, Y plane, the joint PDF of X and Y is f X,Y (x, y) = 1 16π x 2 +y 2 ≤ 16 0 otherwise (1) Since the mobile’s radial distance from the base station is R = √ X 2 +Y 2 , the CDF of R is F R (r) = P [R ≤ r] = P X 2 +Y 2 ≤ r 2 (2) By changing to polar coordinates, we see that for 0 ≤ r ≤ 4 km, F R (r) = 2π 0 r 0 r 16π dr dθ = r 2 /16 (3) So F R (r) = 0 r < 0 r 2 /16 0 ≤ r < 4 1 r ≥ 4 (4) Then by taking the derivative with respect to r we arrive at the PDF f R (r) = r/8 0 ≤ r ≤ 4 0 otherwise (5) 147 Problem 4.6.11 Solution Following the hint, we observe that either Y ≥ X or X ≥ Y , or, equivalently, (Y/X) ≥ 1 or (X/Y ) ≥ 1. Hence, W ≥ 1. To find the CDF F W (w), we know that F W (w) = 0 for w < 1. For w ≥ 1, we solve F W (w) = P[max[(X/Y ), (Y/X)] ≤ w] = P[(X/Y ) ≤ w, (Y/X) ≤ w] = P[Y ≥ X/w, Y ≤ wX] = P[X/w ≤ Y ≤ wX] Y X Y=wX Y=X/w a/w a/w a a We note that in the middle of the above steps, nonnegativity of X and Y was essential. We can depict the given set {X/w ≤ Y ≤ wX} as the dark region on the X, Y plane. Because the PDF is uniform over the square, it is easier to use geometry to calculate the probability. In particular, each of the lighter triangles that are not part of the region of interest has area a 2 /2w. This implies P [X/w ≤ Y ≤ wX] = 1 − a 2 /2w +a 2 /2w a 2 = 1 − 1 w (1) The final expression for the CDF of W is F W (w) = 0 w < 1 1 −1/w w ≥ 1 (2) By taking the derivative, we obtain the PDF f W (w) = 0 w < 1 1/w 2 w ≥ 1 (3) Problem 4.7.1 Solution E T y x P X,Y (x, y) • 1/28 • 3/28 • 2/28 • 6/28 • 4/28 • 12/28 0 1 2 3 4 0 1 2 3 4 In Problem 4.2.1, we found the joint PMF P X,Y (x, y) as shown. Also the expected values and variances were E [X] = 3 Var[X] = 10/7 (1) E [Y ] = 5/2 Var[Y ] = 3/4 (2) We use these results now to solve this problem. (a) Random variable W = Y/X has expected value E [Y/X] = ¸ x=1,2,4 ¸ y=1,3 y x P X,Y (x, y) (3) = 1 1 1 28 + 3 1 3 28 + 1 2 2 28 + 3 2 6 28 + 1 4 4 28 + 3 4 12 28 = 15/14 (4) 148 (b) The correlation of X and Y is r X,Y = ¸ x=1,2,4 ¸ y=1,3 xyP X,Y (x, y) (5) = 1 · 1 · 1 28 + 1 · 3 · 3 28 + 2 · 1 · 2 28 + 2 · 3 · 6 28 + 4 · 1 · 4 28 + 4 · 3 · 12 28 (6) = 210/28 = 15/2 (7) Recognizing that P X,Y (x, y) = xy/28 yields the faster calculation r X,Y = E [XY ] = ¸ x=1,2,4 ¸ y=1,3 (xy) 2 28 (8) = 1 28 ¸ x=1,2,4 x 2 ¸ y=1,3 y 2 (9) = 1 28 (1 + 2 2 + 4 2 )(1 2 + 3 2 ) = 210 28 = 15 2 (10) (c) The covariance of X and Y is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 15 2 −3 5 2 = 0 (11) (d) Since X and Y have zero covariance, the correlation coefficient is ρ X,Y = Cov [X, Y ] Var[X] Var[Y ] = 0. (12) (e) Since X and Y are uncorrelated, the variance of X +Y is Var[X +Y ] = Var[X] + Var[Y ] = 61 28 . (13) Problem 4.7.2 Solution ' E T c y x P X,Y (x, y) • 3/14 • 2/14 • 1/14 • 1/14 • 1/14 • 1/14 • 2/14 • 3/14 1 2 1 In Problem 4.2.1, we found the joint PMF P X,Y (x, y) shown here. The expected values and variances were found to be E [X] = 0 Var[X] = 24/7 (1) E [Y ] = 0 Var[Y ] = 5/7 (2) We need these results to solve this problem. 149 (a) Random variable W = 2 XY has expected value E 2 XY = ¸ x=−2,0,2 ¸ y=−1,0,1 2 xy P X,Y (x, y) (3) = 2 −2(−1) 3 14 + 2 −2(0) 2 14 + 2 −2(1) 1 14 + 2 0(−1) 1 14 + 2 0(1) 1 14 (4) + 2 2(−1) 1 14 + 2 2(0) 2 14 + 2 2(1) 3 14 (5) = 61/28 (6) (b) The correlation of X and Y is r X,Y = ¸ x=−2,0,2 ¸ y=−1,0,1 xyP X,Y (x, y) (7) = −2(−1)(3) 14 + −2(0)(2) 14 + −2(1)(1) 14 + 2(−1)(1) 14 + 2(0)(2) 14 + 2(1)(3) 14 (8) = 4/7 (9) (c) The covariance of X and Y is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 4/7 (10) (d) The correlation coefficient is ρ X,Y = Cov [X, Y ] Var[X] Var[Y ] = 2 √ 30 (11) (e) By Theorem 4.16, Var [X +Y ] = Var [X] + Var[Y ] + 2 Cov [X, Y ] (12) = 24 7 + 5 7 + 2 4 7 = 37 7 . (13) Problem 4.7.3 Solution In the solution to Quiz 4.3, the joint PMF and the marginal PMFs are P H,B (h, b) b = 0 b = 2 b = 4 P H (h) h = −1 0 0.4 0.2 0.6 h = 0 0.1 0 0.1 0.2 h = 1 0.1 0.1 0 0.2 P B (b) 0.2 0.5 0.3 (1) From the joint PMF, the correlation coefficient is r H,B = E [HB] = 1 ¸ h=−1 ¸ b=0,2,4 hbP H,B (h, b) (2) = −1(2)(0.4) + 1(2)(0.1) +−1(4)(0.2) + 1(4)(0) (3) = −1.4 (4) 150 since only terms in which both h and b are nonzero make a contribution. Using the marginal PMFs, the expected values of X and Y are E [H] = 1 ¸ h=−1 hP H (h) = −1(0.6) + 0(0.2) + 1(0.2) = −0.2 (5) E [B] = ¸ b=0,2,4 bP B (b) = 0(0.2) + 2(0.5) + 4(0.3) = 2.2 (6) The covariance is Cov [H, B] = E [HB] −E [H] E [B] = −1.4 −(−0.2)(2.2) = −0.96 (7) Problem 4.7.4 Solution From the joint PMF, P X (x)Y , found in Example 4.13, we can find the marginal PMF for X or Y by summing over the columns or rows of the joint PMF. P Y (y) = 25/48 y = 1 13/48 y = 2 7/48 y = 3 3/48 y = 4 0 otherwise P X (x) = 1/4 x = 1, 2, 3, 4 0 otherwise (1) (a) The expected values are E [Y ] = 4 ¸ y=1 yP Y (y) = 1 25 48 + 2 13 48 + 3 7 48 + 4 3 48 = 7/4 (2) E [X] = 4 ¸ x=1 xP X (x) = 1 4 (1 + 2 + 3 + 4) = 5/2 (3) (b) To find the variances, we first find the second moments. E Y 2 = 4 ¸ y=1 y 2 P Y (y) = 1 2 25 48 + 2 2 13 48 + 3 2 7 48 + 4 2 3 48 = 47/12 (4) E X 2 = 4 ¸ x=1 x 2 P X (x) = 1 4 1 2 + 2 2 + 3 2 + 4 2 = 15/2 (5) Now the variances are Var[Y ] = E Y 2 −(E [Y ]) 2 = 47/12 −(7/4) 2 = 41/48 (6) Var[X] = E X 2 −(E [X]) 2 = 15/2 −(5/2) 2 = 5/4 (7) 151 (c) To find the correlation, we evaluate the product XY over all values of X and Y . Specifically, r X,Y = E [XY ] = 4 ¸ x=1 x ¸ y=1 xyP X,Y (x, y) (8) = 1 4 + 2 8 + 3 12 + 4 16 + 4 8 + 6 12 + 8 16 + 9 12 + 12 16 + 16 16 (9) = 5 (10) (d) The covariance of X and Y is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 5 −(7/4)(10/4) = 10/16 (11) (e) The correlation coefficient is ρ X,Y = Cov [W, V ] Var[W] Var[V ] = 10/16 (41/48)(5/4) ≈ 0.605 (12) Problem 4.7.5 Solution For integers 0 ≤ x ≤ 5, the marginal PMF of X is P X (x) = ¸ y P X,Y (x, y) = x ¸ y=0 (1/21) = x + 1 21 (1) Similarly, for integers 0 ≤ y ≤ 5, the marginal PMF of Y is P Y (y) = ¸ x P X,Y (x, y) = 5 ¸ x=y (1/21) = 6 −y 21 (2) The complete expressions for the marginal PMFs are P X (x) = (x + 1)/21 x = 0, . . . , 5 0 otherwise (3) P Y (y) = (6 −y)/21 y = 0, . . . , 5 0 otherwise (4) The expected values are E [X] = 5 ¸ x=0 x x + 1 21 = 70 21 = 10 3 E [Y ] = 5 ¸ y=0 y 6 −y 21 = 35 21 = 5 3 (5) To find the covariance, we first find the correlation E [XY ] = 5 ¸ x=0 x ¸ y=0 xy 21 = 1 21 5 ¸ x=1 x x ¸ y=1 y = 1 42 5 ¸ x=1 x 2 (x + 1) = 280 42 = 20 3 (6) The covariance of X and Y is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 20 3 − 50 9 = 10 9 (7) 152 Problem 4.7.6 Solution E T y x • 1/4 • 1/8 • 1/8 • 1/12 • 1/12 • 1/12 • 1/16 • 1/16 • 1/16 • 1/16 • W=1 V =1 • W=1 V =2 • W=2 V =2 • W=1 V =3 • W=2 V =3 • W=3 V =3 • W=1 V =4 • W=2 V =4 • W=3 V =4 • W=4 V =4 P X,Y (x, y) 0 1 2 3 4 0 1 2 3 4 To solve this problem, we identify the values of W = min(X, Y ) and V = max(X, Y ) for each possible pair x, y. Here we observe that W = Y and V = X. This is a result of the under- lying experiment in that given X = x, each Y ∈ {1, 2, . . . , x} is equally likely. Hence Y ≤ X. This implies min(X, Y ) = Y and max(X, Y ) = X. Using the results from Problem 4.7.4, we have the following answers. (a) The expected values are E [W] = E [Y ] = 7/4 E [V ] = E [X] = 5/2 (1) (b) The variances are Var[W] = Var[Y ] = 41/48 Var[V ] = Var[X] = 5/4 (2) (c) The correlation is r W,V = E [WV ] = E [XY ] = r X,Y = 5 (3) (d) The covariance of W and V is Cov [W, V ] = Cov [X, Y ] = 10/16 (4) (e) The correlation coefficient is ρ W,V = ρ X,Y = 10/16 (41/48)(5/4) ≈ 0.605 (5) Problem 4.7.7 Solution First, we observe that Y has mean µ Y = aµ X +b and variance Var[Y ] = a 2 Var[X]. The covariance of X and Y is Cov [X, Y ] = E [(X −µ X )(aX +b −aµ X −b)] (1) = aE (X −µ X ) 2 (2) = a Var[X] (3) The correlation coefficient is ρ X,Y = Cov [X, Y ] Var[X] Var[Y ] = a Var[X] Var[X] a 2 Var[X] = a |a| (4) When a > 0, ρ X,Y = 1. When a < 0, ρ X,Y = 1. 153 Problem 4.7.8 Solution The joint PDF of X and Y is f X,Y (x, y) = (x +y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2 0 otherwise (1) Before calculating moments, we first find the marginal PDFs of X and Y . For 0 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = 2 0 x +y 3 dy = xy 3 + y 2 6 y=2 y=0 = 2x + 2 3 (2) For 0 ≤ y ≤ 2, f Y (y) = ∞ −∞ f X,Y (x, y) dx = 1 0 x 3 + y 3 dx = x 2 6 + xy 3 x=1 x=0 = 2y + 1 6 (3) Complete expressions for the marginal PDFs are f X (x) = 2x+2 3 0 ≤ x ≤ 1 0 otherwise f Y (y) = 2y+1 6 0 ≤ y ≤ 2 0 otherwise (4) (a) The expected value of X is E [X] = ∞ −∞ xf X (x) dx = 1 0 x 2x + 2 3 dx = 2x 3 9 + x 2 3 1 0 = 5 9 (5) The second moment of X is E X 2 = ∞ −∞ x 2 f X (x) dx = 1 0 x 2 2x + 2 3 dx = x 4 6 + 2x 3 9 1 0 = 7 18 (6) The variance of X is Var[X] = E[X 2 ] −(E[X]) 2 = 7/18 −(5/9) 2 = 13/162. (b) The expected value of Y is E [Y ] = ∞ −∞ yf Y (y) dy = 2 0 y 2y + 1 6 dy = y 2 12 + y 3 9 2 0 = 11 9 (7) The second moment of Y is E Y 2 = ∞ −∞ y 2 f Y (y) dy = 2 0 y 2 2y + 1 6 dy = y 3 18 + y 4 12 2 0 = 16 9 (8) The variance of Y is Var[Y ] = E[Y 2 ] −(E[Y ]) 2 = 23/81. (c) The correlation of X and Y is E [XY ] = xyf X,Y (x, y) dxdy (9) = 1 0 2 0 xy x +y 3 dy dx (10) = 1 0 x 2 y 2 6 + xy 3 9 y=2 y=0 dx (11) = 1 0 2x 2 3 + 8x 9 dx = 2x 3 9 + 4x 2 9 1 0 = 2 3 (12) The covariance is Cov[X, Y ] = E[XY ] −E[X]E[Y ] = −1/81. 154 (d) The expected value of X and Y is E [X +Y ] = E [X] +E [Y ] = 5/9 + 11/9 = 16/9 (13) (e) By Theorem 4.15, Var[X +Y ] = Var[X] + Var[Y ] + 2 Cov [X, Y ] = 13 162 + 23 81 − 2 81 = 55 162 (14) Problem 4.7.9 Solution (a) The first moment of X is E [X] = 1 0 1 0 4x 2 y dy dx = 1 0 2x 2 dx = 2 3 (1) The second moment of X is E X 2 = 1 0 1 0 4x 3 y dy dx = 1 0 2x 3 dx = 1 2 (2) The variance of X is Var[X] = E[X 2 ] −(E[X]) 2 = 1/2 −(2/3) 2 = 1/18. (b) The mean of Y is E [Y ] = 1 0 1 0 4xy 2 dy dx = 1 0 4x 3 dx = 2 3 (3) The second moment of Y is E Y 2 = 1 0 1 0 4xy 3 dy dx = 1 0 xdx = 1 2 (4) The variance of Y is Var[Y ] = E[Y 2 ] −(E[Y ]) 2 = 1/2 −(2/3) 2 = 1/18. (c) To find the covariance, we first find the correlation E [XY ] = 1 0 1 0 4x 2 y 2 dy dx = 1 0 4x 2 3 dx = 4 9 (5) The covariance is thus Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 4 9 − 2 3 2 = 0 (6) (d) E[X +Y ] = E[X] +E[Y ] = 2/3 + 2/3 = 4/3. (e) By Theorem 4.15, the variance of X +Y is Var[X] + Var[Y ] + 2 Cov [X, Y ] = 1/18 + 1/18 + 0 = 1/9 (7) 155 Problem 4.7.10 Solution The joint PDF of X and Y and the region of nonzero probability are Y X 1 1 -1 f X,Y (x, y) = 5x 2 /2 −1 ≤ x ≤ 1, 0 ≤ y ≤ x 2 0 otherwise (1) (a) The first moment of X is E [X] = 1 −1 x 2 0 x 5x 2 2 dy dx = 1 −1 5x 5 2 dx = 5x 6 12 1 −1 = 0 (2) Since E[X] = 0, the variance of X and the second moment are both Var[X] = E X 2 = 1 −1 x 2 0 x 2 5x 2 2 dy dx = 5x 7 14 1 −1 = 10 14 (3) (b) The first and second moments of Y are E [Y ] = 1 −1 x 2 0 y 5x 2 2 dy dx = 5 14 (4) E Y 2 = 1 −1 0 x 2 y 2 5x 2 2 dy dx = 5 26 (5) Therefore, Var[Y ] = 5/26 −(5/14) 2 = .0576. (c) Since E[X] = 0, Cov[X, Y ] = E[XY ] −E[X]E[Y ] = E[XY ]. Thus, Cov [X, Y ] = E [XY ] = 1 1 x 2 0 xy 5x 2 2 dy dx = 1 −1 5x 7 4 dx = 0 (6) (d) The expected value of the sum X +Y is E [X +Y ] = E [X] +E [Y ] = 5 14 (7) (e) By Theorem 4.15, the variance of X +Y is Var[X +Y ] = Var[X] + Var[Y ] + 2 Cov [X, Y ] = 5/7 + 0.0576 = 0.7719 (8) Problem 4.7.11 Solution Random variables X and Y have joint PDF Y X 1 1 f X,Y (x, y) = 2 0 ≤ y ≤ x ≤ 1 0 otherwise (1) 156 Before finding moments, it is helpful to first find the marginal PDFs. For 0 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = x 0 2 dy = 2x (2) Note that f X (x) = 0 for x < 0 or x > 1. For 0 ≤ y ≤ 1, f Y (y) = ∞ −∞ f X,Y (x, y) dx = 1 y 2 dx = 2(1 −y) (3) Also, for y < 0 or y > 1, f Y (y) = 0. Complete expressions for the marginal PDFs are f X (x) = 2x 0 ≤ x ≤ 1 0 otherwise f Y (y) = 2(1 −y) 0 ≤ y ≤ 1 0 otherwise (4) (a) The first two moments of X are E [X] = ∞ −∞ xf X (x) dx = 1 0 2x 2 dx = 2/3 (5) E X 2 = ∞ −∞ x 2 f X (x) dx = 1 0 2x 3 dx = 1/2 (6) The variance of X is Var[X] = E[X 2 ] −(E[X]) 2 = 1/2 −4/9 = 1/18. (b) The expected value and second moment of Y are E [Y ] = ∞ −∞ yf Y (y) dy = 1 0 2y(1 −y) dy = y 2 − 2y 3 3 1 0 = 1/3 (7) E Y 2 = ∞ −∞ y 2 f Y (y) dy = 1 0 2y 2 (1 −y) dy = 2y 3 3 − y 4 2 1 0 = 1/6 (8) The variance of Y is Var[Y ] = E[Y 2 ] −(E[Y ]) 2 = 1/6 −1/9 = 1/18. (c) Before finding the covariance, we find the correlation E [XY ] = 1 0 x 0 2xy dy dx = 1 0 x 3 dx = 1/4 (9) The covariance is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 1/36. (10) (d) E[X +Y ] = E[X] +E[Y ] = 2/3 + 1/3 = 1 (e) By Theorem 4.15, Var[X +Y ] = Var[X] + Var[Y ] + 2 Cov [X, Y ] = 1/6. (11) 157 Problem 4.7.12 Solution Y X 1 -1 Random variables X and Y have joint PDF f X,Y (x, y) = 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) The region of possible pairs (x, y) is shown with the joint PDF. The rest of this problem is just calculus. E [XY ] = 1 −1 1 x xy 2 dy dx = 1 4 1 −1 x(1 −x 2 ) dx = x 2 8 − x 4 16 1 −1 = 0 (2) E e X+Y = 1 −1 1 x 1 2 e x e y dy dx (3) = 1 2 1 −1 e x (e 1 −e x ) dx (4) = 1 2 e 1+x − 1 4 e 2x 1 −1 = e 2 4 + e −2 4 − 1 2 (5) Problem 4.7.13 Solution For this problem, calculating the marginal PMF of K is not easy. However, the marginal PMF of N is easy to find. For n = 1, 2, . . ., P N (n) = n ¸ k=1 (1 −p) n−1 p n = (1 −p) n−1 p (1) That is, N has a geometric PMF. From Appendix A, we note that E [N] = 1 p Var[N] = 1 −p p 2 (2) We can use these facts to find the second moment of N. E N 2 = Var[N] + (E [N]) 2 = 2 −p p 2 (3) Now we can calculate the moments of K. E [K] = ∞ ¸ n=1 n ¸ k=1 k (1 −p) n−1 p n = ∞ ¸ n=1 (1 −p) n−1 p n n ¸ k=1 k (4) Since ¸ n k=1 k = n(n + 1)/2, E [K] = ∞ ¸ n=1 n + 1 2 (1 −p) n−1 p = E ¸ N + 1 2 = 1 2p + 1 2 (5) We now can calculate the sum of the moments. E [N +K] = E [N] +E [K] = 3 2p + 1 2 (6) 158 The second moment of K is E K 2 = ∞ ¸ n=1 n ¸ k=1 k 2 (1 −p) n−1 p n = ∞ ¸ n=1 (1 −p) n−1 p n n ¸ k=1 k 2 (7) Using the identity ¸ n k=1 k 2 = n(n + 1)(2n + 1)/6, we obtain E K 2 = ∞ ¸ n=1 (n + 1)(2n + 1) 6 (1 −p) n−1 p = E ¸ (N + 1)(2N + 1) 6 (8) Applying the values of E[N] and E[N 2 ] found above, we find that E K 2 = E N 2 3 + E [N] 2 + 1 6 = 2 3p 2 + 1 6p + 1 6 (9) Thus, we can calculate the variance of K. Var[K] = E K 2 −(E [K]) 2 = 5 12p 2 − 1 3p + 5 12 (10) To find the correlation of N and K, E [NK] = ∞ ¸ n=1 n ¸ k=1 nk (1 −p) n−1 p n = ∞ ¸ n=1 (1 −p) n−1 p n ¸ k=1 k (11) Since ¸ n k=1 k = n(n + 1)/2, E [NK] = ∞ ¸ n=1 n(n + 1) 2 (1 −p) n−1 p = E ¸ N(N + 1) 2 = 1 p 2 (12) Finally, the covariance is Cov [N, K] = E [NK] −E [N] E [K] = 1 2p 2 − 1 2p (13) Problem 4.8.1 Solution The event A occurs iff X > 5 and Y > 5 and has probability P [A] = P [X > 5, Y > 5] = 10 ¸ x=6 10 ¸ y=6 0.01 = 0.25 (1) From Theorem 4.19, P X,Y |A (x, y) = P X,Y (x,y) P[A] (x, y) ∈ A 0 otherwise (2) = 0.04 x = 6, . . . , 10; y = 6, . . . , 20 0 otherwise (3) 159 Problem 4.8.2 Solution The event B occurs iff X ≤ 5 and Y ≤ 5 and has probability P [B] = P [X ≤ 5, Y ≤ 5] = 5 ¸ x=1 5 ¸ y=1 0.01 = 0.25 (1) From Theorem 4.19, P X,Y |B (x, y) = P X,Y (x,y) P[B] (x, y) ∈ A 0 otherwise (2) = 0.04 x = 1, . . . , 5; y = 1, . . . , 5 0 otherwise (3) Problem 4.8.3 Solution Given the event A = {X +Y ≤ 1}, we wish to find f X,Y |A (x, y). First we find P [A] = 1 0 1−x 0 6e −(2x+3y) dy dx = 1 −3e −2 + 2e −3 (1) So then f X,Y |A (x, y) = 6e −(2x+3y) 1−3e −2 +2e −3 x +y ≤ 1, x ≥ 0, y ≥ 0 0 otherwise (2) Problem 4.8.4 Solution First we observe that for n = 1, 2, . . ., the marginal PMF of N satisfies P N (n) = n ¸ k=1 P N,K (n, k) = (1 −p) n−1 p n ¸ k=1 1 n = (1 −p) n−1 p (1) Thus, the event B has probability P [B] = ∞ ¸ n=10 P N (n) = (1 −p) 9 p[1 + (1 −p) + (1 −p) 2 + · · · ] = (1 −p) 9 (2) From Theorem 4.19, P N,K|B (n, k) = P N,K (n,k) P[B] n, k ∈ B 0 otherwise (3) = (1 −p) n−10 p/n n = 10, 11, . . . ; k = 1, . . . , n 0 otherwise (4) The conditional PMF P N|B (n|b) could be found directly from P N (n) using Theorem 2.17. However, we can also find it just by summing the conditional joint PMF. P N|B (n) = n ¸ k=1 P N,K|B (n, k) = (1 −p) n−10 p n = 10, 11, . . . 0 otherwise (5) 160 From the conditional PMF P N|B (n), we can calculate directly the conditional moments of N given B. Instead, however, we observe that given B, N = N −9 has a geometric PMF with mean 1/p. That is, for n = 1, 2, . . ., P N |B (n) = P [N = n + 9|B] = P N|B (n + 9) = (1 −p) n−1 p (6) Hence, given B, N = N + 9 and we can calculate the conditional expectations E [N|B] = E N + 9|B = E N |B + 9 = 1/p + 9 (7) Var[N|B] = Var[N + 9|B] = Var[N |B] = (1 −p)/p 2 (8) Note that further along in the problem we will need E[N 2 |B] which we now calculate. E N 2 |B = Var[N|B] + (E [N|B]) 2 (9) = 2 p 2 + 17 p + 81 (10) For the conditional moments of K, we work directly with the conditional PMF P N,K|B (n, k). E [K|B] = ∞ ¸ n=10 n ¸ k=1 k (1 −p) n−10 p n = ∞ ¸ n=10 (1 −p) n−10 p n n ¸ k=1 k (11) Since ¸ n k=1 k = n(n + 1)/2, E [K|B] = ∞ ¸ n=1 n + 1 2 (1 −p) n−1 p = 1 2 E [N + 1|B] = 1 2p + 5 (12) We now can calculate the conditional expectation of the sum. E [N +K|B] = E [N|B] +E [K|B] = 1/p + 9 + 1/(2p) + 5 = 3 2p + 14 (13) The conditional second moment of K is E K 2 |B = ∞ ¸ n=10 n ¸ k=1 k 2 (1 −p) n−10 p n = ∞ ¸ n=10 (1 −p) n−10 p n n ¸ k=1 k 2 (14) Using the identity ¸ n k=1 k 2 = n(n + 1)(2n + 1)/6, we obtain E K 2 |B = ∞ ¸ n=10 (n + 1)(2n + 1) 6 (1 −p) n−10 p = 1 6 E [(N + 1)(2N + 1)|B] (15) Applying the values of E[N|B] and E[N 2 |B] found above, we find that E K 2 |B = E N 2 |B 3 + E [N|B] 2 + 1 6 = 2 3p 2 + 37 6p + 31 2 3 (16) Thus, we can calculate the conditional variance of K. Var[K|B] = E K 2 |B −(E [K|B]) 2 = 5 12p 2 − 7 6p + 6 2 3 (17) 161 To find the conditional correlation of N and K, E [NK|B] = ∞ ¸ n=10 n ¸ k=1 nk (1 −p) n−10 p n = ∞ ¸ n=10 (1 −p) n−1 p n ¸ k=1 k (18) Since ¸ n k=1 k = n(n + 1)/2, E [NK|B] = ∞ ¸ n=10 n(n + 1) 2 (1 −p) n−10 p = 1 2 E [N(N + 1)|B] = 1 p 2 + 9 p + 45 (19) Problem 4.8.5 Solution The joint PDF of X and Y is f X,Y (x, y) = (x +y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2 0 otherwise (1) (a) The probability that Y ≤ 1 is Y X 1 1 2 Y 1 P [A] = P [Y ≤ 1] = y≤1 f X,Y (x, y) dxdy (2) = 1 0 1 0 x +y 3 dy dx (3) = 1 0 xy 3 + y 2 6 y=1 y=0 dx (4) = 1 0 2x + 1 6 dx = x 2 6 + x 6 1 0 = 1 3 (5) (b) By Definition 4.10, the conditional joint PDF of X and Y given A is f X,Y |A (x, y) = f X,Y (x,y) P[A] (x, y) ∈ A 0 otherwise = x +y 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise (6) From f X,Y |A (x, y), we find the conditional marginal PDF f X|A (x). For 0 ≤ x ≤ 1, f X|A (x) = ∞ −∞ f X,Y |A (x, y) dy = 1 0 (x +y) dy = xy + y 2 2 y=1 y=0 = x + 1 2 (7) The complete expression is f X|A (x) = x + 1/2 0 ≤ x ≤ 1 0 otherwise (8) For 0 ≤ y ≤ 1, the conditional marginal PDF of Y is f Y |A (y) = ∞ −∞ f X,Y |A (x, y) dx = 1 0 (x +y) dx = x 2 2 +xy x=1 x=0 = y + 1/2 (9) The complete expression is f Y |A (y) = y + 1/2 0 ≤ y ≤ 1 0 otherwise (10) 162 Problem 4.8.6 Solution Random variables X and Y have joint PDF f X,Y (x, y) = (4x + 2y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise (1) (a) The probability of event A = {Y ≤ 1/2} is P [A] = y≤1/2 f X,Y (x, y) dy dx = 1 0 1/2 0 4x + 2y 3 dy dx. (2) With some calculus, P [A] = 1 0 4xy +y 2 3 y=1/2 y=0 dx = 1 0 2x + 1/4 3 dx = x 2 3 + x 12 1 0 = 5 12 . (3) (b) The conditional joint PDF of X and Y given A is f X,Y |A (x, y) = f X,Y (x,y) P[A] (x, y) ∈ A 0 otherwise (4) = 8(2x +y)/5 0 ≤ x ≤ 1, 0 ≤ y ≤ 1/2 0 otherwise (5) For 0 ≤ x ≤ 1, the PDF of X given A is f X|A (x) = ∞ −∞ f X,Y |A (x, y) dy = 8 5 1/2 0 (2x +y) dy (6) = 8 5 2xy + y 2 2 y=1/2 y=0 = 8x + 1 5 (7) The complete expression is f X|A (x) = (8x + 1)/5 0 ≤ x ≤ 1 0 otherwise (8) For 0 ≤ y ≤ 1/2, the conditional marginal PDF of Y given A is f Y |A (y) = ∞ −∞ f X,Y |A (x, y) dx = 8 5 1 0 (2x +y) dx (9) = 8x 2 + 8xy 5 x=1 x=0 = 8y + 8 5 (10) The complete expression is f Y |A (y) = (8y + 8)/5 0 ≤ y ≤ 1/2 0 otherwise (11) 163 Problem 4.8.7 Solution (a) The event A = {Y ≤ 1/4} has probability Y X 1 1 -1 Y<1/4 ½ - ½ ¼ P [A] = 2 1/2 0 x 2 0 5x 2 2 dy dx (1) + 2 1 1/2 1/4 0 5x 2 2 dy dx = 1/2 0 5x 4 dx + 1 1/2 5x 2 4 dx (2) = x 5 1/2 0 + 5x 3 /12 1 1/2 (3) = 19/48 (4) This implies f X,Y |A (x, y) = f X,Y (x, y) /P [A] (x, y) ∈ A 0 otherwise (5) = 120x 2 /19 −1 ≤ x ≤ 1, 0 ≤ y ≤ x 2 , y ≤ 1/4 0 otherwise (6) (b) f Y |A (y) = ∞ −∞ f X,Y |A (x, y) dx = 2 1 √ y 120x 2 19 dx = 80 19 (1 −y 3/2 ) 0 ≤ y ≤ 1/4 0 otherwise (7) (c) The conditional expectation of Y given A is E [Y |A] = 1/4 0 y 80 19 (1 −y 3/2 ) dy = 80 19 y 2 2 − 2y 7/2 7 1/4 0 = 65 532 (8) (d) To find f X|A (x), we can write f X|A (x) = ∞ −∞ f X,Y |A (x, y) dy. However, when we substitute f X,Y |A (x, y), the limits will depend on the value of x. When |x| ≤ 1/2, f X|A (x) = x 2 0 120x 2 19 dy = 120x 4 19 (9) When −1 ≤ x ≤ −1/2 or 1/2 ≤ x ≤ 1, f X|A (x) = 1/4 0 120x 2 19 dy = 30x 2 19 (10) The complete expression for the conditional PDF of X given A is f X|A (x) = 30x 2 /19 −1 ≤ x ≤ −1/2 120x 4 /19 −1/2 ≤ x ≤ 1/2 30x 2 /19 1/2 ≤ x ≤ 1 0 otherwise (11) 164 (e) The conditional mean of X given A is E [X|A] = −1/2 −1 30x 3 19 dx + 1/2 −1/2 120x 5 19 dx + 1 1/2 30x 3 19 dx = 0 (12) Problem 4.9.1 Solution The main part of this problem is just interpreting the problem statement. No calculations are necessary. Since a trip is equally likely to last 2, 3 or 4 days, P D (d) = 1/3 d = 2, 3, 4 0 otherwise (1) Given a trip lasts d days, the weight change is equally likely to be any value between −d and d pounds. Thus, P W|D (w|d) = 1/(2d + 1) w = −d, −d + 1, . . . , d 0 otherwise (2) The joint PMF is simply P D,W (d, w) = P W|D (w|d) P D (d) (3) = 1/(6d + 3) d = 2, 3, 4; w = −d, . . . , d 0 otherwise (4) Problem 4.9.2 Solution We can make a table of the possible outcomes and the corresponding values of W and Y outcome P [·] W Y hh p 2 0 2 ht p(1 −p) 1 1 th p(1 −p) −1 1 tt (1 −p) 2 0 0 (1) In the following table, we write the joint PMF P W,Y (w, y) along with the marginal PMFs P Y (y) and P W (w). P W,Y (w, y) w = −1 w = 0 w = 1 P Y (y) y = 0 0 (1 −p) 2 0 (1 −p) 2 y = 1 p(1 −p) 0 p(1 −p) 2p(1 −p) y = 2 0 p 2 0 p 2 P W (w) p(1 −p) 1 −2p + 2p 2 p(1 −p) (2) Using the definition P W|Y (w|y) = P W,Y (w, y)/P Y (y), we can find the conditional PMFs of W given Y . P W|Y (w|0) = 1 w = 0 0 otherwise P W|Y (w|1) = 1/2 w = −1, 1 0 otherwise (3) P W|Y (w|2) = 1 w = 0 0 otherwise (4) 165 Similarly, the conditional PMFs of Y given W are P Y |W (y| −1) = 1 y = 1 0 otherwise P Y |W (y|0) = (1−p) 2 1−2p+2p 2 y = 0 p 2 1−2p+2p 2 y = 2 0 otherwise (5) P Y |W (y|1) = 1 y = 1 0 otherwise (6) Problem 4.9.3 Solution f X,Y (x, y) = (x +y) 0 ≤ x, y ≤ 1 0 otherwise (1) (a) The conditional PDF f X|Y (x|y) is defined for all y such that 0 ≤ y ≤ 1. For 0 ≤ y ≤ 1, f X|Y (x) = f X,Y (x, y) f X (x) = (x +y) 1 0 (x +y) dy = (x+y) x+1/2 0 ≤ x ≤ 1 0 otherwise (2) (b) The conditional PDF f Y |X (y|x) is defined for all values of x in the interval [0, 1]. For 0 ≤ x ≤ 1, f Y |X (y) = f X,Y (x, y) f Y (y) = (x +y) 1 0 (x +y) dx = (x+y) y+1/2 0 ≤ y ≤ 1 0 otherwise (3) Problem 4.9.4 Solution Random variables X and Y have joint PDF Y X 1 1 f X,Y (x, y) = 2 0 ≤ y ≤ x ≤ 1 0 otherwise (1) For 0 ≤ y ≤ 1, f Y (y) = ∞ −∞ f X,Y (x, y) dx = 1 y 2 dx = 2(1 −y) (2) Also, for y < 0 or y > 1, f Y (y) = 0. The complete expression for the marginal PDF is f Y (y) = 2(1 −y) 0 ≤ y ≤ 1 0 otherwise (3) By Theorem 4.24, the conditional PDF of X given Y is f X|Y (x|y) = f X,Y (x, y) f Y (y) = 1 1−y y ≤ x ≤ 1 0 otherwise (4) 166 That is, since Y ≤ X ≤ 1, X is uniform over [y, 1] when Y = y. The conditional expectation of X given Y = y can be calculated as E [X|Y = y] = ∞ −∞ xf X|Y (x|y) dx (5) = 1 y x 1 −y dx = x 2 2(1 −y) 1 y = 1 +y 2 (6) In fact, since we know that the conditional PDF of X is uniform over [y, 1] when Y = y, it wasn’t really necessary to perform the calculation. Problem 4.9.5 Solution Random variables X and Y have joint PDF Y X 1 1 f X,Y (x, y) = 2 0 ≤ y ≤ x ≤ 1 0 otherwise (1) For 0 ≤ x ≤ 1, the marginal PDF for X satisfies f X (x) = ∞ −∞ f X,Y (x, y) dy = x 0 2 dy = 2x (2) Note that f X (x) = 0 for x < 0 or x > 1. Hence the complete expression for the marginal PDF of X is f X (x) = 2x 0 ≤ x ≤ 1 0 otherwise (3) The conditional PDF of Y given X = x is f Y |X (y|x) = f X,Y (x, y) f X (x) = 1/x 0 ≤ y ≤ x 0 otherwise (4) Given X = x, Y has a uniform PDF over [0, x] and thus has conditional expected value E[Y |X = x] = x/2. Another way to obtain this result is to calculate ∞ −∞ yf Y |X (y|x) dy. Problem 4.9.6 Solution We are told in the problem statement that if we know r, the number of feet a student sits from the blackboard, then we also know that that student’s grade is a Gaussian random variable with mean 80 −r and standard deviation r. This is exactly f X|R (x|r) = 1 √ 2πr 2 e −(x−[80−r]) 2 /2r 2 (1) Problem 4.9.7 Solution 167 (a) First we observe that A takes on the values S A = {−1, 1} while B takes on values from S B = {0, 1}. To construct a table describing P A,B (a, b) we build a table for all possible values of pairs (A, B). The general form of the entries is P A,B (a, b) b = 0 b = 1 a = −1 P B|A (0| −1) P A (−1) P B|A (1| −1) P A (−1) a = 1 P B|A (0|1) P A (1) P B|A (1|1) P A (1) (1) Now we fill in the entries using the conditional PMFs P B|A (b|a) and the marginal PMF P A (a). This yields P A,B (a, b) b = 0 b = 1 a = −1 (1/3)(1/3) (2/3)(1/3) a = 1 (1/2)(2/3) (1/2)(2/3) which simplifies to P A,B (a, b) b = 0 b = 1 a = −1 1/9 2/9 a = 1 1/3 1/3 (2) (b) Since P A (1) = P A,B (1, 0) +P A,B (1, 1) = 2/3, P B|A (b|1) = P A,B (1, b) P A (1) = 1/2 b = 0, 1, 0 otherwise. (3) If A = 1, the conditional expectation of B is E [B|A = 1] = 1 ¸ b=0 bP B|A (b|1) = P B|A (1|1) = 1/2. (4) (c) Before finding the conditional PMF P A|B (a|1), we first sum the columns of the joint PMF table to find P B (b) = 4/9 b = 0 5/9 b = 1 (5) The conditional PMF of A given B = 1 is P A|B (a|1) = P A,B (a, 1) P B (1) = 2/5 a = −1 3/5 a = 1 (6) (d) Now that we have the conditional PMF P A|B (a|1), calculating conditional expectations is easy. E [A|B = 1] = ¸ a=−1,1 aP A|B (a|1) = −1(2/5) + (3/5) = 1/5 (7) E A 2 |B = 1 = ¸ a=−1,1 a 2 P A|B (a|1) = 2/5 + 3/5 = 1 (8) The conditional variance is then Var[A|B = 1] = E A 2 |B = 1 −(E [A|B = 1]) 2 = 1 −(1/5) 2 = 24/25 (9) 168 (e) To calculate the covariance, we need E [A] = ¸ a=−1,1 aP A (a) = −1(1/3) + 1(2/3) = 1/3 (10) E [B] = 1 ¸ b=0 bP B (b) = 0(4/9) + 1(5/9) = 5/9 (11) E [AB] = ¸ a=−1,1 1 ¸ b=0 abP A,B (a, b) (12) = −1(0)(1/9) +−1(1)(2/9) + 1(0)(1/3) + 1(1)(1/3) = 1/9 (13) The covariance is just Cov [A, B] = E [AB] −E [A] E [B] = 1/9 −(1/3)(5/9) = −2/27 (14) Problem 4.9.8 Solution First we need to find the conditional expectations E [B|A = −1] = 1 ¸ b=0 bP B|A (b| −1) = 0(1/3) + 1(2/3) = 2/3 (1) E [B|A = 1] = 1 ¸ b=0 bP B|A (b|1) = 0(1/2) + 1(1/2) = 1/2 (2) Keep in mind that E[B|A] is a random variable that is a function of A. that is we can write E [B|A] = g(A) = 2/3 A = −1 1/2 A = 1 (3) We see that the range of U is S U = {1/2, 2/3}. In particular, P U (1/2) = P A (1) = 2/3 P U (2/3) = P A (−1) = 1/3 (4) The complete PMF of U is P U (u) = 2/3 u = 1/2 1/3 u = 2/3 (5) Note that E [E [B|A]] = E [U] = ¸ u uP U (u) = (1/2)(2/3) + (2/3)(1/3) = 5/9 (6) You can check that E[U] = E[B]. Problem 4.9.9 Solution Random variables N and K have the joint PMF P N,K (n, k) = 100 n e −100 (n+1)! k = 0, 1, . . . , n; n = 0, 1, . . . 0 otherwise (1) 169 We can find the marginal PMF for N by summing over all possible K. For n ≥ 0, P N (n) = n ¸ k=0 100 n e −100 (n + 1)! = 100 n e −100 n! (2) We see that N has a Poisson PMF with expected value 100. For n ≥ 0, the conditional PMF of K given N = n is P K|N (k|n) = P N,K (n, k) P N (n) = 1/(n + 1) k = 0, 1, . . . , n 0 otherwise (3) That is, given N = n, K has a discrete uniform PMF over {0, 1, . . . , n}. Thus, E [K|N = n] = n ¸ k=0 k/(n + 1) = n/2 (4) We can conclude that E[K|N] = N/2. Thus, by Theorem 4.25, E [K] = E [E [K|N]] = E [N/2] = 50. (5) Problem 4.9.10 Solution This problem is fairly easy when we use conditional PMF’s. In particular, given that N = n pizzas were sold before noon, each of those pizzas has mushrooms with probability 1/3. The conditional PMF of M given N is the binomial distribution P M|N (m|n) = n m (1/3) m (2/3) n−m m = 0, 1, . . . , n 0 otherwise (1) The other fact we know is that for each of the 100 pizzas sold, the pizza is sold before noon with probability 1/2. Hence, N has the binomial PMF P N (n) = 100 n (1/2) n (1/2) 100−n n = 0, 1, . . . , 100 0 otherwise (2) The joint PMF of N and M is for integers m, n, P M,N (m, n) = P M|N (m|n) P N (n) (3) = n m 100 n (1/3) m (2/3) n−m (1/2) 100 0 ≤ m ≤ n ≤ 100 0 otherwise (4) Problem 4.9.11 Solution Random variables X and Y have joint PDF Y X 1 -1 f X,Y (x, y) = 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) 170 (a) For −1 ≤ y ≤ 1, the marginal PDF of Y is f Y (y) = ∞ −∞ f X,Y (x, y) dx = 1 2 y −1 dx = (y + 1)/2 (2) The complete expression for the marginal PDF of Y is f Y (y) = (y + 1)/2 −1 ≤ y ≤ 1 0 otherwise (3) (b) The conditional PDF of X given Y is f X|Y (x|y) = f X,Y (x, y) f Y (y) = 1 1+y −1 ≤ x ≤ y 0 otherwise (4) (c) Given Y = y, the conditional PDF of X is uniform over [−1, y]. Hence the conditional expected value is E[X|Y = y] = (y −1)/2. Problem 4.9.12 Solution We are given that the joint PDF of X and Y is f X,Y (x, y) = 1/(πr 2 ) 0 ≤ x 2 +y 2 ≤ r 2 0 otherwise (1) (a) The marginal PDF of X is f X (x) = 2 √ r 2 −x 2 0 1 πr 2 dy = 2 √ r 2 −x 2 πr 2 −r ≤ x ≤ r 0 otherwise (2) The conditional PDF of Y given X is f Y |X (y|x) = f X,Y (x, y) f X (x) = 1/(2 √ r 2 −x 2 ) y 2 ≤ r 2 −x 2 0 otherwise (3) (b) Given X = x, we observe that over the interval [− √ r 2 −x 2 , √ r 2 −x 2 ], Y has a uniform PDF. Since the conditional PDF f Y |X (y|x) is symmetric about y = 0, E [Y |X = x] = 0 (4) Problem 4.9.13 Solution The key to solving this problem is to find the joint PMF of M and N. Note that N ≥ M. For n > m, the joint event {M = m, N = n} has probability P [M = m, N = n] = P [ m−1 calls . .. . dd · · · d v n −m−1 calls . .. . dd · · · d v] (1) = (1 −p) m−1 p(1 −p) n−m−1 p (2) = (1 −p) n−2 p 2 (3) 171 A complete expression for the joint PMF of M and N is P M,N (m, n) = (1 −p) n−2 p 2 m = 1, 2, . . . , n −1; n = m+ 1, m+ 2, . . . 0 otherwise (4) The marginal PMF of N satisfies P N (n) = n−1 ¸ m=1 (1 −p) n−2 p 2 = (n −1)(1 −p) n−2 p 2 , n = 2, 3, . . . (5) Similarly, for m = 1, 2, . . ., the marginal PMF of M satisfies P M (m) = ∞ ¸ n=m+1 (1 −p) n−2 p 2 (6) = p 2 [(1 −p) m−1 + (1 −p) m + · · · ] (7) = (1 −p) m−1 p (8) The complete expressions for the marginal PMF’s are P M (m) = (1 −p) m−1 p m = 1, 2, . . . 0 otherwise (9) P N (n) = (n −1)(1 −p) n−2 p 2 n = 2, 3, . . . 0 otherwise (10) Not surprisingly, if we view each voice call as a successful Bernoulli trial, M has a geometric PMF since it is the number of trials up to and including the first success. Also, N has a Pascal PMF since it is the number of trials required to see 2 successes. The conditional PMF’s are now easy to find. P N|M (n|m) = P M,N (m, n) P M (m) = (1 −p) n−m−1 p n = m+ 1, m+ 2, . . . 0 otherwise (11) The interpretation of the conditional PMF of N given M is that given M = m, N = m+N where N has a geometric PMF with mean 1/p. The conditional PMF of M given N is P M|N (m|n) = P M,N (m, n) P N (n) = 1/(n −1) m = 1, . . . , n −1 0 otherwise (12) Given that call N = n was the second voice call, the first voice call is equally likely to occur in any of the previous n −1 calls. Problem 4.9.14 Solution (a) The number of buses, N, must be greater than zero. Also, the number of minutes that pass cannot be less than the number of buses. Thus, P[N = n, T = t] > 0 for integers n, t satisfying 1 ≤ n ≤ t. 172 (b) First, we find the joint PMF of N and T by carefully considering the possible sample paths. In particular, P N,T (n, t) = P[ABC] = P[A]P[B]P[C] where the events A, B and C are A = {n −1 buses arrive in the first t −1 minutes} (1) B = {none of the first n −1 buses are boarded} (2) C = {at time t a bus arrives and is boarded} (3) These events are independent since each trial to board a bus is independent of when the buses arrive. These events have probabilities P [A] = t −1 n −1 p n−1 (1 −p) t−1−(n−1) (4) P [B] = (1 −q) n−1 (5) P [C] = pq (6) Consequently, the joint PMF of N and T is P N,T (n, t) = t−1 n−1 p n−1 (1 −p) t−n (1 −q) n−1 pq n ≥ 1, t ≥ n 0 otherwise (7) (c) It is possible to find the marginal PMF’s by summing the joint PMF. However, it is much easier to obtain the marginal PMFs by consideration of the experiment. Specifically, when a bus arrives, it is boarded with probability q. Moreover, the experiment ends when a bus is boarded. By viewing whether each arriving bus is boarded as an independent trial, N is the number of trials until the first success. Thus, N has the geometric PMF P N (n) = (1 −q) n−1 q n = 1, 2, . . . 0 otherwise (8) To find the PMF of T, suppose we regard each minute as an independent trial in which a success occurs if a bus arrives and that bus is boarded. In this case, the success probability is pq and T is the number of minutes up to and including the first success. The PMF of T is also geometric. P T (t) = (1 −pq) t−1 pq t = 1, 2, . . . 0 otherwise (9) (d) Once we have the marginal PMFs, the conditional PMFs are easy to find. P N|T (n|t) = P N,T (n, t) P T (t) = t−1 n−1 p(1−q) 1−pq n−1 1−p 1−pq t−1−(n−1) n = 1, 2, . . . , t 0 otherwise (10) That is, given you depart at time T = t, the number of buses that arrive during minutes 1, . . . , t−1 has a binomial PMF since in each minute a bus arrives with probability p. Similarly, the conditional PMF of T given N is P T|N (t|n) = P N,T (n, t) P N (n) = t−1 n−1 p n (1 −p) t−n t = n, n + 1, . . . 0 otherwise (11) This result can be explained. Given that you board bus N = n, the time T when you leave is the time for n buses to arrive. If we view each bus arrival as a success of an independent trial, the time for n buses to arrive has the above Pascal PMF. 173 Problem 4.9.15 Solution If you construct a tree describing what type of call (if any) that arrived in any 1 millisecond period, it will be apparent that a fax call arrives with probability α = pqr or no fax arrives with probability 1 − α. That is, whether a fax message arrives each millisecond is a Bernoulli trial with success probability α. Thus, the time required for the first success has the geometric PMF P T (t) = (1 −α) t−1 α t = 1, 2, . . . 0 otherwise (1) Note that N is the number of trials required to observe 100 successes. Moreover, the number of trials needed to observe 100 successes is N = T + N where N is the number of trials needed to observe successes 2 through 100. Since N is just the number of trials needed to observe 99 successes, it has the Pascal (k = 99, p) PMF P N (n) = n −1 98 α 99 (1 −α) n−99 . (2) Since the trials needed to generate successes 2 though 100 are independent of the trials that yield the first success, N and T are independent. Hence P N|T (n|t) = P N |T (n −t|t) = P N (n −t) . (3) Applying the PMF of N found above, we have P N|T (n|t) = n −t −1 98 α 99 (1 −α) n−t−99 . (4) Finally the joint PMF of N and T is P N,T (n, t) = P N|T (n|t) P T (t) (5) = n−t−1 98 α 100 (1 −α) n−100 t = 1, 2, . . . ; n = 99 +t, 100 +t, . . . 0 otherwise (6) This solution can also be found a consideration of the sample sequence of Bernoulli trials in which we either observe or do not observe a fax message. To find the conditional PMF P T|N (t|n), we first must recognize that N is simply the number of trials needed to observe 100 successes and thus has the Pascal PMF P N (n) = n −1 99 α 100 (1 −α) n−100 (7) Hence for any integer n ≥ 100, the conditional PMF is P T|N (t|n) = P N,T (n, t) P N (n) = ( n−t−1 98 ) ( n−1 99 ) t = 1, 2, . . . , n −99 0 otherwise. (8) Problem 4.10.1 Solution Flip a fair coin 100 times and let X be the number of heads in the first 75 flips and Y be the number of heads in the last 25 flips. We know that X and Y are independent and can find their PMFs easily. P X (x) = 75 x (1/2) 75 P Y (y) = 25 y (1/2) 25 (1) 174 The joint PMF of X and N can be expressed as the product of the marginal PMFs because we know that X and Y are independent. P X,Y (x, y) = 75 x 25 y (1/2) 100 (2) Problem 4.10.2 Solution Using the following probability model P X (k) = P Y (k) = 3/4 k = 0 1/4 k = 20 0 otherwise (1) We can calculate the requested moments. E [X] = 3/4 · 0 + 1/4 · 20 = 5 (2) Var[X] = 3/4 · (0 −5) 2 + 1/4 · (20 −5) 2 = 75 (3) E [X +Y ] = E [X] +E [X] = 2E [X] = 10 (4) Since X and Y are independent, Theorem 4.27 yields Var[X +Y ] = Var[X] + Var[Y ] = 2 Var[X] = 150 (5) Since X and Y are independent, P X,Y (x, y) = P X (x)P Y (y) and E XY 2 XY = ¸ x=0,20 ¸ y=0,20 XY 2 XY P X,Y (x, y) = (20)(20)2 20(20) P X (20) P Y (20) (6) = 2.75 10 12 (7) Problem 4.10.3 Solution (a) Normally, checking independence requires the marginal PMFs. However, in this problem, the zeroes in the table of the joint PMF P X,Y (x, y) allows us to verify very quickly that X and Y are dependent. In particular, P X (−1) = 1/4 and P Y (1) = 14/48 but P X,Y (−1, 1) = 0 = P X (−1) P Y (1) (1) (b) To fill in the tree diagram, we need the marginal PMF P X (x) and the conditional PMFs P Y |X (y|x). By summing the rows on the table for the joint PMF, we obtain P X,Y (x, y) y = −1 y = 0 y = 1 P X (x) x = −1 3/16 1/16 0 1/4 x = 0 1/6 1/6 1/6 1/2 x = 1 0 1/8 1/8 1/4 (2) Now we use the conditional PMF P Y |X (y|x) = P X,Y (x, y)/P X (x) to write P Y |X (y| −1) = 3/4 y = −1 1/4 y = 0 0 otherwise P Y |X (y|0) = 1/3 y = −1, 0, 1 0 otherwise (3) P Y |X (y|1) = 1/2 y = 0, 1 0 otherwise (4) 175 Now we can us these probabilities to label the tree. The generic solution and the specific solution with the exact values are X=−1 P X(−1) d d d d d d X=1 P X (1) X=0 PX(0) $ $ $ $ $$Y =−1 P Y |X (−1|−1) Y =0 P Y |X (0|−1) ¨ ¨ ¨ ¨ ¨¨ Y =−1 P Y |X (−1|0) r r r r rr Y =1 P Y |X (1|0) Y =0 P Y |X (0|0) Y =0 P Y |X (0|1) ˆ ˆ ˆ ˆ ˆˆ Y =1 P Y |X (1|1) X=−1 1/4 d d d d d d X=1 1/4 X=0 1/2 $ $ $ $ $$Y =−1 3/4 Y =0 1/4 ¨ ¨ ¨ ¨ ¨¨ Y =−1 1/3 r r r r rr Y =1 1/3 Y =0 1/3 Y =0 1/2 ˆ ˆ ˆ ˆ ˆˆ Y =1 1/2 Problem 4.10.4 Solution In the solution to Problem 4.9.10, we found that the conditional PMF of M given N is P M|N (m|n) = n m (1/3) m (2/3) n−m m = 0, 1, . . . , n 0 otherwise (1) Since P M|N (m|n) depends on the event N = n, we see that M and N are dependent. Problem 4.10.5 Solution We can solve this problem for the general case when the probability of heads is p. For the fair coin, p = 1/2. Viewing each flip as a Bernoulli trial in which heads is a success, the number of flips until heads is the number of trials needed for the first success which has the geometric PMF P X 1 (x) = (1 −p) x−1 p x = 1, 2, . . . 0 otherwise (1) Similarly, no matter how large X 1 may be, the number of additional flips for the second heads is the same experiment as the number of flips needed for the first occurrence of heads. That is, P X 2 (x) = P X 1 (x). Moreover, the flips needed to generate the second occurrence of heads are independent of the flips that yield the first heads. Hence, it should be apparent that X 1 and X 2 are independent and P X 1 ,X 2 (x 1 , x 2 ) = P X 1 (x 1 ) P X 2 (x 2 ) = (1 −p) x 1 +x 2 −2 p 2 x 1 = 1, 2, . . . ; x 2 = 1, 2, . . . 0 otherwise (2) However, if this independence is not obvious, it can be derived by examination of the sample path. When x 1 ≥ 1 and x 2 ≥ 1, the event {X 1 = x 1 , X 2 = x 2 } occurs iff we observe the sample sequence tt · · · t . .. . x 1 −1 times h tt · · · t . .. . x 2 −1 times h (3) The above sample sequence has probability (1−p) x 1 −1 p(1−p) x 2 −1 p which in fact equals P X 1 ,X 2 (x 1 , x 2 ) given earlier. 176 Problem 4.10.6 Solution We will solve this problem when the probability of heads is p. For the fair coin, p = 1/2. The number X 1 of flips until the first heads and the number X 2 of additional flips for the second heads both have the geometric PMF P X 1 (x) = P X 2 (x) = (1 −p) x−1 p x = 1, 2, . . . 0 otherwise (1) Thus, E[X i ] = 1/p and Var[X i ] = (1 −p)/p 2 . By Theorem 4.14, E [Y ] = E [X 1 ] −E [X 2 ] = 0 (2) Since X 1 and X 2 are independent, Theorem 4.27 says Var[Y ] = Var[X 1 ] + Var[−X 2 ] = Var[X 1 ] + Var[X 2 ] = 2(1 −p) p 2 (3) Problem 4.10.7 Solution X and Y are independent random variables with PDFs f X (x) = 1 3 e −x/3 x ≥ 0 0 otherwise f Y (y) = 1 2 e −y/2 y ≥ 0 0 otherwise (1) (a) To calculate P[X > Y ], we use the joint PDF f X,Y (x, y) = f X (x)f Y (y). P [X > Y ] = x>y f X (x) f Y (y) dxdy (2) = ∞ 0 1 2 e −y/2 ∞ y 1 3 e −x/3 dxdy (3) = ∞ 0 1 2 e −y/2 e −y/3 dy (4) = ∞ 0 1 2 e −(1/2+1/3)y dy = 1/2 1/2 + 2/3 = 3 7 (5) (b) Since X and Y are exponential random variables with parameters λ X = 1/3 and λ Y = 1/2, Appendix A tells us that E[X] = 1/λ X = 3 and E[Y ] = 1/λ Y = 2. Since X and Y are independent, the correlation is E[XY ] = E[X]E[Y ] = 6. (c) Since X and Y are independent, Cov[X, Y ] = 0. Problem 4.10.8 Solution (a) Since E[−X 2 ] = −E[X 2 ], we can use Theorem 4.13 to write E [X 1 −X 2 ] = E [X 1 + (−X 2 )] = E [X 1 ] +E [−X 2 ] (1) = E [X 1 ] −E [X 2 ] (2) = 0 (3) 177 (b) By Theorem 3.5(f), Var[−X 2 ] = (−1) 2 Var[X 2 ] = Var[X 2 ]. Since X 1 and X 2 are independent, Theorem 4.27(a) says that Var[X 1 −X 2 ] = Var[X 1 + (−X 2 )] (4) = Var[X 1 ] + Var[−X 2 ] (5) = 2 Var[X] (6) Problem 4.10.9 Solution Since X and Y are take on only integer values, W = X +Y is integer valued as well. Thus for an integer w, P W (w) = P [W = w] = P [X +Y = w] . (1) Suppose X = k, then W = w if and only if Y = w −k. To find all ways that X +Y = w, we must consider each possible integer k such that X = k. Thus P W (w) = ∞ ¸ k=−∞ P [X = k, Y = w −k] = ∞ ¸ k=−∞ P X,Y (k, w −k) . (2) Since X and Y are independent, P X,Y (k, w −k) = P X (k)P Y (w −k). It follows that for any integer w, P W (w) = ∞ ¸ k=−∞ P X (k) P Y (w −k) . (3) Problem 4.10.10 Solution The key to this problem is understanding that “short order” and “long order” are synonyms for N = 1 and N = 2. Similarly, “vanilla”, “chocolate”, and “strawberry” correspond to the events D = 20, D = 100 and D = 300. (a) The following table is given in the problem statement. vanilla choc. strawberry short order 0.2 0.2 0.2 long order 0.1 0.2 0.1 This table can be translated directly into the joint PMF of N and D. P N,D (n, d) d = 20 d = 100 d = 300 n = 1 0.2 0.2 0.2 n = 2 0.1 0.2 0.1 (1) 178 (b) We find the marginal PMF P D (d) by summing the columns of the joint PMF. This yields P D (d) = 0.3 d = 20, 0.4 d = 100, 0.3 d = 300, 0 otherwise. (2) (c) To find the conditional PMF P D|N (d|2), we first need to find the probability of the condition- ing event P N (2) = P N,D (2, 20) +P N,D (2, 100) +P N,D (2, 300) = 0.4 (3) The conditional PMF of N D given N = 2 is P D|N (d|2) = P N,D (2, d) P N (2) = 1/4 d = 20 1/2 d = 100 1/4 d = 300 0 otherwise (4) (d) The conditional expectation of D given N = 2 is E [D|N = 2] = ¸ d dP D|N (d|2) = 20(1/4) + 100(1/2) + 300(1/4) = 130 (5) (e) To check independence, we could calculate the marginal PMFs of N and D. In this case, however, it is simpler to observe that P D (d) = P D|N (d|2). Hence N and D are dependent. (f) In terms of N and D, the cost (in cents) of a fax is C = ND. The expected value of C is E [C] = ¸ n,d ndP N,D (n, d) (6) = 1(20)(0.2) + 1(100)(0.2) + 1(300)(0.2) (7) + 2(20)(0.3) + 2(100)(0.4) + 2(300)(0.3) = 356 (8) Problem 4.10.11 Solution The key to this problem is understanding that “Factory Q” and “Factory R” are synonyms for M = 60 and M = 180. Similarly, “small”, “medium”, and “large” orders correspond to the events B = 1, B = 2 and B = 3. (a) The following table given in the problem statement Factory Q Factory R small order 0.3 0.2 medium order 0.1 0.2 large order 0.1 0.1 can be translated into the following joint PMF for B and M. P B,M (b, m) m = 60 m = 180 b = 1 0.3 0.2 b = 2 0.1 0.2 b = 3 0.1 0.1 (1) 179 (b) Before we find E[B], it will prove helpful to find the marginal PMFs P B (b) and P M (m). These can be found from the row and column sums of the table of the joint PMF P B,M (b, m) m = 60 m = 180 P B (b) b = 1 0.3 0.2 0.5 b = 2 0.1 0.2 0.3 b = 3 0.1 0.1 0.2 P M (m) 0.5 0.5 (2) The expected number of boxes is E [B] = ¸ b bP B (b) = 1(0.5) + 2(0.3) + 3(0.2) = 1.7 (3) (c) From the marginal PMF of B, we know that P B (2) = 0.3. The conditional PMF of M given B = 2 is P M|B (m|2) = P B,M (2, m) P B (2) = 1/3 m = 60 2/3 m = 180 0 otherwise (4) (d) The conditional expectation of M given B = 2 is E [M|B = 2] = ¸ m mP M|B (m|2) = 60(1/3) + 180(2/3) = 140 (5) (e) From the marginal PMFs we calculated in the table of part (b), we can conclude that B and M are not independent. since P B,M (1, 60) = P B (1)P M (m)60. (f) In terms of M and B, the cost (in cents) of sending a shipment is C = BM. The expected value of C is E [C] = ¸ b,m bmP B,M (b, m) (6) = 1(60)(0.3) + 2(60)(0.1) + 3(60)(0.1) (7) + 1(180)(0.2) + 2(180)(0.2) + 3(180)(0.1) = 210 (8) Problem 4.10.12 Solution Random variables X 1 and X 2 are iiid with PDF f X (x) = x/2 0 ≤ x ≤ 2, 0 otherwise. (1) (a) Since X 1 and X 2 are identically distributed they will share the same CDF F X (x). F X (x) = x 0 f X x dx = 0 x ≤ 0 x 2 /4 0 ≤ x ≤ 2 1 x ≥ 2 (2) 180 (b) Since X 1 and X 2 are independent, we can say that P [X 1 ≤ 1, X 2 ≤ 1] = P [X 1 ≤ 1] P [X 2 ≤ 1] = F X 1 (1) F X 2 (1) = [F X (1)] 2 = 1 16 (3) (c) For W = max(X 1 , X 2 ), F W (1) = P [max(X 1 , X 2 ) ≤ 1] = P [X 1 ≤ 1, X 2 ≤ 1] (4) Since X 1 and X 2 are independent, F W (1) = P [X 1 ≤ 1] P [X 2 ≤ 1] = [F X (1)] 2 = 1/16 (5) (d) F W (w) = P [max(X 1 , X 2 ) ≤ w] = P [X 1 ≤ w, X 2 ≤ w] (6) Since X 1 and X 2 are independent, F W (w) = P [X 1 ≤ w] P [X 2 ≤ w] = [F X (w)] 2 = 0 w ≤ 0 w 4 /16 0 ≤ w ≤ 2 1 w ≥ 2 (7) Problem 4.10.13 Solution X and Y are independent random variables with PDFs f X (x) = 2x 0 ≤ x ≤ 1 0 otherwise f Y (y) = 3y 2 0 ≤ y ≤ 1 0 otherwise (1) For the event A = {X > Y }, this problem asks us to calculate the conditional expectations E[X|A] and E[Y |A]. We will do this using the conditional joint PDF f X,Y |A (x, y). Since X and Y are independent, it is tempting to argue that the event X > Y does not alter the probability model for X and Y . Unfortunately, this is not the case. When we learn that X > Y , it increases the probability that X is large and Y is small. We will see this when we compare the conditional expectations E[X|A] and E[Y |A] to E[X] and E[Y ]. (a) We can calculate the unconditional expectations, E[X] and E[Y ], using the marginal PDFs f X (x) and f Y (y). E [X] = ∞ −∞ f X (x) dx = 1 0 2x 2 dx = 2/3 (2) E [Y ] = ∞ −∞ f Y (y) dy = 1 0 3y 3 dy = 3/4 (3) (b) First, we need to calculate the conditional joint PDF ipdfX, Y |Ax, y. The first step is to write down the joint PDF of X and Y : f X,Y (x, y) = f X (x) f Y (y) = 6xy 2 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise (4) 181 Y X 1 1 X>Y The event A has probability P [A] = x>y f X,Y (x, y) dy dx (5) = 1 0 x 0 6xy 2 dy dx (6) = 1 0 2x 4 dx = 2/5 (7) Y X 1 1 The conditional joint PDF of X and Y given A is f X,Y |A (x, y) = f X,Y (x,y) P[A] (x, y) ∈ A 0 otherwise (8) = 15xy 2 0 ≤ y ≤ x ≤ 1 0 otherwise (9) The triangular region of nonzero probability is a signal that given A, X and Y are no longer independent. The conditional expected value of X given A is E [X|A] = ∞ −∞ ∞ −∞ xf X,Y |A (x, y|a) x, y dy dx (10) = 15 1 0 x 2 x 0 y 2 dy dx (11) = 5 1 0 x 5 dx = 5/6. (12) The conditional expected value of Y given A is E [Y |A] = ∞ −∞ ∞ −∞ yf X,Y |A (x, y) dy dx (13) = 15 1 0 x x 0 y 3 dy dx (14) = 15 4 1 0 x 5 dx = 5/8. (15) We see that E[X|A] > E[X] while E[Y |A] < E[Y ]. That is, learning X > Y gives us a clue that X may be larger than usual while Y may be smaller than usual. Problem 4.10.14 Solution This problem is quite straightforward. From Theorem 4.4, we can find the joint PDF of X and Y is f X,Y (x, y) = ∂ 2 [F X (x) F Y (y)] ∂x∂y = ∂[f X (x) F Y (y)] ∂y = f X (x) f Y (y) (1) Hence, F X,Y (x, y) = F X (x)F Y (y) implies that X and Y are independent. If X and Y are independent, then f X,Y (x, y) = f X (x) f Y (y) (2) 182 By Definition 4.3, F X,Y (x, y) = x −∞ y −∞ f X,Y (u, v) dv du (3) = x −∞ f X (u) du y −∞ f Y (v) dv (4) = F X (x) F X (x) (5) Problem 4.10.15 Solution Random variables X and Y have joint PDF f X,Y (x, y) = λ 2 e −λy 0 ≤ x ≤ y 0 otherwise (1) For W = Y −X we can find f W (w) by integrating over the region indicated in the figure below to get F W (w) then taking the derivative with respect to w. Since Y ≥ X, W = Y −X is nonnegative. Hence F W (w) = 0 for w < 0. For w ≥ 0, Y X w X<Y<X+w F W (w) = 1 −P [W > w] = 1 −P [Y > X +w] (2) = 1 − ∞ 0 ∞ x+w λ 2 e −λy dy dx (3) = 1 −e −λw (4) The complete expressions for the joint CDF and corresponding joint PDF are F W (w) = 0 w < 0 1 −e −λw w ≥ 0 f W (w) = 0 w < 0 λe −λw w ≥ 0 (5) Problem 4.10.16 Solution (a) To find if W and X are independent, we must be able to factor the joint density function f X,W (x, w) into the product f X (x)f W (w) of marginal density functions. To verify this, we must find the joint PDF of X and W. First we find the joint CDF. F X,W (x, w) = P [X ≤ x, W ≤ w] (1) = P [X ≤ x, Y −X ≤ w] = P [X ≤ x, Y ≤ X +w] (2) Since Y ≥ X, the CDF of W satisfies F X,W (x, w) = P[X ≤ x, X ≤ Y ≤ X +w]. Thus, for x ≥ 0 and w ≥ 0, Y X w {X<x} {X<Y<X+w} ∩ x F X,W (x, w) = x 0 x +w x λ 2 e −λy dy dx (3) = x 0 −λe −λy x +w x dx (4) = x 0 −λe −λ(x +w) + λe −λx dx (5) = e −λ(x +w) −e −λx x 0 (6) = (1 −e −λx )(1 −e −λw ) (7) 183 We see that F X,W (x, w) = F X (x)F W (w). Moreover, by applying Theorem 4.4, f X,W (x, w) = ∂ 2 F X,W (x, w) ∂x∂w = λe −λx λe −λw = f X (x) f W (w) . (8) Since we have our desired factorization, W and X are independent. (b) Following the same procedure, we find the joint CDF of Y and W. F W,Y (w, y) = P [W ≤ w, Y ≤ y] = P [Y −X ≤ w, Y ≤ y] (9) = P [Y ≤ X +w, Y ≤ y] . (10) The region of integration corresponding to the event {Y ≤ x +w, Y ≤ y} depends on whether y < w or y ≥ w. Keep in mind that although W = Y −X ≤ Y , the dummy arguments y and w of f W,Y (w, y) need not obey the same constraints. In any case, we must consider each case separately. Y X w {Y<y} {Y<X+w} Ç y y y-w For y > w, the integration is F W,Y (w, y) = y−w 0 u+w u λ 2 e −λv dv du + y y−w y u λ 2 e −λv dv du (11) = λ y−w 0 e −λu −e −λ(u+w) du + λ y y−w e −λu −e −λy du (12) It follows that F W,Y (w, y) = −e −λu +e −λ(u+w) y−w 0 + −e −λu −uλe −λy y y−w (13) = 1 −e −λw −λwe −λy . (14) For y ≤ w, Y X w {Y<y} y F W,Y (w, y) = y 0 y u λ 2 e −λv dv du (15) = y 0 −λe −λy + λe −λu du (16) = −λue −λy −e −λu y 0 (17) = 1 −(1 + λy)e −λy (18) The complete expression for the joint CDF is F W,Y (w, y) = 1 −e −λw −λwe −λy 0 ≤ w ≤ y 1 −(1 + λy)e −λy 0 ≤ y ≤ w 0 otherwise (19) Applying Theorem 4.4 yields f W,Y (w, y) = ∂ 2 F W,Y (w, y) ∂w∂y = 2λ 2 e −λy 0 ≤ w ≤ y 0 otherwise (20) The joint PDF f W,Y (w, y) doesn’t factor and thus W and Y are dependent. 184 Problem 4.10.17 Solution We need to define the events A = {U ≤ u} and B = {V ≤ v}. In this case, F U,V (u, v) = P [AB] = P [B] −P [A c B] = P [V ≤ v] −P [U > u, V ≤ v] (1) Note that U = min(X, Y ) > u if and only if X > u and Y > u. In the same way, since V = max(X, Y ), V ≤ v if and only if X ≤ v and Y ≤ v. Thus P [U > u, V ≤ v] = P [X > u, Y > u, X ≤ v, Y ≤ v] (2) = P [u < X ≤ v, u < Y ≤ v] (3) Thus, the joint CDF of U and V satisfies F U,V (u, v) = P [V ≤ v] −P [U > u, V ≤ v] (4) = P [X ≤ v, Y ≤ v] −P [u < X ≤ v, u < X ≤ v] (5) Since X and Y are independent random variables, F U,V (u, v) = P [X ≤ v] P [Y ≤ v] −P [u < X ≤ v] P [u < X ≤ v] (6) = F X (v) F Y (v) −(F X (v) −F X (u)) (F Y (v) −F Y (u)) (7) = F X (v) F Y (u) +F X (u) F Y (v) −F X (u) F Y (u) (8) The joint PDF is f U,V (u, v) = ∂ 2 F U,V (u, v) ∂u∂v (9) = ∂ ∂u [f X (v) F Y (u) +F X (u) f Y (v)] (10) = f X (u) f Y (v) +f X (v) f Y (v) (11) Problem 4.11.1 Solution f X,Y (x, y) = ce −(x 2 /8)−(y 2 /18) (1) The omission of any limits for the PDF indicates that it is defined over all x and y. We know that f X,Y (x, y) is in the form of the bivariate Gaussian distribution so we look to Definition 4.17 and attempt to find values for σ Y , σ X , E[X], E[Y ] and ρ. First, we know that the constant is c = 1 2πσ X σ Y 1 −ρ 2 (2) Because the exponent of f X,Y (x, y) doesn’t contain any cross terms we know that ρ must be zero, and we are left to solve the following for E[X], E[Y ], σ X , and σ Y : x −E [X] σ X 2 = x 2 8 y −E [Y ] σ Y 2 = y 2 18 (3) From which we can conclude that E [X] = E [Y ] = 0 (4) σ X = √ 8 (5) σ Y = √ 18 (6) Putting all the pieces together, we find that c = 1 24π . Since ρ = 0, we also find that X and Y are independent. 185 Problem 4.11.2 Solution For the joint PDF f X,Y (x, y) = ce −(2x 2 −4xy+4y 2 ) , (1) we proceed as in Problem 4.11.1 to find values for σ Y , σ X , E[X], E[Y ] and ρ. (a) First, we try to solve the following equations x −E [X] σ X 2 = 4(1 −ρ 2 )x 2 (2) y −E [Y ] σ Y 2 = 8(1 −ρ 2 )y 2 (3) 2ρ σ X σ Y = 8(1 −ρ 2 ) (4) The first two equations yield E[X] = E[Y ] = 0 (b) To find the correlation coefficient ρ, we observe that σ X = 1/ 4(1 −ρ 2 ) σ Y = 1/ 8(1 −ρ 2 ) (5) Using σ X and σ Y in the third equation yields ρ = 1/ √ 2. (c) Since ρ = 1/ √ 2, now we can solve for σ X and σ Y . σ X = 1/ √ 2 σ Y = 1/2 (6) (d) From here we can solve for c. c = 1 2πσ X σ Y 1 −ρ 2 = 2 π (7) (e) X and Y are dependent because ρ = 0. Problem 4.11.3 Solution From the problem statement, we learn that µ X = µ Y = 0 σ 2 X = σ 2 Y = 1 (1) From Theorem 4.30, the conditional expectation of Y given X is E [Y |X] = ˜ µ Y (X) = µ Y + ρ σ Y σ X (X −µ X ) = ρX (2) In the problem statement, we learn that E[Y |X] = X/2. Hence ρ = 1/2. From Definition 4.17, the joint PDF is f X,Y (x, y) = 1 √ 3π 2 e −2(x 2 −xy+y 2 )/3 (3) 186 Problem 4.11.4 Solution The event B is the set of outcomes satisfying X 2 + Y 2 ≤ 2 2 . Of ocurse, the calculation of P[B] depends on the probability model for X and Y . (a) In this instance, X and Y have the same PDF f X (x) = f Y (x) = 0.01 −50 ≤ x ≤ 50 0 otherwise (1) Since X and Y are independent, their joint PDF is f X,Y (x, y) = f X (x) f Y (y) = 10 −4 −50 ≤ x ≤ 50, −50 ≤ y ≤ 50 0 otherwise (2) Because X and Y have a uniform PDF over the bullseye area, P[B] is just the value of the joint PDF over the area times the area of the bullseye. P [B] = P X 2 +Y 2 ≤ 2 2 = 10 −4 · π2 2 = 4π · 10 −4 ≈ 0.0013 (3) (b) In this case, the joint PDF of X and Y is inversely proportional to the area of the target. f X,Y (x, y) = 1/[π50 2 ] x 2 +y 2 ≤ 50 2 0 otherwise (4) The probability of a bullseye is P [B] = P X 2 +Y 2 ≤ 2 2 = π2 2 π50 2 = 1 25 2 ≈ 0.0016. (5) (c) In this instance, X and Y have the identical Gaussian (0, σ) PDF with σ 2 = 100; i.e., f X (x) = f Y (x) = 1 √ 2πσ 2 e −x 2 /2σ 2 (6) Since X and Y are independent, their joint PDF is f X,Y (x, y) = f X (x) f Y (y) = 1 2πσ 2 e −(x 2 +y 2 )/2σ 2 (7) To find P[B], we write P [B] = P X 2 +Y 2 ≤ 2 2 = x 2 +y 2 ≤2 2 f X,Y (x, y) dxdy (8) = 1 2πσ 2 x 2 +y 2 ≤2 2 e −(x 2 +y 2 )/2σ 2 dxdy (9) This integral is easy using polar coordinates. With the substitutions x 2 + y 2 = r 2 , and dxdy = r dr dθ, P [B] = 1 2πσ 2 2 0 2π 0 e −r 2 /2σ 2 r dr dθ (10) = 1 σ 2 2 0 re −r 2 /2σ 2 dr (11) = −e −r 2 /2σ 2 2 0 = 1 −e −4/200 ≈ 0.0198. (12) 187 Problem 4.11.5 Solution (a) The person’s temperature is high with probability p = P [T > 38] = P [T −37 > 38 −37] = 1 −Φ(1) = 0.159. (1) Given that the temperature is high, then W is measured. Since ρ = 0, W and T are inde- pendent and q = P [W > 10] = P ¸ W −7 2 > 10 −7 2 = 1 −Φ(1.5) = 0.067. (2) The tree for this experiment is $ $ $ $ $ $ $ $ $T>38 p T≤38 1−p $ $ $ $ $ $ $ $ $W>10 q W≤10 1−q The probability the person is ill is P [I] = P [T > 38, W > 10] = P [T > 38] P [W > 10] = pq = 0.0107. (3) (b) The general form of the bivariate Gaussian PDF is f W,T (w, t) = exp − w−µ 1 σ 1 2 − 2ρ(w−µ 1 )(t−µ 2 ) σ 1 σ 2 + t−µ 2 σ 2 2 2(1 −ρ 2 ) ¸ ¸ ¸ 2πσ 1 σ 2 1 −ρ 2 (4) With µ 1 = E[W] = 7, σ 1 = σ W = 2, µ 2 = E[T] = 37 and σ 2 = σ T = 1 and ρ = 1/ √ 2, we have f W,T (w, t) = 1 2π √ 2 exp ¸ − (w −7) 2 4 − √ 2(w −7)(t −37) 2 + (t −37) 2 ¸ (5) To find the conditional probability P[I|T = t], we need to find the conditional PDF of W given T = t. The direct way is simply to use algebra to find f W|T (w|t) = f W,T (w, t) f T (t) (6) The required algebra is essentially the same as that needed to prove Theorem 4.29. Its easier just to apply Theorem 4.29 which says that given T = t, the conditional distribution of W is Gaussian with E [W|T = t] = E [W] + ρ σ W σ T (t −E [T]) (7) Var[W|T = t] = σ 2 W (1 −ρ 2 ) (8) 188 Plugging in the various parameters gives E [W|T = t] = 7 + √ 2(t −37) and Var [W|T = t] = 2 (9) Using this conditional mean and variance, we obtain the conditional Gaussian PDF f W|T (w|t) = 1 √ 4π e −(w−(7+ √ 2(t−37))) 2 /4 . (10) Given T = t, the conditional probability the person is declared ill is P [I|T = t] = P [W > 10|T = t] (11) = P ¸ W −(7 + √ 2(t −37)) √ 2 > 10 −(7 + √ 2(t −37)) √ 2 ¸ (12) = P ¸ Z > 3 − √ 2(t −37) √ 2 ¸ = Q 3 √ 2 2 −(t −37) . (13) Problem 4.11.6 Solution The given joint PDF is f X,Y (x, y) = de −(a 2 x 2 +bxy+c 2 y 2 ) (1) In order to be an example of the bivariate Gaussian PDF given in Definition 4.17, we must have a 2 = 1 2σ 2 X (1 −ρ 2 ) c 2 = 1 2σ 2 Y (1 −ρ 2 ) b = −ρ σ X σ Y (1 −ρ 2 ) d = 1 2πσ X σ Y 1 −ρ 2 We can solve for σ X and σ Y , yielding σ X = 1 a 2(1 −ρ 2 ) σ Y = 1 c 2(1 −ρ 2 ) (2) Plugging these values into the equation for b, it follows that b = −2acρ, or, equivalently, ρ = −b/2ac. This implies d 2 = 1 4π 2 σ 2 X σ 2 Y (1 −ρ 2 ) = (1 −ρ 2 )a 2 c 2 = a 2 c 2 −b 2 /4 (3) Since |ρ| ≤ 1, we see that |b| ≤ 2ac. Further, for any choice of a, b and c that meets this constraint, choosing d = a 2 c 2 −b 2 /4 yields a valid PDF. Problem 4.11.7 Solution From Equation (4.146), we can write the bivariate Gaussian PDF as f X,Y (x, y) = 1 σ X √ 2π e −(x−µ X ) 2 /2σ 2 X 1 ˜ σ Y √ 2π e −(y−˜ µ Y (x)) 2 /2˜ σ 2 Y (1) where ˜ µ Y (x) = µ Y + ρ σ Y σ X (x − µ X ) and ˜ σ Y = σ Y 1 −ρ 2 . However, the definitions of ˜ µ Y (x) and ˜ σ Y are not particularly important for this exercise. When we integrate the joint PDF over all x 189 and y, we obtain ∞ −∞ ∞ −∞ f X,Y (x, y) dxdy = ∞ −∞ 1 σ X √ 2π e −(x−µ X ) 2 /2σ 2 X ∞ −∞ 1 ˜ σ Y √ 2π e −(y−˜ µ Y (x)) 2 /2˜ σ 2 Y dy . .. . 1 dx (2) = ∞ −∞ 1 σ X √ 2π e −(x−µ X ) 2 /2σ 2 X dx (3) The marked integral equals 1 because for each value of x, it is the integral of a Gaussian PDF of one variable over all possible values. In fact, it is the integral of the conditional PDF f Y |X (y|x) over all possible y. To complete the proof, we see that ∞ −∞ ∞ −∞ f X,Y (x, y) dxdy = ∞ −∞ 1 σ X √ 2π e −(x−µ X ) 2 /2σ 2 X dx = 1 (4) since the remaining integral is the integral of the marginal Gaussian PDF f X (x) over all possible x. Problem 4.11.8 Solution In this problem, X 1 and X 2 are jointly Gaussian random variables with E[X i ] = µ i , Var[X i ] = σ 2 i , and correlation coefficient ρ 12 = ρ. The goal is to show that Y = X 1 X 2 has variance Var[Y ] = (1 + ρ 2 )σ 2 1 σ 2 2 +µ 2 1 σ 2 2 +µ 2 2 σ 2 1 + 2ρµ 1 µ 2 σ 1 σ 2 . (1) Since Var[Y ] = E[Y 2 ] −(E[Y ]) 2 , we will find the moments of Y . The first moment is E [Y ] = E [X 1 X 2 ] = Cov [X 1 , X 2 ] +E [X 1 ] E [X 2 ] = ρσ 1 σ 2 +µ 1 µ 2 . (2) For the second moment of Y , we follow the problem hint and use the iterated expectation E Y 2 = E X 2 1 X 2 2 = E E X 2 1 X 2 2 |X 2 = E X 2 2 E X 2 1 |X 2 . (3) Given X 2 = x 2 , we observe from Theorem 4.30 that X 1 is is Gaussian with E [X 1 |X 2 = x 2 ] = µ 1 + ρ σ 1 σ 2 (x 2 −µ 2 ), Var[X 1 |X 2 = x 2 ] = σ 2 1 (1 −ρ 2 ). (4) Thus, the conditional second moment of X 1 is E X 2 1 |X 2 = (E [X 1 |X 2 ]) 2 + Var[X 1 |X 2 ] (5) = µ 1 + ρ σ 1 σ 2 (X 2 −µ 2 ) 2 + σ 2 1 (1 −ρ 2 ) (6) = [µ 2 1 + σ 2 1 (1 −ρ 2 )] + 2ρµ 1 σ 1 σ 2 (X 2 −µ 2 ) + ρ 2 σ 2 1 σ 2 2 (X 2 −µ 2 ) 2 . (7) It follows that E X 2 1 X 2 2 = E X 2 2 E X 2 1 |X 2 2 (8) = E ¸ [µ 2 1 + σ 2 1 (1 −ρ 2 )]X 2 2 + 2ρµ 1 σ 1 σ 2 (X 2 −µ 2 )X 2 2 + ρ 2 σ 2 1 σ 2 2 (X 2 −µ 2 ) 2 X 2 2 . (9) 190 Since E[X 2 2 ] = σ 2 2 +µ 2 2 , E X 2 1 X 2 2 = µ 2 1 + σ 2 1 (1 −ρ 2 ) (σ 2 2 +µ 2 2 ) + 2ρµ 1 σ 1 σ 2 E (X 2 −µ 2 )X 2 2 + ρ 2 σ 2 1 σ 2 2 E (X 2 −µ 2 ) 2 X 2 2 . (10) We observe that E (X 2 −µ 2 )X 2 2 = E (X 2 −µ 2 )(X 2 −µ 2 +µ 2 ) 2 (11) = E (X 2 −µ 2 ) (X 2 −µ 2 ) 2 + 2µ 2 (X 2 −µ 2 ) +µ 2 2 (12) = E (X 2 −µ 2 ) 3 + 2µ 2 E (X 2 −µ 2 ) 2 +µ 2 E [(X 2 −µ 2 )] (13) We recall that E[X 2 −µ 2 ] = 0 and that E[(X 2 −µ 2 ) 2 ] = σ 2 2 . We now look ahead to Problem 6.3.4 to learn that E (X 2 −µ 2 ) 3 = 0, E (X 2 −µ 2 ) 4 = 3σ 4 2 . (14) This implies E (X 2 −µ 2 )X 2 2 = 2µ 2 σ 2 2 . (15) Following this same approach, we write E (X 2 −µ 2 ) 2 X 2 2 = E (X 2 −µ 2 ) 2 (X 2 −µ 2 +µ 2 ) 2 (16) = E (X 2 −µ 2 ) 2 (X 2 −µ 2 ) 2 + 2µ 2 (X 2 −µ 2 ) +µ 2 2 (17) = E (X 2 −µ 2 ) 2 (X 2 −µ 2 ) 2 + 2µ 2 (X 2 −µ 2 ) +µ 2 2 (18) = E (X 2 −µ 2 ) 4 + 2µ 2 E X 2 −µ 2 ) 3 +µ 2 2 E (X 2 −µ 2 ) 2 . (19) It follows that E (X 2 −µ 2 ) 2 X 2 2 = 3σ 4 2 +µ 2 2 σ 2 2 . (20) Combining the above results, we can conclude that E X 2 1 X 2 2 = µ 2 1 + σ 2 1 (1 −ρ 2 ) (σ 2 2 +µ 2 2 ) + 2ρµ 1 σ 1 σ 2 (2µ 2 σ 2 2 ) + ρ 2 σ 2 1 σ 2 2 (3σ 4 2 +µ 2 2 σ 2 2 ) (21) = (1 + 2ρ 2 )σ 2 1 σ 2 2 +µ 2 2 σ 2 1 +µ 2 1 σ 2 2 +µ 2 1 µ 2 2 + 4ρµ 1 µ 2 σ 1 σ 2 . (22) Finally, combining Equations (2) and (22) yields Var[Y ] = E X 2 1 X 2 2 −(E [X 1 X 2 ]) 2 (23) = (1 + ρ 2 )σ 2 1 σ 2 2 +µ 2 1 σ 2 2 +µ 2 2 σ 2 1 + 2ρµ 1 µ 2 σ 1 σ 2 . (24) Problem 4.12.1 Solution The script imagepmf in Example 4.27 generates the grid variables SX, SY, and PXY. Recall that for each entry in the grid, SX. SY and PXY are the corresponding values of x, y and P X,Y (x, y). Displaying them as adjacent column vectors forms the list of all possible pairs x, y and the proba- bilities P X,Y (x, y). Since any Matlab vector or matrix x is reduced to a column vector with the command x(:), the following simple commands will generate the list: 191 >> format rat; >> imagepmf; >> [SX(:) SY(:) PXY(:)] ans = 800 400 1/5 1200 400 1/20 1600 400 0 800 800 1/20 1200 800 1/5 1600 800 1/10 800 1200 1/10 1200 1200 1/10 1600 1200 1/5 >> Note that the command format rat wasn’t necessary; it just formats the output as rational num- bers, i.e., ratios of integers, which you may or may not find esthetically pleasing. Problem 4.12.2 Solution In this problem, we need to calculate E[X], E[Y ], the correlation E[XY ], and the covariance Cov[X, Y ] for random variables X and Y in Example 4.27. In this case, we can use the script imagepmf.m in Example 4.27 to generate the grid variables SX, SY and PXY that describe the joint PMF P X,Y (x, y). However, for the rest of the problem, a general solution is better than a specific solution. The general problem is that given a pair of finite random variables described by the grid variables SX, SY and PXY, we want Matlab to calculate an expected value E[g(X, Y )]. This problem is solved in a few simple steps. First we write a function that calculates the expected value of any finite random variable. function ex=finiteexp(sx,px); %Usage: ex=finiteexp(sx,px) %returns the expected value E[X] %of finite random variable X described %by samples sx and probabilities px ex=sum((sx(:)).*(px(:))); Note that finiteexp performs its calculations on the sample values sx and probabilities px using the column vectors sx(:) and px(:). As a result, we can use the same finiteexp function when the random variable is represented by grid variables. For example, we can calculate the correlation r = E[XY ] as r=finiteexp(SX.*SY,PXY) It is also convenient to define a function that returns the covariance: function covxy=finitecov(SX,SY,PXY); %Usage: cxy=finitecov(SX,SY,PXY) %returns the covariance of %finite random variables X and Y %given by grids SX, SY, and PXY ex=finiteexp(SX,PXY); ey=finiteexp(SY,PXY); R=finiteexp(SX.*SY,PXY); covxy=R-ex*ey; 192 The following script calculates the desired quantities: %imageavg.m %Solution for Problem 4.12.2 imagepmf; %defines SX, SY, PXY ex=finiteexp(SX,PXY) ey=finiteexp(SY,PXY) rxy=finiteexp(SX.*SY,PXY) cxy=finitecov(SX,SY,PXY) >> imageavg ex = 1180 ey = 860 rxy = 1064000 cxy = 49200 >> The careful reader will observe that imagepmf is inefficiently coded in that the correlation E[XY ] is calculated twice, once directly and once inside of finitecov. For more complex problems, it would be worthwhile to avoid this duplication. Problem 4.12.3 Solution The script is just a Matlab calculation of F X,Y (x, y) in Equation (4.29). %trianglecdfplot.m [X,Y]=meshgrid(0:0.05:1.5); R=(0<=Y).*(Y<=X).*(X<=1).*(2*(X.*Y)-(Y.^2)); R=R+((0<=X).*(X<Y).*(X<=1).*(X.^2)); R=R+((0<=Y).*(Y<=1).*(1<X).*((2*Y)-(Y.^2))); R=R+((X>1).*(Y>1)); mesh(X,Y,R); xlabel(’\it x’); ylabel(’\it y’); For functions like F X,Y (x, y) that have multiple cases, we calculate the function for each case and multiply by the corresponding boolean condition so as to have a zero contribution when that case doesn’t apply. Using this technique, its important to define the boundary conditions carefully to make sure that no point is included in two different boundary conditions. Problem 4.12.4 Solution By following the formulation of Problem 4.2.6, the code to set up the sample grid is reasonably straightforward: function [SX,SY,PXY]=circuits(n,p); %Usage: [SX,SY,PXY]=circuits(n,p); % (See Problem 4.12.4) [SX,SY]=ndgrid(0:n,0:n); PXY=0*SX; PXY(find((SX==n) & (SY==n)))=p^n; for y=0:(n-1), I=find((SY==y) &(SX>=SY) &(SX<n)); PXY(I)=(p^y)*(1-p)* ... binomialpmf(n-y-1,p,SX(I)-y); end; The only catch is that for a given value of y, we need to calculate the binomial probability of x−y successes in (n −y −1) trials. We can do this using the function call binomialpmf(n-y-1,p,x-y) 193 However, this function expects the argument n-y-1 to be a scalar. As a result, we must perform a separate call to binomialpmf for each value of y. An alternate solution is direct calculation of the PMF P X,Y (x, y) in Problem 4.2.6. In this case, we calculate m! using the Matlab function gamma(m+1). Because, gamma(x) function will calculate the gamma function for each element in a vector x, we can calculate the PMF without any loops: function [SX,SY,PXY]=circuits2(n,p); %Usage: [SX,SY,PXY]=circuits2(n,p); % (See Problem 4.12.4) [SX,SY]=ndgrid(0:n,0:n); PXY=0*SX; PXY(find((SX==n) & (SY==n)))=p^n; I=find((SY<=SX) &(SX<n)); PXY(I)=(gamma(n-SY(I))./(gamma(SX(I)-SY(I)+1)... .*gamma(n-SX(I)))).*(p.^SX(I)).*((1-p).^(n-SX(I))); Some experimentation with cputime will show that circuits2(n,p) runs much faster than circuits(n,p). As is typical, the for loop in circuit results in time wasted running the Matlab interpretor and in regenerating the binomial PMF in each cycle. To finish the problem, we need to calculate the correlation coefficient ρ X,Y = Cov [X, Y ] σ X σ Y . (1) In fact, this is one of those problems where a general solution is better than a specific solution. The general problem is that given a pair of finite random variables described by the grid variables SX, SY and PMF PXY, we wish to calculate the correlation coefficient This problem is solved in a few simple steps. First we write a function that calculates the expected value of a finite random variable. function ex=finiteexp(sx,px); %Usage: ex=finiteexp(sx,px) %returns the expected value E[X] %of finite random variable X described %by samples sx and probabilities px ex=sum((sx(:)).*(px(:))); Note that finiteexp performs its calculations on the sample values sx and probabilities px using the column vectors sx(:) and px(:). As a result, we can use the same finiteexp function when the random variable is represented by grid variables. We can build on finiteexp to calculate the variance using finitevar: function v=finitevar(sx,px); %Usage: ex=finitevar(sx,px) % returns the variance Var[X] % of finite random variables X described by % samples sx and probabilities px ex2=finiteexp(sx.^2,px); ex=finiteexp(sx,px); v=ex2-(ex^2); Putting these pieces together, we can calculate the correlation coefficient. 194 function rho=finitecoeff(SX,SY,PXY); %Usage: rho=finitecoeff(SX,SY,PXY) %Calculate the correlation coefficient rho of %finite random variables X and Y ex=finiteexp(SX,PXY); vx=finitevar(SX,PXY); ey=finiteexp(SY,PXY); vy=finitevar(SY,PXY); R=finiteexp(SX.*SY,PXY); rho=(R-ex*ey)/sqrt(vx*vy); Calculating the correlation coefficient of X and Y , is now a two line exercise.. >> [SX,SY,PXY]=circuits2(50,0.9); >> rho=finitecoeff(SX,SY,PXY) rho = 0.4451 >> Problem 4.12.5 Solution In the first approach X is an exponential (λ) random variable, Y is an independent exponential (µ) random variable, and W = Y/X. we implement this approach in the function wrv1.m shown below. In the second approach, we use Theorem 3.22 and generate samples of a uniform (0, 1) random variable U and calculate W = F −1 W (U). In this problem, F W (w) = 1 − λ/µ λ/µ +w . (1) Setting u = F W (w) and solving for w yields w = F −1 W (u) = λ µ u 1 −u (2) We implement this solution in the function wrv2. Here are the two solutions: function w=wrv1(lambda,mu,m) %Usage: w=wrv1(lambda,mu,m) %Return m samples of W=Y/X %X is exponential (lambda) %Y is exponential (mu) x=exponentialrv(lambda,m); y=exponentialrv(mu,m); w=y./x; function w=wrv2(lambda,mu,m) %Usage: w=wrv1(lambda,mu,m) %Return m samples of W=Y/X %X is exponential (lambda) %Y is exponential (mu) %Uses CDF of F_W(w) u=rand(m,1); w=(lambda/mu)*u./(1-u); We would expect that wrv2 would be faster simply because it does less work. In fact, its instructive to account for the work each program does. • wrv1 Each exponential random sample requires the generation of a uniform random variable, and the calculation of a logarithm. Thus, we generate 2m uniform random variables, calculate 2m logarithms, and perform m floating point divisions. • wrv2 Generate m uniform random variables and perform m floating points divisions. This quickie analysis indicates that wrv1 executes roughly 5m operations while wrv2 executes about 2m operations. We might guess that wrv2 would be faster by a factor of 2.5. Experimentally, we calculated the execution time associated with generating a million samples: 195 >> t2=cputime;w2=wrv2(1,1,1000000);t2=cputime-t2 t2 = 0.2500 >> t1=cputime;w1=wrv1(1,1,1000000);t1=cputime-t1 t1 = 0.7610 >> We see in our simple experiments that wrv2 is faster by a rough factor of 3. (Note that repeating such trials yielded qualitatively similar results.) 196 Problem Solutions – Chapter 5 Problem 5.1.1 Solution The repair of each laptop can be viewed as an independent trial with four possible outcomes corresponding to the four types of needed repairs. (a) Since the four types of repairs are mutually exclusive choices and since 4 laptops are returned for repair, the joint distribution of N 1 , . . . , N 4 is the multinomial PMF P N 1 ,...,N 4 (n 1 , . . . , n 4 ) = 4 n 1 , n 2 , n 3 , n 4 p n 1 1 p n 2 2 p n 3 3 p n 4 4 (1) = 4! n 1 !n 2 !n 3 !n 4 ! 8 15 n 1 4 15 n 2 2 15 n 3 1 15 n 4 n 1 + · · · +n 4 = 4; n i ≥ 0 0 otherwise (2) (b) Let L 2 denote the event that exactly two laptops need LCD repairs. Thus P[L 2 ] = P N 1 (2). Since each laptop requires an LCD repair with probability p 1 = 8/15, the number of LCD repairs, N 1 , is a binomial (4, 8/15) random variable with PMF P N 1 (n 1 ) = 4 n 1 (8/15) n 1 (7/15) 4−n 1 (3) The probability that two laptops need LCD repairs is P N 1 (2) = 4 2 (8/15) 2 (7/15) 2 = 0.3717 (4) (c) A repair is type (2) with probability p 2 = 4/15. A repair is type (3) with probability p 3 = 2/15; otherwise a repair is type “other” with probability p o = 9/15. Define X as the number of “other” repairs needed. The joint PMF of X, N 2 , N 3 is the multinomial PMF P N 2 ,N 3 ,X (n 2 , n 3 , x) = 4 n 2 , n 3 , x 4 15 n 2 2 15 n 3 9 15 x (5) However, Since X + 4 −N 2 −N 3 , we observe that P N 2 ,N 3 (n 2 , n 3 ) = P N 2 ,N 3 ,X (n 2 , n 3 , 4 −n 2 −n 3 ) (6) = 4 n 2 , n 3 , 4 −n 2 −n 3 4 15 n 2 2 15 n 3 9 15 4−n 2 −n 3 (7) = 9 15 4 4 n 2 , n 3 , 4 −n 2 −n 3 4 9 n 2 2 9 n 3 (8) Similarly, since each repair is a motherboard repair with probability p 2 = 4/15, the number of motherboard repairs has binomial PMF P N 2 (n 2 ) n 2 = 4 n 2 4 15 n 2 11 15 4−n 2 (9) 197 Finally, the probability that more laptops require motherboard repairs than keyboard repairs is P [N 2 > N 3 ] = P N 2 ,N 3 (1, 0) +P N 2 ,N 3 (2, 0) +P N 2 ,N 3 (2, 1) +P N 2 (3) +P N 2 (4) (10) where we use the fact that if N 2 = 3 or N 2 = 4, then we must have N 2 > N 3 . Inserting the various probabilities, we obtain P [N 2 > N 3 ] = P N 2 ,N 3 (1, 0) +P N 2 ,N 3 (2, 0) +P N 2 ,N 3 (2, 1) +P N 2 (3) +P N 2 (4) (11) Plugging in the various probabilities yields P[N 2 > N 3 ] = 8,656/16,875 ≈ 0.5129. Problem 5.1.2 Solution Whether a computer has feature i is a Bernoulli trial with success probability p i = 2 −i . Given that n computers were sold, the number of computers sold with feature i has the binomial PMF P N i (n i ) = n n i p n i i (1 −p i ) n i n i = 0, 1, . . . , n 0 otherwise (1) Since a computer has feature i with probability p i independent of whether any other feature is on the computer, the number N i of computers with feature i is independent of the number of computers with any other features. That is, N 1 , . . . , N 4 are mutually independent and have joint PMF P N 1 ,...,N 4 (n 1 , . . . , n 4 ) = P N 1 (n 1 ) P N 2 (n 2 ) P N 3 (n 3 ) P N 4 (n 4 ) (2) Problem 5.1.3 Solution (a) In terms of the joint PDF, we can write joint CDF as F X 1 ,...,Xn (x 1 , . . . , x n ) = x 1 −∞ · · · xn −∞ f X 1 ,...,Xn (y 1 , . . . , y n ) dy 1 · · · dy n (1) However, simplifying the above integral depends on the values of each x i . In particular, f X 1 ,...,Xn (y 1 , . . . , y n ) = 1 if and only if 0 ≤ y i ≤ 1 for each i. Since F X 1 ,...,Xn (x 1 , . . . , x n ) = 0 if any x i < 0, we limit, for the moment, our attention to the case where x i ≥ 0 for all i. In this case, some thought will show that we can write the limits in the following way: F X 1 ,...,Xn (x 1 , . . . , x n ) = max(1,x 1 ) 0 · · · min(1,x n ) 0 dy 1 · · · dy n (2) = min(1, x 1 ) min(1, x 2 ) · · · min(1, x n ) (3) A complete expression for the CDF of X 1 , . . . , X n is F X 1 ,...,Xn (x 1 , . . . , x n ) = ¸ n i=1 min(1, x i ) 0 ≤ x i , i = 1, 2, . . . , n 0 otherwise (4) 198 (b) For n = 3, 1 −P ¸ min i X i ≤ 3/4 = P ¸ min i X i > 3/4 (5) = P [X 1 > 3/4, X 2 > 3/4, X 3 > 3/4] (6) = 1 3/4 1 3/4 1 3/4 dx 1 dx 2 dx 3 (7) = (1 −3/4) 3 = 1/64 (8) Thus P[min i X i ≤ 3/4] = 63/64. Problem 5.2.1 Solution This problem is very simple. In terms of the vector X, the PDF is f X (x) = 1 0 ≤ x ≤ 1 0 otherwise (1) However, just keep in mind that the inequalities 0 ≤ x and x ≤ 1 are vector inequalities that must hold for every component x i . Problem 5.2.2 Solution In this problem, we find the constant c from the requirement that that the integral of the vector PDF over all possible values is 1. That is, ∞ −∞ · · · ∞ −∞ f X (x) dx 1 · · · dx n = 1. Since f X (x) = ca x = c ¸ n i=1 a i x i , we have that ∞ −∞ · · · ∞ −∞ f X (x) dx 1 · · · dx n = c 1 0 · · · 1 0 n ¸ i=1 a i x i dx 1 · · · dx n (1) = c n ¸ i=1 1 0 · · · 1 0 a i x i dx 1 · · · dx n (2) = c n ¸ i=1 a i ¸ 1 0 dx 1 · · · 1 0 x i dx i · · · 1 0 dx n (3) = c n ¸ i=1 a i x 2 i 2 1 0 = c n ¸ i=1 a i 2 (4) The requirement that the PDF integrate to unity thus implies c = 2 ¸ n i=1 a i (5) Problem 5.3.1 Solution Here we solve the following problem: 1 Given f X (x) with c = 2/3 and a 1 = a 2 = a 3 = 1 in Problem 5.2.2, find the marginal PDF f X 3 (x 3 ). 1 The wrong problem statement appears in the first printing. 199 Filling in the parameters in Problem 5.2.2, we obtain the vector PDF f X (x) = 2 3 (x 1 +x 2 +x 3 ) 0 ≤ x 1 , x 2 , x 3 ≤ 1 0 otherwise (1) In this case, for 0 ≤ x 3 ≤ 1, the marginal PDF of X 3 is f X 3 (x 3 ) = 2 3 1 0 1 0 (x 1 +x 2 +x 3 ) dx 1 dx 2 (2) = 2 3 1 0 x 2 1 2 +x 2 x 1 +x 3 x 1 x 1 =1 x 1 =0 dx 2 (3) = 2 3 1 0 1 2 +x 2 +x 3 dx 2 (4) = 2 3 x 2 2 + x 2 2 2 +x 3 x 2 x 2 =1 x 2 =0 = 2 3 1 2 + 1 2 +x 3 (5) The complete expresion for the marginal PDF of X 3 is f X 3 (x 3 ) = 2(1 +x 3 )/3 0 ≤ x 3 ≤ 1, 0 otherwise. (6) Problem 5.3.2 Solution Since J 1 , J 2 and J 3 are independent, we can write P K (k) = P J 1 (k 1 ) P J 2 (k 2 −k 1 ) P J 3 (k 3 −k 2 ) (1) Since P J i (j) > 0 only for integers j > 0, we have that P K (k) > 0 only for 0 < k 1 < k 2 < k 3 ; otherwise P K (k) = 0. Finally, for 0 < k 1 < k 2 < k 3 , P K (k) = (1 −p) k 1 −1 p(1 −p) k 2 −k 1 −1 p(1 −p) k 3 −k 2 −1 p (2) = (1 −p) k 3 −3 p 3 (3) Problem 5.3.3 Solution The joint PMF is P K (k) = P K 1 ,K 2 ,K 3 (k 1 , k 2 , k 3 ) = p 3 (1 −p) k 3 −3 1 ≤ k 1 < k 2 < k 3 0 otherwise (1) (a) We start by finding P K 1 ,K 2 (k 1 , k 2 ). For 1 ≤ k 1 < k 2 , P K 1 ,K 2 (k 1 , k 2 ) = ∞ ¸ k 3 =−∞ P K 1 ,K 2 ,K 3 (k 1 , k 2 , k 3 ) (2) = ∞ ¸ k 3 =k 2 +1 p 3 (1 −p) k 3 −3 (3) = p 3 (1 −p) k 2 −2 1 + (1 −p) + (1 −p) 2 + · · · (4) = p 2 (1 −p) k 2 −2 (5) 200 The complete expression is P K 1 ,K 2 (k 1 , k 2 ) = p 2 (1 −p) k 2 −2 1 ≤ k 1 < k 2 0 otherwise (6) Next we find P K 1 ,K 3 (k 1 , k 3 ). For k 1 ≥ 1 and k 3 ≥ k 1 + 2, we have P K 1 ,K 3 (k 1 , k 3 ) = ∞ ¸ k 2 =−∞ P K 1 ,K 2 ,K 3 (k 1 , k 2 , k 3 ) = k 3 −1 ¸ k 2 =k 1 +1 p 3 (1 −p) k 3 −3 (7) = (k 3 −k 1 −1)p 3 (1 −p) k 3 −3 (8) The complete expression of the PMF of K 1 and K 3 is P K 1 ,K 3 (k 1 , k 3 ) = (k 3 −k 1 −1)p 3 (1 −p) k 3 −3 1 ≤ k 1 , k 1 + 2 ≤ k 3 , 0 otherwise. (9) The next marginal PMF is P K 2 ,K 3 (k 2 , k 3 ) = ∞ ¸ k 1 =−∞ P K 1 ,K 2 ,K 3 (k 1 , k 2 , k 3 ) = k 2 −1 ¸ k 1 =1 p 3 (1 −p) k 3 −3 (10) = (k 2 −1)p 3 (1 −p) k 3 −3 (11) The complete expression of the PMF of K 2 and K 3 is P K 2 ,K 3 (k 2 , k 3 ) = (k 2 −1)p 3 (1 −p) k 3 −3 1 ≤ k 2 < k 3 , 0 otherwise. (12) (b) Going back to first principles, we note that K n is the number of trials up to and including the nth success. Thus K 1 is a geometric (p) random variable, K 2 is an Pascal (2, p) random variable, and K 3 is an Pascal (3, p) random variable. We could write down the respective marginal PMFs of K 1 , K 2 and K 3 just by looking up the Pascal (n, p) PMF. Nevertheless, it is instructive to derive these PMFs from the joint PMF P K 1 ,K 2 ,K 3 (k 1 , k 2 , k 3 ). For k 1 ≥ 1, we can find P K 1 (k 1 ) via P K 1 (k 1 ) = ∞ ¸ k 2 =−∞ P K 1 ,K 2 (k 1 , k 2 ) = ∞ ¸ k 2 =k 1 +1 p 2 (1 −p) k 2 −2 (13) = p 2 (1 −p) k 1 −1 [1 + (1 −p) + (1 −p) 2 + · · · ] (14) = p(1 −p) k 1 −1 (15) The complete expression for the PMF of K 1 is the usual geometric PMF P K 1 (k 1 ) = p(1 −p) k 1 −1 k 1 = 1, 2, . . . , 0 otherwise. (16) Following the same procedure, the marginal PMF of K 2 is P K 2 (k 2 ) = ∞ ¸ k 1 =−∞ P K 1 ,K 2 (k 1 , k 2 ) = k 2 −1 ¸ k 1 =1 p 2 (1 −p) k 2 −2 (17) = (k 2 −1)p 2 (1 −p) k 2 −2 (18) 201 Since P K 2 (k 2 ) = 0 for k 2 < 2, the complete PMF is the Pascal (2, p) PMF P K 2 (k 2 ) = k 2 −1 1 p 2 (1 −p) k 2 −2 (19) Finally, for k 3 ≥ 3, the PMF of K 3 is P K 3 (k 3 ) = ∞ ¸ k 2 =−∞ P K 2 ,K 3 (k 2 , k 3 ) = k 3 −1 ¸ k 2 =2 (k 2 −1)p 3 (1 −p) k 3 −3 (20) = [1 + 2 + · · · + (k 3 −2)]p 3 (1 −p) k 3 −3 (21) = (k 3 −2)(k 3 −1) 2 p 3 (1 −p) k 3 −3 (22) Since P K 3 (k 3 ) = 0 for k 3 < 3, the complete expression for P K 3 (k 3 ) is the Pascal (3, p) PMF P K 3 (k 3 ) = k 3 −1 2 p 3 (1 −p) k 3 −3 . (23) Problem 5.3.4 Solution For 0 ≤ y 1 ≤ y 4 ≤ 1, the marginal PDF of Y 1 and Y 4 satisfies f Y 1 ,Y 4 (y 1 , y 4 ) = f Y (y) dy 2 dy 3 (1) = y 4 y 1 y 4 y 2 24 dy 3 dy 2 (2) = y 4 y 1 24(y 4 −y 2 ) dy 2 (3) = −12(y 4 −y 2 ) 2 y 2 =y 4 y 2 =y 1 = 12(y 4 −y 1 ) 2 (4) The complete expression for the joint PDF of Y 1 and Y 4 is f Y 1 ,Y 4 (y 1 , y 4 ) = 12(y 4 −y 1 ) 2 0 ≤ y 1 ≤ y 4 ≤ 1 0 otherwise (5) For 0 ≤ y 1 ≤ y 2 ≤ 1, the marginal PDF of Y 1 and Y 2 is f Y 1 ,Y 2 (y 1 , y 2 ) = f Y (y) dy 3 dy 4 (6) = 1 y 2 1 y 3 24 dy 4 dy 3 (7) = 1 y 2 24(1 −y 3 ) dy 3 = 12(1 −y 2 ) 2 (8) The complete expression for the joint PDF of Y 1 and Y 2 is f Y 1 ,Y 2 (y 1 , y 2 ) = 12(1 −y 2 ) 2 0 ≤ y 1 ≤ y 2 ≤ 1 0 otherwise (9) 202 For 0 ≤ y 1 ≤ 1, the marginal PDF of Y 1 can be found from f Y 1 (y 1 ) = ∞ −∞ f Y 1 ,Y 2 (y 1 , y 2 ) dy 2 = 1 y 1 12(1 −y 2 ) 2 dy 2 = 4(1 −y 1 ) 3 (10) The complete expression of the PDF of Y 1 is f Y 1 (y 1 ) = 4(1 −y 1 ) 3 0 ≤ y 1 ≤ 1 0 otherwise (11) Note that the integral f Y 1 (y 1 ) = ∞ −∞ f Y 1 ,Y 4 (y 1 , y 4 ) dy 4 would have yielded the same result. This is a good way to check our derivations of f Y 1 ,Y 4 (y 1 , y 4 ) and f Y 1 ,Y 2 (y 1 , y 2 ). Problem 5.3.5 Solution The value of each byte is an independent experiment with 255 possible outcomes. Each byte takes on the value b i with probability p i = p = 1/255. The joint PMF of N 0 , . . . , N 255 is the multinomial PMF P N 0 ,...,N 255 (n 0 , . . . , n 255 ) = 10000! n 0 !n 1 ! · · · n 255 ! p n 0 p n 1 · · · p n 255 n 0 + · · · +n 255 = 10000 (1) = 10000! n 0 !n 1 ! · · · n 255 ! (1/255) 10000 n 0 + · · · +n 255 = 10000 (2) To evaluate the joint PMF of N 0 and N 1 , we define a new experiment with three categories: b 0 , b 1 and “other.” Let ˆ N denote the number of bytes that are “other.” In this case, a byte is in the “other” category with probability ˆ p = 253/255. The joint PMF of N 0 , N 1 , and ˆ N is P N 0 ,N 1 , ˆ N (n 0 , n 1 , ˆ n) = 10000! n 0 !n 1 !ˆ n! 1 255 n 0 1 255 n 1 253 255 ˆ n n 0 +n 1 + ˆ n = 10000 (3) Now we note that the following events are one in the same: {N 0 = n 0 , N 1 = n 1 } = N 0 = n 0 , N 1 = n 1 , ˆ N = 10000 −n 0 −n 1 ¸ (4) Hence, for non-negative integers n 0 and n 1 satisfying n 0 +n 1 ≤ 10000, P N 0 ,N 1 (n 0 , n 1 ) = P N 0 ,N 1 , ˆ N (n 0 , n 1 , 10000 −n 0 −n 1 ) (5) = 10000! n 0 !n 1 !(10000 −n 0 −n 1 )! 1 255 n 0 +n 1 253 255 10000−n 0 −n 1 (6) Problem 5.3.6 Solution In Example 5.1, random variables N 1 , . . . , N r have the multinomial distribution P N 1 ,...,N r (n 1 , . . . , n r ) = n n 1 , . . . , n r p n 1 1 · · · p nr r (1) where n > r > 2. 203 (a) To evaluate the joint PMF of N 1 and N 2 , we define a new experiment with mutually exclusive events: s 1 , s 2 and “other” Let ˆ N denote the number of trial outcomes that are “other”. In this case, a trial is in the “other” category with probability ˆ p = 1 −p 1 −p 2 . The joint PMF of N 1 , N 2 , and ˆ N is P N 1 ,N 2 , ˆ N (n 1 , n 2 , ˆ n) = n! n 1 !n 2 !ˆ n! p n 1 1 p n 2 2 (1 −p 1 −p 2 ) ˆ n n 1 +n 2 + ˆ n = n (2) Now we note that the following events are one in the same: {N 1 = n 1 , N 2 = n 2 } = N 1 = n 1 , N 2 = n 2 , ˆ N = n −n 1 −n 2 ¸ (3) Hence, for non-negative integers n 1 and n 2 satisfying n 1 +n 2 ≤ n, P N 1 ,N 2 (n 1 , n 2 ) = P N 1 ,N 2 , ˆ N (n 1 , n 2 , n −n 1 −n 2 ) (4) = n! n 1 !n 2 !(n −n 1 −n 2 )! p n 1 1 p n 2 2 (1 −p 1 −p 2 ) n−n 1 −n 2 (5) (b) We could find the PMF of T i by summing the joint PMF P N 1 ,...,Nr (n 1 , . . . , n r ). However, it is easier to start from first principles. Suppose we say a success occurs if the outcome of the trial is in the set {s 1 , s 2 , . . . , s i } and otherwise a failure occurs. In this case, the success probability is q i = p 1 +· · · +p i and T i is the number of successes in n trials. Thus, T i has the binomial PMF P T i (t) = n t q t i (1 −q i ) n−t t = 0, 1, . . . , n 0 otherwise (6) (c) The joint PMF of T 1 and T 2 satisfies P T 1 ,T 2 (t 1 , t 2 ) = P [N 1 = t 1 , N 1 +N 2 = t 2 ] (7) = P [N 1 = t 1 , N 2 = t 2 −t 1 ] (8) = P N 1 ,N 2 (t 1 , t 2 −t 1 ) (9) By the result of part (a), P T 1 ,T 2 (t 1 , t 2 ) = n! t 1 !(t 2 −t 1 )!(n −t 2 )! p t 1 1 p t 2 −t 1 2 (1 −p 1 −p 2 ) n−t 2 0 ≤ t 1 ≤ t 2 ≤ n (10) Problem 5.3.7 Solution (a) Note that Z is the number of three-page faxes. In principle, we can sum the joint PMF P X,Y,Z (x, y, z) over all x, y to find P Z (z). However, it is better to realize that each fax has 3 pages with probability 1/6, independent of any other fax. Thus, Z has the binomial PMF P Z (z) = 5 z (1/6) z (5/6) 5−z z = 0, 1, . . . , 5 0 otherwise (1) (b) From the properties of the binomial distribution given in Appendix A, we know that E[Z] = 5(1/6). 204 (c) We want to find the conditional PMF of the number X of 1-page faxes and number Y of 2-page faxes given Z = 2 3-page faxes. Note that given Z = 2, X + Y = 3. Hence for non-negative integers x, y satisfying x +y = 3, P X,Y |Z (x, y|2) = P X,Y,Z (x, y, 2) P Z (2) = 5! x!y!2! (1/3) x (1/2) y (1/6) 2 5 2 (1/6) 2 (5/6) 3 (2) With some algebra, the complete expression of the conditional PMF is P X,Y |Z (x, y|2) = 3! x!y! (2/5) x (3/5) y x +y = 3, x ≥ 0, y ≥ 0; x, y integer 0 otherwise (3) In the above expression, we note that if Z = 2, then Y = 3 −X and P X|Z (x|2) = P X,Y |Z (x, 3 −x|2) = 3 x (2/5) x (3/5) 3−x x = 0, 1, 2, 3 0 otherwise (4) That is, given Z = 2, there are 3 faxes left, each of which independently could be a 1-page fax. The conditonal PMF of the number of 1-page faxes is binomial where 2/5 is the conditional probability that a fax has 1 page given that it either has 1 page or 2 pages. Moreover given X = x and Z = 2 we must have Y = 3 −x. (d) Given Z = 2, the conditional PMF of X is binomial for 3 trials and success probability 2/5. The conditional expectation of X given Z = 2 is E[X|Z = 2] = 3(2/5) = 6/5. (e) There are several ways to solve this problem. The most straightforward approach is to realize that for integers 0 ≤ x ≤ 5 and 0 ≤ y ≤ 5, the event {X = x, Y = y} occurs iff {X = x, Y = y, Z = 5 −(x +y)}. For the rest of this problem, we assume x and y are non- negative integers so that P X,Y (x, y) = P X,Y,Z (x, y, 5 −(x +y)) (5) = 5! x!y!(5−x−y)! 1 3 x 1 2 y 1 6 5−x−y 0 ≤ x +y ≤ 5, x ≥ 0, y ≥ 0 0 otherwise (6) The above expression may seem unwieldy and it isn’t even clear that it will sum to 1. To simplify the expression, we observe that P X,Y (x, y) = P X,Y,Z (x, y, 5 −x −y) = P X,Y |Z (x, y|5 −x +y) P Z (5 −x −y) (7) Using P Z (z) found in part (c), we can calculate P X,Y |Z (x, y|5 −x −y) for 0 ≤ x + y ≤ 5, integer valued. P X,Y |Z (x, y|5 −x +y) = P X,Y,Z (x, y, 5 −x −y) P Z (5 −x −y) (8) = x +y x 1/3 1/2 + 1/3 x 1/2 1/2 + 1/3 y (9) = x +y x 2 5 x 3 5 (x+y)−x (10) In the above expression, it is wise to think of x +y as some fixed value. In that case, we see that given x+y is a fixed value, X and Y have a joint PMF given by a binomial distribution 205 in x. This should not be surprising since it is just a generalization of the case when Z = 2. That is, given that there were a fixed number of faxes that had either one or two pages, each of those faxes is a one page fax with probability (1/3)/(1/2 + 1/3) and so the number of one page faxes should have a binomial distribution, Moreover, given the number X of one page faxes, the number Y of two page faxes is completely specified. Finally, by rewriting P X,Y (x, y) given above, the complete expression for the joint PMF of X and Y is P X,Y (x, y) = 5 5−x−y 1 6 5−x−y 5 6 x+y x+y x 2 5 x 3 5 y x, y ≥ 0 0 otherwise (11) Problem 5.3.8 Solution In Problem 5.3.2, we found that the joint PMF of K = K 1 K 2 K 3 is P K (k) = p 3 (1 −p) k 3 −3 k 1 < k 2 < k 3 0 otherwise (1) In this problem, we generalize the result to n messages. (a) For k 1 < k 2 < · · · < k n , the joint event {K 1 = k 1 , K 2 = k 2 , · · · , K n = k n } (2) occurs if and only if all of the following events occur A 1 k 1 −1 failures, followed by a successful transmission A 2 (k 2 −1) −k 1 failures followed by a successful transmission A 3 (k 3 −1) −k 2 failures followed by a successful transmission . . . A n (k n −1) −k n−1 failures followed by a successful transmission Note that the events A 1 , A 2 , . . . , A n are independent and P [A j ] = (1 −p) k j −k j−1 −1 p. (3) Thus P K 1 ,...,Kn (k 1 , . . . , k n ) = P [A 1 ] P [A 2 ] · · · P [A n ] (4) = p n (1 −p) (k 1 −1)+(k 2 −k 1 −1)+(k 3 −k 2 −1)+···+(kn−k n−1 −1) (5) = p n (1 −p) kn−n (6) To clarify subsequent results, it is better to rename K as K n = K 1 K 2 · · · K n . We see that P Kn (k n ) = p n (1 −p) kn−n 1 ≤ k 1 < k 2 < · · · < k n , 0 otherwise. (7) 206 (b) For j < n, P K 1 ,K 2 ,...,K j (k 1 , k 2 , . . . , k j ) = P K j (k j ) . (8) Since K j is just K n with n = j, we have P K j (k j ) = p j (1 −p) k j −j 1 ≤ k 1 < k 2 < · · · < k j , 0 otherwise. (9) (c) Rather than try to deduce P K i (k i ) from the joint PMF P Kn (k n ), it is simpler to return to first principles. In particular, K i is the number of trials up to and including the ith success and has the Pascal (i, p) PMF P K i (k i ) = k i −1 i −1 p i (1 −p) k i −i . (10) Problem 5.4.1 Solution For i = j, X i and X j are independent and E[X i X j ] = E[X i ]E[X j ] = 0 since E[X i ] = 0. Thus the i, jth entry in the covariance matrix C X is C X (i, j) = E [X i X j ] = σ 2 i i = j, 0 otherwise. (1) Thus for random vector X = X 1 X 2 · · · X n , all the off-diagonal entries in the covariance matrix are zero and the covariance matrix is C X = σ 2 1 σ 2 2 . . . σ 2 n ¸ ¸ ¸ ¸ ¸ . (2) Problem 5.4.2 Solution The random variables N 1 , N 2 , N 3 and N 4 are dependent. To see this we observe that P N i (4) = p 4 i . However, P N 1 ,N 2 ,N 3 ,N 4 (4, 4, 4, 4) = 0 = p 4 1 p 4 2 p 4 3 p 4 4 = P N 1 (4) P N 2 (4) P N 3 (4) P N 4 (4) . (1) Problem 5.4.3 Solution We will use the PDF f X (x) = 1 0 ≤ x i ≤ 1, i = 1, 2, 3, 4 0 otherwise. (1) to find the marginal PDFs f X i (x i ). In particular, for 0 ≤ x 1 ≤ 1, f X 1 (x 1 ) = 1 0 1 0 1 0 f X (x) dx 2 dx 3 dx 4 (2) = 1 0 dx 2 1 0 dx 3 1 0 dx 4 = 1. (3) 207 Thus, f X 1 (x 1 ) = 1 0 ≤ x ≤ 1, 0 otherwise. (4) Following similar steps, one can show that f X 1 (x) = f X 2 (x) = f X 3 (x) = f X 4 (x) = 1 0 ≤ x ≤ 1, 0 otherwise. (5) Thus f X (x) = f X 1 (x) f X 2 (x) f X 3 (x) f X 4 (x) . (6) We conclude that X 1 , X 2 , X 3 and X 4 are independent. Problem 5.4.4 Solution We will use the PDF f X (x) = 6e −(x 1 +2x 2 +3x 3 ) x 1 ≥ 0, x 2 ≥ 0, x 3 ≥ 0 0 otherwise. (1) to find the marginal PDFs f X i (x i ). In particular, for x 1 ≥ 0, f X 1 (x 1 ) = ∞ 0 ∞ 0 f X (x) dx 2 dx 3 (2) = 6e −x 1 ∞ 0 e −2x 2 dx 2 ∞ 0 e −3x 3 dx 3 (3) = 6e −x 1 − 1 2 e −2x 2 ∞ 0 − 1 3 e −3x 3 ∞ 0 = e −x 1 . (4) Thus, f X 1 (x 1 ) = e −x 1 x 1 ≥ 0, 0 otherwise. (5) Following similar steps, one can show that f X 2 (x 2 ) = ∞ 0 ∞ 0 f X (x) dx 1 dx 3 = 2 −2x 2 x 2 ≥ 0, 0 otherwise. (6) f X 3 (x 3 ) = ∞ 0 ∞ 0 f X (x) dx 1 dx 2 = 3 −3x 3 x 3 ≥ 0, 0 otherwise. (7) Thus f X (x) = f X 1 (x 1 ) f X 2 (x 2 ) f X 3 (x 3 ) . (8) We conclude that X 1 , X 2 , and X 3 are independent. Problem 5.4.5 Solution This problem can be solved without any real math. Some thought should convince you that for any x i > 0, f X i (x i ) > 0. Thus, f X 1 (10) > 0, f X 2 (9) > 0, and f X 3 (8) > 0. Thus f X 1 (10)f X 2 (9)f X 3 (8) > 0. However, from the definition of the joint PDF f X 1 ,X 2 ,X 3 (10, 9, 8) = 0 = f X 1 (10) f X 2 (9) f X 3 (8) . (1) It follows that X 1 , X 2 and X 3 are dependent. Readers who find this quick answer dissatisfying are invited to confirm this conclusions by solving Problem 5.4.6 for the exact expressions for the marginal PDFs f X 1 (x 1 ), f X 2 (x 2 ), and f X 3 (x 3 ). 208 Problem 5.4.6 Solution We find the marginal PDFs using Theorem 5.5. First we note that for x < 0, f X i (x) = 0. For x 1 ≥ 0, f X 1 (x 1 ) = ∞ x 1 ∞ x 2 e −x 3 dx 3 dx 2 = ∞ x 1 e −x 2 dx 2 = e −x 1 (1) Similarly, for x 2 ≥ 0, X 2 has marginal PDF f X 2 (x 2 ) = x 2 0 ∞ x 2 e −x 3 dx 3 dx 1 = x 2 0 e −x 2 dx 1 = x 2 e −x 2 (2) Lastly, f X 3 (x 3 ) = x 3 0 x 3 x 1 e −x 3 dx 2 dx 1 = x 3 0 (x 3 −x 1 )e −x 3 dx 1 (3) = − 1 2 (x 3 −x 1 ) 2 e −x 3 x 1 =x 3 x 1 =0 = 1 2 x 2 3 e −x 3 (4) The complete expressions for the three marginal PDFs are f X 1 (x 1 ) = e −x 1 x 1 ≥ 0 0 otherwise (5) f X 2 (x 2 ) = x 2 e −x 2 x 2 ≥ 0 0 otherwise (6) f X 3 (x 3 ) = (1/2)x 2 3 e −x 3 x 3 ≥ 0 0 otherwise (7) In fact, each X i is an Erlang (n, λ) = (i, 1) random variable. Problem 5.4.7 Solution Since U 1 , . . . , U n are iid uniform (0, 1) random variables, f U 1 ,...,Un (u 1 , . . . , u n ) = 1/T n 0 ≤ u i ≤ 1; i = 1, 2, . . . , n 0 otherwise (1) Since U 1 , . . . , U n are continuous, P[U i = U j ] = 0 for all i = j. For the same reason, P[X i = X j ] = 0 for i = j. Thus we need only to consider the case when x 1 < x 2 < · · · < x n . To understand the claim, it is instructive to start with the n = 2 case. In this case, (X 1 , X 2 ) = (x 1 , x 2 ) (with x 1 < x 2 ) if either (U 1 , U 2 ) = (x 1 , x 2 ) or (U 1 , U 2 ) = (x 2 , x 1 ). For infinitesimal ∆, f X 1 ,X 2 (x 1 , x 2 ) ∆ 2 = P [x 1 < X 1 ≤ x 1 +∆, x 2 < X 2 ≤ x 2 +∆] (2) = P [x 1 < U 1 ≤ x 1 +∆, x 2 < U 2 ≤ x 2 +∆] +P [x 2 < U 1 ≤ x 2 +∆, x 1 < U 2 ≤ x 1 +∆] (3) = f U 1 ,U 2 (x 1 , x 2 ) ∆ 2 +f U 1 ,U 2 (x 2 , x 1 ) ∆ 2 (4) We see that for 0 ≤ x 1 < x 2 ≤ 1 that f X 1 ,X 2 (x 1 , x 2 ) = 2/T n . (5) 209 For the general case of n uniform random variables, we define π = π(1) . . . π(n) as a permu- tation vector of the integers 1, 2, . . . , n and Π as the set of n! possible permutation vectors. In this case, the event {X 1 = x 1 , X 2 = x 2 , . . . , X n = x n } occurs if U 1 = x π(1) , U 2 = x π(2) , . . . , U n = x π(n) (6) for any permutation π ∈ Π. Thus, for 0 ≤ x 1 < x 2 < · · · < x n ≤ 1, f X 1 ,...,Xn (x 1 , . . . , x n ) ∆ n = ¸ π∈Π f U 1 ,...,Un x π(1) , . . . , x π(n) ∆ n . (7) Since there are n! permutations and f U 1 ,...,Un (x π(1) , . . . , x π(n) ) = 1/T n for each permutation π, we can conclude that f X 1 ,...,Xn (x 1 , . . . , x n ) = n!/T n . (8) Since the order statistics are necessarily ordered, f X 1 ,...,Xn (x 1 , . . . , x n ) = 0 unless x 1 < · · · < x n . Problem 5.5.1 Solution For discrete random vectors, it is true in general that P Y (y) = P [Y = y] = P [AX+b = y] = P [AX = y −b] . (1) For an arbitrary matrix A, the system of equations Ax = y − b may have no solutions (if the columns of A do not span the vector space), multiple solutions (if the columns of A are linearly dependent), or, when A is invertible, exactly one solution. In the invertible case, P Y (y) = P [AX = y −b] = P X = A −1 (y −b) = P X A −1 (y −b) . (2) As an aside, we note that when Ax = y − b has multiple solutions, we would need to do some bookkeeping to add up the probabilities P X (x) for all vectors x satisfying Ax = y − b. This can get disagreeably complicated. Problem 5.5.2 Solution The random variable J n is the number of times that message n is transmitted. Since each trans- mission is a success with probability p, independent of any other transmission, the number of transmissions of message n is independent of the number of transmissions of message m. That is, for m = n, J m and J n are independent random variables. Moreover, because each message is transmitted over and over until it is transmitted succesfully, each J m is a geometric (p) random variable with PMF P Jm (j) = (1 −p) j−1 p j = 1, 2, . . . 0 otherwise. (1) Thus the PMF of J = J 1 J 2 J 3 is P J (j) = P J 1 (j 1 ) P J 2 (j 2 ) P J 3 (j 3 ) = p 3 (1 −p) j 1 +j 2 +j 3 −3 j i = 1, 2, . . . ; i = 1, 2, 3 0 otherwise. (2) 210 Problem 5.5.3 Solution The response time X i of the ith truck has PDF f X i (x i ) and CDF F X i (x i ) given by f X i (x i ) = 1 2 e −x/2 x ≥ 0, 0 otherwise, F X i (x i ) = F X (x i ) = 1 −e −x/2 x ≥ 0 0 otherwise. (1) Let R = max(X 1 , X 2 , . . . , X 6 ) denote the maximum response time. From Theorem 5.7, R has PDF F R (r) = (F X (r)) 6 . (2) (a) The probability that all six responses arrive within five seconds is P [R ≤ 5] = F R (5) = (F X (5)) 6 = (1 −e −5/2 ) 6 = 0.5982. (3) (b) This question is worded in a somewhat confusing way. The “expected response time” refers to E[X i ], the response time of an individual truck, rather than E[R]. If the expected response time of a truck is τ, then each X i has CDF F X i (x) = F X (x) = 1 −e −x/τ x ≥ 0 0 otherwise. (4) The goal of this problem is to find the maximum permissible value of τ. When each truck has expected response time τ, the CDF of R is F R (r) = (F X (x) r) 6 = (1 −e −r/τ ) 6 r ≥ 0, 0 otherwise. (5) We need to find τ such that P [R ≤ 3] = (1 −e −3/τ ) 6 = 0.9. (6) This implies τ = −3 ln 1 −(0.9) 1/6 = 0.7406 s. (7) Problem 5.5.4 Solution Let X i denote the finishing time of boat i. Since finishing times of all boats are iid Gaussian random variables with expected value 35 minutes and standard deviation 5 minutes, we know that each X i has CDF F X i (x) = P [X i ≤ x] = P ¸ X i −35 5 ≤ x −35 5 = Φ x −35 5 (1) (a) The time of the winning boat is W = min(X 1 , X 2 , . . . , X 10 ) (2) To find the probability that W ≤ 25, we will find the CDF F W (w) since this will also be useful for part (c). F W (w) = P [min(X 1 , X 2 , . . . , X 10 ) ≤ w] (3) = 1 −P [min(X 1 , X 2 , . . . , X 10 ) > w] (4) = 1 −P [X 1 > w, X 2 > w, . . . , X 10 > w] (5) 211 Since the X i are iid, F W (w) = 1 − 10 ¸ i=1 P [X i > w] = 1 −(1 −F X i (w)) 10 (6) = 1 − 1 −Φ w −35 5 10 (7) Thus, P [W ≤ 25] = F W (25) = 1 −(1 −Φ(−2)) 10 (8) = 1 −[Φ(2)] 10 = 0.2056. (9) (b) The finishing time of the last boat is L = max(X 1 , . . . , X 10 ). The probability that the last boat finishes in more than 50 minutes is P [L > 50] = 1 −P [L ≤ 50] (10) = 1 −P [X 1 ≤ 50, X 2 ≤ 50, . . . , X 10 ≤ 50] (11) Once again, since the X i are iid Gaussian (35, 5) random variables, P [L > 50] = 1 − 10 ¸ i=1 P [X i ≤ 50] = 1 −(F X i (50)) 10 (12) = 1 −(Φ([50 −35]/5)) 10 (13) = 1 −(Φ(3)) 10 = 0.0134 (14) (c) A boat will finish in negative time if and only iff the winning boat finishes in negative time, which has probability F W (0) = 1 −(1 −Φ(−35/5)) 10 = 1 −(1 −Φ(−7)) 10 = 1 −(Φ(7)) 10 . (15) Unfortunately, the tables in the text have neither Φ(7) nor Q(7). However, those with access to Matlab, or a programmable calculator, can find out that Q(7) = 1−Φ(7) = 1.2810 −12 . This implies that a boat finishes in negative time with probability F W (0) = 1 −(1 −1.28 10 −12 ) 10 = 1.28 10 −11 . (16) Problem 5.5.5 Solution Since 50 cents of each dollar ticket is added to the jackpot, J i−1 = J i + N i 2 (1) Given J i = j, N i has a Poisson distribution with mean j. It follows that E[N i |J i = j] = j and that Var[N i |J i = j] = j. This implies E N 2 i |J i = j = Var[N i |J i = j] + (E [N i |J i = j]) 2 = j +j 2 (2) 212 In terms of the conditional expectations given J i , these facts can be written as E [N i |J i ] = J i E N 2 i |J i = J i +J 2 i (3) This permits us to evaluate the moments of J i−1 in terms of the moments of J i . Specifically, E [J i−1 |J i ] = E [J i |J i ] + 1 2 E [N i |J i ] = J i + J i 2 = 3J i 2 (4) This implies E [J i−1 ] = 3 2 E [J i ] (5) We can use this the calculate E[J i ] for all i. Since the jackpot starts at 1 million dollars, J 6 = 10 6 and E[J 6 ] = 10 6 . This implies E [J i ] = (3/2) 6−i 10 6 (6) Now we will find the second moment E[J 2 i ]. Since J 2 i−1 = J 2 i +N i J i +N 2 i /4, we have E J 2 i−1 |J i = E J 2 i |J i +E [N i J i |J i ] +E N 2 i |J i /4 (7) = J 2 i +J i E [N i |J i ] + (J i +J 2 i )/4 (8) = (3/2) 2 J 2 i +J i /4 (9) By taking the expectation over J i we have E J 2 i−1 = (3/2) 2 E J 2 i +E [J i ] /4 (10) This recursion allows us to calculate E[J 2 i ] for i = 6, 5, . . . , 0. Since J 6 = 10 6 , E[J 2 6 ] = 10 12 . From the recursion, we obtain E J 2 5 = (3/2) 2 E J 2 6 +E [J 6 ] /4 = (3/2) 2 10 12 + 1 4 10 6 (11) E J 2 4 = (3/2) 2 E J 2 5 +E [J 5 ] /4 = (3/2) 4 10 12 + 1 4 (3/2) 2 + (3/2) 10 6 (12) E J 2 3 = (3/2) 2 E J 2 4 +E [J 4 ] /4 = (3/2) 6 10 12 + 1 4 (3/2) 4 + (3/2) 3 + (3/2) 2 10 6 (13) The same recursion will also allow us to show that E J 2 2 = (3/2) 8 10 12 + 1 4 (3/2) 6 + (3/2) 5 + (3/2) 4 + (3/2) 3 10 6 (14) E J 2 1 = (3/2) 10 10 12 + 1 4 (3/2) 8 + (3/2) 7 + (3/2) 6 + (3/2) 5 + (3/2) 4 10 6 (15) E J 2 0 = (3/2) 12 10 12 + 1 4 (3/2) 10 + (3/2) 9 + · · · + (3/2) 5 10 6 (16) Finally, day 0 is the same as any other day in that J = J 0 + N 0 /2 where N 0 is a Poisson random variable with mean J 0 . By the same argument that we used to develop recursions for E[J i ] and E[J 2 i ], we can show E [J] = (3/2)E [J 0 ] = (3/2) 7 10 6 ≈ 17 10 6 (17) and E J 2 = (3/2) 2 E J 2 0 +E [J 0 ] /4 (18) = (3/2) 14 10 12 + 1 4 (3/2) 12 + (3/2) 11 + · · · + (3/2) 6 10 6 (19) = (3/2) 14 10 12 + 10 6 2 (3/2) 6 [(3/2) 7 −1] (20) 213 Finally, the variance of J is Var[J] = E J 2 −(E [J]) 2 = 10 6 2 (3/2) 6 [(3/2) 7 −1] (21) Since the variance is hard to interpret, we note that the standard deviation of J is σ J ≈ 9572. Although the expected jackpot grows rapidly, the standard deviation of the jackpot is fairly small. Problem 5.5.6 Solution Let A denote the event X n = max(X 1 , . . . , X n ). We can find P[A] by conditioning on the value of X n . P [A] = P [X 1 ≤ X n , X 2 ≤ X n , · · · , X n 1 ≤ X n ] (1) = ∞ −∞ P [X 1 < X n , X 2 < X n , · · · , X n−1 < X n |X n = x] f Xn (x) dx (2) = ∞ −∞ P [X 1 < x, X 2 < x, · · · , X n−1 < x|X n = x] f X (x) dx (3) Since X 1 , . . . , X n−1 are independent of X n , P [A] = ∞ −∞ P [X 1 < x, X 2 < x, · · · , X n−1 < x] f X (x) dx. (4) Since X 1 , . . . , X n−1 are iid, P [A] = ∞ −∞ P [X 1 ≤ x] P [X 2 ≤ x] · · · P [X n−1 ≤ x] f X (x) dx (5) = ∞ −∞ [F X (x)] n−1 f X (x) dx = 1 n [F X (x)] n ∞ −∞ = 1 n (1 −0) (6) Not surprisingly, since the X i are identical, symmetry would suggest that X n is as likely as any of the other X i to be the largest. Hence P[A] = 1/n should not be surprising. Problem 5.6.1 Solution (a) The coavariance matrix of X = X 1 X 2 is C X = ¸ Var[X 1 ] Cov [X 1 , X 2 ] Cov [X 1 , X 2 ] Var[X 2 ] = ¸ 4 3 3 9 . (1) (b) From the problem statement, Y = ¸ Y 1 Y 2 = ¸ 1 −2 3 4 X = AX. (2) By Theorem 5.13, Y has covariance matrix C Y = AC X A = ¸ 1 −2 3 4 ¸ 4 3 3 9 ¸ 1 3 −2 4 = ¸ 28 −66 −66 252 . (3) 214 Problem 5.6.2 Solution The mean value of a sum of random variables is always the sum of their individual means. E [Y ] = n ¸ i=1 E [X i ] = 0 (1) The variance of any sum of random variables can be expressed in terms of the individual variances and co-variances. Since the E[Y ] is zero, Var[Y ] = E[Y 2 ]. Thus, Var[Y ] = E n ¸ i=1 X i 2 ¸ ¸ = E n ¸ i=1 n ¸ j=1 X i X j ¸ ¸ = n ¸ i=1 E X 2 i + n ¸ i=1 ¸ j=i E [X i X j ] (2) Since E[X i ] = 0, E[X 2 i ] = Var[X i ] = 1 and for i = j, E [X i X j ] = Cov [X i , X j ] = ρ (3) Thus, Var[Y ] = n +n(n −1)ρ. Problem 5.6.3 Solution Since X and Y are independent and E[Y j ] = 0 for all components Y j , we observe that E[X i Y j ] = E[X i ]E[Y j ] = 0. This implies that the cross-covariance matrix is E XY = E [X] E Y = 0. (1) Problem 5.6.4 Solution Inspection of the vector PDF f X (x) will show that X 1 , X 2 , X 3 , and X 4 are iid uniform (0, 1) random variables. That is, f X (x) = f X 1 (x 1 ) f X 2 (x 2 ) f X 3 (x 3 ) f X 4 (x 4 ) (1) where each X i has the uniform (0, 1) PDF f X i (x) = 1 0 ≤ x ≤ 1 0 otherwise (2) It follows that for each i, E[X i ] = 1/2, E[X 2 i ] = 1/3 and Var[X i ] = 1/12. In addition, X i and X j have correlation E [X i X j ] = E [X i ] E [X j ] = 1/4. (3) and covariance Cov[X i , X j ] = 0 for i = j since independent random variables always have zero covariance. (a) The expected value vector is E [X] = E [X 1 ] E [X 2 ] E [X 3 ] E [X 4 ] = 1/2 1/2 1/2 1/2 . (4) 215 (b) The correlation matrix is R X = E XX = E X 2 1 E [X 1 X 2 ] E [X 1 X 3 ] E [X 1 X 4 ] E [X 2 X 1 ] E X 2 2 E [X 2 X 3 ] E [X 2 X 4 ] E [X 3 X 1 ] E [X 3 X 2 ] E X 2 3 E [X 3 X 4 ] E [X 4 X 1 ] E [X 4 X 2 ] E [X 4 X 3 ] E X 2 4 ¸ ¸ ¸ ¸ (5) = 1/3 1/4 1/4 1/4 1/4 1/3 1/4 1/4 1/4 1/4 1/3 1/4 1/4 1/4 1/4 1/3 ¸ ¸ ¸ ¸ (6) (c) The covariance matrix for X is the diagonal matrix C X = Var[X 1 ] Cov [X 1 , X 2 ] Cov [X 1 , X 3 ] Cov [X 1 , X 4 ] Cov [X 2 , X 1 ] Var[X 2 ] Cov [X 2 , X 3 ] Cov [X 2 , X 4 ] Cov [X 3 , X 1 ] Cov [X 3 , X 2 ] Var[X 3 ] Cov [X 3 , X 4 ] Cov [X 4 , X 1 ] Cov [X 4 , X 2 ] Cov [X 4 , X 3 ] Var[X 4 ] ¸ ¸ ¸ ¸ (7) = 1/12 0 0 0 0 1/12 0 0 0 0 1/12 0 0 0 0 1/12 ¸ ¸ ¸ ¸ (8) Note that its easy to verify that C X = R X −µ X µ X . Problem 5.6.5 Solution The random variable J m is the number of times that message m is transmitted. Since each trans- mission is a success with probability p, independent of any other transmission, J 1 , J 2 and J 3 are iid geometric (p) random variables with E [J m ] = 1 p , Var[J m ] = 1 −p p 2 . (1) Thus the vector J = J 1 J 2 J 3 has expected value E [J] = E [J 1 ] E [J 2 ] EJ 3 = 1/p 1/p 1/p . (2) For m = n, the correlation matrix R J has m, nth entry R J (m, n) = E [J m J n ] = E [J m ] J n = 1/p 2 (3) For m = n, R J (m, m) = E J 2 m = Var[J m ] + (E J 2 m ) 2 = 1 −p p 2 + 1 p 2 = 2 −p p 2 . (4) Thus R J = 1 p 2 2 −p 1 1 1 2 −p 1 1 1 2 −p ¸ ¸ . (5) Because J m and J n are independent, off-diagonal terms in the covariance matrix are C J (m, n) = Cov [J m , J n ] = 0 (6) 216 Since C J (m, m) = Var[J m ], we have that C J = 1 −p p 2 I = 1 −p p 2 1 0 0 0 1 0 0 0 1 ¸ ¸ . (7) Problem 5.6.6 Solution This problem is quite difficult unless one uses the observation that the vector K can be expressed in terms of the vector J = J 1 J 2 J 3 where J i is the number of transmissions of message i. Note that we can write K = AJ = 1 0 0 1 1 0 1 1 1 ¸ ¸ J (1) We also observe that since each transmission is an independent Bernoulli trial with success prob- ability p, the components of J are iid geometric (p) random variables. Thus E[J i ] = 1/p and Var[J i ] = (1 −p)/p 2 . Thus J has expected value E [J] = µ J = E [J 1 ] E [J 2 ] E [J 3 ] = 1/p 1/p 1/p . (2) Since the components of J are independent, it has the diagonal covariance matrix C J = Var[J 1 ] 0 0 0 Var[J 2 ] 0 0 0 Var[J 3 ] ¸ ¸ = 1 −p p 2 I (3) Given these properties of J, finding the same properties of K = AJ is simple. (a) The expected value of K is E [K] = Aµ J = 1 0 0 1 1 0 1 1 1 ¸ ¸ 1/p 1/p 1/p ¸ ¸ = 1/p 2/p 3/p ¸ ¸ (4) (b) From Theorem 5.13, the covariance matrix of K is C K = AC J A (5) = 1 −p p 2 AIA (6) = 1 −p p 2 1 0 0 1 1 0 1 1 1 ¸ ¸ 1 1 1 0 1 1 0 0 1 ¸ ¸ = 1 −p p 2 1 1 1 1 2 2 1 2 3 ¸ ¸ (7) (c) Given the expected value vector µ K and the covariance matrix C K , we can use Theorem 5.12 217 to find the correlation matrix R K = C K +µ K µ K (8) = 1 −p p 2 1 1 1 1 2 2 1 2 3 ¸ ¸ + 1/p 2/p 3/p ¸ ¸ 1/p 2/p 3/p (9) = 1 −p p 2 1 1 1 1 2 2 1 2 3 ¸ ¸ + 1 p 2 1 2 3 2 4 6 3 6 9 ¸ ¸ (10) = 1 p 2 2 −p 3 −p 4 −p 3 −p 6 −2p 8 −2p 4 −p 8 −2p 12 −3p ¸ ¸ (11) Problem 5.6.7 Solution The preliminary work for this problem appears in a few different places. In Example 5.5, we found the marginal PDF of Y 3 and in Example 5.6, we found the marginal PDFs of Y 1 , Y 2 , and Y 4 . We summarize these results here: f Y 1 (y) = f Y 3 (y) = 2(1 −y) 0 ≤ y ≤ 1, 0 otherwise, (1) f Y 2 (y) = f Y 4 (y) = 2y 0 ≤ y ≤ 1, 0 otherwise. (2) This implies E [Y 1 ] = E [Y 3 ] = 1 0 2y(1 −y) dy = 1/3 (3) E [Y 2 ] = E [Y 4 ] = 1 0 2y 2 dy = 2/3 (4) Thus Y has expected value E[Y] = 1/3 2/3 1/3 2/3 . The second part of the problem is to find the correlation matrix R Y . In fact, we need to find R Y (i, j) = E[Y i Y j ] for each i, j pair. We will see that these are seriously tedious calculations. For i = j, the second moments are E Y 2 1 = E Y 2 3 = 1 0 2y 2 (1 −y) dy = 1/6, (5) E Y 2 2 = E Y 2 4 = 1 0 2y 3 dy = 1/2. (6) In terms of the correlation matrix, R Y (1, 1) = R Y (3, 3) = 1/6, R Y (2, 2) = R Y (4, 4) = 1/2. (7) To find the off diagonal terms R Y (i, j) = E[Y i Y j ], we need to find the marginal PDFs f Y i ,Y j (y i , y j ). Example 5.5 showed that f Y 1 ,Y 4 (y 1 , y 4 ) = 4(1 −y 1 )y 4 0 ≤ y 1 ≤ 1, 0 ≤ y 4 ≤ 1, 0 otherwise. (8) f Y 2 ,Y 3 (y 2 , y 3 ) = 4y 2 (1 −y 3 ) 0 ≤ y 2 ≤ 1, 0 ≤ y 3 ≤ 1, 0 otherwise. (9) 218 Inspection will show that Y 1 and Y 4 are independent since f Y 1 ,Y 4 (y 1 , y 4 ) = f Y 1 (y 1 )f Y 4 (y 4 ). Similarly, Y 2 and Y 4 are independent since f Y 2 ,Y 3 (y 2 , y 3 ) = f Y 2 (y 2 )f Y 3 (y 3 ). This implies R Y (1, 4) = E [Y 1 Y 4 ] = E [Y 1 ] E [Y 4 ] = 2/9 (10) R Y (2, 3) = E [Y 2 Y 3 ] = E [Y 2 ] E [Y 3 ] = 2/9 (11) We also need to calculate f Y 1 ,Y 2 (y 1 , y 2 ), f Y 3 ,Y 4 (y 3 , y 4 ), f Y 1 ,Y 3 (y 1 , y 3 ) and f Y 2 ,Y 4 (y 2 , y 4 ). To start, for 0 ≤ y 1 ≤ y 2 ≤ 1, f Y 1 ,Y 2 (y 1 , y 2 ) = ∞ −∞ ∞ −∞ f Y 1 ,Y 2 ,Y 3 ,Y 4 (y 1 , y 2 , y 3 , y 4 ) dy 3 dy 4 (12) = 1 0 y 4 0 4 dy 3 dy 4 = 1 0 4y 4 dy 4 = 2. (13) Similarly, for 0 ≤ y 3 ≤ y 4 ≤ 1, f Y 3 ,Y 4 (y 3 , y 4 ) = ∞ −∞ ∞ −∞ f Y 1 ,Y 2 ,Y 3 ,Y 4 (y 1 , y 2 , y 3 , y 4 ) dy 1 dy 2 (14) = 1 0 y 2 0 4 dy 1 dy 2 = 1 0 4y 2 dy 2 = 2. (15) In fact, these PDFs are the same in that f Y 1 ,Y 2 (x, y) = f Y 3 ,Y 4 (x, y) = 2 0 ≤ x ≤ y ≤ 1, 0 otherwise. (16) This implies R Y (1, 2) = R Y (3, 4) = E[Y 3 Y 4 ] and that E [Y 3 Y 4 ] = 1 0 y 0 2xy dxdy = 1 0 yx 2 y 0 dy = 1 0 y 3 dy = 1 4 . (17) Continuing in the same way, we see for 0 ≤ y 1 ≤ 1 and 0 ≤ y 3 ≤ 1 that f Y 1 ,Y 3 (y 1 , y 3 ) = ∞ −∞ ∞ −∞ f Y 1 ,Y 2 ,Y 3 ,Y 4 (y 1 , y 2 , y 3 , y 4 ) dy 2 dy 4 (18) = 4 1 y 1 dy 2 1 y 3 dy 4 (19) = 4(1 −y 1 )(1 −y 3 ). (20) We observe that Y 1 and Y 3 are independent since f Y 1 ,Y 3 (y 1 , y 3 ) = f Y 1 (y 1 )f Y 3 (y 3 ). It follows that R Y (1, 3) = E [Y 1 Y 3 ] = E [Y 1 ] E [Y 3 ] = 1/9. (21) Finally, we need to calculate f Y 2 ,Y 4 (y 2 , y 4 ) = ∞ −∞ ∞ −∞ f Y 1 ,Y 2 ,Y 3 ,Y 4 (y 1 , y 2 , y 3 , y 4 ) dy 1 dy 3 (22) = 4 y 2 0 dy 1 y 4 0 dy 3 (23) = 4y 2 y 4 . (24) 219 We observe that Y 2 and Y 4 are independent since f Y 2 ,Y 4 (y 2 , y 4 ) = f Y 2 (y 2 )f Y 4 (y 4 ). It follows that R Y (2, 4) = E[Y 2 Y 4 ] = E[Y 2 ]E[Y 4 ] = 4/9. The above results give R Y (i, j) for i ≤ j. Since R Y is a symmetric matrix, R Y = 1/6 1/4 1/9 2/9 1/4 1/2 2/9 4/9 1/9 2/9 1/6 1/4 2/9 4/9 1/4 1/2 ¸ ¸ ¸ ¸ . (25) Since µ X = 1/3 2/3 1/3 2/3 , the covariance matrix is C Y = R Y −µ X µ X (26) = 1/6 1/4 1/9 2/9 1/4 1/2 2/9 4/9 2/9 4/9 1/4 1/2 ¸ ¸ − 1/3 2/3 1/3 2/3 ¸ ¸ ¸ ¸ 1/3 2/3 1/3 2/3 (27) = 1/18 1/36 0 0 1/36 1/18 0 0 0 0 1/18 1/36 0 0 1/36 1/18 ¸ ¸ ¸ ¸ . (28) The off-diagonal zero blocks are a consequence of Y 1 Y 2 being independent of Y 3 Y 4 . Along the diagonal, the two identical sub-blocks occur because f Y 1 ,Y 2 (x, y) = f Y 3 ,Y 4 (x, y). In short, the matrix structure is the result of Y 1 Y 2 and Y 3 Y 4 being iid random vectors. Problem 5.6.8 Solution The 2-dimensional random vector Y has PDF f Y (y) = 2 y ≥ 0, 1 1 y ≤ 1, 0 otherwise. (1) Rewritten in terms of the variables y 1 and y 2 , f Y 1 ,Y 2 (y 1 , y 2 ) = 2 y 1 ≥ 0, y 2 ≥ 0, y 1 +y 2 ≤ 1, 0 otherwise. (2) In this problem, the PDF is simple enough that we can compute E[Y n i ] for arbitrary integers n ≥ 0. E [Y n 1 ] = ∞ −∞ ∞ −∞ y n 1 f Y 1 ,Y 2 (y 1 , y 2 ) dy 1 dy 2 = 1 0 1−y 2 0 2y n 1 dy 1 dy 2 . (3) A little calculus yields E [Y n 1 ] = 1 0 2 n + 1 y n+1 1 1−y 2 0 dy 2 = 2 n + 1 1 0 (1 −y 2 ) n+1 dy 2 = 2 (n + 1)(n + 2) . (4) Symmetry of the joint PDF f Y 1 , 2 (y 1 , 2 ) implies that E[Y n 2 ] = E[Y n 1 ]. Thus, E[Y 1 ] = E[Y 2 ] = 1/3 and E [Y] = µ Y = 1/3 1/3 . (5) In addition, R Y (1, 1) = E Y 2 1 = 1/6, R Y (2, 2) = E Y 2 2 = 1/6. (6) 220 To complete the correlation matrix, we find R Y (1, 2) = E [Y 1 Y 2 ] = ∞ −∞ ∞ −∞ y 1 y 2 f Y 1 ,Y 2 (y 1 , y 2 ) dy 1 dy 2 = 1 0 1−y 2 0 2y 1 y 2 dy 1 dy 2 . (7) Following through on the calculus, we obtain R Y (1, 2) = 1 0 y 2 1 1−y−2 0 y 2 dy 2 = 1 0 y 2 (1 −y 2 ) 2 dy 2 = 1 2 y 2 2 − 2 3 y 3 2 + 1 4 y 4 2 1 0 = 1 12 . (8) Thus we have found that R Y = ¸ E Y 2 1 E [Y 1 Y 2 ] E [Y 2 Y 1 ] E Y 2 2 = ¸ 1/6 1/12 1/12 1/6 . (9) Lastly, Y has covariance matrix C Y = R Y −µ Y µ Y = ¸ 1/6 1/12 1/12 1/6 − ¸ 1/3 1/3 1/3 1/3 (10) = ¸ 1/9 −1/36 −1/36 1/9 . (11) Problem 5.6.9 Solution Given an arbitrary random vector X, we can define Y = X−µ X so that C X = E (X−µ X )(X−µ X ) = E YY = R Y . (1) It follows that the covariance matrix C X is positive semi-definite if and only if the correlation matrix R Y is positive semi-definite. Thus, it is sufficient to show that every correlation matrix, whether it is denoted R Y or R X , is positive semi-definite. To show a correlation matrix R X is positive semi-definite, we write a R X a = a E XX a = E a XX a = E (a X)(X a) = E (a X) 2 . (2) We note that W = a X is a random variable. Since E[W 2 ] ≥ 0 for any random variable W, a R X a = E W 2 ≥ 0. (3) Problem 5.7.1 Solution (a) From Theorem 5.12, the correlation matrix of X is R X = C X +µ X µ X (1) = 4 −2 1 −2 4 −2 1 −2 4 ¸ ¸ + 4 8 6 ¸ ¸ 4 8 6 (2) = 4 −2 1 −2 4 −2 1 −2 4 ¸ ¸ + 16 32 24 32 64 48 24 48 36 ¸ ¸ = 20 30 25 30 68 46 25 46 40 ¸ ¸ (3) 221 (b) Let Y = X 1 X 2 . Since Y is a subset of the components of X, it is a Gaussian random vector with expected velue vector µ Y = E [X 1 ] E [X 2 ] = 4 8 . (4) and covariance matrix C Y = ¸ Var[X 1 ] Cov [X 1 , X 2 ] C X 1 X 2 Var[X 2 ] = ¸ 4 −2 −2 4 (5) We note that det(C Y ) = 12 and that C −1 Y = 1 12 ¸ 4 2 2 4 = ¸ 1/3 1/6 1/6 1/3 . (6) This implies that (y −µ Y ) C −1 Y (y −µ Y ) = y 1 −4 y 2 −8 ¸ 1/3 1/6 1/6 1/3 ¸ y 1 −4 y 2 −8 (7) = y 1 −4 y 2 −8 ¸ y 1 /3 +y 2 /6 −8/3 y 1 /6 +y 2 /3 −10/3 (8) = y 2 1 3 + y 1 y 2 3 − 16y 1 3 − 20y 2 3 + y 2 2 3 + 112 3 (9) The PDF of Y is f Y (y) = 1 2π √ 12 e −(y−µ Y ) C −1 Y (y−µ Y )/2 (10) = 1 √ 48π 2 e −(y 2 1 +y 1 y 2 −16y 1 −20y 2 +y 2 2 +112)/6 (11) Since Y = X 1 , X 2 , the PDF of X 1 and X 2 is simply f X 1 ,X 2 (x 1 , x 2 ) = f Y 1 ,Y 2 (x 1 , x 2 ) = 1 √ 48π 2 e −(x 2 1 +x 1 x 2 −16x 1 −20x 2 +x 2 2 +112)/6 (12) (c) We can observe directly from µ X and C X that X 1 is a Gaussian (4, 2) random variable. Thus, P [X 1 > 8] = P ¸ X 1 −4 2 > 8 −4 2 = Q(2) = 0.0228 (13) Problem 5.7.2 Solution We are given that X is a Gaussian random vector with µ X = 4 8 6 ¸ ¸ C X = 4 −2 1 −2 4 −2 1 −2 4 ¸ ¸ . (1) We are also given that Y = AX+b where A = ¸ 1 1/2 2/3 1 −1/2 2/3 b = ¸ −4 −4 . (2) Since the two rows of A are linearly independent row vectors, A has rank 2. By Theorem 5.16, Y is a Gaussian random vector. Given these facts, the various parts of this problem are just straightforward calculations using Theorem 5.16. 222 (a) The expected value of Y is µ Y = Aµ X +b = ¸ 1 1/2 2/3 1 −1/2 2/3 4 8 6 ¸ ¸ + ¸ −4 −4 = ¸ 8 0 . (3) (b) The covariance matrix of Y is C Y = AC X A (4) = ¸ 1 1/2 2/3 1 −1/2 2/3 4 −2 1 −2 4 −2 1 −2 4 ¸ ¸ 1 1 1/2 −1/2 2/3 2/3 ¸ ¸ = 1 9 ¸ 43 55 55 103 . (5) (c) Y has correlation matrix R Y = C Y +µ Y µ Y = 1 9 ¸ 43 55 55 103 + ¸ 8 0 8 0 = 1 9 ¸ 619 55 55 103 (6) (d) From µ Y , we see that E[Y 2 ] = 0. From the covariance matrix C Y , we learn that Y 2 has variance σ 2 2 = C Y (2, 2) = 103/9. Since Y 2 is a Gaussian random variable, P [−1 ≤ Y 2 ≤ 1] = P ¸ − 1 σ 2 ≤ Y 2 σ 2 ≤ 1 σ 2 (7) = Φ 1 σ 2 −Φ −1 σ 2 (8) = 2Φ 1 σ 2 −1 (9) = 2Φ 3 √ 103 −1 = 0.2325. (10) Problem 5.7.3 Solution This problem is just a special case of Theorem 5.16 with the matrix A replaced by the row vector a and a 1 element vector b = b = 0. In this case, the vector Y becomes the scalar Y . The expected value vector µ Y = [µ Y ] and the covariance “matrix” of Y is just the 1 1 matrix [σ 2 Y ]. Directly from Theorem 5.16, we can conclude that Y is a length 1 Gaussian random vector, which is just a Gaussian random variable. In addition, µ Y = a µ X and Var[Y ] = C Y = a C X a. (1) Problem 5.7.4 Solution From Definition 5.17, the n = 2 dimensional Gaussian vector X has PDF f X (x) = 1 2π[det (C X )] 1/2 exp − 1 2 (x −µ X ) C −1 X (x −µ X ) (1) where C X has determinant det (C X ) = σ 2 1 σ 2 2 −ρ 2 σ 2 1 σ 2 2 = σ 2 1 σ 2 2 (1 −ρ 2 ). (2) 223 Thus, 1 2π[det (C X )] 1/2 = 1 2πσ 1 σ 2 1 −ρ 2 . (3) Using the 2 2 matrix inverse formula ¸ a b c d −1 = 1 ad −bc ¸ d −b −c a , (4) we obtain C −1 X = 1 σ 2 1 σ 2 2 (1 −ρ 2 ) ¸ σ 2 2 −ρσ 1 σ 2 −ρσ 1 σ 2 σ 2 1 = 1 1 −ρ 2 ¸ 1 σ 2 1 −ρ σ 1 σ 2 −ρ σ 1 σ 2 1 σ 2 2 ¸ . (5) Thus − 1 2 (x −µ X ) C −1 X (x −µ X ) = − x 1 −µ 1 x 2 −µ 2 ¸ 1 σ 2 1 −ρ σ 1 σ 2 −ρ σ 1 σ 2 1 σ 2 2 ¸ ¸ x 1 −µ 1 x 2 −µ 2 2(1 −ρ 2 ) (6) = − x 1 −µ 1 x 2 −µ 2 x 1 −µ 1 σ 2 1 − ρ(x 2 −µ 2 ) σ 1 σ 2 − ρ(x 1 −µ 1 ) σ 1 σ 2 + x 2 −µ 2 σ 2 2 ¸ ¸ 2(1 −ρ 2 ) (7) = − (x 1 −µ 1 ) 2 σ 2 1 − 2ρ(x 1 −µ 1 )(x 2 −µ 2 ) σ 1 σ 2 + (x 2 −µ 2 ) 2 σ 2 2 2(1 −ρ 2 ) . (8) Combining Equations (1), (3), and (8), we see that f X (x) = 1 2πσ 1 σ 2 1 −ρ 2 exp − (x 1 −µ 1 ) 2 σ 2 1 − 2ρ(x 1 −µ 1 )(x 2 −µ 2 ) σ 1 σ 2 + (x 2 −µ 2 ) 2 σ 2 2 2(1 −ρ 2 ) ¸ ¸ , (9) which is the bivariate Gaussian PDF in Definition 4.17. Problem 5.7.5 Solution Since W = ¸ X Y = ¸ I A X = DX (1) Suppose that X Gaussian (0, I) random vector. By Theorem 5.13, µ W = 0 and C W = DD . The matrix D is (m + n) n and has rank n. That is, the rows of D are dependent and there exists a vector y such that y D = 0. This implies y DD y = 0. Hence det(C W ) = 0 and C −1 W does not exist. Hence W is not a Gaussian random vector. The point to keep in mind is that the definition of a Gaussian random vector does not permit a component random variable to be a deterministic linear combination of other components. Problem 5.7.6 Solution 224 (a) From Theorem 5.13, Y has covariance matrix C Y = QC X Q (1) = ¸ cos θ −sin θ sin θ cos θ ¸ σ 2 1 0 0 σ 2 2 ¸ cos θ sin θ −sin θ cos θ (2) = ¸ σ 2 1 cos 2 θ + σ 2 2 sin 2 θ (σ 2 1 −σ 2 2 ) sin θ cos θ (σ 2 1 −σ 2 2 ) sin θ cos θ σ 2 1 sin 2 θ + σ 2 2 cos 2 θ . (3) We conclude that Y 1 and Y 2 have covariance Cov [Y 1 , Y 2 ] = C Y (1, 2) = (σ 2 1 −σ 2 2 ) sin θ cos θ. (4) Since Y 1 and Y 2 are jointly Gaussian, they are independent if and only if Cov[Y 1 , Y 2 ] = 0. Thus, Y 1 and Y 2 are independent for all θ if and only if σ 2 1 = σ 2 2 . In this case, when the joint PDF f X (x) is symmetric in x 1 and x 2 . In terms of polar coordinates, the PDF f X (x) = f X 1 ,X 2 (x 1 , x 2 ) depends on r = x 2 1 +x 2 2 but for a given r, is constant for all φ = tan −1 (x 2 /x 1 ). The transformation of X to Y is just a rotation of the coordinate system by θ preserves this circular symmetry. (b) If σ 2 2 > σ 2 1 , then Y 1 and Y 2 are independent if and only if sinθ cos θ = 0. This occurs in the following cases: • θ = 0: Y 1 = X 1 and Y 2 = X 2 • θ = π/2: Y 1 = −X 2 and Y 2 = −X 1 • θ = π: Y 1 = −X 1 and Y 2 = −X 2 • θ = −π/2: Y 1 = X 2 and Y 2 = X 1 In all four cases, Y 1 and Y 2 are just relabeled versions, possibly with sign changes, of X 1 and X 2 . In these cases, Y 1 and Y 2 are independent because X 1 and X 2 are independent. For other values of θ, each Y i is a linear combination of both X 1 and X 2 . This mixing results in correlation between Y 1 and Y 2 . Problem 5.7.7 Solution The difficulty of this problem is overrated since its a pretty simple application of Problem 5.7.6. In particular, Q = ¸ cos θ −sin θ sin θ cos θ θ=45 ◦ = 1 √ 2 ¸ 1 −1 1 1 . (1) Since X = QY, we know from Theorem 5.16 that X is Gaussian with covariance matrix C X = QC Y Q (2) = 1 √ 2 ¸ 1 −1 1 1 ¸ 1 + ρ 0 0 1 −ρ 1 √ 2 ¸ 1 1 −1 1 (3) = 1 2 ¸ 1 + ρ −(1 −ρ) 1 + ρ 1 −ρ ¸ 1 1 −1 1 (4) = ¸ 1 ρ ρ 1 . (5) 225 Problem 5.7.8 Solution As given in the problem statement, we define the m-dimensional vector X, the n-dimensional vector Y and W = ¸ X Y . Note that W has expected value µ W = E [W] = E ¸¸ X Y = ¸ E [X] E [Y] = ¸ µ X µ Y . (1) The covariance matrix of W is C W = E (W−µ W )(W−µ W ) (2) = E ¸¸ X−µ X Y−µ Y (X−µ X ) (Y−µ Y ) (3) = ¸ E [(X−µ X )(X−µ X ) ] E [(X−µ X )(Y−µ Y ) ] E [(Y−µ Y )(X−µ X ) ] E [(Y−µ Y )(Y−µ Y ) ] (4) = ¸ C X C XY C YX C Y . (5) The assumption that X and Y are independent implies that C XY = E (X−µ X )(Y −µ Y ) = (E [(X−µ X )] E (Y −µ Y ) = 0. (6) This also implies C YX = C XY = 0 . Thus C W = ¸ C X 0 0 C Y . (7) Problem 5.7.9 Solution (a) If you are familiar with the Gram-Schmidt procedure, the argument is that applying Gram- Schmidt to the rows of A yields m orthogonal row vectors. It is then possible to augment those vectors with an additional n −m orothogonal vectors. Those orthogonal vectors would be the rows of ˜ A. An alternate argument is that since A has rank m the nullspace of A, i.e., the set of all vectors y such that Ay = 0 has dimension n − m. We can choose any n − m linearly independent vectors y 1 , y 2 , . . . , y n−m in the nullspace A. We then define ˜ A to have columns y 1 , y 2 , . . . , y n−m . It follows that A ˜ A = 0. (b) To use Theorem 5.16 for the case m = n to show ¯ Y = ¸ Y ˆ Y = ¸ A ˆ A X. (1) is a Gaussian random vector requires us to show that ¯ A = ¸ A ˆ A = ¸ A ˜ AC −1 X (2) 226 is a rank n matrix. To prove this fact, we will suppose there exists w such that ¯ Aw = 0, and then show that w is a zero vector. Since A and ˜ A together have n linearly independent rows, we can write the row vector w as a linear combination of the rows of A and ˜ A. That is, for some v and ˜ v, w = vt A+ ˜ v ˜ A. (3) The condition ¯ Aw = 0 implies ¸ A ˜ AC −1 X A v + ˜ A ˜ v = ¸ 0 0 . (4) This implies AA v +A ˜ A ˜ v = 0 (5) ˜ AC −1 X Av + ˜ AC −1 X ˜ A ˜ v = 0 (6) Since A ˜ A = 0, Equation (5) implies that AA v = 0. Since A is rank m, AA is an mm rank m matrix. It follows that v = 0. We can then conclude from Equation (6) that ˜ AC −1 X ˜ A ˜ v = 0. (7) This would imply that ˜ v ˜ AC −1 X ˜ A ˜ v = 0. Since C −1 X is invertible, this would imply that ˜ A ˜ v = 0. Since the rows of ˜ A are linearly independent, it must be that ˜ v = 0. Thus ¯ A is full rank and ¯ Y is a Gaussian random vector. (c) We note that By Theorem 5.16, the Gaussian vector ¯ Y = ¯ AX has covariance matrix ¯ C = ¯ AC X ¯ A . (8) Since (C −1 X ) = C −1 X , ¯ A = A ( ˜ AC −1 X ) = A C −1 X ˜ A . (9) Applying this result to Equation (8) yields ¯ C = ¸ A ˜ AC −1 X C X A C −1 X ˜ A = ¸ AC X ˜ A A C −1 X ˜ A = ¸ AC X A A ˜ A ˜ AA ˜ AC −1 X ˜ A . (10) Since ˜ AA = 0, ¯ C = ¸ AC X A 0 0 ˜ AC −1 X ˜ A = ¸ C Y 0 0 C ˆ Y . (11) We see that ¯ C is block diagonal covariance matrix. From the claim of Problem 5.7.8, we can conclude that Y and ˆ Y are independent Gaussian random vectors. Problem 5.8.1 Solution We can use Theorem 5.16 since the scalar Y is also a 1-dimensional vector. To do so, we write Y = 1/3 1/3 1/3 X = AX. (1) By Theorem 5.16, Y is a Gaussian vector with expected value E [Y ] = Aµ X = (E [X 1 ] +E [X 2 ] +E [X 3 ])/3 = (4 + 8 + 6)/3 = 6 (2) 227 and covariance matrix C Y = Var[Y ] = AC X A (3) = 1/3 1/3 1/3 4 −2 1 −2 4 −2 1 −2 4 ¸ ¸ 1/3 1/3 1/3 ¸ ¸ = 2 3 (4) Thus Y is a Gaussian (6, 2/3) random variable, implying P [Y > 4] = P ¸ Y −6 2/3 > 4 −6 2/3 ¸ = 1 −Φ(− √ 6) = Φ( √ 6) = 0.9928 (5) Problem 5.8.2 Solution (a) The covariance matrix C X has Var[X i ] = 25 for each diagonal entry. For i = j, the i, jth entry of C X is [C X ] ij = ρ X i X j Var[X i ] Var[X j ] = (0.8)(25) = 20 (1) The covariance matrix of X is a 10 10 matrix of the form C X = 25 20 · · · 20 20 25 . . . . . . . . . . . . . . . 20 20 · · · 20 25 ¸ ¸ ¸ ¸ ¸ ¸ . (2) (b) We observe that Y = 1/10 1/10 · · · 1/10 X = AX (3) Since Y is the average of 10 iid random variables, E[Y ] = E[X i ] = 5. Since Y is a scalar, the 1 1 covariance matrix C Y = Var[Y ]. By Theorem 5.13, the variance of Y is Var[Y ] = C Y = AC X A = 20.5 (4) Since Y is Gaussian, P [Y ≤ 25] = P ¸ Y −5 √ 20.5 ≤ 25 −20.5 √ 20.5 = Φ(0.9939) = 0.8399. (5) Problem 5.8.3 Solution Under the model of Quiz 5.8, the temperature on day i and on day j have covariance Cov [T i , T j ] = C T [i −j] = 36 1 + |i −j| (1) From this model, the vector T = T 1 · · · T 31 has covariance matrix C T = C T [0] C T [1] · · · C T [30] C T [1] C T [0] . . . . . . . . . . . . . . . C T [1] C T [30] · · · C T [1] C T [0] ¸ ¸ ¸ ¸ ¸ ¸ . (2) 228 If you have read the solution to Quiz 5.8, you know that C T is a symmetric Toeplitz matrix and that Matlab has a toeplitz function to generate Toeplitz matrices. Using the toeplitz function to generate the covariance matrix, it is easy to use gaussvector to generate samples of the random vector T. Here is the code for estimating P[A] using m samples. function p=julytemp583(m); c=36./(1+(0:30)); CT=toeplitz(c); mu=80*ones(31,1); T=gaussvector(mu,CT,m); Y=sum(T)/31; Tmin=min(T); p=sum((Tmin>=72) & (Y <= 82))/m; julytemp583(100000) ans = 0.0684 >> julytemp583(100000) ans = 0.0706 >> julytemp583(100000) ans = 0.0714 >> julytemp583(100000) ans = 0.0701 We see from repeated experiments with m = 100,000 trials that P[A] ≈ 0.07. Problem 5.8.4 Solution The covariance matrix C X has Var[X i ] = 25 for each diagonal entry. For i = j, the i, jth entry of C X is [C X ] ij = ρ X i X j Var[X i ] Var[X j ] = (0.8)(25) = 20 (1) The covariance matrix of X is a 10 10 matrix of the form C X = 25 20 · · · 20 20 25 . . . . . . . . . . . . . . . 20 20 · · · 20 25 ¸ ¸ ¸ ¸ ¸ ¸ . (2) A program to estimate P[W ≤ 25] uses gaussvector to generate m sample vector of race times X. In the program sailboats.m, X is an 10 m matrix such that each column of X is a vector of race times. In addition min(X) is a row vector indicating the fastest time in each race. function p=sailboats(w,m) %Usage: p=sailboats(f,m) %In Problem 5.8.4, W is the %winning time in a 10 boat race. %We use m trials to estimate %P[W<=w] CX=(5*eye(10))+(20*ones(10,10)); mu=35*ones(10,1); X=gaussvector(mu,CX,m); W=min(X); p=sum(W<=w)/m; >> sailboats(25,10000) ans = 0.0827 >> sailboats(25,100000) ans = 0.0801 >> sailboats(25,100000) ans = 0.0803 >> sailboats(25,100000) ans = 0.0798 We see from repeated experiments with m = 100,000 trials that P[W ≤ 25] ≈ 0.08. Problem 5.8.5 Solution When we built poissonrv.m, we went to some trouble to be able to generate m iid samples at once. In this problem, each Poisson random variable that we generate has an expected value that 229 is different from that of any other Poisson random variables. Thus, we must generate the daily jackpots sequentially. Here is a simple program for this purpose. function jackpot=lottery1(jstart,M,D) %Usage: function j=lottery1(jstart,M,D) %Perform M trials of the D day lottery %of Problem 5.5.5 and initial jackpot jstart jackpot=zeros(M,1); for m=1:M, disp(’trm) jackpot(m)=jstart; for d=1:D, jackpot(m)=jackpot(m)+(0.5*poissonrv(jackpot(m),1)); end end The main problem with lottery1 is that it will run very slowly. Each call to poissonrv generates an entire Poisson PMF P X (x) for x = 0, 1, . . . , x max where x max ≥ 2 · 10 6 . This is slow in several ways. First, we repeating the calculation of ¸ xmax j=1 log j with each call to poissonrv. Second, each call to poissonrv asks for a Poisson sample value with expected value α > 1 · 10 6 . In these cases, for small values of x, P X (x) = α x e −αx /x! is so small that it is less than the smallest nonzero number that Matlab can store! To speed up the simulation, we have written a program bigpoissonrv which generates Poisson (α) samples for large α. The program makes an approximation that for a Poisson (α) random vari- able X, P X (x) ≈ 0 for |x−α| > 6 √ α. Since X has standard deviation √ α, we are assuming that X cannot be more than six standard deviations away from its mean value. The error in this approxi- mation is very small. In fact, for a Poisson (a) random variable, the program poissonsigma(a,k) calculates the error P[|X −a| > k √ a]. Here is poissonsigma.m and some simple calculations: function err=poissonsigma(a,k); xmin=max(0,floor(a-k*sqrt(a))); xmax=a+ceil(k*sqrt(a)); sx=xmin:xmax; logfacts =cumsum([0,log(1:xmax)]); %logfacts includes 0 in case xmin=0 %Now we extract needed values: logfacts=logfacts(sx+1); %pmf(i,:) is a Poisson a(i) PMF % from xmin to xmax pmf=exp(-a+ (log(a)*sx)-(logfacts)); err=1-sum(pmf); >> poissonsigma(1,6) ans = 1.0249e-005 >> poissonsigma(10,6) ans = 2.5100e-007 >> poissonsigma(100,6) ans = 1.2620e-008 >> poissonsigma(1000,6) ans = 2.6777e-009 >> poissonsigma(10000,6) ans = 1.8081e-009 >> poissonsigma(100000,6) ans = -1.6383e-010 The error reported by poissonsigma(a,k) should always be positive. In fact, we observe negative errors for very large a. For large α and x, numerical calculation of P X (x) = α x e −α /x! is tricky because we are taking ratios of very large numbers. In fact, for α = x = 1,000,000, Matlab calculation of α x and x! will report infinity while e −α will evaluate as zero. Our method 230 of calculating the Poisson (α) PMF is to use the fact that ln x! = ¸ x j=1 ln j to calculate exp (ln P X (x)) = exp ¸ xln α −α − x ¸ j=1 ln j ¸ . (1) This method works reasonably well except that the calculation of the logarithm has finite precision. The consequence is that the calculated sum over the PMF can vary from 1 by a very small amount, on the order of 10 −7 in our experiments. In our problem, the error is inconsequential, however, one should keep in mind that this may not be the case in other other experiments using large Poisson random variables. In any case, we can conclude that within the accuracy of Matlab’s simulated experiments, the approximations to be used by bigpoissonrv are not significant. The other feature of bigpoissonrv is that for a vector alpha corresponding to expected values α 1 · · · α m , bigpoissonrv returns a vector X such that X(i) is a Poisson alpha(i) sample. The work of calculating the sum of logarithms is done only once for all calculated samples. The result is a significant savings in cpu time as long as the values of alpha are reasonably close to each other. function x=bigpoissonrv(alpha) %for vector alpha, returns a vector x such that % x(i) is a Poisson (alpha(i)) rv %set up Poisson CDF from xmin to xmax for each alpha(i) alpha=alpha(:); amin=min(alpha(:)); amax=max(alpha(:)); %Assume Poisson PMF is negligible +-6 sigma from the average xmin=max(0,floor(amin-6*sqrt(amax))); xmax=amax+ceil(6*sqrt(amax));%set max range sx=xmin:xmax; %Now we include the basic code of poissonpmf (but starting at xmin) logfacts =cumsum([0,log(1:xmax)]); %include 0 in case xmin=0 logfacts=logfacts(sx+1); %extract needed values %pmf(i,:) is a Poisson alpha(i) PMF from xmin to xmax pmf=exp(-alpha*ones(size(sx))+ ... (log(alpha)*sx)-(ones(size(alpha))*logfacts)); cdf=cumsum(pmf,2); %each row is a cdf x=(xmin-1)+sum((rand(size(alpha))*ones(size(sx)))<=cdf,2); Finally, given bigpoissonrv, we can write a short program lottery that simulates trials of the jackpot experiment. Ideally, we would like to use lottery to perform m = 1,000 trials in a single pass. In general, Matlab is more efficient when calculations are executed in parallel using vectors. However, in bigpoissonrv, the matrix pmf will have m rows and at least 12 √ α = 12,000 columns. For m more than several hundred, Matlab running on my laptop reported an “Out of Memory” error. Thus, we wrote the program lottery to perform M trials at once and to repeat that N times. The output is an M N matrix where each i, j entry is a sample jackpot after seven days. 231 function jackpot=lottery(jstart,M,N,D) %Usage: function j=lottery(jstart,M,N,D) %Perform M trials of the D day lottery %of Problem 5.5.5 and initial jackpot jstart jackpot=zeros(M,N); for n=1:N, jackpot(:,n)=jstart*ones(M,1); for d=1:D, disp(d); jackpot(:,n)=jackpot(:,n)+(0.5*bigpoissonrv(jackpot(:,n))); end end Executing J=lottery(1e6,200,10,7) generates a matrix J of 2,000 sample jackpots. The com- mand hist(J(:),50) generates a histogram of the values with 50 bins. An example is shown here: 1.7076 1.7078 1.708 1.7082 1.7084 1.7086 1.7088 1.709 1.7092 1.7094 1.7096 x 10 7 0 50 100 150 F r e q u e n c y J If you go back and solve Problem 5.5.5, you will see that the jackpot J has expected value E[J] = (3/2) 7 10 6 = 1.70859 10 7 dollars. Thus it is not surprising that the histogram is centered around a jackpot of 1.708 10 7 dollars. If we did more trials, and used more histogram bins, the histogram would appear to converge to the shape of a Gaussian PDF. This fact is explored in Chapter 6. 232 Problem Solutions – Chapter 6 Problem 6.1.1 Solution The random variable X 33 is a Bernoulli random variable that indicates the result of flip 33. The PMF of X 33 is P X 33 (x) = 1 −p x = 0 p x = 1 0 otherwise (1) Note that each X i has expected value E[X] = p and variance Var[X] = p(1 − p). The random variable Y = X 1 + · · · +X 100 is the number of heads in 100 coin flips. Hence, Y has the binomial PMF P Y (y) = 100 y p y (1 −p) 100−y y = 0, 1, . . . , 100 0 otherwise (2) Since the X i are independent, by Theorems 6.1 and 6.3, the mean and variance of Y are E [Y ] = 100E [X] = 100p Var[Y ] = 100 Var[X] = 100p(1 −p) (3) Problem 6.1.2 Solution Let Y = X 1 −X 2 . (a) Since Y = X 1 + (−X 2 ), Theorem 6.1 says that the expected value of the difference is E [Y ] = E [X 1 ] +E [−X 2 ] = E [X] −E [X] = 0 (1) (b) By Theorem 6.2, the variance of the difference is Var[Y ] = Var[X 1 ] + Var[−X 2 ] = 2 Var[X] (2) Problem 6.1.3 Solution (a) The PMF of N 1 , the number of phone calls needed to obtain the correct answer, can be determined by observing that if the correct answer is given on the nth call, then the previous n −1 calls must have given wrong answers so that P N 1 (n) = (3/4) n−1 (1/4) n = 1, 2, . . . 0 otherwise (1) (b) N 1 is a geometric random variable with parameter p = 1/4. In Theorem 2.5, the mean of a geometric random variable is found to be 1/p. For our case, E[N 1 ] = 4. (c) Using the same logic as in part (a) we recognize that in order for n to be the fourth correct answer, that the previous n −1 calls must have contained exactly 3 correct answers and that the fourth correct answer arrived on the n-th call. This is described by a Pascal random variable. P N 4 (n 4 ) = n−1 3 (3/4) n−4 (1/4) 4 n = 4, 5, . . . 0 otherwise (2) (d) Using the hint given in the problem statement we can find the mean of N 4 by summing up the means of the 4 identically distributed geometric random variables each with mean 4. This gives E[N 4 ] = 4E[N 1 ] = 16. 233 Problem 6.1.4 Solution We can solve this problem using Theorem 6.2 which says that Var[W] = Var[X] + Var[Y ] + 2 Cov [X, Y ] (1) The first two moments of X are E [X] = 1 0 1−x 0 2xdy dx = 1 0 2x(1 −x) dx = 1/3 (2) E X 2 = 1 0 1−x 0 2x 2 dy dx = 1 0 2x 2 (1 −x) dx = 1/6 (3) (4) Thus the variance of X is Var[X] = E[X 2 ] −(E[X]) 2 = 1/18. By symmetry, it should be apparent that E[Y ] = E[X] = 1/3 and Var[Y ] = Var[X] = 1/18. To find the covariance, we first find the correlation E [XY ] = 1 0 1−x 0 2xy dy dx = 1 0 x(1 −x) 2 dx = 1/12 (5) The covariance is Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 1/12 −(1/3) 2 = −1/36 (6) Finally, the variance of the sum W = X +Y is Var[W] = Var[X] + Var[Y ] −2 Cov [X, Y ] = 2/18 −2/36 = 1/18 (7) For this specific problem, it’s arguable whether it would easier to find Var[W] by first deriving the CDF and PDF of W. In particular, for 0 ≤ w ≤ 1, F W (w) = P [X +Y ≤ w] = w 0 w−x 0 2 dy dx = w 0 2(w −x) dx = w 2 (8) Hence, by taking the derivative of the CDF, the PDF of W is f W (w) = 2w 0 ≤ w ≤ 1 0 otherwise (9) From the PDF, the first and second moments of W are E [W] = 1 0 2w 2 dw = 2/3 E W 2 = 1 0 2w 3 dw = 1/2 (10) The variance of W is Var[W] = E[W 2 ] −(E[W]) 2 = 1/18. Not surprisingly, we get the same answer both ways. Problem 6.1.5 Solution This problem should be in either Chapter 10 or Chapter 11. Since each X i has zero mean, the mean of Y n is E [Y n ] = E [X n +X n−1 +X n−2 ] /3 = 0 (1) 234 Since Y n has zero mean, the variance of Y n is Var[Y n ] = E Y 2 n (2) = 1 9 E (X n +X n−1 +X n−2 ) 2 (3) = 1 9 E X 2 n +X 2 n−1 +X 2 n−2 + 2X n X n−1 + 2X n X n−2 + 2X n−1 X n−2 (4) = 1 9 (1 + 1 + 1 + 2/4 + 0 + 2/4) = 4 9 (5) Problem 6.2.1 Solution The joint PDF of X and Y is f X,Y (x, y) = 2 0 ≤ x ≤ y ≤ 1 0 otherwise (1) We wish to find the PDF of W where W = X + Y . First we find the CDF of W, F W (w), but we must realize that the CDF will require different integrations for different values of w. Y X Y=X X+Y=w w w Area of Integration For values of 0 ≤ w ≤ 1 we look to integrate the shaded area in the figure to the right. F W (w) = w 2 0 w−x x 2 dy dx = w 2 2 (2) Y X Y=X X+Y=w w w Area of Integration For values of w in the region 1 ≤ w ≤ 2 we look to integrate over the shaded region in the graph to the right. From the graph we see that we can integrate with respect to x first, ranging y from 0 to w/2, thereby covering the lower right triangle of the shaded region and leaving the upper trapezoid, which is accounted for in the second term of the following expression: F W (w) = w 2 0 y 0 2 dxdy + 1 w 2 w−y 0 2 dxdy (3) = 2w −1 − w 2 2 (4) Putting all the parts together gives the CDF F W (w) and (by taking the derivative) the PDF f W (w). F W (w) = 0 w < 0 w 2 2 0 ≤ w ≤ 1 2w −1 − w 2 2 1 ≤ w ≤ 2 1 w > 2 f W (w) = w 0 ≤ w ≤ 1 2 −w 1 ≤ w ≤ 2 0 otherwise (5) Problem 6.2.2 Solution The joint PDF of X and Y is f X,Y (x, y) = 1 0 ≤ x, y ≤ 1 0 otherwise (1) 235 Proceeding as in Problem 6.2.1, we must first find F W (w) by integrating over the square defined by 0 ≤ x, y ≤ 1. Again we are forced to find F W (w) in parts as we did in Problem 6.2.1 resulting in the following integrals for their appropriate regions. For 0 ≤ w ≤ 1, F W (w) = w 0 w−x 0 dxdy = w 2 /2 (2) For 1 ≤ w ≤ 2, F W (w) = w−1 0 1 0 dxdy + 1 w−1 w−y 0 dxdy = 2w −1 −w 2 /2 (3) The complete CDF F W (w) is shown below along with the corresponding PDF f W (w) = dF W (w)/dw. F W (w) = 0 w < 0 w 2 /2 0 ≤ w ≤ 1 2w −1 −w 2 /2 1 ≤ w ≤ 2 1 otherwise f W (w) = w 0 ≤ w ≤ 1 2 −w 1 ≤ w ≤ 2 0 otherwise (4) Problem 6.2.3 Solution By using Theorem 6.5, we can find the PDF of W = X + Y by convolving the two exponential distributions. For µ = λ, f W (w) = ∞ −∞ f X (x) f Y (w −x) dx (1) = w 0 λe −λx µe −µ(w−x) dx (2) = λµe −µw w 0 e −(λ−µ)x dx (3) = λµ λ−µ e −µw −e −λw w ≥ 0 0 otherwise (4) When µ = λ, the previous derivation is invalid because of the denominator term λ −µ. For µ = λ, we have f W (w) = ∞ −∞ f X (x) f Y (w −x) dx (5) = w 0 λe −λx λe −λ(w−x) dx (6) = λ 2 e −λw w 0 dx (7) = λ 2 we −λw w ≥ 0 0 otherwise (8) Note that when µ = λ, W is the sum of two iid exponential random variables and has a second order Erlang PDF. 236 Problem 6.2.4 Solution In this problem, X and Y have joint PDF f X,Y (x, y) = 8xy 0 ≤ y ≤ x ≤ 1 0 otherwise (1) We can find the PDF of W using Theorem 6.4: f W (w) = ∞ −∞ f X,Y (x, w −x) dx. The only tricky part remaining is to determine the limits of the integration. First, for w < 0, f W (w) = 0. The two remaining cases are shown in the accompanying figure. The shaded area shows where the joint PDF f X,Y (x, y) is nonzero. The diagonal lines depict y = w−x as a function of x. The intersection of the diagonal line and the shaded area define our limits of integration. x y 2 2 1 1 0<w<1 1<w<2 w w w w For 0 ≤ w ≤ 1, f W (w) = w w/2 8x(w −x) dx (2) = 4wx 2 −8x 3 /3 w w/2 = 2w 3 /3 (3) For 1 ≤ w ≤ 2, f W (w) = 1 w/2 8x(w −x) dx (4) = 4wx 2 −8x 3 /3 1 w/2 (5) = 4w −8/3 −2w 3 /3 (6) Since X +Y ≤ 2, f W (w) = 0 for w > 2. Hence the complete expression for the PDF of W is f W (w) = 2w 3 /3 0 ≤ w ≤ 1 4w −8/3 −2w 3 /3 1 ≤ w ≤ 2 0 otherwise (7) Problem 6.2.5 Solution We first find the CDF of W following the same procedure as in the proof of Theorem 6.4. F W (w) = P [X ≤ Y +w] = ∞ −∞ y+w −∞ f X,Y (x, y) dxdy (1) By taking the derivative with respect to w, we obtain f W (w) = dF W (w) dw = ∞ −∞ d dw y+w −∞ f X,Y (x, y) dx dy (2) = ∞ −∞ f X,Y (w +y, y) dy (3) With the variable substitution y = x −w, we have dy = dx and f W (w) = ∞ −∞ f X,Y (x, x −w) dx (4) 237 Problem 6.2.6 Solution The random variables K and J have PMFs P J (j) = α j e −α j! j = 0, 1, 2, . . . 0 otherwise P K (k) = β k e −β k! k = 0, 1, 2, . . . 0 otherwise (1) For n ≥ 0, we can find the PMF of N = J +K via P [N = n] = ∞ ¸ k=−∞ P [J = n −k, K = k] (2) Since J and K are independent, non-negative random variables, P [N = n] = n ¸ k=0 P J (n −k) P K (k) (3) = n ¸ k=0 α n−k e −α (n −k)! β k e −β k! (4) = (α + β) n e −(α+β) n! n ¸ k=0 n! k!(n −k)! α α + β n−k β α + β k . .. . 1 (5) The marked sum above equals 1 because it is the sum of a binomial PMF over all possible values. The PMF of N is the Poisson PMF P N (n) = (α+β) n e −(α+β) n! n = 0, 1, 2, . . . 0 otherwise (6) Problem 6.3.1 Solution For a constant a > 0, a zero mean Laplace random variable X has PDF f X (x) = a 2 e −a|x| −∞< x < ∞ (1) The moment generating function of X is φ X (s) = E e sX = a 2 0 −∞ e sx e ax dx + a 2 ∞ 0 e sx e −ax dx (2) = a 2 e (s+a)x s +a 0 −∞ + a 2 e (s−a)x s −a ∞ 0 (3) = a 2 1 s +a − 1 s −a (4) = a 2 a 2 −s 2 (5) 238 Problem 6.3.2 Solution (a) By summing across the rows of the table, we see that J has PMF P J (j) = 0.6 j = −2 0.4 j = −1 (1) The MGF of J is φ J (s) = E[e sJ ] = 0.6e −2s + 0.4e −s . (b) Summing down the columns of the table, we see that K has PMF P K (k) = 0.7 k = −1 0.2 k = 0 0.1 k = 1 (2) The MGF of K is φ K (s) = 0.7e −s + 0.2 + 0.1e s . (c) To find the PMF of M = J + K, it is easist to annotate each entry in the table with the coresponding value of M: P J,K (j, k) k = −1 k = 0 k = 1 j = −2 0.42(M = −3) 0.12(M = −2) 0.06(M = −1) j = −1 0.28(M = −2) 0.08(M = −1) 0.04(M = 0) (3) We obtain P M (m) by summing over all j, k such that j +k = m, yielding P M (m) = 0.42 m = −3 0.40 m = −2 0.14 m = −1 0.04 m = 0 (4) (d) One way to solve this problem, is to find the MGF φ M (s) and then take four derivatives. Sometimes its better to just work with definition of E[M 4 ]: E M 4 = ¸ m P M (m) m 4 (5) = 0.42(−3) 4 + 0.40(−2) 4 + 0.14(−1) 4 + 0.04(0) 4 = 40.434 (6) As best I can tell, the prupose of this problem is to check that you know when not to use the methods in this chapter. Problem 6.3.3 Solution We find the MGF by calculating E[e sX ] from the PDF f X (x). φ X (s) = E e sX = b a e sX 1 b −a dx = e bs −e as s(b −a) (1) Now to find the first moment, we evaluate the derivative of φ X (s) at s = 0. E [X] = dφ X (s) ds s=0 = s be bs −ae as − e bs −e as (b −a)s 2 s=0 (2) 239 Direct evaluation of the above expression at s = 0 yields 0/0 so we must apply l’Hˆ opital’s rule and differentiate the numerator and denominator. E [X] = lim s→0 be bs −ae as +s b 2 e bs −a 2 e as − be bs −ae as 2(b −a)s (3) = lim s→0 b 2 e bs −a 2 e as 2(b −a) = b +a 2 (4) To find the second moment of X, we first find that the second derivative of φ X (s) is d 2 φ X (s) ds 2 = s 2 b 2 e bs −a 2 e as −2s be bs −ae as + 2 be bs −ae as (b −a)s 3 (5) Substituting s = 0 will yield 0/0 so once again we apply l’Hˆ opital’s rule and differentiate the numerator and denominator. E X 2 = lim s→0 d 2 φ X (s) ds 2 = lim s→0 s 2 b 3 e bs −a 3 e as 3(b −a)s 2 (6) = b 3 −a 3 3(b −a) = (b 2 +ab +a 2 )/3 (7) In this case, it is probably simpler to find these moments without using the MGF. Problem 6.3.4 Solution Using the moment generating function of X, φ X (s) = e σ 2 s 2 /2 . We can find the nth moment of X, E[X n ] by taking the nth derivative of φ X (s) and setting s = 0. E [X] = σ 2 se σ 2 s 2 /2 s=0 = 0 (1) E X 2 = σ 2 e σ 2 s 2 /2 + σ 4 s 2 e σ 2 s 2 /2 s=0 = σ 2 . (2) Continuing in this manner we find that E X 3 = 3σ 4 s + σ 6 s 3 e σ 2 s 2 /2 s=0 = 0 (3) E X 4 = 3σ 4 + 6σ 6 s 2 + σ 8 s 4 e σ 2 s 2 /2 s=0 = 3σ 4 . (4) To calculate the moments of Y , we define Y = X + µ so that Y is Gaussian (µ, σ). In this case the second moment of Y is E Y 2 = E (X +µ) 2 = E X 2 + 2µX +µ 2 = σ 2 +µ 2 . (5) Similarly, the third moment of Y is E Y 3 = E (X +µ) 3 (6) = E X 3 + 3µX 2 + 3µ 2 X +µ 3 = 3µσ 2 +µ 3 . (7) Finally, the fourth moment of Y is E Y 4 = E (X +µ) 4 (8) = E X 4 + 4µX 3 + 6µ 2 X 2 + 4µ 3 X +µ 4 (9) = 3σ 4 + 6µ 2 σ 2 +µ 4 . (10) 240 Problem 6.3.5 Solution The PMF of K is P K (k) = 1/n k = 1, 2, . . . , n 0 otherwise (1) The corresponding MGF of K is φ K (s) = E e sK = 1 n e s +e 2 s + · · · +e ns (2) = e s n 1 +e s +e 2s + · · · +e (n−1)s (3) = e s (e ns −1) n(e s −1) (4) We can evaluate the moments of K by taking derivatives of the MGF. Some algebra will show that dφ K (s) ds = ne (n+2)s −(n + 1)e (n+1)s +e s n(e s −1) 2 (5) Evaluating dφ K (s)/ds at s = 0 yields 0/0. Hence, we apply l’Hˆ opital’s rule twice (by twice differentiating the numerator and twice differentiating the denominator) when we write dφ K (s) ds s=0 = lim s→0 n(n + 2)e (n+2)s −(n + 1) 2 e (n+1)s +e s 2n(e s −1) (6) = lim s→0 n(n + 2) 2 e (n+2)s −(n + 1) 3 e (n+1)s +e s 2ne s = (n + 1)/2 (7) A significant amount of algebra will show that the second derivative of the MGF is d 2 φ K (s) ds 2 = n 2 e (n+3)s −(2n 2 + 2n −1)e (n+2)s + (n + 1) 2 e (n+1)s −e 2s −e s n(e s −1) 3 (8) Evaluating d 2 φ K (s)/ds 2 at s = 0 yields 0/0. Because (e s − 1) 3 appears in the denominator, we need to use l’Hˆ opital’s rule three times to obtain our answer. d 2 φ K (s) ds 2 s=0 = lim s→0 n 2 (n + 3) 3 e (n+3)s −(2n 2 + 2n −1)(n + 2) 3 e (n+2)s + (n + 1) 5 −8e 2s −e s 6ne s (9) = n 2 (n + 3) 3 −(2n 2 + 2n −1)(n + 2) 3 + (n + 1) 5 −9 6n (10) = (2n + 1)(n + 1)/6 (11) We can use these results to derive two well known results. We observe that we can directly use the PMF P K (k) to calculate the moments E [K] = 1 n n ¸ k=1 k E K 2 = 1 n n ¸ k=1 k 2 (12) Using the answers we found for E[K] and E[K 2 ], we have the formulas n ¸ k=1 k = n(n + 1) 2 n ¸ k=1 k 2 = n(n + 1)(2n + 1) 6 (13) 241 Problem 6.4.1 Solution N is a binomial (n = 100, p = 0.4) random variable. M is a binomial (n = 50, p = 0.4) random variable. Thus N is the sum of 100 independent Bernoulli (p = 0.4) and M is the sum of 50 independent Bernoulli (p = 0.4) random variables. Since M and N are independent, L = M + N is the sum of 150 independent Bernoulli (p = 0.4) random variables. Hence L is a binomial n = 150, p = 0.4) and has PMF P L (l) = 150 l (0.4) l (0.6) 150−l . (1) Problem 6.4.2 Solution Random variable Y has the moment generating function φ Y (s) = 1/(1 − s). Random variable V has the moment generating function φ V (s) = 1/(1 −s) 4 . Y and V are independent. W = Y +V . (a) From Table 6.1, Y is an exponential (λ = 1) random variable. For an exponential (λ) random variable, Example 6.5 derives the moments of the exponential random variable. For λ = 1, the moments of Y are E [Y ] = 1, E Y 2 = 2, E Y 3 = 3! = 6. (1) (b) Since Y and V are independent, W = Y +V has MGF φ W (s) = φ Y (s)φ V (s) = 1 1 −s 1 1 −s 4 = 1 1 −s 5 . (2) W is the sum of five independent exponential (λ = 1) random variables X 1 , . . . , X 5 . (That is, W is an Erlang (n = 5, λ = 1) random variable.) Each X i has expected value E[X] = 1 and variance Var[X] = 1. From Theorem 6.1 and Theorem 6.3, E [W] = 5E [X] = 5, Var[W] = 5 Var[X] = 5. (3) It follows that E W 2 = Var[W] + (E [W]) 2 = 5 + 25 = 30. (4) Problem 6.4.3 Solution In the iid random sequence K 1 , K 2 , . . ., each K i has PMF P K (k) = 1 −p k = 0, p k = 1, 0 otherwise. (1) (a) The MGF of K is φ K (s) = E[e sK ] = 1 −p +pe s . (b) By Theorem 6.8, M = K 1 +K 2 +. . . +K n has MGF φ M (s) = [φ K (s)] n = [1 −p +pe s ] n (2) 242 (c) Although we could just use the fact that the expectation of the sum equals the sum of the expectations, the problem asks us to find the moments using φ M (s). In this case, E [M] = dφ M (s) ds s=0 = n(1 −p +pe s ) n−1 pe s s=0 = np (3) The second moment of M can be found via E M 2 = dφ M (s) ds s=0 (4) = np (n −1)(1 −p +pe s )pe 2s + (1 −p +pe s ) n−1 e s s=0 (5) = np[(n −1)p + 1] (6) The variance of M is Var[M] = E M 2 −(E [M]) 2 = np(1 −p) = nVar[K] (7) Problem 6.4.4 Solution Based on the problem statement, the number of points X i that you earn for game i has PMF P X i (x) = 1/3 x = 0, 1, 2 0 otherwise (1) (a) The MGF of X i is φ X i (s) = E e sX i = 1/3 +e s /3 +e 2s /3 (2) Since Y = X 1 + · · · +X n , Theorem 6.8 implies φ Y (s) = [φ X i (s)] n = [1 +e s +e 2s ] n /3 n (3) (b) First we observe that first and second moments of X i are E [X i ] = ¸ x xP X i (x) = 1/3 + 2/3 = 1 (4) E X 2 i = ¸ x x 2 P X i (x) = 1 2 /3 + 2 2 /3 = 5/3 (5) Hence, Var[X i ] = E X 2 i −(E [X i ]) 2 = 2/3. (6) By Theorems 6.1 and 6.3, the mean and variance of Y are E [Y ] = nE [X] = n (7) Var[Y ] = nVar[X] = 2n/3 (8) Another more complicated way to find the mean and variance is to evaluate derivatives of φ Y (s) as s = 0. 243 Problem 6.4.5 Solution P K i (k) = 2 k e −2 /k! k = 0, 1, 2, . . . 0 otherwise (1) And let R i = K 1 +K 2 +. . . +K i (a) From Table 6.1, we find that the Poisson (α = 2) random variable K has MGF φ K (s) = e 2(e s −1) . (b) The MGF of R i is the product of the MGFs of the K i ’s. φ R i (s) = i ¸ n=1 φ K (s) = e 2i(e s −1) (2) (c) Since the MGF of R i is of the same form as that of the Poisson with parameter, α = 2i. Therefore we can conclude that R i is in fact a Poisson random variable with parameter α = 2i. That is, P R i (r) = (2i) r e −2i /r! r = 0, 1, 2, . . . 0 otherwise (3) (d) Because R i is a Poisson random variable with parameter α = 2i, the mean and variance of R i are then both 2i. Problem 6.4.6 Solution The total energy stored over the 31 days is Y = X 1 +X 2 + · · · +X 31 (1) The random variables X 1 , . . . , X 31 are Gaussian and independent but not identically distributed. However, since the sum of independent Gaussian random variables is Gaussian, we know that Y is Gaussian. Hence, all we need to do is find the mean and variance of Y in order to specify the PDF of Y . The mean of Y is E [Y ] = 31 ¸ i=1 E [X i ] = 31 ¸ i=1 (32 −i/4) = 32(31) − 31(32) 8 = 868 kW-hr (2) Since each X i has variance of 100(kW-hr) 2 , the variance of Y is Var[Y ] = Var[X 1 ] + · · · + Var[X 31 ] = 31 Var[X i ] = 3100 (3) Since E[Y ] = 868 and Var[Y ] = 3100, the Gaussian PDF of Y is f Y (y) = 1 √ 6200π e −(y−868) 2 /6200 (4) 244 Problem 6.4.7 Solution By Theorem 6.8, we know that φ M (s) = [φ K (s)] n . (a) The first derivative of φ M (s) is dφ M (s) ds = n[φ K (s)] n−1 dφ K (s) ds (1) We can evaluate dφ M (s)/ds at s = 0 to find E[M]. E [M] = dφ M (s) ds s=0 = n[φ K (s)] n−1 dφ K (s) ds s=0 = nE [K] (2) (b) The second derivative of φ M (s) is d 2 φ M (s) ds 2 = n(n −1) [φ K (s)] n−2 dφ K (s) ds 2 +n[φ K (s)] n−1 d 2 φ K (s) ds 2 (3) Evaluating the second derivative at s = 0 yields E M 2 = d 2 φ M (s) ds 2 s=0 = n(n −1) (E [K]) 2 +nE K 2 (4) Problem 6.5.1 Solution (a) From Table 6.1, we see that the exponential random variable X has MGF φ X (s) = λ λ −s (1) (b) Note that K is a geometric random variable identical to the geometric random variable X in Table 6.1 with parameter p = 1 − q. From Table 6.1, we know that random variable K has MGF φ K (s) = (1 −q)e s 1 −qe s (2) Since K is independent of each X i , V = X 1 +· · · +X K is a random sum of random variables. From Theorem 6.12, φ V (s) = φ K (ln φ X (s)) = (1 −q) λ λ−s 1 −q λ λ−s = (1 −q)λ (1 −q)λ −s (3) We see that the MGF of V is that of an exponential random variable with parameter (1−q)λ. The PDF of V is f V (v) = (1 −q)λe −(1−q)λv v ≥ 0 0 otherwise (4) 245 Problem 6.5.2 Solution The number N of passes thrown has the Poisson PMF and MGF P N (n) = (30) n e −30 /n! n = 0, 1, . . . 0 otherwise φ N (s) = e 30(e s −1) (1) Let X i = 1 if pass i is thrown and completed and otherwise X i = 0. The PMF and MGF of each X i is P X i (x) = 1/3 x = 0 2/3 x = 1 0 otherwise φ X i (s) = 1/3 + (2/3)e s (2) The number of completed passes can be written as the random sum of random variables K = X 1 + · · · +X N (3) Since each X i is independent of N, we can use Theorem 6.12 to write φ K (s) = φ N (ln φ X (s)) = e 30(φ X (s)−1) = e 30(2/3)(e s −1) (4) We see that K has the MGF of a Poisson random variable with mean E[K] = 30(2/3) = 20, variance Var[K] = 20, and PMF P K (k) = (20) k e −20 /k! k = 0, 1, . . . 0 otherwise (5) Problem 6.5.3 Solution In this problem, Y = X 1 + · · · + X N is not a straightforward random sum of random variables because N and the X i ’s are dependent. In particular, given N = n, then we know that there were exactly 100 heads in N flips. Hence, given N, X 1 + · · · + X N = 100 no matter what is the actual value of N. Hence Y = 100 every time and the PMF of Y is P Y (y) = 1 y = 100 0 otherwise (1) Problem 6.5.4 Solution Donovan McNabb’s passing yardage is the random sum of random variables V +Y 1 + · · · +Y K (1) where Y i has the exponential PDF f Y i (y) = 1 15 e −y/15 y ≥ 0 0 otherwise (2) From Table 6.1, the MGFs of Y and K are φ Y (s) = 1/15 1/15 −s = 1 1 −15s φ K (s) = e 20(e s −1) (3) From Theorem 6.12, V has MGF φ V (s) = φ K (ln φ Y (s)) = e 20(φ Y (s)−s) = e 300s/(1−15s) (4) 246 The PDF of V cannot be found in a simple form. However, we can use the MGF to calculate the mean and variance. In particular, E [V ] = dφ V (s) ds s=0 = e 300s/(1−15s) 300 (1 −15s) 2 s=0 = 300 (5) E V 2 = d 2 φ V (s) ds 2 s=0 (6) = e 300s/(1−15s) 300 (1 −15s) 2 2 +e 300s/(1−15s) 9000 (1 −15s) 3 s=0 = 99, 000 (7) Thus, V has variance Var[V ] = E[V 2 ] −(E[V ]) 2 = 9, 000 and standard deviation σ V ≈ 94.9. A second way to calculate the mean and variance of V is to use Theorem 6.13 which says E [V ] = E [K] E [Y ] = 20(15) = 200 (8) Var[V ] = E [K] Var[Y ] + Var[K](E [Y ]) 2 = (20)15 2 + (20)15 2 = 9000 (9) Problem 6.5.5 Solution Since each ticket is equally likely to have one of 46 6 combinations, the probability a ticket is a winner is q = 1 46 6 (1) Let X i = 1 if the ith ticket sold is a winner; otherwise X i = 0. Since the number K of tickets sold has a Poisson PMF with E[K] = r, the number of winning tickets is the random sum V = X 1 + · · · +X K (2) From Appendix A, φ X (s) = (1 −q) +qe s φ K (s) = e r[e s −1] (3) By Theorem 6.12, φ V (s) = φ K (ln φ X (s)) = e r[φ X (s)−1] = e rq(e s −1) (4) Hence, we see that V has the MGF of a Poisson random variable with mean E[V ] = rq. The PMF of V is P V (v) = (rq) v e −rq /v! v = 0, 1, 2, . . . 0 otherwise (5) Problem 6.5.6 Solution (a) We can view K as a shifted geometric random variable. To find the MGF, we start from first principles with Definition 6.1: φ K (s) = ∞ ¸ k=0 e sk p(1 −p) k = p ∞ ¸ n=0 [(1 −p)e s ] k = p 1 −(1 −p)e s (1) 247 (b) First, we need to recall that each X i has MGF φ X (s) = e s+s 2 /2 . From Theorem 6.12, the MGF of R is φ R (s) = φ K (ln φ X (s)) = φ K (s +s 2 /2) = p 1 −(1 −p)e s+s 2 /2 (2) (c) To use Theorem 6.13, we first need to calculate the mean and variance of K: E [K] = dφ K (s) ds s=0 = p(1 −p)e s 1 −(1 −p)e s 2 s=0 = 1 −p p (3) E K 2 = d 2 φ K (s) ds 2 s=0 = p(1 −p) [1 −(1 −p)e s ]e s + 2(1 −p)e 2s [1 −(1 −p)e s ] 3 s=0 (4) = (1 −p)(2 −p) p 2 (5) Hence, Var[K] = E[K 2 ] −(E[K]) 2 = (1 −p)/p 2 . Finally. we can use Theorem 6.13 to write Var[R] = E [K] Var[X] + (E [X]) 2 Var[K] = 1 −p p + 1 −p p 2 = 1 −p 2 p 2 (6) Problem 6.5.7 Solution The way to solve for the mean and variance of U is to use conditional expectations. Given K = k, U = X 1 + · · · +X k and E [U|K = k] = E [X 1 + · · · +X k |X 1 + · · · +X n = k] (1) = k ¸ i=1 E [X i |X 1 + · · · +X n = k] (2) Since X i is a Bernoulli random variable, E [X i |X 1 + · · · +X n = k] = P X i = 1| n ¸ j=1 X j = k ¸ ¸ (3) = P X i = 1, ¸ j=i X j = k −1 P ¸ n j=1 X j = k (4) Note that ¸ n j=1 X j is just a binomial random variable for n trials while ¸ j=i X j is a binomial random variable for n − 1 trials. In addition, X i and ¸ j=i X j are independent random variables. This implies E [X i |X 1 + · · · +X n = k] = P [X i = 1] P ¸ j=i X j = k −1 P ¸ n j=1 X j = k (5) = p n−1 k−1 p k−1 (1 −p) n−1−(k−1) n k p k (1 −p) n−k = k n (6) 248 A second way is to argue that symmetry implies E[X i |X 1 + · · · +X n = k] = γ, the same for each i. In this case, nγ = n ¸ i=1 E [X i |X 1 + · · · +X n = k] = E [X 1 + · · · +X n |X 1 + · · · +X n = k] = k (7) Thus γ = k/n. At any rate, the conditional mean of U is E [U|K = k] = k ¸ i=1 E [X i |X 1 + · · · +X n = k] = k ¸ i=1 k n = k 2 n (8) This says that the random variable E[U|K] = K 2 /n. Using iterated expectations, we have E [U] = E [E [U|K]] = E K 2 /n (9) Since K is a binomial random variable, we know that E[K] = np and Var[K] = np(1 −p). Thus, E [U] = 1 n E K 2 = 1 n Var[K] + (E [K]) 2 = p(1 −p) +np 2 (10) On the other hand, V is just and ordinary random sum of independent random variables and the mean of E[V ] = E[X]E[M] = np 2 . Problem 6.5.8 Solution Using N to denote the number of games played, we can write the total number of points earned as the random sum Y = X 1 +X 2 + · · · +X N (1) (a) It is tempting to use Theorem 6.12 to find φ Y (s); however, this would be wrong since each X i is not independent of N. In this problem, we must start from first principles using iterated expectations. φ Y (s) = E E e s(X 1 +···+X N ) |N = ∞ ¸ n=1 P N (n) E e s(X 1 +···+Xn) |N = n (2) Given N = n, X 1 , . . . , X n are independent so that E e s(X 1 +···+Xn) |N = n = E e sX 1 |N = n E e sX 2 |N = n · · · E e sXn |N = n (3) Given N = n, we know that games 1 through n −1 were either wins or ties and that game n was a loss. That is, given N = n, X n = 0 and for i < n, X i = 0. Moreover, for i < n, X i has the conditional PMF P X i |N=n (x) = P X i |X i =0 (x) = 1/2 x = 1, 2 0 otherwise (4) These facts imply E e sXn |N = n = e 0 = 1 (5) and that for i < n, E e sX i |N = n = (1/2)e s + (1/2)e 2s = e s /2 +e 2s /2 (6) 249 Now we can find the MGF of Y . φ Y (s) = ∞ ¸ n=1 P N (n) E e sX 1 |N = n E e sX 2 |N = n · · · E e sXn |N = n (7) = ∞ ¸ n=1 P N (n) e s /2 +e 2s /2 n−1 = 1 e s /2 +e 2s /2 ∞ ¸ n=1 P N (n) e s /2 +e 2s /2 n (8) It follows that φ Y (s) = 1 e s /2 +e 2s /2 ∞ ¸ n=1 P N (n) e nln[(e s +e 2s )/2] = φ N (ln[e s /2 +e 2s /2]) e s /2 +e 2s /2 (9) The tournament ends as soon as you lose a game. Since each game is a loss with probability 1/3 independent of any previous game, the number of games played has the geometric PMF and corresponding MGF P N (n) = (2/3) n−1 (1/3) n = 1, 2, . . . 0 otherwise φ N (s) = (1/3)e s 1 −(2/3)e s (10) Thus, the MGF of Y is φ Y (s) = 1/3 1 −(e s +e 2s )/3 (11) (b) To find the moments of Y , we evaluate the derivatives of the MGF φ Y (s). Since dφ Y (s) ds = e s + 2e 2s 9 [1 −e s /3 −e 2s /3] 2 (12) we see that E [Y ] = dφ Y (s) ds s=0 = 3 9(1/3) 2 = 3 (13) If you’re curious, you may notice that E[Y ] = 3 precisely equals E[N]E[X i ], the answer you would get if you mistakenly assumed that N and each X i were independent. Although this may seem like a coincidence, its actually the result of theorem known as Wald’s equality. The second derivative of the MGF is d 2 φ Y (s) ds 2 = (1 −e s /3 −e 2s /3)(e s + 4e 2s ) + 2(e s + 2e 2s ) 2 /3 9(1 −e s /3 −e 2s /3) 3 (14) The second moment of Y is E Y 2 = d 2 φ Y (s) ds 2 s=0 = 5/3 + 6 1/3 = 23 (15) The variance of Y is Var[Y ] = E[Y 2 ] −(E[Y ]) 2 = 23 −9 = 14. 250 Problem 6.6.1 Solution We know that the waiting time, W is uniformly distributed on [0,10] and therefore has the following PDF. f W (w) = 1/10 0 ≤ w ≤ 10 0 otherwise (1) We also know that the total time is 3 milliseconds plus the waiting time, that is X = W + 3. (a) The expected value of X is E[X] = E[W + 3] = E[W] + 3 = 5 + 3 = 8. (b) The variance of X is Var[X] = Var[W + 3] = Var[W] = 25/3. (c) The expected value of A is E[A] = 12E[X] = 96. (d) The standard deviation of A is σ A = Var[A] = 12(25/3) = 10. (e) P[A > 116] = 1 −Φ( 116−96 10 ) = 1 −Φ(2) = 0.02275. (f) P[A < 86] = Φ( 86−96 10 ) = Φ(−1) = 1 −Φ(1) = 0.1587 Problem 6.6.2 Solution Knowing that the probability that voice call occurs is 0.8 and the probability that a data call occurs is 0.2 we can define the random variable D i as the number of data calls in a single telephone call. It is obvious that for any i there are only two possible values for D i , namely 0 and 1. Furthermore for all i the D i ’s are independent and identically distributed withe the following PMF. P D (d) = 0.8 d = 0 0.2 d = 1 0 otherwise (1) From the above we can determine that E [D] = 0.2 Var [D] = 0.2 −0.04 = 0.16 (2) With these facts, we can answer the questions posed by the problem. (a) E[K 100 ] = 100E[D] = 20 (b) Var[K 100 ] = 100 Var[D] = √ 16 = 4 (c) P[K 100 ≥ 18] = 1 −Φ 18−20 4 = 1 −Φ(−1/2) = Φ(1/2) = 0.6915 (d) P[16 ≤ K 100 ≤ 24] = Φ( 24−20 4 ) −Φ( 16−20 4 ) = Φ(1) −Φ(−1) = 2Φ(1) −1 = 0.6826 Problem 6.6.3 Solution (a) Let X 1 , . . . , X 120 denote the set of call durations (measured in minutes) during the month. From the problem statement, each X−I is an exponential (λ) random variable with E[X i ] = 1/λ = 2.5 min and Var[X i ] = 1/λ 2 = 6.25 min 2 . The total number of minutes used during the month is Y = X 1 + · · · +X 120 . By Theorem 6.1 and Theorem 6.3, E [Y ] = 120E [X i ] = 300 Var[Y ] = 120 Var[X i ] = 750. (1) 251 The subscriber’s bill is 30 + 0.4(y − 300) + where x + = x if x ≥ 0 or x + = 0 if x < 0. the subscribers bill is exactly $36 if Y = 315. The probability the subscribers bill exceeds $36 equals P [Y > 315] = P ¸ Y −300 σ Y > 315 −300 σ Y = Q 15 √ 750 = 0.2919. (2) (b) If the actual call duration is X i , the subscriber is billed for M i = X i | minutes. Because each X i is an exponential (λ) random variable, Theorem 3.9 says that M i is a geometric (p) random variable with p = 1 −e −λ = 0.3297. Since M i is geometric, E [M i ] = 1 p = 3.033, Var[M i ] = 1 −p p 2 = 6.167. (3) The number of billed minutes in the month is B = M 1 + · · · +M 120 . Since M 1 , . . . , M 120 are iid random variables, E [B] = 120E [M i ] = 364.0, Var[B] = 120 Var[M i ] = 740.08. (4) Similar to part (a), the subscriber is billed $36 if B = 315 minutes. The probability the subscriber is billed more than $36 is P [B > 315] = P ¸ B −364 √ 740.08 > 315 −365 √ 740.08 = Q(−1.8) = Φ(1.8) = 0.964. (5) Problem 6.7.1 Solution In Problem 6.2.6, we learned that a sum of iid Poisson random variables is a Poisson random variable. Hence W n is a Poisson random variable with mean E[W n ] = nE[K] = n. Thus W n has variance Var[W n ] = n and PMF P W n (w) = n w e −n /w! w = 0, 1, 2, . . . 0 otherwise (1) All of this implies that we can exactly calculate P [W n = n] = P Wn (n) = n n e −n /n! (2) Since we can perform the exact calculation, using a central limit theorem may seem silly; however for large n, calculating n n or n! is difficult for large n. Moreover, it’s interesting to see how good the approximation is. In this case, the approximation is P [W n = n] = P [n ≤ W n ≤ n] ≈ Φ n + 0.5 −n √ n −Φ n −0.5 −n √ n = 2Φ 1 2 √ n −1 (3) The comparison of the exact calculation and the approximation are given in the following table. P [W n = n] n = 1 n = 4 n = 16 n = 64 exact 0.3679 0.1954 0.0992 0.0498 approximate 0.3829 0.1974 0.0995 0.0498 (4) 252 Problem 6.7.2 Solution (a) Since the number of requests N has expected value E[N] = 300 and variance Var[N] = 300, we need C to satisfy P [N > C] = P ¸ N −300 √ 300 > C −300 √ 300 (1) = 1 −Φ C −300 √ 300 = 0.05. (2) From Table 3.1, we note that Φ(1.65) = 0.9505. Thus, C = 300 + 1.65 √ 300 = 328.6. (3) (b) For C = 328.6, the exact probability of overload is P [N > C] = 1 −P [N ≤ 328] = 1 −poissoncdf(300,328) = 0.0516, (4) which shows the central limit theorem approximation is reasonable. (c) This part of the problem could be stated more carefully. Re-examining Definition 2.10 for the Poisson random variable and the accompanying discussion in Chapter 2, we observe that the webserver has an arrival rate of λ = 300 hits/min, or equivalently λ = 5 hits/sec. Thus in a one second interval, the number of requests N is a Poisson (α = 5) random variable. However, since the server “capacity” in a one second interval is not precisely defined, we will make the somewhat arbitrary definition that the server capacity is C = 328.6/60 = 5.477 packets/sec. With this somewhat arbitrary definition, the probability of overload in a one second interval is P N > C = 1 −P N ≤ 5.477 = 1 −P N ≤ 5 . (5) Because the number of arrivals in the interval is small, it would be a mistake to use the Central Limit Theorem to estimate this overload probability. However, the direct calculation of the overload probability is not hard. For E[N ] = α = 5, 1 −P N ≤ 5 = 1 − 5 ¸ n=0 P N (n) = 1 −e −α 5 ¸ n=0 α n n! = 0.3840. (6) (d) Here we find the smallest C such that P[N ≤ C] ≥ 0.95. From the previous step, we know that C > 5. Since N is a Poisson (α = 5) random variable, we need to find the smallest C such that P [N ≤ C] = C ¸ n=0 α n e −α /n! ≥ 0.95. (7) Some experiments with poissoncdf(alpha,c) will show that P[N ≤ 8] = 0.9319 while P[N ≤ 9] = 0.9682. Hence C = 9. 253 (e) If we use the Central Limit theorem to estimate the overload probability in a one second interval, we would use the facts that E[N ] = 5 and Var[N ] = 5 to estimate the the overload probability as 1 −P N ≤ 5 = 1 −Φ 5 −5 √ 5 = 0.5 (8) which overestimates the overload probability by roughly 30 percent. We recall from Chapter 2 that a Poisson random is the limiting case of the (n, p) binomial random variable when n is large and np = α.In general, for fixed p, the Poisson and binomial PMFs become closer as n increases. Since large n is also the case for which the central limit theorem applies, it is not surprising that the the CLT approximation for the Poisson (α) CDF is better when α = np is large. Comment: Perhaps a more interesting question is why the overload probability in a one-second interval is so much higher than that in a one-minute interval? To answer this, consider a T-second interval in which the number of requests N T is a Poisson (λT) random variable while the server capacity is cT hits. In the earlier problem parts, c = 5.477 hits/sec. We make the assumption that the server system is reasonably well-engineered in that c > λ. (We will learn in Chapter 12 that to assume otherwise means that the backlog of requests will grow without bound.) Further, assuming T is fairly large, we use the CLT to estimate the probability of overload in a T-second interval as P [N T ≥ cT] = P ¸ N T −λT √ λT ≥ cT −λT √ λT = Q k √ T , (9) where k = (c − λ)/ √ λ. As long as c > λ, the overload probability decreases with increasing T. In fact, the overload probability goes rapidly to zero as T becomes large. The reason is that the gap cT − λT between server capacity cT and the expected number of requests λT grows linearly in T while the standard deviation of the number of requests grows proportional to √ T. However, one should add that the definition of a T-second overload is somewhat arbitrary. In fact, one can argue that as T becomes large, the requirement for no overloads simply becomes less stringent. In Chapter 12, we will learn techniques to analyze a system such as this webserver in terms of the average backlog of requests and the average delay in serving in serving a request. These statistics won’t depend on a particular time period T and perhaps better describe the system performance. Problem 6.7.3 Solution (a) The number of tests L needed to identify 500 acceptable circuits is a Pascal (k = 500, p = 0.8) random variable, which has expected value E[L] = k/p = 625 tests. (b) Let K denote the number of acceptable circuits in n = 600 tests. Since K is binomial (n = 600, p = 0.8), E[K] = np = 480 and Var[K] = np(1 − p) = 96. Using the CLT, we estimate the probability of finding at least 500 acceptable circuits as P [K ≥ 500] = P ¸ K −480 √ 96 ≥ 20 √ 96 ≈ Q 20 √ 96 = 0.0206. (1) (c) Using Matlab, we observe that 1.0-binomialcdf(600,0.8,499) ans = 0.0215 254 (d) We need to find the smallest value of n such that the binomial (n, p) random variable K satisfies P[K ≥ 500] ≥ 0.9. Since E[K] = np and Var[K] = np(1−p), the CLT approximation yields P [K ≥ 500] = P ¸ K −np np(1 −p) ≥ 500 −np np(1 −p) ¸ ≈ 1 −Φ(z) = 0.90. (2) where z = (500 − np)/ np(1 −p). It follows that 1 − Φ(z) = Φ(−z) ≥ 0.9, implying z = −1.29. Since p = 0.8, we have that np −500 = 1.29 np(1 −p). (3) Equivalently, for p = 0.8, solving the quadratic equation n − 500 p 2 = (1.29) 2 1 −p p n (4) we obtain n = 641.3. Thus we should test n = 642 circuits. Problem 6.8.1 Solution The N[0, 1] random variable Z has MGF φ Z (s) = e s 2 /2 . Hence the Chernoff bound for Z is P [Z ≥ c] ≤ min s≥0 e −sc e s 2 /2 = min s≥0 e s 2 /2−sc (1) We can minimize e s 2 /2−sc by minimizing the exponent s 2 /2 −sc. By setting d ds s 2 /2 −sc = 2s −c = 0 (2) we obtain s = c. At s = c, the upper bound is P[Z ≥ c] ≤ e −c 2 /2 . The table below compares this upper bound to the true probability. Note that for c = 1, 2 we use Table 3.1 and the fact that Q(c) = 1 −Φ(c). c = 1 c = 2 c = 3 c = 4 c = 5 Chernoff bound 0.606 0.135 0.011 3.35 10 −4 3.73 10 −6 Q(c) 0.1587 0.0228 0.0013 3.17 10 −5 2.87 10 −7 (3) We see that in this case, the Chernoff bound typically overestimates the true probability by roughly a factor of 10. Problem 6.8.2 Solution For an N[µ, σ 2 ] random variable X, we can write P [X ≥ c] = P [(X −µ)/σ ≥ (c −µ)/σ] = P [Z ≥ (c −µ)/σ] (1) Since Z is N[0, 1], we can apply the result of Problem 6.8.1 with c replaced by (c − µ)/σ. This yields P [X ≥ c] = P [Z ≥ (c −µ)/σ] ≤ e −(c−µ) 2 /2σ 2 (2) 255 Problem 6.8.3 Solution From Appendix A, we know that the MGF of K is φ K (s) = e α(e s −1) (1) The Chernoff bound becomes P [K ≥ c] ≤ min s≥0 e −sc e α(e s −1) = min s≥0 e α(e s −1)−sc (2) Since e y is an increasing function, it is sufficient to choose s to minimize h(s) = α(e s − 1) − sc. Setting dh(s)/ds = αe s −c = 0 yields e s = c/α or s = ln(c/α). Note that for c < α, the minimizing s is negative. In this case, we choose s = 0 and the Chernoff bound is P[K ≥ c] ≤ 1. For c ≥ α, applying s = ln(c/α) yields P[K ≥ c] ≤ e −α (αe/c) c . A complete expression for the Chernoff bound is P [K ≥ c] ≤ 1 c < α α c e c e −α /c c c ≥ α (3) Problem 6.8.4 Solution This problem is solved completely in the solution to Quiz 6.8! We repeat that solution here. Since W = X 1 +X 2 +X 3 is an Erlang (n = 3, λ = 1/2) random variable, Theorem 3.11 says that for any w > 0, the CDF of W satisfies F W (w) = 1 − 2 ¸ k=0 (λw) k e −λw k! (1) Equivalently, for λ = 1/2 and w = 20, P [W > 20] = 1 −F W (20) (2) = e −10 1 + 10 1! + 10 2 2! = 61e −10 = 0.0028 (3) Problem 6.8.5 Solution Let W n = X 1 + · · · +X n . Since M n (X) = W n /n, we can write P [M n (X) ≥ c] = P [W n ≥ nc] (1) Since φ Wn (s) = (φ X (s)) n , applying the Chernoff bound to W n yields P [W n ≥ nc] ≤ min s≥0 e −snc φ Wn (s) = min s≥0 e −sc φ X (s) n (2) For y ≥ 0, y n is a nondecreasing function of y. This implies that the value of s that minimizes e −sc φ X (s) also minimizes (e −sc φ X (s)) n . Hence P [M n (X) ≥ c] = P [W n ≥ nc] ≤ min s≥0 e −sc φ X (s) n (3) 256 Problem 6.9.1 Solution Note that W n is a binomial (10 n , 0.5) random variable. We need to calculate P [B n ] = P [0.499 10 n ≤ W n ≤ 0.501 10 n ] (1) = P [W n ≤ 0.501 10 n ] −P [W n < 0.499 10 n ] . (2) A complication is that the event W n < w is not the same as W n ≤ w when w is an integer. In this case, we observe that P [W n < w] = P [W n ≤ w| −1] = F Wn (w| −1) (3) Thus P [B n ] = F Wn (0.501 10 n ) −F Wn 0.499 10 9 −1 (4) For n = 1, . . . , N, we can calculate P[B n ] in this Matlab program: function pb=binomialcdftest(N); pb=zeros(1,N); for n=1:N, w=[0.499 0.501]*10^n; w(1)=ceil(w(1))-1; pb(n)=diff(binomialcdf(10^n,0.5,w)); end Unfortunately, on this user’s machine (a Windows XP laptop), the program fails for N = 4. The problem, as noted earlier is that binomialcdf.m uses binomialpmf.m, which fails for a binomial (10000, p) random variable. Of course, your mileage may vary. A slightly better solution is to use the bignomialcdf.m function, which is identical to binomialcdf.m except it calls bignomialpmf.m rather than binomialpmf.m. This enables calculations for larger values of n, although at some cost in numerical accuracy. Here is the code: function pb=bignomialcdftest(N); pb=zeros(1,N); for n=1:N, w=[0.499 0.501]*10^n; w(1)=ceil(w(1))-1; pb(n)=diff(bignomialcdf(10^n,0.5,w)); end For comparison, here are the outputs of the two programs: >> binomialcdftest(4) ans = 0.2461 0.0796 0.0756 NaN >> bignomialcdftest(6) ans = 0.2461 0.0796 0.0756 0.1663 0.4750 0.9546 The result 0.9546 for n = 6 corresponds to the exact probability in Example 6.15 which used the CLT to estimate the probability as 0.9544. Unfortunately for this user, for n = 7, bignomialcdftest(7) failed. 257 Problem 6.9.2 Solution The Erlang (n, λ = 1) random variable X has expected value E[X] = n/λ = n and variance Var[X] = n/λ 2 = n. The PDF of X as well as the PDF of a Gaussian random variable Y with the same expected value and variance are f X (x) = x n−1 e −x (n −1)! x ≥ 0 0 otherwise f Y (x) = 1 √ 2πn e −x 2 /2n (1) function df=erlangclt(n); r=3*sqrt(n); x=(n-r):(2*r)/100:n+r; fx=erlangpdf(n,1,x); fy=gausspdf(n,sqrt(n),x); plot(x,fx,x,fy); df=fx-fy; From the forms of the functions, it not likely to be apparent that f X (x) and f Y (x) are similar. The following program plots f X (x) and f Y (x) for values of x within three standard deviations of the expected value n. Below are sample outputs of erlangclt(n) for n = 4, 20, 100. In the graphs we will see that as n increases, the Erlang PDF becomes increasingly similar to the Gaussian PDF of the same expected value and variance. This is not surprising since the Erlang (n, λ) random variable is the sum of n of exponential random variables and the CLT says that the Erlang CDF should converge to a Gaussian CDF as n gets large. 0 5 10 0 0.05 0.1 0.15 0.2 f X ( x ) f Y ( x ) x 10 20 30 0 0.05 0.1 f X ( x ) f Y ( x ) x 80 100 120 0 0.02 0.04 f X ( x ) f Y ( x ) x erlangclt(4) erlangclt(20) erlangclt(100) On the other hand, the convergence should be viewed with some caution. For example, the mode (the peak value) of the Erlang PDF occurs at x = n−1 while the mode of the Gaussian PDF is at x = n. This difference only appears to go away for n = 100 because the graph x-axis range is expanding. More important, the two PDFs are quite different far away from the center of the distribution. The Erlang PDF is always zero for x < 0 while the Gaussian PDF is always positive. For large postive x, the two distributions do not have the same exponential decay. Thus it’s not a good idea to use the CLT to estimate probabilities of rare events such as {X > x} for extremely large values of x. Problem 6.9.3 Solution In this problem, we re-create the plots of Figure 6.3 except we use the binomial PMF and corre- sponding Gaussian PDF. Here is a Matlab program that compares the binomial (n, p) PMF and the Gaussian PDF with the same expected value and variance. 258 function y=binomcltpmf(n,p) x=-1:17; xx=-1:0.05:17; y=binomialpmf(n,p,x); std=sqrt(n*p*(1-p)); clt=gausspdf(n*p,std,xx); hold off; pmfplot(x,y,’\it x’,’\it p_X(x) f_X(x)’); hold on; plot(xx,clt); hold off; Here are the output plots for p = 1/2 and n = 2, 4, 8, 16. −5 0 5 10 15 20 0 0.5 1 x p X ( x ) f X ( x ) −5 0 5 10 15 20 0 0.2 0.4 x p X ( x ) f X ( x ) binomcltpmf(2,0.5) binomcltpmf(4,0.5) −5 0 5 10 15 20 0 0.2 0.4 x p X ( x ) f X ( x ) −5 0 5 10 15 20 0 0.1 0.2 x p X ( x ) f X ( x ) binomcltpmf(8,0.5) binomcltpmf(16,0.5) To see why the values of the PDF and PMF are roughly the same, consider the Gaussian random variable Y . For small ∆, f Y (x) ∆ ≈ F Y (x +∆/2) −F Y (x −∆/2) ∆ . (1) For ∆ = 1, we obtain f Y (x) ≈ F Y (x + 1/2) −F Y (x −1/2) . (2) Since the Gaussian CDF is approximately the same as the CDF of the binomial (n, p) random variable X, we observe for an integer x that f Y (x) ≈ F X (x + 1/2) −F X (x −1/2) = P X (x) . (3) Although the equivalence in heights of the PMF and PDF is only an approximation, it can be useful for checking the correctness of a result. Problem 6.9.4 Solution Since the conv function is for convolving signals in time, we treat P X 1 (x) and P X 2 (x 2 )x, or as though they were signals in time starting at time x = 0. That is, px1 = P X 1 (0) P X 1 (1) · · · P X 1 (25) (1) px2 = P X 2 (0) P X 2 (1) · · · P X 2 (100) (2) 259 %convx1x2.m sw=(0:125); px1=[0,0.04*ones(1,25)]; px2=zeros(1,101); px2(10*(1:10))=10*(1:10)/550; pw=conv(px1,px2); h=pmfplot(sw,pw,... ’\itw’,’\itP_W(w)’); set(h,’LineWidth’,0.25); In particular, between its minimum and maximum values, the vector px2 must enumerate all integer values, including those which have zero probability. In addition, we write down sw=0:125 directly based on knowledge that the range enu- merated by px1 and px2 corresponds to X 1 + X 2 having a minimum value of 0 and a maximum value of 125. The resulting plot will be essentially identical to Figure 6.4. One final note, the command set(h,’LineWidth’,0.25) is used to make the bars of the PMF thin enough to be resolved indi- vidually. Problem 6.9.5 Solution sx1=(1:10);px1=0.1*ones(1,10); sx2=(1:20);px2=0.05*ones(1,20); sx3=(1:30);px3=ones(1,30)/30; [SX1,SX2,SX3]=ndgrid(sx1,sx2,sx3); [PX1,PX2,PX3]=ndgrid(px1,px2,px3); SW=SX1+SX2+SX3; PW=PX1.*PX2.*PX3; sw=unique(SW); pw=finitepmf(SW,PW,sw); h=pmfplot(sw,pw,’\itw’,’\itP_W(w)’); set(h,’LineWidth’,0.25); Since the mdgrid function extends naturally to higher dimensions, this solution follows the logic of sumx1x2 in Example 6.19. The output of sumx1x2x3 is the plot of the PMF of W shown below. We use the command set(h,’LineWidth’,0.25) to ensure that the bars of the PMF are thin enough to be resolved individually. 0 10 20 30 40 50 60 0 0.01 0.02 0.03 0.04 w P W ( w ) Problem 6.9.6 Solution function [pw,sw]=sumfinitepmf(px,sx,py,sy); [SX,SY]=ndgrid(sx,sy); [PX,PY]=ndgrid(px,py); SW=SX+SY;PW=PX.*PY; sw=unique(SW); pw=finitepmf(SW,PW,sw); sumfinitepmf generalizes the method of Ex- ample 6.19. The only difference is that the PMFs px and py and ranges sx and sy are not hard coded, but instead are function in- puts. 260 As an example, suppose X is a discrete uniform (0, 20) random variable and Y is an independent discrete uniform (0, 80) random variable. The following program sum2unif will generate and plot the PMF of W = X +Y . %sum2unif.m sx=0:20;px=ones(1,21)/21; sy=0:80;py=ones(1,81)/81; [pw,sw]=sumfinitepmf(px,sx,py,sy); h=pmfplot(sw,pw,’\it w’,’\it P_W(w)’); set(h,’LineWidth’,0.25); Here is the graph generated by sum2unif. 0 10 20 30 40 50 60 70 80 90 100 0 0.005 0.01 0.015 w P W ( w ) 261 Problem Solutions – Chapter 7 Problem 7.1.1 Solution Recall that X 1 , X 2 . . . X n are independent exponential random variables with mean value µ X = 5 so that for x ≥ 0, F X (x) = 1 −e −x/5 . (a) Using Theorem 7.1, σ 2 Mn(x) = σ 2 X /n. Realizing that σ 2 X = 25, we obtain Var[M 9 (X)] = σ 2 X 9 = 25 9 (1) (b) P [X 1 ≥ 7] = 1 −P [X 1 ≤ 7] (2) = 1 −F X (7) = 1 −(1 −e −7/5 ) = e −7/5 ≈ 0.247 (3) (c) First we express P[M 9 (X) > 7] in terms of X 1 , . . . , X 9 . P [M 9 (X) > 7] = 1 −P [M 9 (X) ≤ 7] = 1 −P [(X 1 +. . . +X 9 ) ≤ 63] (4) Now the probability that M 9 (X) > 7 can be approximated using the Central Limit Theorem (CLT). P [M 9 (X) > 7] = 1 −P [(X 1 +. . . +X 9 ) ≤ 63] (5) ≈ 1 −Φ 63 −9µ X √ 9σ X = 1 −Φ(6/5) (6) Consulting with Table 3.1 yields P[M 9 (X) > 7] ≈ 0.1151. Problem 7.1.2 Solution X 1 , X 2 . . . X n are independent uniform random variables with mean value µ X = 7 and σ 2 X = 3 (a) Since X 1 is a uniform random variable, it must have a uniform PDF over an interval [a, b]. From Appendix A, we can look up that µ X = (a+b)/2 and that Var[X] = (b−a) 2 /12. Hence, given the mean and variance, we obtain the following equations for a and b. (b −a) 2 /12 = 3 (a +b)/2 = 7 (1) Solving these equations yields a = 4 and b = 10 from which we can state the distribution of X. f X (x) = 1/6 4 ≤ x ≤ 10 0 otherwise (2) (b) From Theorem 7.1, we know that Var[M 16 (X)] = Var[X] 16 = 3 16 (3) 262 (c) P [X 1 ≥ 9] = ∞ 9 f X 1 (x) dx = 10 9 (1/6) dx = 1/6 (4) (d) The variance of M 16 (X) is much less than Var[X 1 ]. Hence, the PDF of M 16 (X) should be much more concentrated about E[X] than the PDF of X 1 . Thus we should expect P[M 16 (X) > 9] to be much less than P[X 1 > 9]. P [M 16 (X) > 9] = 1 −P [M 16 (X) ≤ 9] = 1 −P [(X 1 + · · · +X 16 ) ≤ 144] (5) By a Central Limit Theorem approximation, P [M 16 (X) > 9] ≈ 1 −Φ 144 −16µ X √ 16σ X = 1 −Φ(2.66) = 0.0039 (6) As we predicted, P[M 16 (X) > 9] <P[X 1 > 9]. Problem 7.1.3 Solution This problem is in the wrong section since the standard error isn’t defined until Section 7.3. However is we peek ahead to this section, the problem isn’t very hard. Given the sample mean estimate M n (X), the standard error is defined as the standard deviation e n = Var[M n (X)]. In our problem, we use samples X i to generate Y i = X 2 i . For the sample mean M n (Y ), we need to find the standard error e n = Var[M n (Y )] = Var[Y ] n . (1) Since X is a uniform (0, 1) random variable, E [Y ] = E X 2 = 1 0 x 2 dx = 1/3, (2) E Y 2 = E X 4 = 1 0 x 4 dx = 1/5. (3) Thus Var[Y ] = 1/5 −(1/3) 2 = 4/45 and the sample mean M n (Y ) has standard error e n = 4 45n . (4) Problem 7.1.4 Solution (a) Since Y n = X 2n−1 + (−X 2n ), Theorem 6.1 says that the expected value of the difference is E [Y ] = E [X 2n−1 ] +E [−X 2n ] = E [X] −E [X] = 0 (1) (b) By Theorem 6.2, the variance of the difference between X 2n−1 and X 2n is Var[Y n ] = Var[X 2n−1 ] + Var[−X 2n ] = 2 Var[X] (2) 263 (c) Each Y n is the difference of two samples of X that are independent of the samples used by any other Y m . Thus Y 1 , Y 2 , . . . is an iid random sequence. By Theorem 7.1, the mean and variance of M n (Y ) are E [M n (Y )] = E [Y n ] = 0 (3) Var[M n (Y )] = Var[Y n ] n = 2 Var[X] n (4) Problem 7.2.1 Solution If the average weight of a Maine black bear is 500 pounds with standard deviation equal to 100 pounds, we can use the Chebyshev inequality to upper bound the probability that a randomly chosen bear will be more then 200 pounds away from the average. P [|W −E [W] | ≥ 200] ≤ Var[W] 200 2 ≤ 100 2 200 2 = 0.25 (1) Problem 7.2.2 Solution We know from the Chebyshev inequality that P [|X −E [X] | ≥ c] ≤ σ 2 X c 2 (1) Choosing c = kσ X , we obtain P [|X −E [X] | ≥ kσ] ≤ 1 k 2 (2) The actual probability the Gaussian random variable Y is more than k standard deviations from its expected value is P [|Y −E [Y ]| ≥ kσ Y ] = P [Y −E [Y ] ≤ −kσ Y ] +P [Y −E [Y ] ≥ kσ Y ] (3) = 2P ¸ Y −E [Y ] σ Y ≥ k (4) = 2Q(k) (5) The following table compares the upper bound and the true probability: k = 1 k = 2 k = 3 k = 4 k = 5 Chebyshev bound 1 0.250 0.111 0.0625 0.040 2Q(k) 0.317 0.046 0.0027 6.33 10 −5 5.73 10 −7 (6) The Chebyshev bound gets increasingly weak as k goes up. As an example, for k = 4, the bound exceeds the true probability by a factor of 1,000 while for k = 5 the bound exceeds the actual probability by a factor of nearly 100,000. Problem 7.2.3 Solution The hard part of this problem is to derive the PDF of the sum W = X 1 +X 2 +X 3 of iid uniform (0, 30) random variables. In this case, we need to use the techniques of Chapter 6 to convolve the three PDFs. To simplify our calculations, we will instead find the PDF of V = Y 1 +Y 2 +Y 3 where 264 the Y i are iid uniform (0, 1) random variables. By Theorem 3.20 to conclude that W = 30V is the sum of three iid uniform (0, 30) random variables. To start, let V 2 = Y 1 +Y 2 . Since each Y 1 has a PDF shaped like a unit area pulse, the PDF of V 2 is the triangular function 0 1 2 0 0.5 1 v f V 2 ( v ) f V 2 (v) = v 0 ≤ v ≤ 1 2 −v 1 < v ≤ 2 0 otherwise (1) The PDF of V is the convolution integral f V (v) = ∞ −∞ f V 2 (y) f Y 3 (v −y) dy (2) = 1 0 yf Y 3 (v −y) dy + 2 1 (2 −y)f Y 3 (v −y) dy. (3) Evaluation of these integrals depends on v through the function f Y 3 (v −y) = 1 v −1 < v < 1 0 otherwise (4) To compute the convolution, it is helpful to depict the three distinct cases. In each case, the square “pulse” is f Y 3 (v −y) and the triangular pulse is f V 2 (y). −1 0 1 2 3 0 0.5 1 −1 0 1 2 3 0 0.5 1 −1 0 1 2 3 0 0.5 1 0 ≤ v < 1 1 ≤ v < 2 2 ≤ v < 3 From the graphs, we can compute the convolution for each case: 0 ≤ v < 1 : f V 3 (v) = v 0 y dy = 1 2 v 2 (5) 1 ≤ v < 2 : f V 3 (v) = 1 v−1 y dy + v 1 (2 −y) dy = −v 2 + 3v − 3 2 (6) 2 ≤ v < 3 : f V 3 (v) = 2 v−1 (2 −y) dy = (3 −v) 2 2 (7) To complete the problem, we use Theorem 3.20 to observe that W = 30V 3 is the sum of three iid uniform (0, 30) random variables. From Theorem 3.19, f W (w) = 1 30 f V 3 (v 3 ) v/30 = (w/30) 2 /60 0 ≤ w < 30, [−(w/30) 2 + 3(w/30) −3/2]/30 30 ≤ w < 60, [3 −(w/30)] 2 /60 60 ≤ w < 90, 0 otherwise. (8) 265 Finally, we can compute the exact probability P [W ≥ 75] = 1 60 9 75 0[3 −(w/30)] 2 dw = − (3 −w/30) 3 6 90 75 = 1 48 (9) For comparison, the Markov inequality indicated that P[W < 75] ≤ 3/5 and the Chebyshev in- equality showed that P[W < 75] ≤ 1/4. As we see, both inequalities are quite weak in this case. Problem 7.2.4 Solution On each roll of the dice, a success, namely snake eyes, occurs with probability p = 1/36. The number of trials, R, needed for three successes is a Pascal (k = 3, p) random variable with E [R] = 3/p = 108, Var[R] = 3(1 −p)/p 2 = 3780. (1) (a) By the Markov inequality, P [R ≥ 250] ≤ E [R] 250 = 54 125 = 0.432. (2) (b) By the Chebyshev inequality, P [R ≥ 250] = P [R −108 ≥ 142] = P [|R −108| ≥ 142] (3) ≤ Var[R] (142) 2 = 0.1875. (4) (c) The exact value is P[R ≥ 250] = 1 − ¸ 249 r=3 P R (r). Since there is no way around summing the Pascal PMF to find the CDF, this is what pascalcdf does. >> 1-pascalcdf(3,1/36,249) ans = 0.0299 Thus the Markov and Chebyshev inequalities are valid bounds but not good estimates of P[R ≥ 250]. Problem 7.3.1 Solution For an an arbitrary Gaussian (µ, σ) random variable Y , P [µ −σ ≤ Y ≤ µ + σ] = P [−σ ≤ Y −µ ≤ σ] (1) = P ¸ −1 ≤ Y −µ σ ≤ 1 (2) = Φ(1) −Φ(−1) = 2Φ(1) −1 = 0.6827. (3) Note that Y can be any Gaussian random variable, including, for example, M n (X) when X is Gaussian. When X is not Gaussian, the same claim holds to the extent that the central limit theorem promises that M n (X) is nearly Gaussian for large n. 266 Problem 7.3.2 Solution It should seem obvious that the result is true since Var[ ˆ R n ] going to zero implies the probablity that ˆ R n differs from E[ ˆ R n ] is going to zero. Similarly, the difference between E[ ˆ R n ] and r is also going to zero deterministically. Hence it ought to follow that ˆ R n is converging to r in probability. Here are the details: We must show that lim N→∞ P[| ˆ R n −r| ≥ ] = 0. First we note that ˆ R n being asymptotically unbiased implies that lim n→∞ E[ ˆ R n ] = r. Equivalently, given > 0, there exists n 0 such that |E[ ˆ R n ] −r| ≤ 2 /2 for all n ≥ n 0 . Second, we observe that ˆ R n −r 2 = ( ˆ R n −E ˆ R n ) + (E ˆ R n −r) 2 ≤ ˆ R n −E ˆ R n 2 + E ˆ R n −r 2 . (1) Thus for all n ≥ n 0 , ˆ R n −r 2 ≤≤ ˆ R n −E ˆ R n 2 + 2 /2. (2) It follows for n ≥ n 0 that P ¸ ˆ R n −r 2 ≥ 2 ≤ P ¸ ˆ R n −E ˆ R n 2 + 2 /2 ≥ 2 = P ¸ ˆ R n −E ˆ R n 2 ≥ 2 /2 (3) By the Chebyshev inequality, we have that P ¸ ˆ R n −E ˆ R n 2 ≥ 2 /2 ≤ Var[ ˆ R n ] (/ √ 2) 2 (4) Combining these facts, we see for n ≥ n 0 that P ¸ ˆ R n −r 2 ≥ 2 ≤ Var[ ˆ R n ] (/ √ 2) 2 . (5) It follows that lim n→∞ P ¸ ˆ R n −r 2 ≥ 2 ≤ lim n→∞ Var[ ˆ R n ] (/ √ 2) 2 = 0. (6) This proves that ˆ R n is a consistent estimator. Problem 7.3.3 Solution This problem is really very simple. If we let Y = X 1 X 2 and for the ith trial, let Y i = X 1 (i)X 2 (i), then ˆ R n = M n (Y ), the sample mean of random variable Y . By Theorem 7.5, M n (Y ) is unbiased. Since Var[Y ] = Var[X 1 X 2 ] < ∞, Theorem 7.7 tells us that M n (Y ) is a consistent sequence. Problem 7.3.4 Solution (a) Since the expectation of a sum equals the sum of the expectations also holds for vectors, E [M(n)] = 1 n n ¸ i=1 E [X(i)] = 1 n n ¸ i=1 µ X = µ X . (1) 267 (b) The jth component of M(n) is M j (n) = 1 n ¸ n i=1 X j (i), which is just the sample mean of X j . Defining A j = {|M j (n) −µ j | ≥ c}, we observe that P ¸ max j=1,...,k |M j (n) −µ j | ≥ c = P [A 1 ∪ A 2 ∪ · · · ∪ A k ] . (2) Applying the Chebyshev inequality to M j (n), we find that P [A j ] ≤ Var[M j (n)] c 2 = σ 2 j nc 2 . (3) By the union bound, P ¸ max j=1,...,k |M j (n) −µ j | ≥ c ≤ k ¸ j=1 P [A j ] ≤ 1 nc 2 k ¸ j=1 σ 2 j (4) Since ¸ k j=1 σ 2 j < ∞, lim n→∞ P[max j=1,...,k |M j (n) −µ j | ≥ c] = 0. Problem 7.3.5 Solution Note that we can write Y k as Y k = X 2k−1 −X 2k 2 2 + X 2k −X 2k−1 2 2 = (X 2k −X 2k−1 ) 2 2 (1) Hence, E [Y k ] = 1 2 E X 2 2k −2X 2k X 2k−1 +X 2 2k−1 = E X 2 −(E [X]) 2 = Var[X] (2) Next we observe that Y 1 , Y 2 , . . . is an iid random sequence. If this independence is not obvious, consider that Y 1 is a function of X 1 and X 2 , Y 2 is a function of X 3 and X 4 , and so on. Since X 1 , X 2 , . . . is an idd sequence, Y 1 , Y 2 , . . . is an iid sequence. Hence, E[M n (Y )] = E[Y ] = Var[X], implying M n (Y ) is an unbiased estimator of Var[X]. We can use Theorem 7.5 to prove that M n (Y ) is consistent if we show that Var[Y ] is finite. Since Var[Y ] ≤ E[Y 2 ], it is sufficient to prove that E[Y 2 ] < ∞. Note that Y 2 k = X 4 2k −4X 3 2k X 2k−1 + 6X 2 2k X 2 2k−1 −4X 2k X 3 2k−1 +X 4 2k−1 4 (3) Taking expectations yields E Y 2 k = 1 2 E X 4 −2E X 3 E [X] + 3 2 E X 2 2 (4) Hence, if the first four moments of X are finite, then Var[Y ] ≤ E[Y 2 ] < ∞. By Theorem 7.5, the sequence M n (Y ) is consistent. Problem 7.3.6 Solution 268 (a) From Theorem 6.2, we have Var[X 1 + · · · +X n ] = n ¸ i=1 Var[X i ] + 2 n−1 ¸ i=1 n ¸ j=i+1 Cov [X i , X j ] (1) Note that Var[X i ] = σ 2 and for j > i, Cov[X i , X j ] = σ 2 a j−i . This implies Var[X 1 + · · · +X n ] = nσ 2 + 2σ 2 n−1 ¸ i=1 n ¸ j=i+1 a j−i (2) = nσ 2 + 2σ 2 n−1 ¸ i=1 a +a 2 + · · · +a n−i (3) = nσ 2 + 2aσ 2 1 −a n−1 ¸ i=1 (1 −a n−i ) (4) With some more algebra, we obtain Var[X 1 + · · · +X n ] = nσ 2 + 2aσ 2 1 −a (n −1) − 2aσ 2 1 −a a +a 2 + · · · +a n−1 (5) = n(1 +a)σ 2 1 −a − 2aσ 2 1 −a −2σ 2 a 1 −a 2 (1 −a n−1 ) (6) Since a/(1 −a) and 1 −a n−1 are both nonnegative, Var[X 1 + · · · +X n ] ≤ nσ 2 1 +a 1 −a (7) (b) Since the expected value of a sum equals the sum of the expected values, E [M(X 1 , . . . , X n )] = E [X 1 ] + · · · +E [X n ] n = µ (8) The variance of M(X 1 , . . . , X n ) is Var[M(X 1 , . . . , X n )] = Var[X 1 + · · · +X n ] n 2 ≤ σ 2 (1 +a) n(1 −a) (9) Applying the Chebyshev inequality to M(X 1 , . . . , X n ) yields P [|M(X 1 , . . . , X n ) −µ| ≥ c] ≤ Var[M(X 1 , . . . , X n )] c 2 ≤ σ 2 (1 +a) n(1 −a)c 2 (10) (c) Taking the limit as n approaches infinity of the bound derived in part (b) yields lim n→∞ P [|M(X 1 , . . . , X n ) −µ| ≥ c] ≤ lim n→∞ σ 2 (1 +a) n(1 −a)c 2 = 0 (11) Thus lim n→∞ P [|M(X 1 , . . . , X n ) −µ| ≥ c] = 0 (12) 269 Problem 7.3.7 Solution (a) Since the expectation of the sum equals the sum of the expectations, E ˆ R(n) = 1 n n ¸ m=1 E X(m)X (m) = 1 n n ¸ m=1 R = R. (1) (b) This proof follows the method used to solve Problem 7.3.4. The i, jth element of ˆ R(n) is ˆ R i,j (n) = 1 n ¸ n m=1 X i (m)X j (m), which is just the sample mean of X i X j . Defining the event A i,j = ˆ R i,j (n) −E [X i X j ] ≥ c ¸ , (2) we observe that P ¸ max i,j ˆ R i,j (n) −E [X i X j ] ≥ c = P [∪ i,j A i,j ] . (3) Applying the Chebyshev inequality to ˆ R i,j (n), we find that P [A i,j ] ≤ Var[ ˆ R i,j (n)] c 2 = Var[X i X j ] nc 2 . (4) By the union bound, P ¸ max i,j ˆ R i,j (n) −E [X i X j ] ≥ c ≤ ¸ i,j P [A i,j ] ≤ 1 nc 2 ¸ i,j Var[X i X j ] (5) By the result of Problem 4.11.8, X i X j , the product of jointly Gaussian random variables, has finite variance. Thus ¸ i,j Var[X i X j ] = k ¸ i=1 k ¸ j=1 Var[X i X j ] ≤ k 2 max i,j Var[X i X j ] < ∞. (6) It follows that lim n→∞ P ¸ max i,j ˆ R i,j (n) −E [X i X −j] ≥ c ≤ lim n→∞ k 2 max i,j Var[X i X j ] nc 2 = 0 (7) Problem 7.4.1 Solution P X (x) = 0.1 x = 0 0.9 x = 1 0 otherwise (1) (a) E[X] is in fact the same as P X (1) because X is a Bernoulli random variable. (b) We can use the Chebyshev inequality to find P [|M 90 (X) −P X (1)| ≥ .05] = P [|M 90 (X) −E [X] | ≥ .05] ≤ α (2) In particular, the Chebyshev inequality states that α = σ 2 X 90(.05) 2 = .09 90(.05) 2 = 0.4 (3) 270 (c) Now we wish to find the value of n such that P[|M n (X) −P X (1)| ≥ .03] ≤ .01. From the Chebyshev inequality, we write 0.1 = σ 2 X n(.03) 2 . (4) Since σ 2 X = 0.09, solving for n yields n = 100. Problem 7.4.2 Solution X 1 , X 2 , . . . are iid random variables each with mean 75 and standard deviation 15. (a) We would like to find the value of n such that P [74 ≤ M n (X) ≤ 76] = 0.99 (1) When we know only the mean and variance of X i , our only real tool is the Chebyshev inequality which says that P [74 ≤ M n (X) ≤ 76] = 1 −P [|M n (X) −E [X]| ≥ 1] (2) ≥ 1 − Var [X] n = 1 − 225 n ≥ 0.99 (3) This yields n ≥ 22,500. (b) If each X i is a Gaussian, the sample mean, M n (X) will also be Gaussian with mean and variance E [M n (X)] = E [X] = 75 (4) Var [M n (X)] = Var [X] /n = 225/n (5) In this case, P [74 ≤ M n (X) ≤ 76] = Φ 76 −µ σ −Φ 74 −µ σ (6) = Φ( √ n /15) −Φ(− √ n /15) (7) = 2Φ( √ n /15) −1 = 0.99 (8) Thus, n = 1,521. Since even under the Gaussian assumption, the number of samples n is so large that even if the X i are not Gaussian, the sample mean may be approximated by a Gaussian. Hence, about 1500 samples probably is about right. However, in the absence of any information about the PDF of X i beyond the mean and variance, we cannot make any guarantees stronger than that given by the Chebyshev inequality. Problem 7.4.3 Solution (a) Since X A is a Bernoulli (p = P[A]) random variable, E [X A ] = P [A] = 0.8, Var[X A ] = P [A] (1 −P [A]) = 0.16. (1) 271 (b) Let X A,i to denote X A on the ith trial. Since ˆ P n (A) = M n (X A ) = 1 n ¸ n i=1 X A,i , Var[ ˆ P n (A)] = 1 n 2 n ¸ i=1 Var[X A,i ] = P [A] (1 −P [A]) n . (2) (c) Since ˆ P 100 (A) = M 100 (X A ), we can use Theorem 7.12(b) to write P ˆ P 100 (A) −P [A] < c ≥ 1 − Var[X A ] 100c 2 = 1 − 0.16 100c 2 = 1 −α. (3) For c = 0.1, α = 0.16/[100(0.1) 2 ] = 0.16. Thus, with 100 samples, our confidence coefficient is 1 −α = 0.84. (d) In this case, the number of samples n is unknown. Once again, we use Theorem 7.12(b) to write P ˆ P n (A) −P [A] < c ≥ 1 − Var[X A ] nc 2 = 1 − 0.16 nc 2 = 1 −α. (4) For c = 0.1, we have confidence coefficient 1 − α = 0.95 if α = 0.16/[n(0.1) 2 ] = 0.05, or n = 320. Problem 7.4.4 Solution Since E[X] = µ X = p and Var[X] = p(1 −p), we use Theorem 7.12(b) to write P [|M 100 (X) −p| < c] ≥ 1 − p(1 −p) 100c 2 = 1 −α. (1) For confidence coefficient 0.99, we require p(1 −p) 100c 2 ≤ 0.01 or c ≥ p(1 −p). (2) Since p is unknown, we must ensure that the constraint is met for every value of p. The worst case occurs at p = 1/2 which maximizes p(1 −p). In this case, c = 1/4 = 1/2 is the smallest value of c for which we have confidence coefficient of at least 0.99. If M 100 (X) = 0.06, our interval estimate for p is M 100 (X) −c < p < M 100 (X) +c. (3) Since p ≥ 0, M 100 (X) = 0.06 and c = 0.5 imply that our interval estimate is 0 ≤ p < 0.56. (4) Our interval estimate is not very tight because because 100 samples is not very large for a confidence coefficient of 0.99. Problem 7.4.5 Solution First we observe that the interval estimate can be expressed as ˆ P n (A) −P [A] < 0.05. (1) 272 Since ˆ P n (A) = M n (X A ) and E[M n (X A )] = P[A], we can use Theorem 7.12(b) to write P ˆ P n (A) −P [A] < 0.05 ≥ 1 − Var[X A ] n(0.05) 2 . (2) Note that Var[X A ] = P[A](1 −P[A]) ≤ 0.25. Thus for confidence coefficient 0.9, we require that 1 − Var[X A ] n(0.05) 2 ≥ 1 − 0.25 n(0.05) 2 ≥ 0.9. (3) This implies n ≥ 1,000 samples are needed. Problem 7.4.6 Solution Both questions can be answered using the following equation from Example 7.6: P ˆ P n (A) −P [A] ≥ c ≤ P [A] (1 −P [A]) nc 2 (1) The unusual part of this problem is that we are given the true value of P[A]. Since P[A] = 0.01, we can write P ˆ P n (A) −P [A] ≥ c ≤ 0.0099 nc 2 (2) (a) In this part, we meet the requirement by choosing c = 0.001 yielding P ˆ P n (A) −P [A] ≥ 0.001 ≤ 9900 n (3) Thus to have confidence level 0.01, we require that 9900/n ≤ 0.01. This requires n ≥ 990,000. (b) In this case, we meet the requirement by choosing c = 10 −3 P[A] = 10 −5 . This implies P ˆ P n (A) −P [A] ≥ c ≤ P [A] (1 −P [A]) nc 2 = 0.0099 n10 −10 = 9.9 10 7 n (4) The confidence level 0.01 is met if 9.9 10 7 /n = 0.01 or n = 9.9 10 9 . Problem 7.4.7 Solution Since the relative frequency of the error event E is ˆ P n (E) = M n (X E ) and E[M n (X E )] = P[E], we can use Theorem 7.12(a) to write P ˆ P n (A) −P [E] ≥ c ≤ Var[X E ] nc 2 . (1) Note that Var[X E ] = P[E](1 − P[E]) since X E is a Bernoulli (p = P[E]) random variable. Using the additional fact that P[E] ≤ and the fairly trivial fact that 1−P[E] ≤ 1, we can conclude that Var[X E ] = P [E] (1 −P [E]) ≤ P [E] ≤ . (2) Thus P ˆ P n (A) −P [E] ≥ c ≤ Var[X E ] nc 2 ≤ nc 2 . (3) 273 Problem 7.5.1 Solution In this problem, we have to keep straight that the Poisson expected value α = 1 is a different α than the confidence coefficient 1 − α. That said, we will try avoid using α for the confidence coefficient. Using X to denote the Poisson (α = 1) random variable, the trace of the sample mean is the sequence M 1 (X), M 2 (X), . . . The confidence interval estimate of α has the form M n (X) −c ≤ α ≤ M n (X) +c. (1) The confidence coefficient of the estimate based on n samples is P [M n (X) −c ≤ α ≤ M n (X) +c] = P [α −c ≤ M n (X) ≤ α +c] (2) = P [−c ≤ M n (X) −α ≤ c] . (3) Since Var[M n (X)] = Var[X]/n = 1/n, the 0.9 confidence interval shrinks with increasing n. In particular, c = c n will be a decreasing sequence. Using a Central Limit Theorem approximation, a 0.9 confidence implies 0.9 = P ¸ −c n 1/n ≤ M n (X) −α 1/n ≤ c n 1/n ¸ (4) = Φ(c n √ n) −Φ(−c n √ n) = 2Φ(c n √ n) −1. (5) Equivalently, Φ(c n √ n) = 0.95 or c n = 1.65/ √ n. Thus, as a function of the number of samples n, we plot three functions: the sample mean M n (X), and the upper limit M n (X) + 1.65/ √ n and lower limit M n (X) − 1.65/ √ n of the 0.9 confidence interval. We use the Matlab function poissonmeanseq(n) to generate these sequences for n sample values. function M=poissonmeanseq(n); x=poissonrv(1,n); nn=(1:n)’; M=cumsum(x)./nn; r=(1.65)./sqrt(nn); plot(nn,M,nn,M+r,nn,M-r); Here are two output graphs: 0 20 40 60 −2 −1 0 1 2 n M n ( X ) 0 200 400 600 −2 −1 0 1 2 3 n M n ( X ) poissonmeanseq(60) poissonmeanseq(600) 274 Problem 7.5.2 Solution For a Bernoulli (p = 1/2) random variable X, the sample mean M n (X) is the fraction of successes in n Bernoulli trials. That is, M n (X) = K n /n where K n is a binomial (n, p = 1/2) random variable. Thus the probability the sample mean is within one standard error of (p = 1/2) is p n = P ¸ n 2 − √ n 2 ≤ K n ≤ n 2 + √ n 2 (1) = P ¸ K n ≤ n 2 + √ n 2 −P ¸ K n < n 2 − √ n 2 (2) = F Kn n 2 + √ n 2 −F Kn ¸ n 2 − √ n 2 ¸ −1 (3) Here is a Matlab function that graphs p n as a function of n for N steps alongside the output graph for bernoullistderr(50). function p=bernoullistderr(N); p=zeros(1,N); for n=1:N, r=[ceil((n-sqrt(n))/2)-1; ... (n+sqrt(n))/2]; p(n)=diff(binomialcdf(n,0.5,r)); end plot(1:N,p); ylabel(’\it p_n’); xlabel(’\it n’); 0 10 20 30 40 50 0.5 0.6 0.7 0.8 0.9 1 p n n The alternating up-down sawtooth pattern is a consequence of the CDF of K n jumping up at integer values. In particular, depending on the value of √ n/2, every other value of (n ± √ n)/2 exceeds the next integer and increases p n . The less frequent but larger jumps in p n occur when √ n is an integer. For very large n, the central lmit theorem takes over since the CDF of M n (X) converges to a Gaussian CDF. In this case, the sawtooth pattern dies out and p n will converge to 0.68, the probability that a Gaussian random variable is within one standard deviation of its expected value. Finally, we note that the sawtooth pattern is essentially the same as the sawtooth pattern observed in Quiz 7.5 where we graphed the fraction of 1,000 Bernoulli sample mean traces that were within one standard error of the true expected value. For 1,000 traces in Quiz 7.5 was so large, the fraction of the number of traces within one standard error was always very close to the actual probability p n . Problem 7.5.3 Solution First, we need to determine whether the relative performance of the two estimators depends on the actual value of λ. To address this, we observe that if Y is an exponential (1) random variable, then Theorem 3.20 tells us that X = Y/λ is an exponential (λ) random variable. Thus if Y 1 , Y 2 , . . . are iid samples of Y , then Y 1 /λ, Y 2 /λ, . . . are iid samples of X. Moreover, the sample mean of X is M n (X) = 1 nλ n ¸ i=1 Y i = 1 λ M n (Y ) (1) Similarly, the sample variance of X satisfies V n (X) = 1 n −1 n ¸ i=1 (X i −M n (X)) 2 = 1 n −1 n ¸ i=1 Y i λ − 1 λ M n (Y ) 2 = V n (Y ) λ 2 (2) 275 We can conclude that ˆ λ = λ M n (Y ) ˜ λ = λ V n (Y ) (3) For λ = 1, the estimators ˆ λ and ˜ λ are just scaled versions of the estimators for the case λ = 1. Hence it is sufficient to consider only the λ = 1 case. The function z=lamest(n,m) returns the estimation errors for m trials of each estimator where each trial uses n iid exponential (1) samples. function z=lamest(n,m); x=exponentialrv(1,n*m); x=reshape(x,n,m); mx=sum(x)/n; MX=ones(n,1)*mx; vx=sum((x-MX).^2)/(n-1); z=[(1./mx); (1./sqrt(vx))]-1; In lamest.m, each column of matrix x represents one trial. Note that mx is a row vector such that mx(i) is the sample mean for trial i. The matrix MX has mx(i) for every element in column i. Thus vx is a row vector such that vx(i) is the sample variance for trial i. Finally, z is a 2 m matrix such that column i of z records the estimation errors for trial i. If ˆ λ i and ˜ λ i are the estimates for for trial i, then z(1,i) is the error ˆ Z i = ˆ λ i −1 while z(2,i) is the error ˜ Z i = ˜ λ i −1. Now that we can simulate the errors generated by each estimator, we need to determine which estimator is better. We start by using the commands z=lamest(1000,1000); plot(z(1,:),z(2,:),’bd’) to perform 1,000 trials, each using 1,000 samples. The plot command generates a scatter plot of the error pairs ( ˆ Z i , ˜ Z i ) for each trial. Here is an example of the resulting scatter plot: −0.1 −0.05 0 0.05 0.1 0.15 −0.2 −0.1 0 0.1 0.2 z(1,i) z ( 2 , i ) In the scatter plot, each diamond marks an independent pair ( ˆ Z, ˜ Z) where ˆ Z is plotted on the x-axis and ˜ Z is plotted on the y-axis. (Although it is outside the scope of this solution, it is interesting to note that the errors ˆ Z and ˜ Z appear to be positively correlated.) From the plot, it may not be obvious that one estimator is better than the other. However, by reading the axis ticks carefully, one can observe that it appears that typical values for ˆ Z are in the interval (−0.05, 0.05) while typical values for ˜ Z are in the interval (−0.1, 0.1). This suggests that ˆ Z may be superior. To verify this observation, we calculate the sample mean for each squared errors M m ( ˆ Z 2 ) = 1 m m ¸ i=1 ˆ Z 2 i M m ( ˜ Z 2 ) = 1 m m ¸ i=1 ˜ Z 2 i (4) From our Matlab experiment with m = 1,000 trials, we calculate 276 >> sum(z.^2,2)/1000 ans = 0.0010 0.0021 That is, M 1,000 ( ˆ Z 2 ) = 0.0010 and M 1,000 ( ˜ Z 2 ) = 0.0021. In fact, one can show (with a lot of work) for large m that M m ( ˆ Z 2 ) ≈ 1/m M m ( ˜ Z 2 ) = 2/m (5) and that lim m→∞ M m ( ˜ Z 2 ) M m ( ˆ Z 2 ) = 2. (6) In short, the mean squared error of the ˜ λ estimator is twice that of the ˆ λ estimator. Problem 7.5.4 Solution For the sample covariance matrix ˆ R(n) = 1 n n ¸ m=1 X(m)X (m), (1) we wish to use a Matlab simulation to estimate the probability p(n) = P ¸ max i,j ˆ R ij −I ij ≥ 0.05 . (2) (In the original printing, 0.05 was 0.01 but that requirement demanded that n be so large that most installations of Matlab would grind to a halt on the calculations.) The Matlab program uses a matrix algebra identity that may (or may not) be familiar. For a matrix X = x(1) x(2) · · · x(n) , (3) with columns x(i), we can write XX = n ¸ i=1 x(i)x (i). (4) function p=diagtest(n,t); y=zeros(t,1); p=[ ]; for ntest=n, disp(ntest) for i=1:t, X=gaussvector(0,eye(10),ntest); R=X*(X’)/ntest; y(i)=(max(max(abs(R-eye(10))))>= 0.05); end p=[p, sum(y)/t]; end semilogy(n,p); xlabel(’\it n’); ylabel(’\it p(n)’); The Matlab function diagtest(n,t) per- forms the calculation by generating t sam- ple covariance matrices, each using n vectors x(i) in each sample covariance matrix. These n vectors are generated by the gaussvector function. The vector x(i) is the ith column of the matrix X. The function diagtest estimates p(n) as the fraction of t trials for which the threshold 0.05 is exceeded. In addition, if the input n to diagtest is a vector, then the experiment is repeated for each value in the vector n. The following commands, estimate p(n) for n ∈ {10, 100, 1000, 10000} using t = 2000 trials: 277 n=[10 100 1000 10000]; p=diagtest(n,2000); The output is p= 1.0000 1.0000 1.0000 0.0035 We see that p(n) goes from roughly 1 to almost 0 in going from n = 1,000 to n = 10,000. To investigate this transition more carefully, we execute the commands nn=1000:500:10000; p=diagtest(nn,2000); The output is shown in the following graph. We use a semilog plot to emphasize differences when p(n) is close to zero. 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 10 −3 10 −2 10 −1 10 0 n p ( n ) Beyond n = 1,000, the probability p(n) declines rapidly. The “bumpiness” of the graph for large n occurs because the probability p(n) is small enough that out of 2,000 trials, the 0.05 threshold is exceeded only a few times. Note that if x has dimension greater than 10, then the value of n needed to ensure that p(n) is small would increase. Problem 7.5.5 Solution The difficulty in this problem is that although E[X] exists, EX 2 and higher order moments are infinite. Thus Var[X] is also infinite. It also follows for any finite n that the sample mean M n (X) has infinite variance. In this case, we cannot apply the Chebyshev inequality to the sample mean to show the convergence in probability of M n (X) to E[X]. If lim n→∞ P[|M n (X) −E[X]| ≥ ] = p, then there are two distinct possibilities: • p > 0, or • p = 0 but the Chebyshev inequality isn’t a sufficient powerful technique to verify this fact. To resolve whether p = 0 (and the sample mean converges to the expected value) one can spend time trying to prove either p = 0 or p > 0. At this point, we try some simulation experiments to see if the experimental evidence points one way or the other. As requested by the problem, we implement a Matlab function samplemeantest(n,a) to simulate one hundred traces of the sample mean when E[X] = a. Each trace is a length n sequence M 1 (X), M 2 (X), . . . , M n (X). 278 function mx=samplemeantest(n,a); u=rand(n,100); x=a-2+(1./sqrt(1-u)); d=((1:n)’)*ones(1,100); mx=cumsum(x)./d; plot(mx); xlabel(’\it n’); ylabel(’\it M_n(X)’); axis([0 n a-1 a+1]); The n 100 matrix x consists of iid samples of X. Taking cumulative sums along each column of x, and dividng row i by i, each column of mx is a length n sample mean trace. we then plot the traces. The following graph was generated by samplemeantest(1000,5): 0 100 200 300 400 500 600 700 800 900 1000 4 4.5 5 5.5 6 n M n ( X ) Frankly, it is difficult to draw strong conclusions from the graph. If the sample sequences M n (X) are converging to E[X], the convergence is fairly slow. Even after averaging 1,000 samples, typical values for the sample mean appear to range from a − 0.5 to a + 0.5. There may also be outlier sequences which are still off the charts since we truncated the y-axis range. On the other hand, the sample mean sequences do not appear to be diverging (which is also possible since Var[X] = ∞.) Note the above graph was generated using 10 5 sample values. Repeating the experiment with more samples, say samplemeantest(10000,5), will yield a similarly inconclusive result. Even if your version of Matlab can support the generation of 100 times as many samples, you won’t know for sure whether the sample mean sequence always converges. On the other hand, the experiment is probably enough that if you pursue the analysis, you should start by trying to prove that p = 0. (This will make a a fun problem for the third edition!) 279 Problem Solutions – Chapter 8 Problem 8.1.1 Solution Assuming the coin is fair, we must choose a rejection region R such that α = P[R] = 0.05. We can choose a rejection region R = {L > r}. What remains is to choose r so that P[R] = 0.05. Note that L > l if we first observe l tails in a row. Under the hypothesis that the coin is fair, l tails in a row occurs with probability P [L > l] = (1/2) l (1) Thus, we need P [R] = P [L > r] = 2 −r = 0.05 (2) Thus, r = −log 2 (0.05) = log 2 (20) = 4.32. In this case, we reject the hypothesis that the coin is fair if L ≥ 5. The significance level of the test is α = P[L > 4] = 2 −4 = 0.0625 which close to but not exactly 0.05. The shortcoming of this test is that we always accept the hypothesis that the coin is fair whenever heads occurs on the first, second, third or fourth flip. If the coin was biased such that the probability of heads was much higher than 1/2, say 0.8 or 0.9, we would hardly ever reject the hypothesis that the coin is fair. In that sense, our test cannot identify that kind of biased coin. Problem 8.1.2 Solution (a) We wish to develop a hypothesis test of the form P [|K −E [K] | > c] = 0.05 (1) to determine if the coin we’ve been flipping is indeed a fair one. We would like to find the value of c, which will determine the upper and lower limits on how many heads we can get away from the expected number out of 100 flips and still accept our hypothesis. Under our fair coin hypothesis, the expected number of heads, and the standard deviation of the process are E [K] = 50, σ K = 100 · 1/2 · 1/2 = 5. (2) Now in order to find c we make use of the central limit theorem and divide the above inequality through by σ K to arrive at P ¸ |K −E [K] | σ K > c σ K = 0.05 (3) Taking the complement, we get P ¸ − c σ K ≤ K −E [K] σ K ≤ c σ K = 0.95 (4) Using the Central Limit Theorem we can write Φ c σ K −Φ −c σ K = 2Φ c σ K −1 = 0.95 (5) This implies Φ(c/σ K ) = 0.975 or c/5 = 1.96. That is, c = 9.8 flips. So we see that if we observe more then 50 + 10 = 60 or less then 50 −10 = 40 heads, then with significance level α ≈ 0.05 we should reject the hypothesis that the coin is fair. 280 (b) Now we wish to develop a test of the form P [K > c] = 0.01 (6) Thus we need to find the value of c that makes the above probability true. This value will tell us that if we observe more than c heads, then with significance level α = 0.01, we should reject the hypothesis that the coin is fair. To find this value of c we look to evaluate the CDF F K (k) = k ¸ i=0 100 i (1/2) 100 . (7) Computation reveals that c ≈ 62 flips. So if we observe 62 or greater heads, then with a significance level of 0.01 we should reject the fair coin hypothesis. Another way to obtain this result is to use a Central Limit Theorem approximation. First, we express our rejection region in terms of a zero mean, unit variance random variable. P [K > c] = 1 −P [K ≤ c] = 1 −P ¸ K −E [K] σ K ≤ c −E [K] σ K = 0.01 (8) Since E[K] = 50 and σ K = 5, the CLT approximation is P [K > c] ≈ 1 −Φ c −50 5 = 0.01 (9) From Table 3.1, we have (c − 50)/5 = 2.35 or c = 61.75. Once again, we see that we reject the hypothesis if we observe 62 or more heads. Problem 8.1.3 Solution A reasonable test would reject the null hypothesis that the plant is operating normally if one or more of the chips fail the one-day test. Exactly how many shold be tested and how many failures N are needed to reject the null hypothesis would depend on the significance level of the test. (a) The lifetime of a chip is X, an exponential (λ) random variable with λ = (T/200) 2 . The probability p that a chip passes the one-day test is p = P [X ≥ 1/365] = e −λ/365 . (1) For an m chip test, the significance level of the test is α = P [N > 0] = 1 −P [N = 0] = 1 −p m = 1 −e −mλ/365 . (2) (b) At T = 100 ◦ , λ = 1/4 and we obtain a significance level of α = 0.01 if m = − 365 ln(0.99) λ = 3.67 λ = 14.74. (3) In fact, at m = 15 chips, the significance level is α = 0.0102. (c) Raising T raises the failure rate λ = (T/200) 2 and thus lowers m = 3.67/λ. In essence, raising the temperature makes a “tougher” test and thus requires fewer chips to be tested for the same significance level. 281 Problem 8.1.4 Solution (a) The rejection region is R = {T > t 0 }. The duration of a voice call has exponential PDF f T (t) = (1/3)e −t/3 t ≥ 0, 0 otherwise. (1) The significance level of the test is α = P [T > t 0 ] = ∞ t 0 f T (t) dt = e −t 0 /3 . (2) (b) The significance level is α = 0.05 if t 0 = −3 ln α = 8.99 minutes. Problem 8.1.5 Solution In order to test just a small number of pacemakers, we test n pacemakers and we reject the null hypothesis if any pacemaker fails the test. Moreover, we choose the smallest n such that we meet the required significance level of the test. The number of pacemakers that fail the test is X, a binomial (n, q 0 = 10 −4 ) random variable. The significance level of the test is α = P [X > 0] = 1 −P [X = 0] = 1 −(1 −q 0 ) n . (1) For a significance level α = 0.01, we have that n = ln(1 −α) ln(1 −q 0 ) = 100.5. (2) Comment: For α = 0.01, keep in mind that there is a one percent probability that a normal factory will fail the test. That is, a test failure is quite unlikely if the factory is operating normally. Problem 8.1.6 Solution Since the null hypothesis H 0 asserts that the two exams have the same mean and variance, we reject H 0 if the difference in sample means is large. That is, R = {|D| ≥ d 0 }. Under H 0 , the two sample means satisfy E [M A ] = E [M B ] = µ, Var[M A ] = Var[M B ] = σ 2 n = 100 n (1) Since n is large, it is reasonable to use the Central Limit Theorem to approximate M A and M B as Gaussian random variables. Since M A and M B are independent, D is also Gaussian with E [D] = E [M A ] −E [M B ] = 0 Var[D] = Var[M A ] + Var[M B ] = 200 n . (2) Under the Gaussian assumption, we can calculate the significance level of the test as α = P [|D| ≥ d 0 ] = 2 (1 −Φ(d 0 /σ D )) . (3) For α = 0.05, Φ(d 0 /σ D ) = 0.975, or d 0 = 1.96σ D = 1.96 200/n. If n = 100 students take each exam, then d 0 = 2.77 and we reject the null hypothesis that the exams are the same if the sample means differ by more than 2.77 points. 282 Problem 8.2.1 Solution For the MAP test, we must choose acceptance regions A 0 and A 1 for the two hypotheses H 0 and H 1 . From Theorem 8.2, the MAP rule is n ∈ A 0 if P N|H 0 (n) P N|H 1 (n) ≥ P [H 1 ] P [H 0 ] ; n ∈ A 1 otherwise. (1) Since P N|H i (n) = λ n i e −λ i /n!, the MAP rule becomes n ∈ A 0 if λ 0 λ 1 n e −(λ 0 −λ 1 ) ≥ P [H 1 ] P [H 0 ] ; n ∈ A 1 otherwise. (2) By taking logarithms and assuming λ 1 > λ 0 yields the final form of the MAP rule n ∈ A 0 if n ≤ n ∗ = λ 1 −λ 0 + ln(P [H 0 ] /P [H 1 ]) ln(λ 1 /λ 0 ) ; n ∈ A 1 otherwise. (3) From the MAP rule, we can get the ML rule by setting the a priori probabilities to be equal. This yields the ML rule n ∈ A 0 if n ≤ n ∗ = λ 1 −λ 0 ln(λ 1 /λ 0 ) ; n ∈ A 1 otherwise. (4) Problem 8.2.2 Solution Hypotheses H 0 and H 1 have a priori probabilities P[H 0 ] = 0.8 and P[H 1 ] = 0.2 and likelihood functions f T|H 0 (t) = (1/3)e −t/3 t ≥ 0, otherwise, f T|H 1 (t) = (1/µ D )e −t/µ D t ≥ 0, otherwise, (1) The acceptance regions are A 0 = {t|T ≤ t 0 } and A 1 = {t|t > t 0 }. (a) The false alarm probability is P FA = P [A 1 |H 0 ] = ∞ t 0 f T|H 0 (t) dt = e −t 0 /3 . (2) (b) The miss probability is P MISS = P [A 0 |H 1 ] = t 0 0 f T|H 1 (t) dt = 1 −e −t 0 /µ D . (3) (c) From Theorem 8.6, the maximum likelihood decision rule is t ∈ A 0 if f T|H 0 (t) f T|H 1 (t) ≥ 1; t ∈ A 1 otherwise. (4) After some algebra, this rule simplifies to t ∈ A 0 if t ≤ t ML = ln(µ D /3) 1/3 −1/µ D ; t ∈ A 1 otherwise. (5) When µ D = 6 minutes, t ML = 6 ln 2 = 4.16 minutes. When µ D = 10 minutes, t ML = (30/7) ln(10/3) = 5.16 minutes. 283 (d) The ML rule is the same as the MAP rule when P[H 0 ] = P[H 1 ]. When P[H 0 ] > P[H 1 ], the MAP rule (which minimizes the probability of an error) should enlarge the A 0 acceptance region. Thus we would expect t MAP > t ML . (e) From Theorem 8.2, the MAP rule is t ∈ A 0 if f T|H 0 (t) f T|H 1 (t) ≥ P [H 1 ] P [H 0 ] = 1 4 ; t ∈ A 1 otherwise. (6) This rule simplifies to t ∈ A 0 if t ≤ t MAP = ln(4µ D /3) 1/3 −1/µ D ; t ∈ A 1 otherwise. (7) When µ D = 6 minutes, t MAP = 6 ln 8 = 12.48 minutes. When µ D = 10 minutes, t ML = (30/7) ln(40/3) = 11.1 minutes. (f) For a given threshold t 0 , we learned in parts (a) and (b) that P FA = e −t 0 /3 , ¶ MISS = 1 −e −t 0 /µ D . (8) The Matlab program rocvoicedataout graphs both receiver operating curves. The program and the resulting ROC are shown here. t=0:0.05:30; PFA= exp(-t/3); PMISS6= 1-exp(-t/6); PMISS10=1-exp(-t/10); plot(PFA,PMISS6,PFA,PMISS10); legend(’\mu_D=6’,’\mu_D=10’); xlabel(’\itP_{\rmFA}’); ylabel(’\itP_{\rmMISS}’); 0 0.5 1 0 0.2 0.4 0.6 0.8 1 P FA P M I S S µ D =6 µ D =10 As one might expect, larger µ D resulted in reduced P MISS for the same P FA . Problem 8.2.3 Solution By Theorem 8.5, the decision rule is n ∈ A 0 if L(n) = P N|H 0 (n) P N|H 1 (n) ≥ γ; n ∈ A 1 otherwise, (1) where where γ is the largest possible value such that ¸ L(n)<γ P N|H 0 (n) ≤ α. Given H 0 , N is Poisson (a 0 = 1,000) while given H 1 , N is Poisson (a 1 = 1,300). We can solve for the acceptance set A 0 by observing that n ∈ A 0 if P N|H 0 (n) P N|H 1 (n) = a n 0 e −a 0 /n! a n 1 e −a 1 /n! ≥ γ. (2) Cancelling common factors and taking the logarithm, we find that n ∈ A 0 if nln a 0 a 1 ≥ (a 0 −a 1 ) + ln γ. (3) 284 Since ln(a 0 /a 1 ) < 0, divind through reverses the inequality and shows that n ∈ A 0 if n ≤ n ∗ = (a 0 −a 1 ) + ln γ ln(a 0 /a 1 ) = (a 1 −a 0 ) −ln γ ln(a 1 /a 0 ) ; n ∈ A 1 otherwise (4) However, we still need to determine the constant γ. In fact, it is easier to work with the threshold n ∗ directly. Note that L(n) < γ if and only if n > n ∗ . Thus we choose the smallest n ∗ such that P [N > n ∗ |H 0 ] = ¸ n>n ∗ P N|H 0 (n) α ≤ 10 −6 . (5) To find n ∗ a reasonable approach would be to use Central Limit Theorem approximation since given H 0 , N is a Poisson (1,000) random variable, which has the same PDF as the sum on 1,000 independent Poisson (1) random variables. Given H 0 , N has expected value a 0 and variance a 0 . From the CLT, P [N > n ∗ |H 0 ] = P ¸ N −a 0 √ a 0 > n ∗ −a 0 √ a 0 |H 0 ≈ Q n ∗ −a 0 √ a 0 ≤ 10 −6 . (6) From Table 3.2, Q(4.75) = 1.02 10 −6 and Q(4.76) < 10 −6 , implying n ∗ = a 0 + 4.76 √ a 0 = 1150.5. (7) On the other hand, perhaps the CLT should be used with some caution since α = 10 −6 implies we are using the CLT approximation far from the center of the distribution. In fact, we can check out answer using the poissoncdf functions: >> nstar=[1150 1151 1152 1153 1154 1155]; >> (1.0-poissoncdf(1000,nstar))’ ans = 1.0e-005 * 0.1644 0.1420 0.1225 0.1056 0.0910 0.0783 >> Thus we see that n ∗ 1154. Using this threshold, the miss probability is P [N ≤ n ∗ |H 1 ] = P [N ≤ 1154|H 1 ] = poissoncdf(1300,1154) = 1.98 10 −5 . (8) Keep in mind that this is the smallest possible P MISS subject to the constraint that P FA ≤ 10 −6 . Problem 8.2.4 Solution (a) Given H 0 , X is Gaussian (0, 1). Given H 1 , X is Gaussian (4, 1). From Theorem 8.2, the MAP hypothesis test is x ∈ A 0 if f X|H 0 (x) f X|H 1 (x) = e −x 2 /2 e −(x−4) 2 /2 ≥ P [H 1 ] P [H 0 ] ; x ∈ A 1 otherwise. (1) Since a target is present with probability P[H 1 ] = 0.01, the MAP rule simplifies to x ∈ A 0 if x ≤ x MAP = 2 − 1 4 ln P [H 1 ] P [H 0 ] = 3.15; x ∈ A 1 otherwise. (2) 285 The false alarm and miss probabilities are P FA = P [X ≥ x MAP |H 0 ] = Q(x MAP ) = 8.16 10 −4 (3) P MISS = P [X < x MAP |H 1 ] = Φ(x MAP −4) = 1 −Φ(0.85) = 0.1977. (4) The average cost of the MAP policy is E [C MAP ] = C 10 P FA P [H 0 ] +C 01 P MISS P [H 1 ] (5) = (1)(8.16 10 −4 )(0.99) + (10 4 )(0.1977)(0.01) = 19.77. (6) (b) The cost of a false alarm is C 10 = 1 unit while the cost of a miss is C 01 = 10 4 units. From Theorem 8.3, we see that the Minimum Cost test is the same as the MAP test except the P[H 0 ] is replaced by C 10 P[H 0 ] and P[H 1 ] is replaced by C 01 P[H 1 ]. Thus, we see from thr MAP test that the minimum cost test is x ∈ A 0 if x ≤ x MC = 2 − 1 4 ln C 01 P [H 1 ] C 10 P [H 0 ] = 0.846; x ∈ A 1 otherwise. (7) The false alarm and miss probabilities are P FA = P [X ≥ x MC |H 0 ] = Q(x MC ) = 0.1987 (8) P MISS = P [X < x MC |H 1 ] = Φ(x MC −4) = 1 −Φ(3.154) = 8.06 10 −4 . (9) The average cost of the minimum cost policy is E [C MC ] = C 10 P FA P [H 0 ] +C 01 P MISS P [H 1 ] (10) = (1)(0.1987)(0.99) + (10 4 )(8.06 10 −4 )(0.01) = 0.2773. (11) Because the cost of a miss is so high, the minimum cost test greatly reduces the miss proba- bility, resulting in a much lower average cost than the MAP test. Problem 8.2.5 Solution Given H 0 , X is Gaussian (0, 1). Given H 1 , X is Gaussian (v, 1). By Theorem 8.4, the Neyman- Pearson test is x ∈ A 0 if L(x) = f X|H 0 (x) f X|H 1 (x) = e −x 2 /2 e −(x−v) 2 /2 ≥ γ; x ∈ A 1 otherwise. (1) This rule simplifies to x ∈ A 0 if L(x) = e −[x 2 −(x−v) 2 ]/2 = e −vx+v 2 /2 ≥ γ; x ∈ A 1 otherwise. (2) Taking logarithms, the Neyman-Pearson rule becomes x ∈ A 0 if x ≤ x 0 = v 2 − 1 v ln γ; x ∈ A 1 otherwise. (3) The choice of γ has a one-to-one correspondence with the choice of the threshold x 0 . Moreoever L(x) ≥ γ if and only if x ≤ x 0 . In terms of x 0 , the false alarm probability is P FA = P [L(X) < γ|H 0 ] = P [X ≥ x 0 |H 0 ] = Q(x 0 ). (4) Thus we choose x 0 such that Q(x 0 ) = α. 286 Problem 8.2.6 Solution Given H 0 , M n (T) has expected value E[V ]/n = 3/n and variance Var[V ]/n = 9/n. Given H 1 , M n (T) has expected value E[D]/n = 6/n and variance Var[D]/n = 36/n. (a) Using a Central Limit Theorem approximation, the false alarm probability is P FA = P [M n (T) > t 0 |H 0 ] = P ¸ M n (T) −3 9/n > t 0 −3 9/n ¸ = Q( √ n[t 0 /3 −1]). (1) (b) Again, using a CLT Approximation, the miss probability is P MISS = P [M n (T) ≤ t 0 |H 1 ] = P ¸ M n (T) −6 36/n ≤ t 0 −6 36/n ¸ = Φ( √ n[t 0 /6 −1]). (2) (c) From Theorem 8.6, the maximum likelihood decision rule is t ∈ A 0 if f Mn(T)|H 0 (t) f Mn(T)|H 1 (t) ≥ 1; t ∈ A 1 otherwise. (3) We will see shortly that using a CLT approximation for the likelihood functions is something of a detour. Nevertheless, with a CLT approximation, the likelihood functions are f Mn(T)|H 0 (t) = n 18π e −n(t−3) 2 /18 f Mn(T)|H 1 (t) = n 72π e −n(t−6) 2 /72 (4) From the CLT approximation, the ML decision rule is t ∈ A 0 if 72 18 e −n(t−3) 2 /18 e −n(t−6) 2 /72 = 2e −n[4(t−3) 2 −(t−6) 2 ]/72 ≥ 1; t ∈ A 1 otherwise. (5) After some algebra, this rule simplifies to t ∈ A 0 if t 2 −4t − 24 ln 2 n ≤ 0; t ∈ A 1 otherwise. (6) Since the quadratic t 2 −4t−24 ln(2)/n has two zeros, we use the quadratic formula to find the roots. One root corresponds to a negative value of t and can be discarded since M n (T) ≥ 0. Thus the ML rule (for n = 9) becomes t ∈ A 0 if t ≤ t ML = 2 + 2 1 + 6 ln(2)/n = 4.42; t ∈ A 1 otherwise. (7) The negative root of the quadratic is the result of the Gaussian assumption which allows for a nonzero probability that M n (T) will be negative. In this case, hypothesis H 1 which has higher variance becomes more likely. However, since M n (T) ≥ 0, we can ignore this root since it is just an artifact of the CLT approximation. In fact, the CLT approximation gives an incorrect answer. Note that M n (T) = Y n /n where Y n is a sum of iid exponential random variables. Under hypothesis H 0 , Y n is an Erlang (n, λ 0 = 1/3) random variable. Under hypothesis H 1 , Y n is an Erlang (n, λ 1 = 1/6) random variable. Since M n (T) = Y n /n is a scaled version of Y n , Theorem 3.20 tells us that given 287 hypothesis H i , M n (T) is an Erlang (n, nλ i ) random variable. Thus M n (T) has likelihood functions f Mn(T)|H i (t) = (nλ i ) n t n−1 e −nλ i t (n−1)! t ≥ 0 0 otherwise (8) Using the Erlang likelihood functions, the ML rule becomes t ∈ A 0 if f Mn(T)|H 0 (t) f Mn(T)|H 1 (t) = λ 0 λ 1 n e −n(λ 0 −λ 1 )t ≥ 1; t ∈ A 1 otherwise. (9) This rule simplifies to t ∈ A 0 if t ≤ t ML = ln(λ 0 /λ 1 ) λ 0 −λ 1 = 6 ln 2 = 4.159; t ∈ A 1 otherwise. (10) Since 6 ln 2 = 4.159, this rule is not the same as the rule derived using a CLT approximation. Using the exact Erlang PDF, the ML rule does not depend on n. Moreoever, even if n →∞, the exact Erlang-derived rule and the CLT approximation rule remain different. In fact, the CLT-based rule is simply an approximation to the correct rule. This highlights that we should first check whether a CLT approximation is necessary before we use it. (d) In this part, we will use the exact Erlang PDFs to find the MAP decision rule. From 8.2, the MAP rule is t ∈ A 0 if f Mn(T)|H 0 (t) f Mn(T)|H 1 (t) = λ 0 λ 1 n e −n(λ 0 −λ 1 )t ≥ P [H 1 ] P [H 0 ] ; t ∈ A 1 otherwise. (11) Since P[H 0 ] = 0.8 and P[H 1 ] = 0.2, the MAP rule simplifies to t ∈ A 0 if t ≤ t MAP = ln λ 0 λ 1 − 1 n ln P[H 1 ] P[H 0 ] λ 0 −λ 1 = 6 ¸ ln 2 + ln 4 n ; t ∈ A 1 otherwise. (12) For n = 9, t MAP = 5.083. (e) Although we have seen it is incorrect to use a CLT approximation to derive the decision rule, the CLT approximation used in parts (a) and (b) remains a good way to estimate the false alarm and miss probabilities. However, given H i , M n (T) is an Erlang (n, nλ i ) random variable. In particular, given H 0 , M n (T) is an Erlang (n, n/3) random variable while given H 1 , M n (T) is an Erlang (n, n/6). Thus we can also use erlangcdf for an exact calculation of the false alarm and miss probabilities. To summarize the results of parts (a) and (b), a threshold t 0 implies that P FA = P [M n (T) > t 0 |H 0 ] = 1-erlangcdf(n,n/3,t0) ≈ Q( √ n[t 0 /3 −1]), (13) P MISS = P [M n (T) ≤ t 0 |H 1 ] = erlangcdf(n,n/6,t0) ≈ Φ( √ n[t 0 /6 −1]). (14) (15) Here is a program that generates the receiver operating curve. 288 %voicedatroc.m t0=1:0.1:8’; n=9; PFA9=1.0-erlangcdf(n,n/3,t0); PFA9clt=1-phi(sqrt(n)*((t0/3)-1)); PM9=erlangcdf(n,n/6,t0); PM9clt=phi(sqrt(n)*((t0/6)-1)); n=16; PFA16=1.0-erlangcdf(n,n/3,t0); PFA16clt=1.0-phi(sqrt(n)*((t0/3)-1)); PM16=erlangcdf(n,n/6,t0); PM16clt=phi(sqrt(n)*((t0/6)-1)); plot(PFA9,PM9,PFA9clt,PM9clt,PFA16,PM16,PFA16clt,PM16clt); axis([0 0.8 0 0.8]); legend(’Erlang n=9’,’CLT n=9’,’Erlang n=16’,’CLT n=16’); Here are the resulting ROCs. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.2 0.4 0.6 0.8 Erlang n=9 CLT n=9 Erlang n=16 CLT n=16 Both the true curve and CLT-based approximations are shown. The graph makes it clear that the CLT approximations are somewhat innaccurate. It is also apparent that the ROC for n = 16 is clearly better than for n = 9. Problem 8.2.7 Solution This problem is a continuation of Problem 8.2.6. In this case, we say a call is a “success” if T > t 0 . The success probability depends on which hypothesis is true. In particular, p 0 = P [T > t 0 |H 0 ] = e −t 0 /3 , p 1 = P [T > t 0 |H 1 ] = e −t 0 /6 . (1) Under hypothesis H i , K has the binomial (n, p i ) PMF P K|H i (k) = n k p k i (1 −p i ) n−k . (2) (a) A false alarm occurs if K > k 0 under hypothesis H 0 . The probability of this event is P FA = P [K > k 0 |H 0 ] = n ¸ k=k 0 +1 n k p k 0 (1 −p 0 ) n−k , (3) (b) From Theorem 8.6, the maximum likelihood decision rule is k ∈ A 0 if P K|H 0 (k) P K|H 1 (k) = p k 0 (1 −p 0 ) n−k p k 1 (1 −p 1 ) n−k ≥ 1; k ∈ A 1 otherwise. (4) 289 This rule simplifies to k ∈ A 0 if k ln p 0 /(1 −p 0 ) p 1 /(1 −p 1 ) ≥ ln 1 −p 1 1 −p 0 n; k ∈ A 1 otherwise. (5) To proceed further, we need to know if p 0 < p 1 or if p 0 ≥ p 1 . For t 0 = 4.5, p 0 = e −1.5 = 0.2231 < e −0.75 = 0.4724 = p 1 . (6) In this case, the ML rule becomes k ∈ A 0 if k ≤ k ML = ln 1−p 0 1−p 1 ln p 1 /(1−p 1 ) p 0 /(1−p 0 ) n = (0.340)n; k ∈ A 1 otherwise. (7) For n = 16, k ML = 5.44. (c) From Theorem 8.2, the MAP test is k ∈ A 0 if P K|H 0 (k) P K|H 1 (k) = p k 0 (1 −p 0 ) n−k p k 1 (1 −p 1 ) n−k ≥ P [H 1 ] P [H 0 ] ; k ∈ A 1 otherwise. (8) with P[H 0 ] = 0.8 and P[H 1 ] = 0.2, this rule simplifies to k ∈ A 0 if k ln p 0 /(1 −p 0 ) p 1 /(1 −p 1 ) ≥ ln 1 −p 1 1 −p 0 n; k ∈ A 1 otherwise. (9) For t 0 = 4.5, p 0 = 0.2231 < p 1 = 0.4724, the MAP rule becomes k ∈ A 0 if k ≤ k MAP = ln 1−p 0 1−p 1 + ln 4 ln p 1 /(1−p 1 ) p 0 /(1−p 0 ) n = (0.340)n + 1.22; k ∈ A 1 otherwise. (10) For n = 16, k MAP = 6.66. (d) For threshold k 0 , the false alarm and miss probabilities are P FA = P [K > k 0 |H 0 ] = 1-binomialcdf(n,p0,k0) (11) P MISS = P [K ≤ k 0 |H 1 ] = binomialcdf(n,p1,k0) (12) The ROC is generated by evaluating P FA and P MISS for each value of k 0 . Here is a Matlab program that does this task and plots the ROC. function [PFA,PMISS]=binvoicedataroc(n); t0=[3; 4.5]; p0=exp(-t0/3); p1=exp(-t0/6); k0=(0:n)’; PFA=zeros(n+1,2); for j=1:2, PFA(:,j) = 1.0-binomialcdf(n,p0(j),k0); PM(:,j)=binomialcdf(n,p1(j),k0); end plot(PFA(:,1),PM(:,1),’-o’,PFA(:,2),PM(:,2),’-x’); legend(’t_0=3’,’t_0=4.5’); axis([0 0.8 0 0.8]); xlabel(’\itP_{\rmFA}’); ylabel(’\itP_{\rmMISS}’); 290 and here is the resulting ROC: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.2 0.4 0.6 0.8 P FA P M I S S t 0 =3 t 0 =4.5 As we see, the test works better with threshold t 0 = 4.5 than with t 0 = 3. Problem 8.2.8 Solution Given hypothesis H 0 that X = 0, Y = W is an exponential (λ = 1) random variable. Given hypothesis H 1 that X = 1, Y = V +W is an Erlang (n = 2, λ = 1) random variable. That is, f Y |H 0 (y) = e −y y ≥ 0, 0 otherwise, f Y |H 1 (y) = ye −y y ≥ 0, 0 otherwise. (1) The probability of a decoding error is minimized by the MAP rule. Since P[H 0 ] = P[H 1 ] = 1/2, the MAP rule is y ∈ A 0 if f Y |H 0 (y) f Y |H 1 (y) = e −y ye −y ≥ P [H 1 ] P [H 0 ] = 1; y ∈ A 1 otherwise. (2) Thus the MAP rule simplifies to y ∈ A 0 if y ≤ 1; y ∈ A 1 otherwise. (3) The probability of error is P ERR = P [Y > 1|H 0 ] P [H 0 ] +P [Y ≤ 1|H 1 ] P [H 1 ] (4) = 1 2 ∞ 1 e −y dy + 1 2 1 0 ye −y dy (5) = e −1 2 + 1 −2e −1 2 = 1 −e −1 2 . (6) Problem 8.2.9 Solution Given hypothesis H i , K has the binomial PMF P K|H i (k) = n k q k i (1 −q i ) n−k . (1) (a) The ML rule is k ∈ A 0 if P K|H 0 (k) > P K|H 1 (k) ; k ∈ A 1 otherwise. (2) 291 When we observe K = k ∈ {0, 1, . . . , n}, plugging in the conditional PMF’s yields the rule k ∈ A 0 if n k q k 0 (1 −q 0 ) n−k > n k q k 1 (1 −q 1 ) n−k ; k ∈ A 1 otherwise. (3) Cancelling common factors, taking the logarithm of both sides, and rearranging yields k ∈ A 0 if k ln q 0 + (n −k) ln(1 −q 0 ) > k ln q 1 + (n −k) ln(1 −q 1 ); k ∈ A 1 otherwise. (4) By combining all terms with k, the rule can be simplified to k ∈ A 0 if k ln q 1 /(1 −q 1 ) q 0 /(1 −q 0 ) < nln 1 −q 0 1 −q 1 ; k ∈ A 1 otherwise. (5) Note that q 1 > q 0 implies q 1 /(1 −q 1 ) > q 0 /(1 −q 0 ). Thus, we can rewrite our ML rule as k ∈ A 0 if k < k ∗ = n ln[(1 −q 0 )/(1 −q 1 )] ln[q 1 /q 0 ] + ln[(1 −q 0 )/(1 −q 1 )] ; k ∈ A 1 otherwise. (6) (b) Let k ∗ denote the threshold given in part (a). Using n = 500, q 0 = 10 −4 , and q 1 = 10 −2 , we have k ∗ = 500 ln[(1 −10 −4 )/(1 −10 −2 )] ln[10 −2 /10 −4 ] + ln[(1 −10 −4 )/(1 −10 −2 )] ≈ 1.078 (7) Thus the ML rule is that if we observe K ≤ 1, then we choose hypothesis H 0 ; otherwise, we choose H 1 . The false alarm probability is P FA = P [A 1 |H 0 ] = P [K > 1|H 0 ] (8) = 1 −P K|H 0 (0) −P K|H 1 (1) (9) = 1 −(1 −q 0 ) 500 −500q 0 (1 −q 0 ) 499 = 0.0012 (10) and the miss probability is P MISS = P [A 0 |H 1 ] = P [K ≤ 1|H 1 ] (11) = P K|H 1 (0) +P K|H 1 (1) (12) = (1 −q 1 ) 500 + 500q 1 (1 −q 1 ) 499 = 0.0398. (13) (c) In the test of Example 8.8, the geometric random variable N, the number of tests needed to find the first failure, was used. In this problem, the binomial random variable K, the number of failures in 500 tests, was used. We will call these two procedures the geometric and the binomial tests. Also, we will use P (N) FA and P (N) MISS to denote the false alarm and miss probabilities using the geometric test. We also use P (K) FA and P (K) MISS for the error probabilities of the binomial test. From Example 8.8, we have the following comparison: geometric test binomial test (14) P (N) FA = 0.0045, P (K) FA = 0.0012, (15) P (N) MISS = 0.0087, P (K) MISS = 0.0398 (16) 292 When making comparisons between tests, we want to judge both the reliability of the test as well as the cost of the testing procedure. With respect to the reliability, we see that the conditional error probabilities appear to be comparable in that P (N) FA P (K) FA = 3.75 but P (K) MISS P (N) MISS = 4.57. (17) Roughly, the false alarm probability of the geometric test is about four times higher than that of the binomial test. However, the miss probability of the binomial test is about four times that of the geometric test. As for the cost of the test, it is reasonable to assume the cost is proportional to the number of disk drives that are tested. For the geometric test of Example 8.8, we test either until the first failure or until 46 drives pass the test. For the binomial test, we test until either 2 drives fail or until 500 drives the pass the test! You can, if you wish, calculate the expected number of drives tested under each test method for each hypothesis. However, it isn’t necessary in order to see that a lot more drives will be tested using the binomial test. If we knew the a priori probabilities P[H i ] and also the relative costs of the two type of errors, then we could determine which test procedure was better. However, without that information, it would not be unreasonable to conclude that the geometric test offers performance roughly comparable to that of the binomial test but with a significant reduction in the expected number of drives tested. Problem 8.2.10 Solution The key to this problem is to observe that P [A 0 |H 0 ] = 1 −P [A 1 |H 0 ] , P [A 1 |H 1 ] = 1 −P [A 0 |H 1 ] . (1) The total expected cost can be written as E C = P [A 1 |H 0 ] P [H 0 ] C 10 + (1 −P [A 1 |H 0 ])P [H 0 ] C 00 (2) +P [A 0 |H 1 ] P [H 1 ] C 01 + (1 −P [A 0 |H 1 ])P [H 1 ] C 11 . (3) Rearranging terms, we have E C = P [A 1 |H 0 ] P [H 0 ] (C 10 −C 00 ) +P [A 0 |H 1 ] P [H 1 ] (C 01 −C 11 ) +P [H 0 ] C 00 +P [H 1 ] C 11 . (4) Since P[H 0 ]C 00 + P[H 1 ]C 11 does not depend on the acceptance sets A 0 and A 1 , the decision rule that minimizes E[C ] is the same decision rule that minimizes E C = P [A 1 |H 0 ] P [H 0 ] (C 10 −C 00 ) +P [A 0 |H 1 ] P [H 1 ] (C 01 −C 11 ). (5) The decision rule that minimizes E[C ] is the same as the minimum cost test in Theorem 8.3 with the costs C 01 and C 10 replaced by the differential costs C 01 −C 11 and C 10 −C 00 . Problem 8.3.1 Solution Since the three hypotheses H 0 , H 1 , and H 2 are equally likely, the MAP and ML hypothesis tests are the same. From Theorem 8.8, the MAP rule is x ∈ A m if f X|H m (x) ≥ f X|H j (x) for all j. (1) 293 Since N is Gaussian with zero mean and variance σ 2 N , the conditional PDF of X given H i is f X|H i (x) = 1 2πσ 2 N e −(x−a(i−1)) 2 /2σ 2 N . (2) Thus, the MAP rule is x ∈ A m if (x −a(m−1)) 2 ≤ (x −a(j −1)) 2 for all j. (3) This implies that the rule for membership in A 0 is x ∈ A 0 if (x +a) 2 ≤ x 2 and (x +a) 2 ≤ (x −a) 2 . (4) This rule simplifies to x ∈ A 0 if x ≤ −a/2. (5) Similar rules can be developed for A 1 and A 2 . These are: x ∈ A 1 if −a/2 ≤ x ≤ a/2 (6) x ∈ A 2 if x ≥ a/2 (7) To summarize, the three acceptance regions are A 0 = {x|x ≤ −a/2} A 1 = {x| −a/2 < x ≤ a/2} A 2 = {x|x > a/2} (8) Graphically, the signal space is one dimensional and the acceptance regions are X s 0 s 1 s 2 A 0 A 1 A 2 -a 0 a Just as in the QPSK system of Example 8.13, the additive Gaussian noise dictates that the ac- ceptance region A i is the set of observations x that are closer to s i = (i − 1)a than any other s j . Problem 8.3.2 Solution Let the components of s ijk be denoted by s (1) ijk and s (2) ijk so that given hypothesis H ijk , ¸ X 1 X 2 = ¸ s (1) ijk s (2) ijk ¸ + ¸ N 1 N 2 (1) As in Example 8.13, we will assume N 1 and N 2 are iid zero mean Gaussian random variables with variance σ 2 . Thus, given hypothesis H ijk , X 1 and X 2 are independent and the conditional joint PDF of X 1 and X 2 is f X 1 ,X 2 |H ijk (x 1 , x 2 ) = f X 1 |H ijk (x 1 ) f X 2 |H ijk (x 2 ) (2) = 1 2πσ 2 e −(x 1 −s (1) ijk ) 2 /2σ 2 e −(x 2 −s (2) ijk ) 2 /2σ 2 (3) = 1 2πσ 2 e −[(x 1 −s (1) ijk ) 2 +(x 2 −s (2) ijk ) 2 ]/2σ 2 (4) 294 In terms of the distance |x −s ijk | between vectors x = ¸ x 1 x 2 s ijk = ¸ s (1) ijk s (2) ijk ¸ (5) we can write f X 1 ,X 2 |H i (x 1 , x 2 ) = 1 2πσ 2 e −|x−s ijk| 2 /2σ 2 (6) Since all eight symbols s 000 , . . . , s 111 are equally likely, the MAP and ML rules are x ∈ A ijk if f X 1 ,X 2 |H ijk (x 1 , x 2 ) ≥ f X 1 ,X 2 |H i j k (x 1 , x 2 ) for all other H i j k . (7) This rule simplifies to x ∈ A ijk if |x −s ijk | ≤ |x −s ijk | for all other i j k . (8) This means that A ijk is the set of all vectors x that are closer to s ijk than any other signal. Graphically, to find the boundary between points closer to s ijk than s i j k , we draw the line seg- ment connecting s ijk and s i j k . The boundary is then the perpendicular bisector. The resulting boundaries are shown in this figure: X 1 X 2 s 000 A 000 s 100 A 100 s 010 A 010 s 110 A 110 s 011 A 011 s 111 A 111 s 001 A 001 s 101 A 101 Problem 8.3.3 Solution In Problem 8.3.1, we found the MAP acceptance regions were A 0 = {x|x ≤ −a/2} A 1 = {x| −a/2 < x ≤ a/2} A 2 = {x|x > a/2} (1) To calculate the probability of decoding error, we first calculate the conditional error probabilities P [D E |H i ] = P [X ∈ A i |H i ] (2) Given H i , recall that X = a(i −1) +N. This implies P [X ∈ A 0 |H 0 ] = P [−a +N > −a/2] = P [N > a/2] = Q a 2σ N (3) P [X ∈ A 1 |H 1 ] = P [N < −a/2] +P [N > a/2] = 2Q a 2σ N (4) P [X ∈ A 2 |H 2 ] = P [a +N < a/2] = P [N < −a/2] = Q a 2σ N (5) 295 Since the three hypotheses H 0 , H 1 , and H 2 each have probability 1/3, the probability of error is P [D E ] = 2 ¸ i=0 P [X ∈ A i |H i ] P [H i ] = 4 3 Q a 2σ N (6) Problem 8.3.4 Solution Let H i denote the hypothesis that symbol a i was transmitted. Since the four hypotheses are equally likely, the ML tests will maximize the probability of a correct decision. Given H i , N 1 and N 2 are independent and thus X 1 and X 2 are independent. This implies f X 1 ,X 2 |H i (x 1 , x 2 ) = f X 1 |H i (x 1 ) f X 2 |H i (x 2 ) (1) = 1 2πσ 2 e −(x 1 −s i1 ) 2 /2σ 2 e −(x 2 −s i2 ) 2 /2σ 2 (2) = 1 2πσ 2 e −[(x 1 −s i1 ) 2 +(x 2 −s i2 ) 2 ]/2σ 2 (3) From Definition 8.2 the acceptance regions A i for the ML multiple hypothesis test must satisfy (x 1 , x 2 ) ∈ A i if f X 1 ,X 2 |H i (x 1 , x 2 ) ≥ f X 1 ,X 2 |H j (x 1 , x 2 ) for all j. (4) Equivalently, the ML acceptance regions are (x 1 , x 2 ) ∈ A i if (x 1 −s i1 ) 2 + (x 2 −s i2 ) 2 ≤ (x 1 −s j1 ) 2 + (x 2 −s j2 ) 2 for all j (5) In terms of the vectors x and s i , the acceptance regions are defined by the rule x ∈ A i if |x −s i | 2 ≤ |x −s j | 2 (6) Just as in the case of QPSK, the acceptance region A i is the set of vectors x that are closest to s i . Problem 8.3.5 Solution From the signal constellation depicted in Problem 8.3.5, each signal s ij1 is below the x-axis while each signal s ij0 is above the x-axis. The event B 3 of an error in the third bit occurs if we transmit a signal s ij1 but the receiver output x is above the x-axis or if we transmit a signal s ij0 and the receiver output is below the x-axis. By symmetry, we need only consider the case when we transmit one of the four signals s ij1 . In particular, • Given H 011 or H 001 , X 2 = −1 +N 2 • Given H 101 or H 111 , X 2 = −2 +N 2 This implies P [B 3 |H 011 ] = P [B 3 |H 001 ] = P [−1 +N 2 > 0] = Q(1/σ N ) (1) P [B 3 |H 101 ] = P [B 3 |H 111 ] = P [−2 +N 2 > 0] = Q(2/σ N ) (2) Assuming all four hypotheses are equally likely, the probability of an error decoding the third bit is P [B 3 ] = P [B 3 |H 011 ] +P [B 3 |H 001 ] +P [B 3 |H 101 ] +P [B 3 |H 111 ] 4 (3) = Q(1/σ N ) +Q(2/σ N ) 2 (4) 296 Problem 8.3.6 Solution (a) Hypothesis H i is that X = s i +N, where N is a Gaussian random vector independent of which signal was transmitted. Thus, given H i , X is a Gaussian (s i , σ 2 I) random vector. Since X is two-dimensional, f X|H i (x) = 1 2πσ 2 e − 1 2 (x−s i ) σ 2 I −1 (x−s i ) = 1 2πσ 2 e − 1 2σ 2 x−s i 2 . (1) Since the hypotheses H i are equally likely, the MAP and ML rules are the same and achieve the minimum probability of error. In this case, from the vector version of Theorem 8.8, the MAP rule is x ∈ A m if f X|Hm (x) ≥ f X|H j (x) for all j. (2) X 1 X 2 s 0 s 1 s 2 s M-1 s M-2 A 0 A 1 A M-1 A 2 A M-2 Using the conditional PDFs f X|H i (x), the MAP rule becomes x ∈ A m if |x −s m | 2 ≤ |x −s j | 2 for all j. (3) In terms of geometry, the interpretation is that all vectors x closer to s m than to any other signal s j are assigned to A m . In this problem, the signal constellation (i.e., the set of vectors s i ) is the set of vectors on the circle of radius E. The acceptance regions are the “pie slices” around each signal vector. (b) Consider the following sketch to determine d. X 1 X 2 s 0 d E 1/2 q/2 Geometrically, the largest d such that |x −s i | ≤ d defines the largest circle around s i that can be inscribed into the pie slice A i . By symme- try, this is the same for every A i , hence we examine A 0 . Each pie slice has angle θ = 2π/M. Since the length of each signal vector is √ E, the sketch shows that sin(θ/2) = d/ √ E. Thus d = √ E sin(π/M). (c) By symmetry, P ERR is the same as the conditional probability of error 1−P[A i |H i ], no matter which s i is transmitted. Let B denote a circle of radius d at the origin and let B i denote the circle of radius d around s i . Since B 0 ⊂ A 0 , P [A 0 |H 0 ] = P [X ∈ A 0 |H 0 ] ≥ P [X ∈ B 0 |H 0 ] = P [N ∈ B] . (4) Since the components of N are iid Gaussian (0, σ 2 ) random variables, P [N ∈ B] = B f N 1 ,N 2 (n 1 , n 2 ) dn 1 dn 2 = 1 2πσ 2 B e −(n 2 1 +n 2 2 )/2σ 2 dn 1 dn 2 . (5) 297 By changing to polar coordinates, P [N ∈ B] = 1 2πσ 2 d 0 2π 0 e −r 2 /2σ 2 r dθ dr (6) = 1 σ 2 d 0 re −r 2 /2σ 2 r dr (7) = −e −r 2 /2σ 2 d 0 = 1 −e −d 2 /2σ 2 = 1 −e −Esin 2 (π/M)/2σ 2 (8) Thus P ERR = 1 −P [A 0 |H 0 ] ≤ 1 −P [N ∈ B] = e −Esin 2 (π/M)/2σ 2 . (9) Problem 8.3.7 Solution (a) In Problem 8.3.4, we found that in terms of the vectors x and s i , the acceptance regions are defined by the rule x ∈ A i if |x −s i | 2 ≤ |x −s j | 2 for all j. (1) Just as in the case of QPSK, the acceptance region A i is the set of vectors x that are closest to s i . Graphically, these regions are easily found from the sketch of the signal constellation: X 1 X 2 s 0 s 1 4 s 1 2 s 8 s 9 s 11 s 1 0 s 1 3 s 1 5 s 1 s 2 s 5 s 7 s 6 s 4 s 3 (b) For hypothesis A 1 , we see that the acceptance region is A 1 = {(X 1 , X 2 )|0 < X 1 ≤ 2, 0 < X 2 ≤ 2} (2) 298 Given H 1 , a correct decision is made if (X 1 , X 2 ) ∈ A 1 . Given H 1 , X 1 = 1 + N 1 and X 2 = 1 +N 2 . Thus, P [C|H 1 ] = P [(X 1 , X 2 ) ∈ A 1 |H 1 ] (3) = P [0 < 1 +N 1 ≤ 2, 0 < 1 +N 2 ≤ 2] (4) = (P [−1 < N 1 ≤ 1]) 2 (5) = (Φ(1/σ N ) −Φ(−1/σ N )) 2 (6) = (2Φ(1/σ N ) −1) 2 (7) (c) Surrounding each signal s i is an acceptance region A i that is no smaller than the acceptance region A 1 . That is, P [C|H i ] = P [(X 1 , X 2 ) ∈ A i |H 1 ] (8) ≥ P [−1 < N 1 ≤ 1, −1 < N 2 ≤ 1] (9) = (P [−1 < N 1 ≤ 1]) 2 = P [C|H 1 ] . (10) This implies P [C] = 15 ¸ i=0 P [C|H i ] P [H 1 ] (11) ≥ 15 ¸ i=0 P [C|H 1 ] P [H i ] = P [C|H 1 ] 15 ¸ i=0 P [H i ] = P [C|H 1 ] (12) Problem 8.3.8 Solution Let p i = P[H i ]. From Theorem 8.8, the MAP multiple hypothesis test is (x 1 , x 2 ) ∈ A i if p i f X 1 ,X 2 |H i (x 1 , x 2 ) ≥ p j f X 1 ,X 2 |H j (x 1 , x 2 ) for all j (1) From Example 8.13, the conditional PDF of X 1 , X 2 given H i is f X 1 ,X 2 |H i (x 1 , x 2 ) = 1 2πσ 2 e −[(x 1 − √ Ecos θ i ) 2 +(x 2 − √ Esin θ i ) 2 ]/2σ 2 (2) Using this conditional joint PDF, the MAP rule becomes • (x 1 , x 2 ) ∈ A i if for all j, − (x 1 − √ E cos θ i ) 2 + (x 2 − √ E sin θ i ) 2 2σ 2 + (x 1 − √ E cos θ j ) 2 + (x 2 − √ E sin θ j ) 2 2σ 2 ≥ ln p j p i . (3) Expanding the squares and using the identity cos 2 θ + sin 2 θ = 1 yields the simplified rule • (x 1 , x 2 ) ∈ A i if for all j, x 1 [cos θ i −cos θ j ] +x 2 [sin θ i −sin θ j ] ≥ σ 2 √ E ln p j p i (4) 299 Note that the MAP rules define linear constraints in x 1 and x 2 . Since θ i = π/4 +iπ/2, we use the following table to enumerate the constraints: cos θ i sin θ i i = 0 1/ √ 2 1/ √ 2 i = 1 −1/ √ 2 1/ √ 2 i = 2 −1/ √ 2 −1/ √ 2 i = 3 1/ √ 2 −1/ √ 2 (5) To be explicit, to determine whether (x 1 , x 2 ) ∈ A i , we need to check the MAP rule for each j = i. Thus, each A i is defined by three constraints. Using the above table, the acceptance regions are • (x 1 , x 2 ) ∈ A 0 if x 1 ≥ σ 2 √ 2E ln p 1 p 0 x 2 ≥ σ 2 √ 2E ln p 3 p 0 x 1 +x 2 ≥ σ 2 √ 2E ln p 2 p 0 (6) • (x 1 , x 2 ) ∈ A 1 if x 1 ≤ σ 2 √ 2E ln p 1 p 0 x 2 ≥ σ 2 √ 2E ln p 2 p 1 −x 1 +x 2 ≥ σ 2 √ 2E ln p 3 p 1 (7) • (x 1 , x 2 ) ∈ A 2 if x 1 ≤ σ 2 √ 2E ln p 2 p 3 x 2 ≤ σ 2 √ 2E ln p 2 p 1 x 1 +x 2 ≥ σ 2 √ 2E ln p 2 p 0 (8) • (x 1 , x 2 ) ∈ A 3 if x 1 ≥ σ 2 √ 2E ln p 2 p 3 x 2 ≤ σ 2 √ 2E ln p 3 p 0 −x 1 +x 2 ≥ σ 2 √ 2E ln p 2 p 3 (9) Using the parameters σ = 0.8 E = 1 p 0 = 1/2 p 1 = p 2 = p 3 = 1/6 (10) the acceptance regions for the MAP rule are A 0 = {(x 1 , x 2 )|x 1 ≥ −0.497, x 2 ≥ −0.497, x 1 +x 2 ≥ −0.497} (11) A 1 = {(x 1 , x 2 )|x 1 ≤ −0.497, x 2 ≥ 0, −x 1 +x 2 ≥ 0} (12) A 2 = {(x 1 , x 2 )|x 1 ≤ 0, x 2 ≤ 0, x 1 +x 2 ≥ −0.497} (13) A 3 = {(x 1 , x 2 )|x 1 ≥ 0, x 2 ≤ −0.497, −x 1 +x 2 ≥ 0} (14) Here is a sketch of these acceptance regions: 300 X 1 X 2 s 0 A 0 s 1 A 1 s 2 A 2 s 3 A 3 Note that the boundary between A 1 and A 3 defined by −x 1 +x 2 ≥ 0 plays no role because of the high value of p 0 . Problem 8.3.9 Solution (a) First we note that P 1/2 X = √ p 1 . . . √ p k ¸ ¸ ¸ X 1 . . . X k ¸ ¸ ¸ = √ p 1 X 1 . . . √ p k X k ¸ ¸ ¸. (1) Since each S i is a column vector, SP 1/2 X = S 1 · · · S k √ p 1 X 1 . . . √ p k X k ¸ ¸ ¸ = √ p 1 X 1 S 1 + · · · + √ p k X k S k . (2) Thus Y = SP 1/2 X+N = ¸ k i=1 √ p i X i S i +N. (b) Given the observation Y = y, a detector must decide which vector X = X 1 · · · X k was (collectively) sent by the k transmitters. A hypothesis H j must specify whether X i = 1 or X i = −1 for each i. That is, a hypothesis H j corresponds to a vector x j ∈ B k which has ±1 components. Since there are 2 k such vectors, there are 2 k hypotheses which we can enumerate as H 1 , . . . , H 2 k. Since each X i is independently and equally likely to be ±1, each hypothesis has probability 2 −k . In this case, the MAP and and ML rules are the same and achieve minimum probability of error. The MAP/ML rule is y ∈ A m if f Y|Hm (y) ≥ f Y|H j (y) for all j. (3) Under hypothesis H j , Y = SP 1/2 x j + N is a Gaussian (SP 1/2 x j , σ 2 I) random vector. The conditional PDF of Y is f Y|H j (y) = 1 (2πσ 2 ) n/2 e − 1 2 (y−SP 1/2 x j ) (σ 2 I) −1 (y−SP 1/2 x j ) = 1 (2πσ 2 ) n/2 e −|y−SP 1/2 x j| 2 /2σ 2 . (4) 301 The MAP rule is y ∈ A m if e −|y−SP 1/2 xm| 2 /2σ 2 ≥ e −|y−SP 1/2 x j| 2 /2σ 2 for all j, (5) or equivalently, y ∈ A m if y −SP 1/2 x m ≤ y −SP 1/2 x j for all j. (6) That is, we choose the vector x ∗ = x m that minimizes the distance y −SP 1/2 x j among all vectors x j ∈ B k . Since this vector x ∗ is a function of the observation y, this is described by the math notation x ∗ (y) = arg min x∈B k y −SP 1/2 x , (7) where arg min x g(x) returns the argument x that minimizes g(x). (c) To implement this detector, we must evaluate y −SP 1/2 x for each x ∈ B k . Since there 2 k vectors in B k , we have to evaluate 2 k hypotheses. Because the number of hypotheses grows exponentially with the number of users k, the maximum likelihood detector is considered to be computationally intractable for a large number of users k. Problem 8.3.10 Solution A short answer is that the decorrelator cannot be the same as the optimal maximum likelihood (ML) detector. If they were the same, that means we have reduced the 2 k comparisons of the optimal detector to a linear transformation followed by k single bit comparisons. However, as this is not a satisfactory answer, we will build a simple example with k = 2 users and precessing gain n = 2 to show the difference between the ML detector and the decorrelator. In particular, suppose user 1 transmits with code vector S 1 = 1 0 and user transmits with code vector S 2 = cos θ sin θ In addition, we assume that the users powers are p 1 = p 2 = 1. In this case, P = I and S = ¸ 1 cos θ 0 sin θ . (1) For the ML detector, there are four hypotheses corresponding to each possible transmitted bit of each user. Using H i to denote the hypothesis that X = x i , we have X = x 1 = ¸ 1 1 X = x 3 = ¸ −1 1 (2) X = x 2 = ¸ 1 −1 X = x 4 = ¸ −1 −1 (3) When X = x i , Y = y i +N where y i = Sx i . Thus under hypothesis H i , Y = y i +N is a Gaussian (y i , σ 2 I) random vector with PDF f Y|H i (y) = 1 2πσ 2 e −(y−y i ) (σ 2 I) −1 (y−y i )/2 = 1 2πσ 2 e −y−y i 2 /2σ 2 . (4) With the four hypotheses equally likely, the MAP and ML detectors are the same and minimize the probability of error. From Theorem 8.8, this decision rule is y ∈ A m if f Y|H m (y) ≥ f Y|H j (y) for all j. (5) 302 This rule simplifies to y ∈ A m if |y −y m | ≤ |y −y j | for all j. (6) It is useful to show these acceptance sets graphically. In this plot, the area around y i is the acceptance set A i and the dashed lines are the boundaries between the acceptance sets. Y 1 Y 2 q y 1 y 2 y 3 y 4 A 1 A 2 A 4 A 3 y 1 = ¸ 1 + cos θ sin θ y 3 = ¸ −1 + cos θ sin θ (7) y 2 = ¸ 1 −cos θ −sin θ y 4 = ¸ −1 −cos θ −sin θ (8) The probability of a correct decision is P [C] = 1 4 4 ¸ i=1 A i f Y|H i (y) dy. (9) Even though the components of Y are conditionally independent given H i , the four integrals A i f Y|H i (y) dy cannot be represented in a simple form. Moreoever, they cannot even be rep- resented by the Φ(·) function. Note that the probability of a correct decision is the probability that the bits X 1 and X 2 transmitted by both users are detected correctly. The probability of a bit error is still somewhat more complex. For example if X 1 = 1, then hypotheses H 1 and H 3 are equally likely. The detector guesses ˆ X 1 = 1 if Y ∈ A 1 ∪ A 3 . Given X 1 = 1, the conditional probability of a correct decision on this bit is P ˆ X 1 = 1|X 1 = 1 = 1 2 P [Y ∈ A 1 ∪ A 3 |H 1 ] + 1 2 P [Y ∈ A 1 ∪ A 3 |H 3 ] (10) = 1 2 A 1 ∪A 3 f Y|H 1 (y) dy + 1 2 A 1 ∪A 3 f Y|H 3 (y) dy (11) By comparison, the decorrelator does something simpler. Since S is a square invertible matrix, (S S) −1 S = S −1 (S ) −1 S = S −1 = 1 sin θ ¸ 1 −cos θ 0 1 (12) We see that the components of ˜ Y = S −1 Y are ˜ Y 1 = Y 1 − cos θ sin θ Y 2 , ˜ Y 2 = Y 2 sin θ . (13) Assuming (as in earlier sketch) that 0 < θ < π/2, the decorrelator bit decisions are Y 1 Y 2 q y 1 y 2 y 3 y 4 A 1 A 2 A 4 A 3 ˆ X 1 = sgn ( ˜ Y 1 ) = sgn Y 1 − cos θ sin θ Y 2 (14) ˆ X 2 = sgn ( ˜ Y 2 ) = sgn Y 2 sin θ = sgn (Y 2 ). (15) 303 Because we chose a coordinate system such that S 1 lies along the x-axis, the effect of the decorrela- tor on the rule for bit X 2 is particularly easy to understand. For bit X 2 , we just check whether the vector Y is in the upper half plane. Generally, the boundaries of the decorrelator decision regions are determined by straight lines, they are easy to implement and probability of error is easy to calculate. However, these regions are suboptimal in terms of probability of error. Problem 8.4.1 Solution Under hypothesis H i , the conditional PMF of X is P X|H i (x) = (1 −p i )p x−1 i /(1 −p 20 i ) x = 1, 2, . . . , 20, 0 otherwise, (1) where p 0 = 0.99 and p 1 = 0.9. It follows that for x 0 = 0, 1, . . . , 19 that P [X > x 0 |H i ] = 1 −p i 1 −p 20 i 20 ¸ x=x 0 +1 p x−1 i = 1 −p i 1 −p 20 i p x 0 i + · · · +p 19 i (2) = p x 0 i (1 −p i ) 1 −p 20 i 1 +p i + · · · +p 19−x 0 i (3) = p x 0 i (1 −p 20−x 0 i ) 1 −p 20 i = p x 0 i −p 20 i 1 −p 20 i (4) We note that the above formula is also correct for x 0 = 20. Using this formula, the false alarm and miss probabilities are P FA = P [X > x 0 |H 0 ] = p x 0 0 −p 20 0 1 −p 20 0 , (5) P MISS = 1 −P [X > x 0 |H 1 ] = 1 −p x 0 1 1 −p 20 1 (6) The Matlab program rocdisc(p0,p1) returns the false alarm and miss probabilities and also plots the ROC. Here is the program and the output for rocdisc(0.9,0.99): function [PFA,PMISS]=rocdisc(p0,p1); x=0:20; PFA= (p0.^x-p0^(20))/(1-p0^(20)); PMISS= (1.0-(p1.^x))/(1-p1^(20)); plot(PFA,PMISS,’k.’); xlabel(’\itP_{\rm FA}’); ylabel(’\itP_{\rm MISS}’); 0 0.5 1 0 0.2 0.4 0.6 0.8 1 P FA P M I S S From the receiver operating curve, we learn that we have a fairly lousy sensor. No matter how we set the threshold x 0 , either the false alarm probability or the miss probability (or both!) exceed 0.5. Problem 8.4.2 Solution From Example 8.7, the probability of error is P ERR = pQ σ 2v ln p 1 −p + v σ + (1 −p)Φ σ 2v ln p 1 −p − v σ . (1) 304 It is straightforward to use Matlab to plot P ERR as a function of p. The function bperr calculates P ERR for a vector p and a scalar signal to noise ratio snr corresponding to v/σ. A second program bperrplot(snr) plots P ERR as a function of p. Here are the programs function perr=bperr(p,snr); %Problem 8.4.2 Solution r=log(p./(1-p))/(2*snr); perr=(p.*(qfunction(r+snr))) ... +((1-p).*phi(r-snr)); function pe=bperrplot(snr); p=0.02:0.02:0.98; pe=bperr(p,snr); plot(p,pe); xlabel(’\it p’); ylabel(’\it P_{ERR}’); Here are three outputs of bperrplot for the requested SNR values. 0 0.5 1 0 0.2 0.4 0.6 0.8 p P E R R 0 0.5 1 0 0.05 0.1 0.15 0.2 p P E R R 0 0.5 1 2 4 6 8 x 10 −24 p P E R R bperrplot(0.1) bperrplot(0.1) bperrplot(0.1) In all three cases, we see that P ERR is maximum at p = 1/2. When p = 1/2, the optimal (minimum probability of error) decision rule is able to exploit the one hypothesis having higher a priori probability than the other. This gives the wrong impression that one should consider building a communication system with p = 1/2. To see this, consider the most extreme case in which the error probability goes to zero as p → 0 or p → 1. However, in these extreme cases, no information is being communicated. When p = 0 or p = 1, the detector can simply guess the transmitted bit. In fact, there is no need to tansmit a bit; however, it becomes impossible to transmit any information. Finally, we note that v/σ is an SNR voltage ratio. For communication systems, it is common to measure SNR as a power ratio. In particular, v/σ = 10 corresponds to a SNR of 10 log 1 0(v 2 /σ 2 ) = 20 dB. Problem 8.4.3 Solution With v = 1.5 and d = 0.5, it appeared in Example 8.14 that T = 0.5 was best among the values tested. However, it also seemed likely the error probability P e would decrease for larger values of T. To test this possibility we use sqdistor with 100,000 transmitted bits by trying the following: >> T=[0.4:0.1:1.0];Pe=sqdistor(1.5,0.5,100000,T); >> [Pmin,Imin]=min(Pe);T(Imin) ans = 0.80000000000000 Thus among {0.4, 0.5, · · · , 1.0}, it appears that T = 0.8 is best. Now we test values of T in the neighborhood of 0.8: >> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T); >>[Pmin,Imin]=min(Pe);T(Imin) ans = 0.78000000000000 305 This suggests that T = 0.78 is best among these values. However, inspection of the vector Pe shows that all values are quite close. If we repeat this experiment a few times, we obtain: >> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T); >> [Pmin,Imin]=min(Pe);T(Imin) ans = 0.78000000000000 >> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T); >> [Pmin,Imin]=min(Pe);T(Imin) ans = 0.80000000000000 >> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T); >> [Pmin,Imin]=min(Pe);T(Imin) ans = 0.76000000000000 >> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T); >> [Pmin,Imin]=min(Pe);T(Imin) ans = 0.78000000000000 This suggests that the best value of T is in the neighborhood of 0.78. If someone were paying you to find the best T, you would probably want to do more testing. The only useful lesson here is that when you try to optimize parameters using simulation results, you should repeat your experiments to get a sense of the variance of your results. Problem 8.4.4 Solution Since the a priori probabilities P[H 0 ] and P[H 1 ] are unknown, we use a Neyamn-Pearson formula- tion to find the ROC. For a threshold γ, the decision rule is x ∈ A 0 if f X|H 0 (x) f X|H 1 (x) ≥ γ; x ∈ A 1 otherwise. (1) Using the given conditional PDFs, we obtain x ∈ A 0 if e −(8x−x 2 )/16 ≥ γx/4; x ∈ A 1 otherwise. (2) Taking logarithms yields x ∈ A 0 if x 2 −8x ≥ 16 ln(γ/4) + 16 ln x; x ∈ A 1 otherwise. (3) With some more rearranging, x ∈ A 0 if (x −4) 2 ≥ 16 ln(γ/4) + 16 . .. . γ 0 +16 ln x; x ∈ A 1 otherwise. (4) When we plot the functions f(x) = (x −4) 2 and g(x) = γ 0 +16 ln x, we see that there exist x 1 and x 2 such that f(x 1 ) = g(x 1 ) and f(x 2 ) = g(x 2 ). In terms of x 1 and x 2 , A 0 = [0, x 1 ] ∪ [x 2 , ∞), A 1 = (x 1 , x 2 ). (5) Using a Taylor series expansion of lnx around x = x 0 = 4, we can show that g(x) = γ 0 + 16 ln x ≤ h(x) = γ 0 + 16(ln 4 −1) + 4x. (6) Since h(x) is linear, we can use the quadratic formula to solve f(x) = h(x), yielding a solution ¯ x 2 = 6+ √ 4 + 16 ln 4 + γ 0 . One can show that x 2 ≤ ¯ x 2 . In the example shown below corresponding to γ = 1 shown here, x 1 = 1.95, x 2 = 9.5 and ¯ x 2 = 6 + √ 20 = 10.47. 306 0 2 4 6 8 10 12 −40 −20 0 20 40 60 f(x) g(x) h(x) Given x 1 and x 2 , the false alarm and miss probabilities are P FA = P [A 1 |H 0 ] = 2 x 1 1 2 e −x/2 dx = e −x 1 /2 −e −x 2 /2 , (7) P MISS = 1 −P [A 1 |H 1 ] = 1 − x 2 x 1 x 8 e −x 2 /16 dx = 1 −e −x 2 1 /16 +e −x 2 2 /16 (8) To calculate the ROC, we need to find x 1 and x 2 . Rather than find them exactly, we calculate f(x) and g(x) for discrete steps over the interval [0, 1 + ¯ x 2 ] and find the discrete values closest to x 1 and x 2 . However, for these approximations to x 1 and x 2 , we calculate the exact false alarm and miss probabilities. As a result, the optimal detector using the exact x 1 and x 2 cannot be worse than the ROC that we calculate. In terms of Matlab, we divide the work into the functions gasroc(n) which generates the ROC by calling [x1,x2]=gasrange(gamma) to calculate x 1 and x 2 for a given value of γ. function [pfa,pmiss]=gasroc(n); a=(400)^(1/(n-1)); k=1:n; g=0.05*(a.^(k-1)); pfa=zeros(n,1); pmiss=zeros(n,1); for k=1:n, [x1,x2]=gasrange(g(k)); pmiss(k)=1-(exp(-x1^2/16)... -exp(-x2^2/16)); pfa(k)=exp(-x1/2)-exp(-x2/2); end plot(pfa,pmiss); ylabel(’P_{\rm MISS}’); xlabel(’P_{\rm FA}’); function [x1,x2]=gasrange(gamma); g=16+16*log(gamma/4); xmax=7+ ... sqrt(max(0,4+(16*log(4))+g)); dx=xmax/500; x=dx:dx:4; y=(x-4).^2-g-16*log(x); [ym,i]=min(abs(y)); x1=x(i); x=4:dx:xmax; y=(x-4).^2-g-16*log(x); [ym,i]=min(abs(y)); x2=x(i); The argment n of gasroc(n) generates the ROC for n values of γ, ranging from from 1/20 to 20 in multiplicative steps. Here is the resulting ROC: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 P M I S S P FA 307 After all of this work, we see that the sensor is not particularly good in the the ense that no matter how we choose the thresholds, we cannot reduce both the miss and false alarm probabilities under 30 percent. Problem 8.4.5 Solution X 1 X 2 s 0 s 1 s 2 s M-1 s M-2 A 0 A 1 A M-1 A 2 A M-2 In the solution to Problem 8.3.6, we found that the signal constellation and acceptance regions shown in the adjacent figure. We could solve this problem by a general simulation of an M-PSK system. This would include a random sequence of data sysmbols, mapping symbol i to vector s i , adding the noise vector N to produce the receiver output X = s i +N. However, we are only asked to find the probability of symbol error, but not the probability that symbol i is decoded as symbol j at the receiver. Because of the symmetry of the signal constellation and the acceptance regions, the probability of symbol error is the same no matter what symbol is transmitted. N 1 N 2 ( ) -E , 0 1/2 q/2 (0,0) Thus it is simpler to assume that s 0 is transmitted every time and check that the noise vector N is in the pie slice around s 0 . In fact by translating s 0 to the origin, we obtain the “pie slice” geometry shown in the figure. Because the lines marking the boundaries of the pie slice have slopes ±tan θ/2. The pie slice region is given by the constraints N 2 ≤ tan(θ/2) N 1 + √ E , N 2 ≥ −tan(θ/2) N 1 + √ E . (1) We can rearrange these inequalities to express them in vector form as ¸ −tan θ/2 1 −tan θ/2 −1 ¸ N 1 N 2 ≤ ¸ 1 1 √ E tan θ/2. (2) Finally, since each N i has variance σ 2 , we define the Gaussian (0, I) random vector Z = N/σ and write our constraints as ¸ −tan θ/2 1 −tan θ/2 −1 ¸ Z 1 Z 2 ≤ ¸ 1 1 √ γ tan θ/2, (3) where γ = E/σ 2 is the signal to noise ratio of the system. The Matlab “simulation” simply generates many pairs Z 1 Z 2 and checks what fraction meets these constraints. the function mpsksim(M,snr,n) simulates the M-PSK system with SNR snr for n bit transmissions. The script mpsktest graphs the symbol error probability for M = 8, 16, 32. 308 function Pe=mpsksim(M,snr,n); %Problem 8.4.5 Solution: %Pe=mpsksim(M,snr,n) %n bit M-PSK simulation t=tan(pi/M); A =[-t 1; -t -1]; Z=randn(2,n); PC=zeros(length(snr)); for k=1:length(snr), B=(A*Z)<=t*sqrt(snr(k)); PC(k)=sum(min(B))/n; end Pe=1-PC; %mpsktest.m; snr=10.^((0:30)/10); n=500000; Pe8=mpsksim(8,snr,n); Pe16=mpsksim(16,snr,n); Pe32=mpsksim(32,snr,n); loglog(snr,Pe8,snr,Pe16,snr,Pe32); legend(’M=8’,’M=16’,’M=32’,3); In mpsksim, each column of the matrix Z corresponds to a pair of noise variables Z 1 Z 2 . The code B=(A*Z)<=t*sqrt(snr(k)) checks whether each pair of noise variables is in the pie slice region. That is, B(1,j) and B(2,j) indicate if the ith pair meets the first and second constraints. Since min(B) operates on each column of B, min(B) is a row vector indicating which pairs of noise variables passed the test. Here is the output of mpsktest: 10 0 10 1 10 2 10 3 10 −5 10 0 M=8 M=16 M=32 The curves for M = 8 and M = 16 end prematurely because for high SNR, the error rate is so low that no errors are generated in 500,000 symbols. In this case, the measured P e is zero and since log 0 = −∞, the loglog function simply ignores the zero values. Problem 8.4.6 Solution When the transmitted bit vector is X = x, the received signal vector Y = SP 1/2 x+N is a Gaussian (SP 1/2 x, σ 2 I) random vector with conditional PDF f Y|X (y|x) = 1 (2πσ 2 ) k/2 e −|y−SP 1/2 x| 2 /2σ 2 . (1) The transmitted data vector x belongs to the set B k of all binary ±1 vectors of length k. In principle, we can enumerate the vectors in B k as x 0 , x 1 , . . . , x 2 k −1 . Moreover, each possible data vector x m represents a hypothesis. Since there are 2 k possible data vectors, there are 2 k acceptance sets A m . The set A m is the set of all vectors y such that the decision rule is to guess ˆ X = x m , Our normal procedure is to write a decision rule as “y ∈ A m if . . . ” however this problem has so many has so many hypotheses that it is more staightforward to refer to a hypothesis X = x m by the function ˆ x(y) which returns the vector x m when y ∈ A m . In short, ˆ x(y) is our best guess as to which vector x was transmitted when y is received. 309 Because each hypotheses has a priori probability 2 −k , the probability of error is minimized by the maximum likelihood (ML) rule ˆ x(y) = arg max x∈B k f Y|X (y|x) . (2) Keep in mind that arg max x g(x) returns the argument x that maximizes g(x). In any case, the form of f Y|X (y|x) implies that the ML rule should minimize the negative exponent of f Y|X (y|x). That is, the ML rule is ˆ x(y) = arg min x∈B k y −SP 1/2 x (3) = arg min x∈B k (y −SP 1/2 x) (y −SP 1/2 x) (4) = arg min x∈B k y y −2y SP 1/2 x +x P 1/2 S SP 1/2 x (5) Since the term y y is the same for every x, we can define the function h(x) = −2y SP 1/2 x +x P 1/2 S SP 1/2 x, (6) In this case, the ML rule can be expressed as ˆ x(y) = arg min x∈B k h(x). We use Matlab to evaluate h(x) for each x ∈ B k . Since for k = 10, B k has 2 10 = 1024 vectors, it is desirable to make the calculation as easy as possible. To this end, we define w = SP 1/2 x and and we write, with some abuse of notation, h(·) as a function of w: h(w) = −2y w+w w (7) Still, given y, we need to evaluate h(w) for each vector w. In Matlab, this will be convenient because we can form the matrices X and W with columns consisting of all possible vectors x and w. In Matlab, it is easy to calculate w w by operating on the matrix W without looping through all columns w. function X=allbinaryseqs(n) %See Problem 8.4.6 %X: n by 2^n matrix of all %length n binary vectors %Thanks to Jasvinder Singh A=repmat([0:2^n-1],[n,1]); P=repmat([1:n]’,[1,2^n]); X = bitget(A,P); X=(2*X)-1; In terms of Matlab, we start by defining X=allbinaryseqs(n) which returns an n2 n matrix X, corresponding to X, such that the columns of X enumerate all possible binary ±1 sequences of length n. How allbinaryseqs works will be clear by generating the matrices A and P and reading the help for bitget. function S=randomsignals(n,k); %S is an n by k matrix, columns are %random unit length signal vectors S=(rand(n,k)>0.5); S=((2*S)-1.0)/sqrt(n); Next is a short program that generates k random sig- nals, each of length n. Each random signal is just a binary ±1 sequence normalized to have length 1. Next, for a set of signal vectors (spreading sequences in CDMA parlance) given by the n k matrix S, err=cdmasim(S,P,m) simulates the transmission of a frame of m symbols through a k user CDMA system with additive Gaussian noise. A “symbol,” is just a vector x corresponding to the k transmitted bits of the k users. 310 In addition, the function Pe=rcdma(n,k,snr,s,m) runs cdmasim for the pairs of values of users k and SNR snr. Here is the pair of functions: function err=cdmasim(S,P,m); %err=cdmasim(P,S,m); %S= n x k matrix of signals %P= diag matrix of SNRs (power % normalized by noise variance) %See Problem 8.4.6 k=size(S,2); %number of users n=size(S,1); %processing gain X=allbinaryseqs(k);%all data Phalf=sqrt(P); W=S*Phalf*X; WW=sum(W.*W); err=0; for j=1:m, s=duniformrv(1,2^k,1); y=S*Phalf*X(:,s)+randn(n,1); [hmin,imin]=min(-2*y’*W+WW); err=err+sum(X(:,s)~=X(:,imin)); end function Pe=rcdma(n,k,snr,s,m); %Pe=rcdma(n,k,snr,s,m); %R-CDMA simulation: % proc gain=n, users=k % rand signal set/frame % s frames, m symbols/frame %See Problem 8.4.6 Solution [K,SNR]=ndgrid(k,snr); Pe=zeros(size(SNR)); for j=1:prod(size(SNR)), p=SNR(j);k=K(j); e=0; for i=1:s, S=randomsignals(n,k); e=e+cdmasim(S,p*eye(k),m); end Pe(j)=e/(s*m*k); % disp([p k e Pe(j)]); end In cdmasim, the kth diagonal element of P is the “power” p k of user k. Technically, we assume that the additive Gaussian noise variable have variance 1, and thus p k is actually the signal to noise ratio of user k. In addition, WW is a length 2 k row vector, with elements w w for each possible w. For each of the m random data symbols, represented by x (or X(:,s) in Matlab), cdmasim calculates a received signal y (y). Finally, hmin is the minimum h(w) and imin is the index of the column of W that minimizes h(w). Thus imin is also the index of the minimizing column of X. Finally, cdmasim compares ˆ x(y) and the transmitted vector x bit by bit and counts the total number of bit errors. The function rcdma repeats cdmasim for s frames, with a random signal set for each frame. Dividing the total number of bit errors over s frames by the total number of transmitted bits, we find the bit error rate P e . For an SNR of 4 dB and processing gain 16, the requested tests are generated with the commands >> n=16; >> k=[2 4 8 16]; >> Pe=rcdma(n,k,snr,100,1000); >>Pe Pe = 0.0252 0.0272 0.0385 0.0788 >> To answer part (b), the code for the matched filter (MF) detector is much simpler because there is no need to test 2 k hypotheses for every transmitted symbol. Just as for the case of the ML detector, we define a function err=mfcdmasim(S,P,m) that simulates the MF detector for m symbols for a given set of signal vectors S. In mfcdmasim, there is no need for looping. The mth transmitted symbol is represented by the mth column of X and the corresponding received signal is given by the mth column of Y. The matched filter processing can be applied to all m columns at once. A second function Pe=mfrcdma(n,k,snr,s,m) cycles through all combinations of users k and SNR snr and calculates the bit error rate for each pair of values. Here are the functions: 311 function err=mfcdmasim(S,P,m); %err=mfcdmasim(P,S,m); %S= n x k matrix of signals %P= diag matrix of SNRs % SNR=power/var(noise) %See Problem 8.4.6b k=size(S,2); %no. of users n=size(S,1); %proc. gain Phalf=sqrt(P); X=randombinaryseqs(k,m); Y=S*Phalf*X+randn(n,m); XR=sign(S’*Y); err=sum(sum(XR ~= X)); function Pe=mfrcdma(n,k,snr,s,m); %Pe=rcdma(n,k,snr,s,m); %R-CDMA, MF detection % proc gain=n, users=k % rand signal set/frame % s frames, m symbols/frame %See Problem 8.4.6 Solution [K,SNR]=ndgrid(k,snr); Pe=zeros(size(SNR)); for j=1:prod(size(SNR)), p=SNR(j);kt=K(j); e=0; for i=1:s, S=randomsignals(n,kt); e=e+mfcdmasim(S,p*eye(kt),m); end Pe(j)=e/(s*m*kt); disp([snr kt e]); end Here is a run of mfrcdma. >> pemf=mfrcdma(16,k,4,1000,1000); 4 2 73936 4 4 264234 4 8 908558 4 16 2871356 >> pemf’ ans = 0.0370 0.0661 0.1136 0.1795 >> The following plot compares the maximum likelihood (ML) and matched filter (MF) detectors. 0 2 4 6 8 10 12 14 16 0 0.05 0.1 0.15 0.2 ML MF As the ML detector offers the minimum probability of error, it should not surprising that it has a lower bit error rate. Although the MF detector is worse, the reduction in detctor complexity makes it attractive. In fact, in practical CDMA-based cellular phones, the processing gain ranges from roughly 128 to 512. In such case, the complexity of the ML detector is prohibitive and thus only matched filter detectors are used. Problem 8.4.7 Solution For the CDMA system of Problem 8.3.10, the received signal resulting from the transmissions of k users was given by Y = SP 1/2 X+N (1) 312 where S is an n k matrix with ith column S i and P 1/2 = diag[ √ p 1 , . . . , √ p k ] is a k k diagonal matrix of received powers, and N is a Gaussian (0, σ 2 I) Gaussian noise vector. (a) When S has linearly independent columns, S S is invertible. In this case, the decorrelating detector applies a transformation to Y to generate ˜ Y = (S S) −1 S Y = P 1/2 X+ ˜ N, (2) where ˜ N = (S S) −1 S N is still a Gaussian noise vector with expected value E[ ˜ N] = 0. Decorrelation separates the signals in that the ith component of ˜ Y is ˜ Y i = √ p i X i + ˜ N i . (3) This is the same as a single user-receiver output of the binary communication system of Example 8.6. The single-user decision rule ˆ X i = sgn ( ˜ Y i ) for the transmitted bit X i has probability of error P e,i = P ˜ Y i > 0|X i = −1 = P − √ p i + ˜ N i > 0 = Q p i Var[ ˜ N i ] . (4) However, since ˜ N = AN where A = (S S) −1 S , Theorem 5.16 tells us that ˜ N has covariance matrix C ˜ N = AC N A . We note that the general property that (B −1 ) = (B ) −1 implies that A = S((S S) ) −1 = S(S S) −1 . These facts imply C ˜ N == (S S) −1 S (σ 2 I)S(S S) −1 = σ 2 (S S) −1 . (5) Note that S S is called the correlation matrix since its i, jth entry is S i S j is the correlation between the signal of user i and that of user j. Thus Var[ ˜ N i ] = σ 2 (S S) −1 ii and the probability of bit error for user i is for user i is P e,i = Q p i Var[ ˜ N i ] = Q p i (S S) −1 ii . (6) To find the probability of error for a randomly chosen but, we average over the bits of all users and find that P e = 1 k k ¸ i=1 P e,i = 1 k k ¸ i=1 Q p i (S S) −1 ii . (7) (b) When S S is not invertible, the detector flips a coin to decide each bit. In this case, P e,i = 1/2 and thus P e = 1/2. (c) When S is chosen randomly, we need to average over all possible matrices S to find the average probability of bit error. However, there are 2 kn possible matrices S and averaging over all of them is too much work. Instead, we randomly generate m matrices S and estimate the average P e by averaging over these m matrices. A function berdecorr uses this method to evaluate the decorrelator BER. The code has a lot of lines because it evaluates the BER using m signal sets for each combination of users k and SNRs snr. However, because the program generates signal sets and calculates the BER asssociated with each, there is no need for the simulated transmission of bits. Thus the program runs quickly. Since there are only 2 n distinct columns for matrix S, it is quite 313 possible to generate signal sets that are not linearly independent. In this case, berdecorr assumes the “flip a coin” rule is used. Just to see whether this rule dominates the error probability, we also display counts of how often S is rank deficient. Here is the (somewhat tedious) code: function Pe=berdecorr(n,k,snr,m); %Problem 8.4.7 Solution: R-CDMA with decorrelation %proc gain=n, users=k, average Pe for m signal sets count=zeros(1,length(k)); %counts rank<k signal sets Pe=zeros(length(k),length(snr)); snr=snr(:)’; for mm=1:m, for i=1:length(k), S=randomsignals(n,k(i)); R=S’*S; if (rank(R)<k(i)) count(i)=count(i)+1; Pe(i,:)=Pe(i,:)+0.5*ones(1,length(snr)); else G=diag(inv(R)); Pe(i,:)=Pe(i,:)+sum(qfunction(sqrt((1./G)*snr)))/k(i); end end end disp(’Rank deficiency count:’);disp(k);disp(count); Pe=Pe/m; Running berdecorr with processing gains n = 16 and n = 32 yields the following output: >> k=[1 2 4 8 16 32]; >> pe16=berdecorr(16,k,4,10000); Rank deficiency count: 1 2 4 8 16 32 0 2 2 12 454 10000 >> pe16’ ans = 0.0228 0.0273 0.0383 0.0755 0.3515 0.5000 >> pe32=berdecorr(32,k,4,10000); Rank deficiency count: 1 2 4 8 16 32 0 0 0 0 0 0 >> pe32’ ans = 0.0228 0.0246 0.0290 0.0400 0.0771 0.3904 >> As you might expect, the BER increases as the number of users increases. This occurs because the decorrelator must suppress a large set of interferers. Also, in generating 10,000 signal matrices S for each value of k we see that rank deficiency is fairly uncommon, however it occasionally occurs for processing gain n = 16, even if k = 4 or k = 8. Finally, here is a plot of these same BER statistics for n = 16 and k ∈ {2, 4, 8, 16}. Just for comparison, on the same graph is the BER for the matched filter detector and the maximum likelihood detector found in Problem 8.4.6. 314 0 2 4 6 8 10 12 14 16 0 0.1 0.2 0.3 0.4 k P e ML Decorr MF We see from the graph that the decorrelator is better than the matched filter for a small number of users. However, when the number of users k is large (relative to the processing gain n), the decorrelator suffers because it must suppress all interfering users. Finally, we note that these conclusions are specific to this scenario when all users have equal SNR. When some users have very high SNR, the decorrelator is good for the low-SNR user because it zeros out the interference from the high-SNR user. Problem 8.4.8 Solution Each transmitted symbol s b 1 b 2 b 3 corresponds to the transmission of three bits given by the vector b = b 1 b 2 b 3 . Note that s b 1 b 2 b 3 is a two dimensional vector with components s (1) b 1 b 2 b 3 s (2) b 1 b 2 b 3 . The key to this problem is the mapping from bits to symbol components and then back to bits. From the signal constellation shown with Problem 8.3.2, we make the following observations: • s b 1 b 2 b 3 is in the right half plane if b 2 = 0; otherwise it is in the left half plane. • s b 1 b 2 b 3 is in the upper half plane if b 3 = 0; otherwise it is in the lower half plane. • There is an inner ring and an outer ring of signals. s b 1 b 2 b 3 is in the inner ring if b 1 = 0; otherwise it is in the outer ring. Given a bit vector b, we use these facts by first using b 2 and b 3 to map b = b 1 b 2 b 3 to an inner ring signal vector s ∈ 1 1 , −1 1 , −1 −1 , 1 −1 ¸ . (1) In the next step we scale s by (1 +b 1 ). If b 1 = 1, then s is stretched to the outer ring. Finally, we add a Gaussian noise vector N to generate the received signal X = s b 1 b 2 b 3 +N. X 1 X 2 s 000 A 000 s 100 A 100 s 010 A 010 s 110 A 110 s 011 A 011 s 111 A 111 s 001 A 001 s 101 A 101 In the solution to Problem 8.3.2, we found that the accep- tance set for the hypothesis H b 1 b 2 b 3 that s b 1 b 2 b 3 is transmitted is the set of signal space points closest to s b 1 b 2 b 3 . Graphically, these acceptance sets are given in the adjacent figure. These acceptance sets correspond an inverse mapping of the re- ceived signal vector X to a bit vector guess ˆ b = ˆ b 1 ˆ b 2 ˆ b 3 using the following rules: • ˆ b 2 = 1 if X 1 < 0; otherwise ˆ b 2 = 0. • ˆ b 3 = 1 if X 2 < 0; otherwise ˆ b 3 = 0. • If |X 1 | + |X 2 | > 3 √ 2/2, then ˆ b 1 = 1; otherwise ˆ b 1 = 0. 315 We implement these steps with the function [Pe,ber]=myqam(sigma,m) which simulates the transmission of m symbols for each value of the vector sigma. Each column of B corresponds to a bit vector b. Similarly, each column of S and X corresponds to a transmitted signal s and received signal X. We calculate both the symbol decision errors that are made as well as the bit decision errors. Finally, a script myqamplot.m plots the symbol error rate Pe and bit error rate ber as a function of sigma. Here are the programs: function [Pe,ber]=myqam(sigma,m); Pe=zeros(size(sigma)); ber=Pe; B=reshape(bernoullirv(0.5,3*m),3,m); %S(1,:)=1-2*B(2,:); %S(2,:)=1-2*B(3,:); S=1-2*B([2; 3],:); S=([1;1]*(1+B(1,:))).*S; N=randn(2,m); for i=1:length(sigma), X=S+sigma(i)*N; BR=zeros(size(B)); BR([2;3],:)=(X<0); BR(1,:)=sum(abs(X))>(3/sqrt(2)); E=(BR~=B); Pe(i)=sum(max(E))/m; ber(i)=sum(sum(E))/(3*m); end %myqamplot.m sig=10.^(0.2*(-8:0)); [Pe,ber]=myqam(sig,1e6); loglog(sig,Pe,’-d’, ... sig,ber,’-s’); legend(’SER’,’BER’,4); Note that we generate the bits and transmitted signals, and normalized noise only once. However for each value of sigma, we rescale the additive noise, recalculate the received signal and receiver bit decisions. The output of myqamplot is shown in this figure: 10 −2 10 −1 10 0 10 −5 10 0 SER BER Careful reading of the figure will show that the ratio of the symbol error rate to the bit error rate is always very close to 3. This occurs because in the acceptance set for b1b 2 b 3 , the adjacent acceptance sets correspond to a one bit difference. Since the usual type of symbol error occurs when the vector X is in the adjacent set, a symbol error typically results in one bit being in error but two bits being received correctly. Thus the bit error rate is roughly one third the symbol error rate. Problem 8.4.9 Solution (a) For the M-PSK communication system with additive Gaussian noise, A j denoted the hypoth- esis that signal s j was transmitted. The solution to Problem 8.3.6 derived the MAP decision rule 316 X 1 X 2 s 0 s 1 s 2 s M-1 s M-2 A 0 A 1 A M-1 A 2 A M-2 x ∈ A m if |x −s m | 2 ≤ |x −s j | 2 for all j. (1) In terms of geometry, the interpretation is that all vectors x closer to s m than to any other signal s j are assigned to A m . In this problem, the signal constellation (i.e., the set of vectors s i ) is the set of vectors on the circle of radius E. The acceptance regions are the “pie slices” around each signal vector. We observe that |x −s j | 2 = (x −s j ) (x −s j ) = x x −2x s j +s j s . (2) Since all the signals are on the same circle, s j s j is the same for all j. Also, x x is the same for all j. Thus min j |x −s j | 2 = min j −x s j = max j x s j . (3) Since x s j = |x| |s j | cos φ where φ is the angle between x and s j . Thus maximizing x s j is equivalent to minimizing the angle between x and s j . (b) In Problem 8.4.5, we estimated the probability of symbol error without building a complete simulation of the M-PSK system. In this problem, we need to build a more complete simula- tion to determine the probabilities P ij . By symmetry, it is sufficient to transmit s 0 repeatedly and count how often the receiver guesses s j . This is done by the functionp=mpskerr(M,snr,n). function p=mpskerr(M,snr,n); %Problem 8.4.5 Solution: %Pe=mpsksim(M,snr,n) %n bit M-PSK simulation t=(2*pi/M)*(0:(M-1)); S=sqrt(snr)*[cos(t);sin(t)]; X=repmat(S(:,1),1,n)+randn(2,n); [y,e]=max(S’*X); p=countequal(e-1,(0:(M-1)))’/n; Note that column i of S is the signal s i−1 . The kth column of X corresponds to X k = s 0 + N k , the re- ceived signal for the kth transmission. Thus y(k) corresponds to max j X k s j and e(k) reports the re- ceiver decision for the kth transmission. The vector p calculates the relative frequency of each receiver decision. The next step is to translate the vector P 00 P 01 · · · P 0,M−1 (corresponding to p in Matlab) into an entire matrix P with elements P ij . The symmetry of the phase rotiation dictates that each row of P should be a one element cyclic rotation of the previous row. More- over, by symmetry we observe that P 01 = P 0,M−1 , P 02 = P 0,M−2 and so on. However, because p is derived from a simulation experiment, it will exhibit this symmetry only approximately. function P=mpskmatrix(p); M=length(p); r=[0.5 zeros(1,M-2)]; A=toeplitz(r)+... hankel(fliplr(r)); A=[zeros(1,M-1);A]; A=[[1; zeros(M-1,1)] A]; P=toeplitz(A*(p(:))); Our ad hoc (and largely unjustified) solution is to take the average of estimates of probabilities that symmetry says should be identical. (Why this is might be a good thing to do would make an interesting exam problem.) In mpskmatrix(p), the matrix A implements the averaging. The code will become clear by examining the matrices A and the output P. 317 (c) The next step is to determine the effect of the mapping of bits to transmission vectors s j . The matrix D with i, jth element d ij that indicates the number of bit positions in which the bit string assigned to s i differs from the bit string assigned to s j . In this case, the integers provide a compact representation of this mapping. For example the binary mapping is s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 000 001 010 011 100 101 110 111 0 1 2 3 4 5 6 7 The Gray mapping is s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 000 001 011 010 110 111 101 100 0 1 3 2 6 7 5 4 Thus the binary mapping can be represented by a vector c 1 = 0 1 · · · 7 while the Gray mapping is described by c 2 = 0 1 3 2 6 7 5 4 . function D=mpskdist(c); L=length(c);m=log2(L); [C1,C2]=ndgrid(c,c); B1=dec2bin(C1,m); B2=dec2bin(C2,m); D=reshape(sum((B1~=B2),2),L,L); The function D=mpskdist(c) translates the mapping vector c into the matrix D with entries d ij . The method is to generate grids C1 and C2 for the pairs of integers, convert each integer into a length log 2 M binary string, and then to count the number of bit positions in which each pair differs. Given matrices P and D, the rest is easy. We treat BER as as a finite random variable that takes on value d ij with probability P ij . the expected value of this finite random variable is the expected number of bit errors. Note that the BER is a “rate” in that BER = 1 M ¸ i ¸ j P ij d ij . (4) is the expected number of bit errors per transmitted symbol. function Pb=mpskmap(c,snr,n); M=length(c); D=mpskdist(c); Pb=zeros(size(snr)); for i=1:length(snr), p=mpskerr(M,snr(i),n); P=mpskmatrix(p); Pb(i)=finiteexp(D,P)/M; end Given the integer mapping vector c, we estimate the BER of the a mapping using just one more function Pb=mpskmap(c,snr,n). First we calculate the matrix D with elements dij. Next, for each value of the vector snr, we use n transmissions to estimate the probabili- ties P ij . Last, we calculate the expected number of bit errors per transmission. (d) We evaluate the binary mapping with the following commands: >> c1=0:7; >>snr=[4 8 16 32 64]; >>Pb=mpskmap(c1,snr,1000000); >> Pb Pb = 0.7640 0.4878 0.2198 0.0529 0.0038 318 (e) Here is the performance of the Gray mapping: >> c2=[0 1 3 2 6 7 5 4]; >>snr=[4 8 16 32 64]; >>Pg=mpskmap(c2,snr,1000000); >> Pg Pg = 0.4943 0.2855 0.1262 0.0306 0.0023 Experimentally, we observe that the BER of the binary mapping is higher than the BER of the Gray mapping by a factor in the neighborhood of 1.5 to 1.7 In fact, this approximate ratio can be derived by a quick and dirty analysis. For high SNR, suppose that that s i is decoded as s i+1 or s i−1 with probability q = P i,i+1 = P i,i−1 and all other types of errors are negligible. In this case, the BER formula based on this approximation corresponds to summing the matrix D for the first off-diagonals and the corner elements. Here are the calculations: >> D=mpskdist(c1); >> sum(diag(D,1))+sum(diag(D,-1))+D(1,8)+D(8,1) ans = 28 >> DG=mpskdist(c2); >> sum(diag(DG,1))+sum(diag(DG,-1))+DG(1,8)+DG(8,1) ans = 16 Thus in high SNR, we would expect BER(binary) ≈ 28q/M, BER(Gray) ≈ 16q/M. (5) The ratio of BERs is 28/16 = 1.75. Experimentally, we found at high SNR that the ratio of BERs was 0.0038/0.0023 = 1.65, which seems to be in the right ballpark. Problem 8.4.10 Solution As this problem is a continuation of Problem 8.4.9, this solution is also a continuation. In this problem, we want to determine the error probability for each bit k in a mapping of bits to the M-PSK signal constellation. The bit error rate associated with bit k is BER(k) = 1 M ¸ i ¸ j P ij d ij (k) (1) where d ij (k) indicates whether the bit strings mapped to s i and s j differ in bit position k. As in Problem 8.4.9, we describe the mapping by the vector of integers d. For example the binary mapping is s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 000 001 010 011 100 101 110 111 0 1 2 3 4 5 6 7 The Gray mapping is 319 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 000 001 011 010 110 111 101 100 0 1 3 2 6 7 5 4 Thus the binary mapping can be represented by a vector c 1 = 0 1 · · · 7 while the Gray mapping is described by c 2 = 0 1 3 2 6 7 5 4 . function D=mpskdbit(c,k); %See Problem 8.4.10: For mapping %c, calculate BER of bit k L=length(c);m=log2(L); [C1,C2]=ndgrid(c,c); B1=bitget(C1,k); B2=bitget(C2,k); D=(B1~=B2); The function D=mpskdbit(c,k) translates the mapping vector c into the matrix D with entries d ij that indicates whether bit k is in error when transmitted symbol s i is decoded by the receiver as s j . The method is to generate grids C1 and C2 for the pairs of integers, identify bit k in each integer, and then check if the integers differ in bit k. Thus, there is a matrix D associated with each bit position and we calculate the expected number of bit errors associated with each bit position. For each bit, the rest of the solution is the same as in Problem 8.4.9. We use the commands p=mpskerr(M,snr,n) and P=mpskmatrix(p) to calculate the matrix P which holds an estimate of each probability P ij . Finally, using matrices P and D, we treat BER(k) as a finite random variable that takes on value d ij with probability P ij . the expected value of this finite random variable is the expected number of bit errors. function Pb=mpskbitmap(c,snr,n); %Problem 8.4.10: Calculate prob. of %bit error for each bit position for %an MPSK bit to symbol mapping c M=length(c);m=log2(M); p=mpskerr(M,snr,n); P=mpskmatrix(p); Pb=zeros(1,m); for k=1:m, D=mpskdbit(c,k); Pb(k)=finiteexp(D,P)/M; end Given the integer mapping vector c, we estimate the BER of the a mapping using just one more function Pb=mpskmap(c,snr,n). First we calculate the matrix D with elements dij. Next, for a given value of snr, we use n transmissions to estimate the probabilities P ij . Last, we calculate the expected number of bit k errors per transmission. For an SNR of 10dB, we evaluate the two mappings with the following commands: >> c1=0:7; >> mpskbitmap(c1,10,100000) ans = 0.2247 0.1149 0.0577 >> c2=[0 1 3 2 6 7 5 4]; >> mpskbitmap(c2,10,100000) ans = 0.1140 0.0572 0.0572 We see that in the binary mapping, the 0.22 error rate of bit 1 is roughly double that of bit 2, which is roughly double that of bit 3. For the Gray mapping, the error rate of bit 1 is cut in half relative to the binary mapping. However, the bit error rates at each position a re still not identical since the error rate of bit 1 is still double that for bit 2 or bit 3. One might surmise that careful study of the matrix D might lead one to prove for the Gray map that the error rate for bit 1 is exactly double that for bits 2 and 3 . . . but that would be some other homework problem. 320 Problem Solutions – Chapter 9 Problem 9.1.1 Solution First we note that the event T > t 0 has probability P [T > t 0 ] = ∞ t 0 λe −λt dt = e −λt 0 . (1) Given T > t 0 , the conditional PDF of T is f T|T>t 0 (t) = f T (t) P[T>t 0 ] t > t 0 0 otherwise = λe −λ(t−t 0 ) t > t 0 , 0 otherwise. (2) Given T > t 0 , the minimum mean square error estimate of T is ˆ T = E [T|T > t 0 ] = ∞ −∞ tf T|T>t 0 (t) dt = ∞ t 0 λte −λ(t−t 0 ) dt. (3) With the substitution t = t −t 0 , we obtain ˆ T = ∞ 0 λ(t 0 +t )e −λt dt (4) = t 0 ∞ 0 λe −λt dt . .. . 1 + ∞ 0 t λe −λt dt . .. . E[T] = t 0 +E [T] (5) Problem 9.1.2 Solution (a) For 0 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = 1 x 6(y −x) dy (1) = 3y 2 −6xy y=1 y=x (2) = 3(1 −2x +x 2 ) = 3(1 −x) 2 (3) The complete expression for the marginal PDF of X is f X (x) = 3(1 −x) 2 0 ≤ x ≤ 1, 0 otherwise. (4) (b) The blind estimate of X is ˆ x B = E [X] = 1 0 3x(1 −x) 2 dx = 3 2 x 2 −2x 3 + 3 4 x 4 1 0 = 1 4 . (5) 321 (c) First we calculate P [X < 0.5] = 0.5 0 f X (x) dx = 0.5 0 3(1 −x) 2 dx = −(1 −x) 3 0.5 0 = 7 8 . (6) The conditional PDF of X given X < 0.5 is f X|X<0.5 (x) = f X (x) P[X<0.5] 0 ≤ x < 0.5 0 otherwise = 24 7 (1 −x) 2 0 ≤ x < 0.5, 0 otherwise. (7) The minimum mean square estimate of X given X < 0.5 is E [X|X < 0.5] = ∞ −∞ xf X|X<0.5 (x) dx = 0.5 0 24x 7 (1 −x) 2 dx (8) = 12x 2 7 − 16x 3 7 + 6x 4 7 0.5 0 = 11 56 . (9) (d) For y < 0 or y > 1, f Y (y) = 0. For 0 ≤ y ≤ 1, f Y (y) = y 0 6(y −x) dx = 6xy −3x 2 y 0 = 3y 2 . (10) The complete expression for the marginal PDF of Y is f Y (y) = 3y 2 0 ≤ y ≤ 1, 0 otherwise. (11) (e) The blind estimate of Y is ˆ y B = E [Y ] = ∞ −∞ yf Y (y) dy = 1 0 3y 3 dy = 3 4 . (12) (f) First we calculate P [Y > 0.5] = ∞ 0.5 f Y (y) dy = 1 0.5 3y 2 dy = 7 8 . (13) The conditional PDF of Y given Y > 0.5 is f Y |Y >0.5 (y) = f Y (y) P[Y >0.5] y > 0.5, 0 otherwise, = 24y 2 /7 y > 0.5, 0 otherwise. (14) The minimum mean square estimate of Y given Y > 0.5 is E [Y |Y > 0.5] = ∞ −∞ yf Y |Y >0.5 (y) dy = 1 0.5 24y 3 7 dy = 6y 4 7 1 0.5 = 45 56 . (15) 322 Problem 9.1.3 Solution (a) For 0 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = 1 x 2 dy = 2(1 −x). (1) The complete expression of the PDF of X is f X (x) = 2(1 −x) 0 ≤ x ≤ 1, 0 otherwise. (2) (b) The blind estimate of X is ˆ X B = E [X] = 1 0 2x(1 −x) dx = x 2 − 2x 3 3 1 0 = 1 3 . (3) (c) First we calculate P [X > 1/2] = 1 1/2 f X (x) dx = 1 1/2 2(1 −x) dx = (2x −x 2 ) 1 1/2 = 1 4 . (4) Now we calculate the conditional PDF of X given X > 1/2. f X|X>1/2 (x) = f X (x) P[X>1/2] x > 1/2, 0 otherwise, = 8(1 −x) 1/2 < x ≤ 1, 0 otherwise. (5) The minimum mean square error estimate of X given X > 1/2 is E [X|X > 1/2] = ∞ −∞ xf X|X>1/2 (x) dx (6) = 1 1/2 8x(1 −x) dx = 4x 2 − 8x 3 3 1 1/2 = 2 3 . (7) (d) For 0 ≤ y ≤ 1, the marginal PDF of Y is f Y (y) = ∞ −∞ f X,Y (x, y) dx = y 0 2 dx = 2y. (8) The complete expression for the marginal PDF of Y is f Y (y) = 2y 0 ≤ y ≤ 1, 0 otherwise. (9) (e) The blind estimate of Y is ˆ y B = E [Y ] = 1 0 2y 2 dy = 2 3 . (10) 323 (f) We already know that P[X > 1/2] = 1/4. However, this problem differs from the other problems in this section because we will estimate Y based on the observation of X. In this case, we need to calculate the conditional joint PDF f X,Y |X>1/2 (=) f X,Y (x,y) P[X>1/2] x > 1/2, 0 otherwise, = 8 1/2 < x ≤ y ≤ 1, 0 otherwise. (11) The MMSE estimate of Y given X > 1/2 is E [Y |X > 1/2] = ∞ −∞ ∞ −∞ yf X,Y |X>1/2 (x, y) dxdy (12) = 1 1/2 y y 1/2 8 dx dy (13) = 1 1/2 y(8y −4) dy = 5 6 . (14) Problem 9.1.4 Solution The joint PDF of X and Y is f X,Y (x, y) = 6(y −x) 0 ≤ x ≤ y ≤ 1 0 otherwise (1) (a) The conditional PDF of X given Y is found by dividing the joint PDF by the marginal with respect to Y . For y < 0 or y > 1, f Y (y) = 0. For 0 ≤ y ≤ 1, f Y (y) = y 0 6(y −x) dx = 6xy −3x 2 y 0 = 3y 2 (2) The complete expression for the marginal PDF of Y is f Y (y) = 3y 2 0 ≤ y ≤ 1 0 otherwise (3) Thus for 0 < y ≤ 1, f X|Y (x|y) = f X,Y (x, y) f Y (y) = 6(y−x) 3y 2 0 ≤ x ≤ y 0 otherwise (4) (b) The minimum mean square estimator of X given Y = y is ˆ X M (y) = E [X|Y = y] = ∞ −∞ xf X|Y (x|y) dx (5) = y 0 6x(y −x) 3y 2 dx = 3x 2 y −2x 3 3y 2 x=y x=0 = y/3 (6) 324 (c) First we must find the marginal PDF for X. For 0 ≤ x ≤ 1, f X (x) = ∞ −∞ f X,Y (x, y) dy = 1 x 6(y −x) dy = 3y 2 −6xy y=1 y=x (7) = 3(1 −2x +x 2 ) = 3(1 −x) 2 (8) The conditional PDF of Y given X is f Y |X (y|x) = f X,Y (x, y) f X (x) = 2(y−x) 1−2x+x 2 x ≤ y ≤ 1 0 otherwise (9) (d) The minimum mean square estimator of Y given X is ˆ Y M (x) = E [Y |X = x] = ∞ −∞ yf Y |X (y|x) dy (10) = 1 x 2y(y −x) 1 −2x +x 2 dy (11) = (2/3)y 3 −y 2 x 1 −2x +x 2 y=1 y=x = 2 −3x +x 3 3(1 −x) 2 . (12) Perhaps surprisingly, this result can be simplified to ˆ Y M (x) = x 3 + 2 3 . (13) Problem 9.1.5 Solution (a) First we find the marginal PDF f Y (y). For 0 ≤ y ≤ 2, x y 1 1 x=y x=0 f Y (y) = ∞ −∞ f X,Y (x, y) dx = y 0 2 dx = 2y (1) Hence, for 0 ≤ y ≤ 2, the conditional PDF of X given Y is f X|Y (x|y) = f X,Y (x, y) f Y (y) = 1/y 0 ≤ x ≤ y 0 otherwise (2) (b) The optimum mean squared error estimate of X given Y = y is ˆ x M (y) = E [X|Y = y] = ∞ −∞ xf X|Y (x|y) dx = y 0 x y dx = y/2 (3) (c) The MMSE estimator of X given Y is ˆ X M (Y ) = E[X|Y ] = Y/2. The mean squared error is e ∗ X,Y = E (X − ˆ X M (Y )) 2 = E (X −Y/2) 2 = E X 2 −XY +Y 2 /4 (4) 325 Of course, the integral must be evaluated. e ∗ X,Y = 1 0 y 0 2(x 2 −xy +y 2 /4) dxdy (5) = 1 0 2x 3 /3 −x 2 y +xy 2 /2 x=y x=0 dy (6) = 1 0 y 3 6 dy = 1/24 (7) Another approach to finding the mean square error is to recognize that the MMSE estimator is a linear estimator and thus must be the optimal linear estimator. Hence, the mean square error of the optimal linear estimator given by Theorem 9.4 must equal e ∗ X,Y . That is, e ∗ X,Y = Var[X](1 −ρ 2 X,Y ). However, calculation of the correlation coefficient ρ X,Y is at least as much work as direct calculation of e ∗ X,Y . Problem 9.2.1 Solution (a) The marginal PMFs of X and Y are listed below P X (x) = 1/3 x = −1, 0, 1 0 otherwise P Y (y) = 1/4 y = −3, −1, 0, 1, 3 0 otherwise (1) (b) No, the random variables X and Y are not independent since P X,Y (1, −3) = 0 = P X (1) P Y (−3) (2) (c) Direct evaluation leads to E [X] = 0 Var [X] = 2/3 (3) E [Y ] = 0 Var [Y ] = 5 (4) This implies Cov [X, Y ] = Cov [X, Y ] = E [XY ] −E [X] E [Y ] = E [XY ] = 7/6 (5) (d) From Theorem 9.4, the optimal linear estimate of X given Y is ˆ X L (Y ) = ρ X,Y σ X σ Y (Y −µ Y ) +µ X = 7 30 Y + 0. (6) Therefore, a ∗ = 7/30 and b ∗ = 0. (e) From the previous part, X and Y have correlation coefficient ρ X,Y = Cov [X, Y ] / Var[X] Var[Y ] = 49/120. (7) From Theorem 9.4, the minimum mean square error of the optimum linear estimate is e ∗ L = σ 2 X (1 −ρ 2 X,Y ) = 2 3 71 120 = 71 180 . (8) 326 (f) The conditional probability mass function is P X|Y (x| −3) = P X,Y (x, −3) P Y (−3) = 1/6 1/4 = 2/3 x = −1, 1/12 1/4 = 1/3 x = 0. 0 otherwise. (9) (g) The minimum mean square estimator of X given that Y = 3 is ˆ x M (−3) = E [X|Y = −3] = ¸ x xP X|Y (x| −3) = −2/3. (10) (h) The mean squared error of this estimator is ˆ e M (−3) = E (X − ˆ x M (−3)) 2 |Y = −3 (11) = ¸ x (x + 2/3) 2 P X|Y (x| −3) (12) = (−1/3) 2 (2/3) + (2/3) 2 (1/3) = 2/9. (13) Problem 9.2.2 Solution The problem statement tells us that f V (v) = 1/12 −6 ≤ v ≤ 6, 0 otherwise. (1) Furthermore, we are also told that R = V +X where X is a Gaussian (0, √ 3) random variable. (a) The expected value of R is the expected value V plus the expected value of X. We already know that X has zero expected value, and that V is uniformly distributed between -6 and 6 volts and therefore also has zero expected value. So E [R] = E [V +X] = E [V ] +E [X] = 0. (2) (b) Because X and V are independent random variables, the variance of R is the sum of the variance of V and the variance of X. Var[R] = Var[V ] + Var[X] = 12 + 3 = 15. (3) (c) Since E[R] = E[V ] = 0, Cov [V, R] = E [V R] = E [V (V +X)] = E V 2 = Var[V ]. (4) (d) The correlation coefficient of V and R is ρ V,R = Cov [V, R] Var[V ] Var[R] = Var[V ] Var[V ] Var[R] = σ V σ R . (5) The LMSE estimate of V given R is ˆ V (R) = ρ V,R σ V σ R (R −E [R]) +E [V ] = σ 2 V σ 2 R R = 12 15 R. (6) Therefore a ∗ = 12/15 = 4/5 and b ∗ = 0. 327 (e) The minimum mean square error in the estimate is e ∗ = Var[V ](1 −ρ 2 V,R ) = 12(1 −12/15) = 12/5 (7) Problem 9.2.3 Solution The solution to this problem is to simply calculate the various quantities required for the optimal linear estimator given by Theorem 9.4. First we calculate the necessary moments of X and Y . E [X] = −1(1/4) + 0(1/2) + 1(1/4) = 0 (1) E X 2 = (−1) 2 (1/4) + 0 2 (1/2) + 1 2 (1/4) = 1/2 (2) E [Y ] = −1(17/48) + 0(17/48) + 1(14/48) = −1/16 (3) E Y 2 = (−1) 2 (17/48) + 0 2 (17/48) + 1 2 (14/48) = 31/48 (4) E [XY ] = 3/16 −0 −0 + 1/8 = 5/16 (5) The variances and covariance are Var[X] = E X 2 −(E [X]) 2 = 1/2 (6) Var[Y ] = E Y 2 −(E [Y ]) 2 = 493/768 (7) Cov [X, Y ] = E [XY ] −E [X] E [Y ] = 5/16 (8) ρ X,Y = Cov [X, Y ] Var[X] Var[Y ] = 5 √ 6 √ 493 (9) By reversing the labels of X and Y in Theorem 9.4, we find that the optimal linear estimator of Y given X is ˆ Y L (X) = ρ X,Y σ Y σ X (X −E [X]) +E [Y ] = 5 8 X − 1 16 (10) The mean square estimation error is e ∗ L = Var[Y ](1 −ρ 2 X,Y ) = 343/768 (11) Problem 9.2.4 Solution These four joint PMFs are actually related to each other. In particular, completing the row sums and column sums shows that each random variable has the same marginal PMF. That is, P X (x) = P Y (x) = P U (x) = P V (x) = P S (x) = P T (x) = P Q (x) = P R (x) (1) = 1/3 x = −1, 0, 1 0 otherwise (2) This implies E [X] = E [Y ] = E [U] = E [V ] = E [S] = E [T] = E [Q] = E [R] = 0 (3) and that E X 2 = E Y 2 = E U 2 = E V 2 = E S 2 = E T 2 = E Q 2 = E R 2 = 2/3 (4) Since each random variable has zero expected value, the second moment equals the variance. Also, the standard deviation of each random variable is √ 2/3. These common properties will make it much easier to answer the questions. 328 (a) Random variables X and Y are independent since for all x and y, P X,Y (x, y) = P X (x) P Y (y) (5) Since each other pair of random variables has the same marginal PMFs as X and Y but a different joint PMF, all of the other pairs of random variables must be dependent. Since X and Y are independent, ρ X,Y = 0. For the other pairs, we must compute the covariances. Cov [U, V ] = E [UV ] = (1/3)(−1) + (1/3)(−1) = −2/3 (6) Cov [S, T] = E [ST] = 1/6 −1/6 + 0 +−1/6 + 1/6 = 0 (7) Cov [Q, R] = E [QR] = 1/12 −1/6 −1/6 + 1/12 = −1/6 (8) The correlation coefficient of U and V is ρ U,V = Cov [U, V ] √ Var[U] √ Var[V ] = −2/3 √ 2/3 √ 2/3 = −1 (9) In fact, since the marginal PMF’s are the same, the denominator of the correlation coefficient will be 2/3 in each case. The other correlation coefficients are ρ S,T = Cov [S, T] 2/3 = 0 ρ Q,R = Cov [Q, R] 2/3 = −1/4 (10) (b) From Theorem 9.4, the least mean square linear estimator of U given V is ˆ U L (V ) = ρ U,V σ U σ V (V −E [V ]) +E [U] = ρ U,V V = −V (11) Similarly for the other pairs, all expected values are zero and the ratio of the standard deviations is always 1. Hence, ˆ X L (Y ) = ρ X,Y Y = 0 (12) ˆ S L (T) = ρ S,T T = 0 (13) ˆ Q L (R) = ρ Q,R R = −R/4 (14) From Theorem 9.4, the mean square errors are e ∗ L (X, Y ) = Var[X](1 −ρ 2 X,Y ) = 2/3 (15) e ∗ L (U, V ) = Var[U](1 −ρ 2 U,V ) = 0 (16) e ∗ L (S, T) = Var[S](1 −ρ 2 S,T ) = 2/3 (17) e ∗ L (Q, R) = Var[Q](1 −ρ 2 Q,R ) = 5/8 (18) Problem 9.2.5 Solution To solve this problem, we use Theorem 9.4. The only difficulty is in computing E[X], E[Y ], Var[X], Var[Y ], and ρ X,Y . First we calculate the marginal PDFs f X (x) = 1 x 2(y +x) dy = y 2 + 2xy y=1 y=x = 1 + 2x −3x 2 (1) f Y (y) = y 0 2(y +x) dx = 2xy +x 2 x=y x=0 = 3y 2 (2) 329 The first and second moments of X are E [X] = 1 0 (x + 2x 2 −3x 3 ) dx = x 2 /2 + 2x 3 /3 −3x 4 /4 1 0 = 5/12 (3) E X 2 = 1 0 (x 2 + 2x 3 −3x 4 ) dx = x 3 /3 +x 4 /2 −3x 5 /5 1 0 = 7/30 (4) The first and second moments of Y are E [Y ] = 1 0 3y 3 dy = 3/4, E Y 2 = 1 0 3y 4 dy = 3/5. (5) Thus, X and Y each have variance Var[X] = E X 2 −(E [X]) 2 = 129 2160 Var[Y ] = E Y 2 −(E [Y ]) 2 = 3 80 (6) To calculate the correlation coefficient, we first must calculate the correlation E [XY ] = 1 0 y 0 2xy(x +y) dxdy (7) = 1 0 2x 3 y/3 +x 2 y 2 x=y x=0 dy = 1 0 5y 4 3 dy = 1/3 (8) Hence, the correlation coefficient is ρ X,Y = Cov [X, Y ] Var[X] Var[Y ] E [XY ] −E [X] E [Y ] Var[X] Var[Y ] = 5 √ 129 (9) Finally, we use Theorem 9.4 to combine these quantities in the optimal linear estimator. ˆ X L (Y ) = ρ X,Y σ X σ Y (Y −E [Y ]) +E [X] (10) = 5 √ 129 √ 129 9 Y − 3 4 + 5 12 = 5 9 Y. (11) Problem 9.2.6 Solution The linear mean square estimator of X given Y is ˆ X L (Y ) = E [XY ] −µ X µ Y Var[Y ] (Y −µ Y ) +µ X . (1) To find the parameters of this estimator, we calculate f Y (y) = y 0 6(y −x) dx = 6xy −3x 2 y 0 = 3y 2 (0 ≤ y ≤ 1) (2) f X (x) = 1 x 6(y −x) dy = 3(1 +−2x +x 2 ) 0 ≤ x ≤ 1, 0 otherwise. (3) The moments of X and Y are E [Y ] = 1 0 3y 3 dy = 3/4 E [X] = 1 0 3x(1 −2x +x 2 ) dx = 1/4 (4) E Y 2 = 1 0 3y 4 dy = 3/5 E X 2 = 1 0 3x 2 (1 +−2x +x 2 ) dx = 1/10 (5) 330 The correlation between X and Y is E [XY ] = 6 1 0 1 x xy(y −x) dy dx = 1/5 (6) Putting these pieces together, the optimal linear estimate of X given Y is ˆ X L (Y ) = 1/5 −3/16 3/5 −(3/4) 2 Y − 3 4 + 1 4 = Y 3 (7) Problem 9.2.7 Solution We are told that random variable X has a second order Erlang distribution f X (x) = λxe −λx x ≥ 0 0 otherwise (1) We also know that given X = x, random variable Y is uniform on [0, x] so that f Y |X (y|x) = 1/x 0 ≤ y ≤ x 0 otherwise (2) (a) Given X = x, Y is uniform on [0, x]. Hence E[Y |X = x] = x/2. Thus the minimum mean square estimate of Y given X is ˆ Y M (X) = E [Y |X] = X/2 (3) (b) The minimum mean square estimate of X given Y can be found by finding the conditional probability density function of X given Y . First we find the joint density function. f X,Y (x, y) = f Y |X (y|x) · f X (x) = λe −λx 0 ≤ y ≤ x 0 otherwise (4) Now we can find the marginal of Y f Y (y) = ∞ y λe −λx dx = e −λy y ≥ 0 0 otherwise (5) By dividing the joint density by the marginal density of Y we arrive at the conditional density of X given Y . f X|Y (x|y) = f X,Y (x, y) f Y (y) = λe −λ(x−y) x ≥ y 0 otherwise (6) Now we are in a position to find the minimum mean square estimate of X given Y . Given Y = y, the conditional expected value of X is E [X|Y = y] = ∞ y λxe −λ(x−y) dx (7) Making the substitution u = x −y yields E [X|Y = y] = ∞ 0 λ(u +y)e −λu du (8) 331 We observe that if U is an exponential random variable with parameter λ, then E [X|Y = y] = E [U +y] = 1 λ +y (9) The minimum mean square error estimate of X given Y is ˆ X M (Y ) = E [X|Y ] = 1 λ +Y (10) (c) Since the MMSE estimate of Y given X is the linear estimate ˆ Y M (X) = X/2, the optimal linear estimate of Y given X must also be the MMSE estimate. That is, ˆ Y L (X) = X/2. (d) Since the MMSE estimate of X given Y is the linear estimate ˆ X M (Y ) = Y +1/λ, the optimal linear estimate of X given Y must also be the MMSE estimate. That is, ˆ X L (Y ) = Y + 1/λ. Problem 9.2.8 Solution From the problem statement, we learn the following facts: f R (r) = e −r r ≥ 0, 0 otherwise, f X|R (x|r) = re −rx x ≥ 0, 0 otherwise. (1) Note that f X,R (x, r) > 0 for all non-negative X and R. Hence, for the remainder of the problem, we assume both X and R are non-negative and we omit the usual “zero otherwise” considerations. (a) To find ˆ r M (X), we need the conditional PDF f R|X (r|x) = f X|R (x|r) f R (r) f X (x) . (2) The marginal PDF of X is f X (x) = ∞ 0 f X|R (x|r) f R (r) dr = ∞ 0 re −(x+1)r dr (3) We use the integration by parts formula udv = uv − v du by choosing u = r and dv = e −(x+1)r dr. Thus v = −e −(x+1)r /(x + 1) and f X (x) = −r x + 1 e −(x+1)r ∞ 0 + 1 x + 1 ∞ 0 e −(x+1)r dr = −1 (x + 1) 2 e −(x+1)r ∞ 0 = 1 (x + 1) 2 (4) Now we can find the conditional PDF of R given X. f R|X (r|x) = f X|R (x|r) f R (r) f X (x) = (x + 1) 2 re −(x+1)r (5) By comparing, f R|X (r|x) to the Erlang PDF shown in Appendix A, we see that given X = x, the conditional PDF of R is an Erlang PDF with parameters n = 1 and λ = x + 1. This implies E [R|X = x] = 1 x + 1 Var [R|X = x] = 1 (x + 1) 2 (6) Hence, the MMSE estimator of R given X is ˆ r M (X) = E [R|X] = 1 X + 1 (7) 332 (b) The MMSE estimate of X given R = r is E[X|R = r]. From the initial problem statement, we know that given R = r, X is exponential with expectred value 1/r. That is, E[X|R = r] = 1/r. Another way of writing this statement is ˆ x M (R) = E [X|R] = 1/R. (8) (c) Note that the expected value of X is E [X] = ∞ 0 xf X (x) dx = ∞ 0 x (x + 1) 2 dx = ∞ (9) Because E[X] doesn’t exist, the LMSE estimate of X given R doesn’t exist. (d) Just as in part (c), because E[X] doesn’t exist, the LMSE estimate of R given X doesn’t exist. Problem 9.2.9 Solution (a) As a function of a, the mean squared error is e = E (aY −X) 2 = a 2 E Y 2 −2aE [XY ] +E X 2 (1) Setting de/da| a=a ∗ = 0 yields a ∗ = E [XY ] E [Y 2 ] (2) (b) Using a = a ∗ , the mean squared error is e ∗ = E X 2 − (E [XY ]) 2 E [Y 2 ] (3) (c) We can write the LMSE estimator given in Theorem 9.4 in the form ˆ x L (()Y ) = ρ X,Y σ X σ Y Y −b (4) where b = ρ X,Y σ X σ Y E [Y ] −E [X] (5) When b = 0, ˆ X(Y ) is the LMSE estimate. Note that the typical way that b = 0 occurs when E[X] = E[Y ] = 0. However, it is possible that the right combination of expected values, variances, and correlation coefficent can also yield b = 0. 333 Problem 9.3.1 Solution In this case, the joint PDF of X and R is f X,R (x, r) = f X|R (x|r) f R (r) (1) = 1 r 0 √ 128π e −(x+40+40 log 10 r) 2 /128 0 ≤ r ≤ r 0 , 0 otherwise. (2) From Theorem 9.6, the MAP estimate of R given X = x is the value of r that maximizes f X|R (x|r)f R (r). Since R has a uniform PDF over [0, 1000], ˆ r MAP (x) = arg max 0≤r f X|R (x|r) f R (r) = arg max 0≤r≤1000 f X|R (x|r) (3) Hence, the maximizing value of r is the same as for the ML estimate in Quiz 9.3 unless the maximizing r exceeds 1000 m. In this case, the maximizing value is r = 1000 m. From the solution to Quiz 9.3, the resulting ML estimator is ˆ r ML (x) = 1000 x < −160 (0.1)10 −x/40 x ≥ −160 (4) Problem 9.3.2 Solution From the problem statement we know that R is an exponential random variable with expected value 1/µ. Therefore it has the following probability density function. f R (r) = µe −µr r ≥ 0 0 otherwise (1) It is also known that, given R = r, the number of phone calls arriving at a telephone switch, N, is a Poisson (α = rT) random variable. So we can write the following conditional probability mass function of N given R. P N|R (n|r) = (rT) n e −rT n! n = 0, 1, . . . 0 otherwise (2) (a) The minimum mean square error estimate of N given R is the conditional expected value of N given R = r. This is given directly in the problem statement as r. ˆ N M (r) = E [N|R = r] = rT (3) (b) The maximum a posteriori estimate of N given R is simply the value of n that will maximize P N|R (n|r). That is, ˆ n MAP(r) = arg max n≥0 P N|R (n|r) = arg max n≥0 (rT) n e −rT /n! (4) Usually, we set a derivative to zero to solve for the maximizing value. In this case, that tech- nique doesn’t work because n is discrete. Since e −rT is a common factor in the maximization, we can define g(n) = (rT) n /n! so that ˆ n MAP = arg max n g(n). We observe that g(n) = rT n g(n −1) (5) this implies that for n ≤ rT, g(n) ≥ g(n −1). Hence the maximizing value of n is the largest n such that n ≤ rT. That is, ˆ n MAP = rT|. 334 (c) The maximum likelihood estimate of N given R selects the value of n that maximizes f R|N=n (r), the conditional PDF of R given N. When dealing with situations in which we mix continuous and discrete random variables, its often helpful to start from first principles. In this case, f R|N (r|n) dr = P [r < R ≤ r +dr|N = n] (6) = P [r < R ≤ r +dr, N = n] P [N = n] (7) = P [N = n|R = r] P [r < R ≤ r +dr] P [N = n] (8) In terms of PDFs and PMFs, we have f R|N (r|n) = P N|R (n|r) f R (r) P N (n) (9) To find the value of n that maximizes f R|N (r|n), we need to find the denominator P N (n). P N (n) = ∞ −∞ P N|R (n|r) f R (r) dr (10) = ∞ 0 (rT) n e −rT n! µe −µr dr (11) = µT n n!(µ +T) ∞ 0 r n (µ +T)e −(µ+T)r dr (12) = µT n n!(µ +T) E [X n ] (13) where X is an exponential random variable with expected value 1/(µ+T). There are several ways to derive the nth moment of an exponential random variable including integration by parts. In Example 6.5, the MGF is used to show that E[X n ] = n!/(µ+T) n . Hence, for n ≥ 0, P N (n) = µT n (µ +T) n+1 (14) Finally, the conditional PDF of R given N is f R|N (r|n) = P N|R (n|r) f R (r) P N (n) = (rT) n e −rT n! µe −µr µT n (µ+T) n+1 (15) = (µ +T) [(µ +T)r] n e −(µ+T)r n! (16) The ML estimate of N given R is ˆ n ML (r) = arg max n≥0 f R|N (r|n) = arg max n≥0 (µ +T) [(µ +T)r] n e −(µ+T)r n! (17) This maximization is exactly the same as in the previous part except rT is replaced by (µ +T)r. The maximizing value of n is ˆ n ML = (µ +T)r|. 335 Problem 9.3.3 Solution Both parts (a) and (b) rely on the conditional PDF of R given N = n. When dealing with situations in which we mix continuous and discrete random variables, its often helpful to start from first principles. f R|N (r|n) dr = P [r < R ≤ r +dr|N = n] = P [r < R ≤ r +dr, N = n] P [N = n] (1) = P [N = n|R = r] P [r < R ≤ r +dr] P [N = n] (2) In terms of PDFs and PMFs, we have f R|N (r|n) = P N|R (n|r) f R (r) P N (n) (3) To find the value of n that maximizes f R|N (r|n), we need to find the denominator P N (n). P N (n) = ∞ −∞ P N|R (n|r) f R (r) dr (4) = ∞ 0 (rT) n e −rT n! µe −µr dr (5) = µT n n!(µ +T) ∞ 0 r n (µ +T)e −(µ+T)r dr = µT n n!(µ +T) E [X n ] (6) where X is an exponential random variable with expected value 1/(µ+T). There are several ways to derive the nth moment of an exponential random variable including integration by parts. In Example 6.5, the MGF is used to show that E[X n ] = n!/(µ +T) n . Hence, for n ≥ 0, P N (n) = µT n (µ +T) n+1 (7) Finally, the conditional PDF of R given N is f R|N (r|n) = P N|R (n|r) f R (r) P N (n) = (rT) n e −rT n! µe −µr µT n (µ+T) n+1 = (µ +T) n+1 r n e −(µ+T)r n! (8) (a) The MMSE estimate of R given N = n is the conditional expected value E[R|N = n]. Given N = n, the conditional PDF oF R is that of an Erlang random variable or order n+1. From Appendix A, we find that E[R|N = n] = (n + 1)/(µ + T). The MMSE estimate of R given N is ˆ R M (N) = E [R|N] = N + 1 µ +T (9) (b) The MAP estimate of R given N = n is the value of r that maximizes f R|N (r|n). ˆ R MAP (n) = arg max r≥0 f R|N (r|n) = arg max r≥0 (µ +T) n+1 n! r n e −(µ+T)r (10) By setting the derivative with respect to r to zero, we obtain the MAP estimate ˆ R MAP (n) = n µ +T (11) 336 (c) The ML estimate of R given N = n is the value of R that maximizes P N|R (n|r). That is, ˆ R ML (n) = arg max r≥0 (rT) n e −rT n! (12) Seting the derivative with respect to r to zero yields ˆ R ML (n) = n/T (13) Problem 9.3.4 Solution This problem is closely related to Example 9.7. (a) Given Q = q, the conditional PMF of K is P K|Q (k|q) = n k q k (1 −q) n−k k = 0, 1, . . . , n 0 otherwise (1) The ML estimate of Q given K = k is ˆ q ML (k) = arg max 0≤q≤1 P Q|K (q|k) (2) Differentiating P Q|K (q|k) with respect to q and setting equal to zero yields dP Q|K (q|k) dq = n k kq k−1 (1 −q) n−k −(n −k)q k (1 −q) n−k−1 = 0 (3) T‘¡he maximizing value is q = k/n so that ˆ Q ML (K) = K n (4) (b) To find the PMF of K, we average over all q. P K (k) = ∞ −∞ P K|Q (k|q) f Q (q) dq = 1 0 n k q k (1 −q) n−k dq (5) We can evaluate this integral by expressing it in terms of the integral of a beta PDF. Since β(k + 1, n −k + 1) = (n+1)! k!(n−k)! , we can write P K (k) = 1 n + 1 1 0 β(k + 1, n −k + 1)q k (1 −q) n−k dq = 1 n + 1 (6) That is, K has the uniform PMF P K (k) = 1/(n + 1) k = 0, 1, . . . , n 0 otherwise (7) Hence, E[K] = n/2. (c) The conditional PDF of Q given K is f Q|K (q|k) = P K|Q (k|q) f Q (q) P K (k) = (n+1)! k!(n−k)! q k (1 −q) n−k 0 ≤ q ≤ 1 0 otherwise (8) That is, given K = k, Q has a beta (k + 1, n −k + 1) PDF. (d) The MMSE estimate of Q given K = k is the conditional expectation E[Q|K = k]. From the beta PDF described in Appendix A, E[Q|K = k] = (k +1)/(n +2). The MMSE estimator is ˆ Q M (K) = E [Q|K] = K + 1 n + 2 (9) 337 Problem 9.4.1 Solution From the problem statement, we learn for vectors X = X 1 X 2 X 3 and Y = Y 1 Y 2 that E [X] = 0, R X = 1 3/4 1/2 3/4 1 3/4 1/2 3/4 1 ¸ ¸ , Y = AX = ¸ 1 1 0 0 1 1 X. (1) (a) Since E[Y] = AE[X] = 0, we can apply Theorem 9.7(a) which states that the minimum mean square error estimate of X 1 is ˆ X 1 (Y) = ˆ a Y where ˆ a = R −1 Y R YX 1 . The rest of the solution is just calculation. (We note that even in the case of a 3 3 matrix, its convenient to use Matlab with format rat mode to perform the calculations and display the results as nice fractions.) From Theorem 5.13, R Y = AR X A = ¸ 1 1 0 0 1 1 1 3/4 1/2 3/4 1 3/4 1/2 3/4 1 ¸ ¸ 1 0 1 1 0 1 ¸ ¸ = ¸ 7/2 3 3 7/2 . (2) In addition, since R YX 1 = E[YX 1 ] = E[AXX 1 ] = AE[XX 1 ], R YX 1 = A E X 2 1 E [X 2 X 1 ] E [X 3 X 1 ] ¸ ¸ = A R X (1, 1) R X (2, 1) R X (3, 1) ¸ ¸ = ¸ 1 1 0 0 1 1 1 3/4 1/2 ¸ ¸ = ¸ 7/4 5/4 . (3) Finally, ˆ a = R −1 Y R YX 1 = ¸ 14/13 −12/13 −12/13 14/13 ¸ 7/4 5/4 = 1 26 ¸ 19 −7 . (4) Thus the linear MMSE estimator of X 1 given Y is ˆ X 1 (Y) = ˆ a Y = 19 26 Y 1 − 7 26 Y 2 = 0.7308Y 1 −0.2692Y 2 . (5) (b) By Theorem 9.7(c), the mean squared error of the optimal estimator is e ∗ L = Var[X 1 ] − ˆ a R YX 1 (6) = R X (1, 1) −R YX 1 R −1 Y R YX 1 (7) = 1 − 7/4 5/4 ¸ 14/13 −12/13 −12/13 14/13 ¸ 7/4 5/4 = 3 52 (8) (c) We can estimate random variable X 1 based on the observation of random variable Y 1 using Theorem 9.4. Note that Theorem 9.4 is just a special case of Theorem 9.8 in which the observation is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is ˆ X 1 (Y 1 ) = a ∗ Y 1 +b ∗ where a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] , b ∗ = µ X 1 −a ∗ µ Y 1 . (9) Since Y 1 = X 1 +X 2 , we see that µ X 1 = E [X 1 ] = 0, µ Y 1 = E [Y 1 ] = E [X 1 ] +E [X 2 ] = 0. (10) 338 These facts, along with R X and R Y from part (a), imply Cov [X 1 , Y 1 ] = E [X 1 Y 1 ] (11) = E [X 1 (X 1 +X 2 )] = R X (1, 1) +R X (1, 2) = 7/4, (12) Var[Y 1 ] = E Y 2 1 = R Y (1, 1) = 7/2 (13) Thus a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] = 7/4 7/2 = 1 2 , b ∗ = µ X 1 −a ∗ µ Y 1 = 0. (14) Thus the optimum linear estimate of X 1 given Y 1 is ˆ X 1 (Y 1 ) = 1 2 Y 1 . (15) From Theorem 9.4(b), the mean square error of this estimator is e ∗ L = σ 2 X 1 (1 −ρ 2 X 1 ,Y 1 ) (16) Since X 1 and Y 1 have zero expected value, σ 2 X 1 = R X (1, 1) = 1 and σ 2 Y 1 = R Y (1, 1) = 7/2. Also, since Cov[X 1 , Y 1 ] = 7/4, we see that ρ X 1 ,Y 1 = Cov [X 1 , Y 1 ] σ X 1 σ Y 1 = 7/4 7/2 = 7 8 . (17) Thus e ∗ L = 1 − ( 7/8) 2 = 1/8. Note that 1/8 > 3/52. As we would expect, the estimate of X 1 based on just Y 1 has larger mean square error than the estimate based on both Y 1 and Y 2 . Problem 9.4.2 Solution From the problem statement, we learn for vectors X = X 1 X 2 X 3 and W = W 1 W 2 that E [X] = 0, R X = 1 3/4 1/2 3/4 1 3/4 1/2 3/4 1 ¸ ¸ , E [W] = 0, R W = ¸ 0.1 0 0 0.1 (1) In addition, Y = ¸ Y 1 Y 2 = AX+W = ¸ 1 1 0 0 1 1 X+W. (2) (a) Since E[Y] = AE[X] = 0, we can apply Theorem 9.7(a) which states that the minimum mean square error estimate of X 1 is ˆ X 1 (Y) = ˆ a Y where ˆ a = R −1 Y R YX 1 . First we find R Y . R Y = E YY = E (AX+W)(AX+W) (3) = E (AX+W)(X A +W ) (4) = E AXX A +E WX A +E AXW +E WW (5) Since X and W are independent, E[WX ] = 0 and E[XW ] = 0. This implies R Y = AE XX A +E WW (6) = AR X A +R W (7) = ¸ 1 1 0 0 1 1 1 3/4 1/2 3/4 1 3/4 1/2 3/4 1 ¸ ¸ 1 0 1 1 0 1 ¸ ¸ + ¸ 0.1 0 0 0.1 = ¸ 3.6 3 3 3.6 . (8) 339 Once again, independence of W and X 1 yields R YX 1 = E [YX 1 ] = E [(AX+W)X 1 ] = AE [XX 1 ] . (9) This implies R YX 1 = A E X 2 1 E [X 2 X 1 ] E [X 3 X 1 ] ¸ ¸ = A R X (1, 1) R X (2, 1) R X (3, 1) ¸ ¸ = ¸ 1 1 0 0 1 1 1 3/4 1/2 ¸ ¸ = ¸ 7/4 5/4 . (10) Putting these facts together, we find that ˆ a = R −1 Y R YX 1 = ¸ 10/11 −25/33 −25/33 10/11 ¸ 7/4 5/4 = 1 132 ¸ 85 −25 . (11) Thus the linear MMSE estimator of X 1 given Y is ˆ X 1 (Y) = ˆ a Y = 85 132 Y 1 − 25 132 Y 2 = 0.6439Y 1 −0.1894Y 2 . (12) (b) By Theorem 9.7(c), the mean squared error of the optimal estimator is e ∗ L = Var[X 1 ] − ˆ a R YX 1 (13) = R X (1, 1) −R YX 1 R −1 Y R YX 1 (14) = 1 − 7/4 5/4 ¸ 10/11 −25/33 −25/33 10/11 ¸ 7/4 5/4 = 29 264 = 0.1098. (15) In Problem 9.4.1, we solved essentialy the same problem but the observations Y were not subjected to the additive noise W. In comparing the estimators, we see that the additive noise perturbs the estimator somewhat but not dramatically because the correaltion structure of X and the mapping A from X to Y remains unchanged. On the other hand, in the noiseless case, the resulting mean square error was about half as much, 3/52 = 0.0577 versus 0.1098. (c) We can estimate random variable X 1 based on the observation of random variable Y 1 using Theorem 9.4. Note that Theorem 9.4 is a special case of Theorem 9.8 in which the observation is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is ˆ X 1 (Y 1 ) = a ∗ Y 1 +b ∗ where a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] , b ∗ = µ X 1 −a ∗ µ Y 1 . (16) Since E[X i ] = µ X i = 0 and Y 1 = X 1 +X 2 +W 1 , we see that µ Y 1 = E [Y 1 ] = E [X 1 ] +E [X 2 ] +E [W 1 ] = 0. (17) These facts, along with independence of X 1 and W 1 , imply Cov [X 1 , Y 1 ] = E [X 1 Y 1 ] = E [X 1 (X 1 +X 2 +W 1 )] (18) = R X (1, 1) +R X (1, 2) = 7/4 (19) In addition, using R Y from part (a), we see that Var[Y 1 ] = E Y 2 1 = R Y (1, 1) = 3.6. (20) 340 Thus a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] = 7/4 3.6 = 35 72 , b ∗ = µ X 1 −a ∗ µ Y 1 = 0. (21) Thus the optimum linear estimate of X 1 given Y 1 is ˆ X 1 (Y 1 ) = 35 72 Y 1 . (22) From Theorem 9.4(b), the mean square error of this estimator is e ∗ L = σ 2 X 1 (1 −ρ 2 X 1 ,Y 1 ) (23) Since X 1 and Y 1 have zero expected value, σ 2 X 1 = R X (1, 1) = 1 and σ 2 Y 1 = R Y (1, 1) = 3.6. Also, since Cov[X 1 , Y 1 ] = 7/4, we see that ρ X 1 ,Y 1 = Cov [X 1 , Y 1 ] σ X 1 σ Y 1 = 7/4 √ 3.6 = √ 490 24 . (24) Thus e ∗ L = 1 −(490/24 2 ) = 0.1493. As we would expect, the estimate of X 1 based on just Y 1 has larger mean square error than the estimate based on both Y 1 and Y 2 . Problem 9.4.3 Solution From the problem statement, we learn for vectors X = X 1 X 2 X 3 and W = W 1 W 2 that E [X] = −1 0 1 ¸ ¸ , R X = 1 3/4 1/2 3/4 1 1/2 1/2 3/4 1 ¸ ¸ , E [W] = ¸ 0 0 , R W = ¸ 0.1 0 0 0.1 (1) In addition, from Theorem 5.12, C X = R X −µ X µ X (2) = 1 3/4 1/2 3/4 1 1/2 1/2 3/4 1 ¸ ¸ − −0.1 0 0.1 ¸ ¸ −0.1 0 0.1 = 0.99 0.75 0.51 0.75 1.0 0.75 0.51 0.75 0.99 ¸ ¸ , (3) and Y = ¸ Y 1 Y 2 = AX+W = ¸ 1 1 0 0 1 1 X+W. (4) (a) Since E[X] is nonzero, we use Theorem 9.8 which states that the minimum mean square error estimate of X 1 is ˆ X 1 (Y) = ˆ a Y + ˆ b where ˆ a = C −1 Y C YX 1 and ˆ b = E[X 1 ] − ˆ a E[Y]. First we find R Y . Since E[Y] = AE[X] +E[W] = AE[X], C Y = E (Y−E [Y])(Y−E [Y]) (5) = E (A(X−E [X]) +W)(A(X−E [X]) +W) (6) = E (A(X−E [X]) +W)((X−E [X]) A +W ) (7) = AE (X−E [X])(X−E [X]) A +E W(X−E [X]) A +AE (X−E [X])W +E WW (8) 341 Since X and W are independent, E[W(X−E[X]) ] = 0 and E[(X−E[X])W ] = 0. This implies C Y = AC X A +R W (9) = ¸ 1 1 0 0 1 1 0.99 0.75 0.51 0.75 1.0 0.75 0.51 0.75 0.99 ¸ ¸ 1 0 1 1 0 1 ¸ ¸ + ¸ 0.1 0 0 0.1 = ¸ 3.59 3.01 3.01 3.59 . (10) Once again, independence of W and X 1 yields E[W(X 1 −E[X 1 ])] = 0, implying C YX 1 = E [(Y−E [Y])(X 1 −E [X 1 ])] (11) = E [(A(X−E [X]) +W)(X 1 −E [X 1 ])] (12) = AE [(X−E [X])(X 1 −E [X 1 ])] (13) = A C X (1, 1) C X (2, 1) C X (3, 1) ¸ ¸ = ¸ 1 1 0 0 1 1 0.99 0.75 0.51 ¸ ¸ = ¸ 1.74 1.26 . (14) Finally, ˆ a = C −1 Y C YX 1 = ¸ 0.9378 −0.7863 −0.7863 0.9378 ¸ 1.74 1.26 = ¸ 0.6411 −0.1865 (15) Next, we find that E [Y] = AE [X] = ¸ 1 1 0 0 1 1 −0.1 0 0.1 ¸ ¸ = ¸ −0.1 0.1 . (16) This implies ˆ b = E [X 1 ] − ˆ a E [Y] = −0.1 − 0.6411 −0.1865 ¸ −0.1 0.1 = −0.0172 (17) Thus the linear MMSE estimator of X 1 given Y is ˆ X 1 (Y) = ˆ a Y + ˆ b = 0.6411Y 1 −0.1865Y 2 + 0.0172. (18) We note that although E[X] = 0.1, the estimator’s offset is only 0.0172. This is because the change in E[X] is also included in the change in E[Y ]. (b) By Theorem 9.8(c), the mean square error of the optimal estimator is e ∗ L = Var[X 1 ] − ˆ a C YX 1 (19) = C X (1, 1) −C YX 1 C −1 Y C YX 1 (20) = 0.99 − 1.7400 1.2600 ¸ 0.9378 −0.7863 −0.7863 0.9378 ¸ 1.7400 1.2600 = 0.1096. (21) (c) We can estimate random variable X 1 based on the observation of random variable Y 1 using Theorem 9.4. Note that Theorem 9.4 is a special case of Theorem 9.8 in which the observation is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is ˆ X 1 (Y 1 ) = a ∗ Y 1 +b ∗ where a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] , b ∗ = µ X 1 −a ∗ µ Y 1 . (22) 342 First we note that E[X 1 ] = µ X 1 = −0.1. Second, since Y 1 = X 1 +X 2 +W 1 and E[W 1 ] = 0, we see that µ Y 1 = E [Y 1 ] = E [X 1 ] +E [X 2 ] +E [W 1 ] (23) = E [X 1 ] +E [X 2 ] (24) = −0.1 (25) These facts, along with independence of X 1 and W 1 , imply Cov [X 1 , Y 1 ] = E [(X 1 −E [X 1 ])(Y 1 −E [Y 1 ])] (26) = E [(X 1 −E [X 1 ])((X 1 −E [X 1 ]) + (X 2 −EX 2 ) +W 1 )] (27) = C X (1, 1) +C X (1, 2) = 1.74 (28) In addition, using C Y from part (a), we see that Var[Y 1 ] = E Y 2 1 = C Y (1, 1) = 3.59. (29) Thus a ∗ = Cov [X 1 , Y 1 ] Var[Y 1 ] = 1.74 3.59 = 0.4847, b ∗ = µ X 1 −a ∗ µ Y 1 = −0.0515. (30) Thus the optimum linear estimate of X 1 given Y 1 is ˆ X 1 (Y 1 ) = 0.4847Y 1 −0.0515. (31) From Theorem 9.4(b), the mean square error of this estimator is e ∗ L = σ 2 X 1 (1 −ρ 2 X 1 ,Y 1 ) (32) Note that σ 2 X 1 = R X (1, 1) = 0.99 and σ 2 Y 1 = R Y (1, 1) = 3.59. Also, since Cov[X 1 , Y 1 ] = 1.74, we see that ρ X 1 ,Y 1 = Cov [X 1 , Y 1 ] σ X 1 σ Y 1 = 1.74 (0.99)(3.59) = 0.923. (33) Thus e ∗ L = 0.99(1 − (0.923) 2 ) = 0.1466. As we would expect, the estimate of X 1 based on just Y 1 has larger mean square error than the estimate based on both Y 1 and Y 2 . Problem 9.4.4 Solution From Theorem 9.7(a), we know that the minimum mean square error estimate of X given Y is ˆ X L (Y) = ˆ a Y, where ˆ a = R −1 Y R YX . In this problem, Y is simply a scalar Y and ˆ a is a scalar ˆ a. Since E[Y ] = 0, R Y = E YY = E Y 2 = σ 2 Y . (1) Similarly, R YX = E [YX] = E [Y X] = Cov [X, Y ] . (2) It follows that ˆ a = R −1 Y R YX = σ 2 Y −1 Cov [X, Y ] = σ X σ Y Cov [X, Y ] σ X σ Y = σ X σ Y ρ X,Y . (3) 343 Problem 9.4.5 Solution In this problem, we view Y = X 1 X 2 as the observation and X = X 3 as the variable we wish to estimate. Since E[X] = 0, we can use Theorem 9.7(a) to find the minimum mean square error estimate ˆ X L (Y) = ˆ a Y where ˆ a = R −1 Y R YX . (a) In this case, R Y = E YY = ¸ E X 2 1 E [X 1 X 2 ] E [X 2 X 1 ] E X 2 2 (1) = ¸ r 11 r 12 r 21 r 22 = ¸ 1 −0.8 −0.8 1 . (2) Similarly, R YX = E [YX] = E ¸¸ X 1 X 2 X 3 = ¸ E [X 1 X 3 ] E [X 2 X 3 ] = ¸ r 13 r 23 = ¸ 0.64 −0.8 . (3) Thus ˆ a = R −1 Y R YX 1 = ¸ 25/9 20/9 20/9 25/9 −1 ¸ 0.64 −0.8 = ¸ 0 −0.8 . (4) The optimum linear estimator of X 3 given X 1 and X 2 is ˆ X 3 = −0.8X 2 . (5) The fact that this estimator depends only on X 2 while ignoring X 1 is an example of a result to be proven in Problem 9.4.6. (b) By Theorem 9.7(c), the mean squared error of the optimal estimator is e ∗ L = Var[X 3 ] − ˆ a R YX 3 = 1 − 0 −0.8 ¸ 0.64 −0.8 = 0.36 (6) (c) In the previous part, we found that the optimal linear estimate of X 3 based on the observation of random variables X 1 and X 2 employed only X 2 . Hence this same estimate, ˆ X 3 = −0.8X 2 , is the optimal linear estimate of X 3 just using X 2 . (This can be derived using Theorem 9.4, if you wish to do more algebra.) (d) Since the estimator is the same, the mean square error is still e ∗ L = 0.36. Problem 9.4.6 Solution For this problem, let Y = X 1 X 2 · · · n−1 and let X = X n . Since E[Y] = 0 and E[X] = 0, Theorem 9.7(a) tells us that the minimum mean square linear estimate of X given Y is ˆ X n (Y) = ˆ a Y, where ˆ a = R −1 Y R YX . This implies that ˆ a is the solution to R Y ˆ a = R YX . (1) Note that R Y = E YY = 1 c · · · c n−2 c c 2 . . . . . . . . . . . . . . . c c n−2 · · · c 1 ¸ ¸ ¸ ¸ ¸ ¸ , R YX = E X 1 X 2 . . . X n−1 ¸ ¸ ¸ ¸ ¸ X n ¸ ¸ ¸ ¸ ¸ = c n−1 c n−2 . . . c ¸ ¸ ¸ ¸ ¸ . (2) 344 We see that the last column of cR Y equals R YX . Equivalently, if ˆ a = 0 · · · 0 c , then R Y ˆ a = R YX . It follows that the optimal linear estimator of X n given Y is ˆ X n (Y) = ˆ a Y = cX n−1 , (3) which completes the proof of the claim. The mean square error of this estimate is e ∗ L = E (X n −cX n−1 ) 2 (4) = R X (n, n) −cR X (n, n −1) −cR X (n −1, n) +c 2 R X (n −1, n −1) (5) = 1 −2c 2 +c 2 = 1 −c 2 (6) When c is close to 1, X n−1 and X n are highly correlated and the estimation error will be small. Comment: We will see in Chapter 11 that correlation matrices with this structure arise frequently in the study of wide sense stationary random sequences. In fact, if you read ahead, you will find that the claim we just proved is the essentially the same as that made in Theorem 11.10. Problem 9.4.7 Solution (a) In this case, we use the observation Y to estimate each X i . Since E[X i ] = 0, E [Y] = k ¸ j=1 E [X j ] √ p j S j +E [N] = 0. (1) Thus, Theorem 9.7(a) tells us that the MMSE linear estimate of X i is ˆ X i (Y) = ˆ a Y where ˆ a = R −1 Y R YX i . First we note that R YX i = E [YX i ] = E ¸ k ¸ j=1 X j √ p j S j +N ¸ X i ¸ ¸ (2) Since N and X i are independent, E[NX i ] = E[N]E[X i ] = 0. Because X i and X j are independent for i = j, E[X i X j ] = E[X i ]E[X j ] = 0 for i = j. In addition, E[X 2 i ] = 1, and it follows that R YX i = k ¸ j=1 E [X j X i ] √ p j S j +E [NX i ] = √ p i S i . (3) For the same reasons, R Y = E YY = E ¸ k ¸ j=1 √ p j X j S j +N ¸ k ¸ l=1 √ p l X l S l +N ¸ ¸ (4) = k ¸ j=1 k ¸ l=1 √ p j p l E [X j X l ] S j S l + k ¸ j=1 √ p j E [X j N] . .. . =0 S j + k ¸ l=1 √ p l E X l N . .. . =0 S l +E NN (5) = k ¸ j=1 p j S j S j + σ 2 I (6) 345 Now we use a linear algebra identity. For a matrix S with columns S 1 , S 2 , . . . , S k , and a diagonal matrix P = diag[p 1 , p 2 , . . . , p k ], k ¸ j=1 p j S j S j = SPS . (7) Although this identity may be unfamiliar, it is handy in manipulating correlation matrices. (Also, if this is unfamiliar, you may wish to work out an example with k = 2 vectors of length 2 or 3.) Thus, R Y = SPS + σ 2 I, (8) and ˆ a = R −1 Y R YX i = SPS + σ 2 I −1 √ p i S i . (9) Recall that if C is symmetric, then C −1 is also symmetric. This implies the MMSE estimate of X i given Y is ˆ X i (Y) = ˆ a Y = √ p i S i SPS + σ 2 I −1 Y (10) (b) We observe that V = (SPS + σ 2 I) −1 Y is a vector that does not depend on which bit X i that we want to estimate. Since ˆ X i = √ p i S i V, we can form the vector of estimates ˆ X = ˆ X 1 . . . ˆ X k ¸ ¸ ¸ = √ p 1 S 1 V . . . √ p k S k V ¸ ¸ ¸ = √ p 1 . . . √ p k ¸ ¸ ¸ S 1 . . . S k ¸ ¸ ¸V (11) = P 1/2 S V (12) = P 1/2 S SPS + σ 2 I −1 Y (13) Problem 9.5.1 Solution This problem can be solved using the function mse defined in Example 9.10. All we need to do is define the correlation structure of the vector X = X 1 · · · X 21 . Just as in Example 9.10, we do this by defining just the first row of the correlation matrix. Here are the commands we need, and the resulting plot. r1=sinc(0.1*(0:20)); mse(r1); hold on; r5=sinc(0.5*(0:20)); mse(r5); r9=sinc(0.9*(0:20)); mse(r9); 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 Although the plot lacks labels, there are three curves for the mean square error MSE(n) corre- sponding to φ 0 ∈ {0.1, 0.5, 0.9}. Keep in mind that MSE(n) is the MSE of the linear estimate of X 21 using random variables X 1 , . . . , X n . If you run the commands, you’ll find that the φ 0 = 0.1 yields the lowest mean square error while φ 0 = 0.9 results in the highest mean square error. When φ 0 = 0.1, random variables X n for n = 10, 11, . . . , 20 are increasingly correlated with X 21 . The result is that the MSE starts to 346 decline rapidly for n > 10. As φ 0 increases, fewer observations X n are correlated with X 21 . The result is the MSE is simply worse as φ 0 increases. For example, when φ 0 = 0.9, even X 20 has only a small correlation with X 21 . We only get a good estimate of X 21 at time n = 21 when we observe X 21 +W 21 . Problem 9.5.2 Solution This problem can be solved using the function mse defined in Example 9.10. All we need to do is define the correlation structure of the vector X = X 1 · · · X 21 . Just as in Example 9.10, we do this by defining just the first row r of the correlation matrix. Here are the commands we need, and the resulting plots. 0 10 20 0 0.05 0.1 0 10 20 0 0.05 0.1 0 10 20 0 0.05 0.1 n=0:20; r1=cos(0.1*pi*n); mse(r1); n=0:20; r5=cos(0.5*pi*n); mse(r5); n=0:20; r9=cos(0.9*pi*n); mse(r9); All three cases report similar results for the mean square error (MSE). the reason is that in all three cases, X 1 and X 21 are completely correlated; that is, ρ X 1 ,X 21 = 1. As a result, X 1 = X 21 so that at time n = 1, the observation is Y 1 = X 1 +W 1 = X 21 +W 1 . (1) The MSE at time n = 1 is 0.1, corresponding to the variance of the additive noise. Subsequent improvements in the estimates are the result of making other measurements of the form Y n = X n +W n where X n is highly correlated with X 21 . The result is a form of averaging of the additive noise, which effectively reduces its variance. The choice of φ 0 changes the values of n for which X n and X 21 are highly correlated. However, in all three cases, there are several such values of n and the result in all cases is an improvement in the MSE due to averaging the additive noise. Problem 9.5.3 Solution The solution to this problem is almost the same as the solution to Example 9.10, except perhaps the Matlab code is somewhat simpler. As in the example, let W (n) , X (n) , and Y (n) denote the vectors, consisting of the first n components of W, X, and Y. Just as in Examples 9.8 and 9.10, independence of X (n) and W (n) implies that the correlation matrix of Y (n) is R Y (n) = E (X (n) +W (n) )(X (n) +W (n) ) = R X (n) +R W (n) (1) Note that R X (n) and R W (n) are the n n upper-left submatrices of R X and R W . In addition, R Y (n) X = E X 1 +W 1 . . . X n +W n ¸ ¸ ¸X 1 ¸ ¸ ¸ = r 0 . . . r n−1 ¸ ¸ ¸. (2) 347 Compared to the solution of Example 9.10, the only difference in the solution is in the reversal of the vector R Y (n) X . The optimal filter based on the first n observations is ˆ a (n) = R −1 Y (n) R Y (n) X , and the mean square error is e ∗ L = Var[X 1 ] −(ˆ a (n) ) R Y (n) X . (3) function e=mse953(r) N=length(r); e=[]; for n=1:N, RYX=r(1:n)’; RY=toeplitz(r(1:n))+0.1*eye(n); a=RY\RYX; en=r(1)-(a’)*RYX; e=[e;en]; end plot(1:N,e); The program mse953.m simply calculates the mean square error e ∗ L . The input is the vector r correspond- ing to the vector r 0 · · · r 20 , which holds the first row of the Toeplitz correlation matrix R X . Note that R X (n) is the Toeplitz matrix whose first row is the first n elements of r. To plot the mean square error as a function of the number of observations, n, we generate the vector r and then run mse953(r). For the requested cases (a) and (b), the necessary Matlab commands and corresponding mean square estimation error output as a function of n are shown here: 0 5 10 15 20 25 0 0.05 0.1 n M S E 0 5 10 15 20 25 0 0.05 0.1 n M S E ra=sinc(0.1*pi*(0:20)); mse953(ra) rb=cos(0.5*pi*(0:20)); mse953(rb) (a) (b) In comparing the results of cases (a) and (b), we see that the mean square estimation error depends strongly on the correlation structure given by r |i−j| . For case (a), Y 1 is a noisy observation of X 1 and is highly correlated with X 1 . The MSE at n = 1 is just the variance of W 1 . Additional samples of Y n mostly help to average the additive noise. Also, samples X n for n ≥ 10 have very little correlation with X 1 . Thus for n ≥ 10, the samples of Y n result in almost no improvement in the estimate of X 1 . In case (b), Y 1 = X 1 +W 1 , just as in case (a), is simply a noisy copy of X 1 and the estimation error is due to the variance of W 1 . On the other hand, for case (b), X 5 , X 9 , X 13 and X 17 and X 21 are completely correlated with X 1 . Other samples also have significant correlation with X 1 . As a result, the MSE continues to go down with increasing n. Problem 9.5.4 Solution When the transmitted bit vector is X, the received signal vector Y = SP 1/2 X+N. In Problem 9.4.7, we found that the linear MMSE estimate of the vector X is ˆ X = P 1/2 S SPS + σ 2 I −1 Y. (1) As indicated in the problem statement, the evaluation of the LMSE detector follows the approach outlined in Problem 8.4.6. Thus the solutions are essentially the same. 348 function S=randomsignals(n,k); %S is an n by k matrix, columns are %random unit length signal vectors S=(rand(n,k)>0.5); S=((2*S)-1.0)/sqrt(n); The transmitted data vector x belongs to the set B k of all binary ±1 vectors of length k. This short program generates k random signals, each of length n. Each random signal is a binary ±1 sequence normalized to length 1. The evaluation of the LMSE detector is most similar to evaluation of the matched filter (MF) detector in Problem 8.4.6. We define a function err=lmsecdmasim(S,P,m) that simulates the LMSE detector for m symbols for a given set of signal vectors S. In lmsecdmasim, there is no need for looping. The mth transmitted symbol is represented by the mth column of X and the corresponding received signal is given by the mth column of Y. The matched filter processing can be applied to all m columns at once. A second function Pe=lmsecdma(n,k,snr,s,m) cycles through all combination of users k and SNR snr and calculates the bit error rate for each pair of values. Here are the functions: function e=lmsecdmasim(S,P,m); %err=lmsecdmasim(P,S,m); %S= n x k matrix of signals %P= diag matrix of SNRs % SNR=power/var(noise) k=size(S,2); %no. of users n=size(S,1); %proc. gain P2=sqrt(P); X=randombinaryseqs(k,m); Y=S*P2*X+randn(n,m); L=P2*S’*inv((S*P*S’)+eye(n)); XR=sign(L*Y); e=sum(sum(XR ~= X)); function Pe=lmsecdma(n,k,snr,s,m); %Pe=lmsecdma(n,k,snr,s,m); %RCDMA, LMSE detector, users=k %proc gain=n, rand signals/frame % s frames, m symbols/frame %See Problem 9.5.4 Solution [K,SNR]=ndgrid(k,snr); Pe=zeros(size(SNR)); for j=1:prod(size(SNR)), p=SNR(j);kt=K(j); e=0; for i=1:s, S=randomsignals(n,kt); e=e+lmsecdmasim(S,p*eye(kt),m); end Pe(j)=e/(s*m*kt); disp([snr kt e]); end Here is a run of lmsecdma. >> pelmse = lmsecdma(32,k,4,1000,1000); 4 2 48542 4 4 109203 4 8 278266 4 16 865358 4 32 3391488 >> pelmse’ ans = 0.0243 0.0273 0.0348 0.0541 0.1060 >> For processing gain n = 32, the maximum likelihood detector is too complex for my version of Matlab to run quickly. Instead we can compare the LMSE detector to the matched filter (MF) detector of Problem 8.4.6 and the decorrelator of Problem 8.4.7 with the following script: k=[2 4 8 16 32]; pemf = mfrcdma(32,k,4,1000,1000); pedec=berdecorr(32,k,4,10000); pelmse = lmsecdma(32,k,4,1000,1000); plot(k,pemf,’-d’,k,pedec,’-s’,k,pelmse); legend(’MF’,’DEC’,’LMSE’,2); axis([0 32 0 0.5]); 349 The resulting plot resembles 0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4 0.5 MF DEC LMSE Compared to the matched filter and the decorrelator, the linear pre-processing of the LMSE detector offers an improvement in the bit error rate. Recall that for each bit X i , the decorrelator zeroes out the interference from all users j = i at the expense of enhancing the receiver noise. When the number of users is small, the decorrelator works well because the cost of suppressing other users is small. When the number of users equals the processing gain, the decorrelator owrks poorly because the noise is gratly enhanced. By comparison, the LMSE detector applies linear processing that results in an output that minimizes the mean square error between the output and the original transmitted bit. It works about as well as the decorrelator when the number of users is small. For a large number of users, it still works better than the matched filter detector. 350 Problem Solutions – Chapter 10 Problem 10.2.1 Solution • In Example 10.3, the daily noontime temperature at Newark Airport is a discrete time, continuous value random process. However, if the temperature is recorded only in units of one degree, then the process was would be discrete value. • In Example 10.4, the number of active telephone calls is discrete time and discrete value. • The dice rolling experiment of Example 10.5 yields a discrete time, discrete value random process. • The QPSK system of Example 10.6 is a continuous time and continuous value random process. Problem 10.2.2 Solution The sample space of the underlying experiment is S = {s 0 , s 1 , s 2 , s 3 }. The four elements in the sample space are equally likely. The ensemble of sample functions is {x(t, s i )|i = 0, 1, 2, 3} where x(t, s i ) = cos(2πf 0 t + π/4 +iπ/2) (0 ≤ t ≤ T) (1) For f 0 = 5/T, this ensemble is shown below. 0 0.2T 0.4T 0.6T 0.8T T −1 −0.5 0 0.5 1 x ( t , s 0 ) 0 0.2T 0.4T 0.6T 0.8T T −1 −0.5 0 0.5 1 x ( t , s 1 ) 0 0.2T 0.4T 0.6T 0.8T T −1 −0.5 0 0.5 1 x ( t , s 2 ) 0 0.2T 0.4T 0.6T 0.8T T −1 −0.5 0 0.5 1 x ( t , s 3 ) t Problem 10.2.3 Solution The eight possible waveforms correspond to the bit sequences {(0, 0, 0), (1, 0, 0), (1, 1, 0), . . . , (1, 1, 1)} (1) The corresponding eight waveforms are: 351 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 0 T 2T 3T −1 0 1 Problem 10.2.4 Solution The statement is false. As a counterexample, consider the rectified cosine waveform X(t) = R| cos 2πft| of Example 10.9. When t = π/2, then cos 2πft = 0 so that X(π/2) = 0. Hence X(π/2) has PDF f X(π/2) (x) = δ(x) (1) That is, X(π/2) is a discrete random variable. Problem 10.3.1 Solution In this problem, we start from first principles. What makes this problem fairly straightforward is that the ramp is defined for all time. That is, the ramp doesn’t start at time t = W. P [X(t) ≤ x] = P [t −W ≤ x] = P [W ≥ t −x] (1) Since W ≥ 0, if x ≥ t then P[W ≥ t −x] = 1. When x < t, P [W ≥ t −x] = ∞ t−x f W (w) dw = e −(t−x) (2) Combining these facts, we have F X(t) (x) = P [W ≥ t −x] = e −(t−x) x < t 1 t ≤ x (3) We note that the CDF contain no discontinuities. Taking the derivative of the CDF F X(t) (x) with respect to x, we obtain the PDF f X(t) (x) = e x−t x < t 0 otherwise (4) 352 Problem 10.3.2 Solution (a) Each resistor has frequency W in Hertz with uniform PDF f R (r) = 0.025 9980 ≤ r ≤ 1020 0 otherwise (1) The probability that a test yields a one part in 10 4 oscillator is p = P [9999 ≤ W ≤ 10001] = 10001 9999 (0.025) dr = 0.05 (2) (b) To find the PMF of T 1 , we view each oscillator test as an independent trial. A success occurs on a trial with probability p if we find a one part in 10 4 oscillator. The first one part in 10 4 oscillator is found at time T 1 = t if we observe failures on trials 1, . . . , t − 1 followed by a success on trial t. Hence, just as in Example 2.11, T 1 has the geometric PMF P T 1 (t) = (1 −p) t−1 p t = 1, 2, . . . 9 otherwise (3) A geometric random variable with success probability p has mean 1/p. This is derived in Theorem 2.5. The expected time to find the first good oscillator is E[T 1 ] = 1/p = 20 minutes. (c) Since p = 0.05, the probability the first one part in 10 4 oscillator is found in exactly 20 minutes is P T 1 (20) = (0.95) 19 (0.05) = 0.0189. (d) The time T 5 required to find the 5th one part in 10 4 oscillator is the number of trials needed for 5 successes. T 5 is a Pascal random variable. If this is not clear, see Example 2.15 where the Pascal PMF is derived. When we are looking for 5 successes, the Pascal PMF is P T 5 (t) = t−1 4 p 5 (1 −p) t−5 t = 5, 6, . . . 0 otherwise (4) Looking up the Pascal PMF in Appendix A, we find that E[T 5 ] = 5/p = 100 minutes. The following argument is a second derivation of the mean of T 5 . Once we find the first one part in 10 4 oscillator, the number of additional trials needed to find the next one part in 10 4 oscillator once again has a geometric PMF with mean 1/p since each independent trial is a success with probability p. Similarly, the time required to find 5 one part in 10 4 oscillators is the sum of five independent geometric random variables. That is, T 5 = K 1 +K 2 +K 3 +K 4 +K 5 (5) where each K i is identically distributed to T 1 . Since the expectation of the sum equals the sum of the expectations, E [T 5 ] = E [K 1 +K 2 +K 3 +K 4 +K 5 ] = 5E [K i ] = 5/p = 100 minutes (6) 353 Problem 10.3.3 Solution Once we find the first one part in 10 4 oscillator, the number of additional tests needed to find the next one part in 10 4 oscillator once again has a geometric PMF with mean 1/p since each independent trial is a success with probability p. That is T 2 = T 1 +T where T is independent and identically distributed to T 1 . Thus, E [T 2 |T 1 = 3] = E [T 1 |T 1 = 3] +E T |T 1 = 3 (1) = 3 +E T = 23 minutes. (2) Problem 10.3.4 Solution Since the problem states that the pulse is delayed, we will assume T ≥ 0. This problem is difficult because the answer will depend on t. In particular, for t < 0, X(t) = 0 and f X(t) (x) = δ(x). Things are more complicated when t > 0. For x < 0, P[X(t) > x] = 1. For x ≥ 1, P[X(t) > x] = 0. Lastly, for 0 ≤ x < 1, P [X(t) > x] = P e −(t−T) u(t −T) > x (1) = P [t + ln x < T ≤ t] (2) = F T (t) −F T (t + ln x) (3) Note that condition T ≤ t is needed to make sure that the pulse doesn’t arrive after time t. The other condition T > t + ln x ensures that the pulse didn’t arrrive too early and already decay too much. We can express these facts in terms of the CDF of X(t). F X(t) (x) = 1 −P [X(t) > x] = 0 x < 0 1 +F T (t + ln x) −F T (t) 0 ≤ x < 1 1 x ≥ 1 (4) We can take the derivative of the CDF to find the PDF. However, we need to keep in mind that the CDF has a jump discontinuity at x = 0. In particular, since ln 0 = −∞, F X(t) (0) = 1 +F T (−∞) −F T (t) = 1 −F T (t) (5) Hence, when we take a derivative, we will see an impulse at x = 0. The PDF of X(t) is f X(t) (x) = (1 −F T (t))δ(x) +f T (t + ln x) /x 0 ≤ x < 1 0 otherwise (6) Problem 10.4.1 Solution Each Y k is the sum of two identical independent Gaussian random variables. Hence, each Y k must have the same PDF. That is, the Y k are identically distributed. Next, we observe that the sequence of Y k is independent. To see this, we observe that each Y k is composed of two samples of X k that are unused by any other Y j for j = k. Problem 10.4.2 Solution Each W n is the sum of two identical independent Gaussian random variables. Hence, each W n must have the same PDF. That is, the W n are identically distributed. However, since W n−1 and 354 W n both use X n−1 in their averaging, W n−1 and W n are dependent. We can verify this observation by calculating the covariance of W n−1 and W n . First, we observe that for all n, E [W n ] = (E [X n ] +E [X n−1 ])/2 = 30 (1) Next, we observe that W n−1 and W n have covariance Cov [W n−1 , W n ] = E [W n−1 W n ] −E [W n ] E [W n−1 ] (2) = 1 4 E [(X n−1 +X n−2 )(X n +X n−1 )] −900 (3) We observe that for n = m, E[X n X m ] = E[X n ]E[X m ] = 900 while E X 2 n = Var[X n ] + (E [X n ]) 2 = 916 (4) Thus, Cov [W n−1 , W n ] = 900 + 916 + 900 + 900 4 −900 = 4 (5) Since Cov[W n−1 , W n ] = 0, W n and W n−1 must be dependent. Problem 10.4.3 Solution The number Y k of failures between successes k − 1 and k is exactly y ≥ 0 iff after success k − 1, there are y failures followed by a success. Since the Bernoulli trials are independent, the probability of this event is (1 −p) y p. The complete PMF of Y k is P Y k (y) = (1 −p) y p y = 0, 1, . . . 0 otherwise (1) Since this argument is valid for all k including k = 1, we can conclude that Y 1 , Y 2 , . . . are identically distributed. Moreover, since the trials are independent, the failures between successes k −1 and k and the number of failures between successes k −1 and k are independent. Hence, Y 1 , Y 2 , . . . is an iid sequence. Problem 10.5.1 Solution This is a very straightforward problem. The Poisson process has rate λ = 4 calls per second. When t is measured in seconds, each N(t) is a Poisson random variable with mean 4t and thus has PMF P N(t) (n) = (4t) n n! e −4t n = 0, 1, 2, . . . 0 otherwise (1) Using the general expression for the PMF, we can write down the answer for each part. (a) P N(1) (0) = 4 0 e −4 /0! = e −4 ≈ 0.0183. (b) P N(1) (4) = 4 4 e −4 /4! = 32e −4 /3 ≈ 0.1954. (c) P N(2) (2) = 8 2 e −8 /2! = 32e −8 ≈ 0.0107. 355 Problem 10.5.2 Solution Following the instructions given, we express each answer in terms of N(m) which has PMF P N(m) (n) = (6m) n e −6m /n! n = 0, 1, 2, . . . 0 otherwise (1) (a) The probability of no queries in a one minute interval is P N(1) (0) = 6 0 e −6 /0! = 0.00248. (b) The probability of exactly 6 queries arriving in a one minute interval is P N(1) (6) = 6 6 e −6 /6! = 0.161. (c) The probability of exactly three queries arriving in a one-half minute interval is P N(0.5) (3) = 3 3 e −3 /3! = 0.224. Problem 10.5.3 Solution Since there is always a backlog an the service times are iid exponential random variables, The time between service completions are a sequence of iid exponential random variables. that is, the service completions are a Poisson process. Since the expected service time is 30 minutes, the rate of the Poisson process is λ = 1/30 per minute. Since t hours equals 60t minutes, the expected number serviced is λ(60t) or 2t. Moreover, the number serviced in the first t hours has the Poisson PMF P N(t) (n) = (2t) n e −2t n! n = 0, 1, 2, . . . 0 otherwise (1) Problem 10.5.4 Solution Since D(t) is a Poisson process with rate 0.1 drops/day, the random variable D(t) is a Poisson random variable with parameter α = 0.1t. The PMF of D(t). the number of drops after t days, is P D(t) (d) = (0.1t) d e −0.1t /d! d = 0, 1, 2, . . . 0 otherwise (1) Problem 10.5.5 Solution Note that it matters whether t ≥ 2 minutes. If t ≤ 2, then any customers that have arrived must still be in service. Since a Poisson number of arrivals occur during (0, t], P N(t) (n) = (λt) n e −λt /n! n = 0, 1, 2, . . . 0 otherwise (0 ≤ t ≤ 2) (1) For t ≥ 2, the customers in service are precisely those customers that arrived in the interval (t−2, t]. The number of such customers has a Poisson PMF with mean λ[t − (t − 2)] = 2λ. The resulting PMF of N(t) is P N(t) (n) = (2λ) n e −2λ /n! n = 0, 1, 2, . . . 0 otherwise (t ≥ 2) (2) 356 Problem 10.5.6 Solution The time T between queries are independent exponential random variables with PDF f T (t) = (1/8)e −t/8 t ≥ 0 0 otherwise (1) From the PDF, we can calculate for t > 0, P [T ≥ t] = t 0 f T t dt = e −t/8 (2) Using this formula, each question can be easily answered. (a) P[T ≥ 4] = e −4/8 ≈ 0.951. (b) P [T ≥ 13|T ≥ 5] = P [T ≥ 13, T ≥ 5] P [T ≥ 5] (3) = P [T ≥ 13] P [T ≥ 5] = e −13/8 e −5/8 = e −1 ≈ 0.368 (4) (c) Although the time betwen queries are independent exponential random variables, N(t) is not exactly a Poisson random process because the first query occurs at time t = 0. Recall that in a Poisson process, the first arrival occurs some time after t = 0. However N(t) − 1 is a Poisson process of rate 8. Hence, for n = 0, 1, 2, . . ., P [N(t) −1 = n] = (t/8) n e −t/8 /n! (5) Thus, for n = 1, 2, . . ., the PMF of N(t) is P N(t) (n) = P [N(t) −1 = n −1] = (t/8) n−1 e −t/8 /(n −1)! (6) The complete expression of the PMF of N(t) is P N(t) (n) = (t/8) n−1 e −t/8 /(n −1)! n = 1, 2, . . . 0 otherwise (7) Problem 10.5.7 Solution This proof is just a simplified version of the proof given for Theorem 10.3. The first arrival occurs at time X 1 > x ≥ 0 iff there are no arrivals in the interval (0, x]. Hence, for x ≥ 0, P [X 1 > x] = P [N(x) = 0] = (λx) 0 e −λx /0! = e −λx (1) Since P[X 1 ≤ x] = 0 for x < 0, the CDF of X 1 is the exponential CDF F X 1 (x) = 0 x < 0 1 −e −λx x ≥ 0 (2) 357 Problem 10.5.8 Solution (a) For X i = −ln U i , we can write P [X i > x] = P [−ln U i > x] = P [ln U i ≤ −x] = P U i ≤ e −x (1) When x < 0, e −x > 1 so that P[U i ≤ e −x ] = 1. When x ≥ 0, we have 0 < e −x ≤ 1, implying P[U i ≤ e −x ] = e −x . Combining these facts, we have P [X i > x] = 1 x < 0 e −x x ≥ 0 (2) This permits us to show that the CDF of X i is F X i (x) = 1 −P [X i > x] = 0 x < 0 1 −e −x x > 0 (3) We see that X i has an exponential CDF with mean 1. (b) Note that N = n iff n ¸ i=1 U i ≥ e −t > n+1 ¸ i=1 U i (4) By taking the logarithm of both inequalities, we see that N = n iff n ¸ i=1 ln U i ≥ −t > n+1 ¸ i=1 ln U i (5) Next, we multiply through by −1 and recall that X i = −ln U i is an exponential random variable. This yields N = n iff n ¸ i=1 X i ≤ t < n+1 ¸ i=1 X i (6) Now we recall that a Poisson process N(t) of rate 1 has independent exponential interarrival times X 1 , X 2 , . . .. That is, the ith arrival occurs at time ¸ i j=1 X j . Moreover, N(t) = n iff the first n arrivals occur by time t but arrival n + 1 occurs after time t. Since the random variable N(t) has a Poisson distribution with mean t, we can write P ¸ n ¸ i=1 X i ≤ t < n+1 ¸ i=1 X i ¸ = P [N(t) = n] = t n e −t n! . (7) Problem 10.6.1 Solution Customers entering (or not entering) the casino is a Bernoulli decomposition of the Poisson process of arrivals at the casino doors. By Theorem 10.6, customers entering the casino are a Poisson process of rate 100/2 = 50 customers/hour. Thus in the two hours from 5 to 7 PM, the number, N, of customers entering the casino is a Poisson random variable with expected value α = 2· 50 = 100. The PMF of N is P N (n) = 100 n e −100 /n! n = 0, 1, 2, . . . 0 otherwise (1) 358 Problem 10.6.2 Solution In an interval (t, t +∆] with an infinitesimal ∆, let A i denote the event of an arrival of the process N i (t). Also, let A = A 1 ∪A 2 denote the event of an arrival of either process. Since N i (t) is a Poisson process, the alternative model says that P[A i ] = λ i ∆. Also, since N 1 (t)+N 2 (t) is a Poisson process, the proposed Poisson process model says P [A] = (λ 1 + λ 2 )∆ (1) Lastly, the conditional probability of a type 1 arrival given an arrival of either type is P [A 1 |A] = P [A 1 A] P [A] = P [A 1 ] P [A] = λ 1 ∆ (λ 1 + λ 2 )∆ = λ 1 λ 1 + λ 2 (2) This solution is something of a cheat in that we have used the fact that the sum of Poisson processes is a Poisson process without using the proposed model to derive this fact. Problem 10.6.3 Solution We start with the case when t ≥ 2. When each service time is equally likely to be either 1 minute or 2 minutes, we have the following situation. Let M 1 denote those customers that arrived in the interval (t − 1, 1]. All M 1 of these customers will be in the bank at time t and M 1 is a Poisson random variable with mean λ. Let M 2 denote the number of customers that arrived during (t − 2, t − 1]. Of course, M 2 is Poisson with expected value λ. We can view each of the M 2 customers as flipping a coin to determine whether to choose a 1 minute or a 2 minute service time. Only those customers that chooses a 2 minute service time will be in service at time t. Let M 2 denote those customers choosing a 2 minute service time. It should be clear that M 2 is a Poisson number of Bernoulli random variables. Theorem 10.6 verifies that using Bernoulli trials to decide whether the arrivals of a rate λ Poisson process should be counted yields a Poisson process of rate pλ. A consequence of this result is that a Poisson number of Bernoulli (success probability p) random variables has Poisson PMF with mean pλ. In this case, M 2 is Poisson with mean λ/2. Moreover, the number of customers in service at time t is N(t) = M 1 +M 2 . Since M 1 and M 2 are independent Poisson random variables, their sum N(t) also has a Poisson PMF. This was verified in Theorem 6.9. Hence N(t) is Poisson with mean E[N(t)] = E[M 1 ] +E[M 2 ] = 3λ/2. The PMF of N(t) is P N(t) (n) = (3λ/2) n e −3λ/2 /n! n = 0, 1, 2, . . . 0 otherwise (t ≥ 2) (1) Now we can consider the special cases arising when t < 2. When 0 ≤ t < 1, every arrival is still in service. Thus the number in service N(t) equals the number of arrivals and has the PMF P N(t) (n) = (λt) n e −λt /n! n = 0, 1, 2, . . . 0 otherwise (0 ≤ t ≤ 1) (2) When 1 ≤ t < 2, let M 1 denote the number of customers in the interval (t −1, t]. All M 1 customers arriving in that interval will be in service at time t. The M 2 customers arriving in the interval (0, t − 1] must each flip a coin to decide one a 1 minute or two minute service time. Only those customers choosing the two minute service time will be in service at time t. Since M 2 has a Poisson PMF with mean λ(t −1), the number M 2 of those customers in the system at time t has a Poisson PMF with mean λ(t − 1)/2. Finally, the number of customers in service at time t has a Poisson 359 PMF with expected value E[N(t)] = E[M 1 ] + E[M 2 ] = λ + λ(t − 1)/2. Hence, the PMF of N(t) becomes P N(t) (n) = (λ(t + 1)/2) n e −λ(t+1)/2 /n! n = 0, 1, 2, . . . 0 otherwise (1 ≤ t ≤ 2) (3) Problem 10.6.4 Solution Since the arrival times S 1 , . . . , S n are ordered in time and since a Poisson process cannot have two simultaneous arrivals, the conditional PDF f S 1 ,...,Sn|N (S 1 , . . . , S n |n) is nonzero only if s 1 < s 2 < · · · < s n < T. In this case, consider an arbitrarily small ∆; in particular, ∆ < min i (s i+1 − s i )/2 implies that the intervals (s i , s i +∆] are non-overlapping. We now find the joint probability P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆, N = n] that each S i is in the interval (s i , s i + ∆] and that N = n. This joint event implies that there were zero arrivals in each interval (s i + ∆, s i+1 ]. That is, over the interval [0, T], the Poisson process has exactly one arrival in each interval (s i , s i + ∆] and zero arrivals in the time period T − ¸ n i=1 (s i , s i +∆]. The collection of intervals in which there was no arrival had a total duration of T − n∆. Note that the probability of exactly one arrival in the interval (s i , s i + ∆] is λ∆e −λδ and the probability of zero arrivals in a period of duration T − n∆ is e −λ(Tn−∆) . In addition, the event of one arrival in each interval (s i , s i +∆) and zero events in the period of length T −n∆ are independent events because they consider non-overlapping periods of the Poisson process. Thus, P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆, N = n] = λ∆e −λ∆ n e −λ(T−n∆) (1) = (λ∆) n e −λT (2) Since P[N = n] = (λT) n e −λT /n!, we see that P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆|N = n] = P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆, N = n] P [N = n] (3) = (λ∆) n e −λT (λT) n e −λT /n! (4) = n! T n ∆ n (5) Finally, for infinitesimal ∆, the conditional PDF of S 1 , . . . , S n given N = n satisfies f S 1 ,...,Sn|N (s 1 , . . . , s n |n) ∆ n = P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆|N = n] (6) = n! T n ∆ n (7) Since the conditional PDF is zero unless s 1 < s 2 < · · · < s n ≤ T, it follows that f S 1 ,...,S n |N (s 1 , . . . , s n |n) = n!/T n 0 ≤ s 1 < · · · < s n ≤ T, 0 otherwise. (8) If it seems that the above argument had some “hand-waving,” we now do the derivation of P[s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆|N = n] in somewhat excruciating detail. (Feel free to skip the following if you were satisfied with the earlier explanation.) 360 For the interval (s, t], we use the shorthand notation 0 (s,t) and 1 (s,t) to denote the events of 0 arrivals and 1 arrival respectively. This notation permits us to write P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆, N = n] = P 0 (0,s 1 ) 1 (s 1 ,s 1 +∆) 0 (s 1 +∆,s 2 ) 1 (s 2 ,s 2 +∆) 0 (s 2 +∆,s 3 ) · · · 1 (sn,sn+∆) 0 (sn+∆,T) (9) The set of events 0 (0,s 1 ) , 0 (sn+∆,T) , and for i = 1, . . . , n−1, 0 (s i +∆,s i+1 ) and 1 (s i ,s i +∆) are independent because each devent depend on the Poisson process in a time interval that overlaps none of the other time intervals. In addition, since the Poisson process has rate λ, P[0 (s,t) ] = e −λ(t−s) and P[1 (s i ,s i +∆) ] = (λ∆)e −λ∆ . Thus, P [s 1 < S 1 ≤ s 1 +∆, . . . , s n < S n ≤ s n +∆, N = n] = P 0 (0,s 1 ) P 1 (s 1 ,s 1 +∆) P 0 (s 1 +∆,s 2 ) · · · P 1 (sn,sn+∆) P 0 (sn+∆,T) (10) = e −λs 1 λ∆e −λ∆ e −λ(s 2 −s 1 −∆) · · · λ∆e −λ∆ e −λ(T−sn−∆) (11) = (λ∆) n e −λT (12) Problem 10.7.1 Solution From the problem statement, the change in the stock price is X(8)−X(0) and the standard deviation of X(8)−X(0) is 1/2 point. In other words, the variance of X(8)−X(0) is Var[X(8)−X(0)] = 1/4. By the definition of Brownian motion. Var[X(8) −X(0)] = 8α. Hence α = 1/32. Problem 10.7.2 Solution We need to verify that Y (t) = X(ct) satisfies the conditions given in Definition 10.10. First we observe that Y (0) = X(c · 0) = X(0) = 0. Second, we note that since X(t) is Brownian motion process implies that Y (t) −Y (s) = X(ct) −X(cs) is a Gaussian random variable. Further, X(ct) −X(cs) is independent of X(t ) for all t ≤ cs. Equivalently, we can say that X(ct) −X(cs) is independent of X(cτ) for all τ ≤ s. In other words, Y (t) − Y (s) is independent of Y (τ) for all τ ≤ s. Thus Y (t) is a Brownian motion process. Problem 10.7.3 Solution First we observe that Y n = X n −X n−1 = X(n)−X(n−1) is a Gaussian random variable with mean zero and variance α. Since this fact is true for all n, we can conclude that Y 1 , Y 2 , . . . are identically distributed. By Definition 10.10 for Brownian motion, Y n = X(n) − X(n − 1) is independent of X(m) for any m ≤ n −1. Hence Y n is independent of Y m = X(m) −X(m−1) for any m ≤ n −1. Equivalently, Y 1 , Y 2 , . . . is a sequence of independent random variables. Problem 10.7.4 Solution Recall that the vector X of increments has independent components X n = W n − W n−1 . Alterna- tively, each W n can be written as the sum W 1 = X 1 (1) W 2 = X 1 +X 2 (2) . . . W k = X 1 +X 2 + · · · +X k . (3) 361 In terms of matrices, W = AX where A is the lower triangular matrix A = 1 1 1 . . . . . . 1 · · · · · · 1 ¸ ¸ ¸ ¸ ¸ . (4) Since E[W] = AE[X] = 0, it folows from Theorem 5.16 that f W (w) = 1 |det (A)| f X A −1 w . (5) Since A is a lower triangular matrix, det(A) = 1, the product of its diagonal entries. In addition, reflecting the fact that each X n = W n −W n−1 , A −1 = 1 −1 1 0 −1 1 . . . . . . . . . . . . 0 · · · 0 −1 1 ¸ ¸ ¸ ¸ ¸ ¸ ¸ and A −1 W = W 1 W 2 −W 1 W 3 −W 2 . . . W k −W k−1 ¸ ¸ ¸ ¸ ¸ ¸ ¸ . (6) Combining these facts with the observation that f X (x) = ¸ k n=1 f Xn (x n ), we can write f W (w) = f X A −1 w = k ¸ n=1 f Xn (w n −w n−1 ) , (7) which completes the missing steps in the proof of Theorem 10.8. Problem 10.8.1 Solution The discrete time autocovariance function is C X [m, k] = E [(X m −µ X )(X m+k −µ X )] (1) for k = 0, C X [m, 0] = Var[X m ] = σ 2 X . For k = 0, X m and X m+k are independent so that C X [m, k] = E [(X m −µ X )] E [(X m+k −µ X )] = 0 (2) Thus the autocovariance of X n is C X [m, k] = σ 2 X k = 0 0 k = 0 (3) Problem 10.8.2 Solution Recall that X(t) = t −W where E[W] = 1 and E[W 2 ] = 2. (a) The mean is µ X (t) = E[t −W] = t −E[W] = t −1. (b) The autocovariance is C X (t, τ) = E [X(t)X(t + τ)] −µ X (t)µ X (t + τ) (1) = E [(t −W)(t + τ −W)] −(t −1)(t + τ −1) (2) = t(t + τ) −E [(2t + τ)W] +E W 2 −t(t + τ) + 2t + τ −1 (3) = −(2t + τ)E [W] + 2 + 2t + τ −1 (4) = 1 (5) 362 Problem 10.8.3 Solution In this problem, the daily temperature process results from C n = 16 ¸ 1 −cos 2πn 365 + 4X n (1) where X n is an iid random sequence of N[0, 1] random variables. The hardest part of this problem is distinguishing between the process C n and the covariance function C C [k]. (a) The expected value of the process is E [C n ] = 16E ¸ 1 −cos 2πn 365 + 4E [X n ] = 16 ¸ 1 −cos 2πn 365 (2) (b) The autocovariance of C n is C C [m, k] = E ¸ C m −16 ¸ 1 −cos 2πm 365 C m+k −16 ¸ 1 −cos 2π(m+k) 365 (3) = 16E [X m X m+k ] = 16 k = 0 0 otherwise (4) (c) A model of this type may be able to capture the mean and variance of the daily temperature. However, one reason this model is overly simple is because day to day temperatures are uncorrelated. A more realistic model might incorporate the effects of “heat waves” or “cold spells” through correlated daily temperatures. Problem 10.8.4 Solution By repeated application of the recursion C n = C n−1 /2 + 4X n , we obtain C n = C n−2 4 + 4 ¸ X n−1 2 +X n (1) = C n−3 8 + 4 ¸ X n−2 4 + X n−1 2 +X n (2) . . . (3) = C 0 2 n + 4 ¸ X 1 2 n−1 + X 2 2 n−2 + · · · +X n = C 0 2 n + 4 n ¸ i=1 X i 2 n−i (4) (a) Since C 0 , X 1 , X 2 , . . . all have zero mean, E [C n ] = E [C 0 ] 2 n + 4 n ¸ i=1 E [X i ] 2 n−i = 0 (5) (b) The autocovariance is C C [m, k] = E C 0 2 n + 4 n ¸ i=1 X i 2 n−i ¸ C 0 2 m +k + 4 m+k ¸ j=1 X j 2 m+k−j ¸ ¸ ¸ (6) 363 Since C 0 , X 1 , X 2 , . . . are independent (and zero mean), E[C 0 X i ] = 0. This implies C C [m, k] = E C 2 0 2 2m+k + 16 m ¸ i=1 m+k ¸ j=1 E [X i X j ] 2 m−i 2 m+k−j (7) For i = j, E[X i X j ] = 0 so that only the i = j terms make any contribution to the double sum. However, at this point, we must consider the cases k ≥ 0 and k < 0 separately. Since each X i has variance 1, the autocovariance for k ≥ 0 is C C [m, k] = 1 2 2m+k + 16 m ¸ i=1 1 2 2m+k−2i (8) = 1 2 2m+k + 16 2 k m ¸ i=1 (1/4) m−i (9) = 1 2 2m+k + 16 2 k 1 −(1/4) m 3/4 (10) For k < 0, we can write C C [m, k] = E C 2 0 2 2m+k + 16 m ¸ i=1 m+k ¸ j=1 E [X i X j ] 2 m−i 2 m+k−j (11) = 1 2 2m+k + 16 m+k ¸ i=1 1 2 2m+k−2i (12) = 1 2 2m+k + 16 2 −k m+k ¸ i=1 (1/4) m+k−i (13) = 1 2 2m+k + 16 2 k 1 −(1/4) m+k 3/4 (14) A general expression that’s valid for all m and k is C C [m, k] = 1 2 2m+k + 16 2 |k| 1 −(1/4) min(m,m+k) 3/4 (15) (c) Since E[C i ] = 0 for all i, our model has a mean daily temperature of zero degrees Celsius for the entire year. This is not a reasonable model for a year. (d) For the month of January, a mean temperature of zero degrees Celsius seems quite reasonable. we can calculate the variance of C n by evaluating the covariance at n = m. This yields Var[C n ] = 1 4 n + 16 4 n 4(4 n −1) 3 (16) Note that the variance is upper bounded by Var[C n ] ≤ 64/3 (17) Hence the daily temperature has a standard deviation of 8/ √ 3 ≈ 4.6 degrees. Without actual evidence of daily temperatures in January, this model is more difficult to discredit. 364 Problem 10.8.5 Solution This derivation of the Poisson process covariance is almost identical to the derivation of the Brown- ian motion autocovariance since both rely on the use of independent increments. From the definition of the Poisson process, we know that µ N (t) = λt. When τ ≥ 0, we can write C N (t, τ) = E [N(t)N(t + τ)] −(λt)[λ(t + τ)] (1) = E [N(t)[(N(t + τ) −N(t)) +N(t)]] −λ 2 t(t + τ) (2) = E [N(t)[N(t + τ) −N(t)]] +E N 2 (t) −λ 2 t(t + τ) (3) By the definition of the Poisson process, N(t + τ) − N(t) is the number of arrivals in the interval [t, t + τ) and is independent of N(t) for τ > 0. This implies E [N(t)[N(t + τ) −N(t)]] = E [N(t)] E [N(t + τ) −N(t)] = λt[λ(t + τ) −λt] (4) Note that since N(t) is a Poisson random variable, Var[N(t)] = λt. Hence E N 2 (t) = Var[N(t)] + (E [N(t)] 2 = λt + (λt) 2 (5) Therefore, for τ ≥ 0, C N (t, τ) = λt[λ(t + τ) −λt) + λt + (λt) 2 −λ 2 t(t + τ) = λt (6) If τ < 0, then we can interchange the labels t and t+τ in the above steps to show C N (t, τ) = λ(t+τ). For arbitrary t and τ, we can combine these facts to write C N (t, τ) = λmin(t, t + τ) (7) Problem 10.9.1 Solution For an arbitrary set of samples Y (t 1 ), . . . , Y (t k ), we observe that Y (t j ) = X(t j +a). This implies f Y (t 1 ),...,Y (t k ) (y 1 , . . . , y k ) = f X(t 1 +a),...,X(t k +a) (y 1 , . . . , y k ) (1) Thus, f Y (t 1 +τ),...,Y (t k +τ) (y 1 , . . . , y k ) = f X(t 1 +τ+a),...,X(t k +τ+a) (y 1 , . . . , y k ) (2) Since X(t) is a stationary process, f X(t 1 +τ+a),...,X(t k +τ+a) (y 1 , . . . , y k ) = f X(t 1 +a),...,X(t k +a) (y 1 , . . . , y k ) (3) This implies f Y (t 1 +τ),...,Y (t k +τ) (y 1 , . . . , y k ) = f X(t 1 +a),...,X(t k +a) (y 1 , . . . , y k ) (4) = f Y (t 1 ),...,Y (t k ) (y 1 , . . . , y k ) (5) We can conclude that Y (t) is a stationary process. 365 Problem 10.9.2 Solution For an arbitrary set of samples Y (t 1 ), . . . , Y (t k ), we observe that Y (t j ) = X(at j ). This implies f Y (t 1 ),...,Y (t k ) (y 1 , . . . , y k ) = f X(at 1 ),...,X(at k ) (y 1 , . . . , y k ) (1) Thus, f Y (t 1 +τ),...,Y (t k +τ) (y 1 , . . . , y k ) = f X(at 1 +aτ),...,X(at k +aτ) (y 1 , . . . , y k ) (2) We see that a time offset of τ for the Y (t) process corresponds to an offset of time τ = aτ for the X(t) process. Since X(t) is a stationary process, f Y (t 1 +τ),...,Y (t k +τ) (y 1 , . . . , y k ) = f X(at 1 +τ ),...,X(at k +τ ) (y 1 , . . . , y k ) (3) = f X(at 1 ),...,X(at k ) (y 1 , . . . , y k ) (4) = f Y (t 1 ),...,Y (t k ) (y 1 , . . . , y k ) (5) We can conclude that Y (t) is a stationary process. Problem 10.9.3 Solution For a set of time samples n 1 , . . . , n m and an offset k, we note that Y n i +k = X((n i + k)∆). This implies f Y n 1 +k ,...,Y n m +k (y 1 , . . . , y m ) = f X((n 1 +k)∆),...,X((nm+k)∆) (y 1 , . . . , y m ) (1) Since X(t) is a stationary process, f X((n 1 +k)∆),...,X((n m +k)∆) (y 1 , . . . , y m ) = f X(n 1 ∆),...,X(n m ∆) (y 1 , . . . , y m ) (2) Since X(n i ∆) = Y n i , we see that f Y n 1 +k ,...,Y n m +k (y 1 , . . . , y m ) = f Yn 1 ,...,Yn m (y 1 , . . . , y m ) (3) Hence Y n is a stationary random sequence. Problem 10.9.4 Solution Since Y n = X kn , f Y n 1 +l ,...,Y n m +l (y 1 , . . . , y m ) = f X kn 1 +kl ,...,X kn m +kl (y 1 , . . . , y m ) (1) Stationarity of the X n process implies f X kn 1 +kl ,...,X kn m +kl (y 1 , . . . , y m ) = f X kn 1 ,...,X kn m (y 1 , . . . , y m ) (2) = f Yn 1 ,...,Yn m (y 1 , . . . , y m ) . (3) We combine these steps to write f Y n 1 +l ,...,Y n m +l (y 1 , . . . , y m ) = f Y n 1 ,...,Y n m (y 1 , . . . , y m ) . (4) Thus Y n is a stationary process. Comment: The first printing of the text asks whether Y n is wide stationary if X n is wide sense stationary. This fact is also true; however, since wide sense stationarity isn’t addressed until the next section, the problem was corrected to ask about stationarity. 366 Problem 10.9.5 Solution Given A = a, Y (t) = aX(t) which is a special case of Y (t) = aX(t) + b given in Theorem 10.10. Applying the result of Theorem 10.10 with b = 0 yields f Y (t 1 ),...,Y (tn)|A (y 1 , . . . , y n |a) = 1 a n f X(t 1 ),...,X(tn) y 1 a , . . . , y n a (1) Integrating over the PDF f A (a) yields f Y (t 1 ),...,Y (tn) (y 1 , . . . , y n ) = ∞ 0 f Y (t 1 ),...,Y (tn)|A (y 1 , . . . , y n |a) f A (a) da (2) = ∞ 0 1 a n f X(t 1 ),...,X(tn) y 1 a , . . . , y n a f A (a) da (3) This complicated expression can be used to find the joint PDF of Y (t 1 + τ), . . . , Y (t n + τ): f Y (t 1 +τ),...,Y (tn+τ) (y 1 , . . . , y n ) = ∞ 0 1 a n f X(t 1 +τ),...,X(tn+τ) y 1 a , . . . , y n a f A (a) da (4) Since X(t) is a stationary process, the joint PDF of X(t 1 + τ), . . . , X(t n + τ) is the same as the joint PDf of X(t 1 ), . . . , X(t n ). Thus f Y (t 1 +τ),...,Y (tn+τ) (y 1 , . . . , y n ) = ∞ 0 1 a n f X(t 1 +τ),...,X(tn+τ) y 1 a , . . . , y n a f A (a) da (5) = ∞ 0 1 a n f X(t 1 ),...,X(tn) y 1 a , . . . , y n a f A (a) da (6) = f Y (t 1 ),...,Y (t n ) (y 1 , . . . , y n ) (7) We can conclude that Y (t) is a stationary process. Problem 10.9.6 Solution Since g(·) is an unspecified function, we will work with the joint CDF of Y (t 1 + τ), . . . , Y (t n + τ). To show Y (t) is a stationary process, we will show that for all τ, F Y (t 1 +τ),...,Y (tn+τ) (y 1 , . . . , y n ) = F Y (t 1 ),...,Y (tn) (y 1 , . . . , y n ) (1) By taking partial derivatives with respect to y 1 , . . . , y n , it should be apparent that this implies that the joint PDF f Y (t 1 +τ),...,Y (tn+τ) (y 1 , . . . , y n ) will not depend on τ. To proceed, we write F Y (t 1 +τ),...,Y (t n +τ) (y 1 , . . . , y n ) = P [Y (t 1 + τ) ≤ y 1 , . . . , Y (t n + τ) ≤ y n ] (2) = P g(X(t 1 + τ)) ≤ y 1 , . . . , g(X(t n + τ)) ≤ y n . .. . Aτ ¸ ¸ ¸ (3) In principle, we can calculate P[A τ ] by integrating f X(t 1 +τ),...,X(tn+τ) (x 1 , . . . , x n ) over the region corresponding to event A τ . Since X(t) is a stationary process, f X(t 1 +τ),...,X(tn+τ) (x 1 , . . . , x n ) = f X(t 1 ),...,X(tn) (x 1 , . . . , x n ) (4) This implies P[A τ ] does not depend on τ. In particular, F Y (t 1 +τ),...,Y (t n +τ) (y 1 , . . . , y n ) = P [A τ ] (5) = P [g(X(t 1 )) ≤ y 1 , . . . , g(X(t n )) ≤ y n ] (6) = F Y (t 1 ),...,Y (tn) (y 1 , . . . , y n ) (7) 367 Problem 10.10.1 Solution The autocorrelation function R X (τ) = δ(τ) is mathematically valid in the sense that it meets the conditions required in Theorem 10.12. That is, R X (τ) = δ(τ) ≥ 0 (1) R X (τ) = δ(τ) = δ(−τ) = R X (−τ) (2) R X (τ) ≤ R X (0) = δ(0) (3) However, for a process X(t) with the autocorrelation R X (τ) = δ(τ), Definition 10.16 says that the average power of the process is E X 2 (t) = R X (0) = δ(0) = ∞ (4) Processes with infinite average power cannot exist in practice. Problem 10.10.2 Solution Since Y (t) = A+X(t), the mean of Y (t) is E [Y (t)] = E [A] +E [X(t)] = E [A] +µ X (1) The autocorrelation of Y (t) is R Y (t, τ) = E [(A+X(t)) (A+X(t + τ))] (2) = E A 2 +E [A] E [X(t)] +AE [X(t + τ)] +E [X(t)X(t + τ)] (3) = E A 2 + 2E [A] µ X +R X (τ) (4) We see that neither E[Y (t)] nor R Y (t, τ) depend on t. Thus Y (t) is a wide sense stationary process. Problem 10.10.3 Solution In this problem, we find the autocorrelation R W (t, τ) when W(t) = X cos 2πf 0 t +Y sin 2πf 0 t, (1) and X and Y are uncorrelated random variables with E[X] = E[Y ] = 0. We start by writing R W (t, τ) = E [W(t)W(t + τ)] (2) = E [(X cos 2πf 0 t +Y sin 2πf 0 t) (X cos 2πf 0 (t + τ) +Y sin 2πf 0 (t + τ))] . (3) Since X and Y are uncorrelated, E[XY ] = E[X]E[Y ] = 0. Thus, when we expand E[W(t)W(t + τ)] and take the expectation, all of the XY cross terms will be zero. This implies R W (t, τ) = E X 2 cos 2πf 0 t cos 2πf 0 (t + τ) +E Y 2 sin 2πf 0 t sin 2πf 0 (t + τ) (4) Since E[X] = E[Y ] = 0, E X 2 = Var[X] −(E [X]) 2 = σ 2 , E Y 2 = Var[Y ] −(E [Y ]) 2 = σ 2 . (5) In addition, from Math Fact B.2, we use the formulas cos Acos B = 1 2 cos(A−B) + cos(A+B) (6) sin Asin B = 1 2 cos(A−B) −cos(A+B) (7) 368 to write R W (t, τ) = σ 2 2 (cos 2πf 0 τ + cos 2πf 0 (2t + τ)) + σ 2 2 (cos 2πf 0 τ −cos 2πf 0 (2t + τ)) (8) = σ 2 cos 2πf 0 τ (9) Thus R W (t, τ) = R W (τ). Since E [W(t)] = E [X] cos 2πf 0 t +E [Y ] sin 2πf 0 t = 0, (10) we can conclude that W(t) is a wide sense stationary process. However, we note that if E[X 2 ] = E[Y 2 ], then the cos 2πf 0 (2t + τ) terms in R W (t, τ) would not cancel and W(t) would not be wide sense stationary. Problem 10.10.4 Solution (a) In the problem statement, we are told that X(t) has average power equal to 1. By Defini- tion 10.16, the average power of X(t) is E[X 2 (t)] = 1. (b) Since Θ has a uniform PDF over [0, 2π], f Θ (θ) = 1/(2π) 0 ≤ θ ≤ 2π 0 otherwise (1) The expected value of the random phase cosine is E [cos(2πf c t +Θ)] = ∞ −∞ cos(2πf c t + θ)f Θ (θ) dθ (2) = 2π 0 cos(2πf c t + θ) 1 2π dθ (3) = 1 2π sin(2πf c t + θ)| 2π 0 (4) = 1 2π (sin(2πf c t + 2π) −sin(2πf c t)) = 0 (5) (c) Since X(t) and Θ are independent, E [Y (t)] = E [X(t) cos(2πf c t +Θ)] = E [X(t)] E [cos(2πf c t +Θ)] = 0 (6) Note that the mean of Y (t) is zero no matter what the mean of X(t) since the random phase cosine has zero mean. (d) Independence of X(t) and Θ results in the average power of Y (t) being E Y 2 (t) = E X 2 (t) cos 2 (2πf c t +Θ) (7) = E X 2 (t) E cos 2 (2πf c t +Θ) (8) = E cos 2 (2πf c t +Θ) (9) Note that we have used the fact from part (a) that X(t) has unity average power. To finish the problem, we use the trigonometric identity cos 2 φ = (1 + cos 2φ)/2. This yields E Y 2 (t) = E ¸ 1 2 (1 + cos(2π(2f c )t +Θ)) = 1/2 (10) Note that E[cos(2π(2f c )t +Θ)] = 0 by the argument given in part (b) with 2f c replacing f c . 369 Problem 10.10.5 Solution This proof simply parallels the proof of Theorem 10.12. For the first item, R X [0] = R X [m, 0] = E[X 2 m ]. Since X 2 m ≥ 0, we must have E[X 2 m ] ≥ 0. For the second item, Definition 10.13 implies that R X [k] = R X [m, k] = E [X m X m+k ] = E [X m+k X m ] = R X [m+k, −k] (1) Since X m is wide sense stationary, R X [m+k, −k] = R X [−k]. The final item requires more effort. First, we note that when X m is wide sense stationary, Var[X m ] = C X [0], a constant for all t. Second, Theorem 4.17 says that |C X [m, k]| ≤ σ Xm σ X m+k = C X [0] . (2) Note that C X [m, k] ≤ |C X [m, k]|, and thus it follows that C X [m, k] ≤ σ Xm σ X m+k = C X [0] , (3) (This little step was unfortunately omitted from the proof of Theorem 10.12.) Now for any numbers a, b, and c, if a ≤ b and c ≥ 0, then (a + c) 2 ≤ (b + c) 2 . Choosing a = C X [m, k], b = C X [0], and c = µ 2 X yields C X [m, m+k] +µ 2 X 2 ≤ C X [0] +µ 2 X 2 (4) In the above expression, the left side equals (R X [k]) 2 while the right side is (R X [0]) 2 , which proves the third part of the theorem. Problem 10.10.6 Solution The solution to this problem is essentially the same as the proof of Theorem 10.13 except integrals are replaced by sums. First we verify that X m is unbiased: E X m = 1 2m+ 1 E ¸ m ¸ n=−m X n ¸ (1) = 1 2m+ 1 m ¸ n=−m E [X n ] = 1 2m+ 1 m ¸ n=−m µ X = µ X (2) To show consistency, it is sufficient to show that lim m→∞ Var[X m ] = 0. First, we observe that X m −µ X = 1 2m+1 ¸ m n=−m (X n −µ X ). This implies Var[X(T)] = E 1 2m+ 1 m ¸ n=−m (X n −µ X ) 2 ¸ ¸ (3) = E ¸ 1 (2m+ 1) 2 m ¸ n=−m (X n −µ X ) m ¸ n =−m (X n −µ X ) ¸ (4) = 1 (2m+ 1) 2 m ¸ n=−m m ¸ n =−m E [(X n −µ X )(X n −µ X )] (5) = 1 (2m+ 1) 2 m ¸ n=−m m ¸ n =−m C X n −n . (6) 370 We note that m ¸ n =−m C X n −n ≤ m ¸ n =−m C X n −n (7) ≤ ∞ ¸ n =−∞ C X n −n = ∞ ¸ k=−∞ |C X (k)| < ∞. (8) Hence there exists a constant K such that Var[X m ] ≤ 1 (2m+ 1) 2 m ¸ n=−m K = K 2m+ 1 . (9) Thus lim m→∞ Var[X m ] ≤ lim m→∞ K 2m+1 = 0. Problem 10.11.1 Solution (a) Since X(t) and Y (t) are independent processes, E [W(t)] = E [X(t)Y (t)] = E [X(t)] E [Y (t)] = µ X µ Y . (1) In addition, R W (t, τ) = E [W(t)W(t + τ)] (2) = E [X(t)Y (t)X(t + τ)Y (t + τ)] (3) = E [X(t)X(t + τ)] E [Y (t)Y (t + τ)] (4) = R X (τ)R Y (τ) (5) We can conclude that W(t) is wide sense stationary. (b) To examine whether X(t) and W(t) are jointly wide sense stationary, we calculate R WX (t, τ) = E [W(t)X(t + τ)] = E [X(t)Y (t)X(t + τ)] . (6) By independence of X(t) and Y (t), R WX (t, τ) = E [X(t)X(t + τ)] E [Y (t)] = µ Y R X (τ). (7) Since W(t) and X(t) are both wide sense stationary and since R WX (t, τ) depends only on the time difference τ, we can conclude from Definition 10.18 that W(t) and X(t) are jointly wide sense stationary. Problem 10.11.2 Solution To show that X(t) and X i (t) are jointly wide sense stationary, we must first show that X i (t) is wide sense stationary and then we must show that the cross correlation R XX i (t, τ) is only a function of the time difference τ. For each X i (t), we have to check whether these facts are implied by the fact that X(t) is wide sense stationary. 371 (a) Since E[X 1 (t)] = E[X(t +a)] = µ X and R X 1 (t, τ) = E [X 1 (t)X 1 (t + τ)] (1) = E [X(t +a)X(t + τ +a)] (2) = R X (τ), (3) we have verified that X 1 (t) is wide sense stationary. Now we calculate the cross correlation R XX 1 (t, τ) = E [X(t)X 1 (t + τ)] (4) = E [X(t)X(t + τ +a)] (5) = R X (τ +a). (6) Since R XX 1 (t, τ) depends on the time difference τ but not on the absolute time t, we conclude that X(t) and X 1 (t) are jointly wide sense stationary. (b) Since E[X 2 (t)] = E[X(at)] = µ X and R X 2 (t, τ) = E [X 2 (t)X 2 (t + τ)] (7) = E [X(at)X(a(t + τ))] (8) = E [X(at)X(at +aτ)] = R X (aτ), (9) we have verified that X 2 (t) is wide sense stationary. Now we calculate the cross correlation R XX 2 (t, τ) = E [X(t)X 2 (t + τ)] (10) = E [X(t)X(a(t + τ))] (11) = R X ((a −1)t + τ). (12) Except for the trivial case when a = 1 and X 2 (t) = X(t), R XX 2 (t, τ) depends on both the absolute time t and the time difference τ, we conclude that X(t) and X 2 (t) are not jointly wide sense stationary. Problem 10.11.3 Solution (a) Y (t) has autocorrelation function R Y (t, τ) = E [Y (t)Y (t + τ)] (1) = E [X(t −t 0 )X(t + τ −t 0 )] (2) = R X (τ). (3) (b) The cross correlation of X(t) and Y (t) is R XY (t, τ) = E [X(t)Y (t + τ)] (4) = E [X(t)X(t + τ −t 0 )] (5) = R X (τ −t 0 ). (6) (c) We have already verified that R Y (t, τ) depends only on the time difference τ. Since E[Y (t)] = E[X(t −t 0 )] = µ X , we have verified that Y (t) is wide sense stationary. (d) Since X(t) and Y (t) are wide sense stationary and since we have shown that R XY (t, τ) depends only on τ, we know that X(t) and Y (t) are jointly wide sense stationary. Comment: This problem is badly designed since the conclusions don’t depend on the specific R X (τ) given in the problem text. (Sorry about that!) 372 Problem 10.12.1 Solution Writing Y (t + τ) = t+τ 0 N(v) dv permits us to write the autocorrelation of Y (t) as R Y (t, τ) = E [Y (t)Y (t + τ)] = E ¸ t 0 t+τ 0 N(u)N(v) dv du (1) = t 0 t+τ 0 E [N(u)N(v)] dv du (2) = t 0 t+τ 0 αδ(u −v) dv du. (3) At this point, it matters whether τ ≥ 0 or if τ < 0. When τ ≥ 0, then v ranges from 0 to t +τ and at some point in the integral over v we will have v = u. That is, when τ ≥ 0, R Y (t, τ) = t 0 αdu = αt. (4) When τ < 0, then we must reverse the order of integration. In this case, when the inner integral is over u, we will have u = v at some point so that R Y (t, τ) = t+τ 0 t 0 αδ(u −v) dudv = t+τ 0 αdv = α(t + τ). (5) Thus we see the autocorrelation of the output is R Y (t, τ) = αmin {t, t + τ} (6) Perhaps surprisingly, R Y (t, τ) is what we found in Example 10.19 to be the autocorrelation of a Brownian motion process. In fact, Brownian motion is the integral of the white noise process. Problem 10.12.2 Solution Let µ i = E[X(t i )]. (a) Since C X (t 1 , t 2 −t 1 ) = ρσ 1 σ 2 , the covariance matrix is C = ¸ C X (t 1 , 0) C X (t 1 , t 2 −t 1 ) C X (t 2 , t 1 −t 2 ) C X (t 2 , 0) = ¸ σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 2 (1) Since C is a 2 2 matrix, it has determinant |C| = σ 2 1 σ 2 2 (1 −ρ 2 ). (b) Is is easy to verify that C −1 = 1 1 −ρ 2 1 σ 2 1 −ρ σ 1 σ 2 −ρ σ 1 σ 2 1 σ 2 1 ¸ ¸ ¸ (2) (c) The general form of the multivariate density for X(t 1 ), X(t 2 ) is f X(t 1 ),X(t 2 ) (x 1 , x 2 ) = 1 (2π) k/2 |C| 1/2 e − 1 2 (x−µ X ) C −1 (x−µ X ) (3) where k = 2 and x = x 1 x 2 and µ X = µ 1 µ 2 . Hence, 1 (2π) k/2 |C| 1/2 = 1 2π σ 2 1 σ 2 2 (1 −ρ 2 ) . (4) 373 Furthermore, the exponent is − 1 2 (¯ x − ¯ µ X ) C −1 (¯ x − ¯ µ X ) = − 1 2 x 1 −µ 1 x 2 −µ 2 1 1 −ρ 2 1 σ 2 1 −ρ σ 1 σ 2 −ρ σ 1 σ 2 1 σ 2 1 ¸ ¸ ¸ ¸ x 1 −µ 1 x 2 −µ 2 (5) = − x 1 −µ 1 σ 1 2 − 2ρ(x 1 −µ 1 )(x 2 −µ 2 ) σ 1 σ 2 + x 2 −µ 2 σ 2 2 2(1 −ρ 2 ) (6) Plugging in each piece into the joint PDF f X(t 1 ),X(t 2 ) (x 1 , x 2 ) given above, we obtain the bivariate Gaussian PDF. Problem 10.12.3 Solution Let W = W(t 1 ) W(t 2 ) · · · W(t n ) denote a vector of samples of a Brownian motion process. To prove that W(t) is a Gaussian random process, we must show that W is a Gaussian random vector. To do so, let X = X 1 · · · X n (1) = W(t 1 ) W(t 2 ) −W(t 1 ) W(t 3 ) −W(t 2 ) · · · W(t n ) −W(t n−1 ) (2) denote the vector of increments. By the definition of Brownian motion, X 1 , . . . , X n is a sequence of independent Gaussian random variables. Thus X is a Gaussian random vector. Finally, W = W 1 W 2 . . . W n ¸ ¸ ¸ ¸ ¸ = X 1 X 1 +X 2 . . . X 1 + · · · +X n ¸ ¸ ¸ ¸ ¸ = 1 1 1 . . . . . . 1 · · · · · · 1 ¸ ¸ ¸ ¸ ¸ . .. . A X. (3) Since X is a Gaussian random vector and W = AX with A a rank n matrix, Theorem 5.16 implies that W is a Gaussian random vector. Problem 10.13.1 Solution From the instructions given in the problem, the program noisycosine.m will generate the four plots. n=1000; t=0.001*(-n:n); w=gaussrv(0,0.01,(2*n)+1); %Continuous Time, Continuous Value xcc=2*cos(2*pi*t) + w’; plot(t,xcc); xlabel(’\it t’);ylabel(’\it X_{cc}(t)’); axis([-1 1 -3 3]); figure; %Continuous Time, Discrete Value xcd=round(xcc); plot(t,xcd); xlabel(’\it t’);ylabel(’\it X_{cd}(t)’); axis([-1 1 -3 3]); figure; %Discrete time, Continuous Value ts=subsample(t,100); xdc=subsample(xcc,100); plot(ts,xdc,’b.’); xlabel(’\it t’);ylabel(’\it X_{dc}(t)’); axis([-1 1 -3 3]); figure; %Discrete Time, Discrete Value xdd=subsample(xcd,100); plot(ts,xdd,’b.’); xlabel(’\it t’);ylabel(’\it X_{dd}(t)’); axis([-1 1 -3 3]); 374 In noisycosine.m, we use a function subsample.m to obtain the discrete time sample functions. In fact, subsample is hardly necessary since it’s such a simple one-line Matlab function: function y=subsample(x,n) %input x(1), x(2) ... %output y(1)=x(1), y(2)=x(1+n), y(3)=x(2n+1) y=x(1:n:length(x)); However, we use it just to make noisycosine.m a little more clear. Problem 10.13.2 Solution >> t=(1:600)’; >> M=simswitch(10,0.1,t); >> Mavg=cumsum(M)./t; >> plot(t,M,t,Mavg); These commands will simulate the switch for 600 minutes, pro- ducing the vector M of samples of M(t) each minute, the vector Mavg which is the sequence of time average estimates, and a plot resembling this one: 0 100 200 300 400 500 600 0 50 100 150 t M ( t ) M(t) Time Average M(t) From the figure, it appears that the time average is converging to a value in th neighborhood of 100. In particular, because the switch is initially empty with M(0) = 0, it takes a few hundred minutes for the time average to climb to something close to 100. Following the problem instructions, we can write the following short program to examine ten simulation runs: function Mavg=simswitchavg(T,k) %Usage: Mavg=simswitchavg(T,k) %simulate k runs of duration T of the %telephone switch in Chapter 10 %and plot the time average of each run t=(1:k)’; %each column of Mavg is a time average sample run Mavg=zeros(T,k); for n=1:k, M=simswitch(10,0.1,t); Mavg(:,n)=cumsum(M)./t; end plot(t,Mavg); The command simswitchavg(600,10) produced this graph: 375 0 100 200 300 400 500 600 0 20 40 60 80 100 120 t M ( t ) T i m e A v e r a g e From the graph, one can see that even after T = 600 minutes, each sample run produces a time average M 600 around 100. Note that in Chapter 12, we will able Markov chains to prove that the expected number of calls in the switch is in fact 100. However, note that even if T is large, M T is still a random variable. From the above plot, one might guess that M 600 has a standard deviation of perhaps σ = 2 or σ = 3. An exact calculation of the variance of M 600 is fairly difficult because it is a sum of dependent random variables, each of which has a PDF that is in itself reasonably difficult to calculate. Problem 10.13.3 Solution In this problem, our goal is to find out the average number of ongoing calls in the switch. Before we use the approach of Problem 10.13.2, its worth a moment to consider the physical situation. In particular, calls arrive as a Poisson process of rate λ = 100 call/minute and each call has duration of exactly one minute. As a result, if we inspect the system at an arbitrary time t at least one minute past initialization, the number of calls at the switch will be exactly the number of calls N 1 that arrived in the previous minute. Since calls arrive as a Poisson proces of rate λ = 100 calls/minute. N 1 is a Poisson random variable with E[N 1 ] = 100. In fact, this should be true for every inspection time t. Hence it should surprising if we compute the time average and find the time average number in the queue to be something other than 100. To check out this quickie analysis, we use the method of Problem 10.13.2. However, unlike Problem 10.13.2, we cannot directly use the function simswitch.m because the call duration are no longer exponential random variables. Instead, we must modify simswitch.m for the deterministic one minute call durations, yielding the function simswitchd.m: function M=simswitchd(lambda,T,t) %Poisson arrivals, rate lambda %Deterministic (T) call duration %For vector t of times %M(i) = no. of calls at time t(i) s=poissonarrivals(lambda,max(t)); y=s+T; A=countup(s,t); D=countup(y,t); M=A-D; Note that if you compare simswitch.m in the text with simswitchd.m here, two changes occurred. The first is that the exponential call durations are replaced by the deterministic time T. The other change is that count(s,t) is replaced by countup(s,t). In fact, n=countup(x,y) does exactly the same thing as n=count(x,y); in both cases, n(i) is the number of el- ements less than or equal to y(i). The difference is that countup requires that the vectors x and y be nondecreas- ing. Now we use the same procedure as in Problem 10.13.2 and form the time average M(T) = 1 T T ¸ t=1 M(t). (1) 376 >> t=(1:600)’; >> M=simswitchd(100,1,t); >> Mavg=cumsum(M)./t; >> plot(t,Mavg); We form and plot the time average using these commands will yield a plot vaguely similar to that shown below. 0 100 200 300 400 500 600 95 100 105 M ( t ) t i m e a v e r a g e t We used the word “vaguely” because at t = 1, the time average is simply the number of arrivals in the first minute, which is a Poisson (α = 100) random variable which has not been averaged. Thus, the left side of the graph will be random for each run. As expected, the time average appears to be converging to 100. Problem 10.13.4 Solution The random variable S n is the sum of n exponential (λ) random variables. That is, S n is an Erlang (n, λ) random variable. Since K = 1 if and only if S n > T, P[K = 1] = P[S n > T]. Typically, P[K = 1] is fairly high because E [S n ] = n λ = 1.1λT| λ ≈ 1.1T. (1) Increasing n increases P[K = 1]; however, poissonarrivals then does more work generating expo- nential random variables. Although we don’t want to generate more exponential random variables than necessary, if we need to generate a lot of arrivals (ie a lot of exponential interarrival times), then Matlab is typically faster generating a vector of them all at once rather than generating them one at a time. Choosing n = 1.1λT| generates about 10 percent more exponential random variables than we typically need. However, as long as P[K = 1] is high, a ten percent penalty won’t be too costly. When n is small, it doesn’t much matter if we are efficient because the amount of calculation is small. The question that must be addressed is to estimate P[K = 1] when n is large. In this case, we can use the central limit theorem because S n is the sum of n exponential random variables. Since E[S n ] = n/λ and Var[S n ] = n/λ 2 , P [S n > T] = P ¸ S n −n/λ n/λ 2 > T −n/λ n/λ 2 ¸ ≈ Q λT −n √ n (2) To simplify our algebra, we assume for large n that 0.1λT is an integer. In this case, n = 1.1λT and P [S n > T] ≈ Q − 0.1λT √ 1.1λT = Φ λT 110 (3) Thus for large λT, P[K = 1] is very small. For example, if λT = 1,000, P[S n > T] ≈ Φ(3.01) = 0.9987. If λT = 10,000, P[S n > T] ≈ Φ(9.5). 377 Problem 10.13.5 Solution Following the problem instructions, we can write the function newarrivals.m. For convenience, here are newarrivals and poissonarrivals side by side. function s=newarrivals(lam,T) %Usage s=newarrivals(lam,T) %Returns Poisson arrival times %s=[s(1) ... s(n)] over [0,T] n=poissonrv(lam*T,1); s=sort(T*rand(n,1)); function s=poissonarrivals(lam,T) %arrival times s=[s(1) ... s(n)] % s(n)<= T < s(n+1) n=ceil(1.1*lam*T); s=cumsum(exponentialrv(lam,n)); while (s(length(s))< T), s_new=s(length(s))+ ... cumsum(exponentialrv(lam,n)); s=[s; s_new]; end s=s(s<=T); Clearly the code for newarrivals is shorter, more readable, and perhaps, with the help of Problem 10.6.4, more logical than poissonarrivals. Unfortunately this doesn’t mean the code runs better. Here are some cputime comparisons: >> t=cputime;s=poissonarrivals(1,100000);t=cputime-t t = 0.1110 >> t=cputime;s=newarrivals(1,100000);t=cputime-t t = 0.5310 >> t=cputime;poissonrv(100000,1);t=cputime-t t = 0.5200 >> Unfortunately, these results were highly repeatable. The function poissonarrivals generated 100,000 arrivals of a rate 1 Poisson process required roughly 0.1 seconds of cpu time. The same task took newarrivals about 0.5 seconds, or roughly 5 times as long! In the newarrivals code, the culprit is the way poissonrv generates a single Poisson random variable with expected value 100,000. In this case, poissonrv generates the first 200,000 terms of the Poisson PMF! This required calculation is so large that it dominates the work need to generate 100,000 uniform random numbers. In fact, this suggests that a more efficient way to generate a Poisson (α) random variable N is to generate arrivals of a rate α Poisson process until the Nth arrival is after time 1. Problem 10.13.6 Solution We start with brownian.m to simulate the Brownian motion process with barriers, Since the goal is to estimate the barrier probability P[|X(t)| = b], we don’t keep track of the value of the process over all time. Also, we simply assume that a unit time step τ = 1 for the process. Thus, the process starts at n = 0 at position W 0 = 0 at each step n, the position, if we haven’t reached a barrier, is W n = W n−1 + X n , where X 1 , . . . , X T are iid Gaussian (0, √ α) random variables. Accounting for the effect of barriers, W n = max(min(W n−1 +X n , b), −b). (1) To implement the simulation, we can generate the vector x of increments all at once. However to check at each time step whether we are crossing a barrier, we need to proceed sequentially. (This is analogous to the problem in Quiz 10.13.) 378 In brownbarrier shown below, pb(1) tracks how often the process touches the left barrier at −b while pb(2) tracks how often the right side barrier at b is reached. By symmetry, P[X(t) = b] = P[X(t) = −b]. Thus if T is chosen very large, we should expect pb(1)=pb(2). The extent to which this is not the case gives an indication of the extent to which we are merely estimating the barrier probability. Here is the code and for each T ∈ {10,000, 100,000, 1,000,000}, here two sample runs: function pb=brownwall(alpha,b,T) %pb=brownwall(alpha,b,T) %Brownian motion, param. alpha %walls at [-b, b], sampled %unit of time until time T %each Returns vector pb: %pb(1)=fraction of time at -b %pb(2)=fraction of time at b T=ceil(T); x=sqrt(alpha).*gaussrv(0,1,T); w=0;pb=zeros(1,2); for k=1:T, w=w+x(k); if (w <= -b) w=-b; pb(1)=pb(1)+1; elseif (w >= b) w=b; pb(2)=pb(2)+1; end end pb=pb/T; >> pb=brownwall(0.01,1,1e4) pb = 0.0301 0.0353 >> pb=brownwall(0.01,1,1e4) pb = 0.0417 0.0299 >> pb=brownwall(0.01,1,1e5) pb = 0.0333 0.0360 >> pb=brownwall(0.01,1,1e5) pb = 0.0341 0.0305 >> pb=brownwall(0.01,1,1e6) pb = 0.0323 0.0342 >> pb=brownwall(0.01,1,1e6) pb = 0.0333 0.0324 >> The sample runs show that for α = 0.1 and b = 1 that the P [X(t) = −b] ≈ P [X(t) = b] ≈ 0.03. (2) Otherwise, the numerical simulations are not particularly instructive. Perhaps the most important thing to understand is that the Brownian motion process with barriers is very different from the ordinary Brownian motion process. Remember that for ordinary Brownian motion, the variance of X(t) always increases linearly with t. For the process with barriers, X 2 (t) ≤ b 2 and thus Var[X(t)] ≤ b 2 . In fact, for the process with barriers, the PDF of X(t) converges to a limit as t becomes large. If you’re curious, you shouldn’t have much trouble digging in the library to find out more. Problem 10.13.7 Solution In this problem, we start with the simswitch.m code to generate the vector of departure times y. We then construct the vector I of inter-departure times. The command hist,20 will generate a 20 bin histogram of the departure times. The fact that this histogram resembles an exponential PDF suggests that perhaps it is reasonable to try to match the PDF of an exponential (µ) random variable against the histogram. In most problems in which one wants to fit a PDF to measured data, a key issue is how to choose the parameters of the PDF. In this problem, choosing µ is simple. Recall that the switch has a Poisson arrival process of rate λ so interarrival times are exponential (λ) random variables. If 1/µ < 1/λ, then the average time between departures from the switch is less than the average 379 time between arrivals to the switch. In this case, calls depart the switch faster than they arrive which is impossible because each departing call was an arriving call at an earlier time. Similarly, if 1/µ > 1/λ , then calls would be departing from the switch more slowly than they arrived. This can happen to an overloaded switch; however, it’s impossible in this system because each arrival departs after an exponential time. Thus the only possibility is that 1/µ = 1/λ. In the program simswitchdepart.m, we plot a histogram of departure times for a switch with arrival rate λ against the scaled exponential (λ) PDF λe −λx b where b is the histogram bin size. Here is the code: function I=simswitchdepart(lambda,mu,T) %Usage: I=simswitchdepart(lambda,mu,T) %Poisson arrivals, rate lambda %Exponential (mu) call duration %Over time [0,T], returns I, %the vector of inter-departure times %M(i) = no. of calls at time t(i) s=poissonarrivals(lambda,T); y=s+exponentialrv(mu,length(s)); y=sort(y); n=length(y); I=y-[0; y(1:n-1)]; %interdeparture times imax=max(I);b=ceil(n/100); id=imax/b; x=id/2:id:imax; pd=hist(I,x); pd=pd/sum(pd); px=exponentialpdf(lambda,x)*id; plot(x,px,x,pd); xlabel(’\it x’);ylabel(’Probability’); legend(’Exponential PDF’,’Relative Frequency’); Here is an example of the output corresponding to simswitchdepart(10,1,1000). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 x P r o b a b i l i t y Exponential PDF Relative Frequency As seen in the figure, the match is quite good. Although this is not a carefully designed statistical test of whether the inter-departure times are exponential random variables, it is enough evidence that one may want to pursue whether such a result can be proven. In fact, the switch in this problem is an example of an M/M/∞ queuing system for which it has been shown that not only do the inter-departure have an exponential distribution, but the steady-state departure process is a Poisson process. For the curious reader, details can be found, for example, in the text Discrete Stochastic Processes by Gallager. 380 Problem Solutions – Chapter 11 Problem 11.1.1 Solution For this problem, it is easiest to work with the expectation operator. The mean function of the output is E [Y (t)] = 2 +E [X(t)] = 2 (1) The autocorrelation of the output is R Y (t, τ) = E [(2 +X(t)) (2 +X(t + τ))] (2) = E [4 + 2X(t) + 2X(t + τ) +X(t)X(t + τ)] (3) = 4 + 2E [X(t)] + 2E [X(t + τ)] +E [X(t)X(t + τ)] (4) = 4 +R X (τ) (5) We see that R Y (t, τ) only depends on the time difference τ. Thus Y (t) is wide sense stationary. Problem 11.1.2 Solution By Theorem 11.2, the mean of the output is µ Y = µ X ∞ −∞ h(t) dt (1) = −3 10 −3 0 (1 −10 6 t 2 ) dt (2) = −3 t −(10 6 /3)t 3 10 −3 0 (3) = −2 10 −3 volts (4) Problem 11.1.3 Solution By Theorem 11.2, the mean of the output is µ Y = µ X ∞ −∞ h(t) dt = 4 ∞ 0 e −t/a dt = −4ae −t/a ∞ 0 = 4a. (1) Since µ Y = 1 = 4a, we must have a = 1/4. Problem 11.1.4 Solution Since E[Y 2 (t)] = R Y (0), we use Theorem 11.2(a) to evaluate R Y (τ) at τ = 0. That is, R Y (0) = ∞ −∞ h(u) ∞ −∞ h(v)R X (u −v) dv du (1) = ∞ −∞ h(u) ∞ −∞ h(v)η 0 δ(u −v) dv du (2) = η 0 ∞ −∞ h 2 (u) du, (3) by the sifting property of the delta function. 381 Problem 11.2.1 Solution (a) Note that Y i = ∞ ¸ n=−∞ h n X i−n = 1 3 X i+1 + 1 3 X i + 1 3 X i−1 (1) By matching coefficients, we see that h n = 1/3 n = −1, 0, 1 0 otherwise (2) (b) By Theorem 11.5, the output autocorrelation is R Y [n] = ∞ ¸ i=−∞ ∞ ¸ j=−∞ h i h j R X [n +i −j] (3) = 1 9 1 ¸ i=−1 1 ¸ j=−1 R X [n +i −j] (4) = 1 9 (R X [n + 2] + 2R X [n + 1] + 3R X [n] + 2R X [n −1] +R X [n −2]) (5) Substituting in R X [n] yields R Y [n] = 1/3 n = 0 2/9 |n| = 1 1/9 |n| = 2 0 otherwise (6) Problem 11.2.2 Solution Applying Theorem 11.4 with sampling period T s = 1/4000 s yields R X [k] = R X (kT s ) = 10 sin(2000πkT s ) + sin(1000πkT s ) 2000πkT s (1) = 20 sin(0.5πk) + sin(0.25πk) πk (2) = 10 sinc(0.5k) + 5 sinc(0.25k) (3) Problem 11.2.3 Solution (a) By Theorem 11.5, the expected value of the output is µ W = µ Y ∞ ¸ n=−∞ h n = 2µ Y = 2 (1) 382 (b) Theorem 11.5 also says that the output autocorrelation is R W [n] = ∞ ¸ i=−∞ ∞ ¸ j=−∞ h i h j R Y [n +i −j] (2) = 1 ¸ i=0 1 ¸ j=0 R Y [n +i −j] (3) = R Y [n −1] + 2R Y [n] +R Y [n + 1] (4) For n = −3, R W [−3] = R Y [−4] + 2R Y [−3] +R Y [−2] = R Y [−2] = 0.5 (5) Following the same procedure, its easy to show that R W [n] is nonzero for |n| = 0, 1, 2. Specifically, R W [n] = 0.5 |n| = 3 3 |n| = 2 7.5 |n| = 1 10 n = 0 0 otherwise (6) (c) The second moment of the output is E[W 2 n ] = R W [0] = 10. The variance of W n is Var[W n ] = E W 2 n −(E [W n ]) 2 = 10 −2 2 = 6 (7) (d) This part doesn’t require any probability. It just checks your knowledge of linear systems and convolution. There is a bit of confusion because h n is used to denote both the filter that transforms X n to Y n as well as the filter that transforms Y n to W n . To avoid confusion, we will use ˆ h n to denote the filter that transforms X n to Y n . Using Equation (11.25) for discrete-time convolution, we can write W n = ∞ ¸ j=−∞ h j Y n−j , Y n−j = ∞ ¸ i=−∞ ˆ h i X n−j−i . (8) Combining these equations yields W n = ∞ ¸ j=−∞ h j ∞ ¸ i=−∞ ˆ h i X n−j−i . (9) For each j, we make the substitution k = i + j, and then reverse the order of summation to obtain W n = ∞ ¸ j=−∞ h j ∞ ¸ k=−∞ ˆ h k−j X n−k = ∞ ¸ k=−∞ ¸ ∞ ¸ j=−∞ h j ˆ h k−j ¸ X n−k . (10) Thus we see that g k = ∞ ¸ j=−∞ h j ˆ h k−j . (11) That is, the filter g n is the convolution of the filters ˆ h n and h n . 383 In the context of our particular problem, the filter ˆ h n that transforms X n to Y n is given in Example 11.5. The two filters are ˆ h = ˆ h 0 ˆ h 1 = 1/2 1/2 h = h 0 h 1 = 1 1 (12) Keep in mind that h n = ˆ h n = 0 for n < 0 or n > 1. From Equation (11), g k = ˆ h k + ˆ h k−1 = 1/2 k = 0 1 k = 1 1/2 k = 2 0 otherwise (13) Problem 11.2.4 Solution (a) By Theorem 11.5, the mean output is µ V = µ Y ∞ ¸ n=−∞ h n = (−1 + 1)µ Y = 0 (1) (b) Theorem 11.5 also says that the output autocorrelation is R V [n] = ∞ ¸ i=−∞ ∞ ¸ j=−∞ h i h j R Y [n +i −j] (2) = 1 ¸ i=0 1 ¸ j=0 h i h j R Y [n +i −j] (3) = −R Y [n −1] + 2R Y [n] −R Y [n + 1] (4) For n = −3, R V [−3] = −R Y [−4] + 2R Y [−3] −R Y [−2] = R Y [−2] = −0.5 (5) Following the same procedure, it’s easy to show that R V [n] is nonzero for |n| = 0, 1, 2. Specifically, R V [n] = −0.5 |n| = 3 −1 |n| = 2 0.5 |n| = 1 2 n = 0 0 otherwise (6) (c) Since E[V n ] = 0, the variance of the output is E[V 2 n ] = R V [0] = 2. The variance of W n is Var[V n ] = E W 2 n R V [0] = 2 (7) (d) This part doesn’t require any probability. It just checks your knowledge of linear systems and convolution. There is a bit of confusion because h n is used to denote both the filter that transforms X n to Y n as well as the filter that transforms Y n to V n . To avoid confusion, we will 384 use ˆ h n to denote the filter that transforms X n to Y n . Using Equation (11.25) for discrete-time convolution, we can write V n = ∞ ¸ j=−∞ h j Y n−j , Y n−j = ∞ ¸ i=−∞ ˆ h i X n−j−i . (8) Combining these equations yields V n = ∞ ¸ j=−∞ h j ∞ ¸ i=−∞ ˆ h i X n−j−i . (9) For the inner sum, we make the substitution k = i+j and then reverse the order of summation to obtain V n = ∞ ¸ j=−∞ h j ∞ ¸ k=−∞ ˆ h k−j X n−k = ∞ ¸ k=−∞ ¸ ∞ ¸ j=−∞ h j ˆ h k−j ¸ X n−k . (10) Thus we see that f k = ∞ ¸ j=−∞ h j ˆ h k−j . (11) That is, the filter f n is the convolution of the filters ˆ h n and h n . In the context of our particular problem, the filter ˆ h n that transforms X n to Y n is given in Example 11.5. The two filters are ˆ h = ˆ h 0 ˆ h 1 = 1/2 1/2 h = h 0 h 1 = 1 −1 (12) Keep in mind that h n = ˆ h n = 0 for n < 0 or n > 1. From Equation (11), f k = ˆ h k − ˆ h k−1 = 1/2 k = 0 0 k = 1 −1/2 k = 2 0 otherwise (13) Problem 11.2.5 Solution We start with Theorem 11.5: R Y [n] = ∞ ¸ i=−∞ ∞ ¸ j=−∞ h i h j R X [n +i −j] (1) = R X [n −1] + 2R X [n] +R X [n + 1] (2) First we observe that for n ≤ −2 or n ≥ 2, R Y [n] = R X [n −1] + 2R X [n] +R X [n + 1] = 0 (3) This suggests that R X [n] = 0 for |n| > 1. In addition, we have the following facts: R Y [0] = R X [−1] + 2R X [0] +R X [1] = 2 (4) R Y [−1] = R X [−2] + 2R X [−1] +R X [0] = 1 (5) R Y [1] = R X [0] + 2R X [1] +R X [2] = 1 (6) A simple solution to this set of equations is R X [0] = 1 and R X [n] = 0 for n = 0. 385 Problem 11.2.6 Solution The mean of Y n = (X n +Y n−1 )/2 can be found by realizing that Y n is an infinite sum of the X i ’s. Y n = ¸ 1 2 X n + 1 4 X n−1 + 1 8 X n−2 +. . . (1) Since the X i ’s are each of zero mean, the mean of Y n is also 0. The variance of Y n can be expressed as Var[Y n ] = ¸ 1 4 + 1 16 + 1 64 +. . . Var[X] = ∞ ¸ i=1 ( 1 4 ) i σ 2 = ( 1 1 −1/4 −1)σ 2 = σ 2 /3 (2) The above infinite sum converges to 1 1−1/4 −1 = 1/3, implying Var [Y n ] = (1/3) Var [X] = 1/3 (3) The covariance of Y i+1 Y i can be found by the same method. Cov[Y i+1 , Y i ] = [ 1 2 X n + 1 4 X n−1 + 1 8 X n−2 +. . .][ 1 2 X n−1 + 1 4 X n−2 + 1 8 X n−3 +. . .] (4) Since E[X i X j ] = 0 for all i = j, the only terms that are left are Cov[Y i+1 , Y i ] = ∞ ¸ i=1 1 2 i 1 2 i−1 E[X 2 i ] = 1 2 ∞ ¸ i=1 1 4 i E[X 2 i ] (5) Since E[X 2 i ] = σ 2 , we can solve the above equation, yielding Cov [Y i+1 , Y i ] = σ 2 /6 (6) Finally the correlation coefficient of Y i+1 and Y i is ρ Y i+1 Y i = Cov[Y i+1 , Y i ] Var[Y i+1 ] Var[Y i ] = σ 2 /6 σ 2 /3 = 1 2 (7) Problem 11.2.7 Solution There is a technical difficulty with this problem since X n is not defined for n < 0. This implies C X [n, k] is not defined for k < −n and thus C X [n, k] cannot be completely independent of k. When n is large, corresponding to a process that has been running for a long time, this is a technical issue, and not a practical concern. Instead, we will find ¯ σ 2 such that C X [n, k] = C X [k] for all n and k for which the covariance function is defined. To do so, we need to express X n in terms of Z 0 , Z 1 , . . . , Z n 1 . We do this in the following way: X n = cX n−1 +Z n−1 (1) = c[cX n−2 +Z n−2 ] +Z n−1 (2) = c 2 [cX n−3 +Z n−3 ] +cZ n−2 +Z n−1 (3) . . . (4) = c n X 0 +c n−1 Z 0 +c n−2 Z 2 + · · · +Z n−1 (5) = c n X 0 + n−1 ¸ i=0 c n−1−i Z i (6) 386 Since E[Z i ] = 0, the mean function of the X n process is E [X n ] = c n E [X 0 ] + n−1 ¸ i=0 c n−1−i E [Z i ] = E [X 0 ] (7) Thus, for X n to be a zero mean process, we require that E[X 0 ] = 0. The autocorrelation function can be written as R X [n, k] = E [X n X n+k ] = E c n X 0 + n−1 ¸ i=0 c n−1−i Z i ¸ c n+k X 0 + n+k−1 ¸ j=0 c n+k−1−j Z j ¸ ¸ ¸ (8) Although it was unstated in the problem, we will assume that X 0 is independent of Z 0 , Z 1 , . . . so that E[X 0 Z i ] = 0. Since E[Z i ] = 0 and E[Z i Z j ] = 0 for i = j, most of the cross terms will drop out. For k ≥ 0, autocorrelation simplifies to R X [n, k] = c 2n+k Var[X 0 ] + n−1 ¸ i=0 c 2(n−1)+k−2i) ¯ σ 2 = c 2n+k Var[X 0 ] + ¯ σ 2 c k 1 −c 2n 1 −c 2 (9) Since E[X n ] = 0, Var[X 0 ] = R X [n, 0] = σ 2 and we can write for k ≥ 0, R X [n, k] = ¯ σ 2 c k 1 −c 2 +c 2n+k σ 2 − ¯ σ 2 1 −c 2 (10) For k < 0, we have R X [n, k] = E c n X 0 + n−1 ¸ i=0 c n−1−i Z i ¸ c n+k X 0 + n+k−1 ¸ j=0 c n+k−1−j Z j ¸ ¸ ¸ (11) = c 2n+k Var[X 0 ] +c −k n+k−1 ¸ j=0 c 2(n+k−1−j) ¯ σ 2 (12) = c 2n+k σ 2 + ¯ σ 2 c −k 1 −c 2(n+k) 1 −c 2 (13) = ¯ σ 2 1 −c 2 c −k +c 2n+k σ 2 − ¯ σ 2 1 −c 2 (14) We see that R X [n, k] = σ 2 c |k| by choosing ¯ σ 2 = (1 −c 2 )σ 2 (15) Problem 11.2.8 Solution We can recusively solve for Y n as follows. Y n = aX n +aY n−1 (1) = aX n +a[aX n−1 +aY n−2 ] (2) = aX n +a 2 X n−1 +a 2 [aX n−2 +aY n−3 ] (3) 387 By continuing the same procedure, we can conclude that Y n = n ¸ j=0 a j+1 X n−j +a n Y 0 (4) Since Y 0 = 0, the substitution i = n −j yields Y n = n ¸ i=0 a n−i+1 X i (5) Now we can calculate the mean E [Y n ] = E ¸ n ¸ i=0 a n−i+1 X i ¸ = n ¸ i=0 a n−i+1 E [X i ] = 0 (6) To calculate the autocorrelation R Y [m, k], we consider first the case when k ≥ 0. C Y [m, k] = E m ¸ i=0 a m−i+1 X i m+k ¸ j=0 a m+k−j+1 X j ¸ ¸ = m ¸ i=0 m+k ¸ j=0 a m−i+1 a m+k−j+1 E [X i X j ] (7) Since the X i is a sequence of iid standard normal random variables, E [X i X j ] = 1 i = j 0 otherwise (8) Thus, only the i = j terms make a nonzero contribution. This implies C Y [m, k] = m ¸ i=0 a m−i+1 a m+k−i+1 (9) = a k m ¸ i=0 a 2(m−i+1) (10) = a k (a 2 ) m+1 + (a 2 ) m + · · · +a 2 (11) = a 2 1 −a 2 a k 1 −(a 2 ) m+1 (12) For k ≤ 0, we start from C Y [m, k] = m ¸ i=0 m+k ¸ j=0 a m−i+1 a m+k−j+1 E [X i X j ] (13) As in the case of k ≥ 0, only the i = j terms make a contribution. Also, since m+k ≤ m, C Y [m, k] = m+k ¸ j=0 a m−j+1 a m+k−j+1 = a −k m+k ¸ j=0 a m+k−j+1 a m+k−j+1 (14) By steps quite similar to those for k ≥ 0, we can show that C Y [m, k] = a 2 1 −a 2 a −k 1 −(a 2 ) m+k+1 (15) A general expression that is valid for all m and k would be C Y [m, k] = a 2 1 −a 2 a |k| 1 −(a 2 ) min(m,m+k)+1 (16) Since C Y [m, k] depends on m, the Y n process is not wide sense stationary. 388 Problem 11.3.1 Solution Since the process X n has expected value E[X n ] = 0, we know that C X (k) = R X (k) = 2 −|k| . Thus X = X 1 X 2 X 3 has covariance matrix C X = 2 0 2 −1 2 −2 2 −1 2 0 2 −1 2 −2 2 −1 2 0 ¸ ¸ = 1 1/2 1/4 1/2 1 1/2 1/4 1/2 1 ¸ ¸ . (1) From Definition 5.17, the PDF of X is f X (x) = 1 (2π) n/2 [det (C X )] 1/2 exp − 1 2 x C −1 X x . (2) If we are using Matlab for calculations, it is best to decalre the problem solved at this point. However, if you like algebra, we can write out the PDF in terms of the variables x 1 , x 2 and x 3 . To do so we find that the inverse covariance matrix is C −1 X = 4/3 −2/3 0 −2/3 5/3 −2/3 0 −2/3 4/3 ¸ ¸ (3) A little bit of algebra will show that det(C X ) = 9/16 and that 1 2 x C −1 X x = 2x 2 1 3 + 5x 2 2 6 + 2x 2 3 3 − 2x 1 x 2 3 − 2x 2 x 3 3 . (4) It follows that f X (x) = 4 3(2π) 3/2 exp − 2x 2 1 3 − 5x 2 2 6 − 2x 2 3 3 + 2x 1 x 2 3 + 2x 2 x 3 3 . (5) Problem 11.3.2 Solution The sequence X n is passed through the filter h = h 0 h 1 h 2 = 1 −1 1 (1) The output sequence is Y n . (a) Following the approach of Equation (11.58), we can write the output Y 3 = Y 1 Y 2 Y 3 as Y 3 = Y 1 Y 2 Y 3 ¸ ¸ = h 1 h 0 0 0 h 2 h 1 h 0 0 0 h 2 h 1 h 0 ¸ ¸ X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ = −1 1 0 0 1 −1 1 0 0 1 −1 1 ¸ ¸ . .. . H X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ . .. . X . (2) We note that the components of X are iid Gaussian (0, 1) random variables. Hence X has covariance matrix C X = I, the identity matrix. Since Y 3 = HX, C Y 3 = HC X H = HH = 2 −2 1 −2 3 −2 1 −2 3 ¸ ¸ . (3) 389 Some calculation (by hand or by Matlab) will show that det(C Y 3 ) = 3 and that C −1 Y 3 = 1 3 5 4 1 4 5 2 1 2 2 ¸ ¸ . (4) Some algebra will show that y C −1 Y 3 y = 5y 2 1 + 5y 2 2 + 2y 2 3 + 8y 1 y 2 + 2y 1 y 3 + 4y 2 y 3 3 . (5) This implies Y 3 has PDF f Y 3 (y) = 1 (2π) 3/2 [det (C Y 3 )] 1/2 exp − 1 2 y C −1 Y 3 y (6) = 1 (2π) 3/2 √ 3 exp − 5y 2 1 + 5y 2 2 + 2y 2 3 + 8y 1 y 2 + 2y 1 y 3 + 4y 2 y 3 6 . (7) (b) To find the PDF of Y 2 = Y 1 Y 2 , we start by observing that the covariance matrix of Y 2 is just the upper left 2 2 submatrix of C Y 3 . That is, C Y 2 = ¸ 2 −2 −2 3 and C −1 Y 2 = ¸ 3/2 1 1 1 . (8) Since det(C Y 2 ) = 2, it follows that f Y 2 (y) = 1 (2π) 3/2 [det (C Y 2 )] 1/2 exp − 1 2 y C −1 Y 2 y (9) = 1 (2π) 3/2 √ 2 exp − 3 2 y 2 1 −2y 1 y 2 −y 2 2 . (10) Problem 11.3.3 Solution The sequence X n is passed through the filter h = h 0 h 1 h 2 = 1 −1 1 (1) The output sequence is Y n . Following the approach of Equation (11.58), we can write the output Y = Y 1 Y 2 Y 3 as Y = Y 1 Y 2 Y 3 ¸ ¸ = h 2 h 1 h 0 0 0 0 h 2 h 1 h 0 0 0 0 h 2 h 1 h 0 ¸ ¸ X −1 X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ ¸ ¸ = 1 −1 1 0 0 0 1 −1 1 0 0 0 1 −1 1 ¸ ¸ . .. . H X −1 X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ ¸ ¸ . .. . X . (2) Since X n has autocovariance function C X (k) = 2 −|k| , X has covariance matrix C X = 1 1/2 1/4 1/8 1/16 1/2 1 1/2 1/4 1/8 1/4 1/2 1 1/2 1/4 1/8 1/4 1/2 1 1/2 1/16 1/8 1/4 1/2 1 ¸ ¸ ¸ ¸ ¸ ¸ . (3) 390 Since Y = HX, C Y = HC X H = 3/2 −3/8 9/16 −3/8 3/2 −3/8 9/16 −3/8 3/2 ¸ ¸ . (4) Some calculation (by hand or preferably by Matlab) will show that det(C Y ) = 675/256 and that C −1 Y = 1 15 12 2 −4 2 11 2 −4 2 12 ¸ ¸ . (5) Some algebra will show that y C −1 Y y = 12y 2 1 + 11y 2 2 + 12y 2 3 + 4y 1 y 2 +−8y 1 y 3 + 4y 2 y 3 15 . (6) This implies Y has PDF f Y (y) = 1 (2π) 3/2 [det (C Y )] 1/2 exp − 1 2 y C −1 Y y (7) = 16 (2π) 3/2 15 √ 3 exp − 12y 2 1 + 11y 2 2 + 12y 2 3 + 4y 1 y 2 +−8y 1 y 3 + 4y 2 y 3 30 . (8) This solution is another demonstration of why the PDF of a Gaussian random vector should be left in vector form. Comment: We know from Theorem 11.5 that Y n is a stationary Gaussian process. As a result, the random variables Y 1 , Y 2 and Y 3 are identically distributed and C Y is a symmetric Toeplitz matrix. This might make on think that the PDF f Y (y) should be symmetric in the variables y 1 , y 2 and y 3 . However, because Y 2 is in the middle of Y 1 and Y 3 , the information provided by Y 1 and Y 3 about Y 2 is different than the information Y 1 and Y 2 convey about Y 3 . This fact appears as asymmetry in f Y (y). Problem 11.3.4 Solution The sequence X n is passed through the filter h = h 0 h 1 h 2 = 1 0 −1 (1) The output sequence is Y n . Following the approach of Equation (11.58), we can write the output Y = Y 1 Y 2 Y 3 as Y = Y 1 Y 2 Y 3 ¸ ¸ = h 2 h 1 h 0 0 0 0 h 2 h 1 h 0 0 0 0 h 2 h 1 h 0 ¸ ¸ X −1 X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ ¸ ¸ = 1 0 −1 0 0 0 1 0 −1 0 0 0 1 0 −1 ¸ ¸ . .. . H X −1 X 0 X 1 X 2 X 3 ¸ ¸ ¸ ¸ ¸ ¸ . .. . X . (2) Since X n has autocovariance function C X (k) = 2 −|k| , X has the Toeplitz covariance matrix C X = 1 1/2 1/4 1/8 1/16 1/2 1 1/2 1/4 1/8 1/4 1/2 1 1/2 1/4 1/8 1/4 1/2 1 1/2 1/16 1/8 1/4 1/2 1 ¸ ¸ ¸ ¸ ¸ ¸ . (3) 391 Since Y = HX, C Y = HC X H = 3/2 3/8 −9/16 3/8 3/2 3/8 −9/16 3/8 3/2 ¸ ¸ . (4) Some calculation (preferably by Matlab) will show that det(C Y ) = 297/128 and that C −1 Y = 10/11 −1/3 14/33 −1/3 5/6 −1/3 14/33 −1/3 10/11 ¸ ¸ . (5) Some algebra will show that y C −1 Y y = 10 11 y 2 1 + 5 6 y 2 2 + 10 11 y 2 3 − 2 3 y 1 y 2 + 28 33 y 1 y 3 − 2 3 y 2 y 3 . (6) This implies Y has PDF f Y (y) = 1 (2π) 3/2 [det (C Y )] 1/2 exp − 1 2 y C −1 Y y (7) = 8 √ 2 (2π) 3/2 3 √ 33 exp − 5 11 y 2 1 − 5 12 y 2 2 − 5 11 y 2 3 + 1 3 y 1 y 2 − 14 33 y 1 y 3 + 1 3 y 2 y 3 . (8) This solution is yet another demonstration of why the PDF of a Gaussian random vector should be left in vector form. Problem 11.4.1 Solution This problem is solved using Theorem 11.9 with M = 2 and k = 1. The optimum linear predictor filter h = h 0 h 1 of X n+1 given X n = X n−1 X n is given by ←− h = ¸ h 1 h 0 = R −1 X n R XnX n+k , (1) where R Xn = ¸ R X [0] R X [1] R X [1] R X [0] = ¸ 1 3/4 3/4 1 (2) and R XnX n+1 = E ¸¸ X n−1 X n X n+1 = ¸ R X [2] R X [1] = ¸ 1/2 3/4 . (3) Thus the filter vector h satisfies ←− h = ¸ h 1 h 0 = ¸ 1 3/4 3/4 1 −1 ¸ 1/2 3/4 = ¸ −1/7 6/7 . (4) Thus h = 6/7 −1/7 and the optimum linear predictor of X n+1 given X n and X n−1 is ˆ X n+1 = ←− h X n = − 1 7 6 7 ¸ X n−1 X n = − 1 7 X n−1 + 6 7 X n . (5) To find the mean square error of this predictor, we can calculate it directly as e ∗ L = E ( ˆ X n+1 −X n+1 ) 2 = E ¸ − 1 7 X n−1 + 6 7 X n −X n+1 2 ¸ . (6) 392 By expanding the square and taking the expectation, each cross-term is of the form E[X i X j ] = R X [i −j], so that e ∗ L = E ¸ − 1 7 X n−1 + 6 7 X n −X n+1 − 1 7 X n−1 + 6 7 X n −X n+1 2 ¸ (7) = 1 49 R X [0] − 12 49 R X [1] + 2 7 R X [2] + 36 49 R X [0] − 12 7 R X [1] +R X [0] (8) = 86 49 R X [0] − 96 49 R X [1] + 2 7 R X [2] = 3 7 . (9) This direct method is already tedious, even for a simple filter of order M = 2. A better way to calculate the mean square error is to recall that Theorem 11.9 is just of Theorem 9.7 expressed in the terminology of filters. Expressing part (c) of Theorem 9.7 in terms of the linear prediction filter h, the mean square error of the predictor is e ∗ L = Var[X n+1 ] − ←− h R XnX n+k (10) = R X [0] − ←− h ¸ R X [2] R X [1] (11) = 1 − −1/7 6/7 ¸ 1/2 3/4 = 3 7 . (12) For an arbitrary filter order M, Equation (10) is a much simpler way to compute the mean square error. Problem 11.4.2 Solution This problem is solved using Theorem 11.9 with k = 1. The optimum linear predictor filter h = h 0 h 1 of X n+1 given X n = X n−1 X n is given by ←− h = ¸ h 1 h 0 = R −1 Xn R XnX n+k , (1) where R X n = ¸ R X [0] R X [1] R X [1] R X [0] = ¸ 1.1 0.75 0.75 1.1 (2) and R X n X n+1 = E ¸¸ X n−1 X n X n+1 = ¸ R X [2] R X [1] = ¸ 0.5 0.75 . (3) Thus the filter vector h satisfies ←− h = ¸ h 1 h 0 = ¸ 1.1 0.75 0.75 1.1 −1 ¸ 0.5 0.75 = ¸ −0.0193 0.6950 . (4) Thus h = 0.6950 −0.0193 and the optimum linear predictor of X n+1 given X n and X n−1 is ˆ X n+1 = ←− h X n = −0.0193 0.6950 ¸ X n−1 X n = −0.0193X n−1 + 0.6950X n . (5) To find the mean square error of this predictor, we can calculate it directly as e ∗ L = E ( ˆ X n+1 −X n+1 ) 2 = E (−0.0193X n−1 + 0.6950X n −X n+1 ) 2 . (6) 393 We can expand the square and take the expectation term by term since each cross-term is of the form E[X i X j ] = R X [i −j]. This approach is followed in the solution of Problem 11.4.1 and it is quite tedious. A better way to calculate the mean square error is to recall that Theorem 11.9 is just Theorem 9.7 expressed in the terminology of filters. Expressing part (c) of Theorem 9.7 in terms of the linear prediction filter h, the mean square error of the predictor is e ∗ L = Var[X n+1 ] − ←− h R XnX n+k (7) = R X [0] − ←− h ¸ R X [2] R X [1] (8) = 1.1 − −0.0193 0.6950 ¸ 1/2 3/4 = 0.5884. (9) Comment: It is instructive to compare this solution to the solution of Problem 11.4.1 where the random process, denoted ˜ X n here to distinguish it from X n in this problem, has autocorrelation function R ˜ X [k] = 1 −0.25 |k| |k| ≤ 4, 0 otherwise. (10) The difference is simply that R ˜ X [0] = 1, rather than R X [0] = 1.1 as in this problem. This difference corresponds to adding an iid noise sequence to ˜ X n to create X n . That is, X n = ˜ X n +Z n (11) where Z n is an iid additive noise sequence with autocorrelation function R Z [k] = 0.1δ[k] that is independent of the X n process. Thus X n in this problem can be viewed as a noisy version of ˜ X n in Problem 11.4.1. Because the ˜ X n process is less noisy, the optimal predictor filter of ˜ X n+1 given ˜ X n−1 and ˜ X n is ˜ h = 6/7 −1/7 = 0.8571 −0.1429 , which places more emphasis on the current value ˜ X n in predicting the next value. In addition, the mean squared error of the predictor of ˜ X n+1 is only 3/7 = 0.4285, which is less than 0.5884. Not surprisingly, the noise in the X n process reduces the performance of the predictor. Problem 11.4.3 Solution This problem generalizes Example 11.14 in that −0.9 is replaced by the parameter c and the noise variance 0.2 is replaced by η 2 . Because we are only finding the first order filter h = h 0 h 1 , it is relatively simple to generalize the solution of Example 11.14 to the parameter values c and η 2 . Based on the observation Y = Y n−1 Y n , Theorem 11.11 states that the linear MMSE esti- mate of X = X n is ←− h Y where ←− h = R −1 Y R YXn = (R Xn +R Wn ) −1 R XnXn . (1) From Equation (11.82), R XnXn = R X [1] R X [0] = c 1 . From the problem statement, R Xn +R Wn = ¸ 1 c c 1 + ¸ η 2 0 0 η 2 = ¸ 1 + η 2 c c 1 + η 2 . (2) 394 This implies ←− h = ¸ 1 + η 2 c c 1 + η 2 −1 ¸ c 1 (3) = 1 (1 + η 2 ) 2 −c 2 ¸ 1 + η 2 −c −c 1 + η 2 ¸ c 1 (4) = 1 (1 + η 2 ) 2 −c 2 ¸ cη 2 1 + η 2 −c 2 . (5) The optimal filter is h = 1 (1 + η 2 ) 2 −c 2 ¸ 1 + η 2 −c 2 cη 2 . (6) To find the mean square error of this predictor, we recall that Theorem 11.11 is just Theorem 9.7 expressed in the terminology of filters. Expressing part (c) of Theorem 9.7 in terms of the linear estimation filter h, the mean square error of the estimator is e ∗ L = Var[X n ] − ←− h R YnXn (7) = Var[X n ] − ←− h R XnXn (8) = R X [0] − ←− h ¸ c 1 (9) = 1 − c 2 η 2 + η 2 + 1 −c 2 (1 + η 2 ) 2 −c 2 . (10) Note that we always find that e ∗ L < Var[X n ] = 1 simply because the optimal estimator cannot be worse than the blind estimator that ignores the observation Y n . Problem 11.4.4 Solution In this problem, we find the mean square estimation error of the optimal first order filter in Prob- lem 11.4.3. This problem highlights a shortcoming of Theorem 11.11 in that the theorem doesn’t explicitly provide the mean square error associated with the optimal filter h. We recall from the discussion at the start of Section 11.4 that Theorem 11.11 is derived from Theorem 9.7 with ←− h = a = R −1 Y R YX = (R Xn +R Wn ) −1 R XnXn (1) From Theorem 9.7, the mean square error of the filter output is e ∗ L = Var[X] −a R YX (2) = R X [0] − ←− h R XnXn (3) = R X [0] −R XnXn (R Xn +R Wn ) −1 R XnXn (4) Equations (3) and (4) are general expressions for the means square error of the optimal linear filter that can be applied to any situation described by Theorem 11.11. To apply this result to the problem at hand, we observe that R X [0] = c 0 = 1 and that ←− h = 1 (1 + η 2 ) 2 −c 2 ¸ cη 2 1 + η 2 −c 2 , R XnXn = ¸ R X [1] R X [0] = ¸ c 1 (5) 395 This implies e ∗ L = R X [0] − ←− h R X n X n (6) = 1 − 1 (1 + η 2 ) 2 −c 2 cη 2 1 + η 2 −c 2 ¸ c 1 (7) = 1 − c 2 η 2 + 1 + η 2 −c 2 (1 + η 2 ) 2 −c 2 (8) = η 2 1 + η 2 −c 2 (1 + η 2 ) 2 −c 2 (9) The remaining question is what value of c minimizes the mean square error e ∗ L . The usual approach is to set the derivative de ∗ L dc to zero. This would yield the incorrect answer c = 0. In fact, evaluating the second derivative at c = 0 shows that d 2 e ∗ L dc 2 c=0 < 0. Thus the mean square error e ∗ L is maximum at c = 0. For a more careful analysis, we observe that e ∗ L = η 2 f(x) where f(x) = a −x a 2 −x , (10) with x = c 2 , and a = 1 + η 2 . In this case, minimizing f(x) is equivalent to minimizing the mean square error. Note that for R X [k] to be a respectable autocorrelation function, we must have |c| ≤ 1. Thus we consider only values of x in the interval 0 ≤ x ≤ 1. We observe that df(x) dx = − a 2 −a (a 2 −x) 2 (11) Since a > 1, the derivative is negative for 0 ≤ x ≤ 1. This implies the mean square error is minimized by making x as large as possible, i.e., x = 1. Thus c = 1 minimizes the mean square error. In fact c = 1 corresponds to the autocorrelation function R X [k] = 1 for all k. Since each X n has zero expected value, every pair of sample X n and X m has correlation coefficient ρ Xn,Xm = Cov [X n , X m ] Var[X n ] Var[X m ] = R X [n −m] R X [0] = 1. (12) That is, c = 1 corresponds to a degenerate process in which every pair of samples X n and X m are perfectly correlated. Physically, this corresponds to the case where where the random process X n is generated by generating a sample of a random variable X and setting X n = X for all n. The observations are then of the form Y n = X+Z n . That is, each observation is just a noisy observation of the random variable X. For c = 1, the optimal filter h = 1 2 + η 2 ¸ 1 1 . (13) is just an equally weighted average of the past two samples. Problem 11.4.5 Solution This problem involves both linear estimation and prediction because we are using Y n−1 , the noisy observation of X n 1 to estimate the future value X n . Thus we can’t follow the cookbook recipes of Theorem 11.9 or Theorem 11.11. Instead we go back to Theorem 9.4 to find the minimum mean 396 square error linear estimator. In that theorem, X n and Y n−1 play the roles of X and Y . That is, our estimate ˆ X n of X n is ˆ X n = ˆ X L (Y n−1 ) = ρ Xn,Y n−1 Var[X n ] Var[Y n−1 ] 1/2 (Y n−1 −E [Y n−1 ]) +E [X n ] (1) By recursive application of X n = cX n−1 +Z n−1 , we obtain X n = a n X 0 + n ¸ j=1 a j−1 Z n−j (2) The expected value of X n is E[X n ] = a n E[X 0 ] + ¸ n j=1 a j−1 E[Z n−j ] = 0. The variance of X n is Var[X n ] = a 2n Var[X 0 ] + n ¸ j=1 [a j−1 ] 2 Var[Z n−j ] = a 2n Var[X 0 ] + σ 2 n ¸ j=1 [a 2 ] j−1 (3) Since Var[X 0 ] = σ 2 /(1 −c 2 ), we obtain Var[X n ] = c 2n σ 2 1 −c 2 + σ 2 (1 −c 2n ) 1 −c 2 = σ 2 1 −c 2 (4) Note that E[Y n−1 ] = dE[X n−1 ] +E[W n ] = 0. The variance of Y n−1 is Var[Y n−1 ] = d 2 Var[X n−1 ] + Var[W n ] = d 2 σ 2 1 −c 2 + η 2 (5) Since X n and Y n−1 have zero mean, the covariance of X n and Y n−1 is Cov [X n , Y n−1 ] = E [X n Y n−1 ] = E [(cX n−1 +Z n−1 ) (dX n−1 +W n−1 )] (6) From the problem statement, we learn that E[X n−1 W n−1 ] = 0 E[X n−1 ]E[W n−1 ] = 0 E[Z n−1 X n−1 ] = 0 E[Z n−1 W n−1 ] = 0 Hence, the covariance of X n and Y n−1 is Cov [X n , Y n−1 ] = cd Var[X n−1 ] (7) The correlation coefficient of X n and Y n−1 is ρ Xn,Y n−1 = Cov [X n , Y n−1 ] Var[X n ] Var[Y n−1 ] (8) Since E[Y n−1 ] and E[X n ] are zero, the linear predictor for X n becomes ˆ X n = ρ Xn,Y n−1 Var[X n ] Var[Y n−1 ] 1/2 Y n−1 = Cov [X n , Y n−1 ] Var[Y n−1 ] Y n−1 = cd Var[X n−1 ] Var[Y n−1 ] Y n−1 (9) Substituting the above result for Var[X n ], we obtain the optimal linear predictor of X n given Y n−1 . ˆ X n = c d 1 1 + β 2 (1 −c 2 ) Y n−1 (10) where β 2 = η 2 /(d 2 σ 2 ). From Theorem 9.4, the mean square estimation error at step n e ∗ L (n) = E[(X n − ˆ X n ) 2 ] = Var[X n ](1 −ρ 2 Xn ,Y n−1 ) = σ 2 1 + β 2 1 + β 2 (1 −c 2 ) (11) We see that mean square estimation error e ∗ L (n) = e ∗ L , a constant for all n. In addition, e ∗ L is an increasing function β. 397 Problem 11.5.1 Solution To use Table 11.1, we write R X (τ) in terms of the autocorrelation sinc(x) = sin(πx) πx . (1) In terms of the sinc(·) function, we obtain R X (τ) = 10 sinc(2000τ) + 5 sinc(1000τ). (2) From Table 11.1, S X (f) = 10 2,000 rect f 2000 + 5 1,000 rect f 1,000 (3) Here is a graph of the PSD. −1500 −1000 −500 0 500 1000 1500 0 0.002 0.004 0.006 0.008 0.01 0.012 f S X ( f ) Problem 11.5.2 Solution The process Y (t) has expected value E[Y (t)] = 0. The autocorrelation of Y (t) is R Y (t, τ) = E [Y (t)Y (t + τ)] = E [X(αt)X(α(t + τ))] = R X (ατ) (1) Thus Y (t) is wide sense stationary. The power spectral density is S Y (f) = ∞ −∞ R X (ατ)e −j2πfτ dτ. (2) At this point, we consider the cases α > 0 and α < 0 separately. For α > 0, the substitution τ = ατ yields S Y (f) = 1 α ∞ −∞ R X (τ )e −j2π(f/α)τ dτ = S X (f/α) α (3) When α < 0, we start with Equation (2) and make the substitution τ = −ατ, yielding S Y (f) = 1 −α ∞ −∞ R X (−τ )e −j2π f −α τ dτ . (4) Since R X (−τ ) = R X (τ ), S Y (f) = 1 −α ∞ −∞ R X (τ )e −j2π f −α τ dτ . = 1 −α S X f −α (5) For −α = |α| for α < 0, we can combine the α > 0 and α < 0 cases in the expression S Y (f) = 1 |α| S X f |α| . (6) 398 Problem 11.6.1 Solution Since the random sequence X n has autocorrelation function R X [k] = δ k + (0.1) |k| , (1) We can find the PSD directly from Table 11.2 with 0.1 |k| corresponding to a |k| . The table yields S X (φ) = 1 + 1 −(0.1) 2 1 + (0.1) 2 −2(0.1) cos 2πφ = 2 −0.2 cos 2πφ 1.01 −0.2 cos 2πφ . (2) Problem 11.7.1 Solution First we show that S Y X (f) = S XY (−f). From the definition of the cross spectral density, S Y X (f) = ∞ −∞ R Y X (τ)e −j2πfτ dτ (1) Making the subsitution τ = −τ yields S Y X (f) = ∞ −∞ R Y X (−τ )e j2πfτ dτ (2) By Theorem 10.14, R Y X (−τ ) = R XY (τ ). This implies S Y X (f) = ∞ −∞ R XY (τ )e −j2π(−f)τ dτ = S XY (−f) (3) To complete the problem, we need to show that S XY (−f) = [S XY (f)] ∗ . First we note that since R XY (τ) is real valued, [R XY (τ)] ∗ = R XY (τ). This implies [S XY (f)] ∗ = ∞ −∞ [R XY (τ)] ∗ [e −j2πfτ ] ∗ dτ (4) = ∞ −∞ R XY (τ)e −j2π(−f)τ dτ (5) = S XY (−f) (6) Problem 11.8.1 Solution Let a = 1/RC. The solution to this problem parallels Example 11.22. (a) From Table 11.1, we observe that S X (f) = 2 · 10 4 (2πf) 2 + 10 4 H(f) = 1 a +j2πf (1) By Theorem 11.16, S Y (f) = |H(f)| 2 S X (f) = 2 · 10 4 [(2πf) 2 +a 2 ][(2πf) 2 + 10 4 ] (2) To find R Y (τ), we use a form of partial fractions expansion to write S Y (f) = A (2πf) 2 +a 2 + B (2πf) 2 + 10 4 (3) 399 Note that this method will work only if a = 100. This same method was also used in Example 11.22. The values of A and B can be found by A = 2 · 10 4 (2πf) 2 + 10 4 f= ja 2π = −2 · 10 4 a 2 −10 4 B = 2 · 10 4 a 2 + 10 4 f= j100 2π = 2 · 10 4 a 2 −10 4 (4) This implies the output power spectral density is S Y (f) = −10 4 /a a 2 −10 4 2a (2πf) 2 +a 2 + 1 a 2 −10 4 200 (2πf) 2 + 10 4 (5) Since e −c|τ| and 2c/((2πf) 2 + c 2 ) are Fourier transform pairs for any constant c > 0, we see that R Y (τ) = −10 4 /a a 2 −10 4 e −a|τ| + 100 a 2 −10 4 e −100|τ| (6) (b) To find a = 1/(RC), we use the fact that E Y 2 (t) = 100 = R Y (0) = −10 4 /a a 2 −10 4 + 100 a 2 −10 4 (7) Rearranging, we find that a must satisfy a 3 −(10 4 + 1)a + 100 = 0 (8) This cubic polynomial has three roots: a = 100 a = −50 + √ 2501 a = −50 − √ 2501 (9) Recall that a = 100 is not a valid solution because our expansion of S Y (f) was not valid for a = 100. Also, we require a > 0 in order to take the inverse transform of S Y (f). Thus a = −50 + √ 2501 ≈ 0.01 and RC ≈ 100. Problem 11.8.2 Solution (a) R W (τ) = δ(τ) is the autocorrelation function whose Fourier transform is S W (f) = 1. (b) The output Y (t) has power spectral density S Y (f) = |H(f)| 2 S W (f) = |H(f)| 2 (1) (c) Since |H(f)| = 1 for f ∈ [−B, B], the average power of Y (t) is E Y 2 (t) = ∞ −∞ S Y (f) df = B −B df = 2B (2) (d) Since the white noise W(t) has zero mean, the mean value of the filter output is E [Y (t)] = E [W(t)] H(0) = 0 (3) 400 Problem 11.8.3 Solution Since S Y (f) = |H(f)| 2 S X (f), we first find |H(f)| 2 = H(f)H ∗ (f) (1) = a 1 e −j2πft 1 +a 2 e −j2πft 2 a 1 e j2πft 1 +a 2 e j2πft 2 (2) = a 2 1 +a 2 2 +a 1 a 2 e −j2πf(t 2 −t 1 ) +e −j2πf(t 1 −t 2 ) (3) It follows that the output power spectral density is S Y (f) = (a 2 1 +a 2 2 )S X (f) +a 1 a 2 S X (f) e −j2πf(t 2 −t 1 ) +a 1 a 2 S X (f) e −j2πf(t 1 −t 2 ) (4) Using Table 11.1, the autocorrelation of the output is R Y (τ) = (a 2 1 +a 2 2 )R X (τ) +a 1 a 2 (R X (τ −(t 1 −t 2 )) +R X (τ + (t 1 −t 2 ))) (5) Problem 11.8.4 Solution (a) The average power of the input is E X 2 (t) = R X (0) = 1 (1) (b) From Table 11.1, the input has power spectral density S X (f) = 1 2 e −πf 2 /4 (2) The output power spectral density is S Y (f) = |H(f)| 2 S X (f) = 1 2 e −πf 2 /4 |f| ≤ 2 0 otherwise (3) (c) The average output power is E Y 2 (t) = ∞ −∞ S Y (f) df = 1 2 2 −2 e −πf 2 /4 df (4) This integral cannot be expressed in closed form. However, we can express it in the form of the integral of a standardized Gaussian PDF by making the substitution f = z 2/π. With this subsitution, E Y 2 (t) = 1 √ 2π √ 2π − √ 2π e −z 2 /2 dz (5) = Φ( √ 2π) −Φ(− √ 2π) (6) = 2Φ( √ 2π) −1 = 0.9876 (7) The output power almost equals the input power because the filter bandwidth is sufficiently wide to pass through nearly all of the power of the input. 401 Problem 11.8.5 Solution (a) From Theorem 11.13(b), E X 2 (t) = ∞ −∞ S X (f) df 100 −100 10 −4 df = 0.02 (1) (b) From Theorem 11.17 S XY (f) = H(f)S X (f) = 10 −4 H(f) |f| ≤ 100 0 otherwise (2) (c) From Theorem 10.14, R Y X (τ) = R XY (−τ) (3) From Table 11.1, if g(τ) and G(f) are a Fourier transform pair, then g(−τ) and G ∗ (f) are a Fourier transform pair. This implies S Y X (f) = S ∗ XY (f) = 10 −4 H ∗ (f) |f| ≤ 100 0 otherwise (4) (d) By Theorem 11.17, S Y (f) = H ∗ (f)S XY (f) = |H(f)| 2 S X (f) (5) = 10 −4 /[10 4 π 2 + (2πf) 2 ] |f| ≤ 100 0 otherwise (6) (e) By Theorem 11.13, E Y 2 (t) = ∞ −∞ S Y (f) df = 100 −100 10 −4 10 4 π 2 + 4π 2 f 2 df (7) = 2 10 8 π 2 100 0 df 1 + (f/50) 2 (8) By making the substitution, f = 50 tan θ, we have df = 50 sec 2 θ dθ. Using the identity 1 + tan 2 θ = sec 2 θ, we have E Y 2 (t) = 100 10 8 π 2 tan −1 (2) 0 dθ = tan −1 (2) 10 6 π 2 = 1.12 10 −7 (9) Problem 11.8.6 Solution The easy way to do this problem is to use Theorem 11.17 which states S XY (f) = H(f)S X (f) (1) 402 (a) From Table 11.1, we observe that S X (f) = 8 16 + (2πf) 2 H(f) = 1 7 +j2πf (2) From Theorem 11.17, S XY (f) = H(f)S X (f) = 8 [7 +j2πf][16 + (2πf) 2 ] (3) (b) To find the cross correlation, we need to find the inverse Fourier transform of S XY (f). A straightforward way to do this is to use a partial fraction expansion of S XY (f). That is, by defining s = j2πf, we observe that 8 (7 +s)(4 +s)(4 −s) = −8/33 7 +s + 1/3 4 +s + 1/11 4 −s (4) Hence, we can write the cross spectral density as S XY (f) = −8/33 7 +j2πf + 1/3 4 +j2πf + 1/11 4 −jπf (5) Unfortunately, terms like 1/(a −j2πf) do not have an inverse transforms. The solution is to write S XY (f) in the following way: S XY (f) = −8/33 7 +j2πf + 8/33 4 +j2πf + 1/11 4 +j2πf + 1/11 4 −j2πf (6) = −8/33 7 +j2πf + 8/33 4 +j2πf + 8/11 16 + (2πf) 2 (7) (8) Now, we see from Table 11.1 that the inverse transform is R XY (τ) = − 8 33 e −7τ u(τ) + 8 33 e −4τ u(τ) + 1 11 e −4|τ| (9) Problem 11.8.7 Solution (a) Since E[N(t)] = µ N = 0, the expected value of the output is µ Y = µ N H(0) = 0. (b) The output power spectral density is S Y (f) = |H(f)| 2 S N (f) = 10 −3 e −2×10 6 |f| (1) (c) The average power is E Y 2 (t) = ∞ −∞ S Y (f) df = ∞ −∞ 10 −3 e −2×10 6 |f| df (2) = 2 10 −3 ∞ 0 e −2×10 6 f df (3) = 10 −3 (4) 403 (d) Since N(t) is a Gaussian process, Theorem 11.3 says Y (t) is a Gaussian process. Thus the random variable Y (t) is Gaussian with E [Y (t)] = 0 Var[Y (t)] = E Y 2 (t) = 10 −3 (5) Thus we can use Table 3.1 to calculate P [Y (t) > 0.01] = P ¸ Y (t) Var[Y (t)] > 0.01 Var[Y (t)] ¸ (6) 1 −Φ 0.01 √ 0.001 (7) = 1 −Φ(0.32) = 0.3745 (8) Problem 11.8.8 Solution Suppose we assume that N(t) and Y (t) are the input and output of a linear time invariant filter h(u). In that case, Y (t) = t 0 N(u) du = ∞ −∞ h(t −u)N(u) du (1) For the above two integrals to be the same, we must have h(t −u) = 1 0 ≤ t −u ≤ t 0 otherwise (2) Making the substitution v = t −u, we have h(v) = 1 0 ≤ v ≤ t 0 otherwise (3) Thus the impulse response h(v) depends on t. That is, the filter response is linear but not time invariant. Since Theorem 11.2 requires that h(t) be time invariant, this example does not violate the theorem. Problem 11.8.9 Solution (a) Note that |H(f)| = 1. This implies S ˆ M (f) = S M (f). Thus the average power of ˆ M(t) is ˆ q = ∞ −∞ S ˆ M (f) df = ∞ −∞ S M (f) df = q (1) (b) The average power of the upper sideband signal is E U 2 (t) = E M 2 (t) cos 2 (2πf c t +Θ) (2) −E 2M(t) ˆ M(t) cos(2πf c t +Θ) sin(2πf c t +Θ) (3) +E ˆ M 2 (t) sin 2 (2πf c t +Θ) (4) 404 To find the expected value of the random phase cosine, for an integer n = 0, we evaluate E [cos(2πf c t +nΘ)] = 2π 0 cos(2πf c t +nθ) 1 2π dθ (5) = 1 2nπ sin(2πf c t +nθ)| 2π 0 (6) = 1 2nπ (sin(2πf c t + 2nπ) −sin(2πf c t)) = 0 (7) Similar steps will show that for any integer n = 0, the random phase sine also has expected value E [sin(2πf c t +nΘ)] = 0 (8) Using the trigonometric identity cos 2 φ = (1 + cos 2φ)/2, we can show E cos 2 (2πf c t +Θ) = E ¸ 1 2 (1 + cos(2π(2f c )t + 2Θ)) = 1/2 (9) Similarly, E sin 2 (2πf c t +Θ) = E ¸ 1 2 (1 −cos(2π(2f c )t + 2Θ)) = 1/2 (10) In addition, the identity 2 sinφcos φ = sin 2φ implies E [2 sin(2πf c t +Θ) cos(2πf c t +Θ)] = E [cos(4πf c t + 2Θ)] = 0 (11) Since M(t) and ˆ M(t) are independent of Θ, the average power of the upper sideband signal is E U 2 (t) = E M 2 (t) E cos 2 (2πf c t +Θ) +E ˆ M 2 (t) E sin 2 (2πf c t +Θ) (12) −E M(t) ˆ M(t) E [2 cos(2πf c t +Θ) sin(2πf c t +Θ)] (13) = q/2 +q/2 + 0 = q (14) Problem 11.8.10 Solution (a) Since S W (f) = 10 −15 for all f, R W (τ) = 10 −15 δ(τ). (b) Since Θ is independent of W(t), E [V (t)] = E [W(t) cos(2πf c t +Θ)] = E [W(t)] E [cos(2πf c t +Θ)] = 0 (1) (c) We cannot initially assume V (t) is WSS so we first find R V (t, τ) = E[V (t)V (t + τ)] (2) = E[W(t) cos(2πf c t +Θ)W(t + τ) cos(2πf c (t + τ) +Θ)] (3) = E[W(t)W(t + τ)]E[cos(2πf c t +Θ) cos(2πf c (t + τ) +Θ)] (4) = 10 −15 δ(τ)E[cos(2πf c t +Θ) cos(2πf c (t + τ) +Θ)] (5) 405 We see that for all τ = 0, R V (t, t + τ) = 0. Thus we need to find the expected value of E [cos(2πf c t +Θ) cos(2πf c (t + τ) +Θ)] (6) only at τ = 0. However, its good practice to solve for arbitrary τ: E[cos(2πf c t +Θ) cos(2πf c (t + τ) +Θ)] (7) = 1 2 E[cos(2πf c τ) + cos(2πf c (2t + τ) + 2Θ)] (8) = 1 2 cos(2πf c τ) + 1 2 2π 0 cos(2πf c (2t + τ) + 2θ) 1 2π dθ (9) = 1 2 cos(2πf c τ) + 1 2 sin(2πf c (2t + τ) + 2θ) 2π 0 (10) = 1 2 cos(2πf c τ) + 1 2 sin(2πf c (2t + τ) + 4π) − 1 2 sin(2πf c (2t + τ)) (11) = 1 2 cos(2πf c τ) (12) Consequently, R V (t, τ) = 1 2 10 −15 δ(τ) cos(2πf c τ) = 1 2 10 −15 δ(τ) (13) (d) Since E[V (t)] = 0 and since R V (t, τ) = R V (τ), we see that V (t) is a wide sense stationary process. Since L(f) is a linear time invariant filter, the filter output Y (t) is also a wide sense stationary process. (e) The filter input V (t) has power spectral density S V (f) = 1 2 10 −15 . The filter output has power spectral density S Y (f) = |L(f)| 2 S V (f) = 10 −15 /2 |f| ≤ B 0 otherwise (14) The average power of Y (t) is E Y 2 (t) = ∞ −∞ S Y (f) df = B −B 1 2 10 −15 df = 10 −15 B (15) Problem 11.9.1 Solution The system described in this problem corresponds exactly to the system in the text that yielded Equation (11.146). (a) From Equation (11.146), the optimal linear filter is ˆ H(f) = S X (f) S X (f) +S N (f) (1) In this problem, R X (τ) = sinc(2Wτ) so that S X (f) = 1 2W rect f 2W . (2) 406 It follows that the optimal filter is ˆ H(f) = 1 2W rect f 2W 1 2W rect f 2W + 10 −5 = 10 5 10 5 + 2W rect f 2W . (3) (b) From Equation (11.147), the minimum mean square error is e ∗ L = ∞ −∞ S X (f) S N (f) S X (f) +S N (f) df = ∞ −∞ ˆ H(f)S N (f) df (4) = 10 5 10 5 + 2W W −W 10 −5 df (5) = 2W 10 5 + 2W . (6) It follows that the mean square error satisfies e ∗ L ≤ 0.04 if and only if W ≤ 2,083.3 Hz. What is occurring in this problem is the optimal filter is simply an ideal lowpass filter of bandwidth W. Increasing W increases the bandwidth of the signal and the bandwidth of the filter ˆ H(f). This allows more noise to pass through the filter and decreases the quality of our estimator. Problem 11.9.2 Solution The system described in this problem corresponds exactly to the system in the text that yielded Equation (11.146). (a) From Equation (11.146), the optimal linear filter is ˆ H(f) = S X (f) S X (f) +S N (f) (1) In this problem, R X (τ) = e −5000|τ| so that S X (f) = 10 4 (5,000) 2 + (2πf) 2 . (2) It follows that the optimal filter is ˆ H(f) = 10 4 (5,000) 2 +(2πf) 2 10 4 (5,000) 2 +(2πf) 2 + 10 −5 = 10 9 1.025 10 9 + (2πf) 2 . (3) From Table 11.2, we see that the filter ˆ H(f) has impulse response ˆ h(τ) = 10 9 2α e −α|τ| (4) where α = √ 1.025 10 9 = 3.20 10 4 . (b) From Equation (11.147), the minimum mean square error is e ∗ L = ∞ −∞ S X (f) S N (f) S X (f) +S N (f) df = ∞ −∞ ˆ H(f)S N (f) df (5) = 10 −5 ∞ −∞ ˆ H(f) df (6) = 10 −5 ˆ h(0) = 10 4 2α = 0.1562. (7) 407 Problem 11.10.1 Solution Although it is straightforward to calculate sample paths of Y n using the filter response Y n = 1 2 Y n−1 + 1 2 X n directly, the necessary loops makes for a slow program. A solution using vectors and matrices tends to run faster. From the filter response, we can write Y 1 = 1 2 X 1 (1) Y 2 = 1 4 X 1 + 1 2 X 2 (2) Y 3 = 1 8 X 1 + 1 4 X 2 + 1 2 X 3 (3) . . . (4) Y n = 1 2 n X 1 + 1 2 n−1 X 2 + · · · + 1 2 X n (5) In vector notation, these equations become Y 1 Y 2 . . . Y n ¸ ¸ ¸ ¸ ¸ . .. . Y = 1/2 0 · · · 0 1/4 1/2 . . . . . . . . . . . . . . . 0 1/2 n · · · 1/4 1/2 ¸ ¸ ¸ ¸ ¸ ¸ . .. . H X 1 X 2 . . . X n ¸ ¸ ¸ ¸ ¸ . .. . X . (6) When X is a column of iid Gaussian (0, 1) random variables, the column vector Y = HX is a single sample path of Y 1 , . . . , Y n . When X is an n m matrix of iid Gaussian (0, 1) random variables, each column of Y = HX is a sample path of Y 1 , . . . , Y n . In this case, let matrix entry Y i,j denote a sample Y i of the jth sample path. The samples Y i,1 , Y i,2 , . . . , Y i,m are iid samples of Y i . We can estimate the mean and variance of Y i using the sample mean M n (Y i ) and sample variance V m (Y i ) of Section 7.3. These estimates are M n (Y i ) = 1 m m ¸ j=1 Y i,j , V (Y i ) = 1 m−1 m ¸ j=1 (Y i,j −M n (Y i )) 2 (7) This is the approach of the following program. function ymv=yfilter(m); %ymv(i) is the mean and var (over m paths) of y(i), %the filter output of 11.2.6 and 11.10.1 X=randn(500,m); H=toeplitz([(0.5).^(1:500)],[0.5 zeros(1,499)]); Y=H*X; yav=sum(Y,2)/m; yavmat=yav*ones(1,m); yvar=sum((Y-yavmat).^2,2)/(m-1); ymv=[yav yvar]; The commands ymv=yfilter(100);plot(ymv) will generate a plot similar to this: 408 0 50 100 150 200 250 300 350 400 450 500 −0.4 −0.2 0 0.2 0.4 0.6 n Sample mean Sample variance We see that each sample mean is small, on the other order of 0.1. Note that E[Y i ] = 0. For m = 100 samples, the sample mean has variance 1/m = 0.01 and standard deviation 0.1. Thus it is to be expected that we observe sample mean values around 0.1. Also, it can be shown (in the solution to Problem 11.2.6 for example) that as i becomes large, Var[Y i ] converges to 1/3. Thus our sample variance results are also not surprising. Comment: Although within each sample path, Y i and Y i+1 are quite correlated, the sample means of Y i and Y i+1 are not very correlated when a large number of sample paths are averaged. Exact calculation of the covariance of the sample means of Y i and Y i+1 might be an interesting exercise. The same observations apply to the sample variance as well. Problem 11.10.2 Solution This is just a Matlab question that has nothing to do with probability. In the Matlab oper- ation R=fft(r,N), the shape of the output R is the same as the shape of the input r. If r is a column vector, then R is a column vector. If r is a row vector, then R is a row vector. For fftc to work the same way, the shape of n must be the same as the shape of R. The instruction n=reshape(0:(N-1),size(R)) does this. Problem 11.10.3 Solution The program cospaths.m generates Gaussian sample paths with the desired autocorrelation func- tion R X (k) = cos(0.04 ∗ pi ∗ k). Here is the code: function x=cospaths(n,m); %Generate m sample paths of length n of a %Gaussian process with ACF R[k]=cos(0.04*pi*k) k=0:n-1; rx=cos(0.04*pi*k)’; x=gaussvector(0,rx,m); The program is simple because if the second input parameter to gaussvector is a length m vector rx, then rx is assumed to be the first row of a symmetric Toeplitz covariance matrix. The commands x=cospaths(100,10);plot(x) will produce a graph like this one: 0 10 20 30 40 50 60 70 80 90 100 −2 −1 0 1 2 n X n 409 We note that every sample path of the process is Gaussian random sequence. However, it would also appear from the graph that every sample path is a perfect sinusoid. this may seem strange if you are used to seeing Gaussian processes simply as noisy processes or fluctating Brownian motion processes. However, in this case, the amplitude and phase of each sample path is random such that over the ensemble of sinusoidal sample functions, each sample X n is a Gaussian (0, 1) random variable. Finally, to confirm that that each sample path is a perfect sinusoid, rather than just resembling a sinusoid, we calculate the DFT of each sample path. The commands >> x=cospaths(100,10); >> X=fft(x); >> stem((0:99)/100,abs(X)); will produce a plot similar to this: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 k/100 X k The above plot consists of ten overlaid 100-point DFT magnitude stem plots, one for each Gaussian sample function. Each plot has exactly two nonzero components at frequencies k/100 = 0.02 and (100 −k)/100 = 0.98 corresponding to each sample path sinusoid having frequency 0.02. Note that the magnitude of each 0.02 frequency component depends on the magnitude of the corresponding sinusoidal sample path. Problem 11.10.4 Solution Searching the Matlab full product help for inv yields this bit of advice: In practice, it is seldom necessary to form the explicit inverse of a matrix. A frequent misuse of inv arises when solving the system of linear equations . One way to solve this is with x = inv(A) ∗ b. A better way, from both an execution time and numerical accuracy standpoint, is to use the matrix division operator x = A\b. This produces the solution using Gaussian elimination, without forming the inverse. See \ and / for further information. The same discussion goes on to give an example where x = A\b is both faster and more accurate. Problem 11.10.5 Solution The function lmsepredictor.m is designed so that if the sequence X n has a finite duration autocor- relation function such that R X [k] = 0 for |k| ≥ m, but the LMSE filter of order M −1 for M ≥ m is supposed to be returned, then the lmsepredictor automatically pads the autocorrelation vector rx with a sufficient number of zeros so that the output is the order M −1 filter. Conversely, if rx specifies more values R X [k] than are needed, then the operation rx(1:M) extracts the M values R X [0], . . . , R X [M −1] that are needed. 410 However, in this problem R X [k] = (−0.9) |k| has infinite duration. When we pass the truncated representation rx of length m = 6 and request lmsepredictor(rx,M) for M ≥ 6, the result is that rx is incorrectly padded with zeros. The resulting filter output will be the LMSE filter for the filter response R X [k] = (−0.9) |k| |k| ≤ 5, 0 otherwise, (1) rather than the LMSE filter for the true autocorrelation function. Problem 11.10.6 Solution Applying Theorem 11.4 with sampling period T s = 1/4000 s yields R X [k] = R X (kT s ) = 10 sinc(0.5k) + 5 sinc(0.25k). (1) To find the power spectral density S X (φ), we need to find the DTFT of sinc(φ 0 k) Unfortunately, this was omitted from Table 11.2 so we now take a detour and derive it here. As with any derivation of the transform of a sinc function, we guess the answer and calculate the inverse transform. In this case, suppose S X (φ) = 1 φ 0 rect(φ/φ 0 ) = 1 |φ| ≤ φ 0 /2, 0 otherwise. (2) We find R X [k] from the inverse DTFT. For |φ 0 | ≤ 1, R X [k] = 1/2 −1/2 S X (φ) e j2πφk dφ = 1 φ 0 φ 0 /2 −φ 0 /2 e j2πφk dφ = 1 φ 0 e jπφ 0 k −e −jπφ 0 k j2πk = sinc(φ 0 k) (3) Now we apply this result to take the transform of R X [k] in Equation (1). This yields S X (φ) = 10 0.5 rect(φ/0.5) + 5 0.25 rect(φ/0.25). (4) Ideally, an 2N + 1-point DFT would yield a sampled version of the DTFT S X (φ). However, the truncation of the autocorrelation R X [k] to 201 points results in a difference. For N = 100, the DFT will be a sampled version of the DTFT of R X [k] rect(k/(2N +1)). Here is a Matlab program that shows the difference when the autocorrelation is truncated to 2N + 1 terms. function DFT=twosincsdft(N); %Usage: SX=twosincsdft(N); %Returns and plots the 2N+1 %point DFT of R(-N) ... R(0) ... R(N) %for ACF R[k] in Problem 11.2.2 k=-N:N; rx=10*sinc(0.5*k) + 5*sinc(0.25*k); DFT=fftc(rx); M=ceil(0.6*N); phi=(0:M)/(2*N+1); stem(phi,abs(DFT(1:(M+1)))); xlabel(’\it \phi’); ylabel(’\it S_X(\phi)’); Here is the output of twosincsdft(100). 411 0 0.05 0.1 0.15 0.2 0.25 0.3 0 10 20 30 40 50 φ S X ( φ ) From the stem plot of the DFT, it is easy to see the deviations from the two rectangles that make up the DTFT S X (φ). We see that the effects of windowing are particularly pronounced at the break points. Comment: In twosincsdft, DFT must be real-valued since it is the DFT of an autocorrelation function. Hence the command stem(DFT) should be sufficient. However, due to numerical precision issues, the actual DFT tends to have a tiny imaginary hence we use the abs operator. Problem 11.10.7 Solution In this problem, we generalize the solution of Problem 11.4.1 using Theorem 11.9 with k = 1 for filter order M > 2 The optimum linear predictor filter h = h 0 h 1 · · · h M−1 of X n+1 given X n = X n−(M−1) · · · X n−1 X n is given by ←− h = h M−1 . . . h 0 ¸ ¸ ¸ = R −1 X n R XnX n+1 , (1) where R X n is given by Theorem 11.6 and R X n X n+1 is given by Equation (11.66). In this problem, R XnX n+1 = E X n−M+1 . . . X n ¸ ¸ ¸X n+1 ¸ ¸ ¸ = R X [M] . . . R X [1] ¸ ¸ ¸. (2) Going back to Theorem 9.7(a), the mean square error is e ∗ L = E (X n+1 − ←− h X n ) 2 = Var[X n+1 ] − ←− h R X n X n+1 . (3) For M > 2, we use the Matlab function onesteppredictor(r,M) to perform the calculations. function [h,e]=onesteppredictor(r,M); %usage: h=onesteppredictor(r,M); %input: r=[R_X(0) R_X(1) .. R_X(m-1)] %assumes R_X(n)==0 for n >=m %output=vector h for lmse predictor % xx=h’[X(n),X(n-1),..,X(n-M+1)] for X(n+1) m=length(r); r=[r(:);zeros(M-m+1,1)];%append zeros if needed RY=toeplitz(r(1:M)); RYX=r(M+1:-1:2); h=flipud(RY\RYX); e=r(1)-(flipud(h))’*RYX; 412 The code is pretty straightforward. Here are two examples just to show it works. >> [h2,e2]=onesteppredictor(r,2) h2 = 0.8571 -0.1429 e2 = 0.4286 >> [h4,e4]=onesteppredictor(r,4) h4 = 0.8000 0.0000 -0.0000 -0.2000 e4 = 0.4000 The problem also requested that we calculate the mean square error as a function of the filter order M. Here is a script and the resulting plot of the MSE. %onestepmse.m r=1-0.25*(0:3); ee=[ ]; for M=2:10, [h,e]=onesteppredictor(r,M); ee=[ee,e]; end plot(2:10,ee,’-d’); xlabel(’\itM’); ylabel(’\it MSE’); 2 4 6 8 10 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 M M S E Problem 11.10.8 Solution This problem generalizes the solution of Problem 11.10.7. to a k-step predictor. The optimum linear predictor filter h = h 0 h 1 · · · h M−1 of X n+1 given X n = X n−(M−1) · · · X n−1 X n is given by ←− h = h M−1 . . . h 0 ¸ ¸ ¸ = R −1 Xn R XnX n+k , (1) where R Xn is given by Theorem 11.6 and R XnX n+k is given by Equation (11.66). In this problem, R XnX n+k = E X n−M+1 . . . X n ¸ ¸ ¸X n+k ¸ ¸ ¸ = R X [M +k −1] . . . R X [k] ¸ ¸ ¸. (2) Going back to Theorem 9.7(a), the mean square error is e ∗ L = E (X n+k − ←− h X n ) 2 = Var[X n+k ] − ←− h R XnX n+k . (3) The Matlab function kpredictor(r,M) implements this solution. 413 function h=kpredictor(r,M,k); %usage: h=kpredictor(r,M,k); %input: r=[R_X(0) R_X(1) .. R_X(m-1)] %assumes R_X(n)==0 for n >=m %output=vector a % for lmse predictor xx=h’[X(n),X(n-1),..,X(n-N+1)] for X(n+k) m=length(r); r=[r(:);zeros(M-m+1,1)]; %appends zeros if needed RY=toeplitz(r(1:M)); RYX=r(1+k:M+k); h=flipud(RY\RYX); The code is pretty straightforward. Problem 11.10.9 Solution To generate the sample paths X n and Y n is relatively straightforward. For σ = η = 1, the solution to Problem 11.4.5 showed that the optimal linear predictor ˆ X n (Y n−1 ) of X n given Y n−1 is ˆ X n = cd d 2 + (1 −c 2 ) Y n−1 . (1) In the following, we plot sample paths of X n and ˆ X n . To show how sample paths are similar for different values of c and d, we construct all samples paths using the same sequence of noise samples W n and Z n . In addition, given c, we choose X 0 = X 00 / √ 1 −c 2 where X 00 is a Gaussian (0, 1) random variable that is the same for all sample paths. To do this, we write a function xpathplot(c,d,x00,z,w) which constructs a sample path given the parameters c and d, and the noise vectors z and w. A simple script predictpaths.m calls xpathplot to generate the requested sample paths. Here are the two programs: function [x,xhat]=xpaths(c,d,x00,z,w) n=length(z); n0=(0:(n-1))’; n1=(1:n)’; vx=1/(1-c^2); x0=sqrt(vx)*x00; x=(c.^(n1)*x0)+... toeplitz(c.^(n0),eye(1,n))*z; y=d*[x0;x(1:n-1)]+w; vy=((d^2)*vx) + 1; xhat=((c*d*vx)/vy)*y; plot(n1,x,’b-’,n1,xhat,’k:’); axis([0 50 -3 3]); xlabel(’\itn’); ylabel(’\itX_n’); %predictpaths.m w=randn(50,1);z=randn(50,1); x00=randn(1); xpaths(0.9,10,x00,z,w); pause; xpaths(0.9,1,x00,z,w); pause; xpaths(0.9,0.1,x00,z,w); pause; xpaths(0.6,10,x00,z,w); pause; xpaths(0.6,1,x00,z,w); pause; xpaths(0.6,0.1,x00,z,w); Some sample paths for X n and ˆ X n for the requested parameters are shown below. In each pair, the one-step prediction ˆ X n is marked by dots. 0 10 20 30 40 50 −2 0 2 n X n 0 10 20 30 40 50 −2 0 2 n X n (a) c = 0.9, d = 10 (d) c = 0.6, d = 10 414 0 10 20 30 40 50 −2 0 2 n X n 0 10 20 30 40 50 −2 0 2 n X n (b) c = 0.9, d = 1 (e) c = 0.6, d = 1 0 10 20 30 40 50 −2 0 2 n X n 0 10 20 30 40 50 −2 0 2 n X n (c) c = 0.9, d = 0.1 (f ) c = 0.6, d = 0.1 The mean square estimation error at step n was found to be e ∗ L (n) = e ∗ L = σ 2 d 2 + 1 d 2 + (1 −c 2 ) (2) We see that the mean square estimation error is e ∗ L (n) = e ∗ L , a constant for all n. In addition, e ∗ L is a decreasing function of d. In graphs (a) through (c), we see that the predictor tracks X n less well as d decreases because decreasing d corresponds to decreasing the contribution of X n−1 to the measurement Y n−1 . Effectively, the impact of the measurement noise is increased. As d decreases, the predictor places less emphasis on the measurement Y n and instead makes predictions closer to E[X] = 0. That is, when d is small in graphs (c) and (f), the predictor stays close to zero. With respect to c, the performance of the predictor is less easy to understand. In Equation (11), the mean square error e ∗ L is the product of Var[X n ] = σ 2 1 −c 2 1 −ρ 2 Xn,Y n−1 = (d 2 + 1)(1 −c 2 ) d 2 + (1 −c 2 ) (3) As a function of increasing c 2 , Var[X n ] increases while 1 −ρ 2 X n,Y n−1 decreases. Overall, the mean square error e ∗ L is an increasing function of c 2 . However, Var[X] is the mean square error obtained using a blind estimator that always predicts E[X] while 1 − ρ 2 X n,Y n−1 characterizes the extent to which the optimal linear predictor is better than the blind predictor. When we compare graphs (a)-(c) with c = 0.9 to graphs (d)-(f) with c = 0.6, we see greater variation in X n for larger a but in both cases, the predictor worked well when d was large. 415 Problem Solutions – Chapter 12 Problem 12.1.1 Solution From the given Markov chain, the state transition matrix is P = P 00 P 01 P 02 P 10 P 11 P 12 P 20 P 21 P 22 ¸ ¸ = 0.5 0.5 0 0.5 0.5 0 0.25 0.25 0.5 ¸ ¸ (1) Problem 12.1.2 Solution This problem is very straightforward if we keep in mind that P ij is the probability that we transition from state i to state j. From Example 12.1, the state transition matrix is P = ¸ P 00 P 01 P 10 P 11 = ¸ 1 −p p q 1 −q (1) Problem 12.1.3 Solution In addition to the normal OFF and ON states for packetized voice, we add state 2, the “mini-OFF” state. The Markov chain is 0 1 2 P 00 P 11 P 22 P 01 P 12 P 10 P 21 P 20 The only difference between this chain and an arbitrary 3 state chain is that transitions from 0, the OFF state, to state 2, the mini-OFF state, are not allowed. From the problem statement, the corresponding Markov chain is P = P 00 P 01 P 02 P 10 P 11 P 12 P 20 P 21 P 22 ¸ ¸ = 0.999929 0.000071 0 0.000100 0.899900 0.1 0.000100 0.699900 0.3 ¸ ¸ . (1) Problem 12.1.4 Solution Based on the problem statement, the state of the wireless LAN is given by the following Markov chain: 1 3 2 0.5 0.06 0.06 0.5 0.9 0.9 0.9 0.04 0.02 0.04 0.04 0.04 0 The Markov chain has state transition matrix P = 0.5 0.5 0 0 0.04 0.9 0.06 0 0.04 0 0.9 0.06 0.04 0.02 0.04 0.9 ¸ ¸ ¸ ¸ . (1) 416 Problem 12.1.5 Solution In this problem, it is helpful to go fact by fact to identify the information given. • “. . . each read or write operation reads or writes an entire file and that files contain a geometric number of sectors with mean 50.” This statement says that the length L of a file has PMF P L (l) = (1 −p) l−1 p l = 1, 2, . . . 0 otherwise (1) with p = 1/50 = 0.02. This says that when we write a sector, we will write another sector with probability 49/50 = 0.98. In terms of our Markov chain, if we are in the write state, we write another sector and stay in the write state with probability P 22 = 0.98. This fact also implies P 20 +P 21 = 0.02. Also, since files that are read obey the same length distribution, P 11 = 0.98 P 10 +P 12 = 0.02 (2) • “Further, suppose idle periods last for a geometric time with mean 500.” This statement simply says that given the system is idle, it remains idle for another unit of time with probability P 00 = 499/500 = 0.998. This also says that P 01 +P 02 = 0.002. • “After an idle period, the system is equally likely to read or write a file.” Given that at time n, X n = 0, this statement says that the conditional probability that P [X n+1 = 1|X n = 0, X n+1 = 0] = P 01 P 01 +P 02 = 0.5 (3) Combined with the earlier fact that P 01 +P 02 = 0.002, we learn that P 01 = P 02 = 0.001 (4) • “Following the completion of a read, a write follows with probability 0.8.” Here we learn that given that at time n, X n = 1, the conditional probability that P [X n+1 = 2|X n = 1, X n+1 = 1] = P 12 P 10 +P 12 = 0.8 (5) Combined with the earlier fact that P 10 +P 12 = 0.02, we learn that P 10 = 0.004 P 12 = 0.016 (6) • “However, on completion of a write operation, a read operation follows with probability 0.6.” Now we find that given that at time n, X n = 2, the conditional probability that P [X n+1 = 1|X n = 2, X n+1 = 2] = P 21 P 20 +P 21 = 0.6 (7) Combined with the earlier fact that P 20 +P 21 = 0.02, we learn that P 20 = 0.008 P 21 = 0.012 (8) 417 The complete tree is 0 1 2 0.998 0.98 0.98 0.001 0.016 0.004 0.012 0.001 0.008 Problem 12.1.6 Solution To determine whether the sequence ˆ X n forms a Markov chain, we write P ˆ X n+1 = j| ˆ X n = i, ˆ X n−1 = i n−1 , . . . , ˆ X 0 = i 0 = P X m(n+1) = j|X mn = i, X m(n−1) = i n−1 , . . . , X 0 = i 0 (1) = P X m(n+1) = j|X mn = i (2) = P ij (m) (3) The key step is in observing that the Markov property of X n implies that X mn summarizes the past history of the X n process. That is, given X mn , X m(n+1) is independent of X mk for all k < n. Finally, this implies that the state ˆ X n has one-step state transition probabilities equal to the m-step transition probabilities for the Markov chain X n . That is, ˆ P = P m . Problem 12.1.7 Solution Under construction. Problem 12.1.8 Solution Under construction. Problem 12.2.1 Solution From Example 12.1, the state transition matrix is P = ¸ P 00 P 01 P 10 P 11 = ¸ 1 −p p q 1 −q = ¸ 0.2 0.8 0.9 0.1 . (1) This chain is a special case of the two-state chain in Example 12.5 and Example 12.6 with p = 0.8 and q = 0.9. You may wish to derive the eigenvalues and eigenvectors of P in order to diagonalize and then find P n . Or, you may wish just to refer to Example 12.6 which showed that the chain has eigenvalues λ 1 = 1 and λ 2 = 1 −(p +q) = −0.7 and that the n-step transition matrix is P n = ¸ P 00 (n) P 01 (n) P 10 (n) P 11 (n) = 1 p +q ¸ q p q p + λ n 2 p +q ¸ p −p −q q (2) = 1 1.7 ¸ 0.9 0.8 0.9 0.8 + (−0.7) n 1.7 ¸ 0.8 −0.8 −0.9 0.9 . (3) 418 Problem 12.2.2 Solution From the given Markov chain, the state transition matrix is P = P 00 P 01 P 02 P 10 P 11 P 12 P 20 P 21 P 22 ¸ ¸ = 0.5 0.5 0 0.5 0.5 0 0.25 0.25 0.5 ¸ ¸ (1) The way to find P n is to make the decomposition P = SDS −1 where the columns of S are the eigenvectors of P and D is a diagonal matrix containing the eigenvalues of P. The eigenvalues are λ 1 = 1 λ 2 = 0 λ 3 = 1/2 (2) The corresponding eigenvectors are s 1 = 1 1 1 ¸ ¸ s 2 = −1 1 0 ¸ ¸ s 3 = 0 0 1 ¸ ¸ (3) The decomposition of P is P = SDS −1 = 1 −1 0 1 1 0 1 0 1 ¸ ¸ 1 0 0 0 0 0 0 0 0.5 ¸ ¸ 0.5 0.5 0 −0.5 0.5 0 −0.5 −0.5 1 ¸ ¸ (4) Finally, P n is P n = SD n S −1 = 1 −1 0 1 1 0 1 0 1 ¸ ¸ 1 0 0 0 0 0 0 0 (0.5) n ¸ ¸ 0.5 0.5 0 −0.5 0.5 0 −0.5 −0.5 1 ¸ ¸ (5) = 0.5 0.5 0 0.5 0.5 0 0.5 −(0.5) n+1 0.5 −(0.5) n+1 (0.5) n ¸ ¸ (6) Problem 12.3.1 Solution From Example 12.8, the state probabilities at time n are p(n) = 7 12 5 12 + λ n 2 5 12 p 0 − 7 12 p 1 −5 12 p 0 + 7 12 p 1 (1) with λ 2 = 1 −(p +q) = 344/350. With initial state probabilities p 0 p 1 = 1 0 , p(n) = 7 12 5 12 + λ n 2 5 12 −5 12 (2) The limiting state probabilities are π 0 π 1 = 7/12 5/12 . Note that p j (n) is within 1% of π j if |π j −p j (n)| ≤ 0.01π j (3) These requirements become λ n 2 ≤ 0.01 7/12 5/12 λ n 2 ≤ 0.01 (4) The minimum value n that meets both requirements is n = ¸ ln 0.01 ln λ 2 ¸ = 267 (5) Hence, after 267 time steps, the state probabilities are all within one percent of the limiting state probability vector. Note that in the packet voice system, the time step corresponded to a 10 ms time slot. Hence, 2.67 seconds are required. 419 Problem 12.3.2 Solution At time n − 1, let p i (n − 1) denote the state probabilities. By Theorem 12.4, the probability of state k at time n is p k (n) = ∞ ¸ i=0 p i (n −1)P ik (1) Since P ik = q for every state i, p k (n) = q ∞ ¸ i=0 p i (n −1) = q (2) Thus for any time n > 0, the probability of state k is q. Problem 12.3.3 Solution In this problem, the arrivals are the occurrences of packets in error. It would seem that N(t) cannot be a renewal process because the interarrival times seem to depend on the previous interarrival times. However, following a packet error, the sequence of packets that are correct (c) or in error (e) up to and including the next error is given by the tree ¨ ¨ ¨ ¨ ¨ ¨ e 0.9 c 0.1 ¨ ¨ ¨ ¨ ¨ ¨ e 0.01 c 0.99 ¨ ¨ ¨ ¨ ¨ ¨ e 0.01 c 0.99 •X=1 •X=2 •X=3 ... Assuming that sending a packet takes one unit of time, the time X until the next packet error has the PMF P X (x) = 0.9 x = 1 0.001(0.99) x−2 x = 2, 3, . . . 0 otherwise (1) Thus, following an error, the time until the next error always has the same PMF. Moreover, this time is independent of previous interarrival times since it depends only on the Bernoulli trials following a packet error. It would appear that N(t) is a renewal process; however, there is one additional complication. At time 0, we need to know the probability p of an error for the first packet. If p = 0.9, then X 1 , the time until the first error, has the same PMF as X above and the process is a renewal process. If p = 0.9, then the time until the first error is different from subsequent renewal times. In this case, the process is a delayed renewal process. Problem 12.4.1 Solution The hardest part of this problem is that we are asked to find all ways of replacing a branch. The primary problem with the Markov chain in Problem 12.1.1 is that state 2 is a transient state. We can get rid of the transient behavior by making a nonzero branch probability P 12 or P 02 . The possible ways to do this are: • Replace P 00 = 1/2 with P 02 = 1/2 • Replace P 01 = 1/2 with P 02 = 1/2 • Replace P 11 = 1/2 with P 12 = 1/2 • Replace P 10 = 1/2 with P 12 = 1/2 Keep in mind that even if we make one of these replacements, there will be at least one self transition probability, either P 00 or P 11 , that will be nonzero. This will guarantee that the resulting Markov chain will be aperiodic. 420 Problem 12.4.2 Solution The chain given in Example 12.11 has two communicating classes as well as the transient state 2. To create a single communicating class, we need to add a transition that enters state 2. Yet, no matter how we add such a transition, we will still have two communicating classes. A second transition will be needed to create a single communicating class. Thus, we need to add two branches. There are many possible pairs of branches. Some pairs of positive branch probabilities that create an irreducible chain are {P 12 , P 23 } {P 50 , P 02 } {P 51 , P 02 } (1) Problem 12.4.3 Solution The idea behind this claim is that if states j and i communicate, then sometimes when we go from state j back to state j, we will pass through state i. If E[T ij ] = ∞, then on those occasions we pass through i, the expected time to go to back to j will be infinite. This would suggest E[T jj ] = ∞ and thus state j would not be positive recurrent. Using a math to prove this requires a little bit of care. Suppose E[T ij ] = ∞. Since i and j communicate, we can find n, the smallest nonnegative integer such that P ji (n) > 0. Given we start in state j, let G i denote the event that we go through state i on our way back to j. By conditioning on G j , E [T jj ] = E [T jj |G i ] P [G i ] +E [T jj |G c i ] P [G c i ] (1) Since E[T jj |G c i ]P[G c i ] ≥ 0, E [T jj ] ≥ E [T jj |G i ] P [G i ] (2) Given the event G i , T jj = T ji +T ij . This implies E [T jj |G i ] = E [T ji |G i ] +E [T ij |G i ] ≥ E [T ij |G i ] (3) Since the random variable T ij assumes that we start in state i, E[T ij |G i ] = E[T ij ]. Thus E[T jj |G i ] ≥ E[T ij ]. In addition, P[G i ] ≥ P ji (n) since there may be paths with more than n hops that take the system from state j to i. These facts imply E [T jj ] ≥ E [T jj |G i ] P [G i ] ≥ E [T ij ] P ji (n) = ∞ (4) Thus, state j is not positive recurrent, which is a contradiction. Hence, it must be that E[T ij ] < ∞. Problem 12.5.1 Solution In the solution to Problem 12.1.5, we found that the state transition matrix was P = 0.998 0.001 0.001 0.004 0.98 0.016 0.008 0.012 0.98 ¸ ¸ (1) We can find the stationary probability vector π = π 0 π 1 π 2 by solving π = π P along with π 0 + π 1 + π 2 = 1. It’s possible to find the solution by hand but its easier to use MATLAB or a similar tool. The solution is π = [0.7536 0.1159 0.1304]. 421 Problem 12.5.2 Solution From the Markov chain given in Problem 12.1.1, the state transition matrix is P = 0.5 0.5 0 0.5 0.5 0 0.25 0.25 0.5 ¸ ¸ (1) We find the stationary probabilities π = π 0 π 1 π 2 by solving π = π P ¸ j=0 2π j = 1 (2) Of course, one equation of π = π P will be redundant. The three independent equations are π 0 = 0.5π 0 + 0.5π 1 + 0.25π 2 (3) π 2 = 0.5π 2 (4) 1 = π 0 + π 1 + π 2 (5) From the second equation, we see that π 2 = 0. This leaves the two equations: π 0 = 0.5π 0 + 0.5π 1 (6) 1 = π 0 + π 1 (7) Solving these two equations yields π 0 = π 1 = 0.5. The stationary probability vector is π = π 0 π 1 π 2 = 0.5 0.5 0 (8) If you happened to solve Problem 12.2.2, you would have found that the n-step transition matrix is P n = 0.5 0.5 0 0.5 0.5 0 0.5 −(0.5) n+1 0.5 −(0.5) n+1 (0.5) n ¸ ¸ (9) From Theorem 12.21, we know that each rows of the n-step transition matrix converges to π . In this case, lim n→∞ P n = lim n→∞ 0.5 0.5 0 0.5 0.5 0 0.5 −(0.5) n+1 0.5 −(0.5) n+1 (0.5) n ¸ ¸ = lim n→∞ 0.5 0.5 0 0.5 0.5 0 0.5 0.5 0 ¸ ¸ = π π π ¸ ¸ (10) Problem 12.5.3 Solution From the problem statement, the Markov chain is 1 2 4 3 p p p p 1-p p 1-p 1-p 1-p 1-p 0 422 The self-transitions in state 0 and state 4 guarantee that the Markov chain is aperiodic. Since the chain is also irreducible, we can find the stationary probabilities by solving π = π P; however, in this problem it is simpler to apply Theorem 12.13. In particular, by partitioning the chain between states i and i + 1, we obtain π i p = π i+1 (1 −p). (1) This implies π i+1 = απ i where α = p/(1 −p). It follows that π i = α i π 0 . REquiring the stationary probabilities to sum to 1 yields 4 ¸ i=0 π i = π 0 (1 + α + α 2 + α 3 + α 4 ) = 1. (2) This implies π 0 = 1 −α 5 1 −α (3) Thus, for i = 0, 1, . . . , 4, π i = 1 −α 5 1 −α α i = 1 − p 1−p 5 1 − p 1−p p 1 −p i . (4) Problem 12.5.4 Solution From the problem statement, the Markov chain is 1 i K i+1 p p p p 1-p p 1-p 1-p 1-p 1-p 0 ××× ××× The self-transitions in state 0 and state K guarantee that the Markov chain is aperiodic. Since the chain is also irreducible, we can find the stationary probabilities by solving π = π P; however, in this problem it is simpler to apply Theorem 12.13. In particular, by partitioning the chain between states i and i + 1, we obtain π i p = π i+1 (1 −p). (1) This implies π i+1 = απ i where α = p/(1 −p). It follows that π i = α i π 0 . REquiring the stationary probabilities to sum to 1 yields K ¸ i=0 π i = π 0 (1 + α + α 2 + · · · + α K ) = 1. (2) This implies π 0 = 1 −α K+1 1 −α (3) Thus, for i = 0, 1, . . . , K, π i = 1 −α K+1 1 −α α i = 1 − p 1−p K+1 1 − p 1−p p 1 −p i . (4) 423 Problem 12.5.5 Solution For this system, it’s hard to draw the entire Markov chain since from each state n there are six branches, each with probability 1/6 to states n + 1, n + 2, . . . , n + 6. (Of course, if n +k > K −1, then the transition is to state n +k mod K.) Nevertheless, finding the stationary probabilities is not very hard. In particular, the nth equation of π = π P yields π n = 1 6 (π n−6 + π n−5 + π n−4 + π n−3 + π n−2 + π n−1 ) . (1) Rather than try to solve these equations algebraically, it’s easier to guess that the solution is π = 1/K 1/K · · · 1/K . (2) It’s easy to check that 1/K = (1/6) · 6 · (1/K) Problem 12.5.6 Solution This system has three states: 0 front teller busy, rear teller idle 1 front teller busy, rear teller busy 2 front teller idle, rear teller busy We will assume the units of time are seconds. Thus, if a teller is busy one second, the teller will become idle in th next second with probability p = 1/120. The Markov chain for this system is 0 1 2 1-p p +(1-p) 2 2 1-p 1-p p(1-p) p(1-p) p We can solve this chain very easily for the stationary probability vector π. In particular, π 0 = (1 −p)π 0 +p(1 −p)π 1 (1) This implies that π 0 = (1 −p)π 1 . Similarly, π 2 = (1 −p)π 2 +p(1 −p)π 1 (2) yields π 2 = (1 −p)π 1 . Hence, by applying π 0 + π 1 + π 2 = 1, we obtain π 0 = π 2 = 1 −p 3 −2p = 119/358 (3) π 1 = 1 3 −2p = 120/358 (4) The stationary probability that both tellers are busy is π 1 = 120/358. 424 Problem 12.5.7 Solution In this case, we will examine the system each minute. For each customer in service, we need to keep track of how soon the customer will depart. For the state of the system, we will use (i, j), the remaining service requirements of the two customers, To reduce the number of states, we will order the requirements so that i ≤ j. For example, when two new customers start service each requiring two minutes of service, the system state will be (2, 2). Since the system assumes there is always a backlog of cars waiting to enter service, the set of states is 0 (0, 1) One teller is idle, the other teller has a customer requiring one more minute of service 1 (1, 1) Each teller has a customer requiring one more minute of service. 2 (1, 2) One teller has a customer requring one minute of service. The other teller has a customer requiring two minutes of service. 3 (2, 2) Each teller has a customer requiring two minutes of service. The resulting Markov chain is shown on the right. Note that when we departing from either state (0, 1) or (1, 1) corresponds to both custoemrs finishing service and two new customers entering service. The state transiton probabilities reflect the fact that both customer will have two minute service requirements with probability 1/4, or both customers will hae one minute service requirements with probability 1/4, or one customer will need one minute of service and the other will need two minutes of service with probability 1/2. ½ ¼ ¼ ¼ 1 ½ 1 0 (0,1) 2 (1,2) 1 (1,1) 3 (2,2) ¼ Writing the stationary probability equations for states 0, 2, and 3 and adding the constraint ¸ j π j = 1 yields the following equations: π 0 = π 2 (1) π 2 = (1/2)π 0 + (1/2)π 1 (2) π 3 = (1/4)π 0 + (1/4)π 1 (3) 1 = π 0 + π 1 + π 2 + π 3 (4) Substituting π 2 = π 0 in the second equation yields π 1 = π 0 . Substituting that result in the third equation yields π 3 = π 0 /2. Making sure the probabilities add up to 1 yields π = π 0 π 1 π 2 π 3 = 2/7 2/7 2/7 1/7 . (5) Both tellers are busy unless the system is in state 0. The stationary probability both tellers are busy is 1 −π 0 = 5/7. Problem 12.5.8 Solution Under construction. Problem 12.5.9 Solution Under construction. 425 Problem 12.6.1 Solution Equivalently, we can prove that if P ii = 0 for some i, then the chain cannot be periodic. So, suppose for state i, P ii > 0. Since P ii = P ii (1), we see that the largest d that divides n for all n such that P ii (n) > 0 is d = 1. Hence, state i is aperiodic and thus the chain is aperiodic. The converse that P ii = 0 for all i implies the chain is periodic is false. As a counterexample, consider the simple chain on the right with P ii = 0 for each i. Note that P 00 (2) > 0 and P 00 (3) > 0. The largest d that divides both 2 and 3 is d = 1. Hence, state 0 is aperiodic. Since the chain has one communicating class, the chain is also aperiodic. 2 0 1 0.5 0.5 0.5 0.5 0.5 0.5 Problem 12.6.2 Solution The Markov chain for this system is 0 1 P N >1|N >0 [ ] P N=2|N >1 [ ] K-1 K P N >3|N >2 [ ] P N=K|N >K-1 [ ] P N >K|N >K-1 [ ] P N >K-1|N >K-2 [ ] 2 P N >2|N >1 [ ] P N=3|N >2 [ ] P N=K+1|N >K [ ] … Note that P[N > 0] = 1 and that P [N > n|N > n −1] = P [N > n, N > n −1] P [N > n −1] = P [N > n] P [N > n −1] . (1) Solving π = π P yields π 1 = P [N > 1|N > 0] π 0 = P [N > 1] π 0 (2) π 2 = P [N > 2|N > 1] π 1 = P [N > 2] P [N > 1] π 1 = P [N > 2] π 0 (3) . . . π n = P [N > n|N > n −1] π n−1 = P [N > n] P [N > n −1] π n−1 = P [N > n] π 0 (4) Next we apply the requirement that the stationary probabilities sum to 1. Since P[N ≤ K + 1] = 1, we see for n ≥ K + 1 that P[N > n] = 0. Thus 1 = K ¸ n=0 π n = π 0 K ¸ n=0 P [N > n] = π 0 ∞ ¸ n=0 P [N > n] . (5) From Problem 2.5.11, we recall that ¸ ∞ n=0 P[N > n] = E[N]. This implies π 0 = 1/E[N] and that π n = P [N > n] E [N] . (6) This is exactly the same stationary distribution found in Quiz 12.5! In this problem, we can view the system state as describing the age of an object that is repeatedly replaced. In state 0, we start with a new (zero age) object, and each unit of time, the object ages one unit of time. The random 426 variable N is the lifetime of the object. A transition to state 0 corresponds to the current object expiring and being replaced by a new object. In Quiz 12.5, the system state described a countdown timer for the residual life of an object. At state 0, the system would transition to a state N = n corresponding to the lifetime of n for a new object. This object expires and is replaced each time that state 0 is reached. This solution and the solution to Quiz 12.5 show that the age and the residual life have the same stationary distribution. That is, if we inspect an object at an arbitrary time in the distant future, the PMF of the age of the object is the same as the PMF of the residual life. Problem 12.6.3 Solution Consider an arbitrary state i ∈ C l . To show that state i has period L, we must show that P ii (n) > 0 implies n is divisible by L. Consider any sequence of states i, i 1 , i 2 , . . . , i n = i that returns to state i in n steps. The structure of the Markov chain implies that i 1 ∈ C l+1 , i 2 ∈ C l+2 and so on. In general, i j ∈ C (l+j) mod L . Moreover, since state i ∈ C l , i n = i only if i n ∈ C l . Since i n ∈ C (l+n) mod L , we must have that (l +n) mod L = l. This occurs if and only if n is divisible by L. Problem 12.6.4 Solution This problem is easy as long as you don’t try to prove that the chain must have a single communi- cating class. A counterexample is easy to find. Here is a 4 state Markov chain with two recurrent communicating classes and each state having period 2. The two classes C 0 and C 1 are shown. 0 1 2 3 C 0 C 1 From each state i ∈ C 0 , all transitions are to states j ∈ C 1 . Similarly, from each state i ∈ C 1 , only transitions to states j ∈ C 0 are permitted. The sets {0, 1} and {2, 3} are each communicating classes. However, each state has period 2. Problem 12.6.5 Solution Since the Markov chain has period d, there exists a state i 0 such that P i 0 i 0 (d) > 0. For this state i 0 , let C n (i 0 ) denote the set of states that reachable from i 0 in n hops. We make the following claim: • If j ∈ C n (i 0 ) and j ∈ C n (i 0 ) with n > n, then n = n +kd for some integer k To prove this claim, we observe that irreducibility of the Markov chain implies there is a sequence of m hops from state j to state i 0 . Since there is an n hop path from i 0 to j and an m hop path from j back to i 0 , n +m = kd for some integer k. Similarly, since there is an n hop path from i 0 to j, n +m = k d for some integer k . Thus n −n = (n +m) −(n +m) = (k −k)d. (1) Now we define C n = ∞ ¸ k=0 C n+kd (i 0 ), n = 0, 1, . . . , d −1. (2) Because the chain is irreducible, any state i belongs to some set C n (i 0 ), and thus any state i belongs to at least one set C n . By our earlier claim, each node i belongs to exactly one set C n . 427 Hence, {C 0 , . . . , C d−1 } is a partition of the states of the states of the Markov chain. By construction of the set C n (i 0 ), there exists states i ∈ C n (i 0 ) and j ∈ C n+1 (i 0 ) such that P ij > 0. Hence there exists i ∈ C n and j ∈ C n+1 such that P ij > 0. Now suppose there exists states i ∈ C n and j ∈ C n+m such that P ij > 0 and m > 1. In this case, the sequence of n +kd hops from i 0 to i followed by one hop to state j is an n + 1 +kd hop path from i 0 to j, implying j ∈ C n+1 . This contradicts the fact that j cannot belong to both C n+1 and C n+m . Hence no such transition from i ∈ C n to j ∈ C n+m is possible. Problem 12.8.1 Solution A careful reader of the text will observe that the Markov chain in this problem is identical to the Markov chain introduced in Example 12.21: 1 2 p p p 1-p 1-p 1-p 1-p 0 ××× The stationary probabilities were found in Example 12.26 to be π i = (1 −α)α i , i = 0, 1, 2, . . . (1) where α = p/(1 −p). Note that the stationary probabilities do not exist if α ≥ 1, or equivalently, p ≥ 1/2. Problem 12.8.2 Solution If there are k customers in the system at time n, then at time n + 1, the number of customers in the system is either n − 1 (if the customer in service departs and no new customer arrives), n (if either there is no new arrival and no departure or if there is both a new arrival and a departure) or n + 1, if there is a new arrival but no new departure. The transition probabilities are given by the following chain: 1 2 p d d d a a 1-a-d 1-a-d 1-p 0 where α = p(1 −q) and δ = q(1 −p). To find the stationary probabilities, we apply Theorem 12.13 by partitioning the state space between states S = {0, 1, . . . , i} and S = {i + 1, i + 2, . . .} as shown in Figure 12.4. By Theorem 12.13, for state i > 0, π i α = π i+1 δ. (1) This implies π i+1 = (α/δ)π i . A cut between states 0 and 1 yields π 1 = (p/δ)π 0 . Combining these results, we have for any state i > 0, π i = p δ α δ i−1 π 0 (2) Under the condition α < δ, it follows that ∞ ¸ i=0 π i = π 0 + π 0 ∞ ¸ i=1 p δ α δ i−1 = π 0 1 + p/δ 1 −α/δ (3) 428 since p < q implies α/δ < 1. Thus, applying ¸ i π i = 1 and noting δ −α = q −p, we have π 0 = q q −p , π i = p (1 −p)(1 −q) ¸ p/(1 −p) q/(1 −q) i−1 , i = 1, 2, . . . (4) Note that α < δ if and only if p < q, which is both sufficient and necessary for the Markov chain to be positive recurrent. Problem 12.9.1 Solution From the problem statement, we learn that in each state i, the tiger spends an exponential time with parameter λ i . When we measure time in hours, λ 0 = q 01 = 1/3 λ 1 = q 12 = 1/2 λ 2 = q 20 = 2 (1) The corresponding continous time Markov chain is shown below: 2 0 1 ½ 2 1/3 The state probabilities satisfy 1 3 p 0 = 2p 2 1 2 p 1 = 1 3 p 0 p 0 +p 1 +p 2 = 1 (2) The solution is p 0 p 1 p 2 = 6/11 4/11 1/11 (3) Problem 12.9.2 Solution In the continuous time chain, we have states 0 (silent) and 1 (active). The transition rates and the chain are 1 0 0.71 1 q 01 = 1 1.4 = 0.7143 q 10 = 1.0 (1) The stationary probabilities satisfy (1/1.4)p 0 = p 1 . Since p 0 +p 1 = 1, the stationary probabilities are p 0 = 1.4 2.4 = 7 12 p 1 = 1 2.4 = 5 12 (2) In this case, the continuous time chain and the discrete time chain have the exact same state probabilities. In this problem, this is not surprising since we could use a renewal-reward process to calculate the fraction of time spent in state 0. From the renewal-reward process, we know that the fraction of time spent in state 0 depends only on the expected time in each state. Since in both the discrete time and continuous time chains, the expected time in each state is the same, the stationary probabilties must be the same. It is always possible to approximate a continuous time chain with a discrete time chain in which the unit of time is chosen to be very small. In general however, the stationary probabilities of the two chains will be close though not identical. 429 Problem 12.9.3 Solution From each state i, there are transitions of rate q ij = 1 to each of the other k −1 states. Thus each state i has departure rate ν i = k −1. Thus, the stationary probabilities satisfy p j (k −1) = ¸ i=j p j j = 1, 2, . . . , k (1) It is easy to verify that the solution to these equations is p j = 1 k j = 1, 2, . . . , k (2) Problem 12.9.4 Solution In this problem, we build a two-state Markov chain such that the system in state i ∈ {0, 1} if the most recent arrival of either Poisson process is type i. Note that if the system is in state 0, transitions to state 1 occur with rate λ 1 . If the system is in state 1, transitions to state 0 occur at rate λ 0 . The continuous time Markov chain is just 0 1 l 1 l 0 The stationary probabilities satisfy p 0 λ 1 = p 1 λ 0 . Thus p 1 = (λ 1 /λ 0 )p 0 . Since p 0 +p 1 = 1, we have that p 0 + (λ 1 /λ 0 )p 0 = 1. (1) This implies p 0 = λ 0 λ 0 + λ 1 , p 1 = λ 1 λ 0 + λ 1 . (2) It is also possible to solve this problem using a discrete time Markov chain. One way to do this is to assume a very small time step ∆. In state 0, a transition to state 1 occurs with probability λ 1 ∆; otherwise the system stays in state 0 with probability 1−λ 1 ∆. Similarly, in state 1, a transition to state 0 occurs with probability λ 0 ∆; otherwise the system stays in state 1 with probability 1−λ 0 ∆. Here is the Markov chain for this discrete time system: 0 1 l D 1 1-l D 1 l D 0 1-l D 0 Not surprisingly, the stationary probabilities for this discrete time system are π 0 = λ 0 λ 0 + λ 1 , π 1 = λ 1 λ 0 + λ 1 . (3) Problem 12.10.1 Solution In Equation (12.93), we found that the blocking probability of the M/M/c/c queue was given by the Erlang-B formula P [B] = P N (c) = ρ c /c! ¸ c k=0 ρ k /k! (1) 430 The parameter ρ = λ/µ is the normalized load. When c = 2, the blocking probability is P [B] = ρ 2 /2 1 + ρ + ρ 2 /2 (2) Setting P[B] = 0.1 yields the quadratic equation ρ 2 − 2 9 ρ − 2 9 = 0 (3) The solutions to this quadratic are ρ = 1 ± √ 19 9 (4) The meaningful nonnegative solution is ρ = (1 + √ 19)/9 = 0.5954. Problem 12.10.2 Solution When we double the call arrival rate, the offered load ρ = λ/µ doubles to ρ = 160. However, since we double the number of circuits to c = 200, the offered load per circuit remains the same. In this case, if we inspect the system after a long time, the number of calls in progress is described by the stationary probabilities and has PMF P N (n) = ρ n /n! ¸ 200 k=0 ρ k /k! n = 0, 1, . . . , 200 0 otherwise (1) The probability a call is blocked is P [B] = P N (200) = ρ 200 /200! ¸ 200 k=0 ρ k /k! = 2.76 10 −4 (2) Note that although the load per server remains the same, doubling the number of circuits to 200 caused the blocking probability to go down by more than a factor of 10 (from 0.004 to 2.7610 −4 ). This is a general property of the Erlang-B formula and is called trunking efficiency by telephone system engineers. The basic principle ss that it’s more efficient to share resources among larger groups. The hard part of calculating P[B] is that most calculators, including MATLAB have trouble calculating 200!. (In MATLAB, factorial is calculated using the gamma function. That is, 200! = gamma(201).) To do these calculations, you need to observe that if q n = ρ n /n!, then q n+1 = ρ n q n−1 (3) A simple MATLAB program that uses this fact to calculate the Erlang-B formula for large values of c is function y=erlangbsimple(r,c) %load is r=lambda/mu %number of servers is c p=1.0; psum=1.0; for k=1:c p=p*r/k; psum=psum+p; end y=p/psum; 431 Essentially the problems with the calculations of erlangbsimple.m are the same as those of cal- culating the Poisson PMF. A better program for calculating the Erlang-B formula uses the im- provements employed in poissonpmf to calculate the Poisson PMF for large values. Here is the code: function pb=erlangb(rho,c); %Usage: pb=erlangb(rho,c) %returns the Erlang-B blocking %probability for sn M/M/c/c %queue with load rho pn=exp(-rho)*poissonpmf(rho,0:c); pb=pn(c+1)/sum(pn); Problem 12.10.3 Solution In the M/M/1/c queue, there is one server and the system has capacity c. That is, in addition to the server, there is a waiting room that can hold c − 1 customers. With arival rate λ and service rate µ, the Markov chain for this queue is c-1 c 1 0 λ λ λ λ µ µ µ µ By Theorem 12.24, the stationary probabilities satisfy p i−1 λ = p i µ. By defining ρ = λ/mu, we have p i = ρp i−1 , which implies p n = ρ n p 0 n = 0, 1, . . . , c (1) applying the requirement that the stationary probabilities sum to 1 yields c ¸ i=0 p i = p 0 1 + ρ + ρ 2 + · · · + ρ c = 1 (2) This implies p 0 = 1 −ρ 1 −ρ c+1 (3) The stationary probabilities are p n = (1 −ρ)ρ n 1 −ρ c+1 n = 0, 1, . . . , c (4) Problem 12.10.4 Solution Since the arrivals are a Poisson process and since the service requirements are exponentially dis- tributed, the set of toll booths are an M/M/c/∞ queue. With an arrival rate of λ and a service rate of µ = 1, the Markov chain is: c c+1 1 0 λ λ λ λ λ µ 2µ cµ cµ cµ 432 In the solution to Quiz 12.10, we found that the stationary probabilities for the queue satisfied p n = p 0 ρ n /n! n = 1, 2, . . . , c p 0 (ρ/c) n−c ρ c /c! n = c + 1, c + 2, . . . (1) where ρ = λ/µ = λ. We must be sure that ρ is small enough that there exists p 0 > 0 such that ∞ ¸ n=0 p n = p 0 1 + c ¸ n=1 ρ n n! + ρ c c! ∞ ¸ n=c+1 ρ c n−c = 1 (2) This requirement is met if and only if the infinite sum converges, which occurs if and only if ∞ ¸ n=c+1 ρ c n−c = ∞ ¸ j=1 ρ c j < ∞ (3) That is, p 0 > 0 if and only if ρ/c < 1, or λ < c. In short, if the arrival rate in cars per second is less than the service rate (in cars per second) when all booths are busy, then the Markov chain has a stationary distribution. Note that if ρ > c, then the Markov chain is no longer positive recurrent and the backlog of cars will grow to infinity. Problem 12.10.5 Solution (a) In this case, we have two M/M/1 queues, each with an arrival rate of λ/2. By defining ρ = λ/µ, each queue has a stationary distribution p n = (1 −ρ/2) (ρ/2) n n = 0, 1, . . . (1) Note that in this case, the expected number in queue i is E [N i ] = ∞ ¸ n=0 np n = ρ/2 1 −ρ/2 (2) The expected number in the system is E [N 1 ] +E [N 2 ] = ρ 1 −ρ/2 (3) (b) The combined queue is an M/M/2/∞queue. As in the solution to Quiz 12.10, the stationary probabilities satisfy p n = p 0 ρ n /n! n = 1, 2 p 0 ρ n−2 ρ 2 /2 n = 3, 4, . . . (4) The requirement that ¸ ∞ n=0 p n = 1 yields p 0 = 1 + ρ + ρ 2 2 + ρ 2 2 ρ/2 1 −ρ/2 −1 = 1 −ρ/2 1 + ρ/2 (5) The expected number in the system is E[N] = ¸ ∞ n=1 np n . Some algebra will show that E [N] = ρ 1 −(ρ/2) 2 (6) We see that the average number in the combined queue is lower than in the system with individual queues. The reason for this is that in the system with individual queues, there is a possibility that one of the queues becomes empty while there is more than one person in the other queue. 433 Problem 12.10.6 Solution The LCFS queue operates in a way that is quite different from the usual first come, first served queue. However, under the assumptions of exponential service times and Poisson arrivals, customers arrive at rate λ and depart at rate µ, no matter which service discipline is used. The Markov chain for the LCFS queue is the same as the Markov chain for the M/M/1 first come, first served queue: 2 3 1 0 l m l m l m l m It would seem that the LCFS queue should be less efficient than the ordinary M/M/1 queue because a new arrival causes us to discard the work done on the customer in service. This is not the case, however, because the memoryless property of the exponential PDF implies that no matter how much service had already been performed, the remaining service time remains identical to that of a new customer. Problem 12.10.7 Solution Since both types of calls have exponential holding times, the number of calls in the system can be used as the system state. The corresponding Markov chain is c-1 c c-r 0 l+h l+h l l l 1 c-r c-r+1 c-1 c When the number of calls, n, is less than c − r, we admit either type of call and q n,n+1 = λ + h. When n ≥ c −r, we block the new calls and we admit only handoff calls so that q n,n+1 = h. Since the service times are exponential with an average time of 1 minute, the call departure rate in state n is n calls per minute. Theorem 12.24 says that the stationary probabilities p n satisfy p n = λ +h n p n−1 n = 1, 2, . . . , c −r λ n p n−1 n = c −r + 1, c −r + 2, . . . , c (1) This implies p n = (λ +h) n n! p 0 n = 1, 2, . . . , c −r (λ +h) c−r λ n−(c−r) n! p 0 n = c −r + 1, c −r + 2, . . . , c (2) The requirement that ¸ c n=1 p n = 1 yields p 0 ¸ c−r ¸ n=0 (λ +h) n n! + (λ +h) c−r c ¸ n=c−r+1 λ n−(c−r) n! ¸ = 1 (3) Finally, a handoff call is dropped if and only if a new call finds the system with c calls in progress. The probability that a handoff call is dropped is P [H] = p c = (λ +h) c−r λ r c! p 0 = (λ +h) c−r λ r /c! ¸ c−r n=0 (λ+h) n n! + λ+h λ c−r ¸ c n=c−r+1 λ n n! (4) 434 Problem 12.11.1 Solution Here is the Markov chain describing the free throws. 1 -4 2 -3 4 -1 3 -2 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 0 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 Note that state 4 corresponds to “4 or more consecutive successes” while state −4 corresponds to “4 or more consecutive misses.” We denote the stationary probabilities by the vector π = π −4 π −3 π −2 π −1 π 0 π 1 π 2 π 3 π 4 . (1) For this vector π, the state transition matrix is P = 0.9 0 0 0 0 0.1 0 0 0 0.8 0 0 0 0 0.2 0 0 0 0 0.7 0 0 0 0.3 0 0 0 0 0 0.6 0 0 0.4 0 0 0 0 0 0 0.5 0 0.5 0 0 0 0 0 0 0.4 0 0 0.6 0 0 0 0 0 0.3 0 0 0 0.7 0 0 0 0 0.2 0 0 0 0 0.8 0 0 0 0.1 0 0 0 0 0.9 ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ . (2) To solve the problem at hand, we divide the work into two functions; freethrowmat(n) returns the n step transition matrix and freethrowp(n) that calculates the probability of a success on the free throw n. function Pn=freethrowmat(n); P=[0.9 0 0 0 0 0.1 0 0 0;... 0.8 0 0 0 0 0.2 0 0 0;... 0 0.7 0 0 0 0.3 0 0 0;... 0 0 0.6 0 0 0.4 0 0 0;... 0 0 0 0.5 0 0.5 0 0 0;... 0 0 0 0.4 0 0 0.6 0 0;... 0 0 0 0.3 0 0 0 0.7 0;... 0 0 0 0.2 0 0 0 0 0.8;... 0 0 0 0.1 0 0 0 0 0.9]; Pn=P^n; function ps=freethrowp(n); PP=freethrowmat(n-1); p0=[zeros(1,4) 1 ... zeros(1,4)]; ps=p0*PP*0.1*(1:9)’; In freethrowp.m, p0 is the initial state probability row vector π (0). Thus p0*PP is the state probability row vector after n − 1 free throws. Finally, p0*PP*0.1*(1:9)’ multiplies the state probability vector by the conditional probability of successful free throw given the current state. The answer to our problem is simply 435 >> freethrowp(11) ans = 0.5000 >> In retrospect, the calculations are unnecessary! Because the system starts in state 0, symmetry of the Markov chain dictates that states −k and k will have the same probability at every time step. Because state −k has success probability 0.5 − 0.1k while state k has success probability 0.5 +0.1k, the conditional success probability given the system is in state −k or k is 0.5. Averaged over k = 1, 2, 3, 4, the average success probability is still 0.5. Comment: Perhaps finding the stationary distribution is more interesting. This is done fairly easily: >> p=dmcstatprob(freethrowmat(1)); >> p’ ans = 0.3123 0.0390 0.0558 0.0929 0 0.0929 0.0558 0.0390 0.3123 About sixty precent of the time the shooter has either made four or more consecutive free throws or missed four or more free throws. On the other hand, one can argue that in a basketball game, a shooter rarely gets to take more than a half dozen (or so) free throws, so perhaps the stationary distribution isn’t all that interesting. Problem 12.11.2 Solution In this problem, we model the system as a continuous time Markov chain. The clerk and the manager each represent a “server.” The state describes the number of customers in the queue and the number of active servers. The Markov chain issomewhat complicated because when the number of customers in the store is 2, 3, or 4, the number of servers may be 1 or may be 2, depending on whether the manager became an active server. When just the clerk is serving, the service rate is 1 customer per minute. When the manager and clerk are both serving, the rate is 2 customers per minute. Here is the Markov chain: 1 4c 4m 2c 2m 5 6 1 1 2 1 2 2 1 2 2 2 0 3c 3m l l l l l l l l l l In states 2c, 3c and 4c, only the clerk is working. In states 2m, 3m and 4m, the manager is also working. The state space {0, 1, 2c, 3c, 4c, 2m, 3m, 4m, 5, 6, . . .} is countably infinite. Finding the state probabilities is a little bit complicated because the are enough states that we would like to use Matlab; however, Matlab can only handle a finite state space. Fortunately, we can use Matlab because the state space for states n ≥ 5 has a simple structure. We observe for n ≥ 5 that the average rate of transitions from state n to state n+1 must equal the average rate of transitions from state n + 1 to state n, implying λp n = 2p n+1 , n = 5, 6, . . . (1) It follows that p n+1 = (λ/2)p n and that p n = α n−5 p 5 , n = 5, 6, . . . , (2) 436 where α = λ < 2 < 1. The requirement that the stationary probabilities sum to 1 implies 1 = p 0 +p 1 + 4 ¸ j=2 (p jc +p jm ) + ∞ ¸ n=5 p n (3) = p 0 +p 1 + 4 ¸ j=2 (p jc +p jm ) +p 5 ∞ ¸ n=5 α n−5 (4) = p 0 +p 1 + 4 ¸ j=2 (p jc +p jm ) + p 5 1 −α (5) This is convenient because for each state j < 5, we can solve for the staitonary probabilities. In particular, we use Theorem 12.23 to write ¸¸ i r ij p i = 0. This leads to a set of matrix equations for the state probability vector p = p 0 p 1 p 2c p 3c p 3c p 4c p 2m p 3m p 4m p 5 (6) The rate transition matrix associated with p is Q = p 0 p 1 p 2c p 3c p 4c p 2m p 3m p 4m p 5 0 λ 0 0 0 0 0 0 0 1 0 λ 0 0 0 0 0 0 0 1 0 λ 0 0 0 0 0 0 0 1 0 λ 0 0 0 0 0 0 0 1 0 0 0 0 λ 0 2 0 0 0 0 λ 0 0 0 0 0 0 0 2 0 λ 0 0 0 0 0 0 0 2 0 λ 0 0 0 0 0 0 0 2 0 ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ , (7) where the first row just shows the correspondence of the state probabilities and the matrix columns. For each state i, excepting state 5, the departure rate ν i from that state equals the sum of entries of the corresponding row of Q. To find the stationary probabilities, our normal procedure is to use Theorem 12.23 and solve p R = 0 and p 1 = 1, where R is the same as Q except the zero diagonal entries are replaced by −ν i . The equation p 1 = 1 replaces one column of the set of matrix equations. This is the approach of cmcstatprob.m. In this problem, we follow almost the same procedure. We form the matrix R by replacing the diagonal entries of Q. However, instead of replacing an arbitrary column with the equation p 1 = 1, we replace the column corresponding to p 5 with the equation p 0 +p 1 +p 2c +p 3c +p 4c +p 2m +p 3m +p 4m + p 5 1 −α = 1. (8) That is, we solve p R = 0 0 0 0 0 0 0 0 0 1 . (9) 437 where R = −λ λ 0 0 0 0 0 0 1 1 −1 −λ λ 0 0 0 0 0 1 0 1 −1 −λ λ 0 0 0 0 1 0 0 1 −1 −λ λ 0 0 0 1 0 0 0 1 −1 −λ 0 0 0 1 0 2 0 0 0 −2 −λ λ 0 1 0 0 0 0 0 2 −2 −λ λ 1 0 0 0 0 0 0 2 −2 −λ 1 0 0 0 0 0 0 0 2 1 1−α ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ (10) Given the stationary distribution, we can now find E[N] and P[W]. Recall that N is the number of customers in the system at a time in the distant future. Defining p n = p nc +p nm , n = 2, 3, 4, (11) we can write E [N] = ∞ ¸ n=0 np n = 4 ¸ n=0 np n + ∞ ¸ n=5 np 5 α n−5 (12) The substitution k = n −5 yields E [N] = 4 ¸ n=0 np n +p 5 ∞ ¸ k=0 (k + 5)α k (13) = 4 ¸ n=0 np n +p 5 5 1 −α +p 5 ∞ ¸ k=0 kα k (14) From Math Fact B.7, we conclude that E [N] = 4 ¸ n=0 np n +p 5 5 1 −α + α (1 −α) 2 (15) = 4 ¸ n=0 np n +p 5 5 −4α (1 −α) 2 (16) Furthermore, the manager is working unless the system is in state 0, 1, 2c, 3c, or 4c. Thus P [W] = 1 −(p 0 +p 1 +p 2c +p 3c +p 4c ). (17) We implement these equations in the following program, alongside the corresponding output. 438 function [EN,PW]=clerks(lam); Q=diag(lam*[1 1 1 1 0 1 1 1],1); Q=Q+diag([1 1 1 1 0 2 2 2],-1); Q(6,2)=2; Q(5,9)=lam; R=Q-diag(sum(Q,2)); n=size(Q,1); a=lam/2; R(:,n)=[ones(1,n-1) 1/(1-a)]’; pv=([zeros(1,n-1) 1]*R^(-1)); EN=pv*[0;1;2;3;4;2;3;4; ... (5-4*a)/(1-a)^2]; PW=1-sum(pv(1:5)); >> [en05,pw05]=clerks(0.5) en05 = 0.8217 pw05 = 0.0233 >> [en10,pw10]=clerks(1.0) en10 = 2.1111 pw10 = 0.2222 >> [en15,pw15]=clerks(1.5) en15 = 4.5036 pw15 = 0.5772 >> We see that in going from an arrival rate of 0.5 customers per minute to 1.5 customers per minute, the average number of customers goes from 0.82 to 4.5 customers. Similarly, the probability the manager is working rises from 0.02 to 0.57. Problem 12.11.3 Solution Although the inventory system in this problem is relatively simple, the performance analysis is suprisingly complicated. We can model the system as a Markov chain with state X n equal to the number of brake pads in stock at the start of day n. Note that the range of X n is S X = {50, 51, . . . , 109}. To evaluate the system, we need to find the state transition matrix for the Markov chain. We express the transition probabilities in terms of P K (·), the PMF of the number of brake pads ordered on an arbitary day. In state i, there are two possibilities: • If 50 ≤ i ≤ 59, then there will be min(i, K) brake pads sold. At the end of the day, the number of pads remaining is less than 60, and so 50 more pads are delivered overnight. Thus the next state is j = 50 if K ≥ i pads are ordered, i pads are sold and 50 pads are delivered overnight. On the other hand, if there are K < i orders, then the next state is j = i −K+50. In this case, P ij = P [K ≥ i] j = 50, P K (50 +i −j) j = 51, 52, . . . , 50 +i. (1) • If 60 ≤ i ≤ 109, then there are several subcases: – j = 50: If there are K ≥ i orders, then all i pads are sold, 50 pads are delivered overnight, and the next state is j = 50. Thus P ij = P [K ≥ i] , j = 50. (2) – 51 ≤ j ≤ 59: If 50 +i −j pads are sold, then j −50 pads ar left at the end of the day. In this case, 50 pads are delivered overnight, and the next state is j with probability P ij = P K (50 +i −j) , j = 51, 52, . . . , 59. (3) – 60 ≤ j ≤ i: If there are K = i − j pads ordered, then there will be j ≥ 60 pads at the end of the day and the next state is j. On the other hand, if K = 50 + i − j pads are 439 ordered, then there will be i − (50 + i − j) = j − 50 pads at the end of the day. Since 60 ≤ j ≤ 109, 10 ≤ j −50 ≤ 59, there will be 50 pads delivered overnight and the next state will be j. Thus P ij = P K (i −j) +P K (50 +i −j) , j = 60, 61, . . . , i. (4) – For i < j ≤ 109, state j can be reached from state i if there 50 + i − j orders, leaving i −(50+i −j) = j −50 in stock at the end of the day. This implies 50 pads are delivered overnight and the next stage is j. the probability of this event is P ij = P K (50 +i −j) , j = i + 1, i + 2, . . . , 109. (5) We can summarize these observations in this set of state transition probabilities: P ij = P [K ≥ i] 50 ≤ i ≤ 109, j = 50, P K (50 +i −j) 50 ≤ i ≤ 59, 51 ≤ j ≤ 50 +i, P K (50 +i −j) 60 ≤ i ≤ 109, 51 ≤ j ≤ 59, P K (i −j) +P K (50 +i −j) 60 ≤ i ≤ 109, 60 ≤ j ≤ i, P K (50 +i −j) 60 ≤ i ≤ 108, i + 1 ≤ j ≤ 109 0 otherwise (6) Note that the “0 otherwise” rule comes into effect when 50 ≤ i ≤ 59 and j > 50 + i. To simplify these rules, we observe that P K (k) = 0 for k < 0. This implies P K (50 +i −j) = 0 for j > 50 + i. In addition, for j > i, P K (i −j) = 0. These facts imply that we can write the state transition probabilities in the simpler form: P ij = P [K ≥ i] 50 ≤ i ≤ 109, j = 50, P K (50 +i −j) 50 ≤ i ≤ 59, 51 ≤ j P K (50 +i −j) 60 ≤ i ≤ 109, 51 ≤ j ≤ 59, P K (i −j) +P K (50 +i −j) 60 ≤ i ≤ 109, 60 ≤ j (7) Finally, we make the definitions β i = P [K ≥ i] , γ k = P K (50 +k) , δ k = P K (k) +P K (50 +k) . (8) With these definitions, the state transition probabilities are P ij = β i 50 ≤ i ≤ 109, j = 50, γ i−j 50 ≤ i ≤ 59, 51 ≤ j γ i−j 60 ≤ i ≤ 109, 51 ≤ j ≤ 59, δ i−j 60 ≤ i ≤ 109, 60 ≤ j (9) 440 Expressed as a table, the state transition matrix P is i\j 50 51 · · · 59 60 · · · · · · · · · · · · 109 50 β 50 γ −1 · · · γ −9 · · · · · · · · · · · · · · · γ −59 51 β 51 γ 0 . . . . . . . . . . . . . . . . . . . . . . . . γ −1 . . . . . . . . . 59 β 59 γ 8 · · · γ 0 γ −1 · · · γ −9 · · · · · · γ −50 60 β 60 γ 9 · · · γ 1 δ 0 · · · δ −9 · · · δ −49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . γ 9 . . . δ −9 . . . . . . . . . . . . . . . δ 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 β 109 γ 58 · · · γ 50 δ 49 · · · δ 9 · · · · · · δ 0 (10) In terms of Matlab, all we need to do is to encode the matrix P, calculate the stationary probability vector π, and then calculate E[Y ], the expected number of pads sold on a typical day. To calculate E[Y ], we use iterated expectation. The number of pads ordered is the Poisson random variable K. We assume that on a day n that X n = i and we calculate the conditional expectation E [Y |X n = i] = E [min(K, i)] = i−1 ¸ j=0 jP K (j) +iP [K ≥ i] . (11) Since only states i ≥ 50 are possible, we can write E [Y |X n = i] = 48 ¸ j=0 jP K (j) + i−1 ¸ j=49 jP K (j) +iP [K ≥ i] . (12) Finally, we assume that on a typical day n, the state of the system X n is described by the stationary probabilities P[X n = i] = π i and we calculate E [Y ] = 109 ¸ i=50 E [Y |X n = i] π i . (13) These calculations are given in this Matlab program: 441 function [pstat,ey]=brakepads(alpha); s=(50:109)’; beta=1-poissoncdf(alpha,s-1); grow=poissonpmf(alpha,50+(-1:-1:-59)); gcol=poissonpmf(alpha,50+(-1:58)); drow=poissonpmf(alpha,0:-1:-49); dcol=poissonpmf(alpha,0:49); P=[beta,toeplitz(gcol,grow)]; P(11:60,11:60)=P(11:60,11:60)... +toeplitz(dcol,drow); pstat=dmcstatprob(P); [I,J]=ndgrid(49:108,49:108); G=J.*(I>=J); EYX=(G*gcol)+(s.*beta); pk=poissonpmf(alpha,0:48); EYX=EYX+(0:48)*pk; ey=(EYX’)*pstat; The first half of brakepads.m constructs P to cal- culate the stationary probabilities. The first col- umn of P is just the vector beta = β 50 · · · β 109 . (14) The rest of P is easy to construct using toeplitz function. We first build an asymmetric Toeplitz matrix with first row and first column grow = γ −1 γ −2 · · · γ −59 (15) gcol = γ −1 γ 0 · · · γ 58 (16) Note that δ k = P K (k) +γ k . Thus, to construct the Toeplitz matrix in the lower right corner of P, we simply add the Toeplitz matrix corresponding to the missing P K (k) term. The second half of brakepads.m calculates E[Y ] using the iterated expectation. Note that EYX = E [Y |X n = 50] · · · E [Y |X n = 109] . (17) The somewhat convoluted code becomes clearer by noting the following correspondences: E [Y |X n = i] = 48 ¸ j=0 jP K (j) . .. . (0:48)*pk + i−1 ¸ j=49 jP K (j) . .. . G*gcol +iP [K ≥ i] . .. . s.*beta . (18) To find E[Y ], we execute the commands: >> [ps,ey]=brakepads(50); >> ey ey = 49.4154 >> We see that although the store receives 50 orders for brake pads on average, the average number sold is only 49.42 because once in awhile the pads are out of stock. Some experimentation will show that if the expected number of orders α is significantly less than 50, then the expected number of brake pads sold each days is very close to α. On the other hand, if α 50, then the each day the store will run out of pads and will get a delivery of 50 pads ech night. The expected number of unfulfilled orders will be very close to α −50. Note that a new inventory policy in which the overnight delivery is more than 50 pads or the threshold for getting a shipment is more than 60 will reduce the expected numer of unfulfilled orders. Whether such a change in policy is a good idea depends on factors such as the carrying cost of inventory that are absent from our simple model. Problem 12.11.4 Solution This problem is actually very easy. The state of the system is given by X, the number of cars in the system.When X = 0, both tellers are idle. When X = 1, one teller is busy, however, we do not need to keep track of which teller is busy. When X = n ≥ 2, both tellers are busy and there are n −2 cars waiting. Here is the Markov chain: 442 7 8 1 2 0 l l l l l 1 2 2 2 2 ××× Since this is a birth death process, we could easily solve this problem using analysis. However, as this problem is in the Matlab section of this chapter, we might as well construct a Matlab solution: function [p,en]=veryfast2(lambda); c=2*[0,eye(1,8)]’; r=lambda*[0,eye(1,8)]; Q=toeplitz(c,r); Q(2,1)=1; p=cmcstatprob(Q); en=(0:8)*p; The code solves the stationary distribution and the ex- pected number of cars in the system for an arbitrary arrival rate λ. Here is the output: >> [p,en]=veryfast2(0.75); >> p’ ans = 0.4546 0.3410 0.1279 0.0480 0.0180 0.0067 0.0025 0.0009 0.0004 >> en en = 0.8709 >> Problem 12.11.5 Solution Under construction. Problem 12.11.6 Solution Under construction. Problem 12.11.7 Solution Under construction. Problem 12.11.8 Solution Under construction. Problem 12.11.9 Solution Under construction. 443 Problem Solutions – Chapter 1 Problem 1.1.1 Solution Based on the Venn diagram M O T the answers are fairly straightforward: (a) Since T ∩ M = φ, T and M are not mutually exclusive. (b) Every pizza is either Regular (R), or Tuscan (T ). Hence R ∪ T = S so that R and T are collectively exhaustive. Thus its also (trivially) true that R ∪ T ∪ M = S. That is, R, T and M are also collectively exhaustive. (c) From the Venn diagram, T and O are mutually exclusive. In words, this means that Tuscan pizzas never have onions or pizzas with onions are never Tuscan. As an aside, “Tuscan” is a fake pizza designation; one shouldn’t conclude that people from Tuscany actually dislike onions. (d) From the Venn diagram, M ∩ T and O are mutually exclusive. Thus Gerlanda’s doesn’t make Tuscan pizza with mushrooms and onions. (e) Yes. In terms of the Venn diagram, these pizzas are in the set (T ∪ M ∪ O)c . Problem 1.1.2 Solution Based on the Venn diagram, M O T the complete Gerlandas pizza menu is • Regular without toppings • Regular with mushrooms • Regular with onions • Regular with mushrooms and onions • Tuscan without toppings • Tuscan with mushrooms Problem 1.2.1 Solution (a) An outcome specifies whether the fax is high (h), medium (m), or low (l) speed, and whether the fax has two (t) pages or four (f ) pages. The sample space is S = {ht, hf, mt, mf, lt, lf } . 2 (1) (b) The event that the fax is medium speed is A1 = {mt, mf }. (c) The event that a fax has two pages is A2 = {ht, mt, lt}. (d) The event that a fax is either high speed or low speed is A3 = {ht, hf, lt, lf }. (e) Since A1 ∩ A2 = {mt} and is not empty, A1 , A2 , and A3 are not mutually exclusive. (f) Since the collection A1 , A2 , A3 is collectively exhaustive. A1 ∪ A2 ∪ A3 = {ht, hf, mt, mf, lt, lf } = S, (2) Problem 1.2.2 Solution (a) The sample space of the experiment is S = {aaa, aaf, af a, f aa, f f a, f af, af f, f f f} . (b) The event that the circuit from Z fails is ZF = {aaf, af f, f af, f f f} . The event that the circuit from X is acceptable is XA = {aaa, aaf, af a, af f} . (c) Since ZF ∩ XA = {aaf, af f } = φ, ZF and XA are not mutually exclusive. (d) Since ZF ∪ XA = {aaa, aaf, af a, af f, f af, f f f} = S, ZF and XA are not collectively exhaustive. (e) The event that more than one circuit is acceptable is C = {aaa, aaf, af a, f aa} . The event that at least two circuits fail is D = {f f a, f af, af f, f f f } . (f) Inspection shows that C ∩ D = φ so C and D are mutually exclusive. (g) Since C ∪ D = S, C and D are collectively exhaustive. (5) (4) (3) (2) (1) Problem 1.2.3 Solution The sample space is S = {A♣, . . . , K♣, A♦, . . . , K♦, A♥, . . . , K♥, A♠, . . . , K♠} . The event H is the set H = {A♥, . . . , K♥} . 3 (2) (1) Problem 1.2.4 Solution The sample space is   1/1 . . . 1/31, 2/1 . . . 2/29, 3/1 . . . 3/31, 4/1 . . . 4/30, 5/1 . . . 5/31, 6/1 . . . 6/30, 7/1 . . . 7/31, 8/1 . . . 8/31, S=  9/1 . . . 9/31, 10/1 . . . 10/31, 11/1 . . . 11/30, 12/1 . . . 12/31 H = {7/1, 7/2, . . . , 7/31} .    . (1) The event H defined by the event of a July birthday is described by following 31 sample points. (2) Problem 1.2.5 Solution Of course, there are many answers to this problem. Here are four event spaces. 1. We can divide students into engineers or non-engineers. Let A1 equal the set of engineering students and A2 the non-engineers. The pair {A1 , A2 } is an event space. 2. We can also separate students by GPA. Let Bi denote the subset of students with GPAs G satisfying i − 1 ≤ G < i. At Rutgers, {B1 , B2 , . . . , B5 } is an event space. Note that B5 is the set of all students with perfect 4.0 GPAs. Of course, other schools use different scales for GPA. 4. Lastly, we can categorize students by attendance. Let D0 denote the number of students who have missed zero lectures and let D1 denote all other students. Although it is likely that D0 is an empty set, {D0 , D1 } is a well defined event space. 3. We can also divide the students by age. Let Ci denote the subset of students of age i in years. At most universities, {C10 , C11 , . . . , C100 } would be an event space. Since a university may have prodigies either under 10 or over 100, we note that {C0 , C1 , . . .} is always an event space Problem 1.2.6 Solution Let R1 and R2 denote the measured resistances. The pair (R1 , R2 ) is an outcome of the experiment. Some event spaces include 1. If we need to check that neither resistance is too high, an event space is A1 = {R1 < 100, R2 < 100} , A2 = {either R1 ≥ 100 or R2 ≥ 100} . (1) 2. If we need to check whether the first resistance exceeds the second resistance, an event space is B1 = {R1 > R2 } B2 = {R1 ≤ R2 } . (2) 3. If we need to check whether each resistance doesn’t fall below a minimum value (in this case 50 ohms for R1 and 100 ohms for R2 ), an event space is C3 = {R1 ≥ 50, R2 < 100} , C1 = {R1 < 50, R2 < 100} , C2 = {R1 < 50, R2 ≥ 100} , C4 = {R1 ≥ 50, R2 ≥ 100} . (3) (4) 4. If we want to check whether the resistors in parallel are within an acceptable range of 90 to 110 ohms, an event space is D1 = (1/R1 + 1/R2 )−1 < 90 , D2 = 110 < (1/R1 + 1/R2 ) D2 = 90 ≤ (1/R1 + 1/R2 ) −1 −1 (5) (6) (7) . ≤ 110 , 4 Problem 1.3.1 Solution The sample space of the experiment is S = {LF, BF, LW, BW } . (1) From the problem statement, we know that P [LF ] = 0.5, P [BF ] = 0.2 and P [BW ] = 0.2. This implies P [LW ] = 1 − 0.5 − 0.2 − 0.2 = 0.1. The questions can be answered using Theorem 1.5. (a) The probability that a program is slow is P [W ] = P [LW ] + P [BW ] = 0.1 + 0.2 = 0.3. (b) The probability that a program is big is P [B] = P [BF ] + P [BW ] = 0.2 + 0.2 = 0.4. (c) The probability that a program is slow or big is P [W ∪ B] = P [W ] + P [B] − P [BW ] = 0.3 + 0.4 − 0.2 = 0.5. (4) (3) (2) Problem 1.3.2 Solution A sample outcome indicates whether the cell phone is handheld (H) or mobile (M ) and whether the speed is fast (F ) or slow (W ). The sample space is S = {HF, HW, M F, M W } . (1) The problem statement tells us that P [HF ] = 0.2, P [M W ] = 0.1 and P [F ] = 0.5. We can use these facts to find the probabilities of the other outcomes. In particular, P [F ] = P [HF ] + P [M F ] . This implies Also, since the probabilities must sum to 1, P [M F ] = P [F ] − P [HF ] = 0.5 − 0.2 = 0.3. (3) (2) P [HW ] = 1 − P [HF ] − P [M F ] − P [M W ] = 1 − 0.2 − 0.3 − 0.1 = 0.4. (a) The probability a cell phone is slow is P [W ] = P [HW ] + P [M W ] = 0.4 + 0.1 = 0.5. (b) The probability that a cell hpone is mobile and fast is P [M F ] = 0.3. (c) The probability that a cell phone is handheld is P [H] = P [HF ] + P [HW ] = 0.2 + 0.4 = 0.6. (4) Now that we have found the probabilities of the outcomes, finding any other probability is easy. (5) (6) 5 A reasonable probability model that is consistent with the notion of a shuffled deck is that each card in the deck is equally likely to be the first card. Let Hi denote the event that the first card drawn is the ith heart where the first heart is the ace, the second heart is the deuce and so on. In that case, P [Hi ] = 1/52 for 1 ≤ i ≤ 13. The event H that the first card is a heart can be written as the disjoint union (1) H = H1 ∪ H2 ∪ · · · ∪ H13 . Using Theorem 1.1, we have 13 Problem 1.3.3 Solution P [H] = i=1 P [Hi ] = 13/52. (2) This is the answer you would expect since 13 out of 52 cards are hearts. The point to keep in mind is that this is not just the common sense answer but is the result of a probability model for a shuffled deck and the axioms of probability. Problem 1.3.4 Solution Let si denote the outcome that the down face has i dots. The sample space is S = {s1 , . . . , s6 }. The probability of each sample outcome is P [si ] = 1/6. From Theorem 1.1, the probability of the event E that the roll is even is P [E] = P [s2 ] + P [s4 ] + P [s6 ] = 3/6. (1) Let si equal the outcome of the student’s quiz. The sample space is then composed of all the possible grades that she can receive. S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} . (1) Problem 1.3.5 Solution Since each of the 11 possible outcomes is equally likely, the probability of receiving a grade of i, for each i = 0, 1, . . . , 10 is P [si ] = 1/11. The probability that the student gets an A is the probability that she gets a score of 9 or higher. That is P [Grade of A] = P [9] + P [10] = 1/11 + 1/11 = 2/11. The probability of failing requires the student to get a grade less than 4. P [Failing] = P [3] + P [2] + P [1] + P [0] = 1/11 + 1/11 + 1/11 + 1/11 = 4/11. (3) (2) Problem 1.4.1 Solution From the table we look to add all the disjoint events that contain H0 to express the probability that a caller makes no hand-offs as P [H0 ] = P [LH0 ] + P [BH0 ] = 0.1 + 0.4 = 0.5. In a similar fashion we can express the probability that a call is brief by P [B] = P [BH0 ] + P [BH1 ] + P [BH2 ] = 0.4 + 0.1 + 0.1 = 0.6. The probability that a call is long or makes at least two hand-offs is P [L ∪ H2 ] = P [LH0 ] + P [LH1 ] + P [LH2 ] + P [BH2 ] = 0.1 + 0.1 + 0.2 + 0.1 = 0.5. (3) (4) (2) (1) 6 Problem 1.4.2 Solution (a) From the given probability distribution of billed minutes, M , the probability that a call is billed for more than 3 minutes is P [L] = 1 − P [3 or fewer billed minutes] = 1 − P [B1 ] − P [B2 ] − P [B3 ] = (1 − α)3 = 0.57. = 1 − α − α(1 − α) − α(1 − α) 2 (1) (2) (3) (4) (b) The probability that a call will billed for 9 minutes or less is 9 P [9 minutes or less] = i=1 α(1 − α)i−1 = 1 − (0.57)3 . (5) The first generation consists of two plants each with genotype yg or gy. They are crossed to produce the following second generation genotypes, S = {yy, yg, gy, gg}. Each genotype is just as likely as any other so the probability of each genotype is consequently 1/4. A pea plant has yellow seeds if it possesses at least one dominant y gene. The set of pea plants with yellow seeds is Y = {yy, yg, gy} . So the probability of a pea plant with yellow seeds is P [Y ] = P [yy] + P [yg] + P [gy] = 3/4. (2) (1) Problem 1.4.3 Solution Problem 1.4.4 Solution Each statement is a consequence of part 4 of Theorem 1.4. (a) Since A ⊂ A ∪ B, P [A] ≤ P [A ∪ B]. (b) Since B ⊂ A ∪ B, P [B] ≤ P [A ∪ B]. (c) Since A ∩ B ⊂ A, P [A ∩ B] ≤ P [A]. (d) Since A ∩ B ⊂ B, P [A ∩ B] ≤ P [B]. Problem 1.4.5 Solution Specifically, we will use Theorem 1.7(c) which states that for any events A and B, P [A ∪ B] = P [A] + P [B] − P [A ∩ B] . (1) To prove the union bound by induction, we first prove the theorem for the case of n = 2 events. In this case, by Theorem 1.7(c), P [A1 ∪ A2 ] = P [A1 ] + P [A2 ] − P [A1 ∩ A2 ] . 7 (2) By the first axiom of probability, P [A1 ∩ A2 ] ≥ 0. Thus, P [A1 ∪ A2 ] ≤ P [A1 ] + P [A2 ] . (3) which proves the union bound for the case n = 2. Now we make our induction hypothesis that the union-bound holds for any collection of n − 1 subsets. In this case, given subsets A1 , . . . , An , we define B = An . (4) A = A1 ∪ A2 ∪ · · · ∪ An−1 , By our induction hypothesis, P [A] = P [A1 ∪ A2 ∪ · · · ∪ An−1 ] ≤ P [A1 ] + · · · + P [An−1 ] . This permits us to write P [A1 ∪ · · · ∪ An ] = P [A ∪ B] (6) (by the union bound for n = 2) (7) (8) (9) (5) ≤ P [A] + P [B] ≤ P [A1 ] + · · · P [An−1 ] + P [An ] which completes the inductive proof. = P [A1 ∪ · · · ∪ An−1 ] + P [An ] Problem 1.4.6 Solution (a) For convenience, let pi = P [F Hi ] and qi = P [V Hi ]. Using this shorthand, the six unknowns p0 , p1 , p2 , q0 , q1 , q2 fill the table as F V However, we are given a number of facts: p0 + q0 = 1/3, p2 + q2 = 1/3, p1 + q1 = 1/3, p0 + p1 + p2 = 5/12. (2) (3) H0 H1 H2 p0 p1 p2 . q0 q1 q2 (1) Other facts, such as q0 + q1 + q2 = 7/12, can be derived from these facts. Thus, we have four equations and six unknowns, choosing p0 and p1 will specify the other unknowns. Unfortunately, arbitrary choices for either p0 or p1 will lead to negative values for the other probabilities. In terms of p0 and p1 , the other unknowns are q0 = 1/3 − p0 , q1 = 1/3 − p1 , p2 = 5/12 − (p0 + p1 ), q2 = p0 + p1 − 1/12. (4) (5) Because the probabilities must be nonnegative, we see that 0 ≤ p0 ≤ 1/3, 1/12 ≤ p0 + p1 ≤ 5/12. 8 0 ≤ p1 ≤ 1/3, (6) (7) (8) In this case.4. q2 = 0. q1 = 3/12. p2 = 0. the fact that A2 = φ permits us to write P [φ] + By Axiom 1. (3) By subtracting P [A1 ] from both sides. and since S = S ∪ φ. p1 = 1/12. (2) (1) The problem is that this property is a consequence of the three axioms. we can use Axiom 3 to write i=1 P [A1 ] = P [∪∞ Ai ] i=1 = P [A1 ] + P [A2 ] + ∞ i=3 P [Ai ] . q0 = 1/12. qi notation. let A1 be an arbitrary set and for n = 2. q0 = 1/12. The above “proof” used the property that for mutually exclusive sets A1 and A2 .7 Solution It is tempting to use the following proof: Since S and φ are mutually exclusive. P [Ai ] ≥ 0 for all i. . P [A1 ∪ A2 ] = P [A1 ] + P [A2 ] . q0 = 1/3. q0 = 0. three possible solutions are: p0 = 1/3. p1 = 1/6. 1 = P [S ∪ φ] = P [S] + P [φ] .Although there are an infinite number of solutions. p2 = 1/3. (15) (16) Problem 1. Since Axiom 1 9 . q2 = 1/3. Thus. 3. we must have P [φ] = 0. let An = φ. For a proof that uses just the three axioms. q1 = 1/6. This implies P [φ] ≤ 0.. q1 = 3/12. (13) (14) p1 = 1/12. and p0 = 0. and thus must be proven. (11) (12) p1 = 1/12. ∞ n=3 ∞ n=3 P [Ai ] P [Ai ] = 0. and p0 = 1/4. (4) ≥ 0. requires P [φ] ≥ 0. p2 = 1/12. p2 = 0. Since P [S] = 1. (9) (10) (b) In terms of the pi . q2 = 1/3. q2 = 3/12. These extra facts uniquely specify the probabilities. the new facts are p0 = 1/4 and q1 = 1/6. p0 = 1/4. . Since A1 = ∪∞ Ai . we must have P [φ] = 0. q1 = 1/4. . (4) Thus. .4: P [∪m Bi ] = i=1 Problem 1. Am+1 . Am+1 . P [B1 ∪ B2 ∪ · · · ∪ Bm ] = P [A1 ∪ A2 ∪ · · ·] m−1 i=1 m−1 (1) P [Ai ] P [Ai ] .4. P [Ai ] = P [φ] = 0. P [φ] = 0. yielding the claim P [∪m Bi ] = m P [Ai ] = m P [Bi ]. (1) For i > m. m i=1 P [Bi ]. . we use Axiom 3 again on the countably infinite sequence Am .4. by Axiom 3. let B1 = S and let B2 = φ. P [S] = P [B1 ∪ B2 ] = P [B1 ] + P [B2 ] = P [S] + P [φ] . In that case. we have used just Axiom 3 to prove Theorem 1. .4.Following the hint. Thus.4: m P [B1 ∪ B2 ∪ · · · ∪ Bm ] = P [Bi ] . . . (2) (3) = = i=1 P [Ai ] + P [Bi ] + ∞ i=m ∞ i=m Now. i=1 (5) (a) To show P [φ] = 0. By construction. .} such that i = 1. (2) Now. m and let Ai = φ for i > m. We start by observing m−1 P [∪m Bi ] = i=1 i=1 P [Bi ] + ∞ i=m P [Ai ] . .8 Solution P [Ai ] . to write ∞ i=m P [Ai ] = P [Am ∪ Am+1 ∪ · · ·] = P [Bm ] . . m. 2. we have used just Axiom 3 to prove Theorem 1. 10 (6) . . the proof of each part will need Theorem 1. (3) Thus. . . .4 which uses only Axiom 3.9 Solution Each claim in Theorem 1. . . Bm . i=1 i=1 i=1 Note that the fact that P [φ] = 0 follows from Axioms 1 and 2.4 which we now prove. . Axiom 3 then implies i=1 i=1 P [∪m Bi ] = P [∪∞ Ai ] = i=1 i=1 ∞ i=1 Problem 1. . ∪m Bi = ∪∞ Ai . . Still. we define the set of events {Ai |i = 1. let Ai = Bi for i = 1. . . However. Ai = Bi and for i > m. Thus by Theorem 1.7 requires a proof from which we can check which axioms are used. For the mutually exclusive events B1 . This problem is more challenging if you just use Axiom 3. we use Axiom 3 again on Am . Note that this proof uses only Theorem 1. . to write ∞ i=m P [Ai ] = P [Am ∪ Am+1 ∪ · · ·] = P [Bm ] . . the problem is somewhat hard because there may still be a simpler proof that uses fewer axioms. Ai = φ. By Theorem 1. Note that so far we have used only Axiom 3.6. (4) P [L] P [L] 0. Axiom 2 says P [S] = 1. P [A ∪ B] = P [AB] + P [AB c ] + P [Ac B] (d) Observe that since A ⊂ B.4 2 P [H0 B] = = .2 3 P [H1 L ∪ H2 L] P [H1 ∪ H2 |L] = = = = . P [H1 ] 0. we have A ∪ B = (AB) ∪ (AB c ) ∪ (Ac B).1 1 P [H1 L] = = . P [B] 0. we can write both A and B as unions of disjoint events: Now we apply Theorem 1. (7) (c) By Theorem 1. using Theorem 1. The probability that a long call will have one or more handoffs is P [H1 L] + P [H2 L] 0. The probability a brief call will have no handoffs is P [H0 |B] = 0.6 3 (2) (1) (b) The probability of one handoff is P [H1 ] = P [H1 B] + P [H1 L] = 0.4 to write We can rewrite these facts as P [A] = P [AB] + P [AB c ] .4 with B1 = A and B2 = Ac . This proof uses Axioms 1 and 3.2 2 (3) (c) The probability a call is long is P [L] = 1 − P [B] = 0.(b) Using Theorem 1. P [Ac ] = 1 − P [A]. we observe that A ∪ B can be written as the union of mutually exclusive events Once again. This proof uses Axioms 2 and 3.2. we can write B as the disjoint union B = A ∪ (Ac B).1 Solution Each question requests a conditional probability.4 (which uses Axiom 3). P [AB c ] = P [A] − P [AB].2. P [Ac B] = P [B] − P [AB]. (11) (12) (13) Substituting the results of Equation (10) into Equation (12) yields which completes the proof. (8) (9) (10) P [S] = P [A ∪ Ac ] = P [A] + P [Ac ] . Finally.4 4 11 .5. (a) Note that the probability a call is brief is P [B] = P [H0 B] + P [H1 B] + P [H2 B] = 0. we have Since. Note that this claim required only Axiom 3. P [B] = P [A] + P [Ac B] . A = (AB) ∪ (AB c ) B = (AB) ∪ (Ac B).4. P [B] = P [AB] + P [Ac B] . hich implies P [A] ≤ P [B]. The probability that a call with one handoff will be long is P [L|H1 ] = 0.1 + 0. P [A ∪ B] = P [AB] + P [A] − P [AB] + P [B] − P [AB] .4. (14) Problem 1. P [Ac B] ≥ 0. By Axiom 1. (a) Since G1 = {s2 . The conditional probabilities of G3 given E is P [G3 |E] = 1/3 2 P [G3 E] = = . yg.2 Solution Let si denote the outcome that the roll is i. . C2 ⊂ E so that P [C2 E] = P [C2 ] = 1/3. s5 . P [C2 ] 1/3 (2) Problem 1. From Problem 1.5. P [G1 ] 5 (1) (b) The conditional probability that 6 is rolled given that the roll is greater than 3 is P [R6 |G3 ] = P [s6 ] 1/6 P [R6 G3 ] = = . Gj = {sj+1 . P [G3 ] P [s4 . Ri = {si }.3. So. This implies P [DY ] = P [yy] = 1/4 P [Y ] = P [yy.4. the conditional probability can be expressed as P [D|Y ] = P [DY ] 1/4 = = 1/3. for 1 ≤ i ≤ 6. Similarly. The joint probability of G3 and E is (3) P [G3 E] = P [s4 . 1/3 P [C2 E] = = 1/2.5. P [E] 1/2 3 1/3 2 P [EG3 ] = = . s6 ] = 1/3. gy. s4 . s6 } and all outcomes have probability 1/6. . . P [G1 ] = 5/6. s4 . P [G3 ] 1/2 3 (4) (d) The conditional probability that the roll is even given that it’s greater than 3 is P [E|G3 ] = (5) Since the 2 of clubs is an even numbered card. we found that with respect to the color of the peas. To find the conditional probability of D given the event Y . s5 . The event R3 G1 = {s3 } and P [R3 G1 ] = 1/6 so that P [R3 |G1 ] = 1 P [R3 G1 ] = . the genotypes yy.4 Solution Note that P [DY ] is just the probability of the genotype yy. . gy.5.Problem 1. s6 }. s6 } and has probability 3/6. we look to evaluate P [D|Y ] = P [DY ] . corresponding to a plant having yellow seeds. P [Y ] 3/4 12 (3) . yg] = 3/4. s6 ] 3/6 (2) (c) The event E that the roll is even is E = {s2 . Since P [E] = 2/3. P [Y ] (1) Problem 1.3 Solution Define D as the event that a pea plant has two dominant y genes. and gg were all equally likely. (1) P [C2 |E] = P [E] 2/3 The probability that an even numbered card is picked given that the 2 is picked is P [E|C2 ] = 1/3 P [C2 E] = = 1. (2) Thus. s3 . 423. 423] 2/6 P [E1 E2 ] = = = 1/2. 324.5 Solution The sample outcomes can be written ijk where the first card drawn is i. P [LH] P [LH ∩ (L ∪ H)] = = 0.16 and P [H] = 0. 432} . The sample space is S = {234. P [L ∪ H] P [L ∪ H] (1) (a) Since LH ⊂ L ∪ H. 342} . 432} . O2 = {234. O3 = {243. P [LH|L ∪ H] = Thus. The events E1 .16 and P [H] = 0.10 (0. (1) and each of the six outcomes has probability 1/6.10. (2) (3) Since P [L] = 0.0236. P [E1 ] P [234. 432} . 423} . (2) (3) (4) (a) The conditional probability the second card is even given that the first card is even is P [243.10. 423. P [O1 ] P [O1 ] (8) (e) The conditional probability the second card is odd given that the first card is odd is P [O2 |O1 ] = P [O1 O2 ] = 0. E2 .5. 1. 342. 423} .10P [L ∪ H] = 0. 342. the second is j and the third is k. 243. E2 = {243. 324. 324. 342. 243. P [LH] = 0. The words “10% of the ticks that had either Lyme disease or HGE carried both diseases” can be written as P [LH|L ∪ H] = 0.10. P [E2 ] P [243. 423] 4/6 (6) (c) The probability the first two cards are even given the third card is even is P [E1 E2 |E3 ] = P [E1 E2 E3 ] = 0.1 13 (4) . 423] 2/6 P [E2 E1 ] = = = 1/2.5.16 + 0.Problem 1. P [E2 |E1 ] = O1 = {324. E3 . 423. P [E3 ] (7) (d) The conditional probabilities the second card is even given that the first card is odd is P [E2 |O1 ] = P [O1 ] P [O1 E2 ] = = 1. P [LH] = 0. P [O1 ] (9) Problem 1. 432] 4/6 (5) (b) The conditional probability the first card is even given that the second card is even is P [E1 |E2 ] = P [243. O2 .6 Solution The problem statement yields the obvious facts that P [L] = 0. 243.10 (P [L] + P [H] − P [LH]) . O1 . 432} .10) = 0.10. 342. O3 are E1 = {234. E3 = {234. 324. (1) A B Problem 1. both A and B have area 1/4 so that P [A] = P [B] = 1/4.6. events A and B are independent if and only if P [AB] = P [A]P [B]. That is. P [A ∪ B] = P [A] + P [B] − P [A ∩ B] = 3/8. P [A] = P [AB] = P [A] P [B] = (P [A])2 . A Venn diagram should convince you that A ⊂ B c so that A ∩ B c = A. the intersection AB has area 1/16 and covers 1/4 of A and 1/4 of B.3 Solution (a) Since A and B are disjoint.6. Since P [A ∩ B] = 0.1475. assume the sample space has area 1 corresponding to probability 1. for A and B to be the same set and also independent. Moreover.6. (b) Events A and B are dependent since P [AB] = P [A]P [B]. P [A ∩ B] = 0. As drawn.2 Solution In the Venn diagram.16 (5) This problems asks whether A and B can be independent events yet satisfy A = B? By definition. We can see that if A = B.0236 P [LH] = = 0. that is they are the same set.(b) The conditional probability that a tick has HGE given that it has Lyme disease is P [H|L] = 0. A and B are independent since P [AB] = P [A] P [B] . P [L] 0. This implies P [A ∩ B c ] = P [A] = 1/4.1 Solution Problem 1. (2) (1) Problem 1. • P [A] = 0 implying A = B = φ. There are two ways that this requirement can be satisfied: • P [A] = 1 implying A = B = S. then P [AB] = P [AA] = P [A] = P [B] . Thus. It also follows that P [A ∪ B c ] = P [B c ] = 1 − 1/8 = 7/8. (2) (1) 14 . The next few items are a little trickier. since A is a subset of B c . we see It follows that P [C ∩ Dc ] = P [C] − P [C ∩ D] = 5/8 − 15/64 = 25/64.4 Solution (a) Since A ∩ B = ∅. P [C ∩ Dc ] = P [C] − P [C ∩ D] = 1/2 − 1/3 = 1/6. P [AB] = 0 = 3/32 = P [A] P [B] . To find P [B]. Problem 1. P [B] = 1/4. since C and D are independent events. C c and Dc are independent. Furthermore. events C and Dc are independent because P [C ∩ Dc ] = 1/6 = (1/2)(1/3) = P [C] P [Dc ] .(c) Since C and D are independent. P [C ∪ Dc ] = P [C] + P [Dc ] − P [C ∩ Dc ] Using DeMorgan’s law. So P [D] = P [CD] 1/3 = = 2/3. P [A ∪ B c ] = P [B c ] = 3/4. P [C c ∩ Dc ] = P [C c ] P [Dc ] = (1 − P [C])(1 − P [D]) = 1/6. P [A ∩ B] = 0. P [A ∩ B c ] = P [A] = 3/8. (5) (6) Note that a second way to find P [C c ∩ Dc ] is to use the fact that if C and D are independent. (7) (d) Note that we found P [C ∪ D] = 5/6.7. P [C c ∩ Dc ] = P [(C ∪ D)c ] = 1 − P [C ∪ D] = 1/6. P [C c ∩ Dc ] = P [(C ∪ D)c ] = 1 − P [C ∪ D] = 15/64. we first observe that By De Morgan’s Law. 15 P [C ∪ Dc ] = P [C] + P [D] − P [C ∩ Dc ] = 1/2 + (1 − 2/3) − 1/6 = 2/3. (3) = 5/8 + (1 − 3/8) − 25/64 = 55/64. (3) (c) Since C and D are independent P [CD] = P [C]P [D]. This implies P [C ∪ D] = P [C] + P [D] − P [C ∩ D] = 1/2 + 2/3 − 1/3 = 5/6. we have (4) (5) (6) (7) P [C ∩ D] = P [C] P [D] = 15/64. (8) (9) . (d) Since P [C c Dc ] = P [C c ]P [Dc ]. To find P [C c ∩ Dc ]. we can write 5/8 = 3/8 + P [B] − 0. P [A ∪ B] = P [A] + P [B] − P [A ∩ B] (1) (2) (b) The events A and B are dependent because Thus. Thus Finally. then C c and Dc are independent. P [C] 1/2 (4) In addition. We can also use the earlier results to show (e) By Definition 1. C c ∩ Dc = (C ∪ D)c . P [C|D] = P [C] = 1/2. Since A is a subset of B c .6. From Venn diagrams. the three events A1 . wryg. rwgy.Problem 1. There are four visibly different pea plants. 3. the remaining 12 outcomes have yellow seeds. 2} A2 = {2. 4} with equiprobable outcomes. wrgy}. wrgy} . we see that P [Y ] = 12/16 = 3/4 (2) and P [R] = 12/16 = 3/4. or yellow (Y ) or not (Y c ). that is event Y occurs. 1} . rrgy.6 Solution There are 16 distinct equally likely outcomes for the second generation of pea plants based on a first generation of {rwyg. Except for the four outcomes with a pair of recessive g genes. They are listed below rryy rwyy wryy wwyy rryg rwyg wryg wwyg rrgy rwgy wrgy wwgy rrgg rwgg wrgg wwgg (1) A plant has yellow seeds.6. corresponding to whether the peas are round (R) or not (Rc ). is the set of outcomes RY = {rryy. we first must find P [RY ]. Note that RY . A2 . wryy. rwyy. consider the events A1 = {1. Since P [RY ] = 9/16. P [R Y ] = 1/16. (3) To find the conditional probabilities P [R|Y ] and P [Y |R]. the event that a plant has rounded yellow seeds. if a plant has at least one dominant y gene. c c (7) (8) 16 . 2. rwyg. However.5 Solution For a sample space S = {1. (1) Each event Ai has probability 1/2. (3) (2) Problem 1. These four visible events have probabilities P [RY ] = 9/16 P [R Y ] = 3/16 c P [RY c ] = 3/16. P [Y |R ] = and P [R |Y ] = 9/16 P [RY ] = = 3/4 P [R] 3/4 9/16 P [RY ] = = 3/4. Moreover. each pair of events is independent since P [A1 A2 ] = P [A2 A3 ] = P [A3 A1 ] = 1/4. From the above. P [Y ] 3/4 (4) (5) (6) Thus P [R|Y ] = P [R] and P [Y |R] = P [Y ] and R and Y are independent events. 3} A3 = {3. wryg. A3 are not independent since P [A1 A2 A3 ] = 0 = P [A1 ] P [A2 ] P [A3 ] . rryg.6. rwgy. P [BC] = P [B] P [C] . As drawn. P [AB] = P [A]P [B]. it is really the same claim as in part (a). (2) (1) Problem 1.8 Solution A AC AB ABC C In the Venn diagram at right.6. Since A and B are arbitrary labels. assume the sample space has area 1 corresponding to probability 1.7 Solution (a) For any events A and B. and AC each has area 1/9. As drawn. and C are mutually independent since P [ABC] = P [A] P [B] P [C] . Thus A. Since A and B are independent. we apply the result of part (a) to the sets A and B c . That is.6. B. part (b) says that Ac and B c are independent. Thus A and B c are independent. simply reversing the labels of A and B proves the claim. implying A. and C each have area 1/2 and thus probability 1/2. AB. A.6. The three way intersection ABC has zero probability. the three way intersection ABC has probability 1/8. B. we can write the law of total probability in the form of P [A] = P [AB] + P [AB c ] . and C are not mutually independent since P [ABC] = 0 = P [A] P [B] P [C] . Since we know from part (a) that A and B c are independent. P [AC] = P [A] P [C] . B. assume the sample space has area 1 corresponding to probability 1. Moreover. Alternatively. (2) 17 . each pair of events is independent since P [AB] = P [A] P [B] . and C each have area 1/3 and thus probability 1/3.9 Solution A AC AB C B BC In the Venn diagram at right. (1) B BC Problem 1. (c) To prove that Ac and B c are independent. one can construct exactly the same proof as in part (a) with the labels A and B reversed.Problem 1. A. (b) Proving that Ac and B are independent is not really necessary. B. This implies P [AB c ] = P [A] − P [A] P [B] = P [A] (1 − P [B]) = P [A] P [B c ] . BC. As a result. (1) However. the probability the second light is green is P [G2 ] = P [G1 G2 ] + P [R1 G2 ] = 3/8 + 1/8 = 1/2. Problem 1. P [G2 ] P [G2 ] (2) (1) Finally.7. P [H2 ] 1/4 (1) (2) (b) The probability that the first flip is heads and the second flip is tails is P [H1 T2 ] = 3/16. Let Gi and Bi denote events indicating whether free throw i was good (Gi ) or bad (Bi ).7. we observe P [H2 ] = P [H1 H2 ] + P [T1 H2 ] = 1/4.7. we can directly read that P [G2 |G1 ] = 3/4. This implies P [H1 |H2 ] = 1/16 P [H1 H2 ] = = 1/4.1 Solution A sequential sample space for this experiment is 1/4 $ H2 •H1 H2 1/16 $ $$$ 1/4 $ H1 $ T2 •H1 T2 3/16 3/4 $$ $$$ ˆˆˆ ˆˆ 1/4 H2 •T1 H2 3/16 3/4 ˆ T1 ˆˆ ˆˆˆ ˆ 3/4 T2 •T1 T2 9/16 (a) From the tree. The tree for the free throw experiment is Problem 1. The conditional probability that the first light was green given the second light was green is P [G1 |G2 ] = P [G2 |G1 ] P [G1 ] P [G1 G2 ] = = 3/4.2 Solution The tree with adjusted probabilities is $ G2 $$$ $$ 1/2 ¨ G1 ˆˆˆ ¨ ˆˆ 1/4 ˆ R2 ¨¨ ¨¨ r rr 1/4 $ G2 rr $$$ 1/2 r R1 $$ ˆˆˆ ˆˆ 3/4 ˆ R2 3/4 •G1 G2 •G1 R2 •R1 G2 •R1 R2 3/8 1/8 1/8 3/8 From the tree. from the tree diagram.3 Solution 18 .Problem 1. 5 Solution We can use Bayes’ formula to evaluate these joint probabilities. P [+] P [+.0194.4 Solution The tree for this experiment is 1/4 $ H •AH 1/8 $$ $$$ 1/2 $ A 3/8 T •AT 3/4 $$ $$$ ˆˆ ˆˆˆ 3/4 •BH 3/8 1/2 ˆ B ˆˆ ˆˆˆ H ˆ T •BT 1/8 1/4 The probability that you guess correctly is P [C] = P [AT ] + P [BH] = 3/8 + 3/8 = 3/4.7. This event has probability P [O] = P [G1 B2 ] + P [B1 G2 ] = 1/8 + 1/8 = 1/4. The reason this probability is so low is that the a priori probability that a person has HIV is very small. even though the test is correct 99% of the time. is called a false-positive result and has probability P [+|H c ].99)(0. (1) The P [− |H ] is the probability that a person who has HIV tests negative for the disease. (1) P [−|H] = P [+|H c ] = 0.0002) + (0. H] P [+. Now the probability that a person who has tested positive for HIV actually has the disease is P [H|+] = P [+. 19 .01.$ $ $ G2 $$$ 1/2 ¨ G1 ˆˆˆ ¨ ˆˆ 1/4 ˆ B2 ¨¨ ¨¨ rr rr 1/4 $ G2 $ $$ r 1/2 r B1 $$ ˆˆˆ ˆˆ 3/4 ˆ B2 3/4 •G1 G2 •G1 B2 •B1 G2 •B1 B2 3/8 1/8 1/8 3/8 The game goes into overtime if exactly one free throw is made.0002) = (0.7. the probability that a random person who tests positive actually has HIV is less than 0. P [H|+] = P [+|H] P [H] P [+|H] P [H] + P [+|H c ] P [H c ] (0. Since the test is correct 99% of the time. The case where a person who does not have HIV but tests positive for the disease.01)(0.02. (3) (4) (5) Thus. H] = . This is referred to as a false-negative result. (1) Problem 1. H c ] (2) Problem 1. H] + P [+.99)(0.9998) = 0. (3) (2) (1) Thus P [H1 H2 ] = P [H1 ]P [H2 ]. 4/5 $ A2 •A1 A2 $ $$$ 3/5 $ A1 $ D2 •A1 D2 1/5 $$ $$$ ˆˆˆ ˆˆ 2/5 A2 •D1 A2 2/5 ˆ D1 ˆˆ ˆˆˆ ˆ 3/5 12/25 3/25 4/25 6/25 D2 •D1 D2 (a) We wish to find the probability P [E1 ] that exactly one photodetector is acceptable. This result should not be surprising since if the first flip is heads. implying H1 and H2 are not independent.Problem 1. From the tree. we have P [E1 ] = P [A1 D2 ] + P [D1 A2 ] = 3/25 + 4/25 = 7/25.7.7.6 Solution Let Ai and Di indicate whether the ith photodetector is acceptable or defective. P [H2 ] = P [A1 H1 H2 ] + P [A1 T1 H2 ] + P [B1 H1 H2 ] + P [B1 T1 H2 ] = 1/2. the second flip is less likely to be heads since it becomes more likely that the second coin flipped was coin A.7 Solution The tree for this experiment is 3/4 $ H2 $ $$$ 1/4 $ H1 $ T2 1/4 $$ 3/4 $$$ T1 ˆˆ H2 1/2 ¨ A1 3/4 ˆˆ ˆ T ¨¨ ˆ 2 1/4 ¨ ¨¨ rr rr 1/4 $ H2 $$ r 3/4 1/2 r B1 ˆ $$$ H1 T2 ˆˆˆ 3/4 1/4 ˆ T ˆ ˆ 1 H2 1/4 ˆˆˆ ˆ ˆ 3/4 •A1 H1 H2 •A1 H1 T2 •A1 T1 H2 •A1 T1 T2 •B1 H1 H2 •B1 H1 T2 •B1 T1 H2 3/32 1/32 9/32 3/32 3/32 9/32 1/32 3/32 T2 •B1 T1 T2 The event H1 H2 that heads occurs on both flips has probability P [H1 H2 ] = P [A1 H1 H2 ] + P [B1 H1 H2 ] = 6/32. (b) The probability that both photodetectors are defective is P [D1 D2 ] = 6/25. 20 . it is likely that coin B was picked first. Similarly. The probability of H1 is P [H1 ] = P [A1 H1 H2 ] + P [A1 H1 T2 ] + P [B1 H1 H2 ] + P [B1 H1 T2 ] = 1/2. In this case. (1) Problem 1. (d) From part (b).7. The conditional probability of heads on flip 1 given that Dagwood must diet is P [H1 |D] = P [H1 D] = 0. (c) The event that Dagwood must diet is The probability that Dagwood must diet is D = (T1 H2 T3 T4 ) ∪ (T1 T2 H3 T4 ) ∪ (T1 T2 T3 ). P [D] (8) Remember. P [T3 ] = P [T1 H2 T3 H4 ] + P [T1 H2 T3 T4 ] + P [T1 T2 T3 ] = 1/8 + 1/16 + 1/16 = 1/4. In fact. then we know there will be a third flip which may result in H3 . H2 and H3 are dependent events. if there was heads on flip 1. Now we find that P [H2 H3 ] = 1/8 = P [H2 ] P [H3 ] . 21 (9) (10) . Similarly. we calculate P [H2 ] = P [T1 H2 H3 ] + P [T1 H2 T3 ] + P [T1 H2 T4 T4 ] = 1/4 P [H2 H3 ] = P [T1 H2 H3 ] = 1/8. (11) Hence. The reason for the dependence is that given H2 occurred. we found that P [H3 ] = 1/4. P [H3 |H2 ] = 1/2 while P [H3 ] = 1/4. The tree for this problem is shown below. & & 1/2 &H3 •T1 H2 H3 1/8 1/2 & & 1/2 $H4 •T1 H2 T3 H4 1/16 & $ & & $$$ T3 T4 •T1 H2 T3 T4 1/16 1/2 ¨H2 & 1/2 1/2 &ˆ ¨¨ ˆ ˆ ¨ 1/2 1/2 ˆ 1/16 T2 H3 ˆ ˆ H •T T H H 1/2 ˆ T1 ¨ 1/2 ˆˆ 4 1 2 3 4  ˆ T4 •T1 T2 H3 T4 1/16  1/2   1/2  T3 •T1 T2 T3 1/8 H1 •H1 1/2 (b) From the tree. knowledge of H2 tells us that the experiment didn’t end after the first flip. then Dagwood always postpones his diet. To check independence.Problem 1.8 Solution (a) The primary difficulty in this problem is translating the words into the correct tree diagram. That is. P [H3 ] = P [T1 H2 H3 ] + P [T1 T2 H3 H4 ] + P [T1 T2 H3 H4 ] = 1/8 + 1/16 + 1/16 = 1/4. (5) (6) (7) (3) (4) (1) (2) P [D] = P [T1 H2 T3 T4 ] + P [T1 T2 H3 T4 ] + P [T1 T2 T3 ] = 1/16 + 1/16 + 1/8 = 1/4. If we allow each letter to appear only once then we have 4 choices for the first letter and 3 choices for the second and two choices for the third letter. 1−p 1−p 1−p From the tree. we require 6 25 n ≤ 0. n = 4 pairs must be tested.2 Solution 22 .10 Solution The experiment ends as soon as a fish is caught.01 ⇒ n≥ ln 0. Thus..6. The number of codes with exactly 3 zeros equals the number of ways of choosing the bits in which those zeros occur. P [C1 ] = p and P [C2 ] = (1 − p)p. From Problem 1.8. the probability of zero acceptable diodes out of n pairs of diodes is pn because on each test of a pair of diodes.01 = 3. there are a total of 4 · 3 · 2 = 24 possible codes. both diodes of a pair are bad. Testing each pair of diodes is an independent trial such that with probability p.7.. Since P [Zn ] = (6/25)n .9 Solution (a) We wish to know what the probability that we find no good photodiodes in n pairs of diodes. p = P [both diodes are defective] = P [D1 D2 ] = 6/25.8. the number of 3 letter words that can be formed is 43 = 64.7. Problem 1.01. ln 6/25 (3) Since n must be an integer. Problem 1. a fish is caught on the nth cast if no fish were caught on the previous n − 1 casts. Since each letter can take on any one of the 4 possible letters in the alphabet. P [Cn ] = (1 − p)n−1 p. (1) The probability of Zn . both must be defective.Problem 1. Therefore. Finally.23. The tree resembles p ¨ C1 p ¨ C2 p ¨ C3 ¨¨ ¨¨ ¨¨ ¨¨ ¨¨ ¨¨ c c c ¨ C1 ¨ C2 ¨ C3 .7. (1) Problem 1. n P [Zn ] = i=1 p = pn = 6 25 n (2) (b) Another way to phrase this question is to ask how many pairs must we test until P [Zn ] ≤ 0. Therefore there are 5 3 = 10 codes with exactly 3 zeros. we can easily calculate p.1 Solution There are 25 = 32 different binary codes with 5 bits. 52 · 51 = 2652 outcomes. K♥) were distinct. Problem 1. which does not change their ratio. (2) (c) The probability that the two cards are of the same type but different suit is the number of outcomes that are of the same type but different suit divided by the total number of outcomes involved in picking two cards at random from a deck of 52 cards. Of the remaining 14 field players. 15 1 2. the outcomes (K♥. Choose 1 of the 15 field players to be the designated hitter (DH). There are N4 = 9! ways to do this. the second sub-experiment is to pick the second card without replacing the first and recording it. = 15 3. Then we need to pick one of the 3 remaining cards that are of the same type but different suit out of the remaining 51 cards. 8 4. First we need to pick one of the 52 cards. There are N2 = ways to do this.Problem 1. So before. Choose 1 of the 10 pitchers. For the first sub-experiment we can have any one of the possible 52 cards for a total of 52 possibilities.3 Solution (a) The experiment of picking two cards and recording them in the order in which they were selected can be modeled by two sub-experiments. Now. So the total number outcomes is 52 · 3 = 156 outcomes. The second experiment consists of all the cards minus the one that was picked first(because we are sampling without replacement) for a total of 51 possible outcomes.8. does not change because both the numerator and the denominator have been reduced by an equal factor of 2. The probability however. different suit] = 1 156 = . (1) (b) To have the same card but different suit we can make the following sub-experiments. So we can redo parts (a) and (b) above by halving the corresponding values found in parts (a) and (b). So every pair of outcomes before collapses to a single outcome when we disregard ordering. The first is to pick the first card and record it. 8♦) and (8♦. So the total number of outcomes is the product of the number of outcomes for each sub-experiment. For the 9 batters (consisting of the 8 field players and the designated hitter). There are N1 = 10 1 = 10 ways to do this. There are N3 = 14 to do this.4 Solution We can break down the experiment of choosing a starting lineup into a sequence of subexperiments: 1.8. choose 8 for the remaining field positions. 23 . those two outcomes are not distinct and are only considered to be the single outcome that a King of hearts and 8 of diamonds were selected. 2652 17 (3) (d) Now we are not concerned with the ordering of the cards. P [same type. choose a batting lineup. 15 1 2. In this case. including the pitchers.So the total number of different starting lineups when the DH is selected among the field players is N = N1 N2 N3 N4 = (10)(15) 14 9! = 163.000. the designated hitter must be chosen from the 15 field players. there are some coomplicated rules about this. Although these constraints on the manager reduce the number of possible lineups. = 3. the shortstop and second baseman could trade positions in the middle of an inning. So the total number of different starting lineups when the DH is selected among the field players is 14 N = N1 N2 N3 N4 = (10)(15) 9! = 163.5 Solution When the DH can be chosen among all the players.000. you can find the complete rule on the web. 10 choices for the DH among the pitchers (including the pitcher batting for himself). the 8 nonpitching field players are allowed to switch positions at any time in the field. the number of possible lineups.296. NF . we note that our count did not need to specify the positions played by the field players. In this case. 8 (2) The total number of ways of choosing a lineup is N + N = 396. it typically makes the manager’s job more difficult. continuing to bat in the same position in the batting order.10: The Designated Hitter may be used defensively. Choose 1 of the 10 pitchers.459. Problem 1. Of the remaining 14 field players. 000. The number of possible lineups is N = (10)(10) 15 9! = 233. There are N2 = 15 ways to do this. Here is an an excerpt from Major league Baseball Rule 6. If you’re curious. There are N1 = 10 1 = 10 ways to do this. There are N4 = 9! ways to do this. but the pitcher must then bat in the place of the substituted defensive player.4. there are two cases: • The DH is a field player. there are 10 choices for the pitcher. 8 4. Although this is an important consideration for the manager. is given in Problem 1. As for the counting. There are N3 = 14 to do this. In this case.576. 513.972. We repeat the solution of Problem 1. choose 8 for the remaining field positions.000. Although the DH can go play the field.459. it is not part of our counting of different lineups. choose a batting lineup. Choose 1 of the 15 field players to be the designated hitter (DH). For example. 15 choices for the field 8 players.8. 280. and the manager then must designate their spots in the batting order. and 9! ways of ordering the batters into a lineup.8. 24 . In fact. For the 9 batters (consisting of the 8 field players and the designated hitter). 8 (1) Note that this overestimates the number of combinations the manager must really consider because most field players can play only one or two positions. (1) 8 • The DH is a pitcher.8.4 here: We can break down the experiment of choosing a starting lineup into a sequence of subexperiments: 1. unless more than one substitution is made.296. the probability that a ticket is a winner is p= (k + 5)!(n − 5)! 5+k4+k 3+k 2+k 1+k = . In the first situation. the box has the special mark.Problem 1. Suppose each ticket has n boxes and 5 + k specially marked boxes. a winning ticket will still have k unscratched boxes with the special mark.0079 0.01. (1) By careful choice of n and k. Then we can write the total number of lineups as N1 + N2 + N3 . The first situation is when the swingman can be chosen to play the guard position. we can choose p close to 0. Continuing this argument. we choose 1 out of 3 centers.012 0.1 Solution 25 . with the swingman on the bench. (2) the swingman plays forward. (1) In the second case. Problem 1. n n−1n−2n−3n−4 k!n! n 9 11 14 17 k 0 1 2 3 p 0. Assuming the boxes are scratched off randomly. Moreover. and 1 out of 4 guards so that N1 = 3 1 4 2 4 1 = 72. and 2 out of four guards. we need to choose 1 out of 3 centers. and the number of specially marked boxes. This implies N3 = 3 1 4 2 4 2 = 108. we have to choose 1 out of 3 centers. yielding 3 4 4 N2 = = 72. if the first scratched box has the mark. and the second where the swingman can only be chosen to play the forward position.6 Solution (a) We can find the number of valid starting lineups by noticing that the swingman presents three situations: (1) the swingman plays guard. (3) and the total number of lineups is N1 + N2 + N3 = 252. Let Ni denote the number of lineups corresponding to case i. 2 out of 4 forward. (2) Problem 1.0090 A gamecard with N = 14 boxes and 5 + k = 7 shaded boxes would be quite reasonable.7 Solution What our design must specify is the number of boxes on the ticket. then there are 4 + k marked boxes out of n − 1 remaining boxes. and (3) the swingman doesn’t play.0105 0. A ticket is a winner if each time a box is scratched off. 1 out of 4 forwards and 2 out of 4 guards. Note that when k > 0. 2 out of 4 forwards.8.8. (2) 1 1 2 Finally.9. the first box scratched off has the mark with probability (5 + k)/n since there are 5 + k marked boxes out of n boxes. For example. and that of a yellow light is 1/8.3 Solution We know that the probability of a green and red light is 7/16. we can express the probability of the code word 00111 as 2 occurrences of a 0 and three occurrences of a 1.00084. 3 (2) (1) Problem 1.4 Solution 26 .68) = 0.32)8 = 0. G. Y = 1] + P [G = 0. The probability that they win 10 titles in 11 years is P [10 titles in 11 years] = (1) p=1-q (2) 11 (.00512. (b) The probability that a code word has exactly three 1’s is P [three 1’s] = 5 (0.9. let Wi and Li denote whether game i was a win or a loss.2 Solution Given that the probability that the Celtics win a single championship in any given year is 0.2)3 = 0. we can find the probability that they win 8 straight NBA championships. Y .32)10 (. R = 0. R = 2] = 5! 2!1!2! 7 16 2 1 8 7 16 2 . (1) The probability that the number of green lights equals the number of red lights P [G = R] = P [G = 1.32. it should seem that these probabilities should be much higher. the tree is Problem 1. R = 2. What the model overlooks is that the sequence of 10 titles in 11 years started when Bill Russell joined the Celtics.8. P [8 straight championships] = (0. Problem 1.9.00011. Y = 3] + P [G = 2.8)2 (0. Because games 1 and 3 are home games and game 2 is an away game. R = 1. = 7 16 1 8 3 (2) (3) (4) + 5! 2!1!2! 7 16 2 7 16 2 1 8 + 5! 0!0!5! 1 8 5 For the team with the homecourt advantage. Y = 1.1449.8)2 (0. Since there are always 5 lights.0512. Y = 5] 5! 7 1!1!3! 16 ≈ 0.9. 10 The probability of each of these events is less than 1 in 1000! Given that these events took place in the relatively short fifty year history of the NBA.2)3 = 0. Therefore P [00111] = (0. In the years with Russell (and a strong supporting cast) the probability of a championship was much higher. and R obey the multinomial probability law: P [G = 2.(a) Since the probability of a zero is 0. the problem statement tells us that P [K|G1 ] = 1/2 P [K|G2 ] = 1/3. 3 2 (1) (2) Note that P [H] ≤ p for 1/2 ≤ p ≤ 1. the home court team is less likely to win a three game series than a 1 game playoff! Problem 1. (3) (4) (b) To solve this part. Using Gi to denote that a group i kicker was chosen. we have P [G2 ] = 2/3.p $ $$ W1 $$$ ˆˆˆ ˆˆ 1−p ˆ L1 ¨ ¨¨ ¨ 1−p ¨ W2 •W1 W2 ¨ p p(1−p) p $ W3 •W1 L2 W3 p3 $ $$$ L2 $ L3 •W1 L2 L3 p2 (1−p) 1−p 1−p p W2 ˆˆ W3 •L1 W2 W3 p(1−p)2 rr ˆˆˆ ˆ L3 •L1 W2 L3 (1−p)3 rr 1−p r p r •L1 L2 p(1−p) L2 The probability that the team with the home court advantage wins is P [H] = P [W1 W2 ] + P [W1 L2 W3 ] + P [L1 W2 W3 ] = p(1 − p) + p + p(1 − p) . P [K1 K2 |C11 ] = (1/2)2 P [K1 K2 |C21 ] = (1/3)(1/2) 27 P [K1 K2 |C12 ] = (1/2)(1/3) P [K1 K2 |C22 ] = (1/3)2 (5) (6) .9. (1) P [G1 ] = 1/3 In addition.5 Solution (a) There are 3 group 1 kickers and 6 group 2 kickers. Since the team with the home court advantage would win a 1 game playoff with probability p. Let ci indicate whether a kicker was chosen from group i and let Cij indicate that the first kicker was chosen from group i and the second kicker from group j. The experiment to choose the kickers is described by the sample tree: 2/8 $ c1 •C11 $$ $$$ c2 •C12 1/12 1/4 6/8 $ $$$ ˆˆˆ ˆˆ 3/8 c1 •C21 6/9 ˆ c2 ˆˆ ˆˆˆ ˆ c2 •C22 5/8 3/9 $ c1 $ 1/4 5/12 Since a kicker from group 1 makes a kick with probability 1/2 while a kicker from group 2 makes a kick with probability 1/3. we need to identify the groups from which the first and second kicker were chosen. (2) Combining these facts using the Law of Total Probability yields P [K] = P [K|G1 ] P [G1 ] + P [K|G2 ] P [G2 ] = (1/2)(1/3) + (1/3)(2/3) = 7/18. P [K2 |C12 ] = 1/3. the success probability is 1/3. the probability of 5 misses is P [M |G1 ] = 10 (1/2)5 (1/2)5 . we derive this result by calculating P [K2 |Cij ] and using the law of total probability to calculate P [K2 ]. the reverse ordering is equally likely. then the success probability is 1/2. 5 (15) We use the Law of Total Probability to find P [M ] = P [M |G1 ] P [G1 ] + P [M |G2 ] P [G2 ] = 10 5 (1/3)(1/2)10 + (2/3)(1/3)5 (2/3)5 . P [K2 ] = P [K2 |C11 ] P [C11 ] + P [K2 |C12 ] P [C12 ] P [K2 |C11 ] = 1/2. If this is not clear. If the kicker is from group 1. Out of 10 kicks. P [K1 K2 ] = P [K1 K2 |C11 ] P [C11 ] + P [K1 K2 |C12 ] P [C12 ] (7) (8) (9) + P [K1 K2 |C21 ] P [C21 ] + P [K1 K2 |C22 ] P [C22 ] 1 1 11 11 1 5 = + + + = 15/96. If the kicker is from group 2. This makes it more likely that the second kicker is from group 2 and is thus more likely to miss. (14) Note that 15/96 and (7/18)2 are close but not exactly the same. 5 P [M |G2 ] = 10 (1/3)5 (2/3)5 . P [K2 |C22 ] = 1/3. 28 . (16) (17) Problem 1. Symmetry should also make it clear that P [K1 ] = P [K2 ] since for any ordering of two kickers. (10) (11) + P [K2 |C21 ] P [C21 ] + P [K2 |C22 ] P [C22 ] 1 1 11 11 1 5 7 + + + = . then it is more likely that kicker is from group 1. Given the type of kicker chosen. each of the 10 field goals is an independent trial. P [K2 |C21 ] = 1/2. By the law of total probability. (c) Once a kicker is chosen. The reason K1 and K2 are dependent is that if the first kicker is successful.1 Solution From the problem statement.By the law of total probability. there are 5 misses iff there are 5 successful kicks. we can conclude that the device components are configured in the following way. 4 12 6 4 6 4 9 12 It should be apparent that P [K1 ] = P [K] from part (a). = 2 12 3 4 2 4 3 12 18 (12) (13) We observe that K1 and K2 are not independent since P [K1 K2 ] = 15 = 96 7 18 2 = P [K1 ] P [K2 ] .10. We call each repeated bit the receiver decodes as 1 a success. . The probability a deletion occurs is P [D] = P [S3. for the device to work. (1) Problem 1. and parallel devices 5 and 6 each with a single device labeled with the probability that it works. . Using Sk. Finally.5 ] = 10p3 (1 − p)2 + 10p2 (1 − p)3 = 0.W1 W2 W3 W5 W4 W6 To find the probability that the device works. 29 . We can view each repeated transmission as an independent trial. . then the probability k 1’s are decoded at the receiver is P [Sk. Introducing deletions reduces the probability of an error by roughly a factor of 20. The probability of an error is P [E] = P [S1. 5. Thus. 1.5 ] = 5 k p (1 − p)5−k .2 Solution The probability a bit is decoded correctly is P [C] = P [S5.5 ] = 5p(1 − p)4 + (1 − p)5 = 0.10. Suppose that the transmitted bit was a 1.5 ] = p5 + 5p4 (1 − p) = 0.5 ] + P [S4. You should convince yourself that this a symmetric situation with the same deletion and error probabilities. k k = 0. . the probability the device works is (4) P [W ] = [1 − q(1 − (1 − q)3 )][1 − q 2 ].00046. In particular. 2 c c [W5 W6 ] (2) 1-q 2 The probability P [W ] that the two devices in parallel work is 1 minus the probability that neither works: (3) P W = 1 − q(1 − (1 − q)3 ).5 ] + P [S2. P [W1 W2 W3 ] = (1 − q)3 . we replace series devices 1.081. the probability of successfull decoding is also reduced. both composite device in series must work. then 0 is sent five times and we call decoding a 0 a success.91854. P [W5 ∪ W6 ] = 1 − P This yields a composite device of the form (1-q) 1-q 3 (1) =1−q . However.5 to denote the event of k successes in the five trials. and 3. (4) (3) (2) Note that if a 0 is transmitted. 2.5 ] + P [S0. For each bit. the 10 digit number results in the transmission of 40 bits. and e erasures has the multinomial probability P [C = c.10.10. 2 (3) P [W5 ∪ W6 ] = 1 − q 2 . d. c. Similarly. P [D] = δ = 0.4 Solution From the statement of Problem 1. a deletion. 2 (2) q2 (5 − 4q + q 2 ) (1 − q 2 ). we found the probabilities of these events to be P [C] = γ = 0. 2 2 (6) 30 . E = d] = 0 40! c d e c!d!e! γ δ c + d + e = 40. the configuration of device components is W1 W2 W3 W5 W4 W6 By symmetry. Thus we consider the following cases: I Replace component 1 In this case q P [W1 W2 W3 ] = (1 − )(1 − q)2 . otherwise.1. 0 corresponds to 0000. the joint probability of c correct bits. note that the reliability of the system is the same whether we replace component 1. or an error. (1) Since each of the 40 bit transmissions is an independent trial. P [W1 W2 W3 ] = (1 − q)3 . d deletions. q q + (1 − q)3 .10.91854. the probability the system works is P [WII ] = P [W1 W2 W3 ∪ W4 ] P [W5 ∪ W6 ] = 1 − (5) q q + (1 − q)3 (1 − q 2 ). an independent trial determines whether the bit was correct.3 Solution Note that each digit 0 through 9 is mapped to the 4 bit binary representation of the digit. q2 (5 − 4q + q 2 ).00046. 1 to 0001. e ≥ 0. 2 (1) P [W1 W2 W3 ∪ W4 ] = 1 − (1 − P [W1 W2 W3 ])(1 − P [W4 ]) = 1 − In this case. the reliability is the same whether we replace component 5 or component 6. however this is unimportant to our problem. or component 3. 2 This implies P [W4 ] = 1 − q. up to 9 which corresponds to 1001. P [E] = = 0. (2) Problem 1. This implies q P [W4 ] = 1 − . the 4 bit binary numbers corresponding to numbers 10 through 15 go unused.081. the probability the system works is P [WI ] = P [W1 W2 W3 ∪ W4 ] P [W5 ∪ W6 ] = 1 − II Replace component 4 In this case. D = d.2.10. In Problem 1. Of course. 2 2 (4) P [W1 W2 W3 ∪ W4 ] = 1 − (1 − P [W1 W2 W3 ])(1 − P [W4 ]) = 1 − In this case.Problem 1. component 2. That is. P [W5 ∪ W6 ] = 1 − q 2 . we generate the vector P such that P(i)=0. we add 50 to produce random numbers between 51 and 100.. Next. Similar algebra will show that P [WII ] > P [WIII ] for all values of 0 ≤ q ≤ 1. occurring with probability P(i). then P(i)=0. Applying the ceiling function converts these random numbers to rndom integers in the set {1. P [W5 ∪ W6 ] = 1 − q2 . As a result. . The code for this task is function [C. . H(i) is the simulated result of a coin flip with heads. its hard to tell which substitution creates the most reliable circuit. In this case.1 Solution We can generate the 200 × 1 vector T. which occurs for all nontrivial (i.2 Solution Rather than just solve the problem for 50 trials.1)). P [W1 W2 W3 ] = (1 − q)3 .H]=twocoin(n). 2 (7) From these expressions.5. Problem 1. Problem 1.1)< P). C=ceil(2*rand(n. each in the interval (0. H=(rand(n.1) produces a 200 × 1 vector of random numbers. 50}.3 Solution Rather than just solve the problem for 100 trials. First. Finally. The code for this task is 31 . via the command T=50+ceil(50*rand(200. 2. . if C(i)=2. corresponding to H(i)=1.75 if C(i)=1. (9) (10) (8) P [W4 ] = 1 − q.11. . Thus the best policy is to replace component 4. we can write a function that generates vectors C and H for an arbitrary number of trials n.1)) Keep in mind that 50*rand(200.11. otherwise. denoted T in Matlab. The first line produces the n × 1 vector C such that C(i) indicates whether coin 1 or coin 2 is chosen for trial i. nonzero) values of q.11.III Replace component 5 In this case. P=1-(C/4). 50). we can write a function that generates n packets for an arbitrary number of trials n. 2 2 2 (11) Some algebra will show that P [WII ] > P [WI ] if and only if q 2 < 2. Problem 1. This implies P [W1 W2 W3 ∪ W4 ] = 1 − (1 − P [W1 W2 W3 ])(1 − P [W4 ]) = (1 − q) 1 + q(1 − q)2 . the probability the system works is P [WIII ] = P [W1 W2 W3 ∪ W4 ] P [W5 ∪ W6 ] = (1 − q) 1 − q2 2 1 + q(1 − q)2 .e. we observe that P [WII ] > P [WI ] if and only if 1− q q q2 + (1 − q)3 > 1 − (5 − 4q + q 2 ). % n is the number of 6 component devices %N is the number of working devices W=rand(n.4 Solution To test n 6-component devices.5)|W(:. Problem 1. As a result. D=(W(:. (such that each component works with probability q) we use the following function: function N=reliable6(n. otherwise. 000 packets typically produces around C = 98. if B(i. 400 correct packets. repeated trials with n = 100. otherwise E(i. Note that D(i)=1 if device i works.3))|W(:.02*B. First.6)>q. For n = 100 packets.j)=0.j)=1. C=98. Experimentation will show that C=97.100)< P). end >> w w = 82 87 87 92 91 85 85 83 90 >> 89 As we see. N=sum(D).5 Solution The code 32 . By applying these logical operators to the n × 1 columns of W.function C=bit100(n). Next we sum across the rows of E to obtain the number of errors in each packet.10.11.100)). w(n)=reliable6(100. we count the number N of working devices. we count the number of packets with 5 or more errors. E(i. where each sample run tests n=100 devices for q = 0.2.j)=0.8663. D=D&(W(:.0.984 is a reasonable estimate for the probability of a packet being transmitted correctly.j)=0. then P(i.j) is the simulated error indicator for bit i of packet j.6)).j)=0. Lastly. more consistent results are obtained. D(i)=0. C=sum((sum(E. we can use the Matlab logical operators | and & to implement the logic requirements for a working device.2). The n×6 matrix W is a logical matrix such that W(i. The following code snippet produces ten sample runs.2)&W(:. C=99 and C=100 correct packets are typica values that might be observed.4). Because W is a logical matrix. Problem 1.03-0. That is. By increasing n. E(i.01. B is an n × 100 matrix such that B(i.j)=1 if component j of device i works properly. we simulate the test of n circuits.j)=1 if bit i of packet j is in error. Next. the number of working devices is typically around 85 out of 100.q). Solving Problem 1. will show that the probability the device works is actually 0. % n is the number of 100 bit packets sent B=floor(2*rand(n. the packet success probability is inconclusive. P=0. Otherwise. >> for n=1:10.j) indicates whether bit i of packet j is zero or one.11.2)<=5)). we generate the n×100 matrix P such that P(i. E=(rand(n. Finally. For example. Thus 0.1.1)&W(:.03 if B(i. we can use the Matlab logical operators | and & to implement the logic requirements for a working device. y plane is represented by a grid of all pairs of points x(i).j)=0.2)&W(:. y) as a surface over the x. ndgrid(x. This grid is represented by two separate n × m matrices: MX and MY which indicate the x and y values at each grid point. That is.6)>q. A sample run for n = 100 trials and q = 0. and y has m elements. D=(W(:. W=rand(n. for each column replacement.MY]=ndgrid(x.1))’. otherwise C(i. That is. The key is in the operation [MX. y(j).y).j)=1 if x(i)=y(j). R=rand(n. D=D&(W(:.1)&W(:.4.6)). MY(i.MY]=ndgrid(x.11.1)>(q/2). Mathematically.3))|W(:. we count the number N of working devices.6 Solution This above code is based on the code for the solution of Problem 1.11. Lastly.j)=1 if component j of device i works properly. the following functions evaluates replacing each of the six components by an ultrareliable device.y). we sum along column j to count the number of occurrences (in x) of y(j). The Matlab built-in function ndgrid facilitates plotting a function g(x.j)=y(j). we sum along each column j to count number of elements of x equal to y(j).y) %Usage: n=countequal(x. % n is the number of 6 component devices %N is the number of working devices for r=1:6.y) %n(j)= # elements of x = y(j) [MX. The n × 6 matrix W is a logical matrix such that W(i. MX(i. For arbitrary number of trials n and failure probability q.5)|W(:. D(i)=0. Lastly. W(:. When x has n elements. y plane.r)=R.j) = x(i). we simulate the test of n circuits.function n=countequal(x.4). To simulate the replacement of the jth device by the ultrareliable version by replacing the jth column of W by the column vector R in which a device has failure probability q/2. The x. Note that in the code.q). N(r)=sum(D). Note that D(i)=1 if device i works. Otherwise. By applying these logical opeators to the n × 1 columns of W. we first generate the matrix W such that each component has failure probability q. end Problem 1. the jth column of C indicates indicates which elements of x equal y(j). %each column of MX = x %each row of MY = y n=(sum((MX==MY). for countequal is quite short (just two lines excluding comments) but needs some explanation.y) creates a grid (an n × m array) of all possible pairs [x(i) y(j)]. C=(MX==MY) is an n × m array such that C(i. Because W is a logical matrix.2 yielded these results: 33 . Next. function N=ultrareliable6(n. 0. 2. The somewhat complicated solution of Problem 1.10. If we experiment with n = 10.4 will confirm this observation. 34 . or 3 should yield the same probability of device failure.0. 000 runs. we see. The results are fairly inconclusive in that replacing devices 1. that replacing the third component with an ultrareliable component resulted in 91 working devices.0. for example. the results are more definitive: >> ultrareliable6(10000.>> ultrareliable6(100.2) ans = 8738 8762 8806 >> >> ultrareliable6(10000.2) ans = 93 89 91 92 90 93 From the above. it is clear that replacing component 4 maximizes the device reliability.2) ans = 8771 8795 8806 >> 9135 8800 8796 9178 8886 8875 In both cases. 2. (b) P [X < 3] = PX (0) + PX (1) + PX (2) = 7/8.1 Solution (a) We wish to find the value of c that makes the PMF sum up to one. 4 PV (v) = c(12 + 22 + 32 + 42 ) = 30c = 1 v=1 (1) Hence c = 1/30. . 2.3 Solution (a) We must choose c to make the PMF of V sum to one. (1) From the PMFs PX (x) and PR (r). 1. .} so that P [V ∈ U ] = PV (1) + PV (4) = (c) The probability that V is even is P [V is even] = PV (2) + PV (4) = (d) The probability that V > 2 is P [V > 2] = PV (3) + PV (4) = 42 5 32 + = 30 30 6 (4) 42 2 22 + = 30 30 3 (3) 42 17 1 + = 30 30 30 (2) 35 . 2 n=0 PN (n) c(1/2)n n = 0. (b) Let U = {u2 |u = 1. (c) P [R > 1] = PR (2) = 3/4. . we can write the PMF of X and the PMF of R as   1/8 x = 0     3/8 x = 1   1/4 r = 0 3/8 x = 2 3/4 r = 2 PR (r) = PX (x) =    1/8 x = 3 0 otherwise    0 otherwise (a) P [X = 0] = PX (0) = 1/8. we can calculate the requested probabilities Problem 2.2. 2 0 otherwise (1) = c + c/2 + c/4 = 1.2. PN (n) = Therefore.2 Solution From Example 2. (2) (b) The probability that N ≤ 1 is P [N ≤ 1] = P [N = 0] + P [N = 1] = 4/7 + 2/7 = 6/7 Problem 2. implying c = 4/7.Problem Solutions – Chapter 2 Problem 2.5. The problem statement requires that (0. To be sure that at least 95% of all callers get through.8)3n ≤ 0. This implies n ≥ 4.2. Problem 2.48 and so we need 5 operators.6 Solution The probability that a caller fails to get through in three tries is (1 − p)3 .7 Solution In Problem 2. the PMF of Y is   1−p   p(1 − p) PY (y) =  2  p  0 y=0 y=1 y=2 otherwise (1) Problem 2. (b) P [X = 4] = PX (4) = (c) P [X < 4] = PX (2) = (d) 2 8 = 7·4 7 4 8 = 7·2 7 8 3 8 + = 7·4 7·8 7 (2) (3) P [3 ≤ X ≤ 9] = PX (4) + PX (8) = (4) Using B (for Bad) to denote a miss and G (for Good) to denote a successful free throw. each caller is willing to make 3 attempts to get through. 36 .8)n .2. a caller will suffer three failed attempts with probability q 3 = (0.Problem 2.2.4 Solution (a) We choose c so that the PMF sums to one.05.05.6316.5 Solution ¨¨ ¨¨ p ¨¨ ¨¨ G p G •Y =2 From the tree. we need (1 − p)3 ≤ 0. PX (x) = x c c 7c c + + = =1 2 4 8 8 (1) Thus c = 8/7. Assuming call attempts are independent.6. An attempt is a failure if all n operators are busy.2.8)3n . which occurs with probability q = (0. the sample tree for the number of points scored in the 1 and 1 is 1−p ¨ B •Y =0 1−p ¨ B •Y =1 ¨ ¨ Problem 2. This implies p = 0.2. 7.8 Solution From the problem statement.08 PB (b) =  0. Each time the setup message is sent.1 Solution (a) If it is indeed true that Y .2. p + 2p + 4p + 8p = 0. (c) Let B denote the event that a busy signal is given after six failed setup attempts.2.70   0.Problem 2. Using r to denote a successful response.16     0. is uniformly distributed between 5 and 15.02)1/6 = 0. . then the PMF of Y . . the phone will send the “SETUP” message up to six times.02. the number of yellow M&M’s in a package. a single is twice as likely as a double. If p is the probability of a home run. we have a Bernoulli trial with success probability p. then PB (4) = p PB (3) = 2p PB (2) = 4p PB (1) = 8p (1) Since a hit of any kind occurs with probability of p = 0. the number of “SETUP” messages sent as  k = 1. and n a non-response.300 which implies b=0 b=1 b=2 b=3 b=4 otherwise (2) Problem 2. . . 5  (1 − p)k−1 p 5 p + (1 − p)6 = (1 − p)5 k = 6 (1 − p) PK (k) =  0 otherwise (1) Note that the expression for PK (6) is different because K = 6 if either there was a success or a failure on the sixth attempt.479. The probability of six consecutive failures is P [B] = (1 − p)6 . . which is twice as likely as a home-run. 15 0 otherwise 37 (1) . K = 6 whenever there were failures on the first five attempts which is why PK (6) simplifies to (1 − p)5 .02. In fact.9 Solution (a) In the setup of a mobile call. 2. Of course.300. the phone stops trying as soon as there is a success.02    0 . the PMF of B is   0. is PY (y) = 1/11 y = 5. the sample tree is p ¨ r •K=1 p ¨ r •K=2 p ¨ r •K=3 p ¨ r •K=4 p ¨ r •K=5 p ¨ r •K=6 ¨ ¨ ¨ ¨ ¨ ¨¨ ¨ ¨¨ ¨¨ ¨¨ ¨¨ ¨¨ ¨ n ¨ 1−p n ¨ 1−p n ¨ 1−p n ¨ 1−p n ¨ 1−p n •K=6 1−p (b) We can write the PMF of K. (d) To be sure that P [B] ≤ 0. Problem 2. Hence. . . we need p ≥ 1 − (0. which is twice as likely as a triple. 6.3. .04    0. . Thus. The number of times K that the pager receives a message is the number of successes in n Bernoulli trials and has the binomial PMF PK (k) = pk (1 − p)n−k k = 0. K. If p = 0. . the experiment ends on the first success and X has the geometric PMF PX (x) = (1 − p)x−1 p x = 1. we must have n ≥ 1. . .8. Whether a hook catches a fish is an independent trial with success probability h.3.3 Solution hk (1 − h)m−k k = 0. Thus.4096. . 2. For p = 0. n = 2 pages would be necessary.95 requires that n ≥ ln(0. The the number of fish hooked.86. . 1. The event R has probability (2) P [R] = P [B > 0] = 1 − P [B = 0] = 1 − (1 − p)n To ensure that P [R] ≥ 0. m otherwise (1) Problem 2. the probability of more than four throws is (0. 0 otherwise (1) (b) The child will throw the frisbee more than four times iff there are failures on the first 4 trials which has probability (1 − p)4 . 1. has the binomial PMF PK (k) = 0 m k Problem 2. .8)4 = 0. n 0 otherwise n k (1) (b) Let R denote the event that the paging message was received at least once.(b) P [Y < 10] = PY (5) + PY (6) + · · · + PY (9) = 5/11 (c) P [Y > 12] = PY (13) + PY (14) + PY (15) = 3/11 (d) P [8 ≤ Y ≤ 12] = PY (8) + PY (9) + · · · + PY (12) = 5/11 (4) (3) (2) Problem 2. .4 Solution (a) Let X be the number of times the frisbee is thrown until the dog catches it and runs away. . . .2 Solution (a) Each paging attempt is an independent Bernoulli trial with success probability p.2. 38 .3. Each throw of the frisbee can be viewed as a Bernoulli trial in which a success occurs if the dog catches the frisbee an runs away.05)/ ln(1 − p).3. 5 Solution Each paging attempt is a Bernoulli trial with success probability p where a success occurs if the pager receives the paging message.0072 39 (2) . we obtain: (3) ≈ exp(−500. (4) P [B > 500. is (T /5)b e−T /5 /b! b = 0.29. the number of buses in T minutes.000/400.3.4 implies that (1 − x) 500. buses arrive at the bus stop at a rate of 1/5 buses per minute. . Substituting.000] = 1 − (1 − (1 − p)500. x = 1 − p.000 P [B > 500. .000) = 0. 2.000 bits is 500.05.000 Problem 2. 0 otherwise (1) (b) The probability that no more three paging attempts are required is P [N ≤ 3] = 1 − P [N > 3] = 1 − ∞ n=4 PN (n) = 1 − (1 − p)3 (2) This answer can be obtained without calculation since N > 3 if the first three paging attempts fail and that event occurs with probability (1 − p)3 . the PMF of B. 1.7 Solution Since an average of T /5 buses arrive in an interval of T minutes. .3. That is. N has the geometric PMF PN (n) = (1 − p)n−1 p n = 1. This implies p ≥ 1 − (0. PB (b) = (1) 0 otherwise (b) Choosing T = 2 minutes. the probability that three buses arrive in a two minute interval is PB (3) = (2/5)3 e−2/5 /3! ≈ 0.000 ) = (1 − 0. Hence the number of paging messages is N = n if there are n − 1 paging failures followed by a paging success.6316 (3) Problem 2.Problem 2.000 .95 or (1 − p)3 ≤ 0.25 × 10 ) −5 500. (a) From the definition of the Poisson PMF.05)1/3 ≈ 0.3. . .000 b−1 x b=1 PB (b) b=1 500.6 Solution The probability of more than 500. .000 b=1 (1) (2) (1 − p)b−1 = 1 − x500. Hence. we must choose p to satisfy 1 − (1 − p)3 ≥ 0.000] = 1 − =1−p Math Fact B. (a) The paging message is sent again and again until a success occurs. (4) (3) Problem 2.0 minutes.3. 1. 8 0 otherwise 8 n (1) (b) The indicator random variable I equals zero if and only if N = 8. . .8 Solution (a) If each message is transmitted 8 times and the probability of a successful transmission is p. the number of successful transmissions has the binomial PMF PN (n) = pn (1 − p)8−n n = 0.99 Rearranging yields T ≥ 5 ln 100 ≈ 23.9 Solution The requirement that n=1: n=2: n=3: n=4: n=5: n=6: n x=1 PX (x) = 1 implies c(1) c(2) 1 1 1 2 1 3 1 4 1 4 + + + + + 1 1 1 2 1 3 1 4 1 5 1 6 =1 =1 =1 =1 =1 =1 c(1) = 1 c(2) = c(3) = c(4) = c(5) = c(6) = 2 3 6 11 12 25 12 25 20 49 (1) (2) (3) (4) (5) (6) 1 1 1 1 + c(4) 1 2 1 1 1 + + c(5) 1 2 3 1 1 1 + + c(6) 1 2 3 c(3) + + + + As an aside.(c) By choosing T = 10 minutes. . the probability of zero buses arriving in a ten minute interval is PB (0) = e−10/5 /0! = e−2 ≈ 0. Hence.3.135 (d) The probability that at least one bus arrives in T minutes is P [B ≥ 1] = 1 − P [B = 0] = 1 − e−T /5 ≥ 0. P [I = 0] = P [N = 0] = 1 − P [I = 1] Thus. find c(n) for large values of n is easy using the recursion 1 1 1 = + . . then the PMF of N . c(n + 1) c(n) n + 1 (7) 40 . the complete expression for the PMF of I is  i=0  (1 − p)8 8 i=1 1 − (1 − p) PI (i) =  0 otherwise (2) (3) Problem 2. Similarly for P [L87 ] and P [N99 ] become just the probability that a losing ticket was bought and that no ticket was bought on a single day. otherwise (1) (b) The probability of finding the winner on the tenth call is PL (10) = 9 (0. a packet is transmitted t times if the first t − 1 attempts fail followed by a successful transmission on attempt t. 7. can be represented by the following PMF.25) ] ≈ 0.75)6 (0. no matter what the outcome of attempt d. 2. L has a Pascal PMF with parameters p = 0.3. Otherwise.10 Solution (a) We can view whether each caller knows the birthdate as a Bernoulli trial.12 Solution (a) Since each day is independent of any other day.25) + 21(0.25)l−6 l = 6.3. The random variable K is just the number of trials until the first success and has the geometric PMF PK (k) = (1/2)(1/2)k−1 = (1/2)k k = 1. the number of times that a packet is transmitted. . So the random variable T .0876 5 (2) (c) The probability that the station will need nine or more calls to find a winner is P [L ≥ 9] = 1 − P [L < 9] 6 (3) (4) 2 = 1 − (0. P [W33 ] is just the probability that a winning lottery ticket was bought. As a result.75) [1 + 6(0.3. . . the packet is transmitted d times if there are failures on the first d − 1 transmissions. Therefore P [W33 ] = p/2 P [L87 ] = (1 − p)/2 P [N99 ] = 1/2 (1) (b) Suppose we say a success occurs on the kth trial if on day k we buy a ticket. a failure occurs.75 and k = 6 as defined by Definition 2. . respectively. For t < d. Further.Problem 2. . 0 otherwise 41 (2) .   p(1 − p)t−1 t = 1.8.11 Solution Problem 2. PL (l) = 0 l−1 5 (0. That is.75)6 (0. d − 1 (1 − p)d−1 t = d (1) PT (t) =  0 otherwise Problem 2. L is the number of trials needed for 6 successes. . In particular. .321 = 1 − PL (6) − PL (7) − PL (8) (5) The packets are delay sensitive and can only be retransmitted d times.25)4 ≈ 0. . . . The probability of success is simply 1/2. 2. Thus. Therefore D has the Pascal PMF PD (d) = 0 d−1 j−1 [(1 − p)/2]j [(1 + p)/2]d−j d = j. has the binomial PMF PR (r) = 0 m r [(1 − p)/2]r [(1 + p)/2]m−r r = 0. m otherwise (3) (d) Letting D be the day on which the j-th losing ticket is bought. If we view buying a losing ticket as a Bernoulli success. Similarly. j + 1.13 Solution (a) Let Sn denote the event that the Sixers win the series in n games. Mathematically.3. . . Cn is the event that the Celtics in in n games. P [Cn ] = P [Sn ]. the Celtics win 3 games if they win the series in 3. P [W = w] = w = 0. R. . . 1. Also. independent of any other day. or 5 games. 4 (1/2)4 (1/2) = 3/16 (3) P [S5 ] = 2 By symmetry. P [N = n] = P [Sn ] + P [Cn ] = 2P [Sn ] Consequently. N .4. . the number of losing lottery tickets bought in m days. . then the Sixers won the series in 3 + w games. played in and the Sixers can be described by the PMF  3  2(1/2) = 1/4  3  2 1 (1/2)4 = 3/8 PN (n) =  2 4 (1/2)5 = 3/8  2  0 (4) a best of 5 series between the Celtics n=3 n=4 n=5 otherwise (5) (b) For the total number of Celtic wins W . the total number of games. Hence. . The Sixers win the series in 3 games if they win three straight. we can find the probability that D = d by noting that j − 1 losing tickets must have been purchased in the d − 1 previous days. which occurs with probability (1) P [S3 ] = (1/2)3 = 1/8 The Sixers win the series in 4 games if they win two out of the first three games and they win the fourth game so that P [S4 ] = 3 (1/2)3 (1/2) = 3/16 2 (2) The Sixers win the series in five games if they win two out of the first four games and then win game five.(c) The probability that you decide to buy a ticket and it is a losing ticket is (1−p)/2. 1. we note that if the Celtics get w < 3 wins. otherwise (4) Problem 2. 2 P [S3+w ] P [C3 ] + P [C4 ] + P [C5 ] w = 3 42 (6) . Further we observe that the series last n games if either the Sixers or the Celtics win the series in n games. 14 Solution Since a and b are positive. This implies PL (l) = PWS (l) = PW (l). Since either team is equally likely to win any game. the number of wins by the Celtics. we see that n n (a + b)n = (a + b)n k=0 PK (k) = k=0 n k n−k a b k (4) Problem 2. W . PWS (w) = PW (w). The complete expression of for the PMF of L is   1/8 l = 0    3/16 l = 1  3/16 l = 2 PL (l) = PW (l) = (8)   1/2 l = 3    0 otherwise Problem 2.3.4.Thus. let K be a binomial random variable for n trials and success probability p = a/(a + b). This implies PL (l) = PWS (l).1 Solution (a) P [Y < 1] = 0 (b) P [Y ≤ 1] = 1/4 Using the CDF given in the problem statement we find that (c) P [Y > 2] = 1 − P [Y ≤ 2] = 1 − 1/2 = 1/2 (d) P [Y ≥ 2] = 1 − P [Y < 2] = 1 − 1/4 = 3/4 (e) P [Y = 1] = 1/4 43 . has the PMF   P [S3 ] = 1/8    P [S4 ] = 3/16  P [S5 ] = 3/16 PW (w) =   1/8 + 3/16 + 3/16 = 1/2    0 shown below. by symmetry. First. w=0 w=1 w=2 w=3 otherwise (7) (c) The number of Celtic losses L equals the number of Sixers’ wins WS . we observe that the sum of over all possible values of the PMF of K is n n PK (k) = k=0 k=0 n n k p (1 − p)n−k k n k a a+b k (1) b a+b n−k = = Since n k=0 PK (k) (2) (3) k=0 n n k=0 k ak bn−k (a + b)n = 1. (f) P [Y = 3] = 1/2 (g) From the staircase CDF of Problem 2.8 5 ≤ x < 7 0   1 x≥7 −3 0 5 7 FX(x) x (1) (b) The corresponding PMF of X is   0.4 −3 ≤ x < 5 0.2 Solution (a) The given CDF is shown in the diagram below.2 0 −2 −1 0 x 1 2 (b) The corresponding PMF of X is   0   0.3   0 (2) Problem 2.4   0.8 x < −3  0  0.8 0.3 Solution (a) Similar to the previous problem.4.2  0. we see that Y is a discrete random variable.2 FX (x) =   0.4.4.2   0 x = −3 x=5 x=7 otherwise (2) 44 .6  0.2   0. The height of each jump equals the probability of that value.4 0.4 PX (x) =  0.1. the graph of the CDF is shown below.6 0.  1 0. The PMF of Y is   1/4 y = 1   1/4 y = 2 PY (y) = (1)  1/2 y = 3   0 otherwise Problem 2.4 FX (x) = 0. 1 0.5 PX (x) =  0. The jumps in the CDF occur at at the values that Y can take on.7  1 x = −1 x=0 x=1 otherwise x < −1 −1 ≤ x < 0 0≤x<1 x≥1 FX(x) (1)   0. 08 b = 2 PB (b) =  0. the CDF obeys k k pq k−1 k = 1. you should review Example 2. the CDF of N obeys n n Problem 2. . (3) FK (k) = k 1 − (1 − p) k ≥ 1.5 Solution FN (n) = i=0 PN (i) = i=0 (1/3)i (2/3) = 1 − (1/3)n+1 (2) A complete expression for FN (n) must give a valid answer for every value of n. .4 Solution Let q = 1 − p. including non-integer values.02 b = 4    0 otherwise 45 (1) .8. . . . We can write the CDF using the floor function x which denote the largest integer less than or equal to X. FK (k) = FK ( k ) for all integer and non-integer values of k.2. the number of pizzas sold before the first mushroom pizza is N = n < 100 if the first n pizzas do not have mushrooms followed by mushrooms on pizza n + 1. (2) Since K is integer valued. 99 n = 100 (1/3)100 PN (n) = (1)  0 otherwise For integers n < 100.16 b = 1     0. Also. .6 Solution From Problem 2. .4.24. the complete expression for the CDF of K is 0 k < 1. 2.4.) Thus.4.04 b = 3    0.70 b = 0   0. it is possible that N = 100 if all 100 pizzas are sold without mushrooms. the resulting PMF is   (1/3)n (2/3) n = 0. (If this point is not clear.Problem 2. The complete expression for the CDF is  x<0  0 FN (x) = (3) 1 − (1/3) x +1 0 ≤ x < 100  1 x ≥ 100 Problem 2. Since mushrooms occur with probability 2/3. 0 otherwise. so the PMF of the geometric (p) random variable K is PK (k) = For any integer k ≥ 1. (1) FK (k) = j=1 PK (j) = j=1 pq j−1 = 1 − q k . . 1. the PMF of B is   0. 9. 2. .2.25 0 −1 (1) FY(y) FY(y) FY(y) (2) 0 1 y 2 3 0 1 y 2 3 0 1 y 2 3 p = 1/4 p = 1/2 p = 3/4 Problem 2. . .4. 5 (1/2)5 n = 6 PN (n) =  0 otherwise The corresponding CDF of N is 1 0.75 0. . 2.70     0.5 0.7 Solution   0   0. the PMF of the number of call attempts is  k = 1.The corresponding CDF is 1 0.25 0 −1 1 0. .75 0. .86 FB (b) =  0.5 0. and its corresponding CDF are   y=0 y<0  1−p  0     p(1 − p) y = 1 1−p 0≤y <1 PY (y) = FY (y) = y=2  p2  1 − p2 1 ≤ y < 2     0 otherwise 1 y≥2 1 0. the CDF resembles 1 0.2.0 b<0 0≤b<1 1≤b<2 2≤b<3 3≤b<4 b≥4 (2) For the three values of p.75 0.5 0.5. the PMF can be simplified to   (1/2)n n = 1.75 0.25 0 −1 0 1 2 3 4 5 b FB(b) Problem 2.5 0.75 0.5 0. .98    1.25 0 −1 In Problem 2.8 Solution From Problem 2. 5  (1 − p)k−1 p (1 − p)5 p + (1 − p)6 = (1 − p)5 k = 6 PN (n) =  0 otherwise For p = 1/2.25 0 0 1 2 3 4 5 6 7 n (1) (2)   0   1/2      3/4  7/8 FN (n) =   15/16     31/32    1 46 n<1 1≤n<2 2≤n<3 3≤n<4 4≤n<5 5≤n<6 n≥6 FN(n) (3) . This PMF. .4.94    0. we found the PMF of Y . 5. the set of all modes is (1) Xmod = {1. the cost of one call can only take the two values associated with the cost of each type of call. for any x satisfying 50 < x < 51.5 is a median since it satisfies P [X < xmed ] = P [X > xmed ] = 1/2 In fact.5. we just need to pay careful attention to the definitions of mode and median.6 x = 20 0.4) = 24 cents (2) Problem 2.6) + 30(0. Hence.5.Problem 2. E[C]. any integer x between 1 and 100 is a mode of the random variable X. 100} (b) The median must satisfy P [X < xmed ] = P [X > xmed ]. .4.3 Solution From the solution to Problem 2.4.6 and 0. is simply the sum of the cost of each type of call multiplied by the probability of such a call occurring. Since P [X ≤ 50] = P [X ≥ 51] = 1/2 we observe that xmed = 50. P [X < x ] = P [X > x ] = 1/2. . Furthermore the respective probabilities of each type of call are 0. (a) Since each call is either a voice or data call. Xmed = {x|50 < x < 51} (4) (3) (2) Problem 2.4 x = 30 (1) PX (x) =  0 otherwise (b) The expected cost. 2. In the case of the uniform PMF.1. Thus. .1 Solution For this problem.2 Solution Voice calls and data calls each cost 20 cents and 30 cents respectively. (a) The mode must satisfy PX (xmod ) ≥ PX (x) for all x. . Therefore the PMF of X is   0. the PMF of Y   1/4   1/4 PY (y) =  1/2   0 The expected value of Y is E [Y ] = y is y=1 y=2 y=3 otherwise (1) yPY (y) = 1(1/4) + 2(1/4) + 3(1/2) = 9/4 (2) 47 . E [C] = 20(0. 3. 4.4.4   0.5 48 .Problem 2.4) + 5(0.2 (2) Problem 2. 1. 4 0 otherwise 4 x (1) E [X] = x=0 xPX (x) = 0 4 1 4 1 4 1 4 1 4 1 +1 +2 +3 +4 4 4 4 4 0 2 1 2 2 2 3 2 4 24 (2) (3) = [4 + 12 + 12 + 4]/24 = 2 Problem 2.2) + 0(0.7 Solution From Definition 2.5) + 1(0.5.3. the PMF of X   0.3   0 The expected value of X is E [X] = x is x = −1 x=0 x=1 otherwise (1) xPX (x) = −1(0. random variable X has PMF PX (x) = The expected value of X is 4 (1/2)4 x = 0. 5 0 otherwise 5 x (1) E [X] = x=0 xPX (x) 5 1 5 1 5 1 5 1 5 1 5 1 +1 +2 +3 +4 +5 5 5 5 5 5 0 2 1 2 2 2 3 2 4 2 5 25 (2) (3) (4) =0 = [5 + 20 + 30 + 20 + 5]/25 = 2.5.6 Solution From Definition 2. random variable X has PMF PX (x) = The expected value of X is 5 (1/2)5 x = 0.1 (2) Problem 2. 2. 3. 1. 2.4 Solution From the solution to Problem 2.2.4.2) = 2.5. the PMF of X   0.7.7.5 Solution From the solution to Problem 2.4 PX (x) =  0.4) + 7(0.2   0.5.5 PX (x) =  0.2   0 The expected value of X is E [X] = x is x = −3 x=5 x=7 otherwise (1) xPX (x) = −3(0.3) = 0. and the second payoff is 64 dollars which happens unless we lose 6 straight bets. So the PMF of Y is  y=0  (1/2)6 = 1/64 6 = 63/64 y = 64 1 − (1/2) (1) PY (y) =  0 otherwise The expected amount you take home is E [Y ] = 0(1/64) + 64(63/64) = 63 So. 0 otherwise (7) The expected value of N is E[N ] = 5000. we can expect to break even.999 (2) (b) Let Y denote the number of packets received in error out of 100 packets transmitted. .999)100−y y = 0. we assume that a packet is corrupted with probability = 0.001 0. .001.001) (0.1 (4) (c) Let L equal the number of packets that must be received to decode 5 packets in error. 1. In these networks. Problem 2. (a) Let X = 1 if a data packet is decoded correctly. 4 (0. (2) Bernoulli. . . . Y has the binomial PMF PY (y) = The expected value of Y is E [Y ] = 100 = 0.999) (5) PL (l) = 0 otherwise The expected value of L is E [L] = 5 = 5 = 5000 0. on the average.001 is the probability a packet is corrupted. each data packet contains a cylic redundancy check (CRC) code that permits the receiver to determine whether the packet was decoded correctly. 6.Problem 2. there are only two possible payoffs.999 PX (x) =  0 otherwise X = 0. . .9 Solution In this ”double-or-nothing” type game. then the number N of packets that arrive in 5 seconds has the Poisson PMF PN (n) = 5000n e−5000 /n! n = 0. . 1. .001)y (0.one or the other 49 prob is raised to the 6 because it takes 6 times to reach $0 . Random variable X is a x=0 x=1 otherwise (1) The parameter = 0. The first is zero dollars. which is not a very exciting proposition. L has the Pascal PMF l−1 5 l−5 l = 5.8 Solution The following experiments are based on a common model of packet transmissions in data networks.001 (6) 0 100 y (0. . 100 otherwise (3) (d) If packet arrivals obey a Poisson model with an average arrival rate of 1000 packets per second. which happens when we lose 6 straight bets. The expected value of X is E [X] = 1 − = 0.5. In the following. Bernoulli random variable with PMF   0. independent of whether any other packet is corrupted.5. 4.5.6. The complete expression for the PMF of U is   1/4 u = 1   1/4 u = 4 PU (u) =  1/2 u = 9   0 otherwise 50 (4) . PU (1) = PY (1) = 1/4 PU (4) = PY (2) = 1/4 PU (9) = PY (3) = 1/2 (3) For all other values of u. Hence. PU (u) = 0. ∞ i=0 P [X > i] = ∞ j−1 j=1 i=0 PX (j) = ∞ j=1 jPX (j) = E [X] (2) Problem 2.11 Solution We write the sum as a double sum in the following way: ∞ i=0 P [X > i] = ∞ ∞ PX (j) (1) i=0 j=i+1 At this point.1. 9}.5.1 Solution From the solution to Problem 2. PU (u) = PY ( u).Problem 2. we have n−1 E [Xn ] = np x =0 PXn−1 (x) = np x =0 (3) The above sum is 1 because it is the sum of a binomial random variable for n − 1 trials over all possible values. The PMF of U can be found by observing that √ √ P [U = u] = P Y 2 = u = P Y = u + P Y = − u (2) √ Since Y is never negative. 2. 3}.10 Solution n By the definition of the expected value. the key step is to reverse the order of summation.4. the range of U = Y 2 is SU = {1. the PMF of Y   1/4   1/4 PY (y) =  1/2   0 is y=1 y=2 y=3 otherwise (1) (a) Since Y has range SY = {1. E [Xn ] = x=1 n x n x p (1 − p)n−x x (n − 1)! px−1 (1 − p)n−1−(x−1) (x − 1)!(n − 1 − (x − 1))! n−1 x p (1 − p)n−x = np x 1 n−1 (1) (2) = np x=1 With the substitution x = x − 1. In this case. You may need to make a sketch of the feasible values for i and j to see how this reversal occurs. Problem 2. the expected value of V is E [V ] = v (5) PV (v) = 0(1/2) + 1(1/2) = 1/2 (6) You can also compute E[V ] directly by using Theorem 2. we can calculate the expected value of U as E [U ] = E Y 2 = y y 2 PY (y) = 12 (1/4) + 22 (1/4) + 32 (1/2) = 5. the PMF of X   0.75 (7) As we expect. the expected value of U is E [U ] = u (5) uPU (u) = 1(1/4) + 4(1/4) + 9(1/2) = 5.4.(b) From the PMF.5 0 ≤ v < 1 FV (v) =  1 v≥1 (c) From the PMF PV (v).5 PX (x) =  0. both methods yield the same answer. it is straighforward to write down the CDF.5 v = 1 PV (v) =  same v so are added together 0 otherwise (4) (b) From the PMF.3   0 (a) The PMF of V = |X| satisfies In particular. Problem 2.5 (3) The complete expression for the PMF of V is   0.5 is x = −1 x=0 x=1 otherwise (1) PV (v) = P [|X| = v] = PX (v) + PX (−v) (2) PV (1) = PX (−1) + PX (1) = 0. 1 and -1 are the 0.  u<1  0   1/4 1 ≤ u < 4 FU (u) =  1/2 4 ≤ u < 9   1 u≥9 (c) From Definition 2.75 (6) From Theorem 2.14.2   0. we can construct the staircase CDF of V . 51 .2.5 v = 0 b/c of the abs.10.2 Solution From the solution to Problem 2.10.  v<0  0 0.6. PV (0) = PX (0) = 0. Problem 2.6.3 Solution From the solution to Problem 2.4.3, the PMF of X   0.4   0.4 PX (x) =  0.2   0 (a) The PMF of W = −X satisfies This implies PW (−7) = PX (7) = 0.2 The complete PMF for W is is x = −3 x=5 x=7 otherwise (1) PW (w) = P [−X = w] = PX (−w) (2) PW (−5) = PX (5) = 0.4   0.2   0.4 (w) =  0.4   0 PW (3) = PX (−3) = 0.4 (3) PW w = −7 w = −5 w=3 otherwise (4) (b) From the PMF, the CDF of W is   0   0.2 (w) =  0.6   1 w < −7 −7 ≤ w < −5 −5 ≤ w < 3 w≥3 FW (5) (c) From the PMF, W has expected value E [W ] = w wPW (w) = −7(0.2) + −5(0.4) + 3(0.4) = −2.2 (6) Problem 2.6.4 Solution A tree for the experiment is 1/3$ $ D=99.75 $$$ $$ ˆˆˆ 1/3 D=100 ˆˆˆ ˆ D=100.25 1/3 •C=100074.75 •C=10100 •C=10125.13 Thus C has three equally likely outcomes. The PMF of C is PC (c) = 1/3 c = 100,074.75, 10,100, 10,125.13 0 otherwise (1) 52 Problem 2.6.5 Solution (a) The source continues to transmit packets until one is received correctly. Hence, the total number of times that a packet is transmitted is X = x if the first x − 1 transmissions were in error. Therefore the PMF of X is PX (x) = q x−1 (1 − q) x = 1, 2, . . . 0 otherwise (1) (b) The time required to send a packet is a millisecond and the time required to send an acknowledgment back to the source takes another millisecond. Thus, if X transmissions of a packet are needed to send the packet correctly, then the packet is correctly received after T = 2X − 1 milliseconds. Therefore, for an odd integer t > 0, T = t iff X = (t + 1)/2. Thus, PT (t) = PX ((t + 1)/2) = q (t−1)/2 (1 − q) t = 1, 3, 5, . . . 0 otherwise (2) Problem 2.6.6 Solution The cellular calling plan charges a flat rate of $20 per month up to and including the 30th minute, and an additional 50 cents for each minute over 30 minutes. Knowing that the time you spend on the phone is a geometric random variable M with mean 1/p = 30, the PMF of M is PM (m) = The monthly cost, C obeys 30 (1 − p)m−1 p m = 1, 2, . . . 0 otherwise (1) PC (20) = P [M ≤ 30] = m=1 (1 − p)m−1 p = 1 − (1 − p)30 (2) When M ≥ 30, C = 20 + (M − 30)/2 or M = 2C − 10. Thus, PC (c) = PM (2c − 10) The complete PMF of C is PC (c) = c = 20 1 − (1 − p)30 (1 − p)2c−10−1 p c = 20.5, 21, 21.5, . . . (4) c = 20.5, 21, 21.5, . . . (3) Problem 2.7.1 Solution E [T ] = n∈SN From the solution to Quiz 2.6, we found that T = 120 − 15N . By Theorem 2.10, (120 − 15n)PN (n) (1) (2) = 0.1(120) + 0.3(120 − 15) + 0.3(120 − 30) + 0.3(120 − 45) = 93 Also from the solution to Quiz 2.6, we found that   0.3 t = 75, 90, 105 0.1 t = 120 PT (t) =  0 otherwise 53 (3) Using Definition 2.14, E [T ] = t∈ST tPT (t) = 0.3(75) + 0.3(90) + 0.3(105) + 0.1(120) = 93 (4) As expected, the two calculations give the same answer. Problem 2.7.2 Solution Whether a lottery ticket is a winner is a Bernoulli trial with a success probability of 0.001. If we buy one every day for 50 years for a total of 50 · 365 = 18250 tickets, then the number of winning tickets T is a binomial random variable with mean E [T ] = 18250(0.001) = 18.25 (1) Since each winning ticket grosses $1000, the revenue we collect over 50 years is R = 1000T dollars. The expected revenue is E [R] = 1000E [T ] = 18250 (2) But buying a lottery ticket everyday for 50 years, at $2.00 a pop isn’t cheap and will cost us a total of 18250 · 2 = $36500. Our net profit is then Q = R − 36500 and the result of our loyal 50 year patronage of the lottery system, is disappointing expected loss of E [Q] = E [R] − 36500 = −18250 (3) Problem 2.7.3 Solution Let X denote the number of points the shooter scores. If the shot is uncontested, the expected number of points scored is E [X] = (0.6)2 = 1.2 (1) If we foul the shooter, then X is a binomial random variable with mean E[X] = 2p. If 2p > 1.2, then we should not foul the shooter. Generally, p will exceed 0.6 since a free throw is usually even easier than an uncontested shot taken during the action of the game. Furthermore, fouling the shooter ultimately leads to the the detriment of players possibly fouling out. This suggests that fouling a player is not a good idea. The only real exception occurs when facing a player like Shaquille O’Neal whose free throw probability p is lower than his field goal percentage during a game. Given the distributions of D, the waiting time in days and the resulting cost, C, we can answer the following questions. (a) The expected waiting time is simply the expected value of D. 4 Problem 2.7.4 Solution E [D] = d=1 d · PD (d) = 1(0.2) + 2(0.4) + 3(0.3) + 4(0.1) = 2.3 (1) (b) The expected deviation from the waiting time is E [D − µD ] = E [D] − E [µd ] = µD − µD = 0 54 (2) (d) The expected service charge is (c) C can be expressed as a function of D in the following manner.   90 D = 1   70 D = 2 C(D) =  40 D = 3   40 D = 4 E [C] = 90(0.2) + 70(0.4) + 40(0.3) + 40(0.1) = 62 dollars (3) (4) Problem 2.7.5 Solution As a function of the number of minutes used, M , the monthly cost is C(M ) = The expected cost per month is E [C] = ∞ m=1 30 20 M ≤ 30 20 + (M − 30)/2 M ≥ 30 ∞ (1) C(m)PM (m) = = 20 20PM (m) + m=1 ∞ m=1 PM (m) + m=31 ∞ (20 + (m − 30)/2)PM (m) (m − 30)PM (m) (2) (3) 1 2 m=31 Since ∞ m=1 PM (m) = 1 and since PM (m) = (1 − p)m−1 p for m ≥ 1, we have E [C] = 20 + (1 − p)30 2 ∞ m=31 (m − 30)(1 − p)m−31 p (4) Making the substitution j = m − 30 yields (1 − p)30 E [C] = 20 + 2 ∞ j=1 j(1 − p)j−1 p = 20 + (1 − p)30 2p (5) Problem 2.7.6 Solution Since our phone use is a geometric random variable M with mean value 1/p, PM (m) = (1 − p)m−1 p m = 1, 2, . . . 0 otherwise (1) For this cellular billing plan, we are given no free minutes, but are charged half the flat fee. That is, we are going to pay 15 dollars regardless and $1 for each minute we use the phone. Hence C = 15 + M and for c ≥ 16, P [C = c] = P [M = c − 15]. Thus we can construct the PMF of the cost C (1 − p)c−16 p c = 16, 17, . . . PC (c) = (2) 0 otherwise 55 Since C = 15 + M , the expected cost per month of the plan is E [C] = E [15 + M ] = 15 + E [M ] = 15 + 1/p In Problem 2.7.5, we found that that the expected cost of the plan was E [C] = 20 + [(1 − p)30 ]/(2p) (4) (3) In comparing the expected costs of the two plans, we see that the new plan is better (i.e. cheaper) if 15 + 1/p ≤ 20 + [(1 − p)30 ]/(2p) (5) A simple plot will show that the new plan is better if p ≤ p0 ≈ 0.2. We define random variable W such that W = 1 if the circuit works or W = 0 if the circuit is defective. (In the probability literature, W is called an indicator random variable.) Let Rs denote the profit on a circuit with standard devices. Let Ru denote the profit on a circuit with ultrareliable devices. We will compare E[Rs ] and E[Ru ] to decide which circuit implementation offers the highest expected profit. The circuit with standard devices works with probability (1 − q)10 and generates revenue of k dollars if all of its 10 constituent devices work. We observe that we can we can express Rs as a function rs (W ) and that we can find the PMF PW (w): Geometric probability of fail and sucess  Cost of ordinary if success and fail  1 − (1 − q)10 w = 0, −10 W = 0, w = 1, (1 − q)10 (1) PW (w) = Rs = rs (W ) = k − 10 W = 1,  0 otherwise. Thus we can express the expected profit as 1 Problem 2.7.7 Solution Ordinary: q=0.1 U: q/2 = 0.05 O: $1 U: $3 10 tests are done E [rs (W )] = w=0 PW (w) rs (w) Expected = probability x cost (2) (3) (4) = PW (0) (−10) + PW (1) (k − 10) For the ultra-reliable case, Ru = ru (W ) = −30 W = 0, k − 30 W = 1, PW = (1 − (1 − q)10 )(−10) + (1 − q)10 (k − 10) = (0.9)10 k − 10.   1 − (1 − q/2)10 w = 0, w = 1, (1 − q/2)10 (w) =  0 otherwise. (5) Thus we can express the expected profit as 1 E [ru (W )] = w=0 PW (w) ru (w) (6) (7) 10 10 = PW (0) (−30) + PW (1) (k − 30) 10 = (1 − (1 − q/2) )(−30) + (1 − q/2) (k − 30) = (0.95) k − 30 (8) To determine which implementation generates the most profit, we solve E[Ru ] ≥ E[Rs ], yielding k ≥ 20/[(0.95)10 − (0.9)10 ] = 80.21. So for k < $80.21 using all standard devices results in greater 56 revenue, while for k > $80.21 more revenue will be generated by implementing all ultra-reliable devices. That is, when the price commanded for a working circuit is sufficiently high, we should build more-expensive higher-reliability circuits. If you have read ahead to Section 2.9 and learned about conditional expected values, you might prefer the following solution. If not, you might want to come back and review this alternate approach after reading Section 2.9. Let W denote the event that a circuit works. The circuit works and generates revenue of k dollars if all of its 10 constituent devices work. For each implementation, standard or ultra-reliable, let R denote the profit on a device. We can express the expected profit as E [R] = P [W ] E [R|W ] + P [W c ] E [R|W c ] (9) Let’s first consider the case when only standard devices are used. In this case, a circuit works with probability P [W ] = (1 − q)10 . The profit made on a working device is k − 10 dollars while a nonworking circuit has a profit of -10 dollars. That is, E[R|W ] = k − 10 and E[R|W c ] = −10. Of course, a negative profit is actually a loss. Using Rs to denote the profit using standard circuits, the expected profit is E [Rs ] = (1 − q)10 (k − 10) + (1 − (1 − q)10 )(−10) = (0.9)10 k − 10 (10) And for the ultra-reliable case, the circuit works with probability P [W ] = (1−q/2)10 . The profit per working circuit is E[R|W ] = k − 30 dollars while the profit for a nonworking circuit is E[R|W c ] = −30 dollars. The expected profit is E [Ru ] = (1 − q/2)10 (k − 30) + (1 − (1 − q/2)10 )(−30) = (0.95)10 k − 30 (11) Not surprisingly, we get the same answers for E[Ru ] and E[Rs ] as in the first solution by performing essentially the same calculations. it should be apparent that indicator random variable W in the first solution indicates the occurrence of the conditioning event W in the second solution. That is, indicators are a way to track conditioning events. Problem 2.7.8 Solution (a) There are 46 6 equally likely winning combinations so that q= 1 46 6 = 1 ≈ 1.07 × 10−7 9,366,819 (1) (b) Assuming each ticket is chosen randomly, each of the 2n − 1 other tickets is independently a winner with probability q. The number of other winning tickets Kn has the binomial PMF PKn (k) = 0 2n−1 k q k (1 − q)2n−1−k k = 0, 1, . . . , 2n − 1 otherwise (2) (c) Since there are Kn + 1 winning tickets in all, the value of your winning ticket is Wn = n/(Kn + 1) which has mean 1 (3) E [Wn ] = nE Kn + 1 57 Calculating the expected value 1 = E Kn + 1 is fairly complicated. PMF. E 2n−1 k=0 1 k+1 PKn (k) (4) The trick is to express the sum in terms of the sum of a binomial 2n−1 1 = Kn + 1 = 1 2n k=0 2n−1 k=0 (2n − 1)! 1 q k (1 − q)2n−1−k k + 1 k!(2n − 1 − k)! (2n)! q k (1 − q)2n−(k+1) (k + 1)!(2n − (k + 1))! 2n−1 k=0 2n j=1 (5) (6) By factoring out 1/q, we obtain 1 1 = E Kn + 1 2nq = 1 2nq 2n q k+1 (1 − q)2n−(k+1) k+1 2n j q (1 − q)2n−j j A (7) (8) We observe that the above sum labeled A is the sum of a binomial PMF for 2n trials and success probability q over all possible values except j = 0. Thus A=1− This implies E 2n 0 q (1 − q)2n−0 = 1 − (1 − q)2n 0 (9) 1 − (1 − q)2n 1 = Kn + 1 2nq Our expected return on a winning ticket is E [Wn ] = nE Note that when nq 1 − (1 − q)2n 1 = Kn + 1 2q (10) (11) 1, we can use the approximation that (1 − q)2n ≈ 1 − 2nq to show that E [Wn ] ≈ 1 − (1 − 2nq) =n 2q (nq 1) (12) However, in the limit as the value of the prize n approaches infinity, we have 1 ≈ 4.683 × 106 lim E [Wn ] = n→∞ 2q (13) That is, as the pot grows to infinity, the expected return on a winning ticket doesn’t approach infinity because there is a corresponding increase in the number of other winning tickets. If it’s not clear how large n must be for this effect to be seen, consider the following table: n 106 107 108 E [Wn ] 9.00 × 105 4.13 × 106 4.68 × 106 (14) When the pot is $1 million, our expected return is $900,000. However, we see that when the pot reaches $100 million, our expected return is very close to 1/(2q), less than $5 million! 58 Problem 2.7.9 Solution (a) There are 46 6 equally likely winning combinations so that q= 1 46 6 = 1 ≈ 1.07 × 10−7 9,366,819 (1) (b) Assuming each ticket is chosen randomly, each of the 2n − 1 other tickets is independently a winner with probability q. The number of other winning tickets Kn has the binomial PMF PKn (k) = 0 2n−1 k q k (1 − q)2n−1−k k = 0, 1, . . . , 2n − 1 otherwise (2) Since the pot has n + r dollars, the expected amount that you win on your ticket is E [V ] = 0(1 − q) + qE 1 n+r = q(n + r)E Kn + 1 Kn + 1 (3) Note that E[1/Kn + 1] was also evaluated in Problem 2.7.8. For completeness, we repeat those steps here. E 1 = Kn + 1 = 2n−1 1 2n k=0 2n−1 k=0 (2n − 1)! 1 q k (1 − q)2n−1−k k + 1 k!(2n − 1 − k)! (2n)! q k (1 − q)2n−(k+1) (k + 1)!(2n − (k + 1))! (4) (5) By factoring out 1/q, we obtain 1 1 = E Kn + 1 2nq = 1 2nq 2n−1 k=0 2n j=1 2n q k+1 (1 − q)2n−(k+1) k+1 2n j q (1 − q)2n−j j A (6) (7) We observe that the above sum labeled A is the sum of a binomial PMF for 2n trials and success probability q over all possible values except j = 0. Thus A = 1 − 2n q 0 (1 − q)2n−0 , 0 which implies A 1 − (1 − q)2n 1 = = (8) E Kn + 1 2nq 2nq The expected value of your ticket is E [V ] = q(n + r)[1 − (1 − q)2n ] 1 r = 1+ [1 − (1 − q)2n ] 2nq 2 n (9) Each ticket tends to be more valuable when the carryover pot r is large and the number of new tickets sold, 2n, is small. For any fixed number n, corresponding to 2n tickets sold, 59 your expected profit E [R] = E r + n − 1/2 + 1/(2q) 1 − Kn + 1 q 1 1 q(2r + 2n − 1) + 1 E − = 2q Kn + 1 q 2n ) [q(2r + 2n − 1) + 1](1 − (1 − q) 1 = − 4nq 2 q (12) (13) (14) For fixed n.29 √ (d) σN = Var[N ] = 0. you must buy 1/q tickets.1 Solution Given the following PMF   0. If you buy one of each possible ticket. Since the cost of the tickets is 1/q dollars. limn→∞ E[R] = −1/(2q). you are guaranteed to have one winning ticket. as n approaches infinity.8. For example if n = 107 .7)1 + (0.29 60 . The 2n − 1 other tickets are sold add n − 1/2 dollars to the pot.2)0 + (0. From the other 2n − 1 tickets. your expected loss will be quite large.7)12 + (0.1)2 = 0. The total number of winning tickets will be Kn + 1. This is an unusual situation because normally a carryover pot of 30 million dollars will result in far more than 20 million tickets being sold.2   0.1  0 n=0 n=1 n=2 otherwise (1) (a) E[N ] = (0.7 PN (n) =   0.76. On the other hand.1 − (0. (20 million tickets sold) then r E [V ] = 0. sufficiently large r will make E[R] > 0. The pot had r dollars beforehand. This suggests that buying a one dollar ticket is a good idea. for fixed r.a sufficiently large pot r will guarantee that E[V ] > 1. suppose there were 2n − 1 tickets sold before you must make your decision.44 1 + 7 (10) 10 If the carryover pot r is 30 million dollars. That is. Problem 2.9 (b) E[N 2 ] = (0.1)22 = 1.1 (c) Var[N ] = E[N 2 ] − E[N ]2 = 1.2)02 + (0. then E[V ] = 1. there will be Kn winners. (c) So that we can use the results of the previous part. adding 1/(2q) dollars to the pot.9)2 = 0. Furthermore. In the previous part we found that E 1 − (1 − q)2n 1 = Kn + 1 2nq (11) Let R denote the expected return from buying one of each possible ticket. 2   0.1)2 = 0.4.4 PX (x) =  0.8.2.5) + 12 (0.1 (2) The expected value of X 2 is E X2 = x x2 PX (x) = (−1)2 (0.2 Solution From the solution to Problem 2.2) + 0(0.2) + 02 (0.4.8. the PMF of X   0.49 (4) Problem 2.4   0.1.2   0 61 is x = −3 x=5 x=7 otherwise (1) .Problem 2.3) = 0.5 (3) The variance of X is Var[X] = E X 2 − (E [X])2 = 0.4 Solution From the solution to Problem 2.5 − (0.3.3   0 The expected value of X is E [X] = x is x = −1 x=0 x=1 otherwise (1) xPX (x) = (−1)(0. the PMF of Y   1/4   1/4 PY (y) =  1/2   0 The expected value of Y is E [Y ] = y is y=1 y=2 y=3 otherwise (1) yPY (y) = 1(1/4) + 2(1/4) + 3(1/2) = 9/4 (2) The expected value of Y 2 is E Y2 = y y 2 PY (y) = 12 (1/4) + 22 (1/4) + 32 (1/2) = 23/4 (3) The variance of Y is Var[Y ] = E Y 2 − (E [Y ])2 = 23/4 − (9/4)2 = 11/16 (4) Problem 2.4.3 Solution From the solution to Problem 2.5 PX (x) =  0.8.3) = 0. the PMF of X   0.5) + 1(0. 56 (4) Problem 2.4 − (2. (b) The probability that X is within one standard deviation of its expected value is P [µX − σX ≤ X ≤ µX + σX ] = P [2 − 1 ≤ X ≤ 2 + 1] = P [1 ≤ X ≤ 3] This calculation is easy using the PMF of X.8.6 Solution (a) The expected value of X is 5 E [X] = x=0 xPX (x) 5 1 5 1 5 1 5 1 5 1 5 1 +1 +2 +3 +4 +5 5 5 5 5 5 0 2 1 2 2 2 3 2 4 2 5 25 (1) (2) (3) =0 = [5 + 20 + 30 + 20 + 5]/25 = 5/2 62 .2)2 = 18.The expected value of X is E [X] = x xPX (x) = −3(0.2 (2) The expected value of X 2 is E X2 = x x2 PX (x) = (−3)2 (0. X has standard deviation σX = Var[X] = E X 2 − (E [X])2 = 5 − 22 = 1 Var[X] = 1.4 (3) The variance of X is Var[X] = E X 2 − (E [X])2 = 23.2) = 2.4) + 7(0. P [1 ≤ X ≤ 3] = PX (1) + PX (2) + PX (3) = 7/8 (7) (6) Problem 2.4) + 52 (0.8.4) + 72 (0.5 Solution (a) The expected value of X is 4 E [X] = x=0 xPX (x) = 0 4 1 4 1 4 1 4 1 4 1 +1 +2 +3 +4 4 4 4 4 0 2 1 2 2 2 3 2 4 24 (1) (2) = [4 + 12 + 12 + 4]/24 = 2 The expected value of X 2 is 4 E X 2 = x=0 x2 PX (x) = 02 4 1 4 1 4 1 4 1 4 1 + 12 + 22 + 32 + 42 4 4 4 4 0 2 1 2 2 2 3 2 4 24 (3) (4) (5) = [4 + 24 + 36 + 16]/24 = 5 The variance of X is Thus.2) = 23.4) + 5(0. the assertion is proved.12] = P [1. by the definition of variance.12 ≤ X ≤ 2. we obtain P [2 ≤ X ≤ 3] = PX (2) + PX (3) = 10/32 + 10/32 = 5/8 (11) For Y = aX + b.12 says that E[aX + b] = aE[X] + b.62] = P [2 ≤ X ≤ 3] (7) 5/4 ≈ 1. the standard deviation of X is σX = (b) The probability that X is within one standard deviation of its mean is P [µX − σX ≤ X ≤ µX + σX ] = P [2.5 − 1.38 ≤ X ≤ 3. Var [Y ] = E (aX + b − (aE [X] + b))2 = a E (X − E [X]) 2 Problem 2. We begin by noting that Theorem 2. (8) (9) (10) By summing the PMF over the desired range.8 Solution Given the following description of the random variable Y . Hence.The expected value of X 2 is 5 E X 2 = x=0 x2 PX (x) 5 1 5 1 5 1 5 1 5 1 5 1 + 12 + 22 + 32 + 42 + 52 5 5 5 5 5 0 2 1 2 2 2 3 2 4 2 5 25 (4) (5) (6) = 02 = [5 + 40 + 90 + 80 + 25]/25 = 240/32 = 15/2 The variance of X is Var[X] = E X 2 − (E [X])2 = 15/2 − 25/4 = 5/4 By taking the square root of the variance. Y = 1 (X − µX ) σx (1) we can use the linearity property of the expectation operator to find the mean value E [Y ] = E [X − µX ] E [X] − E [X] = =0 σX σX (2) Using the fact that Var[aX + b] = a2 Var[X]. Problem 2.8. the variance of Y is found to be Var [Y ] = 1 2 Var [X] = 1 σX 63 (3) .7 Solution (1) (2) (3) = E a2 (X − E [X])2 2 Since E[(X − E[X])2 ] = Var[X].8.5 + 1.12. we wish to show that Var[Y ] = a2 Var[X]. Problem 2.8. both E[X 2 ] and ˆ x ˆ E[X] are simply constants. we can express the jitter as a function of q by realizing that Var[T ] = 4 Var[X] = Therefore. d E X 2 − 2ˆE [X] + x2 = 2E [X] − 2ˆ = 0 x ˆ x dˆ x (3) Hence we see that x = E[X].9 Solution With our measure of jitter being σT .11 Solution The PMF of K is the Poisson PMF PK (k) = The mean of K is E [K] = ∞ k=0 λk e−λ /k! k = 0. we know that a value of q = 3/2 − 5/2 ≈ 0.10 Solution We wish to minimize the function with respect to x. and the fact that T = 2X − 1.8. 1. By solving this quadratic equation.382 will ensure a jitter of at most 2 milliseconds. . . We can expand the square and take the expectation while treating x as a ˆ ˆ constant. we obtain √ √ 3± 5 q= = 3/2 ± 5/2 (3) 2 √ Since q must be a value between 0 and 1. we use the hint and first find E [K(K − 1)] = ∞ k=0 k(k − 1) 64 λk e−λ = k! λk e−λ (k − 2)! (3) . This yields x ˆ x ˆ e(ˆ) = E X 2 − 2ˆX + x2 = E X 2 − 2ˆE [X] + x2 x (2) e(ˆ) = E (X − x)2 x ˆ (1) Solving for the value of x that makes the derivative de(ˆ)/dˆ equal to zero results in the value of ˆ x x x that minimizes e(ˆ). Problem 2. 0 otherwise ∞ k=1 (1) k λk e−λ =λ k! λk−1 e−λ =λ (k − 1)! ∞ k=2 (2) To find E[K 2 ]. the best guess for a random ˆ variable is the mean value.8. In the sense of mean squared error. Problem 2. In Chapter 9 this idea is extended to develop minimum mean squared error estimation. Note that when we take the derivative with respect to x. . our maximum permitted jitter is σT = 4q (1 − q)2 (1) √ 2 q = 2 msec (1 − q) (2) Solving for q yields q 2 − 3q + 1 = 0. 9. we obtain E [K(K − 1)] = λ2 ∞ j=0 λj e−λ = λ2 j! 1 (4) The above sum equals 1 because it is the sum of a Poisson PMF over all possible values.By factoring out λ2 and substituting j = k − 2.17.1 − 2.1 √ (2) 0. the PMF of Y   1/4   1/4 PY (y) =  1/2   0 (1) yPY |B (y) = 1(1/2) + 2(1/2) = 3/2 y 2 PY |B (y) = 12 (1/2) + 22 (1/2) = 5/2 (3) (4) y .6 = 6.32 = is y=1 y=2 y=3 otherwise 2 4 Var[D] = E [D2 ] − E [D]2 (1) = d=1 d2 PD (d) = 0.6 + 2. From Theorem 2.8. the conditional PMF of Y given B is  What's the probability of y=1 or 2  1/2 y = 1 PY (y) y∈B P [B] 1/2 y = 2 PY |B (y) = = (2)  0 otherwise 0 otherwise The conditional first and second moments of Y are E [Y |B] = E Y 2 |B = The conditional variance of Y is Var[Y |B] = E Y 2 |B − (E [Y |B])2 = 5/2 − 9/4 = 1/4 65 (5) y From the solution to Problem 2.12 Solution The standard deviation can be expressed as σD = where E D So finally we have σD = 6.7 + 1. Since E[K] = λ.81 = 0.1.9 (3) Problem 2.2 + 1.4. the variance of K is Var[K] = E K 2 − (E [K])2 = λ2 + λ − λ2 = λ (5) 2 = E [K(K − 1)] + E [K] − (E [K]) (6) (7) Problem 2.1 Solution The probability of the event B = {Y < 3} is P [B] = 1 − P [Y = 3] = 1/2. 4 PX (x) =  0.9.5 PX (x) =  0.4.96 (5) Problem 2.17.6 = PX|B (x) =  0 otherwise 0 The conditional first and second moments of X are E [X|B] = x = 0.4.17.3.4) + 1(0.4 PX (x) x∈B P [B] 0.Problem 2.6) = 1 x (3) (4) E X 2 |B = The conditional variance of X is Var[X|B] = E X 2 |B − (E [X|B])2 = 1 − (0.9. the PMF of X   0.2.3   0 is x = −1 x=0 x=1 otherwise (1) The event B = {|X| > 0} has probability P [B] = P [X = 0] conditional PMF of X given B is   0.2   0 is x = −3 x=5 x=7 otherwise (1) The event B = {X > 0} has probability P [B] = PX (5) + PX (7) conditional PMF of X given B is   2/3 PX (x) x∈B P [B] 1/3 PX|B (x) = =  0 otherwise 0 The conditional first and second moments of X are E [X|B] = x = 0.2   0. From Theorem 2.4) + 12 (0. the PMF of X   0.5.2)2 = 0.3 Solution From the solution to Problem 2. From Theorem 2.2 Solution From the solution to Problem 2.2 x2 PX|B (x) = (−1)2 (0.6) = 0. the x=5 x=7 otherwise (2) xPX|B (x) = 5(2/3) + 7(1/3) = 17/3 x2 PX|B (x) = 52 (2/3) + 72 (1/3) = 33 x (3) (4) E X 2 |B = The conditional variance of X is Var[X|B] = E X 2 |B − (E [X|B])2 = 33 − (17/3)2 = 8/9 66 (5) .4   0. the x = −1 x=1 otherwise (2) xPX|B (x) = (−1)(0.6. 4 Solution The event B = {X = 0} has probability P [B] = 1 − P [X = 0] = 15/16.5 Solution The probability of the event B is P [B] = P [X ≥ µX ] = P [X ≥ 3] = PX (3) + PX (4) + PX (5) = The conditional PMF of X given B is PX|B (x) = PX (x) P [B] 5 3 (1) (2) + 32 5 4 + 5 5 = 21/32 0 x∈B = otherwise 0 5 1 x 21 x = 3. 5 otherwise (3) The conditional first and second moments of X are 5 E [X|B] = x=3 xPX|B (x) = 3 5 1 5 1 5 1 +4 +5 4 21 5 21 3 21 5 1 5 1 5 1 + 42 + 52 3 21 4 21 5 21 (4) (5) (6) (7) = [30 + 20 + 5]/21 = 55/21 5 E X 2 |B = x2 PX|B (x) = 32 x=3 = [90 + 80 + 25]/21 = 195/21 = 65/7 The conditional variance of X is Var[X|B] = E X 2 |B − (E [X|B])2 = 65/7 − (55/21)2 = 1070/441 = 2.43 (8) 67 .9.9.Problem 2. 2.782 (6) Problem 2. The conditional PMF of X given B is PX|B (x) = PX (x) P [B] 0 x∈B = otherwise 0 4 1 x 15 x = 1. 3. 4. 4 otherwise (1) The conditional first and second moments of X are 4 E [X|B] = x=1 xPX|B (x) = 1 4 1 4 1 4 1 4 1 2 +3 +4 3 15 4 15 1 15 2 15 4 1 2 4 1 4 1 4 1 2 + 32 + 42 1 15 2 15 3 15 4 15 (2) (3) (4) (5) = [4 + 12 + 12 + 4]/15 = 32/15 4 E X 2 |B = x2 PX|B (x) = 12 x=1 = [4 + 24 + 36 + 16]/15 = 80/15 The conditional variance of X is Var[X|B] = E X 2 |B − (E [X|B])2 = 80/15 − (32/15)2 = 176/225 ≈ 0. . 21. The conditional PMF of N given B is PN |B (n) = PN (n) P [B] 0 n∈B = otherwise (1 − p)n−20 p n = 20. we effectively have the expected value of J + 19 where J is geometric random variable with parameter p. . . .e.9. . the number of additional tests needed to find the first failure is still a geometric random variable with mean 1/p.6 Solution (a) Consider each circuit test as a Bernoulli trial such that a failed circuit is called a success.Problem 2. This is not surprising since the N ≥ 20 iff we observed 19 successful tests. The number of trials until the first success (i. . 0 otherwise (3) (c) Given the event B.9.7 Solution (a) The PMF of M . the conditional expectation of N is E [N |B] = nPN |B (n) = ∞ n=20 n n(1 − p)n−20 p (4) Making the substitution j = n − 19 yields E [N |B] = ∞ j=1 (j + 19)(1 − p)j−1 p = 1/p + 19 (5) We see that in the above sum. is P [M > 0] = 1 − P [M = 0] = 1 − q 68 (2) . 2. 0 otherwise (1) (b) The probability there are at least 20 tests is P [B] = P [N ≥ 20] = ∞ n=20 PN (n) = (1 − p)19 (2) Note that (1 − p)19 is just the probability that the first 19 circuits pass the test. 0 otherwise (1) And we can see that the probability that M > 0. Problem 2. which is what we would expect since there must be at least 20 tests if the first 19 circuits pass. the number of miles run on an arbitrary day is PM (m) = q(1 − q)m m = 0. . . 1. . a failed circuit) has the geometric PMF PN (n) = (1 − p)n−1 p n = 1. After 19 successful tests. 365 otherwise (4) (d) The random variable K is defined as the number of miles we run above that required for a marathon. 6. and thus 365 chances to run a marathon. So the PMF of the number of marathons run in a year. The event A has probability P [A] = PX (2) + PX (4) + PX (6) + PX (8) = 0. 1. PK|A (k) = (1 − q)26+k q = (1 − q)k q (1 − q)26 (1 − q)k q k = 0. r = P [M ≥ 26] = ∞ m=26 q(1 − q)m = (1 − q)26 (3) (c) We run a marathon on each day with probability equal to r. A. and we do not run a marathon with probability 1 − r.8 Solution Recall that the PMF of the number of pages in a fax is   0. 1. the conditional PMF of X is PX|A (x) = PX (x) P [A] (2) 0   0. .9. that we have run a marathon. So we want to know the conditional PMF PK|A (k). 4. .5 Given the event A. Given the event. 2.1 x = 5. 3.. . 0 otherwise (6) The complete expression of for the conditional PMF of K is PK|A (k) = (7) Problem 2. . . P [M = 26 + k] P [K = k. Since SX = {1. 1. 2. . 6. 8 =  otherwise 0 otherwise 69 (3) . Similarly. Therefore in a year we have 365 tests of our jogging resolve. J. we have that the event that a fax is sent to A is equivalent to the event X ∈ A. . . 4 x∈A 0.3 x = 2. . A] PK|A (k) = = (5) P [A] P [A] Since P [A] = r.15 x = 1. we define the set A = {2. 7. 4 0. for k = 0. 8 PX (x) =  0 otherwise (1) (a) The event that a fax was sent to machine A can be expressed mathematically as the event that the number of pages X is an even number.(b) The probability that we run a marathon on any particular day is the probability that M ≥ 26. 8}. 8}. . K = M − 26. Using this definition for A. the event that a fax was sent to B is the event that X is an odd number. . can be expressed as PJ (j) = 0 365 j rj (1 − r)365−j j = 0. .2 x = 6. we wish to know how many miles in excess of 26 we in fact ran. . . q=sum(binomialpmf(n.0..p).] and then to pass that vector to the binomialpmf. p) random variable X.56 (12) (13) Problem 2.2) = 26 x (4) (5) E X 2 |A = The conditional variance and standard deviation are Var[X|A] = E X 2 |A − (E [X|A])2 = 26 − (4. 3 x∈B 1/4 x = 5 =  otherwise 0 otherwise (9) xPX|B (x) = 1(3/8) + 3(3/8) + 5(1/4)+ = 11/4 x2 PX|B (x) = 1(3/8) + 9(3/8) + 25(1/4) = 10 x (10) (11) E X 2 |B = The conditional variance and standard deviation are Var[X|B ] = E X 2 |B − (E X|B )2 = 10 − (11/4)2 = 39/16 √ σX|B = Var[X|B ] = 39/4 ≈ 1.2) = 4.0811 70 .The conditional first and second moments of X given A is E [X|A] = x xPX|A (x) = 2(0.m.1 Solution For a binomial (n. the values of the binomial PMF are calculated only once. 0.84 σX|A = Var[X|A] = 2. Hence.3) + 16(0. the probability X is a perfect square is >> perfectbinomial(100.10.p.5) random variable X. In this case.4 (8) The conditional PMF of X given B is PX|B (x) = PX (x) P [B ] 0 Given the event B .6 x2 PX|A (x) = 4(0.2) + 64(0.3) + 6(0. For a binomial (100.6)2 = 4. x=i. the efficient solution is to generate the vector of perfect squares x = [0 1 4 9 16 .2) + 8(0.x)). the event B = {1.. Here is the code: function q=perfectbinomial(n. the conditional first and second moments are E X|B = x   3/8 x = 1. 5} has probability P B = PX (1) + PX (3) + PX (5) = 0. the solution in terms of math is √ n P [E2 ] = x=0 PX x2 (1) In terms of Matlab.5) ans = 0.3) + 36(0.2 (6) (7) (b) Let the event B denote the event that the fax was sent to B and that the fax had no more than 6 pages. i=0:floor(sqrt(n)).3) + 4(0. 3.^2. 15*ones(1. yy=cumsum([10 9 8 7 6]). x=finiterv(sx.10. we will develop techniques to show how Y converges to E[Y ] as m → ∞. α = 1) random variable and thus has PMF PX (x) = c(n)/x x = 1.Problem 2.10.3 Solution Each time we perform the experiment of executing the function avgfax. Here is the code: function y=avgfax(m). and m corresponding samples of Y . Next we convert that to m samples of the fax cost Y .8250 >> In Chapter 7. yy=[yy 50 50 50]. x=faxlength8(m). Problem 2. we obtain results reasonably close to E[Y ]: >> [avgfax(1000) avgfax(1000) avgfax(1000) avgfax(1000)] ans = 32. For m = 10.4)]. we obtain the average cost of m samples.10.1*ones(1.2000 29.2 to generate m samples of the faqx length X. the results are arguably more consistent: >> [avgfax(100) avgfax(100) avgfax(100) avgfax(100)] ans = 34. That is. then with P [Xn ≤ k] the request is for one of the k cached 71 .9000 31.2 Solution The random variable X given in Example 2.4 Solution Suppose Xn is a Zipf (n. Summing these samples and dividing by m. .m). we generate m random 1 samples of X.10. Problem 2.p.8920 33. The code is function x=faxlength8(m). y=sum(yy(x))/m.6900 >> Finally. We can generate random samples using the finiterv function.8100 33.29 is just a finite random variable.4) 0.1740 31. n 0 otherwise (1) The problem asks us to find the smallest value of k such that P [Xn ≤ k] ≥ 0. . First we use faxlength8 from Problem 2. sx=1:8.75.1890 32. i=1 four samples of Y are >> [avgfax(10) avgfax(10) avgfax(10) avgfax(10)] ans = 31.6000 34. p=[0. 2. . for m = 1000.3000 29.5300 33. . The sum Y = m m Yi is random.1000 >> For m = 100. if the server caches the k most popular files. we can write k =1+ k| p 1 < c(k ) c(n) (7) This is the basis for a very short Matlab program: function k=zipfcacheall(n. The program zipfcache generalizes 0. returns the smallest k %such that the first k items have total probability p pmf=1.files. A better way is to use the properties of the Zipf PDF. k=1+sum(cdf<=p).p. we use Matlab to grind through the calculations.p)./cumsum(1. (3) Thus. function k=zipfcache(n. %Usage: k=zipfcacheall(n. The following simple program generates the Zipf distributions and returns the correct value of k. %returns vector k such that the first %k(m) items have total probability >= p %for the Zipf(m.75. 72 . Thus. Although this program is sufficient. c(k ) c(n) k = 1. the problem asks us to find k for all values of n from 1 to 103 !. %for the Zipf (n. . n −1 (2) What makes the Zipf distribution hard to analyze is that there is no closed form expression for c(n) = x=1 1 x . %Usage: k=zipfcache(n. in math terms.p). .p).1) distribution.p). k=1+countless(1. In particular. (5) Note that the definition of k implies that 1 p < ./c).alpha=1) distribution. pmf=pmf/sum(pmf). First. %normalize to sum to 1 cdf=cumsum(pmf). ./c. we are looking for k = min k |P Xn ≤ k ≥ p . ./(1:n). k P Xn ≤ k = c(n) Thus we wish to find k = min k | c(n) ≥p c(k ) x=1 c(n) 1 = x c(k ) (4) = min k | p 1 ≥ c(k ) c(n) . One way to do this is to call zipfcache a thousand times to find k for each value of n.75 to be the probability p. k − 1. we might as well solve this problem for any probability p rather than just p = 0. (6) Using the notation |A| to denote the number of elements in the set A./(1:n)). c=1. to do its counting all at once. the commands k=zipfcacheall(1000. countless will cause an “out of memory” error.000. 1000. plot(1:1000. It is generally desirable for Matlab to execute operations in parallel. The program zipfcacheall generally will run faster than n calls to zipfcache. xr=0:ceil(3*alpha). here are sample plots comparing the PMF and the relative frequency.^2).10.m to generate random samples of a Poisson (α = 5) random variable. say n = 106 (as was proposed in the original printing of this edition of the text.xr)/m).px). px=poissonpmf(alpha. The plots show reasonable agreement for m = 10000 samples.m) x=poissonrv(alpha. If n=countless(x. we make one last observation. say n ≤ 1000. pxsample=pxsample(:).47. function diff=poissontest(alpha. In any case.75).xr. is sufficient to produce this figure of k as a function of m: 200 150 100 50 0 k 0 100 200 300 400 500 n 600 700 800 900 1000 We see in the figure that the number of files that must be cached grows slowly with the total number of files n.xr). For much large values of n. For m = 100. However.y).m introduced in Example 2.5 Solution We use poissonrv. diff=sum((pxsample-px).k).m).m that is almost the same as count. The following code plots the relative frequency against the PMF. 73 .xr)/m. plot(xr. pxsample=hist(x. Finally. When n is not too large.000 elements fits in memory. 10000. Problem 2. %pxsample=(countequal(x. To compare the Poisson PMF against the output of poissonrv.0. then n(i) is the number of elements of x that are strictly less than y(i) while count returns the number of elements less than or equal to y(i). relative frequencies are calculated using the hist function. countless generates and n × n array. the resulting array with n2 = 1.Note that zipfcacheall uses a short Matlab program countless.pxsample. 1 0.01).15 0 5 10 15 20 (a) n = 10.1 0 0 5 10 15 0 0 5 10 15 (a) m = 100 0. p = 1 (b) n = 100. the approximation is reasonable: 1 0.2 0.1 0. p) = (10. the binomial PMF has no randomness.2 0.1 0 0 5 10 15 0 0 5 10 15 (a) m = 1000 0.p. the approximation is very good: 74 .10.x).8 0. p = 0.1 Finally.1 0.1. and (n.b). p) = (10000. 0.1) using the following Matlab code: x=0:20.x.001).4 0. p=poissonpmf(100.6 Solution We can compare the binomial and Poisson PMFs for (n.0. plot(x.2 0 0 5 10 15 20 0. b=binomialpmf(100. 0.2 0.0.x). 1). For (n.2 (b) m = 1000 0. For (n. 0. for (n.1 0. p) = (100.6 0.1 0 0 5 10 15 0 0 5 10 15 (a) m = 10.2 0.2 (b) m = 100 0. p) = (1000.000 (b) m = 10. p) = (100.1).2 0.05 0 0. 0.000 Problem 2. 15 0. Thus.m) %Usage: x=geometricrv(p.4. (3) Next we take the logarithm of each side. < R ≤ 1 − (1 − p)k . (5) ln(1 − p) Since k ∗ is an integer. x=ceil(log(1-r)/log(1-p)).15 0. Thus dividing through by ln(1 − p) reverses the inequalities. we have that ln(1 − p) < 0.4.05 0 0. yielding ln(1 − R) k∗ − 1 > ≤ k∗ . we generate a sample value R = rand(1) and then we find k ∗ such that (1) FK (k ∗ − 1) < R < FK (k ∗ ) .2 0. ∗ (2) Subtracting 1 from each side and then multiplying through by −1 (which reverses the inequalities). (4) Since 0 < p < 1.1 0. we know for integers k ≥ 1 that geometric (p) random variable K has CDF FK (k) = 1 − (1 − p)k . it must be the smallest integer greater than or equal to ln(1 − R)/ ln(1 − p). Since logarithms are monotonic functions.m) % returns m samples of a geometric (p) rv r=rand(m.05 0 0 5 10 15 20 0 5 10 15 20 (a) n = 1000. 75 .1 0. That is.10. p = 0. p = 0.2 0. we obtain ∗ ∗ (1 − p)k −1 > 1 − R ≥ (1 − p)k . we have (k ∗ − 1) ln(1 − p) > ln(1 − R) ≥ k ∗ ln(1 − p).1).0.001 Problem 2. 1 − (1 − p)k ∗ −1 Following the Random Sample algorithm. following the last step of the random sample algorithm.01 (b) n = 10000. K = k∗ = ln(1 − R) ln(1 − p) (6) The Matlab algorithm that implements this operation is quite simple: function x=geometricrv(p.7 Solution From Problem 2. -alpha+ (k*log(alpha))-logfacts]). okx=(x>=0).44 is that the cumulative product that calculated nk /k! can have an overflow. %out=vector pmf: pmf(i)=P[X=x(i)] x=x(:). 76 . else k=(1:ceil(max(x)))’. we can write an alternate poissonpmf function as follows: function pmf=poissonpmf(alpha. . x=okx. end %pmf(i)=0 for zero-prob x(i) Problem 2. logfacts =cumsum(log(k)).For the PC version of Matlab employed for this test.n) reported Inf for n = n∗ = 714.. pmf=okx.0*(x==0). Following the hint.10.. pb=exp([-alpha. The problem with the poissonpmf function in Example 2.*pb(x+1). the intermediate terms are much less likely to overflow.x) %Poisson (alpha) rv X.*(x==floor(x)).8 Solution By summing logarithms.*x. if (alpha==0) pmf=1. poissonpmf(n. Being careful. we must have a ≤ 1. We meet this requirement if c(7 + 5)2 = 1. a+1 = 0. This implies c = 1/144. (a) P [X > 1/2] = 1 − P [X ≤ 1/2] = 1 − FX (1/2) = 1 − 3/4 = 1/4 (b) This is a little trickier than it should be. This occurs if we choose c such that FV (v) doesn’t have a discontinuity at v = 7. FV (v) must be a continuous function.1 Solution The CDF of X is  x < −1  0 (x + 1)/2 −1 ≤ x < 1 FX (x) =  1 x≥1 (1) Each question can be answered by expressing the requested probability in terms of FX (x).8 2 (7) (6) P [|X| ≤ 1/2] = P [−1/2 ≤ X ≤ 1/2] = P [X ≤ 1/2] − P [X < −1/2] (5) (4) Problem 3.6.1. P [−1/2 ≤ X < 3/4] = P [−1/2 < X ≤ 3/4] = FX (3/4) − FX (−1/2) = 5/8 (c) Note that P [X ≤ 1/2] = FX (1/2) = 3/4. For a ≤ 1.6. we can write P [−1/2 ≤ X < 3/4] = P [−1/2 < X ≤ 3/4] + P [X = −1/2] − P [X = 3/4] (3) (2) Since the CDF of X is a continuous function.2 Solution The CDF of V was given to be  v < −5  0 2 −5 ≤ v < 7 c(v + 5) FV (v) =  1 v≥7 (1) (a) For V to be a continuous random variable. Since the probability that P [X = −1/2] = 0. This implies P [|X| ≤ 1/2] = P [X ≤ 1/2] − P [X < −1/2] = 3/4 − 1/4 = 1/2 (d) Since FX (1) = 1.Problem Solutions – Chapter 3 Problem 3. P [X < −1/2] = P [X ≤ 1/2]. Hence P [X < −1/2] = FX (−1/2) = 1/4.1. 77 . the probability that X takes on any specific value is zero. (If this is not clear at this point. it will become clear in Section 3. This implies P [X = 3/4] = 0 and P [X = −1/2] = 0. we need to satisfy P [X ≤ a] = FX (a) = Thus a = 0.) Thus. That is. This implies nx nx + 1 nx ≤ ≤ n n n x≤ x ≤ lim 1 nx ≤x+ n n (1) (2) (3) n→∞ 1 nx ≤ lim x + = x n→∞ n n 78 . we observe that P [W ≤ a] = FW (a) = 1/2 for a in the range 3 ≤ a ≤ 5.4 Solution (a) By definition. (4) P [V > a] = 1 − FV (a) = 1 − (a + 5)2 /144 = 2/3 √ The unique solution in the range −5 ≤ a ≤ 7 is a = 4 3 − 5 = 1. In this range. the CDF of W is   0    (w + 5)/8  1/4 (w) =   1/4 + 3(w − 3)/8    1 w < −5 −5 ≤ w < −3 −3 ≤ w < 3 3≤w<5 w ≥ 5. (2) Problem 3. This implies nx ≤ nx ≤ nx + 1. In this range. (b) By part (a).3 Solution In this problem.(b) P [V > 4] = 1 − P [V ≤ 4] = 1 − FV (4) = 1 − 81/144 = 63/144 (c) P [−3 < V ≤ 0] = FV (0) − FV (−3) = 25/144 − 4/144 = 21/144 (3) (d) Since 0 ≤ FV (v) ≤ 1 and since FV (v) is a nondecreasing function.1.1. Problem 3. FW (1) Each question can be answered directly from this CDF.928. nx is the smallest integer that is greater than or equal to nx. (a) P [W ≤ 4] = FW (4) = 1/4 + 3/8 = 5/8. (5) FW (a) = 1/4 + 3(a − 3)/8 = 1/2 This implies a = 11/3. it must be that −5 ≤ a ≤ 7. (c) P [W > 0] = 1 − P [W ≤ 0] = 1 − FW (0) = 3/4 (2) (3) (4) (d) By inspection of FW (w). (b) P [−2 < W ≤ 2] = FW (2) − FW (−2) = 1/4 − 1/4 = 0. 2. This implies n→∞ (5) (6) nx 1 = x ≤ lim ≤x n→∞ n n Problem 3. we can find the PDF by direct differentiation.3 Solution We find the PDF by taking the derivative of FU (u) on each piece that FU (u) is defined. 79 . 0 u ≥ 5.2. The CDF and correponding PDF are  x < −1  0 1/2 −1 ≤ x ≤ 1 (x + 1)/2 −1 ≤ x ≤ 1 FX (x) = (1) fX (x) = 0 otherwise  1 x>1 Problem 3.2. This implies nx − 1 ≤ nx ≤ nx. 2 cx dx = 2c = 1 (2) 0 Therefore c = 1/2.1 Solution fX (x) = cx 0 ≤ x ≤ 2 0 otherwise (1) (a) From the above PDF we can determine the value of c by integrating the PDF and setting it equal to 1. nx is the largest integer that is less than or equal to nx.2 Solution (d) The CDF of X is found by integrating the PDF from 0 to   0 x x2 /4 FX (x) = fX x dx =  0 1 From the CDF. It follows that nx nx nx − 1 ≤ ≤ n n n x− lim x − 1 nx ≤ ≤x n n (4) That is. x<0 0≤x≤2 x>2 (3) Problem 3. (b) P [0 ≤ X ≤ 1] = 1 x 0 2 dx = 1/4 1/2 x 2 0 (c) P [−1/2 ≤ X ≤ 1/2] = dx = 1/16 x.(c) In the same way. The CDF and corresponding PDF of U are   u < −5 u < −5  0  0      (u + 5)/8  1/8 −5 ≤ u < −3 −5 ≤ u < −3   1/4 −3 ≤ u < 3 0 −3 ≤ u < 3 FU (u) = fU (u) = (1)    1/4 + 3(u − 3)/8 3 ≤ u < 5  3/8 3 ≤ u < 5       1 u ≥ 5. we must have ax + 2 − 2a/3 ≥ 0 for all x ∈ [0. When 0 ≤ x < 2/3. FX (x) = 0. 1].Problem 3. the requirement holds for all a.2.5 Solution fX (x) = ax2 + bx 0 ≤ x ≤ 1 0 otherwise (1) First. For x ≥ 0. where we must have a/3 ≥ −2 or a ≥ −6. 80 .4 Solution For x < 0. the constraint is most stringent at x = 1. This requirement can be written as a(2/3 − x) ≤ 2 (0 ≤ x ≤ 1) (4) For x = 2/3. 1]. FX (x) = 0 x fX (y) dy a2 ye−a 2 y 2 /2 2 y 2 /2 (1) dy 2 x2 /2 = 0 x (2) (3) = −e−a A complete expression for the CDF of X is FX (x) = x 0 = 1 − e−a 0 x<0 −a2 x2 /2 x ≥ 0 1−e (4) Problem 3. When 2/3 < x ≤ 1. 1 0 (ax2 + bx) dx = a/3 + b/2 = 1 (2) Hence. the shape of the PDF fX (x) varies greatly with the value of a. we can write the constraint as a(x − 2/3) ≥ −2. we note that a and b must be chosen such that the above PDF integrates to 1.2. the problem is tricky because we must consider the cases 0 ≤ x < 2/3 and 2/3 < x ≤ 1 separately because of the sign change of the inequality. Thus a complete expression for our requirements are −6 ≤ a ≤ 3 b = 2 − 2a/3 (5) As we see in the following plot. we have 2/3 − x > 0 and the requirement is most stringent at x = 0 where we require 2a/3 ≤ 2 or a ≤ 3. b = 2 − 2a/3 and our PDF becomes fX (x) = x(ax + 2 − 2a/3) (3) For the PDF to be non-negative for x ∈ [0. In this case. However. 3.3. (a) E[X] = 1 and Var[X] = (3+1)2 12 = 4/3.1 Solution fX (x) = 1/4 −1 ≤ x ≤ 3 0 otherwise (1) We recognize that X is a uniform random variable from [-1.5 2 1. (b) The new random variable Y is defined as Y = h(X) = X 2 . Therefore h(E [X]) = h(1) = 1 and (c) Finally E [Y ] = E [h(X)] = E X 2 = 7/3 Var [Y ] = E X 4 − E X 2 2 (2) (3) E [h(X)] = E X 2 = Var [X] + E [X]2 = 4/3 + 1 = 7/3 (4) x4 4 dx − 49 61 49 = − 9 5 9 (5) = 3 −1 Problem 3.5 0 0 0.3 2.3].6 0.2 0.4 0.8 1 a=−6 a=−3 a=0 a=3 Problem 3.5 1 0.9] E [X] = √ (b) Define h(X) = 1/ X then √ h(E [X]) = 1/ 5 E [h(X)] = 1 9 1+9 =5 2 Var [X] = (9 − 1)2 16 = 12 3 (1) (2) 8 dx = 1/2 (3) x−1/2 81 .2 Solution (a) Since the PDF is uniform over [1. 4 Solution We can find the expected value of X by direct integration of the given PDF.3.3 Solution The CDF of X is  x<0  0 x/2 0 ≤ x < 2 FX (x) =  1 x≥2 fX (x) = The expected value is then E [X] = 0 (1) (a) To find E[X]. 2 (3) The variance is then Var[Y ] = E[Y 2 ] − E[Y ]2 = 2 − (4/3)2 = 2/9.3.3.5 Solution The CDF of Y is  y < −1  0 (y + 1)/2 −1 ≤ y < 1 FY (y) =  1 y≥1 82 (1) . Problem 3. fY (y) = The expectation is E [Y ] = 0 y/2 0 ≤ y ≤ 2 0 otherwise 2 (1) y2 dy = 4/3 2 (2) To find the variance. 1/2 0 ≤ x ≤ 2 0 otherwise 2 (2) x dx = 1 2 (3) (b) E X2 = 0 2 x2 dx = 8/3 2 (4) (5) Var[X] = E X 2 − E [X]2 = 8/3 − 1 = 5/3 Problem 3.(c) E [Y ] = E [h(X)] = 1/2 Var [Y ] = E Y 2 − (E [Y ])2 = 9 1 (4) x−1 8 dx − E [X]2 = ln 9 − 1/4 8 (5) Problem 3. we first find the PDF by differentiating the above CDF. we first find the second moment E Y2 = 0 2 y3 dy = 2. 55.2 83 .(a) We can find the expected value of Y by first differentiating the above CDF to find the PDF fY (y) = It follows that E [Y ] = (b) E Y2 = 1 −1 1/2 −1 ≤ y ≤ 1.6 Solution 1 72 −∞ vfV (v) dv = v 3 5v 2 + 3 2 7 1 72 = 7 −5 (v 2 + 5v) dv 343 245 125 125 + + − 3 2 3 2 =3 (2) (3) −5 1 72 (b) To find the variance. The CDF and corresponding PDF of V are   v < −5 v < −5  0  0 2 /144 −5 ≤ v < 7 (v + 5) (v + 5)/72 −5 ≤ v < 7 FV (v) = fV (v) = (1)   1 v≥7 0 v≥7 (a) The expected value of V is E [V ] = = ∞ Problem 3.55 The variance is Var[V ] = E[V 2 ] − (E[V ])2 = 2831/432 = 6.3. we need the PDF fV (v). which we find by taking the derivative of the CDF FV (v). 1 −1 (2) y/2 dy = 0. (c) The third moment of V is E V3 = = ∞ 1 72 −∞ v 3 fV (v) dv = v 5 5v 4 + 5 4 7 −5 1 72 7 −5 (v 4 + 5v 3 ) dv (6) (7) = 86. we first find the second moment E V2 = = ∞ v 2 fV (v) dv = v 4 5v 3 + 4 3 7 −5 1 72 −∞ 1 72 7 −5 (v 3 + 5v 2 ) dv (4) (5) = 6719/432 = 15. (3) y2 dy = 1/3 2 (4) (5) Var[Y ] = E Y 2 − E [Y ]2 = 1/3 − 0 = 1/3 To evaluate the moments of V . 0 otherwise. The CDF and u < −5 −5 ≤ u < −3 −3 ≤ u < 3 3≤u<5 u ≥ 5.To find the moments. (1) −∞ ufU (u) du = = −5 −3 u2 u du + 8 + 5 3 2 5 3u 3u du 8 (2) (3) 16 −5 16 =2 3 (b) The second moment of U is E U2 ∞ −∞ u2 fU (u) du = −3 u3 = + 24 −5 8 −5 −3 u3 u2 du + 8 5 5 3 3u2 du 8 (4) (5) = 49/3 3 The variance of U is Var[U ] = E[U 2 ] − (E[U ])2 = 37/3. (c) Note that 2U = e(ln 2)U .3. ∞   0    1/8  0 fU (u) =   3/8    0 −3 u < −5 −5 ≤ u < −3 −3 ≤ u < 3 3≤u<5 u ≥ 5. This implies that 2u du = The expected value of 2U is then E 2U = ∞ −∞ e(ln 2)u du = 1 (ln 2)u 2u e = ln 2 ln 2 (6) 2u fU (u) du = = −3 −5 2u 2u du + 8 −3 −5 5 3 3 · 2u du 8 5 (7) (8) 8 ln 2 + 3 · 2u 8 ln 2 = 3 2307 = 13.8 Solution The Pareto (α.3. µ) random variable has PDF fX (x) = The nth moment is E [X n ] = ∞ µ (α/µ) (x/µ)−(α+1) x ≥ µ 0 otherwise (1) xn α µ x µ −(α+1) dx = µn ∞ µ α µ x µ −(α−n+1) dx (2) 84 .001 256 ln 2 Problem 3. we first find the corresponding PDF are   0    (u + 5)/8  1/4 FU (u) =   1/4 + 3(u − 3)/8    1 (a) The expected value of U is E [U ] = Problem 3.7 Solution PDF of U by taking the derivative of FU (u). we must have λ = 1/5. we obtain E [X n ] = αµn 1 ∞ y −(α−n+1) dy (3) We see that E[X n ] < ∞ if and only if α − n + 1 > 1. (c) P [Y > 5] = 5 ∞ Var[Y ] = (2) fY (y) dy = −e−y/5 ∞ 5 = e−1 (3) Problem 3.4. the mean and variance of Y are 1 E [Y ] = λ (a) Since Var[Y ] = 25.With the variable substitution y = x/µ. (b) Substituting the parameters n = 5 and λ = 1/3 into the given PDF. (1) P [Y > P0 ] = P0 P0 Fortunately.4. Problem 3. 85 .2 Solution From Appendix A. equivalently.1 Solution The reflected power Y has an exponential (λ = 1/P0 ) PDF. an Erlang random variable X with parameters λ > 0 and n has PDF fX (x) = λn xn−1 e−λx /(n − 1)! x ≥ 0 0 otherwise (1) In addition.3 Solution From Appendix A. E[Y ] = P0 . n < α. In this case. the mean and variance of X are n n Var[X] = 2 E [X] = λ λ (a) Since λ = 1/3 and E[X] = n/λ = 15.4. we must have n = 5. real radar systems offer better performance. E [X n ] = = αµn y −(α−n+1)+1 −(α − n + 1) + 1 y=∞ y=1 y=∞ y=1 (4) (5) −αµn −(α−n) y α−n = αµn α−n Problem 3. we know that Var[X] = n/λ2 = 45. The probability that an aircraft is correctly identified is ∞ 1 −y/P0 e dy = e−1 . or. we observe that an exponential PDF Y with parameter λ > 0 has PDF fY (y) = λe−λy y ≥ 0 0 otherwise 1 λ2 (1) In addition.8. (b) The expected value of Y is E[Y ] = 1/λ = 5. From Theorem 3. we obtain fX (x) = (1/3)5 x4 e−x/3 /24 x ≥ 0 0 otherwise (2) (3) (c) From above. 4 Solution Since Y is an Erlang random variable with parameters λ = 2 and n = 2. we obtain P [1/2 ≤ Y < 3/2] = −2ye−2y 3/2 1/2 dv = 2e−2y v = −e−2y 3/2 1/2 + 2e−2y dy (3) (4) = 2e−1 − 4e−3 = 0. we find in Appendix A that 4ye−2y y ≥ 0 (1) fY (y) = 0 otherwise (a) Appendix A tells us that E[Y ] = n/λ = 1.4. the CDF is FX (x) = x −5 fX (τ ) dτ = x+5 10 (2) (c) The expected value of X is The complete expression for the CDF of X is  x < −5  0 (x + 5)/10 5 ≤ x ≤ 5 FX (x) =  1 x>5 5 −5 (3) x2 x dx = 10 20 5 =0 −5 (4) Another way to obtain this answer is to use Theorem 3. 5) random variable is fX (x) = 1/10 −5 ≤ x ≤ 5 0 otherwise (1) (b) For x < −5. (b) Appendix A also tells us that Var[Y ] = n/λ2 = 1/2. For x ≥ 5. (c) The probability that 1/2 ≤ Y < 3/2 is P [1/2 ≤ Y < 3/2] = 3/2 1/2 fY (y) dy = 3/2 1/2 4ye−2y dy u dv = uv − (2) v du This integral is easily completed using the integration by parts formula with u = 2y du = 2dy Making these substitutions.5 Solution (a) The PDF of a continuous uniform (−5.Problem 3.4. 86 .6 which says the expected value of X is E[X] = (5 + −5)/2 = 0.537 Problem 3. FX (x) = 1. For −5 ≤ x ≤ 5. FX (x) = 0. 4.84 10 (6) We know that X has a uniform PDF over [a.6 Solution Problem 3.7 Solution Given that fX (x) = (a) P [1 ≤ X ≤ 2] = 2 1 (1/2)e−x/2 x ≥ 0 0 otherwise (1) (1/2)e−x/2 dx = e−1/2 − e−1 = 0. All that is left to do is determine the values of the constants a and b. the expected value of X is E[X] = 1/a = 2. the variance of X is Var[X] = 1/a2 = 4. to complete the model of the uniform PDF. fX (x) = 1/6 4 ≤ x ≤ 10 0 otherwise (4) b = 10 (3) b−a=6 (2) Problem 3. (b − a)2 a+b =7 Var[X] = =3 (1) E [X] = 2 12 Since we assume b > a. b) and has mean µX = 7 and variance Var[X] = 3. we arrive at a=4 And the resulting PDF of X is.4.2387 (2) (b) The CDF of X may be be expressed as FX (x) = 0 x −x/2 dτ 0 (1/2)e x<0 = x≥0 0 x<0 −x/2 x ≥ 0 1−e (3) (c) X is an exponential random variable with parameter a = 1/2.8. By Theorem 3.8. (d) By Theorem 3. 87 . this implies a + b = 14 Solving these two equations.(d) The fifth moment of X is 5 −5 x6 x5 dx = 10 60 5 =0 −5 (5) (e) The expected value of eX is 5 −5 ex ex dx = 10 10 5 = −5 e5 − e−5 = 14. (X − 20)+ = 0. The variance of U can also be found by finding E[U 2 ]. E U2 = a b u2 /(b − a) du = (3) Therefore the variance is Var[U ] = (b3 − a3 ) − 3(b − a) b+a 2 2 = (4) Problem 3. otherwise y + = 0 for y < 0. That is. The PDF of X is fX (x) = (1/τ )e−x/τ 0 x≥0 otherwise (1) We will use CA (X) and CB (X) to denote the cost of a call under the two plans.4. Thus E[(X − 20)+ |X ≤ 20] = 0 and + 10E (X − 20)+ |X > 20 P [X > 20] E [CB (X)] = 99 + 10E [(X − 20)|X > 20] P [X > 20] Finally.34 minutes. Thus. From the problem statement.8 Solution Given the uniform PDF fU (u) = The mean of U can be found by integrating E [U ] = a b 1/(b − a) a ≤ u ≤ b 0 otherwise (1) u/(b − a) du = b+a b2 − a2 = 2(b − a) 2 (b3 − a3 ) 3(b − a) (b − a)2 12 (2) Where we factored (b2 − a2 ) = (b − a)(b + a). E [CB (X)] = 99 + 10τ e−20/τ (8) Some numeric comparisons show that E[CB (X)] ≤ E[CA (X)] if τ > 12. the flat price for the first 20 minutes is a good deal only if your average phone call is sufficiently long. 88 . Thus. On the other hand CB (X) = 99 + 10(X − 20)+ where y + = y if y ≥ 0.Problem 3. we note that CA (X) = 10X so that E[CA (X)] = 10E[X] = 10τ . E [CB (X)] = E 99 + 10(X − 20)+ = 99 + 10E (X − 20) + + (2) (3) (4) (5) = 99 + 10E (X − 20) |X ≤ 20 P [X ≤ 20] Given X ≤ 20.4. we observe that P [X > 20] = e−20/τ and that E [(X − 20)|X > 20] = τ (6) (7) since given X ≥ 20.9 Solution Let X denote the holding time of a call. X −20 has a PDF identical to X by the memoryless property of the exponential random variable. 89 .4. the CDF of an Erlang (n. Since λn−1 xn−1 dx v = −e−λx du = (n − 2)! we can write In = uv|∞ − 0 =− ∞ v du −λx ∞ 0 (4) + 0 ∞ 0 λn−1 xn−1 Hence. we use that relationship to prove the theorem by induction. λ) random variable X. we have In = λn−1 xn−1 −λx λe dt (n − 1)! dv u (2) u dv = (3) We define u and dv as shown above in order to use the integration by parts formula uv − v du. Problem 3.10 Solution The integral I1 is I1 = 0 ∞ λe−λx dx = −e−λx ∞ 0 ∞ 0 =1 (1) For n > 1. E Xk = (n + k − 1)! λk (n − 1)! (n + 1)! (n + 1)n = λ2 (n − 1)! λ2 (3) This implies that the first and second moments are E [X] = n! n = (n − 1)!λ λ E X2 = (4) It follows that the variance of X is n/λ2 .Problem 3. λ) random variable Xn satisfies n−1 (λx)k e−λx .4.11 which says that for x ≥ 0. (1) FXn (x) = 1 − k! k=0 We do this in two steps. First. Second. we prove Theorem 3.12 Solution In this problem. we derive a relationship between FXn (x) and FXn−1 (x). In = 1 for all n ≥ 1. (n − 1)! e λn−1 xn−1 −λx e dx = 0 + In−1 (n − 2)! (5) Problem 3.11 Solution E Xk = 0 ∞ ∞ 0 For an Erlang (n.4. Hence. the kth moment is xk fX (x) dt λn xn+k−1 −λx (n + k − 1)! e dt = k (n − 1)! λ (n − 1)! ∞ 0 (1) λn+k xn+k−1 −λt e dt (n + k − 1)! 1 = (2) The above marked integral equals 1 since it is the integral of an Erlang PDF with parameters λ and n + k over all possible values. we have x 0 λn−1 tn−1 e−λt λn tn−1 e−λt dt = − (n − 1)! (n − 1)! =− + λn−1 xn−1 e−λx + FXn−1 (x) (n − 1)! λn−1 tn−2 e−λt dt (n − 2)! (5) (6) (c) Now we do proof by induction. we can write FXn (x) = FXn−1 (x) − n−2 k=0 (λx)k e−λx . Hence for x ≥ 0. For n = 1. k! (7) =1− which proves the claim.13 Solution For n = 1.4. λ) random variable X1 is simply an exponential random variable. λ) random variable Xn is FXn (x) = x −∞ fXn (t) dt = x 0 λn tn−1 e−λt dt. k=0 (λx)k e−λx (λx)n−1 e−λx − k! (n − 1)! (λx)n−1 e−λx (n − 1)! (8) (9) Problem 3. we define tn−1 (n − 1)! tn−2 du = (n − 2)! u= x 0 dv = λn e−λt dt v = −λn−1 e−λt u dv = uv − x 0 (3) (4) Thus. FX1 (x) = 1 − e−λx . we write E [X n ] = 0 xn λe−λx dx (1) v du with u = xn and dv = λe−λx dx. (2) (3) (4) Now we use the integration by parts formula u dv = uv − This implies du = nxn−1 dx and v = −e−λx so that E [X n ] = −xn e−λx =0+ = n λ ∞ 0 ∞ + 0 ∞ nxn−1 e−λx dx xn−1 λe−λx dx n E X λ 0 n−1 90 . (n − 1)! (2) (b) To use integration by parts. the CDF of Erlang (n. Now we suppose the claim is true for FXn−1 (x) so that n−2 FXn−1 (x) = 1 − Using the result of part (a).7. we have the fact E[X] = 1/λ that is given in the problem statement. Specifically. the Erlang (n.(a) By Definition 3. To complete the proof. we show that this implies that E[X n ] = n!/λn . using the integration by parts formula FXn (x) = v du. Now we assume that E[X n−1 ] = (n − 1)!/λn−1 . 5) − 1 = .1 on page 123 to evaluate the following P [T > 100] = 1 − P [T ≤ 100] = 1 − FT (100) = 1 − Φ = 1 − Φ(1. limr→∞ rP [X > r] = 0 and this implies x[1 − FX (x)]|∞ = 0. u dv = uv − v du by defining u = 1 − FX (x) xfX (x) dx (5) [1 − FX (x)] dx = x[1 − FX (x)]|∞ + 0 ∞ 0 By applying part (a). T .By our induction hyothesis. is a Gaussian random variable with mean 85 and standard deviation 10 we can use the fact that FT (t) = Φ((t − µT )/σT ) and Table 3.007 100 − 85 10 (1) (2) (3) (4) (5) (6) P [70 ≤ T ≤ 100] = FT (100) − FT (70) = Φ(1. E[X n−1 ] = (n − 1)!/λn−1 which implies E [X n ] = n!/λn (5) Problem 3. Thus.4.066 P [T < 60] = Φ 60 − 85 = Φ(−2.5) 10 = 1 − Φ(2.5.5) − Φ(−1.1 Solution Given that the peak temperature.14 Solution (a) Since fX (x) ≥ 0 and x ≥ r over the entire integral.5) = 1 − 0. we can write ∞ r xfX (x) dx ≥ r 0 ∞ rfX (x) dx = rP [X > r] (1) r (b) We can write the expected value of X in the form E [X] = Hence. rP [X > r] ≤ r xfX (x) dx + r ∞ xfX (x) dx r 0 (2) ∞ xfX (x) dx = E [X] − r xfX (x) dx (3) Allowing r to approach infinity yields r→∞ lim rP [X > r] ≤ E [X] − lim r→∞ 0 xfX (x) dx = E [X] − E [X] = 0 (4) (c) We can use the integration by parts formula and dv = dx. we must have limr→∞ rP [X > r] = 0.866 91 . This yields ∞ 0 Since rP [X > r] ≥ 0 for all r ≥ 0.5) = 2Φ(1.933 = 0. 0 ∞ 0 [1 − FX (x)] dx = ∞ xfX (x) dx = E [X] (7) 0 Problem 3.5) = 1 − . we now observe that x [1 − FX (x)]|∞ = lim r[1 − FX (r)] − 0 = lim rP [X > r] 0 r→∞ r→∞ (6) By part (b).993 = 0. 1 for the Gaussian CDF. Using Table 3. Φ 10 − µT 15 = 1/2 (2) implies that (10 − µT )/15 = 0 or µT = 10. T exceeds 10 degrees. then Γ = 0.5. We do know. Now we have a Gaussian T with mean 10 and standard 92 .5.2 Solution The standard normal Gaussian random variable Z has mean µ = 0 and variance σ 2 = 1.11.5. and we do so by looking at the second fact. we have 1 Q(z) = √ π −x2 √ e z/ 2 ∞ 1 dx = erfc 2 z √ 2 (2) Problem 3. Problem 3. Therefore. 1 Q(z) = √ 2π ∞ z e−u 2 /2 du (1) √ Making the substitution x = u/ 2.Problem 3. First we would like to find the mean temperature. P [−10 ≤ X ≤ 10] = FX (10) − FX (−10) = 2Φ 10 σX −1 (2) This implies Φ(10/σX ) = 0. Making these substitutions in Definition 3. we find that 10/σX = 0.5 Solution Moving to Antarctica.4 Solution Repeating Definition 3.6.55.1 (1) We can find the variance Var[X] by expanding the above probability in terms of the Φ(·) function.15 or σX = 66. We also know that with probability 1/2. however.3 Solution X is a Gaussian random variable with zero mean but unknown variance. P [T > 10] = 1 − P [T ≤ 10] = 1 − Φ 10 − µT 15 = 1/2 (1) By looking at the table we find that if Φ(Γ) = 1/2. T is still Gaussian but with variance 225. that P [|X| ≤ 10] = 0. we find that the temperature.8 yields 1 2 fZ (z) = √ e−z /2 2π (1) Problem 3.5. 01 (5) Φ n n Equivalently.926 = 0.45) = 1 − 0.09.749 = 0.14 and the tables for the Φ and Q functions to answer the questions. we use Theorem 3. to be certain with probability 0. Hence.34 · 10−4 = 1 − Φ(0.91 × 10−6 20 5 (1) (2) The second part is a little trickier. we know that Yn − 40n 1000 − 40n 100 − 4n √ √ > √ = 0. In particular.99 that the prof spends $1000 will require more than 25 years. This root is not a valid solution to our problem. Since E[Y25 ] = 1000.074 (3) (4) (5) (6) (7) (8) (9) (10) P [T > 60] = 1 − P [T ≤ 60] = 1 − FT (60) =1−Φ 60 − 10 15 = Q(3. This would correspond to assuming the standard deviation of Yn is negative.99 (3) =1−Φ n 100n 100n Hence. Note that a second root of the quadratic yields n = 22. or P [Yn > 1000] = P Solving this quadratic yields n = 28. (7) . P [T > 32] = 1 − P [T ≤ 32] = 1 − Φ P [T < 0] = FT (0) = Φ 0 − 10 15 = Φ(−2/3) = 1 − Φ(2/3) 32 − 10 15 = 1 − Φ(1. we must find n such that 100 − 4n √ = 0. only after 28 years are we 99 percent sure that the prof will have spent $1000.deviation 15. However. This is consistent with our earlier observation that we would need n > 25 corresponding to 100 − 4n < 0. we have that (4n − 100)/ n = 2. Since E[Y20 ] = 40(20) = 800 and Var[Y20 ] = 100(20) = 2000. we have 4n − 100 √ = 0.25.99 (6) Φ n √ From the table of the Φ function. Thus.5.251 = 1 − Φ(10/3) Problem 3. Mathematically.01 for a negative value of x. So we are prepared to answer the following problems.47) = 3. we use the identity Φ(x) = 1 − Φ(−x) to write 4n − 100 100 − 4n √ √ =1−Φ = 0.67) = 1 − 0. 93 (n − 25)2 = (0.58)2 n = 0. it is a solution of our quadratic in which we √ choose the negative root of n.01 (4) Φ n Recall that Φ(x) = 0. we know that the prof will spend around $1000 in roughly 25 years.3393n. we can write P [Y20 > 1000] = P 1000 − 800 Y20 − 800 √ > √ 2000 2000 200 = P Z > √ = Q(4.33.6 Solution In this problem.33) = 4. 5.S men are independent Gaussian random variables with mean 5 10 .000 of them are at least 7 feet tall.87 · 10−7 . Table 3. if you’re curious.3 · 10−4 From Table 3. P [H ≥ 90] = Q 90 − 70 4 = Q(5) ≈ 3 · 10−7 = β (4) Now we can begin to find the probability that no man is at least 7’6”. 2 (1) 94 .5) = 4.000 =Φ 100. In fact.5. and the heights of U. (b) The probability that a randomly chosen man is at least 8 feet tall is P [H ≥ 96] = Q 96 − 70 4 = Q(6. although it should be apparent that the probability is very small. To find σX .2.000 repetitions of a Bernoulli trial with parameter 1 − β.000.000 = 9.2 stops at Q(4. This can be modeled as 100. Q(6.000. Q(14/σX ) = 2.9 as 2 erf(x) = √ π x 0 e−u du.S male. (a) Let H denote the height in inches of a U.5 or σX = 4.7 Solution We are given that there are 100.000. we look at the fact that the probability that P [H ≥ 84] is the number of men who are at least 7 feet tall divided by the total number of men (the frequency interpretation of probability).5). the exact value is Q(5) = 2.000. Problem 3.000 men in the United States and 23. this implies 14/σX = 3.99).8 Solution This problem is in the wrong section since the erf(·) function is defined later on in Section 3.000 70 − 84 σX = 0.00023 (1) Unfortunately. (c) First we need to find the probability that a man is at least 7’6”.5) (3) (2) 23.Problem 3.0 × 10−11 .4 × 10−14 (d) The expected value of N is just the number of trials multiplied by the probability that a man is at least 7’6”.000.2 doesn’t include Q(6. Since we measure H in inches. The probability that no man is at least 7’6” is (5) (1 − β)100. we have P [H ≥ 84] = Since Φ(−x) = 1 − Φ(x) = Q(x). E [N ] = 100.000 · β = 30 (6) Although Table 3. FY (y) = y −∞ fY (u) du 1 2π(1/2) = 1/2 + 1 1 +√ 2 π e−y 2 /[2(1/2)] 1 2 = √ e−y .5.5. we use y to denote the other variable of integration so that I2 = ∞ 1 1 2 √ √ e−x /2 dx 2π −∞ 2π ∞ ∞ 1 2 +y 2 )/2 = e−(x dx dy 2π −∞ −∞ ∞ −∞ e−y 2 /2 dy (3) (4) (c) By changing to polar coordinates. Z = 2Y is Gaussian with expected value E[Z] = 2E[Y ] = 0 and variance Var[Z] = 2 Var[Y ] = 1. the SNR Y is an exponential (1/γ) random variable with PDF fY (y) = (1/γ)e−y/γ 0 95 y ≥ 0. 1) and Φ(z) = FZ (z) = P √ z 2Y ≤ z = P Y ≤ √ = FY 2 z √ 2 = 1 + erf 2 z √ 2 (4) Problem 3.9 Solution I= First we note that since W has an N [µ. Y has variance 1/2 and fY (y) = For y ≥ 0. 1/ 2). Thus Z is Gaussian (0. y 0 2 FY (y) = e−u du = (3) √ √ √ (b) Since Y is Gaussian (0. From the problem statement. the integral we wish to evaluate is ∞ −∞ fW (w) dw = √ 1 2πσ 2 ∞ −∞ e−(w−µ) 2 /2σ 2 dw (1) (a) Using the substitution x = (w − µ)/σ. otherwise. (1) . we have dx = dw/σ and 1 I=√ 2π ∞ −∞ e−x 2 /2 dx (2) (b) When we write I 2 as the product of integrals. σ 2 ] distribution. x2 + y 2 = r2 and dx dy = r dr dθ so that I2 = 1 2π 1 = 2π 2π 0 2π 0 0 ∞ e−r 2 /2 r dr dθ dθ = 1 2π 2π 0 (5) dθ = 1 (6) −e−r 2 /2 ∞ 0 Problem 3. 2 (2) y 0 fY (u) du. 1/ 2).√ (a) Since Y is Gaussian (0.10 Solution This problem is mostly calculus and only a little probability. π Substituting fY (u) yields 1 + erf(y). we ¯ γ obtain Pe = 1 1 − 2 2 γ ¯ π ∞ 0 t−1/2 e−t dt (9) From Math Fact B.11. recalling that Q(0) = 1/2 and making the substitution t = y/¯ . from the problem statement.6. we use the integration by parts formula u = Q( 2y) = uv|b − a where (5) (6) dv = 1 −y/γ e dy γ 1 e−y du = Q ( 2y) √ = − √ 2 πy 2y From integration by parts. it follows that P e = uv|∞ − 0 ∞ 0 ∞ 0 ∞ 0 v = −e−y/γ v du = −Q( 2y)e−y/γ ∞ 0 − 1 √ e−y[1+(1/γ)] dy y (7) (8) 1 = 0 + Q(0)e−0 − √ 2 π γ y −1/2 e−y/¯ dy where γ = γ/(1 + γ). To solve the integral. 96 .1 Solution (a) Using the given CDF P [X < −1] = FX −1− = 0 (1) (2) P [X ≤ −1] = FX (−1) = −1/3 + 1/3 = 0 Where FX (−1− ) denotes the limiting value of the CDF found by approaching −1 from the left. √ Since Γ(1/2) = π. FX (−1+ ) is interpreted to be the value of the CDF found by approaching −1 from the right. then Q(x) = 1 − FX (x) . we recall that if X is a Gaussian (0. Next. Pe = 1 1 − 2 2 √ 1 γ ¯ 1 Γ(1/2) = 1− γ = 1− ¯ π 2 2 γ 1+γ (10) Problem 3.Thus. Before doing so. its a good idea to try integration by parts. we see that the remaining integral is the Γ(z) function evaluated z = 1/2. the BER is P e = E [Pe (Y )] = ∞ −∞ Q( 2y)fY (y) dy = 0 ∞ Q( y 2y) e−y/γ dy γ (2) Like most integrals with exponential factors. Likewise. We notice that these two probabilities are the same and therefore the probability that X is exactly −1 is zero. 1) random variable with CDF FX (x). It follows that Q(x) has derivative Q (x) = dFX (x) dQ(x) 1 2 =− = −fX (x) = − √ e−x /2 dx dx 2π b a u dv (3) (4) b a v du. 2.3 Solution (a) By taking the derivative of the CDF FX (x) given in Problem 3.2 Solution (a) Similar to the previous problem we find P [X < −1] = FX −1− = 0 (b) P [X < 0] = FX 0− = 1/2 P [X ≤ −1] = FX (−1) = 1/4 (1) Here we notice the discontinuity of value 1/4 at x = −1.6. Approached from the left the CDF yields a value of 1/3 but approached from the right the value is 2/3. P [X = 0] = P [X ≤ 0] − P [X < 0] = 2/3 − 1/3 = 1/3 (c) P [0 ≤ X ≤ 1] = FX (1) − FX 0 P [0 < X ≤ 1] = FX (1) − FX 0+ = 1 − 2/3 = 1/3 − (5) (6) (7) = 1 − 1/3 = 2/3 The difference in the last two probabilities above is that the first was concerned with the probability that X was strictly greater then 0.6. Problem 3. The two differ by the probability that X = 0. P [X ≤ 0] = FX (0) = 1/2 (2) Since there is no discontinuity at x = 0. Problem 3.6. FX (0− ) = FX (0+ ) = FX (0).(b) P [X < 0] = FX 0− = 1/3 P [X ≤ 0] = FX (0) = 2/3 (3) (4) Here we see that there is a discrete jump at X = 0. and this difference is non-zero only when the random variable exhibits a discrete jump in the CDF. This means that there is a non-zero probability that X = 0. (c) P [X > 1] = 1 − P [X ≤ 1] = 1 − FX (1) = 0 P [X ≥ 1] = 1 − P [X < 1] = 1 − FX 1 − (3) (4) = 1 − 3/4 = 1/4 Again we notice a discontinuity of size 1/4. we obtain the PDF fX (x) = 0 δ(x+1) 4 + 1/4 + δ(x−1) 4 −1 ≤ x ≤ 1 otherwise (1) 97 . and the second with the probability that X was greater than or equal to zero. Since the the second probability is a larger set (it includes the probability that X = 0) it should always be greater than or equal to the first probability. here occurring at x = 1. in fact that probability is the difference of the two values. 0 otherwise (1) (2) (3) p(1 − p)j−1 δ(x − j) Problem 3. For w ≥ 0. Var[X] = E[X 2 ] = 2/3. no one answers. we know that FW (w) = 0 for w < 0.6.5 Solution The PMF of a geometric random variable with mean 1/p is PX (x) = The corresponding PDF is fX (x) = pδ(x − 1) + p(1 − p)δ(x − 2) + · · · = ∞ j=1 p(1 − p)x−1 x = 1.(b) The first moment of X is E [X] = ∞ −∞ xfX (x) dx 1 −1 (2) + x/4|x=1 = −1/4 + 0 + 1/4 = 0. Note that the event Ac implies W = 0. .4 Solution The PMF of a Bernoulli random variable with mean   1−p p PX (x) =  0 p is x=0 x=1 otherwise (1) The corresponding PDF of this discrete random variable is fX (x) = (1 − p)δ(x) + pδ(x − 1) (2) Problem 3. (5) Since E[X] = 0. .6. or if the conversation time X of a completed call is zero. The conversation time W is zero iff either the phone is busy. (3) = x/4|x=−1 + x2 /8 (c) The second moment of X is E X2 = ∞ −∞ 2 x2 fX (x) dx x=−1 (4) 1 −1 = x /4 + x3 /12 + x2 /4 x=1 = 1/4 + 1/6 + 1/4 = 2/3.6. . FW (w) = P [Ac ] + P [A] FW |A (w) = (1/2) + (1/2)FX (w) Thus the complete CDF of W is FW (w) = 0 w<0 1/2 + (1/2)FX (w) w ≥ 0 98 (2) (1) . Problem 3.6 Solution (a) Since the conversation time cannot be negative. Let A be the event that the call is answered. 2. (b) By taking the derivative of FW (w). fX (x) = 0 for x < 0. is fT (t) = The CDF can be found be integrating   0 0. calculating the moments is straightforward. E [W ] = The second moment is E W2 = The variance of W is Var[W ] = E W 2 − (E [W ])2 = E X 2 /2 − (E [X] /2)2 2 ∞ −∞ ∞ −∞ (4) wfW (w) dw = (1/2) ∞ −∞ wfX (w) dw = E [X] /2 (5) w2 fW (w) dw = (1/2) ∞ −∞ w2 fX (w) dw = E X 2 /2 (6) (7) (8) = (1/2) Var[X] + (E [X]) /4 Problem 3.2t 300 0. From the problem statement.7 Solution The professor is on time 80 percent of the time and when he is late his arrival time is uniformly distributed between 0 and 300 seconds. then D = 0. no foul occurs.8δ(t − 0) + 0 0.8 Solution Let G denote the event that the throw is good. if the throw is a foul. fW (w) = (1/2)δ(w) + (1/2)fX (w) (c) From the PDF fW (w).6. the PDF of W is fW (w) = (1/2)δ(w) + (1/2)fX (w) 0 otherwise (3) Next. This implies P [D ≤ y|Gc ] = u(y) (3) 99 . The PDF of T . we keep in mind that since X must be nonnegative. for y < 60.8 + FT (t) =  1 0. P [D ≤ y|G] = 0. P [D ≤ y|G] = P [X ≤ y − 60] = 1 − e−(y−60)/10 (y ≥ 60) (2) (1) Of course.2 300 0 ≤ t ≤ 300 otherwise (1) t < −1 0 ≤ t < 300 t ≥ 300 (2) Problem 3. The CDF of D obeys FD (y) = P [D ≤ y|G] P [G] + P [D ≤ y|Gc ] P [Gc ] Given the event G. Hence.6. that is. So the PDF of T can be written as  t=0  0.07u(y − 60)e−(y−60)/10 = 0.07u(y − 60)e −(y−60)/10 (8) (9) The middle term δ(y − 60)(1 − e−(y−60)/10 ) dropped out because at y = 60.7(1 − e−(y−60)/10 ) y ≥ 60 (4) (5) However.15. However.3δ(y) + 0. Y = X 2 satisfies 0 ≤ Y ≤ 1.15 = 0. the probability that there is no lecture is P [T = 0] = (0. Since he is late 30% of the time and given that he is late. FY (y) = P X 2 ≤ y = P [X ≤ 100 √ y] (1) Problem 3.7.3δ(y) + 0.7.03 75 ≤ 7 < 80 (3) fT (t) =  0.6. when we take the derivative.5) = 0. We can conclude that FY (y) = 0 for y < 0 and that FY (y) = 1 for y ≥ 1.7δ(y − 60)(1 − e−(y−60)/10 ) + 0. his arrival is uniformly distributed between 0 and 10 minutes. Problem 3.3u(y) + 0.7 − 0.3)(0. In terms of math. because the students will not wait longer then 5 minutes.7u(y − 60)(1 − e−(y−60)/10 ) (6) 0.15δ(t)   0. applying the usual rule for the differentation of a product does give the correct answer: fD (y) = 0.9 Solution The professor is on time and lectures the full 80 minutes with probability 0. the students leave and a 0 minute lecture is observed.1 Solution .7.7.07e−(y−60)/10 y ≥ 60 (7) Taking the derivative of the second expression for the CDF is a little tricky because of the product of the exponential and the step function.15 (2) The only other possible lecture durations are uniformly distributed between 75 and 80 minutes.3δ(y) y < 60 0. Since P [G] = 0. and that probability must add to a total of 1 − 0. taking the derivative of the first expression perhaps may be simpler: fD (y) = 0.3 + 0.3u(y) y < 60 0. e−(y−60)/10 = 1. P [T = 80] = 0. either expression for the CDF will yield the PDF.where u(·) denotes the unit step function.7δ(t − 80) t = 80   0 otherwise Since 0 ≤ X ≤ 1. we can write FD (y) = P [G] P [D ≤ y|G] + P [Gc ] P [D ≤ y|Gc ] = Another way to write this CDF is FD (y) = 0. However. For 0 ≤ y < 1. (1) Likewise when the professor is more than 5 minutes late. we obtain the PDF fY (y) =  y<0  0 √ y 0≤y<1 FY (y) =  1 y≥1 √ 1/(2 y) 0 ≤ y < 1 0 otherwise (3) (4) Since Y = X. we see that for 0 ≤ y < 1. for y ≥ 0.Since fX (x) = 1 for 0 ≤ x ≤ 1. it follows that the PDF of Y is fY (y) = 2λye−λy 0 2 y≥0 otherwise (3) In comparing this result to the √ Rayleigh PDF given in Appendix A.7. FX (x) = 1 − e−λx . Hence for w < 0. FW (w) = P [W ≤ w] = P X 2 ≤ w = P [X ≤ w] =1−e √ −λ w (1) (2) (3) √ √ Taking the derivative with respect to w yields fW (w) = λe−λ w /(2 w).7. The complete expression for the PDF is √ λe−λ w √ w≥0 2 w (4) fW (w) = 0 otherwise 101 . fW (w) = 0. the CDF of Y is √ y] = 0 √ y dx = √ y (2) By taking the derivative of the CDF. Problem 3.3 Solution Since X is non-negative. FY (y) = 1 − e−λy 0 2 Problem √ 3. In addition. W = X 2 is also non-negative. we observe that Y is a Rayleigh (a) random variable with a = 2λ. Thus.2 Solution √ X ≤ y = P X ≤ y 2 = FX y 2 (1) y≥0 otherwise (2) By taking the derivative with respect to y. P [X ≤ Hence. the fact that X is nonegative and that we asume the squre root is always positive implies FY (y) = 0 for y < 0. we can find the CDF of Y by writing FY (y) = P [Y ≤ y] = P For x ≥ 0. For w ≥ 0. it is helpful to have a sketch of the function X = − ln(1 − U ). We use this fact to find the CDF of X.1. the CDF of Y is  y<0  0 1/3 0 ≤ y < 100 (3) FY (y) = P [Y ≤ y] =  1 y ≥ 100 (b) The CDF FY (y) has jumps of 1/3 at y = 0 and 2/3 at y = 100. Since P [X < 0] = FX (0− ) = 1/3. we observe that X will be nonnegative.6. 1]. Hence FX (x) = 0 for x < 0.5 Solution Before solving for the PDF. For x ≥ 0.Problem 3. random variable X has CDF   0   x/3 + 1/3 FX (x) =  x/3 + 2/3   1 x < −1 −1 ≤ x < 0 0≤x<1 1≤x (1) (a) We can find the CDF of Y . Therefore  y<0  0 P [X < 0] 0 ≤ y < 100 (2) FY (y) = P [Y ≤ y] =  1 y ≥ 100 The probabilities concerned with X can be found from the given CDF FX (x). This is the general strategy for solving problems of this type: to express the CDF of Y in terms of the CDF of X. And the probability that Y takes on these two values depends on the probability that X < 0 and X ≥ 0. P [U ≤ u] = u. 4 2 0 X 0 0. Since U has a uniform distribution on [0. FY (y) by noting that Y can only take on two possible values. 0 and 100. FX (x) = P [− ln(1 − U ) ≤ x] = P 1 − U ≥ e−x = P U ≤ 1 − e−x 102 (1) .4 Solution From Problem 3. for 0 ≤ u ≤ 1.7.7.5 U 1 (a) From the sketch.66 3 3 (5) Problem 3. The corresponding PDF of Y is (4) fY (y) = δ(y)/3 + 2δ(y − 100)/3 (c) The expected value of Y is E [Y ] = ∞ −∞ yfY (y) dy = 0 · 2 1 + 100 · = 66. respectively. 5 otherwise 103 (2) .1] to the following PDF for Y . (c) Since X is an exponential random variable with parameter a = 1. Problem 3. for 0 ≤ y ≤ 1. X has an exponential PDF. 1] random numbers. the PDF is fX (x) = e−x x ≥ 0 0 otherwise (4) 0 x<0 1 − e−x x ≥ 0 (3) (2) Thus. 3y 2 0 ≤ y ≤ 1 (1) fY (y) = 0 otherwise We begin by realizing that in this case the CDF of   0 y3 FY (y) =  1 Therefore.7 Solution Since the microphone voltage V is uniformly distributed between -1 and 1 volts. since most computer languages provide uniform [0. we see that for 0 ≤ y ≤ 1.6 Solution We wish to find a transformation that takes a uniformly distributed random variable on [0. using g(X) = X 1/3 . P [Y ≤ y] = P [g(X) ≤ y] = y 3 (3) P [g(X) ≤ y] = P X 1/3 ≤ y = P X ≤ y 3 = y 3 which is the desired answer. V has PDF and CDF  v < −1  0 1/2 −1 ≤ v ≤ 1 (v + 1)/2 −1 ≤ v ≤ 1 FV (v) = fV (v) = (1) 0 otherwise  1 v>1 The voltage is processed by a limiter whose output magnitude is given by below L= |V | |V | ≤ 0. E[X] = 1. 0 ≤ 1 − e−x ≤ 1 and so FX (x) = FU 1 − e−x = 1 − e−x The complete CDF can be written as FX (x) = (b) By taking the derivative.For x ≥ 0. the procedure outlined in this problem provides a way to generate exponential random variables from uniform random variables. In fact. (4) Problem 3.7. Y must be y<0 0≤y≤1 otherwise (2) Thus.5 0.7. the PDF of L is fL (l) = The expected value of L is E [L] = ∞ −∞ (9) lfL (l) dl = l dl + 0. the shaded area is the whole disc and Y = πr2 .5) (3) (4) (5) FL (l) = P [|V | ≤ l] = P [−l ≤ v ≤ l] = FV (l) − FV (−l)   0 l<0 l 0 ≤ l < 0.5/2 = 1/2 (b) For 0 ≤ l ≤ 0. 104 . then the area of the disc is πr2 . if X = 1/2.5 l(0.5 1 + (0.(a) P [L = 0. Since the circumference of the disc is 1 and X is measured around the circumference.5)δ(l − 0.5)δ(l − 0.5] = P [|V | ≥ 0. Similarly.5 0 otherwise 0.5 FL (l) =  1 l ≥ 0. For example.5 0 0. then Y = πr2 /2 is half the area of the disc.5) 0 ≤ l ≤ 0. the PDF of Y is fY (y) = (d) The expected value of Y is E[Y ] =  y<0  0 4πy 0 ≤ y ≤ FY (y) =  1 1 y ≥ 4π 1 4π (3) 1 4π 0 ≤ y ≤ 4π 0 otherwise (4) 1/(4π) 4πy dy 0 = 1/(8π). = 1 − FV (0. when X = 1. (a) If the disc has radius r. r = 1/(2π) and Y = πr2 X = (b) The CDF of Y can be expressed as FY (y) = P [Y ≤ y] = P Therefore the CDF is X ≤ y = P [X ≤ 4πy] = FX (4πy) 4π (2) X 4π (1) (c) By taking the derivative of the CDF.5) dl = 0.375 (10) Problem 3.7. Y = πr2 X.5] = P [V ≥ 0.8 Solution Let X denote the position of the pointer and Y denote the area within the arc defined by the stopping position of the pointer.5) + FV (−0.5] = 1 − 1. Since the disc has circumference 1.5.5/2 + 0.5] + P [V ≤ −0.5 0 (6) (7) So the CDF of L is = 1/2(l + 1) − 1/2(−l + 1) = l (8) (c) By taking the derivative of FL (l). for 0 ≤ w ≤ 1. Furthermore.4.7. 1/2 + (1/2)δ(w − 1) 0 ≤ w ≤ 1 0 otherwise (5) wfW (w) dw = 0 1 w[1/2 + (1/2)δ(w − 1)] dw (6) (7) = 1/4 + 1/2 = 3/4. FU (u) =  1 u > 2. that Y = 10 when X < 0.9 Solution The uniform (0. Y = g(X) = 10 X<0 −10 X ≥ 0 (1) we follow the same procedure as in Problem 3. i. U 1 U ≤1 U >1 (1) The uniform random variable U is subjected to the following clipper.. W = g(U ) = (2) To find the CDF of the output of the clipper. We also know that −10 ≤ Y < 10 when X ≥ 0. FW (w) = 0 for w < 0. We know that Y is always less than −10. We attempt to express the CDF of Y in terms of the CDF of X. Problem 3. this implies W is nonnegative.  0 u/2 0 ≤ u < 2.10 Solution Given the following function of random variable X. and finally.e. FW (w) = P [W ≤ w] = P [U ≤ w] = FU (w) = w/2 Lastly. Therefore w<0 0≤w<1 w≥1 (4) From the jump in the CDF at w = 1. we observe that it is always true that W ≤ 1. First. The corresponding PDF can be found by taking the derivative and using the delta function to model the discontinuity.7. we see that P [W = 1] = 1/2.  u < 0. 0 otherwise. Therefore  y < −10  0 P [X ≥ 0] = 1 − FX (0) −10 ≤ y < 10 (2) FY (y) = P [Y ≤ y] =  1 y ≥ 10 105 . W . we remember that W = U for 0 ≤ U ≤ 1 while W = 1 for 1 ≤ U ≤ 2. the CDF of W is   0 w/2 FW (w) =  1 fW (w) = The expected value of W is E [W ] = ∞ −∞ (3) This implies FW (w) = 1 for w ≥ 1.7.Problem 3. 2) random variable U has PDF and CDF fU (u) = 1/2 0 ≤ u ≤ 2. E [W ] = ∞ −∞ g(u)fW (w) du = 0 1 u(1/2) du = 1/4 (8) As we expect. Hence. the complete expression for the CDF is  w<0  0 (w + 1)/2 0 ≤ w ≤ 1 FW (w) = (5)  1 w>1 fW (w) = (δ(w) + 1)/2 0 ≤ w ≤ 1 0 otherwise (6) By taking the derivative of the CDF. we find the PDF of W . we must keep in mind that the discontinuity in the CDF at w = 0 yields a corresponding impulse in the PDF.7. FW (w) = P [U ≤ w] = w −1 0 −1 fU (u) du = 1/2 (2) (3) fU (u) du = (w + 1)/2 (4) Finally. Problem 3. we see that FW (w) = 0 for w < 0. we observe that the rectifier output W is a mixed random variable since P [W = 0] = P [U < 0] = The above facts imply that FW (0) = P [W ≤ 0] = P [W = 0] = 1/2 Next. c). In this case. which implies FW (w) = 1 for w ≥ 1. Y = aX has CDF and PDF FY (y) = FX (y/a) (a) If X is uniform (b.10. we note that for 0 < w < 1. we can calculate the expected value E [W ] = 0 1 w(δ(w) + 1)/2 dw = 0 + 0 1 (w/2) dw = 1/4 (7) Perhaps an easier way to find the expected value is to use Theorem 2. Next. however. 106 .11 Solution The PDF of U is fU (u) = 1/2 −1 ≤ u ≤ 1 0 otherwise (1) Since W ≥ 0.19 states that for a constant a > 0.7. From the PDF. ac) random variable. then Y = aX has PDF fY (y) = 1 fX (y/a) = a 0 1 a(c−b) fY (y) = 1 fX (y/a) a (1) b ≤ y/a ≤ c otherwise = 0 1 ac−ab ab ≤ y ≤ ac otherwise (2) Thus Y has the PDF of a uniform (ab.Problem 3. U ≤ 1 implies W ≤ 1. both approaches give the same answer.12 Solution Theorem 3. then Y = aX has PDF 1 2 2 e−((y/a)−µ) /2σ fY (y) = fX (y/a) = √ a 2πσ 2 1 2 2 2 e−(y−aµ) /2(a σ ) =√ 2σ2 2πa (7) (8) (9) Thus Y is a Gaussian random variable with expected value E[Y ] = aµ and Var[Y ] = a2 σ 2 . 0 0 λn (y/a)n−1 e−λ(y/a) a(n−1)! (λ/a)n y n−1 e−(λ/a)y (n−1)! y/a ≥ 0 otherwise y ≥ 0. σ) random variable. Y is a Gaussian (aµ.19. (c) Using Theorem 3. That is. 107 . (5) (6) Problem 3. aσ) random variable. we can find the CDF of the function Y = a + (b − a)X =P X≤ = FX Therefore the CDF of Y is FY (y) =   0  1 FY (y) = P [Y ≤ y] = P [a + (b − a)X ≤ y] y−a b−a y−a y−a = b−a b−a (1) (2) (3) (4) y−a b−a By differentiating with respect to y we arrive at the PDF fY (y) = y<a a≤y≤b y≥b (5) 1/(b − a) a ≤ x ≤ b 0 otherwise (6) which we recognize as the PDF of a uniform (a.7. otherwise. λ) PDF. the PDF of Y = aX is 1 fY (y) = fX (y/a) = a = which is an Erlang (n.(b) Using Theorem 3.13 Solution If X has a uniform distribution from 0 to 1 then the PDF and corresponding CDF of X are   0 x<0 1 0≤x≤1 x 0≤x≤1 fX (x) = FX (x) = 0 otherwise  1 x>1 For b − a > 0.19. (d) If X is a Gaussian (µ. the PDF of Y = aX is fY (y) = 1 fX (y/a) = a = 0 λ −λ(y/a) ae y/a ≥ 0 otherwise (3) (4) (λ/a)e−(λ/a)y y ≥ 0 0 otherwise Hence Y is an exponential (λ/a) exponential random variable. b) random variable. Similarly.16 Solution We can prove the assertion by considering the cases where a > 0 and a < 0.7. it is desirable that the function F −1 (u) exist for all 0 ≤ u ≤ 1. FY (y) = P [0 ≤ X ≤ 1] = P [Y = 1/2] = 1/4 Next.7. Asa result. it doesn’t matter whether F −1 (u) exists at u = 0 or u = 1. Thus. Problem 3. Also. FY (1/2) = P [Y = 1/2] = 1/4. For the case where a > 0 we have FY (y) = P [Y ≤ y] = P X ≤ Therefore by taking the derivative we find that fY (y) = 1 fX a y−b a a>0 (2) y−b = FX a y−b a (1) 108 . Thus. FY (y) = P [X ≤ y] = y 0 (2) fX (x) dx = y 2 /4 (3) Lastly.7. for the continuous uniform random variable U . P [Y = 1/2] = P [0 ≤ X ≤ 1] = 1 0 fX (x) dx = 0 1 (x/2) dx = 1/4 (1) (b) Since Y ≥ 1/2. P [U = 0] = P [U = 1] = 0. for 1 < y ≤ 2. it is a zero probability event that F −1 (U ) will be evaluated at U = 0 or U = 1.14 Solution Problem 3.Since X = F −1 (U ). respectively. However. FY (y) = 1 for y ≥ 2. The complete expression of the CDF is  y < 1/2  0   1/4 1/2 ≤ y ≤ 1 FY (y) =  y 2 /4 1 < y < 2   1 y≥2 (4) Problem 3. we can conclude that FY (y) = 0 for y < 1/2.15 Solution The relationship between X and Y is shown in the following figure: 3 2 Y 1 0 0 1 X 2 3 (a) Note that Y = 1/2 if and only if 0 ≤ X ≤ 1. since Y ≤ 2. for 1/2 < y ≤ 1. x1 ]. which contradicts the assumption that FX (x) has no flat intervals.7. we find that for negative a. Since 0 ≤ F (x) ≤ 1.. FX (x) is flat over the interval [x0 . we can write for 0 ≤ u ≤ 1. we can assume x1 > x0 since otherwise we could exchange the points x0 and x1 . u+ ].7. since F (x) is an increasing function. 0 0 0 0 0 0 0 0 Since FX (x0 ) = FX (x+ ) and FX (x) is nondecreasing. x1 ]. Consider any u in the interval [u− . Thus. To see this. (b) In this part. there is a unique x0 such that FX (x) = u. Problem 3. we have for 0 ≤ u ≤ 1. Without loss of generality.18 Solution (a) Given FX (x) is a continuous function. 1 fX |a| y−b a (5) Understanding this claim may be harder than completing the proof. there exists u− = FX (x− ) and u+ = FX (x+ ) with u− < u+ . FU (u) = P [F (X) ≤ u] = P X ≤ F −1 (u) = FX F −1 (u) Since FX (x) = F (x). FU (u) = F (F −1 (u)) = u Hence the complete CDF of U is   0 u<0 u 0≤u<1 FU (u) =  1 u≥1 (3) (2) (1) Problem 3. Moreiver. i. there exists x0 such that FX (x0 ) = u. the corresponding x0 is unique. This implies FU (u) = 0 for u < 0 and FU (u) = 1 for u ≥ 1. the same x0 is the minimum of all x such that FX (x ) ≥ u. That is. U is a uniform [0. for any u ∈ (0. we are given that FX (x) has a jump discontinuity at x0 . suppose there were also x1 such that FX (x1 ) = u. (1) . 1] random variable. Since FX (x0 ) = FX (x1 ) = u. we know that 0 ≤ U ≤ 1. the fact that FX (x) is nondecreasing implies FX (x) = u for all x ∈ [x0 . The uniqueness of x0 such that FX (x)x0 = u permits us to −1 ˜ define F (u) = x0 = FX (u). 0 FX (x) ≥ FX (x0 ) = u+ .17 Solution That is. 1). For each value of u. Moreover. 0 109 x ≥ x0 . 1 fY (y) = − fX a y−b a a<0 (4) A valid expression for both positive and negative a is fY (y) = Therefore the assertion is proved.Similarly for the case when a < 0 we have FY (y) = P [Y ≤ y] = P X ≥ y−b = 1 − FX a y−b a (3) And by taking the derivative.e. we define. P [B] = P [U ≤ FX (x)] = FX (x) since P [U ≤ u] = u for any u ∈ [0. It follows that the minimum element min{x |x ∈ L} ≤ x. First. FX (x) < u for x < x0 and FX (x) ≥ u for x ≥ x0 . (9) (8) (7) Problem 3. we suppose event B is true so that U ≤ FX (x). min x |FX x ≥ U ≤ x. FX (x) < FX x− = u− . Since x0 ≤ x. (2) Thus for any u satisfying u− ≤ u ≤ u+ . We define the set L = x |FX x ≥ U . ˆ Note that P [A] = P [X ≤ x]. the events A: min x |FX x ≥ U ≤ x. 1].Moreover. That is. we need to show that A ⊂ B and that B ⊂ A. To prove the claim. we observe that ˆ ˜ P X ≤ x = P F (U ) ≤ x = P min x |FX x ≥ U ≤ x . it follows from FX (x) being nondecreasing that FX (x0 ) ≤ FX (x). That is. • To show B ⊂ A. event B is true. o 0 ˜ Thus. F (u) = min{x|FX (x) ≥ u} = x0 . This fact implies ˆ P X ≤ x = P [A] = P [B] = P [U ≤ FX (x)] = FX (x) . (4) (5) (3) B : U ≤ FX (x) . In addition.8. As always. suppose A is true and min{x |FX (x ) ≥ U } ≤ x. for any x.1 Solution The PDF of X is fX (x) = 1/10 −5 ≤ x ≤ 5 0 otherwise (1) 110 . (c) We note that the first two parts of this problem were just designed to show the properties of ˜ F (u). We can thus conclude that U ≤ FX (x0 ) ≤ FX (x) . • To show A ⊂ B. We note x ∈ L. 0 0 x < x0 . We will show that the events A and B are the same. which is simply event A. (6) All that remains is to show A and B are the same. This implies there exists x0 ≤ x such that FX (x0 ) ≥ U . the conditional PDF of X given B is fX|B (x) = fX (x) /P [B] x ∈ B = 0 otherwise 1/6 |x| ≤ 3 0 otherwise (3) (b) Given B.15. the conditional expected value of X is E[X|B] = (a + b)/2 = 0. Problem 3. the conditional PDF of Y given A is fY |A (y) = = fY (y) /P [A] x ∈ A 0 otherwise (1/5)e−y/5 /(1 − e−2/5 ) 0 ≤ y < 2 0 otherwise (3) (4) (b) The conditional expected value of Y given A is E [Y |A] = ∞ −∞ yfY |A (y) dy = 1/5 1 − e−2/5 2 0 ye−y/5 dy (5) Using the integration by parts formula yields E [Y |A] = u dv = uv − v du with u = y and dv = e−y/5 dy 2 2 1/5 −5ye−y/5 + 5e−y/5 dy 0 1 − e−2/5 0 2 1/5 −10e−2/5 − 25e−y/5 = 0 1 − e−2/5 −2/5 5 − 7e = 1 − e−2/5 (6) (7) (8) 111 . the conditional variance of X is Var[X|B] = (b − a)2 /12 = 3.6. From Theorem 3. (c) From Theorem 3.6. b] with a = −3 and b = 3. the PDF of Y is fY (y) = (a) The event A has probability P [A] = P [Y < 2] = 0 2 (1/5)e−y/5 y ≥ 0 0 otherwise (1) (1/5)e−y/5 dy = −e−y/5 2 0 = 1 − e−2/5 (2) From Definition 3.(a) The event B has probability P [B] = P [−3 ≤ X ≤ 3] = 3 −3 3 1 dx = 10 5 (2) From Definition 3.15. we see that X has a uniform PDF over [a.8.2 Solution From Definition 3.6. 8. fW (w) is symmetric about w = 0. we obtain 32 E [W |C] = √ 32π (c) The conditional second moment of W is E W 2 |C = ∞ −∞ 32 e−v dv = √ 32π (4) w2 fW |C (w) dw = 2 112 ∞ 0 w2 fW (w) dw (5) . Using the PDF in Example 3. the conditional PDF of Y given event R is fY |R (y) = 24y 2 0 ≤ y ≤ 1/2 0 otherwise (2) The conditional expected value and mean square value are E [Y |R] = E Y 2 |R = The conditional variance is Var [Y |R] = E Y 2 |R − (E [Y |R])2 = The conditional standard deviation is σY |R = 3 − 20 3 8 2 ∞ −∞ ∞ −∞ yfY |R (y) dy = y 2 fY |R (y) dy = 1/2 0 24y 3 dy = 3/8 meter 24y 4 dy = 3/20 m2 (3) (4) 1/2 0 = 3/320 m2 (5) Var[Y |R] = 0. the PDF of W is fW (w) = √ 1 2 e−w /32 32π (1) (a) Since W has expected value µ = 0.5. From Definition 3.0968 meters.15. Problem 3. 1/2].8. we have P [R] = 0 1/2 fY (y) dy = 0 1/2 3y 2 dy = 1/8 (1) Therefore. the conditional PDF of W given C is √ 2 fW (w) /P [C] w ∈ C 2e−w /32 / 32π w > 0 fW |C (w) = (2) = 0 otherwise 0 otherwise (b) The conditional expected value of W given C is E [W |C] = 2 wfW |C (w) dw = √ 4 2π −∞ ∞ ∞ 0 ∞ 0 we−w 2 /32 dw (3) Making the substitution v = w2 /32. Hence P [C] = P [W > 0] = 1/2.Problem 3.4 Solution From Definition 3.8.3 Solution The condition right side of the circle is R = [0. 02)fT (τ ) dτ = E [T + 0.03 (5) (6) (b) The conditional second moment of T is E T 2 |T > 0.02 = otherwise 100e−100(t−0.8.02 t(100)e−100(t−0.02)2 fT (τ ) dτ (8) (9) (10) = E (T + 0.We observe that w2 fW (w) is an even function.02) dt (7) ∞ ∞ (τ + 0.02 (t) = 0 fT (t) P [T >0.02 = e−2 (2) From Definition 3.02)2 113 .02 = = 0 0 ∞ 0.02] = = 0 0 ∞ ∞ ∞ 0.02 t2 (100)e−100(t−0.02] = The substitution τ = t − 0.02 0 otherwise (3) The conditional expected value of T is E [T |T > 0.02) t ≥ 0.02] t ≥ 0. the conditional PDF of T is fT |T >0.02 100e−100t t ≥ 0 0 otherwise (1) fT (t) dt = −e−100t ∞ 0.02)(100)e−100τ dτ (τ + 0. Hence E W 2 |C = 2 = ∞ 0 ∞ −∞ w2 fW (w) dw (6) (7) w2 fW (w) dw = E W 2 = σ 2 = 16 Lastly. the conditional variance of W given C is Var[W |C] = E W 2 |C − (E [W |C])2 = 16 − 32/π = 5.02) dt (4) (τ + 0.5 Solution (a) We first find the conditional PDF of T . The PDF of T is fT (t) = The conditioning event has probability P [T > 0.02 yields E [T |T > 0.02] = 0.81 (8) Problem 3.02] = ∞ 0.02)2 (100)e−100τ dτ (τ + 0.02 = The substitution τ = t − 0.02 yields E T 2 |T > 0.15. Var[T |T > 0.01 2 (11) (12) (13) (14) − (E [T + 0. we can write fD|D>0 (y) = 0 fD (y) P [D>0] 0+ fD (y) dy = 0.02 − (E [T |T > 0. Thus.02] = Var[T ] = 0.02])2 = E (T + 0. we found that the PDF of D is fD (y) = 0.7 Solution (a) Given that a person is healthy.7e−(y−60)/10 The conditional PDF is fD|D≤70 (y) = =    0 fD (y) P [D≤70] y ≤ 70 otherwise 0 ≤ y < 60 60 ≤ y ≤ 70 otherwise (7) 0 0.3 δ(y) 1−0. we can calculate the conditional PDF by first calculating P [D ≤ 70] = = 0 70 0 60 fD (y) dy 0.7e−1 0. A second way to find this probability is P [D > 0] = From Definition 3.7e−1 (8) Problem 3.07e−(y−60)/10 y ≥ 60 ∞ (1) First.8.3δ(y) dy + 70 60 70 60 (4) 0.02] = E T 2 |T > 0.3 + −0.07e−(y−60)/10 dy = 1 − 0.7 (2) y>0 = otherwise (1/10)e−(y−60)/10 y ≥ 60 0 otherwise (3) (b) If instead we learn that D ≤ 70.7.02) = Var[T + 0. 1 1 2 2 2 fX|H (x) = √ e−(x−µ) /2σ = √ e−(x−90) /800 σ 2π 20 2π 114 (1) .15.02]) 2 Problem 3.8.07 e−(y−60)/10 1−0.Now we can calculate the conditional variance.6 Solution (a) In Problem 3.3δ(y) y < 60 0. σ = 20) random variable.6. X is a Gaussian (µ = 90.8.7e−1 (5) (6) = 0. we observe that D > 0 if the throw is good so that P [D > 0] = 0. P H|T − = P [T − |H] P [H] P [T − |D] P [D] + P [T − |H] P [H] 0.106(0.847. given the event H. 0 otherwise. we have P H|T − = P [T − |H] P [H] P [T − |H] P [H] = P [T − ] P [T − |D] P [D] + P [T − |H] P [H] (6) In the denominator. and T + are mutually exclusive and collectively exhaustive. T 0 .8.5|H = 1 − Φ(2.841 − 0. we use the conditional PDF fX|H (x) to calculate the required probabilities P T + |H = P [X ≥ 140|H] = P [X − 90 ≥ 50|H] =P Similarly. P T − |H = P [X ≤ 110|H] = P [X − 90 ≤ 20|H] =P (4) (5) (2) (3) X − 90 ≥ 2.8 Solution 115 .841 20 (c) Using Bayes Theorem.25) = 0.986 = 0.5) = 0.006 = 0.1) + 0.9) = 0. .006 20 X − 90 ≤ 1|H = Φ(1) = 0. each test has conditional failure probability of q = 0. . Given H.25) = 1 − Φ(1. (13) Problem 3.841(0.153 (12) We say that a test is a failure if the result is T 0 .153. Thus.9) (10) (11) (d) Since T − . . P T 0 |H = 1 − P T − |H − P T + |H = 1 − 0. 2. we need to calculate P T − |D = P [X ≤ 110|D] = P [X − 160 ≤ −50|D] =P (7) (8) (9) X − 160 ≤ −1.106 Thus. or success probability p = 1 − q = 0.25|D 40 = Φ(−1.841(0. the number of trials N until a success is a geometric (p) random variable with PMF PN |H (n) = (1 − p)n−1 p n = 1.(b) Given the event H. . ∆/2) random variable. 0 otherwise. Y = i∆ + ∆/2. r/2) PDF fX (x) = we observe that P [Bi ] = 1/r −r/2 ≤ x < r/2. fZ|Bi (z) is non-uniform for all Bi . For example.(a) The event Bi that Y = ∆/2 + i∆ occurs if and only if i∆ ≤ X < (i + 1)∆. fZ|Bi (z) = 1/∆ −∆/2 ≤ z < ∆/2 0 otherwise (4) (b) We observe that fZ|Bi (z) is the same for every i.9 Solution For this problem. Thus P [Bi ] = (i+1)∆ i∆ 8x 8∆(i∆ + ∆/2) dx = 2 r r2 x ∆(i∆+∆/2) (2) It follows that the conditional PDF of X given Bi is fX|Bi (x) = 0 fX (x) P [Bi ] x ∈ Bi = otherwise 0 i∆ ≤ x < (i + 1)∆ otherwise (3) Given event Bi . we can write fZ (z) = i P [Bi ] fZ|Bi (z) = fZ|B0 (z) i P [Bi ] = fZ|B0 (z) (5) Thus. Z is a uniform (−∆/2. b) random variable. Thus. suppose X has the “triangular” PDF fX (x) = 8x/r2 0 ≤ x ≤ r/2 0 otherwise (1) In this case. Var[Z] = ∆2 (∆/2 − (−∆/2))2 = .8. 116 . since X has the uniform (−r/2. This implies fZ|Bi (z) = fX|Bi (z + i∆ + ∆/2) = 0 z+i∆+∆/2 ∆(i∆+∆/2) −∆/2 ≤ z < ∆/2 otherwise (4) We observe that the PDF of Z depends on which event Bi occurs. almost any non-uniform random variable X will yield a non-uniform random variable Z. 12 12 (6) Problem 3. the conditional PDF of X given Bi is fX|Bi (x) = fX (x) /P [B] x ∈ Bi = 0 otherwise (3) It follows that given Bi . the event Bi that Y = i∆ + ∆/2 occurs if and only if i∆ ≤ X < (i + 1)∆. In particular. From the definition of a uniform (a. Z = X − Y = X − ∆/2 − i∆. Moreover. Z has mean and variance E [Z] = 0. which is a uniform (−∆/2. (i+1)∆ i∆ (1) ∆ 1 dx = r r 1/∆ i∆ ≤ x < (i + 1)∆ 0 otherwise (2) In addition. That is. so that Z = X − Y = X − i∆ − ∆/2. ∆/2) random variable. 2) sb=[-5. Here is an example plot: 300 Relative Frequency 250 200 150 100 50 0 −15 −10 −5 0 x 5 10 15 As expected.3. generating samples of the receiver voltage is easy in Matlab. hist(x. generate 10.2 Solution The modem receiver voltage is genrated by taking a ±5 voltage representing data. 117 .1 y=4*rand(m. 4) random variable.20.m).4) random %variable Y of Quiz 3. %Usage: x=modemrv(m) %generates m samples of X.1 Solution Taking the derivative of the CDF FY (y) in Quiz 3.pb. By Theorem 3. The commands x=modemrv(10000). noise=gaussrv(0. 0.32. Problem 3.1). we obtain fY (y) = 1/4 0 ≤ y ≤ 4 0 otherwise (1) We see that Y is a uniform (0. %X=+-5 + N where N is Gaussian (0. 5]. x=b+noise.Problem 3. then Y = 4X is a uniform (0.1. 1) random variable.9.2. the modem %receiver voltage in Exampe 3. Although siuations in which two random variables are added together are not analyzed until Chapter 4. 4) random variable.9. Here is the code: function x=modemrv(m).5. the result is qualitatively similar (“hills” around X = −5 and X = 5) to the sketch in Figure 3. the program quiz31rv is essentially a one line program: function y=quiz31rv(m) %Usage y=quiz31rv(m) %Returns the vector y holding m %samples of the uniform (0. 1) random variable. and adding to it a Gaussian (0. 2) noise variable.000 sample of the modem receiver voltage and plots the relative frequencies using 100 bins. b=finiterv(sb.5]. if X is a uniform (0.100). Using rand as Matlab’s uniform (0. pb=[0.m). figure. . In fact.5 × 10−3 . -0.231641888. but the relative error is so small that it isn’t visible in comparison to e(6) ≈ −3. the right hand graph plots |e(z)| versus z in log scale where we observe very small relative errors on the order of 10−7 . -0. qhat=qapprox(z)..^5]*a).9.7265760135. semilogy(z. To compare this technique with that use in geometricrv. given p. Here are the output figures of qtest.*z(:))).^3 t. Problem 3.e).7107068705. Here is the Matlab function that implements this technique: function k=georv(p. %approximation to the Gaussian % (0.^2 t. we first examine the code for exponentialrv.5307027145].0+(0. we can write λ = − ln(1 − p) and X is a geometric (p) random variable. q=1. This code generates two plots of the relative error e(z) as a function of z: z=0:0.9.127414796.1)).02:6.m) x=-(1/lambda)*log(1-rand(m.^2)/2).m.m))./q. if X is an exponential (λ) random variable. 0.9. e=(q-qhat).0-phi(z(:)). a=[0.Problem 3.*exp(-(z(:). k=ceil(exponentialrv(lambda./(1. e(z) is nonzero over that range.^4 t.abs(e)). To see the error for small z. then K = X is a geometric (p) random variable with p = 1 − e−λ . p=([t t. lambda= -log(1-p).142248368.m: function x=exponentialrv(lambda.4 Solution By Theorem 3. plot(z.3 Solution ˆ The code for Q(z) is the Matlab function function p=qapprox(z).m: 1 0 e(z) −1 −2 −3 −4 0 2 z 4 6 x 10 −3 10 10 10 10 10 −2 −4 −6 −8 −10 0 2 4 6 The left side plot graphs e(z) versus z. 0.1) complementary CDF Q(z) t=1.m). It appears that the e(z) = 0 for z ≤ 3.. 118 . Thus. n). any w in the interval [−3. so that the Matlab code W=icdfrv(@iwcdf.n) delta=0. 3] satisfies FW (w) = 1/4. the function W = FW (U ) produces a random variable with CDF FW (w). exponentialrv(lambda. For a uniform (0. in terms of generating samples of random variable W . generates n samples of an exponential λ random variable and plots the relative frequency ni /(n∆) against the corresponding exponential PDF.1). Careful inspection will show that u = (w + 5)/8 for −5 ≤ w < −3 and that u = 1/4 + 3(w − 3)/8 for −1 −3 ≤ w ≤ 5. plot(xr. we have that K= X = ln(1 − R) ln(1 − p) (2) ln(1 − R) λ (1) This is precisely the same function implemented by geometricrv. In particular. However. Problem 3. Thus. In short. let R = rand(1. we define the inverse CDF as −1 w = FW (u) = Problem 3. for a uniform (0.9. the following code function exponentialtest(lambda. P [U = 1/4] = 0. x=exponentialrv(lambda.fxsample).0/lambda))’. w=((u>=0).*(8*u-5))+.*((8*u+7)/3)).000 and n = 100.5 Solution 8u − 5 0 ≤ u ≤ 1/4 (8u + 7)/3 1/4 < u ≤ 1 (1) −1 Note that because 0 ≤ FW (w) ≤ 1. the two methods for generating geometric (p) random samples are one in the same.m. ((u > 0.. 3]. 1) random variable U . fx=exponentialpdf(lambda.To analyze how m = 1 random sample is generated. xr=(0:delta:(5.*(u<=1).01.25). we define function w=iwcdf(u).9.25). we need to find the “inverse” function that finds the value of w satisfying u = FW (w). In terms of mathematics. The problem is that for u = 1/4. 1) random variable U . Two representative plots for n = 1. Given 0 ≤ u ≤ 1. To implement this solution in Matlab.m) generates m samples of random variable W .1) generates the random variable X=− For λ = − ln(1 − p).xr)/(n*delta)). the inverse FW (u) is defined only for 0 ≤ u ≤ 1.xr).*(u <= 0. fxsample=(histc(x. Note that the histc function generates a histogram using xr to define the edges of the bins..fx. Thus we can choose any w ∈ [−3.000 samples appear in the following figure: 119 .xr. this doesn’t matter.6 Solution (a) To test the exponential random variables. 1. (b) Similar results hold for Gaussian random variables.5 0.1 0 0 2 4 6 gausstest(3. y=(delta/2)+delta*floor(x/delta). fx=gausspdf(mu.r/2) b bit quantizer n=2^b.xr)/(n*delta)).(r-delta/2)/2).-(r-delta/2)/2). r/2) b-bit quantizer.1.1.xr).100000) For n = 1.xr.8 0. function y=uquantize(r. Here are two typical plots produced by gaussiantest.x) %uniform (-r/2.sigma2.4 0.000.n).01. x=min(x. fxsample=(histc(x. For n = 100.1000) exponentialtest(1.6 0. plot(xr.2 0.5 1.m: 1 0.4 0.000. The function uquantize does this.n) delta=0.5 0.100000) Problem 3.sigma2. xr=(0:delta:(mu+(3*sqrt(sigma2))))’. x=max(x.sigma2.5 0 0 1 2 3 4 5 0 0 1 2 3 4 5 exponentialtest(1. function gausstest(mu. delta=r/n. the greater smoothness of the curve demonstrates how the relative frequency is becoming a better approximation to the actual PDF.3 0.fxsample). 120 .fx.9.2 0 0 2 4 6 0. The following code generates the same comparison between the Gaussian PDF and the relative frequency of n samples.7 Solution First we need to build a uniform (−r/2. the jaggedness of the relative frequency occurs because δ is sufficiently small that the number of sample of X in each bin i∆ < X ≤ (i+1)∆ is fairly small.1000) gausstest(3.b.5 1 1 0. x=gaussrv(mu. the event X = 1/4 has probability zero. r/2). Next. U = F −1 (X). For a Gaussian random variable X. z=x-y. In this problem. given a value of x. we ave to find the value of u such that x = FU (u). However.m) x=gaussrv(0. Given U with CDF FU (u) = F (u). One complication is that in the theorem.3. To generate the inverse CDF. 3 bits. 15000 10000 5000 0 −2 15000 10000 5000 0 −1 15000 10000 5000 0 −0.5 0 2 0 1 0 0. 1) random variable X. To solve this problem.m eliminates Gaussian samples outside the range (−r/2.22 with the roles of X and U reversed. At x = 1/4. However. Recall that random variable U defined in Problem 3. except for x = 1/4. we can arbitrarily define a value for −1 −1 FU (1/4) because when we produce sample values of FU (X). hist(z.22. As a result. we are using U for the random variable we want to derive.5 b=1 b=2 b=3 It is obvious that for b = 1 bit quantization.8 Solution 1 <x≤1 4 0≤x≤ 1 4 ⇒x= u+5 8 1 3 ⇒ x = + (u − 3) 4 8 (2) (3) (4) 121 . we will use Theorem 3. To focus our attention on the effect of b bit quantization. we need to find the inverse −1 functon F −1 (x) = FU (x) so that for a uniform (0.Note that if |x| > r/2.x).100).1. quantize them and record the errors: function stdev=quantizegauss(r. When we generate enough Gaussian samples. we want to use Theorem 3. r/2) range. −1 the inverse FU (x) is well defined over 0 < x < 1.5 1/4 −3 ≤ u < 3 FU (u) = (1)   1/4 + 3(u − 3)/8 3 ≤ u < 5 0    −5 0 5 1 u ≥ 5. U denotes the uniform random variable while X is the derived random variable. we will always see some quantization errors due to the finite (−r/2.^2)/length(z)). quantizegauss. there are multiple values of u such that FU (u) = 1/4. stdev=sqrt(sum(z. u At x = 1/4.b. then x is truncated so that the quantizer output has maximum amplitude. P [|X| > r/2] > 0 for any value of r. Here are outputs of quantizegauss for b = 1. it appears that the error is uniform for b = 2 and b = 3. the error is decidely not uniform.b. 0 < x < 1. From the CDF we see that FU(u) Problem 3. 2. we generate Gaussian samples.m). y=uquantize(r. x=x((x<=r/2)&(x>=-r/2)). You can verify that uniform errors is a reasonable model for larger values of b.7 has CDF  1 u < −5  0    (u + 5)/8 −5 ≤ u < −3  0.9. we would expect to see about 3.   0    1/8  0 fU (u) =   3/8    0 u < −5 −5 ≤ u < −3 −3 ≤ u < 3 3≤u<5 u ≥ 5. Finally. we can generate a histogram of a million sample values of U using the commands u=urv(1000000). We will see in Chapter 7 that the fraction of samples in a bin does converge to the probability of a sample being in that bin as the number of samples m goes to infinity. the number of samples. Because the histpgram is always the same height. A sample value of U would fall in that bin with probability fU (u0 )∆. (6) Problem 3. For −5 < u0 < −3.100). A Matlab program to implement this solution is now straightforward: function u=urv(m) %Usage: u=urv(m) %Generates m samples of the random %variable U defined in Problem 3. 4 4 3 Count 2 1 0 −5 0 u 5 x 10 Note that the scaling constant 104 on the histogram plot comes from the fact that the histogram was generated using 106 sample points and 100 bins. what is actually happening is that the vertical axis is effectively scaled by 1/m and the height of a histogram bar is proportional to the fraction of m samples that land in that bin.9.6. u = F −1 (x) = 8x − 5 0 ≤ x ≤ 1/4 (8x + 7)/3 1/4 < x ≤ 1 (5) In particular. Consider a bin of idth ∆ centered at u0 .7 x=rand(m. u=u+(x>1/4). we comment that if you generate histograms for a range of values of m.1). u=(x<=1/4). when X is a uniform (0. This gives the (false) impression that any bin centered on u0 has a number of samples increasingly close to mfU (u0 )∆. hist(u.1. we would expect about mfU (u0 )∆ = 105 fU (u0 ) samples in each bin.9 Solution 1 FX(x) 0. we would expect to see about 1. 1) random variable. The width of each bin is ∆ = 10/100 = 0.5 0 −2 From Quiz 3.75 × 104 samples in each bin.These conditions can be inverted to express u as a function of x.  0 (x + 1)/4 −1 ≤ x < 1. As can be seen. you will see that the histograms will become more and more similar to a scaled version of the PDF. Given that we generate m = 106 samples. alongside the corresponding PDF of U . 122 (1) 0 x 2 . The output is shown in the following graph.3. To see that this generates the correct output. FX (x) =  1 x ≥ 1. these conclusions are consistent with the histogam data. random variable X has CDF The CDF of X is  x < −1.25 × 104 samples in each bin.*(8*x-5).*(8*x+7)/3. U = F −1 (X) will generate samples of the rndom variable U . For 3 < u0 < 5. Following the procedure outlined in Problem 3.0*(u>=0.7. then F (U ) has the same CDF as X. These facts imply 4u − 1 0 < u < 1/4 ˜ F (u) = (3) 1 1/4 ≤ u ≤ 1 123 . or equivalently. 1) random variable. (x+1)/4 = u. function x=quiz36rv(m) %Usage x=quiz36rv(m) %Returns the vector x holding m samples %of the random variable X of Quiz 3. x = 4u − 1.25))+(1.18. (2) ˜ It follows that if U is a uniform (0. We observe that if 0 < u < 1/4. For 1/4 ≤ u ≤ 1. the minimum x that satisfies FX (x) ≥ u is x = 1. x=((4*u-1). ˜ F (u) = min {x|FX (x) ≥ u} .1).6 u=rand(m. In this case.*(u< 0. This is trivial to implement in Matlab. then we can choose x so that FX (x) = u. we define for 0 < u ≤ 1.25)). ∞) = 1 − e−x x ≥ 0 0 otherwise (2) (1) (c) Likewise for the marginal CDF of Y . Y ≤ ∞] ≤ P [X ≤ −∞] = 0 (d) Part (d) follows the same logic as that of part (a).Y (x. Y ≤ 3] can be found be evaluating the joint CDF FX.Y (x.Y (∞. Therefore the following is true. Y ≤ −∞] ≤ P [Y ≤ −∞] = 0 (b) The probability that any random variable is less than infinity is always one. FY (y) = FX. FX. we have FX. FX. Y ≤ y] = P [Y ≤ y] = FY (y) (5) (4) (3) (2) (1) 124 . FX (x) = FX.1. This yields P [X ≤ 2. P [X ≤ −∞] = 0. FX.Y (−∞. ∞) = P [X ≤ −∞. FX (x). Y ≤ 3] = FX. we evaluate the joint CDF at X = ∞.2 Solution (a) Because the probability that any random variable is less than −∞ is zero.1 Solution (a) The probability P [X ≤ 2. 3) = (1 − e−2 )(1 − e−3 ) (b) To find the marginal CDF of X. Y ≤ y] ≤ P [X ≤ −∞] = 0 (e) Analogous to Part (b). y) = P [X ≤ ∞.Problem Solutions – Chapter 4 Problem 4.1. −∞) = P [X ≤ x. y) = 1 − e−y y ≥ 0 0 otherwise (3) Problem 4. we find that FX.Y (2. y) at x = 2 and y = 3.Y (−∞. y) = P [X ≤ −∞. Y ≤ ∞] = P [X ≤ x] = FX (x) (c) Although P [Y ≤ ∞] = 1.Y (x.Y (∞.Y (x. we simply evaluate the joint CDF at y = ∞. ∞) = P [X ≤ x. (2) By Theorem 4. we show that for all x1 ≤ x2 and y1 ≤ y2 .3 Solution (1) Keep in mind that the intersection of events A and B are all the outcomes such that both A and B occur. y1 ≤ Y ≤ y2 ] . y2 ) − F (x1 .1 are satisfied. specifically.1. FX (x)FY (y) is a valid joint CDF. y1 ≤ Y ≤ y2 ] Expressed in terms of the marginal and joint CDFs.5 yields P [x1 < X1 ≤ x2 . However. y2 ) − F (x2 .Y (x1 . y1 ) − FX.Y (x2 . y1 ) Problem 4. y1 ) (a) The events A.Y (x1 . ≥0 = [FX (x2 ) − FX (x1 )][FY (y2 ) − FY (y1 )] Problem 4. y1 ) Y y2 y1 Y (3) x1 x2 x1 X x1 x2 X X A 125 B C . y1 ) = FX (x2 ) FY (y2 ) − FX (x1 ) FY (y2 ) − FX (x2 ) FY (y1 ) + FX (x1 ) FY (y1 ) (2) (3) (4) (5) (6) Hence.4 Solution Its easy to show that the properties of Theorem 4. y1 < Y ≤ y2 ] ≥ 0 (1) P [x1 < X ≤ x2 .Y (x2 .5 Solution In this problem. In this case. for x1 ≤ x2 and y1 ≤ y2 . and C are Y y2 y1 y2 y1 (1) (2) − FX.We wish to find P [x1 ≤ X ≤ x2 ] or P [y1 ≤ Y ≤ y2 ].Y (x1 .Y (x2 . To convince ourselves that F (x. P [x1 ≤ X ≤ x2 . Theorem 4. We define events A = {y1 ≤ Y ≤ y2 } and B = {y1 ≤ Y ≤ y2 } so that P [A ∪ B] = P [A] + P [B] − P [AB] Problem 4. y2 ) + FX. y1 < Y ≤ y2 ] = FX. y2 ) − FX.Y (x2 . It follows that P [A ∪ B] = P [x1 ≤ X ≤ x2 ] + P [y1 ≤ Y ≤ y2 ] − P [x1 ≤ X ≤ x2 .1.Y (x1 .1. B. y1 ) .Y (x1 . y1 ≤ Y ≤ y2 }.Y (x2 . y2 ) − FX. those properties are necessary but not sufficient to show F (x.5. y2 ) + FX.Y (x1 . y1 < Y ≤ y2 ] = F (x2 .Y (x2 .5 which states P [x1 < X ≤ x2 . y1 ) + FX. y2 ) + FX. y) is a CDF. (3) (4) (5) P [A ∪ B] = FX (x2 ) − FX (x1 ) + FY (y2 ) − FY (y1 ) − FX. we prove Theorem 4. y2 ) − FX. = FX. y) is a valid CDF. AB = {x1 ≤ X ≤ x2 . y1 ) + F (x1 . y).Y (x. y ≥ 0 0 otherwise (1) Thus.Y (x2 .Y (x1 . we have the contradiction that e−(x+y) ≤ 0 for all x.Y (∞. we can write P [B] = FX.Y (x1 . P [{X > x} ∪ {Y > y}] = 1 − P [X ≤ x. y1 ) + FX. We can conclude that the given function is not a valid CDF. y1 ) (c) Since A. y2 ) − FX.Y (x. y) = First. 126 . y2 ) − FX.6 Solution The given function is FX.Y (x1 . this implies P [{X > x} ∪ {Y > y}] ≤ P [X > x] + P [Y > y] = 0 However. y). y2 ) − FX. P [X > x] = 0 For x ≥ 0 and y ≥ 0.Y (x2 . y1 ) (4) (5) (6) P [A ∪ B ∪ C] = FX.Y (x2 . y1 < Y ≤ y2 ] in terms of the joint CDF FX. we find the CDF FX (x) and FY (y). B.Y (x1 . y2 ) − FX. y ≥ 0.Y (x.1. Y ≤ y] = 1 − (1 − e−(x+y) ) = e−(x+y) (6) (5) P [Y > y] = 0 (4) 1 x≥0 0 otherwise 1 y≥0 0 otherwise (2) (3) 1 − e−(x+y) x. y1 ) − FX. and C are mutually exclusive. ∞) = FY (y) = FX. since we want to express P [C] = P [x1 < X ≤ x2 . we write P [C] = P [A ∪ B ∪ C] − P [A] − P [B] which completes the proof of the theorem. (7) (8) (9) (10) = FX.Y (x1 . for any x ≥ 0 or y ≥ 0. P [A ∪ B ∪ C] = P [A] + P [B] + P [C] However. y1 ) P [A] = FX.Y (x1 . y) = Hence.Y (x2 . FX (x) = FX.(b) In terms of the joint CDF FX.Y (x. y1 ) Problem 4. Y plane. Y plane: 4 3 2 1 0 0 y P X.4 y=1.4 x y=1.4 (6) PX.Problem 4.2.4 y>x PX.3 y (1) (2) = c [1(1 + 3) + 2(1 + 3) + 4(1 + 3)] = 28c Thus c = 1/28. (b) The event {Y < X} has probability P [Y < X] = x=1.2. y) = 1 1(1) + 2(0) = 28 28 (5) The indirect way is to use the previous results and the observation that P [Y = X] = 1 − P [Y < X] − P [Y > X] = (1 − 18/28 − 9/28) = 1/28 (e) P [Y = 3] = x=1.Y (x. it is helpful to label points with nonzero probability on the X. y) = 18 1(0) + 2(1) + 4(1 + 3) = 28 28 (3) (c) The event {Y > X} has probability P [Y > X] = x=1. The direct way is to calculate P [Y = X] = x=1.2.4 y<x PX.Y (x. y) T •3c •6c •c •2c 1 2 3 •12c E x 4 •4c (a) We must choose c so the PMF sums to one: PX.2 Solution On the X.Y (x.1 Solution In this problem.2.2.Y (x.Y (x.2. y) = 9 1(3) + 2(3) + 4(0) = 28 28 (4) (d) There are two ways to solve this part.2. y) = c x=1.Y (x. the joint PMF is 127 . 3) = 21 3 (1)(3) + (2)(3) + (4)(3) = = 28 28 4 (7) Problem 4.4 y=x PX.2.3 x=1. 0) + PX. 1) + PX. 0) + PX.Y (x.Y (−2.Y (−2.Y (2. P [X = Y ] = 0. 0) + PX. (4) (5) (2) (3) (6) (7) Problem 4. (b) P [Y < X] = PX. aa. 1) = 8c = 8/14.Y (−2.Y (−2.Y (0.Y (x. There are four possible outcomes: rr.3 Solution Let r (reject) and a (accept) denote the result of each test.0. We choose c so the sum equals one.Y (0. −1) + PX. 1) = c + c + 2c + 3c = 7c = 1/2 (c) P [Y > X] = PX. PX.Y (0. ar. −1) + PX. 1) + PX. The sample tree is $ $ $$$ r $ˆ ˆ ˆ p ¨ ˆˆ ¨¨ 1−p ˆ a ¨ ¨¨ rr rr p $ $$ r r 1−p r a $$$ ˆˆˆ ˆˆ 1−p ˆ a p r •rr •ra •ar •aa p2 p(1−p) p(1−p) (1−p)2 128 .2.Y (0. y) given above.1 c |x + y| = 6c + 2c + 6c = 14c (1) Thus c = 1/14. (e) P [X < 1] = PX.Y (2. −1) + PX. −1) + PX. we sum the PMF over all possible values of X and Y .Y (−2.0.Y (−2.y PX.Y (x. −1) + PX. y) = x y x=−2. 1) = 3c + 2c + c + c = 7c = 1/2 (d) From the sketch of PX. y) T 1 •c •c •2c •3c •c c 1 ' •3c •c 2 •2c E x (a) To find c.2 y=−1.Y (2. ra. we observe that we are depicting the masses associated with the joint PMF PX.Y (x. y = 1 x = 1. Since the PMF function is defined in terms of x and y. y) =   (1 − p)2    0 (1) x = 1. y = 0 otherwise (2) The sample space is the set S = {hh.Y (x. y) given in the figure could just as well have been written PX.Y (x. y) y = 0 y = 1 x=0 0 1/4 1/4 1/4 x=1 1/4 0 x=2 Problem 4. since the expression for the PMF PX.  2  p    p(1 − p)  p(1 − p) PX. Each sample outcome specifies the values of X and Y as given in the following table outcome X Y hh 0 1 1 0 ht 1 1 th 2 0 tt The joint PMF can represented by the table PX. the axis labels should be x and y. reasonable arguments can be made for the labels being X and Y or x and y. y = 0 x = 0. The fact that we have labeled the possible outcomes by their probabilities is irrelevant. As we see in the arguments below. ·). P [·] X Y outcome 2 rr p 1 1 p(1 − p) 1 0 ra p(1 − p) 0 1 ar (1 − p)2 0 0 aa This table is esentially the joint PMF PX.2. 129 . y). th.4 Solution (1) (2) Problem 4. ht. The corresponding axis labels should be X and Y just as in Figure 4. Further. • Uppercase axis labels: On the other hand. the lowercase choice of the text is somewhat arbitrary.2.Y (x.2. it is clear that the lowercase x and y are not what matter.5 Solution As the problem statement says. • Lowercase axis labels: For the lowercase labels. tt} and each sample point has probability 1/4. we are depicting the possible outcomes (labeled with their respective probabilities) of the pair of random variables X and Y . y = 1 x = 0.Y (·. y) whose arguments are x and y.Now we construct a table that maps the sample outcomes to values of X and Y .Y (x. X (k.  n−1  k−1 (1 − p)k pn−k x = 0. k = 1.15.7 Solution When X = 1 the last test was acceptable and therefore we know that the K = k ≤ n − 1 tails must have occurred in the previous n − 1 tests. and B: test y + 1 is a rejection. Note that Y ≤ X since the number of acceptable tests before the first failure cannot exceed the number of acceptable circuits. When X = 0 that means that a rejection occurred on the last test and that the other k − 1 rejections must have occurred in the previous n − 1 tests. . n) = pn . 2.Y (n. PK. . .2. . n − 1 (2) Optional We can combine these cases into a single complete expression for the joint PMF.2. let’s consider each in turn. Since events A. Thus P [Y = y] = P [AB]. This means that both events must be satisfied. .Y (x. 0) = n−1 (1 − p)k−1 pn−1−(k−1) (1 − p) k−1 k = 1. 1) = n−1 (1 − p)k pn−1−k p k k = 0. The joint PMF of X and K is PK. n (1) Problem 4. X = x].X (k. . . PK. y) = P [X = x. PX. Y = y] = P [ABC] = P [A] P [B] P [C] = py (1 − p) P [A] P [B] (1) (2) (3) n − y − 1 x−y p (1 − p)n−y−1−(x−y) x−y P [C] = n−y−1 x p (1 − p)n−x x−y (4) The case y = x = n occurs when all n tests are acceptable and thus PX. The joint PMF.X (k. x) = P [K = k. Thus. .Problem 4. . .X (k. x) = P [K = k. PK. Thus. for 0 ≤ y ≤ x < n.2. Since X can take on only the two values 0 and 1. x) = (1 − p)k pn−k x = 1. Moreover. In this case. .8 Solution . Let K denote the number of rejected circuits that occur in n tests and X is the number of acceptable circuits before the first reject. The approach we use is similar to that used in finding the Pascal PMF in Example 2. . Y = y < n if and only if A: the first y tests are acceptable. the event X = x < n occurs if and only if there are x − y acceptable circuits in the remaining n − y − 1 tests. . X = x] can be found by realizing that {K = k. they are independent events. X = x} occurs if and only if the following events occur: 130 Problem 4.6 Solution As the problem statement indicates. . n n−1 PK. n − 1  k 0 otherwise (3) Each circuit test produces an acceptable circuit with probability p. which is the probability that K = k and X = x. . given the occurrence of AB. B and C depend on disjoint sets of tests.X (k. k = 0. 1. . Y (x. y) = PY (y) =  x=1.Y (x.Y (x.X (k.2.4 0 otherwise (b) The expected values of X and Y are E [X] = x=1.A The first x tests must be acceptable. x) = px (1 − p) P [A] P [B] n−x−1 (1 − p)k−1 pn−x−1−(k−1) k−1 P [C] (1) After simplifying. y) =  16/28 x = 4  y=1. x) = 0 n−x−1 k−1 pn−k (1 − p)k x + k ≤ n. Since the events A.2. C The remaining n − x − 1 tests must contain k − 1 rejections. the joint PMF PX. (a) The marginal PMFs of X and Y are   4/28 x = 1   8/28 x = 2 PX (x) = PX. B and C are independent. x ≥ 0. B Test x+1 must be a rejection since otherwise we would have x+1 acceptable at the beginnning. Y plane. x ≥ 0 and k ≥ 0 is PK.X (k.3. y) 3c 6c • • 12c • • 4 c • 2c • 1 2 3 4c E x By choosing c = 1/28. the PMF sums to one. k ≥ 0 otherwise (2) Problem 4.3  0 otherwise   7/28 y = 1 21/28 y = 3 PX. a complete expression for the joint PMF is PK.1 Solution On the X.3 (3) (4) E [Y ] = 131 .4 (1) (2) xPX (x) = (4/28) + 2(8/28) + 4(16/28) = 3 yPY (y) = 7/28 + 3(21/28) = 5/2 y=1.Y (x. y) is y T 4 3 2 1 0 0 PX. the joint PMF for x + k ≤ r. 0.0.0.2   6/14 2/14 (x.Y (x.1 (c) Since X and Y both have zero mean. 2 x=0 otherwise y = −1.3 (5) (6) E Y The variances are 2 = Var[X] = E X 2 − (E [X])2 = 10/7 The standard deviations are σX = Var[Y ] = E Y 2 − (E [Y ])2 = 3/4 3/4.(c) The second moments are E X2 = x=1.0.Y (b) The expected values of X and Y are E [X] = x=−2. y) T •c 1 •c •3c ' •2c •2c E x 1 2 •3c •c •c c The PMF sums to one when c = 1/14.1 (5) (6) Var[Y ] = E Y 2 = The standard deviations are σX = 24/7 and σY = 132 5/7. (7) 10/7 and σY = Problem 4.2 Solution On the X. the joint PMF is y PX.4 xPX (x) = 12 (4/28) + 22 (8/28) + 42 (16/28) = 73/7 yPY (y) = 12 (7/28) + 32 (21/28) = 7 y=1.0. the variances are Var[X] = E X 2 = x=−2. y) =  0   5/14 4/14 (x.3.0. Y plane.2 PX.Y PY (y) = x=−2.2 x2 PX (x) = (−2)2 (6/14) + 22 (6/14) = 24/7 y 2 PY (y) = (−1)2 (5/14) + 12 (5/14) = 5/7 y=−1.1 PX. y) =  0 x = −2.2. 1 y=0 otherwise (1) (2) xPX (x) = −2(6/14) + 2(6/14) = 0 yPY (y) = −1(5/14) + 1(5/14) = 0 (3) (4) E [Y ] = y=−1. . (a) The marginal PMFs of X and Y are PX (x) = y=−1. . . . k) = PN.. Problem 4.4. otherwise (1) (2) 0 pk (1 − p)100−k k = 0. .5 Solution For n = 0. . . 1. . k) =  o otherwise n n (1) For n ≥ 1.3. PN. n n = 1. 1. the marginal PMF of N is n PN (n) = k PN. 2. . 1..K (n.3.4 Solution The joint PMF of N. .K (n. . 100 otherwise Problem 4. 1. 2 .K (n. k) over all possible N . Thus.3.We recognize that the given joint PMF is written as the product of two marginal PMFs PN (n) and PK (k) where 100 Problem 4.K (n. .K (n. .3 Solution PN (n) = PK (k) = k=0 ∞ n=0 PN. this sum cannot be simplified. k) = k=0 100n e−100 100n e−100 = (n + 1)! n! (1) For k = 0. . the marginal PMF of K is PK (k) = ∞ n=k 1 100n e−100 = (n + 1)! 100 = 1 100 ∞ n=k ∞ n=k 100n+1 e−100 (n + 1)! PN (n + 1) (2) (3) (4) = P [N > k] /100 Problem 4. . the marginal PMF of N is PN (n) = k=1 PN. . . K is   (1 − p)n−1 p/n k = 1. .1 Solution 133 .K (n. then N ≥ k. . k) = k=1 (1 − p)n−1 p/n = (1 − p)n−1 p (2) The marginal PMF of K is found by summing PN. Note that if K = k. k) = 0 100n e−100 n! 100 k n = 0. ∞ 1 (1 − p)n−1 p (3) PK (k) = n n=k Unfortunately. . 134 .Y (x. y ≥ 0 0 otherwise (1) To find the constant c we integrate over the region shown. y) = cxy 2 0 ≤ x.2 Solution Given the joint PDF fX. x.(a) The joint PDF of X and Y is Y 1 Y+X=1 fX. Here we can set up the following integrals Y 1 Y+X=1 Y+X=½ P [X + Y ≤ 1/2] = = 1/2 0 1/2 0 0 1/2−x 2 dy dx (6) (7) (8) X 1 (1 − 2x) dx = 1/2 − 1/4 = 1/4 Problem 4. This gives 1 0 0 1−x 1 c dy dx = cx − cx 2 1 0 = c =1 2 (2) Therefore c = 2. y ≤ 1 0 otherwise (1) (a) To find the constant c integrate fX.Y (x. y) = X c x + y ≤ 1. y) over the all possible values of X and Y to get 1= 0 1 0 1 cxy 2 dx dy = c/6 (2) Therefore c = 6.Y (x. (b) To find the P [X ≤ Y ] we look to integrate over the area indicated by the graph Y 1 X£Y X=Y P [X ≤ Y ] = = X 1/2 0 1/2 0 x 1−x dy dx (2 − 4x) dx (3) (4) (5) 1 = 1/2 (c) The probability P [X + Y ≤ 1/2] can be seen in the figure.4. y) over the indicated shaded region.Y) < ½ Y 1 min(X. to find P [Y ≤ X 2 ] we can integrate over the region shown in the figure. Y ) ≤ 1/2] = 1 − P [min(X. which would require the evaluation of two integrals. Y ≤ 3/4] = 0 3 4 3 4 (11) (12) (13) (14) 6xy 2 dx dy y3 3/4 0 = X 1 = (3/4) = 0.Y) < ¾ P [max(X. 135 (1) . y) over the lighter shaded region. Y 1 1 0 1 0 0 x P [X ≥ Y ] = = X 1 Y 1 Y=X 2 6xy 2 dy dx (3) (4) (5) 2x4 dx = 2/5 Similarly.Y (x. y ≥ 0.Y) > ½ P [min(X. Y ) ≤ 3/4] can be found be integrating over the shaded region shown below. y) = 6e−(2x+3y) x ≥ 0.4. Y ) > 1/2] =1− =1− 1 1 (8) (9) (10) 6xy 2 dx dy dy = 11 32 1/2 1 1/2 1/2 9y 2 X 1 4 (d) The probability P [max(X. or we can perform one integral over the darker region by recognizing min(X. Y ) ≤ 3/4] = P [X ≤ 3/4.237 0 2 3/4 x 0 5 Problem 4.Y (x. Y 1 max(X.(b) The probability P [X ≥ Y ] is the integral of the joint PDF fX.Y (x. 0 otherwise. P Y ≤X X 1 2 = 0 1 0 x2 6xy 2 dy dx (6) (7) = 1/4 (c) Here we can choose to either integrate fX.3 Solution The joint PDF of X and Y is fX. 4. We will use variable u and v as dummy variables for x and y. Y ) ≥ 1] = ∞ 1 1 ∞ 6e−(2x+3y) dy dx = e−(2+3) (10) (c) The event max(X.4 Solution The only difference between this problem and Example 4. Y ≤ 1} so that P [max(X.(a) The probability that X ≥ Y is: Y X³Y P [X ≥ Y ] = = ∞ 0 ∞ ∞ 0 x 6e−(2x+3y) dy dx y=x y=0 (2) dx (3) (4) 0 2e−2x −e−3y X = 0 The P [X + Y ≤ 1] is found by integrating over the region where X + Y ≤ 1 Y 1 X+Y≤ 1 [2e−2x − 2e−5x ] dx = 3/5 1 0 1 0 0 1−x P [X + Y ≤ 1] = = = X 1 6e−(2x+3y) dy dx y=1−x y=0 (5) (6) (7) (8) (9) 2e−2x −e−3y dx 1 0 2e−2x 1 − e−3(1−x) dx 1 0 −2 = 1 + 2e−3 − 3e = −e−2x − 2ex−3 (b) The event min(X. Y ) ≥ 1 is the same as the event {X ≥ 1. y) = X x 1 y −∞ x −∞ fX. Y ≥ 1}. P [min(X. Y ) ≤ 1] = 1 0 0 1 6e−(2x+3y) dy dx = (1 − e−2 )(1 − e−3 ) (11) Problem 4. v) du dv = 0 (1) 136 .Y (x.5. there are five cases.5 is that in this problem we must integrate the joint PDF over the regions to find the probabilities. Just as in Example 4. Thus. Y ) ≤ 1 is the same as the event {X ≤ 1. the region of integration doesn’t overlap the region of nonzero probability and y FX. • x < 0 or y < 0 Y 1 In this case.Y (u. v) dy dx (2) (3) (4) (5) 8uv du dv = 0 y 4(x2 − v 2 )v dv v=y v=0 = 2x2 v 2 − v 4 = 2x2 y 2 − y 4 • 0 < x ≤ y and 0 ≤ x ≤ 1 Y 1 FX.Y (x. y) = = X x 1 y x −∞ −∞ y 1 0 v fX.Y (x.• 0<y≤x≤1 Y 1 In this case. y) = y y x = X 1 x −∞ −∞ x u 0 0 fX. v) dv du (6) (7) (8) 8uv dv du = 0 x 4u3 du = x4 • 0 < y ≤ 1 and x ≥ 1 Y 1 y FX.Y (x.Y (u. y) = = X x 1 y −∞ −∞ y x 0 v fX.Y (u. the region where the integral has a nonzero contribution is y x FX.Y (u. v) dv du (9) (10) (11) (12) 8uv du dv = 0 y 4v(1 − v 2 ) dv = 2y 2 − y 4 • x ≥ 1 and y ≥ 1 137 . Y plane corresponding to the event X > 0 is 1/4 of the total shaded area. v) du dv (13) (14) =1 The complete answer for the joint CDF is   0    2x2 y 2 − y 4  x4 FX. 2 (3) The complete expression for the marginal PDF is fX (x) = (1 − x)/2 −1 ≤ x ≤ 1. y) dy = x 1 1 dy = (1 − x)/2.Y (x. For −1 ≤ x ≤ 1. 0 ≤ x ≤ 1 0 ≤ y ≤ 1. fX (x) = ∞ −∞ fX. y) = 1 -1 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) (b) P [X > 0] = 0 1 x 1 1 dy dx = 2 1 0 1−x dx = 1/4 2 (2) This result can be deduced by geometry.Y (u. The shaded triangle of the X. fX (x) = 0.Y (x. y) =  2  2y − y 4    1 x < 0 or y < 0 0<y≤x≤1 0 ≤ x ≤ y. x ≥ 1 x ≥ 1. the region of integration completely covers the region of nonzero probability and FX. y ≥ 1 (15) Problem 4. (c) For x > 1 or x < −1. 0 otherwise.Y y 1 In this case. (4) 138 .Y (x.5.Y (x. y) = X 1 x y −∞ x −∞ fX.1 Solution (a) The joint PDF (and the corresponding region of nonzero probability) are Y X fX. 5.(d) From the marginal PDF fX (x). 0 ≤ y ≤ x2 0 otherwise fX. 139 . x. y) = X -1 1 (1) We can find the appropriate marginal PDFs by integrating the joint PDF.2 Solution fX.Y (x. the expected value of X is E [X] = ∞ −∞ xfX (x) dx = = 1 2 1 −1 x(1 − x) dx 1 −1 (5) (6) x2 x3 − 4 6 1 =− . fX (x) = 0 1−x 2 dy = 2(1 − x) 0 ≤ x ≤ 1 0 otherwise (2) Likewise for fY (y): X 1 fY (y) = 0 1−y 2 dx = 2(1 − y) 0 ≤ y ≤ 1 0 otherwise (3) Problem 4.5. (2) (b) Similarly.5. 3 Problem 4. y ≥ 0 0 otherwise (1) Y 1 Y+X=1 Using the figure to the left we can find the marginal PDFs by integrating over the appropriate regions.3 Solution Random variables X and Y have joint PDF fX.Y (x. y) = (a) The marginal PDF of X is fX (x) = 2 √ r 2 −x2 √ − r 2 −x2 1/(πr2 ) 0 ≤ x2 + y 2 ≤ r2 0 otherwise (1) 1 dy = πr2 0 √ 2 r 2 −x2 πr 2 −r ≤ x ≤ r. otherwise. √ fY (y) = 2 − √ r 2 −y 2 r 2 −y 2 1 dx = πr2 2 √ 0 r 2 −y 2 πr 2 −r ≤ y ≤ r.Y (x. otherwise. (3) Problem 4.4 Solution Y 1 The joint PDF of X and Y and the region of nonzero probability are 5x2 /2 −1 ≤ x ≤ 1. y) = 2 x + y ≤ 1. for random variable Y . we can simplify the integral to 4 |x| fX (x) = 4 r √ r 2 −x2 0 2 |x| y dy = 4 y 2 r √ 0 r 2 −x2 = 2 |x| (r2 − x2 ) r4 (3) Note that for |x| > r.Y (x. y) dx 5x2 dx + 2 1 √ y (3) 5x2 dx 2 (4) (5) = 5(1 − y 3/2 )/3 The complete expression for the marginal CDF of Y is fY (y) = 5(1 − y 3/2 )/3 0 ≤ y ≤ 1 0 otherwise (6) Problem 4.5 Solution In this problem.5. Hence the complete expression for the PDF of X is fX (x) = 0 2|x|(r 2 −x2 ) r4 −r ≤ x ≤ r otherwise (4) (b) Note that the joint PDF is symmetric in x and y so that fY (y) = fX (y). Y 1 y X -1 -Öy Öy 1 fY (y) = = ∞ −∞ √ − y −1 fX.5. y) = 2 |xy| /r4 0 ≤ x2 + y 2 ≤ r2 0 otherwise (1) (a) Since |xy| = |x||y|. y) dy = 4 r √ r 2 −x2 √ − r 2 −x2 |y| dy (2) Since |y| is symmetric about the origin. Problem 4.Y (x. the joint PDF is fX. For 0 ≤ y ≤ 1.(a) The marginal PDF of X is fX (x) = 0 x2 5x2 dy = 2 5x4 /2 −1 ≤ x ≤ 1 0 otherwise (2) (b) Note that fY (y) = 0 for y > 1 or y < 0. we can write fX (x) = ∞ −∞ fX.Y 2 |x| (x.6 Solution 140 . fX (x) = 0. for −r ≤ x ≤ r. y 6y dx dy dy dx (7) (8) (9) (10) = = 3(y )2 − 2(y )3 0 = 3y 2 − 2y 3 . FX (x) = 1. 6 (2) Thus c = 6. Y FY (y) = = 0 1 y X y ≤y y 1 y y 0 fX. we find the CDF of Y by integrating fX.Y x . y) over the event Y ≤ y. c.Y (x. For x > 1. FX (x) = 0. For 0 ≤ x ≤ 1. For 0 ≤ y ≤ 1. ∞ −∞ ∞ −∞ fX.Y (x. y) = X cy 0 ≤ y ≤ x ≤ 1 0 otherwise (1) 1 (b) To find the value of the constant. The complete expression for the CDF of Y is  y<0  0 3y 2 − 2y 3 0 ≤ y ≤ 1 FY (y) =  1 y>1 141 1 6y (1 − y ) dy y (11) . FY (y) = 1.Y x . (c) We can find the CDF FX (x) = P [X ≤ x] by integrating the joint PDF over the event X ≤ x. For y < 0. FY (y) = 0 and for y > 1. we integrate the joint PDF over all x and y.Y (x. Y 1 FX (x) = = 0 x ≤x x x 0 fX. y) dx dy = 0 1 0 x cy dy dx = 0 1 cx3 cx2 dx = 2 6 1 0 c = . y 6y dy dx dy dx (3) (4) (5) X x 1 = 0 x 3(x )2 dx = x3 .(a) The joint PDF of X and Y and the region of nonzero probability are Y 1 fX. The complete expression for the joint CDF is   0 x<0 x3 0 ≤ x ≤ 1 FX (x) =  1 x≥1 (6) (d) Similarly. For x < 0. 3) = 3/28 PW (0) = PX. y 4 T W =−2 3/28 PX. Y 1 ½ X P [Y ≤ X/2] = = 1 0 1 0 0 x/2 6y dy dx (12) (13) (14) 3y 2 x/2 dx 0 1 = 0 1 3x2 dx = 1/4 4 Problem 4.(e) To find P [Y ≤ X/2]. it is helpful to label possible points X. PW (w) = 0. we integrate the joint PDF fX.Y (x. 142 .Y (1. y) over the region y ≤ x/2. From the statement of Problem 4.Y (2. 3) = 6/28 PW (1) = PX.6. (b) The expected value of W is E [W ] = w PW (−1) = PX. we simply add the probabilities associated with each possible value of W : PW (−2) = PX.Y (x. Y along with the corresponding values of W = X − Y .Y (1. y) • • W =−1 6/28 W =1 12/28 3 • 2 W =0 1/28 W =1 2/28 W =3 4/28 1 • 1 • 2 • 4 0 0 3 E x (a) To find the PMF of W .Y (4.Y (2.6.Y (4.1. 1) + PX. 1) = 4/28 For all other values of w.1 Solution In this problem. 1) = 1/28 PW (3) = PX. 3) = 14/28 (1) (2) (3) wPW (w) (4) (5) = −2(3/28) + −1(6/28) + 0(1/28) + 1(14/28) + 3(4/28) = 1/2 (c) P [W > 0] = PW (1) + PW (3) = 18/28. 6.2 y=−1.Y (−2. 14 14 (8) (9) (c) Lastly. 0) = 3c PW (4) = PX. Y point with the corresponding value of W = X + 2Y .2 Solution y T c W =0 2c W =−2 ' PX.Y (−2. 1) = 3c (b) The expected value is now straightforward: E [W ] = With c = 1/14. That is. 1) + PX. −1) = 2c PW (2) = PX. y) • • 2 3c W =4 2c W =2 E c W =0 • • • 1 • • c W =2 1 c W =−2 x 3c W =−4 • In Problem 4. it is convenient to label each possible X. P [W > 0] = PW (2) + PW (4) = 3/7. y) is given in terms of the parameter c.Y (2. we sum the PMF over all possible values of X and Y . 1) + PX.Y (0. we first need to find c.Y (x. PW (−4) = PX.1 c |x + y| (1) (2) = 6c + 2c + 6c = 14c Thus c = 1/14.6.0.Y (0. −1) = 3c (3) (4) (5) (6) (7) PW (−2) = PX.Y (−2.Y (x. y) = x y x=−2. (a) From the above graph. c To find c. 4 2/14 w = 0 PW (w) =  0 otherwise 2 3 (−4 + −2 + 2 + 4) + 0 = 0.0. PX.Y (2. PW (w) = ∞ x=−∞ P [X = x.Y (2. −2. we can calculate the probability of each possible value of w. we must have Y = w − x in order for W = w. 2.Problem 4. we can summarize the PMF as   3/14 w = −4. Problem 4. We choose c so the sum equals one. −1) = 3c PW (0) = PX.Y (x. w − x) (1) 143 .2. Y = w − x] = ∞ x=−∞ PX. 0) + PX. Before doing so. the joint PMF PX. Now we can solve the actual problem.2.3 Solution We observe that when X = x.Y (x. For this problem. 01(10 − w) 2 To find the PMF of W . 10.4 Solution Y W>w The x. 1. . . 10 0 otherwise (7) = 0. . PV (v) = P [V < v + 1] − P [V < v] = 0. we observe that w P [W > w] = P [min(X. . For v = 1. . . Y > w] X w (1) (2) (3) = 0.01[(10 − w − 1)2 − (10 − w)2 ] = 0. 11. . . .01(v − 1) 2 v To find the PMF of V . . For w = 0. . we observe that for v = 1. .01(21 − 2w) Problem 4. . 2.6.01[v − (v − 1) ] 2 2 (4) (5) (6) 144 .01(2v − 1) v = 1.6. .01(21 − 2w) w = 1.Problem 4. . 10. . . y pairs with nonzero probability are shown in the figure. y pairs with nonzero probability are shown in the figure. Y ) > w] = P [X > w. Y < v] X (1) (2) (3) = 0. . Y ) < v] = P [X < v. we observe that v V<v P [V < v] = P [max(X. 2.5 Solution Y The x. . . 10 0 otherwise (6) (4) (5) = 0. 10. . PW (w) = P [W > w − 1] − P [W > w] The complete expression for the PMF of W is PW (w) = 0. . we observe that for w = 1. . .01(2v − 1) The complete expression for the PMF of V is PV (v) = 0. which occurs when X = 1 or Y = 1. fW (w) = (7) Problem 4. the CDF of W is Y 1 w W<w FW (w) = P [max(X. Y ≤ w] = 0 w w 0 (1) (2) (3) X fX. The maximum value of W is W = 1.Y (x. the CDF of W is Y 1 Y=X+w ½ X -w 1 FW (w) = P [Y − X ≤ w] = = 1 x+w (1) (2) (3) (4) 6y dy dx −w 1 −w 0 3(x + w)2 dx 1 −w = (x + w)3 145 = (1 + w)3 .6. which occurs when X = 0 and Y = 0. (b) For 0 ≤ w ≤ 1. Hence the range of W is SW = {w| − 1 ≤ w ≤ 0}.Problem 4. we observe that W = Y − X ≤ 0 since Y ≤ X. FW (w) = 1. y) = x + y yields FW (w) = 0 w 0 w 0 w w 1 (x + y) dy dx y2 2 y=w (4) dx = w 0 = xy + (wx + w2 /2) dx = w3 (5) y=0 The complete expression for the CDF is FW  w<0  0 w3 0 ≤ w ≤ 1 (w) =  1 otherwise dFW (w) = dw 3w2 0 ≤ w ≤ 1 0 otherwise (6) The PDF of W is found by differentiating the CDF. For w > 0. the most negative value of W occurs when Y = 0 and X = 1 and W = −1. For −1 ≤ w ≤ 0. y) is nonzero only for 0 ≤ y ≤ x ≤ 1.6 Solution (a) The minimum value of W is W = 0.7 Solution (a) Since the joint PDF fX. The range of W is SW = {w|0 ≤ w ≤ 1}. y) dy dx Substituting fX.Y (x.Y (x. Y ) ≤ w] = P [X ≤ w. (b) For w < −1. FW (w) = 0. In addition.6. y) = X 2 0≤y≤x≤1 0 otherwise (1) 1 (a) Since X and Y are both nonnegative.Y (x. Thus E[W ] = 1/2. the CDF of W is Y 1 w FW (w) = P [Y /X ≤ w] = P [Y ≤ wX] = w (2) The complete expression for the CDF is P[Y<wX]   0 w<0 X w 0≤w<1 FW (w) = 1  1 w≥1 By taking the derivative of the CDF. Problem 4.8 Solution Y Random variables X and Y have joint PDF 1 fX.9 Solution Y Random variables X and Y have joint PDF 1 fX. y) = X 2 0≤y≤x≤1 0 otherwise (1) 1 146 . 1]. the complete CDF of W is  w < −1  0 3 −1 ≤ w ≤ 0 (1 + w) FW (w) =  1 w>0 (5) Problem 4. Since Y ≤ X.Y (x. we find that the PDF of W is fW (w) = 1 0≤w<1 0 otherwise (3) (4) We see that W has a uniform PDF over [0. Thus the range of W is SW = {w|0 ≤ w ≤ 1}. W = Y /X ≤ 1. (b) For 0 ≤ w ≤ 1.By taking the derivative of fW (w) with respect to w.6. we obtain the PDF fW (w) = 3(w + 1)2 −1 ≤ w ≤ 0 0 otherwise (6) Therefore.6. Note that W = 0 can occur if Y = 0. W = Y /X ≥ 0. W can be arbitrarily large. Problem 4. fW (w) = dFW (w) = dw ∞ −∞ 0 w<1 1 − 1/w w ≥ 1 1/w2 w ≥ 1 0 otherwise ∞ 1 (6) (7) To find the expected value E[W ]. the integral diverges and E[W ] is undefined. 1 1/w X P[Y<X/w] FW (w) = P [X/Y ≤ w] (2) (3) (4) = 1 − P [X/Y > w] = 1 − P [Y < X/w] = 1 − 1/w (5) Note that we have used the fact that P [Y < X/w] equals 1/2 times the area of the corresponding triangle.10 Solution The position of the mobile phone is equally likely to be anywhere in the area of a circle with radius 16 km. FR (r) = 0 2π r 0 FR (r) = P [R ≤ r] = P X 2 + Y 2 ≤ r2 r dr dθ = r2 /16 16π (2) (3) So Then by taking the derivative with respect to r we arrive at the PDF fR (r) = r/8 0 ≤ r ≤ 4 0 otherwise 147 (5)  r<0  0 r2 /16 0 ≤ r < 4 FR (r) =  1 r≥4 (4) . y) = 0 for y > x. we will measure X and Y in kilometers. the CDF of R is fX. the CDF of W is Y (a) Since fX. Let X and Y denote the position of the mobile.Y (x. we write E [W ] = wfW (w) dw = dw . we can conclude that Y ≤ X and that W = X/Y ≥ 1. Assuming the base station is at the origin of the X.(b) For w ≥ 1. Since Y can be arbitrarily small but positive.6. y) = 1 16π (1) By changing to polar coordinates. Since we are given that the cell has a radius of 4 km.Y (x. we see that for 0 ≤ r ≤ 4 km. The complete CDF is 1 FW (w) = The PDF of W is found by differentiating the CDF. the joint PDF of X and Y is x2 + y 2 ≤ 16 0 otherwise √ Since the mobile’s radial distance from the base station is R = X 2 + Y 2 . Hence the range of W is SW = {w|w ≥ 1}. Y plane. w (8) However. we observe that either Y ≥ X or X ≥ Y . each of the lighter triangles that are not part of the region of interest has area a/w a2 /2w. y) 3/28 • •6/28 •2/28 2 3 •12/28 •4/28 4 In Problem 4. X 1 a2 /2w + a2 /2w =1− 2 a w This implies P [X/w ≤ Y ≤ wX] = 1 − The final expression for the CDF of W is FW (w) = By taking the derivative. we obtain the PDF fW (w) = 0 w<1 1/w2 w ≥ 1 (3) 0 w<1 1 − 1/w w ≥ 1 (2) (1) a Problem 4.11 Solution = P [(X/Y ) ≤ w.3 y PX. Y plane. Also the expected values and variances were E [X] = 3 E [Y ] = 5/2 Var[X] = 10/7 Var[Y ] = 3/4 (1) (2) • 1 E x We use these results now to solve this problem. we solve FW (w) = P [max[(X/Y ). Because the PDF is uniform over the square. W ≥ 1. it Y=X/w is easier to use geometry to calculate the probability. we know that FW (w) = 0 for w < 1.Following the hint. equivalently. we found the joint PMF PX. y) x (3) (4) = 1 1 3 3 1 2 3 6 1 4 3 12 + + + + + = 15/14 1 28 1 28 2 28 2 28 4 28 4 28 148 .Y (x.2. (Y /X)] ≤ w] = P [Y ≥ X/w. We can depict the given set {X/w ≤ Y ≤ wX} as the dark region on the X.7. (a) Random variable W = Y /X has expected value E [Y /X] = x=1.6. or.1 Solution y 4 3 2 1 0 0 1/28 T PX. To find the CDF FW (w). nonnegativity of X and Y was essential. Y ≤ wX] = P [X/w ≤ Y ≤ wX] Y a a/w Problem 4.Y (x. For w ≥ 1.Y (x. y) as shown.4 y=1.1.2. (Y /X) ≥ 1 or (X/Y ) ≥ 1. In particular. (Y /X) ≤ w] Y=wX We note that in the middle of the above steps. Hence. y) T ' •1/14 •2/14 •3/14 1 •1/14 1 •3/14 2 In Problem 4.Y (x. The expected values and variances were found to be E [X] = 0 E [Y ] = 0 Var[X] = 24/7 Var[Y ] = 5/7 (1) (2) •2/14E x •1/14 •1/14 c We need these results to solve this problem. 149 . Y ] Var[X] Var[Y ] = 0. Y ] = E [XY ] − E [X] E [Y ] = 5 15 −3 =0 2 2 (11) (d) Since X and Y have zero covariance.Y (x.2. we found the joint PMF PX.Y = x=1. the correlation coefficient is ρX.7.2.4 y=1.Y = E [XY ] = x=1. y) (5) (6) (7) 1 · 1 · 1 1 · 3 · 3 2 · 1 · 2 2 · 3 · 6 4 · 1 · 4 4 · 3 · 12 + + + + + 28 28 28 28 28 28 = 210/28 = 15/2 = Recognizing that PX. 28 (13) Problem 4.1.3 xyPX.2.3 (xy)2 28 y2 y=1.Y (x.Y (x.2.3 (8) (9) (10) = 1 28 x2 x=1. (12) (e) Since X and Y are uncorrelated.4 y=1. the variance of X + Y is Var[X + Y ] = Var[X] + Var[Y ] = 61 .Y = Cov [X.4 1 15 210 = (1 + 22 + 42 )(12 + 32 ) = = 28 28 2 (c) The covariance of X and Y is Cov [X. y) shown here.2 Solution y PX. y) = xy/28 yields the faster calculation rX.(b) The correlation of X and Y is rX. 1 2xy PX. the joint PMF and the marginal PMFs are PH.2 h=0 0.4 0. b) b = 0 b = 2 b = 4 PH (h) h = −1 0 0.1 xyPX.3 From the joint PMF.2 0.1 0 0.16.2 y=−1.1 0 0.2 h=1 PB (b) 0.3 Solution In the solution to Quiz 4.4 hbPH.1) + −1(4)(0. y) (3) (4) (5) (6) 3 2 1 1 1 + 2−2(0) + 2−2(1) + 20(−1) + 20(1) 14 14 14 14 14 2(−1) 1 2(0) 2 2(1) 3 +2 +2 +2 14 14 14 = 61/28 = 2−2(−1) (b) The correlation of X and Y is rX.2 y=−1.4 150 .5 0.2 0. b) (2) (3) (4) = −1(2)(0. Y ] (10) (11) (12) (13) Problem 4.Y (x.(a) Random variable W = 2XY has expected value E 2XY = x=−2. the correlation coefficient is 1 (1) rH.B (h.7. 7 7 7 7 2 =√ 30 Var[X] Var[Y ] Cov [X.6 0.0. Y ] 24 5 4 37 = + +2 = .1 0.0.3. y) (7) (8) (9) −2(−1)(3) −2(0)(2) −2(1)(1) 2(−1)(1) 2(0)(2) 2(1)(3) + + + + + 14 14 14 14 14 14 = 4/7 = (c) The covariance of X and Y is Cov [X.B (h.4) + 1(2)(0.2) + 1(4)(0) = −1.Y (x.0.Y = x=−2.0. Y ] = E [XY ] − E [X] E [Y ] = 4/7 (d) The correlation coefficient is ρX.1 0.Y = (e) By Theorem 4.B = E [HB] = h=−1 b=0. Var [X + Y ] = Var [X] + Var[Y ] + 2 Cov [X.2. 2 bPB (b) = 0(0. the expected values of X and Y are 1 Using the marginal E [H] = h=−1 hPH (h) = −1(0.5) + 4(0. B] = E [HB] − E [H] E [B] = −1. we first find the second moments. found in Example 4.2) = −0. 2.4 − (−0.6) + 0(0.4 The covariance is Cov [H.96 (7) Problem 4.2 (5) (6) E [B] = b=0. 3.   25/48 y = 1     13/48 y = 2 1/4 x = 1.7. 4 E Y 2 = y=1 4 y 2 PY (y) = 12 x2 PX (x) = x=1 25 13 7 3 + 22 + 32 + 42 = 47/12 48 48 48 48 (4) (5) E X 2 = 1 2 1 + 22 + 32 + 42 = 15/2 4 Now the variances are Var[Y ] = E Y 2 − (E [Y ])2 = 47/12 − (7/4)2 = 41/48 2 (6) (7) Var[X] = E X − (E [X]) = 15/2 − (5/2) = 5/4 2 2 151 .2) + 1(0. PMFs.2) + 2(0.3) = 2.2) = −0.13.2)(2. 4 7/48 y = 3 PY (y) = (1) PX (x) = 0 otherwise   3/48 y = 4    0 otherwise (a) The expected values are 4 E [Y ] = y=1 4 yPY (y) = 1 xPX (x) = x=1 13 7 3 25 + 2 + 3 + 4 = 7/4 48 48 48 48 (2) (3) E [X] = 1 (1 + 2 + 3 + 4) = 5/2 4 (b) To find the variances.4 Solution From the joint PMF.since only terms in which both h and b are nonzero make a contribution. we can find the marginal PMF for X or Y by summing over the columns or rows of the joint PMF. PX (x)Y .2. V ] Var[W ] Var[V ] = 10/16 (41/48)(5/4) ≈ 0. .605 (11) (12) Problem 4. . Y ] = E [XY ] − E [X] E [Y ] = 5 − (7/4)(10/4) = 10/16 (e) The correlation coefficient is ρX.Y = E [XY ] = x=1 y=1 xyPX. for integers 0 ≤ y ≤ 5. Specifically. 5 0 otherwise (3) (4) E [X] = x=0 x 70 10 x+1 = = 21 21 3 5 E [Y ] = y=0 y 35 5 6−y = = 21 21 3 (5) To find the covariance.7. the marginal PMF of X is x PX (x) = y PX.Y (x. we first find the correlation 5 x E [XY ] = x=0 y=0 1 xy = 21 21 5 x x x=1 y=1 y= 1 42 5 x2 (x + 1) = x=1 20 280 = 42 3 (6) The covariance of X and Y is Cov [X. . Y ] = E [XY ] − E [X] E [Y ] = 152 10 20 50 − = 3 9 9 (7) . we evaluate the product XY over all values of X and Y . . .Y = Cov [W. 5 0 otherwise (6 − y)/21 y = 0. y) (8) (9) (10) 1 2 3 4 4 6 8 9 12 16 + + + + + + + + + 4 8 12 16 8 12 16 12 16 16 =5 = (d) The covariance of X and Y is Cov [X. the marginal PMF of Y is 5 PY (y) = x PX. .Y (x.(c) To find the correlation. . . y) = y=0 (1/21) = x+1 21 (1) Similarly.5 Solution For integers 0 ≤ x ≤ 5. 4 x rX.Y (x. y) = x=y (1/21) = 6−y 21 (2) The complete expressions for the marginal PMFs are PX (x) = PY (y) = The expected values are 5 (x + 1)/21 x = 0. 7. This implies min(X. Here we observe that W = Y and V = X. 2. The covariance of X and Y is Cov [X.V = ρX. . ρX.7. This is a result of the underlying experiment in that given X = x. each Y ∈ {1. y. Y ) for each possible pair x. Y ) and V = max(X.V = E [W V ] = E [XY ] = rX. Y ) = Y and max(X. 1 • 1/8 W =1 V =2 0 0 1 2 3 E 4 x Using the results from Problem 4.7 Solution (1) (2) (3) The correlation coefficient is ρX. . ρX. 153 . we identify the values of W = min(X. Y ] = 10/16 (e) The correlation coefficient is ρW. Y ) = X.Y = 10/16 (41/48)(5/4) ≈ 0. (a) The expected values are E [W ] = E [Y ] = 7/4 (b) The variances are Var[W ] = Var[Y ] = 41/48 (c) The correlation is rW. we have the following answers.Y = Cov [X.Y = 5 (d) The covariance of W and V is Cov [W.605 (5) (4) (3) Var[V ] = Var[X] = 5/4 (2) E [V ] = E [X] = 5/2 (1) First. Y ] = E [(X − µX )(aX + b − aµX − b)] = aE (X − µX ) = a Var[X] 2 Problem 4.4.Y (x. . Y ] Var[X] Var[Y ] = a Var[X] Var[X] a2 Var[X] = a |a| (4) When a > 0.Y = 1. Hence Y ≤ X. y) 1/12 W =3 V =3 1/8 W =2 V =2 1/4 W =1 V =1 1/16 W =4 V =4 • • • • 3 • • • 1/16 W =3 V =4 1/16 W =2 V =4 1/16 W =1 V =4 2 • • 1/12 W =2 V =3 1/12 W =1 V =3 To solve this problem.6 Solution y 4 T PX. When a < 0. .Y = 1.Problem 4. we observe that Y has mean µY = aµX + b and variance Var[Y ] = a2 Var[X]. V ] = Cov [X.7. x} is equally likely. Y (x. y) = (x + y)/3 0 ≤ x ≤ 1. Y ] = E[XY ] − E[X]E[Y ] = −1/81. fY (y) = ∞ −∞ fX. ∞ −∞ ∞ −∞ 2 0 yfY (y) dy = y2 y3 2y + 1 dy = + y 6 12 9 2 2y 2 = 0 2 11 9 16 9 (7) The second moment of Y is E Y 2 = y fY (y) dy = 2 0 2 y y3 y4 +1 dy = + 6 18 12 = 0 (8) (c) The correlation of X and Y is The variance of Y is Var[Y ] = E[Y 2 ] − (E[Y ])2 = 23/81. y) dx = 0 1 x y + 3 3 x2 xy + dx = 6 3 2y+1 6 x=1 = x=0 2y + 1 6 (3) Complete expressions for the marginal PDFs are fX (x) = 0 2x+2 3 0≤x≤1 otherwise fY (y) = 0 0≤y≤2 otherwise 1 (4) (a) The expected value of X is E [X] = ∞ −∞ ∞ −∞ xfX (x) dx = 0 1 2x3 x2 2x + 2 dx = + x 3 9 3 2 2x = 0 1 5 9 7 18 (5) The second moment of X is E X 2 = x fX (x) dx = 2 0 1 x x4 2x3 +2 dx = + 3 6 9 = 0 (6) (b) The expected value of Y is E [Y ] = The variance of X is Var[X] = E[X 2 ] − (E[X])2 = 7/18 − (5/9)2 = 13/162.8 Solution The joint PDF of X and Y is fX. we first find the marginal PDFs of X and Y . fX (x) = For 0 ≤ y ≤ 2. y) dy = 0 xy y 2 x+y dy = + 3 3 6 y=2 y=0 = 2x + 2 3 (2) ∞ −∞ fX. 154 . y) dx dy 2 (9) (10) (11) 1 xy x+y 3 dy dx y=2 = = 0 x2 y 2 xy 3 + 6 9 2x2 8x + 3 9 dx y=0 1 dx = 2x3 4x2 + 9 9 = 0 2 3 (12) The covariance is Cov[X.Problem 4. For 0 ≤ x ≤ 1.Y (x.7. E [XY ] = = 0 1 0 1 0 xyfX.Y (x.Y (x. 0 ≤ y ≤ 2 0 otherwise 2 (1) Before calculating moments. (d) The expected value of X and Y is E [X + Y ] = E [X] + E [Y ] = 5/9 + 11/9 = 16/9 (e) By Theorem 4. Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov [X.15.9 Solution (a) The first moment of X is E [X] = 0 1 0 1 4x2 y dy dx = 0 1 2x2 dx = 2 3 (1) The second moment of X is E X2 = 0 1 0 1 4x3 y dy dx = 0 1 2x3 dx = 1 2 (2) The variance of X is Var[X] = E[X 2 ] − (E[X])2 = 1/2 − (2/3)2 = 1/18. (e) By Theorem 4. Y ] = 1/18 + 1/18 + 0 = 1/9 (7) 2 3 2 =0 (6) 155 .15. (b) The mean of Y is E [Y ] = 0 1 0 1 4xy 2 dy dx = 0 1 2 4x dx = 3 3 1 (3) The second moment of Y is E Y2 = 0 1 0 1 4xy 3 dy dx = 0 x dx = 1 2 (4) The variance of Y is Var[Y ] = E[Y 2 ] − (E[Y ])2 = 1/2 − (2/3)2 = 1/18.7. Y ] = 13 23 2 55 + − = 162 81 81 162 (14) (13) Problem 4. the variance of X + Y is Var[X] + Var[Y ] + 2 Cov [X. Y ] = E [XY ] − E [X] E [Y ] = − 9 (d) E[X + Y ] = E[X] + E[Y ] = 2/3 + 2/3 = 4/3. (c) To find the covariance. we first find the correlation E [XY ] = 0 1 0 1 1 0 4x2 y 2 dy dx = 4 4x2 dx = 3 9 (5) The covariance is thus 4 Cov [X. Y (x.7. Var[Y ] = 5/26 − (5/14)2 = . Thus.0576 = 0.10 Solution Y 1 The joint PDF of X and Y and the region of nonzero probability are 5x2 /2 −1 ≤ x ≤ 1. Y ] = 5/7 + 0.11 Solution Y Random variables X and Y have joint PDF 1 fX.7. y) = X 2 0≤y≤x≤1 0 otherwise (1) 1 156 .15. 0 ≤ y ≤ x2 0 otherwise fX. the variance of X and the second moment are both Var[X] = E X 2 = 1 −1 0 x2 x2 5x7 5x2 dy dx = 2 14 1 = −1 10 14 (3) (b) The first and second moments of Y are E [Y ] = E Y2 = 1 −1 1 −1 0 x2 y 5 5x2 dy dx = 2 14 5 5x2 dy dx = 2 26 (4) (5) x2 y 2 0 Therefore.Problem 4. Cov[X.7719 (8) 5 14 (7) Problem 4. (c) Since E[X] = 0. Y ] = E [XY ] = 1 1 0 x2 xy 5x2 dy dx = 2 1 −1 5x7 dx = 0 4 (6) (d) The expected value of the sum X + Y is E [X + Y ] = E [X] + E [Y ] = (e) By Theorem 4. Cov [X. the variance of X + Y is Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov [X.0576. Y ] = E[XY ] − E[X]E[Y ] = E[XY ]. y) = X -1 1 (1) (a) The first moment of X is E [X] = 1 −1 0 x2 5x2 dy dx = x 2 1 −1 5x6 5x5 dx = 2 12 1 =0 −1 (2) Since E[X] = 0.Y (x. fY (y) = ∞ −∞ fX.Y (x. y) dy = 0 x 2 dy = 2x (2) Note that fX (x) = 0 for x < 0 or x > 1.15. Y ] = 1/6. it is helpful to first find the marginal PDFs. we find the correlation E [XY ] = 0 1 0 x 1 0 2xy dy dx = x3 dx = 1/4 (9) The covariance is Cov [X. fX (x) = ∞ −∞ fX. y) dx = y 1 2 dx = 2(1 − y) (3) Also. Y ] = E [XY ] − E [X] E [Y ] = 1/36. For 0 ≤ y ≤ 1. (b) The expected value and second moment of Y are E [Y ] = E Y2 = ∞ −∞ ∞ −∞ yfY (y) dy = 0 1 2y(1 − y) dy = y 2 − 1 2y 3 3 1 = 1/3 0 1 y4 0 (7) (8) y 2 fY (y) dy = 0 2y 2 (1 − y) dy = 2y 3 − 3 2 = 1/6 The variance of Y is Var[Y ] = E[Y 2 ] − (E[Y ])2 = 1/6 − 1/9 = 1/18. for y < 0 or y > 1.Before finding moments. (c) Before finding the covariance. (d) E[X + Y ] = E[X] + E[Y ] = 2/3 + 1/3 = 1 (e) By Theorem 4.Y (x. Complete expressions for the marginal PDFs are fX (x) = 2x 0 ≤ x ≤ 1 f (y) 0 otherwise Y = 2(1 − y) 0 ≤ y ≤ 1 0 otherwise (4) (a) The first two moments of X are E [X] = E X2 = ∞ −∞ ∞ −∞ xfX (x) dx = 0 1 2x2 dx = 2/3 1 (5) (6) x2 fX (x) dx = 0 2x3 dx = 1/2 The variance of X is Var[X] = E[X 2 ] − (E[X])2 = 1/2 − 4/9 = 1/18. For 0 ≤ x ≤ 1. fY (y) = 0. Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov [X. (11) (10) 157 . . E [XY ] = E eX+Y = = = 1 −1 1 −1 x 1 x 1 1 1 xy dy dx = 2 4 1 x y e e dy dx 2 1 −1 x(1 − x2 ) dx = x2 x4 − 8 16 1 =0 −1 (2) (3) (4) 1 2 1 1+x 1 2x e − e 2 4 −1 ex (e1 − ex ) dx 1 = −1 e2 e−2 1 + − 4 4 2 (5) Problem 4..7. calculating the marginal PMF of K is not easy. For n = 1. . y) is shown with the joint PDF. . 2. we note that E [N ] = 1 p Var[N ] = 1−p p2 2−p p2 (2) We can use these facts to find the second moment of N .13 Solution For this problem. the marginal PMF of N is easy to find. E [K] = Since n k=1 k ∞ n (3) k n=1 k=1 (1 − p)n−1 p = n ∞ n=1 (1 − p)n−1 p n n k k=1 (4) = n(n + 1)/2. E N 2 = Var[N ] + (E [N ])2 = Now we can calculate the moments of K. N has a geometric PMF.7. The rest of this problem is just calculus. From Appendix A. E [N + K] = E [N ] + E [K] = 158 1 3 + 2p 2 (6) .12 Solution Y Random variables X and Y have joint PDF fX. E [K] = ∞ n=1 1 1 n+1 N +1 (1 − p)n−1 p = E = + 2 2 2p 2 (5) We now can calculate the sum of the moments. y) = X 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) 1 -1 The region of possible pairs (x.Y (x. However.Problem 4. n PN (n) = k=1 (1 − p)n−1 p = (1 − p)n−1 p n (1) That is. 04 x = 6. 20 0 otherwise 159 .19. we can calculate the variance of K. the covariance is Cov [N. K] = E [N K] − E [N ] E [K] = 1 1 − 2 2p 2p (13) Problem 4.8. Var[K] = E K 2 − (E [K])2 = To find the correlation of N and K. y) ∈ A otherwise (2) (3) 0. . 10.Y (x. . PX. we find that E K2 = E N2 E [N ] 1 2 1 1 + + = 2+ + 3 2 6 3p 6p 6 (9) Thus. . y) = = PX.1 Solution The event A occurs iff X > 5 and Y > 5 and has probability 10 10 P [A] = P [X > 5. Y > 5] = x=6 y=6 0. . we obtain (n + 1)(2n + 1) (N + 1)(2N + 1) (1 − p)n−1 p = E 6 6 (8) E K2 = ∞ n=1 Applying the values of E[N ] and E[N 2 ] found above. . E [N K] = ∞ n=1 1 n(n + 1) N (N + 1) (1 − p)n−1 p = E = 2 2 2 p (12) Finally.The second moment of K is E K2 = Using the identity n 2 k=1 k ∞ n k2 n=1 k=1 (1 − p)n−1 p = n ∞ n=1 (1 − p)n−1 p n n k2 k=1 (7) = n(n + 1)(2n + 1)/6. .01 = 0.Y |A (x. .25 (1) From Theorem 4. E [N K] = Since n k=1 k ∞ n 5 5 1 + − 12p2 3p 12 (10) n=1 k=1 (1 − p)n−1 p = nk n ∞ n n=1 (1 − p) n−1 p k=1 k (11) = n(n + 1)/2.y) P [A] 0 (x. y = 6. . Y |A (x.3 Solution Given the event A = {X + Y ≤ 1}.19.2 Solution The event B occurs iff X ≤ 5 and Y ≤ 5 and has probability 5 5 P [B] = P [X ≤ 5. y) ∈ A otherwise (2) (3) 0.01 = 0.K|B (n. 2. .8. n PN |B (n) = PN. . . Y ≤ 5] = From Theorem 4.y) P [B] 0. y). PX. 5 0 otherwise Problem 4. . k) = k=1 (1 − p)n−10 p n = 10. .8.K (n. . the event B has probability P [B] = From Theorem 4. . . we wish to find fX. k = 1. . . . the marginal PMF of N satisfies n n PN (n) = k=1 PN. .4 Solution First we observe that for n = 1. k) = (1 − p)n−1 p k=1 1 = (1 − p)n−1 p n (1) Thus. . .8.19. k) = = PN.K (n. .k) P [B] ∞ n=10 PN (n) = (1 − p)9 p[1 + (1 − p) + (1 − p)2 + · · · ] = (1 − p)9 (2) 0 n. . . 0 otherwise (5) 160 . n 0 otherwise The conditional PMF PN |B (n|b) could be found directly from PN (n) using Theorem 2.Y |B (x.04 x = 1. y ≥ 0 otherwise (1) So then fX. y) = 0 6e−(2x+3y) 1−3e−2 +2e−3 (2) Problem 4.Y |A (x.. First we find P [A] = 0 1 0 1−x 6e−(2x+3y) dy dx = 1 − 3e−2 + 2e−3 x + y ≤ 1. . we can also find it just by summing the conditional joint PMF. .25 x=1 y=1 (1) 0 (x. 11. x ≥ 0. . 11. . y) = = PX.K|B (n. PN. k ∈ B otherwise (3) (4) (1 − p)n−10 p/n n = 10. y = 1. However. 5. .Problem 4.17.Y (x. for n = 1. however. N = N + 9 and we can calculate the conditional expectations E [N |B] = E N + 9|B = E N |B + 9 = 1/p + 9 (7) (8) Var[N |B] = Var[N + 9|B] = Var[N |B] = (1 − p)/p2 Note that further along in the problem we will need E[N 2 |B] which we now calculate. E [N + K|B] = E [N |B] + E [K|B] = 1/p + 9 + 1/(2p) + 5 = The conditional second moment of K is E K |B = 2 ∞ n 3 + 14 2p (13) k 2 (1 n=10 k=1 − p)n−10 p = n ∞ n=10 (1 − p)n−10 p n n k2 k=1 (14) Using the identity 2 n 2 k=1 k = n(n + 1)(2n + 1)/6. PN |B (n) = P [N = n + 9|B] = PN |B (n + 9) = (1 − p)n−1 p (6) Hence. E [K|B] = ∞ n=1 1 n+1 1 (1 − p)n−1 p = E [N + 1|B] = +5 2 2 2p (12) We now can calculate the conditional expectation of the sum.K|B (n. we can calculate directly the conditional moments of N given B.. E N 2 |B = Var[N |B] + (E [N |B])2 2 17 + 81 = 2+ p p For the conditional moments of K. we obtain (n + 1)(2n + 1) 1 (1 − p)n−10 p = E [(N + 1)(2N + 1)|B] 6 6 (15) E K |B = ∞ n=10 Applying the values of E[N |B] and E[N 2 |B] found above.From the conditional PMF PN |B (n). we find that E K 2 |B = E N 2 |B E [N |B] 1 2 2 37 + + = 2+ + 31 3 2 6 3p 6p 3 (16) Thus. N = N − 9 has a geometric PMF with mean 1/p. . given B. we observe that given B. That is. we work directly with the conditional PMF PN. Var[K|B] = E K 2 |B − (E [K|B])2 = 161 2 5 7 +6 − 2 12p 6p 3 (17) . Instead. we can calculate the conditional variance of K. 2. . . E [K|B] = Since n k=1 k ∞ n (9) (10) n=10 k=1 (1 − p)n−10 p = k n ∞ n=10 (1 − p)n−10 p n n k k=1 (11) = n(n + 1)/2. k). To find the conditional correlation of N and K, E [N K|B] = Since n k=1 k ∞ n n=10 k=1 (1 − p)n−10 p = nk n ∞ n=10 n (1 − p) n−1 p k=1 k (18) = n(n + 1)/2, ∞ n=10 E [N K|B] = 1 n(n + 1) 1 9 (1 − p)n−10 p = E [N (N + 1)|B] = 2 + + 45 2 2 p p (19) Problem 4.8.5 Solution The joint PDF of X and Y is fX,Y (x, y) = (a) The probability that Y ≤ 1 is Y (x + y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2 0 otherwise (1) P [A] = P [Y ≤ 1] = = 0 fX,Y (x, y) dx dy y≤1 1 1 0 1 0 (2) (3) 2 1 Y 1 X x+y dy dx 3 y=1 = = 0 xy y 2 + 3 6 dx y=0 1 (4) = 0 1 1 x2 x 2x + 1 dx = + 6 6 6 1 3 (5) (b) By Definition 4.10, the conditional joint PDF of X and Y given A is fX,Y |A (x, y) = 0 fX,Y (x,y) P [A] (x, y) ∈ A = otherwise x + y 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise (6) From fX,Y |A (x, y), we find the conditional marginal PDF fX|A (x). For 0 ≤ x ≤ 1, fX|A (x) = ∞ −∞ fX,Y |A (x, y) dy = 1 0 (x + y) dy = xy + y2 2 y=1 =x+ y=0 1 2 (7) The complete expression is fX|A (x) = x + 1/2 0 ≤ x ≤ 1 0 otherwise x=1 (8) For 0 ≤ y ≤ 1, the conditional marginal PDF of Y is fY |A (y) = ∞ −∞ fX,Y |A (x, y) dx = 1 0 (x + y) dx = x2 + xy 2 = y + 1/2 x=0 (9) The complete expression is fY |A (y) = y + 1/2 0 ≤ y ≤ 1 0 otherwise (10) 162 Problem 4.8.6 Solution Random variables X and Y have joint PDF fX,Y (x, y) = (4x + 2y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise (1) (a) The probability of event A = {Y ≤ 1/2} is P [A] = y≤1/2 fX,Y (x, y) dy dx = 0 1 0 1/2 4x + 2y dy dx. 3 (2) With some calculus, P [A] = 0 1 4xy + y 2 3 y=1/2 dx = y=0 0 1 x2 x 2x + 1/4 dx = + 3 3 12 1 = 0 5 . 12 (3) (b) The conditional joint PDF of X and Y given A is fX,Y |A (x, y) = = fX,Y (x,y) P [A] 0 (x, y) ∈ A otherwise (4) (5) 8(2x + y)/5 0 ≤ x ≤ 1, 0 ≤ y ≤ 1/2 0 otherwise For 0 ≤ x ≤ 1, the PDF of X given A is fX|A (x) = ∞ −∞ fX,Y |A (x, y) dy = 8 5 1/2 0 (2x + y) dy y=1/2 (6) = 8x + 1 5 (7) y2 8 2xy + = 5 2 y=0 The complete expression is fX|A (x) = (8x + 1)/5 0 ≤ x ≤ 1 0 otherwise (8) For 0 ≤ y ≤ 1/2, the conditional marginal PDF of Y given A is fY |A (y) = ∞ −∞ fX,Y |A (x, y) dx = = 8 5 1 (2x + y) dx 8xy x=1 (9) 8y + 8 5 (10) 0 2+ 8x 5 = x=0 The complete expression is fY |A (y) = (8y + 8)/5 0 ≤ y ≤ 1/2 0 otherwise (11) 163 Problem 4.8.7 Solution (a) The event A = {Y ≤ 1/4} has probability Y 1 Y<1/4 P [A] = 2 0 1/2 0 1 x2 5x2 dy dx 2 1/4 0 (1) +2 = 0 1/2 1/2 5x2 dy dx 2 1 1/2 ¼ X -1 -½ ½ 1 5x4 dx + + 5x3 /12 5x2 dx 4 (2) (3) (4) = x5 1/2 0 1 1/2 This implies fX,Y |A (x, y) = = = 19/48 fX,Y (x, y) /P [A] (x, y) ∈ A 0 otherwise 120x2 /19 −1 ≤ x ≤ 1, 0 ≤ y ≤ x2 , y ≤ 1/4 0 otherwise (5) (6) (b) fY |A (y) = ∞ −∞ fX,Y |A (x, y) dx = 2 1 √ y 120x2 dx = 19 0 80 19 (1 − y 3/2 ) 0 ≤ y ≤ 1/4 otherwise (7) (c) The conditional expectation of Y given A is E [Y |A] = 1/4 0 80 80 y (1 − y 3/2 ) dy = 19 19 ∞ y 2 2y 7/2 − 2 7 1/4 = 0 65 532 (8) (d) To find fX|A (x), we can write fX|A (x) = −∞ fX,Y |A (x, y) dy. However, when we substitute fX,Y |A (x, y), the limits will depend on the value of x. When |x| ≤ 1/2, fX|A (x) = When −1 ≤ x ≤ −1/2 or 1/2 ≤ x ≤ 1, fX|A (x) = 1/4 0 x2 0 120x4 120x2 dy = 19 19 (9) 30x2 120x2 dy = 19 19 (10) The complete expression for the conditional PDF of X given A is   30x2 /19 −1 ≤ x ≤ −1/2   120x4 /19 −1/2 ≤ x ≤ 1/2 fX|A (x) =  30x2 /19 1/2 ≤ x ≤ 1   0 otherwise 164 (11) (e) The conditional mean of X given A is E [X|A] = −1/2 −1 30x3 dx + 19 1/2 −1/2 120x5 dx + 19 1 1/2 30x3 dx = 0 19 (12) Problem 4.9.1 Solution The main part of this problem is just interpreting the problem statement. No calculations are necessary. Since a trip is equally likely to last 2, 3 or 4 days, PD (d) = 1/3 d = 2, 3, 4 0 otherwise (1) Given a trip lasts d days, the weight change is equally likely to be any value between −d and d pounds. Thus, 1/(2d + 1) w = −d, −d + 1, . . . , d (2) PW |D (w|d) = 0 otherwise The joint PMF is simply PD,W (d, w) = PW |D (w|d) PD (d) = (3) (4) 1/(6d + 3) d = 2, 3, 4; w = −d, . . . , d 0 otherwise Problem 4.9.2 Solution We can make a table of the possible outcomes and the corresponding values of W and Y P [·] W Y outcome hh p2 0 2 p(1 − p) 1 1 ht p(1 − p) −1 1 th (1 − p)2 0 0 tt In the following table, we write and PW (w). PW,Y (w, y) y=0 y=1 y=2 PW (w) (1) the joint PMF PW,Y (w, y) along with the marginal PMFs PY (y) w = −1 w=0 w=1 PY (y) 2 0 (1 − p) 0 (1 − p)2 p(1 − p) 0 p(1 − p) 2p(1 − p) 0 p2 0 p2 p(1 − p) 1 − 2p + 2p2 p(1 − p) (2) Using the definition PW |Y (w|y) = PW,Y (w, y)/PY (y), we can find the conditional PMFs of W given Y. PW |Y (w|0) = PW |Y (w|2) = 1 w=0 0 otherwise 1 w=0 0 otherwise PW |Y (w|1) = 1/2 w = −1, 1 0 otherwise (3) (4) 165 Similarly, the conditional PMFs of Y given W are 1 y=1 0 otherwise 1 y=1 0 otherwise      (1−p)2 1−2p+2p2 p2 1−2p+2p2 y=0 y=2 otherwise (5) PY |W (y| − 1) = PY |W (y|1) = PY |W (y|0) = 0 (6) Problem 4.9.3 Solution fX,Y (x, y) = (x + y) 0 ≤ x, y ≤ 1 0 otherwise (1) (a) The conditional PDF fX|Y (x|y) is defined for all y such that 0 ≤ y ≤ 1. For 0 ≤ y ≤ 1, fX|Y (x) = fX,Y (x, y) = fX (x) (x + y) 1 0 (x + y) dy = 0 (x+y) x+1/2 0≤x≤1 otherwise (2) (b) The conditional PDF fY |X (y|x) is defined for all values of x in the interval [0, 1]. For 0 ≤ x ≤ 1, fY |X (y) = fX,Y (x, y) = fY (y) (x + y) 1 0 (x + y) dx = 0 (x+y) y+1/2 0≤y≤1 otherwise (3) Problem 4.9.4 Solution Y Random variables X and Y have joint PDF 1 fX,Y (x, y) = X 2 0≤y≤x≤1 0 otherwise (1) For 0 ≤ y ≤ 1, fY (y) = ∞ −∞ 1 fX,Y (x, y) dx = y 1 2 dx = 2(1 − y) (2) Also, for y < 0 or y > 1, fY (y) = 0. The complete expression for the marginal PDF is fY (y) = 2(1 − y) 0 ≤ y ≤ 1 0 otherwise (3) By Theorem 4.24, the conditional PDF of X given Y is fX|Y (x|y) = fX,Y (x, y) = fY (y) 166 0 1 1−y y≤x≤1 otherwise (4) That is, since Y ≤ X ≤ 1, X is uniform over [y, 1] when Y = y. The conditional expectation of X given Y = y can be calculated as E [X|Y = y] = = ∞ −∞ 1 y xfX|Y (x|y) dx 1 (5) = y x2 x dx = 1−y 2(1 − y) 1+y 2 (6) In fact, since we know that the conditional PDF of X is uniform over [y, 1] when Y = y, it wasn’t really necessary to perform the calculation. Problem 4.9.5 Solution Y Random variables X and Y have joint PDF 1 fX,Y (x, y) = X 2 0≤y≤x≤1 0 otherwise (1) For 0 ≤ x ≤ 1, the marginal PDF for X satisfies fX (x) = ∞ −∞ 1 fX,Y (x, y) dy = 0 x 2 dy = 2x (2) Note that fX (x) = 0 for x < 0 or x > 1. Hence the complete expression for the marginal PDF of X is 2x 0 ≤ x ≤ 1 (3) fX (x) = 0 otherwise The conditional PDF of Y given X = x is fY |X (y|x) = fX,Y (x, y) = fX (x) 1/x 0 ≤ y ≤ x 0 otherwise (4) Given X = x, Y has a uniform PDF over [0, x] and thus has conditional expected value E[Y |X = x] = ∞ x/2. Another way to obtain this result is to calculate −∞ yfY |X (y|x) dy. Problem 4.9.6 Solution We are told in the problem statement that if we know r, the number of feet a student sits from the blackboard, then we also know that that student’s grade is a Gaussian random variable with mean 80 − r and standard deviation r. This is exactly fX|R (x|r) = √ 1 2πr2 e−(x−[80−r]) 2 /2r 2 (1) Problem 4.9.7 Solution 167 (a) First we observe that A takes on the values SA = {−1, 1} while B takes on values from SB = {0, 1}. To construct a table describing PA,B (a, b) we build a table for all possible values of pairs (A, B). The general form of the entries is b=0 b=1 PA,B (a, b) a = −1 PB|A (0| − 1) PA (−1) PB|A (1| − 1) PA (−1) a=1 PB|A (0|1) PA (1) PB|A (1|1) PA (1) (1) Now we fill in the entries using the conditional PMFs PB|A (b|a) and the marginal PMF PA (a). This yields b=0 b=1 PA,B (a, b) b = 0 b = 1 PA,B (a, b) a = −1 (1/3)(1/3) (2/3)(1/3) which simplifies to a = −1 1/9 2/9 (1/2)(2/3) (1/2)(2/3) 1/3 1/3 a=1 a=1 (b) Since PA (1) = PA,B (1, 0) + PA,B (1, 1) = 2/3, PB|A (b|1) = PA,B (1, b) = PA (1) 1/2 b = 0, 1, 0 otherwise. (3) (2) If A = 1, the conditional expectation of B is 1 E [B|A = 1] = b=0 bPB|A (b|1) = PB|A (1|1) = 1/2. (4) (c) Before finding the conditional PMF PA|B (a|1), we first sum the columns of the joint PMF table to find 4/9 b = 0 (5) PB (b) = 5/9 b = 1 The conditional PMF of A given B = 1 is PA|B (a|1) = PA,B (a, 1) = PB (1) 2/5 a = −1 3/5 a = 1 (6) (d) Now that we have the conditional PMF PA|B (a|1), calculating conditional expectations is easy. E [A|B = 1] = a=−1,1 aPA|B (a|1) = −1(2/5) + (3/5) = 1/5 a2 PA|B (a|1) = 2/5 + 3/5 = 1 (7) (8) E A2 |B = 1 = a=−1,1 The conditional variance is then Var[A|B = 1] = E A2 |B = 1 − (E [A|B = 1])2 = 1 − (1/5)2 = 24/25 (9) 168 (e) To calculate the covariance, we need E [A] = a=−1,1 1 aPA (a) = −1(1/3) + 1(2/3) = 1/3 (10) (11) (12) (13) E [B] = b=0 bPB (b) = 0(4/9) + 1(5/9) = 5/9 1 E [AB] = a=−1,1 b=0 abPA,B (a, b) = −1(0)(1/9) + −1(1)(2/9) + 1(0)(1/3) + 1(1)(1/3) = 1/9 The covariance is just Cov [A, B] = E [AB] − E [A] E [B] = 1/9 − (1/3)(5/9) = −2/27 (14) Problem 4.9.8 Solution First we need to find the conditional expectations 1 E [B|A = −1] = E [B|A = 1] = b=0 1 b=0 bPB|A (b| − 1) = 0(1/3) + 1(2/3) = 2/3 bPB|A (b|1) = 0(1/2) + 1(1/2) = 1/2 (1) (2) Keep in mind that E[B|A] is a random variable that is a function of A. that is we can write E [B|A] = g(A) = 2/3 A = −1 1/2 A = 1 (3) We see that the range of U is SU = {1/2, 2/3}. In particular, PU (1/2) = PA (1) = 2/3 The complete PMF of U is PU (u) = Note that E [E [B|A]] = E [U ] = u PU (2/3) = PA (−1) = 1/3 2/3 u = 1/2 1/3 u = 2/3 (4) (5) (6) uPU (u) = (1/2)(2/3) + (2/3)(1/3) = 5/9 You can check that E[U ] = E[B]. Problem 4.9.9 Solution Random variables N and K have the joint PMF   100n e−100 (n+1)! PN,K (n, k) =  0 169 k = 0, 1, . . . , n; n = 0, 1, . . . otherwise (1) . the pizza is sold before noon with probability 1/2. E [K] = E [E [K|N ]] = E [N/2] = 50. k) 1/(n + 1) k = 0. y) = 1 -1 1/2 −1 ≤ x ≤ y ≤ 1 0 otherwise (1) 170 . .9. 1. n E [K|N = n] = k=0 k/(n + 1) = n/2 (4) We can conclude that E[K|N ] = N/2. n. n) = PM |N (m|n) PN (n) = 0 n m 100 n (3) (4) (1/3)m (2/3)n−m (1/2)100 0 ≤ m ≤ n ≤ 100 otherwise Problem 4. For n ≥ 0. . . n}. In particular. PM. 1. . . . For n ≥ 0. n PN (n) = k=0 100n e−100 100n e−100 = (n + 1)! n! (2) We see that N has a Poisson PMF with expected value 100.25.N (m. . . (5) Problem 4.K (n. 1. . Hence. each of those pizzas has mushrooms with probability 1/3. .We can find the marginal PMF for N by summing over all possible K. Thus. . the conditional PMF of K given N = n is PN. K has a discrete uniform PMF over {0. by Theorem 4. . . N has the binomial PMF PN (n) = 0 100 n (1/2)n (1/2)100−n n = 0.9. 100 otherwise (2) The joint PMF of N and M is for integers m. . given N = n. n PK|N (k|n) = = (3) 0 otherwise PN (n) That is. given that N = n pizzas were sold before noon. . 1. n otherwise (1) The other fact we know is that for each of the 100 pizzas sold. Thus.Y (x.11 Solution Y Random variables X and Y have joint PDF X fX.10 Solution This problem is fairly easy when we use conditional PMF’s. The conditional PMF of M given N is the binomial distribution PM |N (m|n) = 0 n m (1/3)m (2/3)n−m m = 0. y) = (a) The marginal PDF of X is fX (x) = 2 0 √ r 2 −x2 1/(πr2 ) 0 ≤ x2 + y 2 ≤ r2 0 otherwise (1) 1 dy = πr2 0 √ 2 r 2 −x2 πr 2 −r ≤ x ≤ r otherwise (2) The conditional PDF of Y given X is fX.9. Note that N ≥ M .Y (x.12 Solution We are given that the joint PDF of X and Y is fX.13 Solution P [M = m. Hence the conditional expected value is E[X|Y = y] = (y − 1)/2. r2 − x2 ]. y) = fY (y) 0 1 1+y −1 ≤ x ≤ y otherwise (4) (c) Given Y = y.Y (x. Since the conditional PDF fY |X (y|x) is symmetric about y = 0. Y has a uniform PDF. N = n] = P [dd · · · d v dd · · · d v] = (1 − p)n−2 p2 171 = (1 − p) m−1 (1) p (2) (3) p(1 − p) n−m−1 . N = n} has probability m−1 calls n−m−1 calls Problem 4. the conditional PDF of X is uniform over [−1. y]. E [Y |X = x] = 0 (4) The key to solving this problem is to find the joint PMF of M and N .(a) For −1 ≤ y ≤ 1. y) dx = 1 2 y −1 dx = (y + 1)/2 (2) The complete expression for the marginal PDF of Y is fY (y) = (y + 1)/2 −1 ≤ y ≤ 1 0 otherwise (3) (b) The conditional PDF of X given Y is fX|Y (x|y) = fX. Problem 4. we observe that over the interval [− r2 − x2 .Y (x.9. the joint event {M = m.Y (x. y) = fY |X (y|x) = fX (x) √ 1/(2 r2 − x2 ) y 2 ≤ r2 − x2 0 otherwise (3) √ √ (b) Given X = x. For n > m. the marginal PDF of Y is fY (y) = ∞ −∞ fX. N (m. 0 otherwise (4) The marginal PMF of N satisfies n−1 PN (n) = m=1 (1 − p)n−2 p2 = (n − 1)(1 − p)n−2 p2 . (5) Similarly. . . 2. n = 2. . Thus. 0 otherwise (11) The interpretation of the conditional PMF of N given M is that given M = m. m + 2. . PN |M (n|m) = PM. for m = 1. N has a Pascal PMF since it is the number of trials required to see 2 successes.14 Solution (a) The number of buses.. n) = (1 − p)n−2 p2 m = 1. The conditional PMF’s are now easy to find. . n = m + 1. n) = PM (m) (1 − p)n−m−1 p n = m + 1. . .A complete expression for the joint PMF of M and N is PM. . N . 2. . . . . P [N = n. . the first voice call is equally likely to occur in any of the previous n − 1 calls. the marginal PMF of M satisfies PM (m) = ∞ = p [(1 − p)m−1 + (1 − p)m + · · · ] = (1 − p) m−1 n=m+1 2 (1 − p)n−2 p2 (6) (7) (8) p The complete expressions for the marginal PMF’s are PM (m) = PN (n) = (1 − p)m−1 p m = 1. 3.N (m. . . M has a geometric PMF since it is the number of trials up to and including the first success. 172 . 0 otherwise (9) (10) Not surprisingly. n − 1 0 otherwise (12) Given that call N = n was the second voice call. The conditional PMF of M given N is PM |N (m|n) = PM. Problem 4. . . 0 otherwise (n − 1)(1 − p)n−2 p2 n = 2. t satisfying 1 ≤ n ≤ t. 3. . . must be greater than zero. if we view each voice call as a successful Bernoulli trial. . . m + 2. n − 1. .9. . . .N (m. T = t] > 0 for integers n. Also. n) = PN (n) 1/(n − 1) m = 1. Also. 2. . N = m + N where N has a geometric PMF with mean 1/p. the number of minutes that pass cannot be less than the number of buses. t) = PN (n) 0 t−1 n−1 pn (1 − p)t−n t = n. suppose we regard each minute as an independent trial in which a success occurs if a bus arrives and that bus is boarded. Specifically. If we view each bus arrival as a success of an independent trial. . the time for n buses to arrive has the above Pascal PMF. . PN |T PN. the conditional PMF of T given N is PT |N (t|n) = PN. . (1 − pq)t−1 pq t = 1. Given that you board bus N = n. 0 otherwise (8) To find the PMF of T . 2. 173 . . In particular. the experiment ends when a bus is boarded. given you depart at time T = t.T (n. . it is boarded with probability q. PN.(b) First. the success probability is pq and T is the number of minutes up to and including the first success. n + 1. In this case. N is the number of trials until the first success. t) = P [ABC] = P [A]P [B]P [C] where the events A.T (n. when a bus arrives. N has the geometric PMF PN (n) = (1 − q)n−1 q n = 1. B and C are B = {none of the first n − 1 buses are boarded} C = {at time t a bus arrives and is boarded} A = {n − 1 buses arrive in the first t − 1 minutes} (1) (2) (3) These events are independent since each trial to board a bus is independent of when the buses arrive. t otherwise 0 (10) That is. the time T when you leave is the time for n buses to arrive. it is much easier to obtain the marginal PMFs by consideration of the experiment. 2. . the joint PMF of N and T is PN.T (n. . we find the joint PMF of N and T by carefully considering the possible sample paths. Moreover. PT (t) = (9) 0 otherwise (d) Once we have the marginal PMFs. .T (n. 2. These events have probabilities P [A] = t − 1 n−1 (1 − p)t−1−(n−1) p n−1 (4) (5) (6) P [B] = (1 − q)n−1 P [C] = pq Consequently. . the conditional PMFs are easy to find. . . t) = (n|t) = PT (t) t−1 n−1 p(1−q) 1−pq n−1 1−p 1−pq t−1−(n−1) n = 1. t) = 0 t−1 n−1 pn−1 (1 − p)t−n (1 − q)n−1 pq n ≥ 1. . t ≥ n otherwise (7) (c) It is possible to find the marginal PMF’s by summing the joint PMF. Similarly. . . Thus. However. . The PMF of T is also geometric. the number of buses that arrive during minutes 1. . otherwise (11) This result can be explained. By viewing whether each arriving bus is boarded as an independent trial. t−1 has a binomial PMF since in each minute a bus arrives with probability p. . (3) Applying the PMF of N found above. . . .10.9. n − 99 (8) Flip a fair coin 100 times and let X be the number of heads in the first 75 flips and Y be the number of heads in the last 25 flips.15 Solution If you construct a tree describing what type of call (if any) that arrived in any 1 millisecond period. p) PMF PN (n) = n − 1 99 α (1 − α)n−99 . otherwise (6) α100 (1 This solution can also be found a consideration of the sample sequence of Bernoulli trials in which we either observe or do not observe a fax message. . 98 (2) Since the trials needed to generate successes 2 though 100 are independent of the trials that yield the first success. t)  ( n−1 ) ( 99 ) PT |N (t|n) = =  0 PN (n) otherwise. 2. We know that X and Y are independent and can find their PMFs easily. it has the Pascal (k = 99. . the time required for the first success has the geometric PMF PT (t) = (1 − α)t−1 α t = 1. .Problem 4. . . . . it will be apparent that a fax call arrives with probability α = pqr or no fax arrives with probability 1 − α. Since N is just the number of trials needed to observe 99 successes.1 Solution Hence for any integer n ≥ 100. we have PN |T (n|t) = Finally the joint PMF of N and T is PN.T (n. PX (x) = 75 (1/2)75 x 174 PY (y) = 25 (1/2)25 y (1) . 0 otherwise (1) Note that N is the number of trials required to observe 100 successes. whether a fax message arrives each millisecond is a Bernoulli trial with success probability α. . . . 2. t = 1. To find the conditional PMF PT |N (t|n). N and T are independent. n = 99 + t. Hence PN |T (n|t) = PN |T (n − t|t) = PN (n − t) . Thus. we first must recognize that N is simply the number of trials needed to observe 100 successes and thus has the Pascal PMF PN (n) = n − 1 100 α (1 − α)n−100 99 (7) Problem 4. 100 + t. t) = PN |T (n|t) PT (t) = 0 n−t−1 98 n − t − 1 99 α (1 − α)n−t−99 . Moreover. 2. the number of trials needed to observe 100 successes is N = T + N where N is the number of trials needed to observe successes 2 through 100. the conditional PMF is  n−t−1 98 PN. That is. . 98 (4) (5) − α)n−100 t = 1.T (n. y) = (20)(20)220(20) PX (20) PY (20) = 2. y)/PX (x) to write   3/4 y = −1 1/3 y = −1. However.   3/4 k = 0 1/4 k = 20 PX (k) = PY (k) =  0 otherwise (1) E [X + Y ] = E [X] + E [X] = 2E [X] = 10 Since X and Y are independent.75 × 1012 (6) (7) Problem 4.Y (x.Y (x. Theorem 4.Y (−1. we need the marginal PMF PX (x) and the conditional PMFs PY |X (y|x).Y (x. the zeroes in the table of the joint PMF PX. PX. checking independence requires the marginal PMFs. By summing the rows on the table for the joint PMF.Y (x.10. 0. PX. y) y = −1 y = 0 y = 1 PX (x) x = −1 3/16 1/16 0 1/4 1/6 1/6 1/6 1/2 x=0 0 1/8 1/8 1/4 x=1 Now we use the conditional PMF PY |X (y|x) = PX. in this problem. 1 0 otherwise (2) (3) (4) 175 . PX (−1) = 1/4 and PY (1) = 14/48 but PX.Y (x. we obtain PX.20 (5) XY 2XY PX. y) allows us to verify very quickly that X and Y are dependent.27 yields Var[X] = 3/4 · (0 − 5)2 + 1/4 · (20 − 5)2 = 75 E [X] = 3/4 · 0 + 1/4 · 20 = 5 (2) (3) (4) Var[X + Y ] = Var[X] + Var[Y ] = 2 Var[X] = 150 Since X and Y are independent. 1 1/4 y = 0 PY |X (y| − 1) = PY |X (y|0) = 0 otherwise  0 otherwise PY |X (y|1) = 1/2 y = 0.10. y) = 75 x 25 (1/2)100 y (2) Problem 4.Y (x.2 Solution Using the following probability model We can calculate the requested moments.The joint PMF of X and N can be expressed as the product of the marginal PMFs because we know that X and Y are independent.3 Solution (a) Normally. In particular. 1) = 0 = PX (−1) PY (1) (1) (b) To fill in the tree diagram. y) = PX (x)PY (y) and E XY 2XY = x=0.20 y=0. 9. .10. . p = 1/2. it should be apparent that X1 and X2 are independent and PX1 . the number of flips until heads is the number of trials needed for the first success which has the geometric PMF PX1 (x) = (1 − p)x−1 p x = 1. 0 otherwise (2) However. the number of additional flips for the second heads is the same experiment as the number of flips needed for the first occurrence of heads.5 Solution We can solve this problem for the general case when the probability of heads is p. . . That is. 1. The generic solution and the specific solution with the exact values are PY |X (−1|−1) $ Y =−1 $   PX (−1)       d PX (0) d d d PX (1) d d     X=−1 $ PY |X (0|−1) $$$ Y =0 PY |X (−1|0) ¨ Y =−1 ¨ X=0 ¨¨ ¨¨ rr (0|0) PY |X rr PY |X (1|0) r r Y |X ˆˆˆ ˆˆ ˆ Y =0   d       1/2   1/4   X=−1 $ 3/4 $ Y =−1 $ $$$ Y =0 1/4 1/3 ¨ Y =−1 ¨ Y =0 X=0 Y =1 X=1 P (0|1) Y =0 Y =1 d d d 1/4 d d ¨¨ ¨¨ rr 1/3 rr 1/3 r r 1/2 ˆˆˆ ˆˆ 1/2 ˆ Y =1 X=1 Y =0 Y =1 PY |X (1|1) Problem 4.10.X2 (x1 . . the flips needed to generate the second occurrence of heads are independent of the flips that yield the first heads. Moreover. For the fair coin. .Now we can us these probabilities to label the tree. 0 otherwise (1) Similarly. n otherwise (1) Since PM |N (m|n) depends on the event N = n. . x2 = 1. . we found that the conditional PMF of M given N is PM |N (m|n) = 0 n m (1/3)m (2/3)n−m m = 0. if this independence is not obvious. When x1 ≥ 1 and x2 ≥ 1. Hence. no matter how large X1 may be. the event {X1 = x1 . X2 = x2 } occurs iff we observe the sample sequence x1 − 1 times tt · · · t h tt · · · t h x2 − 1 times (3) The above sample sequence has probability (1−p)x1 −1 p(1−p)x2 −1 p which in fact equals PX1 . . it can be derived by examination of the sample path. 176 . . . Viewing each flip as a Bernoulli trial in which heads is a success. . 2.X2 (x1 . . Problem 4. we see that M and N are dependent. x2 ) given earlier.10. 2.4 Solution In the solution to Problem 4. . 2. PX2 (x) = PX1 (x). x2 ) = PX1 (x1 ) PX2 (x2 ) = (1 − p)x1 +x2 −2 p2 x1 = 1. we use the joint PDF fX. Y ] = 0. .We will solve this problem when the probability of heads is p. Theorem 4.27 says Var[Y ] = Var[X1 ] + Var[−X2 ] = Var[X1 ] + Var[X2 ] = 2(1 − p) p2 (3) (2) Problem 4. Cov[X. For the fair coin. p = 1/2. we can use Theorem 4.Y (x. By Theorem 4.10. 0 otherwise (1) Problem 4. . y) = fX (x)fY (y).6 Solution Thus. the correlation is E[XY ] = E[X]E[Y ] = 6. Problem 4. Appendix A tells us that E[X] = 1/λX = 3 and E[Y ] = 1/λY = 2.7 Solution X and Y are independent random variables with PDFs fX (x) = 0 1 −x/3 3e x≥0 otherwise fY (y) = 0 1 −y/2 2e y≥0 otherwise (1) (a) To calculate P [X > Y ]. 2. (c) Since X and Y are independent. E[Xi ] = 1/p and Var[Xi ] = (1 − p)/p2 .14.10. The number X1 of flips until the first heads and the number X2 of additional flips for the second heads both have the geometric PMF PX1 (x) = PX2 (x) = (1 − p)x−1 p x = 1.10. E [Y ] = E [X1 ] − E [X2 ] = 0 Since X1 and X2 are independent. . Since X and Y are independent. P [X > Y ] = = 0 x>y ∞ ∞ 0 ∞ fX (x) fY (y) dx dy 1 −y/2 ∞ 1 −x/3 e e dx dy 2 3 y 1 −y/2 −y/3 e e dy 2 3 1 −(1/2+1/3)y 1/2 e = dy = 2 1/2 + 2/3 7 (2) (3) (4) (5) = = 0 (b) Since X and Y are exponential random variables with parameters λX = 1/3 and λY = 1/2.8 Solution (a) Since E[−X2 ] = −E[X2 ].13 to write E [X1 − X2 ] = E [X1 + (−X2 )] = E [X1 ] + E [−X2 ] =0 177 = E [X1 ] − E [X2 ] (1) (2) (3) . PN. Thus for an integer w.1 choc.2 0. 0. PW (w) = P [W = w] = P [X + Y = w] . (a) The following table is given in the problem statement. (3) Problem 4.27(a) says that Var[X1 − X2 ] = Var[X1 + (−X2 )] = 2 Var[X] (4) (5) (6) = Var[X1 ] + Var[−X2 ] Since X and Y are take on only integer values. “chocolate”. and “strawberry” correspond to the events D = 20.2 0. w − k) . d) d = 20 d = 100 d = 300 n=1 n=2 0. then W = w if and only if Y = w − k. “vanilla”.10.Y (k.9 Solution Suppose X = k. It follows that for any integer w.2 0. (1) Problem 4.5(f).1 0. Thus PW (w) = ∞ k=−∞ P [X = k.(b) By Theorem 3. Var[−X2 ] = (−1)2 Var[X2 ] = Var[X2 ]. Since X1 and X2 are independent. Similarly.2 strawberry 0.1 This table can be translated directly into the joint PMF of N and D. Theorem 4.D (n.2 0. w − k) = PX (k)PY (w − k). W = X + Y is integer valued as well. PX.2 0.10 Solution The key to this problem is understanding that “short order” and “long order” are synonyms for N = 1 and N = 2.10.2 0. D = 100 and D = 300. PW (w) = ∞ k=−∞ PX (k) PY (w − k) .1 (1) 178 . To find all ways that X + Y = w. we must consider each possible integer k such that X = k.2 0.Y (k. vanilla short order long order 0. Y = w − k] = ∞ k=−∞ PX. (2) Since X and Y are independent. 2 0.2) + 1(300)(0.2) + 2(20)(0. “small”. (f) In terms of N and D.D (2.D (2.11 Solution The key to this problem is understanding that “Factory Q” and “Factory R” are synonyms for M = 60 and M = 180.(b) We find the marginal PMF PD (d) by summing the columns of the joint PMF.1 can be translated into the following joint PMF for B and M . 20) + PN.   0 otherwise.3 d = 20.   0.3) + 2(100)(0. however. Hence N and D are dependent.2 0. In this case.d ndPN. and “large” orders correspond to the events B = 1. B = 2 and B = 3.3 0.2) + 1(100)(0.1 b=3 179 (1) .D (2. “medium”. PD (d) = (2)  0. The expected value of C is E [C] = n.3 0.1 0. (c) To find the conditional PMF PD|N (d|2). it is simpler to observe that PD (d) = PD|N (d|2).D (2.4) + 2(300)(0. This yields   0. d) (6) (7) (8) = 1(20)(0.10.1 0. we could calculate the marginal PMFs of N and D.2 b=2 0.4 The conditional PMF of N D given N = 2 is   1/4  PN.2 0. d)  1/2 = PD|N (d|2) =  1/4 PN (2)   0 d = 20 d = 100 d = 300 otherwise (4) (d) The conditional expectation of D given N = 2 is E [D|N = 2] = d dPD|N (d|2) = 20(1/4) + 100(1/2) + 300(1/4) = 130 (5) (e) To check independence.4 d = 100.3) = 356 Problem 4.3 d = 300. Similarly.1 0.D (n. PB.M (b. 300) = 0. 100) + PN. the cost (in cents) of a fax is C = N D. m) m = 60 m = 180 b=1 0. we first need to find the probability of the conditioning event (3) PN (2) = PN. (a) The following table given in the problem statement small order medium order large order Factory Q 0.1 Factory R 0. (1) (a) Since X1 and X2 are identically distributed they will share the same CDF FX (x).1 0.M (1.M (b. since PB.12 Solution Random variables X1 and X2 are iiid with PDF fX (x) = x/2 0 ≤ x ≤ 2. m) (6) (7) (8) = 1(60)(0.3) + 3(0.7 (3) (d) The conditional expectation of M given B = 2 is E [M |B = 2] = (c) From the marginal PMF of B. B = 2 is  1/3 PB.3) + 2(60)(0.5 The expected number of boxes is E [B] = b (2) bPB (b) = 1(0. 0 otherwise. These can be found from the row and column sums of the table of the joint PMF PB.5 0. it will prove helpful to find the marginal PMFs PB (b) and PM (m).10.3 0. we know that PB (2) = 0.2 b=3 PM (m) 0.2) + 3(180)(0.1 0.M (b.5 0.m bmPB.1) + 3(60)(0.2) + 2(180)(0. m)  2/3 = PM |B (m|2) =  PB (2) 0 The conditional PMF of M given m = 60 m = 180 otherwise (4) m mPM |B (m|2) = 60(1/3) + 180(2/3) = 140 (5) (e) From the marginal PMFs we calculated in the table of part (b). (f) In terms of M and B.2 0. The expected value of C is E [C] = b.3. m) m = 60 m = 180 PB (b) b=1 0.1) = 210 Problem 4. we can conclude that B and M are not independent. the cost (in cents) of sending a shipment is C = BM .(b) Before we find E[B].2) = 1. 60) = PB (1)PM (m)60.2 0.M (2.1 0.1) + 1(180)(0.5) + 2(0.  x≤0  0 x x2 /4 0 ≤ x ≤ 2 fX x dx = FX (x) =  0 1 x≥2 180 (2) .3 b=2 0. X2 ≤ 1] Since X1 and X2 are independent. E [X] = E [Y ] = ∞ −∞ ∞ −∞ fX (x) dx = fY (y) dy = 1 0 1 0 2x2 dx = 2/3 3y 3 dy = 3/4 (2) (3) (b) First. X2 ) ≤ 1] = P [X1 ≤ 1. 0 ≤ y ≤ 1 0 otherwise (4) . FW (1) = P [X1 ≤ 1] P [X2 ≤ 1] = [FX (1)]2 = 1/16 (d) FW (w) = P [max(X1 . We will see this when we compare the conditional expectations E[X|A] and E[Y |A] to E[X] and E[Y ].10. we can say that P [X1 ≤ 1. y) = fX (x) fY (y) = 181 6xy 2 0 ≤ x ≤ 1. using the marginal PDFs fX (x) and fY (y). Unfortunately. y. this problem asks us to calculate the conditional expectations E[X|A] and E[Y |A]. FW (1) = P [max(X1 .Y |A (x. When we learn that X > Y . this is not the case.13 Solution fX (x) = X and Y are independent random variables with PDFs 2x 0 ≤ x ≤ 1 0 otherwise fY (y) = 3y 2 0 ≤ y ≤ 1 0 otherwise (1) For the event A = {X > Y }. E[X] and E[Y ]. it is tempting to argue that the event X > Y does not alter the probability model for X and Y . we need to calculate the conditional joint PDF ipdf X.Y (x. FW  w≤0  0 2 4 /16 0 ≤ w ≤ 2 w (w) = P [X1 ≤ w] P [X2 ≤ w] = [FX (w)] =  1 w≥2 (7) (6) (5) (4) 1 16 (3) Problem 4. X2 ). X2 ) ≤ w] = P [X1 ≤ w. it increases the probability that X is large and Y is small. X2 ≤ 1] = P [X1 ≤ 1] P [X2 ≤ 1] = FX1 (1) FX2 (1) = [FX (1)]2 = (c) For W = max(X1 . We will do this using the conditional joint PDF fX. Since X and Y are independent. (a) We can calculate the unconditional expectations. y). The first step is to write down the joint PDF of X and Y : fX.(b) Since X1 and X2 are independent. X2 ≤ w] Since X1 and X2 are independent. Y |Ax. y) = X fX.Y The event A has probability P [A] = X>Y X 1 fX. then fX. We see that E[X|A] > E[X] while E[Y |A] < E[Y ].Y |A (x. y) ∈ A otherwise (8) 15xy 2 0 ≤ y ≤ x ≤ 1 (9) 0 otherwise 1 The triangular region of nonzero probability is a signal that given A. The conditional expected value of Y given A is E [Y |A] = ∞ −∞ 0 1 ∞ −∞ yfX.Y (x. From Theorem 4. The conditional expected value of X given A is = E [X|A] = = 15 =5 0 0 1 ∞ −∞ 1 ∞ −∞ xfX.y) P [A] 1 0 (x.4. y) = ∂[fX (x) FY (y)] ∂ 2 [FX (x) FY (y)] = = fX (x) fY (y) ∂x ∂y ∂y (1) Hence. That is. y) dy dx x>y 1 x 0 0 1 0 (5) (6) (7) = = 6xy 2 dy dx 1 Y 2x4 dx = 2/5 The conditional joint PDF of X and Y given A is fX. FX.14 Solution This problem is quite straightforward. y|a) x.Y |A (x.10.Y |A (x. learning X > Y gives us a clue that X may be larger than usual while Y may be smaller than usual.Y (x.Y (x. y) = FX (x)FY (y) implies that X and Y are independent. X and Y are no longer independent. y) = fX (x) fY (y) 182 (2) . we can find the joint PDF of X and Y is fX. If X and Y are independent. y) dy dx x (13) (14) (15) = 15 = 15 4 x 0 1 0 y 3 dy dx x5 dx = 5/8.Y (x. Problem 4.Y (x. y dy dx x 0 (10) (11) (12) x2 y 2 dy dx x5 dx = 5/6. W ≤ w] (1) (2) = P [X ≤ x. To verify this.15 Solution Random variables X and Y have joint PDF fX. X ≤ Y ≤ X + w]. Y − X ≤ w] = P [X ≤ x.W (x.10.Y (x.Y (x. v) dv du y −∞ (3) (4) (5) fX (u) du fY (v) dv = FX (x) FX (x) Problem 4. FX. Y FW (w) = 1 − P [W > w] = 1 − P [Y > X + w] w X<Y<X+w X (2) (3) (4) =1− ∞ ∞ λ2 e−λy dy dx The complete expressions for the joint CDF and corresponding joint PDF are FW (w) = 0 w<0 1 − e−λw w ≥ 0 fW (w) = 0 w<0 λe−λw w ≥ 0 (5) =1−e 0 x+w −λw Problem 4. First we find the joint CDF. Thus.W (x.Y (u.By Definition 4. w) = P [X ≤ x.10. For w ≥ 0. Hence FW (w) = 0 for w < 0.W (x. w) into the product fX (x)fW (w) of marginal density functions. Since Y ≥ X. y) = = x y −∞ −∞ x −∞ fX. Y ≤ X + w] Since Y ≥ X. Y {X<x}∩{X<Y<X+w} FX.3. w) = P [X ≤ x. w) = 0 x x x 0 x +w λ2 e−λy dy dx x +w x (3) (4) dx (5) (6) = w X x −λe−λy dx = 0 x −λe−λ(x +w) + λe−λx x 0 −λw = e−λ(x +w) − e−λx = (1 − e−λx )(1 − e 183 ) (7) . we must be able to factor the joint density function fX. FX. the CDF of W satisfies FX. y) = λ2 e−λy 0 ≤ x ≤ y 0 otherwise (1) For W = Y − X we can find fW (w) by integrating over the region indicated in the figure below to get FW (w) then taking the derivative with respect to w. for x ≥ 0 and w ≥ 0. W = Y − X is nonnegative.W (x.16 Solution (a) To find if W and X are independent. we must find the joint PDF of X and W . by applying Theorem 4. y) =  0 otherwise fW. FW. we must consider each case separately. the integration is Y {Y<y}Ç{Y<X+w} y w X y-w y FW. ∂x ∂w Since we have our desired factorization. Moreover.Y (w. y) = 0 y−w u+w u λ2 e−λv dv du y u + =λ 0 y−w y y−w λ2 e−λv dv du (11) e−λu − e−λ(u+w) du y y−w +λ It follows that FW.4 yields The complete expression for the joint CDF is   1 − e−λw − λwe−λy 0 ≤ w ≤ y 1 − (1 + λy)e−λy 0≤y≤w FW. y) = P [W ≤ w. For y > w.Y (w.Y (w.Y (w. w) = (b) Following the same procedure. y) = ∂w ∂y 2λ2 e−λy 0 ≤ w ≤ y 0 otherwise (19) (20) The joint PDF fW. y) = ∂ 2 FW. Keep in mind that although W = Y − X ≤ Y . Y ≤ y] . w) = FX (x)FW (w).We see that FX.W (x.W (x. y) doesn’t factor and thus W and Y are dependent. In any case. Y ≤ y] = P [Y − X ≤ w.Y (w. W and X are independent.Y (w.Y (w. we find the joint CDF of Y and W .Y (w.4. y) need not obey the same constraints. Y ≤ y] (9) (10) = P [Y ≤ X + w. 184 . y 0 y u FW. Y ≤ y} depends on whether y < w or y ≥ w. (8) The region of integration corresponding to the event {Y ≤ x + w. the dummy arguments y and w of fW. fX.Y (w.W (x. ∂ 2 FX. y) = −e−λu + e−λ(u+w) For y ≤ w. Y {Y<y} w y X y−w 0 e−λu − e−λy du y y−w (12) + −e−λu − uλe−λy (13) (14) = 1 − e−λw − λwe−λy . w) = λe−λx λe−λw = fX (x) fW (w) . y) = = 0 λ2 e−λv dv du (15) (16) (17) (18) y −λe−λy + λe−λu du y 0 = −λue−λy − e−λu = 1 − (1 + λy)e−λy Applying Theorem 4. V (u. In this case. we find that c = independent. u < Y ≤ v] (2) (3) (4) (5) (6) (7) (8) FU. v) ∂u∂v ∂ = [fX (v) FY (u) + FX (u) fY (v)] ∂u = fX (u) fY (v) + fX (v) fY (v) = FX (v) FY (v) − (FX (v) − FX (u)) (FY (v) − FY (u)) The joint PDF is (9) (10) (11) Problem 4. Thus P [U > u. In the same way. First.10.V (u.Y (x. and we are left to solve the following for E[X]. since V = max(X. Y ) > u if and only if X > u and Y > u. Y > u. E[Y ].Problem 4. y) = ce−(x 2 /8)−(y 2 /18) (1) The omission of any limits for the PDF indicates that it is defined over all x and y. v) = P [X ≤ v] P [Y ≤ v] − P [u < X ≤ v] P [u < X ≤ v] = FX (v) FY (u) + FX (u) FY (v) − FX (u) FY (u) fU. X ≤ v.17 Solution We need to define the events A = {U ≤ u} and B = {V ≤ v}. = P [X ≤ v. we also find that X and Y are 185 . Y ≤ v] = P [u < X ≤ v. y) is in the form of the bivariate Gaussian distribution so we look to Definition 4. y) doesn’t contain any cross terms we know that ρ must be zero. u < X ≤ v] FU. and σY : x − E [X] σX From which we can conclude that E [X] = E [Y ] = 0 √ σX = 8 √ σY = 18 Putting all the pieces together. E[X]. v) = P [AB] = P [B] − P [Ac B] = P [V ≤ v] − P [U > u. σX . 1 24π . we know that the constant is c= 1 2πσX σY 1 − ρ2 (2) Because the exponent of fX.Y (x. V ≤ v] = P [X > u.V (u. E[Y ] and ρ. the joint CDF of U and V satisfies FU. v) = ∂ 2 FU. Y ≤ v] − P [u < X ≤ v. V ≤ v if and only if X ≤ v and Y ≤ v.Y (x. σX . Y ). V ≤ v] Since X and Y are independent random variables. V ≤ v] (1) Thus. Note that U = min(X. We know that fX.11.17 and attempt to find values for σY .V (u.1 Solution fX. v) = P [V ≤ v] − P [U > u. 2 = x2 8 y − E [Y ] σY 2 = y2 18 (3) (4) (5) (6) Since ρ = 0.V (u. E[X].Y (x. we try to solve the following equations x − E [X] σX 2 = 4(1 − ρ2 )x2 (2) (3) (4) y − E [Y ] 2 = 8(1 − ρ2 )y 2 σY 2ρ = 8(1 − ρ2 ) σX σY The first two equations yield E[X] = E[Y ] = 0 (b) To find the correlation coefficient ρ. (a) First. Hence ρ = 1/2. σX = 1/ 4(1 − ρ2 ) 8(1 − ρ2 ) (5) Using σX and σY √ (c) Since ρ = 1/ 2.11. we learn that µX = µY = 0 2 2 σX = σ Y = 1 (1) From Theorem 4.11.Y (x.11. the joint PDF is 1 2 2 fX. we observe that σY = 1/ √ in the third equation yields ρ = 1/ 2. From Definition 4. now we can solve for σX and σY . √ σX = 1/ 2 σY = 1/2 (d) From here we can solve for c.30. Problem 4. we learn that E[Y |X] = X/2. y) = √ e−2(x −xy+y )/3 (3) 2 3π 186 .17. E[Y ] and ρ. (1) we proceed as in Problem 4. σX . the conditional expectation of Y given X is E [Y |X] = µY (X) = µY + ρ ˜ σY (X − µX ) = ρX σX (2) In the problem statement.Problem 4.3 Solution From the problem statement. c= 1 2πσX σY 1− ρ2 = 2 π (6) (7) (e) X and Y are dependent because ρ = 0.2 Solution For the joint PDF fX.1 to find values for σY . y) = ce−(2x 2 −4xy+4y 2 ) . fX.11. y) = fX (x) fY (y) = (2) Because X and Y have a uniform PDF over the bullseye area. Of ocurse. P [B] = 2 2π 1 2 2 e−r /2σ r dr dθ 2 2πσ 0 0 2 1 2 2 re−r /2σ dr = 2 σ 0 2 /2σ 2 (10) (11) (12) = −e−r 2 0 = 1 − e−4/200 ≈ 0. −50 ≤ y ≤ 50 0 otherwise (1) Since X and Y are independent.Y (x. i. their joint PDF is fX. the calculation of P [B] depends on the probability model for X and Y .01 −50 ≤ x ≤ 50 0 otherwise 10−4 −50 ≤ x ≤ 50. P [B] = P X 2 + Y 2 ≤ 22 = 10−4 · π22 = 4π · 10−4 ≈ 0. y) = The probability of a bullseye is P [B] = P X 2 + Y 2 ≤ 22 = π22 = π502 1 25 2 (4) ≈ 0..0016. With the substitutions x2 + y 2 = r2 . 187 .Y (x. y) = fX (x) fY (y) = To find P [B].4 Solution The event B is the set of outcomes satisfying X 2 + Y 2 ≤ 22 . P [B] is just the value of the joint PDF over the area times the area of the bullseye. X and Y have the same PDF fX (x) = fY (x) = 0. (a) In this instance.e. and dx dy = r dr dθ. (5) (c) In this instance. their joint PDF is fX. we write P [B] = P X 2 + Y 2 ≤ 22 = = x2 +y 2 ≤22 1 −(x2 +y2 )/2σ2 e 2πσ 2 (7) fX.0013 1/[π502 ] x2 + y 2 ≤ 502 0 otherwise (3) (b) In this case. fX (x) = fY (x) = √ 1 2πσ 2 e−x 2 /2σ 2 (6) Since X and Y are independent. σ) PDF with σ 2 = 100.Y (x. X and Y have the identical Gaussian (0.Y (x. y) dx dy e−(x 2 +y 2 )/2σ 2 (8) dx dy (9) 1 2πσ 2 x2 +y 2 ≤22 This integral is easy using polar coordinates.Problem 4. the joint PDF of X and Y is inversely proportional to the area of the target.0198. Since ρ = 0.29. t) = √ exp − 4 2 2π 2 To find the conditional probability P [I|T = t].29 which says that given T = t. µ2 = E[T ] = 37 and σ2 = σT = 1 and ρ = 1/ 2.T (w. t) = 2πσ1 σ2 1 − ρ2 t−µ2 σ2 2    (4) √ With µ1 = E[W ] = 7.159. we need to find the conditional PDF of W given T = t. we have √ 1 (w − 7)2 2(w − 7)(t − 37) − + (t − 37)2 (5) fW. (3) (b) The general form of the bivariate Gaussian PDF is  2 )(t−µ w−µ1 − 2ρ(w−µ1σ2 2 ) + σ1 σ1  exp − 2(1 − ρ2 ) fW. W and T are independent and q = P [W > 10] = P The tree for this experiment is q $ W >10 $ $ $$$ $$$ 1−q W ≤10 10 − 7 W −7 > = 1 − Φ(1.0107.T (w. 2 2 (2) p $T >38 $$$ $$$ $$ T ≤38 1−p The probability the person is ill is P [I] = P [T > 38. σ1 = σW = 2. Its easier just to apply Theorem 4. the conditional distribution of W is Gaussian with σW (t − E [T ]) (7) E [W |T = t] = E [W ] + ρ σT 2 Var[W |T = t] = σW (1 − ρ2 ) (8) 188 .067. W > 10] = P [T > 38] P [W > 10] = pq = 0. then W is measured.T (w.Problem 4. The direct way is simply to use algebra to find fW |T (w|t) = fW.5) = 0. t) fT (t) (6) The required algebra is essentially the same as that needed to prove Theorem 4.11. (1) Given that the temperature is high.5 Solution (a) The person’s temperature is high with probability p = P [T > 38] = P [T − 37 > 38 − 37] = 1 − Φ(1) = 0. choosing d = a2 c2 − b2 /4 yields a valid PDF.146). or. However.17. the definitions of µY (x) and ˜ ˜ ˜ σY are not particularly important for this exercise. it follows that b = −2acρ. When we integrate the joint PDF over all x ˜ 189 . we see that |b| ≤ 2ac. the conditional probability the person is declared ill is P [I|T = t] = P [W > 10|T = t] √ √ W − (7 + 2(t − 37)) 10 − (7 + 2(t − 37)) √ √ =P > 2 2 √ √ 3 2 3 − 2(t − 37) √ − (t − 37) . y) = σX 2π 1 √ e−(x−µX ) 2 2 /2σX σY ˜ 1 √ 2π µ e−(y−˜Y (x)) 2 /2˜Y σ2 (1) σY where µY (x) = µY + ρ σX (x − µX ) and σY = σY 1 − ρ2 . we obtain the conditional Gaussian PDF √ 2 1 fW |T (w|t) = √ e−(w−(7+ 2(t−37))) /4 .6 Solution The given joint PDF is fX. for any choice of a. b and c that meets this constraint.11.11. y) = de−(a 1 2 2σX (1 2 x2 +bxy+c2 y 2 ) (1) In order to be an example of the bivariate Gaussian PDF given in Definition 4. Further. =Q =P Z> 2 2 (11) (12) (13) Problem 4. we must have a2 = − −ρ b= σX σY (1 − ρ2 ) 1 a 2(1 − ρ2 ) ρ2 ) c2 = d= 1 2 2σY (1 − ρ2 ) 1 1 − ρ2 2πσX σY We can solve for σX and σY . we can write the bivariate Gaussian PDF as fX. yielding σX = σY = 1 c 2(1 − ρ2 ) (2) Since |ρ| ≤ 1.Y (x. Plugging these values into the equation for b. ρ = −b/2ac.Y (x. This implies 1 d2 = 2 2 2 (3) = (1 − ρ2 )a2 c2 = a2 c2 − b2 /4 4π σX σY (1 − ρ2 ) Problem 4.7 Solution From Equation (4. equivalently. 4π (10) Given T = t.Plugging in the various parameters gives √ E [W |T = t] = 7 + 2(t − 37) and Var [W |T = t] = 2 (9) Using this conditional mean and variance. Y (x. The first moment is E [Y ] = E [X1 X2 ] = Cov [X1 . Var[Xi ] = σi . (4) Thus. we obtain ∞ −∞ ∞ −∞ fX. 2 σ2 σ2 It follows that 2 2 2 2 2 E X1 X2 = E X2 E X1 |X2 2 2 = E [µ2 + σ1 (1 − ρ2 )]X2 + 2ρµ1 1 (8) σ1 σ2 2 2 (X2 − µ2 )X2 + ρ2 1 (X2 − µ2 )2 X2 .Y (x. 1 2 (1) Since Var[Y ] = E[Y 2 ] − (E[Y ])2 . In fact. The goal is to show that Y = X1 X2 has variance 2 2 2 2 Var[Y ] = (1 + ρ2 )σ1 σ2 + µ2 σ2 + µ2 σ1 + 2ρµ1 µ2 σ1 σ2 . it is the integral of the conditional PDF fY |X (y|x) over all possible y. we follow the problem hint and use the iterated expectation 2 2 2 2 E Y 2 = E X1 X2 = E E X1 X2 |X2 2 2 = E X2 E X1 |X2 (2) . To complete the proof. y) dx dy = ∞ −∞ ∞ −∞ σX 2π 1 √ 1 √ e−(x−µX ) 2 2 /2σX ∞ −∞ σY ˜ 1 √ 2π µ e−(y−˜Y (x)) 1 2 /2˜Y σ2 dy dx (2) = σX 2π e−(x−µX ) 2 2 /2σX dx (3) The marked integral equals 1 because for each value of x. 2 σ2 σ2 (9) 190 .and y. the conditional second moment of X1 is 2 E X1 |X2 = (E [X1 |X2 ])2 + Var[X1 |X2 ] (5) (6) (7) = µ1 + ρ σ1 (X2 − µ2 ) σ2 2 2 + σ1 (1 − ρ2 ) 2 = [µ2 + σ1 (1 − ρ2 )] + 2ρµ1 1 σ1 σ2 (X2 − µ2 ) + ρ2 1 (X2 − µ2 )2 . it is the integral of a Gaussian PDF of one variable over all possible values. X1 and X2 are jointly Gaussian random variables with E[Xi ] = µi . we observe from Theorem 4. Problem 4.8 Solution 2 In this problem. and correlation coefficient ρ12 = ρ. we will find the moments of Y .11. σ2 2 Var[X1 |X2 = x2 ] = σ1 (1 − ρ2 ). we see that ∞ −∞ ∞ −∞ fX. X2 ] + E [X1 ] E [X2 ] = ρσ1 σ2 + µ1 µ2 . For the second moment of Y . (3) Given X2 = x2 . y) dx dy = ∞ −∞ σX 2π 1 √ e−(x−µX ) 2 2 /2σX dx = 1 (4) since the remaining integral is the integral of the marginal Gaussian PDF fX (x) over all possible x.30 that X1 is is Gaussian with E [X1 |X2 = x2 ] = µ1 + ρ σ1 (x2 − µ2 ). we can conclude that 2 2 2 2 E X1 X2 = µ2 + σ1 (1 − ρ2 ) (σ2 + µ2 ) + 2ρµ1 1 2 σ1 σ2 2 4 2 (2µ2 σ2 ) + ρ2 1 (3σ2 + µ2 σ2 ) 2 2 σ2 σ2 (21) (22) 2 2 2 2 = (1 + 2ρ2 )σ1 σ2 + µ2 σ1 + µ2 σ2 + µ2 µ2 + 4ρµ1 µ2 σ1 σ2 . 1 2 Problem 4. Displaying them as adjacent column vectors forms the list of all possible pairs x. SX.Y (x. we write 2 E (X2 − µ2 )2 X2 = E (X2 − µ2 )2 (X2 − µ2 + µ2 )2 (16) µ2 2 µ2 2 (17) (18) (19) = E (X2 − µ2 ) = E (X2 − µ2 ) 2 2 (X2 − µ2 ) + 2µ2 (X2 − µ2 ) + 2 2 = E (X2 − µ2 )4 + 2µ2 E X2 − µ2 )3 + µ2 E (X2 − µ2 )2 .3.27 generates the grid variables SX.1 Solution The script imagepmf in Example 4. (15) Following this same approach. 2 2 2 2 2 E X1 X2 = µ2 + σ1 (1 − ρ2 ) (σ2 + µ2 ) 1 2 + 2ρµ1 We observe that σ1 σ2 2 2 E (X2 − µ2 )X2 + ρ2 1 E (X2 − µ2 )2 X2 . We now look ahead to Problem 6. y and PX. 2 σ2 σ2 (10) 2 E (X2 − µ2 )X2 = E (X2 − µ2 )(X2 − µ2 + µ2 )2 (11) (12) (13) 2 = E (X2 − µ2 ) = E (X2 − µ2 ) (X2 − µ2 )2 + 2µ2 (X2 − µ2 ) + µ2 2 3 + 2µ2 E (X2 − µ2 ) + µ2 E [(X2 − µ2 )] 2 We recall that E[X2 − µ2 ] = 0 and that E[(X2 − µ2 )2 ] = σ2 . and PXY.4 to learn that 4 E (X2 − µ2 )4 = 3σ2 . y). 2 (X2 − µ2 ) + 2µ2 (X2 − µ2 ) + (20) Combining the above results. SY and PXY are the corresponding values of x.2 2 Since E[X2 ] = σ2 + µ2 .12.Y (x. Recall that for each entry in the grid. y and the probabilities PX. SY. 2 1 1 2 Finally. y). 2 It follows that 2 4 2 E (X2 − µ2 )2 X2 = 3σ2 + µ2 σ2 . Since any Matlab vector or matrix x is reduced to a column vector with the command x(:). combining Equations (2) and (22) yields 2 2 Var[Y ] = E X1 X2 − (E [X1 X2 ])2 (23) (24) 2 2 2 2 = (1 + ρ2 )σ1 σ2 + µ2 σ2 + µ2 σ1 + 2ρµ1 µ2 σ1 σ2 . This implies 2 2 E (X2 − µ2 )X2 = 2µ2 σ2 . (14) E (X2 − µ2 )3 = 0. the following simple commands will generate the list: 191 . ey=finiteexp(SY. Note that finiteexp performs its calculations on the sample values sx and probabilities px using the column vectors sx(:) and px(:).*(px(:))). SY.Y (x. we need to calculate E[X].px) %returns the expected value E[X] %of finite random variable X described %by samples sx and probabilities px ex=sum((sx(:)).12. covxy=R-ex*ey. we can use the same finiteexp function when the random variable is represented by grid variables.e.PXY) It is also convenient to define a function that returns the covariance: function covxy=finitecov(SX. 192 . we want Matlab to calculate an expected value E[g(X. R=finiteexp(SX. for the rest of the problem. we can calculate the correlation r = E[XY ] as r=finiteexp(SX. First we write a function that calculates the expected value of any finite random variable.>> format rat. function ex=finiteexp(sx. and the covariance Cov[X.SY. SY and PXY. For example. As a result.. The general problem is that given a pair of finite random variables described by the grid variables SX. >> [SX(:) SY(:) ans = 800 1200 1600 800 1200 1600 800 1200 1600 >> PXY(:)] 400 400 400 800 800 800 1200 1200 1200 1/5 1/20 0 1/20 1/5 1/10 1/10 1/10 1/5 Note that the command format rat wasn’t necessary. SY and PXY that describe the joint PMF PX. ratios of integers.PXY) %returns the covariance of %finite random variables X and Y %given by grids SX. which you may or may not find esthetically pleasing. a general solution is better than a specific solution. This problem is solved in a few simple steps.*SY. Y ] for random variables X and Y in Example 4.PXY). In this case. i. we can use the script imagepmf. Y )].2 Solution In this problem.PXY). and PXY ex=finiteexp(SX.SY.PXY). %Usage: cxy=finitecov(SX. y). the correlation E[XY ].px).27 to generate the grid variables SX. E[Y ]. it just formats the output as rational numbers. >> imagepmf. However. Problem 4.27. %Usage: ex=finiteexp(sx.PXY).m in Example 4.*SY. *(X<=1).29). PXY(find((SX==n) & (SY==n)))=p^n.p).SX(I)-y)..Y (x.PXY) >> imageavg ex = 1180 ey = 860 rxy = 1064000 cxy = 49200 >> The careful reader will observe that imagepmf is inefficiently coded in that the correlation E[XY ] is calculated twice.*(Y<=X).p.6. Problem 4..12.*((2*Y)-(Y.p. %defines SX.R).Y.m [X.PXY) rxy=finiteexp(SX. PXY ex=finiteexp(SX.PXY]=circuits(n.*(1<X).PXY) cxy=finitecov(SX.m %Solution for Problem 4.*Y)-(Y. it would be worthwhile to avoid this duplication. for y=0:(n-1). the code to set up the sample grid is reasonably straightforward: function [SX.*SY.05:1. I=find((SY==y) &(SX>=SY) &(SX<n)). once directly and once inside of finitecov.SY.SY]=ndgrid(0:n.*(X<Y).*(Y<=1).p). %Usage: [SX. xlabel(’\it x’). SY.*(Y>1)). ylabel(’\it y’). Using this technique. Problem 4.*(2*(X.*(X.12.12. R=R+((0<=Y).2 imagepmf. PXY(I)=(p^y)*(1-p)* . For more complex problems. PXY=0*SX. For functions like FX.*(X<=1).The following script calculates the desired quantities: %imageavg.Y (x. mesh(X. R=R+((0<=X).PXY) ey=finiteexp(SY.0:n). % (See Problem 4. end.12. y) that have multiple cases. we need to calculate the binomial probability of x − y successes in (n − y − 1) trials. we calculate the function for each case and multiply by the corresponding boolean condition so as to have a zero contribution when that case doesn’t apply.x-y) 193 . R=(0<=Y).^2)). y) in Equation (4.2.PXY]=circuits(n.Y]=meshgrid(0:0. R=R+((X>1).3 Solution The script is just a Matlab calculation of FX.^2))). binomialpmf(n-y-1.^2)). The only catch is that for a given value of y.5).4 Solution By following the formulation of Problem 4.SY. its important to define the boundary conditions carefully to make sure that no point is included in two different boundary conditions.4) [SX. We can do this using the function call binomialpmf(n-y-1. %trianglecdfplot.SY. gamma(x) function will calculate the gamma function for each element in a vector x. . I=find((SY<=SX) &(SX<n)). we wish to calculate the correlation coefficient This problem is solved in a few simple steps.Y = Cov [X. In this case. First we write a function that calculates the expected value of a finite random variable.*(px(:))). Because.px).However.0:n).Y (x. σX σY (1) In fact. this function expects the argument n-y-1 to be a scalar. %Usage: ex=finitevar(sx. SY and PMF PXY.^(n-SX(I))).12.px). we can use the same finiteexp function when the random variable is represented by grid variables. the for loop in circuit results in time wasted running the Matlab interpretor and in regenerating the binomial PMF in each cycle.*(p. ex=finiteexp(sx.p).2. Putting these pieces together. 194 . %Usage: [SX.px).px). we must perform a separate call to binomialpmf for each value of y.6. PXY(find((SX==n) & (SY==n)))=p^n. this is one of those problems where a general solution is better than a specific solution.PXY]=circuits2(n.*gamma(n-SX(I)))).SY.p). we can calculate the PMF without any loops: function [SX. An alternate solution is direct calculation of the PMF PX. As a result. PXY(I)=(gamma(n-SY(I)). Note that finiteexp performs its calculations on the sample values sx and probabilities px using the column vectors sx(:) and px(:). we calculate m! using the Matlab function gamma(m+1).p). %Usage: ex=finiteexp(sx./(gamma(SX(I)-SY(I)+1). v=ex2-(ex^2).^SX(I)). As is typical. PXY=0*SX.SY]=ndgrid(0:n. We can build on finiteexp to calculate the variance using finitevar: function v=finitevar(sx. y) in Problem 4.. Some experimentation with cputime will show that circuits2(n.PXY]=circuits2(n. To finish the problem.SY. we can calculate the correlation coefficient.*((1-p). we need to calculate the correlation coefficient ρX.p) runs much faster than circuits(n. As a result. function ex=finiteexp(sx.px) % returns the variance Var[X] % of finite random variables X described by % samples sx and probabilities px ex2=finiteexp(sx. % (See Problem 4.4) [SX.px) %returns the expected value E[X] %of finite random variable X described %by samples sx and probabilities px ex=sum((sx(:)).^2. Y ] . The general problem is that given a pair of finite random variables described by the grid variables SX.. ey=finiteexp(SY. is now a two line exercise. its instructive to account for the work each program does.mu. function w=wrv2(lambda.mu.5 Solution In the first approach X is an exponential (λ) random variable. λ/µ + w (1) λ µ We implement this solution in the function wrv2.1). we implement this approach in the function wrv1. • wrv1 Each exponential random sample requires the generation of a uniform random variable. Here are the two solutions: function w=wrv1(lambda. >> rho=finitecoeff(SX. and W = Y /X. In this problem. In the second approach. Calculating the correlation coefficient of X and Y .4451 >> Problem 4.PXY).. We might guess that wrv2 would be faster by a factor of 2. we generate 2m uniform random variables.PXY). vy=finitevar(SY. vx=finitevar(SX.9).SY.m). y=exponentialrv(mu. Y is an independent exponential (µ) random variable.SY.m shown below. 1) random −1 variable U and calculate W = FW (U ). u 1−u (2) We would expect that wrv2 would be faster simply because it does less work.0.m) %Return m samples of W=Y/X %X is exponential (lambda) %Y is exponential (mu) %Uses CDF of F_W(w) u=rand(m.*SY.PXY). Experimentally. we calculated the execution time associated with generating a million samples: 195 . %Usage: rho=finitecoeff(SX. and perform m floating point divisions. Thus. FW (w) = 1 − Setting u = FW (w) and solving for w yields −1 w = FW (u) = λ/µ .PXY).function rho=finitecoeff(SX. rho=(R-ex*ey)/sqrt(vx*vy). • wrv2 Generate m uniform random variables and perform m floating points divisions.PXY).SY./x./(1-u).PXY).PXY) %Calculate the correlation coefficient rho of %finite random variables X and Y ex=finiteexp(SX. >> [SX.PXY) rho = 0.PXY]=circuits2(50.m) %Usage: w=wrv1(lambda.12. calculate 2m logarithms. R=finiteexp(SX.m).m) %Return m samples of W=Y/X %X is exponential (lambda) %Y is exponential (mu) x=exponentialrv(lambda. w=(lambda/mu)*u. w=y. and the calculation of a logarithm.22 and generate samples of a uniform (0.5. In fact. This quickie analysis indicates that wrv1 executes roughly 5m operations while wrv2 executes about 2m operations. we use Theorem 3.mu.SY.mu.m) %Usage: w=wrv1(lambda. 1. (Note that repeating such trials yielded qualitatively similar results.>> t2=cputime.1.2500 >> t1=cputime.w2=wrv2(1.t1=cputime-t1 t1 = 0.w1=wrv1(1.7610 >> We see in our simple experiments that wrv2 is faster by a rough factor of 3.t2=cputime-t2 t2 = 0.1000000).1000000).) 196 . 4 − n2 − n3 ) = = 4 n 2 . the number of motherboard repairs has binomial PMF PN2 (n2 ) n2 = 4 n2 197 4 15 n2 11 15 4−n2 (9) . Define X as the number of “other” repairs needed. N3 is the multinomial PMF PN2 . . since each repair is a motherboard repair with probability p2 = 4/15. n3 ) = PN2 . . .N3 . N2 . N1 . A repair is type (3) with probability p3 = 2/15. .Problem Solutions – Chapter 5 Problem 5. n3 . ni ≥ 0 otherwise (2) (b) Let L2 denote the event that exactly two laptops need LCD repairs. the joint distribution of N1 . x 4 15 n2 2 15 n3 9 15 x (5) However. n4 ) = = 4 pn1 pn2 pn3 pn4 n 1 . Thus P [L2 ] = PN1 (2). the number of LCD repairs. N4 is the multinomial PMF PN1 ...N4 (n1 .N3 . n2 . n3 .X (n2 . 4 − n 2 − n 3 Similarly. n3 . The joint PMF of X. x) = 4 n 2 . n3 . n3 .1. is a binomial (4. . 4 − n 2 − n 3 9 15 4 (6) 2 15 4 9 n2 n3 4 15 n2 9 15 2 9 n3 4−n2 −n3 (7) (8) 4 n 2 . n4 1 2 3 4 0 4! n1 !n2 !n3 !n4 ! 8 n1 15 4 n2 15 2 n3 15 1 n4 15 (1) n1 + · · · + n4 = 4.X (n2 .N3 (n2 .3717 2 (4) (c) A repair is type (2) with probability p2 = 4/15. otherwise a repair is type “other” with probability po = 9/15. n3 . . . . we observe that PN2 ..1 Solution The repair of each laptop can be viewed as an independent trial with four possible outcomes corresponding to the four types of needed repairs. (a) Since the four types of repairs are mutually exclusive choices and since 4 laptops are returned for repair. Since each laptop requires an LCD repair with probability p1 = 8/15. 8/15) random variable with PMF PN1 (n1 ) = 4 (8/15)n1 (7/15)4−n1 n1 (3) The probability that two laptops need LCD repairs is PN1 (2) = 4 (8/15)2 (7/15)2 = 0.. Since X + 4 − N2 − N3 . N3 (2. n otherwise (1) Since a computer has feature i with probability pi independent of whether any other feature is on the computer. xi ) 0 ≤ xi .. . ... That is.. xn ) = 0 n i=1 min(1. . xn ) = 0 if any xi < 0.N3 (2. . Xn is FX1 . some thought will show that we can write the limits in the following way: FX1 . . the number of computers sold with feature i has the binomial PMF PNi (ni ) = 0 n ni (11) Problem 5. .5129. .N3 (1. .. we obtain P [N2 > N3 ] = PN2 ..N3 (2. . . . . for the moment.Xn (y1 .x1 ) 0 min(1. yn ) = 1 if and only if 0 ≤ yi ≤ 1 for each i. x1 ) min(1. . .. fX1 . .Finally. i = 1. 1) + PN2 (3) + PN2 (4) Plugging in the various probabilities yields P [N2 > N3 ] = 8.2 Solution pni (1 − pi )ni i ni = 0. . . . . xn ) A complete expression for the CDF of X1 . Inserting the various probabilities.Xn (x1 . 1) + PN2 (3) + PN2 (4) (10) where we use the fact that if N2 = 3 or N2 = 4. 0) + PN2 .N3 (2... . .... Whether a computer has feature i is a Bernoulli trial with success probability pi = 2−i . . ..N4 (n1 . In this case. 1. the probability that more laptops require motherboard repairs than keyboard repairs is P [N2 > N3 ] = PN2 ... the number Ni of computers with feature i is independent of the number of computers with any other features.. . ..Xn (x1 .. our attention to the case where xi ≥ 0 for all i. N4 are mutually independent and have joint PMF (2) PN1 .1. Since FX1 . .. .3 Solution (a) In terms of the joint PDF. simplifying the above integral depends on the values of each xi . .. . we can write joint CDF as FX1 . . . . 0) + PN2 . 0) + PN2 . . then we must have N2 > N3 .. yn ) dy1 · · · dyn (1) However...656/16. . . .Xn (y1 . In particular... xn ) = x1 −∞ xn ··· −∞ fX1 .. xn ) = max(1. .xn ) ··· 0 dy1 · · · dyn (2) (3) = min(1. we limit.1. 0) + PN2 . n4 ) = PN1 (n1 ) PN2 (n2 ) PN3 (n3 ) PN4 (n4 ) Problem 5.. . 2.875 ≈ 0. x2 ) · · · min(1. .. . . Given that n computers were sold. . . N1 . .N3 (1.Xn (x1 . n otherwise (4) 198 .Xn (x1 . In terms of the vector X.(b) For n = 3. we have that i=1 ∞ −∞ Problem 5.2. −∞ · · · −∞ fX (x) dx1 · · · dxn = 1.2 Solution ··· ∞ −∞ fX (x) dx1 · · · dxn = c =c 1 0 n i=1 n 0 1 n ··· ai xi 0 1 i=1 1 dx1 · · · dxn (1) (2) 1 ··· 1 0 ai xi dx1 · · · dxn 1 =c i=1 n ai 0 dx1 · · · 1 n 0 xi dxi · · · dxn 0 (3) (4) =c i=1 ai x2 i 2 =c 0 i=1 ai 2 The requirement that the PDF integrate to unity thus implies c= 2 n i=1 ai (5) Problem 5. X2 > 3/4. the PDF is fX (x) = 1 0≤x≤1 0 otherwise (1) However. In this problem.2. X3 > 3/4] = 1 3/4 1 3/4 1 dx1 dx2 dx3 3/4 = (1 − 3/4)3 = 1/64 Thus P [mini Xi ≤ 3/4] = 63/64. Problem 5. just keep in mind that the inequalities 0 ≤ x and x ≤ 1 are vector inequalities that must hold for every component xi . That is.1 Solution This problem is very simple. 199 . 1 The wrong problem statement appears in the first printing. find the marginal PDF fX3 (x3 ). Since fX (x) = ca x = c n ai xi . 1 − P min Xi ≤ 3/4 = P min Xi > 3/4 i i (5) (6) (7) (8) = P [X1 > 3/4.1 Solution Here we solve the following problem:1 Given fX (x) with c = 2/3 and a1 = a2 = a3 = 1 in Problem 5.3.2.2. we find the constant c from the requirement that that the integral of the vector ∞ ∞ PDF over all possible values is 1. for 0 < k1 < k2 < k3 .K3 (k1 .Filling in the parameters in Problem 5. Finally. k3 ) = p3 (1 − p)k3 −3 1 ≤ k1 < k2 < k3 0 otherwise (1) (a) We start by finding PK1 . 0 otherwise. for 0 ≤ x3 ≤ 1. we obtain the vector PDF fX (x) = 0 2 3 (x1 + x2 + x3 ) 0 ≤ x1 . k2 ). x2 . k2 ) = = ∞ k3 =−∞ ∞ k3 =k2 +1 3 PK1 . k3 ) p3 (1 − p)k3 −3 (2) (3) (4) (5) = p (1 − p)k2 −2 1 + (1 − p) + (1 − p)2 + · · · = p2 (1 − p)k2 −2 200 . PK (k) = (1 − p)k1 −1 p(1 − p)k2 −k1 −1 p(1 − p)k3 −k2 −1 p = (1 − p) k3 −3 3 (2) (3) p Problem 5.2 Solution Since J1 . PK1 . we can write PK (k) = PJ1 (k1 ) PJ2 (k2 − k1 ) PJ3 (k3 − k2 ) (1) Since PJi (j) > 0 only for integers j > 0. J2 and J3 are independent. k2 . otherwise PK (k) = 0.K2 (k1 . (6) Problem 5.3 Solution The joint PMF is PK (k) = PK1 .3.3. k2 .K2 .2.K2 (k1 .2. For 1 ≤ k1 < k2 . the marginal PDF of X3 is fX3 (x3 ) = = = = 2 3 2 3 2 3 2 3 1 0 1 0 1 0 0 1 (x1 + x2 + x3 ) dx1 dx2 x1 =1 (2) dx2 (3) (4) x2 1 + x2 x1 + x3 x1 2 1 + x2 + x3 2 dx2 x2 =1 x1 =0 x2 x2 + 2 + x3 x2 2 2 = x2 =0 2 3 1 1 + + x3 2 2 (5) The complete expresion for the marginal PDF of X3 is fX3 (x3 ) = 2(1 + x3 )/3 0 ≤ x3 ≤ 1. we have that PK (k) > 0 only for 0 < k1 < k2 < k3 .K2 . x3 ≤ 1 otherwise (1) In this case.K3 (k1 . k2 . k3 ) = ∞ k1 =−∞ (k3 − k1 − 1)p3 (1 − p)k3 −3 1 ≤ k1 . k2 ) = k2 =−∞ p2 (1 − p)k2 −2 k1 −1 (13) (14) (15) = p (1 − p)k1 −1 [1 + (1 − p) + (1 − p)2 + · · · ] = p(1 − p) The complete expression for the PMF of K1 is the usual geometric PMF PK1 (k1 ) = p(1 − p)k1 −1 k1 = 1. p) random variable.K3 (k1 . it is instructive to derive these PMFs from the joint PMF PK1 . . and K3 is an Pascal (3. K2 and K3 just by looking up the Pascal (n. p) PMF. we can find PK1 (k1 ) via PK1 (k1 ) = ∞ ∞ k2 =k1 +1 2 PK1 . (16) Following the same procedure.K2 (k1 . k1 + 2 ≤ k3 . k2 . 0 otherwise. K2 is an Pascal (2. 0 otherwise. For k1 ≥ 1. . k3 ) = The next marginal PMF is PK2 .K3 (k2 . For k1 ≥ 1 and k3 ≥ k1 + 2. (12) (b) Going back to first principles.K2 (k1 .K2 . (9) PK1 . We could write down the respective marginal PMFs of K1 .K3 (k1 . p) random variable. Thus K1 is a geometric (p) random variable.K3 (k1 .K2 . we note that Kn is the number of trials up to and including the nth success.K3 (k2 .K3 (k1 .The complete expression is PK1 . k3 ) = (k2 − 1)p3 (1 − p)k3 −3 1 ≤ k2 < k3 . k3 ) = k3 −1 k2 =k1 +1 p3 (1 − p)k3 −3 (7) (8) = (k3 − k1 − 1)p3 (1 − p)k3 −3 The complete expression of the PMF of K1 and K3 is PK1 . . 2. Nevertheless. k3 ) = ∞ k2 =−∞ PK1 .K2 (k1 . we have PK1 .K3 (k1 . k2 ) = k2 −1 k1 =1 p2 (1 − p)k2 −2 (17) (18) = (k2 − 1)p2 (1 − p)k2 −2 201 . k2 . k3 ) = k2 −1 k1 =1 p3 (1 − p)k3 −3 (10) (11) = (k2 − 1)p3 (1 − p)k3 −3 The complete expression of the PMF of K2 and K3 is PK2 . . k2 ) = p2 (1 − p)k2 −2 1 ≤ k1 < k2 0 otherwise (6) Next we find PK1 . k3 ). the marginal PMF of K2 is PK2 (k2 ) = ∞ k1 =−∞ PK1 .K3 (k1 . k3 ). 0 otherwise.K2 . 4 Solution For 0 ≤ y1 ≤ y4 ≤ 1. 2 (23) Problem 5.Since PK2 (k2 ) = 0 for k2 < 2.K3 (k2 .Y4 (y1 . k3 ) = k3 −1 k2 =2 (k2 − 1)p3 (1 − p)k3 −3 (20) (21) (22) = [1 + 2 + · · · + (k3 − 2)]p3 (1 − p)k3 −3 (k3 − 2)(k3 − 1) 3 = p (1 − p)k3 −3 2 Since PK3 (k3 ) = 0 for k3 < 3. the PMF of K3 is PK3 (k3 ) = ∞ k2 =−∞ k2 − 1 2 p (1 − p)k2 −2 1 (19) PK2 . y2 ) = = = 1 y2 1 y2 fY (y) dy3 dy4 1 y3 (6) (7) (8) 24 dy4 dy3 24(1 − y3 ) dy3 = 12(1 − y2 )2 The complete expression for the joint PDF of Y1 and Y2 is fY1 .Y2 (y1 . for k3 ≥ 3.Y4 (y1 . p) PMF PK2 (k2 ) = Finally. the complete PMF is the Pascal (2. y4 ) = 12(y4 − y1 )2 0 ≤ y1 ≤ y4 ≤ 1 0 otherwise (5) For 0 ≤ y1 ≤ y2 ≤ 1. y2 ) = 12(1 − y2 )2 0 ≤ y1 ≤ y2 ≤ 1 0 otherwise (9) 202 . the complete expression for PK3 (k3 ) is the Pascal (3. y4 ) = = = y4 y1 y4 y1 fY (y) dy2 dy3 y4 y2 (1) dy2 (2) (3) = 12(y4 − y1 )2 (4) 24 dy3 24(y4 − y2 ) dy2 y2 =y4 y2 =y1 = −12(y4 − y2 )2 The complete expression for the joint PDF of Y1 and Y4 is fY1 . the marginal PDF of Y1 and Y2 is fY1 . p) PMF PK3 (k3 ) = k3 − 1 3 p (1 − p)k3 −3 . the marginal PDF of Y1 and Y4 satisfies fY1 .3.Y2 (y1 . N255 is the multinomial PMF PN0 . . n1 ) = PN0 . . . The joint PMF of N0 .N (n0 .N (n0 . Problem 5. ... Each byte takes on the value bi with probability pi = p = 1/255.Y4 (y1 . N1 = n1 } = N0 = n0 . we define a new experiment with three categories: b0 . ..” Let N denote the number of bytes that are “other. . random variables N1 .6 Solution In Example 5. . y4 ) dy4 would have yielded the same result. . Nr have the multinomial distribution PN1 . . n255 ) = 10000! pn0 pn1 · · · pn255 n0 !n1 ! · · · n255 ! 10000! (1/255)10000 = n0 !n1 ! · · · n255 ! n0 + · · · + n255 = 10000 n0 + · · · + n255 = 10000 (1) (2) To evaluate the joint PMF of N0 and N1 . N1 .. .. .N255 (n0 . ..Y2 (y1 . y2 ).Nr (n1 . nr ) = where n > r > 2. 10000 − n0 − n1 ) ˆ = 10000! n0 !n1 !(10000 − n0 − n1 )! 1 255 (5) 253 255 10000−n0 −n1 n0 +n1 (4) (6) Problem 5. . and N is ˆ ˆ PN0 .. This is a good way to check our derivations of fY1 .For 0 ≤ y1 ≤ 1.Y2 (y1 . The joint PMF of N0 .” In this case.N1 . n1 .Y4 (y1 . .1. n pn1 · · · pnr r n1 . for non-negative integers n0 and n1 satisfying n0 + n1 ≤ 10000..N1 (n0 . . .3. N = 10000 − n0 − n1 Hence.5 Solution The value of each byte is an independent experiment with 255 possible outcomes. . n) = ˆ 10000! n0 !n1 !ˆ ! n 1 255 n0 1 255 n1 253 255 n ˆ n0 + n1 + n = 10000 ˆ (3) Now we note that the following events are one in the same: ˆ {N0 = n0 . y2 ) dy2 = 1 y1 12(1 − y2 )2 dy2 = 4(1 − y1 )3 (10) The complete expression of the PDF of Y1 is fY1 (y1 ) = ∞ 4(1 − y1 )3 0 ≤ y1 ≤ 1 0 otherwise (11) Note that the integral fY1 (y1 ) = −∞ fY1 .3. ˆ b1 and “other. . nr 1 (1) 203 .N1 . n1 . PN0 . N1 = n1 . . the marginal PDF of Y1 can be found from fY1 (y1 ) = ∞ −∞ fY1 . a byte is in the ˆ “other” category with probability p = 253/255. . y4 ) and fY1 . N2 . . n − n1 − n2 ) ˆ = (4) (5) n! pn1 pn2 (1 − p1 − p2 )n−n1 −n2 n1 !n2 !(n − n1 − n2 )! 1 2 (3) (b) We could find the PMF of Ti by summing the joint PMF PN1 . N2 = n2 . the success probability is qi = p1 + · · · + pi and Ti is the number of successes in n trials.N2 .N2 . In principle. y. PN1 . .. 204 . and N ˆ PN1 . Suppose we say a success occurs if the outcome of the trial is in the set {s1 . it is easier to start from first principles.Nr (n1 . . 1. n2 .. we can sum the joint PMF PX. n t qi (1 − qi ) PTi (t) = (6) 0 otherwise (c) The joint PMF of T1 and T2 satisfies PT1 . .N (n1 .Z (x. Thus.. we define a new experiment with mutually exclusive ˆ events: s1 . n2 . 5 0 otherwise 5 z (1) (b) From the properties of the binomial distribution given in Appendix A. y to find PZ (z). for non-negative integers n1 and n2 satisfying n1 + n2 ≤ n. . However. N2 = n2 } = N1 = n1 . . . . it is better to realize that each fax has 3 pages with probability 1/6. . .3.N (n1 . 1. we know that E[Z] = 5(1/6).N2 (t1 . Thus. In this case. N1 + N2 = t2 ] = PN1 . t2 ) = P [N1 = t1 . N = n − n1 − n2 Hence. s2 and “other” Let N denote the number of trial outcomes that are “other”. N2 = t2 − t1 ] (7) (8) (9) Problem 5.T2 (t1 . n2 ) = PN1 .. . si } and otherwise a failure occurs.Y. However.(a) To evaluate the joint PMF of N1 and N2 . Z has the binomial PMF PZ (z) = (1/6)z (5/6)5−z z = 0. PT1 . s2 . t2 ) = n! pt1 pt2 −t1 (1 − p1 − p2 )n−t2 t1 !(t2 − t1 )!(n − t2 )! 1 2 0 ≤ t 1 ≤ t2 ≤ n (10) = P [N1 = t1 .N2 (n1 . . n) = ˆ n! ˆ pn1 pn2 (1 − p1 − p2 )n n1 !n2 !ˆ ! 1 2 n n1 + n2 + n = n ˆ (2) Now we note that the following events are one in the same: ˆ {N1 = n1 . z) over all x.T2 (t1 . Ti has the binomial PMF n t n−t t = 0. In this case. . t2 − t1 ) By the result of part (a). . .7 Solution (a) Note that Z is the number of three-page faxes. The joint PMF ˆ ˆ is of N1 . a trial is in the “other” category with probability p = 1 − p1 − p2 . nr ). independent of any other fax. . Y (x. 3 0 otherwise 3 x (4) That is.Y. given Z = 2. 1. the event {X = x. PX. X + Y = 3.Y |Z (x. (d) Given Z = 2. y|2) = PX. we see that given x + y is a fixed value. then Y = 3 − X and PX|Z (x|2) = PX. Moreover given X = x and Z = 2 we must have Y = 3 − x. we observe that PX.Y |Z (x. PX. y ≥ 0.Y |Z (x. we note that if Z = 2. (e) There are several ways to solve this problem.Z (x. 5 − x − y) = PX. Y = y} occurs iff {X = x. x. X and Y have a joint PMF given by a binomial distribution 205 . the conditional PMF of X is binomial for 3 trials and success probability 2/5. y) = PX. y.Y (x. y ≥ 0 otherwise (6) The above expression may seem unwieldy and it isn’t even clear that it will sum to 1. we can calculate PX.Y. y. y integer otherwise (3) In the above expression.Z (x. 2) = PZ (2) 5! x y 2 x!y!2! (1/3) (1/2) (1/6) 5 2 3 2 (1/6) (5/6) (2) With some algebra. y satisfying x + y = 3. The most straightforward approach is to realize that for integers 0 ≤ x ≤ 5 and 0 ≤ y ≤ 5. the complete expression of the conditional PMF is PX. In that case. x ≥ 0. 2. it is wise to think of x + y as some fixed value.Y. The conditional expectation of X given Z = 2 is E[X|Z = 2] = 3(2/5) = 6/5. y|5 − x + y) PZ (5 − x − y) (7) Using PZ (z) found in part (c). To simplify the expression. Hence for non-negative integers x.Y |Z (x.Z (x. 3 − x|2) = (2/5)x (3/5)3−x x = 0. y) = PX. Note that given Z = 2. 5 − x − y) PZ (5 − x − y) x x+y 1/3 = 1/2 + 1/3 x = x+y x 2 5 x (8) 1/2 1/2 + 1/3 y (9) (10) 3 5 (x+y)−x In the above expression. y. y|5 − x + y) = PX. x ≥ 0. Y = y. we assume x and y are nonnegative integers so that PX.Y |Z (x. integer valued. The conditonal PMF of the number of 1-page faxes is binomial where 2/5 is the conditional probability that a fax has 1 page given that it either has 1 page or 2 pages. there are 3 faxes left.(c) We want to find the conditional PMF of the number X of 1-page faxes and number Y of 2-page faxes given Z = 2 3-page faxes. 5 − (x + y)) = 0 5! x!y!(5−x−y)! 1 x 3 1 y 2 (5) 1 5−x−y 6 0 ≤ x + y ≤ 5. y|5 − x − y) for 0 ≤ x + y ≤ 5. Z = 5 − (x + y)}.Y. For the rest of this problem. each of which independently could be a 1-page fax.Z (x.Y |Z (x. y|2) = 0 3! x y x!y! (2/5) (3/5) x + y = 3. y. .. . Finally. .2. we generalize the result to n messages.Y (x. = p (1 − p) n (3) (4) (5) (6) Kn .. . the joint event {K1 = k1 .Y (x. An are independent and P [Aj ] = (1 − p)kj −kj−1 −1 p. K2 = k2 . . Thus PK1 . given the number X of one page faxes. kn ) = P [A1 ] P [A2 ] · · · P [An ] = pn (1 − p)kn −n To clarify subsequent results. it is better to rename K as Kn = K1 K2 · · · that pn (1 − p)kn −n 1 ≤ k1 < k2 < · · · < kn . . A2 . y) = 0 5 5−x−y 1 5−x−y 6 5 x+y x+y 6 x 2 x 5 3 y 5 x. we found that the joint PMF of K = K1 K2 K3 PK (k) = p3 (1 − p)k3 −3 k1 < k2 < k3 0 otherwise is (1) In this problem.Kn (k1 . y ≥ 0 otherwise (11) Problem 5. · · · ..3. .3. the complete expression for the joint PMF of X and Y is PX.. followed by a successful transmission (k2 − 1) − k1 failures followed by a successful transmission (k3 − 1) − k2 failures followed by a successful transmission (kn − 1) − kn−1 failures followed by a successful transmission (2) Note that the events A1 . by rewriting PX. the number Y of two page faxes is completely specified. PKn (kn ) = 0 otherwise. each of those faxes is a one page fax with probability (1/3)/(1/2 + 1/3) and so the number of one page faxes should have a binomial distribution. given that there were a fixed number of faxes that had either one or two pages. . This should not be surprising since it is just a generalization of the case when Z = 2. (a) For k1 < k2 < · · · < kn . y) given above. Moreover. An k1 − 1 failures. Kn = kn } occurs if and only if all of the following events occur A1 A2 A3 .in x. . That is. We see (7) (k1 −1)+(k2 −k1 −1)+(k3 −k2 −1)+···+(kn −kn−1 −1) 206 .8 Solution In Problem 5. . 1 2 3 4 (1) Problem 5.N4 (4.1 Solution For i = j.4. jth entry in the covariance matrix CX is CX (i.N3 .(b) For j < n. 4. PN1 .   . Xi and Xj are independent and E[Xi Xj ] = E[Xi ]E[Xj ] = 0 since E[Xi ] = 0. PK1 .4. . . j) = E [Xi Xj ] = 2 σi i = j.Kj (k1 . 4 0 otherwise. (3) dx2 dx4 207 . 4. i However. In particular.K2 .3 Solution We will use the PDF fX (x) = 1 0 ≤ xi ≤ 1. it is simpler to return to first principles. Thus the i. otherwise. N3 and N4 are dependent. i = 1. all the off-diagonal entries in the covariance matrix are zero and the covariance matrix is  2  σ1 2   σ2   (2) CX =  . for 0 ≤ x1 ≤ 1. i−1 (10) Problem 5.. k2 . In particular. fX1 (x1 ) = = 0 1 0 1 0 1 0 1 1 1 fX (x) dx2 dx3 dx4 dx3 0 0 (2) = 1.N2 . 0 otherwise. kj ) = PKj (kj ) .. we have PKj (kj ) = pj (1 − p)kj −j 0 1 ≤ k1 < k2 < · · · < kj .. . Since Kj is just Kn with n = j..4.2 Solution The random variables N1 . . 2. 2 σn Problem 5. To see this we observe that PNi (4) = p4 . . p) PMF PKi (ki ) = ki − 1 i p (1 − p)ki −i . Ki is the number of trials up to and including the ith success and has the Pascal (i. (9) (8) (c) Rather than try to deduce PKi (ki ) from the joint PMF PKn (kn ). N2 . 4) = 0 = p4 p4 p4 p4 = PN1 (4) PN2 (4) PN3 (4) PN4 (4) . (1) to find the marginal PDFs fXi (xi ). 3. (1) Thus for random vector X = X1 X2 · · · Xn .. for x1 ≥ 0. 0 otherwise. fXi (xi ) > 0. x3 ≥ 0. fX2 (x2 ).4. fX2 (9) > 0. otherwise. X2 . (6) (7) fX (x) = fX1 (x1 ) fX2 (x2 ) fX3 (x3 ) . (6) Problem 5.4. (1) to find the marginal PDFs fXi (xi ). from the definition of the joint PDF fX1 . one can show that fX2 (x2 ) = fX3 (x3 ) = Thus ∞ 0 ∞ 0 0 0 ∞ ∞ e−x1 0 (5) fX (x) dx1 dx3 = fX (x) dx1 dx2 = x2 ≥ 0. 8) = 0 = fX1 (10) fX2 (9) fX3 (8) . We conclude that X1 . However. e−2x2 dx2 ∞ 0 e−3x3 dx3 ∞ 0 (3) (4) 1 − e−2x2 2 1 − e−3x3 3 x1 ≥ 0. Readers who find this quick answer dissatisfying are invited to confirm this conclusions by solving Problem 5. fX1 (x1 ) = ∞ ∞ fX (x) dx2 dx3 ∞ 0 (2) ∞ 0 0 0 = 6e−x1 = 6e−x1 Thus. fX1 (10) > 0. (4) fX1 (x) = fX2 (x) = fX3 (x) = fX4 (x) = Thus (5) fX (x) = fX1 (x) fX2 (x) fX3 (x) fX4 (x) . fX1 (x1 ) = Following similar steps. We conclude that X1 . Thus fX1 (10)fX2 (9)fX3 (8) > 0. X3 and X4 are independent. and fX3 (x3 ). x2 ≥ 0. 1 0 ≤ x ≤ 1. X2 .5 Solution It follows that X1 .Thus.6 for the exact expressions for the marginal PDFs fX1 (x1 ). (1) Problem 5. one can show that 1 0 ≤ x ≤ 1. (8) This problem can be solved without any real math. 9. 0 otherwise. otherwise. Thus. X2 and X3 are dependent. and X3 are independent. Some thought should convince you that for any xi > 0. x3 ≥ 0 0 otherwise. otherwise.4 Solution We will use the PDF fX (x) = 6e−(x1 +2x2 +3x3 ) x1 ≥ 0.X3 (10. 208 . and fX3 (8) > 0. fX1 (x1 ) = Following similar steps. 2−2x2 0 3−3x3 0 = e−x1 . In particular.4.X2 . . fU1 . n 0 otherwise (1) Since U1 . U2 ) = (x1 . x1 < U2 ≤ x1 + ∆] (2) (3) (4) = fU1 .X2 (x1 . . . P [Ui = Uj ] = 0 for all i = j. x2 ) ∆2 = P [x1 < X1 ≤ x1 + ∆. x2 < X2 ≤ x2 + ∆] = P [x1 < U1 ≤ x1 + ∆. λ) = (i.Un (u1 . For infinitesimal ∆. . x2 ) = 2/T n .X2 (x1 . For the same reason.U2 (x1 . each Xi is an Erlang (n. U2 ) = (x2 .5. 209 (5) .U2 (x2 . Problem 5.Problem 5. . i = 1. .. (X1 . for x2 ≥ 0. For x1 ≥ 0. fX1 (x1 ) = ∞ x1 ∞ x2 e−x3 dx3 dx2 = ∞ x1 e−x2 dx2 = e−x1 (1) Similarly. x2 ) or (U1 . X2 has marginal PDF fX2 (x2 ) = Lastly. . 1) random variable. X2 ) = (x1 . . 1) random variables. un ) = 1/T n 0 ≤ ui ≤ 1.4..6 Solution We find the marginal PDFs using Theorem 5. . . x2 ) (with x1 < x2 ) if either (U1 . 2. fX1 .7 Solution Since U1 . . . it is instructive to start with the n = 2 case. x1 ) ∆2 We see that for 0 ≤ x1 < x2 ≤ 1 that fX1 .. fXi (x) = 0. Thus we need only to consider the case when x1 < x2 < · · · < xn .4. Un are iid uniform (0. In this case. . . fX3 (x3 ) = x3 0 x3 x1 x2 0 ∞ x2 e−x3 dx3 dx1 = 0 x2 e−x2 dx1 = x2 e−x2 (2) e−x3 dx2 dx1 = 0 x3 (x3 − x1 )e−x3 dx1 x1 =x3 x1 =0 (3) 1 = x2 e−x3 2 3 (4) 1 = − (x3 − x1 )2 e−x3 2 The complete expressions for the three marginal PDFs are fX1 (x1 ) = fX2 (x2 ) = fX3 (x3 ) = e−x1 0 x1 ≥ 0 otherwise x2 ≥ 0 otherwise x3 ≥ 0 otherwise (5) (6) (7) x2 e−x2 0 (1/2)x2 e−x3 3 0 In fact. First we note that for x < 0. . . Un are continuous. . x1 ). P [Xi = Xj ] = 0 for i = j. x2 ) ∆2 + fU1 . x2 < U2 ≤ x2 + ∆] + P [x2 < U1 ≤ x2 + ∆. To understand the claim. . . . fX1 . . Xn = xn } occurs if U1 = xπ(1) . U2 = xπ(2) . .For the general case of n uniform random variables.. multiple solutions (if the columns of A are linearly dependent). xπ(n) ) = 1/T n for each permutation π. . (2) 210 . . the system of equations Ax = y − b may have no solutions (if the columns of A do not span the vector space).1 Solution For discrete random vectors. . xn ) = n!/T n . . . 2. xn ) ∆n = π∈Π (6) fU1 . fX1 . each Jm is a geometric (p) random variable with PMF (1 − p)j−1 p j = 1. . .Xn (x1 . .. exactly one solution. . xn ) = 0 unless x1 < · · · < xn .. Problem 5. . . it is true in general that PY (y) = P [Y = y] = P [AX + b = y] = P [AX = y − b] .. . for m = n.. . In this case. (1) For an arbitrary matrix A. we would need to do some bookkeeping to add up the probabilities PX (x) for all vectors x satisfying Ax = y − b. . ... 3 PJ (j) = PJ1 (j1 ) PJ2 (j2 ) PJ3 (j3 ) =    0 otherwise.. (7) Since there are n! permutations and fU1 . This can get disagreeably complicated. . . . . π(n) as a permutation vector of the integers 1. 2. In the invertible case. Thus the PMF of J = J1 J2 J3 is Problem 5.   i = 1. .Xn (x1 . Thus. or. .. . Since the order statistics are necessarily ordered. we define π = π(1) . . That is.. . xπ(n) ∆n . the number of transmissions of message n is independent of the number of transmissions of message m.Xn (x1 . for 0 ≤ x1 < x2 < · · · < xn ≤ 1. independent of any other transmission.Un (xπ(1) . we can conclude that (8) fX1 . PY (y) = P [AX = y − b] = P X = A−1 (y − b) = PX A−1 (y − b) . when A is invertible. the event {X1 = x1 .5. because each message is transmitted over and over until it is transmitted succesfully. Un = xπ(n) for any permutation π ∈ Π. ... Since each transmission is a success with probability p.. . (1) PJm (j) = 0 otherwise. .. Jm and Jn are independent random variables.. . The random variable Jn is the number of times that message n is transmitted.5. . ... . Moreover.. .. . 2.Un xπ(1) . X2 = x2 . . we note that when Ax = y − b has multiple solutions. . (2) As an aside. n and Π as the set of n! possible permutation vectors. .2 Solution  3  p (1 − p)j1 +j2 +j3 −3 ji = 1. 2.. FXi (xi ) = FX (xi ) = 1 − e−x/2 x ≥ 0 0 otherwise.Problem 5.9)1/6 (6) (7) (1 − e−r/τ )6 r ≥ 0. FW (w) = P [min(X1 .5.7. . . The “expected response time” refers to E[Xi ]. (1) Let R = max(X1 . X2 > w. 0 otherwise. When each truck has expected response time τ . we will find the CDF FW (w) since this will also be useful for part (c). . X2 .9. we know that each Xi has CDF x − 35 x − 35 Xi − 35 ≤ =Φ (1) FXi (x) = P [Xi ≤ x] = P 5 5 5 (a) The time of the winning boat is W = min(X1 . .3 Solution fXi (xi ) = 1 −x/2 2e The response time Xi of the ith truck has PDF fXi (xi ) and CDF FXi (xi ) given by 0 x ≥ 0. . This implies τ= −3 = 0. . rather than E[R]. . X6 ) denote the maximum response time. then each Xi has CDF FXi (x) = FX (x) = 1 − e−x/τ 0 x≥0 otherwise. . ln 1 − (0. otherwise. . (5) Let Xi denote the finishing time of boat i. . X10 > w] 211 . the CDF of R is FR (r) = (FX (x) r)6 = We need to find τ such that P [R ≤ 3] = (1 − e−3/τ )6 = 0.4 Solution To find the probability that W ≤ 25. X10 ) (2) Problem 5. (a) The probability that all six responses arrive within five seconds is P [R ≤ 5] = FR (5) = (FX (5))6 = (1 − e−5/2 )6 = 0. (3) (2) (b) This question is worded in a somewhat confusing way. . . .5. . X2 . . .7406 s.5982. X10 ) > w] = 1 − P [X1 > w. . From Theorem 5. If the expected response time of a truck is τ . X10 ) ≤ w] (3) (4) (5) = 1 − P [min(X1 . (4) The goal of this problem is to find the maximum permissible value of τ . X2 . . R has PDF FR (r) = (FX (r))6 . the response time of an individual truck. Since finishing times of all boats are iid Gaussian random variables with expected value 35 minutes and standard deviation 5 minutes. . X2 . . However. 10 FW (w) = 1 − i=1 P [Xi > w] = 1 − (1 − FXi (w))10 =1− 1−Φ w − 35 5 10 (6) (7) Thus. 10 P [L > 50] = 1 − i=1 P [Xi ≤ 50] = 1 − (FXi (50))10 = 1 − (Φ([50 − 35]/5))10 = 1 − (Φ(3)) 10 (12) (13) (14) = 0. P [W ≤ 25] = FW (25) = 1 − (1 − Φ(−2))10 = 1 − [Φ(2)] 10 (8) (9) = 0. or a programmable calculator. X10 ≤ 50] Once again.28 × 10−11 .5.28 × 10−12 )10 = 1.5 Solution Since 50 cents of each dollar ticket is added to the jackpot. . . the tables in the text have neither Φ(7) nor Q(7). X10 ). . Ni has a Poisson distribution with mean j.2056. . which has probability FW (0) = 1 − (1 − Φ(−35/5))10 = 1 − (1 − Φ(−7))10 = 1 − (Φ(7))10 . can find out that Q(7) = 1 − Φ(7) = 1. since the Xi are iid Gaussian (35. 5) random variables. It follows that E[Ni |Ji = j] = j and that Var[Ni |Ji = j] = j. (15) Unfortunately. . Ji−1 = Ji + Ni 2 (1) Given Ji = j. . X2 ≤ 50. . (16) Problem 5. those with access to Matlab. This implies that a boat finishes in negative time with probability FW (0) = 1 − (1 − 1.28 × 10−12 . This implies E Ni2 |Ji = j = Var[Ni |Ji = j] + (E [Ni |Ji = j])2 = j + j 2 212 (2) .Since the Xi are iid. (b) The finishing time of the last boat is L = max(X1 . The probability that the last boat finishes in more than 50 minutes is P [L > 50] = 1 − P [L ≤ 50] (10) (11) = 1 − P [X1 ≤ 50. .0134 (c) A boat will finish in negative time if and only iff the winning boat finishes in negative time. we have 2 E Ji−1 |Ji = E Ji2 |Ji + E [Ni Ji |Ji ] + E Ni2 |Ji /4 (7) (8) (9) = Ji2 = (3/2)2 Ji2 + Ji /4 By taking the expectation over Ji we have + Ji E [Ni |Ji ] + (Ji + Ji2 )/4 2 E Ji−1 = (3/2)2 E Ji2 + E [Ji ] /4 (10) 2 This recursion allows us to calculate E[Ji2 ] for i = 6. these facts can be written as This permits us to evaluate the moments of Ji−1 in terms of the moments of Ji . Since the jackpot starts at 1 million dollars. 5. Since J6 = 106 . This implies (6) E [Ji ] = (3/2)6−i 106 2 Now we will find the second moment E[Ji2 ].In terms of the conditional expectations given Ji . Specifically. From the recursion. . Since Ji−1 = Ji2 + Ni Ji + Ni2 /4. E[J6 ] = 1012 . J6 = 106 and E[J6 ] = 106 . . . 3Ji 1 Ji = E [Ji−1 |Ji ] = E [Ji |Ji ] + E [Ni |Ji ] = Ji + 2 2 2 This implies (4) E [Ni |Ji ] = Ji E Ni2 |Ji = Ji + Ji2 (3) 3 E [Ji−1 ] = E [Ji ] (5) 2 We can use this the calculate E[Ji ] for all i. By the same argument that we used to develop recursions for E[Ji ] and E[Ji2 ]. we can show (17) E [J] = (3/2)E [J0 ] = (3/2)7 106 ≈ 17 × 106 and 2 E J 2 = (3/2)2 E J0 + E [J0 ] /4 1 (3/2)12 + (3/2)11 + · · · + (3/2)6 106 = (3/2)14 1012 + 4 106 (3/2)6 [(3/2)7 − 1] = (3/2)14 1012 + 2 (18) (19) (20) 213 . 0. we obtain 1 2 2 (11) E J5 = (3/2)2 E J6 + E [J6 ] /4 = (3/2)2 1012 + 106 4 1 2 2 E J4 = (3/2)2 E J5 + E [J5 ] /4 = (3/2)4 1012 + (3/2)2 + (3/2) 106 (12) 4 1 2 2 (3/2)4 + (3/2)3 + (3/2)2 106 E J3 = (3/2)2 E J4 + E [J4 ] /4 = (3/2)6 1012 + (13) 4 The same recursion will also allow us to show that 1 2 E J2 = (3/2)8 1012 + (3/2)6 + (3/2)5 + (3/2)4 + (3/2)3 106 (14) 4 1 2 E J1 = (3/2)10 1012 + (3/2)8 + (3/2)7 + (3/2)6 + (3/2)5 + (3/2)4 106 (15) 4 1 2 (3/2)10 + (3/2)9 + · · · + (3/2)5 106 E J0 = (3/2)12 1012 + (16) 4 Finally. . day 0 is the same as any other day in that J = J0 + N0 /2 where N0 is a Poisson random variable with mean J0 . . (4) P [X1 ≤ x] P [X2 ≤ x] · · · P [Xn−1 ≤ x] fX (x) dx [FX (x)] n−1 (5) (6) 1 fX (x) dx = [FX (x)]n n ∞ −∞ = 1 (1 − 0) n Not surprisingly. We can find P [A] by conditioning on the value of Xn . · · · . · · · . the variance of J is Var[J] = E J 2 − (E [J])2 = 106 (3/2)6 [(3/2)7 − 1] 2 (21) Since the variance is hard to interpret. P [A] = Since X1 . .Finally. symmetry would suggest that Xn is as likely as any of the other Xi to be the largest. Y has covariance matrix CY = ACX A = 1 −2 3 4 4 3 3 9 1 3 28 −66 = . . . X2 < Xn . X2 < x. . . the standard deviation of the jackpot is fairly small. Xn−1 are iid. Xn−1 < Xn |Xn = x] fXn (x) dx P [X1 < x. Xn1 ≤ Xn ] = = ∞ −∞ ∞ −∞ (1) (2) (3) P [X1 < Xn . Xn−1 are independent of Xn . . · · · . . . X2 < x. Y= Y1 1 −2 = X = AX. P [A] = P [X1 ≤ Xn .5. Although the expected jackpot grows rapidly. −2 4 −66 252 (3) 214 . P [A] = = ∞ −∞ ∞ −∞ ∞ −∞ P [X1 < x.6 Solution Let A denote the event Xn = max(X1 .13. X2 ≤ Xn .1 Solution (a) The coavariance matrix of X = X1 X2 CX = (b) From the problem statement. . Problem 5. X2 ] Var[X1 ] = . Y2 3 4 (2) is (1) 4 3 Cov [X1 . Var[X2 ] 3 9 Cov [X1 . . Problem 5. Xn ). . we note that the standard deviation of J is σJ ≈ 9572.6. X2 ] By Theorem 5. Hence P [A] = 1/n should not be surprising. Xn−1 < x|Xn = x] fX (x) dx Since X1 . Xn−1 < x] fX (x) dx. · · · . since the Xi are identical. E[Xi ] = 1/2. we observe that E[Xi Yj ] = E[Xi ]E[Yj ] = 0. (a) The expected value vector is E [X] = E [X1 ] E [X2 ] E [X3 ] E [X4 ] = 1/2 1/2 1/2 1/2 . fX (x) = fX1 (x1 ) fX2 (x2 ) fX3 (x3 ) fX4 (x4 ) where each Xi has the uniform (0. X3 . n E [Y ] = i=1 E [Xi ] = 0 (1) The variance of any sum of random variables can be expressed in terms of the individual variances and co-variances.     n 2 n n n n Since E[Xi ] = 0.6. 1) random variables. In addition. Since the E[Y ] is zero. (3) Problem 5.2 Solution The mean value of a sum of random variables is always the sum of their individual means. and X4 are iid uniform (0.6. Thus.3 Solution Since X and Y are independent and E[Yj ] = 0 for all components Yj .4 Solution Inspection of the vector PDF fX (x) will show that X1 . E[Xi2 ] = 1/3 and Var[Xi ] = 1/12. Var[Y ] = E  Xi i=1 =E i=1 j=1 Xi Xj  = E Xi2 + E [Xi Xj ] (2) i=1 i=1 j=i E [Xi Xj ] = Cov [Xi .Problem 5. X2 . Xj ] = 0 for i = j since independent random variables always have zero covariance. Xi and Xj have correlation (3) E [Xi Xj ] = E [Xi ] E [Xj ] = 1/4. This implies that the cross-covariance matrix is E XY = E [X] E Y = 0. That is. and covariance Cov[Xi . E[Xi2 ] = Var[Xi ] = 1 and for i = j. Xj ] = ρ Thus. Var[Y ] = n + n(n − 1)ρ.6. Var[Y ] = E[Y 2 ]. (1) Problem 5. (4) 215 . 1) PDF fXi (x) = 1 0≤x≤1 0 otherwise (2) (1) It follows that for each i. 2 p p p2 Thus Because Jm and Jn are independent. X4 ] Var[X4 ] Cov [X4 . J1 . RJ = 2 p 1 1 2−p (5) . n) = E [Jm Jn ] = E [Jm ] Jn = 1/p2 For m = n. X2 ] Cov [X1 .(b) The correlation matrix is 2 E X1 E [X2 X1 ] = E XX =  E [X3 X1 ] E [X4 X1 ]  1/3 1/4 1/4 1/3 = 1/4 1/4 1/4 1/4 RX  (c) The covariance matrix for X is the diagonal matrix   Cov [X1 . m) = E Jm = Var[Jm ] + (E Jm )2 = (3) (4) 1−p 1 2−p + 2 = . X3 ]   1/12 0 0 0  0 1/12 0 0   =  0 0 1/12 0  0 0 0 1/12 Note that its easy to verify that CX = RX − µX µX . X2 ] Var[X3 ] Cov [X3 . J2 and J3 are iid geometric (p) random variables with E [Jm ] = Thus the vector J = J1 J2 J3 1 . p Var[Jm ] = 1−p . X1 ] Var[X2 ] Cov [X2 . For m = n. Jn ] = 0 216 (6)   2−p 1 1 1  1 2−p 1 . independent of any other transmission. X1 ] Cov [X4 . X4 ]  CX =  Cov [X3 . X4 ] Var[X1 ] Cov [X2 . 2 2 RJ (m. off-diagonal terms in the covariance matrix are CJ (m. E [X1 X2 ] 2 E X2 E [X3 X2 ] E [X4 X2 ]  1/4 1/4 1/4 1/4  1/3 1/4 1/4 1/3  E [X1 X3 ] E [X1 X4 ] E [X2 X3 ] E [X2 X4 ]  2 E [X3 X4 ] E X3 2 E [X4 X3 ] E X4 (5) (6) (7) (8) The random variable Jm is the number of times that message m is transmitted. the correlation matrix RJ has m. Since each transmission is a success with probability p.6. X3 ] Cov [X1 . X1 ] Cov [X3 . X2 ] Cov [X4 . X3 ] Cov [X2 .5 Solution has expected value (2) E [J] = E [J1 ] E [J2 ] EJ3 = 1/p 1/p 1/p . nth entry RJ (m. p2 (1) Problem 5. n) = Cov [Jm . Note that we can write   1 0 0 (1) K = AJ = 1 1 0 J 1 1 1 We also observe that since each transmission is an independent Bernoulli trial with success probability p. we have that  1 0 0 1−p 1−p 0 1 0 . Thus E[Ji ] = 1/p and Var[Ji ] = (1 − p)/p2 . Thus J has expected value E [J] = µJ = E [J1 ] E [J2 ] E [J3 ] = 1/p 1/p 1/p . m) = Var[Jm ].12 217 . finding the same properties of K = AJ is simple.Since CJ (m.6 Solution This problem is quite difficult unless one uses the observation that the vector K can be expressed in terms of the vector J = J1 J2 J3 where Ji is the number of transmissions of message i. the components of J are iid geometric (p) random variables.6.13. (a) The expected value of K is      1/p 1 0 0 1/p E [K] = AµJ = 1 1 0 1/p = 2/p 3/p 1 1 1 1/p (b) From Theorem 5. the covariance matrix of K is CK = ACJ A 1−p = AIA p2      1 0 0 1 1 1 1 1 1 1−p 1−p 1 1 0 0 1 1 = 1 2 2 = 2 p p2 1 1 1 0 0 1 1 2 3 (5) (6) (7) (4) (2) (3) (c) Given the expected value vector µK and the covariance matrix CK . we can use Theorem 5. I= CJ = 2 p p2 0 0 1  (7) Problem 5. it has the diagonal covariance matrix   Var[J1 ] 0 0 1−p 0 = Var[J2 ] CJ =  0 I p2 0 0 Var[J3 ] Given these properties of J. Since the components of J are independent. Y3 (y2 . We summarize these results here: fY1 (y) = fY3 (y) = fY2 (y) = fY4 (y) = This implies E [Y1 ] = E [Y3 ] = 0 1 Problem 5. 2) = RY (4. 0 otherwise. we need to find the marginal PDFs fYi . j) = E[Yi Yj ] for each i.7 Solution 2(1 − y) 0 ≤ y ≤ 1. 0 ≤ y4 ≤ 1. y4 ) = fY2 .Yj (yi . (5) (6) To find the off diagonal terms RY (i. Example 5. 4) = 1/2. The second part of the problem is to find the correlation matrix RY . 218 (8) (9) .6. Y2 . j pair. 0 otherwise. the second moments are E Y12 = E Y32 = E Y22 = E Y42 = In terms of the correlation matrix. y3 ) = 4(1 − y1 )y4 0 ≤ y1 ≤ 1. RY (1. (7) 1 0 1 0 2y 2 (1 − y) dy = 1/6. we found the marginal PDFs of Y1 . RY (2. yj ). 2y 0 ≤ y ≤ 1. 2y 3 dy = 1/2. j) = E[Yi Yj ]. we need to find RY (i. For i = j.Y4 (y1 . and Y4 . (1) (2) 2y(1 − y) dy = 1/3 2y 2 dy = 2/3 (3) (4) E [Y2 ] = E [Y4 ] = 0 1 Thus Y has expected value E[Y] = 1/3 2/3 1/3 2/3 . 0 otherwise.6. In Example 5. 4y2 (1 − y3 ) 0 ≤ y2 ≤ 1.to find the correlation matrix RK = CK + µK µK     1 1 1 1/p 1−p 1 2 2 + 2/p 1/p 2/p 3/p = p2 1 2 3 3/p     1 1 1 1 2 3 1 1−p 1 2 2 + 2 2 4 6 = 2 p p 1 2 3 3 6 9   2−p 3−p 4−p 1 = 2 3 − p 6 − 2p 8 − 2p  p 4 − p 8 − 2p 12 − 3p (8) (9) (10) (11) The preliminary work for this problem appears in a few different places. 0 otherwise. we found the marginal PDF of Y3 and in Example 5. 3) = 1/6.5.5 showed that fY1 . 1) = RY (3. 0 ≤ y3 ≤ 1. We will see that these are seriously tedious calculations. In fact. y2 ). y3 .Y4 (y1 . we need to calculate fY2 . y3 ) = =4 ∞ −∞ ∞ −∞ 1 y1 fY1 . y4 ). y3 . these PDFs are the same in that fY1 .Y4 (y1 . y4 ) dy3 dy4 1 0 (12) (13) 4 dy3 dy4 = 4y4 dy4 = 2. y3 . fY1 .Y2 . To start. y3 ) and fY2 .Y4 (y3 .Y2 (y1 . y4 ) = = ∞ ∞ −∞ −∞ 1 y2 0 0 ∞ ∞ −∞ −∞ 1 y4 0 0 fY1 .Y4 (y3 . we see for 0 ≤ y1 ≤ 1 and 0 ≤ y3 ≤ 1 that fY1 .Y3 (y1 . y2 . We observe that Y1 and Y3 are independent since fY1 . y2 . In fact. (16) 2xy dx dy = 0 1 yx2 y 0 dy = 0 1 1 y 3 dy = . 4 (17) Continuing in the same way. 219 .Y3 (y1 . fY3 . y2 ) = = Similarly.Y4 (y2 . y3 ) = fY1 (y1 )fY3 (y3 ). for 0 ≤ y1 ≤ y2 ≤ 1. y2 . y2 .Y3 (y2 . y3 ) = fY2 (y2 )fY3 (y3 ). y4 ) dy2 dy4 1 (18) (19) (20) dy2 y3 dy4 = 4(1 − y1 )(1 − y3 ). It follows that RY (1. y4 ) dy1 dy2 1 0 (14) (15) 4 dy1 dy2 = 4y2 dy2 = 2.Y2 .Y3 . y4 ) = fY1 (y1 )fY4 (y4 ).Y4 (x. y) = This implies RY (1. for 0 ≤ y3 ≤ y4 ≤ 1.Y2 (y1 . y4 ) = =4 0 ∞ −∞ ∞ −∞ y2 (21) fY1 . y4 ).Y4 (y1 .Y2 .Y4 (y1 .Y2 . y) = fY3 . fY3 . 4) = E [Y1 Y4 ] = E [Y1 ] E [Y4 ] = 2/9 RY (2. fY1 .Y3 (y1 . Y2 and Y4 are independent since fY2 . 3) = E [Y2 Y3 ] = E [Y2 ] E [Y3 ] = 2/9 (10) (11) We also need to calculate fY1 . 2) = RY (3.Y4 (y1 .Y4 (y2 .Y3 . 4) = E[Y3 Y4 ] and that E [Y3 Y4 ] = 0 1 0 y 2 0 ≤ x ≤ y ≤ 1. This implies RY (1. y4 ) dy1 dy3 y4 (22) (23) (24) dy1 0 dy3 = 4y2 y4 .Y3 . fY1 .Y3 . Finally. 0 otherwise. 3) = E [Y1 Y3 ] = E [Y1 ] E [Y3 ] = 1/9.Y2 (x. y3 . Similarly.Inspection will show that Y1 and Y4 are independent since fY1 . 8 Solution The 2-dimensional random vector Y has PDF fY (y) = 2 y ≥ 0. Along the diagonal.Y2 (x.2 (y1 . y1 + y2 ≤ 1. y2 ) dy1 dy2 = 1 0 0 1−y2 n 2y1 dy1 dy2 . y4 ) = fY2 (y2 )fY4 (y4 ). (n + 1)(n + 2) (4) Symmetry of the joint PDF fY1 . the two identical sub-blocks occur because fY1 . 220 RY (2. 4) = E[Y2 Y4 ] = E[Y2 ]E[Y4 ] = 4/9. (3) 2 y n+1 n+1 1 1−y2 0 dy2 = 2 n+1 1 0 (1 − y2 )n+1 dy2 = 2 . In addition. Problem 5.Since µX = 1/3 2/3 1/3 2/3 .2 ) implies that E[Y2n ] = E[Y1n ]. =  0 0 1/18 1/36 0 0 1/36 1/18 (26) (27) (28) The off-diagonal zero blocks are a consequence of Y1 Y2 being independent of Y3 Y4 . Thus. y).Y2 (y1 .Y4 (y2 . RY (1. The above results give RY (i. E [Y1n ] = A little calculus yields E [Y1n ] = 0 1 ∞ −∞ ∞ −∞ n y1 fY1 . the matrix structure is the result of Y1 Y2 and Y3 Y4 being iid random vectors. E[Y1 ] = E[Y2 ] = 1/3 and (5) E [Y] = µY = 1/3 1/3 . the covariance matrix is We observe that Y2 and Y4 are independent since fY2 . the PDF is simple enough that we can compute E[Yin ] for arbitrary integers n ≥ 0. 0 otherwise.Y2 (y1 . fY1 . (1) Rewritten in terms of the variables y1 and y2 .   1/6 1/4 1/9 2/9 1/4 1/2 2/9 4/9  (25) RY =  1/9 2/9 1/6 1/4 . j) for i ≤ j. Since RY is a symmetric matrix. 2/9 4/9 1/4 1/2 CY = RY − µX µX     1/3 1/6 1/4 1/9 2/9 2/3 = 1/4 1/2 2/9 4/9 −   1/3 2/3 1/3 2/3 1/3 2/9 4/9 1/4 1/2 2/3   1/18 1/36 0 0 1/36 1/18 0 0  . y2 ) = 2 y1 ≥ 0. y) = fY3 . (2) In this problem. 0 otherwise. 2) = E Y22 = 1/6. (6) . y2 ≥ 0.Y4 (x. In short. 1) = E Y12 = 1/6. It follows that RY (2. 1 1 y ≤ 1.6. 1 Solution (a) From Theorem 5. the correlation matrix of X is RX = CX + µX µX     4 −2 1 4 = −2 4 −2 + 8 1 −2 4 6    16 4 −2 1 = −2 4 −2 + 32 24 1 −2 4 221 (1) 4 8 6    20 30 25 32 24 64 48 = 30 68 46 25 46 40 48 36 (2) (3) . To show a correlation matrix RX is positive semi-definite.12.To complete the correlation matrix. it is sufficient to show that every correlation matrix. 2) = 0 1 2 y1 1−y−2 0 y2 dy2 = 0 1 y2 (1 − y2 )2 dy2 = 1 2 2 3 1 4 y − y + y 2 2 3 2 4 2 1 = 0 1 . (1) It follows that the covariance matrix CX is positive semi-definite if and only if the correlation matrix RY is positive semi-definite. we obtain RY (1. 2) = E [Y1 Y2 ] = ∞ −∞ ∞ −∞ y1 y2 fY1 . we can define Y = X − µX so that CX = E (X − µX )(X − µX ) = E YY = RY . 12 (8) Thus we have found that RY = Lastly. is positive semi-definite.7. we write a RX a = a E XX a = E a XX a = E (a X)(X a) = E (a X)2 .9 Solution Given an arbitrary random vector X. Y has covariance matrix CY = RY − µY µY = = 1/6 1/12 1/3 − 1/12 1/6 1/3 1/9 −1/36 . y2 ) dy1 dy2 = 1 0 0 1−y2 2y1 y2 dy1 dy2 . Thus.Y2 (y1 . we find RY (1. We note that W = a X is a random variable. (7) Following through on the calculus. whether it is denoted RY or RX . 2 1/12 1/6 E [Y2 Y1 ] E Y2 (9) Problem 5. a RX a = E W 2 ≥ 0. −1/36 1/9 1/3 1/3 (10) (11) 1/6 1/12 E [Y1 Y2 ] E Y12 = . Since E[W 2 ] ≥ 0 for any random variable W .6. (3) (2) Problem 5. x2 ) = √ e−(x1 +x1 x2 −16x1 −20x2 +x2 +112)/6 2 2 (12) (c) We can observe directly from µX and CX that X1 is a Gaussian (4. X2 . 2) random variable.2 Solution We are given that X is a Gaussian random vector with     4 4 −2 1 CX = −2 4 −2 . −4 (1) (2) Since the two rows of A are linearly independent row vectors. x2 ) = fY1 .16.X2 (x1 . By Theorem 5. 1/6 1/3 12 2 4 1/3 1/6 1/6 1/3 y1 − 4 y2 − 8 (5) (4) We note that det(CY ) = 12 and that C−1 = Y This implies that (6) (y − µY ) C−1 (y − µY ) = y1 − 4 y2 − 8 Y = y1 − 4 y2 − 8 = The PDF of Y is fY (y) = 1 √ (7) (8) (9) 2 2 y1 y1 y2 16y1 20y2 y2 112 + − − + + 3 3 3 3 3 3 y1 /3 + y2 /6 − 8/3 y1 /6 + y2 /3 − 10/3 2π 12 1 2 2 e−(y1 +y1 y2 −16y1 −20y2 +y2 +112)/6 =√ 2 48π 1 48π 2 e−(y−µY ) CY −1 (y−µY )/2 (10) (11) Since Y = X1 . the various parts of this problem are just straightforward calculations using Theorem 5. Since Y is a subset of the components of X. Thus. the PDF of X1 and X2 is simply fX1 . it is a Gaussian random vector with expected velue vector µY = E [X1 ] E [X2 ] = 4 8 . X2 ] = Var[X2 ] −2 4 CX1 X2 1 4 2 1/3 1/6 = .(b) Let Y = X1 X2 .0228 2 2 (13) Problem 5. Given these facts. A has rank 2. Y is a Gaussian random vector. 222 . µX = 8 6 1 −2 4 We are also given that Y = AX + b where A= 1 1/2 2/3 1 −1/2 2/3 b= −4 .16. P [X1 > 8] = P 8−4 X1 − 4 > = Q(2) = 0.Y2 (x1 . and covariance matrix CY = 4 −2 Var[X1 ] Cov [X1 .7. 17. P [−1 ≤ Y2 ≤ 1] = P − 1 Y2 1 ≤ ≤ σ2 σ2 σ2 −1 1 −Φ =Φ σ2 σ2 1 −1 = 2Φ σ2 3 = 2Φ √ − 1 = 0. we can conclude that Y is a length 1 Gaussian random vector.(a) The expected value of Y is µY   4 1 1/2 2/3   −4 8 8 + = AµX + b = = . we learn that Y2 has 2 variance σ2 = CY (2. Directly from Theorem 5.3 Solution Problem 5. µY = a µX and Var[Y ] = CY = a CX a.16 with the matrix A replaced by the row vector a and a 1 element vector b = b = 0. From the covariance matrix CY .7. 2) = 103/9. the n = 2 dimensional Gaussian vector X has PDF 1 1 exp − (x − µX ) C−1 (x − µX ) X 1/2 2 2π[det (CX )] (1) 2 2 2 2 2 2 det (CX ) = σ1 σ2 − ρ2 σ1 σ2 = σ1 σ2 (1 − ρ2 ).7. In addition.16. In this case. The expected 2 value vector µY = [µY ] and the covariance “matrix” of Y is just the 1 × 1 matrix [σY ]. we see that E[Y2 ] = 0. 1 −1/2 2/3 9 55 103 1 −2 4 2/3 2/3 1 43 55 8 + 55 103 0 9 1 619 55 9 55 103    (4) (5) (c) Y has correlation matrix RY = CY + µY µY = 8 0 = (6) (d) From µY . (1) Problem 5. (2) 223 .4 Solution fX (x) = where CX has determinant From Definition 5. which is just a Gaussian random variable. 103 (7) (8) (9) (10) This problem is just a special case of Theorem 5. 1 −1/2 2/3 −4 0 6 (3) (b) The covariance matrix of Y is CY = ACX A = 4 −2 1 1 1 1 43 55 1 1/2 2/3  −2 4 −2 1/2 −1/2 = . the vector Y becomes the scalar Y .2325. Since Y2 is a Gaussian random variable. 7. The point to keep in mind is that the definition of a Gaussian random vector does not permit a component random variable to be a deterministic linear combination of other components.5 Solution Since W= X I = X = DX Y A (1) Suppose that X Gaussian (0. µW = 0 and CW = DD . (5) 1 − (x − µX ) C−1 (x − µX ) = − X 2 x1 − µ1 x2 − µ2 x1 − µ1 x2 − =− =− (x1 −µ1 )2 2 σ1 x1 −µ1 − ρ(x2 −µ2 ) 2 σ σ σ1 µ2  ρ(x1 −µ1 ) 1x22 2  −µ − σ1 σ2 + σ 2 2 2(1  −ρ 1 2 σ1 σ2 σ1 −ρ 1 2 σ1 σ2 σ2 2) −ρ x1 − µ1 x2 − µ2  (6) − 2ρ(x1 −µ1 )(x2 −µ2 ) σ1 σ2 2(1 − ρ2 ) 2(1 − ρ2 ) (7) (8) + (x2 −µ2 )2 2 σ2 .13. By Theorem 5. (9) Problem 5. Hence det(CW ) = 0 and C−1 does not W exist.6 Solution 224 . 1/2 2π[det (CX )] 2πσ1 σ2 1 − ρ2 a b c d −1 (3) = 1 d −b . Hence W is not a Gaussian random vector. Using the 2 × 2 matrix inverse formula 1 1 = . which is the bivariate Gaussian PDF in Definition 4. −c a ad − bc 1 2 σ1 −ρ σ1 σ2 −ρ σ1 σ2 1 2 σ2 (4) we obtain C−1 X Thus 2 1 1 −ρσ1 σ2 σ2 = = 2 2 2 σ1 1 − ρ2 σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 . we see that  (x −µ )2 1 1 − 2 1 σ1 fX (x) = exp − 2πσ1 σ2 1 − ρ2 2ρ(x1 −µ1 )(x2 −µ2 ) σ1 σ2 2(1 − ρ2 ) + (x2 −µ2 )2 2 σ2  . This implies y DD y = 0. (3). I) random vector. and (8). Combining Equations (1). the rows of D are dependent and there exists a vector y such that y D = 0. The matrix D is (m + n) × n and has rank n. That is.Thus.17.7. Problem 5. they are independent if and only if Cov[Y1 . when the joint PDF fX (x) is symmetric in x1 and x2 . In particular. The transformation of X to Y is just a rotation of the coordinate system φ = tan 2 1 by θ preserves this circular symmetry. Y has covariance matrix CY = QCX Q = = cos θ − sin θ sin θ cos θ 2 σ1 0 2 0 σ2 (1) cos θ sin θ − sin θ cos θ (2) (3) 2 2 2 2 σ1 cos2 θ + σ2 sin2 θ (σ1 − σ2 ) sin θ cos θ 2 − σ 2 ) sin θ cos θ σ 2 sin2 θ + σ 2 cos2 θ .16 that X is Gaussian with covariance matrix CX = QCY Q 1 1 −1 1 + ρ 1 0 1 1 √ =√ 0 1−ρ 2 1 1 2 −1 1 1 1 + ρ −(1 − ρ) 1 1 = 1−ρ −1 1 2 1+ρ = 1 ρ . In terms of polar coordinates. For other values of θ. Y1 and Y2 are just relabeled versions. Thus. of X1 and X2 . each Yi is a linear combination of both X1 and X2 . In this case. ρ 1 (2) (3) (4) (5) 225 . (σ1 2 1 2 We conclude that Y1 and Y2 have covariance 2 2 Cov [Y1 . This mixing results in correlation between Y1 and Y2 .7 Solution The difficulty of this problem is overrated since its a pretty simple application of Problem 5.13. is constant for all fX (x) = fX1 . the PDF x2 + x2 but for a given r. In these cases. possibly with sign changes. Y1 and Y2 are independent because X1 and X2 are independent. Problem 5. (4) Since Y1 and Y2 are jointly Gaussian. Y2 ] = 2 2 0.7. 2) = (σ1 − σ2 ) sin θ cos θ. 1 1 −1 cos θ − sin θ Q= .6. x2 ) depends on r = 1 2 −1 (x /x ). then Y1 and Y2 are independent if and only if sin θ cos θ = 0. we know from Theorem 5. This occurs in the following cases: • θ = 0: Y1 = X1 and Y2 = X2 • θ = π/2: Y1 = −X2 and Y2 = −X1 • θ = π: Y1 = −X1 and Y2 = −X2 • θ = −π/2: Y1 = X2 and Y2 = X1 In all four cases.7. 2 2 (b) If σ2 > σ1 . (1) =√ sin θ cos θ θ=45◦ 2 1 1 Since X = QY. Y2 ] = CY (1.X2 (x1 . Y1 and Y2 are independent for all θ if and only if σ1 = σ2 .(a) From Theorem 5. CY 0 (7) (6) Problem 5. .e. we define the m-dimensional vector X. y2 .. yn−m in the nullspace A. . Those orthogonal vectors would ˜ be the rows of A. CYX CY The assumption that X and Y are independent implies that CXY = E (X − µX )(Y − µY ) = (E [(X − µX )] E (Y − µY ) = 0. the n-dimensional vector X . . It is then possible to augment those vectors with an additional n − m orothogonal vectors. . Y A is a Gaussian random vector requires us to show that A A ¯ A = ˆ = ˜ −1 ACX A 226 (2) (1) . This also implies CYX = CXY = 0 .9 Solution (a) If you are familiar with the Gram-Schmidt procedure.Problem 5. y2 . An alternate argument is that since A has rank m the nullspace of A. . (b) To use Theorem 5.16 for the case m = n to show Y A ¯ Y = ˆ = ˆ X. the set of all vectors y such that Ay = 0 has dimension n − m. . Thus CW = CX 0 . . Note that W has expected value Y and W = Y µW = E [W] = E The covariance matrix of W is CW = E (W − µW )(W − µW ) =E = = X − µX Y − µY (X − µX ) (Y − µY ) (2) (3) (4) (5) X Y = E [X] µX . yn−m . = µY E [Y] (1) E [(X − µX )(X − µX ) ] E [(X − µX )(Y − µY ) ] E [(Y − µY )(X − µX ) ] E [(Y − µY )(Y − µY ) ] CX CXY . It follows that AA = 0. i.7. .8 Solution As given in the problem statement. We can choose any n − m linearly ˜ independent vectors y1 . the argument is that applying GramSchmidt to the rows of A yields m orthogonal row vectors.7. We then define A to have columns ˜ y1 . 8. To do so. That ˜ is. we can write the row vector w as a linear combination of the rows of A and A.¯ is a rank n matrix. ¯ ¯ (c) We note that By Theorem 5. We can then conclude from Equation (6) that ˜ ˜ ˜ AC−1 A v = 0. Since (C−1 ) = C−1 . ˜ ˜ w = vt A + v A. AA is an m × m rank m matrix. Since A is rank m. By Theorem 5.16.16. the Gaussian vector Y = AX has covariance matrix ¯ ¯ ¯ C = ACX A . we can ˆ conclude that Y and Y are independent Gaussian random vectors. we write Y = 1/3 1/3 1/3 X = AX. Since C−1 is invertible. X (7) ˜ ˜ X ˜ ˜ This would imply that v AC−1 A v = 0.7.1 Solution We can use Theorem 5. It follows that v = 0. we will suppose there exists w such that Aw = 0.8. ACX A ¯ C= 0 0 ˜ −1 A ACX ˜ = CY 0 . Since the rows of A are linearly independent. for some v and v. −1 ˜ ˜ AC A X (8) (9) Applying this result to Equation (8) yields A ¯ C = ˜ −1 CX A ACX ˜ Since AA = 0.16 since the scalar Y is also a 1-dimensional vector. 0 (4) (5) (6) ˜ Since AA = 0. ˜ together have n linearly independent and then show that w is a zero vector. X ACX A ˜ AA ˜ AA . Since A and A ˜ rows. this would imply that X ˜ v = 0. To prove this fact. 0 CY ˆ (11) ˜ C−1 A = X A ˜ C−1 A = X (10) ¯ We see that C is block diagonal covariance matrix. Thus A is ˜ ¯ ˜ A˜ ¯ full rank and Y is a Gaussian random vector. From the claim of Problem 5. Equation (5) implies that AA v = 0. (3) ¯ The condition Aw = 0 implies A ˜ −1 ACX This implies ˜ ˜ AA v + AA v = 0 ˜ ˜ ˜ ˜ AC−1 Av + AC−1 A v = 0 X X ˜ ˜ Av+Av = 0 . X X ¯ A = A ˜ (AC−1 ) = A X ACX ˜ A ˜ C−1 A . it must be that v = 0. Problem 5. Y is a Gaussian vector with expected value E [Y ] = AµX = (E [X1 ] + E [X2 ] + E [X3 ])/3 = (4 + 8 + 6)/3 = 6 227 (2) (1) .  . .8. .  . 20 · · · 20 Y = 1/10 1/10 · · · (2) (b) We observe that 1/10 X = AX (3) Since Y is the average of 10 iid random variables. For i = j. P [Y ≤ 25] = P √ 20. .8. jth entry of CX is [CX ]ij = ρXi Xj Var[Xi ] Var[Xj ] = (0.3 Solution Under the model of Quiz 5. .. . . .5 20.9939) = 0. By Theorem 5. .5 (5) (4) Problem 5..2 Solution (a) The covariance matrix CX has Var[Xi ] = 25 for each diagonal entry.. Tj ] = CT [i − j] = 36 1 + |i − j| (1) From this model. .5 Y −5 ≤ √ = Φ(0.13..8399.8)(25) = 20 form  20 . 2/3) random variable. CT [1]  ··· CT [1] CT [0] CT [30] 228 (2) . .  . . the vector T = T1 · · · T31 has covariance matrix   CT [0] CT [1] ··· CT [30]  . 25 − 20.and covariance matrix CY = Var[Y ] = ACX A 4 −2 1 1/3 −2 4 −2 1/3 = 2 = 1/3 1/3 1/3 3 1 −2 4 1/3 4−6 √ √ = 1 − Φ(− 6) = Φ( 6) = 0. CX =  .9928    (3) (4) Thus Y is a Gaussian (6. 20 25 (1) The covariance matrix of X is a 10 × 10 matrix of the  25 20 · · ·  20 25 . the i.5 Since Y is Gaussian.. .8. .   CT [1] CT [0] . . . Since Y is a scalar. .   CT =  .. the variance of Y is Var[Y ] = CY = ACX A = 20. implying Y −6 2/3 > 2/3 P [Y > 4] = P (5) Problem 5. the temperature on day i and on day j have covariance Cov [Ti . E[Y ] = E[Xi ] = 5. . the 1 × 1 covariance matrix CY = Var[Y ]. m). it is easy to use gaussvector to generate samples of the random vector T.0701 We see from repeated experiments with m = 100.10)).If you have read the solution to Quiz 5. function p=sailboats(w. .0798 We see from repeated experiments with m = 100. .4 Solution The covariance matrix CX has Var[Xi ] = 25 for each diagonal entry. you know that CT is a symmetric Toeplitz matrix and that Matlab has a toeplitz function to generate Toeplitz matrices.1).CT.m) %In Problem 5.m) %Usage: p=sailboats(f.0684 >> julytemp583(100000) ans = 0.100000) ans = 0.0827 >> sailboats(25. >> sailboats(25. . CT=toeplitz(c). .m). . c=36. Using the toeplitz function to generate the covariance matrix.000 trials that P [W ≤ 25] ≈ 0. mu=35*ones(10. Here is the code for estimating P [A] using m samples. 20 25 . For i = j./(1+(0:30)).  . . mu=80*ones(31. the i.8)(25) = 20 (1) The covariance matrix of X is a 10 × 10 matrix of the form   25 20 · · · 20  . .8.4. Problem 5. Problem 5. julytemp583(100000) ans = 0.8. X is an 10 × m matrix such that each column of X is a vector of race times. W is the %winning time in a 10 boat race. X=gaussvector(mu. Y=sum(T)/31.m. %We use m trials to estimate %P[W<=w] CX=(5*eye(10))+(20*ones(10. p=sum((Tmin>=72) & (Y <= 82))/m.0801 >> sailboats(25. T=gaussvector(mu. we went to some trouble to be able to generate m iid samples at once. Tmin=min(T). . In the program sailboats.m.8.07. .0706 >> julytemp583(100000) ans = 0.100000) ans = 0.100000) ans = 0. 20 · · · 20 25 (2) A program to estimate P [W ≤ 25] uses gaussvector to generate m sample vector of race times X. jth entry of CX is [CX ]ij = ρXi Xj Var[Xi ] Var[Xj ] = (0.0714 >> julytemp583(100000) ans = 0.000 trials that P [A] ≈ 0. p=sum(W<=w)/m.1).  . In this problem.5 Solution When we built poissonrv. W=min(X).8.10000) ans = 0.0803 >> sailboats(25. function p=julytemp583(m).CX. 20 . CX =  .08. In addition min(X) is a row vector indicating the fastest time in each race.  . each Poisson random variable that we generate has an expected value that 229 . 6) ans = -1. Matlab calculation of αx and x! will report infinity while e−α will evaluate as zero. %logfacts includes 0 in case xmin=0 %Now we extract needed values: logfacts=logfacts(sx+1). .6777e-009 >> poissonsigma(10000.D) %Perform M trials of the D day lottery %of Problem 5.5 and initial jackpot jstart jackpot=zeros(M. This is slow in several ways.M. First. Our method 230 .1). logfacts =cumsum([0. numerical calculation of PX (x) = αx e−α /x! is tricky because we are taking ratios of very large numbers. The program makes an approximation that for a Poisson (α) random vari√ √ able X. jackpot(m)=jackpot(m)+(0.8081e-009 >> poissonsigma(100000. for m=1:M. >> poissonsigma(1. end end The main problem with lottery1 is that it will run very slowly. .m and some simple calculations: function err=poissonsigma(a. . disp(’trm) jackpot(m)=jstart. In these cases.M. we must generate the daily jackpots sequentially. err=1-sum(pmf).6) ans = 2.log(1:xmax)]). the program poissonsigma(a.is different from that of any other Poisson random variables.k). we are assuming that X cannot be more than six standard deviations away from its mean value. for α = x = 1.0249e-005 >> poissonsigma(10.5*poissonrv(jackpot(m).5.2620e-008 >> poissonsigma(1000. In fact. Thus. . we observe negative errors for very large a. xmax=a+ceil(k*sqrt(a)).6) ans = 2. The error in this approximation is very small. In fact. each j=1 call to poissonrv asks for a Poisson sample value with expected value α > 1 · 106 .6) ans = 1.:) is a Poisson a(i) PMF % from xmin to xmax pmf=exp(-a+ (log(a)*sx)-(logfacts)). sx=xmin:xmax. PX (x) = αx e−αx /x! is so small that it is less than the smallest nonzero number that Matlab can store! To speed up the simulation.6) ans = 1. we repeating the calculation of xmax log j with each call to poissonrv. %pmf(i. function jackpot=lottery1(jstart. Here is a simple program for this purpose. for d=1:D. In fact.floor(a-k*sqrt(a))). for a Poisson (a) random variable. Here is poissonsigma. 1.1)).k) √ calculates the error P [|X − a| > k a].D) %Usage: function j=lottery1(jstart.6383e-010 The error reported by poissonsigma(a.5100e-007 >> poissonsigma(100. Since X has standard deviation α.6) ans = 1.k) should always be positive. for small values of x. PX (x) ≈ 0 for |x − α| > 6 α.000. Second. xmin=max(0.000. we have written a program bigpoissonrv which generates Poisson (α) samples for large α. xmax where xmax ≥ 2 · 106 . For large α and x. Each call to poissonrv generates an entire Poisson PMF PX (x) for x = 0. . Matlab running on my laptop reported an “Out of Memory” error. amax=max(alpha(:)). function x=bigpoissonrv(alpha) %for vector alpha. returns a vector x such that % x(i) is a Poisson (alpha(i)) rv %set up Poisson CDF from xmin to xmax for each alpha(i) alpha=alpha(:). the matrix pmf will have m rows and at least 12 α = 12.of calculating the Poisson (α) PMF is to use the fact that ln x! = exp (ln PX (x)) = exp x ln α − α −  x j=1 x j=1 ln j to calculate (1) This method works reasonably well except that the calculation of the logarithm has finite precision..2). we would like to use lottery to perform m = 1. xmax=amax+ceil(6*sqrt(amax)). however.%set max range sx=xmin:xmax.2). we wrote the program lottery to perform M trials at once and to repeat that N times.000 columns. one should keep in mind that this may not be the case in other other experiments using large Poisson random variables. %each row is a cdf x=(xmin-1)+sum((rand(size(alpha))*ones(size(sx)))<=cdf.log(1:xmax)]). √ However. %Assume Poisson PMF is negligible +-6 sigma from the average xmin=max(0. amin=min(alpha(:)). the approximations to be used by bigpoissonrv are not significant. %Now we include the basic code of poissonpmf (but starting at xmin) logfacts =cumsum([0. (log(alpha)*sx)-(ones(size(alpha))*logfacts)). The result is a significant savings in cpu time as long as the values of alpha are reasonably close to each other.000 trials in a single pass. In any case. Thus. we can conclude that within the accuracy of Matlab’s simulated experiments.  Finally. Matlab is more efficient when calculations are executed in parallel using vectors. the error is inconsequential. The output is an M × N matrix where each i. In general. we can write a short program lottery that simulates trials of the jackpot experiment. j entry is a sample jackpot after seven days.floor(amin-6*sqrt(amax))). in bigpoissonrv. %include 0 in case xmin=0 logfacts=logfacts(sx+1). cdf=cumsum(pmf. The other feature of bigpoissonrv is that for a vector alpha corresponding to expected values α1 · · · αm . For m more than several hundred. ln j  . In our problem. on the order of 10−7 in our experiments.:) is a Poisson alpha(i) PMF from xmin to xmax pmf=exp(-alpha*ones(size(sx))+ . The consequence is that the calculated sum over the PMF can vary from 1 by a very small amount. %extract needed values %pmf(i. given bigpoissonrv. bigpoissonrv returns a vector X such that X(i) is a Poisson alpha(i) sample. 231 . Ideally. The work of calculating the sum of logarithms is done only once for all calculated samples. 10. 232 .50) generates a histogram of the values with 50 bins.n)=jackpot(:.70859 × 107 dollars.n)+(0. the histogram would appear to converge to the shape of a Gaussian PDF. for d=1:D.708 1.5. Thus it is not surprising that the histogram is centered around a jackpot of 1.function jackpot=lottery(jstart.M.200.7092 1. for n=1:N. end end Executing J=lottery(1e6. jackpot(:. This fact is explored in Chapter 6. you will see that the jackpot J has expected value E[J] = (3/2)7 × 106 = 1. disp(d).5 and initial jackpot jstart jackpot=zeros(M.000 sample jackpots.7) generates a matrix J of 2. and used more histogram bins.7082 1.7094 1.M. The command hist(J(:).709 1.7078 1.7084 1. jackpot(:. An example is shown here: 150 Frequency 100 50 0 1.708 × 107 dollars.N).7088 1.N.5*bigpoissonrv(jackpot(:.7086 1.1).n))).n)=jstart*ones(M.N.7096 7 J x 10 If you go back and solve Problem 5. If we did more trials.5.D) %Usage: function j=lottery(jstart.7076 1.5.D) %Perform M trials of the D day lottery %of Problem 5. 2. the number of phone calls needed to obtain the correct answer.2 Solution Let Y = X1 − X2 .1. the mean and variance of Y are E [Y ] = 100E [X] = 100p Var[Y ] = 100 Var[X] = 100p(1 − p) (3) Problem 6.Problem Solutions – Chapter 6 Problem 6.1. . (c) Using the same logic as in part (a) we recognize that in order for n to be the fourth correct answer. that the previous n − 1 calls must have contained exactly 3 correct answers and that the fourth correct answer arrived on the n-th call.1.1 says that the expected value of the difference is E [Y ] = E [X1 ] + E [−X2 ] = E [X] − E [X] = 0 (b) By Theorem 6. Y has the binomial PMF 100 y 100−y y = 0. . The PMF of X33 is   1−p x=0 p x=1 PX33 (x) = (1)  0 otherwise Note that each Xi has expected value E[X] = p and variance Var[X] = p(1 − p). . Hence. 0 otherwise (1) (b) N1 is a geometric random variable with parameter p = 1/4.1 and 6.3 Solution (a) The PMF of N1 . For our case. 5. n−1 n−4 (1/4)4 n = 4. .1 Solution The random variable X33 is a Bernoulli random variable that indicates the result of flip 33. then the previous n − 1 calls must have given wrong answers so that PN1 (n) = (3/4)n−1 (1/4) n = 1. E[N1 ] = 4. the mean of a geometric random variable is found to be 1/p. . by Theorems 6. This gives E[N4 ] = 4E[N1 ] = 16. (a) Since Y = X1 + (−X2 ). 3 (3/4) (2) PN4 (n4 ) = 0 otherwise (d) Using the hint given in the problem statement we can find the mean of N4 by summing up the means of the 4 identically distributed geometric random variables each with mean 4. 100 y p (1 − p) PY (y) = (2) 0 otherwise Since the Xi are independent.3. . the variance of the difference is Var[Y ] = Var[X1 ] + Var[−X2 ] = 2 Var[X] (2) (1) Problem 6. The random variable Y = X1 + · · · + X100 is the number of heads in 100 coin flips. 1. 2. This is described by a Pascal random variable. 233 . Theorem 6. can be determined by observing that if the correct answer is given on the nth call. . . In Theorem 2. .5. . FW (w) = P [X + Y ≤ w] = w 0 0 w−x 2 dy dx = 0 w 2(w − x) dx = w2 (8) Hence.1. In particular. the mean of Yn is E [Yn ] = E [Xn + Xn−1 + Xn−2 ] /3 = 0 234 (1) . for 0 ≤ w ≤ 1.2 which says that Var[W ] = Var[X] + Var[Y ] + 2 Cov [X.5 Solution This problem should be in either Chapter 10 or Chapter 11.Problem 6.4 Solution We can solve this problem using Theorem 6. Y ] = 2/18 − 2/36 = 1/18 (7) (6) For this specific problem. it should be apparent that E[Y ] = E[X] = 1/3 and Var[Y ] = Var[X] = 1/18. the first and second moments of W are E [W ] = 0 1 2w2 dw = 2/3 E W2 = 0 1 2w3 dw = 1/2 (10) The variance of W is Var[W ] = E[W 2 ] − (E[W ])2 = 1/18. the PDF of W is fW (w) = 2w 0 ≤ w ≤ 1 0 otherwise (9) From the PDF. we get the same answer both ways. Since each Xi has zero mean. by taking the derivative of the CDF. Problem 6. Y ] The first two moments of X are E [X] = 0 1 0 1 0 0 1−x 1−x (1) 2x dy dx = 0 1 2x(1 − x) dx = 1/3 1 (2) (3) (4) E X2 = 2x2 dy dx = 0 2x2 (1 − x) dx = 1/6 Thus the variance of X is Var[X] = E[X 2 ] − (E[X])2 = 1/18.1. By symmetry. it’s arguable whether it would easier to find Var[W ] by first deriving the CDF and PDF of W . Y ] = E [XY ] − E [X] E [Y ] = 1/12 − (1/3)2 = −1/36 Finally. To find the covariance. Not surprisingly. we first find the correlation E [XY ] = 0 1 0 1−x 2xy dy dx = 0 1 x(1 − x)2 dx = 1/12 (5) The covariance is Cov [X. the variance of the sum W = X + Y is Var[W ] = Var[X] + Var[Y ] − 2 Cov [X. Since Yn has zero mean, the variance of Yn is 2 Var[Yn ] = E Yn 1 = E (Xn + Xn−1 + Xn−2 )2 9 1 2 2 2 = E Xn + Xn−1 + Xn−2 + 2Xn Xn−1 + 2Xn Xn−2 + 2Xn−1 Xn−2 9 1 4 = (1 + 1 + 1 + 2/4 + 0 + 2/4) = 9 9 (2) (3) (4) (5) Problem 6.2.1 Solution The joint PDF of X and Y is fX,Y (x, y) = 2 0≤x≤y≤1 0 otherwise (1) We wish to find the PDF of W where W = X + Y . First we find the CDF of W , FW (w), but we must realize that the CDF will require different integrations for different values of w. Y Y=X w Area of Integration X+Y=w X w Y w Y=X For values of 0 ≤ w ≤ 1 we look to integrate the shaded area in the figure to the right. w w−x 2 w2 (2) 2 dy dx = FW (w) = 2 x 0 For values of w in the region 1 ≤ w ≤ 2 we look to integrate over the shaded region in the graph to the right. From the graph we see that we can integrate with respect to x first, ranging y from 0 to w/2, thereby covering the lower right triangle of the shaded region and leaving the upper trapezoid, which is accounted for in the second term of the following expression: FW (w) = 0 w 2 Area of Integration y 0 X+Y=w X w 2 dx dy + 1 w 2 w−y 0 2 dx dy (3) w2 (4) 2 Putting all the parts together gives the CDF FW (w) and (by taking the derivative) the PDF fW (w).   w<0  02  w 0≤w≤1   w 0≤w≤1 2 2−w 1≤w ≤2 (5) FW (w) = fW (w) = 2  2w − 1 − w 1 ≤ w ≤ 2   2 0 otherwise  1 w>2 = 2w − 1 − Problem 6.2.2 Solution The joint PDF of X and Y is fX,Y (x, y) = 1 0 ≤ x, y ≤ 1 0 otherwise 235 (1) Proceeding as in Problem 6.2.1, we must first find FW (w) by integrating over the square defined by 0 ≤ x, y ≤ 1. Again we are forced to find FW (w) in parts as we did in Problem 6.2.1 resulting in the following integrals for their appropriate regions. For 0 ≤ w ≤ 1, FW (w) = 0 w 0 w−x dx dy = w2 /2 (2) For 1 ≤ w ≤ 2, FW (w) = 0 w−1 0 1 dx dy + 1 w−1 0 w−y dx dy = 2w − 1 − w2 /2 (3) The complete CDF FW (w) is shown below along with the corresponding PDF fW (w) = dFW (w)/dw.   w<0  0  2 0≤w≤1  w  0≤w≤1 w /2 2−w 1≤w ≤2 (4) FW (w) = fW (w) =   2w − 1 − w2 /2 1 ≤ w ≤ 2  0 otherwise  1 otherwise Problem 6.2.3 Solution By using Theorem 6.5, we can find the PDF of W = X + Y by convolving the two exponential distributions. For µ = λ, fW (w) = = ∞ −∞ w 0 fX (x) fY (w − x) dx (1) (2) (3) λe−λx µe−µ(w−x) dx w 0 = λµe−µw = 0 λµ λ−µ e−(λ−µ)x dx w≥0 otherwise e−µw − e−λw (4) When µ = λ, the previous derivation is invalid because of the denominator term λ − µ. For µ = λ, we have fW (w) = = ∞ −∞ w 0 fX (x) fY (w − x) dx (5) (6) (7) (8) λe−λx λe−λ(w−x) dx w = λ2 e−λw = 0 dx w≥0 otherwise 0 λ2 we−λw Note that when µ = λ, W is the sum of two iid exponential random variables and has a second order Erlang PDF. 236 Problem 6.2.4 Solution In this problem, X and Y have joint PDF fX,Y (x, y) = 8xy 0 ≤ y ≤ x ≤ 1 0 otherwise ∞ (1) We can find the PDF of W using Theorem 6.4: fW (w) = −∞ fX,Y (x, w − x) dx. The only tricky part remaining is to determine the limits of the integration. First, for w < 0, fW (w) = 0. The two remaining cases are shown in the accompanying figure. The shaded area shows where the joint PDF fX,Y (x, y) is nonzero. The diagonal lines depict y = w − x as a function of x. The intersection of the diagonal line and the shaded area define our limits of integration. For 0 ≤ w ≤ 1, y 2 w 1 w 0<w<1 w 1 w 2 x 1<w<2 fW (w) = w w/2 8x(w − x) dx w w/2 (2) = 2w3 /3 (3) = 4wx2 − 8x3 /3 For 1 ≤ w ≤ 2, fW (w) = 1 w/2 8x(w − x) dx 1 w/2 3 (4) (5) = 4w − 8/3 − 2w /3 (6) Since X + Y ≤ 2, fW (w) = 0 for w > 2. Hence the complete expression for the PDF of W is  0≤w≤1  2w3 /3 4w − 8/3 − 2w3 /3 1 ≤ w ≤ 2 fW (w) = (7)  0 otherwise = 4wx2 − 8x3 /3 Problem 6.2.5 Solution We first find the CDF of W following the same procedure as in the proof of Theorem 6.4. FW (w) = P [X ≤ Y + w] = ∞ −∞ y+w −∞ fX,Y (x, y) dx dy (1) By taking the derivative with respect to w, we obtain fW (w) = dFW (w) = dw = ∞ −∞ ∞ −∞ d dw y+w −∞ fX,Y (x, y) dx dy (2) (3) fX,Y (w + y, y) dy With the variable substitution y = x − w, we have dy = dx and fW (w) = ∞ −∞ fX,Y (x, x − w) dx (4) 237 Problem 6.2.6 Solution The random variables K and J have PMFs PJ (j) = 0 αj e−α j! j = 0, 1, 2, . . . otherwise PK (k) = 0 β k e−β k! k = 0, 1, 2, . . . otherwise (1) For n ≥ 0, we can find the PMF of N = J + K via P [N = n] = ∞ k=−∞ P [J = n − k, K = k] (2) Since J and K are independent, non-negative random variables, n P [N = n] = k=0 n PJ (n − k) PK (k) αn−k e−α β k e−β (n − k)! k! (3) (4) n! k!(n − k)! α α+β 1 n−k = k=0 = (α + β)n e−(α+β) n! n k=0 β α+β k (5) The marked sum above equals 1 because it is the sum of a binomial PMF over all possible values. The PMF of N is the Poisson PMF PN (n) = 0 (α+β)n e−(α+β) n! n = 0, 1, 2, . . . otherwise (6) Problem 6.3.1 Solution For a constant a > 0, a zero mean Laplace random variable X has PDF fX (x) = The moment generating function of X is φX (s) = E esX = a 2 0 −∞ a −a|x| e 2 −∞<x<∞ (1) esx eax dx + 0 a 2 ∞ 0 esx e−ax dx ∞ 0 (2) (3) (4) (5) a e(s+a)x = 2 s+a a = 2 = a2 a2 − s2 1 1 − s+a s−a −∞ a e(s−a)x + 2 s−a 238 Problem 6.3.2 Solution (a) By summing across the rows of the table, we see that J has PMF PJ (j) = 0.6 j = −2 0.4 j = −1 (1) The MGF of J is φJ (s) = E[esJ ] = 0.6e−2s + 0.4e−s . (b) Summing down the columns of the table, we see that K has PMF   0.7 k = −1 0.2 k = 0 PK (k) =  0.1 k = 1 The MGF of K is φK (s) = 0.7e−s + 0.2 + 0.1es . (2) (c) To find the PMF of M = J + K, it is easist to annotate each entry in the table with the coresponding value of M : k = −1 k=0 k=1 PJ,K (j, k) j = −2 0.42(M = −3) 0.12(M = −2) 0.06(M = −1) 0.28(M = −2) 0.08(M = −1) 0.04(M = 0) j = −1 We obtain PM (m) by summing over all j, k such that j + k = m, yielding   0.42 m = −3   0.40 m = −2 PM (m) =  0.14 m = −1   0.04 m = 0 (3) (4) (d) One way to solve this problem, is to find the MGF φM (s) and then take four derivatives. Sometimes its better to just work with definition of E[M 4 ]: E M4 = m PM (m) m4 (5) (6) = 0.42(−3)4 + 0.40(−2)4 + 0.14(−1)4 + 0.04(0)4 = 40.434 As best I can tell, the prupose of this problem is to check that you know when not to use the methods in this chapter. Problem 6.3.3 Solution We find the MGF by calculating E[esX ] from the PDF fX (x). φX (s) = E esX = a b esX ebs − eas 1 dx = b−a s(b − a) (1) Now to find the first moment, we evaluate the derivative of φX (s) at s = 0. E [X] = dφX (s) ds = s=0 s bebs − aeas − ebs − eas (b − a)s2 239 (2) s=0 Direct evaluation of the above expression at s = 0 yields 0/0 so we must apply l’Hˆpital’s rule and o differentiate the numerator and denominator. E [X] = lim bebs − aeas + s b2 ebs − a2 eas − bebs − aeas s→0 2(b − a)s 2 ebs − a2 eas b+a b = = lim s→0 2(b − a) 2 (3) (4) To find the second moment of X, we first find that the second derivative of φX (s) is s2 b2 ebs − a2 eas − 2s bebs − aeas + 2 bebs − aeas d2 φX (s) = ds2 (b − a)s3 (5) Substituting s = 0 will yield 0/0 so once again we apply l’Hˆpital’s rule and differentiate the o numerator and denominator. E X 2 = lim s2 b3 ebs − a3 eas d2 φX (s) = lim s→0 s→0 ds2 3(b − a)s2 3 − a3 b = (b2 + ab + a2 )/3 = 3(b − a) (6) (7) In this case, it is probably simpler to find these moments without using the MGF. Problem 6.3.4 Solution Using the moment generating function of X, φX (s) = eσ s /2 . We can find the nth moment of X, E[X n ] by taking the nth derivative of φX (s) and setting s = 0. E [X] = σ 2 seσ E X 2 = σ 2 eσ Continuing in this manner we find that E X 3 = 3σ 4 s + σ 6 s3 eσ 2 s2 /2 2 s2 /2 2 2 s=0 =0 2 s2 /2 (1) s=0 2 s2 /2 + σ 4 s2 eσ = σ2. (2) s=0 =0 s=0 (3) = 3σ 4 . (4) E X 4 = 3σ 4 + 6σ 6 s2 + σ 8 s4 eσ 2 s2 /2 To calculate the moments of Y , we define Y = X + µ so that Y is Gaussian (µ, σ). In this case the second moment of Y is E Y 2 = E (X + µ)2 = E X 2 + 2µX + µ2 = σ 2 + µ2 . Similarly, the third moment of Y is E Y 3 = E (X + µ)3 = E X + 3µX + 3µ X + µ 3 2 2 3 (5) (6) = 3µσ + µ . 2 3 (7) Finally, the fourth moment of Y is E Y 4 = E (X + µ)4 = E X + 4µX + 6µ X + 4µ X + µ 4 3 2 2 3 4 (8) (9) (10) = 3σ 4 + 6µ2 σ 2 + µ4 . 240 Problem 6.3.5 Solution The PMF of K is PK (k) = The corresponding MGF of K is φK (s) = E esK = 1 s e + e2 s + · · · + ens n es = 1 + es + e2s + · · · + e(n−1)s n es (ens − 1) = n(es − 1) (2) (3) (4) 1/n k = 1, 2, . . . , n 0 otherwise (1) We can evaluate the moments of K by taking derivatives of the MGF. Some algebra will show that ne(n+2)s − (n + 1)e(n+1)s + es dφK (s) = ds n(es − 1)2 (5) Hence, we apply l’Hˆpital’s rule twice (by twice o Evaluating dφK (s)/ds at s = 0 yields 0/0. differentiating the numerator and twice differentiating the denominator) when we write dφK (s) ds = lim s=0 n(n + 2)e(n+2)s − (n + 1)2 e(n+1)s + es s→0 2n(es − 1) (6) (7) = lim n(n + 2)2 e(n+2)s − (n + 1)3 e(n+1)s + es = (n + 1)/2 s→0 2nes A significant amount of algebra will show that the second derivative of the MGF is d2 φK (s) n2 e(n+3)s − (2n2 + 2n − 1)e(n+2)s + (n + 1)2 e(n+1)s − e2s − es = ds2 n(es − 1)3 (8) Evaluating d2 φK (s)/ds2 at s = 0 yields 0/0. Because (es − 1)3 appears in the denominator, we need to use l’Hˆpital’s rule three times to obtain our answer. o d2 φK (s) ds2 = lim s=0 n2 (n + 3)3 e(n+3)s − (2n2 + 2n − 1)(n + 2)3 e(n+2)s + (n + 1)5 − 8e2s − es (9) s→0 6nes (10) (11) = n2 (n + 3)3 − (2n2 + 2n − 1)(n + 2)3 + (n + 1)5 − 9 6n = (2n + 1)(n + 1)/6 We can use these results to derive two well known results. We observe that we can directly use the PMF PK (k) to calculate the moments E [K] = 1 n n k k=1 E K2 = 1 n n k2 k=1 (12) Using the answers we found for E[K] and E[K 2 ], we have the formulas n k=1 n(n + 1) k= 2 n k2 = k=1 n(n + 1)(2n + 1) 6 (13) 241 Problem 6.4.1 Solution N is a binomial (n = 100, p = 0.4) random variable. M is a binomial (n = 50, p = 0.4) random variable. Thus N is the sum of 100 independent Bernoulli (p = 0.4) and M is the sum of 50 independent Bernoulli (p = 0.4) random variables. Since M and N are independent, L = M + N is the sum of 150 independent Bernoulli (p = 0.4) random variables. Hence L is a binomial n = 150, p = 0.4) and has PMF PL (l) = 150 (0.4)l (0.6)150−l . l (1) Random variable Y has the moment generating function φY (s) = 1/(1 − s). Random variable V has the moment generating function φV (s) = 1/(1 − s)4 . Y and V are independent. W = Y + V . (a) From Table 6.1, Y is an exponential (λ = 1) random variable. For an exponential (λ) random variable, Example 6.5 derives the moments of the exponential random variable. For λ = 1, the moments of Y are E [Y ] = 1, E Y 2 = 2, E Y 3 = 3! = 6. (1) Problem 6.4.2 Solution (b) Since Y and V are independent, W = Y + V has MGF φW (s) = φY (s)φV (s) = 1 1−s 1 1−s 4 = 1 1−s 5 . (2) W is the sum of five independent exponential (λ = 1) random variables X1 , . . . , X5 . (That is, W is an Erlang (n = 5, λ = 1) random variable.) Each Xi has expected value E[X] = 1 and variance Var[X] = 1. From Theorem 6.1 and Theorem 6.3, E [W ] = 5E [X] = 5, It follows that Var[W ] = 5 Var[X] = 5. (3) (4) E W 2 = Var[W ] + (E [W ])2 = 5 + 25 = 30. Problem 6.4.3 Solution In the iid random sequence K1 , K2 , . . ., each Ki has   1−p p PK (k) =  0 PMF k = 0, k = 1, otherwise. (1) (a) The MGF of K is φK (s) = E[esK ] = 1 − p + pes . φM (s) = [φK (s)]n = [1 − p + pes ]n (2) (b) By Theorem 6.8, M = K1 + K2 + . . . + Kn has MGF 242 1 and 6. the problem asks us to find the moments using φM (s). In this case. Theorem 6.3. (6) (7) (8) Var[Y ] = n Var[X] = 2n/3 Another more complicated way to find the mean and variance is to evaluate derivatives of φY (s) as s = 0. 243 .4 Solution Based on the problem statement. the number of points Xi that you earn for game i has PMF PXi (x) = (a) The MGF of Xi is 1/3 x = 0. 1. By Theorems 6. 2 0 otherwise (1) φXi (s) = E esXi = 1/3 + es /3 + e2s /3 (2) Since Y = X1 + · · · + Xn . the mean and variance of Y are E [Y ] = nE [X] = n Var[Xi ] = E Xi2 − (E [Xi ])2 = 2/3.8 implies φY (s) = [φXi (s)]n = [1 + es + e2s ]n /3n (b) First we observe that first and second moments of Xi are E [Xi ] = x (3) xPXi (x) = 1/3 + 2/3 = 1 x2 PXi (x) = 12 /3 + 22 /3 = 5/3 x (4) (5) E Xi2 = Hence.4.(c) Although we could just use the fact that the expectation of the sum equals the sum of the expectations. E [M ] = dφM (s) ds = n(1 − p + pes )n−1 pes s=0 = np (3) s=0 The second moment of M can be found via E M2 = dφM (s) ds (4) s=0 s=0 = np (n − 1)(1 − p + pes )pe2s + (1 − p + pes )n−1 es = np[(n − 1)p + 1] The variance of M is Var[M ] = E M 2 − (E [M ])2 = np(1 − p) = n Var[K] (5) (6) (7) Problem 6. α = 2i. . Therefore we can conclude that Ri is in fact a Poisson random variable with parameter α = 2i. . .1. Problem 6. i 2k e−2 /k! k = 0.4. we find that the Poisson (α = 2) random variable K has MGF φK (s) = s e2(e −1) . . the mean and variance of Ri are then both 2i.6 Solution The total energy stored over the 31 days is Y = X1 + X2 + · · · + X31 (1) The random variables X1 . 2. all we need to do is find the mean and variance of Y in order to specify the PDF of Y . 2. The mean of Y is 31 31 E [Y ] = i=1 E [Xi ] = i=1 (32 − i/4) = 32(31) − 31(32) = 868 kW-hr 8 (2) Since each Xi has variance of 100(kW-hr)2 . . (2i)r e−2i /r! r = 0. (b) The MGF of Ri is the product of the MGFs of the Ki ’s. + Ki (a) From Table 6. That is. . the Gaussian PDF of Y is fY (y) = √ 1 2 e−(y−868) /6200 6200π (4) (3) 244 . 1.5 Solution PKi (k) = And let Ri = K1 + K2 + . we know that Y is Gaussian. . since the sum of independent Gaussian random variables is Gaussian.4. (3) PRi (r) = 0 otherwise (d) Because Ri is a Poisson random variable with parameter α = 2i. . . 1. .Problem 6. X31 are Gaussian and independent but not identically distributed. . However. the variance of Y is Var[Y ] = Var[X1 ] + · · · + Var[X31 ] = 31 Var[Xi ] = 3100 Since E[Y ] = 868 and Var[Y ] = 3100. 0 otherwise (1) φRi (s) = n=1 φK (s) = e2i(e s −1) (2) (c) Since the MGF of Ri is of the same form as that of the Poisson with parameter. Hence. . From Table 6. V = X1 + · · · + XK is a random sum of random variables.8. From Theorem 6. φV (s) = φK (ln φX (s)) = λ (1 − q) λ−s 1− λ q λ−s = (1 − q)λ (1 − q)λ − s (3) We see that the MGF of V is that of an exponential random variable with parameter (1− q)λ.1.4.Problem 6. we see that the exponential random variable X has MGF φX (s) = λ λ−s (1) (b) Note that K is a geometric random variable identical to the geometric random variable X in Table 6.5. E [M ] = dφM (s) ds = n [φK (s)]n−1 s=0 (1) dφK (s) ds = nE [K] s=0 (2) (b) The second derivative of φM (s) is d2 φM (s) = n(n − 1) [φK (s)]n−2 ds2 dφK (s) ds 2 + n [φK (s)]n−1 d2 φK (s) ds2 (3) Evaluating the second derivative at s = 0 yields E M2 = d2 φM (s) ds2 = n(n − 1) (E [K])2 + nE K 2 (4) s=0 Problem 6. (a) The first derivative of φM (s) is dφK (s) dφM (s) = n [φK (s)]n−1 ds ds We can evaluate dφM (s)/ds at s = 0 to find E[M ]. we know that random variable K has MGF (1 − q)es φK (s) = (2) 1 − qes Since K is independent of each Xi . The PDF of V is (1 − q)λe−(1−q)λv v ≥ 0 (4) fV (v) = 0 otherwise 245 .7 Solution By Theorem 6.12. we know that φM (s) = [φK (s)]n .1 with parameter p = 1 − q.1 Solution (a) From Table 6.1. . X1 + · · · + XN = 100 no matter what is the actual value of N .5. then we know that there were exactly 100 heads in N flips. Hence Y = 100 every time and the PMF of Y is PY (y) = 1 y = 100 0 otherwise (1) Problem 6.5. 1. given N . . 0 otherwise φN (s) = e30(e s −1) (1) Let Xi = 1 if pass i is thrown and completed and otherwise Xi = 0. In particular.4 Solution Donovan McNabb’s passing yardage is the random sum of random variables V + Y1 + · · · + YK where Yi has the exponential PDF fYi (y) = From Table 6. the MGFs of Y and K are φY (s) = 1 1/15 = 1/15 − s 1 − 15s φK (s) = e20(e s −1) (1) 0 1 −y/15 15 e y≥0 otherwise (2) (3) From Theorem 6. 0 otherwise (5) In this problem.12.12 to write φK (s) = φN (ln φX (s)) = e30(φX (s)−1) = e30(2/3)(e s −1) (3) (4) We see that K has the MGF of a Poisson random variable with mean E[K] = 30(2/3) = 20. . . The PMF and MGF of each Xi is   1/3 x = 0 2/3 x = 1 PXi (x) = φXi (s) = 1/3 + (2/3)es (2)  0 otherwise The number of completed passes can be written as the random sum of random variables K = X1 + · · · + XN Since each Xi is independent of N . we can use Theorem 6.5.3 Solution Problem 6. 1. . and PMF PK (k) = (20)k e−20 /k! k = 0.1.2 Solution PN (n) = The number N of passes thrown has the Poisson PMF and MGF (30)n e−30 /n! n = 0. variance Var[K] = 20. Y = X1 + · · · + XN is not a straightforward random sum of random variables because N and the Xi ’s are dependent.Problem 6. Hence. given N = n. V has MGF φV (s) = φK (ln φY (s)) = e20(φY (s)−s) = e300s/(1−15s) 246 (4) . . However. PV (v) = (5) 0 otherwise Problem 6. 000 s=0 (7) Thus. In particular. . A second way to calculate the mean and variance of V is to use Theorem 6.13 which says E [V ] = E [K] E [Y ] = 20(15) = 200 Var[V ] = E [K] Var[Y ] + Var[K](E [Y ]) = (20)15 + (20)15 = 9000 2 2 2 (8) (9) Problem 6. The PMF of V is (rq)v e−rq /v! v = 0. otherwise Xi = 0. we see that V has the MGF of a Poisson random variable with mean E[V ] = rq.1: φK (s) = ∞ k=0 esk p(1 − p)k = p ∞ n=0 [(1 − p)es ]k = p 1 − (1 − p)es (1) 247 . φX (s) = (1 − q) + qes φK (s) = er[e s −1] (2) (3) φV (s) = φK (ln φX (s)) = er[φX (s)−1] = erq(e s −1) (4) Hence. 2.9.5 Solution Since each ticket is equally likely to have one of 46 combinations. we can use the MGF to calculate the mean and variance.6 Solution (a) We can view K as a shifted geometric random variable. we start from first principles with Definition 6. 000 and standard deviation σV ≈ 94. E [V ] = E V2 = dφV (s) ds d2 φV (s) ds2 = e300s/(1−15s) s=0 300 (1 − 15s)2 = 300 s=0 (5) (6) s=0 =e 300s/(1−15s) 300 (1 − 15s)2 2 + e300s/(1−15s) 9000 (1 − 15s)3 = 99. By Theorem 6. the probability a ticket is a 6 winner is 1 q = 46 (1) 6 Let Xi = 1 if the ith ticket sold is a winner.The PDF of V cannot be found in a simple form.5.12. V has variance Var[V ] = E[V 2 ] − (E[V ])2 = 9. . . the number of winning tickets is the random sum V = X1 + · · · + XK From Appendix A. Since the number K of tickets sold has a Poisson PMF with E[K] = r.5. To find the MGF. 1. Var[K] = E[K 2 ] − (E[K])2 = (1 − p)/p2 . E [Xi |X1 + · · · + Xn = k] = P Xi = 1| = P Xi = 1. U = X1 + · · · + Xk and E [U |K = k] = E [X1 + · · · + Xk |X1 + · · · + Xn = k] k Problem 6. we can use Theorem 6.13 to write Var[R] = E [K] Var[X] + (E [X])2 Var[K] = 1−p 1−p 1 − p2 + 2 = p p p2 (6) The way to solve for the mean and variance of U is to use conditional expectations. This implies E [Xi |X1 + · · · + Xn = k] = = P [Xi = 1] P P p n−1 k−1 n k j=i Xj n j=1 Xj =k−1 k n =k = (5) pk−1 (1 − p)n−1−(k−1) pk (1 − p)n−k (6) 248 . Finally.7 Solution (1) (2) = i=1 E [Xi |X1 + · · · + Xn = k] Since Xi is a Bernoulli random variable.5. we first need to calculate the mean and variance of K: E [K] = E K2 = dφK (s) ds d2 φK (s) ds2 = s=0 s=0 [1 − (1 − p)es ]es + 2(1 − p)e2s [1 − (1 − p)es ]3 (1 − p)(2 − p) = p2 = p(1 − p) p(1 − p)es 2 1 − (1 − p)es = s=0 1−p p (3) (4) s=0 (5) Hence. the (2) p 1 − (1 − p)es+s2 /2 (c) To use Theorem 6. we need to recall that each Xi has MGF φX (s) = es+s MGF of R is φR (s) = φK (ln φX (s)) = φK (s + s2 /2) = 2 /2 . From Theorem 6.12. Xi and j=i Xj are independent random variables. In addition. P  n j=1 j=i Xj n j=1 Xj Xj = k  =k  (3) =k−1 (4) Note that n Xj is just a binomial random variable for n trials while j=i Xj is a binomial j=1 random variable for n − 1 trials. Given K = k.(b) First.13. E esXn |N = n = e0 = 1 E esXi |N = n = (1/2)es + (1/2)e2s = es /2 + e2s /2 249 (5) 1/2 x = 1. however. for i < n. Xn = 0 and for i < n. . we can write the total number of points earned as the random sum (1) Y = X1 + X2 + · · · + XN (a) It is tempting to use Theorem 6. this would be wrong since each Xi is not independent of N . X1 . the same for each i. In this case. E [U ] = (10) On the other hand. . Using iterated expectations. n nγ = i=1 E [Xi |X1 + · · · + Xn = k] = E [X1 + · · · + Xn |X1 + · · · + Xn = k] = k (7) Thus γ = k/n. Thus. Problem 6.A second way is to argue that symmetry implies E[Xi |X1 + · · · + Xn = k] = γ. V is just and ordinary random sum of independent random variables and the mean of E[V ] = E[X]E[M ] = np2 . Xn are independent so that E es(X1 +···+Xn ) |N = n = E esX1 |N = n E esX2 |N = n · · · E esXn |N = n (3) Given N = n. . That is. φY (s) = E E es(X1 +···+XN ) |N = ∞ n=1 PN (n) E es(X1 +···+Xn ) |N = n (2) Given N = n.8 Solution Using N to denote the number of games played. we have E [U ] = E [E [U |K]] = E K 2 /n 1 1 E K2 = Var[K] + (E [K])2 = p(1 − p) + np2 n n (9) Since K is a binomial random variable. 2 0 otherwise (4) (6) . . Xi has the conditional PMF PXi |N =n (x) = PXi |Xi =0 (x) = These facts imply and that for i < n. Xi = 0. given N = n. Moreover. the conditional mean of U is k k E [U |K = k] = i=1 E [Xi |X1 + · · · + Xn = k] = i=1 k2 k = n n (8) This says that the random variable E[U |K] = K 2 /n.5.12 to find φY (s). we know that games 1 through n − 1 were either wins or ties and that game n was a loss. we know that E[K] = np and Var[K] = np(1 − p). At any rate. In this problem. we must start from first principles using iterated expectations. the number of games played has the geometric PMF and corresponding MGF PN (n) = Thus. φY (s) = = It follows that φY (s) = 1 s /2 + e2s /2 e ∞ n=1 ∞ n=1 ∞ n=1 PN (n) E esX1 |N = n E esX2 |N = n · · · E esXn |N = n PN (n) es /2 + e2s /2 n−1 (7) n = 1 s /2 + e2s /2 e ∞ n=1 PN (n) es /2 + e2s /2 (8) PN (n) en ln[(e s +e2s )/2] = φN (ln[es /2 + e2s /2]) es /2 + e2s /2 (9) The tournament ends as soon as you lose a game. the answer you would get if you mistakenly assumed that N and each Xi were independent. you may notice that E[Y ] = 3 precisely equals E[N ]E[Xi ]. The second derivative of the MGF is (1 − es /3 − e2s /3)(es + 4e2s ) + 2(es + 2e2s )2 /3 d2 φY (s) = 2 ds 9(1 − es /3 − e2s /3)3 The second moment of Y is E Y2 = d2 φY (s) ds2 = s=0 (14) 5/3 + 6 = 23 1/3 (15) The variance of Y is Var[Y ] = E[Y 2 ] − (E[Y ])2 = 23 − 9 = 14. Although this may seem like a coincidence. the MGF of Y is φY (s) = 1− (2/3)n−1 (1/3) n = 1. we evaluate the derivatives of the MGF φY (s). . its actually the result of theorem known as Wald’s equality. 2. 250 . 0 otherwise 1/3 + e2s )/3 φN (s) = (1/3)es 1 − (2/3)es (10) (es (11) (b) To find the moments of Y . Since es + 2e2s dφY (s) = ds 9 [1 − es /3 − e2s /3]2 we see that E [Y ] = dφY (s) ds = s=0 (12) 3 =3 9(1/3)2 (13) If you’re curious.Now we can find the MGF of Y . . . Since each game is a loss with probability 1/3 independent of any previous game.   0. . By Theorem 6.2 − 0.25 min2 .04 = 0. . (c) The expected value of A is E[A] = 12E[X] = 96. (1) . E [Y ] = 120E [Xi ] = 300 251 Var[Y ] = 120 Var[Xi ] = 750.6.3 Solution (a) Let X1 . 10 (f) P [A < 86] = Φ( 86−96 ) = Φ(−1) = 1 − Φ(1) = 0.2 we can define the random variable Di as the number of data calls in a single telephone call. (e) P [A > 116] = 1 − Φ( 116−96 ) = 1 − Φ(2) = 0. . Furthermore for all i the Di ’s are independent and identically distributed withe the following PMF.5 min and Var[Xi ] = 1/λ2 = 6. we can answer the questions posed by the problem.02275.2 d = 1 PD (d) = (1)  0 otherwise From the above we can determine that E [D] = 0. (d) The standard deviation of A is σA = Var[A] = 12(25/3) = 10. 1/10 0 ≤ w ≤ 10 (1) fW (w) = 0 otherwise We also know that the total time is 3 milliseconds plus the waiting time.16 (2) With these facts.2 Solution Knowing that the probability that voice call occurs is 0.6. namely 0 and 1. X120 denote the set of call durations (measured in minutes) during the month. (a) The expected value of X is E[X] = E[W + 3] = E[W ] + 3 = 5 + 3 = 8.1 and Theorem 6.6915 (d) P [16 ≤ K100 ≤ 24] = Φ( 24−20 ) − Φ( 16−20 ) = Φ(1) − Φ(−1) = 2Φ(1) − 1 = 0.1 Solution We know that the waiting time. The total number of minutes used during the month is Y = X1 + · · · + X120 . It is obvious that for any i there are only two possible values for Di . From the problem statement. each X − I is an exponential (λ) random variable with E[Xi ] = 1/λ = 2.6.3. .2 Var [D] = 0. that is X = W + 3.1587 10 Problem 6.Problem 6.6826 4 4 Problem 6.8 and the probability that a data call occurs is 0. (a) E[K100 ] = 100E[D] = 20 (b) Var[K100 ] = 100 Var[D] = 18−20 4 √ 16 = 4 (c) P [K100 ≥ 18] = 1 − Φ = 1 − Φ(−1/2) = Φ(1/2) = 0. W is uniformly distributed on [0. (b) The variance of X is Var[X] = Var[W + 3] = Var[W ] = 25/3.10] and therefore has the following PDF.8 d = 0 0. 3829 0.1954 0. E [Mi ] = 1 = 3.5 − n √ n = 2Φ 1 √ 2 n −1 (3) The comparison of the exact calculation and the approximation are given in the following table. P [Wn = n] n = 1 n = 4 n = 16 n = 64 exact 0.3679 0.7. .9 says that Mi is a geometric (p) random variable with p = 1 − e−λ = 0.3297.08 740. the subscriber is billed for Mi = Xi minutes. . the subscriber is billed $36 if B = 315 minutes.0. In this case. (2) (b) If the actual call duration is Xi . .08.167. Hence Wn is a Poisson random variable with mean E[Wn ] = nE[K] = n. p Var[Mi ] = 1−p = 6.0992 0.5 − n √ n −Φ n − 0.964. . it’s interesting to see how good the approximation is.8) = 0.4(y − 300)+ where x+ = x if x ≥ 0 or x+ = 0 if x < 0.8) = Φ(1. The probability the subscriber is billed more than $36 is 315 − 365 B − 364 > √ = Q(−1. however for large n. P [B > 315] = P √ 740. . 1. . using a central limit theorem may seem silly. 2. Since M1 .0498 approximate 0.2. Because each Xi is an exponential (λ) random variable.0498 (4) 252 .1974 0. 0 otherwise (1) Problem 6. The probability the subscribers bill exceeds $36 equals P [Y > 315] = P 315 − 300 Y − 300 =Q > σY σY 15 √ 750 = 0.6.1 Solution All of this implies that we can exactly calculate P [Wn = n] = PWn (n) = nn e−n /n! (2) Since we can perform the exact calculation.033. E [B] = 120E [Mi ] = 364. we learned that a sum of iid Poisson random variables is a Poisson random variable. Thus Wn has variance Var[Wn ] = n and PMF PWn (w) = nw e−n /w! w = 0. calculating nn or n! is difficult for large n. Var[B] = 120 Var[Mi ] = 740.2919.08 (5) In Problem 6. the subscribers bill is exactly $36 if Y = 315. Moreover. Since Mi is geometric.The subscriber’s bill is 30 + 0. . the approximation is P [Wn = n] = P [n ≤ Wn ≤ n] ≈ Φ n + 0. Theorem 3. p2 (3) The number of billed minutes in the month is B = M1 + · · · + M120 .0995 0. (4) Similar to part (a). M120 are iid random variables. 253 .10 for the Poisson random variable and the accompanying discussion in Chapter 2. (5) Because the number of arrivals in the interval is small.9505.328) = 0. For E[N ] = α = 5. However. we need to find the smallest C such that C P [N ≤ C] = n=0 αn e−α /n! ≥ 0.c) will show that P [N ≤ 8] = 0. From the previous step. (b) For C = 328. we note that Φ(1. Re-examining Definition 2.3840. we observe that the webserver has an arrival rate of λ = 300 hits/min.65) = 0. 5 5 1−P N ≤5 =1− n=0 PN (n) = 1 − e−α n=0 αn = 0. √ C = 300 + 1.7.95.Problem 6. the probability of overload in a one second interval is P N >C = 1 − P N ≤ 5. However.9682. we need C to satisfy P [N > C] = P N − 300 C − 300 √ > √ 300 300 C − 300 √ = 0. the exact probability of overload is P [N > C] = 1 − P [N ≤ 328] = 1 − poissoncdf(300.2 Solution (a) Since the number of requests N has expected value E[N ] = 300 and variance Var[N ] = 300. =1−Φ 300 (1) (2) From Table 3. Since N is a Poisson (α = 5) random variable. (7) Some experiments with poissoncdf(alpha.1.9319 while P [N ≤ 9] = 0. which shows the central limit theorem approximation is reasonable.6.477 packets/sec. we know that C > 5. the number of requests N is a Poisson (α = 5) random variable.6. the direct calculation of the overload probability is not hard. or equivalently λ = 5 hits/sec.05. Thus. it would be a mistake to use the Central Limit Theorem to estimate this overload probability. With this somewhat arbitrary definition. we will make the somewhat arbitrary definition that the server capacity is C = 328. Thus in a one second interval. n! (6) (d) Here we find the smallest C such that P [N ≤ C] ≥ 0. since the server “capacity” in a one second interval is not precisely defined.0516.477 = 1 − P N ≤ 5 . Hence C = 9.95. (3) (4) (c) This part of the problem could be stated more carefully.6/60 = 5.65 300 = 328. As long as c > λ. Using the CLT. one can argue that as T becomes large. The reason is that the gap cT − λT between server capacity cT and the expected number of requests λT √ grows linearly in T while the standard deviation of the number of requests grows proportional to T .In general. p) binomial random variable when n is large and np = α. (1) 254 . In Chapter 12. p = 0. we would use the facts that E[N ] = 5 and Var[N ] = 5 to estimate the the overload probability as 5−5 1−P N ≤5 =1−Φ √ = 0.0215 20 K − 480 √ ≥√ ≈Q 96 96 20 √ 96 = 0. Since K is binomial (n = 600.) Further. Comment: Perhaps a more interesting question is why the overload probability in a one-second interval is so much higher than that in a one-minute interval? To answer this. These statistics won’t depend on a particular time period T and perhaps better describe the system performance.477 hits/sec. Since large n is also the case for which the central limit theorem applies.8) random variable.8. We make the assumption that the server system is reasonably well-engineered in that c > λ. (9) √ where k = (c − λ)/ λ. we will learn techniques to analyze a system such as this webserver in terms of the average backlog of requests and the average delay in serving in serving a request. Problem 6. In fact. In fact.0-binomialcdf(600. it is not surprising that the the CLT approximation for the Poisson (α) CDF is better when α = np is large. (We will learn in Chapter 12 that to assume otherwise means that the backlog of requests will grow without bound.7. In the earlier problem parts. we estimate the probability of finding at least 500 acceptable circuits as P [K ≥ 500] = P (c) Using Matlab.499) ans = 0. one should add that the definition of a T -second overload is somewhat arbitrary. We recall from Chapter 2 that a Poisson random is the limiting case of the (n. consider a T -second interval in which the number of requests NT is a Poisson (λT ) random variable while the server capacity is cT hits.0. we observe that 1. the Poisson and binomial PMFs become closer as n increases. the requirement for no overloads simply becomes less stringent.(e) If we use the Central Limit theorem to estimate the overload probability in a one second interval. the overload probability goes rapidly to zero as T becomes large.3 Solution (a) The number of tests L needed to identify 500 acceptable circuits is a Pascal (k = 500. for fixed p.0206. c = 5. which has expected value E[L] = k/p = 625 tests. the overload probability decreases with increasing T . However.8). E[K] = np = 480 and Var[K] = np(1 − p) = 96.5 (8) 5 which overestimates the overload probability by roughly 30 percent. (b) Let K denote the number of acceptable circuits in n = 600 tests. p = 0. assuming T is fairly large. we use the CLT to estimate the probability of overload in a T -second interval as P [NT ≥ cT ] = P cT − λT NT − λT √ ≥ √ λT λT √ =Q k T . solving the quadratic equation n− 500 p = (1. At s = c. By setting d 2 s /2 − sc = 2s − c = 0 ds 2 (2) we obtain s = c.35 × 10−4 3. 2 we use Table 3. c=1 c=2 c=3 c=4 c=5 Chernoff bound 0. we have that np − 500 = 1.29)2 1−p n p (4) we obtain n = 641.135 0.29. Since p = 0. we can write P [X ≥ c] = P [(X − µ)/σ ≥ (c − µ)/σ] = P [Z ≥ (c − µ)/σ] (1) Since Z is N [0. σ 2 ] random variable X. p) random variable K satisfies P [K ≥ 500] ≥ 0.1 with c replaced by (c − µ)/σ. implying z = −1. the Chernoff bound typically overestimates the true probability by roughly a factor of 10.1587 0.0013 3.29 2 np(1 − p).90. (3) Equivalently.17 × 10−5 2.3.1 and the fact that Q(c) = 1 − Φ(c). the CLT approximation yields 500 − np K − np ≥ ≈ 1 − Φ(z) = 0.87 × 10−7 (3) We see that in this case. for p = 0.2 Solution For an N [µ. Since E[K] = np and Var[K] = np(1−p). It follows that 1 − Φ(z) = Φ(−z) ≥ 0.73 × 10−6 Q(c) 0.1 Solution The N [0.011 3.(d) We need to find the smallest value of n such that the binomial (n.8.9.8. 1] random variable Z has MGF φZ (s) = es P [Z ≥ c] ≤ min e−sc es s≥0 2 /2 . (2) P [K ≥ 500] = P np(1 − p) np(1 − p) where z = (500 − np)/ np(1 − p). Problem 6.8. Hence the Chernoff bound for Z is 2 /2−sc 2 /2 = min es s≥0 (1) We can minimize es 2 /2−sc by minimizing the exponent s2 /2 − sc. Note that for c = 1. 1].8. This yields 2 2 P [X ≥ c] = P [Z ≥ (c − µ)/σ] ≤ e−(c−µ) /2σ (2) 255 .8.606 0.9. Thus we should test n = 642 circuits.0228 0. we can apply the result of Problem 6. Problem 6. The table below compares this upper bound to the true probability. the upper bound is P [Z ≥ c] ≤ e−c /2 . A complete expression for the Chernoff bound is 1 c<α (3) P [K ≥ c] ≤ αc ec e−α /cc c ≥ α Problem 6. In this case. λ = 1/2) random variable. Setting dh(s)/ds = αes − c = 0 yields es = c/α or s = ln(c/α). Note that for c < α. Hence P [Mn (X) ≥ c] = P [Wn ≥ nc] ≤ min e−sc φX (s) s≥0 n (3) 256 . Since W = X1 + X2 + X3 is an Erlang (n = 3. we choose s = 0 and the Chernoff bound is P [K ≥ c] ≤ 1.0028 (3) 10 + 1! 2! Problem 6. For c ≥ α.3 Solution From Appendix A.8. P [W > 20] = 1 − FW (20) = e−10 1 + k=0 (λw)k e−λw k! (1) (2) 102 = 61e−10 = 0. applying the Chernoff bound to Wn yields P [Wn ≥ nc] ≤ min e−snc φWn (s) = min e−sc φX (s) s≥0 s≥0 n (1) (2) For y ≥ 0. it is sufficient to choose s to minimize h(s) = α(es − 1) − sc.11 says that for any w > 0. This implies that the value of s that minimizes e−sc φX (s) also minimizes (e−sc φX (s))n . the CDF of W satisfies 2 FW (w) = 1 − Equivalently. we can write P [Mn (X) ≥ c] = P [Wn ≥ nc] Since φWn (s) = (φX (s))n . we know that the MGF of K is φK (s) = eα(e The Chernoff bound becomes P [K ≥ c] ≤ min e−sc eα(e s≥0 s −1) s −1) (1) = min eα(e s≥0 s −1)−sc (2) Since ey is an increasing function.8. Theorem 3.Problem 6.5 Solution Let Wn = X1 + · · · + Xn . the minimizing s is negative. for λ = 1/2 and w = 20. Since Mn (X) = Wn /n.4 Solution This problem is solved completely in the solution to Quiz 6.8.8! We repeat that solution here. y n is a nondecreasing function of y. applying s = ln(c/α) yields P [K ≥ c] ≤ e−α (αe/c)c . 5) random variable.15 which used the CLT to estimate the probability as 0. Of course.9546 The result 0. N . Here is the code: function pb=bignomialcdftest(N). w=[0. which is identical to binomialcdf. This enables calculations for larger values of n.5.w)). the program fails for N = 4.Problem 6. The problem.2461 0. although at some cost in numerical accuracy.0756 >> bignomialcdftest(6) ans = 0.0796 0.501 × 10n ] n (1) n = P [Wn ≤ 0.499 0.499 × 10 ] .2461 0. for n = 7.501 × 10 ] − P [Wn < 0. . p) random variable. we observe that P [Wn < w] = P [Wn ≤ w − 1] = FWn ( w − 1) Thus P [Bn ] = FWn (0. Unfortunately for this user.499 0. which fails for a binomial (10000. pb(n)=diff(bignomialcdf(10^n.m uses binomialpmf.501]*10^n.0.9546 for n = 6 corresponds to the exact probability in Example 6. 0.5.9. we can calculate P [Bn ] in this Matlab program: function pb=binomialcdftest(N).N). end Unfortunately.m.w)). for n=1:N. bignomialcdftest(7) failed. w(1)=ceil(w(1))-1.m rather than binomialpmf. . as noted earlier is that binomialcdf.N).0756 NaN 0. your mileage may vary. .499 × 10n ≤ Wn ≤ 0. here are the outputs of the two programs: >> binomialcdftest(4) ans = 0. 257 .m function.9544. w=[0. end For comparison. In this case.m. We need to calculate P [Bn ] = P [0.0.0796 0.1663 0. A slightly better solution is to use the bignomialcdf. . for n=1:N.501]*10^n.499 × 109 − 1 (4) (3) For n = 1.1 Solution Note that Wn is a binomial (10n . (2) A complication is that the event Wn < w is not the same as Wn ≤ w when w is an integer. on this user’s machine (a Windows XP laptop).m except it calls bignomialpmf. pb=zeros(1. w(1)=ceil(w(1))-1. pb=zeros(1.4750 0.501 × 10n ) − FWn 0. pb(n)=diff(binomialcdf(10^n. 9. λ) random variable is the sum of n of exponential random variables and the CLT says that the Erlang CDF should converge to a Gaussian CDF as n gets large. the convergence should be viewed with some caution. Here is a Matlab program that compares the binomial (n.The Erlang (n. plot(x. we re-create the plots of Figure 6.02 0.fy). More important.x). 20.9.2 0.x). df=fx-fy. Problem 6.04 0.3 except we use the binomial PMF and corresponding Gaussian PDF. 0. the two PDFs are quite different far away from the center of the distribution. p) PMF and the Gaussian PDF with the same expected value and variance. r=3*sqrt(n). This difference only appears to go away for n = 100 because the graph x-axis range is expanding. fy=gausspdf(n. Problem 6.1 0. For large postive x. The PDF of X as well as the PDF of a Gaussian random variable Y with the same expected value and variance are  n−1 −x e  x 1 2 x≥0 fY (x) = √ fX (x) = e−x /2n (1) (n − 1)!  2πn 0 otherwise function df=erlangclt(n).fx. The Erlang PDF is always zero for x < 0 while the Gaussian PDF is always positive.15 0.3 Solution In this problem. 100.05 0 0 x 5 10 0 0 80 100 x 120 erlangclt(4) erlangclt(20) erlangclt(100) On the other hand. fx=erlangpdf(n. x=(n-r):(2*r)/100:n+r.05 0. the mode (the peak value) of the Erlang PDF occurs at x = n − 1 while the mode of the Gaussian PDF is at x = n. For example. it not likely to be apparent that fX (x) and fY (x) are similar. Thus it’s not a good idea to use the CLT to estimate probabilities of rare events such as {X > x} for extremely large values of x. This is not surprising since the Erlang (n. λ = 1) random variable X has expected value E[X] = n/λ = n and variance Var[X] = n/λ2 = n. the Erlang PDF becomes increasingly similar to the Gaussian PDF of the same expected value and variance. the two distributions do not have the same exponential decay. 258 .1. Below are sample outputs of erlangclt(n) for n = 4.2 Solution From the forms of the functions. The following program plots fX (x) and fY (x) for values of x within three standard deviations of the expected value n.1 fX(x) fY(x) fX(x) fY(x) fX(x) fY(x) 10 20 x 30 0.x. In the graphs we will see that as n increases.sqrt(n). That is.4 fX(x) fX(x) 0 −5 pX(x) 0 5 x 10 15 20 0.5) 0. hold off.function y=binomcltpmf(n.’\it p_X(x) hold on.p) x=-1:17. Here are the output plots for p = 1/2 and n = 2.2 0 −5 0. plot(xx.y.5 0. 1 fX(x) fX(x) 0 −5 pX(x) 0 5 x 10 15 20 0. we treat PX1 (x) and PX2 (x2 )x. For small ∆. (3) fY (x) ≈ FY (x + 1/2) − FY (x − 1/2) .0. fY (x) ∆ ≈ For ∆ = 1.1 0. Problem 6. it can be useful for checking the correctness of a result.x). hold off. std=sqrt(n*p*(1-p)). 8.p. px1 = PX1 (0) PX1 (1) · · · px2 = PX2 (0) PX2 (1) · · · 259 PX1 (25) PX2 (100) (1) (2) .2 binomcltpmf(4.xx).std. we observe for an integer x that fY (x) ≈ FX (x + 1/2) − FX (x − 1/2) = PX (x) . pmfplot(x.clt).2 0. (2) FY (x + ∆/2) − FY (x − ∆/2) . y=binomialpmf(n. f_X(x)’). p) random variable X.0.’\it x’. 4.5) To see why the values of the PDF and PMF are roughly the same.05:17.4 pX(x) 0 5 x 10 15 20 binomcltpmf(2.0.5) binomcltpmf(16. ∆ (1) Although the equivalence in heights of the PMF and PDF is only an approximation. 16.9. consider the Gaussian random variable Y .4 Solution Since the conv function is for convolving signals in time. xx=-1:0.5) pX(x) 0 −5 0 5 x 10 15 20 binomcltpmf(8. we obtain Since the Gaussian CDF is approximately the same as the CDF of the binomial (n. clt=gausspdf(n*p. or as though they were signals in time starting at time x = 0.0. px2=zeros(1.25)]. set(h.m sw=(0:125).PY]=ndgrid(px.’\itw’.px2. The only difference is that the PMFs px and py and ranges sx and sy are not hard coded.px3=ones(1.pw. set(h. Problem 6. pw=conv(px1. The resulting plot will be essentially identical to Figure 6. ’\itw’.SX3]=ndgrid(sx1.sx3).px2=0. px2(10*(1:10))=10*(1:10)/550. px1=[0.05*ones(1.sy)..10).’\itP_W(w)’).9.19.PW.04*ones(1.sy).PX3]=ndgrid(px1. between its minimum and maximum values. In particular.6 Solution function [pw.sw]=sumfinitepmf(px.SX2. sw=unique(SW).’LineWidth’. h=pmfplot(sw. sx3=(1:30).20). Since the mdgrid function extends naturally to higher dimensions. One final note.25) to ensure that the bars of the PMF are thin enough to be resolved individually.PW.25).sx.30)/30.25). The output of sumx1x2x3 is the plot of the PMF of W shown below.sw).%convx1x2.’LineWidth’. the command set(h.19. pw=finitepmf(SW. We use the command set(h. this solution follows the logic of sumx1x2 in Example 6.*PY.*PX3.py.px3). In addition. [PX. we write down sw=0:125 directly based on knowledge that the range enumerated by px1 and px2 corresponds to X1 + X2 having a minimum value of 0 and a maximum value of 125. 260 .4. the vector px2 must enumerate all integer values.0. h=pmfplot(sw.px2).px1=0.py). PW=PX1. but instead are function inputs. 0.0. SW=SX+SY. pw=finitepmf(SW.0.04 0.03 PW(w) 0.5 Solution sx1=(1:10).pw. [PX1.. [SX1. including those which have zero probability.25) is used to make the bars of the PMF thin enough to be resolved individually.’\itP_W(w)’).1*ones(1.’LineWidth’.’LineWidth’.SY]=ndgrid(sx. sx2=(1:20).101).02 0. sw=unique(SW).0.PX2.PW=PX. [SX.sx2.0.01 0 0 10 20 30 w 40 50 60 Problem 6. sumfinitepmf generalizes the method of Example 6.9. SW=SX1+SX2+SX3..sw).*PX2. ’\it P_W(w)’).sy). 20) random variable and Y is an independent discrete uniform (0. 0. [pw.01 0. suppose X is a discrete uniform (0. set(h.0.sw]=sumfinitepmf(px.’\it w’.015 PW(w) 0.21)/21.25).sx. The following program sum2unif will generate and plot the PMF of W = X + Y .py=ones(1.py.m sx=0:20.005 0 0 10 20 30 40 50 w 60 70 80 90 100 261 . Here is the graph generated by sum2unif.px=ones(1.pw. %sum2unif. sy=0:80. h=pmfplot(sw.’LineWidth’.As an example.81)/81. 80) random variable. given the mean and variance. . (b − a)2 /12 = 3 (a + b)/2 = 7 (1) Solving these equations yields a = 4 and b = 10 from which we can state the distribution of X.1 yields P [M9 (X) > 7] ≈ 0. it must have a uniform PDF over an interval [a.247 (3) (c) First we express P [M9 (X) > 7] in terms of X1 .1. P [M9 (X) > 7] = 1 − P [(X1 + . we can look up that µX = (a+b)/2 and that Var[X] = (b−a)2 /12. we know that Var[M16 (X)] = 3 Var[X] = 16 16 (3) 262 . we obtain Var[M9 (X)] = (b) P [X1 ≥ 7] = 1 − P [X1 ≤ 7] 2 σX 25 = 9 9 (1) (2) −7/5 = 1 − FX (7) = 1 − (1 − e )=e −7/5 ≈ 0. σMn (x) = σX /n. From Appendix A. we obtain the following equations for a and b. . X2 .2 Solution 2 X1 . + X9 ) ≤ 63] (4) Now the probability that M9 (X) > 7 can be approximated using the Central Limit Theorem (CLT). . P [M9 (X) > 7] = 1 − P [M9 (X) ≤ 7] = 1 − P [(X1 + . Problem 7. . . .1.1 Solution Recall that X1 . Hence. .1. Xn are independent exponential random variables with mean value µX = 5 so that for x ≥ 0. . . .1. X2 . FX (x) = 1 − e−x/5 . . Xn are independent uniform random variables with mean value µX = 7 and σX = 3 (a) Since X1 is a uniform random variable. + X9 ) ≤ 63] ≈1−Φ 63 − 9µX √ 9σX = 1 − Φ(6/5) (5) (6) Consulting with Table 3. 1/6 4 ≤ x ≤ 10 fX (x) = (2) 0 otherwise (b) From Theorem 7.1151. Realizing that σX = 25.Problem Solutions – Chapter 7 Problem 7. X9 . 2 2 2 (a) Using Theorem 7. b]. . P [M16 (X) > 9] ≈ 1 − Φ As we predicted.1.2. (1) n Since X is a uniform (0. However is we peek ahead to this section. P [M16 (X) > 9] 144 − 16µX √ 16σX = 1 − Φ(2. Hence. the PDF of M16 (X) should be much more concentrated about E[X] than the PDF of X1 .1 says that the expected value of the difference is E [Y ] = E [X2n−1 ] + E [−X2n ] = E [X] − E [X] = 0 (b) By Theorem 6.3.0039 (6) (5) P [X1 > 9]. the problem isn’t very hard. This problem is in the wrong section since the standard error isn’t defined until Section 7. the variance of the difference between X2n−1 and X2n is Var[Yn ] = Var[X2n−1 ] + Var[−X2n ] = 2 Var[X] 263 (2) (1) . (2) (3) E Y 2 = E X4 = 0 1 Thus Var[Y ] = 1/5 − (1/3)2 = 4/45 and the sample mean Mn (Y ) has standard error en = 4 . 45n (4) Problem 7. Thus we should expect P [M16 (X) > 9] to be much less than P [X1 > 9]. For the sample mean M (Y ). 1) random variable. P [M16 (X) > 9] = 1 − P [M16 (X) ≤ 9] = 1 − P [(X1 + · · · + X16 ) ≤ 144] By a Central Limit Theorem approximation. In our Mn (X). x4 dx = 1/5.66) = 0.1. Given the sample mean estimate Var[Mn (X)]. Theorem 6.4 Solution (a) Since Yn = X2n−1 + (−X2n ). the standard error is defined as the standard deviation en = 2 . E [Y ] = E X 2 = 0 1 Problem 7.(c) P [X1 ≥ 9] = ∞ 9 fX1 (x) dx = 10 9 (1/6) dx = 1/6 (4) (d) The variance of M16 (X) is much less than Var[X1 ]. we need to find problem.3 Solution x2 dx = 1/3. we use samples Xi to generate Yi = Xi n the standard error Var[Y ] en = Var[Mn (Y )] = . As an example. we obtain 2 σX c2 (1) 1 (2) k2 The actual probability the Gaussian random variable Y is more than k standard deviations from its expected value is P [|X − E [X] | ≥ kσ] ≤ P [|Y − E [Y ]| ≥ kσY ] = P [Y − E [Y ] ≤ −kσY ] + P [Y − E [Y ] ≥ kσY ] Y − E [Y ] ≥k σY = 2Q(k) = 2P (3) (4) (5) The following table compares the upper bound and the true probability: k=1 k=2 k=3 Chebyshev bound 1 2Q(k) 0.2.317 0.73 × 10−7 The Chebyshev bound gets increasingly weak as k goes up. To simplify our calculations.25 2002 2002 (1) Problem 7.000 while for k = 5 the bound exceeds the actual probability by a factor of nearly 100.2.250 0.111 k=4 0. we will instead find the PDF of V = Y1 + Y2 + Y3 where 264 .2.3 Solution The hard part of this problem is to derive the PDF of the sum W = X1 + X2 + X3 of iid uniform (0.33 × 10−5 5. Y2 . In this case. Thus Y1 . . we need to use the techniques of Chapter 6 to convolve the three PDFs.0027 6.040 (6) 0. Problem 7. .1. the mean and variance of Mn (Y ) are E [Mn (Y )] = E [Yn ] = 0 2 Var[X] Var[Yn ] = Var[Mn (Y )] = n n (3) (4) Problem 7. is an iid random sequence.000. .(c) Each Yn is the difference of two samples of X that are independent of the samples used by any other Ym .2 Solution We know from the Chebyshev inequality that P [|X − E [X] | ≥ c] ≤ Choosing c = kσX . for k = 4. By Theorem 7.046 0. P [|W − E [W ] | ≥ 200] ≤ 1002 Var[W ] ≤ = 0.1 Solution If the average weight of a Maine black bear is 500 pounds with standard deviation equal to 100 pounds.0625 k=5 0. we can use the Chebyshev inequality to upper bound the probability that a randomly chosen bear will be more then 200 pounds away from the average. the bound exceeds the true probability by a factor of 1. 30) random variables. 5 0 −1 0 1 2 3 0 1 2 3 0 1 2 3 0≤v<1 1≤v<2 v 0 1 v−1 2 v−1 2≤v<3 From the graphs. let V2 = Y1 + Y2 . we use Theorem 3. 30) random variables. 1) random variables.  (w/30)2 /60   2 + 3(w/30) − 3/2]/30 30 ≤ w < 60. By Theorem 3.  0 ≤ w < 30.20 to conclude that W = 30V is the sum of three iid uniform (0. From Theorem 3. 1 0.5 0 −1 1 0.the Yi are iid uniform (0.20 to observe that W = 30V3 is the sum of three iid uniform (0.  [3 − (w/30)]2 /60 30   0 otherwise. 30) random variables. Evaluation of these integrals depends on v through the function fY3 (v − y) = 1 v−1<v <1 0 otherwise (4) To compute the convolution. 1 [−(w/30) fW (w) = fV3 (v3 ) v/30 = (8) 60 ≤ w < 90. Since each Y1 has a PDF shaped like a unit area pulse. we can compute the convolution for each case: 0≤v<1: 1≤v<2: 2≤v<3: fV3 (v) = fV3 (v) = fV3 (v) = 1 y dy = v 2 2 y dy + 1 v (5) (2 − y) dy = −v 2 + 3v − (3 − v)2 2 3 2 (6) (7) (2 − y) dy = To complete the problem.5 0 0 1 v 2 The PDF of V is the convolution integral fV (v) = = ∞ −∞ 1 0  0≤v≤1  v 2−v 1<v ≤2 fV2 (v) =  0 otherwise (1) fV2 (y) fY3 (v − y) dy 2 1 (2) (3) yfY3 (v − y) dy + (2 − y)fY3 (v − y) dy. it is helpful to depict the three distinct cases.5 0 −1 1 0. 265 . To start.19. the PDF of V2 is the triangular function 1 fV (v) 2 0. In each case. the square “pulse” is fY3 (v − y) and the triangular pulse is fV2 (y). P [R ≥ 250] = P [R − 108 ≥ 142] = P [|R − 108| ≥ 142] Var[R] ≤ = 0.1875. (a) By the Markov inequality. a success. 250 125 (2) Var[R] = 3(1 − p)/p2 = 3780. namely snake eyes. occurs with probability p = 1/36. When X is not Gaussian. for example. P [R ≥ 250] ≤ (b) By the Chebyshev inequality. >> 1-pascalcdf(3. The number of trials. Mn (X) when X is Gaussian. Problem 7. this is what pascalcdf does. R.4 Solution On each roll of the dice. Problem 7.6827. (1) (c) The exact value is P [R ≥ 250] = 1 − 249 PR (r). both inequalities are quite weak in this case. including.1/36.0299 Thus the Markov and Chebyshev inequalities are valid bounds but not good estimates of P [R ≥ 250]. the same claim holds to the extent that the central limit theorem promises that Mn (X) is nearly Gaussian for large n.2.3. σ) random variable Y .Finally. Since there is no way around summing the r=3 Pascal PMF to find the CDF.1 Solution For an an arbitrary Gaussian (µ.432. (142)2 (3) (4) E [R] 54 = = 0. 266 .249) ans = 0. we can compute the exact probability 1 P [W ≥ 75] = 60 (3 − w/30)3 0[3 − (w/30)] dw = − 6 75 9 2 90 = 75 1 48 (9) For comparison. the Markov inequality indicated that P [W < 75] ≤ 3/5 and the Chebyshev inequality showed that P [W < 75] ≤ 1/4. P [µ − σ ≤ Y ≤ µ + σ] = P [−σ ≤ Y − µ ≤ σ] = P −1 ≤ (1) (2) (3) Y −µ ≤1 σ = Φ(1) − Φ(−1) = 2Φ(1) − 1 = 0. needed for three successes is a Pascal (k = 3. As we see. Note that Y can be any Gaussian random variable. p) random variable with E [R] = 3/p = 108. Problem 7. we observe that ˆ Rn − r 2 Problem 7. i=1 (1) 267 . 1 E [M(n)] = n n i=1 1 E [X(i)] = n n µX = µX . Problem 7. If we let Y = X1 X2 and for the ith trial. First we note that Rn being asymptotically ˆ n ] = r. By Theorem 7.4 Solution (a) Since the expectation of a sum equals the sum of the expectations also holds for vectors. we see for n ≥ n0 that P It follows that n→∞ ˆ Rn − r ˆ Rn − r 2 2 ≥ 2 ≤ ˆ Var[Rn ] √ . Hence it ought to follow that R Here are the details: ˆ ˆ We must show that limN →∞ P [|Rn − r| ≥ ] = 0.7 tells us that Mn (Y ) is a consistent sequence. going to zero deterministically. (2) ≥ 2 ≤P ˆ ˆ Rn − E Rn + 2 /2 ≥ 2 =P ˆ ˆ Rn − E Rn 2 ≥ 2 /2 (3) By the Chebyshev inequality. Mn (Y ) is unbiased.3. ( / 2)2 (5) lim P ≥ 2 ≤ lim ˆ Var[Rn ] √ = 0. It follows for n ≥ n0 that P ˆ Rn − r 2 ˆ ˆ ≤≤ Rn − E Rn 2 + 2 /2. we have that P ˆ ˆ Rn − E Rn 2 ≥ 2 /2 ≤ ˆ Var[Rn ] √ ( / 2)2 (4) Combining these facts.3. there exists n0 such that unbiased implies that limn→∞ E[R ˆ |E[Rn ] − r| ≤ 2 /2 for all n ≥ n0 . Theorem 7. Similarly.3 Solution This problem is really very simple. Equivalently.ˆ It should seem obvious that the result is true since Var[Rn ] going to zero implies the probablity ˆ ˆ ˆ that Rn differs from E[Rn ] is going to zero. n→∞ ( / 2)2 (6) ˆ This proves that Rn is a consistent estimator. 2 (1) Thus for all n ≥ n0 .5. given > 0. Second. let Yi = X1 (i)X2 (i). Since Var[Y ] = Var[X1 X2 ] < ∞. ˆ then Rn = Mn (Y ).3. the difference between E[Rn ] and r is also ˆ n is converging to r in probability. the sample mean of random variable Y .2 Solution ˆ ˆ ˆ = (Rn − E Rn ) + (E Rn − r) ˆ Rn − r 2 2 ˆ ˆ ≤ Rn − E Rn 2 2 ˆ + E Rn − r . Y2 ... then Var[Y ] ≤ E[Y 2 ] < ∞. . which is just the sample mean of Xj . is an iid random sequence. implying Mn (Y ) is an unbiased estimator of Var[X]. it is sufficient to prove that E[Y 2 ] < ∞.. Problem 7.k max |Mj (n) − µj | ≥ c ≤ j=1 1 P [Aj ] ≤ 2 nc k 2 σj j=1 (4) < ∞. X2 . . limn→∞ P [maxj=1.1 (b) The jth component of M(n) is Mj (n) = n n Xj (i).. we find that P [Aj ] ≤ By the union bound. .3.5 to prove that Mn (Y ) is consistent if we show that Var[Y ] is finite.k |Mj (n) − µj | ≥ c] = 0.6 Solution 268 . By Theorem 7. E[Mn (Y )] = E[Y ] = Var[X]. If this independence is not obvious. . Note that Yk2 4 3 2 2 3 4 X2k − 4X2k X2k−1 + 6X2k X2k−1 − 4X2k X2k−1 + X2k−1 = 4 (3) Taking expectations yields 1 3 E X2 E Yk2 = E X 4 − 2E X 3 E [X] + 2 2 2 (4) Hence...3.5. . (2) Applying the Chebyshev inequality to Mj (n). we observe that P j=1. Problem 7. We can use Theorem 7.. 1 2 2 E [Yk ] = E X2k − 2X2k X2k−1 + X2k−1 = E X 2 − (E [X])2 = Var[X] 2 (2) X2k−1 − X2k 2 2 + X2k − X2k−1 2 2 = (X2k − X2k−1 )2 2 (1) Next we observe that Y1 . .5 Solution Note that we can write Yk as Yk = Hence. and so on. . i=1 Defining Aj = {|Mj (n) − µj | ≥ c}... Y2 . . Y1 .. . Hence.. consider that Y1 is a function of X1 and X2 . Since X1 . k 2 σj Var[Mj (n)] = 2. Y2 is a function of X3 and X4 . the sequence Mn (Y ) is consistent. is an idd sequence. c2 nc (3) P Since k 2 j=1 σj j=1.k max |Mj (n) − µj | ≥ c = P [A1 ∪ A2 ∪ · · · ∪ Ak ] .. if the first four moments of X are finite. is an iid sequence. Since Var[Y ] ≤ E[Y 2 ]. (a) From Theorem 6. . . . . Var[X1 + · · · + Xn ] ≤ nσ 2 (7) (b) Since the expected value of a sum equals the sum of the expected values. Xn )] = Var[X1 + · · · + Xn ] σ 2 (1 + a) ≤ n2 n(1 − a) Var[M (X1 . Xj ] (1) Note that Var[Xi ] = σ 2 and for j > i. E [M (X1 . . Xn ) − µ| ≥ c] ≤ lim σ 2 (1 + a) =0 n→∞ n(1 − a)c2 (11) n→∞ Thus n→∞ lim P [|M (X1 . This implies n−1 n Var[X1 + · · · + Xn ] = nσ 2 + 2σ 2 = nσ 2 + 2σ 2 = nσ 2 + With some more algebra. . . Xn ) yields P [|M (X1 . . . . . . . . . . . . . . . Xn )] = The variance of M (X1 . . Cov[Xi . . Xn ) is Var[M (X1 . Xj ] = σ 2 aj−i . . Xn )] σ 2 (1 + a) ≤ c2 n(1 − a)c2 (9) E [X1 ] + · · · + E [Xn ] =µ n (8) Applying the Chebyshev inequality to M (X1 . . we have n n−1 n Var[X1 + · · · + Xn ] = Var[Xi ] + 2 i=1 i=1 j=i+1 Cov [Xi . we obtain Var[X1 + · · · + Xn ] = nσ 2 + = aj−i i=1 j=i+1 n−1 i=1 2 n−1 i=1 (2) (3) (4) a + a2 + · · · + an−i (1 − an−i ) 2aσ 1−a 2aσ 2 2aσ 2 (n − 1) − a + a2 + · · · + an−1 1−a 1−a 2 a n(1 + a)σ 2 2aσ 2 − − 2σ 2 (1 − an−1 ) 1−a 1−a 1−a 1+a 1−a (5) (6) Since a/(1 − a) and 1 − an−1 are both nonnegative. Xn ) − µ| ≥ c] = 0 269 (12) . . . Xn ) − µ| ≥ c] ≤ (10) (c) Taking the limit as n approaches infinity of the bound derived in part (b) yields lim P [|M (X1 . . . . .2. . 8.j = we observe that ˆ P max Ri.j (n) − E [Xi Xj ] ≥ c . m=1 (1) ˆ (b) This proof follows the method used to solve Problem 7.4.j ] .j ˆ Var[Ri.1 x = 0 0.05] = P [|M90 (X) − E [X] | ≥ .j P [Ai.11.09 = = 0. the product of jointly Gaussian random variables. ˆ P max Ri.j ˆ Ri. jth element of R(n) is ˆ i. we find that P [Ai.j (6) It follows that k 2 maxi.j (n) = 1 n Xi (m)Xj (m). i. c2 nc2 1 nc2 (4) i.j (n)] Var[Xi Xj ] = .05)2 (2) (3) 270 .j ] ≤ By the union bound.3. The i.Problem 7.j (5) By the result of Problem 4. (b) We can use the Chebyshev inequality to find P [|M90 (X) − PX (1)| ≥ . Defining the event R m=1 n Ai.j (n) − E [Xi Xj ] ≥ c = P [∪i. i. the Chebyshev inequality states that α= 2 σX .j ] ≤ Var[Xi Xj ] i. Xi Xj .j nc2 (7) Problem 7.j (n) − E [Xi X − j] ≥ c ≤ lim =0 n→∞ n→∞ i.7 Solution (a) Since the expectation of the sum equals the sum of the expectations.j Var[Xi Xj ] ˆ lim P max Ri.j Ai. which is just the sample mean of Xi Xj .05) 90(.j (n) − E [Xi Xj ] ≥ c ≤ i.9 x = 1 PX (x) =  0 otherwise (1) (a) E[X] is in fact the same as PX (1) because X is a Bernoulli random variable.j (n).1 Solution   0.05] ≤ α In particular.4 2 90(. 1 ˆ E R(n) = n n E X(m)X (m) = m=1 1 n n R = R.3.4. (2) (3) ˆ Applying the Chebyshev inequality to Ri. Thus k k Var[Xi Xj ] = i.j i=1 j=1 Var[Xi Xj ] ≤ k 2 max Var[Xi Xj ] < ∞. has finite variance. 3 Solution (a) Since XA is a Bernoulli (p = P [A]) random variable. From the Chebyshev inequality.1 = .03)2 2 Since σX = 0. Since even under the Gaussian assumption. 271 Var[XA ] = P [A] (1 − P [A]) = 0.521.8. Problem 7. However. E [XA ] = P [A] = 0. (b) If each Xi is a Gaussian. (4) n(. Mn (X) will also be Gaussian with mean and variance E [Mn (X)] = E [X] = 75 Var [Mn (X)] = Var [X] /n = 225/n In this case.09. n = 1. X2 . .99 (6) (7) (8) (4) (5) (2) (3) Thus. . are iid random variables each with mean 75 and standard deviation 15.2 Solution X1 . solving for n yields n = 100. .99 (1) When we know only the mean and variance of Xi . we write 2 σX 0. about 1500 samples probably is about right.4. the sample mean.99 n n This yields n ≥ 22. the sample mean may be approximated by a Gaussian. Hence. Problem 7.4. P [74 ≤ Mn (X) ≤ 76] = Φ 74 − µ 76 − µ −Φ σ σ √ √ = Φ( n /15) − Φ(− n /15) √ = 2Φ( n /15) − 1 = 0.03] ≤ .500. our only real tool is the Chebyshev inequality which says that P [74 ≤ Mn (X) ≤ 76] = 1 − P [|Mn (X) − E [X]| ≥ 1] Var [X] 225 ≥1− =1− ≥ 0.16. we cannot make any guarantees stronger than that given by the Chebyshev inequality.01.(c) Now we wish to find the value of n such that P [|Mn (X) − PX (1)| ≥ . in the absence of any information about the PDF of Xi beyond the mean and variance. (a) We would like to find the value of n such that P [74 ≤ Mn (X) ≤ 76] = 0. the number of samples n is so large that even if the Xi are not Gaussian. (1) . we must ensure that the constraint is met for every value of p.12(b) to write Var[XA ] 0.12(b) to write P [|M100 (X) − p| < c] ≥ 1 − For confidence coefficient 0. with 100 samples. our confidence coefficient is 1 − α = 0. the number of samples n is unknown. we have confidence coefficient 1 − α = 0. (d) In this case.5 imply that our interval estimate is 0 ≤ p < 0. Problem 7.i ] = i=1 P [A] (1 − P [A]) .1)2 ] = 0. If M100 (X) = 0.16 ˆ P100 (A) − P [A] < c ≥ 1 − =1− = 1 − α. or n = 320.1. n (2) ˆ (c) Since P100 (A) = M100 (XA ).4.84. Var[XA.1)2 ] = 0.95 if α = 0. we require p(1 − p) ≤ 0. (4) P Pn (A) − P [A] < c ≥ 1 − nc2 nc2 For c = 0. we use Theorem 7. In this case. Since p ≥ 0. α = 0.16/[100(0. (2) p(1 − p) = 1 − α. Since Pn (A) = Mn (XA ) = ˆ Var[Pn (A)] = 1 n2 n 1 n n i=1 XA.99. Once again. Problem 7.06. Thus.4 Solution Since E[X] = µX = p and Var[X] = p(1 − p). our interval estimate for p is M100 (X) − c < p < M100 (X) + c. we can use Theorem 7.5 Solution First we observe that the interval estimate can be expressed as ˆ Pn (A) − P [A] < 0.12(b) to write P Var[XA ] 0. we use Theorem 7.01 100c2 or c≥ p(1 − p).06 and c = 0. c = 1/4 = 1/2 is the smallest value of c for which we have confidence coefficient of at least 0. 100c2 (1) Since p is unknown. 2 100c 100c2 (3) For c = 0.ˆ (b) Let XA.05.1.56. (1) 272 .4.99.16.05. (4) (3) Our interval estimate is not very tight because because 100 samples is not very large for a confidence coefficient of 0.99.16/[n(0. M100 (X) = 0.i to denote XA on the ith trial.16 ˆ =1− = 1 − α. The worst case occurs at p = 1/2 which maximizes p(1 − p).i . 9.05) n(0.000. we can write 0.9 × 107 ˆ Pn (A) − P [A] ≥ c ≤ = = nc2 n10−10 n (4) The confidence level 0.01 is met if 9.6: P P [A] (1 − P [A]) ˆ Pn (A) − P [A] ≥ c ≤ nc2 (1) The unusual part of this problem is that we are given the true value of P [A]. we require that 9900/n ≤ 0. n(0. Problem 7. we meet the requirement by choosing c = 0. we require that 1− Var[XA ] 0. nc2 (1) Note that Var[XE ] = P [E](1 − P [E]) since XE is a Bernoulli (p = P [E]) random variable.7 Solution ˆ Since the relative frequency of the error event E is Pn (E) = Mn (XE ) and E[Mn (XE )] = P [E]. 2 n(0.4.05)2 (3) This implies n ≥ 1. we can conclude that Var[XE ] = P [E] (1 − P [E]) ≤ P [E] ≤ . we can use Theorem 7.4.25 ≥1− ≥ 0.05)2 (2) Note that Var[XA ] = P [A](1 − P [A]) ≤ 0.001 ≤ n (3) Thus to have confidence level 0. nc2 nc (2) (3) 273 .12(a) to write P Var[XE ] ˆ Pn (A) − P [E] ≥ c ≤ .05 ≥ 1 − .01. This implies P P [A] (1 − P [A]) 0. Since P [A] = 0.9 × 107 /n = 0.12(b) to write P Var[XA ] ˆ Pn (A) − P [A] < 0.0099 ˆ (2) P Pn (A) − P [A] ≥ c ≤ nc2 (a) In this part.0099 9.9 × 109 .001 yielding P 9900 ˆ Pn (A) − P [A] ≥ 0.01.9. (b) In this case. Thus for confidence coefficient 0. Problem 7. This requires n ≥ 990.ˆ Since Pn (A) = Mn (XA ) and E[Mn (XA )] = P [A]. we meet the requirement by choosing c = 10−3 P [A] = 10−5 .25.01. Using the additional fact that P [E] ≤ and the fairly trivial fact that 1 − P [E] ≤ 1.01 or n = 9.6 Solution Both questions can be answered using the following equation from Example 7. Thus P Var[XE ] ˆ Pn (A) − P [E] ≥ c ≤ ≤ 2. we can use Theorem 7.000 samples are needed. the 0. (2) (3) (1) Problem 7. Here are two output graphs: 2 1 Mn(X) Mn(X) 0 20 n 40 60 0 3 2 1 0 −1 −2 −1 −2 0 200 n 400 600 poissonmeanseq(60) poissonmeanseq(600) 274 . .65/ n and lower limit Mn (X) − 1.65/ n of the 0.M+r.M-r)./sqrt(nn).9 = P −cn ≤ Mn (X) − α ≤ cn (4) (5) √ √ Equivalently.nn.65/ n. we will try avoid using α for the confidence coefficient. . plot(nn. Using X to denote the Poisson (α = 1) random variable.9 confidence implies 0. Φ(cn n) = 0. as a function of the number of samples n.1 Solution Since Var[Mn (X)] = Var[X]/n = 1/n. function M=poissonmeanseq(n). In particular. a 0. M2 (X).9 confidence interval shrinks with increasing n.95 or cn = 1. Thus. and the upper limit Mn (X) + 1.nn. nn=(1:n)’. .5.M. c = cn will be a decreasing sequence. we have to keep straight that the Poisson expected value α = 1 is a different α than the confidence coefficient 1 − α. 1/n 1/n 1/n √ √ √ = Φ(cn n) − Φ(−cn n) = 2Φ(cn n) − 1. we plot three functions: the sample mean √ √ Mn (X). M=cumsum(x). Using a Central Limit Theorem approximation.In this problem. We use the Matlab function poissonmeanseq(n) to generate these sequences for n sample values. x=poissonrv(1. That said. the trace of the sample mean is the sequence M1 (X). The confidence interval estimate of α has the form Mn (X) − c ≤ α ≤ Mn (X) + c. The confidence coefficient of the estimate based on n samples is P [Mn (X) − c ≤ α ≤ Mn (X) + c] = P [α − c ≤ Mn (X) ≤ α + c] = P [−c ≤ Mn (X) − α ≤ c] .65)./nn.n).9 confidence interval. r=(1. the probability that a Gaussian random variable is within one standard deviation of its expected value. we observe that if Y is an exponential (1) random variable. every other value of (n ± n)/2 exceeds the next integer and increases pn . Finally.2 Solution The alternating up-down sawtooth pattern is a consequence of the CDF of Kn jumping up at √ √ integer values. For 1. depending on the value of n/2. Moreover.5. .5. r=[ceil((n-sqrt(n))/2)-1. xlabel(’\it n’).p).6 0. are iid samples of Y . then Y1 /λ.5. In particular.3 Solution Yi = i=1 1 Mn (Y ) λ (1) (Xi − Mn (X))2 = 1 n−1 n i=1 Yi 1 − Mn (Y ) λ λ 2 = Vn (Y ) λ2 (2) 275 . end plot(1:N.N). In this case. p = 1/2) random variable. p(n)=diff(binomialcdf(n. 1 0.5 was so large. . For very large n. the sawtooth pattern dies out and pn will converge to 0. .5 0 10 20 n 30 40 50 n Problem 7. the sample mean of X is 1 Mn (X) = nλ Similarly.. are iid samples of X.0. The less frequent but larger jumps in pn occur when √ n is an integer.For a Bernoulli (p = 1/2) random variable X. .68. Thus if Y1 . function p=bernoullistderr(N).r)).000 Bernoulli sample mean traces that were within one standard error of the true expected value.8 pn 0. for n=1:N. . That is.20 tells us that X = Y /λ is an exponential (λ) random variable. the sample mean Mn (X) is the fraction of successes in n Bernoulli trials. First. Y2 /λ. (n+sqrt(n))/2]. Y2 . p=zeros(1.000 traces in Quiz 7.5 where we graphed the fraction of 1. . To address this. the central lmit theorem takes over since the CDF of Mn (X) converges to a Gaussian CDF. Thus the probability the sample mean is within one standard error of (p = 1/2) is √ √ n n n n − ≤ Kn ≤ + (1) pn = P 2 2 2 2 √ √ n n n n − P Kn < − (2) = P Kn ≤ + 2 2 2 2 √ √ n n n n + − FKn − −1 (3) = FKn 2 2 2 2 Here is a Matlab function that graphs pn as a function of n for N steps alongside the output graph for bernoullistderr(50). .. the fraction of the number of traces within one standard error was always very close to the actual probability pn .7 0.9 0. we need to determine whether the relative performance of the two estimators depends on the actual value of λ. ylabel(’\it p_n’). then Theorem 3. the sample variance of X satisfies Vn (X) = 1 n−1 n i=1 Problem 7. we note that the sawtooth pattern is essentially the same as the sawtooth pattern observed in Quiz 7. Mn (X) = Kn /n where Kn is a binomial (n. ) From the plot. Zi ) for each trial.1 0. We start by using the commands z=lamest(1000.:). The plot command generates a scatter plot of ˆ ˜ the error pairs (Zi .’bd’) to perform 1. it may not be obvious that one estimator is better than the other. 0.1 −0.1).05 0 z(1. x=reshape(x.m. Z) where Z is plotted on the x-axis ˜ and Z is plotted on the y-axis.05. each column of matrix x represents one trial. then z(1.n. the estimators λ and λ are just scaled versions of the estimators for the case λ = 1. Now that we can simulate the errors generated by each estimator.n*m).:). 0. Note that mx is a row vector such that mx(i) is the sample mean for trial i. function z=lamest(n.z(2. z=[(1.^2)/(n-1).000 samples.15 ˆ ˜ ˆ In the scatter plot.m) returns the estimation errors for m trials of each estimator where each trial uses n iid exponential (1) samples. mx=sum(x)/n. each diamond marks an independent pair (Z.i) 0 −0. Thus vx is a row vector such that vx(i) is the sample variance for trial i.m).2 −0. (1. Finally. x=exponentialrv(1.1)*mx.1.We can conclude that ˆ λ= λ Mn (Y ) ˜ λ= λ Vn (Y ) (3) ˆ ˜ For λ = 1.m). ˆ one can observe that it appears that typical values for Z are in the interval (−0. In lamest.1000).1 −0.1 z(2. The matrix MX has mx(i) for every element in column i. vx=sum((x-MX). we calculate 276 .i) is the ˜ ˜ error Zi = λi − 1. z is a 2 × m matrix such that column i of z records the estimation errors for trial i./mx). by reading the axis ticks carefully.000 trials. To verify ˆ typical values for Z this observation.05 0. (Although it is outside the scope of this solution.2 0.000 trials. If ˜ ˆ ˆ ˆ λi and λi are the estimates for for trial i. MX=ones(n. we calculate the sample mean for each squared errors ˆ Mm (Z 2 ) = 1 m m i=1 ˆ Zi2 ˜ Mm (Z 2 ) = 1 m m i=1 ˜ Zi2 (4) From our Matlab experiment with m = 1. However. The function z=lamest(n.i) 0./sqrt(vx))]-1. each using 1. plot(z(1. it is interesting ˆ ˜ to note that the errors Z and Z appear to be positively correlated. Here is an example of the resulting scatter plot: 0.i) is the error Zi = λi − 1 while z(2. Hence it is sufficient to consider only the λ = 1 case.05) while ˜ are in the interval (−0. This suggests that Z may be superior. we need to determine which estimator is better. R=X*(X’)/ntest.0021 ˆ ˜ That is. X=gaussvector(0.0010 0.ntest).05 is exceeded.t) performs the calculation by generating t sample covariance matrices. estimate p(n) for n ∈ {10. 100. In addition.1). M1. then the experiment is repeated for each value in the vector n.01 but that requirement demanded that n be so large that most installations of Matlab would grind to a halt on the calculations.) The Matlab program uses a matrix algebra identity that may (or may not) be familiar. (3) with columns x(i). The following commands. Problem 7. if the input n to diagtest is a vector. (4) function p=diagtest(n.2)/1000 ans = 0.0010 and M1.4 Solution For the sample covariance matrix 1 ˆ R(n) = n n X(m)X (m).eye(10).05 . 1000.5. disp(ntest) for i=1:t. p=[ ]. end p=[p. each using n vectors x(i) in each sample covariance matrix. sum(y)/t]. y=zeros(t. ˆ m→∞ M (Z 2 ) m lim ˜ Mm (Z 2 ) = 2/m (5) (6) ˜ ˆ In short. The vector x(i) is the ith column of the matrix X. In fact.000 (Z 2 ) = 0. the mean squared error of the λ estimator is twice that of the λ estimator.05 was 0. for ntest=n.j (2) (In the original printing. For a matrix X = x(1) x(2) · · · x(n) . m=1 (1) we wish to use a Matlab simulation to estimate the probability ˆ p(n) = P max Rij − Iij ≥ 0. ylabel(’\it p(n)’). These n vectors are generated by the gaussvector function. we can write n XX = i=1 x(i)x (i).05).0021.t).>> sum(z. y(i)=(max(max(abs(R-eye(10))))>= 0.p). i. The Matlab function diagtest(n. one can show (with a lot of work) for large m that ˆ Mm (Z 2 ) ≈ 1/m and that ˜ Mm (Z 2 ) = 2. 0. 10000} using t = 2000 trials: 277 . The function diagtest estimates p(n) as the fraction of t trials for which the threshold 0.000 (Z 2 ) = 0.^2. end semilogy(n. xlabel(’\it n’). a) to simulate one hundred traces of the sample mean when E[X] = a.05 threshold is exceeded only a few times. At this point.0000 1. We use a semilog plot to emphasize differences when p(n) is close to zero.000 to n = 10. As requested by the problem.5.2000). Note that if x has dimension greater than 10.000. To resolve whether p = 0 (and the sample mean converges to the expected value) one can spend time trying to prove either p = 0 or p > 0.000. we execute the commands nn=1000:500:10000. then the value of n needed to ensure that p(n) is small would increase.000 trials. Problem 7. p=diagtest(n. p=diagtest(nn. EX 2 and higher order moments are infinite. To investigate this transition more carefully. Thus Var[X] is also infinite. M2 (X).0035 We see that p(n) goes from roughly 1 to almost 0 in going from n = 1. we implement a Matlab function samplemeantest(n. In this case. .0000 1. 10 0 10 p(n) 10 −1 −2 10 −3 1000 2000 3000 4000 5000 n 6000 7000 8000 9000 10000 Beyond n = 1.2000).5 Solution The difficulty in this problem is that although E[X] exists. . .0000 0. Mn (X). 278 . or • p = 0 but the Chebyshev inequality isn’t a sufficient powerful technique to verify this fact. we cannot apply the Chebyshev inequality to the sample mean to show the convergence in probability of Mn (X) to E[X]. the 0. The “bumpiness” of the graph for large n occurs because the probability p(n) is small enough that out of 2. we try some simulation experiments to see if the experimental evidence points one way or the other. the probability p(n) declines rapidly. then there are two distinct possibilities: • p > 0.n=[10 100 1000 10000]. . It also follows for any finite n that the sample mean Mn (X) has infinite variance. If limn→∞ P [|Mn (X) − E[X]| ≥ ] = p. Each trace is a length n sequence M1 (X). The output is shown in the following graph. The output is p= 1. 5 Mn(X) 5 4. xlabel(’\it n’). Taking cumulative sums along each column of x. you won’t know for sure whether the sample mean sequence always converges. (This will make a a fun problem for the third edition!) 279 . it is difficult to draw strong conclusions from the graph.function mx=samplemeantest(n. say samplemeantest(10000.5 4 0 100 200 300 400 500 n 600 700 800 900 1000 Frankly.5): 6 5. Repeating the experiment with more samples.5 to a + 0. will yield a similarly inconclusive result. The following graph was generated by samplemeantest(1000. mx=cumsum(x). the experiment is probably enough that if you pursue the analysis. There may also be outlier sequences which are still off the charts since we truncated the y-axis range. the sample mean sequences do not appear to be diverging (which is also possible since Var[X] = ∞./sqrt(1-u)). d=((1:n)’)*ones(1. and dividng row i by i. ylabel(’\it M_n(X)’). each column of mx is a length n sample mean trace.100). x=a-2+(1. you should start by trying to prove that p = 0.5).000 samples.5. If the sample sequences Mn (X) are converging to E[X]. we then plot the traces. axis([0 n a-1 a+1]). the convergence is fairly slow./d. u=rand(n. On the other hand.) Note the above graph was generated using 105 sample values. typical values for the sample mean appear to range from a − 0. On the other hand.a). Even after averaging 1. Even if your version of Matlab can support the generation of 100 times as many samples.100). The n × 100 matrix x consists of iid samples of X. plot(mx). Under the hypothesis that the coin is fair. r = − log2 (0. which will determine the upper and lower limits on how many heads we can get away from the expected number out of 100 flips and still accept our hypothesis. 280 . What remains is to choose r so that P [R] = 0.8 flips.95 (4) |K − E [K] | c > σK σK = 0. In that sense.1. We would like to find the value of c.1.2 Solution (a) We wish to develop a hypothesis test of the form P [|K − E [K] | > c] = 0.05.975 or c/5 = 1.05) = log2 (20) = 4. and the standard deviation of the process are E [K] = 50.1 Solution Assuming the coin is fair.96. third or fourth flip. then with significance level α ≈ 0. say 0.05 (2) Thus. second.05 (3) Using the Central Limit Theorem we can write Φ c σK −Φ = 2Φ c σK − 1 = 0. Note that L > l if we first observe l tails in a row. Under our fair coin hypothesis. We can choose a rejection region R = {L > r}. we reject the hypothesis that the coin is fair if L ≥ 5. The shortcoming of this test is that we always accept the hypothesis that the coin is fair whenever heads occurs on the first. our test cannot identify that kind of biased coin.9. σK = 100 · 1/2 · 1/2 = 5. (2) Now in order to find c we make use of the central limit theorem and divide the above inequality through by σK to arrive at P Taking the complement. the expected number of heads. l tails in a row occurs with probability (1) P [L > l] = (1/2)l Thus.0625 which close to but not exactly 0. we need P [R] = P [L > r] = 2−r = 0.Problem Solutions – Chapter 8 Problem 8. we would hardly ever reject the hypothesis that the coin is fair.05. So we see that if we observe more then 50 + 10 = 60 or less then 50 − 10 = 40 heads. we must choose a rejection region R such that α = P [R] = 0.8 or 0. we get P − c K − E [K] c ≤ ≤ σK σK σK −c σK = 0.05 (1) to determine if the coin we’ve been flipping is indeed a fair one.95 (5) This implies Φ(c/σK ) = 0. c = 9.32. That is. Problem 8. The significance level of the test is α = P [L > 4] = 2−4 = 0. If the coin was biased such that the probability of heads was much higher than 1/2.05. In this case.05 we should reject the hypothesis that the coin is fair. 99) 3. Once again. The probability p that a chip passes the one-day test is p = P [X ≥ 1/365] = e−λ/365 . (c) Raising T raises the failure rate λ = (T /200)2 and thus lowers m = 3.3 Solution A reasonable test would reject the null hypothesis that the plant is operating normally if one or more of the chips fail the one-day test. λ = 1/4 and we obtain a significance level of α = 0.01 (6) Thus we need to find the value of c that makes the above probability true. we should reject the hypothesis that the coin is fair. the significance level of the test is α = P [N > 0] = 1 − P [N = 0] = 1 − pm = 1 − e−mλ/365 . the significance level is α = 0. an exponential (λ) random variable with λ = (T /200)2 .01 ≤ σK σK (8) Since E[K] = 50 and σK = 5. we express our rejection region in terms of a zero mean.67/λ. This value will tell us that if we observe more than c heads. For an m chip test. then with significance level α = 0. the CLT approximation is P [K > c] ≈ 1 − Φ c − 50 5 = 0. we have (c − 50)/5 = 2.01.0102. Another way to obtain this result is to use a Central Limit Theorem approximation. So if we observe 62 or greater heads. we see that we reject the hypothesis if we observe 62 or more heads. i (7) Computation reveals that c ≈ 62 flips. P [K > c] = 1 − P [K ≤ c] = 1 − P c − E [K] K − E [K] = 0. λ λ (3) (2) (1) In fact. unit variance random variable.1.01 (9) From Table 3. at m = 15 chips. To find this value of c we look to evaluate the CDF k FK (k) = i=0 100 (1/2)100 . In essence.67 = = 14. Problem 8. then with a significance level of 0.01 we should reject the fair coin hypothesis. (b) At T = 100◦ .01 if m=− 365 ln(0.74. (a) The lifetime of a chip is X. First.35 or c = 61. Exactly how many shold be tested and how many failures N are needed to reject the null hypothesis would depend on the significance level of the test. 281 . raising the temperature makes a “tougher” test and thus requires fewer chips to be tested for the same significance level.75.(b) Now we wish to develop a test of the form P [K > c] = 0.1. a binomial (n.01. Φ(d0 /σD ) = 0. keep in mind that there is a one percent probability that a normal factory will fail the test. Since the null hypothesis H0 asserts that the two exams have the same mean and variance.96σD = 1. we test n pacemakers and we reject the null hypothesis if any pacemaker fails the test.1. That is. we have that n= ln(1 − α) = 100. (1) fT (t) dt = e−t0 /3 . Var[MA ] = Var[MB ] = 100 σ2 = n n (1) Problem 8. we choose the smallest n such that we meet the required significance level of the test.05 if t0 = −3 ln α = 8.6 Solution Since n is large. the two sample means satisfy E [MA ] = E [MB ] = µ. q0 = 10−4 ) random variable.Problem 8.77 and we reject the null hypothesis that the exams are the same if the sample means differ by more than 2.4 Solution (a) The rejection region is R = {T > t0 }.1.99 minutes. D is also Gaussian with E [D] = E [MA ] − E [MB ] = 0 Var[D] = Var[MA ] + Var[MB ] = 200 . (2) (b) The significance level is α = 0.5. Under H0 .975. For a significance level α = 0. then d0 = 2. The duration of a voice call has exponential PDF fT (t) = The significance level of the test is α = P [T > t0 ] = ∞ t0 (1/3)e−t/3 t ≥ 0. 282 . 0 otherwise. Moreover.96 200/n. we can calculate the significance level of the test as α = P [|D| ≥ d0 ] = 2 (1 − Φ(d0 /σD )) . or d0 = 1. we reject H0 if the difference in sample means is large. Since MA and MB are independent.05.77 points. ln(1 − q0 ) (2) (1) Problem 8. (3) For α = 0. That is.5 Solution Comment: For α = 0. a test failure is quite unlikely if the factory is operating normally. n (2) Under the Gaussian assumption.01. In order to test just a small number of pacemakers. it is reasonable to use the Central Limit Theorem to approximate MA and MB as Gaussian random variables. R = {|D| ≥ d0 }. The significance level of the test is α = P [X > 0] = 1 − P [X = 0] = 1 − (1 − q0 )n . If n = 100 students take each exam. The number of pacemakers that fail the test is X.1. For the MAP test, we must choose acceptance regions A0 and A1 for the two hypotheses H0 and H1 . From Theorem 8.2, the MAP rule is n ∈ A0 if PN |H0 (n) P [H1 ] ≥ ; PN |H1 (n) P [H0 ] n Problem 8.2.1 Solution n ∈ A1 otherwise. (1) Since PN |Hi (n) = λn e−λi /n!, the MAP rule becomes i n ∈ A0 if λ0 λ1 e−(λ0 −λ1 ) ≥ P [H1 ] ; P [H0 ] n ∈ A1 otherwise. (2) By taking logarithms and assuming λ1 > λ0 yields the final form of the MAP rule n ∈ A0 if n ≤ n∗ = λ1 − λ0 + ln(P [H0 ] /P [H1 ]) ; ln(λ1 /λ0 ) n ∈ A1 otherwise. (3) From the MAP rule, we can get the ML rule by setting the a priori probabilities to be equal. This yields the ML rule n ∈ A0 if n ≤ n∗ = λ1 − λ0 ; ln(λ1 /λ0 ) n ∈ A1 otherwise. (4) Problem 8.2.2 Solution Hypotheses H0 and H1 have a priori probabilities P [H0 ] = 0.8 and P [H1 ] = 0.2 and likelihood functions fT |H0 (t) = (1/3)e−t/3 t ≥ 0, otherwise, fT |H1 (t) = (1/µD )e−t/µD t ≥ 0, otherwise, (1) The acceptance regions are A0 = {t|T ≤ t0 } and A1 = {t|t > t0 }. (a) The false alarm probability is ∞ t0 PFA = P [A1 |H0 ] = (b) The miss probability is PMISS = P [A0 |H1 ] = fT |H0 (t) dt = e−t0 /3 . (2) t0 0 fT |H1 (t) dt = 1 − e−t0 /µD . (3) (c) From Theorem 8.6, the maximum likelihood decision rule is t ∈ A0 if fT |H0 (t) ≥ 1; fT |H1 (t) t ∈ A1 otherwise. (4) After some algebra, this rule simplifies to t ∈ A0 if t ≤ tM L = ln(µD /3) ; 1/3 − 1/µD t ∈ A1 otherwise. (5) When µD = 6 minutes, tM L = 6 ln 2 = 4.16 minutes. When µD = 10 minutes, tM L = (30/7) ln(10/3) = 5.16 minutes. 283 (d) The ML rule is the same as the MAP rule when P [H0 ] = P [H1 ]. When P [H0 ] > P [H1 ], the MAP rule (which minimizes the probability of an error) should enlarge the A0 acceptance region. Thus we would expect tMAP > tM L . (e) From Theorem 8.2, the MAP rule is t ∈ A0 if This rule simplifies to t ∈ A0 if t ≤ tMAP = ln(4µD /3) ; 1/3 − 1/µD t ∈ A1 otherwise. (7) fT |H0 (t) P [H1 ] 1 ≥ = ; fT |H1 (t) P [H0 ] 4 t ∈ A1 otherwise. (6) When µD = 6 minutes, tMAP = 6 ln 8 = 12.48 minutes. When µD = 10 minutes, tM L = (30/7) ln(40/3) = 11.1 minutes. (f) For a given threshold t0 , we learned in parts (a) and (b) that PFA = e−t0 /3 , ¶MISS = 1 − e−t0 /µD . (8) The Matlab program rocvoicedataout graphs both receiver operating curves. The program and the resulting ROC are shown here. t=0:0.05:30; PFA= exp(-t/3); PMISS6= 1-exp(-t/6); PMISS10=1-exp(-t/10); plot(PFA,PMISS6,PFA,PMISS10); legend(’\mu_D=6’,’\mu_D=10’); xlabel(’\itP_{\rmFA}’); ylabel(’\itP_{\rmMISS}’); 1 0.8 PMISS 0.6 0.4 0.2 0 0 0.5 PFA 1 µD=6 µD=10 As one might expect, larger µD resulted in reduced PMISS for the same PFA . Problem 8.2.3 Solution By Theorem 8.5, the decision rule is n ∈ A0 if L(n) = PN |H0 (n) ≥ γ; PN |H1 (n) n ∈ A1 otherwise, (1) where where γ is the largest possible value such that L(n)<γ PN |H0 (n) ≤ α. Given H0 , N is Poisson (a0 = 1,000) while given H1 , N is Poisson (a1 = 1,300). We can solve for the acceptance set A0 by observing that n ∈ A0 if PN |H0 (n) an e−a0 /n! = 0 −a1 ≥ γ. PN |H1 (n) an e /n! 1 Cancelling common factors and taking the logarithm, we find that n ∈ A0 if n ln a0 ≥ (a0 − a1 ) + ln γ. a1 284 (3) (2) Since ln(a0 /a1 ) < 0, divind through reverses the inequality and shows that n ∈ A0 if n ≤ n∗ = (a1 − a0 ) − ln γ (a0 − a1 ) + ln γ = ; ln(a0 /a1 ) ln(a1 /a0 ) n ∈ A1 otherwise (4) However, we still need to determine the constant γ. In fact, it is easier to work with the threshold n∗ directly. Note that L(n) < γ if and only if n > n∗ . Thus we choose the smallest n∗ such that P [N > n∗ |H0 ] = PN |H0 (n) α ≤ 10−6 . (5) n>n∗ To find n∗ a reasonable approach would be to use Central Limit Theorem approximation since given H0 , N is a Poisson (1,000) random variable, which has the same PDF as the sum on 1,000 independent Poisson (1) random variables. Given H0 , N has expected value a0 and variance a0 . From the CLT, P [N > n∗ |H0 ] = P N − a0 n∗ − a0 > √ |H0 ≈ Q √ a0 a0 n∗ − a0 √ a0 ≤ 10−6 . (6) From Table 3.2, Q(4.75) = 1.02 × 10−6 and Q(4.76) < 10−6 , implying √ n∗ = a0 + 4.76 a0 = 1150.5. (7) On the other hand, perhaps the CLT should be used with some caution since α = 10−6 implies we are using the CLT approximation far from the center of the distribution. In fact, we can check out answer using the poissoncdf functions: >> nstar=[1150 1151 1152 1153 1154 1155]; >> (1.0-poissoncdf(1000,nstar))’ ans = 1.0e-005 * 0.1644 0.1420 0.1225 0.1056 0.0910 >> 0.0783 Thus we see that n∗ 1154. Using this threshold, the miss probability is P [N ≤ n∗ |H1 ] = P [N ≤ 1154|H1 ] = poissoncdf(1300,1154) = 1.98 × 10−5 . (8) Keep in mind that this is the smallest possible PMISS subject to the constraint that PFA ≤ 10−6 . Problem 8.2.4 Solution (a) Given H0 , X is Gaussian (0, 1). Given H1 , X is Gaussian (4, 1). From Theorem 8.2, the MAP hypothesis test is fX|H0 (x) = x ∈ A0 if fX|H1 (x) e−x e−(x−4)2 /2 2 /2 P [H1 ] ≥ P [H0 ] ; x ∈ A1 otherwise. (1) Since a target is present with probability P [H1 ] = 0.01, the MAP rule simplifies to x ∈ A0 if x ≤ xMAP = 2 − 1 ln 4 P [H1 ] P [H0 ] 285 = 3.15; x ∈ A1 otherwise. (2) The false alarm and miss probabilities are PMISS = P [X < xMAP |H1 ] = Φ(xMAP − 4) = 1 − Φ(0.85) = 0.1977. The average cost of the MAP policy is E [CMAP ] = C10 PFA P [H0 ] + C01 PMISS P [H1 ] = (1)(8.16 × 10 −4 PFA = P [X ≥ xMAP |H0 ] = Q(xMAP ) = 8.16 × 10−4 (3) (4) (5) (6) )(0.99) + (10 )(0.1977)(0.01) = 19.77. 4 (b) The cost of a false alarm is C10 = 1 unit while the cost of a miss is C01 = 104 units. From Theorem 8.3, we see that the Minimum Cost test is the same as the MAP test except the P [H0 ] is replaced by C10 P [H0 ] and P [H1 ] is replaced by C01 P [H1 ]. Thus, we see from thr MAP test that the minimum cost test is x ∈ A0 if x ≤ xMC = 2 − 1 ln 4 C01 P [H1 ] C10 P [H0 ] = 0.846; x ∈ A1 otherwise. (7) The false alarm and miss probabilities are PMISS = P [X < xMC |H1 ] = Φ(xMC − 4) = 1 − Φ(3.154) = 8.06 × 10 The average cost of the minimum cost policy is E [CMC ] = C10 PFA P [H0 ] + C01 PMISS P [H1 ] = (1)(0.1987)(0.99) + (104 )(8.06 × 10−4 )(0.01) = 0.2773. (10) (11) PFA = P [X ≥ xMC |H0 ] = Q(xMC ) = 0.1987 (8) −4 . (9) Because the cost of a miss is so high, the minimum cost test greatly reduces the miss probability, resulting in a much lower average cost than the MAP test. Problem 8.2.5 Solution Given H0 , X is Gaussian (0, 1). Given H1 , X is Gaussian (v, 1). By Theorem 8.4, the NeymanPearson test is fX|H0 (x) e−x /2 = −(x−v)2 /2 ≥ γ; x ∈ A0 if L(x) = fX|H1 (x) e 2 x ∈ A1 otherwise. (1) This rule simplifies to x ∈ A0 if L(x) = e−[x 2 −(x−v)2 ]/2 = e−vx+v 2 /2 ≥ γ; x ∈ A1 otherwise. (2) Taking logarithms, the Neyman-Pearson rule becomes x ∈ A0 if x ≤ x0 = v 1 − ln γ; 2 v x ∈ A1 otherwise. (3) The choice of γ has a one-to-one correspondence with the choice of the threshold x0 . Moreoever L(x) ≥ γ if and only if x ≤ x0 . In terms of x0 , the false alarm probability is PFA = P [L(X) < γ|H0 ] = P [X ≥ x0 |H0 ] = Q(x0 ). Thus we choose x0 such that Q(x0 ) = α. 286 (4) Problem 8.2.6 Solution Given H0 , Mn (T ) has expected value E[V ]/n = 3/n and variance Var[V ]/n = 9/n. Given H1 , Mn (T ) has expected value E[D]/n = 6/n and variance Var[D]/n = 36/n. (a) Using a Central Limit Theorem approximation, the false alarm probability is PFA = P [Mn (T ) > t0 |H0 ] = P Mn (T ) − 3 9/n > t0 − 3 √ = Q( n[t0 /3 − 1]). (1) 9/n (b) Again, using a CLT Approximation, the miss probability is PMISS = P [Mn (T ) ≤ t0 |H1 ] = P Mn (T ) − 6 36/n ≤ t0 − 6 √ = Φ( n[t0 /6 − 1]). (2) 36/n (c) From Theorem 8.6, the maximum likelihood decision rule is t ∈ A0 if fMn (T )|H0 (t) ≥ 1; fMn (T )|H1 (t) t ∈ A1 otherwise. (3) We will see shortly that using a CLT approximation for the likelihood functions is something of a detour. Nevertheless, with a CLT approximation, the likelihood functions are fMn (T )|H0 (t) = n −n(t−3)2 /18 e 18π fMn (T )|H1 (t) = n −n(t−6)2 /72 e 72π (4) From the CLT approximation, the ML decision rule is t ∈ A0 if 72 e−n(t−3) /18 2 2 = 2e−n[4(t−3) −(t−6) ]/72 ≥ 1; −n(t−6)2 /72 18 e 24 ln 2 ≤ 0; n 2 t ∈ A1 otherwise. (5) After some algebra, this rule simplifies to t ∈ A0 if t2 − 4t − t ∈ A1 otherwise. (6) Since the quadratic t2 −4t−24 ln(2)/n has two zeros, we use the quadratic formula to find the roots. One root corresponds to a negative value of t and can be discarded since Mn (T ) ≥ 0. Thus the ML rule (for n = 9) becomes t ∈ A0 if t ≤ tM L = 2 + 2 1 + 6 ln(2)/n = 4.42; t ∈ A1 otherwise. (7) The negative root of the quadratic is the result of the Gaussian assumption which allows for a nonzero probability that Mn (T ) will be negative. In this case, hypothesis H1 which has higher variance becomes more likely. However, since Mn (T ) ≥ 0, we can ignore this root since it is just an artifact of the CLT approximation. In fact, the CLT approximation gives an incorrect answer. Note that Mn (T ) = Yn /n where Yn is a sum of iid exponential random variables. Under hypothesis H0 , Yn is an Erlang (n, λ0 = 1/3) random variable. Under hypothesis H1 , Yn is an Erlang (n, λ1 = 1/6) random variable. Since Mn (T ) = Yn /n is a scaled version of Yn , Theorem 3.20 tells us that given 287 hypothesis Hi , Mn (T ) is an Erlang (n, nλi ) random variable. Thus Mn (T ) has likelihood functions (nλi )n tn−1 e−nλi t t≥0 (n−1)! (8) fMn (T )|Hi (t) = 0 otherwise Using the Erlang likelihood functions, the ML rule becomes t ∈ A0 if fMn (T )|H0 (t) = fMn (T )|H1 (t) λ0 λ1 n e−n(λ0 −λ1 )t ≥ 1; t ∈ A1 otherwise. (9) This rule simplifies to t ∈ A0 if t ≤ tML = ln(λ0 /λ1 ) = 6 ln 2 = 4.159; λ0 − λ1 t ∈ A1 otherwise. (10) Since 6 ln 2 = 4.159, this rule is not the same as the rule derived using a CLT approximation. Using the exact Erlang PDF, the ML rule does not depend on n. Moreoever, even if n → ∞, the exact Erlang-derived rule and the CLT approximation rule remain different. In fact, the CLT-based rule is simply an approximation to the correct rule. This highlights that we should first check whether a CLT approximation is necessary before we use it. (d) In this part, we will use the exact Erlang PDFs to find the MAP decision rule. From 8.2, the MAP rule is t ∈ A0 if fMn (T )|H0 (t) = fMn (T )|H1 (t) λ0 λ1 n e−n(λ0 −λ1 )t ≥ P [H1 ] ; P [H0 ] t ∈ A1 otherwise. (11) Since P [H0 ] = 0.8 and P [H1 ] = 0.2, the MAP rule simplifies to t ∈ A0 if t ≤ tMAP = For n = 9, tMAP = 5.083. (e) Although we have seen it is incorrect to use a CLT approximation to derive the decision rule, the CLT approximation used in parts (a) and (b) remains a good way to estimate the false alarm and miss probabilities. However, given Hi , Mn (T ) is an Erlang (n, nλi ) random variable. In particular, given H0 , Mn (T ) is an Erlang (n, n/3) random variable while given H1 , Mn (T ) is an Erlang (n, n/6). Thus we can also use erlangcdf for an exact calculation of the false alarm and miss probabilities. To summarize the results of parts (a) and (b), a threshold t0 implies that √ PFA = P [Mn (T ) > t0 |H0 ] = 1-erlangcdf(n,n/3,t0) ≈ Q( n[t0 /3 − 1]), (13) √ (14) PMISS = P [Mn (T ) ≤ t0 |H1 ] = erlangcdf(n,n/6,t0) ≈ Φ( n[t0 /6 − 1]). (15) Here is a program that generates the receiver operating curve. ln λ0 − λ1 1 n ln P [H1 ] P [H0 ] λ0 − λ1 = 6 ln 2 + ln 4 ; n t ∈ A1 otherwise. (12) 288 %voicedatroc.m t0=1:0.1:8’; n=9; PFA9=1.0-erlangcdf(n,n/3,t0); PFA9clt=1-phi(sqrt(n)*((t0/3)-1)); PM9=erlangcdf(n,n/6,t0); PM9clt=phi(sqrt(n)*((t0/6)-1)); n=16; PFA16=1.0-erlangcdf(n,n/3,t0); PFA16clt=1.0-phi(sqrt(n)*((t0/3)-1)); PM16=erlangcdf(n,n/6,t0); PM16clt=phi(sqrt(n)*((t0/6)-1)); plot(PFA9,PM9,PFA9clt,PM9clt,PFA16,PM16,PFA16clt,PM16clt); axis([0 0.8 0 0.8]); legend(’Erlang n=9’,’CLT n=9’,’Erlang n=16’,’CLT n=16’); Here are the resulting ROCs. 0.8 0.6 0.4 0.2 0 Erlang n=9 CLT n=9 Erlang n=16 CLT n=16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Both the true curve and CLT-based approximations are shown. The graph makes it clear that the CLT approximations are somewhat innaccurate. It is also apparent that the ROC for n = 16 is clearly better than for n = 9. Problem 8.2.7 Solution This problem is a continuation of Problem 8.2.6. In this case, we say a call is a “success” if T > t0 . The success probability depends on which hypothesis is true. In particular, p0 = P [T > t0 |H0 ] = e−t0 /3 , Under hypothesis Hi , K has the binomial (n, pi ) PMF PK|Hi (k) = n k p (1 − pi )n−k . k i (2) p1 = P [T > t0 |H1 ] = e−t0 /6 . (1) (a) A false alarm occurs if K > k0 under hypothesis H0 . The probability of this event is n PFA = P [K > k0 |H0 ] = k=k0 +1 n k p (1 − p0 )n−k , k 0 (3) (b) From Theorem 8.6, the maximum likelihood decision rule is k ∈ A0 if PK|H0 (k) pk (1 − p0 )n−k = 0 ≥ 1; PK|H1 (k) pk (1 − p1 )n−k 1 289 k ∈ A1 otherwise. (4) This rule simplifies to k ∈ A0 if k ln p0 /(1 − p0 ) p1 /(1 − p1 ) ≥ ln 1 − p1 1 − p0 n; k ∈ A1 otherwise. (5) To proceed further, we need to know if p0 < p1 or if p0 ≥ p1 . For t0 = 4.5, p0 = e−1.5 = 0.2231 < e−0.75 = 0.4724 = p1 . (6) In this case, the ML rule becomes k ∈ A0 if k ≤ kM L = For n = 16, kM L = 5.44. (c) From Theorem 8.2, the MAP test is k ∈ A0 if PK|H0 (k) pk (1 − p0 )n−k P [H1 ] = 0 ; ≥ k (1 − p )n−k PK|H1 (k) P [H0 ] p1 1 p0 /(1 − p0 ) p1 /(1 − p1 ) 1−p0 1−p1 ln ln 1−p0 1−p1 p1 /(1−p1 ) p0 /(1−p0 ) n = (0.340)n; k ∈ A1 otherwise. (7) k ∈ A1 otherwise. (8) with P [H0 ] = 0.8 and P [H1 ] = 0.2, this rule simplifies to k ∈ A0 if k ln ≥ ln 1 − p1 1 − p0 n; k ∈ A1 otherwise. (9) For t0 = 4.5, p0 = 0.2231 < p1 = 0.4724, the MAP rule becomes k ∈ A0 if k ≤ kMAP = For n = 16, kMAP = 6.66. (d) For threshold k0 , the false alarm and miss probabilities are PMISS = P [K ≤ k0 |H1 ] = binomialcdf(n,p1,k0) PFA = P [K > k0 |H0 ] = 1-binomialcdf(n,p0,k0) (11) (12) ln ln + ln 4 n = (0.340)n + 1.22; k ∈ A1 otherwise. (10) p1 /(1−p1 ) p0 /(1−p0 ) The ROC is generated by evaluating PFA and PMISS for each value of k0 . Here is a Matlab program that does this task and plots the ROC. function [PFA,PMISS]=binvoicedataroc(n); t0=[3; 4.5]; p0=exp(-t0/3); p1=exp(-t0/6); k0=(0:n)’; PFA=zeros(n+1,2); for j=1:2, PFA(:,j) = 1.0-binomialcdf(n,p0(j),k0); PM(:,j)=binomialcdf(n,p1(j),k0); end plot(PFA(:,1),PM(:,1),’-o’,PFA(:,2),PM(:,2),’-x’); legend(’t_0=3’,’t_0=4.5’); axis([0 0.8 0 0.8]); xlabel(’\itP_{\rmFA}’); ylabel(’\itP_{\rmMISS}’); 290 2.8 As we see. Y = W is an exponential (λ = 1) random variable. Given hypothesis H0 that X = 0. Y |H1 ye−y y ≥ 0. (3) Problem 8. λ = 1) random variable.5 than with t0 = 3. the test works better with threshold t0 = 4.9 Solution Given hypothesis Hi .7 0. 0 otherwise. (1) Problem 8. k i (1) .4 0. 291 k ∈ A1 otherwise. The probability of error is PERR = P [Y > 1|H0 ] P [H0 ] + P [Y ≤ 1|H1 ] P [H1 ] 1 1 −y 1 ∞ −y e dy + ye dy 2 1 2 0 1 − e−1 e−1 1 − 2e−1 + = .1 0. = 2 2 2 = (4) (5) (6) y ∈ A1 otherwise.2 0 t0=3 t0=4. (y) = f 0 otherwise.2 0. the MAP rule is y ∈ A0 if fY |H0 (y) e−y P [H1 ] = −y ≥ = 1.3 0.2.6 0.6 PMISS 0. Given hypothesis H1 that X = 1.4 PFA 0. fY |H1 (y) ye P [H0 ] y ∈ A1 otherwise.and here is the resulting ROC: 0.5 0.8 0. (2) Thus the MAP rule simplifies to y ∈ A0 if y ≤ 1. (2) n k q (1 − qi )n−k . Y = V + W is an Erlang (n = 2.8 Solution The probability of a decoding error is minimized by the MAP rule.5 0 0. That is. K has the binomial PMF PK|Hi (k) = (a) The ML rule is k ∈ A0 if PK|H0 (k) > PK|H1 (k) . Since P [H0 ] = P [H1 ] = 1/2. fY |H0 (y) = e−y y ≥ 0. otherwise. . we have the following comparison: geometric test PFA = 0. the number of failures in 500 tests. was used. (3) Cancelling common factors. we have ln[(1 − 10−4 )/(1 − 10−2 )] ≈ 1. we can rewrite our ML rule as k ∈ A0 if k < k ∗ = n ln[(1 − q0 )/(1 − q1 )] . 1. k 1 k ∈ A1 otherwise. the binomial random variable K. PMISS = 0. and rearranging yields k ∈ A0 if k ln q0 + (n − k) ln(1 − q0 ) > k ln q1 + (n − k) ln(1 − q1 ). We will call these two procedures the geometric (N ) (N ) and the binomial tests. By combining all terms with k. k ∈ A1 otherwise.8. was used.0012 (10) (11) (12) 499 = PK|H1 (0) + PK|H1 (1) + 500q1 (1 − q1 ) = 0. From Example 8. .0398. (6) (b) Let k ∗ denote the threshold given in part (a). q0 = 10−4 . the rule can be simplified to k ∈ A0 if k ln q1 /(1 − q1 ) q0 /(1 − q0 ) < n ln 1 − q0 1 − q1 .078 (7) k ∗ = 500 ln[10−2 /10−4 ] + ln[(1 − 10−4 )/(1 − 10−2 )] Thus the ML rule is that if we observe K ≤ 1.0398 (K) (K) (14) (15) (16) .0012.0045. (13) (c) In the test of Example 8. we choose H1 . In this problem. Using n = 500. (4) Note that q1 > q0 implies q1 /(1 − q1 ) > q0 /(1 − q0 ). PMISS = 0. Thus. taking the logarithm of both sides. The false alarm probability is PFA = P [A1 |H0 ] = P [K > 1|H0 ] = 1 − (1 − q0 ) and the miss probability is PMISS = P [A0 |H1 ] = P [K ≤ 1|H1 ] = (1 − q1 ) 500 (8) (9) 499 = 1 − PK|H0 (0) − PK|H1 (1) 500 − 500q0 (1 − q0 ) = 0. plugging in the conditional PMF’s yields the rule k ∈ A0 if n k q (1 − q0 )n−k > k 0 n k q (1 − q1 )n−k . . then we choose hypothesis H0 . (5) k ∈ A1 otherwise.0087.When we observe K = k ∈ {0. ln[q1 /q0 ] + ln[(1 − q0 )/(1 − q1 )] k ∈ A1 otherwise.8. We also use PFA and PMISS for the error probabilities of the binomial test. we will use PFA and PMISS to denote the false alarm and miss (K) (K) probabilities using the geometric test. Also. 292 (N ) (N ) binomial test PFA = 0. the geometric random variable N . . and q1 = 10−2 . n}. the number of tests needed to find the first failure. it is reasonable to assume the cost is proportional to the number of disk drives that are tested. calculate the expected number of drives tested under each test method for each hypothesis. For the binomial test. (1) + P [A0 |H1 ] P [H1 ] C01 + (1 − P [A0 |H1 ])P [H1 ] C11 . we want to judge both the reliability of the test as well as the cost of the testing procedure. if you wish. the false alarm probability of the geometric test is about four times higher than that of the binomial test. the MAP rule is x ∈ Am if fX|Hm (x) ≥ fX|Hj (x) for all j. we test until either 2 drives fail or until 500 drives the pass the test! You can.When making comparisons between tests. (4) Since P [H0 ]C00 + P [H1 ]C11 does not depend on the acceptance sets A0 and A1 . (17) Roughly. For the geometric test of Example 8. As for the cost of the test. it isn’t necessary in order to see that a lot more drives will be tested using the binomial test. The total expected cost can be written as E C = P [A1 |H0 ] P [H0 ] C10 + (1 − P [A1 |H0 ])P [H0 ] C00 (2) (3) P [A1 |H1 ] = 1 − P [A0 |H1 ] .3 with the costs C01 and C10 replaced by the differential costs C01 − C11 and C10 − C00 .57.10 Solution The key to this problem is to observe that P [A0 |H0 ] = 1 − P [A1 |H0 ] . without that information.75 but PMISS PMISS (N ) (K) = 4. 293 (1) Problem 8. However. (5) The decision rule that minimizes E[C ] is the same as the minimum cost test in Theorem 8. we have E C = P [A1 |H0 ] P [H0 ] (C10 − C00 ) + P [A0 |H1 ] P [H1 ] (C01 − C11 ) + P [H0 ] C00 + P [H1 ] C11 . it would not be unreasonable to conclude that the geometric test offers performance roughly comparable to that of the binomial test but with a significant reduction in the expected number of drives tested. Since the three hypotheses H0 . If we knew the a priori probabilities P [Hi ] and also the relative costs of the two type of errors. Rearranging terms.8. and H2 are equally likely. Problem 8.1 Solution . However. the decision rule that minimizes E[C ] is the same decision rule that minimizes E C = P [A1 |H0 ] P [H0 ] (C10 − C00 ) + P [A0 |H1 ] P [H1 ] (C01 − C11 ). the MAP and ML hypothesis tests are the same. With respect to the reliability.3. then we could determine which test procedure was better.8.2. we test either until the first failure or until 46 drives pass the test. However. we see that the conditional error probabilities appear to be comparable in that PFA PFA (N ) (K) = 3. the miss probability of the binomial test is about four times that of the geometric test. From Theorem 8. H1 . the signal space is one dimensional and the acceptance regions are A1 s1 0 A2 s2 a X Just as in the QPSK system of Example 8. (2) x ∈ Am if (x − a(m − 1))2 ≤ (x − a(j − 1))2 for all j. we will assume N1 and N2 are iid zero mean Gaussian random variables with variance σ 2 . the conditional PDF of X given Hi is fX|Hi (x) = Thus.X2 |Hijk (x1 .2 Solution Let the components of sijk be denoted by sijk and sijk so that given hypothesis Hijk . These are: x ∈ A1 if −a/2 ≤ x ≤ a/2 x ∈ A2 if x ≥ a/2 To summarize. given hypothesis Hijk . the three acceptance regions are A0 = {x|x ≤ −a/2} A1 = {x| − a/2 < x ≤ a/2} A0 s0 -a (3) (4) (5) (6) (7) A2 = {x|x > a/2} (8) Graphically. X1 and X2 are independent and the conditional joint PDF of X1 and X2 is fX1 . the additive Gaussian noise dictates that the acceptance region Ai is the set of observations x that are closer to si = (i − 1)a than any other sj . Similar rules can be developed for A1 and A2 .2 Since N is Gaussian with zero mean and variance σN . This rule simplifies to x ∈ A0 if x ≤ −a/2.13. This implies that the rule for membership in A0 is x ∈ A0 if (x + a)2 ≤ x2 and (x + a)2 ≤ (x − a)2 . x2 ) = fX1 |Hijk (x1 ) fX2 |Hijk (x2 ) 1 −(x1 −s(1) )2 /2σ2 −(x2 −s(2) )2 /2σ2 ijk ijk = e e 2πσ 2 1 −[(x1 −s(1) )2 +(x2 −s(2) )2 ]/2σ2 ijk ijk = e 2πσ 2 294 (2) (3) (4) . Problem 8. Thus. s X1 N1 = ijk + (2) X2 N2 sijk (1) (1) (2) (1) As in Example 8. the MAP rule is 1 2 2πσN e−(x−a(i−1)) 2 /2σ 2 N .3.13. X2 |Hi j k (x1 . the MAP and ML rules are (6) x ∈ Aijk if fX1 . we found the MAP acceptance regions were A0 = {x|x ≤ −a/2} A1 = {x| − a/2 < x ≤ a/2} P [DE |Hi ] = P [X ∈ Ai |Hi ] Given Hi .X2 |Hi (x1 .3. . (7) (8) This means that Aijk is the set of all vectors x that are closer to sijk than any other signal. . x2 ) ≥ fX1 .1. This implies P [X ∈ A0 |H0 ] = P [−a + N > −a/2] = P [N > a/2] = Q P [X ∈ A1 |H1 ] = P [N < −a/2] + P [N > a/2] = 2Q a 2σN a 2σN a 2σN (3) (4) (5) A2 = {x|x > a/2} (1) To calculate the probability of decoding error. The resulting boundaries are shown in this figure: X2 A110 s110 A010 s010 A011 s011 A111 s111 A000 s000 X1 A001 s001 A101 s101 A100 s100 Problem 8. . s111 fX1 . we first calculate the conditional error probabilities (2) P [X ∈ A2 |H2 ] = P [a + N < a/2] = P [N < −a/2] = Q 295 . . to find the boundary between points closer to sijk than si j k .X2 |Hijk (x1 . x2 ) = sijk = sijk sijk (2) (1) (5) 1 − x−sijk 2 /2σ2 e 2πσ 2 are equally likely. Graphically.3. The boundary is then the perpendicular bisector.3 Solution In Problem 8. we draw the line segment connecting sijk and si j k .In terms of the distance x − sijk between vectors x x= 1 x2 we can write Since all eight symbols s000 . This rule simplifies to x ∈ Aijk if x − sijk ≤ x − sijk for all other i j k . x2 ) for all other Hi j k . recall that X = a(i − 1) + N . x2 ) = fX1 |Hi (x1 ) fX2 |Hi (x2 ) 1 −(x1 −si1 )2 /2σ2 −(x2 −si2 )2 /2σ2 e e = 2πσ 2 1 −[(x1 −si1 )2 +(x2 −si2 )2 ]/2σ2 = e 2πσ 2 (x1 .5 Solution Assuming all four hypotheses are equally likely. x2 ) ≥ fX1 . x2 ) for all j. the acceptance region Ai is the set of vectors x that are closest to si . Equivalently.X2 |Hi (x1 . the acceptance regions are defined by the rule x ∈ Ai if x − si 2 Problem 8. the ML acceptance regions are (x1 . the probability of error is 2 P [DE ] = i=0 4 P [X ∈ Ai |Hi ] P [Hi ] = Q 3 a 2σN (6) Let Hi denote the hypothesis that symbol ai was transmitted. and H2 each have probability 1/3. Given Hi . each signal sij1 is below the x-axis while each signal sij0 is above the x-axis. X2 = −1 + N2 This implies • Given H101 or H111 . Since the four hypotheses are equally likely. X2 = −2 + N2 P [B3 |H011 ] = P [B3 |H001 ] = P [−1 + N2 > 0] = Q(1/σN ) P [B3 |H101 ] = P [B3 |H111 ] = P [−2 + N2 > 0] = Q(2/σN ) (1) (2) Problem 8. x2 ) ∈ Ai if fX1 . This implies fX1 . N1 and N2 are independent and thus X1 and X2 are independent.2 the acceptance regions Ai for the ML multiple hypothesis test must satisfy (4) (5) ≤ x − sj 2 (6) Just as in the case of QPSK.3. H1 .5. From the signal constellation depicted in Problem 8.X2 |Hj (x1 .3. The event B3 of an error in the third bit occurs if we transmit a signal sij1 but the receiver output x is above the x-axis or if we transmit a signal sij0 and the receiver output is below the x-axis.4 Solution (1) (2) (3) From Definition 8.3. the probability of an error decoding the third bit is P [B3 ] = P [B3 |H011 ] + P [B3 |H001 ] + P [B3 |H101 ] + P [B3 |H111 ] 4 Q(1/σN ) + Q(2/σN ) = 2 296 (3) (4) .Since the three hypotheses H0 . the ML tests will maximize the probability of a correct decision. we need only consider the case when we transmit one of the four signals sij1 . By symmetry. In particular. • Given H011 or H001 . x2 ) ∈ Ai if (x1 − si1 )2 + (x2 − si2 )2 ≤ (x1 − sj1 )2 + (x2 − sj2 )2 for all j In terms of the vectors x and si .X2 |Hi (x1 . Thus d = E sin(π/M ).N2 (n1 . the signal constellation (i. PERR is the same as the conditional probability of error 1−P [Ai |Hi ]. hence we examine A0 . √ √ the sketch shows that sin(θ/2) = d/ E.8. where N is a Gaussian random vector independent of which signal was transmitted. X2 q/2 E1/2 d s0 X1 Geometrically.e. σ 2 ) random variables. (3) s0 sM-1 sM-2 AM-1 AM-2 In terms of geometry. (c) By symmetry. By symmeslice try. the MAP rule is x ∈ Am if fX|Hm (x) ≥ fX|Hj (x) for all j. (b) Consider the following sketch to determine d. the interpretation is that all vectors x closer to sm than to any other signal sj are assigned to Am . no matter which si is transmitted. In this problem. the set of vectors si ) is the set of vectors on the circle of radius E. the MAP and ML rules are the same and achieve the minimum probability of error. fX|Hi (x) = 1 − 1 (x−si ) σ2 I−1 (x−si ) 1 − 12 e 2 = e 2σ 2 2πσ 2πσ 2 x−si 2 . Let B denote a circle of radius d at the origin and let Bi denote the circle of radius d around si . Since the length of each signal vector is E. P [A0 |H0 ] = P [X ∈ A0 |H0 ] ≥ P [X ∈ B0 |H0 ] = P [N ∈ B] .Problem 8. Each pie √ has angle θ = 2π/M . Since the components of N are iid Gaussian (0. In this case. X is a Gaussian (si . Thus. Since X is two-dimensional. The acceptance regions are the “pie slices” around each signal vector. P [N ∈ B] = fN1 . X2 s2 (2) A2 s1 Using the conditional PDFs fX|Hi (x). B 2 2 2 (4) (5) B 297 . given Hi .. n2 ) dn1 dn2 = 1 2πσ 2 e−(n1 +n2 )/2σ dn1 dn2 . Since B0 ⊂ A0 . σ 2 I) random vector. from the vector version of Theorem 8. (1) Since the hypotheses Hi are equally likely. the MAP rule becomes A1 A0 X1 x ∈ Am if x − sm 2 ≤ x − sj 2 for all j. this is the same for every Ai .3.6 Solution (a) Hypothesis Hi is that X = si +N. the largest d such that x − si ≤ d defines the largest circle around si that can be inscribed into the pie slice Ai . 7 Solution (a) In Problem 8. we found that in terms of the vectors x and si .3. (9) Problem 8.4. the acceptance regions are defined by the rule x ∈ Ai if x − si 2 ≤ x − sj 2 for all j. the acceptance region Ai is the set of vectors x that are closest to si . X2 )|0 < X1 ≤ 2.3.By changing to polar coordinates. we see that the acceptance region is A1 = {(X1 . (1) Just as in the case of QPSK. 0 < X2 ≤ 2} (2) 298 . P [N ∈ B] = d 2π 1 2 2 e−r /2σ r dθ dr 2 2πσ 0 0 d 1 2 2 = 2 re−r /2σ r dr σ 0 (6) (7) 2 (π/M )/2σ 2 = −e−r Thus 2 /2σ 2 d 0 = 1 − e−d 2 /2σ 2 = 1 − e−E sin (8) PERR = 1 − P [A0 |H0 ] ≤ 1 − P [N ∈ B] = e−E sin 2 (π/M )/2σ 2 . these regions are easily found from the sketch of the signal constellation: X2 s7 s5 s2 s3 s6 s4 s1 s0 X1 s9 s8 s1 2 s1 4 s1 0 s11 s1 3 s1 5 (b) For hypothesis A1 . Graphically. 8. X2 ) ∈ A1 |H1 ] = (P [−1 < N1 ≤ 1])2 = (2Φ(1/σN ) − 1) 2 (3) (4) (5) (6) (7) = P [0 < 1 + N1 ≤ 2. − 1 −[(x1 −√E cos θi )2 +(x2 −√E sin θi )2 ]/2σ2 e 2πσ 2 (2) (1) Using this conditional joint PDF. This implies 15 ≥ P [−1 < N1 ≤ 1.Given H1 . x2 ) = • (x1 . Thus.X2 |Hi (x1 . X2 ) ∈ Ai |H1 ] 2 (8) (9) (10) = (P [−1 < N1 ≤ 1]) = P [C|H1 ] . the MAP rule becomes √ E cos θi )2 + (x2 − E sin θi )2 2σ 2 √ √ (x1 − E cos θj )2 + (x2 − E sin θj )2 pj + ≥ ln . x2 ) ≥ pj fX1 . −1 < N2 ≤ 1] P [C] = i=0 15 P [C|Hi ] P [H1 ] 15 (11) P [Hi ] = P [C|H1 ] i=0 ≥ P [C|H1 ] P [Hi ] = P [C|H1 ] i=0 (12) Problem 8. x2 ) for all j From Example 8. 0 < 1 + N2 ≤ 2] = (Φ(1/σN ) − Φ(−1/σN )) 2 (c) Surrounding each signal si is an acceptance region Ai that is no smaller than the acceptance region A1 . From Theorem 8. X2 given Hi is fX1 .13.X2 |Hi (x1 . x2 ) ∈ Ai if for all j. Given H1 . a correct decision is made if (X1 . 2 2σ pi √ (x1 − (3) Expanding the squares and using the identity cos2 θ + sin2 θ = 1 yields the simplified rule • (x1 . P [C|Hi ] = P [(X1 .3. pj σ2 x1 [cos θi − cos θj ] + x2 [sin θi − sin θj ] ≥ √ ln E pi 299 (4) . X2 ) ∈ A1 . That is. the conditional PDF of X1 . X1 = 1 + N1 and X2 = 1 + N2 . P [C|H1 ] = P [(X1 .8 Solution Let pi = P [Hi ].X2 |Hj (x1 . x2 ) ∈ Ai if for all j. x2 ) ∈ Ai if pi fX1 . the MAP multiple hypothesis test is (x1 . x2 ) ∈ A3 if σ2 p2 x1 ≥ √ ln 2E p3 Using the parameters σ = 0.Note that the MAP rules define linear constraints in x1 and x2 . we use the following table to enumerate the constraints: cos θi sin θi √ √ 1/ √ 2 1/√2 −1/√2 1/ √ 2 −1/ 2 −1/√2 √ 1/ 2 −1/ 2 i=0 i=1 i=2 i=3 (5) To be explicit.497} A1 = {(x1 . x2 ) ∈ A1 if σ2 p1 x1 ≤ √ ln 2E p0 • (x1 . x2 )|x1 ≤ 0. x2 ) ∈ A0 if σ2 p1 x1 ≥ √ ln 2E p0 • (x1 . Thus.497. to determine whether (x1 . x2 ) ∈ Ai . x1 + x2 ≥ −0. each Ai is defined by three constraints.497} (11) (12) (13) (14) 300 .497. Since θi = π/4 + iπ/2. Using the above table. the acceptance regions are • (x1 .497. −x1 + x2 ≥ 0} Here is a sketch of these acceptance regions: A2 = {(x1 . x2 )|x1 ≤ −0. we need to check the MAP rule for each j = i. x2 ≤ −0. x1 + x2 ≥ −0. x2 ≥ 0. x2 )|x1 ≥ −0.497. x2 ≤ 0. x2 ) ∈ A2 if σ2 p2 x1 ≤ √ ln 2E p3 • (x1 .8 E=1 p0 = 1/2 p1 = p2 = p3 = 1/6 (10) σ2 p3 x2 ≤ √ ln 2E p0 σ2 p2 − x1 + x2 ≥ √ ln 2E p3 (9) σ2 p2 x2 ≤ √ ln 2E p1 σ2 p2 x1 + x2 ≥ √ ln 2E p0 (8) σ2 p2 x2 ≥ √ ln 2E p1 σ2 p3 − x1 + x2 ≥ √ ln 2E p1 (7) σ2 p3 x2 ≥ √ ln 2E p0 σ2 p2 x1 + x2 ≥ √ ln 2E p0 (6) the acceptance regions for the MAP rule are A0 = {(x1 . x2 ≥ −0. −x1 + x2 ≥ 0} A3 = {(x1 . x2 )|x1 ≥ 0. . √ √ Xk pk X k pk √  p1 X 1 √  .  = p1 X1 S1 + · · · + pk Xk Sk . σ 2 I) random vector. Since each Xi is independently and equally likely to be ±1. . a hypothesis Hj corresponds to a vector xj ∈ Bk which has ±1 components. . Since there are 2k such vectors..9 Solution (a) First we note that  P1/2 X =  √ p1  √  p1 X 1 X1  . each hypothesis has probability 2−k . the MAP and and ML rules are the same and achieve minimum probability of error. . a detector must decide which vector X = X1 · · · Xk was (collectively) sent by the k transmitters. . That is. H2k . Y = SP1/2 xj + N is a Gaussian (SP1/2 xj .  .  (1) Since each Si is a column vector. . (3) Under hypothesis Hj . . . there are 2k hypotheses which we can enumerate as H1 . .X2 A1 s1 s0 A0 X1 A2 s2 s3 A3 Note that the boundary between A1 and A3 defined by −x1 + x2 ≥ 0 plays no role because of the high value of p0 . The MAP/ML rule is y ∈ Am if fY|Hm (y) ≥ fY|Hj (y) for all j. Problem 8. A hypothesis Hj must specify whether Xi = 1 or Xi = −1 for each i.  =  . In this case.3. (4) 301 .  √  . SP1/2 X = S1 · · · Thus Y = SP1/2 X + N = Sk (2) k √ i=1 pi Xi Si (b) Given the observation Y = y. √ pk X k + N.   .  . The conditional PDF of Y is fY|Hj (y) = 1 1 1 1/2 2 −1 1/2 e− 2 (y−SP xj ) (σ I) (y−SP xj ) = e− 2 )n/2 (2πσ (2πσ 2 )n/2 y−SP1/2 xj 2 /2σ 2 . (1) 0 sin θ For the ML detector. x∈Bk where arg minx g(x) returns the argument x that minimizes g(x). this is described by the math notation (7) x∗ (y) = arg min y − SP1/2 x . In particular. (4) With the four hypotheses equally likely. Thus under hypothesis Hi . suppose user 1 transmits with code vector S1 = 1 0 and user transmits with code vector S2 = cos θ sin θ In addition. However. that means we have reduced the 2k comparisons of the optimal detector to a linear transformation followed by k single bit comparisons. y ∈ Am if y − SP1/2 xm ≤ y − SP1/2 xj for all j. the maximum likelihood detector is considered to be computationally intractable for a large number of users k. P = I and 1 cos θ S= . Since there 2k vectors in Bk . we choose the vector x∗ = xm that minimizes the distance y − SP1/2 xj among all vectors xj ∈ Bk .3. we assume that the users powers are p1 = p2 = 1. (6) y−SP1/2 xm 2 /2σ 2 ≥ e− y−SP1/2 xj 2 /2σ 2 for all j. we must evaluate y − SP1/2 x for each x ∈ Bk . σ 2 I) random vector with PDF fY|Hi (y) = 1 −(y−yi ) (σ2 I)−1 (y−yi )/2 1 − e = e 2πσ 2 2πσ 2 y−yi 2 /2σ 2 .10 Solution When X = xi . Using Hi to denote the hypothesis that X = xi .The MAP rule is y ∈ Am if e− or equivalently. the MAP and ML detectors are the same and minimize the probability of error. as this is not a satisfactory answer. If they were the same. Since this vector x∗ is a function of the observation y. we have to evaluate 2k hypotheses. 302 (5) . Y = yi + N is a Gaussian (yi . A short answer is that the decorrelator cannot be the same as the optimal maximum likelihood (ML) detector. this decision rule is y ∈ Am if fY|Hm (y) ≥ fY|Hj (y) for all j. there are four hypotheses corresponding to each possible transmitted bit of each user.8. Y = yi + N where yi = Sxi . (c) To implement this detector. From Theorem 8. Because the number of hypotheses grows exponentially with the number of users k. we have X = x1 = X = x2 = 1 1 1 −1 X = x3 = X = x4 = −1 1 −1 −1 (2) (3) Problem 8. (5) That is. In this case. we will build a simple example with k = 2 users and precessing gain n = 2 to show the difference between the ML detector and the decorrelator. they cannot even be represented by the Φ(·) function. Y2 A3 y3 q A1 y1 Y1 y1 = y2 = 1 + cos θ sin θ 1 − cos θ − sin θ y3 = y4 = −1 + cos θ sin θ −1 − cos θ − sin θ (7) (8) y4 A4 y2 A2 The probability of a correct decision is P [C] = 1 4 4 i=1 Ai fY|Hi (y) dy. Moreoever. The probability of a bit error is still somewhat more complex.This rule simplifies to y ∈ Am if y − ym ≤ y − yj for all j. The detector guesses X1 = 1 if Y ∈ A1 ∪ A3 . (6) It is useful to show these acceptance sets graphically. (14) (15) A4 y4 y2 A2 303 . Y2 = sin θ (13) Assuming (as in earlier sketch) that 0 < θ < π/2. the four integrals Ai fY|Hi (y) dy cannot be represented in a simple form. For example if X1 = 1. Y1 = Y1 − sin θ Y2 A3 y3 q (12) Y2 ˜ . Since S is a square invertible matrix. then ˆ hypotheses H1 and H3 are equally likely. Given X1 = 1. the conditional probability of a correct decision on this bit is 1 1 ˆ P X1 = 1|X1 = 1 = P [Y ∈ A1 ∪ A3 |H1 ] + P [Y ∈ A1 ∪ A3 |H3 ] 2 2 1 1 fY|H1 (y) dy + f (y) dy = 2 A1 ∪A3 2 A1 ∪A3 Y|H3 1 1 − cos θ 1 sin θ 0 (10) (11) By comparison. the decorrelator does something simpler. In this plot. the decorrelator bit decisions are y1 A1 Y1 ˆ ˜ X1 = sgn (Y1 ) = sgn ˆ ˜ X2 = sgn (Y2 ) = sgn Y1 − Y2 sin θ cos θ Y2 sin θ = sgn (Y2 ). Note that the probability of a correct decision is the probability that the bits X1 and X2 transmitted by both users are detected correctly. (S S)−1 S = S−1 (S )−1 S = S−1 = ˜ We see that the components of Y = S−1 Y are cos θ ˜ Y2 . the area around yi is the acceptance set Ai and the dashed lines are the boundaries between the acceptance sets. (9) Even though the components of Y are conditionally independent given Hi . . the probability of error is PERR = pQ p v σ ln + 2v 1 − p σ + (1 − p)Φ 304 σ p v ln − 2v 1 − p σ . (1) . we learn that we have a fairly lousy sensor.4.9. xlabel(’\itP_{\rm FA}’).’).^x))/(1-p1^(20)).PMISS]=rocdisc(p0. they are easy to implement and probability of error is easy to calculate. 2.4 0.’k. these regions are suboptimal in terms of probability of error. (1) where p0 = 0. . 1 0.99): function [PFA. the effect of the decorrelator on the rule for bit X2 is particularly easy to understand. 19 that P [X > x0 |Hi ] = 1 − pi 1 − p20 i 20 px−1 = i x=x0 +1 = = We note that the above formula is also correct for x0 = 20. ylabel(’\itP_{\rm MISS}’).99 and p1 = 0.5 1 P FA From the receiver operating curve. 1 − p20 0 1 − px 0 1 = 1 − P [X > x0 |H1 ] = 1 − p20 1 (5) (6) px0 − p20 px0 (1 − p20−x0 ) i i i = i 1 − p20 1 − p20 i i px0 (1 − pi ) i 1 + pi + · · · + p19−x0 i 1 − p20 i 1 − pi x0 p + · · · + p19 i 1 − p20 i i (2) (3) (4) The Matlab program rocdisc(p0. . 20.^x-p0^(20))/(1-p0^(20)).2 Solution From Example 8. we just check whether the vector Y is in the upper half plane. the conditional PMF of X is PX|Hi (x) = (1 − pi )px−1 /(1 − p20 ) x = 1. Generally.6 0. Using this formula.p1).5. Problem 8. . the boundaries of the decorrelator decision regions are determined by straight lines. PFA= (p0.9.PMISS. However. 1.p1) returns the false alarm and miss probabilities and also plots the ROC. plot(PFA.Because we chose a coordinate system such that S1 lies along the x-axis. .0. either the false alarm probability or the miss probability (or both!) exceed 0. PMISS= (1.0-(p1.4. For bit X2 . .7.1 Solution Under hypothesis Hi .2 0 0 0. . the false alarm and miss probabilities are PFA PMISS px0 − p20 0 = P [X > x0 |H0 ] = 0 .8 P MISS 0. i i 0 otherwise. Here is the program and the output for rocdisc(0. . It follows that for x0 = 0. No matter how we set the threshold x0 . x=0:20. Problem 8. 8: >> T=[0.5 and d = 0.snr). Finally.0.T(Imin) ans = 0.5. · · · .5. +((1-p). v/σ = 10 corresponds to a SNR of 10 log1 0(v 2 /σ 2 ) = 20 dB. ylabel(’\it P_{ERR}’). xlabel(’\it p’).8 0.100000. in these extreme cases. Here are three outputs of bperrplot for the requested SNR values.T(Imin) ans = 0.70:0. Here are the programs function perr=bperr(p. 1. %Problem 8. plot(p.0}. To see this.2 Solution r=log(p. there is no need to tansmit a bit.6 PERR 0.5./(1-p))/(2*snr).1:1.4:0.5.T).Pe=sqdistor(1. With v = 1. the optimal (minimum probability of error) decision rule is able to exploit the one hypothesis having higher a priori probability than the other. This gives the wrong impression that one should consider building a communication system with p = 1/2. In particular. 0.78000000000000 305 .Imin]=min(Pe).1) bperrplot(0.5.05 0 0 0.1) bperrplot(0. For communication systems.*(qfunction(r+snr))) . p=0.0.*phi(r-snr)). >>[Pmin.4. Now we test values of T in the neighborhood of 0.2 0.5 p 1 PERR 0 0.9].4. it appeared in Example 8. however. no information is being communicated.98.0]. To test this possibility we use sqdistor with 100. we note that v/σ is an SNR voltage ratio.14 that T = 0. function pe=bperrplot(snr). consider the most extreme case in which the error probability goes to zero as p → 0 or p → 1.snr).100000.02:0.It is straightforward to use Matlab to plot PERR as a function of p. 0.000 transmitted bits by trying the following: >> T=[0.4.02:0.T). >> [Pmin.8 is best.80000000000000 Problem 8. However. When p = 1/2.4 0. However.Pe=sqdistor(1.02:0. it also seemed likely the error probability Pe would decrease for larger values of T .2 0 0 0. A second program bperrplot(snr) plots PERR as a function of p. it becomes impossible to transmit any information. In fact.15 PERR 0.1) In all three cases.5 was best among the values tested.5 p 1 0.pe).3 Solution Thus among {0.5 p 1 bperrplot(0. pe=bperr(p. the detector can simply guess the transmitted bit. When p = 0 or p = 1.Imin]=min(Pe). it is common to measure SNR as a power ratio.. The function bperr calculates PERR for a vector p and a scalar signal to noise ratio snr corresponding to v/σ. we see that PERR is maximum at p = 1/2.5..1 8 6 4 2 x 10 −24 0. perr=(p. it appears that T = 0. the decision rule is x ∈ A0 if fX|H0 (x) ≥ γ. we can use the quadratic formula to solve f (x) = h(x).T(Imin) ans = 0.Pe=sqdistor(1.T).5 and x2 = 6 + 20 = 10. we see that there exist x1 and x2 such that f (x1 ) = g(x1 ) and f (x2 ) = g(x2 ). If someone were paying you to find the best T . >> [Pmin.70:0.70:0. >> [Pmin.80000000000000 >> T=[0.95. x ∈ A1 otherwise. x ∈ A0 if (x − 4)2 ≥ 16 ln(γ/4) + 16 +16 ln x.9].T).5.76000000000000 >> T=[0. A0 = [0.Pe=sqdistor(1. we can show that g(x) = γ0 + 16 ln x ≤ h(x) = γ0 + 16(ln 4 − 1) + 4x. In terms of x1 and x2 .02:0. you would probably want to do more testing. 306 .Imin]=min(Pe).02:0. >> [Pmin.70:0.T).Imin]=min(Pe).0.Pe=sqdistor(1.Pe=sqdistor(1. (1) Using the given conditional PDFs.5. x ∈ A1 otherwise.9]. (2) (3) (4) When we plot the functions f (x) = (x − 4)2 and g(x) = γ0 + 16 ln x. inspection of the vector Pe shows that all values are quite close.T). For a threshold γ. A1 = (x1 . x1 = 1. γ0 ≥ γx/4.5.T(Imin) ans = 0.4. x1 ] ∪ [x2 . you should repeat your experiments to get a sense of the variance of your results. One can show that x2 ≤ x2 .5.100000.100000. >> [Pmin. yielding a solution √ x2 = 6 + 4 + 16 ln 4 + γ0 . √ the example shown below corresponding ¯ ¯ In ¯ to γ = 1 shown here.5. (5) Using a Taylor series expansion of ln x around x = x0 = 4.78. If we repeat this experiment a few times.This suggests that T = 0.02:0. x ∈ A1 otherwise.02:0. However.70:0. ∞).5.T(Imin) ans = 0.4 Solution Since the a priori probabilities P [H0 ] and P [H1 ] are unknown. fX|H1 (x) 2 )/16 x ∈ A1 otherwise. x2 = 9. With some more rearranging.78 is best among these values.0.Imin]=min(Pe). we obtain x ∈ A0 if e−(8x−x Taking logarithms yields x ∈ A0 if x2 − 8x ≥ 16 ln(γ/4) + 16 ln x.100000.0.5.47.100000.T(Imin) ans = 0. Problem 8.9]. x2 ).9]. The only useful lesson here is that when you try to optimize parameters using simulation results.78000000000000 This suggests that the best value of T is in the neighborhood of 0. we obtain: >> T=[0.78000000000000 >> T=[0.5. (6) Since h(x) is linear.0.Imin]=min(Pe). we use a Neyamn-Pearson formulation to find the ROC. we calculate f (x) and g(x) for discrete steps over the interval [0. for these approximations to x1 and x2 . pmiss=zeros(n.i]=min(abs(y)).x2]=gasrange(g(k)). ylabel(’P_{\rm MISS}’). y=(x-4).^2-g-16*log(x).i]=min(abs(y)). pfa(k)=exp(-x1/2)-exp(-x2/2). xmax=7+ . [ym. we need to find x1 and x2 . the optimal detector using the exact x1 and x2 cannot be worse than the ROC that we calculate. we calculate the exact false alarm and miss probabilities. for k=1:n.7 0. [x1.1).5 P FA 0. the false alarm and miss probabilities are PFA = P [A1 |H0 ] = PMISS 1 −x/2 e dx = e−x1 /2 − e−x2 /2 . function [x1. [ym.8 P MISS 0. x2=x(i).^2-g-16*log(x).x2]=gasrange(gamma) to calculate x1 and x2 for a given value of γ.4+(16*log(4))+g)). end plot(pfa. x1 2 x2 x −x2 /16 2 2 e = 1 − P [A1 |H1 ] = 1 − dx = 1 − e−x1 /16 + e−x2 /16 8 x1 2 (7) (8) To calculate the ROC.6 0.x2]=gasrange(gamma). In terms of Matlab. ranging from from 1/20 to 20 in multiplicative steps. xlabel(’P_{\rm FA}’). As a result.05*(a. a=(400)^(1/(n-1)). x1=x(i).9 1 307 .. we divide the work into the functions gasroc(n) which generates the ROC by calling [x1.60 40 20 0 −20 −40 0 2 4 6 8 10 f(x) g(x) h(x) 12 Given x1 and x2 . g=0.1). -exp(-x2^2/16)). pmiss(k)=1-(exp(-x1^2/16). 1 + x2 ] and find the discrete values closest to x1 and ¯ x2 . y=(x-4). dx=xmax/500..3 0.pmiss). Rather than find them exactly. function [pfa. However.^(k-1)).. Here is the resulting ROC: 1 0. sqrt(max(0.1 0. x=dx:dx:4.2 0.2 0 0 0.4 0. x=4:dx:xmax.pmiss]=gasroc(n).8 0. The argment n of gasroc(n) generates the ROC for n values of γ.. k=1:n. pfa=zeros(n.4 0. g=16+16*log(gamma/4).6 0. 32. we obtain the “pie slice” geometry shown in the figure. 16.5 Solution X2 s2 A2 s1 A1 A0 X1 s0 sM-1 sM-2 AM-1 AM-2 In the solution to Problem 8. but not the probability that symbol i is decoded as symbol j at the receiver. N2 1 (2) Finally. adding the noise vector N to produce the receiver output X = si + N. we found that the signal constellation and acceptance regions shown in the adjacent figure. Because of the symmetry of the signal constellation and the acceptance regions. we see that the sensor is not particularly good in the the ense that no matter how we choose the thresholds. The script mpsktest graphs the symbol error probability for M = 8.3. (1) We can rearrange these inequalities to express them in vector form as − tan θ/2 1 − tan θ/2 −1 1 √ N1 ≤ E tan θ/2. N2 q/2 (-E1/2. However.n) simulates the M -PSK system with SNR snr for n bit transmissions. we cannot reduce both the miss and false alarm probabilities under 30 percent. Problem 8. The Matlab “simulation” simply generates many pairs Z1 Z2 and checks what fraction meets these constraints. N2 ≥ − tan(θ/2) N1 + E .snr.After all of this work. the probability of symbol error is the same no matter what symbol is transmitted.0) N1 Thus it is simpler to assume that s0 is transmitted every time and check that the noise vector N is in the pie slice around s0 . √ √ The pie slice region is given by the constraints N2 ≤ tan(θ/2) N1 + E .4. since each Ni has variance σ 2 . We could solve this problem by a general simulation of an M -PSK system. I) random vector Z = N/σ and write our constraints as 1 √ − tan θ/2 1 Z1 ≤ γ tan θ/2. mapping symbol i to vector si . 308 . (3) 1 − tan θ/2 −1 Z2 where γ = E/σ 2 is the signal to noise ratio of the system. In fact by translating s0 to the origin. the function mpsksim(M. we define the Gaussian (0. Because the lines marking the boundaries of the pie slice have slopes ± tan θ/2. This would include a random sequence of data sysmbols. we are only asked to find the probability of symbol error.6. 0) (0. Moreover. x1 . In short. n=500000.^((0:30)/10). legend(’M=8’. Since min(B) operates on each column of B.j) indicate if the ith pair meets the first and second constraints. Z=randn(2. each column of the matrix Z corresponds to a pair of noise variables Z1 Z2 .m. the loglog function simply ignores the zero values. In mpsksim. the error rate is so low that no errors are generated in 500. -t -1].function Pe=mpsksim(M. The code B=(A*Z)<=t*sqrt(snr(k)) checks whether each pair of noise variables is in the pie slice region. σ 2 I) random vector with conditional PDF fY|X (y|x) = 1 e− (2πσ 2 )k/2 y−SP1/2 x 2 Problem 8. Pe16=mpsksim(16.5 Solution: %Pe=mpsksim(M.n). B=(A*Z)<=t*sqrt(snr(k)). . In this case.snr. PC=zeros(length(snr)). (1) The transmitted data vector x belongs to the set Bk of all binary ±1 vectors of length k.n). x2k −1 . each possible data vector xm represents a hypothesis. .n).6 Solution /2σ 2 . .snr.snr.000 symbols.snr.n).4. loglog(snr. A =[-t 1. snr=10. That is. end Pe=1-PC. for k=1:length(snr).Pe32). %mpsktest.n). we can enumerate the vectors in Bk as x0 .’M=32’.snr. ” however this problem has so many has so many hypotheses that it is more staightforward to refer to a hypothesis X = xm by ˆ ˆ the function x(y) which returns the vector xm when y ∈ Am .Pe16.j) and B(2. PC(k)=sum(min(B))/n.Pe8. the measured Pe is zero and since log 0 = −∞.snr. When the transmitted bit vector is X = x.4. the received signal vector Y = SP1/2 x+N is a Gaussian (SP1/2 x. . min(B) is a row vector indicating which pairs of noise variables passed the test. x(y) is our best guess as to which vector x was transmitted when y is received. %Problem 8. 309 . there are 2k acceptance ˆ sets Am . Our normal procedure is to write a decision rule as “y ∈ Am if .snr. B(1.’M=16’. In principle. Pe32=mpsksim(32.3). The set Am is the set of all vectors y such that the decision rule is to guess X = xm . Here is the output of mpsktest: 10 0 10 −5 M=8 M=16 M=32 0 10 10 1 10 2 10 3 The curves for M = 8 and M = 16 end prematurely because for high SNR.n) %n bit M-PSK simulation t=tan(pi/M). . Pe8=mpsksim(8. Since there are 2k possible data vectors. . given y. Next.k). In terms of Matlab.2^n]). A “symbol. How allbinaryseqs works will be clear by generating the matrices A and P and reading the help for bitget. Next is a short program that generates k random signals. we can define the function h(x) = −2y SP1/2 x + x P1/2 S SP1/2 x. Since for k = 10.k)>0.4. Bk has 210 = 1024 vectors. S=((2*S)-1. the ML rule can be expressed as x(y) = arg minx∈Bk h(x). columns are %random unit length signal vectors S=(rand(n. for a set of signal vectors (spreading sequences in CDMA parlance) given by the n × k matrix S. x∈Bk (2) Keep in mind that arg maxx g(x) returns the argument x that maximizes g(x). the ML rule is ˆ x(y) = arg min y − SP1/2 x x∈Bk x∈Bk x∈Bk (3) (4) (5) = arg min (y − SP1/2 x) (y − SP1/2 x) = arg min y y − 2y SP1/2 x + x P1/2 S SP1/2 x Since the term y y is the same for every x. function X=allbinaryseqs(n) %See Problem 8.Because each hypotheses has a priori probability 2−k . we define w = SP1/2 x and and we write. We use Matlab to evaluate h(x) for each x ∈ Bk .P. we start by defining X=allbinaryseqs(n) which returns an n×2n matrix X.” is just a vector x corresponding to the k transmitted bits of the k users. P=repmat([1:n]’.5). we need to evaluate h(w) for each vector w. In Matlab.[n. X=(2*X)-1. corresponding to X. each of length n. the form of fY|X (y|x) implies that the ML rule should minimize the negative exponent of fY|X (y|x). (6) ˆ In this case. In any case. In Matlab. err=cdmasim(S.6 %X: n by 2^n matrix of all %length n binary vectors %Thanks to Jasvinder Singh A=repmat([0:2^n-1]. Each random signal is just a binary ±1 sequence normalized to have length 1. To this end. That is. this will be convenient because we can form the matrices X and W with columns consisting of all possible vectors x and w.m) simulates the transmission of a frame of m symbols through a k user CDMA system with additive Gaussian noise. it is easy to calculate w w by operating on the matrix W without looping through all columns w. X = bitget(A.P). such that the columns of X enumerate all possible binary ±1 sequences of length n.[1. the probability of error is minimized by the maximum likelihood (ML) rule ˆ x(y) = arg max fY|X (y|x) . %S is an n by k matrix. function S=randomsignals(n. it is desirable to make the calculation as easy as possible. h(·) as a function of w: h(w) = −2y w + w w (7) Still. with some abuse of notation. 310 .0)/sqrt(n).1]). %all data Phalf=sqrt(P). Finally. hmin is the minimum h(w) and imin is the index of the column of W that minimizes h(w).0252 0. cdmasim compares x(y) and the transmitted vector x bit by bit and counts the total number of bit errors.1000). end function Pe=rcdma(n.s)~=X(:. there is no need for looping.0385 >> 0. we find the bit error rate Pe .s)+randn(n. we define a function err=mfcdmasim(S. with elements w w for each possible w.S.snr). %S= n x k matrix of signals %P= diag matrix of SNRs (power % normalized by noise variance) %See Problem 8.m).snr. represented by x (or X(:.m). %number of users n=size(S.snr. %R-CDMA simulation: % proc gain=n. For an SNR of 4 dB and processing gain 16.0788 To answer part (b). %processing gain X=allbinaryseqs(k).4. >>Pe Pe = 0. Here is the pair of functions: function err=cdmasim(S.P.2^k.k. % disp([p k e Pe(j)]). the requested tests are generated with the commands >> n=16. s=duniformrv(1.imin)). WW is a length 2k row vector. and thus pk is actually the signal to noise ratio of user k.s.m). Finally. m symbols/frame %See Problem 8. cdmasim calculates a received signal y (y).m) cycles through all combinations of users k and SNR snr and calculates the bit error rate for each pair of values. the kth diagonal element of P is the “power” pk of user k.k.k.6 k=size(S.k. [hmin. In addition. >> k=[2 4 8 16]. the function Pe=rcdma(n.1). p=SNR(j). %err=cdmasim(P.0272 0. e=e+cdmasim(S. For each of the m random data symbols. e=0. err=err+sum(X(:. Just as for the case of the ML detector.In addition.snr.m). for j=1:m.k. The matched filter processing can be applied to all m columns at once. %Pe=rcdma(n. S=randomsignals(n. err=0.s) in Matlab).6 Solution [K.1).2). y=S*Phalf*X(:.4. users=k % rand signal set/frame % s frames. the code for the matched filter (MF) detector is much simpler because there is no need to test 2k hypotheses for every transmitted symbol. with a random signal set for each frame.m) runs cdmasim for the pairs of values of users k and SNR snr. end Pe(j)=e/(s*m*k). Dividing the total number of bit errors over s frames by the total number of transmitted bits.P.s.SNR]=ndgrid(k. A second function Pe=mfrcdma(n.s.k=K(j). The function rcdma repeats cdmasim for s frames. In mfcdmasim.s.snr.k).imin]=min(-2*y’*W+WW). Pe=zeros(size(SNR)).snr.1). we assume that the additive Gaussian noise variable have variance 1. W=S*Phalf*X. >> Pe=rcdma(n.p*eye(k). Technically. for i=1:s.100.m) that simulates the MF detector for m symbols for a given set of signal vectors S. The mth transmitted symbol is represented by the mth column of X and the corresponding received signal is given by the mth column of Y.m). end In cdmasim.*W). Thus imin is also the index of the minimizing column of ˆ X. for j=1:prod(size(SNR)). Here are the functions: 311 . WW=sum(W. for j=1:prod(size(SNR)).kt).S.10.4. Although the MF detector is worse.P. %proc. X=randombinaryseqs(k. function Pe=mfrcdma(n. In fact. %Pe=rcdma(n. S=randomsignals(n.k. end Here is a run of mfrcdma.6b k=size(S. XR=sign(S’*Y).1 0. p=SNR(j). end Pe(j)=e/(s*m*kt). the received signal resulting from the transmissions of k users was given by (1) Y = SP1/2 X + N 312 .snr. >> pemf=mfrcdma(16.m).3.k.0370 0. users=k % rand signal set/frame % s frames.2 0.4.1795 >> The following plot compares the maximum likelihood (ML) and matched filter (MF) detectors.0661 0. 4 2 73936 4 4 264234 4 8 908558 4 16 2871356 >> pemf’ ans = 0.1000. the complexity of the ML detector is prohibitive and thus only matched filter detectors are used. 0.s.1000).function err=mfcdmasim(S.snr. m symbols/frame %See Problem 8. Pe=zeros(size(SNR)).m).p*eye(kt).snr).kt=K(j). %no.15 0. Y=S*Phalf*X+randn(n. it should not surprising that it has a lower bit error rate. In such case.m). Problem 8. %S= n x k matrix of signals %P= diag matrix of SNRs % SNR=power/var(noise) %See Problem 8.m).4.k.m). e=e+mfcdmasim(S.1136 0.7 Solution For the CDMA system of Problem 8. e=0. the processing gain ranges from roughly 128 to 512. %err=mfcdmasim(P. the reduction in detctor complexity makes it attractive. in practical CDMA-based cellular phones.m).05 0 ML MF 0 2 4 6 8 10 12 14 16 As the ML detector offers the minimum probability of error.SNR]=ndgrid(k.6 Solution [K. %R-CDMA.4. err=sum(sum(XR ~= X)). gain Phalf=sqrt(P). of users n=size(S. disp([snr kt e]).s.2). MF detection % proc gain=n. for i=1:s.1).m). (7) Pe = k k (S S)−1 ii i=1 i=1 (b) When S S is not invertible. ˜ (5) Note that S S is called the correlation matrix since its i. pk ] is a k × k diagonal matrix of received powers. we average over the bits of all users and find that k k 1 1 pi Pe. Pe.i = Q . jth entry is Si Sj is the correlation ˜ between the signal of user i and that of user j. since N = AN where A = (S S)−1 S . However. . . we need to average over all possible matrices S to find the average probability of bit error. Since there are only 2n distinct columns for matrix S. Thus Var[Ni ] = σ 2 (S S)−1 and the probability ii of bit error for user i is for user i is Pe.6. (3) This is the same as a single user-receiver output of the binary communication system of ˆ ˜ Example 8. The code has a lot of lines because it evaluates the BER using m signal sets for each combination of users k and SNRs snr. Theorem 5. In this case. σ 2 I) Gaussian noise vector.i = P Yi > 0|Xi = −1 = P − pi + Ni > 0 = Q pi ˜ Var[Ni ] .i = Q pi ˜ Var[Ni ] =Q pi (S S)−1 ii . and N is a Gaussian (0. Instead. However. the decorrelating detector applies a transformation to Y to generate ˜ ˜ Y = (S S)−1 S Y = P1/2 X + N.i = 1/2 and thus Pe = 1/2. (6) To find the probability of error for a randomly chosen but. there is no need for the simulated transmission of bits. Thus the program runs quickly. A function berdecorr uses this method to evaluate the decorrelator BER. The single-user decision rule Xi = sgn (Yi ) for the transmitted bit Xi has probability of error √ ˜ ˜ Pe. it is quite 313 .√ √ where S is an n × k matrix with ith column Si and P1/2 = diag[ p1 .16 tells us that N has covariance −1 ) = (B )−1 implies that matrix CN = ACN A . the detector flips a coin to decide each bit. because the program generates signal sets and calculates the BER asssociated with each. (2) ˜ ˜ where N = (S S)−1 S N is still a Gaussian noise vector with expected value E[N] = 0. S S is invertible. In this case. we randomly generate m matrices S and estimate the average Pe by averaging over these m matrices. there are 2kn possible matrices S and averaging over all of them is too much work. We note that the general property that (B ˜ A = S((S S) )−1 = S(S S)−1 . (4) ˜ ˜ However. These facts imply CN == (S S)−1 S (σ 2 I)S(S S)−1 = σ 2 (S S)−1 . ˜ is Decorrelation separates the signals in that the ith component of Y √ ˜ ˜ Yi = pi Xi + Ni . . . (c) When S is chosen randomly. (a) When S has linearly independent columns. >> pe16=berdecorr(16.:)+sum(qfunction(sqrt((1. Rank deficiency count: 1 2 4 8 16 0 0 0 0 0 >> pe32’ ans = 0. here is a plot of these same BER statistics for n = 16 and k ∈ {2.disp(k). Here is the (somewhat tedious) code: function Pe=berdecorr(n.:)=Pe(i. users=k. 8. This occurs because the decorrelator must suppress a large set of interferers. In this case.disp(count). Also.3904 As you might expect. %Problem 8.4.0771 >> 32 10000 0.length(snr)). else G=diag(inv(R)). Running berdecorr with processing gains n = 16 and n = 32 yields the following output: >> k=[1 2 4 8 16 32]. snr=snr(:)’. in generating 10./G)*snr)))/k(i).0755 0. average Pe for m signal sets count=zeros(1. R=S’*S.length(snr)). Pe(i.possible to generate signal sets that are not linearly independent.10000). for i=1:length(k). even if k = 4 or k = 8. end end end disp(’Rank deficiency count:’).0400 0.k. Just to see whether this rule dominates the error probability.length(k)).k.000 signal matrices S for each value of k we see that rank deficiency is fairly uncommon. we also display counts of how often S is rank deficient.k(i)).k.5000 32 0 0.:)=Pe(i. Pe=Pe/m. Rank deficiency count: 1 2 4 8 16 0 2 2 12 454 >> pe16’ ans = 0.4. for mm=1:m. %counts rank<k signal sets Pe=zeros(length(k).0273 0. Just for comparison. if (rank(R)<k(i)) count(i)=count(i)+1. Pe(i.0290 0. however it occasionally occurs for processing gain n = 16.snr.0228 0. Finally.10000).5*ones(1.:)+0.0228 0.7 Solution: R-CDMA with decorrelation %proc gain=n. the BER increases as the number of users increases. on the same graph is the BER for the matched filter detector and the maximum likelihood detector found in Problem 8.6.0383 0.4. S=randomsignals(n. 4. berdecorr assumes the “flip a coin” rule is used.4.0246 0. 16}.m).3515 >> pe32=berdecorr(32. 314 . we add a Gaussian noise vector N to generate the received signal X = sb1 b2 b3 + N. b1 b b1 b The key to this problem is the mapping from bits to symbol components and then back to bits. otherwise it is in the lower half plane. we use these facts by first using b2 and b3 to map b = b1 b2 b3 inner ring signal vector s∈ 1 1 . Given a bit vector b. then ˆ1 = 1. 1 −1 .0. then s is stretched to the outer ring.8 Solution Each transmitted symbol sb1 b2 b3 corresponds to the transmission of three bits given by the vector b = b1 b2 b3 . If b1 = 1. Problem 8. the decorrelator suffers because it must suppress all interfering users. Note that sb1 b2 b3 is a two dimensional vector with components s(1) 2 b3 s(2) 2 b3 . • There is an inner ring and an outer ring of signals. • sb1 b2 b3 is in the upper half plane if b3 = 0.4. to an (1) In the next step we scale s by (1 + b1 ). b b s001 s011 A101 A111 • ˆ3 = 1 if X2 < 0. otherwise it is in the left half plane. Finally. otherwise ˆ3 = 0.2 0.2. the decorrelator is good for the low-SNR user because it zeros out the interference from the high-SNR user.3. X2 In the solution to Problem 8. b b s101 s111 √ • If |X1 | + |X2 | > 3 2/2. when the number of users k is large (relative to the processing gain n).3.2. b b 315 . −1 1 . otherwise it is in the outer ring. −1 −1 . sb1 b2 b3 is in the inner ring if b1 = 0. we found that the acceptance set for the hypothesis Hb1 b2 b3 that sb1 b2 b3 is transmitted A100 A110 is the set of signal space points closest to sb1 b2 b3 .4 0. When some users have very high SNR.3 Pe ML Decorr MF 0. Graphically. otherwise ˆ2 = 0. These A010 A000 acceptance sets correspond an inverse mapping of the reˆ s010 s000 ceived signal vector X to a bit vector guess b = ˆ1 ˆ2 ˆ3 b b b X1 using the following rules: A001 A011 • ˆ2 = 1 if X1 < 0. otherwise ˆ1 = 0. Finally. s110 s100 these acceptance sets are given in the adjacent figure. From the signal constellation shown with Problem 8. we note that these conclusions are specific to this scenario when all users have equal SNR. we make the following observations: • sb1 b2 b3 is in the right half plane if b2 = 0.1 0 0 2 4 6 8 10 12 14 16 k We see from the graph that the decorrelator is better than the matched filter for a small number of users. However. 3*m). sig. Each column of B corresponds to a bit vector b.1]*(1+B(1. recalculate the received signal and receiver bit decisions. the adjacent acceptance sets correspond to a one bit difference. Finally. S=([1. Note that we generate the bits and transmitted signals.:).6 derived the MAP decision rule 316 . BR(1.’-d’.m).. Pe(i)=sum(max(E))/m.2*(-8:0)).:).m) which simulates the transmission of m symbols for each value of the vector sigma. and normalized noise only once. We calculate both the symbol decision errors that are made as well as the bit decision errors. Since the usual type of symbol error occurs when the vector X is in the adjacent set. a script myqamplot.3]. we rescale the additive noise. N=randn(2. end %myqamplot. E=(BR~=B). legend(’SER’. Problem 8.’-s’). %S(2. Similarly. S=1-2*B([2. Here are the programs: function [Pe. .ber]=myqam(sig.*S.:)=sum(abs(X))>(3/sqrt(2)). ber(i)=sum(sum(E))/(3*m).:).:)=1-2*B(3.’BER’. loglog(sig. The solution to Problem 8.5. a symbol error typically results in one bit being in error but two bits being received correctly.9 Solution (a) For the M -PSK communication system with additive Gaussian noise.3. Aj denoted the hypothesis that signal sj was transmitted. 3].:))). Pe=zeros(size(sigma)).ber. Thus the bit error rate is roughly one third the symbol error rate. The output of myqamplot is shown in this figure: 10 0 10 −5 SER BER −2 10 10 −1 10 0 Careful reading of the figure will show that the ratio of the symbol error rate to the bit error rate is always very close to 3.. BR([2. However for each value of sigma.We implement these steps with the function [Pe. %S(1.m).:)=1-2*B(2.ber]=myqam(sigma.:)=(X<0).1e6). This occurs because in the acceptance set for b1b2 b3 .^(0. for i=1:length(sigma). X=S+sigma(i)*N.m). each column of S and X corresponds to a transmitted signal s and received signal X. ber=Pe. BR=zeros(size(B)). B=reshape(bernoullirv(0.m plots the symbol error rate Pe and bit error rate ber as a function of sigma.4).ber]=myqam(sigma. [Pe.Pe.4.3.m sig=10. M=length(p).n). we estimated the probability of symbol error without building a complete simulation of the M -PSK system. r=[0. This is done by the functionp=mpskerr(M.n).) In mpskmatrix(p). 2 We observe that x − sj = (x − sj ) (x − sj ) = x x − 2x sj + sj s .M −2 and so on. However. Note that column i of S is the signal si−1 . (2) Since all the signals are on the same circle.snr. (b) In Problem 8.snr. The next step is to translate the vector P00 P01 · · · P0. because p is derived from a simulation experiment. Also. S=sqrt(snr)*[cos(t).5 Solution: %Pe=mpsksim(M.. Thus (3) min x − sj 2 = min −x sj = max x sj .M −1 (corresponding to p in Matlab) into an entire matrix P with elements Pij . the signal constellation (i.1). we need to build a more complete simulation to determine the probabilities Pij . Moreover. A=toeplitz(r)+.e]=max(S’*X).M −1 .n) %n bit M-PSK simulation t=(2*pi/M)*(0:(M-1)).M-2)]. p=countequal(e-1. In this problem. The acceptance regions are the “pie slices” around each signal vector. 317 . %Problem 8. Thus maximizing x sj is equivalent to minimizing the angle between x and sj .5 zeros(1. the received signal for the kth transmission.1. P=toeplitz(A*(p(:))).A]. X=repmat(S(:. [y.5.n)+randn(2. The code will become clear by examining the matrices A and the output P. The symmetry of the phase rotiation dictates that each row of P should be a one element cyclic rotation of the previous row. it is sufficient to transmit s0 repeatedly and count how often the receiver guesses sj . the matrix A implements the averaging. by symmetry we observe that P01 = P0.4.M-1). (1) s0 sM-1 sM-2 AM-1 AM-2 In terms of geometry. Thus y(k) corresponds to maxj Xk sj and e(k) reports the receiver decision for the kth transmission.n). A=[[1.. function P=mpskmatrix(p). zeros(M-1. The vector p calculates the relative frequency of each receiver decision.. (Why this is might be a good thing to do would make an interesting exam problem. sj sj is the same for all j. it will exhibit this symmetry only approximately. P02 = P0.e.sin(t)]. By symmetry. the interpretation is that all vectors x closer to sm than to any other signal sj are assigned to Am .1)] A].X2 s2 A2 s1 A1 A0 X1 x ∈ Am if x − sm 2 ≤ x − sj 2 for all j. The kth column of X corresponds to Xk = s0 + Nk . In this problem.4. the set of vectors si ) is the set of vectors on the circle of radius E. hankel(fliplr(r)). j j j Since x sj = x sj cos φ where φ is the angle between x and sj . A=[zeros(1. Our ad hoc (and largely unjustified) solution is to take the average of estimates of probabilities that symmetry says should be identical. function p=mpskerr(M.snr. x x is the same for all j.(0:(M-1)))’/n. P=mpskmatrix(p).n). convert each integer into a length log2 M binary string.m). function D=mpskdist(c). we use n transmissions to estimate the probabilities Pij . for each value of the vector snr.4878 0. i j (4) is the expected number of bit errors per transmitted symbol.snr(i).L).2198 0.n). D=reshape(sum((B1~=B2).n). >> Pb Pb = 0. For example the binary mapping is s0 s1 s2 s3 s4 s5 s6 s7 000 001 010 011 100 101 110 111 0 1 2 3 4 5 6 7 The Gray mapping is s0 s1 s2 s3 s4 s5 s6 s7 000 001 011 010 110 111 101 100 0 1 3 2 6 7 5 4 Thus the binary mapping can be represented by a vector c1 = 0 1 · · · mapping is described by c2 = 0 1 3 2 6 7 5 4 .0038 318 . function Pb=mpskmap(c.P)/M. (d) We evaluate the binary mapping with the following commands: >> c1=0:7. The matrix D with i. jth element dij that indicates the number of bit positions in which the bit string assigned to si differs from the bit string assigned to sj . L=length(c). Next. [C1.1000000). end Given the integer mapping vector c.snr. we estimate the BER of the a mapping using just one more function Pb=mpskmap(c. >>Pb=mpskmap(c1. 7 while the Gray The function D=mpskdist(c) translates the mapping vector c into the matrix D with entries dij .c). Pb=zeros(size(snr)). for i=1:length(snr).L. D=mpskdist(c).(c) The next step is to determine the effect of the mapping of bits to transmission vectors sj . the integers provide a compact representation of this mapping.snr. Last.C2]=ndgrid(c. We treat BER as as a finite random variable that takes on value dij with probability Pij . The method is to generate grids C1 and C2 for the pairs of integers. M=length(c).0529 0. the rest is easy. Note that the BER is a “rate” in that BER = 1 M Pij dij . p=mpskerr(M. Given matrices P and D. Pb(i)=finiteexp(D. >>snr=[4 8 16 32 64].m=log2(L).m). the expected value of this finite random variable is the expected number of bit errors.2). First we calculate the matrix D with elements dij. In this case. B2=dec2bin(C2. B1=dec2bin(C1. and then to count the number of bit positions in which each pair differs.7640 0.snr. we calculate the expected number of bit errors per transmission. we observe that the BER of the binary mapping is higher than the BER of the Gray mapping by a factor in the neighborhood of 1. this solution is also a continuation.4. >> Pg Pg = 0. In this problem.0038/0. we want to determine the error probability for each bit k in a mapping of bits to the M -PSK signal constellation. >>Pg=mpskmap(c2.8)+D(8.9. The bit error rate associated with bit k is BER(k) = 1 M Pij dij (k) i j (1) where dij (k) indicates whether the bit strings mapped to si and sj differ in bit position k.5 to 1. we describe the mapping by the vector of integers d.0023 Experimentally.1))+sum(diag(DG.7 In fact. BER(Gray) ≈ 16q/M. In this case.1))+sum(diag(D.4.1000000).1) ans = 16 Thus in high SNR.9. suppose that that si is decoded as si+1 or si−1 with probability q = Pi.0023 = 1. the BER formula based on this approximation corresponds to summing the matrix D for the first off-diagonals and the corner elements.0306 0.1262 0. >> sum(diag(DG.snr. we would expect BER(binary) ≈ 28q/M. which seems to be in the right ballpark. >>snr=[4 8 16 32 64].2855 0.4943 0. >> sum(diag(D.1) ans = 28 >> DG=mpskdist(c2). (5) The ratio of BERs is 28/16 = 1.(e) Here is the performance of the Gray mapping: >> c2=[0 1 3 2 6 7 5 4].4.8)+DG(8.-1))+DG(1. For example the binary mapping is s0 s1 s2 s3 s4 s5 s6 s7 000 001 010 011 100 101 110 111 0 1 2 3 4 5 6 7 The Gray mapping is 319 .75. we found at high SNR that the ratio of BERs was 0.10 Solution As this problem is a continuation of Problem 8. For high SNR. Here are the calculations: >> D=mpskdist(c1). As in Problem 8.i−1 and all other types of errors are negligible. Problem 8.-1))+D(1. this approximate ratio can be derived by a quick and dirty analysis. Experimentally.i+1 = Pi.65. we treat BER(k) as a finite random variable that takes on value dij with probability Pij . First we calculate the matrix D with elements dij. We use the commands p=mpskerr(M. For each bit. for k=1:m. we estimate the BER of the a mapping using just one more function Pb=mpskmap(c.n). Pb=zeros(1.C2]=ndgrid(c. we use n transmissions to estimate the probabilities Pij . Last.n).snr. 7 while the Gray The function D=mpskdbit(c. >> mpskbitmap(c2. we evaluate the two mappings with the following commands: >> c1=0:7.0577 >> c2=[0 1 3 2 6 7 5 4]. identify bit k in each integer.100000) ans = 0. For an SNR of 10dB. calculate BER of bit k L=length(c). %Problem 8.4.snr.snr.1149 0.snr. One might surmise that careful study of the matrix D might lead one to prove for the Gray map that the error rate for bit 1 is exactly double that for bits 2 and 3 .s0 s1 s2 s3 s4 s5 s6 s7 000 001 011 010 110 111 101 100 0 1 3 2 6 7 5 4 Thus the binary mapping can be represented by a vector c1 = 0 1 · · · mapping is described by c2 = 0 1 3 2 6 7 5 4 . %See Problem 8. the error rate of bit 1 is cut in half relative to the binary mapping. >> mpskbitmap(c1.10: For mapping %c. Next. the 0.m=log2(M). we calculate the expected number of bit k errors per transmission. B1=bitget(C1. which is roughly double that of bit 3. the rest of the solution is the same as in Problem 8. p=mpskerr(M. of %bit error for each bit position for %an MPSK bit to symbol mapping c M=length(c). there is a matrix D associated with each bit position and we calculate the expected number of bit errors associated with each bit position. Thus. The method is to generate grids C1 and C2 for the pairs of integers.P)/M.22 error rate of bit 1 is roughly double that of bit 2.9. the expected value of this finite random variable is the expected number of bit errors.m=log2(L).m).4. using matrices P and D.10.0572 We see that in the binary mapping.10. D=mpskdbit(c.k). [C1.100000) ans = 0. and then check if the integers differ in bit k.0572 0. 320 . end Given the integer mapping vector c.n). but that would be some other homework problem. P=mpskmatrix(p).2247 0.10: Calculate prob. B2=bitget(C2.k). function D=mpskdbit(c.k) translates the mapping vector c into the matrix D with entries dij that indicates whether bit k is in error when transmitted symbol si is decoded by the receiver as sj . for a given value of snr. .k).c).n) and P=mpskmatrix(p) to calculate the matrix P which holds an estimate of each probability Pij . For the Gray mapping. Pb(k)=finiteexp(D.k). Finally.4. function Pb=mpskbitmap(c. the bit error rates at each position a re still not identical since the error rate of bit 1 is still double that for bit 2 or bit 3. .1140 0. However. D=(B1~=B2). we obtain ˆ T = 0 ∞ λ(t0 + t )e−λt dt ∞ (4) t λe−λt dt = t0 + E [T ] E[T ] = t0 0 λe−λt dt + 0 1 ∞ (5) Problem 9. (1) 0 t > t0 = otherwise λe−λ(t−t0 ) t > t0 . 4 (5) 321 .1. 0 otherwise.1 Solution First we note that the event T > t0 has probability P [T > t0 ] = Given T > t0 .Problem Solutions – Chapter 9 Problem 9. y) dy = x 1 6(y − x) dy y=1 y=x 2 2 (1) (2) (3) = 3y 2 − 6xy = 3(1 − 2x + x ) = 3(1 − x) The complete expression for the marginal PDF of X is fX (x) = (b) The blind estimate of X is xB = E [X] = ˆ 0 1 3(1 − x)2 0 ≤ x ≤ 1. fX (x) = ∞ −∞ fX.1. (2) Given T > t0 . (3) With the substitution t = t − t0 . the minimum mean square error estimate of T is ˆ T = E [T |T > t0 ] = ∞ −∞ tfT |T >t0 (t) dt = ∞ t0 λte−λ(t−t0 ) dt. (4) 3x(1 − x)2 dx = 3 2 3 x − 2x3 + x4 2 4 1 0 1 = . the conditional PDF of T is fT |T >t0 (t) = fT (t) P [T >t0 ] ∞ t0 λe−λt dt = e−λt0 .Y (x. 0 otherwise.2 Solution (a) For 0 ≤ x ≤ 1. 5 fX (x) dx = 0 0.5 fY (y) dy = 7 3y 2 dy = .5 is fX|X<0. (7) The minimum mean square estimate of X given X < 0. (14) The minimum mean square estimate of Y given Y > 0.(c) First we calculate P [X < 0.5 is E [Y |Y > 0.5] = ∞ −∞ yfY |Y >0. For 0 ≤ y ≤ 1.5. 8 (6) The conditional PDF of X given X < 0.5 45 .5 (y) dy = 1 0.5] 0 y > 0.5 (8) = 0 12x2 16x3 6x4 − + 7 7 7 11 .5 0 24x (1 − x)2 dx 7 0.5.5 (x) = fX (x) P [X<0. (11) (e) The blind estimate of Y is yB = E [Y ] = ˆ (f) First we calculate P [Y > 0.5 (x) dx = = 0.5 6y 4 24y 3 dy = 7 7 1 = 0.5] = ∞ −∞ xfX|X<0. (10) The complete expression for the marginal PDF of Y is fY (y) = 3y 2 0 ≤ y ≤ 1. 56 (15) 322 .5 0 7 = .5] = 0 0.5 1 (13) The conditional PDF of Y given Y > 0. 56 (9) (d) For y < 0 or y > 1. 0 otherwise.5] = ∞ −∞ yfY (y) dy = 0 1 3 3y 3 dy = . otherwise. fY (y) = 0. 0 otherwise. = otherwise. 24y 2 /7 y > 0. fY (y) = 0 y 6(y − x) dx = 6xy − 3x2 y 0 = 3y 2 .5 is E [X|X < 0. 4 (12) ∞ 0.5.5 = otherwise 0 24 7 (1 − x)2 0 ≤ x < 0.5 is fY |Y >0. 8 0.5] 0 0 ≤ x < 0.5 (y) = fY (y) P [Y >0.5 3(1 − x)2 dx = −(1 − x)3 0. 8(1 − x) 1/2 < x ≤ 1. y) dy = x 1 2 dy = 2(1 − x). 3 (7) 8x(1 − x) dx = (d) For 0 ≤ y ≤ 1. fX|X>1/2 (x) = fX (x) P [X>1/2] 0 x > 1/2. 0 otherwise. 0 otherwise.Problem 9. (8) The complete expression for the marginal PDF of Y is fY (y) = (e) The blind estimate of Y is yB = E [Y ] = ˆ 0 1 2y 0 ≤ y ≤ 1. (5) The minimum mean square error estimate of X given X > 1/2 is E [X|X > 1/2] = = ∞ −∞ 1 1/2 xfX|X>1/2 (x) dx 4x2 − 8x3 3 1 1/2 (6) 2 = . the marginal PDF of Y is fY (y) = ∞ −∞ fX. 0 otherwise. = otherwise. 3 (3) (c) First we calculate P [X > 1/2] = 1 1/2 fX (x) dx = 1 1/2 2(1 − x) dx = (2x − x2 ) 1 1/2 1 = . (2) 2x(1 − x) dx = 2x3 x − 3 2 1 0 1 = .Y (x. (1) The complete expression of the PDF of X is fX (x) = (b) The blind estimate of X is ˆ XB = E [X] = 0 1 2(1 − x) 0 ≤ x ≤ 1. 4 (4) Now we calculate the conditional PDF of X given X > 1/2. (9) 2 2y 2 dy = .3 Solution (a) For 0 ≤ x ≤ 1. y) dx = 0 y 2 dx = 2y.Y (x.1. 3 (10) 323 . fX (x) = ∞ −∞ fX. y) = 6(y − x) 0 ≤ x ≤ y ≤ 1 0 otherwise (1) (a) The conditional PDF of X given Y is found by dividing the joint PDF by the marginal with respect to Y . = otherwise. 6 1/2 Problem 9.1.Y |X>1/2 (x.(f) We already know that P [X > 1/2] = 1/4.Y |X>1/2 (=) fX. we need to calculate the conditional joint PDF fX. 8 1/2 < x ≤ y ≤ 1.Y (x.y) P [X>1/2] 0 x > 1/2. fY (y) = 0 y 6(y − x) dx = 6xy − 3x2 y 0 = 3y 2 (2) The complete expression for the marginal PDF of Y is fY (y) = Thus for 0 < y ≤ 1.Y (x. For y < 0 or y > 1. (11) The MMSE estimate of Y given X > 1/2 is E [Y |X > 1/2] = = = ∞ −∞ 1 ∞ −∞ yfX.4 Solution The joint PDF of X and Y is fX. this problem differs from the other problems in this section because we will estimate Y based on the observation of X. y) = fY (y) 0 6(y−x) 3y 2 3y 2 0 ≤ y ≤ 1 0 otherwise (3) 0≤x≤y otherwise (4) (b) The minimum mean square estimator of X given Y = y is ˆ XM (y) = E [X|Y = y] = = ∞ −∞ y 0 xfX|Y (x|y) dx x=y (5) = y/3 x=0 6x(y − x) 3x2 y − 2x3 dx = 2 3y 3y 2 (6) 324 .Y (x. For 0 ≤ y ≤ 1. fX|Y (x|y) = fX. However. 0 otherwise. In this case. fY (y) = 0. y) dx dy y (12) (13) (14) y 1/2 1 1/2 8 dx dy 5 y(8y − 4) dy = . y) = fY (y) 1/y 0 ≤ x ≤ y 0 otherwise (2) −∞ fX.Y (x. y) dx = 0 y 2 dx = 2y (1) (b) The optimum mean squared error estimate of X given Y = y is xM (y) = E [X|Y = y] = ˆ ∞ −∞ xfX|Y (x|y) dx = y 0 x dx = y/2 y (3) ˆ (c) The MMSE estimator of X given Y is XM (Y ) = E[X|Y ] = Y /2.5 Solution (a) First we find the marginal PDF fY (y). y) = fX (x) 2(y−x) 1−2x+x2 0 x≤y≤1 otherwise (9) (d) The minimum mean square estimator of Y given X is ˆ YM (x) = E [Y |X = x] = = = ∞ −∞ 1 x yfY |X (y|x) dy (10) (11) = 2 − 3x + x3 . y) dy = x 1 6(y − x) dy = 3y 2 − 6xy y=1 y=x (7) (8) = 3(1 − 2x + x2 ) = 3(1 − x)2 The conditional PDF of Y given X is fY |X (y|x) = fX. fX (x) = ∞ −∞ fX. 3 3 (13) Problem 9.1. for 0 ≤ y ≤ 2. For 0 ≤ y ≤ 2.Y (4) 325 .Y (x. For 0 ≤ x ≤ 1.(c) First we must find the marginal PDF for X. 3(1 − x)2 (12) (2/3)y 3 − y 2 x 1 − 2x + x2 2y(y − x) dy 1 − 2x + x2 y=1 y=x Perhaps surprisingly. this result can be simplified to x 2 ˆ YM (x) = + . the conditional PDF of X given Y is fX|Y (x|y) = fX. The mean squared error is ˆ e∗ = E (X − XM (Y ))2 = E (X − Y /2)2 = E X 2 − XY + Y 2 /4 X. y 1 x=0 fY (y) = ∞ x=y x 1 Hence.Y (x.Y (x. Y is at least as much X. X and Y have correlation coefficient ρX. (e) From the previous part. −3) = 0 = PX (1) PY (−3) (c) Direct evaluation leads to E [X] = 0 E [Y ] = 0 This implies Cov [X. X.1 Solution (a) The marginal PMFs of X and Y are listed below PX (x) = 1/3 x = −1. Hence. e∗ = X.Y Var[X](1 − ρ2 ). 3 0 otherwise (1) (b) No. calculation of the correlation coefficient ρX. the minimum mean square error of the optimum linear estimate is 2 e∗ = σX (1 − ρ2 ) = L X.Y = Cov [X.Y = 0 1 0 1 0 y 2(x2 − xy + y 2 /4) dx dy x=y x=0 (5) dy (6) (7) 2x3 /3 − x2 y + xy 2 /2 y3 dy = 1/24 6 = 0 1 Another approach to finding the mean square error is to recognize that the MMSE estimator is a linear estimator and thus must be the optimal linear estimator.4.4 must equal e∗ .4.Y X. −1. the mean square error of the optimal linear estimator given by Theorem 9. (7) (6) (5) Var [X] = 2/3 Var [Y ] = 5 (3) (4) (2) From Theorem 9. Y ] = Cov [X.Of course.Y work as direct calculation of e∗ . 1.2. That is. 0. the random variables X and Y are not independent since PX.Y 71 2 71 = . However.Y (1. a∗ = 7/30 and b∗ = 0.Y σY 30 Therefore. 1 0 otherwise PY (y) = 1/4 y = −3. e∗ = X. Y ] = E [XY ] − E [X] E [Y ] = E [XY ] = 7/6 (d) From Theorem 9. the integral must be evaluated. 3 120 180 (8) 326 . Y ] / Var[X] Var[Y ] = 49/120. 0. XL (Y ) = ρX.Y Problem 9. the optimal linear estimate of X given Y is σX 7 ˆ (Y − µY ) + µX = Y + 0. (a) The expected value of R is the expected value V plus the expected value of X. σR (5) The LMSE estimate of V given R is σ2 σV 12 ˆ (R − E [R]) + E [V ] = V R = R.2 Solution The problem statement tells us that fV (v) = 1/12 −6 ≤ v ≤ 6. Problem 9. 327 (6) . Cov [V. So E [R] = E [V + X] = E [V ] + E [X] = 0. (10) (11) (12) (13) (x + 2/3)2 PX|Y (x| − 3) = (−1/3)2 (2/3) + (2/3)2 (1/3) = 2/9. Var[R] = Var[V ] + Var[X] = 12 + 3 = 15. (9) (g) The minimum mean square estimator of X given that Y = 3 is xM (−3) = E [X|Y = −3] = ˆ (h) The mean squared error of this estimator is eM (−3) = E (X − xM (−3))2 |Y = −3 ˆ ˆ = x x xPX|Y (x| − 3) = −2/3.Y (x. (2) (b) Because X and V are independent random variables.R = Cov [V. x = −1. (4) (3) (d) The correlation coefficient of V and R is ρV. R] Var[V ] Var[R] = Var[V ] Var[V ] Var[R] = σV . −3) = (x| − 3) =  PY (−3)     1/6 1/4 = 2/3 1/12 1/4 = 1/3 0 x = 0. 3) random variable. 0 otherwise.2. (c) Since E[R] = E[V ] = 0. the variance of R is the sum of the variance of V and the variance of X. R] = E [V R] = E [V (V + X)] = E V 2 = Var[V ]. and that V is uniformly distributed between -6 and 6 volts and therefore also has zero expected value. (1) √ Furthermore.R 2 σR 15 σR Therefore a∗ = 12/15 = 4/5 and b∗ = 0. otherwise.(f) The conditional probability mass function is PX|Y PX. we are also told that R = V + X where X is a Gaussian (0. V (R) = ρV. We already know that X has zero expected value. Y (11) (10) Problem 9. we find that the optimal linear estimator of Y given X is 1 σY 5 ˆ (X − E [X]) + E [Y ] = X − YL (X) = ρX. the the standard deviation of each random variable is 2/3. In particular.3 Solution The solution to this problem is to simply calculate the various quantities required for the optimal linear estimator given by Theorem 9. That is. These common properties will make it much easier to answer the questions. 328 (3) 1/3 x = −1.Y = 493 Var[X] Var[Y ] Var[Y ] = E Y 2 − (E [Y ])2 = 493/768 By reversing the labels of X and Y in Theorem 9. 0. Also.4 Solution These four joint PMFs are actually related to each other.R (7) Problem 9.Y σX 8 16 The mean square estimation error is e∗ = Var[Y ](1 − ρ2 ) = 343/768 L X.4. Y ] 5 6 =√ ρX.(e) The minimum mean square error in the estimate is e∗ = Var[V ](1 − ρ2 ) = 12(1 − 12/15) = 12/5 V.2. 1 0 otherwise (1) (2) . Y ] = E [XY ] − E [X] E [Y ] = 5/16 √ Cov [X.2. completing the row sums and column sums shows that each random variable has the same marginal PMF.4. PX (x) = PY (x) = PU (x) = PV (x) = PS (x) = PT (x) = PQ (x) = PR (x) = This implies E [X] = E [Y ] = E [U ] = E [V ] = E [S] = E [T ] = E [Q] = E [R] = 0 and that E X2 = E Y 2 = E U 2 = E V 2 = E S 2 = E T 2 = E Q2 = E R2 = 2/3 (4) Since each random variable has zero expected value.√ second moment equals the variance. E [X] = −1(1/4) + 0(1/2) + 1(1/4) = 0 2 2 2 2 (1) (2) (3) (4) (5) E X = (−1) (1/4) + 0 (1/2) + 1 (1/4) = 1/2 E Y 2 = (−1)2 (17/48) + 02 (17/48) + 12 (14/48) = 31/48 The variances and covariance are E [XY ] = 3/16 − 0 − 0 + 1/8 = 5/16 Var[X] = E X 2 − (E [X])2 = 1/2 E [Y ] = −1(17/48) + 0(17/48) + 1(14/48) = −1/16 (6) (7) (8) (9) Cov [X. First we calculate the necessary moments of X and Y . the denominator of the correlation coefficient will be 2/3 in each case.5 Solution To solve this problem.V = = Var[S](1 − ρ2 ) = 2/3 S. R] = −1/4 2/3 (10) (b) From Theorem 9.4. Y ) = Var[X](1 − ρ2 ) = 2/3 L X.T Var[Q](1 − ρ2 ) = 5/8 Q. V ] −2/3 √ =√ √ = −1 Var[U ] Var[V ] 2/3 2/3 (9) In fact.4. all of the other pairs of random variables must be dependent. T ] =0 2/3 ρQ.Y Y = 0 ˆ SL (T ) = ρS. T ) L ∗ eL (Q. The other correlation coefficients are ρS. ˆ XL (Y ) = ρX. V ) = Var[U ](1 − ρ2 ) = 0 L U. The only difficulty is in computing E[X].R (15) (16) (17) (18) (12) (13) (14) Problem 9. V ] = E [U V ] = (1/3)(−1) + (1/3)(−1) = −2/3 (6) (7) (8) Cov [S. and ρX.Y (x. Cov [U.V V = −V σV (11) Similarly for the other pairs. R) e∗ (U.2. Since X and Y are independent.T = Cov [S.T T = 0 ˆ QL (R) = ρQ. we use Theorem 9. Var[Y ]. since the marginal PMF’s are the same.Y = 0.V = √ Cov [U. all expected values are zero and the ratio of the standard deviations is always 1. T ] = E [ST ] = 1/6 − 1/6 + 0 + −1/6 + 1/6 = 0 Cov [Q. we must compute the covariances.4. Var[X].Y . E[Y ].R = Cov [Q. For the other pairs.R R = −R/4 From Theorem 9.V (V − E [V ]) + E [U ] = ρU.(a) Random variables X and Y are independent since for all x and y. Hence. ρX. the mean square errors are e∗ (X.Y e∗ (S. R] = E [QR] = 1/12 − 1/6 − 1/6 + 1/12 = −1/6 The correlation coefficient of U and V is ρU. PX. First we calculate the marginal PDFs fX (x) = x 1 2(y + x) dy = y 2 + 2xy 2(y + x) dx = 2xy + x2 329 y=1 y=x x=y x=0 = 1 + 2x − 3x2 = 3y 2 (1) (2) fY (y) = 0 y . the least mean square linear estimator of U given V is σU ˆ UL (V ) = ρU. y) = PX (x) PY (y) (5) Since each other pair of random variables has the same marginal PMFs as X and Y but a different joint PMF. 2.4 to combine these quantities in the optimal linear estimator. we calculate fY (y) = 0 y 6(y − x) dx = 6xy − 3x2 6(y − x) dy = y 0 = 3y 2 (0 ≤ y ≤ 1) (2) (3) fX (x) = x 1 3(1 + −2x + x2 ) 0 ≤ x ≤ 1.6 Solution The linear mean square estimator of X given Y is ˆ XL (Y ) = E [XY ] − µX µY Var[Y ] (Y − µY ) + µX . 1 0 The moments of X and Y are E [Y ] = 0 1 3y 3 dy = 3/4 3y 4 dy = 3/5 E [X] = E X2 = 0 3x(1 − 2x + x2 ) dx = 1/4 3x2 (1 + −2x + x2 ) dx = 1/10 (4) (5) E Y2 = 0 1 1 330 . 0 otherwise. E Y2 = 3y 4 dy = 3/5.The first and second moments of X are E [X] = 0 1 (x + 2x2 − 3x3 ) dx = x2 /2 + 2x3 /3 − 3x4 /4 (x2 + 2x3 − 3x4 ) dx = x3 /3 + x4 /2 − 3x5 /5 1 0 1 0 1 0 = 5/12 = 7/30 (3) (4) E X2 = 0 1 The first and second moments of Y are E [Y ] = 0 1 3y 3 dy = 3/4. =√ 9 4 12 9 129 (10) (11) Problem 9. the correlation coefficient is ρX. (5) Thus. X and Y each have variance Var[X] = E X 2 − (E [X])2 = 1 0 0 1 0 y 129 2160 Var[Y ] = E Y 2 − (E [Y ])2 = 3 80 (6) To calculate the correlation coefficient. we use Theorem 9. σX ˆ XL (Y ) = ρX. (1) To find the parameters of this estimator.Y = Cov [X. Y ] Var[X] Var[Y ] E [XY ] − E [X] E [Y ] Var[X] Var[Y ] =√ (9) Finally.Y (Y − E [Y ]) + E [X] σY √ 3 5 5 5 129 Y − + = Y. we first must calculate the correlation E [XY ] = = 2xy(x + y) dx dy x=y x=0 (7) dy = 0 1 2x3 y/3 + x2 y 2 5y 4 dy = 1/3 3 5 129 (8) Hence. fX. fX. y) λe−λ(x−y) x ≥ y = fX|Y (x|y) = (6) 0 otherwise fY (y) Now we are in a position to find the minimum mean square estimate of X given Y .2. Thus the minimum mean square estimate of Y given X is ˆ YM (X) = E [Y |X] = X/2 (3) (b) The minimum mean square estimate of X given Y can be found by finding the conditional probability density function of X given Y . Y is uniform on [0.Y (x.7 Solution We are told that random variable X has a second order Erlang distribution fX (x) = λxe−λx x ≥ 0 0 otherwise (1) We also know that given X = x. y) = fY |X (y|x) · fX (x) = Now we can find the marginal of Y fY (y) = y ∞ λe−λx 0 ≤ y ≤ x 0 otherwise (4) λe−λx dx = e−λy y ≥ 0 0 otherwise (5) By dividing the joint density by the marginal density of Y we arrive at the conditional density of X given Y . Hence E[Y |X = x] = x/2.Y (x. First we find the joint density function. x]. x] so that fY |X (y|x) = 1/x 0 ≤ y ≤ x 0 otherwise (2) (a) Given X = x. the conditional expected value of X is E [X|Y = y] = y ∞ λxe−λ(x−y) dx (7) Making the substitution u = x − y yields E [X|Y = y] = 0 ∞ λ(u + y)e−λu du (8) 331 .The correlation between X and Y is E [XY ] = 6 0 1 x 1 xy(y − x) dy dx = 1/5 (6) Putting these pieces together. Given Y = y. random variable Y is uniform on [0. the optimal linear estimate of X given Y is ˆ XL (Y ) = 1/5 − 3/16 3/5 − (3/4)2 Y − 3 4 + 1 Y = 4 3 (7) Problem 9. ˆ (d) Since the MMSE estimate of X given Y is the linear estimate XM (Y ) = Y + 1/λ. That is. Problem 9.We observe that if U is an exponential random variable with parameter λ. we need the conditional PDF ˆ fR|X (r|x) = The marginal PDF of X is fX (x) = 0 ∞ fX|R (x|r) fR (r) . This implies 1 1 Var [R|X = x] = (6) E [R|X = x] = x+1 (x + 1)2 Hence. YL (X) = X/2. fX|R (x|r) = re−rx x ≥ 0. the conditional PDF of R is an Erlang PDF with parameters n = 1 and λ = x + 1. then E [X|Y = y] = E [U + y] = 1 +y λ (9) The minimum mean square error estimate of X given Y is 1 ˆ XM (Y ) = E [X|Y ] = + Y λ (10) ˆ (c) Since the MMSE estimate of Y given X is the linear estimate YM (X) = X/2. the MMSE estimator of R given X is rM (X) = E [R|X] = ˆ 332 1 X +1 (7) . fR|X (r|x) to the Erlang PDF shown in Appendix A. (1) Note that fX. the optimal ˆ linear estimate of X given Y must also be the MMSE estimate. we learn the following facts: e−r r ≥ 0. 0 otherwise. 0 otherwise. Thus v = −e−(x+1)r /(x + 1) and fX (x) = −r −(x+1)r e x+1 ∞ 0 v du by choosing u = r and dv = −1 e−(x+1)r (x + 1)2 ∞ 0 + 1 x+1 ∞ e−(x+1)r dr = = 0 1 (x + 1)2 (4) Now we can find the conditional PDF of R given X. we see that given X = x. fR|X (r|x) = fX|R (x|r) fR (r) = (x + 1)2 re−(x+1)r fX (x) (5) By comparing. for the remainder of the problem. we assume both X and R are non-negative and we omit the usual “zero otherwise” considerations. Hence.2.8 Solution fR (r) = From the problem statement. the optimal ˆ linear estimate of Y given X must also be the MMSE estimate. fX (x) ∞ 0 (2) fX|R (x|r) fR (r) dr = re−(x+1)r dr (3) We use the integration by parts formula u dv = uv − e−(x+1)r dr. XL (Y ) = Y + 1/λ. r) > 0 for all non-negative X and R. (a) To find rM (X).R (x. That is. 9 Solution (a) As a function of a. Note that the typical way that b = 0 occurs when E[X] = E[Y ] = 0.Y ˆ where b = ρX. the LMSE estimate of X given R doesn’t exist. we know that given R = r. (d) Just as in part (c). the mean squared error is e∗ = E X 2 − (E [XY ])2 E [Y 2 ] (3) E [XY ] E [Y 2 ] (1) (2) (c) We can write the LMSE estimator given in Theorem 9. Another way of writing this statement is xM (R) = E [X|R] = 1/R. because E[X] doesn’t exist. and correlation coefficent can also yield b = 0. ˆ (8) (c) Note that the expected value of X is E [X] = 0 ∞ xfX (x) dx = 0 ∞ x dx = ∞ (x + 1)2 (9) Because E[X] doesn’t exist. 333 . variances. E[X|R = r] = 1/r. However. it is possible that the right combination of expected values.2.Y σX Y −b σY (4) σX E [Y ] − E [X] σY (5) ˆ When b = 0. From the initial problem statement.4 in the form xL (()Y ) = ρX. X is exponential with expectred value 1/r. That is. Problem 9.(b) The MMSE estimate of X given R = r is E[X|R = r]. the LMSE estimate of R given X doesn’t exist. X(Y ) is the LMSE estimate. the mean squared error is e = E (aY − X)2 = a2 E Y 2 − 2aE [XY ] + E X 2 Setting de/da|a=a∗ = 0 yields a∗ = (b) Using a = a∗ . From the solution to Quiz 9. Since e−rT is a common factor in the maximization. We observe that g(n) = rT g(n − 1) n (5) this implies that for n ≤ rT . ˆ NM (r) = E [N |R = r] = rT (3) (b) The maximum a posteriori estimate of N given R is simply the value of n that will maximize PN |R (n|r). ˆ 334 .Problem 9. PN |R (n|r) = 0 (rT )n e−rT n! n = 0. In this case.2 Solution From the problem statement we know that R is an exponential random variable with expected value 1/µ. ˆ we can define g(n) = (rT )n /n! so that nM AP = arg maxn g(n). we set a derivative to zero to solve for the maximizing value. Hence the maximizing value of n is the largest n such that n ≤ rT . 1000]. otherwise. In this case. nM AP (r) = arg max PN |R (n|r) = arg max(rT )n e−rT /n! ˆ n≥0 n≥0 (4) Usually.3. Since R has a uniform PDF over [0. the number of phone calls arriving at a telephone switch. otherwise (2) (a) The minimum mean square error estimate of N given R is the conditional expected value of N given R = r. This is given directly in the problem statement as r. the maximizing value is r = 1000 m.3 unless the maximizing r exceeds 1000 m.6.3. r) = fX|R (x|r) fR (r) = 0 2 √1 e−(x+40+40 log10 r) /128 r0 128π (1) 0 ≤ r ≤ r0 . the resulting ML estimator is rML (x) = ˆ 1000 x < −160 (0. the maximizing value of r is the same as for the ML estimate in Quiz 9. (2) From Theorem 9.1)10−x/40 x ≥ −160 (4) Problem 9. . fR (r) = µe−µr r ≥ 0 0 otherwise (1) It is also known that. So we can write the following conditional probability mass function of N given R. N . 1. Therefore it has the following probability density function.1 Solution In this case. That is. g(n) ≥ g(n − 1). the MAP estimate of R given X = x is the value of r that maximizes fX|R (x|r)fR (r). that technique doesn’t work because n is discrete.R (x. nM AP = rT . That is.3. the joint PDF of X and R is fX. rMAP (x) = arg max fX|R (x|r) fR (r) = arg ˆ 0≤r 0≤r≤1000 max fX|R (x|r) (3) Hence. is a Poisson (α = rT ) random variable. . given R = r. . Hence. we have fR|N (r|n) = PN |R (n|r) fR (r) PN (n) (9) To find the value of n that maximizes fR|N (r|n). the conditional PDF of R given N . the conditional PDF of R given N is PN |R (n|r) fR (r) = fR|N (r|n) = PN (n) (rT )n e−rT µe−µr n! n µT (µ+T )n+1 (15) (16) = (µ + T ) The ML estimate of N given R is [(µ + T )r]n e−(µ+T )r n! nM L (r) = arg max fR|N (r|n) = arg max(µ + T ) ˆ n≥0 n≥0 [(µ + T )r]n e−(µ+T )r n! (17) This maximization is exactly the same as in the previous part except rT is replaced by (µ + T )r. There are several ways to derive the nth moment of an exponential random variable including integration by parts. the MGF is used to show that E[X n ] = n!/(µ + T )n . PN (n) = = ∞ −∞ ∞ PN |R (n|r) fR (r) dr (10) (11) (12) (13) (rT )n e−rT −µr µe dr n! 0 ∞ µT n = rn (µ + T )e−(µ+T )r dr n!(µ + T ) 0 µT n E [X n ] = n!(µ + T ) where X is an exponential random variable with expected value 1/(µ + T ). In Example 6. ˆ 335 . The maximizing value of n is nM L = (µ + T )r . fR|N (r|n) dr = P [r < R ≤ r + dr|N = n] = (6) (7) (8) P [r < R ≤ r + dr.(c) The maximum likelihood estimate of N given R selects the value of n that maximizes fR|N =n (r). for n ≥ 0. we need to find the denominator PN (n). In this case.5. When dealing with situations in which we mix continuous and discrete random variables. N = n] P [N = n] P [N = n|R = r] P [r < R ≤ r + dr] = P [N = n] In terms of PDFs and PMFs. its often helpful to start from first principles. PN (n) = µT n (µ + T )n+1 (14) Finally. the conditional PDF of R given N is PN |R (n|r) fR (r) fR|N (r|n) = = PN (n) (rT )n e−rT µe−µr n! µT n (µ+T )n+1 µT n (µ + T )n+1 (7) = (µ + T )n+1 rn e−(µ+T )r n! (8) (a) The MMSE estimate of R given N = n is the conditional expected value E[R|N = n].Problem 9.3 Solution Both parts (a) and (b) rely on the conditional PDF of R given N = n. PN (n) = = ∞ −∞ ∞ PN |R (n|r) fR (r) dr (4) (5) (6) (rT )n e−rT −µr µe dr n! 0 ∞ µT n µT n = E [X n ] rn (µ + T )e−(µ+T )r dr = n!(µ + T ) 0 n!(µ + T ) where X is an exponential random variable with expected value 1/(µ + T ). N = n] P [N = n] P [N = n|R = r] P [r < R ≤ r + dr] = P [N = n] (1) (2) In terms of PDFs and PMFs. Hence. we obtain the MAP estimate ˆ RMAP (n) = 336 (11) . In Example 6. There are several ways to derive the nth moment of an exponential random variable including integration by parts. we find that E[R|N = n] = (n + 1)/(µ + T ). we need to find the denominator PN (n). When dealing with situations in which we mix continuous and discrete random variables.3. the MGF is used to show that E[X n ] = n!/(µ + T )n . the conditional PDF oF R is that of an Erlang random variable or order n + 1. its often helpful to start from first principles. Given N = n. PN (n) = Finally. we have fR|N (r|n) = PN |R (n|r) fR (r) PN (n) (3) To find the value of n that maximizes fR|N (r|n).5. fR|N (r|n) dr = P [r < R ≤ r + dr|N = n] = P [r < R ≤ r + dr. The MMSE estimate of R given N is N +1 ˆ (9) RM (N ) = E [R|N ] = µ+T (b) The MAP estimate of R given N = n is the value of r that maximizes fR|N (r|n). (µ + T ) ˆ RMAP (n) = arg max fR|N (r|n) = arg max r≥0 r≥0 n! n µ+T n+1 rn e−(µ+T )r (10) By setting the derivative with respect to r to zero. for n ≥ 0. From Appendix A. 1. That is. 1. n 0 otherwise (7) PK|Q (k|q) fQ (q) = PK (k) 0 (n+1)! k k!(n−k)! q (1 − q)n−k 0 ≤ q ≤ 1 otherwise (8) (d) The MMSE estimate of Q given K = k is the conditional expectation E[Q|K = k]. the conditional PMF of K is PK|Q (k|q) = q k (1 − q)n−k k = 0. PK (k) = ∞ −∞ 1 0 (3) T‘¡he maximizing value is q = k/n so that (4) PK|Q (k|q) fQ (q) dq = n k q (1 − q)n−k dq k (5) We can evaluate this integral by expressing it in terms of the integral of a beta PDF. . . The MMSE estimator is K +1 ˆ (9) QM (K) = E [Q|K] = n+2 337 That is. . K has the uniform PMF PK (k) = (6) PK (k) = Hence.7.(c) The ML estimate of R given N = n is the value of R that maximizes PN |R (n|r). . n − k + 1) PDF. we can write 1 1 1 β(k + 1. Since (n+1)! β(k + 1. (c) The conditional PDF of Q given K is fQ|K (q|k) = 1/(n + 1) k = 0. n 0 otherwise n k (1) The ML estimate of Q given K = k is qML (k) = arg max PQ|K (q|k) ˆ 0≤q≤1 (2) Differentiating PQ|K (q|k) with respect to q and setting equal to zero yields dPQ|K (q|k) = dq n k kq k−1 (1 − q)n−k − (n − k)q k (1 − q)n−k−1 = 0 K ˆ QML (K) = n (b) To find the PMF of K. . . .3. n − k + 1)q k (1 − q)n−k dq = n+1 0 n+1 That is. Q has a beta (k + 1. (rT ) e ˆ RML (n) = arg max r≥0 n! Seting the derivative with respect to r to zero yields ˆ RML (n) = n/T n −rT (12) (13) Problem 9. n − k + 1) = k!(n−k)! .4 Solution This problem is closely related to Example 9. E[Q|K = k] = (k + 1)/(n + 2). we average over all q. (a) Given Q = q. From the beta PDF described in Appendix A. given K = k. . . E[K] = n/2. 4. ˆ a = R−1 RYX1 = Y 14/13 −12/13 −12/13 14/13 1 19 7/4 . the mean squared error of the optimal estimator is ˆ e∗ = Var[X1 ] − a RYX1 L = RX (1. In any case. 1) 1 E X1 1 1 0  7/4 3/4 = RYX1 = A E [X2 X1 ] = A RX (2. Y1 ] .4. the optimum linear estimate ˆ is X1 (Y1 ) = a∗ Y1 + b∗ where a∗ = Cov [X1 .7(c). 26 26 (b) By Theorem 9. 1) Finally. 1/2 3/4 1  and Y = Y1 Y2 1 1 0 X. we see that µX1 = E [X1 ] = 0. 0 1 1 5/4 1/2 E [X3 X1 ] RX (3. 338 (10) .4 is just a special case of Theorem 9. The rest of the Y solution is just calculation.       2 RX (1. = 5/4 26 −7 (3) (4) Thus the linear MMSE estimator of X1 given Y is 19 7 ˆ ˆ X1 (Y) = a Y = Y1 − Y2 = 0.7(a) which states that the minimum ˆ ˆ ˆ mean square error estimate of X1 is X1 (Y) = a Y where a = R−1 RYX1 . its convenient to use Matlab with format rat mode to perform the calculations and display the results as nice fractions. Note that Theorem 9. we learn for vectors X = X1 X2 X3 E [X] = 0.13. (We note that even in the case of a 3 × 3 matrix. 1) = . (9) Since Y1 = X1 + X2 . µY1 = E [Y1 ] = E [X1 ] + E [X2 ] = 0.7308Y1 − 0.1 Solution From the problem statement. Var[Y1 ] b∗ = µX1 − a∗ µY1 . 1) − = 1 − 7/4 5/4 (5) (6) (7) 3 7/4 = 5/4 52 (8) RYX1 R−1 RYX1 Y 14/13 −12/13 −12/13 14/13 (c) We can estimate random variable X1 based on the observation of random variable Y1 using Theorem 9. (2) RY = ARX A = 0 1 1 3 7/2 1/2 3/4 1 0 1 In addition. from Theorem 9.    1 3/4 1/2 1 0 1 1 0  7/2 3 3/4 1 3/4 1 1 = . since RYX1 = E[YX1 ] = E[AXX1 ] = AE[XX1 ].Problem 9.4.8 in which the observation is a random vector.) From Theorem 5. 0 1 1 that (1) Y = AX = (a) Since E[Y] = AE[X] = 0. we can apply Theorem 9.  1 3/4 1/2 RX = 3/4 1 3/4 .2692Y2 . 7(a) which states that the minimum ˆ ˆ ˆ mean square error estimate of X1 is X1 (Y) = a Y where a = R−1 RYX1 . Y1 ] 7/4 = = σX1 σY1 7/2 7 . 1/2 3/4 1 In addition. 1) + RX (1. Y1 ] = 7/4. Var[Y1 ] 7/2 2 Thus the optimum linear estimate of X1 given Y1 is 1 ˆ X1 (Y1 ) = Y1 .2 Solution From the problem statement. RX = 3/4 1 3/4 . 1) = 1 and σY1 = RY (1.6 0 1 1 1/2 3/4 1 0 1 339 (6) (7) (8) (3) (4) (5) . Var[Y1 ] = E Thus a∗ = Y12 = RY (1. 8 (17) Thus e∗ = 1 − ( 7/8)2 = 1/8. 1) = 7/2. b∗ = µX1 − a∗ µY1 = 0. First we find RY .1 3 3. Y1 ] = = .4(b). Problem 9. 2 (15) From Theorem 9.1 that (1) (2) (a) Since E[Y] = AE[X] = 0.1 0 3.These facts. since Cov[X1 . Y= 1 1 0 Y1 = AX + W = X + W. As we would expect.4. E[WX ] = 0 and E[XW ] = 0. we learn for vectors X = X1 X2 X3   1 3/4 1/2 E [X] = 0.Y1 = Cov [X1 . Y2 0 1 1 and W = W1 W2 RW = 0. Y RY = E YY = E (AX + W)(AX + W) = E (AX + W)(X A + W ) = E AXX A + E WX A + E AXW + E WW Since X and W are independent. we can apply Theorem 9. the mean square error of this estimator is 2 e∗ = σX1 (1 − ρ2 1 . Note that 1/8 > 3/52. This implies RY = AE XX A + E WW = ARX A + RW    1 3/4 1/2 1 0 0.Y1 ) L X (16) 2 2 Since X1 and Y1 have zero expected value. imply Cov [X1 . = 0 0. we see that ρX1 .6 3 1 1 0  3/4 1 3/4 1 1 + = .1 0 0 0. along with RX and RY from part (a). Also. the estimate of L X1 based on just Y1 has larger mean square error than the estimate based on both Y1 and Y2 . Y1 ] = E [X1 Y1 ] = E [X1 (X1 + X2 )] = RX (1. 1) = 7/2 (11) (12) (13) (14) 7/4 1 Cov [X1 . σX1 = RX (1. E [W] = 0. 2) = 7/4. In any case. using RY from part (a). in the noiseless case.6439Y1 − 0. 132 132 (b) By Theorem 9. we see that Var[Y1 ] = E Y12 = RY (1.4.1. 3/52 = 0. b∗ = µX1 − a∗ µY1 . (16) Var[Y1 ] Since E[Xi ] = µXi = 0 and Y1 = X1 + X2 + W1 . = 5/4 132 −25 (10) (9) Putting these facts together. This implies RYX1      2 RX (1. Note that Theorem 9. In comparing the estimators. 1) − RYX1 R−1 RYX1 Y 10/11 −25/33 −25/33 10/11 In Problem 9.Once again.1098. we find that ˆ a = R−1 RYX1 = Y (11) Thus the linear MMSE estimator of X1 given Y is 85 25 ˆ ˆ X1 (Y) = a Y = Y1 − Y2 = 0. the optimum linear estimate is X1 (Y1 ) = ∗ Y + b∗ where a 1 Cov [X1 . = 5/4 264 (15) (12) = RX (1. These facts. the mean squared error of the optimal estimator is ˆ e∗ = Var[X1 ] − a RYX1 L = 1 − 7/4 5/4 (13) (14) 29 7/4 = 0.1894Y2 . Y1 ] = E [X1 Y1 ] = E [X1 (X1 + X2 + W1 )] = RX (1. 1) + RX (1. we see that the additive noise perturbs the estimator somewhat but not dramatically because the correaltion structure of X and the mapping A from X to Y remains unchanged. 1) 1 E X1 1 1 0  7/4 3/4 = .8 in which the observation ˆ is a random vector. On the other hand. we solved essentialy the same problem but the observations Y were not subjected to the additive noise W. from Theorem 9. (c) We can estimate random variable X1 based on the observation of random variable Y1 using Theorem 9.7(c).4.6. imply Cov [X1 .0577 versus 0. 1)  10/11 −25/33 −25/33 10/11 1 85 7/4 . independence of W and X1 yields RYX1 = E [YX1 ] = E [(AX + W)X1 ] = AE [XX1 ] . the resulting mean square error was about half as much. 340 (20) (18) (19) (17) . = A E [X2 X1 ] = A RX (2. 1) = 0 1 1 5/4 1/2 E [X3 X1 ] RX (3.4. 2) = 7/4 In addition.1098. along with independence of X1 and W1 . we see that µY1 = E [Y1 ] = E [X1 ] + E [X2 ] + E [W1 ] = 0.4 is a special case of Theorem 9. Y1 ] a∗ = . 1) = 3. and CX = RX − µX µX       1 3/4 1/2 −0.51 = 3/4 1 1/2 −  0  −0.1493. 0 and W = W1 W2 RW = 0.75 0. Y1 ] = = . Since E[Y] = AE[X] + E[W] = AE[X].1 that (1) In addition. (24) =√ = ρX1 . 1) = 3.4(b). Y First we find RY . Y2 0 1 1 (2) (3) (4) (a) Since E[X] is nonzero.1 0 0 0. As we would expect. since Cov[X1 .Y1 ) L X (22) (23) 2 2 Since X1 and Y1 have zero expected value. from Theorem 5.1 = 0. Var[Y1 ] 3. we see that √ Cov [X1 . Problem 9. 72 From Theorem 9. the estimate of X1 based on just Y1 L has larger mean square error than the estimate based on both Y1 and Y2 .75 0. CY = E (Y − E [Y])(Y − E [Y]) (5) (6) (7) (8) = E (A(X − E [X]) + W)(A(X − E [X]) + W) = AE (X − E [X])(X − E [X]) A = E (A(X − E [X]) + W)((X − E [X]) A + W ) + E W(X − E [X]) A + AE (X − E [X])W + E WW 341 . (21) Thus the optimum linear estimate of X1 given Y1 is 35 ˆ X1 (Y1 ) = Y1 .51 0. Y1 ] = 7/4.3 Solution   −1  0 . σX1 = RX (1.4. 1) = 1 and σY1 = RY (1.1 0.6. the mean square error of this estimator is 2 e∗ = σX1 (1 − ρ2 1 . Y1 ] 7/4 490 .75 1.6 72 b∗ = µX1 − a∗ µY1 = 0. we use Theorem 9. Also.Y1 = σX1 σY1 24 3.12. E [X] = 1 From the problem statement.1 0 0.Thus a∗ = 7/4 35 Cov [X1 . 1/2 3/4 1  E [W] = 0 .99 0.8 which states that the minimum mean square error ˆ ˆ ˆ ˆ b b estimate of X1 is X1 (Y) = a Y + ˆ where a = C−1 CYX1 and ˆ = E[X1 ] − a E[Y].6 Thus e∗ = 1 − (490/242 ) = 0.99 Y= 1 1 0 Y1 = AX + W = X + W.75 .0 0.1 0. we learn for vectors X = X1 X2 X3  1 3/4 1/2 RX = 3/4 1 1/2 . 1/2 3/4 1 0. 1 = −0.9378 −0. Y1 ] .1. 1) 1 1 0  1. = A CX (2. independence of W and X1 yields E[W(X1 − E[X1 ])] = 0.9378 −0. from Theorem 9.7400 1.0172.6411 = 1.1096.0 0.26 −0. This is because the change in E[X] is also included in the change in E[Y ]. E[W(X − E[X]) ] = 0 and E[(X − E[X])W ] = 0. 1) ˆ a = C−1 CYX1 = Y 0.99 0. 0 1 1 0 0. = AE [(X − E [X])(X1 − E [X1 ])]     0.2600 (21) 0.74 0.9378 CYX1 C−1 CYX1 Y = 0.51 CX (3. Note that Theorem 9. implying = E [(A(X − E [X]) + W)(X1 − E [X1 ])] (11) (12) (13) (14) Finally.8 in which the observation ˆ is a random vector.51 1 0 1 1 0  0.2600 (c) We can estimate random variable X1 based on the observation of random variable Y1 using Theorem 9. the estimator’s offset is only 0.99 CX (1.1 − 0.7863 −0. In any case.75 1.7863 0.8(c). a∗ = (22) Var[Y1 ] 342 . 1.7863 0. (b) By Theorem 9.74 0.75 1 1 + = = .4. b∗ = µX1 − a∗ µY1 . b (18) We note that although E[X] = 0.4 is a special case of Theorem 9.99 0 1 CYX1 = E [(Y − E [Y])(X1 − E [X1 ])] (9) (10) Once again.7863 −0.26 0.01 0.1865Y2 + 0.6411 −0.75 = .1 3.1865 (15) Next.59 0. we find that This implies   −0.1 −0. the mean square error of the optimal estimator is ˆ e∗ = Var[X1 ] − a CYX1 L = CX (1.9378 1.7400 = 0.1 0 = E [Y] = AE [X] = .51 0.6411Y1 − 0.1 1 1 0  −0.1865 ˆ b Thus the linear MMSE estimator of X1 given Y is (17) ˆ ˆ X1 (Y) = a Y + ˆ = 0.99 − 1.1 0.01 3. 1) − (19) (20) 1. the optimum linear estimate is X1 (Y1 ) = ∗ Y + b∗ where a 1 Cov [X1 .75 0.Since X and W are independent.59 3.0172. 1) = 0 1 1 1. 0 1 1 0.1 (16) ˆ = E [X1 ] − a E [Y] = −0. This implies CY = ACX A + RW    0.75 0.0172 0.4.1 0 3. Y1 ] = E [(X1 − E [X1 ])(Y1 − E [Y1 ])] = CX (1.74 In addition. the mean square error of this estimator is 2 e∗ = σX1 (1 − ρ2 1 .1.Y1 ) L X (31) (32) 2 2 Note that σX1 = RX (1.4 Solution From Theorem 9.Y . since Y1 = X1 + X2 + W1 and E[W1 ] = 0.923)2 ) = 0. Similarly. Y1 ] = 1.4847. Y Since E[Y ] = 0. we see that Cov [X1 . Var[Y1 ] 3. the estimate of X1 based on L just Y1 has larger mean square error than the estimate based on both Y1 and Y2 . RYX = E [YX] = E [Y X] = Cov [X.0515. Y1 ] = = 0. (29) (26) (27) (28) = E [(X1 − E [X1 ])((X1 − E [X1 ]) + (X2 − EX2 ) + W1 )] (23) (24) (25) (30) Thus the optimum linear estimate of X1 given Y1 is ˆ X1 (Y1 ) = 0. Y1 ] 1.59 b∗ = µX1 − a∗ µY1 = −0. along with independence of X1 and W1 . we see that Var[Y1 ] = E Y12 = CY (1.74. 2 (1) RY = E YY = E Y 2 = σY .74 = = 0.7(a).59. Thus a∗ = 1. Y ] σX = ρX.Y1 = σX1 σY1 (0. using CY from part (a). we know that the minimum mean square error estimate of X given Y is ˆ ˆ ˆ ˆ ˆ XL (Y) = a Y. (33) ρX1 . 1) = 3.99 and σY1 = RY (1. 2) = 1.99(1 − (0. Also. As we would expect.923.1466. 1) = 0.74 Cov [X1 . Second. In this problem. From Theorem 9.4(b). σY σX σY σY (3) 343 .99)(3.59. 1) + CX (1. Y is simply a scalar Y and a is a scalar a.First we note that E[X1 ] = µX1 = −0. since Cov[X1 . It follows that 2 a = R−1 RYX = σY ˆ Y −1 (2) Cov [X. we see that µY1 = E [Y1 ] = E [X1 ] + E [X2 ] + E [W1 ] = E [X1 ] + E [X2 ] = −0.59) Thus e∗ = 0.0515. 1) = 3. Problem 9.4.4847Y1 − 0.1 These facts. imply Cov [X1 . Y ] = σX Cov [X. where a = R−1 RYX . Y ] . X2 r23 −0.8X2 ..36.64 = 0. Note that  1 c . the mean square error is still e∗ = 0. X3 = −0.  cn−2 . This implies that a is the solution to ˆ RY a = RYX . .8X2 . we found that the optimal linear estimate of X3 based on the observation ˆ of random variables X1 and X2 employed only X2 .6. ··· .8 (6) (c) In the previous part. if you wish to do more algebra. (2) . .64 0 = .8 1 E [X1 X3 ] r 0.  .In this problem. is the optimal linear estimate of X3 just using X2 .8 L 0. L Problem 9. 0.   c (1) RY   = E YY =    c c2 . we can use Theorem 9.  . r21 r22 −0..64 X1 X3 = = 13 = . Hence this same estimate. .8 E [X2 X3 ] 25/9 20/9 20/9 25/9 −1 (3) ˆ a = R−1 RYX1 = Y The optimum linear estimator of X3 given X1 and X2 is ˆ X3 = −0.8 = . Since E[X] = 0. we view Y = X1 X2 as the observation and X = X3 as the variable we wish to estimate. RYX = E [YX] = E Thus 2 E [X1 X2 ] E X1 2 E [X2 X1 ] E X2 Problem 9.  .  . . . (b) By Theorem 9.    .7(c).5 Solution (1) (2) r11 r12 1 −0. c  1 344   n−1  c   cn−2       Xn  =  . .36 −0. . . (This can be derived using Theorem 9. let Y = X1 X2 · · · n−1 and let X = Xn .8 (4) (5) The fact that this estimator depends only on X2 while ignoring X1 is an example of a result to be proven in Problem 9.7(a) tells us that the minimum mean square linear estimate of X given Y is Xn (Y) = −1 ˆ ˆ ˆ a Y. Since E[Y] = 0 and E[X] = 0. c cn−2 · · ·   RYX = E   Xn−1 X1 X2 .4.  . the mean squared error of the optimal estimator is ˆ e∗ = Var[X3 ] − a RYX3 = 1 − 0 −0.) (d) Since the estimator is the same. where a = RY RYX .4..8 −0.6 Solution For this problem.7(a) to find the minimum mean square error ˆ ˆ ˆ estimate XL (Y) = a Y where a = R−1 RYX .4. RY = E YY = = Similarly. Y (a) In this case. ˆ Theorem 9. −0.4. k E [Y] = j=1 √ E [Xj ] pj Sj + E [N] = 0.7 Solution (a) In this case. you will find that the claim we just proved is the essentially the same as that made in Theorem 11. then (3) (4) (5) (6) 2 = RX (n. The mean square error of this estimate is e∗ = E (Xn − cXn−1 )2 L 2 2 0 c . E[Xi2 ] = 1. In fact. if you read ahead. First we note that    k √ (2) Xj pj Sj + N Xi  RYXi = E [YXi ] = E  j=1 For the same reasons. n) − cRX (n. Xn−1 and Xn are highly correlated and the estimation error will be small. (1) Since N and Xi are independent.ˆ We see that the last column of cRY equals RYX . It follows that the optimal linear estimator of Xn given Y is ˆ ˆ Xn (Y) = a Y = cXn−1 . we use the observation Y to estimate each Xi . Problem 9. which completes the proof of the claim.10. n − 1) = 1 − 2c + c = 1 − c When c is close to 1. Theorem 9. if a = 0 · · · ˆ RY a = RYX . Comment: We will see in Chapter 11 that correlation matrices with this structure arise frequently in the study of wide sense stationary random sequences. and it follows that k √ √ RYXi = E [Xj Xi ] pj Sj + E [NXi ] = pi Si . E[NXi ] = E[N]E[Xi ] = 0. Because Xi and Xj are independent for i = j.4. (3) j=1 ˆ ˆ Thus. Since E[Xi ] = 0. RY = E YY = E  k  k k j=1 √ = j=1 l=1 k √ pj Xj Sj + N  k l=1 √ pl X l S l + N   (4) pj pl E [Xj Xl ] Sj S l k + j=1 k √ pj E [Xj N] Sj + =0 l=1 √ pl E Xl N S l + E NN =0 (5) = j=1 pj Sj S j + σ 2 I (6) 345 . Equivalently. E[Xi Xj ] = E[Xi ]E[Xj ] = 0 for i = j. n) + c2 RX (n − 1. In addition. n − 1) − cRX (n − 1.7(a) tells us that the MMSE linear estimate of Xi is Xi (Y) = a Y where −1 ˆ a = RY RYXi . For a matrix S with columns S1 . we can form the vector of estimates  √  √ ˆ p1 S 1 V p1 X1   . r1=sinc(0. This implies the MMSE estimate of Xi given Y is √ −1 ˆ ˆ Y (10) Xi (Y) = a Y = pi S i SPS + σ 2 I (b) We observe that V = (SPS + σ 2 I)−1 Y is a vector that does not depend on which bit Xi √ ˆ that we want to estimate. . p2 . j=1 (7) Although this identity may be unfamiliar. .9 results in the highest mean square error. if this is unfamiliar.   = . . 0.5*(0:20)). All we need to do is define the correlation structure of the vector X = X1 · · · X21 .1 yields the lowest mean square error while φ0 = 0.1*(0:20)). S2 . The result is that the MSE starts to 346 . 20 are increasingly correlated with X21 . (9) Recall that if C is symmetric..4 0.1. hold on. we do this by defining just the first row of the correlation matrix. Sk . k pj Sj S j = SPS . √ ˆ pk S k V Xk  . . 0. . Sk −1 . . it is handy in manipulating correlation matrices.1 Solution Although the plot lacks labels.1. ˆ = . Keep in mind that MSE(n) is the MSE of the linear estimate of X21 using random variables X1 . = . mse(r1). . you’ll find that the φ0 = 0. mse(r9). . and a diagonal matrix P = diag[p1 .Now we use a linear algebra identity.2 0 0 5 10 15 20 25 Problem 9.) Thus.9}. pk ]. then C−1 is also symmetric. . you may wish to work out an example with k = 2 vectors of length 2 or 3. Xn . . .9*(0:20)). 1 0. X  . (Also. there are three curves for the mean square error MSE(n) corresponding to φ0 ∈ {0.6 0. and ˆ a = R−1 RYXi = SPS + σ 2 I Y −1 √ pi Si . If you run the commands.5. .  pk  S1  . When φ0 = 0.8 0. r5=sinc(0. .   .10. . V . √ (11) (12) (13) = P1/2 S V = P1/2 S SPS + σ 2 I Y This problem can be solved using the function mse defined in Example 9. (8) RY = SPS + σ 2 I. . r9=sinc(0. 11. .10. . and the resulting plot. Since Xi = pi S i V. random variables Xn for n = 10. Just as in Example 9. mse(r5).5. Here are the commands we need. 10. RY(n) X = E  (2)  X1  =  .1 Problem 9.10. in all three cases. In addition.2 Solution 0. r1=cos(0. fewer observations Xn are correlated with X21 . even X20 has only a small correlation with X21 . ρX1 . which effectively reduces its variance. let W(n) .05 0 0 10 20 0 0 10 20 0 0 10 20 n=0:20. The choice of φ0 changes the values of n for which Xn and X21 are highly correlated. the reason is that in all three cases. 0. r9=cos(0.9. that is.10. X. This problem can be solved using the function mse defined in Example 9. As a result. X1 and X21 are completely correlated.1.1*pi*n). Subsequent improvements in the estimates are the result of making other measurements of the form Yn = Xn + Wn where Xn is highly correlated with X21 . For example.10.8 and 9. corresponding to the variance of the additive noise.1 0. independence of X(n) and W(n) implies that the correlation matrix of Y(n) is RY(n) = E (X(n) + W(n) )(X(n) + W(n) ) = RX(n) + RW(n) (1) Problem 9.05 0.5*pi*n). .      r0 X 1 + W1     . X(n) .9*pi*n). We only get a good estimate of X21 at time n = 21 when we observe X21 + W21 . n=0:20. and Y.  . the observation is Y1 = X1 + W1 = X21 + W1 .5. mse(r5). .5.decline rapidly for n > 10. we do this by defining just the first row r of the correlation matrix. X n + Wn rn−1 347 . The result is a form of averaging of the additive noise. . (1) The MSE at time n = 1 is 0. there are several such values of n and the result in all cases is an improvement in the MSE due to averaging the additive noise. As φ0 increases. All three cases report similar results for the mean square error (MSE). X1 = X21 so that at time n = 1. except perhaps the Matlab code is somewhat simpler. Here are the commands we need.X21 = 1. when φ0 = 0. All we need to do is define the correlation structure of the vector X = X1 · · · X21 .3 Solution Note that RX(n) and RW(n) are the n × n upper-left submatrices of RX and RW . and Y(n) denote the vectors.05 0. consisting of the first n components of W. and the resulting plots. n=0:20.1 0. The solution to this problem is almost the same as the solution to Example 9.  . r5=cos(0. As in the example. mse(r9). The result is the MSE is simply worse as φ0 increases. Just as in Example 9. Just as in Examples 9. However. mse(r1). which holds the first row of the Toeplitz correlation matrix RX . Y (n) and the mean square error is a (3) e∗ = Var[X1 ] − (ˆ(n) ) RY(n) X . samples Xn for n ≥ 10 have very little correlation with X1 . RY=toeplitz(r(1:n))+0. X9 . Y1 = X1 + W1 . Thus the solutions are essentially the same. mse953(ra) rb=cos(0. en=r(1)-(a’)*RYX.Compared to the solution of Example 9.05 0 0 0 5 10 n 15 20 25 ra=sinc(0. As a result. RYX=r(1:n)’. e=[e. L function e=mse953(r) N=length(r).4 Solution Y. Note that RX(n) is the Toeplitz matrix whose first row is the first n elements of r.4.5.7. the received signal vector Y = SP1/2 X+N. n. end plot(1:N. for case (b). the necessary Matlab commands and corresponding mean square estimation error output as a function of n are shown here: 0. The MSE at n = 1 is just the variance of W1 .10.en]. Also. the MSE continues to go down with increasing n.4. Other samples also have significant correlation with X1 .05 0.1*eye(n). In Problem 9. When the transmitted bit vector is X. we generate the vector r and then run mse953(r). The input is the vector r correspondL ing to the vector r0 · · · r20 . for n=1:N. the samples of Yn result in almost no improvement in the estimate of X1 . X5 .6. For case (a).5*pi*(0:20)). a=RY\RYX. On the other hand. we found that the linear MMSE estimate of the vector X is ˆ X = P1/2 S SPS + σ 2 I −1 Problem 9. just as in case (a). For the requested cases (a) and (b).1*pi*(0:20)). In case (b). 348 . The optimal filter based on the first n observations is a(n) = R−1 RY(n) X . is simply a noisy copy of X1 and the estimation error is due to the variance of W1 . e=[]. Y1 is a noisy observation of X1 and is highly correlated with X1 .m simply calculates the mean square error e∗ . The program mse953.e). mse953(rb) (a) (b) In comparing the results of cases (a) and (b). (1) As indicated in the problem statement. the evaluation of the LMSE detector follows the approach outlined in Problem 8. Additional samples of Yn mostly help to average the additive noise. To plot the mean square error as a function of the number of observations. we see that the mean square estimation error depends strongly on the correlation structure given by r|i−j| .1 MSE MSE 0 5 10 n 15 20 25 0. X13 and X17 and X21 are completely correlated with X1 . Thus for n ≥ 10. the only difference in the solution is in the reversal of ˆ the vector RY(n) X .1 0. 1000.P.k).0)/sqrt(n).’-s’.k. e=sum(sum(XR ~= X)).pedec.m). L=P2*S’*inv((S*P*S’)+eye(n)).4. LMSE detector. pelmse = lmsecdma(32.snr.kt=K(j).s.1000).pemf. %S is an n by k matrix. pedec=berdecorr(32.7 with the following script: k=[2 4 8 16 32]. >> pelmse = lmsecdma(32. for i=1:s.m).m) that simulates the LMSE detector for m symbols for a given set of signal vectors S. X=randombinaryseqs(k.1).1000).1060 For processing gain n = 32. XR=sign(L*Y).kt). end Here is a run of lmsecdma.4.2).p*eye(kt).k.k.’LMSE’.0348 0. The evaluation of the LMSE detector is most similar to evaluation of the matched filter (MF) detector in Problem 8. S=((2*S)-1.k. axis([0 32 0 0. e=e+lmsecdmasim(S.6. each of length n. %S= n x k matrix of signals %P= diag matrix of SNRs % SNR=power/var(noise) k=size(S. m symbols/frame %See Problem 9. e=0.snr. Pe=zeros(size(SNR)). %Pe=lmsecdma(n. %no. Y=S*P2*X+randn(n.4.’DEC’. function Pe=lmsecdma(n.1000.1000.10000). p=SNR(j).s.SNR]=ndgrid(k.m). pemf = mfrcdma(32.4.5.pelmse).’-d’.m).6 and the decorrelator of Problem 8. We define a function err=lmsecdmasim(S. S=randomsignals(n. The matched filter processing can be applied to all m columns at once. Each random signal is a binary ±1 sequence normalized to length 1. %RCDMA.m).4.P.4. %err=lmsecdmasim(P. of users n=size(S.k. disp([snr kt e]). 4 2 48542 4 4 109203 4 8 278266 4 16 865358 4 32 3391488 >> pelmse’ ans = 0.k)>0.k. Instead we can compare the LMSE detector to the matched filter (MF) detector of Problem 8.4. %proc.5]).5).k.0273 0. plot(k.2). function e=lmsecdmasim(S. legend(’MF’. there is no need for looping. Here are the functions: function S=randomsignals(n.0541 >> 0. the maximum likelihood detector is too complex for my version of Matlab to run quickly.S. columns are %random unit length signal vectors S=(rand(n. end Pe(j)=e/(s*m*kt).1000).snr). gain P2=sqrt(P).snr.m). The mth transmitted symbol is represented by the mth column of X and the corresponding received signal is given by the mth column of Y. A second function Pe=lmsecdma(n.0243 0.The transmitted data vector x belongs to the set Bk of all binary ±1 vectors of length k. users=k %proc gain=n.k.m) cycles through all combination of users k and SNR snr and calculates the bit error rate for each pair of values. This short program generates k random signals.m). rand signals/frame % s frames. 349 . In lmsecdmasim.s.k. for j=1:prod(size(SNR)).4 Solution [K. When the number of users equals the processing gain. the decorrelator works well because the cost of suppressing other users is small. When the number of users is small.The resulting plot resembles 0. Recall that for each bit Xi .2 0. the decorrelator owrks poorly because the noise is gratly enhanced. it still works better than the matched filter detector.3 0.4 0. 350 . the decorrelator zeroes out the interference from all users j = i at the expense of enhancing the receiver noise. the linear pre-processing of the LMSE detector offers an improvement in the bit error rate.1 0 0 5 10 15 20 25 30 MF DEC LMSE Compared to the matched filter and the decorrelator. the LMSE detector applies linear processing that results in an output that minimizes the mean square error between the output and the original transmitted bit. By comparison. For a large number of users. It works about as well as the decorrelator when the number of users is small.5 0. However.5 0 −0.s3) −0.5 −1 0 1 0. 0. s1 .8T T x(t. s3 }.8T T x(t.3.5 −1 0 0.5 0 0. s2 .4T 0.4T 0. 1. .6T 0. The ensemble of sample functions is {x(t.6T 0. then the process was would be discrete value. discrete value random process. si )|i = 0.2T 0.5 −1 0 1 0. 1. The four elements in the sample space are equally likely. .2T 0. • In Example 10.3 Solution The eight possible waveforms correspond to the bit sequences {(0.4T 0. 0).2. continuous value random process.s1) −0. 2. (1. the number of active telephone calls is discrete time and discrete value. si ) = cos(2πf0 t + π/4 + iπ/2) For f0 = 5/T .Problem Solutions – Chapter 10 Problem 10.6T 0.2T 0.5 yields a discrete time.1 Solution • In Example 10. .8T T t Problem 10.8T T x(t. 3} where x(t.s2) −0. 1 (0 ≤ t ≤ T ) (1) x(t.2. .2T 0. • The QPSK system of Example 10.5 −1 0 1 0. 0).s0) 0. 0.6T 0.6 is a continuous time and continuous value random process.4.5 0 0. 0). 1)} The corresponding eight waveforms are: 351 (1) .2 Solution The sample space of the underlying experiment is S = {s0 . • The dice rolling experiment of Example 10. 1.5 0 0. if the temperature is recorded only in units of one degree. this ensemble is shown below.2.4T 0. (1. Problem 10. the daily noontime temperature at Newark Airport is a discrete time. (1. What makes this problem fairly straightforward is that the ramp is defined for all time. X(π/2) is a discrete random variable. Hence X(π/2) has PDF (1) fX(π/2) (x) = δ(x) That is. if x ≥ t then P [W ≥ t − x] = 1. consider the rectified cosine waveform X(t) = R| cos 2πf t| of Example 10.1 Solution In this problem. P [W ≥ t − x] = Combining these facts. the ramp doesn’t start at time t = W .1 0 −1 10 0 −1 10 0 −1 10 0 −1 10 0 −1 10 0 −1 10 0 −1 10 0 −1 0 T 2T 3T T 2T 3T T 2T 3T T 2T 3T T 2T 3T T 2T 3T T 2T 3T T 2T 3T Problem 10. we have FX(t) (x) = P [W ≥ t − x] = e−(t−x) x < t 1 t≤x (3) ∞ t−x (1) fW (w) dw = e−(t−x) (2) We note that the CDF contain no discontinuities.2. P [X(t) ≤ x] = P [t − W ≤ x] = P [W ≥ t − x] Since W ≥ 0. then cos 2πf t = 0 so that X(π/2) = 0.9. Problem 10. As a counterexample. That is. we start from first principles. Taking the derivative of the CDF FX(t) (x) with respect to x. When x < t.4 Solution The statement is false. we obtain the PDF fX(t) (x) = ex−t x < t 0 otherwise (4) 352 .3. When t = π/2. 05. (c) Since p = 0. the number of additional trials needed to find the next one part in 104 oscillator once again has a geometric PMF with mean 1/p since each independent trial is a success with probability p. . . just as in Example 2. T5 = K1 + K2 + K3 + K4 + K5 (5) where each Ki is identically distributed to T1 . .05) = 0. Similarly. see Example 2. . the time required to find 5 one part in 104 oscillators is the sum of five independent geometric random variables. 9 otherwise (3) A geometric random variable with success probability p has mean 1/p. (d) The time T5 required to find the 5th one part in 104 oscillator is the number of trials needed for 5 successes.3.2 Solution (a) Each resistor has frequency W in Hertz with uniform PDF fR (r) = 0. 6. the probability the first one part in 104 oscillator is found in exactly 20 minutes is PT1 (20) = (0. Since the expectation of the sum equals the sum of the expectations. Hence.Problem 10. we view each oscillator test as an independent trial. A success occurs on a trial with probability p if we find a one part in 104 oscillator. .5. T1 has the geometric PMF PT1 (t) = (1 − p)t−1 p t = 1. Once we find the first one part in 104 oscillator. The expected time to find the first good oscillator is E[T1 ] = 1/p = 20 minutes. If this is not clear.025 9980 ≤ r ≤ 1020 0 otherwise (1) The probability that a test yields a one part in 104 oscillator is p = P [9999 ≤ W ≤ 10001] = 10001 9999 (0.15 where the Pascal PMF is derived. we find that E[T5 ] = 5/p = 100 minutes. t − 1 followed by a success on trial t. otherwise (4) Looking up the Pascal PMF in Appendix A. This is derived in Theorem 2.0189. . T5 is a Pascal random variable.025) dr = 0. .05 (2) (b) To find the PMF of T1 .95)19 (0. E [T5 ] = E [K1 + K2 + K3 + K4 + K5 ] = 5E [Ki ] = 5/p = 100 minutes (6) 353 . .11. The first one part in 104 oscillator is found at time T1 = t if we observe failures on trials 1. The following argument is a second derivation of the mean of T5 . . 2. That is. . the Pascal PMF is PT5 (t) = 0 t−1 4 p5 (1 − p)t−5 t = 5. When we are looking for 5 successes. 1 Solution Problem 10. we will see an impulse at x = 0. we will assume T ≥ 0. P [X(t) > x] = 0. P [X(t) > x] = P e−(t−T ) u(t − T ) > x = P [t + ln x < T ≤ t] = FT (t) − FT (t + ln x) (1) (2) (3) Problem 10.2 Solution Each Wn is the sum of two identical independent Gaussian random variables. Next. Thus.3. The PDF of X(t) is fX(t) (x) = (1 − FT (t))δ(x) + fT (t + ln x) /x 0 ≤ x < 1 0 otherwise (6) (5) Each Yk is the sum of two identical independent Gaussian random variables.  x<0  0 1 + FT (t + ln x) − FT (t) 0 ≤ x < 1 (4) FX(t) (x) = 1 − P [X(t) > x] =  1 x≥1 We can take the derivative of the CDF to find the PDF. However. In particular. when we take a derivative. In particular. we need to keep in mind that the CDF has a jump discontinuity at x = 0. the Wn are identically distributed. That is. That is T2 = T1 + T where T is independent and identically distributed to T1 . since ln 0 = −∞. Hence. we observe that each Yk is composed of two samples of Xk that are unused by any other Yj for j = k. Lastly. The other condition T > t + ln x ensures that the pulse didn’t arrrive too early and already decay too much. FX(t) (0) = 1 + FT (−∞) − FT (t) = 1 − FT (t) Hence.3 Solution Since the problem states that the pulse is delayed. since Wn−1 and 354 . (1) (2) Problem 10. Things are more complicated when t > 0.4. the number of additional tests needed to find the next one part in 104 oscillator once again has a geometric PMF with mean 1/p since each independent trial is a success with probability p. Problem 10. However. for 0 ≤ x < 1. For x < 0.Once we find the first one part in 104 oscillator. That is. P [X(t) > x] = 1.4. To see this. E [T2 |T1 = 3] = E [T1 |T1 = 3] + E T |T1 = 3 =3+E T = 23 minutes. each Yk must have the same PDF. X(t) = 0 and fX(t) (x) = δ(x). the Yk are identically distributed.3. We can express these facts in terms of the CDF of X(t). Hence. each Wn must have the same PDF. For x ≥ 1.4 Solution Note that condition T ≤ t is needed to make sure that the pulse doesn’t arrive after time t. we observe that the sequence of Yk is independent. for t < 0. This problem is difficult because the answer will depend on t. 1. When t is measured in seconds. First. Hence. are identically distributed. .1954. (a) PN (1) (0) = 40 e−4 /0! = e−4 ≈ 0. . . The complete PMF of Yk is PYk (y) = (1 − p)y p y = 0. Moreover. We can verify this observation by calculating the covariance of Wn−1 and Wn . we observe that Wn−1 and Wn have covariance Cov [Wn−1 . . Problem 10. 1. we observe that for all n. 0 otherwise (1) Since this argument is valid for all k including k = 1.1 Solution This is a very straightforward problem. Problem 10.4.0107. is an iid sequence.3 Solution The number Yk of failures between successes k − 1 and k is exactly y ≥ 0 iff after success k − 1. E [Wn ] = (E [Xn ] + E [Xn−1 ])/2 = 30 Next. Wn ] = E [Wn−1 Wn ] − E [Wn ] E [Wn−1 ] 1 = E [(Xn−1 + Xn−2 )(Xn + Xn−1 )] − 900 4 We observe that for n = m. Wn−1 and Wn are dependent. Since the Bernoulli trials are independent. . we can write down the answer for each part. the failures between successes k − 1 and k and the number of failures between successes k − 1 and k are independent. the probability of this event is (1 − p)y p. each N (t) is a Poisson random variable with mean 4t and thus has PMF PN (t) (n) = 0 (4t)n −4t n! e n = 0. we can conclude that Y1 . (c) PN (2) (2) = 82 e−8 /2! = 32e−8 ≈ 0. otherwise (1) Using the general expression for the PMF. Y2 . E[Xn Xm ] = E[Xn ]E[Xm ] = 900 while 2 E Xn = Var[Xn ] + (E [Xn ])2 = 916 (1) (2) (3) (4) Thus. Y1 . . Wn ] = 900 + 916 + 900 + 900 − 900 = 4 4 (5) Since Cov[Wn−1 . Y2 . . Wn ] = 0. .5. 2. . 355 . . there are y failures followed by a success.0183. (b) PN (1) (4) = 44 e−4 /4! = 32e−4 /3 ≈ 0. Cov [Wn−1 . . The Poisson process has rate λ = 4 calls per second. since the trials are independent. Wn and Wn−1 must be dependent.Wn both use Xn−1 in their averaging. . 2. 1. The time between service completions are a sequence of iid exponential random variables. 2.1t /d! d = 0. The number of such customers has a Poisson PMF with mean λ[t − (t − 2)] = 2λ. Problem 10. The PMF of D(t).3 Solution Since there is always a backlog an the service times are iid exponential random variables. Since t hours equals 60t minutes. then any customers that have arrived must still be in service. the customers in service are precisely those customers that arrived in the interval (t−2. is PD(t) (d) = (0. we express each answer in terms of N (m) which has PMF PN (m) (n) = (6m)n e−6m /n! n = 0. otherwise (1) Problem 10. The resulting PMF of N (t) is (2λ)n e−2λ /n! n = 0. .1t)d e−0.1 drops/day. 2. Since the expected service time is 30 minutes. t]. Moreover. 1. the random variable D(t) is a Poisson random variable with parameter α = 0. . the expected number serviced is λ(60t) or 2t. . If t ≤ 2. (c) The probability of exactly three queries arriving in a one-half minute interval is PN (0. 0 otherwise (1) Note that it matters whether t ≥ 2 minutes.4 Solution Since D(t) is a Poisson process with rate 0. PN (t) (n) = (λt)n e−λt /n! n = 0. .Problem 10. . . Since a Poisson number of arrivals occur during (0. (b) The probability of exactly 6 queries arriving in a one minute interval is PN (1) (6) = 66 e−6 /6! = 0. the number serviced in the first t hours has the Poisson PMF PN (t) (n) = (2t)n e−2t n! 0 n = 0.5. the service completions are a Poisson process. the number of drops after t days. (t ≥ 2) (2) PN (t) (n) = 0 otherwise 356 . . 2. . . . . 1. .2 Solution Following the instructions given.224.5. 0 otherwise (1) (a) The probability of no queries in a one minute interval is PN (1) (0) = 60 e−6 /0! = 0. . the rate of the Poisson process is λ = 1/30 per minute. 0 otherwise (0 ≤ t ≤ 2) (1) Problem 10. .1t. 2. that is.5) (3) = 33 e−3 /3! = 0. 1.5.00248.5 Solution For t ≥ 2.5. 1.161. . t]. for n = 0. (a) P [T ≥ 4] = e−4/8 ≈ 0. .7 Solution This proof is just a simplified version of the proof given for Theorem 10. .3.368 P [T ≥ 5] e (c) Although the time betwen queries are independent exponential random variables. x]. 2.951. each question can be easily answered. 0 otherwise (7) (6) (5) Problem 10. T ≥ 5] P [T ≥ 5] (3) (4) P [T ≥ 13] e−13/8 = −5/8 = e−1 ≈ 0..6 Solution The time T between queries are independent exponential random variables with PDF fT (t) = From the PDF. Hence. Recall that in a Poisson process. we can calculate for t > 0. P [N (t) − 1 = n] = (t/8)n e−t/8 /n! Thus. The first arrival occurs at time X1 > x ≥ 0 iff there are no arrivals in the interval (0. . 1. . the first arrival occurs some time after t = 0. Hence. 2.Problem 10. . for x ≥ 0. P [T ≥ t] = t (1/8)e−t/8 t ≥ 0 0 otherwise (1) fT t 0 dt = e−t/8 (2) Using this formula.. for n = 1. N (t) is not exactly a Poisson random process because the first query occurs at time t = 0.5. . P [X1 > x] = P [N (x) = 0] = (λx)0 e−λx /0! = e−λx Since P [X1 ≤ x] = 0 for x < 0. However N (t) − 1 is a Poisson process of rate 8. . .5. the CDF of X1 is the exponential CDF FX1 (x) = 0 x<0 1 − e−λx x ≥ 0 (2) (1) 357 . 2. (b) P [T ≥ 13|T ≥ 5] = = P [T ≥ 13. the PMF of N (t) is PN (t) (n) = P [N (t) − 1 = n − 1] = (t/8)n−1 e−t/8 /(n − 1)! The complete expression of the PMF of N (t) is PN (t) (n) = (t/8)n−1 e−t/8 /(n − 1)! n = 1. . . of customers entering the casino is a Poisson random variable with expected value α = 2 · 50 = 100.5. the number. By Theorem 10. we see that N = n iff n i=1 n+1 ln Ui ≥ −t > ln Ui i=1 (5) Next. we multiply through by −1 and recall that Xi = − ln Ui is an exponential random variable. .6. Since the random variable N (t) has a Poisson distribution with mean t. Combining these facts. . 1. .8 Solution (a) For Xi = − ln Ui .6. N (t) = n iff j=1 the first n arrivals occur by time t but arrival n + 1 occurs after time t.. n! (7) Problem 10. customers entering the casino are a Poisson process of rate 100/2 = 50 customers/hour. (1) PN (n) = 0 otherwise 358 .1 Solution Customers entering (or not entering) the casino is a Bernoulli decomposition of the Poisson process of arrivals at the casino doors.Problem 10. implying P [Ui ≤ e−x ] = e−x . Thus in the two hours from 5 to 7 PM. e−x > 1 so that P [Ui ≤ e−x ] = 1. we have 0 < e−x ≤ 1. we can write P [Xi > x] = P [− ln Ui > x] = P [ln Ui ≤ −x] = P Ui ≤ e−x (1) When x < 0. the ith arrival occurs at time i Xj . . N . Moreover. This yields N = n iff n i=1 n+1 Xi ≤ t < Xi i=1 (6) Now we recall that a Poisson process N (t) of rate 1 has independent exponential interarrival times X1 . That is. . X2 . The PMF of N is 100n e−100 /n! n = 0. When x ≥ 0. (b) Note that N = n iff n i=1 n+1 Ui ≥ e−t > Ui i=1 (4) By taking the logarithm of both inequalities. we have P [Xi > x] = This permits us to show that the CDF of Xi is FXi (x) = 1 − P [Xi > x] = 0 x<0 1 − e−x x > 0 (3) 1 x<0 e−x x ≥ 0 (2) We see that Xi has an exponential CDF with mean 1. 2. we can write n n+1 P i=1 Xi ≤ t < Xi = P [N (t) = n] = i=1 tn e−t . 2 Solution In an interval (t. the number M2 of those customers in the system at time t has a Poisson PMF with mean λ(t − 1)/2. Let M2 denote those customers choosing a 2 minute service time. When 0 ≤ t < 1. All M1 customers arriving in that interval will be in service at time t. We can view each of the M2 customers as flipping a coin to determine whether to choose a 1 minute or a 2 minute service time. Let M1 denote those customers that arrived in the interval (t − 1.9. Hence N (t) is Poisson with mean E[N (t)] = E[M1 ] + E[M2 ] = 3λ/2.6 verifies that using Bernoulli trials to decide whether the arrivals of a rate λ Poisson process should be counted yields a Poisson process of rate pλ.3 Solution Now we can consider the special cases arising when t < 2. t − 1]. .Problem 10. . It should be clear that M2 is a Poisson number of Bernoulli random variables. . M2 is Poisson with expected value λ. The PMF of N (t) is PN (t) (n) = (3λ/2)n e−3λ/2 /n! n = 0. M2 is Poisson with mean λ/2. The M2 customers arriving in the interval (0. Theorem 10. 1. 2. let A = A1 ∪A2 denote the event of an arrival of either process. let Ai denote the event of an arrival of the process Ni (t). 2. . . the conditional probability of a type 1 arrival given an arrival of either type is P [A1 |A] = P [A1 ] λ1 ∆ λ1 P [A1 A] = = = P [A] P [A] (λ1 + λ2 )∆ λ1 + λ2 (2) (1) This solution is something of a cheat in that we have used the fact that the sum of Poisson processes is a Poisson process without using the proposed model to derive this fact. t]. Since Ni (t) is a Poisson process. In this case. the alternative model says that P [Ai ] = λi ∆. This was verified in Theorem 6. 1. the number of customers in service at time t has a Poisson 359 . Since M1 and M2 are independent Poisson random variables. Also. Only those customers choosing the two minute service time will be in service at time t. Thus the number in service N (t) equals the number of arrivals and has the PMF PN (t) (n) = (λt)n e−λt /n! n = 0. Also. Let M2 denote the number of customers that arrived during (t − 2. since N1 (t)+N2 (t) is a Poisson process. . Moreover. the number of customers in service at time t is N (t) = M1 + M2 . Since M2 has a Poisson PMF with mean λ(t − 1). we have the following situation. t + ∆] with an infinitesimal ∆. A consequence of this result is that a Poisson number of Bernoulli (success probability p) random variables has Poisson PMF with mean pλ. their sum N (t) also has a Poisson PMF. 0 otherwise (t ≥ 2) (1) Problem 10. 1]. 0 otherwise (0 ≤ t ≤ 1) (2) When 1 ≤ t < 2.6. Finally. every arrival is still in service. let M1 denote the number of customers in the interval (t − 1. When each service time is equally likely to be either 1 minute or 2 minutes. All M1 of these customers will be in the bank at time t and M1 is a Poisson random variable with mean λ.6. Of course. Only those customers that chooses a 2 minute service time will be in service at time t. t − 1] must each flip a coin to decide one a 1 minute or two minute service time. the proposed Poisson process model says P [A] = (λ1 + λ2 )∆ Lastly. We start with the case when t ≥ 2. sn < Sn ≤ sn + ∆. . . . . This joint event implies that there were zero arrivals in each interval (si + ∆. it follows that fS1 .PMF with expected value E[N (t)] = E[M1 ] + E[M2 ] = λ + λ(t − 1)/2. the conditional PDF of S1 . . for infinitesimal ∆. . .. . we see that P [s1 < S1 ≤ s1 + ∆. . sn < Sn ≤ sn + ∆|N = n] in somewhat excruciating detail. Sn |n) is nonzero only if s1 < s2 < · · · < sn < T . si+1 ]. . . . . Thus. N = n] = λ∆e−λ∆ Since P [N = n] = (λT )n e−λT /n!. .4 Solution e−λ(T −n∆) (1) (2) = (λ∆)n e−λT (3) (4) (5) If it seems that the above argument had some “hand-waving.Sn |N (S1 . That is. si + ∆] and that N = n. P [s1 < S1 ≤ s1 + ∆. . .. the event of one arrival in each interval (si . 1. sn < Sn ≤ sn + ∆. . . In this case. .6.. si + ∆) and zero events in the period of length T − n∆ are independent events because they consider non-overlapping periods of the Poisson process.Sn |N (s1 . .. . . . . in particular. . sn |n) ∆n = P [s1 < S1 ≤ s1 + ∆. . . . .. . ... si + ∆] are non-overlapping.) 360 . . . (Feel free to skip the following if you were satisfied with the earlier explanation. . si + ∆] is λ∆e−λδ and the probability of zero arrivals in a period of duration T − n∆ is e−λ(Tn −∆) . sn < Sn ≤ sn + ∆|N = n] P [s1 < S1 ≤ s1 + ∆.. The collection of intervals in which there was no arrival had a total duration i=1 of T − n∆. si + ∆] and zero arrivals in the time period T − n (si . . . the PMF of N (t) becomes PN (t) (n) = (λ(t + 1)/2)n e−λ(t+1)/2 /n! n = 0. ..” we now do the derivation of P [s1 < S1 ≤ s1 + ∆. the Poisson process has exactly one arrival in each interval (si . over the interval [0. . Sn are ordered in time and since a Poisson process cannot have two simultaneous arrivals. (8) (6) (7) n Problem 10. Note that the probability of exactly one arrival in the interval (si . sn < Sn ≤ sn + ∆. 0 otherwise (1 ≤ t ≤ 2) (3) Since the arrival times S1 . Hence. . sn < Sn ≤ sn + ∆|N = n] n! = n ∆n T Since the conditional PDF is zero unless s1 < s2 < · · · < sn ≤ T . ∆ < mini (si+1 − si )/2 implies that the intervals (si . . Sn given N = n satisfies fS1 . .. . In addition. si + ∆]. N = n] that each Si is in the interval (si ..Sn |N (s1 . consider an arbitrarily small ∆. sn |n) = n!/T n 0 ≤ s1 < · · · < sn ≤ T.. We now find the joint probability P [s1 < S1 ≤ s1 + ∆. the conditional PDF fS1 . 0 otherwise. 2. T ]. N = n] = P [N = n] n e−λT (λ∆) = (λT )n e−λT /n! n! = n ∆n T Finally. . . . . . .t) to denote the events of 0 arrivals and 1 arrival respectively. .s1 ) P 1(s1 . Problem 10. .T ) . X(ct) − X(cs) is independent of X(t ) for all t ≤ cs. is a sequence of independent random variables. By Definition 10.sn +∆) 0(sn +∆.sn +∆) P 0(sn +∆. Hence α = 1/32. P [s1 < S1 ≤ s1 + ∆. . Second.For the interval (s.10 for Brownian motion. Y1 . First we observe that Y (0) = X(c · 0) = X(0) = 0. Equivalently. This notation permits us to write P [s1 < S1 ≤ s1 + ∆.t) ] = e−λ(t−s) and P [1(si . .7. . Further. First we observe that Yn = Xn −Xn−1 = X(n)−X(n−1) is a Gaussian random variable with mean zero and variance α. Hence Yn is independent of Ym = X(m) − X(m − 1) for any m ≤ n − 1.7. n−1.s1 ) 1(s1 .s1 ) . Y2 . Wk = X 1 + X 2 + · · · + X k . (1) (2) (3) 361 .t) and 1(s. N = n] = P 0(0. we can say that X(ct) − X(cs) is independent of X(cτ ) for all τ ≤ s. . are identically distributed. Equivalently. Y (t) − Y (s) is independent of Y (τ ) for all τ ≤ s. sn < Sn ≤ sn + ∆. each Wn can be written as the sum W1 = X 1 W2 = X 1 + X 2 .3 Solution Problem 10.s1 +∆) P 0(s1 +∆. the variance of X(8)−X(0) is Var[X(8)−X(0)] = 1/4.s1 +∆) 0(s1 +∆. Alternatively.7. .2 Solution Problem 10. the change in the stock price is X(8)−X(0) and the standard deviation of X(8)−X(0) is 1/2 point. . Thus Y (t) is a Brownian motion process. Thus. we can conclude that Y1 . Since this fact is true for all n. . 0(si +∆.s2 +∆) 0(s2 +∆.s2 ) · · · P 1(sn . N = n] = P 0(0. In addition. .s2 ) 1(s2 . In other words. . .4 Solution Recall that the vector X of increments has independent components Xn = Wn − Wn−1 .T ) (9) The set of events 0(0. Var[X(8) − X(0)] = 8α. . . sn < Sn ≤ sn + ∆. Yn = X(n) − X(n − 1) is independent of X(m) for any m ≤ n − 1.si +∆) ] = (λ∆)e−λ∆ .10.s3 ) · · · 1(sn . since the Poisson process has rate λ. In other words.T ) = (λ∆)n e−λT (10) (11) (12) = e−λs1 λ∆e−λ∆ e−λ(s2 −s1 −∆) · · · λ∆e−λ∆ e−λ(T −sn −∆) From the problem statement. . Y2 . we use the shorthand notation 0(s.7.1 Solution Problem 10. We need to verify that Y (t) = X(ct) satisfies the conditions given in Definition 10.si+1 ) and 1(si . . 0(sn +∆. t]. and for i = 1. By the definition of Brownian motion. we note that since X(t) is Brownian motion process implies that Y (t) − Y (s) = X(ct) − X(cs) is a Gaussian random variable. .si +∆) are independent because each devent depend on the Poisson process in a time interval that overlaps none of the other time intervals. . P [0(s. .  .Since E[W] = AE[X] = 0.8.16 that fW (w) = In terms of matrices.. . . reflecting the fact that each Xn = Wn − Wn−1 . . Wk − Wk−1 0 · · · 0 −1 1 k n=1 fXn (xn ). . . k] = 2 σX 0 k=0 k=0 (3) Problem 10. det(A) = 1. .8. . .8. .   .     1 W1 −1 1  W2 − W1         0 −1 1   (6) A−1 =   and A−1 W =  W3 − W2  . |det (A)| (4) (5) Combining these facts with the observation that fX (x) = k Since A is a lower triangular matrix. 0] = Var[Xm ] = σX .. In addition.   . . the product of its diagonal entries.2 Solution Recall that X(t) = t − W where E[W ] = 1 and E[W 2 ] = 2. we can write fW (w) = fX A−1 w = n=1 fXn (wn − wn−1 ) . it folows from Theorem 5. For k = 0. CX (t... Xm and Xm+k are independent so that CX [m. τ ) = E [X(t)X(t + τ )] − µX (t)µX (t + τ ) = t(t + τ ) − E [(2t + τ )W ] + E W =1 2 (1) (2) (3) (4) (5) = E [(t − W )(t + τ − W )] − (t − 1)(t + τ − 1) = −(2t + τ )E [W ] + 2 + 2t + τ − 1 − t(t + τ ) + 2t + τ − 1 362 . (b) The autocovariance is (a) The mean is µX (t) = E[t − W ] = t − E[W ] = t − 1.1 Solution The discrete time autocovariance function is 2 for k = 0.   . (7) which completes the missing steps in the proof of Theorem 10. CX [m. W = AX where A is the lower triangular matrix   1 1 1    A = . 1 ··· ··· 1 1 fX A−1 w . Problem 10. k] = E [(Xm − µX )(Xm+k − µX )] (1) (2) Thus the autocovariance of Xn is CX [m. k] = E [(Xm − µX )] E [(Xm+k − µX )] = 0 CX [m. . .   . A more realistic model might incorporate the effects of “heat waves” or “cold spells” through correlated daily temperatures. we obtain Cn = Xn−1 Cn−2 +4 + Xn 4 2 Xn−2 Xn−1 Cn−3 +4 + + Xn = 8 4 2 . The hardest part of this problem is distinguishing between the process Cn and the covariance function CC [k]. 1] random variables. one reason this model is overly simple is because day to day temperatures are uncorrelated. = C0 C0 X1 X2 + 4 n−1 + n−2 + · · · + Xn = n + 4 n 2 2 2 2 n i=1 (1) (2) (3) Xi 2n−i (4) (a) Since C0 . k] = E   C0 +4 2n n i=1 n i=1 E [Xi ] =0 2n−i (5) Xi 2n−i   C0 +4 2m + k m+k j=1 Xj  2m+k−j  (6) 363 .8. . E [C0 ] E [Cn ] = +4 2n (b) The autocovariance is CC [m. (a) The expected value of the process is E [Cn ] = 16E 1 − cos (b) The autocovariance of Cn is CC [m. the daily temperature process results from Cn = 16 1 − cos 2πn + 4Xn 365 (1) where Xn is an iid random sequence of N [0. . Problem 10.3 Solution In this problem. . X2 . However.4 Solution By repeated application of the recursion Cn = Cn−1 /2 + 4Xn . all have zero mean. X1 . . .8.Problem 10. k] = E Cm − 16 1 − cos 2πm 2π(m + k) Cm+k − 16 1 − cos 365 365 16 k = 0 0 otherwise (3) (4) 2πn 2πn + 4E [Xn ] = 16 1 − cos 365 365 (2) = 16E [Xm Xm+k ] = (c) A model of this type may be able to capture the mean and variance of the daily temperature. m+k) 3/4 2|k| (15) (c) Since E[Ci ] = 0 for all i. k] = = = For k < 0. Since each Xi has variance 1. X1 . our model has a mean daily temperature of zero degrees Celsius for the entire year.Since C0 . E[Xi Xj ] = 0 so that only the i = j terms make any contribution to the double sum. k] = 2m+k + 16 2 m m+k i=1 j=1 1 22m+k m + 16 i=1 m i=1 1 22m+k−2i (1/4)m−i (8) (9) (10) 1 16 + k 22m+k 2 1 22m+k 16 1 − (1/4)m + k 2 3/4 E [Xi Xj ] m−i 2m+k−j 2 1 (11) (12) (13) (14) = = = 1 22m+k m+k + 16 1 16 + −k 22m+k 2 1 22m+k i=1 m+k i=1 22m+k−2i (1/4)m+k−i 16 1 − (1/4)m+k + k 2 3/4 A general expression that’s valid for all m and k is CC [m. This yields Var[Cn ] = 1 16 4(4n − 1) + n 4n 4 3 (16) Note that the variance is upper bounded by √ Hence the daily temperature has a standard deviation of 8/ 3 ≈ 4. However. are independent (and zero mean). This is not a reasonable model for a year. we must consider the cases k ≥ 0 and k < 0 separately. k] = 1 22m+k + 16 1 − (1/4)min(m. the autocovariance for k ≥ 0 is CC [m. 364 Var[Cn ] ≤ 64/3 (17) . we can calculate the variance of Cn by evaluating the covariance at n = m. X2 . this model is more difficult to discredit. E[C0 Xi ] = 0. . at this point. This implies 2 E C0 CC [m. . . k] = 2m+k + 16 2 m m+k i=1 j=1 E [Xi Xj ] m−i 2m+k−j 2 (7) For i = j. Without actual evidence of daily temperatures in January. a mean temperature of zero degrees Celsius seems quite reasonable. we can write 2 E C0 CC [m. (d) For the month of January.6 degrees. fY (t1 +τ )... yk ) We can conclude that Y (t) is a stationary process. . τ ) = λt[λ(t + τ ) − λt) + λt + (λt)2 − λ2 t(t + τ ) = λt (6) (5) (4) If τ < 0.X(tk +a) (y1 .8. .. . .Y (tk ) (y1 .X(tk +a) (y1 . For arbitrary t and τ .. N (t + τ ) − N (t) is the number of arrivals in the interval [t. yk ) Thus... ... we can combine these facts to write CN (t. yk ) = fY (t1 ). t + τ ) (7) Problem 10. .. (4) (5) (3) (2) (1) 365 . . ..Y (tk +τ ) (y1 .. This implies E [N (t)[N (t + τ ) − N (t)]] = E [N (t)] E [N (t + τ ) − N (t)] = λt[λ(t + τ ) − λt] Note that since N (t) is a Poisson random variable. . yk ) = fX(t1 +a).. . .. .. . .Problem 10.. . . fX(t1 +τ +a). .. . Y (tk ). . τ ) = λ min(t.. . CN (t. . . . When τ ≥ 0.. . . . τ ) = E [N (t)N (t + τ )] − (λt)[λ(t + τ )] (1) 2 = E [N (t)[(N (t + τ ) − N (t)) + N (t)]] − λ t(t + τ ) 2 2 (2) (3) = E [N (t)[N (t + τ ) − N (t)]] + E N (t) − λ t(t + τ ) By the definition of the Poisson process.. . . .Y (tk +τ ) (y1 . . ..... we know that µN (t) = λt. . τ ) = λ(t+τ ).. yk ) = fX(t1 +τ +a). yk ) This implies fY (t1 +τ ). . . t + τ ) and is independent of N (t) for τ > 0.X(tk +a) (y1 ..9. This implies fY (t1 ). we observe that Y (tj ) = X(tj + a). ... Var[N (t)] = λt. yk ) = fX(t1 +a).X(tk +τ +a) (y1 . Hence E N 2 (t) = Var[N (t)] + (E [N (t)]2 = λt + (λt)2 Therefore... we can write CN (t... yk ) = fX(t1 +a). . then we can interchange the labels t and t+τ in the above steps to show CN (t.. . ..5 Solution This derivation of the Poisson process covariance is almost identical to the derivation of the Brownian motion autocovariance since both rely on the use of independent increments. yk ) Since X(t) is a stationary process.X(tk +τ +a) (y1 ... . . for τ ≥ 0.Y (tk ) (y1 . From the definition of the Poisson process.1 Solution For an arbitrary set of samples Y (t1 ).. .. . ym ) = fYn1 . . yk ) = fX(at1 ).. . .Y (tk +τ ) (y1 .. ... .Ynm +l (y1 . . fY (t1 +τ ). .3 Solution For a set of time samples n1 . .. . . . we note that Yni +k = X((ni + k)∆).. since wide sense stationarity isn’t addressed until the next section.X((nm +k)∆) (y1 ........Ynm +k (y1 . .. . ym ) = fXkn1 . . . . (3) (2) Problem 10.. ym ) = fYn1 . . yk ) = fY (t1 ). . . . . ... . . .. .. ... . Comment: The first printing of the text asks whether Yn is wide stationary if Xn is wide sense stationary. Thus Yn is a stationary process.. Since X(t) is a stationary process. .. . . . ..Problem 10. .Y (tk +τ ) (y1 . ym ) = fXkn1 +kl . yk ) (2) (1) We see that a time offset of τ for the Y (t) process corresponds to an offset of time τ = aτ for the X(t) process.. ... .. .X(atk ) (y1 .. ym ) .. . .Ynm (y1 .X(atk +τ ) (y1 . .. This implies (1) fYn1 +k .. .. . ym ) = fX(n1 ∆). This fact is also true.. Y (tk ). . nm and an offset k.. ....X(atk +aτ ) (y1 . yk ) We can conclude that Y (t) is a stationary process. ym ) Since X(ni ∆) = Yni .. ym ) = fX((n1 +k)∆).Xknm +kl (y1 . .. . .Ynm +k (y1 . . . . we see that fYn1 +k .. ym ) = fYn1 . . .. . .. yk ) = fX(at1 +τ ). however. . ... .. .. yk ) (3) (4) (5) = fX(at1 ). . .9... ..... ...X(atk ) (y1 .. . .Y (tk ) (y1 .. . .. 366 (4) (2) (3) (1) ... the problem was corrected to ask about stationarity. ...9. .. . We combine these steps to write fYn1 +l . ... . .. .Ynm (y1 . This implies fY (t1 ). .4 Solution Since Yn = Xkn .X(nm ∆) (y1 . . . . fYn1 +l . . .. ....2 Solution For an arbitrary set of samples Y (t1 ). .. ym ) Hence Yn is a stationary random sequence.... . yk ) = fX(at1 +aτ ).X((nm +k)∆) (y1 . ym ) Stationarity of the Xn process implies fXkn1 +kl .9.. fX((n1 +k)∆).Y (tk ) (y1 . ym ) . . . .Xknm (y1 ... .Xknm +kl (y1 . . Problem 10..Ynm (y1 .Ynm +l (y1 . yk ) Thus.. . . fY (t1 +τ ).. ym ) Since X(t) is a stationary process.. we observe that Y (tj ) = X(atj ). ... ...X(tn ) a a (1) fY (t1 ). .. . . xn ) over the region corresponding to event Aτ ...X(tn ) (x1 . Thus fY (t1 +τ ). . .. . xn ) This implies P [Aτ ] does not depend on τ .. . . ... .X(tn +τ ) (x1 ..X(tn ) a a 0 = fY (t1 ). ... ......Y (tn ) (y1 .. . .... .Y (tn +τ ) (y1 . ... . FY (t1 +τ ). .. f fA (a) da an X(t1 ). yn ) = FY (t1 ). Since g(·) is an unspecified function.. .6 Solution By taking partial derivatives with respect to y1 . . .9..... .. yn ) ∞ (5) (6) (7) We can conclude that Y (t) is a stationary process. . . yn ) = P [Aτ ] = FY (t1 ).. g(X(tn + τ )) ≤ yn   (2) (3) . Y (tn + τ ).. In particular.X(tn +τ ) a a a 0 ∞ yn 1 y1 = ..Y (tn ) (y1 . . f fA (a) da n X(t1 +τ )...Y (tn ) (y1 .. . ...Y (tn )|A (y1 . yn ) 367 = P [g(X(t1 )) ≤ y1 .X(tn +τ ) (x1 . .. . .X(tn ) a a a yn 1 y1 ...Y (tn +τ ) (y1 .. . ... ..Y (tn ) (y1 .. yn ) = ∞ 0 (4) Since X(t) is a stationary process. . Y (tn + τ ): fY (t1 +τ ).10.. .Y (tn +τ ) (y1 . To show Y (t) is a stationary process. .. . . FY (t1 +τ ). .. . we can calculate P [Aτ ] by integrating fX(t1 +τ ). X(tn ). . . . we will show that for all τ .Problem 10.. . f an X(t1 ). Since X(t) is a stationary process. ... .. . . . g(X(tn )) ≤ yn ] (5) (6) (7) (4)   = P g(X(t1 + τ )) ≤ y1 . . . ..Y (tn )|A (y1 . .Y (tn +τ ) (y1 . .9. . ..10 with b = 0 yields fY (t1 ). . .. we write FY (t1 +τ ).. . . . it should be apparent that this implies that the joint PDF fY (t1 +τ ).. ... .. .. .... yn ) will not depend on τ . . . . .. yn ) = yn 1 y1 . . Applying the result of Theorem 10..5 Solution Given A = a. . . . To proceed. ... Y (t) = aX(t) which is a special case of Y (t) = aX(t) + b given in Theorem 10. . yn . .... .. . we will work with the joint CDF of Y (t1 + τ ).Y (tn +τ ) (y1 . . . . .... . yn ) = = 0 ∞ 0 ∞ yn 1 y1 .. yn |a) fA (a) da yn 1 y1 . .. yn ) (1) Problem 10.. Y (tn + τ ) ≤ yn ]  Aτ In principle.. ... . . ...... f fA (a) da n X(t1 +τ ). the joint PDF of X(t1 + τ ).... ..Y (tn +τ ) (y1 ... . fA (a) da f n X(t1 ). X(tn + τ ) is the same as the joint PDf of X(t1 ).. . yn |a) = Integrating over the PDF fA (a) yields fY (t1 ).X(tn +τ ) a a a (2) (3) This complicated expression can be used to find the joint PDF of Y (t1 + τ ).... yn ) = P [Y (t1 + τ ) ≤ y1 .. xn ) = fX(t1 ).. fX(t1 +τ ). . (2) (3) (1) Since X and Y are uncorrelated.10. we use the formulas 1 cos(A − B) + cos(A + B) 2 1 sin A sin B = cos(A − B) − cos(A + B) 2 368 (6) (7) . τ ) depend on t. all of the XY cross terms will be zero. τ ) when W (t) = X cos 2πf0 t + Y sin 2πf0 t. E X 2 = Var[X] − (E [X])2 = σ 2 . We start by writing RW (t. and X and Y are uncorrelated random variables with E[X] = E[Y ] = 0.12. Thus Y (t) is a wide sense stationary process. That is. τ ) = E [(A + X(t)) (A + X(t + τ ))] =E A 2 2 (1) (2) (3) (4) + E [A] E [X(t)] + AE [X(t + τ )] + E [X(t)X(t + τ )] + 2E [A] µX + RX (τ ) =E A We see that neither E[Y (t)] nor RY (t. we find the autocorrelation RW (t.1 Solution However. Thus. E[XY ] = E[X]E[Y ] = 0. τ ) = E [W (t)W (t + τ )] = E [(X cos 2πf0 t + Y sin 2πf0 t) (X cos 2πf0 (t + τ ) + Y sin 2πf0 (t + τ ))] .2. Problem 10. for a process X(t) with the autocorrelation RX (τ ) = δ(τ ). RX (τ ) = δ(τ ) = δ(−τ ) = RX (−τ ) RX (τ ) ≤ RX (0) = δ(0) RX (τ ) = δ(τ ) ≥ 0 (1) (2) (3) Problem 10. (4) Problem 10.The autocorrelation function RX (τ ) = δ(τ ) is mathematically valid in the sense that it meets the conditions required in Theorem 10. the mean of Y (t) is E [Y (t)] = E [A] + E [X(t)] = E [A] + µX The autocorrelation of Y (t) is RY (t. This implies RW (t. cos A cos B = E Y 2 = Var[Y ] − (E [Y ])2 = σ 2 .16 says that the average power of the process is E X 2 (t) = RX (0) = δ(0) = ∞ Processes with infinite average power cannot exist in practice. from Math Fact B. Definition 10.2 Solution Since Y (t) = A + X(t).3 Solution In this problem. τ ) = E X 2 cos 2πf0 t cos 2πf0 (t + τ ) + E Y 2 sin 2πf0 t sin 2πf0 (t + τ ) Since E[X] = E[Y ] = 0. (5) (4) In addition.10.10. when we expand E[W (t)W (t + τ )] and take the expectation. (8) (9) (10) Thus RW (t. the average power of X(t) is E[X 2 (t)] = 1.to write RW (t. E [Y (t)] = E [X(t) cos(2πfc t + Θ)] = E [X(t)] E [cos(2πfc t + Θ)] = 0 (6) Note that the mean of Y (t) is zero no matter what the mean of X(t) since the random phase cosine has zero mean. (b) Since Θ has a uniform PDF over [0.16. (d) Independence of X(t) and Θ results in the average power of Y (t) being E Y 2 (t) = E X 2 (t) cos2 (2πfc t + Θ) = E X 2 (t) E cos2 (2πfc t + Θ) = E cos (2πfc t + Θ) 2 (7) (8) (9) Note that we have used the fact from part (a) that X(t) has unity average power. τ ) = σ2 σ2 (cos 2πf0 τ + cos 2πf0 (2t + τ )) + (cos 2πf0 τ − cos 2πf0 (2t + τ )) 2 2 = σ 2 cos 2πf0 τ E [W (t)] = E [X] cos 2πf0 t + E [Y ] sin 2πf0 t = 0. By Definition 10. However. This yields E Y 2 (t) = E 1 (1 + cos(2π(2fc )t + Θ)) = 1/2 2 (10) Note that E[cos(2π(2fc )t + Θ)] = 0 by the argument given in part (b) with 2fc replacing fc . fΘ (θ) = 1/(2π) 0 ≤ θ ≤ 2π 0 otherwise ∞ −∞ 2π 0 (1) The expected value of the random phase cosine is E [cos(2πfc t + Θ)] = = = cos(2πfc t + θ)fΘ (θ) dθ cos(2πfc t + θ) 1 dθ 2π (2) (3) (4) (5) 1 sin(2πfc t + θ)|2π 0 2π 1 (sin(2πfc t + 2π) − sin(2πfc t)) = 0 = 2π (c) Since X(t) and Θ are independent. Problem 10. then the cos 2πf0 (2t + τ ) terms in RW (t. To finish the problem. we use the trigonometric identity cos2 φ = (1 + cos 2φ)/2. Since we can conclude that W (t) is a wide sense stationary process. τ ) = RW (τ ).4 Solution (a) In the problem statement. τ ) would not cancel and W (t) would not be wide sense stationary.10. 2π]. 369 . we note that if E[X 2 ] = E[Y 2 ]. we are told that X(t) has average power equal to 1. b. a constant for all t. Choosing a = CX [m. Theorem 4. m + k] + µ2 ≤ CX [0] + µ2 (4) X X In the above expression.10. Var[Xm ] = CX [0]. RX [m + k.10. k]|.12. k]| ≤ σXm σXm+k = CX [0] .Since Xm is wide sense stationary. For the second item. we observe that 1 X m − µX = 2m+1 m n=−m (Xn − µX ). it is sufficient to show that limm→∞ Var[X m ] = 0. Definition 10. b = CX [0]. Second. then (a + c)2 ≤ (b + c)2 . First.17 says that |CX [m. and thus it follows that CX [m. First we verify that X m is unbiased: E Xm 1 E = 2m + 1 = m Xn n=−m m m (1) (2) 1 1 E [Xn ] = µX = µX 2m + 1 n=−m 2m + 1 n=−m To show consistency. we must have E[Xm ] ≥ 0. which proves the third part of the theorem. we note that when Xm is wide sense stationary.5 Solution (This little step was unfortunately omitted from the proof of Theorem 10. −k] Problem 10. k]. The final item requires more effort. Since Xm ≥ 0. the left side equals (RX [k])2 while the right side is (RX [0])2 .) Now for any numbers a. −k] = RX [−k]. and c. First. 0] = 2 2 2 E[Xm ].6 Solution The solution to this problem is essentially the same as the proof of Theorem 10. Note that CX [m. This implies 1 Var[X(T )] = E  (Xn − µX ) 2m + 1 n=−m =E 1 (2m + 1)2 m n=−m m  m 2   (3) m n =−m (Xn − µX ) (Xn − µX ) (4) (5) (6) = = 1 (2m + 1)2 1 (2m + 1)2 m n=−m n =−m m m n=−m n =−m E [(Xn − µX )(Xn − µX )] CX n − n . k] = E [Xm Xm+k ] = E [Xm+k Xm ] = RX [m + k.13 except integrals are replaced by sums. 370 . RX [0] = RX [m. if a ≤ b and c ≥ 0. (3) (2) This proof simply parallels the proof of Theorem 10. Problem 10. k] ≤ σXm σXm+k = CX [0] . and c = µ2 yields X 2 2 CX [m.13 implies that (1) RX [k] = RX [m.12. For the first item. k] ≤ |CX [m. RW (t. τ ) depends only on the time difference τ . Problem 10. we must first show that Xi (t) is wide sense stationary and then we must show that the cross correlation RXXi (t. By independence of X(t) and Y (t). In addition. τ ) = E [W (t)X(t + τ )] = E [X(t)Y (t)X(t + τ )] .11.11. 371 . E [W (t)] = E [X(t)Y (t)] = E [X(t)] E [Y (t)] = µX µY .1 Solution (a) Since X(t) and Y (t) are independent processes. Problem 10. 2m + 1 (9) = 0.2 Solution To show that X(t) and Xi (t) are jointly wide sense stationary. we can conclude from Definition 10. τ ) = E [X(t)X(t + τ )] E [Y (t)] = µY RX (τ ).18 that W (t) and X(t) are jointly wide sense stationary.We note that m n =−m m CX n − n ≤ ≤ n =−m ∞ n =−∞ CX n − n CX n − n = ∞ k=−∞ (7) |CX (k)| < ∞. (8) Hence there exists a constant K such that 1 Var[X m ] ≤ (2m + 1)2 Thus limm→∞ Var[X m ] ≤ limm→∞ K 2m+1 m K= n=−m K . we calculate RW X (t. τ ) = E [W (t)W (t + τ )] = E [X(t)Y (t)X(t + τ )Y (t + τ )] = E [X(t)X(t + τ )] E [Y (t)Y (t + τ )] = RX (τ )RY (τ ) We can conclude that W (t) is wide sense stationary. τ ) is only a function of the time difference τ . RW X (t. we have to check whether these facts are implied by the fact that X(t) is wide sense stationary. (7) (6) (2) (3) (4) (5) (1) Since W (t) and X(t) are both wide sense stationary and since RW X (t. For each Xi (t). (b) To examine whether X(t) and W (t) are jointly wide sense stationary. Since E[Y (t)] = E[X(t − t0 )] = µX . Now we calculate the cross correlation Except for the trivial case when a = 1 and X2 (t) = X(t). (Sorry about that!) 372 . τ ) depends only on the time difference τ . τ ) depends on both the absolute time t and the time difference τ . (b) The cross correlation of X(t) and Y (t) is RXY (t. RXX2 (t.11. Comment: This problem is badly designed since the conclusions don’t depend on the specific RX (τ ) given in the problem text.(a) Since E[X1 (t)] = E[X(t + a)] = µX and RX1 (t. Now we calculate the cross correlation Since RXX1 (t. τ ) = E [X(t)Y (t + τ )] = RX (τ − t0 ). Problem 10. (c) We have already verified that RY (t. τ ) = E [X(t)X2 (t + τ )] = E [X(t)X(a(t + τ ))] = RX ((a − 1)t + τ ). τ ) = E [Y (t)Y (t + τ )] = RX (τ ). τ ) = E [X(t)X1 (t + τ )] = E [X(t)X(t + τ + a)] = RX (τ + a). we conclude that X(t) and X2 (t) are not jointly wide sense stationary.3 Solution (a) Y (t) has autocorrelation function RY (t. τ ) = E [X1 (t)X1 (t + τ )] = E [X(t + a)X(t + τ + a)] = RX (τ ). RXX2 (t. we conclude that X(t) and X1 (t) are jointly wide sense stationary. RXX1 (t. we know that X(t) and Y (t) are jointly wide sense stationary. = E [X(t)X(t + τ − t0 )] (4) (5) (6) = E [X(t − t0 )X(t + τ − t0 )] (1) (2) (3) (d) Since X(t) and Y (t) are wide sense stationary and since we have shown that RXY (t. τ ) depends on the time difference τ but not on the absolute time t. (1) (2) (3) (4) (5) (6) we have verified that X1 (t) is wide sense stationary. τ ) = E [X2 (t)X2 (t + τ )] = E [X(at)X(a(t + τ ))] = E [X(at)X(at + aτ )] = RX (aτ ). τ ) depends only on τ . (7) (8) (9) (10) (11) (12) we have verified that X2 (t) is wide sense stationary. we have verified that Y (t) is wide sense stationary. (b) Since E[X2 (t)] = E[X(at)] = µX and RX2 (t. In this case. it matters whether τ ≥ 0 or if τ < 0. That is. we will have u = v at some point so that RY (t. When τ ≥ 0. t2 − t1 ) = ρσ1 σ2 . then v ranges from 0 to t + τ and at some point in the integral over v we will have v = u.12. RY (t. t2 − t1 ) ρσ1 σ2 = 2 CX (t2 . = t+τ At this point.1 Solution Writing Y (t + τ ) = t+τ 0 N (v) dv permits us to write the autocorrelation of Y (t) as t 0 0 t+τ t+τ RY (t. 0) σ1 CX (t1 . Brownian motion is the integral of the white noise process. τ ) is what we found in Example 10. when the inner integral is over u. Hence. t1 − t2 ) (1) 2 2 Since C is a 2 × 2 matrix. then we must reverse the order of integration. (4) 373 . X(t2 ) is fX(t1 ).19 to be the autocorrelation of a Brownian motion process. when τ ≥ 0. (b) Is is easy to verify that C−1 (c) The general form of the multivariate density for X(t1 ). (4) When τ < 0. (a) Since CX (t1 . τ ) = E [Y (t)Y (t + τ )] = E = 0 t 0 t 0 0 N (u)N (v) dv du (1) (2) (3) E [N (u)N (v)] dv du αδ(u − v) dv du. Problem 10. x2 ) = where k = 2 and x = x1 x2 1 (2π)k/2 |C|1/2 e− 2 (x−µX ) C 1 −1 (x−µ ) X  1 2 1  σ1 =  −ρ 1 − ρ2 σ1 σ2 −ρ  σ1 σ2  1  2 σ1 (2) (3) and µX = µ1 µ2 . τ ) = 0 t+τ 0 t αδ(u − v) du dv = t+τ 0 α dv = α(t + τ ).X(t2 ) (x1 . In fact. RY (t. 1 1/2 (2π)k/2 |C| = 1 2π 2 2 σ1 σ2 (1 − ρ2 ) . τ ) = 0 t α du = αt.2 Solution Let µi = E[X(ti )].Problem 10. (5) Thus we see the autocorrelation of the output is RY (t. t + τ } (6) Perhaps surprisingly. the covariance matrix is C= 2 CX (t1 . 0) ρσ1 σ2 σ2 CX (t2 . it has determinant |C| = σ1 σ2 (1 − ρ2 ).12. τ ) = α min {t. 374 . plot(t.   .3 Solution Let W = W (t1 ) W (t2 ) · · · W (tn ) denote a vector of samples of a Brownian motion process.(2*n)+1).0. Problem 10. w=gaussrv(0. . axis([-1 1 -3 3]). figure. Discrete Value xcd=round(xcc). . Continuous Value ts=subsample(t.’). . the exponent is 1 x ¯ x ¯ − (¯ − µX ) C−1 (¯ − µX ) 2 =− 1 x1 − µ1 x2 − µ2 2 x1 − µ1 σ1 2 =− − Plugging in each piece into the joint PDF fX(t1 ). . =   X.’b.  . figure. Continuous Value xcc=2*cos(2*pi*t) + w’. To do so.ylabel(’\it X_{dc}(t)’).100).1 Solution From the instructions given in the problem..xdd.xcc).       X1 1 W1   W2   X1 + X2  1 1       = . Thus X is a Gaussian random vector. (3) W= . axis([-1 1 -3 3]).12. plot(ts. %Discrete Time. xlabel(’\it t’). let X = X1 · · · Xn W (tn ) − W (tn−1 ) (1) (2) = W (t1 ) W (t2 ) − W (t1 ) W (t3 ) − W (t2 ) · · · denote the vector of increments. n=1000.ylabel(’\it X_{dd}(t)’).m will generate the four plots.’b.100).16 implies that W is a Gaussian random vector. t=0. Theorem 5.X(t2 ) (x1 .01. To prove that W (t) is a Gaussian random process. figure. xlabel(’\it t’). Discrete Value xdd=subsample(xcd. plot(t. . we must show that W is a Gaussian random vector. xdc=subsample(xcc. %Continuous Time. . x2 ) given above.   . %Continuous Time.13.’). the program noisycosine. Xn is a sequence of independent Gaussian random variables. X1 .xdc. %Discrete time.001*(-n:n). .ylabel(’\it X_{cc}(t)’). axis([-1 1 -3 3]).ylabel(’\it X_{cd}(t)’). . xlabel(’\it t’).Furthermore.xcd). . Finally. axis([-1 1 -3 3]). we obtain the bivariate Gaussian PDF. 2ρ(x1 − µ1 )(x2 − µ2 ) + σ1 σ2 2(1 − ρ2 )  1 2 1  σ1  −ρ 1 − ρ2 σ1 σ2 −ρ  σ1 σ2  x1 − µ1 1  x2 − µ2 2 σ1 x2 − µ2 σ2 2 (5) (6) Problem 10. Wn X1 + · · · + Xn 1 ··· ··· 1 A Since X is a Gaussian random vector and W = AX with A a rank n matrix. xlabel(’\it t’). . By the definition of Brownian motion. plot(ts.100). Mavg).Mavg). %output y(1)=x(1).m to obtain the discrete time sample functions. because the switch is initially empty with M (0) = 0. x(2) .In noisycosine. The command simswitchavg(600.0. we use a function subsample. y(3)=x(2n+1) y=x(1:n:length(x)).10) produced this graph: 375 .t.1.. we use it just to make noisycosine.2 Solution >> >> >> >> t=(1:600)’.1. Problem 10.n) %input x(1). plot(t. Mavg=cumsum(M). y(2)=x(1+n).M. for n=1:k.t). and a plot resembling this one: 150 100 M(t) 50 M(t) Time Average M(t) 0 0 100 200 300 t 400 500 600 From the figure. Mavg(:.k) %simulate k runs of duration T of the %telephone switch in Chapter 10 %and plot the time average of each run t=(1:k)’. %each column of Mavg is a time average sample run Mavg=zeros(T. Following the problem instructions. we can write the following short program to examine ten simulation runs: function Mavg=simswitchavg(T. subsample is hardly necessary since it’s such a simple one-line Matlab function: function y=subsample(x.n)=cumsum(M). These commands will simulate the switch for 600 minutes./t.t). M=simswitch(10.13. the vector Mavg which is the sequence of time average estimates. it takes a few hundred minutes for the time average to climb to something close to 100. In particular. M=simswitch(10. producing the vector M of samples of M (t) each minute.. it appears that the time average is converging to a value in th neighborhood of 100. However.0. end plot(t.m a little more clear.m. In fact.k).k) %Usage: Mavg=simswitchavg(T./t. However. N1 is a Poisson random variable with E[N1 ] = 100.13.m because the call duration are no longer exponential random variables.3 Solution Note that if you compare simswitch. Hence it should surprising if we compute the time average and find the time average number in the queue to be something other than 100. unlike Problem 10.y) does exactly the same thing as n=count(x. note that even if T is large. To check out this quickie analysis. D=countup(y.2. each sample run produces a time average M 600 around 100. two changes occurred. in both cases.2.13. In fact. A=countup(s. we must modify simswitch.t). we use the method of Problem 10. rate lambda %Deterministic (T) call duration %For vector t of times %M(i) = no. Before we use the approach of Problem 10. The other change is that count(s. of calls at time t(i) s=poissonarrivals(lambda.m here. In fact.t) %Poisson arrivals.t). the number of calls at the switch will be exactly the number of calls N1 that arrived in the previous minute.y). Problem 10. M=A-D. one can see that even after T = 600 minutes.13. Instead. one might guess that M 600 has a standard deviation of perhaps σ = 2 or σ = 3. n(i) is the number of elements less than or equal to y(i). In this problem.t). if we inspect the system at an arbitrary time t at least one minute past initialization.m in the text with simswitchd. t=1 (1) 376 . The first is that the exponential call durations are replaced by the deterministic time T .max(t)). we will able Markov chains to prove that the expected number of calls in the switch is in fact 100. its worth a moment to consider the physical situation. M T is still a random variable. we cannot directly use the function simswitch. our goal is to find out the average number of ongoing calls in the switch.m for the deterministic one minute call durations. n=countup(x.13. In particular.120 M(t) Time Average 100 80 60 40 20 0 0 100 200 300 t 400 500 600 From the graph. calls arrive as a Poisson process of rate λ = 100 call/minute and each call has duration of exactly one minute. As a result.t) is replaced by countup(s. this should be true for every inspection time t. Since calls arrive as a Poisson proces of rate λ = 100 calls/minute. each of which has a PDF that is in itself reasonably difficult to calculate. y=s+T. The difference is that countup requires that the vectors x and y be nondecreasing. yielding the function simswitchd. However.2.2 and form the time average M (T ) = 1 T T M (t).m: function M=simswitchd(lambda.T.13. Now we use the same procedure as in Problem 10. From the above plot. An exact calculation of the variance of M 600 is fairly difficult because it is a sum of dependent random variables. Note that in Chapter 12. In this case. n = 1. However. Thus.1λT generates about 10 percent more exponential random variables than we typically need.5). Choosing n = 1. As expected.1λT and 0. a ten percent penalty won’t be too costly. Sn is an Erlang (n.000.1λT =Φ λT 110 (3) Thus for large λT .13. P [Sn > T ] = P Sn − n/λ n/λ2 > T − n/λ n/λ2 ≈Q λT − n √ n (2) To simplify our algebra.>> >> >> >> t=(1:600)’. λ λ (1) Problem 10.01) = 0. P [Sn > T ] ≈ Φ(9.000.1T. then Matlab is typically faster generating a vector of them all at once rather than generating them one at a time.t). 377 . P [K = 1] is fairly high because E [Sn ] = 1. Mavg=cumsum(M).Mavg). λ) random variable./t. if we need to generate a lot of arrivals (ie a lot of exponential interarrival times). however. The question that must be addressed is to estimate P [K = 1] when n is large. M=simswitchd(100. it doesn’t much matter if we are efficient because the amount of calculation is small. Typically. Although we don’t want to generate more exponential random variables than necessary. P [K = 1] = P [Sn > T ]. the time average appears to be converging to 100. we assume for large n that 0. which is a Poisson (α = 100) random variable which has not been averaged. If λT = 10. Since E[Sn ] = n/λ and Var[Sn ] = n/λ2 .9987. That is. the time average is simply the number of arrivals in the first minute. the left side of the graph will be random for each run. 100 95 0 100 200 300 t 400 500 600 We used the word “vaguely” because at t = 1.4 Solution Increasing n increases P [K = 1]. poissonarrivals then does more work generating exponential random variables.1λT n = ≈ 1. as long as P [K = 1] is high. we can use the central limit theorem because Sn is the sum of n exponential random variables. 105 M(t) time average We form and plot the time average using these commands will yield a plot vaguely similar to that shown below.1. For example. if λT = 1. When n is small. P [Sn > T ] ≈ Φ(3. The random variable Sn is the sum of n exponential (λ) random variables. plot(t. P [K = 1] is very small.1λT P [Sn > T ] ≈ Q − √ 1. In this case. Since K = 1 if and only if Sn > T .1λT is an integer. We start with brownian.5200 >> Unfortunately.t=cputime-t t = 0.100000).T] n=poissonrv(lam*T. (1) Wn = max(min(Wn−1 + Xn . is √ Wn = Wn−1 + Xn . and perhaps. the process starts at n = 0 at position W0 = 0 at each step n.000 uniform random numbers. function s=poissonarrivals(lam. Accounting for the effect of barriers. For convenience. we can generate the vector x of increments all at once.1).6 Solution 378 . .000 arrivals of a rate 1 Poisson process required roughly 0. function s=newarrivals(lam. these results were highly repeatable.000. Unfortunately this doesn’t mean the code runs better.5 seconds. s=cumsum(exponentialrv(lam. s=sort(T*rand(n.m to simulate the Brownian motion process with barriers. Here are some cputime comparisons: >> t=cputime.1 seconds of cpu time.. end s=s(s<=T). this suggests that a more efficient way to generate a Poisson (α) random variable N is to generate arrivals of a rate α Poisson process until the N th arrival is after time 1. . we need to proceed sequentially. −b). more readable. here are newarrivals and poissonarrivals side by side.000 terms of the Poisson PMF! This required calculation is so large that it dominates the work need to generate 100. s_new]. s(n)] over [0. we simply assume that a unit time step τ = 1 for the process..1*lam*T). In fact. we don’t keep track of the value of the process over all time.) Problem 10.T) %arrival times s=[s(1) .5 Solution Following the problem instructions. . In this case. However to check at each time step whether we are crossing a barrier.100000)..m.5310 >> t=cputime. Thus. the position.Problem 10. cumsum(exponentialrv(lam.6. α) random variables. while (s(length(s))< T). poissonrv generates the first 200. more logical than poissonarrivals.T) %Usage s=newarrivals(lam. or roughly 5 times as long! In the newarrivals code. the culprit is the way poissonrv generates a single Poisson random variable with expected value 100.s=newarrivals(1.T) %Returns Poisson arrival times %s=[s(1) .t=cputime-t t = 0.1110 >> t=cputime.13.1)).t=cputime-t t = 0. s=[s. b).. The same task took newarrivals about 0.s=poissonarrivals(1.n)). Since the goal is to estimate the barrier probability P [|X(t)| = b]. where X1 . we can write the function newarrivals.13. with the help of Problem 10.n)). (This is analogous to the problem in Quiz 10. . s(n)] % s(n)<= T < s(n+1) n=ceil(1. s_new=s(length(s))+ . Clearly the code for newarrivals is shorter.1).4.poissonrv(100000.. if we haven’t reached a barrier. The function poissonarrivals generated 100.. Also. XT are iid Gaussian (0. To implement the simulation.13. 1. pb(1) tracks how often the process touches the left barrier at −b while pb(2) tracks how often the right side barrier at b is reached.1. we should expect pb(1)=pb(2).1e5) pb = 0. In fact.13.000}.0305 >> pb=brownwall(0.0324 >> The sample runs show that for α = 0.1. a key issue is how to choose the parameters of the PDF. you shouldn’t have much trouble digging in the library to find out more.0417 0.T) %pb=brownwall(alpha.1e4) pb = 0.1e6) pb = 0. here two sample runs: function pb=brownwall(alpha. In most problems in which one wants to fit a PDF to measured data. alpha %walls at [-b. x=sqrt(alpha). The command hist.1.0301 0. For the process with barriers.b. The extent to which this is not the case gives an indication of the extent to which we are merely estimating the barrier probability. b]. end end pb=pb/T. sampled %unit of time until time T %each Returns vector pb: %pb(1)=fraction of time at -b %pb(2)=fraction of time at b T=ceil(T).03.7 Solution In this problem. w=w+x(k). By symmetry.m code to generate the vector of departure times y.20 will generate a 20 bin histogram of the departure times. pb(1)=pb(1)+1.0341 0. >> pb=brownwall(0.01.01.0323 0.2). the numerical simulations are not particularly instructive.1. the PDF of X(t) converges to a limit as t becomes large.000.000.T) %Brownian motion. The fact that this histogram resembles an exponential PDF suggests that perhaps it is reasonable to try to match the PDF of an exponential (µ) random variable against the histogram. elseif (w >= b) w=b. we start with the simswitch. X 2 (t) ≤ b2 and thus Var[X(t)] ≤ b2 . If 1/µ < 1/λ. the variance of X(t) always increases linearly with t.1e5) pb = 0.1.In brownbarrier shown below. In this problem.01.0353 >> pb=brownwall(0.1.0333 0. Perhaps the most important thing to understand is that the Brownian motion process with barriers is very different from the ordinary Brownian motion process. then the average time between departures from the switch is less than the average 379 .0360 >> pb=brownwall(0. 1. for k=1:T.1e4) pb = 0. (2) Otherwise. choosing µ is simple.T). w=0. for the process with barriers.000.0299 >> pb=brownwall(0. 100.01. Recall that the switch has a Poisson arrival process of rate λ so interarrival times are exponential (λ) random variables. Here is the code and for each T ∈ {10. P [X(t) = b] = P [X(t) = −b]. Problem 10.*gaussrv(0. Remember that for ordinary Brownian motion.1 and b = 1 that the P [X(t) = −b] ≈ P [X(t) = b] ≈ 0.1e6) pb = 0.0342 >> pb=brownwall(0.0333 0.01.pb=zeros(1. param. if (w <= -b) w=-b. Thus if T is chosen very large. pb(2)=pb(2)+1. If you’re curious.b. We then construct the vector I of inter-departure times.01. for example. This can happen to an overloaded switch.1 0. Thus the only possibility is that 1/µ = 1/λ. Here is an example of the output corresponding to simswitchdepart(10.08 Probability 0. of calls at time t(i) s=poissonarrivals(lambda. 0.8 0. plot(x.mu.b=ceil(n/100). id=imax/b. details can be found.mu. In this case. y=sort(y).T) %Usage: I=simswitchdepart(lambda.1.1 0.04 0. x=id/2:id:imax. xlabel(’\it x’).x. In the program simswitchdepart.pd). pd=hist(I.’Relative Frequency’).time between arrivals to the switch. 380 . y(1:n-1)]. but the steady-state departure process is a Poisson process. n=length(y). if 1/µ > 1/λ . the switch in this problem is an example of an M/M/∞ queuing system for which it has been shown that not only do the inter-departure have an exponential distribution. the match is quite good.06 0. it’s impossible in this system because each arrival departs after an exponential time. legend(’Exponential PDF’. Here is the code: function I=simswitchdepart(lambda.length(s)). %interdeparture times imax=max(I). pd=pd/sum(pd).T). rate lambda %Exponential (mu) call duration %Over time [0.6 0.2 0. %the vector of inter-departure times %M(i) = no.m.4 0. it is enough evidence that one may want to pursue whether such a result can be proven.7 0.x).9 1 Exponential PDF Relative Frequency As seen in the figure.1000). returns I.T) %Poisson arrivals.T]. in the text Discrete Stochastic Processes by Gallager. however. px=exponentialpdf(lambda. we plot a histogram of departure times for a switch with arrival rate λ against the scaled exponential (λ) PDF λe−λx b where b is the histogram bin size. For the curious reader. Similarly.x)*id.ylabel(’Probability’).5 x 0.3 0.02 0 0 0. In fact. I=y-[0. y=s+exponentialrv(mu. then calls would be departing from the switch more slowly than they arrived. Although this is not a carefully designed statistical test of whether the inter-departure times are exponential random variables. calls depart the switch faster than they arrive which is impossible because each departing call was an arriving call at an earlier time.px. the mean of the output is µY = µX = −3 ∞ h(t) dt (1 − 106 t2 ) dt 10−3 0 (1) (2) (3) (4) −∞ 10−3 0 = −3 t − (106 /3)t3 = −2 × 10−3 volts Problem 11. we use Theorem 11. the mean of the output is ∞ −∞ h(t) dt = 4 0 ∞ e−t/a dt = −4ae−t/a ∞ 0 = 4a.3 Solution µY = µX By Theorem 11.2. Problem 11. Problem 11. −∞ by the sifting property of the delta function. Thus Y (t) is wide sense stationary. τ ) only depends on the time difference τ . That is. 381 . RY (0) = = ∞ −∞ ∞ −∞ h(u) h(u) ∞ ∞ −∞ ∞ −∞ h(v)RX (u − v) dv du h(v)η0 δ(u − v) dv du (1) (2) (3) = η0 h2 (u) du.1 Solution For this problem. τ ) = E [(2 + X(t)) (2 + X(t + τ ))] = E [4 + 2X(t) + 2X(t + τ ) + X(t)X(t + τ )] = 4 + 2E [X(t)] + 2E [X(t + τ )] + E [X(t)X(t + τ )] = 4 + RX (τ ) (2) (3) (4) (5) We see that RY (t.Problem Solutions – Chapter 11 Problem 11.2 Solution By Theorem 11.2(a) to evaluate RY (τ ) at τ = 0.2.1.4 Solution Since E[Y 2 (t)] = RY (0). (1) Since µY = 1 = 4a.1. it is easiest to work with the expectation operator.1. The mean function of the output is E [Y (t)] = 2 + E [X(t)] = 2 (1) The autocorrelation of the output is RY (t. we must have a = 1/4.1. 5.2.4 with sampling period Ts = 1/4000 s yields RX [k] = RX (kTs ) = 10 sin(2000πkTs ) + sin(1000πkTs ) 2000πkTs sin(0.25πk) = 20 πk = 10 sinc(0.2. the expected value of the output is µW = µY ∞ n=−∞ hn = 2µY = 2 (1) 382 . the output autocorrelation is RY [n] = = = 1 9 ∞ ∞ i=−∞ j=−∞ 1 1 hi hj RX [n + i − j] RX [n + i − j] (3) (4) (5) i=−1 j=−1 1 (RX [n + 2] + 2RX [n + 1] + 3RX [n] + 2RX [n − 1] + RX [n − 2]) 9   1/3   2/9 RY [n] =  1/9   0 Substituting in RX [n] yields n=0 |n| = 1 |n| = 2 otherwise (6) Problem 11.2 Solution Applying Theorem 11.5k) + 5 sinc(0.5πk) + sin(0.1 Solution (a) Note that Yi = 1 1 1 hn Xi−n = Xi+1 + Xi + Xi−1 3 3 3 n=−∞ ∞ (1) By matching coefficients.5. 0. 1 0 otherwise (2) (b) By Theorem 11. we see that hn = 1/3 n = −1.3 Solution (a) By Theorem 11.25k) (1) (2) (3) Problem 11.Problem 11.2. RW [−3] = RY [−4] + 2RY [−3] + RY [−2] = RY [−2] = 0. the filter gn is the convolution of the filters hn and hn . we can write Wn = ∞ j=−∞ hj Yn−j . Yn−j = ∞ i=−∞ ˆ hi Xn−j−i .5 also says that the output autocorrelation is RW [n] = = i=0 j=0 ∞ ∞ i=−∞ j=−∞ 1 1 hi hj RY [n + i − j] (2) (3) (4) (5) RY [n + i − j] = RY [n − 1] + 2RY [n] + RY [n + 1] For n = −3.5 |n| = 1 RW [n] = (6)   10 n = 0    0 otherwise (7) (d) This part doesn’t require any probability. It just checks your knowledge of linear systems and convolution. (8) Combining these equations yields Wn = ∞ j=−∞ hj ∞ i=−∞ ˆ hi Xn−j−i . 383 . and then reverse the order of summation to obtain   Wn = ∞ hj ∞ ˆ hk−j Xn−k = ∞ j=−∞ k=−∞ k=−∞ Thus we see that gk =  ∞ j=−∞ ˆ hj hk−j  Xn−k . The variance of Wn is 2 Var[Wn ] = E Wn − (E [Wn ])2 = 10 − 22 = 6 Following the same procedure. To avoid confusion. 1. (9) For each j.   0.5 |n| = 3    3 |n| = 2  7.(b) Theorem 11. (10) ∞ j=−∞ ˆ hj hk−j . (11) ˆ That is. Specifically.25) for discrete-time convolution. we make the substitution k = i + j. Using Equation (11. There is a bit of confusion because hn is used to denote both the filter that transforms Xn to Yn as well as the filter that transforms Yn to Wn . ˆ we will use hn to denote the filter that transforms Xn to Yn .5 2 (c) The second moment of the output is E[Wn ] = RW [0] = 10. 2. its easy to show that RW [n] is nonzero for |n| = 0. 5.5. the mean output is µV = µY ∞ n=−∞ hn = (−1 + 1)µY = 0 (1) (b) Theorem 11. 2. 1.5 also says that the output autocorrelation is RV [n] = = i=0 j=0 ∞ ∞ i=−∞ j=−∞ 1 1 hi hj RY [n + i − j] (2) (3) (4) hi hj RY [n + i − j] = −RY [n − 1] + 2RY [n] − RY [n + 1] For n = −3. To avoid confusion. Specifically. we will 384 .ˆ In the context of our particular problem.4 Solution (a) By Theorem 11. it’s easy to show that RV [n] is nonzero for |n| = 0.   −0.2.   1/2 k = 0   1 k=1 ˆ k + hk−1 = ˆ gk = h  1/2 k = 2   0 otherwise (13) Problem 11. The variance of Wn is 2 Var[Vn ] = E Wn RV [0] = 2 Following the same procedure.5 |n| = 3    −1 |n| = 2  0.5 (5) 2 (c) Since E[Vn ] = 0. It just checks your knowledge of linear systems and convolution. RV [−3] = −RY [−4] + 2RY [−3] − RY [−2] = RY [−2] = −0. From Equation (11).5 |n| = 1 (6) RV [n] =   2 n=0    0 otherwise (7) (d) This part doesn’t require any probability. The two filters are ˆ ˆ ˆ h = h0 h1 = 1/2 1/2 h = h0 h1 = 1 1 (12) ˆ Keep in mind that hn = hn = 0 for n < 0 or n > 1. the filter hn that transforms Xn to Yn is given in Example 11. the variance of the output is E[Vn ] = RV [0] = 2. There is a bit of confusion because hn is used to denote both the filter that transforms Xn to Yn as well as the filter that transforms Yn to Vn . 5 Solution We start with Theorem 11.25) for discrete-time convolution.2. From Equation (11). Yn−j = ∞ i=−∞ ˆ hi Xn−j−i . we have the following facts: RY [0] = RX [−1] + 2RX [0] + RX [1] = 2 RY [−1] = RX [−2] + 2RX [−1] + RX [0] = 1 RY [1] = RX [0] + 2RX [1] + RX [2] = 1 A simple solution to this set of equations is RX [0] = 1 and RX [n] = 0 for n = 0. (10) ∞ j=−∞ ˆ hj hk−j . the filter fn is the convolution of the filters hn and hn .ˆ use hn to denote the filter that transforms Xn to Yn . In addition. (11) ˆ That is. we make the substitution k = i+j and then reverse the order of summation to obtain   Vn = ∞ hj ∞ ˆ hk−j Xn−k = ∞ j=−∞ k=−∞ k=−∞ Thus we see that fk =  ∞ j=−∞ ˆ hj hk−j  Xn−k .  k=0  1/2   0 k=1 ˆ ˆ fk = hk − hk−1 =  −1/2 k = 2   0 otherwise (13) Problem 11. 385 (3) (4) (5) (6) . RY [n] = RX [n − 1] + 2RX [n] + RX [n + 1] = 0 This suggests that RX [n] = 0 for |n| > 1. In the context of our particular ˆ n that transforms Xn to Yn is given in Example 11. we can write Vn = ∞ j=−∞ hj Yn−j . The two filters are problem. Using Equation (11.5. (8) Combining these equations yields Vn = ∞ j=−∞ hj ∞ i=−∞ ˆ hi Xn−j−i . the filter h ˆ ˆ ˆ h = h0 h1 = 1/2 1/2 h = h0 h1 = 1 −1 (12) ˆ Keep in mind that hn = hn = 0 for n < 0 or n > 1. (9) For the inner sum.5: RY [n] = ∞ ∞ i=−∞ j=−∞ hi hj RX [n + i − j] (1) (2) = RX [n − 1] + 2RX [n] + RX [n + 1] First we observe that for n ≤ −2 or n ≥ 2. we will find σ 2 such that CX [n.] 2 4 8 2 4 8 Since E[Xi Xj ] = 0 for all i = j. . The variance of Yn can be expressed as Var[Yn ] = 1 1 1 + + + . corresponding to a process that has been running for a long time. 1 1 1 1 1 1 Cov[Yi+1 . this is a technical issue. Var[X] = 4 16 64 1 1−1/4 ∞ i=1 1 1 − 1)σ 2 = σ 2 /3 ( )i σ 2 = ( 4 1 − 1/4 (2) The above infinite sum converges to − 1 = 1/3. Yi ] = [ Xn + Xn−1 + Xn−2 + . . This implies CX [n. . To do so.Problem 11. Yi ] = ∞ i=1 ∞ i=1 (4) 1 1 1 E[Xi2 ] = i 2i−1 2 2 1 E[Xi2 ] 4i (5) Since E[Xi2 ] = σ 2 . . . we need to express Xn in terms of Z0 . Yi ] = σ 2 /6 Finally the correlation coefficient of Yi+1 and Yi is ρYi+1 Yi = Cov[Yi+1 . .6 Solution The mean of Yn = (Xn + Yn−1 )/2 can be found by realizing that Yn is an infinite sum of the Xi ’s. k] cannot be completely independent of k. 2 4 8 (1) Since the Xi ’s are each of zero mean. . Yn = 1 1 1 Xn + Xn−1 + Xn−2 + . implying (3) Var [Yn ] = (1/3) Var [X] = 1/3 The covariance of Yi+1 Yi can be found by the same method. .2. the only terms that are left are Cov[Yi+1 . the mean of Yn is also 0. k] is not defined for k < −n and thus CX [n. Z1 . . When n is large.2. . yielding Cov [Yi+1 .][ Xn−1 + Xn−2 + Xn−3 + . we can solve the above equation. Yi ] Var[Yi+1 ] Var[Yi ] = 1 σ 2 /6 = σ 2 /3 2 (7) (6) There is a technical difficulty with this problem since Xn is not defined for n < 0. Instead. . . . Zn1 . k] = CX [k] for all n ¯ and k for which the covariance function is defined.7 Solution (1) (2) (3) (4) (5) (6) = c X0 + c n n−1 Z0 + c n−2 n−1 Z2 + · · · + Zn−1 = cn X0 + i=0 cn−1−i Zi 386 . We do this in the following way: Xn = cXn−1 + Zn−1 = c[cXn−2 + Zn−2 ] + Zn−1 = c [cXn−3 + Zn−3 ] + cZn−2 + Zn−1 . 2 Problem 11. and not a practical concern. . . For k ≥ 0. the mean function of the Xn process is n−1 E [Xn ] = cn E [X0 ] + i=0 cn−1−i E [Zi ] = E [X0 ] (7) Thus.8 Solution We can recusively solve for Yn as follows. . k] = σ 2 c|k| by choosing σ 2 = (1 − c2 )σ 2 ¯ (15) Problem 11.Since E[Zi ] = 0. k] = c2n+k Var[X0 ] + i=0 c2(n−1)+k−2i) σ 2 = c2n+k Var[X0 ] + σ 2 ck ¯ ¯ 1 − c2n 1 − c2 (9) Since E[Xn ] = 0. so that E[X0 Zi ] = 0. k] = σ 2 For k < 0. for Xn to be a zero mean process. . k] = E  cn X0 +  n−1 ck σ2 ¯ + c2n+k σ 2 − 2 1−c 1 − c2   (10) n+k−1 j=0 cn−1−i Zi i=0 n+k−1 j=0 cn+k X0 + c2(n+k−1−j) σ 2 ¯ cn+k−1−j Zj  (11) = c2n+k Var[X0 ] + c−k = c2n+k σ 2 + σ 2 c−k ¯ = (12) (13) (14) σ 2 −k ¯ σ2 ¯ c + c2n+k σ 2 − 2 1−c 1 − c2 1 − c2(n+k) 1 − c2 We see that RX [n. most of the cross terms will drop out.2. k] = E [Xn Xn+k ] = E  cn X0 + c n−1−i Zi cn+k X0 + cn+k−1−j Zj  (8) RX [n. we will assume that X0 is independent of Z0 . The autocorrelation function can be written as    n−1 i=0 n+k−1 j=0 Although it was unstated in the problem. 0] = σ 2 and we can write for k ≥ 0. Since E[Zi ] = 0 and E[Zi Zj ] = 0 for i = j. we have RX [n. Var[X0 ] = RX [n. Z1 . ¯ RX [n. Yn = aXn + aYn−1 = aXn + a[aXn−1 + aYn−2 ] = aXn + a Xn−1 + a [aXn−2 + aYn−3 ] 2 2 (1) (2) (3) 387 . autocorrelation simplifies to n−1 RX [n. we require that E[X0 ] = 0. we start from CY [m. only the i = j terms make a contribution. only the i = j terms make a nonzero contribution. we consider first the case when k ≥ 0. Also. since m + k ≤ m. CY [m. the substitution i = n − j yields Yn = n an−i+1 Xi i=0 (5) Now we can calculate the mean n n E [Yn ] = E i=0 an−i+1 Xi = i=0 an−i+1 E [Xi ] = 0 (6) To calculate the autocorrelation RY [m. k] = i=0 am−i+1 am+k−i+1 m k (9) (10) (11) (12) =a a2(m−i+1) (a ) + (a2 )m + · · · + a2 =a = For k ≤ 0. k] = E  a m−i+1 Xi a m+k−j+1 i=0 Xj  = am−i+1 am+k−j+1 E [Xi Xj ] (7) i=0 j=0 Thus. k] depends on m.   m m+k j=0 m m+k Since the Xi is a sequence of iid standard normal random variables. k]. we can show that (15) (16) . m+k j=0 m+k j=0 CY [m. the Yn process is not wide sense stationary. k] = 388 By steps quite similar to those for k ≥ 0. E [Xi Xj ] = 1 i=j 0 otherwise (8) CY [m. k] = i=0 k 2 m+1 a2 1 − a2 m m+k ak 1 − (a2 )m+1 am−i+1 am+k−j+1 E [Xi Xj ] i=0 j=0 (13) As in the case of k ≥ 0. k] = a2 a|k| 1 − (a2 )min(m. k] = a m−j+1 m+k−j+1 a =a −k am+k−j+1 am+k−j+1 (14) a2 a−k 1 − (a2 )m+k+1 1 − a2 A general expression that is valid for all m and k would be CY [m.By continuing the same procedure. we can conclude that n Yn = j=0 aj+1 Xn−j + an Y0 (4) Since Y0 = 0. This implies m CY [m.m+k)+1 1 − a2 Since CY [m. Hence X has covariance matrix CX = I. However. we can write the output Y3 = Y1 Y2 Y3       X0   X0 h1 h0 0 0   −1 1 0 0   Y1 X X Y3 = Y2  = h2 h1 h0 0   1  =  1 −1 1 0  1  . (5) (4) Problem 11. Thus X = X1 X2 X3 has covariance matrix    1 1/2 1/4 20 2−1 2−2 CX = 2−1 20 2−1  = 1/2 1 1/2 . 2−2 2−1 20 1/4 1/2 1  1 1 exp − x C−1 x . X 2 3 6 3 3 3 It follows that fX (x) = 4 2x2 5x2 2x2 2x1 x2 2x2 x3 + exp − 1 − 2 − 3 + 3 6 3 3 3 3(2π)3/2 . 1) random variables. To do so we find that the inverse covariance matrix is   4/3 −2/3 0 (3) C−1 = −2/3 5/3 −2/3 X 0 −2/3 4/3 A little bit of algebra will show that det(CX ) = 9/16 and that 2x2 5x2 2x2 2x1 x2 2x2 x3 1 x C−1 x = 1 + 2 + 3 − − .3. (3) 1 −2 3 389 .58).3. Since Y3 = HX.1 Solution Since the process Xn has expected value E[Xn ] = 0.2 Solution The sequence Xn is passed through the filter h = h0 h1 h2 = 1 −1 1 The output sequence is Yn . X2  X2  Y3 0 1 −1 1 0 h2 h1 h0 X3 X3   H X (1) as (2) We note that the components of X are iid Gaussian (0.17. x2 and x3 . it is best to decalre the problem solved at this point. the PDF of X is fX (x) = (2) If we are using Matlab for calculations. we can write out the PDF in terms of the variables x1 . the identity matrix. we know that CX (k) = RX (k) = 2−|k| . X 2 (2π)n/2 [det (CX )]1/2 (1) From Definition 5.Problem 11.   2 −2 1 CY3 = HCX H = HH = −2 3 −2 . (a) Following the approach of Equation (11. if you like algebra. we can write the output Y = Y1 Y2 Y3 as       X−1  X−1    1 −1 1 0 0  X0  h2 h1 h0 0 0  X0  Y1     (2) Y = Y2  =  0 h2 h1 h0 0   X1  = 0 1 −1 1 0  X1  . it follows that fY2 (y) = = 1 (2π)3/2 [det (CY2 )]1/2 1 (2π)3/2 1 exp − y C−1 y Y2 2 (9) (10) 3 2 2 √ exp − y1 − 2y1 y2 − y2 . 1 1 (8) Since det(CY2 ) = 2. Following the approach of Equation (11. we start by observing that the covariance matrix of Y2 is just the upper left 2 × 2 submatrix of CY3 . X has covariance matrix  1/4 1/8 1/16 1/2 1/4 1/8   1 1/2 1/4  .58). 3 (5) This implies Y3 has PDF fY3 (y) = = 1 (2π)3/2 [det (CY3 )]1/2 1 5y 2 √ exp − 1 (2π)3/2 3 1 exp − y C−1 y Y3 2 2 2 + 5y2 + 2y3 + 8y1 y2 + 2y1 y3 + 4y2 y3 6 (6) . 2 2 (4) 2 2 2 5y1 + 5y2 + 2y3 + 8y1 y2 + 2y1 y3 + 4y2 y3 .Some algebra will show that Some calculation (by hand or by Matlab) will  5 1 C−1 = 4 Y3 3 1 y C−1 y = Y3 show that det(CY3 ) = 3 and that  4 1 5 2 .3 Solution The sequence Xn is passed through the filter The output sequence is Yn . That is.3. 2 2 Problem 11.  1/2 1 1/2  1/4 1/2 1 (3) 390 . CY2 = 2 −2 −2 3 and C−1 = Y2 3/2 1 . (7) (b) To find the PDF of Y2 = Y1 Y2 .      X2   X2  0 0 1 −1 1 Y3 0 0 h2 h1 h0 X3 X3 H X h = h0 h1 h2 = 1 −1 1 (1) Since Xn has autocovariance function CX (k) =  1 1/2  1/2 1  CX =  1/4 1/2   1/8 1/4 1/16 1/8 2−|k| . This fact appears as asymmetry in fY (y). we can write the output Y = Y1 Y2 Y3 as     X−1 X−1       1 0 −1 0 0  X0  Y1 h2 h1 h0 0 0  X0      Y = Y2  =  0 h2 h1 h0 0   X1  = 0 1 0 −1 0   X1  . because Y2 is in the middle of Y1 and Y3 .3. This might make on think that the PDF fY (y) should be symmetric in the variables y1 .  1/2 1 1/2  1/4 1/2 1 (3) 391 .58). However. y2 and y3 .Since Y = HX.5 that Yn is a stationary Gaussian process. As a result. CY Some algebra will show that Some calculation (by hand or preferably by Matlab) will show that det(CY ) = 675/256 and that   12 2 −4 1  2 11 2  .4 Solution The sequence Xn is passed through the filter The output sequence is Yn . (5) C−1 = Y 15 −4 2 12 y C−1 y = Y This implies Y has PDF fY (y) = 1 exp − y C−1 y Y 2 2 2 2 12y1 + 11y2 + 12y3 + 4y1 y2 + −8y1 y3 + 4y2 y3 16 √ exp − = 30 (2π)3/2 15 3 (2π)3/2 [det (CY )]1/2 1 (7) . the random variables Y1 . X has the Toeplitz covariance matrix  1/4 1/8 1/16 1/2 1/4 1/8   1 1/2 1/4  . Problem 11. Comment: We know from Theorem 11. (2)      X2   X2  Y3 0 0 1 0 −1 0 0 h2 h1 h0 X3 X3 H X h = h0 h1 h2 = 1 0 −1 (1) Since Xn has autocovariance function CX (k) =  1 1/2  1/2 1  CX =  1/4 1/2   1/8 1/4 1/16 1/8 2−|k| . 9/16 −3/8 3/2  (4) (6) This solution is another demonstration of why the PDF of a Gaussian random vector should be left in vector form. Y2 and Y3 are identically distributed and CY is a symmetric Toeplitz matrix. the information provided by Y1 and Y3 about Y2 is different than the information Y1 and Y2 convey about Y3 . Following the approach of Equation (11. (8) 2 2 2 12y1 + 11y2 + 12y3 + 4y1 y2 + −8y1 y3 + 4y2 y3 . 15  3/2 −3/8 9/16 = HCX H = −3/8 3/2 −3/8 . 4. = HCX H =  3/8 −9/16 3/8 3/2  (4) (5) (6) (7) (8) √ 8 2 5 2 5 2 1 14 1 5 2 √ exp − y1 − y2 − y3 + y1 y2 − y1 y3 + y2 y3 . Y 14/33 −1/3 10/11 Some algebra will show that y C−1 y = Y This implies Y has PDF fY (y) = 1 (2π)3/2 [det (CY )]1/2 1 exp − y C−1 y Y 2 10 2 5 2 10 2 2 28 2 y + y + y − y1 y2 + y1 y3 − y2 y3 . Xn 3/4 RX [1] 1/2 −1/7 = .1 Solution This problem is solved using Theorem 11.Since Y = HX. 3/4 6/7 (2) (3) (4) and the optimum linear predictor of Xn+1 given Xn and Xn−1 is ← − ˆ Xn+1 = h Xn = − 1 7 6 7 1 6 Xn−1 = − Xn−1 + Xn . Xn h0 where RXn = and RXn Xn+1 = E Thus the filter vector h satisfies ← − h 1 3/4 h = 1 = h0 3/4 1 Thus h = 6/7 −1/7 −1 (1) 1 3/4 RX [0] RX [1] = 3/4 1 RX [1] RX [0] RX [2] 1/2 Xn−1 Xn+1 = = . CY Some calculation (preferably by Matlab) will show that det(CY ) = 297/128 and that   10/11 −1/3 14/33 C−1 =  −1/3 5/6 −1/3  . Problem 11. (6) . we can calculate it directly as ˆ e∗ = E (Xn+1 − Xn+1 )2 = E L 392 1 6 − Xn−1 + Xn − Xn+1 7 7 2 . Xn 7 7 (5) To find the mean square error of this predictor. The optimum linear predictor filter h = h0 h1 of Xn+1 given Xn = Xn−1 Xn is given by ← − h h = 1 = R−1 RXn Xn+k .9 with M = 2 and k = 1. 11 1 6 2 11 3 3 33 3  3/2 3/8 −9/16 3/2 3/8  . = 3/2 3 33 11 12 11 3 33 3 (2π) This solution is yet another demonstration of why the PDF of a Gaussian random vector should be left in vector form. 75 RX [0] RX [1] = 0.7 expressed in the terminology of filters.75 1.0193Xn−1 + 0.5 −0. we can calculate it directly as ˆ e∗ = E (Xn+1 − Xn+1 )2 = E (−0.7 in terms of the linear prediction filter h.1 0.2 Solution This problem is solved using Theorem 11.9 is just of Theorem 9. Xn h0 where RXn = and RXn Xn+1 = E Thus the filter vector h satisfies ← − 1. Expressing part (c) of Theorem 9.0193 0.1 RX [1] RX [0] RX [2] 0.6950Xn . The optimum linear predictor filter h = h0 h1 of Xn+1 given Xn = Xn−1 Xn is given by ← − h h = 1 = R−1 RXn Xn+k . 7 7 This direct method is already tedious. even for a simple filter of order M = 2.0193Xn−1 + 0. 0. L 393 (6) .9 with k = 1.1 0. A better way to calculate the mean square error is to recall that Theorem 11.6950Xn − Xn+1 )2 .6950 (2) (3) (4) and the optimum linear predictor of Xn+1 given Xn and Xn−1 is Xn−1 = −0. each cross-term is of the form E[Xi Xj ] = RX [i − j].By expanding the square and taking the expectation.75 0. so that e∗ = E L = 1 6 − Xn−1 + Xn − Xn+1 7 7 12 RX [1] + 49 96 RX [1] + 49 1 6 − Xn−1 + Xn − Xn+1 7 7 2 (7) (8) (9) 1 RX [0] − 49 86 = RX [0] − 49 2 36 12 RX [2] + RX [0] − RX [1] + RX [0] 7 49 7 2 3 RX [2] = .75 h h = 1 = h0 0. the mean square error of the predictor is ← − e∗ = Var[Xn+1 ] − h RXn Xn+k L ← RX [2] − = RX [0] − h RX [1] = 1 − −1/7 6/7 3 1/2 = .5 Xn−1 Xn+1 = = . 3/4 7 (10) (11) (12) For an arbitrary filter order M .75 1.4. Problem 11.0193 = .0193 −1 (1) 1.75 RX [1] 0. Xn (5) ← − ˆ Xn+1 = h Xn = −0.6950 To find the mean square error of this predictor.1 Thus h = 0.6950 −0. Xn 0. Equation (10) is a much simpler way to compute the mean square error. which is less than 0.82).4.0193 0. denoted Xn here to distinguish it from Xn in this problem. This difference ˜ ˜ corresponds to adding an iid noise sequence to Xn to create Xn . ˜ Xn = Xn + Zn (11) where Zn is an iid additive noise sequence with autocorrelation function RZ [k] = 0.1 as in this problem.1 and it is quite tedious. the mean square error of the predictor is ← − e∗ = Var[Xn+1 ] − h RXn Xn+k L ← RX [2] − = RX [0] − h RX [1] (7) (8) 1/2 = 0. the noise in the Xn process reduces the performance of the predictor.1. the optimal predictor filter of Xn+1 ˜ ˜ n−1 and Xn is h = 6/7 −1/7 = 0.7 in terms of the linear prediction filter h. Problem 11.1δ[k] that is independent of the Xn process. Because we are only finding the first order filter h = h0 h1 .9 is replaced by the parameter c and the noise variance 0. Theorem 11.4285. rather than RX [0] = 1.5884.1429 .9 is just Theorem 9. ˜ In addition. RXn Xn = RX [1] RX [0] = c 1 .4.3 Solution This problem generalizes Example 11. From the problem statement.1 − −0. The difference is simply that RX [0] = 1. which places more emphasis on the ˜ given X ˜ current value Xn in predicting the next value.7 expressed in the terminology of filters.14 to the parameter values c and η 2 . (10) RX [k] = ˜ 0 otherwise. Because the Xn process is less noisy. That is.11 states that the linear MMSE esti← − mate of X = Xn is h Y where ← − h = R−1 RYXn = (RXn + RWn )−1 RXn Xn .5884. Y From Equation (11.4. 3/4 (9) = 1. Expressing part (c) of Theorem 9. Not surprisingly. This approach is followed in the solution of Problem 11.25 |k| |k| ≤ 4. + 2 = 0 η c 1 + η2 c 1 (2) (1) 394 .8571 −0. Thus Xn in this problem can be viewed as a noisy version of ˜ ˜ ˜ Xn in Problem 11.6950 Comment: It is instructive to compare this solution to the solution of Problem 11. has autocorrelation function 1 − 0.2 is replaced by η 2 .1 where the ˜ random process. the mean squared error of the predictor of Xn+1 is only 3/7 = 0. Based on the observation Y = Yn−1 Yn . it is relatively simple to generalize the solution of Example 11.14 in that −0. A better way to calculate the mean square error is to recall that Theorem 11. RXn + RWn = 1 + η2 c 1 c η2 0 .4.We can expand the square and take the expectation term by term since each cross-term is of the form E[Xi Xj ] = RX [i − j]. This problem highlights a shortcoming of Theorem 11. 1 + η 2 − c2 395 RXn Xn = c RX [1] = 1 RX [0] (5) . the mean square error of the filter output is e∗ = Var[X] − a RYX L ← − = RX [0] − h RXn Xn (2) (3) −1 Problem 11.4 that Theorem 11.11.This implies ← − c 1 + η2 h = c 1 + η2 = = The optimal filter is h= 1 (1 + η 2 )2 − c2 −1 c 1 c 1 (3) (4) (5) 1 cη 2 . cη 2 (1 + η 2 )2 − c2 (6) Note that we always find that e∗ < Var[Xn ] = 1 simply because the optimal estimator cannot be L worse than the blind estimator that ignores the observation Yn .7.7 in terms of the linear estimation filter h.11 is just Theorem 9. We recall from the discussion at the start of Section 11.4. we recall that Theorem 11. the mean square error of the estimator is ← − e∗ = Var[Xn ] − h RYn Xn L ← − = Var[Xn ] − h RXn Xn ← c − = RX [0] − h 1 =1− c2 η 2 + η 2 + 1 − c2 . we find the mean square estimation error of the optimal first order filter in Problem 11.7 expressed in the terminology of filters.3. To apply this result to the problem at hand.4. (1 + η 2 )2 − c2 (7) (8) (9) (10) 1 1 + η 2 − c2 .11 in that the theorem doesn’t explicitly provide the mean square error associated with the optimal filter h. In this problem.11 is derived from Theorem 9. (1 + η 2 )2 − c2 1 + η 2 − c2 1 + η2 −c −c 1 + η2 To find the mean square error of this predictor. we observe that RX [0] = c0 = 1 and that ← − h = 1 (1 + η 2 )2 − c2 cη 2 . Expressing part (c) of Theorem 9.4 Solution (1) = RX [0] − RXn Xn (RXn + RWn ) RXn Xn (4) Equations (3) and (4) are general expressions for the means square error of the optimal linear filter that can be applied to any situation described by Theorem 11.7 with ← − h = a = R−1 RYX = (RXn + RWn )−1 RXn Xn Y From Theorem 9. This problem involves both linear estimation and prediction because we are using Yn−1 . the noisy observation of Xn1 to estimate the future value Xn . c = 1 corresponds to a degenerate process in which every pair of samples Xn and Xm are perfectly correlated. RX [0] (12) That is. evaluating the second derivative at c = 0 shows that is maximum at c = 0. The usual L de∗ approach is to set the derivative dcL to zero. Thus we can’t follow the cookbook recipes of Theorem 11. Thus we consider only values of x in the interval 0 ≤ x ≤ 1.4 to find the minimum mean Problem 11.4. This implies the mean square error is minimized by making x as large as possible.This implies ← − e∗ = RX [0] − h RXn Xn L =1− =1− = η2 1 (1 + η 2 )2 c2 η 2 + 1 + η 2 − c2 (1 + η 2 )2 − c2 1 + η 2 − c2 (1 + η 2 )2 − c2 − c2 cη 2 1 + η 2 − c2 (6) c 1 (7) (8) (9) The remaining question is what value of c minimizes the mean square error e∗ . each observation is just a noisy observation of the random variable X. In fact c = 1 corresponds to the autocorrelation function RX [k] = 1 for all k. Thus c = 1 minimizes the mean square error. the derivative is negative for 0 ≤ x ≤ 1.Xm = Cov [Xn . i. The observations are then of the form Yn = X +Zn .11.5 Solution 396 . a2 − x (10) d2 e ∗ L dc2 c=0 < 0. minimizing f (x) is equivalent to minimizing the mean square error. Instead we go back to Theorem 9. In fact. the optimal filter h= 1 1 . Physically. That is.9 or Theorem 11. and a = 1 + η 2 . Since each Xn has zero expected value. every pair of sample Xn and Xm has correlation coefficient ρXn . we observe that e∗ = η 2 f (x) where L f (x) = a−x . In this case. We observe that a2 − a df (x) =− 2 dx (a − x)2 (11) Since a > 1. 2 1 2+η (13) is just an equally weighted average of the past two samples. Xm ] Var[Xn ] Var[Xm ] = RX [n − m] = 1. For a more careful analysis.e. we must have |c| ≤ 1. Note that for RX [k] to be a respectable autocorrelation function. This would yield the incorrect answer c = 0.. For c = 1. x = 1. Thus the mean square error e∗ L with x = c2 . this corresponds to the case where where the random process Xn is generated by generating a sample of a random variable X and setting Xn = X for all n. 4. In addition. In that theorem. 1 c ˆ Y Xn = 2 (1 − c2 ) n−1 d1+β (10) where β 2 = η 2 /(d2 σ 2 ). The variance of Xn is n The expected value of Xn is E[Xn ] = an E[X0 ] + n Var[Xn ] = a2n Var[X0 ] + j=1 [aj−1 ]2 Var[Zn−j ] = a2n Var[X0 ] + σ 2 j=1 [a2 ]j−1 (3) c2n σ 2 σ 2 (1 − c2n ) σ2 + = 1 − c2 1 − c2 1 − c2 Note that E[Yn−1 ] = dE[Xn−1 ] + E[Wn ] = 0. From Theorem 9. the linear predictor for Xn becomes ˆ Xn = ρXn . the covariance of Xn and Yn−1 is Cov [Xn . we obtain (4) (5) Since Xn and Yn−1 Cov [Xn . we obtain the optimal linear predictor of Xn given Yn−1 . The variance of Yn−1 is Var[Xn ] = d2 σ 2 + η2 1 − c2 have zero mean. the mean square estimation error at step n ˆ e∗ (n) = E[(Xn − Xn )2 ] = Var[Xn ](1 − ρ2 n . That is.Yn−1 = Cov [Xn . the covariance of Xn and Yn−1 is Var[Yn−1 ] = d2 Var[Xn−1 ] + Var[Wn ] = Since Var[X0 ] = σ 2 /(1 − c2 ). a constant for all n. Yn−1 ] Var[Xn ] Var[Yn−1 ] Cov [Xn .Yn−1 Var[Xn ] Var[Yn−1 ] 1/2 Yn−1 = (9) Substituting the above result for Var[Xn ].square error linear estimator. Yn−1 ] = E [Xn Yn−1 ] = E [(cXn−1 + Zn−1 ) (dXn−1 + Wn−1 )] From the problem statement. Yn−1 ] cd Var[Xn−1 ] Yn−1 = Yn−1 Var[Yn−1 ] Var[Yn−1 ] E[Xn−1 ]E[Wn−1 ] = 0 E[Zn−1 Wn−1 ] = 0 (6) (7) (8) Since E[Yn−1 ] and E[Xn ] are zero. Yn−1 ] = cd Var[Xn−1 ] The correlation coefficient of Xn and Yn−1 is ρXn . Xn and Yn−1 play the roles of X and Y . e∗ is an L L L increasing function β. we learn that E[Xn−1 Wn−1 ] = 0 E[Zn−1 Xn−1 ] = 0 Hence. 397 .Yn−1 Var[Xn ] Var[Yn−1 ] n 1/2 (Yn−1 − E [Yn−1 ]) + E [Xn ] (1) By recursive application of Xn = cXn−1 + Zn−1 . we obtain Xn = a X0 + n j=1 aj−1 Zn−j n j−1 E[Z n−j ] j=1 a (2) = 0. ˆ our estimate Xn of Xn is ˆ ˆ Xn = XL (Yn−1 ) = ρXn .Yn−1 ) = σ 2 L X 1 + β2 1 + β 2 (1 − c2 ) (11) We see that mean square estimation error e∗ (n) = e∗ . we write RX (τ ) in terms of the autocorrelation sinc(x) = In terms of the sinc(·) function. (6) 1 SX −α f −α (5) . The autocorrelation of Y (t) is RY (t. SY (f ) = 1 −α 1 −α ∞ −∞ RX (−τ )e−j2π −α τ dτ .000 f 2000 + 5 rect 1. 0. = f For −α = |α| for α < 0.008 SX(f) 0. we obtain RX (τ ) = 10 sinc(2000τ ) + 5 sinc(1000τ ).000 f 1.1. the substitution τ = ατ yields SY (f ) = 1 α ∞ −∞ RX (τ )e−j2π(f /α)τ dτ = SX (f /α) α (3) When α < 0. yielding SY (f ) = Since RX (−τ ) = RX (τ ).1 Solution To use Table 11. τ ) = E [Y (t)Y (t + τ )] = E [X(αt)X(α(t + τ ))] = RX (ατ ) Thus Y (t) is wide sense stationary.004 0. (2) At this point. The power spectral density is SY (f ) = ∞ −∞ (1) RX (ατ )e−j2πf τ dτ. we consider the cases α > 0 and α < 0 separately.01 0.012 0.1.5. we can combine the α > 0 and α < 0 cases in the expression SY (f ) = 1 SX |α| 398 f |α| . πx (1) (2) (3) 10 rect 2. we start with Equation (2) and make the substitution τ = −ατ .002 0 −1500 −1000 −500 0 f 500 1000 1500 sin(πx) .5. f (4) ∞ −∞ RX (τ )e−j2π −α τ dτ .2 Solution The process Y (t) has expected value E[Y (t)] = 0. For α > 0. SX (f ) = Here is a graph of the PSD. From Table 11.Problem 11.006 0.000 Problem 11. 2 cos 2πφ (2) Problem 11.16.1) 1. The solution to this problem parallels Example 11.01 − 0. (a) From Table 11. (1) We can find the PSD directly from Table 11.7.1)2 = .1|k| corresponding to a|k| . This implies SY X (f ) = ∞ −∞ RXY (τ )e−j2π(−f )τ dτ = SXY (−f ) (3) To complete the problem.2 with 0.2 cos 2πφ 1 − (0. This implies [SXY (f )]∗ = = ∞ −∞ ∞ −∞ [RXY (τ )]∗ [e−j2πf τ ]∗ dτ RXY (τ )e−j2π(−f )τ dτ (4) (5) (6) = SXY (−f ) Problem 11.1) cos 2πφ 1 + (0.8.22. we use a form of partial fractions expansion to write SY (f ) = A B + (2πf )2 + a2 (2πf )2 + 104 399 (3) . From the definition of the cross spectral density.1. SY (f ) = |H(f )|2 SX (f ) = 2 · 104 [(2πf )2 + a2 ][(2πf )2 + 104 ] (2) 2 · 104 (2πf )2 + 104 H(f ) = 1 a + j2πf (1) To find RY (τ ).Problem 11.1)|k| . we observe that SX (f ) = By Theorem 11. The table yields SX (φ) = 1 + 2 − 0.1 Solution First we show that SY X (f ) = SXY (−f ).1 Solution Let a = 1/RC. [RXY (τ )]∗ = RXY (τ ). 2 − 2(0. we need to show that SXY (−f ) = [SXY (f )]∗ . SY X (f ) = Making the subsitution τ = −τ yields SY X (f ) = ∞ −∞ RY X (τ )e−j2πf τ dτ (1) ∞ −∞ RY X (−τ )ej2πf τ dτ (2) By Theorem 10. First we note that since RXY (τ ) is real valued. RY X (−τ ) = RXY (τ ).6.1 Solution Since the random sequence Xn has autocorrelation function RX [k] = δk + (0.14. the average power of Y (t) is E Y 2 (t) = ∞ −∞ (1) SY (f ) df = B −B df = 2B (2) (d) Since the white noise W (t) has zero mean. we require a > 0 in order to take the inverse transform of SY (f ).Note that this method will work only if a = 100.22. we see that −104 /a −a|τ | 100 e + 2 e−100|τ | (6) RY (τ ) = 2 4 a − 10 a − 104 −104 /a 100 + 2 2 − 104 a a − 104 E Y 2 (t) = 100 = RY (0) = Rearranging. Also. The values of A and B can be found by A= 2 · 104 (2πf )2 + 104 = −2 · 104 a2 − 104 B= 2 · 104 a2 + 104 = 2 · 104 a2 − 104 (4) ja f = 2π f = j100 2π This implies the output power spectral density is SY (f ) = −104 /a 2a 1 200 + a2 − 104 (2πf )2 + a2 a2 − 104 (2πf )2 + 104 (5) (b) To find a = 1/(RC).2 Solution (a) RW (τ ) = δ(τ ) is the autocorrelation function whose Fourier transform is SW (f ) = 1. (b) The output Y (t) has power spectral density SY (f ) = |H(f )|2 SW (f ) = |H(f )|2 (c) Since |H(f )| = 1 for f ∈ [−B. we find that a must satisfy (7) a3 − (104 + 1)a + 100 = 0 This cubic polynomial has three roots: a = 100 a = −50 + √ 2501 a = −50 − √ 2501 (8) (9) Recall that a = 100 is not a valid solution because our expansion of SY (f ) was not valid for a = 100. B]. This same method was also used in Example 11. the mean value of the filter output is E [Y (t)] = E [W (t)] H(0) = 0 (3) 400 . we use the fact that Since e−c|τ | and 2c/((2πf )2 + c2 ) are Fourier transform pairs for any constant c > 0. Problem 11.01 and RC ≈ 100. Thus √ a = −50 + 2501 ≈ 0.8. 2π 1 −z 2 /2 dz E Y 2 (t) = √ √ e 2π − 2π √ √ = Φ( 2π) − Φ(− 2π) √ = 2Φ( 2π) − 1 = 0. 401 .1. the input has power spectral density 1 2 SX (f ) = e−πf /4 2 The output power spectral density is SY (f ) = |H(f )|2 SX (f ) = (c) The average output power is E Y 2 (t) = ∞ −∞ (1) (2) 1 −πf 2 /4 e |f | ≤ 2 2 0 otherwise (3) SY (f ) df = 1 2 2 −2 e−πf 2 /4 df (4) This integral cannot be expressed in closed form. With this subsitution.8.4 Solution (a) The average power of the input is E X 2 (t) = RX (0) = 1 (b) From Table 11.8. we first find |H(f )|2 = H(f )H ∗ (f ) (1) a1 ej2πf t1 + a2 ej2πf t2 (2) (3) = a1 e−j2πf t1 + a2 e−j2πf t2 = a2 + a2 + a1 a2 e−j2πf (t2 −t1 ) + e−j2πf (t1 −t2 ) 1 2 It follows that the output power spectral density is SY (f ) = (a2 + a2 )SX (f ) + a1 a2 SX (f ) e−j2πf (t2 −t1 ) + a1 a2 SX (f ) e−j2πf (t1 −t2 ) 1 2 Using Table 11.Problem 11. we can express it in the form of the integral of a standardized Gaussian PDF by making the substitution f = z 2/π. However.1. the autocorrelation of the output is RY (τ ) = (a2 + a2 )RX (τ ) + a1 a2 (RX (τ − (t1 − t2 )) + RX (τ + (t1 − t2 ))) 1 2 (4) (5) Problem 11.9876 √ (5) (6) (7) The output power almost equals the input power because the filter bandwidth is sufficiently wide to pass through nearly all of the power of the input.3 Solution Since SY (f ) = |H(f )|2 SX (f ). 17. E Y 2 (t) = ∞ −∞ (5) + (2πf )2 ] |f | ≤ 100 otherwise (6) 10−4 /[104 π 2 0 SY (f ) df = = 100 −100 10−4 df 104 π 2 + 4π 2 f 2 100 0 (7) (8) 2 108 π 2 df 1 + (f /50)2 By making the substitution.02 (1) 10−4 H(f ) |f | ≤ 100 0 otherwise (2) 10−4 H ∗ (f ) |f | ≤ 100 0 otherwise (4) (d) By Theorem 11. then g(−τ ) and G∗ (f ) are a Fourier transform pair. f = 50 tan θ.13.5 Solution (a) From Theorem 11. RY X (τ ) = RXY (−τ ) (3) From Table 11.14.1.13(b). if g(τ ) and G(f ) are a Fourier transform pair.17 SXY (f ) = H(f )SX (f ) = (c) From Theorem 10.8.12 × 10−7 106 π 2 (9) Problem 11. SY (f ) = H ∗ (f )SXY (f ) = |H(f )|2 SX (f ) = (e) By Theorem 11.Problem 11.17 which states SXY (f ) = H(f )SX (f ) (1) 402 . we have 100 E Y (t) = 8 2 10 π 2 tan−1 (2) 0 dθ = tan−1 (2) = 1.8.6 Solution The easy way to do this problem is to use Theorem 11. we have df = 50 sec2 θ dθ. Using the identity 1 + tan2 θ = sec2 θ. This implies ∗ SY X (f ) = SXY (f ) = ∞ −∞ SX (f ) df 100 −100 10−4 df = 0. E X 2 (t) = (b) From Theorem 11. The solution is to write SXY (f ) in the following way: SXY (f ) = 8/33 1/11 1/11 −8/33 + + + 7 + j2πf 4 + j2πf 4 + j2πf 4 − j2πf 8/33 8/11 −8/33 + + = 7 + j2πf 4 + j2πf 16 + (2πf )2 (6) (7) (8) Now.1. That is. A straightforward way to do this is to use a partial fraction expansion of SXY (f ). we observe that 8 −8/33 1/3 1/11 = + + (7 + s)(4 + s)(4 − s) 7+s 4+s 4−s 1/3 1/11 −8/33 + + 7 + j2πf 4 + j2πf 4 − jπf (4) Hence. we see from Table 11. SXY (f ) = H(f )SX (f ) = 8 [7 + j2πf ][16 + (2πf )2 ] (3) 8 16 + (2πf )2 H(f ) = 1 7 + j2πf (2) (b) To find the cross correlation.1 that the inverse transform is RXY (τ ) = − 8 −7τ 8 1 e u(τ ) + e−4τ u(τ ) + e−4|τ | 33 33 11 (9) Problem 11. we can write the cross spectral density as SXY (f ) = (5) Unfortunately. the expected value of the output is µY = µN H(0) = 0. terms like 1/(a − j2πf ) do not have an inverse transforms.17. we need to find the inverse Fourier transform of SXY (f ). (b) The output power spectral density is SY (f ) = |H(f )|2 SN (f ) = 10−3 e−2×10 (c) The average power is E Y 2 (t) = ∞ −∞ ∞ −∞ 0 6 |f | (1) SY (f ) df = 10−3 e−2×10 ∞ 6 |f | df 6f (2) df (3) (4) = 2 × 10−3 = 10−3 403 e−2×10 .7 Solution (a) Since E[N (t)] = µN = 0. by defining s = j2πf .(a) From Table 11.8. we observe that SX (f ) = From Theorem 11. (d) Since N (t) is a Gaussian process. In that case. Since Theorem 11.8. Thus the average power of M (t) is ˆ q= ˆ ∞ −∞ SM (f ) df = ˆ ∞ −∞ SM (f ) df = q (1) (b) The average power of the upper sideband signal is E U 2 (t) = E M 2 (t) cos2 (2πfc t + Θ) ˆ − E 2M (t)M (t) cos(2πfc t + Θ) sin(2πfc t + Θ) ˆ + E M 2 (t) sin2 (2πfc t + Θ) (2) (3) (4) 404 .9 Solution ˆ (a) Note that |H(f )| = 1.3745 Y (t) Var[Y (t)] > 0. we have h(v) = 1 0≤v≤t 0 otherwise (3) 1 0≤t−u≤t 0 otherwise (2) −∞ h(t − u)N (u) du (1) Thus the impulse response h(v) depends on t.01 √ 0. we must have h(t − u) = Making the substitution v = t − u.1 to calculate P [Y (t) > 0. Problem 11. Theorem 11.2 requires that h(t) be time invariant.01] = P 1−Φ 0. Y (t) = 0 t N (u) du = ∞ For the above two integrals to be the same.01 Var[Y (t)] (6) (7) (8) Var[Y (t)] = E Y 2 (t) = 10−3 (5) Problem 11. This implies SM (f ) = SM (f ).8 Solution Suppose we assume that N (t) and Y (t) are the input and output of a linear time invariant filter h(u).32) = 0. the filter response is linear but not time invariant. Thus the random variable Y (t) is Gaussian with E [Y (t)] = 0 Thus we can use Table 3.3 says Y (t) is a Gaussian process.001 = 1 − Φ(0.8. this example does not violate the theorem. That is. for an integer n = 0. the identity 2 sin φ cos φ = sin 2φ implies E [2 sin(2πfc t + Θ) cos(2πfc t + Θ)] = E [cos(4πfc t + 2Θ)] = 0 (11) ˆ Since M (t) and M (t) are independent of Θ. the average power of the upper sideband signal is ˆ E U 2 (t) = E M 2 (t) E cos2 (2πfc t + Θ) + E M 2 (t) E sin2 (2πfc t + Θ) ˆ − E M (t)M (t) E [2 cos(2πfc t + Θ) sin(2πfc t + Θ)] (12) (13) (14) = q/2 + q/2 + 0 = q Problem 11. E sin2 (2πfc t + Θ) = E 1 (1 − cos(2π(2fc )t + 2Θ)) = 1/2 2 (10) 1 (1 + cos(2π(2fc )t + 2Θ)) = 1/2 2 (9) In addition.To find the expected value of the random phase cosine. we evaluate E [cos(2πfc t + nΘ)] = 0 2π cos(2πfc t + nθ) 1 dθ 2π (5) (6) (7) 1 sin(2πfc t + nθ)|2π 0 2nπ 1 = (sin(2πfc t + 2nπ) − sin(2πfc t)) = 0 2nπ = Similar steps will show that for any integer n = 0. we can show E cos2 (2πfc t + Θ) = E Similarly. E [V (t)] = E [W (t) cos(2πfc t + Θ)] = E [W (t)] E [cos(2πfc t + Θ)] = 0 (c) We cannot initially assume V (t) is WSS so we first find RV (t. RW (τ ) = 10−15 δ(τ ). the random phase sine also has expected value (8) E [sin(2πfc t + nΘ)] = 0 Using the trigonometric identity cos2 φ = (1 + cos 2φ)/2. τ ) = E[V (t)V (t + τ )] = E[W (t) cos(2πfc t + Θ)W (t + τ ) cos(2πfc (t + τ ) + Θ)] = E[W (t)W (t + τ )]E[cos(2πfc t + Θ) cos(2πfc (t + τ ) + Θ)] = 10 −15 (1) (2) (3) (4) (5) δ(τ )E[cos(2πfc t + Θ) cos(2πfc (t + τ ) + Θ)] 405 .8.10 Solution (a) Since SW (f ) = 10−15 for all f . (b) Since Θ is independent of W (t). RX (τ ) = sinc(2W τ ) so that SX (f ) = 1 rect 2W f 2W . Since L(f ) is a linear time invariant filter. (2) 406 . The filter output has power 2 spectral density 10−15 /2 |f | ≤ B SY (f ) = |L(f )|2 SV (f ) = (14) 0 otherwise The average power of Y (t) is E Y 2 (t) = ∞ −∞ SY (f ) df = B −B 1 −15 10 df = 10−15 B 2 (15) Problem 11. RV (t. Thus we need to find the expected value of E [cos(2πfc t + Θ) cos(2πfc (t + τ ) + Θ)] only at τ = 0. the optimal linear filter is ˆ H(f ) = SX (f ) SX (f ) + SN (f ) (1) In this problem. we see that V (t) is a wide sense stationary process. its good practice to solve for arbitrary τ : E[cos(2πfc t + Θ) cos(2πfc (t + τ ) + Θ)] 1 E[cos(2πfc τ ) + cos(2πfc (2t + τ ) + 2Θ)] = 2 1 2π 1 1 cos(2πfc τ ) + cos(2πfc (2t + τ ) + 2θ) dθ = 2 2 0 2π = = = Consequently.We see that for all τ = 0. (a) From Equation (11. the filter output Y (t) is also a wide sense stationary process.146).9. However. τ ) = 10−15 δ(τ ) cos(2πfc τ ) = 10−15 δ(τ ) 2 2 (13) (d) Since E[V (t)] = 0 and since RV (t. (e) The filter input V (t) has power spectral density SV (f ) = 1 10−15 .1 Solution The system described in this problem corresponds exactly to the system in the text that yielded Equation (11. 1 cos(2πfc τ ) + 2 1 cos(2πfc τ ) + 2 1 cos(2πfc τ ) 2 2π 1 sin(2πfc (2t + τ ) + 2θ) 2 0 1 1 sin(2πfc (2t + τ ) + 4π) − sin(2πfc (2t + τ )) 2 2 (6) (7) (8) (9) (10) (11) (12) 1 1 RV (t. t + τ ) = 0. τ ) = RV (τ ).146). 20 × 104 .9. e∗ = L ∞ −∞ (4) where α = √ (b) From Equation (11.It follows that the optimal filter is ˆ H(f ) = 1 2W 1 2W rect f 2W f 2W rect + 10−5 = 105 rect 105 + 2W f 2W . (3) (b) From Equation (11. the minimum mean square error is e∗ = L ∞ −∞ SX (f ) SN (f ) df = SX (f ) + SN (f ) = ∞ −∞ ˆ H(f )SN (f ) df (4) W 105 10−5 df (5) 105 + 2W −W 2W = 5 . the optimal linear filter is ˆ H(f ) = In this problem.2.2 Solution The system described in this problem corresponds exactly to the system in the text that yielded Equation (11. we see that the filter H(f ) has impulse response 109 −α|τ | ˆ e h(τ ) = 2α 1.025 × 109 = 3.146).083. What L is occurring in this problem is the optimal filter is simply an ideal lowpass filter of bandwidth ˆ W .3 Hz. 2α ˆ = 10−5 h(0) = 407 .147).146). (6) 10 + 2W It follows that the mean square error satisfies e∗ ≤ 0. 1.000)2 +(2πf )2 104 + 10−5 (5.025 × 109 + (2πf )2 (1) (2) = (3) ˆ From Table 11. Increasing W increases the bandwidth of the signal and the bandwidth of the filter H(f ). the minimum mean square error is SX (f ) SN (f ) df = SX (f ) + SN (f ) ∞ −∞ ˆ H(f )SN (f ) df ∞ −∞ (5) (6) (7) = 10−5 ˆ H(f ) df 104 = 0.04 if and only if W ≤ 2. Problem 11.000)2 +(2πf )2 SX (f ) SX (f ) + SN (f ) 104 . (a) From Equation (11.000)2 + (2πf )2 109 . This allows more noise to pass through the filter and decreases the quality of our estimator.147). (5.1562. RX (τ ) = e−5000|τ | so that SX (f ) = It follows that the optimal filter is ˆ H(f ) = 104 (5. . .j − Mn (Yi ))2 (7) This is the approach of the following program. %ymv(i) is the mean and var (over m paths) of y(i).2. The samples Yi.  Yn Xn 1/2n · · · 1/4 1/2 Y H X (6) When X is a column of iid Gaussian (0. each column of Y = HX is a sample path of Y1 .Problem 11. From the filter response. yavmat=yav*ones(1. . Yn . yav=sum(Y. . .1 Solution Although it is straightforward to calculate sample paths of Yn using the filter response Yn = 1 1 2 Yn−1 + 2 Xn directly. .5). 0  . .j . ymv=[yav yvar].  . A solution using vectors and matrices tends to run faster.1 X=randn(500. Yi. . .5 zeros(1. Y=H*X. .3.     1/4 1/2  . .1 .  .2)/m. let matrix entry Yi. . we can write 1 Y1 = X1 2 1 Y2 = X1 + 4 1 Y3 = X1 + 8 . In this case. .2 . .  .499)])... the necessary loops makes for a slow program. function ymv=yfilter(m). . 1) random variables. Yn = (1) 1 X2 2 1 1 X2 + X3 4 2 (2) (3) (4) (5) 1 1 1 X + n−1 X2 + · · · + Xn n 1 2 2 2 In vector notation. We can estimate the mean and variance of Yi using the sample mean Mn (Yi ) and sample variance Vm (Yi ) of Section 7.m). H=toeplitz([(0. yvar=sum((Y-yavmat). . the column vector Y = HX is a single sample path of Y1 .  X  . The commands ymv=yfilter(100).j denote a sample Yi of the jth sample path. these equations become      1/2 0 ··· 0 X1 Y1 .plot(ymv) will generate a plot similar to this: 408 .   .. .10. Yn . %the filter output of 11.10.[0.  Y2   . 1) random variables.   2 . Yi. . j=1 V (Yi ) = 1 m−1 m j=1 (Yi. These estimates are Mn (Yi ) = 1 m m Yi.6 and 11.m). When X is an n × m matrix of iid Gaussian (0.^(1:500)].^2. = . .2)/(m-1).m are iid samples of Yi . . . . In the Matlab operation R=fft(r.4 0.N). Thus our sample variance results are also not surprising.10.1.2 0 −0. the sample means of Yi and Yi+1 are not very correlated when a large number of sample paths are averaged.04*pi*k)’.m).0. Problem 11. then R is a row vector.2. %Generate m sample paths of length n of a %Gaussian process with ACF R[k]=cos(0.1.10. Var[Yi ] converges to 1/3. x=gaussvector(0.4 0 50 100 150 200 250 n 300 350 400 450 500 Sample mean Sample variance We see that each sample mean is small. The commands x=cospaths(100.plot(x) will produce a graph like this one: 2 1 Xn 0 −1 −2 0 10 20 30 40 50 n 60 70 80 90 100 409 . the shape of the output R is the same as the shape of the input r.m).1.m generates Gaussian sample paths with the desired autocorrelation function RX (k) = cos(0. The program cospaths.2 Solution This is just a Matlab question that has nothing to do with probability.6 0. Problem 11. The instruction n=reshape(0:(N-1). it can be shown (in the solution to Problem 11. rx=cos(0. Also.10). Comment: Although within each sample path. The same observations apply to the sample variance as well.04 ∗ pi ∗ k).3 Solution The program is simple because if the second input parameter to gaussvector is a length m vector rx.2 −0. on the other order of 0.rx. then R is a column vector. If r is a column vector. then rx is assumed to be the first row of a symmetric Toeplitz covariance matrix. the shape of n must be the same as the shape of R. If r is a row vector. Note that E[Yi ] = 0. For fftc to work the same way.01 and standard deviation 0. the sample mean has variance 1/m = 0. Here is the code: function x=cospaths(n. Yi and Yi+1 are quite correlated.6 for example) that as i becomes large.04*pi*k) k=0:n-1. Thus it is to be expected that we observe sample mean values around 0.size(R)) does this. Exact calculation of the covariance of the sample means of Yi and Yi+1 might be an interesting exercise. For m = 100 samples. RX [M − 1] that are needed.8 0. This produces the solution using Gaussian elimination. .m is designed so that if the sequence Xn has a finite duration autocorrelation function such that RX [k] = 0 for |k| ≥ m. is to use the matrix division operator x = A\b.7 0.02.10.3 0.1 0.2 0. Conversely. 1) random variable.10). this may seem strange if you are used to seeing Gaussian processes simply as noisy processes or fluctating Brownian motion processes.10. >> X=fft(x).98 corresponding to each sample path sinusoid having frequency 0. . Problem 11. The function lmsepredictor. A frequent misuse of inv arises when solving the system of linear equations . each sample Xn is a Gaussian (0.6 0.02 frequency component depends on the magnitude of the corresponding sinusoidal sample path. >> stem((0:99)/100. However. However. from both an execution time and numerical accuracy standpoint.abs(X)).02 and (100 − k)/100 = 0. the amplitude and phase of each sample path is random such that over the ensemble of sinusoidal sample functions.5 k/100 0. to confirm that that each sample path is a perfect sinusoid. Finally. See \ and / for further information.5 Solution . The same discussion goes on to give an example where x = A\b is both faster and more accurate. it is seldom necessary to form the explicit inverse of a matrix. The commands >> x=cospaths(100. it would also appear from the graph that every sample path is a perfect sinusoid. Note that the magnitude of each 0. 410 Problem 11. we calculate the DFT of each sample path. .We note that every sample path of the process is Gaussian random sequence. one for each Gaussian sample function. . rather than just resembling a sinusoid. without forming the inverse.4 Solution Searching the Matlab full product help for inv yields this bit of advice: In practice.4 0.9 1 The above plot consists of ten overlaid 100-point DFT magnitude stem plots. but the LMSE filter of order M − 1 for M ≥ m is supposed to be returned. if rx specifies more values RX [k] than are needed. then the operation rx(1:M) extracts the M values RX [0]. in this case. then the lmsepredictor automatically pads the autocorrelation vector rx with a sufficient number of zeros so that the output is the order M − 1 filter. A better way. will produce a plot similar to this: 120 100 80 Xk 60 40 20 0 0 0. Each plot has exactly two nonzero components at frequencies k/100 = 0. One way to solve this is with x = inv(A) ∗ b. 2 k=-N:N. the truncation of the autocorrelation RX [k] to 201 points results in a difference.6 Solution Applying Theorem 11. %Returns and plots the 2N+1 %point DFT of R(-N) .25).5 0.However. The resulting filter output will be the LMSE filter for the filter response (−0. (2) rect(φ/φ0 ) = SX (φ) = 0 otherwise. an 2N + 1-point DFT would yield a sampled version of the DTFT SX (φ).. suppose 1 1 |φ| ≤ φ0 /2.4 with sampling period Ts = 1/4000 s yields RX [k] = RX (kTs ) = 10 sinc(0. As with any derivation of the transform of a sinc function. M=ceil(0. This yields SX (φ) = 5 10 rect(φ/0. However.M) for M ≥ 6.25 (4) Ideally. 0.5k) + 5 sinc(0.. Problem 11. the result is that rx is incorrectly padded with zeros. RX [k] = 1/2 −1/2 SX (φ) ej2πφk dφ = 1 φ0 φ0 /2 −φ0 /2 ej2πφk dφ = 1 ejπφ0 k − e−jπφ0 k = sinc(φ0 k) φ0 j2πk (3) Now we apply this result to take the transform of RX [k] in Equation (1). we need to find the DTFT of sinc(φ0 k) Unfortunately.abs(DFT(1:(M+1)))). this was omitted from Table 11. rather than the LMSE filter for the true autocorrelation function.5) + rect(φ/0. we guess the answer and calculate the inverse transform. For |φ0 | ≤ 1. When we pass the truncated representation rx of length m = 6 and request lmsepredictor(rx. Here is a Matlab program that shows the difference when the autocorrelation is truncated to 2N + 1 terms. (1) To find the power spectral density SX (φ). RX [k] = (1) 0 otherwise.10.2.9)|k| has infinite duration.25k).9)|k| |k| ≤ 5. rx=10*sinc(0. stem(phi.. phi=(0:M)/(2*N+1). DFT=fftc(rx).25*k). function DFT=twosincsdft(N). R(N) %for ACF R[k] in Problem 11. %Usage: SX=twosincsdft(N). the DFT will be a sampled version of the DTFT of RX [k] rect(k/(2N + 1)). R(0) . φ0 We find RX [k] from the inverse DTFT. xlabel(’\it \phi’). in this problem RX [k] = (−0.5*k) + 5*sinc(0.2 so we now take a detour and derive it here. ylabel(’\it S_X(\phi)’). In this case.. For N = 100. Here is the output of twosincsdft(100). 411 .6*N).  = R−1 RXn Xn+1 .zeros(M-m+1.. .3 From the stem plot of the DFT. Comment: In twosincsdft. due to numerical precision issues. (1) . function [h.66)..M).2 0.05 0. r=[r(:). RXn Xn+1 = E  (2)  Xn+1  =  .e]=onesteppredictor(r.10.M).15 φ 0.      Xn−M +1 RX [M ]      . We see that the effects of windowing are particularly pronounced at the break points.4. e=r(1)-(flipud(h))’*RYX.M) to perform the calculations. we use the Matlab function onesteppredictor(r.7 Solution h0 where RXn is given by Theorem 11.1)]. it is easy to see the deviations from the two rectangles that make up the DTFT SX (φ).6 and RXn Xn+1 is given by Equation (11. 412 . However.X(n-1). the mean square error is ← − ← − e∗ = E (Xn+1 − h Xn )2 = Var[Xn+1 ] − h RXn Xn+1 .X(n-M+1)] for X(n+1) m=length(r).1 using Theorem 11. In this problem. X n Problem 11. the actual DFT tends to have a tiny imaginary hence we use the abs operator. RYX=r(M+1:-1:2).. Hence the command stem(DFT) should be sufficient. DFT must be real-valued since it is the DFT of an autocorrelation function.%append zeros if needed RY=toeplitz(r(1:M)). RX [1] Xn Going back to Theorem 9.25 0.50 40 SX(φ) 30 20 10 0 0 0.. . In this problem. . . h=flipud(RY\RYX). L (3) For M > 2.9 with k = 1 for filter order M > 2 The optimum linear predictor filter h = h0 h1 · · · hM −1 of Xn+1 given Xn = Xn−(M −1) · · · Xn−1 Xn is given by   hM −1 ←  . R_X(m-1)] %assumes R_X(n)==0 for n >=m %output=vector h for lmse predictor % xx=h’[X(n). we generalize the solution of Problem 11. %usage: h=onesteppredictor(r.7(a). . %input: r=[R_X(0) R_X(1) .  − h =  .1 0. ’-d’). Xn  + k − 1]  .4286 >> [h4.8000 0.66).3 2 4 6 M 8 10 This problem generalizes the solution of Problem 11. ee=[ ].4) h4 = 0.ee.0000 -0.e]. Here are two examples just to show it works.4 MSE 0.42 0. >> [h2.M) implements this solution.m r=1-0.34 0. . RXn Xn+k     Xn−M +1 RX [M     .1429 e2 = 0. %onestepmse. to a k-step predictor. = E   Xn+k  =  .The code is pretty straightforward. ylabel(’\it MSE’).32 0.38 0.44 0.e]=onesteppredictor(r. [h. .8571 -0.7. .10.25*(0:3). In this problem. The optimum linear predictor filter h = h0 h1 · · · hM −1 of Xn+1 given Xn = Xn−(M −1) · · · Xn−1 Xn is given by   hM −1 ←  .2000 e4 = 0.M). end plot(2:10. L The Matlab function kpredictor(r. 0. (3) 413 .e4]=onesteppredictor(r.4000 The problem also requested that we calculate the mean square error as a function of the filter order M .e2]=onesteppredictor(r. ee=[ee. . the mean square error is ← − ← − e∗ = E (Xn+k − h Xn )2 = Var[Xn+k ] − h RXn Xn+k .  − (1) h =  .8 Solution h0 where RXn is given by Theorem 11. for M=2:10. . (2) RX [k] Going back to Theorem 9.36 0. X n Problem 11. Here is a script and the resulting plot of the MSE.7(a).2) h2 = 0. xlabel(’\itM’).6 and RXn Xn+k is given by Equation (11.0000 -0.  = R−1 RXn Xn+k .10. ^(n0).9 Solution To generate the sample paths Xn and Yn is relatively straightforward.. r=[r(:).x00.z. x0=sqrt(vx)*x00.xhat.1.x00. d = 10 414 (d) c = 0.x00.w).1.9. pause.function h=kpredictor(r.x00.6.z. ylabel(’\itX_n’). vy=((d^2)*vx) + 1.x(1:n-1)]+w.z=randn(50. pause. 2 Xn 0 n 2 0 X 0 10 20 n 30 40 50 −2 −2 0 10 20 n 30 40 50 (a) c = 0. vx=1/(1-c^2). To show how sample paths are similar for different values of c and d.eye(1. %usage: h=kpredictor(r. xpaths(0.zeros(M-m+1. n1=(1:n)’.1.z.4.9.M. pause. To do this.z. ˆ the one-step prediction Xn is marked by dots.z.x00.X(n-1). ˆ Some sample paths for Xn and Xn for the requested parameters are shown below. xpaths(0. In addition..m w=randn(50. pause. d2 + (1 − c2 ) (1) ˆ In the following. 1) random variable that is the same for all sample paths.0. %appends zeros if needed RY=toeplitz(r(1:M)).1)].6. xhat=((c*d*vx)/vy)*y. toeplitz(c.M.x.xhat]=xpaths(c.n1.k).d. we construct all samples paths using the same sequence of noise √ samples Wn and Zn . The code is pretty straightforward.6.9. and the noise vectors z and w. y=d*[x0.’b-’. xpaths(0.w) which constructs a sample path given the parameters c and d.10.w) n=length(z). n0=(0:(n-1))’.5 showed that the optimal linear predictor Xn (Yn−1 ) of Xn given Yn−1 is ˆ Xn = cd Yn−1 .’k:’). xpaths(0.X(n-N+1)] for X(n+k) m=length(r). %predictpaths.6. RYX=r(1+k:M+k).w). Problem 11. Here are the two programs: function [x. pause. x00=randn(1).z.w). xpaths(0. axis([0 50 -3 3]). we write a function xpathplot(c.^(n1)*x0)+.1).1). plot(n1.z. given c. In each pair. we plot sample paths of Xn and Xn .x00.d. the solution ˆ to Problem 11. we choose X0 = X00 / 1 − c2 where X00 is a Gaussian (0. xpaths(0..10. For σ = η = 1.. h=flipud(RY\RYX). A simple script predictpaths.9.z.10.x00..w).w). R_X(m-1)] %assumes R_X(n)==0 for n >=m %output=vector a % for lmse predictor xx=h’[X(n)..0.w).x00.k). d = 10 .m calls xpathplot to generate the requested sample paths. %input: r=[R_X(0) R_X(1) .1. xlabel(’\itn’).n))*z. x=(c. In graphs (a) through (c).6.6. However. With respect to c. the mean X square error e∗ is an increasing function of c2 . e∗ L L L is a decreasing function of d.9. d = 0.9. the predictor stays close to zero.Yn−1 decreases. the mean square error e∗ is the product of L Var[Xn ] = σ2 1 − c2 1 − ρ2 n . the performance of the predictor is less easy to understand. When we compare graphs (a)-(c) with c = 0.Yn−1 characterizes the extent to X which the optimal linear predictor is better than the blind predictor. In Equation (11). 415 . the predictor places less emphasis on the measurement Yn and instead makes predictions closer to E[X] = 0.9 to graphs (d)-(f) with c = 0. Effectively.6. we see greater variation in Xn for larger a but in both cases. That is. d = 1 2 Xn 0 n (e) c = 0.Yn−1 = X (d2 + 1)(1 − c2 ) d2 + (1 − c2 ) (3) As a function of increasing c2 . we see that the predictor tracks Xn less well as d decreases because decreasing d corresponds to decreasing the contribution of Xn−1 to the measurement Yn−1 . d = 0. when d is small in graphs (c) and (f). Var[Xn ] increases while 1 − ρ2 n .2 Xn 0 n 2 0 X 0 10 20 n 30 40 50 −2 −2 0 10 20 n 30 40 50 (b) c = 0. Overall.1 (f ) c = 0. Var[X] is the mean square error obtained L using a blind estimator that always predicts E[X] while 1 − ρ2 n . d = 1 2 0 X −2 0 10 20 n 30 40 50 −2 0 10 20 n 30 40 50 (c) c = 0. the impact of the measurement noise is increased.1 The mean square estimation error at step n was found to be e∗ (n) = e∗ = σ 2 L L d2 + 1 d2 + (1 − c2 ) (2) We see that the mean square estimation error is e∗ (n) = e∗ . As d decreases. the predictor worked well when d was large. In addition. a constant for all n. 9 0.06 0  .1 .4 Solution Based on the problem statement. From the problem statement.3 Problem 12.5 0.04 0 0.1.1.5 P20 P21 P22 (1) This problem is very straightforward if we keep in mind that Pij is the probability that we transition from state i to state j. are not allowed.5 0.000100 0.9 The Markov chain has state transition matrix   0.1.02 1 0.04 0.9 0.999929 0.04 0.25 0.699900 0.5 0 0.9 0. the “mini-OFF” state.000071 0 P00 P01 P02 (1) P = P10 P11 P12  = 0. the OFF state. the corresponding Markov chain is     0. the state transition matrix is     0.Problem Solutions – Chapter 12 Problem 12.04 0.04 0.06 2 0.06 3 0.899900 0. the state transition matrix is P= P00 P01 1−p p = P10 P11 q 1−q (1) Problem 12. The Markov chain is P00 P01 P11 P12 P22 2 P21 P20 0 P10 1 The only difference between this chain and an arbitrary 3 state chain is that transitions from 0.5 0 0 0.5 0 P00 P01 P02 0 P = P10 P11 P12  =  0.1. to state 2.04 0. the mini-OFF state.5 0.5 0.25 0.1.5 0.1 Solution From the given Markov chain. From Example 12. P= 0. the state of the wireless LAN is given by the following Markov chain: 0. we add state 2.02 0.9 0.3 Solution In addition to the normal OFF and ON states for packetized voice.04 0.000100 0.06 0.9 416 (1) . P20 P21 P22 0.04 0.2 Solution Problem 12. since files that are read obey the same length distribution. we write another sector and stay in the write state with probability P22 = 0.98 P10 + P12 = 0. . Also. Xn+1 = 0] = P01 = 0. the conditional probability that P [Xn+1 = 1|Xn = 2. • “. we learn that P01 = P02 = 0. this statement says that the conditional probability that P [Xn+1 = 1|Xn = 0. P11 = 0.5 Solution In this problem.” P [Xn+1 = 2|Xn = 1. suppose idle periods last for a geometric time with mean 500. Xn+1 = 2] = P21 = 0.” Now we find that given that at time n.004 P12 = 0.Problem 12. we learn that P10 = 0.02. This fact also implies P20 + P21 = 0.02. a write follows with probability 0. it is helpful to go fact by fact to identify the information given.016 (6) • “However. .8 P10 + P12 (4) Here we learn that given that at time n. the conditional probability that (5) Combined with the earlier fact that P10 + P12 = 0. we will write another sector with probability 49/50 = 0. the system is equally likely to read or write a file.5 P01 + P02 (3) • “After an idle period. In terms of our Markov chain. 0 otherwise (1) with p = 1/50 = 0. This also says that P01 + P02 = 0.02 (2) • “Further. 2. each read or write operation reads or writes an entire file and that files contain a geometric number of sectors with mean 50.02.008 417 P21 = 0.6.998. on completion of a write operation.6 P20 + P21 (7) Combined with the earlier fact that P20 + P21 = 0.002. a read operation follows with probability 0. if we are in the write state. Given that at time n. .02. Xn = 0. .” This statement simply says that given the system is idle. we learn that P20 = 0.” This statement says that the length L of a file has PMF PL (l) = (1 − p)l−1 p l = 1.1.8.012 (8) .001 • “Following the completion of a read. . it remains idle for another unit of time with probability P00 = 499/500 = 0. Xn = 2. Xn+1 = 1] = P12 = 0.98. Xn = 1. This says that when we write a sector.98.002.” Combined with the earlier fact that P01 + P02 = 0. 9 λn p p −p + 2 p −q q p+q (−0. ˆ Finally.2 0. Problem 12. P10 P11 q 1−q 0.998 0.9 = 1.016 0.98 0 0. X0 = i0 = P Xm(n+1) = j|Xmn = i = Pij (m) (1) (2) (3) The key step is in observing that the Markov property of Xn implies that Xmn summarizes the past history of the Xn process. Xn−1 = in−1 . .1 (1) This chain is a special case of the two-state chain in Example 12. given Xmn .012 0.1. . we write ˆ ˆ ˆ ˆ P Xn+1 = j|Xn = i.7 and that the n-step transition matrix is Pn = 1 q P00 (n) P01 (n) = P10 (n) P11 (n) p+q q 1 0. .7 Solution Under construction.1. 0.2.8 P00 P01 = = . You may wish to derive the eigenvalues and eigenvectors of P in order to diagonalize and then find Pn . Problem 12.004 1 0. . That is.8 0. . this implies that the state Xn has one-step state transition probabilities equal to the ˆ m-step transition probabilities for the Markov chain Xn .98 0.7 0.6 which showed that the chain has eigenvalues λ1 = 1 and λ2 = 1 − (p + q) = −0.8 and q = 0.8 −0.6 Solution ˆ To determine whether the sequence Xn forms a Markov chain.1. .The complete tree is 0.8 + .6 with p = 0. Or.9 0.8 −0.7 (2) (3) 418 . X0 = i0 = P Xm(n+1) = j|Xmn = i.9. Xm(n+1) is independent of Xmk for all k < n. Problem 12.1 Solution P= From Example 12.001 0.1.9 0. Xm(n−1) = in−1 . .7)n 0. That is. the state transition matrix is 1−p p 0.8 Solution Under construction.9 1.001 0. P = Pm .008 2 Problem 12. . you may wish just to refer to Example 12.5 and Example 12. 5 0 = 1 1 0 0 0 0  −0. Hence.5) (4) (5) (6) Problem 12.25 0.5 P20 P21 P22 0.5 1 0 (0. the state transition matrix is     0.5 − (0.25 0. Note that in the packet voice system.2.5 0 P00 P01 P02 0 P = P10 P11 P12  =  0. the time step corresponded to a 10 ms time slot. Note that pj (n) is within 1% of πj if These requirements become |πj − pj (n)| ≤ 0. 2.3.5 0.5 0.01 2 5/12 The minimum value n that meets both requirements is n= ln 0.5 0.01 2 7/12 λn ≤ 0.5 (1) (3)     1 −1 0 1 0 0 0.Problem 12.5 −0. the state probabilities at time n are p(n) = 7 12 5 12 + λn 2 7 12 5 12 p0 with λ2 = 1 − (p + q) = 344/350. the state probabilities are all within one percent of the limiting state probability vector.5 0  n+1 (0. 419 .5 0 −0.5 0.5 −0.5)n+1    0.5  0.67 seconds are required.5 −0.1 Solution From Example 12.5)n 0.2 Solution The way to find Pn is to make the decomposition P = SDS−1 where the columns of S are the eigenvectors of P and D is a diagonal matrix containing the eigenvalues of P.01 = 267 ln λ2 (5) Hence.8.5 = 0.5 0 1 0 1 0 0 0.5 1  1 −1 0 1 = 1 1 0 0 1 0 1 0  0. The eigenvalues are λ1 = 1 The corresponding eigenvectors are   1 s1 = 1 1 The decomposition of P is P = SDS−1 Finally.5 0.5 − (0.01πj λn ≤ 0.5 0 0.5 0.5 0 0 0 0 0  −0. With initial state probabilities p0 p1 = 1 0 .5)n  0. Pn is Pn = SDn S−1 λ2 = 0   −1 s2 =  1  0 λ3 = 1/2   0 s3 = 0 1 (2) From the given Markov chain. after 267 time steps. p(n) = 5 12 − 7 12 p1 −5 12 p0 + 7 12 p1 (1) (2) (3) (4) + λn 2 5 12 −5 12 The limiting state probabilities are π0 π1 = 7/12 5/12 . there is one additional complication.9. the probability of state k is q. the arrivals are the occurrences of packets in error. pk (n) = q ∞ i=0 ∞ i=0 pi (n − 1)Pik pi (n − 1) = q (1) (2) Thus for any time n > 0. the sequence of packets that are correct (c) or in error (e) up to and including the next error is given by the tree e •X=1 e •X=2 e •X=3 0.1 is that state 2 is a transient state. 0. This will guarantee that the resulting Markov chain will be aperiodic. the time X until the next packet error has the PMF  x=1  0. let pi (n − 1) denote the state probabilities. Moreover. the process is a delayed renewal process. then the time until the first error is different from subsequent renewal times.2 Solution At time n − 1. the time until the next error always has the same PMF. . 420 • Replace P10 = 1/2 with P12 = 1/2 • Replace P00 = 1/2 with P02 = 1/2 Problem 12. following an error. In this case. Problem 12.3.Problem 12.99) PX (x) = (1)  0 otherwise Thus. there will be at least one self transition probability. If p = 0. the probability of state k at time n is pk (n) = Since Pik = q for every state i. It would appear that N (t) is a renewal process. 3. has the same PMF as X above and the process is a renewal process.1. that will be nonzero.4.1 0.9 ¨ 0. The hardest part of this problem is that we are asked to find all ways of replacing a branch. We can get rid of the transient behavior by making a nonzero branch probability P12 or P02 .4.99 Assuming that sending a packet takes one unit of time.99 0.. we need to know the probability p of an error for the first packet.9 x−2 x = 2. If p = 0. However. It would seem that N (t) cannot be a renewal process because the interarrival times seem to depend on the previous interarrival times. 0. At time 0.01 ¨ ¨¨ ¨¨ ¨¨ ¨ ¨¨ ¨¨ ¨¨ c ¨ c ¨ c .9. The primary problem with the Markov chain in Problem 12.1 Solution .3. the time until the first error.01 ¨ 0. The possible ways to do this are: • Replace P01 = 1/2 with P02 = 1/2 • Replace P11 = 1/2 with P12 = 1/2 Keep in mind that even if we make one of these replacements. By Theorem 12. following a packet error. . either P00 or P11 . . then X1 . however.3 Solution In this problem..001(0. this time is independent of previous interarrival times since it depends only on the Bernoulli trials following a packet error. P [Gi ] ≥ Pji (n) since there may be paths with more than n hops that take the system from state j to i. In the solution to Problem 12.1. it must be that E[Tij ] < ∞. we found that the state transition matrix was   0.11 has two communicating classes as well as the transient state 2.001 P = 0.5. P02 } {P51 .Problem 12. By conditioning on Gj . then on those occasions we pass through i.012 0. we will pass through state i. In addition. we need to add two branches. the expected time to go to back to j will be infinite. which is a contradiction. Thus. the smallest nonnegative integer such that Pji (n) > 0. This implies E [Tjj |Gi ] = E [Tji |Gi ] + E [Tij |Gi ] ≥ E [Tij |Gi ] (3) Since the random variable Tij assumes that we start in state i. then sometimes when we go from state j back to state j. There are many possible pairs of branches. Hence. A second transition will be needed to create a single communicating class.5. To create a single communicating class. we need to add a transition that enters state 2. Using a math to prove this requires a little bit of care.98 (1) 421 . Some pairs of positive branch probabilities that create an irreducible chain are {P50 .004 0. no matter how we add such a transition.4. Thus E[Tjj |Gi ] ≥ E[Tij ]. Problem 12. It’s possible to find the solution by hand but its easier to use MATLAB or a similar tool.3 Solution Given the event Gi . E[Tij |Gi ] = E[Tij ]. If E[Tij ] = ∞. let Gi denote the event that we go through state i on our way back to j.1304]. Since i and j communicate.008 0.001 0. i i E [Tjj ] ≥ E [Tjj |Gi ] P [Gi ] (1) (2) Problem 12. Yet.2 Solution The chain given in Example 12. Given we start in state j.7536 0. Tjj = Tji + Tij . These facts imply E [Tjj ] ≥ E [Tjj |Gi ] P [Gi ] ≥ E [Tij ] Pji (n) = ∞ (4) Thus. we can find n.1 Solution We can find the stationary probability vector π = π0 π1 π2 by solving π = π P along with π0 + π1 + π2 = 1.1159 0.98 0. The solution is π = [0. E [Tjj ] = E [Tjj |Gi ] P [Gi ] + E [Tjj |Gc ] P [Gc ] i i Since E[Tjj |Gc ]P [Gc ] ≥ 0.998 0. Suppose E[Tij ] = ∞.016 0. state j is not positive recurrent. we will still have two communicating classes. P02 } (1) {P12 . P23 } The idea behind this claim is that if states j and i communicate.4. This would suggest E[Tjj ] = ∞ and thus state j would not be positive recurrent. 5 0.2.5 by solving 2πj = 1 j=0 (1) (2) Of course.5.25 π =πP We find the stationary probabilities π = π0 π1 π2 the state transition matrix is  0.5 P =  0. one equation of π = π P will be redundant.5 − (0.5 0.25 0.5)n+1 (0.5.5 0  (9) Pn =  n+1 0. The stationary probability vector is π = π0 π1 π2 = 0. the Markov chain is p p 0 1-p 1 1-p p 2 1-p p 3 1-p p 4 422 .5 0.5) converges to π .5)n 0.5 0 0. you would have found that the n-step transition matrix is   0.5 0.1.5 n   = lim 0.5π0 + 0. we see that π2 = 0.5 0 (8) (6) (7) (3) (4) (5) From Theorem 12.5 0.5 0 lim P = lim n→∞ n→∞ n→∞ 0.2.5)n+1 (0.5π2 1 = π0 + π 1 + π 2 From the second equation.5π0 + 0.5)n 0. The three independent equations are π0 = 0.2 Solution From the Markov chain given in Problem 12.1.5.5 0. This leaves the two equations: π0 = 0.5 0.    0.5 0.5)n+1 0. we know that each rows of the n-step transition matrix this case.5π1 + 0.5π1 1 = π0 + π 1 Solving these two equations yields π0 = π1 = 0.5 If you happened to solve Problem 12. In    π 0  = π  0 0 π (10) Problem 12.  0.21.5 0 0.25π2 π2 = 0.3 Solution 1-p From the problem statement.5 0.5 − (0.5 0 0.5 0.5 − (0.5 0 0.5 − (0.Problem 12. K. by partitioning the chain between states i and i + 1. .13. REquiring the stationary probabilities to sum to 1 yields K i=0 πi = π0 (1 + α + α2 + · · · + αK ) = 1. .The self-transitions in state 0 and state 4 guarantee that the Markov chain is aperiodic. . i=0 (2) This implies π0 = Thus. we can find the stationary probabilities by solving π = π P.4 Solution 1-p p 0 1-p From the problem statement. in this problem it is simpler to apply Theorem 12. in this problem it is simpler to apply Theorem 12. . REquiring the stationary probabilities to sum to 1 yields 4 πi = π0 (1 + α + α2 + α3 + α4 ) = 1. This implies πi+1 = απi where α = p/(1 − p). the Markov chain is p p 1 1-p p p i+1 1-p ××× i ××× 1-p K The self-transitions in state 0 and state K guarantee that the Markov chain is aperiodic. (4) Problem 12. Since the chain is also irreducible. . for i = 0. It follows that πi = αi π0 . (4) . 1 − α5 1−α p 1−p p 1−p 5 (3) 1 − α5 i 1 − πi = α = 1−α 1− p 1−p i . 4. 1 − αK+1 1−α 1− 1− 423 p 1−p K+1 i (2) This implies π0 = Thus. . In particular. we can find the stationary probabilities by solving π = π P.13. however. 1. we obtain (1) πi p = πi+1 (1 − p). we obtain (1) πi p = πi+1 (1 − p). for i = 0. . 1− πi = αi = 1−α αK+1 (3) p 1−p p 1−p . 1. however. It follows that πi = αi π0 . This implies πi+1 = απi where α = p/(1 − p). Since the chain is also irreducible.5. by partitioning the chain between states i and i + 1. In particular. . π0 = (1 − p)π0 + p(1 − p)π1 This implies that π0 = (1 − p)π1 . . it’s hard to draw the entire Markov chain since from each state n there are six branches. Similarly.6 Solution This system has three states: 0 front teller busy. if n + k > K − 1. it’s easier to guess that the solution is π = 1/K 1/K · · · It’s easy to check that 1/K = (1/6) · 6 · (1/K) 1/K . Thus. 424 . 6 (1) Problem 12.) Nevertheless. .5. n + 6. by applying π0 + π1 + π2 = 1. finding the stationary probabilities is not very hard. the nth equation of π = π P yields πn = 1 (πn−6 + πn−5 + πn−4 + πn−3 + πn−2 + πn−1 ) . rear teller busy 2 front teller idle. The Markov chain for this system is 1-p 1-p p2+(1-p)2 1-p p(1-p) 0 p(1-p) 1 p 2 We can solve this chain very easily for the stationary probability vector π. π2 = (1 − p)π2 + p(1 − p)π1 yields π2 = (1 − p)π1 . rear teller busy We will assume the units of time are seconds. Hence. the teller will become idle in th next second with probability p = 1/120. rear teller idle 1 front teller busy. . we obtain π0 = π2 = 1−p = 119/358 3 − 2p 1 = 120/358 π1 = 3 − 2p (3) (4) (2) (1) The stationary probability that both tellers are busy is π1 = 120/358. In particular. (Of course. then the transition is to state n + k mod K. if a teller is busy one second. . n + 2.5. In particular.5 Solution Rather than try to solve these equations algebraically. (2) Problem 12. each with probability 1/6 to states n + 1.For this system. Making sure the probabilities add up to 1 yields π = π0 π1 π2 π3 = 2/7 2/7 2/7 1/7 . the other teller has a customer requiring one more minute of service 1 (1. 1) Each teller has a customer requiring one more minute of service. and 3 and adding the constraint j πj = 1 yields the following equations: π0 = π2 π2 = (1/2)π0 + (1/2)π1 π3 = (1/4)π0 + (1/4)π1 1 = π0 + π 1 + π 2 + π 3 (1) (2) (3) (4) ¼ Substituting π2 = π0 in the second equation yields π1 = π0 .5.2) ¼ and two new customers entering service. the set of states is 0 (0. we will use (i. we will order the requirements so that i ≤ j. Note that when we departing 3 0 (0. Problem 12. 2) One teller has a customer requring one minute of service. or both customers will hae one minute service requirements (1.1) (1. when two new customers start service each requiring two minutes of service.7 Solution In this case. 1) One teller is idle. Problem 12.9 Solution Under construction. (5) Both tellers are busy unless the system is in state 0. or one customer will need one minute of service and the ½ other will need two minutes of service with probability 1/2.1) from either state (0. The resulting Markov chain is shown on the right. The stationary probability both tellers are busy is 1 − π0 = 5/7.2) with probability 1/4. The state transiton probabilities reflect 1 ¼ 1 ½ the fact that both customer will have two minute service requirements with 1 2 probability 1/4. j). 425 . 1) corresponds to both custoemrs finishing service (2. the remaining service requirements of the two customers. 2).5. 2 (1. 2) Each teller has a customer requiring two minutes of service. 3 (2. 2. The other teller has a customer requiring two minutes of service.Problem 12. For the state of the system. For example.8 Solution Under construction. To reduce the number of states.5. we need to keep track of how soon the customer will depart. Substituting that result in the third equation yields π3 = π0 /2. Since the system assumes there is always a backlog of cars waiting to enter service. ¼ Writing the stationary probability equations for states 0. 1) or (1. we will examine the system each minute. For each customer in service. the system state will be (2. 5 2 Problem 12. we can view the system state as describing the age of an object that is repeatedly replaced. .11. 0. we see that the largest d that divides n for all n such that Pii (n) > 0 is d = 1.5 0. N > n − 1] = .5 0. E [N ] This is exactly the same stationary distribution found in Quiz 12.1 Solution Equivalently.5! In this problem.Problem 12. suppose for state i. P [N > n − 1] P [N > n − 1] (1) (4) Next we apply the requirement that the stationary probabilities sum to 1. So. Thus K K 1= n=0 πn = π0 n=0 P [N > n] = π0 ∞ n=0 P [N ∞ n=0 P [N > n] . and each unit of time. In state 0.2 Solution P[N >1|N >0] The Markov chain for this system is P[N >2|N >1] P[N >3|N >2] P[N >K-1|N >K-2] P[N >K|N >K-1] 0 P[N=2|N >1] 1 P[N=3|N >2] P[N=K|N >K-1] P[N=K+1|N >K] 2 …K-1 K Note that P [N > 0] = 1 and that P [N > n|N > n − 1] = Solving π = π P yields π1 = P [N > 1|N > 0] π0 = P [N > 1] π0 P [N > 2] π2 = P [N > 2|N > 1] π1 = π1 = P [N > 2] π0 P [N > 1] .6. Pii > 0. Since the chain has one communicating class. The converse that Pii = 0 for all i implies the chain is periodic is false. the chain is also aperiodic.6. the object ages one unit of time.5 0 0. then the chain cannot be periodic. The random 426 . . Hence. we recall that > n] = E[N ]. Since P [N ≤ K + 1] = 1. we start with a new (zero age) object. consider the simple chain on the right with Pii = 0 for each i. As a counterexample.5 0. we see for n ≥ K + 1 that P [N > n] = 0. (5) From Problem 2. The largest d that divides both 2 and 3 is d = 1. Hence. P [N > n] πn = P [N > n|N > n − 1] πn−1 = πn−1 = P [N > n] π0 P [N > n − 1] (2) (3) P [N > n] P [N > n. we can prove that if Pii = 0 for some i. Note that P00 (2) > 0 and P00 (3) > 0. Since Pii = Pii (1). state 0 is aperiodic. state i is aperiodic and thus the chain is aperiodic.5 1 0. This implies π0 = 1/E[N ] and that (6) πn = P [N > n] .5. In general. C0 0 2 C1 1 3 From each state i ∈ C0 . i2 . we must have that (l + n) mod L = l.5. the PMF of the age of the object is the same as the PMF of the residual life. A transition to state 0 corresponds to the current object expiring and being replaced by a new object.5 show that the age and the residual life have the same stationary distribution. each node i belongs to exactly one set Cn . In Quiz 12. (2) Because the chain is irreducible. Problem 12. 3} are each communicating classes. ij ∈ C(l+j) mod L .6. the system would transition to a state N = n corresponding to the lifetime of n for a new object. Consider any sequence of states i. . By our earlier claim. That is. Since there is an n hop path from i0 to j and an m hop path from j back to i0 . .6. Moreover. We make the following claim: • If j ∈ Cn (i0 ) and j ∈ Cn (i0 ) with n > n. since state i ∈ Cl . . Similarly. from each state i ∈ C1 .6. This object expires and is replaced each time that state 0 is reached. all transitions are to states j ∈ C1 . To show that state i has period L. Since in ∈ C(l+n) mod L . n = 0. This solution and the solution to Quiz 12. let Cn (i0 ) denote the set of states that reachable from i0 in n hops. i2 ∈ Cl+2 and so on. in = i only if in ∈ Cl . The structure of the Markov chain implies that i1 ∈ Cl+1 . For this state i0 . we must show that Pii (n) > 0 implies n is divisible by L. any state i belongs to some set Cn (i0 ).4 Solution This problem is easy as long as you don’t try to prove that the chain must have a single communicating class. . . . Similarly. Consider an arbitrary state i ∈ Cl . Here is a 4 state Markov chain with two recurrent communicating classes and each state having period 2. Thus n − n = (n + m) − (n + m) = (k − k)d. there exists a state i0 such that Pi0 i0 (d) > 0. only transitions to states j ∈ C0 are permitted.variable N is the lifetime of the object. 427 . each state has period 2. . we observe that irreducibility of the Markov chain implies there is a sequence of m hops from state j to state i0 . n + m = k d for some integer k . since there is an n hop path from i0 to j. i1 .5 Solution Since the Markov chain has period d.3 Solution Problem 12. Now we define Cn = ∞ k=0 (1) Cn+kd (i0 ). A counterexample is easy to find. Problem 12. then n = n + kd for some integer k To prove this claim. . in = i that returns to state i in n steps. d − 1. if we inspect an object at an arbitrary time in the distant future. This occurs if and only if n is divisible by L. n + m = kd for some integer k. 1} and {2. and thus any state i belongs to at least one set Cn . At state 0. 1. The two classes C0 and C1 are shown. the system state described a countdown timer for the residual life of an object. The sets {0. However. A cut between states 0 and 1 yields π1 = (p/δ)π0 . {C0 . πi α = πi+1 δ. In this case. the sequence of n + kd hops from i0 to i followed by one hop to state j is an n + 1 + kd hop path from i0 to j. the number of customers in the system is either n − 1 (if the customer in service departs and no new customer arrives). i = 0. . . . If there are k customers in the system at time n. Problem 12. . then at time n + 1. for state i > 0. This contradicts the fact that j cannot belong to both Cn+1 and Cn+m . . p α i−1 πi = π0 (2) δ δ Under the condition α < δ. To find the stationary probabilities.} as shown in Figure 12.13 by partitioning the state space between states S = {0. Now suppose there exists states i ∈ Cn and j ∈ Cn+m such that Pij > 0 and m > 1. Hence there exists i ∈ Cn and j ∈ Cn+1 such that Pij > 0. Combining these results.Hence. implying j ∈ Cn+1 . p ≥ 1/2. . The transition probabilities are given by the following chain: 1-p p 0 d 1 d 1-a-d a 2 d 1-a-d a Problem 12. i + 2. . (1) This implies πi+1 = (α/δ)πi . 2. we have for any state i > 0. Note that the stationary probabilities do not exist if α ≥ 1. .8. n (if either there is no new arrival and no departure or if there is both a new arrival and a departure) or n + 1. 1.26 to be πi = (1 − α)αi . there exists states i ∈ Cn (i0 ) and j ∈ Cn+1 (i0 ) such that Pij > 0. Cd−1 } is a partition of the states of the states of the Markov chain. (1) where α = p/(1 − p). . By construction of the set Cn (i0 ). i} and S = {i + 1. it follows that ∞ i=0 πi = π0 + π0 ∞ i=1 p α δ δ i−1 = π0 1 + p/δ 1 − α/δ (3) 428 . if there is a new arrival but no new departure. 1.4.21: 1-p p 0 1-p 1 1-p p 2 1-p p ××× The stationary probabilities were found in Example 12.13.2 Solution where α = p(1 − q) and δ = q(1 − p). By Theorem 12. . we apply Theorem 12. . Hence no such transition from i ∈ Cn to j ∈ Cn+m is possible.8. . or equivalently. . .1 Solution A careful reader of the text will observe that the Markov chain in this problem is identical to the Markov chain introduced in Example 12. we learn that in each state i. i = 1. Since in both the discrete time and continuous time chains. 2.7143 1.9.since p < q implies α/δ < 1. Problem 12. the stationary probabilties must be the same. which is both sufficient and necessary for the Markov chain to be positive recurrent.71 In the continuous time chain. we have i−1 p/(1 − p) p (1 − p)(1 − q) q/(1 − q) . (4) Note that α < δ if and only if p < q. 429 . the expected time in each state is the same. we have states 0 (silent) and 1 (active).4)p0 = p1 .9.2 Solution 0.4 1 = p1 = = (2) p0 = 2.0 (1) The stationary probabilities satisfy (1/1.4 q10 = 1. When we measure time in hours. . In this problem.1 Solution From the problem statement. the stationary probabilities are 7 5 1. Thus. Since p0 + p1 = 1. It is always possible to approximate a continuous time chain with a discrete time chain in which the unit of time is chosen to be very small. From the renewal-reward process. applying π0 = q . . the continuous time chain and the discrete time chain have the exact same state probabilities. this is not surprising since we could use a renewal-reward process to calculate the fraction of time spent in state 0. In general however. the stationary probabilities of the two chains will be close though not identical. the tiger spends an exponential time with parameter λi .4 12 In this case. λ0 = q01 = 1/3 λ1 = q12 = 1/2 λ2 = q20 = 2 (1) The corresponding continous time Markov chain is shown below: 1/3 0 2 1 ½ 2 The state probabilities satisfy 1 p0 = 2p2 3 The solution is p0 p1 p2 = 6/11 4/11 1/11 (3) 1 1 p1 = p0 2 3 p0 + p1 + p2 = 1 (2) Problem 12. The transition rates and the chain are 0 1 1 q01 = 1 = 0. we know that the fraction of time spent in state 0 depends only on the expected time in each state.4 12 2. q−p πi = i πi = 1 and noting δ − α = q − p. . there are transitions of rate qij = 1 to each of the other k − 1 states. Since p0 + p1 = 1. . Thus. Here is the Markov chain for this discrete time system: p0 = 1-l1D l1D 0 l0D 1 1-l0D This implies Not surprisingly. Note that if the system is in state 0. .From each state i. (2) λ0 + λ1 λ0 + λ1 It is also possible to solve this problem using a discrete time Markov chain. λ0 + λ1 (3) Problem 12. . The continuous time Markov chain is just l1 0 l0 1 Problem 12. One way to do this is to assume a very small time step ∆. Thus p1 = (λ1 /λ0 )p0 . 2. p1 = . otherwise the system stays in state 1 with probability 1 − λ0 ∆. .1 Solution In Equation (12. otherwise the system stays in state 0 with probability 1 − λ1 ∆.4 Solution The stationary probabilities satisfy p0 λ1 = p1 λ0 . Thus each state i has departure rate νi = k − 1. .9. If the system is in state 1. . . a transition to state 1 occurs with probability λ1 ∆. we have that (1) p0 + (λ1 /λ0 )p0 = 1. transitions to state 0 occur at rate λ0 . k (2) In this problem.93). Similarly.10. we build a two-state Markov chain such that the system in state i ∈ {0. a transition to state 0 occurs with probability λ0 ∆. the stationary probabilities for this discrete time system are π0 = λ0 . transitions to state 1 occur with rate λ1 . In state 0. 1} if the most recent arrival of either Poisson process is type i.3 Solution j = 1. λ0 λ1 . .9. λ0 + λ1 π1 = λ1 . we found that the blocking probability of the M/M/c/c queue was given by the Erlang-B formula ρc /c! (1) P [B] = PN (c) = c k k=0 ρ /k! 430 . in state 1. the stationary probabilities satisfy pj (k − 1) = pj i=j Problem 12. k (1) It is easy to verify that the solution to these equations is pj = 1 k j = 1. 2. The parameter ρ = λ/µ is the normalized load. ρ= 1± √ (3) (4) Problem 12. . the offered load per circuit remains the same. including MATLAB have trouble calculating 200!. . you need to observe that if qn = ρn /n!.0.) To do these calculations.76 × 10−4 ). 431 . psum=1. This is a general property of the Erlang-B formula and is called trunking efficiency by telephone system engineers. The hard part of calculating P [B] is that most calculators. . psum=psum+p.1 yields the quadratic equation 2 2 ρ2 − ρ − = 0 9 9 The solutions to this quadratic are 19 9 √ The meaningful nonnegative solution is ρ = (1 + 19)/9 = 0. The basic principle ss that it’s more efficient to share resources among larger groups.0.2 Solution When we double the call arrival rate.10. the number of calls in progress is described by the stationary probabilities and has PMF PN (n) = The probability a call is blocked is P [B] = PN (200) = ρ200 /200! = 2. end y=p/psum. That is. since we double the number of circuits to c = 200. In this case. then qn+1 = ρ qn−1 n (3) A simple MATLAB program that uses this fact to calculate the Erlang-B formula for large values of c is function y=erlangbsimple(r. the offered load ρ = λ/µ doubles to ρ = 160. if we inspect the system after a long time.c) %load is r=lambda/mu %number of servers is c p=1. for k=1:c p=p*r/k. (In MATLAB. .004 to 2. When c = 2. factorial is calculated using the gamma function. the blocking probability is P [B] = ρ2 /2 1 + ρ + ρ2 /2 (2) Setting P [B] = 0. However. 200 otherwise (1) Note that although the load per server remains the same.76 × 10−4 200 k k=0 ρ /k! (2) 200 k=0 0 ρn /n! ρk /k! n = 0. doubling the number of circuits to 200 caused the blocking probability to go down by more than a factor of 10 (from 0. 200! = gamma(201).5954. 1. With an arrival rate of λ and a service rate of µ = 1. the set of toll booths are an M/M/c/∞ queue. That is.4 Solution Since the arrivals are a Poisson process and since the service requirements are exponentially distributed. By defining ρ = λ/mu. A better program for calculating the Erlang-B formula uses the improvements employed in poissonpmf to calculate the Poisson PMF for large values. . the Markov chain is: λ λ λ λ λ 0 µ 1 2µ cµ c cµ c+1 cµ 432 . . . pb=pn(c+1)/sum(pn). In the M/M/1/c queue. which implies pn = ρn p0 n = 0. 1. the Markov chain for this queue is λ λ λ λ Problem 12. %Usage: pb=erlangb(rho. . With arival rate λ and service rate µ. 1. Here is the code: function pb=erlangb(rho.Essentially the problems with the calculations of erlangbsimple.3 Solution 0 µ 1 µ µ c-1 µ c By Theorem 12. . c (1) applying the requirement that the stationary probabilities sum to 1 yields c i=0 pi = p0 1 + ρ + ρ2 + · · · + ρc = 1 1−ρ 1 − ρc+1 n = 0.c) %returns the Erlang-B blocking %probability for sn M/M/c/c %queue with load rho pn=exp(-rho)*poissonpmf(rho. there is a waiting room that can hold c − 1 customers. . in addition to the server. .c).10. there is one server and the system has capacity c.m are the same as those of calculating the Poisson PMF.24.10. c (2) This implies p0 = The stationary probabilities are pn = (1 − ρ)ρn 1 − ρc+1 (3) (4) Problem 12.0:c). we have pi = ρpi−1 . the stationary probabilities satisfy pi−1 λ = pi µ. . . the expected number in queue i is E [Ni ] = The expected number in the system is E [N1 ] + E [N2 ] = ρ 1 − ρ/2 (3) npn = ρ/2 1 − ρ/2 (2) (b) The combined queue is an M/M/2/∞ queue. Note that if ρ > c. In short. there is a possibility that one of the queues becomes empty while there is more than one person in the other queue. . . if the arrival rate in cars per second is less than the service rate (in cars per second) when all booths are busy.10. . 433 . c + 2. each with an arrival rate of λ/2. . 1. . As in the solution to Quiz 12. The reason for this is that in the system with individual queues.5 Solution (a) In this case. . p0 (ρ/c) (1) where ρ = λ/µ = λ. each queue has a stationary distribution pn = (1 − ρ/2) (ρ/2)n ∞ n=0 n = 0. or λ < c.10. . . which occurs if and only if ∞ n=c+1 ρ c n−c = ∞ j=1 ρ c j <∞ (3) That is. Some algebra will show that (6) ρ 1 − (ρ/2)2 We see that the average number in the combined queue is lower than in the system with individual queues. p0 ρ The requirement that ∞ n=0 pn = 1 yields −1 p0 = The expected number in the system is E[N ] = E [N ] = ρ2 ρ2 ρ/2 + 1+ρ+ 2 2 1 − ρ/2 = 1 − ρ/2 1 + ρ/2 (5) ∞ n=1 npn . . 4. We must be sure that ρ is small enough that there exists p0 > 0 such that ∞ n=0 c pn = p0 1+ n=1 ρn ρc + n! c! ∞ n=c+1 ρ c n−c =1 (2) This requirement is met if and only if the infinite sum converges. (1) Note that in this case. By defining ρ = λ/µ. . we have two M/M/1 queues. Problem 12. the stationary probabilities satisfy n = 1. then the Markov chain has a stationary distribution. 2. p0 > 0 if and only if ρ/c < 1.In the solution to Quiz 12. 2 p0 ρn /n! pn = (4) n−2 ρ2 /2 n = 3.10. c p0 ρn /n! n−c c ρ /c! n = c + 1. we found that the stationary probabilities for the queue satisfied pn = n = 1. . . then the Markov chain is no longer positive recurrent and the backlog of cars will grow to infinity. The Markov chain for the LCFS queue is the same as the Markov chain for the M/M/1 first come. The probability that a handoff call is dropped is P [H] = pc = (λ + h)c−r λr p0 = c! (λ + h)c−r λr /c! c−r n=0 (λ+h)n n! + λ+h c−r λ c λn n=c−r+1 n! (4) 434 . . we block the new calls and we admit only handoff calls so that qn. . a handoff call is dropped if and only if a new call finds the system with c calls in progress. c − r 0 n! pn = c−r λn−(c−r)  (λ + h)  p0 n = c − r + 1. is less than c − r. . 2. under the assumptions of exponential service times and Poisson arrivals. the remaining service time remains identical to that of a new customer.n+1 = h. first served queue: l l l l 0 m 1 m 2 m 3 m It would seem that the LCFS queue should be less efficient than the ordinary M/M/1 queue because a new arrival causes us to discard the work done on the customer in service. . . the call departure rate in state n is n calls per minute. c n This implies  n  (λ + h) p  n = 1. . However. .10. This is not the case. . Problem 12.n+1 = λ + h. When n ≥ c − r. The corresponding Markov chain is l+h l+h l l l 0 1 c-r c-r c-r+1 c-1 c-1 c c When the number of calls. first served queue. .10. Theorem 12. we admit either type of call and qn. . . . the number of calls in the system can be used as the system state. however. . c − r n (1) pn =  λ  pn−1 n = c − r + 1. .7 Solution Since both types of calls have exponential holding times. . because the memoryless property of the exponential PDF implies that no matter how much service had already been performed. Since the service times are exponential with an average time of 1 minute. c n! The requirement that c pn = 1 yields n=1 c−r (2) p0 n=0 (λ + h)n + (λ + h)c−r n! c n=c−r+1 λn−(c−r) n! =1 (3) Finally. no matter which service discipline is used.6 Solution The LCFS queue operates in a way that is quite different from the usual first come. customers arrive at rate λ and depart at rate µ.Problem 12. 2. c − r + 2. c − r + 2. .24 says that the stationary probabilities pn satisfy   λ+h  pn−1 n = 1. n. .7 0 0 0 0.5 0 0 0 ....3 0  0 0 0 0.7 0 0 0  0 0 0.8 0 0 0 0.6 0.6 0 0 0 0.1 Solution Here is the Markov chain describing the free throws.3 0 0 0 0 0 0 0.8 0 0 0 0   0 0.6 0 0  0 0 0.6 0 0  0 0 0 0.  0 0.4 0 0 0.2 0 0 0  0.2 0 0 0 0 0.8 -4 -3 0.7 0 0 0. function ps=freethrowp(n).4 0  0 0 0 0..5 -1 0.9 0 0 0 0 0. function Pn=freethrowmat(n)..” We denote the stationary probabilities by the vector π = π−4 π−3 π−2 π−1 π0 π1 π2 π3 π4 .3 0 0 0  0. 0. The answer to our problem is simply 435 .3 0.1 0..9]. p0 is the initial state probability row vector π (0).1*(1:9)’ multiplies the state probability vector by the conditional probability of successful free throw given the current state.7 3 0.6 1 0. 0.5 0 P= 0 0 0 0..1 0  0.1 0 0 0 Pn=P^n.4 0 0 0  0. 0 0 0 0 0 0 0.4 0. freethrowmat(n) returns the n step transition matrix and freethrowp(n) that calculates the probability of a success on the free throw n.. 0.7 0   0 0 0 0.. Finally. For this vector π.8 0 0 0 0 0.2 0 0 0 0 0 0 0.. p0=[zeros(1.9 0 0 0 0 0..2 0 0 0..1 2 0.9 0.5 0 0. P=[0.2 0. 0.1 0 0 0 0.. p0*PP*0.6 0 0 0.3 0.11. ps=p0*PP*0. 0.. In freethrowp..9 (1) (2) To solve the problem at hand.9 0....4 0 0 0 0 0.m. PP=freethrowmat(n-1). 0.1*(1:9)’.4) 1 . we divide the work into two functions.. 0. zeros(1. the state transition matrix is  0.4 0 0.1 0 0.5 0.2 0. 0.Problem 12..4)].5 0 0 0 0 0....8..8 4 Note that state 4 corresponds to “4 or more consecutive successes” while state −4 corresponds to “4 or more consecutive misses..3 0 0 0 0.7 -2 0. 0. Thus p0*PP is the state probability row vector after n − 1 free throws. 6.” The state describes the number of customers in the queue and the number of active servers. implying λpn = 2pn+1 . so perhaps the stationary distribution isn’t all that interesting. When just the clerk is serving. The Markov chain issomewhat complicated because when the number of customers in the store is 2. 4m. The state space {0. 2c. the service rate is 1 customer per minute. This is done fairly >> p=dmcstatprob(freethrowmat(1)). 3. 6.5000 >> In retrospect.5 − 0. 2.5. the conditional success probability given the system is in state −k or k is 0. symmetry of the Markov chain dictates that states −k and k will have the same probability at every time step. Because state −k has success probability 0. or 4. Here is the Markov chain: l 0 1 2 1 1 2m 2 l 2c 1 l 3m 2 l 3c 1 l 4m l 4c l 2 l 5 2 l 6 2 l In states 2c. one can argue that in a basketball game. . Comment: easily: Perhaps finding the stationary distribution is more interesting. We observe for n ≥ 5 that the average rate of transitions from state n to state n + 1 must equal the average rate of transitions from state n + 1 to state n. depending on whether the manager became an active server.11. the rate is 2 customers per minute. .0558 0. 3m and 4m.3123 0. Fortunately.} is countably infinite. we can use Matlab because the state space for states n ≥ 5 has a simple structure.5 + 0.3123 About sixty precent of the time the shooter has either made four or more consecutive free throws or missed four or more free throws. Finding the state probabilities is a little bit complicated because the are enough states that we would like to use Matlab. . only the clerk is working.0558 0. On the other hand. . a shooter rarely gets to take more than a half dozen (or so) free throws. however. . . the calculations are unnecessary! Because the system starts in state 0.0929 0 0. 436 n = 5. . the number of servers may be 1 or may be 2. 3c.1k. In states 2m. 1. Averaged over k = 1.0390 0.>> freethrowp(11) ans = 0. 3m.1k while state k has success probability 0. Problem 12. It follows that pn+1 = (λ/2)pn and that pn = αn−5 p5 . 3.0390 0. 4. When the manager and clerk are both serving. . . 3c and 4c. (1) . (2) n = 5. .0929 0. we model the system as a continuous time Markov chain. the average success probability is still 0. 4c. The clerk and the manager each represent a “server. 2m. 6. 5. Matlab can only handle a finite state space.2 Solution In this problem. the manager is also working. >> p’ ans = 0.5. where R is the same as Q except the zero diagonal entries are replaced by −νi . To find the stationary probabilities. We form the matrix R by replacing the diagonal entries of Q.23 to write i rij pi = 0. For each state i. Q= 0 1 0 0 0 0 λ 0 0  0 2 0 0 0 0 λ 0 0   0 0 0 0 0 2 0 λ 0   0 0 0 0 0 0 2 0 λ 0 0 0 0 0 0 0 2 0 (6) (7) where the first row just shows the correspondence of the state probabilities and the matrix columns. This leads to a set of matrix equations for the state probability vector p = p0 p1 p2c p3c p3c p4c p2m p3m p4m p5 The rate transition matrix associated with p is   p0 p1 p2c p3c p4c p2m p3m p4m p5 0 λ 0 0 0 0 0 0 0   1 0 λ 0 0 0 0 0 0   0 1 0 λ 0 0 0 0 0   0 0 1 0 λ 0 0 0 0  . In this problem. The requirement that the stationary probabilities sum to 1 implies 4 1 = p0 + p 1 + j=2 4 (pjc + pjm ) + ∞ n=5 pn ∞ (3) αn−5 (4) (5) = p0 + p1 + j=2 4 (pjc + pjm ) + p5 (pjc + pjm ) + j=2 n=5 = p0 + p1 + p5 1−α This is convenient because for each state j < 5. In particular. This is the approach of cmcstatprob. However. we solve p5 = 1.where α = λ < 2 < 1.m.23 and solve p R = 0 and p 1 = 1. our normal procedure is to use Theorem 12. we use Theorem 12. we follow almost the same procedure. The equation p 1 = 1 replaces one column of the set of matrix equations. instead of replacing an arbitrary column with the equation p 1 = 1. excepting state 5. the departure rate νi from that state equals the sum of entries of the corresponding row of Q. 1−α (8) pR= 0 0 0 0 0 0 0 0 0 1 . we can solve for the staitonary probabilities. we replace the column corresponding to p5 with the equation p0 + p1 + p2c + p3c + p4c + p2m + p3m + p4m + That is. (9) 437 . or 4c.7. 1. we can write E [N ] = The substitution k = n − 5 yields 4 ∞ n=0 4 n = 2. Thus P [W ] = 1 − (p0 + p1 + p2c + p3c + p4c ). alongside the corresponding output. (11) npn = n=0 npn + ∞ n=5 np5 αn−5 (12) E [N ] = n=0 4 npn + p5 npn + p5 n=0 ∞ k=0 (k + 5)αk ∞ k=0 (13) kαk (14) = From Math Fact B. Defining pn = pnc + pnm .where  −λ λ 0 0 0 0 0 0  1 −1 − λ λ 0 0 0 0 0   0 1 −1 − λ λ 0 0 0 0   0 0 1 −1 − λ λ 0 0 0  0 0 1 −1 − λ 0 0 0 R= 0   0 2 0 0 0 −2 − λ λ 0   0 0 0 0 0 2 −2 − λ λ   0 0 0 0 0 0 2 −2 − λ 0 0 0 0 0 0 0 2 1 1 1 1 1 1 1 1 1 1−α               (10) Given the stationary distribution. the manager is working unless the system is in state 0. Recall that N is the number of customers in the system at a time in the distant future. (17) We implement these equations in the following program. 3. 3c. 438 . 4. 2c. we conclude that 4 5 + p5 1−α E [N ] = n=0 4 npn + p5 npn + p5 n=0 5 α + 1 − α (1 − α)2 5 − 4α (1 − α)2 (15) (16) = Furthermore. we can now find E[N ] and P [W ]. 2)=2. then there will be min(i. if there are K < i orders. then j − 50 pads ar left at the end of the day. then there will be j ≥ 60 pads at the end of the day and the next state is j.3. .n-1) 1/(1-a)]’. . 109}.pw15]=clerks(1. 50 pads are delivered overnight. >> [en05.1111 pw10 = 0. n=size(Q.5) en15 = 4. PW=1-sum(pv(1:5)).8217 pw05 = 0. In this case. On the other hand.4. and the next state is j = 50. Q(5. j = 50. pv=([zeros(1.0233 >> [en10. 52. if K = 50 + i − j pads are 439 . then all i pads are sold. In state i. (5-4*a)/(1-a)^2].57.9)=lam.. .pw10]=clerks(1.1). the PMF of the number of brake pads ordered on an arbitary day. In this case.82 to 4.5036 pw15 = 0.pw05]=clerks(0.02 to 0.5 customers per minute. j = 51. Similarly. the number of pads remaining is less than 60. there are two possibilities: • If 50 ≤ i ≤ 59. Q=Q+diag([1 1 1 1 0 2 2 2].3. then there are several subcases: – j = 50: If there are K ≥ i orders. 50 pads are delivered overnight. Note that the range of Xn is SX = {50. P [K ≥ i] j = 50. . . K) brake pads sold. 51. 52.. On the other hand.function [EN. We express the transition probabilities in terms of PK (·). and so 50 more pads are delivered overnight. . Q(6.n)=[ones(1. the probability the manager is working rises from 0.1).5 customers.4.2. (2) – 51 ≤ j ≤ 59: If 50 + i − j pads are sold.5772 >> We see that in going from an arrival rate of 0.-1).2. and the next state is j with probability Pij = PK (50 + i − j) . we need to find the state transition matrix for the Markov chain. . • If 60 ≤ i ≤ 109. i pads are sold and 50 pads are delivered overnight. . R(:. . Thus the next state is j = 50 if K ≥ i pads are ordered. . To evaluate the system. Thus Pij = P [K ≥ i] . Q=diag(lam*[1 1 1 1 0 1 1 1]. At the end of the day.n-1) 1]*R^(-1)).PW]=clerks(lam). 50 + i.5) en05 = 0.1.0) en10 = 2. We can model the system as a Markov chain with state Xn equal to the number of brake pads in stock at the start of day n.3 Solution Although the inventory system in this problem is relatively simple.5 customers per minute to 1. then the next state is j = i − K + 50. . . the performance analysis is suprisingly complicated. 59. (1) Pij = PK (50 + i − j) j = 51. a=lam/2. EN=pv*[0. the average number of customers goes from 0. R=Q-diag(sum(Q.2)). Problem 12.2222 >> [en15. (3) – 60 ≤ j ≤ i: If there are K = i − j pads ordered.11. . . the probability of this event is Pij = PK (50 + i − j) . (5) We can summarize these observations in this set of state transition probabilities:  50 ≤ i ≤ 109. j = 50. . 10 ≤ j − 50 ≤ 59. j = i + 1. j = 50. 51 ≤ j ≤ 59. These facts imply that we can write the state transition probabilities in the simpler form:  50 ≤ i ≤ 109. . for j > i.  P [K ≥ i]   P (50 + i − j)  K 50 ≤ i ≤ 59. 51 ≤ j ≤ 50 + i. PK (50 + i − j) Pij =  PK (i − j) + PK (50 + i − j) 60 ≤ i ≤ 109. . . Note that the “0 otherwise” rule comes into effect when 50 ≤ i ≤ 59 and j > 50 + i. 51 ≤ j PK (50 + i − j) (7) Pij = 60 ≤ i ≤ 109. we observe that PK (k) = 0 for k < 0.   δi−j 60 ≤ i ≤ 109. 109. 60 ≤ j γk = PK (50 + k) . i + 2. 51 ≤ j ≤ 59. i + 1 ≤ j ≤ 109   0 otherwise (6) Finally. 60 ≤ j ≤ i. In addition. we make the definitions βi = P [K ≥ i] . This implies PK (50 + i − j) = 0 for j > 50 + i. state j can be reached from state i if there 50 + i − j orders. there will be 50 pads delivered overnight and the next state will be j. Since 60 ≤ j ≤ 109.  PK (50 + i − j)   PK (i − j) + PK (50 + i − j) 60 ≤ i ≤ 109. leaving i − (50 + i − j) = j − 50 in stock at the end of the day. 60 ≤ j (9) 440 . 61. j = 60. Thus Pij = PK (i − j) + PK (50 + i − j) . . . (8) With these definitions.  P [K ≥ i]   50 ≤ i ≤ 59.  βi   γi−j 50 ≤ i ≤ 59. 51 ≤ j Pij =  γi−j 60 ≤ i ≤ 109. j = 50.    PK (50 + i − j)  60 ≤ i ≤ 108. δk = PK (k) + PK (50 + k) .    60 ≤ i ≤ 109. i. This implies 50 pads are delivered overnight and the next stage is j. (4) – For i < j ≤ 109. . PK (i − j) = 0. the state transition probabilities are  50 ≤ i ≤ 109. To simplify these rules.ordered. 51 ≤ j ≤ 59. then there will be i − (50 + i − j) = j − 50 pads at the end of the day. . γ−50 δ−49 . . 51 · · · γ−1 · · · . . . . .. calculate the stationary probability vector π. .. γ58 · · · 59 γ−9 . . the state transition matrix P is i\j 50 51 . γ9 . .. . and then calculate E[Y ]. . . . .. . . γ0 . . i)] = Since only states i ≥ 50 are possible. . . . . . . 60 ··· . . . (12) Finally. . . the state of the system Xn is described by the stationary probabilities P [Xn = i] = πi and we calculate 109 E [Y ] = i=50 E [Y |Xn = i] πi . ··· .. δ−9 . . . the expected number of pads sold on a typical day. . . . 109 β109 . . . . . . . . .. . . . . δ0 59 β59 60 β60 . ··· ··· ··· ··· ··· ··· ··· ··· . . . .. . . γ50 δ49 · · · γ−9 ··· δ−9 (10) δ9 ··· . . . . . we use iterated expectation. γ−1 γ0 γ−1 · · · γ1 δ0 · · · . . .. (11) i−1 E [Y |Xn = i] = jPK (j) + j=0 j=49 jPK (j) + iP [K ≥ i] . . ..Expressed as a table. all we need to do is to encode the matrix P. . . . . . . . . . 109 γ−59 . . .. . (13) These calculations are given in this Matlab program: 441 . . we assume that on a typical day n. . . we can write 48 j=0 jPK (j) + iP [K ≥ i] . . . . To calculate E[Y ]. 50 β50 β51 . δ9 . . . The number of pads ordered is the Poisson random variable K. .. . . .. In terms of Matlab. γ8 · · · γ9 · · · . . ··· ··· . We assume that on a day n that Xn = i and we calculate the conditional expectation i−1 E [Y |Xn = i] = E [min(K. . . . . When X = n ≥ 2.4154 >> G*gcol We see that although the store receives 50 orders for brake pads on average.. one teller is busy. we do not need to keep track of which teller is busy.m calculates E[Y ] using the iterated expectation. both tellers are busy and there are n − 2 cars waiting.ey]=brakepads(50). grow=poissonpmf(alpha.50+(-1:-1:-59)).0:-1:-49). gcol=poissonpmf(alpha. Note that a new inventory policy in which the overnight delivery is more than 50 pads or the threshold for getting a shipment is more than 60 will reduce the expected numer of unfulfilled orders.function [pstat. dcol=poissonpmf(alpha. beta=1-poissoncdf(alpha.*beta (18) (0:48)*pk To find E[Y ]. ey=(EYX’)*pstat. When X = 1.. The first half of brakepads. Note that EYX = E [Y |Xn = 50] · · · 48 The somewhat convoluted code becomes clearer by noting the following correspondences: i−1 E [Y |Xn = 109] .drow). however. Whether such a change in policy is a good idea depends on factors such as the carrying cost of inventory that are absent from our simple model. then the each day the store will run out of pads and will get a delivery of 50 pads ech night. This problem is actually very easy. Thus. The state of the system is given by X.0:48). drow=poissonpmf(alpha.toeplitz(gcol. the average number sold is only 49.11.0:49). EYX=EYX+(0:48)*pk. we simply add the Toeplitz matrix corresponding to the missing PK (k) term. Some experimentation will show that if the expected number of orders α is significantly less than 50. (14) The rest of P is easy to construct using toeplitz function. EYX=(G*gcol)+(s. The first column of P is just the vector beta = β50 · · · β109 .m constructs P to calculate the stationary probabilities.s-1).ey]=brakepads(alpha).grow)]. P(11:60. both tellers are idle. P=[beta. then the expected number of brake pads sold each days is very close to α.42 because once in awhile the pads are out of stock.11:60). pk=poissonpmf(alpha. On the other hand. (17) E [Y |Xn = i] = jPK (j) + j=0 j=49 jPK (j) + iP [K ≥ i] . if α 50. The second half of brakepads. we execute the commands: >> [ps.49:108). to construct the Toeplitz matrix in the lower right corner of P. the number of cars in the system.J]=ndgrid(49:108. Here is the Markov chain: 442 Problem 12. >> ey ey = 49. +toeplitz(dcol. We first build an asymmetric Toeplitz matrix with first row and first column grow = γ−1 γ−2 · · · γ−59 γ58 (15) (16) gcol = γ−1 γ0 · · · Note that δk = PK (k) + γk . G=J.*(I>=J).*beta). s.50+(-1:58)).4 Solution . [I.When X = 0. pstat=dmcstatprob(P). s=(50:109)’.11:60)=P(11:60. The expected number of unfulfilled orders will be very close to α − 50. l l l l l 0 1 1 2 2 ××× 2 2 7 2 8 Since this is a birth death process.11.8 Solution Under construction.11. However.75). we could easily solve this problem using analysis.11.r).8709 >> Problem 12.3410 0. Problem 12. >> p’ ans = 0.0025 0.0004 >> en en = 0.1279 0. Problem 12.11. p=cmcstatprob(Q). The code solves the stationary distribution and the expected number of cars in the system for an arbitrary arrival rate λ.4546 0. Problem 12. Here is the output: >> [p.0009 0.6 Solution Under construction. as this problem is in the Matlab section of this chapter.0180 0. c=2*[0. Q=toeplitz(c.en]=veryfast2(lambda). Problem 12. Q(2. r=lambda*[0.11.en]=veryfast2(0.5 Solution Under construction.1)=1.7 Solution Under construction.8)]’.0480 0.8)].0067 0.eye(1.9 Solution Under construction. en=(0:8)*p. we might as well construct a Matlab solution: function [p. 443 .eye(1. Documents Similar To Prob Stats Solution Manual AnnotatedSkip carouselcarousel previouscarousel next1 Practice TestTEO TOP Probability05.22.16 Mariners Minor League Reportvickey progression 32012-2013-homework-algenglish last essaych01-posjthsol_manualHw2solgs.psparabolic unit testMaths Year 8 Worksheetssoftball lesson plan 2-batting[솔루션] Probability and Stochastic Processes 2nd Roy D. Yates and David J. Goodman 2판 확률과 통계 솔루션 433 4000quiz 1sewa official rules1200 Well to Seismic Tie by Pamela TanezaGgplot2 Book ExamplesErrataMFE-9pGY308_EN.pdfj1q92013Projection of PointsReasoning Strategies for Data Sufficiency Questions2006QC-Feedback - 34788 - 14380 - 44066672014 08 Lyp Mathematics Sa 1 01TCS+PapersManualEJOR PrintingMid term 2009-10Meshing GeomMore From Hector GutierrezSkip carouselcarousel previouscarousel nextChapter 1 LectureYokogawa DX2000 Manual.pdfFinancial Accounting Lecture 82 Conduction NotesPCDMIS AdvanceLatticesFooter MenuBack To TopAboutAbout ScribdPressOur blogJoin our team!Contact UsJoin todayInvite FriendsGiftsLegalTermsPrivacyCopyrightSupportHelp / FAQAccessibilityPurchase helpAdChoicesPublishersSocial MediaCopyright © 2018 Scribd Inc. .Browse Books.Site Directory.Site Language: English中文EspañolالعربيةPortuguês日本語DeutschFrançaisTurkceРусский языкTiếng việtJęzyk polskiBahasa indonesiaSign up to vote on this titleUsefulNot usefulMaster Your Semester with Scribd & The New York TimesSpecial offer for students: Only $4.99/month.Master Your Semester with a Special Offer from Scribd & The New York TimesRead Free for 30 DaysCancel anytime.Read Free for 30 DaysYou're Reading a Free PreviewDownloadClose DialogAre you sure?This action might not be possible to undo. Are you sure you want to continue?CANCELOK
Copyright © 2025 DOKUMEN.SITE Inc.