302540069-John-E-Freund-s-Mathematical-Statistics-With-Applications-Pearson-Education-Limited.pdf

April 28, 2018 | Author: Condee Teerak | Category: Permutation, Statistics, Probability Theory, Probability, Physics & Mathematics


Comments



Description

John E. Freund's Mathematical Statistics with Applications Irwin Miller Marylees Miller Eighth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk © Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners. ISBN 10: 1-292-02500-X ISBN 13: 978-1-292-02500-1 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America P E A R S O N C U S T O M L I B R A R Y Table of Contents 1. Introduction Irwin Miller/Marylees Miller 1 2. Probability Irwin Miller/Marylees Miller 21 3. Probability Distributions and Probability Densities Irwin Miller/Marylees Miller 61 4. Mathematical Expectation Irwin Miller/Marylees Miller 113 5. Special Probability Distributions Irwin Miller/Marylees Miller 145 6. Special Probability Densities Irwin Miller/Marylees Miller 177 7. Functions of Random Variables Irwin Miller/Marylees Miller 207 8. Sampling Distributions Irwin Miller/Marylees Miller 233 9. Decision Theory Irwin Miller/Marylees Miller 261 10. Point Estimation Irwin Miller/Marylees Miller 283 11. Interval Estimation Irwin Miller/Marylees Miller 317 12. Hypothesis Testing Irwin Miller/Marylees Miller 337 13. Tests of Hypothesis Involving Means, Variances, and Proportions Irwin Miller/Marylees Miller 359 I 14. Regression and Correlation Irwin Miller/Marylees Miller 391 Appendix: Sums and Products Irwin Miller/Marylees Miller 433 Appendix: Special Probability Distributions Irwin Miller/Marylees Miller 437 Appendix: Special Probability Densities Irwin Miller/Marylees Miller 439 Statistical Tables Irwin Miller/Marylees Miller 443 Index 469 II Introduction 1 Introduction 3 Binomial Coefficients 2 Combinatorial Methods 4 The Theory in Practice 1 Introduction In recent years, the growth of statistics has made itself felt in almost every phase of human activity. Statistics no longer consists merely of the collection of data and their presentation in charts and tables; it is now considered to encompass the science of basing inferences on observed data and the entire problem of making decisions in the face of uncertainty. This covers considerable ground since uncertainties are met when we flip a coin, when a dietician experiments with food additives, when an actuary determines life insurance premiums, when a quality control engineer accepts or rejects manufactured products, when a teacher compares the abilities of students, when an economist forecasts trends, when a newspaper predicts an election, and even when a physicist describes quantum mechanics. It would be presumptuous to say that statistics, in its present state of devel- opment, can handle all situations involving uncertainties, but new techniques are constantly being developed and modern statistics can, at least, provide the frame- work for looking at these situations in a logical and systematic fashion. In other words, statistics provides the models that are needed to study situations involving uncertainties, in the same way as calculus provides the models that are needed to describe, say, the concepts of Newtonian physics. The beginnings of the mathematics of statistics may be found in mid-eighteenth- century studies in probability motivated by interest in games of chance. The theory thus developed for “heads or tails” or “red or black” soon found applications in sit- uations where the outcomes were “boy or girl,” “life or death,” or “pass or fail,” and scholars began to apply probability theory to actuarial problems and some aspects of the social sciences. Later, probability and statistics were introduced into physics by L. Boltzmann, J. Gibbs, and J. Maxwell, and by this century they have found applications in all phases of human endeavor that in some way involve an element of uncertainty or risk. The names that are connected most prominently with the growth of mathematical statistics in the first half of the twentieth century are those of R. A. Fisher, J. Neyman, E. S. Pearson, and A. Wald. More recently, the work of R. Schlaifer, L. J. Savage, and others has given impetus to statistical theories based essentially on methods that date back to the eighteenth-century English clergyman Thomas Bayes. Mathematical statistics is a recognized branch of mathematics, and it can be studied for its own sake by students of mathematics. Today, the theory of statistics is applied to engineering, physics and astronomy, quality assurance and reliability, drug development, public health and medicine, the design of agricultural or industrial experiments, experimental psychology, and so forth. Those wishing to participate From Chapter 1 of John E. Freund’s Mathematical Statistics with Applications, Eighth Edition. Irwin Miller, Marylees Miller. Copyright © 2014 by Pearson Education, Inc. All rights reserved. 1 Introduction in such applications or to develop new applications will do well to understand the mathematical theory of statistics. For only through such an understanding can appli- cations proceed without the serious mistakes that sometimes occur. The applications are illustrated by means of examples and a separate set of applied exercises, many of them involving the use of computers. To this end, we have added at the end of the chapter a discussion of how the theory of the chapter can be applied in practice. We begin with a brief review of combinatorial methods and binomial coefficients. 2 Combinatorial Methods In many problems of statistics we must list all the alternatives that are possible in a given situation, or at least determine how many different possibilities there are. In connection with the latter, we often use the following theorem, sometimes called the basic principle of counting, the counting rule for compound events, or the rule for the multiplication of choices. THEOREM 1. If an operation consists of two steps, of which the first can be done in n1 ways and for each of these the second can be done in n2 ways, then the whole operation can be done in n1 · n2 ways. Here, “operation” stands for any kind of procedure, process, or method of selection. To justify this theorem, let us define the ordered pair (xi , yj ) to be the outcome that arises when the first step results in possibility xi and the second step results in possibility yj . Then, the set of all possible outcomes is composed of the following n1 · n2 pairs: (x1 , y1 ), (x1 , y2 ), . . . , (x1 , yn2 ) (x2 , y1 ), (x2 , y2 ), . . . , (x2 , yn2 ) ... ... ... (xn1 , y1 ), (xn1 , y2 ), . . . , (xn1 , yn2 ) EXAMPLE 1 Suppose that someone wants to go by bus, train, or plane on a week’s vacation to one of the five East North Central States. Find the number of different ways in which this can be done. Solution The particular state can be chosen in n1 = 5 ways and the means of transportation can be chosen in n2 = 3 ways. Therefore, the trip can be carried out in 5 · 3 = 15 possible ways. If an actual listing of all the possibilities is desirable, a tree diagram like that in Figure 1 provides a systematic approach. This diagram shows that there are n1 = 5 branches (possibilities) for the number of states, and for each of these branches there are n2 = 3 branches (possibilities) for the different means of trans- portation. It is apparent that the 15 possible ways of taking the vacation are repre- sented by the 15 distinct paths along the branches of the tree. 2 Introduction bus train plane bus io train Oh plane a an di In bus Illinois train plane M ich ig an bus Wi train sc on plane sin bus train plane Figure 1. Tree diagram. EXAMPLE 2 How many possible outcomes are there when we roll a pair of dice, one red and one green? Solution The red die can land in any one of six ways, and for each of these six ways the green die can also land in six ways. Therefore, the pair of dice can land in 6 · 6 = 36 ways. Theorem 1 may be extended to cover situations where an operation consists of two or more steps. In this case, we have the following theorem. 3 Introduction THEOREM 2. If an operation consists of k steps, of which the first can be done in n1 ways, for each of these the second step can be done in n2 ways, for each of the first two the third step can be done in n3 ways, and so forth, then the whole operation can be done in n1 · n2 · . . . · nk ways. EXAMPLE 3 A quality control inspector wishes to select a part for inspection from each of four different bins containing 4, 3, 5, and 4 parts, respectively. In how many different ways can she choose the four parts? Solution The total number of ways is 4 · 3 · 5 · 4 = 240. EXAMPLE 4 In how many different ways can one answer all the questions of a true–false test consisting of 20 questions? Solution Altogether there are 2 · 2 · 2 · 2 · . . . · 2 · 2 = 220 = 1,048,576 different ways in which one can answer all the questions; only one of these corre- sponds to the case where all the questions are correct and only one corresponds to the case where all the answers are wrong. Frequently, we are interested in situations where the outcomes are the different ways in which a group of objects can be ordered or arranged. For instance, we might want to know in how many different ways the 24 members of a club can elect a presi- dent, a vice president, a treasurer, and a secretary, or we might want to know in how many different ways six persons can be seated around a table. Different arrange- ments like these are called permutations. DEFINITION 1. PERMUTATIONS. A permutation is a distinct arrangement of n differ- ent elements of a set. EXAMPLE 5 How many permutations are there of the letters a, b, and c? Solution The possible arrangements are abc, acb, bac, bca, cab, and cba, so the number of distinct permutations is six. Using Theorem 2, we could have arrived at this answer without actually listing the different permutations. Since there are three choices to 4 . Introduction select a letter for the first position. leaving only one letter for the third position. we find that n distinct objects taken r at a time. but what is the number of permutations if we take only two of the four letters or. · (n − r + 1) ways. Therefore. EXAMPLE 6 In how many different ways can the five starting players of a basketball team be introduced to the public? Solution There are 5! = 5 · 4 · 3 · 2 · 1 = 120 ways in which they can be introduced. . we can state the following theorem. 2. b. Generalizing the argument used in the preceding example. by definition we let 0! = 1. c. Therefore. Proof The formula n Pr = n(n − 1) · . the number of permutations is 4 · 3 = 12. but we do have n! n P0 = =1 (n − 0)! 5 . can be arranged in n(n − 1) · . . · 3 · 2 · 1 different ways. which is read “n factorial. 1. . 4! = 4 · 3 · 2 · 1 = 24. 2! = 2 · 1 = 2. then two for the second position. The number of permutations of n distinct objects is n!. THEOREM 4. n. as it is usually put. · (n − r + 1) cannot be used for r = 0. we represent this product by the symbol n!. we find that n distinct objects can be arranged in n(n − 1)(n − 2) · . . if we take the four letters two at a time? Solution We have two positions to fill. and we let n P0 = 1 by definition. . 3! = 3 · 2 · 1 = 6. . 1! = 1. the total number of permutations is 3 · 2 · 1 = 6. with four choices for the first and then three choices for the second. and so on. We denote this product by n Pr . . EXAMPLE 7 The number of permutations of the four letters a. by Theorem 1. To simplify our notation.” Thus. for r > 0. THEOREM 3. 5! = 5 · 4 · 3 · 2 · 1 = 120. and d is 24. The number of permutations of n distinct objects taken r at a time is n! n Pr = (n − r)! for r = 0. Generalizing the argument that we used in the preceding example. . Also. . . we get 5! 120 5 P3 = = = 60 2! 2 We might also argue that the first speaker can be scheduled in five ways. EXAMPLE 8 Four names are drawn from among the 24 members of a club for the offices of pres- ident. · (n − r − 1)(n − r)! = (n − r)! n! = (n − r)! In problems concerning permutations. Two circular permutations are not considered different (and are counted only once) if corresponding objects in the two arrangements have the same objects to their left and to their right. 2. .024 20! EXAMPLE 9 In how many ways can a local chapter of the American Chemical Society schedule three speakers for three different meetings if they are all available on any of five possible dates? Solution Since we must choose three of the five dates and the order in which they are chosen (assigned to the three speakers) matters. but the factorial formula of Theorem 4 is somewhat easier to remember. Introduction For r = 1. these quantities are also preprogrammed in many hand-held statistical (or scientific) calculators. we do not get a different permutation if everyone moves to the chair at his or her right. and the third speaker in three ways. 6 . . vice president. . · (n − r − 1) n(n − 1)(n − 2) · . and secretary. treasurer. In how many different ways can this be done? Solution The number of permutations of 24 distinct objects taken four at a time is 24! 24 P4 = = 24 · 23 · 22 · 21 = 255. if four persons are playing bridge. . Many statistical software packages provide values of n Pr and other combinatorial quantities upon simple commands. it is usually easier to proceed by using Theorem 2 as in Example 7. Permutations that occur when objects are arranged in a circle are called circular permutations. . we have n Pr = n(n − 1)(n − 2) · . Indeed. . n. For example. so that the answer is 5 · 4 · 3 = 60. the sec- ond speaker in four ways. . However. the total number of arrangements of the letters in the word “book” is 2 = 12. we can seat (arrange) the other three players in 3! = 6 different ways. The number of permutations of n distinct objects arranged in a circle is (n − 1)!. b. a2 . we find that with subscripts there are 7! different permuta- tions of a1 . c. Thus. for example. and k. and e. and since each pair of permutations with subscripts yields but one arrangement without subscripts. We have been assuming until now that the n objects from which we select r objects and form permutations are all distinct. In other words. both yield boko. Generalizing the argument that we used in the two preceding examples. for instance. THEOREM 5. if we drop the subscripts. there are six different circular permutations. and a3 that lead to the same permutation of a. and a3 and the other four novels by b. a2 . d. c.” or the number of ways in which three copies of one novel and one copy each of four other novels can be arranged on a shelf. a3 . we obtain the follow- ing theorem. Generalizing the argument used in the preceding example. b. However. the various formulas cannot be used. 24 EXAMPLE 12 In how many different ways can three copies of one novel and one copy each of four other novels be arranged on a shelf? Solution If we denote the three copies of the first novel by a1 . since there are 3! permutations of a1 . we obtain the following theorem. 7 . there are 4! = 24 different permutations of the symbols b. a. a2 . c. d. we find that there 3! = 7 · 6 · 5 · 4 = 840 ways in which the seven books can be arranged on a are only 7! shelf. and e. o1 . and e. o2 . d. a. to determine the number of ways in which we can arrange the letters in the word “book. then bo1 ko2 and bo2 ko1 . Introduction EXAMPLE 10 How many circular permutations are there of four persons playing bridge? Solution If we arbitrarily consider the position of one of the four players as fixed. EXAMPLE 11 How many different permutations are there of the letters in the word “book”? Solution If we distinguish for the moment between the two o’s by labeling them o1 and o2 . Introduction THEOREM 6. 6 Actually. Dividing n Pr by r! and denoting the   n result by the symbol r . A combination is a selection of r objects taken from n distinct objects without regard to the order of selection. . DEFINITION 2. If we do not care about the order in which the households are selected. we are simply asking for the total number of subsets of r objects that can be selected from a set of n distinct objects. “combination” means the same as “subset. the answer is 20 P3 = 20 · 19 · 18 = 6. . there are only 6.” and when we ask for the number of combinations of r objects selected from a set of n distinct objects. 8 . . EXAMPLE 14 In how many different ways can a person gathering data for a market research orga- nization select three of the 20 households living in a certain apartment complex? Solution If we care about the order in which the households are selected.140 ways in which the person gathering the data can do his or her job.840 but each set of three households would then be counted 3! = 6 times. and n3 = 2 into the formula of Theorem 6. we thus have the following theorem. and n1 + n2 + · · · + nk = n is n! n1 ! · n2 ! · . n2 = 3. so that the n Pr permutations of r objects selected from a set of n distinct objects contain each subset r! times. n1 = 2. . n2 are of a second kind. and two paintings by Degas be hung side by side on a museum wall if we do not distinguish between the paintings by the same artists? Solution Substituting n = 7. . three paintings by Renoir. we get 7! = 210 2! · 3! · 2! There are many problems in which we are interested in determining the number of ways in which r objects can be selected from among n distinct objects without regard to the order in which they are selected. · nk ! EXAMPLE 13 In how many ways can two paintings by Monet. nk are of a kth kind. The number of permutations of n objects of which n1 are of one kind. . COMBINATIONS.840 = 1. there are r! permutations of the objects in a subset of r objects. In general. where H stands for head and T for tail. . . A constitute a partition of set A if A ∪ A2 ∪ · · · ∪ A = A and Ai ∩ Aj = 1 k 1 k Ø for all i Z j. applying Theorem 7. 9 . .† The order of the objects within a subset is of no importance. HTHTTT. the r objects that are selected and the n − r objects that are left. we are concerned with the more general problem of partitioning a set of n distinct objects into k sub- sets. TTHTHT. . EXAMPLE 17 In how many ways can a set of four objects be partitioned into three subsets contain- ing. two. the subsets A . one. . we find that the answer is   6 6! = = 15 2 2! · 4! This result could also have been obtained by the rather tedious process of enumer- ating the various possibilities. HHTTTT. Introduction THEOREM 7. 2. The number of combinations of n distinct objects taken r at a time is   n n! = r r!(n − r)! for r = 0. EXAMPLE 16 How many different committees of two chemists and one physicist can be formed from the four chemists and three physicists on the faculty of a small college? Solution   4 4! Since two of four chemists can be selected in 2 = = 6 ways and one of   2! · 2! 3 3! three physicists can be selected in 1 = = 3 ways. . which requires that each of the n objects must belong to one and only one of the subsets. respectively. 1. . A2 . A combination of r objects selected from a set of n distinct objects may be con- sidered a partition of the n objects into two subsets containing. Often. . respectively. . n. . and one of the objects? † Symbolically. . . Therefore. EXAMPLE 15 In how many different ways can six tosses of a coin yield two heads and four tails? Solution This question is the same as asking for the number of ways in which we can select the two tosses on which heads is to occur. Theorem 1 shows that the 1! · 2! number of committees is 6 · 3 = 18. . · nk ! 10 . by Theorem 2 there are 6 · 2 · 1 = 12 partitions. the object going into the second subset can then be chosen in 1 = 2   1 ways. THEOREM 8. and d. . .. Had we not wanted to enumerate all the possibilities in the preceding example. . . .. . Generalizing this argument. · nk ! · 0! n! = n1 ! · n2 ! · . 1 where the number at the top represents the total number of objects and the numbers at the bottom represent the number of objects going into each subset. . we have the following theorem. . n2 objects in the second subset. the n2 objects going into the second subset can then be chosen   n − n1 in n2 ways. Introduction Solution Denoting the four objects by a. n2 . . Thus. the n3 objects going into the third subset can then be   n − n1 − n2 chosen in n3 ways. . nk n1 ! · n2 ! · . we  could have argued that the two objects going into the first subset can be chosen   in 4 2 2 = 6 ways. b. we find by enumeration that there are the following 12 possibilities: ab|c|d ab|d|c ac|b|d ac|d|b ad|b|c ad|c|b bc|a|d bc|d|a bd|a|c bd|c|a cd|a|b cd|b|a The number of partitions for this example is denoted by the symbol   4 = 12 2. . .· n1 . The number of ways in which a set of n distinct objects can be partitioned into k subsets with n1 objects in the first subset. . and nk objects in the kth subset is   n n! = n1 . . it follows by Theorem 2 that the total number of partitions is         n n n − n1 n − n1 − n2 − · · · − nk−1 = · ·. 1. c.. nk n1 n2 nk n! (n − n1 )! = · n1 ! · (n − n1 )! n2 ! · (n − n1 − n2 )! (n − n1 − n2 − · · · − nk−1 )! · . and so forth.. n2 . and the object going into the third subset can then be chosen in 1 = 1 way. . · nk ! Proof Since the n1 objects going into the first subset can be chosen in   n n1 ways. xy2 . For instance. the coefficient of x2 y is 1 = 3. Values of the binomial coefficients for n = 0. . Their coefficients are 1. x2 y. . n2 = 2. 20 and r = 0. the number of ways in which we can choose   n the r factors providing the y’s. and the     3 3 coefficients of x3 and y3 are 0 = 1 and 3 = 1. and   3 the coefficient of xy2 . each term will be the product of x’s and y’s. . Introduction EXAMPLE 18 In how many ways can seven businessmen attending a convention be assigned to one triple and two double hotel rooms? Solution Substituting n = 7. . Accordingly. 2 3! · 2! · 2! 3 Binomial Coefficients If n is a positive integer and we multiply out (x + y)n term by term. if n is a positive integer and we multiply out (x + y)n term by   n term. for example.    n n n−r r (x + y)n = x y for any positive integer n r r=0 11 . and y3 . with an x or a y coming from each of the n factors x + y. and n3 = 2 into the formula of Theorem 8. and 1. 10 are given in table Factorials and Binomial Coefficients of “Statistical Tables. More generally. THEOREM 9. . 3. Similarly. is 2 = 3. n1 = 3. the number of ways in which we can choose the one factor providing the y. 3. the number of ways in which we can   3 choose the two factors providing the y’s. the coefficient of xn−r yr is r .” We can now state the following theorem. 1. . we refer to r as a binomial coefficient. . 2. . the expansion (x + y)3 = (x + y)(x + y)(x + y) = x·x·x+x·x·y+x·y·x+x·y·y + y·x·x+y·x·y+y·y·x+y·y·y = x3 + 3x2 y + 3xy2 + y3 yields terms of the form x3 . 1. we get   7 7! = = 210 3. . n−1 2 when n is odd. EXAMPLE 19           4 4 4 4 4 Given = 1. . Introduction DEFINITION 3. we leave a subset of n − r objects. we write   n n! n! = = n−r (n − r)![n − (n − r)]! (n − r)!r!   n! n = = r!(n − r)! r Theorem 10 implies that if we calculate the binomial coefficients for r = 0. find . . 2. . n2 when n is even and for r = 0. 1. and 3 5−3 2 4 5−4 1       5 5 5 = = =1 5 5−5 0 It is precisely in this fashion that Theorem 10 may have to be used in connection with table Factorials and Binomial Coefficients of “Statistical Tables. 1. The coefficient of x n−r yr in the binomial expansion of (x + y)n is called the binomial coefficient nr . . . THEOREM 10. there are as many ways of selecting r objects as there are ways of leaving (or selecting) n − r objects. . 0 1 2 3 4 5 Solution             5 5 5 5 5 5 = = = 10. . . .” 12 . and = 6. 1. . n. . find and . = 4. and = 10.     n n = r n−r Proof We might argue that when we select a subset of r objects from a set of n distinct objects. = 5. BINOMIAL COEFFICIENTS. 0 1 2 3 4 Solution             4 4 4 4 4 4 = = = 4 and = = =1 3 4−3 1 4 4−4 0 EXAMPLE 20             5 5 5 5 5 5 Given = 1. the remaining binomial coefficients can be obtained by making use of the theorem. To prove the theorem algebraically. . and . hence. The calculation of binomial coefficients can often be simplified by making use of the three theorems that follow. For any positive integers n and r = 0. = = = 5. but we shall leave this to the reader in Exercise 12. there     n−1 n−1 are r − 1 ways of selecting the other r − 1 objects. to find 10 . r . Therefore. it is sometimes convenient to determine bino- mial coefficients by means of a simple construction. THEOREM 11. Since the coefficient of yr in   n (1 + y)n is r and the coefficient of yr in (1 + y)n−1 + y(1 + y)n−1 is the   n−1 sum of the coefficient of yr in (1 + y)n−1 . look up 7 . n − 1. we make use of the fact         17 17 17 17 that 10 = 7 .       n n−1 n−1 = + r r r−1 Proof Substituting x = 1 into (x + y)n .448. Applying Theorem 11. look       20 20 17 up 8 . . that is. When no table is available. . Introduction EXAMPLE 21     20 17 Find 12 and 10 . . if it is to be included. we make use of the fact that 12 = 8 . that is. 2. For any positive integer n and r = 1. An important application of Theorem 11 is a construct known as Pascal’s triangle. Similarly. we can generate Pascal’s triangle as follows: 13 . Solution       20 20 20 Since 12 is not given in the table. . there are r +   n−1 r − 1 ways of selecting the r objects. If it is not to be included among the   n−1 r objects. and get 10 = 19. and get 12 = 125.       n n−1 n−1 = + r r r−1 Theorem 11 can also be proved by expressing the binomial coefficients on both sides of the equation in terms of factorials and then proceeding algebraically. Alternatively. there are r ways of selecting the r objects. and the coeffi-   n−1 cient of yr−1 in (1 + y)n−1 . take any one of the n objects. r − 1 . we obtain       n n−1 n−1 = + r r r−1 which completes the proof. let us write (1 + y)n = (1 + y) (1 + y)n−1 = (1 + y)n−1 + y(1 + y)n−1 and equate the coefficient of yr in (1 + y)n with that in (1 + y)n−1 + y(1 + y)n−1 . that is.970. the coefficient of yk in (1 + y)m (1 + y)n is             m n m n m n m n + + +···+ 0 k 1 k−1 2 k−2 k 0     k m n = r k−r r=0 and this completes the proof.. (Clearly... To state the third theorem about binomial coefficients... Introduction 1 1 1 1 2 1 1 3 3 1 1 4 6 1 4 1 5 10 10 5 1 . let us make the following   n definition: r = 0 whenever n is a positive integer and r is a positive integer greater than n... Thus.... .... 14 . ...       k m n m+n = r k−r k r=0 Proof Using the same technique as in the proof of Theorem 11.... let us prove this theorem by equating the coefficients of yk in the expressions on both sides of the equation (1 + y)m+n = (1 + y)m (1 + y)n m+n The coefficient of yk in (1+y)m+n is .. . the coefficient of y in the first factor by the coefficient of yk−1 in the second factor. In this triangle. . the first and last entries of each row are the numeral “1” each other entry in any given row is obtained by adding the two entries in the preceding row immediately to its left and to its right.. there is no way in which we can select a subset that contains more elements than the whole set itself... and the coefficient of yk in k ⎡      ⎤ m m m m⎦ (1 + y)m (1 + y)n = ⎣ + y+···+ y 0 1 m ⎡      ⎤ n n n n⎦ *⎣ + y+···+ y 0 1 n is the sum of the products that we obtain by multiplying the constant term of the first factor by the coefficient of yk in the second factor.) THEOREM 12.... and the coefficient of yk in the first factor by the constant term of the second factor... . · xkk in the expansion of (x1 + x2 + · · · + xk )n is   n n! = r1 . we get                  2 3 2 3 2 3 2 3 2 3 5 + + + + = 0 4 1 3 2 2 3 1 4 0 4       3 2 2 and since 4 . Introduction EXAMPLE 22 Verify Theorem 12 numerically for m = 2. we get 6! = 60 3! · 1! · 2! Exercises 1. that is. or 3 hours for a history third step. the third step can be made (a) to verify that there are 13 ways in which the student in n3ij ways. r2 . the formula obtained in part (a) reduces way. verify that if n2i equals can be made in n1 ways. With reference to Exercise 1. . If the first step is made in the ith the constant n2 . and if the first step is made in the ith way and test on any given day. to the coefficients that arise in the expansion of (x1 + x2 + · · · + xk )n . suppose that there is a (b) A student can study 0. we can extend our discussion to multinomial coefficients. . 3. can study at most 4 hours for the test on two consecutive (a) Use a tree diagram to verify that the whole operation days. (a) Use a tree diagram to find a formula for the total num- ber of ways in which the total operation can be made. and the second subscript denotes the column. of which the first 2. . With reference to Exercise 1. and r3 = 2 into the preceding formula. can be made in † The first subscript denotes the row to which a particular element belongs. and 4 equal 0 according to the definition on the previous page. Use the formula obtained in part the second step in the jth way. 2. Solution Substituting these values. . since 2 · 1 + 1 · 3 = 5. 1. and k = 4. the second step can be made in n2i ways. rk r 1 ! · r2 ! · . An operation consists of two steps. . r2 = 1. 3 . . The multi- r r r nomial coefficient of the term x11 · x22 · . r1 = 3. . 15 .† to that of Theorem 1. Using Theorem 8. the equation reduces to         2 3 2 3 5 + = 1 3 2 2 4 which checks. · rk ! EXAMPLE 23 What is the coefficient of x31 x2 x23 in the expansion of (x1 + x2 + x3 )6 ? Solution Substituting n = 6. n = 3. Find an expression for the number of n   n ways in which r distinguishable objects can be distributed (c) (a − 1)r = an .     ues given in table Factorials and Binomial Coefficients of n n−r+1 n (a) = · . and use it to find the number of ways in which a baker can sell five (indistinguishable) 4. Show that if n2i equals the constant n2 and n3ij equals loaves of bread to three customers. When n is large. Using Stirling’s formula (see Exercise 6) to approxi. show that approximations by comparing them with the exact val. In some problems of occupancy theory we are con-       cerned with the number of ways in which certain indistin- m−1 m 2m − 2 guishable objects can be distributed among individuals. and a “4 out of 7” at least one in each cell. . Prove Theorem 11 by expressing all the binomial logarithms. In some problems of occupancy theory we are con- different ways. . boxes. and 2m − 1 games. 22n r r=0 8. one loaf. guishable objects can be distributed among individuals. r guishable objects can be distributed among individuals. (a) Use Stirling’s formula to obtain approximations for 13. a “3 out of 5” play-off. three loaves. Thus. Construct the seventh and eighth rows of Pascal’s tri- √ n 2πn angle and write the binomial expansions of (x + y)6 and e (x + y)7 . none of the loaves. r among n cells. and one loaf. we must look for the number of ways (a) Counting separately the number of play-offs requiring in which we can arrange the five L’s and the two verti- m. (Hint: We might argue the constant n3 . .     n−1 n 7. or cells with at least one in each cell. and team to win m games. boxes. r=0 urns. Find an expression for the number of ways in which r indistin- (b) How many different outcomes are there in a “2 out guishable objects can be distributed among n cells with of 3” play-off. . use the for. 6. r r+1 mate 2n! and n!. mula of part (a) to verify that there are 32 ways in which urns. cerned with the number of ways in which certain indistin- (b) With reference to part (b) of Exercise 1. the number of 13-card bridge hands that can be dealt with r n−r r an ordinary deck of 52 playing cards. “Statistical Tables. and that LLLL||L represents the case where the 5. where e is the base of natural 12. show that πn n   n  n L1 (a) = 2n . Feller cited among the references at the coefficients in terms of factorials and then simplifying end of this chapter. or cells. In some problems of occupancy theory we are con. tomers buy one loaf.) ber of different outcomes (sequences of wins and losses by one of the teams) is 10. show that the total num. n! can be approximated by means of the expression  n 11. m + 1. (c) n = (r + 1) .” r r r−1     (b) Use Stirling’s formula to obtain an approximation for n n n−1 (b) = · . Find an expression for the number the student can study at most 4 hours for the test on three of ways in which r indistinguishable objects can be dis- consecutive days. (A derivation of this formula may be found in the book by W. i=1 j=1 9. and use it to find the number of ways in r=0 16 . called Stirling’s formula. respec- tively. and rework the numerical part play-off? of Exercise 9 with each of the three customers getting at least one loaf of bread. Substituting appropriate values for x and y into the 2n √ formula of Theorem 9. 2 + +···+ m−1 m−1 m−1 urns. Expressing the binomial coefficients in terms of fac- 10! and 12!. In a two-team basketball play-off. tributed among n cells. cal bars.) algebraically. (b) (−1)r = 0. boxes. the winner is the first three customers buy four loaves. the formula of part (a) of Exercise 3 that L|LLL|L represents the case where the three cus- reduces to that of Theorem 2. Introduction  n1  n2i which three different books can be distributed among the n3ij 12 students in an English literature class. and find the percentage errors of these torials and simplifying algebraically. or cells. show that   14. n   n cerned with the number of ways in which certain distin. n2 . EXAMPLE 24 An assembler of electronic equipment has 20 integrated-circuit chips on her table. (x + y + z)8 . (2x + 3y − 4z + w)9 . 18. Find the coefficient of x2 y3 z3 in the expansion of with respect to y. show that n  2      n 2n −1 = (a) = (−1)r . and she must solder three of them as part of a larger component. the infi. . n2 . . . . n 2 . Show that r = n2n−1 by setting x = 1 in The- r=0 r orem 9. n2 − 1. we obtain the result 20 P3 = 20!/17! = 20 · 19 · 18 = 6.    4 3 r+1   n n−i √ √ = r r−i+1 (b) 5 writing 5 = 2(1 + 14 )1/2 and using the first four i=1 terms of the binomial expansion of (1 + 14 )1/2 . . . . . The following examples illustrate further applications of this theory. If n is not a positive integer or zero. . and finally substituting y = 1. the binomial 23.     n n−1 nite series = n1 . nk     n−1 + n n(n − 1) · . . 4 The Theory in Practice Applications of the preceding theory of combinatorial methods and binomial coeffi- cients are quite straightforward. · (n − r + 1) n1 . nk           n n 2 n 3 n r n−1 1+ y+ y + y +···+ y +··· + +··· 1 2 3 r n1 . Repeatedly applying Theorem 11. for −1 < y < 1.840 EXAMPLE 25 A lot of manufactured goods. Show that expansion of (1 + y)n yields. Rework Exercise 17 by making use of part (a) of 22. 2. show that 1 −3 (a) 2 and . . . then differentiating the expressions on both sides 21. Use Theorem 12 to show that 20. 3. 19. . r r  n n 17. presented for sampling inspection. . . Introduction     15. In how many ways can 4 of the 16 units be selected for inspection? 17 . . Find the coefficient of x3 y2 z3 w in the expansion of Exercise 14 and part (c) of Exercise 13. . r r! Use this generalized definition of binomial coefficients by expressing all these multinomial coefficients in terms to evaluate of factorials and simplifying algebraically. nk n1 − 1. 16. . . r n r r=0     −n n+r−1   (b) = (−1)r for n > 0. . contains 16 units. In how many ways can she choose the three chips for assembly? Solution Using Theorem 6. nk − 1 where = for r = 1. With reference to the generalized definition of bino- mial coefficients in Exercise 19. . and a variety of them have been given in Sections 2 and 3. . in how many can come up with a different number of points. On August 31 there are five wild-card terms in the American League that can make it to the play-offs. Miss Japan. Counting the number of outcomes in games of chance only two will win spots. In a primary election. reordering at the end of each day (for 32. A person with $2 in her pocket bets $1. There are four routes. (a) the winner and the first runner-up. Thus. hosts for these championships ent ways that it can turn on the furnace for a total of 6 (a) if they are not both to be held at the same university. there are four candidates for delivery early the next morning) if and only if he has sold mayor. and she continues to bet $1 as long dice can come up with the same number of points. A thermostat will call for heat 0. 30. or 2 times a night.. 28. Introduction Solution According to Theorem 7. chance were often interpreted as divine intent. This was the various possible play-off wild-card teams.   16 = 16!/4!12! = 16 · 15 · 14 · 13/4 · 3 · 2 · 1 = 1. so he cannot take it on the way to work. times over 4 nights. 1–4 24. of interest not only because of the gambling that was involved. (a) Find the number of ways in which three dice can all come up with the same number of points. After the fourth flip of the coin. (b) Find the number of ways in which two of the three on the flip of a coin. and two candi- them both. In how many ways can the judges choose remaining game or games. but route B was just about a thousand years ago that a bishop in what is one-way. The pro at a golf course stocks two identical sets of women’s clubs. the bishop’s calculations that there are altogether 56 (b) exactly $2 ahead? possibilities. so he cannot take it on the way home. five candidates for city treasurer. consecutive years. (b) if they may both be held at the same university? 25. B. while as she has any money. A. even money. and (c) to verify (a) exactly even. C. Construct a tree diagram to show that if he dates for county attorney. there are (a) In how many ways can a voter mark his ballot for all altogether eight different ways in which he can make sales three of these offices? on the first two days of that week. and number of ways in which these teams may win or lose the Miss Norway. of the cases will she be (d) Use the results of parts (a). various things that can happen during the first four flips (c) Find the number of ways in which all three of the dice of the coin. ways in which three dice can fall provided one is inter- (a) Draw a tree diagram showing the various ways the ested only in the overall result and not in which die does person can go to and from work. He assigned a virtue to each of these possibilities (b) Draw a tree diagram showing the various ways he and each sinner had to concentrate for some time on the can go to and from work without taking the same route virtue that corresponded to his cast of the dice. The five finalists in the Miss Universe contest are Miss three games to two. 1. but also because the outcomes of games of 26. Construct a tree diagram to show the Argentina. Miss Belgium. and 31. Suppose that in a baseball World Series (in which the these offices? winner is the first team to win four games) the National League champion leads the American League champion 33. (b). the first runner-up. 27.A. it son’s home and the place where he works. what. in how many ways can they select the Construct a tree diagram to show that there are 10 differ.092 ways 4 Applied Exercises SECS. starts on a Monday with two sets of the clubs. and the second for hosting its intercollegiate tennis championships in two runner-up? 18 . Draw a tree diagram to show the the third comes up with a different number of points. If the NCAA has applications from six universities (b) the winner. Miss U. Draw a tree diagram which shows has been a popular pastime for many centuries. (b) In how many ways can a person vote if he exercises his option of not voting for a candidate for any or all of 29.S. and is now Belgium determined that there are 56 different route C is one-way. between a per. and D. both ways. four order by Sunday night and we are interested only in how many steak. is preparing her will. the needy? ters in the word “statistics”? How many of these begin and end with the letter s? 55. and one tie? 56. In how many ways can five persons line up to get on artists. (b) with any two of the four women. and one of the sweaters to take along on a trip? 38. three diamonds. in how many different ways can he take one of them along to each of the six games? 42. Saturday. Introduction 34. is unsold to food banks for the needy. seven blouses. On a Friday morning. (b) 10 are right and 10 are wrong. Ms. each (a) 7 are right and 13 are wrong. Rework Exercise 56 given that at least two of the cans true or false so that of tennis balls were sold on each of the three days. if each of the six food banks is to receive at least son. in how many differ- ent ways can the bakery distribute the 12 apple pies 44.048. At the end of the day. cise 38 fill the six time slots for commercials if there are two C’s. and 46. In how many can choose 2 of 15 warehouses to ship a large order. Sunday? ing of 20 questions can be marked in 1. If it has 12 apple (b) “greet”? pies left at the end of a given day. in how many different ways could the tennis balls have been sold on Friday. Determine the number of ways in which a distributor city council are three men and four women. With reference to Exercise 54. to be selected from among 10 cities. be shown twice? 52. in how has 14 identical cans of tennis balls. and three (a) get the one that is defective. and two program? hearts? 39. A carton of 15 light bulbs contains one that is defec. three of the blouses. and one order lobster? were sold on each day. 48. A baseball fan has a pair of tickets for six different 41. Jones has four skirts. In how many ways can the television director of Exer. ferent ways can a student check off her answers to these (c) at least 17 are right? questions? 47. 51. An art collector. sweaters. three B’s. In how many ways can an inspector choose 3 of the sets? bulbs and 49. In how many different ways a bus? In how many ways can they line up if two of the can she leave these paintings to her three heirs? persons refuse to follow each other? 53. Among the seven nominees for two vacancies on a 35. in how many different ways can it distribute these pies among six food banks for 43. In how many ways can a hotel purchase 37. How many distinct permutations are there of the let. ent ways can one plan such a tour (c) with one of the men and one of the women? (a) if the order of the stopovers matters. In how many ways can she choose two of the (b) not get the one that is defective? skirts. In how many ways can each question be marked 57. If they are all sold many different ways can three order chicken. three clubs. In how many ways can it end the season with five one pie? wins. ways can these vacancies be filled 36. who owns 10 paintings by famous 40. How many different bridge hands are possible con- time slots allocated to commercials during a two-hour taining five spades. How many permutations are there of the letters in the word 54. The price of a European tour includes four stopovers (a) with any two of the seven nominees. 19 . In Example 4 we showed that a true–false test consist. If he has five friends a folk dance? who like baseball. four of these sets and receive at least two of the defective tive. A shipment of 10 television sets includes three that (b) if the order of the stopovers does not matter? are defective. In how many ways can eight persons form a circle for home games of the Chicago Cubs. In how many dif. permitting a choice of three alternatives. A multiple-choice test consists of 15 questions. In how many differ. In how many ways can a television director sched- ule a sponsor’s six different commercials during the six 50. four losses. Find the number of ways in which one A. a bakery gives everything that (a) “great”. the pro shop of a tennis club 45.576 different ways. A college team plays 10 football games during a sea. If eight persons are having dinner together. and one F can be distributed among seven stu- three different sponsors and the commercial for each is to dents taking a course in statistics. Feller. and Plackett. 1 (a) n2ni . eds.. Conn. Science Publishers. 20. A wealth of material on combinatorial methods can be and found in Riordan. Baltimore: The Williams & Wilkins Company. Studies in the and Whitworth.. Stigler. Inc.. 1958. Introduction References Among the books on the history of statistics there are Eisen. S. 15. Inc. 1977. become a classic in this field. E. I. 49 630. A.. Applications. Princeton. Inc. 1.. The Rise of Statistical Thinking... which has Hafner Publishing Co.J. 20. 1820– 1900. New York: Random and the more recent publications House. G. 1978.. Method.... 1932. (b) 4.625. 31 (a) 6. Vol. Studies in the History of Statistical York: Gordon and Breach. S. 1970. 57 45. Applied Combinatorial Mathe- and matics. i=1 39 90. Inc. New York: John Wiley & Sons. 47 (a) 21. Applied Combinatorics. 37 (a) 91. Inc. (b) 2.   41 5040.. M. T.. 1962. 27 (a) 5. S.. and 70. 6. Sons. 35. M... M.: York: Hafner Publishing Co. 27. New History of Statistics and Probability. An Introduction to Combinatorial Analysis. W. David.J. Vol.. r+n−1 9 and 21.. L. Combinatorial Mass. Darien. 53 15.. (b) 60.. 5 (b) 6. 1986. 1959. D.. 21 560... 5th ed. 55 462. Contributions to the History of Statis. II.. N. Upper Saddle History of Statistics and Probability. F. E. and BARTON. eds. 35. New York: Hafner Publishing Co. Mathematics of Choice. 1965. W. (c) 12. Pearson. 33 (a) 20.. 1986. 15. r 45 280. 1968. Studies in the Roberts. New York: John Wiley & Sons.. J. 21. 1984.. Kendall. 1929. ory. King & Son. G. 7. 19 (a) −15 384 and −10. R. N.400 and 3360. 7. Answers to Odd-Numbered Exercises  n 35 (a) 105. 1964. The History of Statistics. J. Cambridge. M. 11 Seventh row: 1.: Prentice Hall. 1. New Walker. F. D.. (b) 30. Choice and Chance. Inc. M. 20 . 6. (b) 6. Inc. Inc. An Introduction to Probability Theory and Its Westergaard.. 3rd ed.230. E. H. 51 420.. Inc. H. S. Inc.. (c) 20. ed.: Princeton University Press. Eighth row: 1. (b) 364.... Chance. M. Porter. New York: River. Niven. N. Elementary Combinatorial Analysis. (d) 56. Macmillan Publishing Co. 1970. More advanced treatments may be found in Beckenbach. Basic Techniques of Combinatorial The- New York: John Wiley & Sons. New York: John Wiley & tics. and Kendall. Cohen. A. 43 50. F. London: P..: Harvard University Press. the oldest way of defining probabilities. Marylees Miller. if the weather bureau From Chapter 2 of John E. the probability of drawing 4 an ace is 52 = 13 1 .” then n the probability of a “success” is given by the ratio N . Copyright © 2014 by Pearson Education. Eighth Edition. according to which the probability of an event (outcome or happening) is the proportion of the time that events of the same kind will occur in the long run. This would be the case.) Although equally likely possibilities are found mostly in games of chance. All rights reserved. Inc. Freund’s Mathematical Statistics with Applications. most widely held is the frequency inter- pretation.Probability 1 Introduction 6 Conditional Probability 2 Sample Spaces 7 Independent Events 3 Events 8 Bayes’ Theorem 4 The Probability of an Event 9 The Theory in Practice 5 Some Rules of Probability 1 Introduction Historically. or if we are concerned with a person’s recovery from a disease. Irwin Miller. the classical probability concept applies also in a great variety of situations where gam- bling devices are used to make random selections—when office space is assigned to teaching assistants by lot. EXAMPLE 1 What is the probability of drawing an ace from an ordinary deck of 52 playing cards? Solution Since there are n = 4 aces among the N = 52 cards. 21 . of which one must occur and n are regarded as favorable. when machine parts are chosen for inspection so that each part produced has the same chance of being selected. Among the various probability concepts. that each card has the same chance of being drawn. Similarly. we mean (in accordance with the frequency interpretation) that such flights arrive on time 84 percent of the time.84 that a jet from Los Angeles to San Francisco will arrive on time. (It is assumed. when some of the families in a township are chosen in such a way that each one has the same chance of being included in a sample study. if we are con- cerned with the question whether it will rain on a given day. If we say that the probability is 0. for instance. if we are concerned with the outcome of an election. and so forth. A major shortcoming of the classical probability concept is its limited applica- bility. of course. applies when all possible outcomes are equally likely. as is presumably the case in most games of chance. for there are many situations in which the possibilities that arise cannot all be regarded as equally likely. or as a “success. We can then say that if there are N equally likely possibili- ties. the classical probability con- cept. the sample space for the possible outcomes of one flip of a coin may be written S = {H. we write S = {2k + 1|k = 0.} How we formulate the sample space for a given situation will depend on the problem at hand. If an experiment consists of one roll of a die and we are interested in which face is turned up. Each outcome in a sample space is called an element of the sample space. T} where H and T stand for head and tail. and sample space. for instance. can be used in applications as long as it is consistent with these rules. or simply a sample point. let us explain first what we mean here by event and by the related terms experiment. The set of all possible outcomes of an experiment is called the sample space and it is usually denoted by the letter S. The results one obtains from an experi- ment. 2.30). we say that an event has a probability of. If a sample space has a finite number of elements. or it may consist of the very complicated process of determining the mass of an electron.90. . or interpretations. out- come. SAMPLE SPACE. Then. whether they are instrument readings. we would use the sample space 22 . Sample spaces with a large or infinite number of elements are best described by a statement or rule. we should find that the proportion of “successes” is very close to 0. 2 Sample Spaces Since all probabilities pertain to the occurrence or nonoccurrence of events. in the same sense in which we might say that our car will start in cold weather 90 percent of the time. counts. an experiment may consist of the simple pro- cess of checking whether a switch is turned on or off.90. any one of the preceding probability concepts. . In this sense. 1.” Similarly. if S is the set of odd positive integers. More generally. or values obtained through extensive calculations. the sample space may be written S = {x|x is an automobile with a satellite radio} This is read “S is the set of all x such that x is an automobile with a satellite radio. are called the outcomes of the experiment. in which probabilities are defined as “mathematical objects” that behave according to certain well-defined rules. It is customary in statistics to refer to any process of observation or measure- ment as an experiment. The approach to probability that we shall use in this chapter is the axiomatic approach. Probability predicts that there is a 30 percent chance for rain (that is. We cannot guarantee what will happen on any particular occasion—the car may start and then it may not—but if we kept records over a long period of time. a probability of 0. we may list the elements in the usual set notation. say. for example. . if the possible outcomes of an experiment are the set of automobiles equipped with satellite radios. 0. DEFINITION 1. this means that under the same weather conditions it will rain 30 percent of the time. it may consist of counting the imperfections in a piece of cloth. “yes” or “no” answers. Probability S1 = {1. is given by S2 = {2. the amounts making up the sample space are infinite in number and not countable. but if a coin is flipped until a head appears for the first time. we would use the sample space S2 = {even. in the preceding illustration S1 would be preferable to S2 . 2.} with an unending sequence of elements. Also. one red and one green. TTTTH. . Sample spaces are usually classified according to the number of elements that they contain. Thus. . the third flip. 4. it is preferable that an element of a sample space not represent two or more outcomes that are distinguishable in some way. In the preceding example the sample spaces S1 and S2 contained a finite number of elements. . 5. For this experiment we obtain the sample space S = {H. TH. . . TTTH. . 6} where x represents the number turned up by the red die and y represents the number turned up by the green die. (The different colors are used to emphasize that the dice are distinct from one another. But even here the number of elements can be matched one-to-one with the whole numbers. if we are interested only in whether the face turned up is even or odd. The outcomes of some experiments are neither finite nor countably infinite. If a sample space contains a finite number of elements or an infinite though countable number of elements. . 2. . Such is the case. 6. Thus. 4. the second flip. odd} This demonstrates that different sample spaces may well be used to describe an experiment. 2. . adequate for most purposes (though less desirable in general as it provides less information). 3. . In general. it is desirable to use sample spaces whose elements cannot be divided (partitioned or separated) into more primitive or more elementary kinds of outcomes. A second sample space. TTH. this could happen on the first flip. In other words. If we assume that distance is a variable that can be measured to any desired degree of accuracy. 12} where the elements are the totals of the numbers turned up by the two dice. sample spaces need 23 . if we want to measure the amount of time it takes for two chemicals to react. . y)|x = 1. 6} However. it is said to be discrete. for example. .. y = 1. . and in this sense the sample space is said to be countable. EXAMPLE 2 Describe a sample space that might be appropriate for an experiment in which we roll a pair of dice. and there are infinitely many possibilities. there is an infinity of possibilities (distances) that can- not be matched one-to-one with the whole numbers.) Solution The sample space that provides the most information consists of the 36 points given by S1 = {(x. when one conducts an investigation to determine the dis- tance that a certain make of car will travel over a prescribed test course on 5 liters of gasoline. . . the fourth flip. . . 3. . such as temperature. 3 Events In many problems we are interested in results that are not given directly by a specific element of a sample space. 6} of the sample space S1 . 2). describe the event A that the number of points rolled with the die is divisible by 3. such as all the points of a line segment or all the points in a plane. EXAMPLE 4 With reference to the sample space S1 of Example 2. Green die 6 5 4 3 2 1 Red die 1 2 3 4 5 6 Figure 1. A is represented by the subset {3.. 24 . 5. (5. 1)} Note that in Figure 1 the event of rolling a total of 7 with the two dice is represented by the set of points inside the region bounded by the dotted line. Solution Among the 36 possibilities. (4. that are measured on continuous scales. 4). Rolling a total of 7 with a pair of dice. So. . pressure. only 3 and 6 are divisible by 3. it is said to be continuous. Solution Among 1. 5). 1) yield a total of 7. Probability not be discrete. 2. 6). If a sample space consists of a continuum. 6). . and 6. 3. (2. we write B = {(1. 4). (2. speed. (5. 2). describe the event B that the total number of points rolled with the pair of dice is 7. (4. only (1. 3). 5). length. 3). (3. 4. (6. EXAMPLE 3 With reference to the first sample space S1 on the previous page. Therefore. Continuous sample spaces arise in practice whenever the outcomes of experiments are measure- ments of physical properties. and (6. (3. . Thus. 25 . (1. 1. it can be seen that M = {(0. 0. the elements of the sample space that constitute event M that the person will miss the target three times in a row. and the elements of event N that the person will hit the target once and miss it twice. 1. any event (outcome or result) can be identified with a collection of points. 1). 1. (0. (0. respectively. (1. 0. 1). An event is a subset of a sample space. EVENT. 1) (0. 1. which constitute a subset of an appropriate sample space. 0) First shot (0. 0) (1. describe a suitable sample space. 0. 0). 1) (0. 0. 1. 1)} Third shot (0. (0. Sample space for Example 5. EXAMPLE 5 If someone takes three shots at a target and we care only whether each shot is a hit or a miss. 0)} and N = {(1. and in probability and statistics we identify the subset with the event. EXAMPLE 6 Construct a sample space for the length of the useful life of a certain electronic component and indicate the subset that represents the event F that the component fails before the end of the sixth year. (1. 0). (0. 0. 0. 1) (1. 0. 1. 1). 0) Second shot Figure 2. 0. 0. Solution If we let 0 and 1 represent a miss and a hit. 0). Such a subset consists of all the elements of the sample space for which the event occurs. 0) (1. 1) may be displayed as in Figure 2. Probability In the same way. 0). 1. and (1. 0. DEFINITION 2. 1. 0). 0. (0. the eight possibilities (0. 1. 0). 1) (1. the union of events A and B. event A. while events are represented by regions within the rectangle. the complement of event A. particularly relationships among events. For instance. or in both. all subsets are events. the regions are numbered 1 through 8 for easy reference. Sample spaces and events. Here. Although the reader must surely be familiar with these terms. but it should be observed that the converse is not necessarily true. any event is a subset of an appropriate sample space. 26 . formed by taking unions. In many problems of probability we are interested in events that are actually combinations of two or more events. and complements. intersections. but in the continuous case some rather abstruse point sets must be excluded for mathematical reasons. usually by circles or parts of circles. we usually draw the circles as in Figure 4. and the intersection of events A and B. and the subset F = {t|0 F t < 6} is the event that the component fails before the end of the sixth year. if A and B are any two subsets of a sample space S. the shaded regions of the four Venn diagrams of Figure 3 represent. This is discussed fur- ther in some of the more advanced texts listed among the references at the end of this chapter. intersections. respectively. let us review briefly that. in which the sample space is represented by a rectangle. When we are dealing with three events. For dis- crete sample spaces. Probability Solution If t is the length of the component’s useful life in years. are often depicted by means of Venn diagrams. their intersection A ∩ B is the subset of S that contains all the elements that are in both A and B. and complements may be found in Exercises 1 through 4. and the complement A of A is the subset of S that con- tains all the elements of S that are not in A. Venn diagrams. the sample space may be written S = {t|t G 0}. Some of the rules that control the formation of unions. Figure 3. According to our definition. in B. their union A ∪ B is the subset of S that contains all the elements that are either in A. Two events having no elements in com- mon are said to be mutually exclusive. we write A ∩ B = ∅. (c) A ∪ (A ∩ B) = A ∪ B. (b) (A ∪ B) = A ∩ B . B. The diagram on the right serves to indicate that A is contained in B. Venn diagram. Here. where ∅ denotes the empty set. which has no elements at all. (b) A ∩ (B ∪ C) is the same event as (A ∩ B) ∪ (A ∩ C). Exercises 1. MUTUALLY EXCLUSIVE EVENTS. When A and B are mutually exclusive. Use Venn diagrams to verify that 3. Use Venn diagrams to verify the two De Morgan laws: 4. (c) A ∪ (B ∩ C) is the same event as (A ∪ B) ∩ (A ∪ C). (b) (A ∩ B) ∪ (A ∩ B ) ∪ (A ∩ B) = A ∪ B. (a) (A ∩ B) ∪ (A ∩ B ) = A. 2. Probability Figure 4. and symbolically we express this by writing A ( B. DEFINITION 3. we sometimes draw diagrams like those of Figure 5. Use Venn diagrams to verify that if A is contained in (a) (A ∩ B) = A ∪ B . Diagrams showing special relationships among events. Figure 5. then A ∩ B = A and A ∩ B = ∅. Use Venn diagrams to verify that (a) (A ∪ B) ∪ C is the same event as A ∪ (B ∪ C). 27 . the one on the left serves to indicate that events A and B are mutually exclusive. To indicate special relationships among events. the relationship between the postulates and the classical probability concept will be discussed below. Explain why the following assignments of probabilities are not permissible: (a) P(A) = 0. Solution (a) P(D) = −0. EXAMPLE 7 An experiment has four possible outcomes. . Since proportions are always positive or zero. The second postulate states indirectly that certainty is identified with a probability of 1. another event occurs 39 percent of the time. it can easily be seen that it is satisfied by the frequency interpreta- tion. B. we shall follow the practice of denoting events by means of capital letters. and this violates Postulate 2. If one event occurs. it is always assumed that one of the possibilities in S must occur. Probability 4 The Probability of an Event To formulate the postulates of probability. and the same kind of argument applies when there are more than two mutually exclusive events. that is. A. but if the resulting theory is to be applied. (b) P(S) = P(A ∪ B ∪ C ∪ D) = 1209 + 120 45 + 120 27 + 120 46 = 127 120 Z 1. P(B) = 0. is a finite or infinite sequence of mutually exclusive events of S. the probability of event B as P(B). for two mutually exclusive events A1 and A2 . P(C) = 120 27 . in other words.20 violates Postulate 1. POSTULATE 1 The probability of an event is a nonnegative real number.. As far as the frequency interpretation is concerned.20. Thus. The following postulates of probability apply only to discrete sample spaces. they merely restrict the ways in which it can be done.45. then P(A1 ∪ A2 ∪ A3 ∪ · · · ) = P(A1 ) + P(A2 ) + P(A3 ) + · · · Postulates per se require no proof. A3 . that is. after all. P(A) G 0 for any subset A of S. S. and it is to this certain event that we assign a probability of 1. . P(D) = 120 46 . POSTULATE 2 P(S) = 1. A2 . 28 . we must show that the postulates are satisfied when we give probabilities a “real” meaning. while the relationship between the postulates and subjective prob- abilities is left for the reader to examine in Exercises 16 and 82. the first postulate is in complete agreement with the frequency interpretation. that are mutually exclu- sive. then one or the other will occur 28 + 39 = 67 percent of the time. a probability of 1 implies that the event in question will occur 100 percent of the time or. (b) P(A) = 120 9 . they are mutually exclusive). that it is certain to occur. P(C) = 0. and D.63. the third postulate is satisfied. and so forth. P(B) = 120 45 . C. Taking the third postulate in the simplest case. 28 percent of the time. and the two events cannot both occur at the same time (that is. let us emphasize the point that the three postulates do not tell us how to assign probabilities to events. Let us illustrate this in connection with the frequency interpretation. POSTULATE 3 If A1 . P(D) = −0. Before we study some of the immediate consequences of the postulates of prob- ability.12. . and we shall write the probability of event A as P(A). say. HT. EXAMPLE 8 If we twice flip a balanced coin. on the basis of subjective judgments. we must be able to assign probabilities to the individual outcomes of experiments. Since we assume that the coin is balanced. Probability Of course. or on the basis of assumptions—sometimes the assumption that all possible outcomes are equiprobable. these outcomes are equally likely and we assign to each sample point the probability 14 . Instead of listing the probabilities of all possible subsets.048.576 subsets. then P(A) equals the sum of the probabilities of the individual outcomes comprising A. . . TH} and P(A) = P(HH) + P(HT) + P(TH) 1 1 1 = + + 4 4 4 3 = 4 29 . THEOREM 1. TH. on the basis of a careful analysis of all underlying conditions. and the number of subsets grows very rapidly when there are 50 possible outcomes. O3 . To assign a probability measure to a sample space. Proof Let O1 . or more. we often list the probabilities of the individual outcomes. This is fortunate. Letting A denote the event that we will get at least one head. be the finite or infinite sequence of outcomes that comprise the event A.. for a sample space with as few as 20 possible outcomes has already 220 = 1. HT. it is not necessary to specify the probability of each possible subset. what is the probability of getting at least one head? Solution The sample space is S = {HH. If A is an event in a discrete sample space S. the O’s. and then make use of the following theorem. the third postulate of probability yields P(A) = P(O1 ) + P(O2 ) + P(O3 ) + · · · This completes the proof. or sample points of S. Thus. in actual practice probabilities are assigned on the basis of past expe- rience. 100 possible outcomes. . are mutually exclusive. where H and T denote head and tail. How this is done in some special situations is illustrated by the following examples. O2 . A = O1 ∪ O2 ∪ O3 · · · and since the individual outcomes. TT}. we get A = {HH. To use this theorem. ver- ify that  i 1 P(Oi ) = for i = 1. Probability EXAMPLE 9 A die is loaded in such a way that each odd number is twice as likely to occur as each even number. Getting 1 1 1 1 P(S) = + + + +··· 2 4 8 16 and making use of the formula for the sum of the terms of an infinite geometric progression. . we find that 2w + w + 2w + w + 2w + w = 9w = 1 in accordance with Postulate 2. 2 is. The probability measure of Example 10 would be appropriate. Find P(G). 6}. or fifth flip of the coin is  3  4  5 1 1 1 7 + + = 2 2 2 32 and the probability that the first tail will come on an odd-numbered flip of the coin is  1  3  5 1 1 1 1 2 + + +··· = 2 1 = 2 2 2 1− 4 3 30 . for example. for a given experiment. 2. . indeed. the probability that the first tail will come on the third. if we assign probability w to each even number and probability 2w to each odd number. Solution The sample space is S = {1. the word “sum” in Theorem 1 will have to be interpreted so that it includes the value of an infinite series. it remains to be shown that P(S) = 1. is an infinite sequence of outcomes. fourth. . 2. a probability measure. O2 . 5. where G is the event that a number greater than 3 occurs on a single roll of the die.. 4. we find that 1 2 P(S) = =1 1 − 12 In connection with the preceding example. Thus. Hence. if Oi is the event that a person flipping a balanced coin will get a tail for the first time on the ith flip of the coin. 3. O3 . Solution Since the probabilities are all positive. . preferably by means of a formula or equation. EXAMPLE 10 If. . probabilities will have to be assigned to the individual outcomes by means of a mathematical rule. 3. O1 . . It follows that w = 19 and 1 2 1 4 P(G) = + + = 9 9 9 9 If a sample space is countably infinite. . and it does not matter which ones. what is the probability of being dealt a full house? Solution The number of ways in which we can be dealt a particular full house. . If A is the union of n of these mutually exclusive N outcomes. . and if n of these outcomes together constitute event A. . ON represent the individual outcomes in S. then the probability of event A is n P(A) = N Proof Let O1 . Indeed. is 3 2 . say three kings   4 4 and two aces. each 1 with probability . If all the five-card hands are equally likely. as was the case in Example 8. EXAMPLE 11 A five-card poker hand dealt from a deck of 52 playing cards is said to be a full house if it consists of three of a kind and a pair. Also. O2 . then P(A) = P(O1 ∪ O2 ∪ · · · ∪ On ) = P(O1 ) + P(O2 ) + · · · + P(On ) 1 1 1 = + +···+ N N  N n terms n = N n Observe that the formula P(A) = of Theorem 2 is identical with the one for N the classical probability concept (see below). Since there are 13 ways of selecting the face value for the three of a kind and for each of these there are 12 ways of selecting the face value for the pair. THEOREM 2. the total number of equally likely five-card poker hands is 31 . If an experiment is such that we can assume equal probabilities for all the sample points. what we have shown here is that the classical probability concept is consistent with the postulates of probability—it follows from the postulates in the special case where the individual outcomes are all equiprobable. we can take advantage of the following special case of Theorem 1. there are altogether 4 4 n = 13 · 12 · 3 2 different full houses. If an experiment can result in any one of N different equally likely outcomes. Probability Here again we made use of the formula for the sum of the terms of an infinite geo- metric progression. then it does not occur 63 percent of the time. in colloquial terms. THEOREM 4. In practice. it follows that P(S) = P(S ∪ ∅) = P(S) + P(∅) (by Postulate 3) and. say. hence. THEOREM 3. Thus. this result implies that if an event occurs. P(∅) = 0 for any sample space S. we make use of the definition of a complement. we write 1 = P(S) (by Postulate 2)  = P(A ∪ A ) = P(A) + P(A ) (by Postulate 3) and it follows that P(A ) = 1 − P(A).0014 N 52 5 5 Some Rules of Probability Based on the three postulates of probability. then P(A ) = 1 − P(A) Proof In the second and third steps of the proof that follows. Probability 52 N= 5 and it follows by Theorem 2 that the probability of getting a full house is 4 4 13 · 12 n 3 2 P(A) = = = 0. If A and A are complementary events in a sample space S. 37 percent of the time. 32 . according to which A and A are mutually exclusive and A ∪ A = S. Proof Since S and ∅ are mutually exclusive and S ∪ ∅ = S in accordance with the definition of the empty set ∅. that P(∅) = 0. Among them. It is important to note that it does not necessarily follow from P(A) = 0 that A = ∅. we often assign 0 probability to events that. we can derive many other rules that have important applications. In connection with the frequency interpretation. the next four theorems are immediate consequences of the postulates. there is the classical example that we assign a probability of 0 to the event that a monkey set loose on a typewriter will type Plato’s Republic word for word without a mistake. we can write B = A ∪ (A ∩ B) as can easily be verified by means of a Venn diagram. If A and B are any two events in a sample space S. the probability of drawing a heart from an ordinary deck of 52 playing cards cannot be greater than the probability of drawing a red card. For instance. . For any two events A and B. If A and B are events in a sample space S and A ( B. A3 . since A and A ∩ B are mutually exclusive. must all be mutually exclusive. it is special in the sense that events A1 . A ∩ B . THEOREM 6.. Indeed. then P(A) F P(B). b. especially. then P(A) cannot be greater than P(B). A2 . we find that P(A ∪ B) = a + b + c = (a + b) + (c + a) − a = P(A) + P(B) − P(A ∩ B) 33 . . or the inclusion–exclusion principle: THEOREM 7. the probability of drawing a red card. and A ∩ B as in the Venn diagram of Figure 6. The fact that P(A) = 0 does not imply that A = ∅ is of relevance. 0 F P(A) F 1 for any event A. in the continuous case. then P(A ∪ B) = P(A) + P(B) − P(A ∩ B) Proof Assigning the probabilities a. Probability would not happen in a million years. this theorem states that if A is a subset of B. and c to the mutually exclusive events A ∩ B. the probability of drawing a heart is 14 . P(∅) = 0 and P(S) = 1 leads to the result that 0 F P(A) F 1 The third postulate of probability is sometimes referred to as the special addi- tion rule. For instance. there exists the general addition rule. compared with 12 . Proof Since A ( B. THEOREM 5. we get P(B) = P(A) + P(A ∩ B) (by Postulate 3) G P(A) (by Postulate 1) In words. we have P(∅) F P(A) F P(S) Then. Then. . Proof Using Theorem 5 and the fact that ∅ ( A ( S for any event A in S. and P(B ∪ T) = 0. or both kinds of sets.23 and 0. for three events we obtain the following theorem.35 − 0. the probabilities are 0. we thus get P(B ∩ T) = 0. respectively.38.86 + 0. 34 . that a family (randomly chosen for a sample survey) owns a color television set.29. Probability A B b a c Figure 6.35. we have P(B) = 0. P(B) = 0. that a truck stopped at a roadblock will have faulty brakes or badly worn tires. 0.38 = 0.24. Venn diagram for proof of Theorem 7. the probability is 0.92 EXAMPLE 13 Near a certain exit of I-17. Also. EXAMPLE 12 In a large metropolitan area. substitution into the formula of Theorem 7 yields P(A ∪ B) = 0.24 − 0.23.23 + 0. respectively. What is the probability that a family owns either or both kinds of sets? Solution If A is the event that a family in this metropolitan area owns a color television set and B is the event that it owns a HDTV set.86.29.24 − P(B ∩ T) Solving for P(B ∩ T). we have P(A) = 0.09 Repeatedly using the formula of Theorem 7.24. a HDTV set. we can generalize this addition rule so that it will apply to any number of events.35.29 = 0.38 that a truck stopped at the roadblock will have faulty brakes and/or badly worn tires. For instance. and 0. What is the probability that a truck stopped at this roadblock will have faulty brakes as well as badly worn tires? Solution If B is the event that a truck stopped at the roadblock will have faulty brakes and T is the event that it will have badly worn tires.23 + 0. the probabilities are 0. P(T) = 0.38 = 0.86. and P(A ∩ B) = 0. substitution into the formula of Theorem 7 yields 0. a cavity filled. and the probability that he will have his teeth cleaned.44. and E is the event that he will have a tooth extracted. What is the probability that a person visiting his dentist will have at least one of these things done to him? Solution If C is the event that the person will have his teeth cleaned.11.03.08. the probability that he will have his teeth cleaned and a tooth extracted is 0. P(C ∩ E) = 0.24 + 0.21 − 0.07 + 0.08.44 + 0. the probability that he will have a cavity filled is 0. and C are any three events in a sample space S.11 − 0. P(F ∩ E) = 0.11. once for P[A ∪ (B ∪ C)] and once for P(B ∪ C). we get P(A ∪ B ∪ C) = P[A ∪ (B ∪ C)] = P(A) + P(B ∪ C) − P[A ∩ (B ∪ C)] = P(A) + P(B) + P(C) − P(B ∩ C) − P[A ∩ (B ∪ C)] Then.08 − 0. we are given P(C) = 0. P(E) = 0.66 35 . and P(C ∩ F ∩ E) = 0.03 = 0.44. and a tooth extracted is 0. suppose that the probability that he will have his teeth cleaned is 0. the probability that he will have a tooth extracted is 0. B. the probability that he will have a cavity filled and a tooth extracted is 0.) EXAMPLE 14 If a person visits his dentist. F is the event that he will have a cavity filled.07. and substitution into the formula of Theorem 8 yields P(C ∪ F ∪ E) = 0.21. If A. based on the method used in the text to prove Theorem 7.21. Probability THEOREM 8. P(C ∩ F) = 0.03.24. P(F) = 0. then P(A ∪ B ∪ C) = P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C) Proof Writing A ∪ B ∪ C as A ∪ (B ∪ C) and using the formula of Theo- rem 7 twice. the probability that he will have his teeth cleaned and a cavity filled is 0.24.07. we find that P[A ∩ (B ∪ C)] = P[(A ∩ B) ∪ (A ∩ C)] = P(A ∩ B) + P(A ∩ C) − P[(A ∩ B) ∩ (A ∩ C)] = P(A ∩ B) + P(A ∩ C) − P(A ∩ B ∩ C) and hence that P(A ∪ B ∪ C) = P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C) (In Exercise 12 the reader will be asked to give an alternative proof of this the- orem. using the distributive law that the reader was asked to verify in part (b) of Exercise 1. f . verify that − P(A ∩ B) − P(A ∩ C) − P(A ∩ D) P(A ∩ B ) = P(A) − P(A ∩ B) − P(B ∩ C) − P(B ∩ D) − P(C ∩ D) 7. show that (b) P(A) F P(A ∪ B). and p be the probabili- (a) P(A ∩ B) F P(A) + P(B). g. n. Prove by induction that 10. c.) (b) P(A ∩ B) G P(A) + P(B) − 1. Use the Venn diagram of Figure 7 with the prob. e. ver- ify that + P(A ∩ B ∩ C) + P(A ∩ B ∩ D) P(A ∩ B ) = 1 − P(A) − P(B) + P(A ∩ B) + P(A ∩ C ∩ D) + P(B ∩ C ∩ D) 8. b. and P(A ∩ B). The event that “A or B but not both” will occur can be written as − P(A ∩ B ∩ C ∩ D) (A ∩ B ) ∪ (A ∩ B) Express the probability of this event in terms of P(A). ties associated with the resulting 16 regions. (Hint: With reference to the Venn diagram of Figure 7. Duplicate the method of proof used in Exercise 12 to (a) P(A) G P(A ∩ B). Referring to Figure 6 and letting P(A ∩ B ) = d. P(A ∪ B ∪ C ∪ D) = P(A) + P(B) + P(C) + P(D) 6. j. l. d. 14. Referring to Figure 6. Use parts (a) and (b) of Exercise 3 to show that 13. m. Use the formula of Theorem 7 to show that a. h. k. designat- ing one to be inside D and the other outside D and letting 9. . divide each of the eight regions into two parts. o. P(B). Probability Exercises 5. i. . [Hint: Start with the argument that since P(A) = 1. and g assigned to A ∩ B ∩ C.] 15. c.. For instance. . . n abilities a. b. E2 . and En . its probability is which we proved Theorem 7 to prove Theorem 8. The odds that an event will occur are given by the 11. if a person feels that 3 to 2 are fair odds that a business venture will a succeed (or that it would be fair to bet $30 against $20 d c 3 that it will succeed). Give an alternative proof of Theorem 7 by making ratio of the probability that the event will occur to the use of the relationships A ∪ B = A ∪ (A ∩ B) and B = probability that it will not occur. P(E1 ∪ E2 ∪ · · · ∪ En ) F P(Ei ) A ∩ B ∩ C . the probability is = 0. Subjective probabilities may be determined by expos- A B ing persons to risk-taking situations and finding the odds at which they would consider it fair to bet on the outcome.6 that f 3+2 the business venture will succeed. 36 . C (b) Postulate 2. and 13. bility is zero. Show that if subjective probabilities are determined in this way. e = c = f = 0. See also Exercise 82. e. it follows that for any finite sequence of events E1 . Odds are usually quoted in terms of positive integers having no common factor. and A ∩ B ∩ C to show that if P(A) = i=1 P(B) = P(C) = 1. A p= A+B 16. Use the Venn diagram of Figure 7 and the method by are A to B that an event will occur. Figure 7.. provided neither proba- (A ∩ B) ∪ (A ∩ B). f . . they satisfy (a) Postulate 1. Venn diagram for Exercises 10. . 12. Show that if the odds 12. then P(A ∩ B ∩ C) = 1. The odds are then converted into probabilities by means b g e of the formula of Exercise 15. . d. and we can therefore use the formula of Theorem 2. and P(A|E) is the probability that a person actively engaged in the practice of law makes more than $75. If A is the event that a person makes more than $75. L is the event that a person is licensed to practice law. what is the probability that he gets one who provides good service under warranty? Also. and if we let n(G) denote the number of elements in G and n(S) the number of elements in the whole sample space. “the probability of A given S. and E is the event that a person is actively engaged in the practice of law. and it is preferable to the abbreviated notation P(A) unless the tacit choice of S is clearly understood. as we also call it. the set of all possibilities under consideration) is by no means always self-evident.000 per year.000 per year. and so forth. Probability 6 Conditional Probability Difficulties can easily arise when probabilities are quoted without specification of the sample space. we get n(G) 16 + 10 P(G) = = = 0. it often helps to use the symbol P(A|S) to denote the conditional probability of event A relative to the sample space S or. in each case. EXAMPLE 15 A consumer research organization has studied the services under warranty provided by the 50 new-car dealers in a certain city. Some ideas connected with conditional probabilities are illustrated in the fol- lowing example. then P(A|G) is the probability that a law school graduate makes more than $75.” The symbol P(A|S) makes it explicit that we are referring to a particular sample space S.52 n(S) 50 This answers the first question. It is also preferable when we want to refer to several sample spaces in the same example. Since the choice of the sample space (that is. if a person randomly selects one of the dealers who has been in business for 10 years or more. what is the probability that he gets one who provides good service under warranty? Solution By “randomly” we mean that. another might apply to lawyers employed by corporations. One of them might apply to all those who are engaged in the private practice of law. 37 .000 per year. G is the event that a person is a law school graduate. For instance. Good service Poor service under warranty under warranty In business 10 years or more 16 4 In business less than 10 years 10 20 If a person randomly selects one of these new-car dealers. and its findings are summarized in the following table. If we let G denote the selection of a dealer who provides good service under warranty.000 per year. P(A|L) is the probability that a person licensed to practice law makes more than $75. we may well get several different answers.000 per year. if we ask for the probability that a lawyer makes more than $75. and they may all be correct. all possible selections are equally likely. 20 1 P(G|T  ) =  = = P(T ) 0.60. P(G|T) is considerably higher than P(G). substitution into the 50 50 formula yields P(T  ∩ G) 0. as should have been expected.60 3 38 . which consists of the first line of the table. This answers the second question and. the number of dealers who have been in business for 10 years or more and provide good service under warranty. and we get 16 P(G|T) = = 0.80 20 where T denotes the selection of a dealer who has been in business 10 years or more. Generalizing from the preceding. that is. what is the probability that one of the dealers who has been in business less than 10 years will provide good service under warranty? Solution 10 10 + 20 Since P(T  ∩ G) = = 0. If A and B are any two events in a sample space S and P(A) Z 0. CONDITIONAL PROBABILITY. we get n(T∩G) n(S) P(T ∩ G) P(G|T) = n(T) = P(T) n(S) and we have. Probability For the second question. if we divide the numerator and the denominator by n(S).20 and P(T  ) = = 0. thus. the conditional probability of B given A is P(A ∩ B) P(B|A) = P(A) EXAMPLE 16 With reference to Example 15. we limit ourselves to the reduced sample space. and the denominator is n(T). we can write symbolically n(T ∩ G) P(G|T) = n(T) Then. Of these. the 16 + 4 = 20 dealers who have been in business 10 years or more. DEFINITION 4. Since the numerator of P(G|T) is n(T ∩ G) = 16 in the preceding example. the total number of new-car dealers in the given city. the number of dealers who have been in business 10 years or more. let us now make the following definition of conditional probability. 16 provide good service under warranty. expressed the conditional probability P(G|T) in terms of two probabilities defined for the whole sample space S. 80 39 . 4}.72. and. we get 1 P(A ∩ B) 1 P(B|A) = = 9 = P(A) 4 4 9 To verify that the formula of Definition 4 has yielded the “right” answer in the preceding example.72 P(D|R) = = = 0. this is not a requirement for its use. P(B|A) = 14 as before. such that the sum of the three probabilities is equal to 1. we have A = {4.80 that an order will be ready for shipment on time. We thus have v + 2v + v = 1.90 P(R) 0. Since the probabilities of rolling a 1. substituting into the formula of Definition 4. EXAMPLE 18 A manufacturer of airplane parts knows from past experience that the probability is 0. we have P(R) = 0.80 and P(R ∩ D) = 0. What is the probability that such an order will be delivered on time given that it was ready for shipment on time? Solution If we let R stand for the event that an order is ready for shipment on time and D be the event that it is delivered on time. 5. B = {1. v = 14 . EXAMPLE 17 With reference to the loaded die of Example 9. and it is 0. 3. and 19 . 29 . hence. we first calculate 1 1 2 1 4 P(A ∩ B) = and P(A) = + + = 9 9 9 9 9 Then. we find that the answer to the first question is 2 1 1 P(B) = + = 9 9 3 To determine P(B|A). and it follows that P(R ∩ D) 0. 29 . we have only to assign probability v to the two even numbers in the reduced sample space A and probability 2v to the odd number. Probability Although we introduced the formula for P(B|A) by means of an example in which the possibilities were all equally likely. 2. and A ∩ B = {4}. 5. 19 . what is the probability that it is a perfect square given that it is greater than 3? Solution If A is the event that the number of points rolled is greater than 3 and B is the event that it is a perfect square. or 6 with the die are 29 . what is the probability that the num- ber of points rolled is a perfect square? Also. 4.72 that an order will be ready for shipment on time and will also be delivered on time. 6}. 19 . the first set is not replaced before the second set is selected. the probability that the first set will be defective is 240 . we obtain the following multiplication rule. that is. If we multiply the expressions on both sides of the formula of Definition 4 by P(A). what is the probability that they will both be defective? Solution If we assume equal probabilities for each selection (which is what we mean by “ran- 15 domly” picking the sets). THEOREM 9. Thus. cannot be determined without further informa- tion. If A and B are any two events in a sample space S and P(A) Z 0. then P(A ∩ B) = P(A) · P(B|A) In words. Probability Thus. (b) with replacement. EXAMPLE 20 Find the probabilities of randomly drawing two aces in succession from an ordinary deck of 52 playing cards if we sample (a) without replacement. if P(B) Z 0. Note that P(R|D). Solution (a) If the first card is not replaced before the second card is drawn. for this purpose we would also have to know P(D). the probability of getting two aces in succession is 4 3 1 · = 52 51 221 40 . Alternatively. and the probability that the second set will be defective given that the first set is defec- 14 tive is 239 15 . the probability that a shipment that is delivered on time was also ready for shipment on time. the probability that both sets will be defective is 240 · 239 14 = 1.912 7 . 90 percent of the shipments will be delivered on time provided they are shipped on time. symbolically. EXAMPLE 19 If we randomly pick two television sets in succession from a shipment of 240 tele- vision sets of which 15 are defective. the probability that A and B will both occur is the product of the probability of B and the conditional probability of A given B. the probability that A and B will both occur is the product of the probabil- ity of A and the conditional probability of B given A. P(A ∩ B) = P(B) · P(A|B) To derive this alternative multiplication rule. we interchange A and B in the formula of Theorem 9 and make use of the fact that A ∩ B = B ∩ A. This assumes that we are sampling without replacement. and substitution into the formula yields 4 3 5 4 3 P(A ∩ B ∩ C) = · · 20 19 18 1 = 114 Further generalization of Theorems 9 and 10 to k events is straightforward. what is the probability that all 3 fuses are defective? Solution If A is the event that the first fuse is defective. this need not be the case when we write P(A|B) or P(B|A). In general. 41 . If 3 of the fuses are selected at random and removed from the box in succession without replacement. and the resulting formula can be proved by mathematical induction. If A. then P(A ∩ B ∩ C) = P(A) · P(B|A) · P(C|A ∩ B) Proof Writing A ∩ B ∩ C as (A ∩ B) ∩ C and using the formula of Theo- rem 9 twice. THEOREM 10. of which 5 are defective. Theorem 9 can easily be generalized so that it applies to more than two events. and C is the event that the third fuse is defective. the corresponding probability is 4 4 1 · = 52 52 169 In the situations described in the two preceding examples there is a definite temporal order between the two events A and B. Probability (b) If the first card is replaced before the second card is drawn. for instance. we could ask for the probabil- ity that the first card drawn was an ace given that the second card drawn (without 3 replacement) is an ace—the answer would also be 51 . we get P(A ∩ B ∩ C) = P[(A ∩ B) ∩ C] = P(A ∩ B) · P(C|A ∩ B) = P(A) · P(B|A) · P(C|A ∩ B) EXAMPLE 21 A box of fuses contains 20 fuses. then P(A) = 20 5 . P(B|A) = 19 . For instance. B. P(C|A ∩ B) = 18 . for three events we have the following theorem. and C are any three events in a sample space S such that P(A ∩ B) Z 0. B is the event that the second fuse is defective. In the deriva- tion of the formula of Definition 5. Solution Since A = {HHH. HHH. TTT} C = {HTT. (b) events B and C are dependent. Now. we shall let the definition apply also when P(A) = 0 and/or P(B) = 0. HTT. that P(A) Z 0. THT. hence. HHT} B = {HHT. HTH. THT} 42 . and it can be shown that either of these equalities implies the other when both of the conditional probabilities exist. if we substitute P(B) for P(B|A) into the formula of Theorem 9. For instance. Probability 7 Independent Events Informally speaking. they are said to be dependent. we can also show that Definition 5 implies the definition of inde- pendence that we gave earlier. Symbolically. show that (a) events A and B are independent. If two events are not independent. THT. THH. two events A and B are independent if the occurrence or nonoc- currence of either one does not affect the probability of the occurrence of the other. EXAMPLE 22 A coin is tossed three times and the eight possible outcomes. For mathematical convenience. B is the event that a tail occurs on the third toss. and C is the event that exactly two tails occur in the three tosses. TTH} A ∩ B = {HHT} B ∩ C = {HTT. INDEPENDENCE. TTH. we get P(A ∩ B) = P(A) · P(B|A) = P(A) · P(B) and we shall use this as our formal definition of independence. and TTT. DEFINITION 5. HHT. we assume that P(B|A) exists and. Two events A and B are independent if and only if P(A ∩ B) = P(A) · P(B) Reversing the steps. HTT. the probability 5 of getting a defective fuse would have remained 20 . in the preceding example the selections would all have been indepen- dent had each fuse been replaced before the next one was selected. are assumed to be equally likely. THT. that is. when neither P(A) nor P(B) equals zero (see Exercise 21). If A is the event that a head occurs on each of the first two tosses. two events A and B are independent if P(B|A) = P(B) and P(A|B) = P(A). P(C) = 38 . and C. we have P(A) = P[(A ∩ B) ∪ (A ∩ B )] = P(A ∩ B) + P(A ∩ B ) = P(A) · P(B) + P(A ∩ B ) It follows that P(A ∩ B ) = P(A) − P(A) · P(B) = P(A) · [1 − P(B)] = P(A) · P(B ) and hence that A and B are independent. Probability the assumption that the eight possible outcomes are all equiprobable yields P(A) = 14 . DEFINITION 6. events B and C are not inde- pendent. A2 . as the reader was asked to show in part (a) of Exercise 3. then so are A and B . for example. . . and if A and B are dependent. To extend the concept of independence to more than two events. 3. independence requires that P(A ∩ B) = P(A) · P(B) P(A ∩ C) = P(A) · P(C) P(B ∩ C) = P(B) · P(C) 43 . For instance. then A and B are also indepen- dent. events A and B are independent. THEOREM 11. If A and B are independent. and P(B ∩ C) = 14 . For three events A. B. then A and B are dependent. consider the following theorem. and Ak are independent if and only if the probability of the intersections of any 2. INDEPENDENCE OF MORE THAN TWO EVENTS. . . and A and B . or k of these events equals the product of their respective probabilities. In connection with Definition 5. Proof Since A = (A ∩ B) ∪ (A ∩ B ). P(B) = 12 . it can be shown that if A and B are independent. . A ∩ B and A ∩ B are mutually exclusive. (a) Since P(A) · P(B) = 1 4 · 12 = 1 8 = P(A ∩ B). A and B. (b) Since P(B) · P(C) = 1 2 · 3 8 = 3 16 Z P(B ∩ C). . let us make the following definition. . P(A ∩ B) = 18 . In Exercises 22 and 23 the reader will be asked to show that if A and B are independent. Events A1 . then A and B are independent and so are A and B . and A and B are independent by assumption. . the preceding example can be given a “real” interpretation by con- sidering a large room that has three separate switches controlling the ceiling lights. Verify that A and B are independent. These lights will be on when all three switches are “up” and hence also when one of the switches is “up” and the other two are “down.” If A is the event that the first switch is “up. EXAMPLE 23 Figure 8 shows a Venn diagram with probabilities assigned to its various regions. Solution As can be seen from the diagram.” B is the event that the second switch is “up.” and C is the event that the third switch is “up.” the Venn diagram of Figure 8 shows a possible set of prob- abilities associated with the switches being “up” or “down” when the ceiling lights are on. but A. and P(A ∩ B ∩ C) = 14 . 1 P(A) · P(B) = = P(A ∩ B) 4 1 P(A) · P(C) = = P(A ∩ C) 4 1 P(B) · P(C) = = P(B ∩ C) 4 but 1 P(A) · P(B) · P(C) = Z P(A ∩ B ∩ C) 8 A B 1 1 4 4 1 4 1 4 C Figure 8. and B and C are independent. 44 . P(A ∩ B) = P(A ∩ C) = P(B ∩ C) = 14 . Thus. Incidentally. P(A) = P(B) = P(C) = 1 2. and C are not independent. A and C are independent. Venn diagram for Example 23. B. Probability and P(A ∩ B ∩ C) = P(A) · P(B) · P(C) It is of interest to note that three or more events can be pairwise independent without being independent. 60. if we are given that certain events are independent. we can write P(A) = P[(A ∩ B) ∪ (A ∩ B )] = P(A ∩ B) + P(A ∩ B ) = P(B) · P(A|B) + P(B ) · P(A|B ) 45 . Making use of the formula of part (a) of Exercise 3.60 that there will be a strike. Thus we can multiply. EXAMPLE 24 Find the probabilities of getting (a) three heads in three random tosses of a balanced coin. and C being pairwise independent—this the reader will be asked to verify in Exercise 24. The following is a simple example in which there is one intermediate stage consisting of two alternatives: EXAMPLE 25 The completion of a construction job may be delayed because of a strike. and the alternative form of the multiplication rule. obtaining 1 1 1 1 · · = 2 2 2 8 1 (b) The probability of a six on each toss is . and 0. we are given P(B) = 0. the probability that they will all occur is simply the product of their respective probabilities. Inasmuch as the tosses are independent. What is the probability that the construction job will be completed on time? Solution If A is the event that the construction job will be completed on time and B is the event that there will be a strike. we can 6 multiply the respective probabilities to obtain 1 1 1 1 5 5 · · · · = 6 6 6 6 6 7. P(A|B ) = 0. (b) four sixes and then another number in five random rolls of a balanced die. The prob- abilities are 0. and P(A|B) = 0.35. B.85 that the construction job will be completed on time if there is no strike. 776 8 Bayes’ Theorem In many situations the outcome of an experiment depends on what happens in var- ious intermediate stages. Probability It can also happen that P(A ∩ B ∩ C) = P(A) · P(B) · P(C) without A.35 that the construction job will be completed on time if there is a strike. Of course. thus the probability of tossing a 6 5 number other than 6 is . the fact that A ∩ B and A ∩ B are mutually exclusive. 0.85. Solution 1 (a) The probability of a head on each toss is and the three outcomes are inde- 2 pendent. . we get P(A) = (0. . . . then for any event A in S .85) = 0.60)(0. Probability Then. . If the events B1 . THEOREM 12. sometimes called the rule of total probability or the rule of elimination. and Bk constitute a partition of the sample space S and P(Bi ) Z 0 for i = 1. . substituting the given numerical values. k. .55 An immediate generalization of this kind of situation is the case where the intermediate stage permits k different alternatives (whose occurrence is denoted by B1 .35) + (1 − 0. . B2 . Bk ). 2. . . B2 . . .60)(0. It requires the following theorem. what is the probability that a rental car delivered to the firm will need an oil change? Solution If A is the event that the car needs an oil change.60. and 10 percent from agency 3. B2 .30)(0. we have P(B1 ) = 0. essentially. P(B2 ) = 0. or 3. and it is left to the reader in Exercise 32. 30 percent from agency 2. and 6 percent of the cars from agency 3 need an oil change. 2. and P(A|B3 ) = 0. P(B3 ) = 0.06) = 0. of the same steps we used in Example 25. and B3 are the events that the car comes from rental agencies 1. we get P(A) = (0. Substi- tuting these values into the formula of Theorem 12. 12 percent of all the rental cars delivered to this firm will need an oil change. we need the following theorem. 20 percent of the cars from agency 2 need an oil change.20.09. With reference to the preceding example. P(A|B2 ) = 0. If 9 percent of the cars from agency 1 need an oil change. A formal proof of Theorem 12 consists. called Bayes’ theorem: 46 . and B1 .60)(0.30.10)(0. suppose that we are interested in the following question: If a rental car delivered to the consulting firm needs an oil change.09) + (0. P(A|B1 ) = 0.10. what is the probability that it came from rental agency 2? To answer ques- tions of this kind.12 Thus. k P(A) = P(Bi ) · P(A|Bi ) i=1 The B’s constitute a partition of the sample space if they are pairwise mutually exclu- sive and if their union equals S.06.20) + (0. EXAMPLE 26 The members of a consulting firm rent cars from three rental agencies: 60 percent from agency 1. B2 . EXAMPLE 27 With reference to Example 26. Tree diagram for Bayes’ theorem. given that it was reached via one of its k branches. etc. . . . .10)(0.30)(0. P(B k) Bk P(A兩Bk) A P(Bk)  P(A兩Bk) Figure 9.060 = 0.06) 0.60)(0. . . P(A ∩ Br ) Proof Writing P(Br |A) = in accordance with the definition P(A) of conditional probability. k. If B1 .20) P(B2 |A) = (0. . is the ratio of the probability associated with the rth branch to the sum of the probabilities associated with all k branches of the tree. 50 percent of those requiring an oil change come from that agency. Bk constitute a partition of the sample space S and P(Bi ) Z 0 for i = 1. 2. the probability that event A was reached via the rth branch of the tree diagram of Figure 9. 47 . . . k. In words.09) + (0.120 = 0. if a rental car delivered to the consulting firm needs an oil change. Probability THEOREM 13. .30)(0. we get (0. we have only to substitute P(Br ) · P(A|Br ) for P(A ∩ Br ) and the formula of Theorem 12 for P(A). B1 P(A兩B1) A P(B1)  P(A兩B1) ) B1 B2 P(A兩B2) P( A ) P(B2)  P(A兩B2) P(B 2 etc. what is the probability that it came from rental agency 2? Solution Substituting the probabilities on the previous page into the formula of Theorem 13.20) + (0. . .5 Observe that although only 30 percent of the cars delivered to the firm come from agency 2. 2. then for any event A in S such that P(A) Z 0 P(Br ) · P(A|Br ) P(Br |A) = k P(Bi ) · P(A|Bi ) i=1 for r = 1. . show that P(A ∩ B ∩ C ∩ D) = P(A) · P(B|A) · P(C|A ∩ B) · 26. Duplicating the method of proof of Theorem 10. that is. B. B and A is independent of C. 19. ..” sort of reasoning. Substituting the given probabilities into the formula of Theorem 13. Show that if events A and B are independent. (b) P(B|B) = 1. does not have the disease and is found negative for the dis- ease by the test. P(B) Z 0. Refer to Figure 10 to show that if A is independent (b) need not be equal to 1. . show that if (a) events A and B are independent. |B) = P(A1 |B) + P(A2 |B) + · · · for any sequence of mutually exclusive events A1 .03 P(D|p) = = = 0. Solution Let D and p represent the events that a person randomly selected from the given population. . respectively.03 This example demonstrates the near impossibility of finding a test for a rare disease that does not have an unacceptably high probability of false positives.” or “inverse. and C are independent. Probability EXAMPLE 28 A rare but serious disease. A test has been developed that will be positive. then by conditional probabilities. B. then B is not necessarily independent of C. needing an oil change is the effect and coming from agency 2 of is a possible cause. 23. show that 27.01 percent of a certain population. 24. Exercises 17. then P(A|B) = P(A). Show by means of numerical examples that P(B|A) + and C are all pairwise independent.0001 · 0. for 98 percent of those who have the disease and be positive for only 3 percent of those who do not have the disease. 25. Refer to Figure 10 to show that if A is independent of P(D|A ∩ B ∩ C) provided that P(A ∩ B ∩ C) Z 0. of B and A is independent of C. Show that if events A and B are dependent. (a) A and B ∩ C are independent. a good deal of mysticism surrounds Bayes’ theorem because it entails a “back- ward. then (b) events A and B are independent. 20. p. P(B|A ) (a) may be equal to 1. in Example 27. (c) P(A1 ∪ A2 ∪ . . Given three events A. (b) A and B ∪ C are independent. 48 . D. B. prove that P(B|A) < P(B). 21. reasoning “from effect to cause. we get P(D)P(p|D) 0.98 + 0. Show that if P(B|A) = P(B) and P(B) Z 0. it has been the subject of extensive controversy. Find the probability that a person tested as positive does not have the disease. (a) P(A|B) G 0. A2 . Although Bayes’ theorem follows from the postulates of probability and the definition of conditional probability. 28. If P(A|B) < P(A).9999 · 0. Show that the postulates of probability are satisfied 22.9999 · 0. then A is not necessarily independent of B ∪ C. In other words. and C such that P(A ∩ B ∩ C) Z 0 and P(C|A ∩ B) = P(C|B). 18. There can be no question about the validity of Bayes’ theorem.997 P(D)P(p|D) + P(D)P(p|D) 0. If events A. has been found in 0. but considerable arguments have been raised about the assignment of the prior probabilities P(Bi ). then events A and B are dependent. Also. show that P(A|B ∩ C) = P(A|B). Refer to Figure 10 to show that P(A ∩ B ∩ C) = P(A) · P(B) · P(C) does not necessarily imply that A.” For instance. . and 26.∗ To illustrate. the different horses have different probabilities of winning.06 33. to find the probability that a trifecta at the race track will pay off. . expressed in Theorem 2. The assumption of equal likelihood fails when we attempt. For any event A. It seems remarkable that the entire structure of probability and. Incorporated (www. n.24 0. definition. An are independent events. 0. Together with the rules given in Section 5. Copyright © 2012 by Merriam-Webster. Reprinted with permission. Prove that the probability of at least one match is given by Figure 10.” If the concept of probability is to be used in mathematics and scientific applications.06 0. Motivated by problems associated with games of chance.     n−1 n 1 n 1− = 1− 1− 29. . .06 0. It is common to take into account the various ∗ From MERRIAM-WEBSTER’S COLLEGIATE DICTIONARY. Under this assumption one only had to count the number of “success- ful” outcomes and divide by the total number of “possible” outcomes to arrive at the probability of an event. But differ- ences arose among gamblers about probabilities. . but it is difficult to define this word without using the word “probable” or its synonym “likely” in the definition. less circular. Some of them went as far as to postulate some of these rules entirely on the basis of experience. and we are forced to rely on a different method of evaluating probabilities. Probabilities were first considered in games of chance. Show that P(A ∪ B) Ú 1 − P(A ) − P(B ) for any two events A and B defined in the sample space S. 49 . If A1 . Assume that the probability of it coming up on the side numbered i is the same for each value of i. . (Hint: Use {1 − P(A2 )} .14 A∩ (B1 ∪ B2 ∪ · · · ∪Bk) = (A ∩ B1)∪(A ∩ B2 ) ∪ · · · ∪(A∩Bk) 0. prove that n n P(A1 ∪ A2 ∪ · · · ∪ An ) = 1 − {1 − P(A1 )} · 34.24 generalization of the distributive law given in part (b) of Exercise 1: 0. {1 − P(An )} Venn diagrams. Prove Theorem 12 by making use of the following 0. A2 . Suppose that a die has n sides numbered i = 1. this definition lends itself to calculations of proba- bilities that “make sense” and that can be verified experimentally. With this motivation. Here. or gambling. and they brought their questions to the noted mathematicians of their day. C 2. 25. the theory of prob- ability first was developed under the assumption of equal likelihood. A B 31. ELEVENTH EDITION. Show that 2k − k − 1 conditions must be satisfied for k events to be independent.com).Merriam-Webster. can be built on the relatively straightforward foundation given in this chapter. we require a more exact.02 is rolled n times (assume independence) and a “match” is defined to be the occurrence of side i on the ith roll. therefore of statistics. . Probability 30. show that A and ∅ are independent. . The postulates of probability given in Section 4 satisfy this criterion. The entire theory of statistics is based on the notion of probability. Players of various games of chance observed that there seemed to be “rules” that governed the roll of dice or the results of spinning a roulette wheel. The die 0. the modern theory of probability began to be developed. . Diagram for Exercises 24. Webster’s Third New International Dictionary defines “probability” as “the quality or state of being probable.18 32. for example. .) 9 The Theory in Practice The word “probability” is a part of everyday language. ” The reliability of a system of components can be calculated from the reliabil- ities of the individual components if the system consists entirely of components connected in series. the performance of one part does not affect the reliability of the others.” If one bulb fails. 50 . An important application of probability theory relates to the theory of reliabil- ity.000 miles of operation on a passenger car traveling within the speed limits on paved roads. Parallel systems are sometimes called “redundant” systems. Application of the frequency interpretation requires a well-documented his- tory of the outcomes of an event over a large number of experimental trials. For example. which interprets the probability of an event to be the proportion of times the event has occurred in a long series of repeated exper- iments.90 if. that is. if the hydraulic system on a commercial aircraft that lowers the landing wheels fails.” Subjective probabilities should be used only when all other methods fail. Under this assumption. As another example. In the absence of such a history. or in parallel. This idea gives rise to the fre- quency interpretation of probabilities. For example. a personal or subjective assessment is made of the probability of an event which is difficult or impossible to estimate in any other way. they may be lowered manually with a crank. A more recently employed method of calculating probabilities is called the sub- jective method. calculating each horse’s probability of winning by dividing its number of wins by the number of starts. in 90 percent of many previous lots produced to the same specifications by the same process. A series system is one in which all com- ponents are so interrelated that the entire system will fail if any one (or more) of its components fails. For exam- ple. An exam- ple of a series system is a string of Christmas lights connected electrically “in series. THEOREM 14. the number of defectives was three or less. juries use this method when determining the guilt or innocence of a defendant “beyond a reasonable doubt. Probability horses’ records in previous races. the reliability of a parallel system is given by an extension of Definition 5. or both. a series of experiments can be planned and their results observed. Here. but it is close to zero for even short distances at the Indianapolis “500. Proof The proof follows immediately by iterating in Definition 5. The reliability of a series system consisting of n independent components is given by n Rs = Ri i=1 where Ri is the reliability of the ith component. the probability that a lot of manufactured items will have at most three defectives is estimated to be 0. The reliability of a component or system can be defined as follows. the probability that the major stock market indexes will go up in a given future period of time cannot be estimated very well using the frequency interpretation because economic and world conditions rarely replicate themselves very closely. DEFINITION 7. the reliability of a “standard equipment” automobile tire is close to 1 for 10. We shall assume that the components connected in a series system are indepen- dent. the entire string will fail to light. RELIABILITY. Thus. The reliability of a product is the probability that it will function within specified limits for a specified period of time under specified environmental conditions. Thus. and then only with a high level of skepticism. we have the following theorem. A parallel system will fail only if all its components fail. 75 H 0. C . it will fail to function only if all components fail.70)3 = 0.95)(0. E can be replaced by an equivalent component. Thus. If the system complexity were increased so it now has 10 such components. Combination of series and parallel systems. Likewise.772.90) = 0. Again applying Definition 5. One way to improve the reliability of a series system is to introduce parallel redundancy by replacing some or all of its components by several components con- nected in parallel. H.75 0. 51 . Find the reliability of the system.70 G 0. we obtain the following theorem. The reliability of a parallel system consisting of n indepen- dent components is given by  n Rp = 1 − (1 − Ri ) i=1 Proof The proof of this theorem is identical to that of Theorem 14. the probability of failure is Fi = 1 − Ri .95 0. which consists of eight components having the reliabilities shown in the figure.970. For example. F  .970)10 = 0.738. B. THEOREM 15.90 E 0. C having the reliability 1 − (1 − 0.9375)(0.99)(0. F. C 0.75)2 = 0. Solution The parallel subsystem C. the reliability would be reduced to (0. D.9375. EXAMPLE 29 Consider the system diagramed in Figure 11. having the reliability (0. G can be replaced by F  having the reliability 1 − (1 − 0. for the ith com- ponent.70 F A B D 0.99 0. the system is reduced to the par- allel system A. If a system consists of n independent components connected in parallel.970)5 = 0. with (1 − Ri ) replacing Ri . called the “unreliability” of the component. each with a reliability of 0.70 Figure 11. the reliability of the entire system is only (0.973. Thus. Probability Theorem 14 vividly demonstrates the effect of increased complexity on reliability.859. if a series system has 5 components.973)(0. 5. indicate similarly the sets that represent the that the coin comes up tails three times in a row. but die is thrown once. a site in Santa Barbara or Anaheim. B = (c) the intersection of the sets of parts (c) and (d). (i) B ∩ C . (a) he chooses a car without air-conditioning. but neither power steering nor (a) E. but neither air-conditioning (g) E ∪ F . Santa Barbara. (c) he chooses a car with bucket seats. exactly one head occurs. (f) A ∪ B. U is the event that it has a fireplace. the point (4. four. the point (0. and D = {1. no power steering. but neither power steering nor bucket seats. B repre. (h) (B ∪ C) . 1–3 35. (e) B ∪ C. Car 5 is new and has no air-conditioning. . An electronics firm plans to build a research labora. If S = {1. nor bucket seats. more than $200. . state in words what arranged in the series–parallel circuit shown in Figure 12. that together they carry two. and D represents the event that they will choose a site in Los Angeles or 40. which it uses Santa Barbara. (c) G. is represented by the set {Car 1. C = {2. (c) V  . each component may or may not be given by operative. Car 2 is one year old and has air. that the game will operate. will choose a site in San Diego or Santa Barbara. sample space. and 41. T) denotes the event Car 5}. Among the eight cars that a dealer has in his show. V is the event that it costs 36. and (T. 5. and its management has to describe (in words) each of the following events: decide between sites in Los Angeles. 9}. which consists of the eight of the two station wagons can carry five passengers and site selections: the smaller can carry four passengers. 2). T. A coin is tossed once. F stands for the event ing. Probability Applied Exercises SECS. Anaheim. (h) V ∪ W. 4. (l) V ∩ W. then the die comes up 2. 9}. Car 1 is new and has air-conditioning. (c) the elements of S corresponding to event B that at (d) he chooses a car that is either two or three years old. (i) F  ∩ E . 2. (b) (A ∩ B) ∩ C. Using the notation in which (H. 7}. sents the event that they will choose a site in San Diego or Long Beach. (f) T ∩ V. 2) represents the event that at the given moment the larger station wagon has four pas- (g) C ∪ D. (b) the elements of S corresponding to event A that (b) he chooses a car without power steering. 8. An electronic game contains three components 38. (h) E ∩ G . (e) T ∩ U. (k) T ∪ V. 7. passengers. sengers while the smaller one has two passengers. denotes the event that the coin comes up heads and new car. (f) F ∪ G. G stands for the event that each carries the same num- Car 3 is two years old and has air-conditioning and power ber of passengers. and the game will operate only if there is a (a) the complement of the set of part (a). 3) represents the event that at a given moment the larger (a) A . If A represents the event that they (d) W  . one of the station wagons is empty. and conditioning. (c) B ∪ C. kind of car the customer will choose. Long (a) T  . least two tails occur or a number greater than 4 occurs. if the coin comes up tails. list the elements (d) the intersection of parts (b) and (c) of this exercise. (c) C ∩ D. C represents the event that they will choose (j) T ∪ U. it is tossed has power steering as well as bucket seats. A resort hotel has two station wagons. T is the (d) (B ∪ C) ∩ D. station wagon is empty while the smaller one has three (d) B ∩ C. if it comes up heads. for exam- buys one of these cars and the event that he chooses a ple. 42. (e) E ∩ F. 5. let B be the event that 52 . {6. draw a figure showing the 30 points of the corresponding 37. San Diego. or six passengers. if E stands for the event that at least room. . but no bucket seats. 3. . 8. list events that (a) the 10 elements of the sample space S. Then. and has no air-conditioning. Let A be the event (b) the union of the sets of parts (b) and (c). and no bucket seats. (b) U  . A = {1. continuous circuit from P to Q. and no bucket seats. Beach. If the larger subsets of the sample space.000. 4. list the elements of each of the following to shuttle its guests to and from the airport. bucket seats. 39. list the points of the sample space that steering. no power steering. 7. If Ms. 8}. power steer. and W is the event that it is new. if his choice is At any given time. (e) A ∩ C. tory in Southern California. and bucket seats. a Car 8 is three years old. Car 6 is one year (d) E ∪ F. 6. Brown buys one of the houses advertised for sale in a Seattle newspaper (on a given Sunday). (i) V  ∪ W. 3. With reference to Exercise 37. If a customer twice more. Pasadena. and Westwood. (f) (A ∩ C) ∩ D. (b) D . Car 4 is three years old correspond to each of the following events: and has air-conditioning. event that the house has three or more baths. Also. of the subsets of S corresponding to the following events: (a) A ∩ B. Santa Mon- ica. old and has power steering. (b) F. for example. Car 7 is two years old and has no air- conditioning. (g) U  ∩ V. 9}. x (d) regions 2. and N are the events that a car brought to a garage needs an engine overhaul. or new tires. L is the event that a driver has liability (c) transmission repairs or new tires. (b) region 3. 46. (d) M ∪ N. In a group of 200 college students. A market research organization claims that. of all the points (x. 2. 60 stayed for at least 3 hours and spent at 53 . Describe the sample space and determine 4 3 (a) how many elements of the sample space correspond to the event that the 3 appears on the kth roll of the die. students are not enrolled in either course? (Hint: Draw a suitable Venn diagram and fill in the numbers associated with the various regions. what events are represented by 52. (a) region 1. but neither an engine overhaul nor new tires. Venn diagram for Exercise 46. check whether the results of this study should be questioned. M = {x|3 < x F 8}.) 47. How many of these Figure 13. Express in words what events are represented by (d) new tires. With reference to Exercise 46 and Figure 13. 103 regularly buy both. and L C 59 buy neither on a regular basis. 0. A and B. Diagram for Exercise 42. 266 regularly buy Product Y. 64 went on the Mat- (b) regions 2 and 4 together. 308 regularly buy Product X. list the 45. (c) M ∩ N  . Venn diagram for Exercise 48. (a) list the elements of the sample space S and also the elements of S corresponding to events A. 86 spent at least $20. Using a Venn diagram and filling in the number of shoppers associated with the 2 1 3 various regions. and N = region or combinations of regions representing the events {x|5 < x < 10}. and 4. Probability (c) regions 1. or E T B and C. With reference to Exercise 48 and Figure 14. T. denotes that component z (e) regions 2 and 5 together. transmis- y sion repairs. in which (0. 44. Express symbolically the sample space S that consists Figure 14. (b) determine which pairs of events. 115 are enrolled in a course in sociology. and C. for example. find that a car brought to the garage needs (a) M ∪ N. are mutually exclusive. Using the notation (d) regions 1 and 4 together. E. 50. least 3 hours. (a) transmission repairs. 138 are enrolled in a course in psychology. ance. is operative but components x and y are not. 74 stayed for at (a) regions 1 and 2 together. 2. (b) M ∩ N. A and C. 6. If S = {x|0 < x < 10}. and 3 together. −3). 7 2 5 43. 4 51. B. the game will operate though component x is not oper- (c) region 7. Among 120 visitors to Disneyland. (f) regions 3. and 8 together. engine overhaul. ative. (b) an engine overhaul and transmission repairs. y) on or in the circle of radius 3 cen- tered at the point (2. 3. and let C be the event that the game will operate though component y is not operative. In Figure 14. 6 8 (b) how many elements of the sample space correspond to the event that the 3 appears not later than the kth roll N of the die. and 91 are enrolled in both. regions 1. An experiment consists of rolling a die until a 3 1 appears. In Figure 13. 5. but not an insurance and C is the event that she has collision insur. 49. 3. Express in words the events represented by Figure 12. and 4 together? P z Q 48. 1). among 500 shoppers interviewed. terhorn ride. 01. or Goodrich tires. Goodrich tires. Find the probability that both cards will be tie the game is 0. P(A) = 0. P(B) = 0. 4–5 (d) together they carry at least six passengers. (d) P(A ∩ B). 0. Find the probabilities that the serviceabil.40. P(C) = 0. ber of passengers. In a poker game. P(B) = 0.50. or very easy are.60. 54. Four candidates are seeking a vacancy on a school 0.21.06.58. P(C) = 0.20.20.28.20. P(D) = 0.03. find cars and the probabilities are 0. 25. and 0.29. or 4.25. and 10 blue slips of paper numbered from 1 (a) P(A) = 0.21. board. A police department needs new tires for its patrol P(B) = 0. 57.66 and the probability that she will not pass is (c) Michelin or Armstrong tires. Goodyear tires. or 35. Matterhorn ride. 54 spent at least $20 and went on the (b) neither very difficult nor very easy. drawn. (b) P(B ). spent at least $20. tion is 0.17. (c) P(A ∪ B). −0. (e) P(A ∩ B ).14.08.05. (a) at least one of the station wagons is empty. through 10. and P(E) = 0. D. difficult. P(D) = 0. An experiment has five possible outcomes. (b) numbered 1. 0. 0. (a) The probability that Jean will pass the bar examina. (a) blue or white. an ordinary deck of 52 playing cards. (c) The probabilities that a secretary will make 0. If A is twice as likely to be elected as B. Check whether the fol. Michelin tires. the smaller station wagon. (d) numbered 5. 4. 3. and 59. If these 80 slips of paper are thoroughly shuf- and P(E) = 0.09. 62.12. that it will buy Uniroyal tires. 56. A. Find the probabili- ity of the machine will be rated ties of getting 54 . more bad checks on any given day are. five cards are dealt at random from 0. but stayed less than 3 (b) each of the two station wagons carries the same num- hours and spent less than $20. or Goodrich tires. and 0.10. General. (c) stayed less than 3 hours. (e) white and numbered higher than 12 or yellow and and P(E) = 0. Probability least $20. (d) Uniroyal.95. while C is twice as likely to be elected as D. (d) P(A) = 0. 0. fled so that each slip has the same probability of being (b) P(A) = 0. Michelin. If A and B are mutually exclusive. the probability that it will 60.15. P(B) = 0. respectively. P(D) = 0. and went on the Matterhorn ride. P(D) = 0. (b) Uniroyal. (a) P(A ). Michelin. Drawing a (d) average or better. Two cards are randomly drawn from a deck of 52 tie the game is 0.23. 0. (b) The probability that the home team will win an upcoming football game is 0. or 3 or and C are given about the same chance of being elected. P(C) = 0.22. 0. at least $20.26.24.20. P(B) = 0. 2. and the probability that it will win or playing cards.12.10. 2. 0. P(D) = 0. 10 red slips of paper numbered from 1 lowing assignments of probabilities are permissible and through 10. 0.44. numbered higher than 26. SECS. (b) went on the Matterhorn ride.34. or 5 or more mistakes in typing a report are. and 0.77. (c) red or yellow and also numbered 1. Suppose that each of the 30 points of the sample space find how many of the 120 visitors to Disneyland 1 of Exercise 40 is assigned the probability 30 . easy. Find the (a) stayed for at least 3 hours. 0. and 48 stayed for at least 3 hours. what are the 0. 61.22.08.12. respectively. and P(E) = −0. spent (c) average or worse.08.34.36. respectively.08. P(C) = 0. 4. 52 stayed for at least 3 hours and went on the (a) difficult or very difficult. that is (c) P(A) = 0. respectively. and 0. 0. Matterhorn ride. A hat contains 20 white slips of paper numbered from and E.30. B. 3. C.20. 15. spent at least $20. 40 yellow slips of paper numbered from 1 explain your answers: through 40. 0. or Armstrong tires. 0. 1. find the probabilities of drawing a slip of paper and P(E) = 0. 58. probabilities that (a) C will win.19. The probabilities that the serviceability of a new (b) A will not win? X-ray machine will be rated very difficult. but did not (c) the larger station wagon carries more passengers than go on the Matterhorn ride. but did probabilities that at a given moment not go on the Matterhorn ride.20. 2.29. General tires. 1. P(B) = 0. and B (d) The probabilities that a bank will get 0. 2. Find the probabilities that it 55. (e) P(A) = 0. 1 through 20. or 5. Venn diagram with three circles (like that of Figure 4) and filling in the numbers associated with the various regions. aver- age. 53. 3. (f) P(A ∩ B ).07. greater than 3 and less than 8. Explain why there must be a mistake in each of the will buy following statements: (a) Goodyear or Goodrich tires. P(C) = 0.21.18.37. that are mutually exclusive.10. 0.08. For married couples living in a certain suburb. 71. It is also known that 59 of the stu- (b) four of a kind (four cards of equal face value).59.55. each doctor has a proba. the carry malpractice insurance.67. 36 are surgeons. the probability that he will see London. Suppose that if a person visits Disneyland. and the probability that they will both 0. (c) a full house (three of a kind and a pair). and Amsterdam is 0. and the proba.35.86. (c) The probability that a person visiting the San Diego Zoo will see the giraffes is 0. Among the 78 doctors on the staff of a hospital. and the probability that he will on any given day.21. respectively. for the first time will see at least one of these four cities? (c) P(A ∪ B ).52. What is the prob. P(B) = 0. dents live off campus. the proba- probability that the husband will vote in a school board bility that he will see Rome and Amsterdam is 0.70. drawn at random bility that he will see Rome is 0. find What is the probability that a person traveling to Europe (a) P(A ∪ B).05. the probabil- ity that he will see Paris and Rome is 0.72. 0.45.15. and the probability that (a) The probability that it will rain is 0.74. and the probability that she will get a passing grade in English and French is 0. At Roanoke College it is known that 13 of the stu- exactly twice).82. Given P(A) = 0. see both is 0.58. In a game of Yahtzee. simultaneously. Explain on the basis of the various rules of of the two assistants will be absent on any given day is Exercises 5 through 9 why there is a mistake in each of 0. the probability that at least half the area of the triangle. Rome.34. will divide the triangle so that the area between will see London and Paris is 0. and vote is 0.12. A right triangle has the legs 3 and 4 units. the probability that he the line and the vertex opposite the hypotenuse will equal will see London and Rome is 0. the probability that the wife will vote in probability that he will see London. and the 69. in English is 0. Probability (a) two pairs (any two distinct face values occurring 70.32. A biology professor has two graduate assistants help- (d) four of a kind.21.58.02. 64 ability that he will go on the Jungle Cruise is 0. (b) P(A ∩ B ). and P(A ∩ B) = probability that he will see all four of these cities is 0. the probability that he will see Lon- them will vote? don. (d) P(A ∩ B ). and Amsterdam is 0.30. the probability that he (c) only one of the two graduate assistants will be absent will see the bears is 0. the probability that he will see Paris is 0. the probability that he will see Paris. Find the probabilities of getting What is the probability that a student selected at ran- (a) two pairs. (Hint: Use the formula of Exercise 13. the election is 0. and the probability that he will go on all three of these rides is 0. 72.A.26. dents are from within the state of Virginia and that 34 63. the probability that he triangle. and 34 of probability that he will ride the Monorail is 0.52. and Rome is the election is 0.) 55 .42. the probability that he will see London is 0.21.20.62. A line segment of length l is divided by a point bility that a person visiting Disneyland will go on at least selected at random within the segment. the probability that he will go on the 1 bility of 78 of being selected). he will see London and Amsterdam is 0. five balanced dice are rolled of the students are from out of state or live on campus. the doctors is chosen by lot to represent the hospital staff at probability that he will go on the Jungle Cruise and ride an A. probability that he will ride the Monorail and go on the practice insurance? Matterhorn ride is 0. ing her with her research. the the one chosen is not a surgeon and does not carry mal.08.70.23.M.39. the probability that the younger of the two will be the following statements: absent on any given day is 0.46. they will both be absent on any given day is 0. Paris. What is the proba- 66. the probability 68. the that he will see Paris and Amsterdam is 0. What is the probability that at least one of Amsterdam is 0. the Monorail is 0. the proba- Find the probability that a line segment. dom from Roanoke College is from out of state and lives on campus? (b) three of a kind. If one of these ability that he will go on the Matterhorn ride is 0. 67. Find the bility that it will rain or snow is 0. the probability that parallel to the hypotenuse and contained entirely in the he will see Amsterdam is 0. Suppose that if a person travels to Europe for the segment exactly in half? first time. (b) at least one of the two graduate assistants will not be absent on any given day. Paris. The probability that the older 64. Rome. the prob- 65.28. the prob- the surgeons carry malpractice insurance. probabilities that (a) either or both of the graduate assistants will be absent (b) The probability that a student will get a passing grade on any given day. what is the probability that Jungle Cruise as well as the Matterhorn ride is 0.44. one of these three rides? ability that it will divide the line segment in a ratio greater than 1:2? What is the probability that it will divide the 73.41.84. convention (that is.64. ” white. the odds are 11 that he or she is from out of state? to 2 that they will not all be $1 bills.000 raise. Use the results of Exercise 76 to verify that chosen integer will have a value less than 1. There are 90 applicants for a job with the news depart- subjective probabilities. of the subjective probability concept usually impose this postulate as a consistency criterion. a party official three years’ experience and some have not. (b) P(T  ). what is the probability (b) If a person has eight $1 bills. Furthermore. (f) P(G |T  ). five $5 bills. the odds are 34 to 21 that at least one of them will be cracked. 5. of which 25 are red. generate 1. 9). integers on the interval (0. set of 1.000 random integers on (0. 80. (c) The probability of rolling “7 or 11” with a pair of bal. and 35 are black. With reference to Exercise 70. (c) P(G ∩ T). However. To be consistent (see Exercise 82). the third postulate of (a) The probability that the last digit of a car’s license probability may not be satisfied. then we can use the sample space S2 of Less than three years’ experience 36 27 Example 2 to describe the outcomes of the experiment. what three years’ experience. he feels that the odds College college are 7 to 5 that he will run for one or the other. 86. and a first applicant interviewed is a college graduate. 40 are (c) If we arbitrarily arrange the letters in the word “nest. with the exact replies that the odds are 2 to 1 that he will not run for breakdown being the House of Representatives and 4 to 1 that he will not Not run for the Senate. and randomly selects three of them. If we let x = the number of spots facing up when a pair of dice is cast. in other words. the station manager is random. Using a computer program that can generate random (d) P(G ∩ T  ). generate a second P(G ∩T  ) (b) P(G |T  ) = P(T  ) . what is the probability set will be less than 1 or B: an integer selected at random that the doctor chosen to represent the hospital staff at from the second set will be less than 1 the convention carries malpractice insurance given that (a) using the frequency interpretation of probabilities. 6. Estimate the prob- ability that A: an integer selected at random from the first 78. 85. proponents plate is a 2. 3. and T is reporter feels that the odds against their winning are 3 the event that the first applicant interviewed has at least to 1 and 5 to 3. 81. some of them have at least (b) Asked about his political future. With reference to Exercise 65. 4. she feels that it is an SECS. Are the graduates graduates corresponding probabilities consistent? At least three years’ experience 18 9 83. There are two Porsches in a road race in Italy.000 raise and 11 to 1 against her getting a $2. Probability 74. (a) A high school principal feels that the odds are 7 to 5 anced dice is 29 . If two balls are selected from the the odds are 5 to 1 that we will not get a meaningful word bin without replacement. he or she is a surgeon? (b) using Theorem 7 and P(A ∩ B) = 81 1 . A bin contains 100 balls. against her getting a $1. Discuss the consistency of the corresponding 76. (a) Find the probability of each outcome in S2 . (e) P(T|G). Furthermore. Some of them are college graduates and some are not. what is the probability following odds to probabilities: that a husband will vote in the election given that his wife (a) If three eggs are randomly chosen from a carton of 12 is going to vote? eggs of which 3 are cracked. will be red and one will be white? 75. tulate as inconsistent. 56 . or 7 is 106 . Using the method of Exercise 85. Use the definition of “odds” given in Exercise 15 to 82. 6–8 even-money bet that she will get one of these raises or the other. and one $20 that one of the students will be living on campus given bill. G is the event that the 84. Use the formula of Exercise 15 to convert each of the 79. If subjective probabilities are determined by the convert each of the following probabilities to odds: method suggested in Exercise 16. 9) with equal probabilities. determine each of the following odds should the reporter assign to the event that either probabilities directly from the entries and the row and car will win? column totals of the table: (a) P(G). With reference to Exercise 68. If the order in which the applicants are interviewed by (b) Verify that the sum of these probabilities is 1. P(G∩T) (a) P(T|G) = P(G) . ment of a television station.000 such integers and use the frequency inter- pretation to estimate the probability that such a randomly 77. what is the probability that one in the English language. they (b) The probability of getting at least two heads in four regard subjective probabilities that do not satisfy the pos- flips of a balanced coin is 11 16 . (b) go on the Matterhorn ride given that he will go on the (b) the first defective part found will be on the fourth Jungle Cruise and ride the Monorail. a sunny day. and another rainy day. if followed by a customer does not pay promptly one month.40.10 that the basketball teams of four universities.48. and W. what is the probabil. If a person randomly picks 4 of the 15 gold coins a a month is 0. find the probabilities of getting C. find the probabilities 95. The probability of surviving a certain transplant oper. B. Medical records show that one out of 10 persons in that a crate will be shipped if it contains 120 eggs. With reference to Figure 15. (a) the first four parts chosen arbitrarily for inspection are gle Cruise. respectively. what is the probability that none of the trucks cho- domly removing three eggs in succession and examining sen will meet emission standards? their contents.80 and the tomers once a month has found that if a customer pays probability that a sunny fall day is followed by a rainy promptly one month. ineligible for the championship. and D are independent. paying on any given month depends only on the outcome (d) rain two days later. What is the probability of surviving both dealer has in stock. With reference to Exercise 72. inspection. onship? (b) independent? 88. 57 . If all three eggs are good. the prob- ability that his or her body will reject the transplant within 99.52 and 0. of these critical stages? what is the probability that the coins picked will all be counterfeits? 91. nondefective. What is the probability 98. are the events A. Suppose that in Vancouver. If the coin is tossed (d) go on the Matterhorn ride and the Jungle Cruise given three times. that an even number comes up on the second toss. Probability 87.60. and C If university U is placed on probation and declared is the event that both tosses result in the same number. and 6 of the coins are counterfeits. Find the probabilities that a rainy fall day is she will also pay promptly the next month. senting A consists of two circles. the probability that 100. verify that events A. 101. U. 94. A coin is loaded so that the probabilities of heads and Monorail and/or go on the Matterhorn ride. 0. B. B is the event T. however. and so do the regions (b) two hits and a miss in any order. (a) pairwise independent. Assuming independence. If A is the event that and 0. Note that the region repre- (a) a hit followed by two misses. only 0.C. otherwise it is rejected.55. of the previous month.30. and C ity that university T will win the conference champi. A sharpshooter hits a target with probability 0. what is the probability that at least one of them will have a thy- 90. (Assume that the probability of paying or not (c) two rainy days and then two sunny days. If 12 persons in 10 have blood clots? this town are randomly chosen and tested. ability that he or she will pay promptly the next month is (b) two sunny days and then a rainy day. If a patient survives the operation. will win their conference championship. (Hint: Draw a Venn diagram and fill in the probabilities (b) two tails and a head in that order? associated with the various regions. Crates of eggs are inspected for blood clots by ran- tion. (c) not go on the Jungle Cruise given that he will ride the 96. 0. of next two months and then make a prompt payment the which 15 are healthy and 5 are diseased. the crate is shipped.] promptly one month will also pay promptly the next three months? 92. (a) all heads. If 5 of a company’s 10 delivery trucks do not meet emission standards and 3 of them are chosen for inspec- 89. roid deficiency? ation is 0. Use the formula of Exercise 19 to find the probability (b) What is the probability that a customer who does not of randomly choosing (without replacement) four healthy pay promptly one month will also not pay promptly the guinea pigs from a cage containing 20 guinea pigs. what are the probabilities of getting that he will not ride the Monorail. A department store that bills its charge-account cus- a rainy fall day is followed by a rainy day is 0. V. an even number comes up on the first toss. tails are 0.75. representing B and C. of which a certain town has a thyroid deficiency.000 parts contains 1 percent defective that a person who visits Disneyland will parts. Find the probability that (a) ride the Monorail given that he will go on the Jun.) 97. It is felt that the probabilities are 0.) (a) What is the probability that a customer who pays [Hint: In part (c) use the formula of Exercise 19. the prob- (a) a rainy day. B.20.20.90 that he or day is 0. month after that? 93.40.. A shipment of 1. the probability is 0. A balanced die is tossed twice. Probability D A B A B 1 1 1 1 1 1 18 18 18 9 9 9 1 1 1 36 1 1 18 1 36 36 18 18 1 18 1 1 18 36 C C 1 9 Figure 15. Diagram for Exercise 101. 102. At an electronics plant, it is known from past expe- the probability that a one-car accident is incorrectly rience that the probability is 0.84 that a new worker who attributed to faulty brakes is 0.03. What is the probabil- has attended the company’s training program will meet ity that the production quota, and that the corresponding prob- (a) a one-car accident will be attributed to faulty brakes; ability is 0.49 for a new worker who has not attended (b) a one-car accident attributed to faulty brakes was the company’s training program. If 70 percent of all new actually due to faulty brakes? workers attend the training program, what is the proba- bility that a new worker will meet the production quota? 107. With reference to Example 25, suppose that we dis- cover later that the job was completed on time. What is 103. It is known from experience that in a certain indus- the probability that there had been a strike? try 60 percent of all labor–management disputes are over wages, 15 percent are over working conditions, and 25 108. In a certain community, 8 percent of all adults over percent are over fringe issues. Also, 45 percent of the 50 have diabetes. If a health service in this community disputes over wages are resolved without strikes, 70 per- correctly diagnoses 95 percent of all persons with dia- cent of the disputes over working conditions are resolved betes as having the disease and incorrectly diagnoses 2 without strikes, and 40 percent of the disputes over fringe percent of all persons without diabetes as having the dis- issues are resolved without strikes. What is the probabil- ease, find the probabilities that ity that a labor–management dispute in this industry will (a) the community health service will diagnose an adult be resolved without a strike? over 50 as having diabetes; (b) a person over 50 diagnosed by the health service as 104. In a T-maze, a rat is given food if it turns left and an having diabetes actually has the disease. electric shock if it turns right. On the first trial there is a 50–50 chance that a rat will turn either way; then, if it 109. An explosion at a construction site could have receives food on the first trial, the probability is 0.68 that occurred as the result of static electricity, malfunctioning it will turn left on the next trial, and if it receives a shock of equipment, carelessness, or sabotage. Interviews with on the first trial, the probability is 0.84 that it will turn left construction engineers analyzing the risks involved led on the next trial. What is the probability that a rat will to the estimates that such an explosion would occur with turn left on the second trial? probability 0.25 as a result of static electricity, 0.20 as a result of malfunctioning of equipment, 0.40 as a result of 105. With reference to Exercise 103, what is the probabil- carelessness, and 0.75 as a result of sabotage. It is also ity that if a labor–management dispute in this industry is felt that the prior probabilities of the four causes of the resolved without a strike, it was over wages? explosion are 0.20, 0.40, 0.25, and 0.15. Based on all this 106. The probability that a one-car accident is due to information, what is faulty brakes is 0.04, the probability that a one-car acci- (a) the most likely cause of the explosion; dent is correctly attributed to faulty brakes is 0.82, and (b) the least likely cause of the explosion? 58 Probability 110. A mail-order house employs three stock clerks, U, (b) If 106 of 250 students answered “yes” under these V, and W, who pull items from shelves and assemble conditions, use the result of part (a) and 106 250 as an esti- them for subsequent verification and packaging. U makes mate of P(Y) to estimate P(M). a mistake in an order (gets a wrong item or the wrong quantity) one time in a hundred, V makes a mistake in SEC. 9 an order five times in a hundred, and W makes a mistake 113. A series system consists of three components, each in an order three times in a hundred. If U, V, and W fill, having the reliability 0.95, and three components, each respectively, 30, 40, and 30 percent of all orders, what are having the reliability 0.99. Find the reliability of the the probabilities that system. (a) a mistake will be made in an order; (b) if a mistake is made in an order, the order was filled 114. Find the reliability of a series systems having five by U; components with reliabilities 0.995, 0.990, 0.992, 0.995, and 0.998, respectively. (c) if a mistake is made in an order, the order was filled by V? 115. What must be the reliability of each component in a 111. An art dealer receives a shipment of five old paint- series system consisting of six components that must have ings from abroad, and, on the basis of past experience, a system reliability of 0.95? she feels that the probabilities are, respectively, 0.76, 0.09, 116. Referring to Exercise 115, suppose now that there 0.02, 0.01, 0.02, and 0.10 that 0, 1, 2, 3, 4, or all 5 of them are 10 components, and the system reliability must be are forgeries. Since the cost of authentication is fairly 0.90. high, she decides to select one of the five paintings at ran- dom and send it away for authentication. If it turns out 117. Suppose a system consists of four components, con- that this painting is a forgery, what probability should she nected in parallel, having the reliabilities 0.8, 0.7, 0.7, and now assign to the possibility that all the other paintings 0.65, respectively. Find the system reliability. are also forgeries? 118. Referring to Exercise 117, suppose now that the sys- 112. To get answers to sensitive questions, we sometimes tem has five components with reliabilities 0.85, 0.80, 0.65, use a method called the randomized response technique. 0.60, and 0.70, respectively. Find the system reliability. Suppose, for instance, that we want to determine what percentage of the students at a large university smoke 119. A system consists of two components having the reli- marijuana. We construct 20 flash cards, write “I smoke abilities 0.95 and 0.90, connected in series to two parallel marijuana at least once a week” on 12 of the cards, where subsystems, the first containing four components, each 12 is an arbitrary choice, and “I do not smoke marijuana having the reliability 0.60, and the second containing two at least once a week” on the others. Then, we let each stu- components, each having the reliability 0.75. Find the sys- dent (in the sample interviewed) select one of the 20 cards tem reliability. at random and respond “yes” or “no” without divulging the question. 120. A series system consists of two components having (a) Establish a relationship between P(Y), the probabil- the reliabilities 0.98 and 0.99, respectively, connected to a ity that a student will give a “yes” response, and P(M), parallel subsystem containing five components having the the probability that a student randomly selected at that reliabilities 0.75, 0.60, 0.65, 0.70, and 0.60, respectively. university smokes marijuana at least once a week. Find the system reliability. References Among the numerous textbooks on probability theory Freund, J. E., Introduction to Probability. New York: published in recent years, one of the most popular is Dover Publications, Inc., 1993 Reprint, Feller, W., An Introduction to Probability Theory and Its Goldberg, S., Probability—An Introduction. Mineola, Applications, Vol. I, 3rd ed. New York: John Wiley & N.Y.: Dover Publications, Inc. (republication of 1960 Sons, Inc., 1968. edition), Hodges, J. L., and Lehmann, E. L., Elements of Finite More elementary treatments may be found in Probability. San Francisco: Holden Day, Inc., Barr, D. R., and Zehna, P. W., Probability: Modeling 1965, Uncertainty. Reading, Mass.: Addison-Wesley Publish- Nosal, M., Basic Probability and Applications. Philadel- ing Company, Inc., 1983, phia: W. B. Saunders Company, 1977. Draper, N. R., and Lawrence, W. E., Probability: An Introductory Course. Chicago: Markam Publishing More advanced treatments are given in many texts, Company, 1970, for instance, 59 Probability Hoel, P., Port, S. C., and Stone, C. J., Introduction to Solomon, F., Probability and Stochastic Processes. Upper Probability Theory. Boston: Houghton Mifflin Com- Saddle River, N.J.: Prentice Hall, 1987. pany, 1971, For more on reliability and related topics, see Khazanie, R., Basic Probability Theory and Applica- tions. Pacific Palisades, Calif.: Goodyear Publishing Johnson, R. A., Miller and Freund’s Probability and Company, Inc., 1976, Statistics for Engineers. Upper Saddle River, N.J.: Parzen, E., Modern Probability Theory and Its Applica- Prentice Hall, 2000, tions. New York: John Wiley & Sons, Inc., 1960, Miller, I. and Miller, M., Statistical Methods for Qual- Ross, S., A First Course in Probability, 3rd ed. New York: ity with Applications to Engineering and Management. Macmillan Publishing Company, 1988. Upper Saddle River, N.J.: Prentice Hall, 1995. Answers to Odd-Numbered Exercises 35 (a) {6, 8, 9}; (b) {8}; (c) {1, 2, 3, 4, 5, 8}; (d) {1, 5}; 57 (a) 13 ; (b) 16 ; (c) 1; (d) 13 . 2 (e) {2, 4, 8}; (f) Ø. 59 (a) 0.46; (b) 0.40; (c) 0.11 (d) 0.68. 37 (a) {Car 5, Car 6, Car 7, Car 8}; 61 (a) 29 ; (b) 59 . (b) {Car 2, Car 4, Car 5, Car 7}; (c) {Car 1, Car 8}; 25 ; 25 ; 25 ; 25 . (d) {Car 3, Car 4, Car 7, Car 8}. 63 (a) 108 (b) 162 (c) 648 (d) 1296 39 (a) The house has fewer than three baths. (b) The house 2 65 13 . does not have a fireplace. (c) The house does not cost more √ than $200,000. (d) The house is not new. (e) The house has 67 1 − 22 . three or more baths and a fireplace. (f) The house has three 69 (a) 0.68; (b) 0.38; (c) 0.79; (d) 0.32. or more baths and costs more than $200,000. (g) The house 71 (a) 0.11; (b) 0.98; (c) 0.09. costs more than $200,000 but has no fireplace. (h) The house 73 0.94. is new or costs more than $200,000. (i) The house is new or 75 (a) 3 to 2; (b) 11 to 5; (c) 7 to 2 against it. costs at most $200,000. (j) The house has three or more baths and/or a fireplace. (k) The house has three or more baths 77 (a) 13 ; (b) 37 . and/or costs more than $200,000. (l) The house is new and 79 15 28 . costs more than $200,000. 81 (a) 0.2; (b) 20 99 . 41 (a) (H,1), (H,2), (H,3), (H,4), (H,5), (H,6), (T,H,H), Outcome 2 3 4 5 6 7 8 9 10 11 12 (T,H,T), (T,T,H), and (T,T,T); (b) (H,1), (H,2), (H,3), 83 Probability 1 1 1 1 5 1 5 1 1 1 1 (H,4), (H,5), (H,6), (T,H,T), and (T,T,H); (c) (H,5), (H,6), 36 18 12 9 36 6 36 9 12 18 36 (T,H,T), (T,T,H), and (T,T,T). 87 13 . 5k − 1 89 0.7685. 43 (a) 5k−1 ; (b) . 4 91 (a) 0.096; (b) 0.048; (c) 0.0512; (d) 0.76. 45 (a) (x|3 < x < 10); (b) (x|15 < x ≤ 8); (c) (x|3 < x ≤ 5); (d) 3 ; (b) 27 . (x|0 < x ≤ 3) or (5 < x < 10). 93 (a) 64 64 47 (a) The event that a driver has liability insurance. (b) The 95 (a) Required probability = 0.9606; exact probability = event that a driver does not have collision insurance. (c) The 0.9605; (b) required probability = 0.0097 (assuming inde- event that a driver has liability insurance or collision insur- pendence); exact probability = 0.0097. 1 . 97 12 ance, but not both. (d) The event that a driver does not have both kinds of insurance. 99 911 . 49 (a) Region 5; (b) regions 1 and 2 together; (c) regions 103 0.475. 3, 5, and 6 together; (d) regions 1, 3, 4, and 6 together. 105 0.5684. 51 38. 107 0.3818. 53 (a) Permissible; (b) not permissible because the sum 109 (a) Most likely cause is sabotage (P = 0.3285); of the probabilities exceeds 1; (c) permissible; (d) not (b) least likely cause is static electricity (P = 0.1460). permissible because P(E) is negative; (e) not permissible because the sum of the probabilities is less than 1. 111 0.6757. 55 (a) The probability that she cannot pass cannot be neg- 113 0.832. ative. (b) 0.77 + 0.08 = 0.85 Z 0.95; (c) 0.12 + 0.25 + 0.36 + 115 0.991. 0.14 + 0.09 + 0.07 = 1.03 > 1; (d) 0.08 + 0.21 + 0.29 + 0.40 = 117 0.9937. 0.98 < 1. 119 0.781. 60 Probability Distributions and Probability Densities 1 Random Variables 5 Multivariate Distributions 2 Probability Distributions 6 Marginal Distributions 3 Continuous Random Variables 7 Conditional Distributions 4 Probability Density Functions 8 The Theory in Practice 1 Random Variables In most applied problems involving probabilities we are interested only in a partic- ular aspect (or in two or a few particular aspects) of the outcomes of experiments. For instance, when we roll a pair of dice we are usually interested only in the total, and not in the outcome for each die; when we interview a randomly chosen married couple we may be interested in the size of their family and in their joint income, but not in the number of years they have been married or their total assets; and when we sample mass-produced light bulbs we may be interested in their durability or their brightness, but not in their price. In each of these examples we are interested in numbers that are associated with the outcomes of chance experiments, that is, in the values taken on by random vari- ables. In the language of probability and statistics, the total we roll with a pair of dice is a random variable, the size of the family of a randomly chosen married couple and their joint income are random variables, and so are the durability and the brightness of a light bulb randomly picked for inspection. To be more explicit, consider Figure 1, which pictures the sample space for an experiment in which we roll a pair of dice, and let us assume that each of the 36 1 possible outcomes has the probability 36 . Note, however, that in Figure 1 we have attached a number to each point: For instance, we attached the number 2 to the point (1, 1), the number 6 to the point (1, 5), the number 8 to the point (6, 2), the number 11 to the point (5, 6), and so forth. Evidently, we associated with each point the value of a random variable, that is, the corresponding total rolled with the pair of dice. Since “associating a number with each point (element) of a sample space” is merely another way of saying that we are “defining a function over the points of a sample space,” let us now make the following definition. From Chapter 3 of John E. Freund’s Mathematical Statistics with Applications, Eighth Edition. Irwin Miller, Marylees Miller. Copyright © 2014 by Pearson Education, Inc. All rights reserved. 61 Probability Distributions and Probability Densities Green die 7 8 9 10 11 12 6 6 7 8 9 10 11 5 5 6 7 8 9 10 4 4 5 6 7 8 9 3 3 4 5 6 7 8 2 2 3 4 5 6 7 1 Red die 1 2 3 4 5 6 Figure 1. The total number of points rolled with a pair of dice. DEFINITION 1. RANDOM VARIABLE. If S is a sample space with a probability measure and X is a real-valued function defined over the elements of S, then X is called a random variable.† In this chapter we shall denote random variables by capital letters and their values by the corresponding lowercase letters; for instance, we shall write x to denote a value of the random variable X. With reference to the preceding example and Figure 1, observe that the random variable X takes on the value 9, and we write X = 9 for the subset {(6, 3), (5, 4), (4, 5), (3, 6)} of the sample space S. Thus, X = 9 is to be interpreted as the set of elements of S for which the total is 9 and, more generally, X = x is to be interpreted as the set of elements of the sample space for which the random variable X takes on the value x. This may seem confusing, but it reminds one of mathematicians who say “f (x) is a function of x” instead of “f (x) is the value of a function at x.” EXAMPLE 1 Two socks are selected at random and removed in succession from a drawer contain- ing five brown socks and three green socks. List the elements of the sample space, the corresponding probabilities, and the corresponding values w of the random variable W, where W is the number of brown socks selected. † Instead of “random variable,” the terms “chance variable,” “stochastic variable,” and “variate” are also used in some books. 62 Probability Distributions and Probability Densities Solution If B and G stand for brown and green, the probabilities for BB, BG, GB, and GG are, respectively, 58 · 47 = 14 , 8 · 7 = 15 5 5 3 56 , 8 · 7 = 56 , and 8 · 7 = 28 , and the results are 3 5 15 3 2 3 shown in the following table: Element of sample space Probability w 5 BB 2 14 15 BG 1 56 15 GB 1 56 3 GG 0 28 Also, we can write P(W = 2) = 14 5 , for example, for the probability of the event that the random variable W will take on the value 2. EXAMPLE 2 A balanced coin is tossed four times. List the elements of the sample space that are presumed to be equally likely, as this is what we mean by a coin being balanced, and the corresponding values x of the random variable X, the total number of heads. Solution If H and T stand for heads and tails, the results are as shown in the following table: Element of sample space Probability x 1 HHHH 4 16 1 HHHT 3 16 1 HHTH 3 16 1 HTHH 3 16 1 THHH 3 16 1 HHTT 2 16 1 HTHT 2 16 63 Probability Distributions and Probability Densities Element of sample space Probability x 1 HTTH 2 16 1 THHT 2 16 1 THTH 2 16 1 TTHH 2 16 1 HTTT 1 16 1 THTT 1 16 1 TTHT 1 16 1 TTTH 1 16 1 TTTT 0 16 Thus, we can write P(X = 3) = 164 , for example, for the probability of the event that the random variable X will take on the value 3. The fact that Definition 1 is limited to real-valued functions does not impose any restrictions. If the numbers we want to assign to the outcomes of an experiment are complex numbers, we can always look upon the real and the imaginary parts sepa- rately as values taken on by two random variables. Also, if we want to describe the outcomes of an experiment quantitatively, say, by giving the color of a person’s hair, we can arbitrarily make the descriptions real-valued by coding the various colors, perhaps by representing them with the numbers 1, 2, 3, and so on. In all of the examples of this section we have limited our discussion to discrete sample spaces, and hence to discrete random variables, namely, random variables whose range is finite or countably infinite. Continuous random variables defined over continuous sample spaces will be taken up in Section 3. 2 Probability Distributions As we already saw in Examples 1 and 2, the probability measure defined over a dis- crete sample space automatically provides the probabilities that a random variable will take on any given value within its range. 1 For instance, having assigned the probability 36 to each element of the sam- ple space of Figure 1, we immediately find that the random variable X, the total 4 rolled with the pair of dice, takes on the value 9 with probability 36 ; as described in Section 1, X = 9 contains four of the equally likely elements of the sample space. The probabilities associated with all possible values of X are shown in the follow- ing table: 64 Probability Distributions and Probability Densities x P(X = x) 1 2 36 2 3 36 3 4 36 4 5 36 5 6 36 6 7 36 5 8 36 4 9 36 3 10 36 2 11 36 1 12 36 Instead of displaying the probabilities associated with the values of a random variable in a table, as we did in the preceding illustration, it is usually preferable to give a formula, that is, to express the probabilities by means of a function such that its values, f (x), equal P(X = x) for each x within the range of the random variable X. For instance, for the total rolled with a pair of dice we could write 6 − |x − 7| f (x) = for x = 2, 3, . . . , 12 36 as can easily be verified by substitution. Clearly, 6 − |2 − 7| 6−5 1 f (2) = = = 36 36 36 6 − |3 − 7| 6−4 2 f (3) = = = 36 36 36 ..................... 6 − |12 − 7| 6−5 1 f (12) = = = 36 36 36 and all these values agree with the ones shown in the preceding table. DEFINITION 2. PROBABILITY DISTRIBUTION. If X is a discrete random variable, the function given by f(x) = P(X = x) for each x within the range of X is called the probability distribution of X. 65 Probability Distributions and Probability Densities Based on the postulates of probability, we obtain the following theorem. THEOREM 1. A function can serve as the probability distribution of a dis- crete random variable X if and only if its values, f (x), satisfy the conditions 1. f (x) G 0 for each value within its domain;  2. f (x) = 1, where the summation extends over all the values within x its domain. EXAMPLE 3 Find a formula for the probability distribution of the total number of heads obtained in four tosses of a balanced coin. Solution Based on the probabilities in the table, we find that P(X = 0) = 16 1 , P(X = 1) = 4 16 , P(X = 2) = 6 16 , P(X = 3) = 4 16 , and P(X = 4) = 1 16 . Observing that the numerators of         these five fractions,   1, 4, 6, 4, and 1, are the binomial coefficients 4 4 4 4 4 0 , 1 , 2 , 3 , and 4 , we find that the formula for the probability distri- bution can be written as   4 x f (x) = for x = 0, 1, 2, 3, 4 16 EXAMPLE 4 Check whether the function given by x+2 f (x) = for x = 1, 2, 3, 4, 5 25 can serve as the probability distribution of a discrete random variable. Solution Substituting the different values of x, we get f (1) = 25 3 , f (2) = 25 4 , f (3) = 25 5 , f (4) = 25 , and f (5) = 25 . Since these values are all nonnegative, the first condition 6 7 of Theorem 1 is satisfied, and since 3 4 5 6 7 f (1) + f (2) + f (3) + f (4) + f (5) = + + + + 25 25 25 25 25 =1 the second condition of Theorem 1 is satisfied. Thus, the given function can serve as the probability distribution of a random variable having the range {1, 2, 3, 4, 5}. Of course, whether any given random variable actually has this probability distribution is an entirely different matter. In some problems it is desirable to present probability distributions graphi- cally, and two kinds of graphical presentations used for this purpose are shown in Figures 2 and 3. The one shown in Figure 2, called a probability histogram, repre- sents the probability distribution of Example 3. The height of each rectangle equals 66 we are. 1 with the interval from 0. Probability histogram. Since each rectangle of the probability histogram of Figure 2 has unit width. we could have said that the areas of the rectangles. By representing 0 with the interval from −0.5. “spreading” the values of the given discrete random variable over a continuous scale.5 to 0. .5 to 1. Probability Distributions and Probability Densities f (x) 6 16 4 16 1 16 x 0 1 2 3 4 Number of heads Figure 2. . so to speak. Bar chart.5.5 to 4. the probability that X takes on the value that corresponds to the midpoint of its base. . and 4 with the interval from 3. equal the 67 ..5. rather than their heights. f (x) 6 16 4 16 1 16 x 0 1 2 3 4 Number of heads Figure 3. histograms and bar charts are used mainly in descriptive statis- tics to convey visually the information provided by a probability distribution or a distribution of actual data (see Section 8). when we wish to approximate the graph of a discrete probability distribution with a continuous curve. or the cumulative distribution of X. is called the distri- bution function. DISTRIBUTION FUNCTION. as shown in Figure 4. Sometimes. In this chapter. There are certain advantages to identifying the areas of the rectangles with the probabilities. but we still refer to the graphs as probability histograms. of X. DEFINITION 3. 68 . f (x) 6 16 4 16 1 16 x 0 1 2 3 4 Number of heads Figure 4. Probability histogram. If X is a discrete random variable. but there is no pretense of having a continuous horizontal scale. This can be done even when the rectangles of a probability histogram do not all have unit width by adjusting the heights of the rectangles or by modifying the vertical scale. for instance. the func- tion given by  F(x) = P(X ≤ x) = f (t) for − q < x < q t≤x where f(t) is the value of the probability distribution of X at t. the height of each rectangle. we use lines (rect- angles with no width) instead of the rectangles. or the cumulative distribution. The graph of Figure 3 is called a bar chart. but it is also referred to as a his- togram. Probability Distributions and Probability Densities corresponding probabilities. Thus. There are many problems in which it is of interest to know the probability that the value of a random variable is less than or equal to some real number x. or bar. As in Figure 2. equals the probability of the corresponding value of the random variable. let us write the probability that X takes on a value less than or equal to x as F(x) = P(X F x) and refer to this function defined for all real numbers x as the distribution function. 2.7) = 165 and F(100) = 1. If we are given the probability distribution of a discrete random variable. The values F(x) of the distribution function of a discrete ran- dom variable X satisfy the conditions 1. Solution Given f (0) = 16 . f (2) 4 = 16 . we can write F(1. if a < b. EXAMPLE 5 Find the distribution function of the total number of heads obtained in four tosses of a balanced coin. it follows that 1 F(0) = f (0) = 16 5 F(1) = f (0) + f (1) = 16 11 F(2) = f (0) + f (1) + f (2) = 16 15 F(3) = f (0) + f (1) + f (2) + f (3) = 16 F(4) = f (0) + f (1) + f (2) + f (3) + f (4) = 1 Hence. THEOREM 2. Probability Distributions and Probability Densities Based on the postulates of probability and some of their immediate consequences. then F(a) F F(b) for any real numbers a and b. 69 . and f (4) = 1 16 from Example 3. but for all real numbers.7 heads” or “at most 100 heads” in four tosses of a balanced coin may not be of any real significance. we obtain the following theorem. F(−q) = 0 and F(q) = 1. the corresponding distribution function is generally easy to find. For instance. f (3) 6 = 4 16 . although the probabilities of getting “at most 1. the distribution function is given by ⎧ ⎪ ⎪ for x < 0 ⎪ ⎪ 0 ⎪ ⎪ 1 ⎪ ⎪ for 0 F x < 1 ⎪ ⎪ ⎪ 16 ⎪ ⎪ ⎪ ⎪5 ⎪ ⎨ for 1 F x < 2 F(x) = 16 ⎪ ⎪ 11 ⎪ ⎪ for 2 F x < 3 ⎪ ⎪ 16 ⎪ ⎪ ⎪ ⎪ 15 ⎪ ⎪ for 3 F x < 4 ⎪ ⎪ ⎪ ⎪ 16 ⎩1 for x G 4 Observe that this distribution function is defined not only for the values taken on by the given random variable. f (1) 1 = 16 . Probability Distributions and Probability Densities EXAMPLE 6 Find the distribution function of the random variable W of Example 1 and plot its graph. the distribution function of W is given by ⎧ ⎪ ⎪ 0 for w < 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 3 ⎪ ⎨ for 0 F w < 1 F(w) = 28 ⎪ ⎪ 9 ⎪ ⎪ for 1 F w < 2 ⎪ ⎪ ⎪ ⎪ 14 ⎪ ⎩1 for w G 2 The graph of this distribution function. Solution Based on the probabilities given in the table in Section 1. we can write f (0) = 28 . Graph of the distribution function of Example 6. and f (2) = 14 . 1. as indicated by the heavy dots in Figure 5. shown in Figure 5. f (1) = 56 + 56 = 28 . F(w)) for w = 0. was obtained by first plot- ting the points (w. 70 . and 2 and then completing the step function as indicated. so that 3 15 15 15 5 3 F(0) = f (0) = 28 9 F(1) = f (0) + f (1) = 14 F(2) = f (0) + f (1) + f (2) = 1 Hence. Figure 5. Note that at all points of discontinuity the distribution function takes on the greater of the two values. f (12) = 1 − 36 = 36 . and comparison with the 3 3 10 6 4 35 1 probabilities in the table in Section 2 reveals that the random variable with which we are concerned here is the total number of points rolled with a pair of dice. If the range of a random variable X consists of the values x1 < x2 < x3 < · · · < xn . . . obtain values of the probability distribution of a random variable from its distri- bution function. that is. f (4) = 36 6 − 36 = 36 . . f (3) = 36 3 − 36 1 = 36 2 . . Solution Making use of Theorem 3. n EXAMPLE 7 If the distribution function of X is given by ⎧ ⎪ ⎪ 0 for x < 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ for 2 F x < 3 ⎪ 36 ⎪ ⎪ ⎪ ⎪ ⎪ 3 ⎪ ⎪ for 3 F x < 4 ⎪ 36 ⎪ ⎪ ⎪ ⎪ ⎪ 6 ⎪ ⎪ for 4 F x < 5 ⎪ ⎪ 36 ⎪ ⎪ ⎪ ⎪ 10 ⎪ ⎪ for 5 F x < 6 ⎪ ⎪ 36 ⎪ ⎪ ⎪ ⎪ ⎨ 15 for 6 F x < 7 F(x) = 36 ⎪ ⎪ 21 ⎪ ⎪ for 7 F x < 8 ⎪ ⎪ ⎪ ⎪ 36 ⎪ ⎪ 26 ⎪ ⎪ for 8 F x < 9 ⎪ ⎪ ⎪ ⎪ 36 ⎪ ⎪ 30 ⎪ ⎪ for 9 F x < 10 ⎪ ⎪ ⎪ ⎪ 36 ⎪ ⎪ 33 ⎪ ⎪ for 10 F x < 11 ⎪ ⎪ ⎪ ⎪ 36 ⎪ ⎪ 35 ⎪ ⎪ for 11 F x < 12 ⎪ ⎪ ⎪ ⎪ 36 ⎪ ⎪ ⎩ 1 for x G 12 find the probability distribution of this random variable. f (5) = 36 − 36 = 36 . we get f (2) = 36 1 . Probability Distributions and Probability Densities We can also reverse the process illustrated in the two preceding examples. To this end. we use the following result. In the remainder of this chapter we will be concerned with continuous ran- dom variables and their distributions and with problems relating to the simultaneous occurrence of the values of two or more random variables. then f (x1 ) = F(x1 ) and f (xi ) = F(xi ) − F(xi−1 ) for i = 2. . . THEOREM 3. 3. 71 . . . and f (4) = x 3−x (a) f (x) = ⎛ ⎞ for x = 0. 3. F(3) = 0. ⎪ ⎪ ⎪ ⎪ 6 4 ⎪ ⎪ ⎩1 for x G 10 5.0. For each of the following. 5 (c) F(1) = 0. 3. . (c) f (x) = for x = 0. 3.5. ⎛ ⎞⎛ ⎞ and 4: 2 4 ⎝ ⎠⎝ ⎠ (a) f (1) = 0. 4.7. 2.29. F(2) = 0. 3. 2. 2. . . ⎝ ⎠ 3 1 10 2 5 (c) f (1) = . 3. k can k(k + 1) serve as the probability distribution of a random variable 10. For each of the following. and F(4) = 1. ⎪3 ⎪ ⎪ ⎪ x ⎨1 F(x) = for 4 F x < 6 (c) f (x) = cx2 for x = 1. 1. 5. 1 (b) F(1) = 0. F(2) = 0. 2. 11. 4.25. 9. . 3.3. f (2) = . and 2x F(4) = 1. 2. −0. 4. 2. determine c so that the func. 4. .25. determine whether the given 5 values can serve as the values of a distribution function of x2 (b) f (x) = for x = 0. 2. For each of the following.? (c) the probability distribution of X. and 4: 30 (a) F(1) = 0. 2. Find the distribution function of the random variable c that has the probability distribution f (x) = x can serve as the values of the probability distribution x f (x) = for x = 1. .5. 5. . . . 5. Verify that f (x) = for x = 1. 4. 2. . 3. If X has the distribution function tion can serve as the probability distribution of a random variable with the given range: ⎧ (a) f (x) = cx for x = 1. 4. 1. 3. determine whether the 7. . . 4. .25. F(3) = 0. determine whether the given x 5 5 function can serve as the probability distribution of a ran- dom variable with the given range: 8. f (3) = 0. 3. 3. 2. random variable with the countably infinite range x = (b) P(X = 4). Construct a probability histogram for each of the fol- given values can serve as the values of a probability dis. ⎪ ⎪ 0 for x < 1 ⎪ ⎪   ⎪ ⎪ ⎪1 ⎪ 5 ⎪ ⎪ for 1 F x < 4 (b) f (x) = c for x = 0.61.4. F(2) = 0. 4. 5. 5 of a random variable with the countably infinite range 15 x = 1. 3. For each of the following. 2. k. .8. . 72 . Prove Theorem 2. 5.2. and f (4) = 0. and F(4) = 1.    x  5−x 19 19 19 19 5 1 4 (b) f (x) = for x = 0. 0.25. . and f (4) = . 1.83.27. . f (2) = 0. 6. 3. Probability Distributions and Probability Densities Exercises 1. Find the distribution function of the random variable with the given range. .15. a random variable with the range x = 1. 3. of part (a) of Exercise 7 and plot its graph. 3. 1. 1. f (3) = . For what values of k can f (x) = (1 − k)kx find serve as the values of the probability distribution of a (a) P(2 < X F 6). f (3) = 0. 2.75. 2. Show that there are no values of c such that 12. . 1. ⎪2 ⎪ ⎪ ⎪  x ⎪ ⎪ 5 1 ⎪ ⎪ for 6 F x < 10 (d) f (x) = c for x = 1. F(3) = 0.0. 2. 2.29. 6 (b) f (1) = 0. 2. f (2) = 0. x−2 (a) f (x) = for x = 1. lowing probability distributions: tribution of a random variable with the range x = 1. 3 grams. . for the sake of argument. (f) P(X = 5). . and we shall assume. where random variables can take on values on a continuous scale. or perhaps on a given stretch of the road. the result itself. 3. (c) P(X < 3). (b) P(X = 3). 3. find (a) P(X F 3). . With reference to Example 4. n. Probability Distributions and Probability Densities 13. that the probability d that an accident will occur on any interval of length d is .4 < X < 4). and in Figure 1 we illustrated this by assigning the total rolled with a pair of dice to each of the 36 equally likely points of the sample space. and the values of random variables are numbers appropriately assigned to the points by means of rules or equations. 2. The outcomes of experiments are represented by the points on line segments or lines. When the value of a ran- dom variable is given directly by a measurement or observation. with d measured in 200 73 . To illustrate. . and 5. . 4. With reference to Theorem 3. let us consider the following situation. . (e) P(−0. (b) P(X G xi ) = 1 − F(xi−1 ) for i = 2. ⎪ ⎪ 4 ⎪ ⎪ ⎩ 1 for x G 5 15. and P(X G x1 ) = 1. 225. . 3 Continuous Random Variables In Section 1 we introduced the concept of a random variable as a real-valued func- tion defined over the points of a sample space with a probability measure. The problem of defining probabilities in connection with continuous sample spaces and continuous random variables involves some complications. If X has the distribution function 14. Thus. In the continuous case. the procedure is very much the same. say. . verify that the values of ⎧ the distribution function are given by ⎪ ⎪ 0 for x < −1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ for −1 F x < 1 ⎪ ⎪ 4 x2 + 5x ⎪ ⎪ F(x) = ⎨1 50 F(x) = for 1 F x < 3 ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪ 3 ⎪ ⎪ for 3 F x < 5 ⎪ ⎪ for x = 1. 2. if an experiment consists of determining the actual content of a 230-gram jar of instant coffee. verify that (a) P(X > xi ) = 1 − F(xi ) for i = 1. (d) P(X G 1). is the value of the random variable with which we are concerned. EXAMPLE 8 Suppose that we are concerned with the possibility that an accident will occur on a freeway that is 200 kilometers long and that we are interested in the probability that it will occur at a given location. The sample space of this “experiment” consists of a continuum of points. and there is no real need to add that the sample space consists of a certain continuous interval of points on the positive real axis. we generally do not bother to distinguish between the value of the random variable (the measure- ment that we obtain) and the outcome of the experiment (the corresponding point on the real axis). those on the interval from 0 to 200. n. 3. heads and tails. after all. then P(A1 ∪ A2 ∪ A3 ∪ · · · ) = P(A1 ) + P(A2 ) + P(A3 ) + · · · ). This does not mean that the corresponding events cannot occur. However. and it is similar in nature to the way in which we assign equal probabilities to the six faces of a die. P(A) G 0 for any subset A of S but in Postulate 2 P(S) = 1. the 52 playing cards in a standard deck. is d1 + d2 + d3 + · · · 200 With reference to Example 8. an interval of 1 centimeter. . and this probability distribution may be pictured as a histogram in which the probabilities are given by the areas of rect- angles. If he rounds the amounts to the nearest hundredth of an ounce. . the probability that an accident will occur on either of two nonoverlapping intervals of length d1 and d2 is d1 + d2 200 and the probability that it will occur on any one of a countably infinite sequence of nonoverlapping intervals of length d1 . .00000005. . . A2 . the probability that an accident will occur on it also approaches zero.) The probabilities d 200 are all nonnegative and P(S) = = 1. observe also that the probability of the accident occurring on a very short interval. if he rounds the amounts to the nearest tenth of an ounce. he will again be dealing with a discrete random variable (a different one) that has a probability distribution. For instance. say. A3 . . d2 . d3 . Evidently. in fact. that is. Note that this assignment of probabilities is consistent with Postulates 1 and 2. which is very small. is a finite or infinite sequence of mutually exclusive events of S. 4 Probability Density Functions The way in which we assigned probabilities in Example 8 is very special. when an accident occurs on the 200- kilometer stretch of road. the amount will vary somewhat from bottle to bottle. . but if we use Postulate 3 (Postulate 3: If A1 . indeed. (Postulate 1 states that probability of an event is a nonnegative real number. it is. As the length of the interval approaches zero. and so forth. we can also obtain probabilities for the union of any finite or countably infinite sequence of nonoverlapping intervals. say. it has to occur at some point even though each point has zero probability. a continuous random variable. as in the diagram at the top of Figure 6. To treat the problem of associating probabilities with values of continuous ran- dom variables more generally. suppose that a bottler of soft drinks is concerned about the actual amount of a soft drink that his bottling machine puts into 16-ounce bottles. in the continuous case we always assign zero probability to individual points. Probability Distributions and Probability Densities kilometers. and this probability distribution may be pictured as a probability histogram in which the probabilities 74 . he will be dealing with a discrete random vari- able that has a probability distribution. is only 0. So far this assignment of probabilities 200 200 applies only to intervals on the line segment from 0 to 200. 00 16. called a probability density function. the probability histograms of the probability distributions of the corresponding discrete random variables will approach the continuous curve shown in the diagram at the bottom of Figure 6. such that areas under the curve give the probabilities associated with the corresponding intervals along the horizontal axis. and the sum of the areas of the rectangles that represent the probability that the amount falls within any specified interval approaches the corresponding area under the curve. integrated from a to b (with a F b). gives the probability that the corresponding random variable will take on a value on the interval from a to b.10 Amounts rounded to nearest hundredth of an ounce 15. say. Indeed. It should be apparent that if he rounds the amounts to the nearest thousandth of an ounce or to the nearest ten-thousandth of an ounce.1 Figure 6. Definition of probability in the continuous case.0 16.9 16. In other words. a probability density function.0 16. Probability Distributions and Probability Densities 15. 75 .90 16. as in the diagram in the middle of Figure 6.1 Amounts rounded to nearest tenth of an ounce 15. the definition of probability in the continuous case presumes for each random variable the existence of a function.9 16. are given by the areas of rectangles. of the random variable X at x.f. f (x) G 0 for −q < x < q. f (x) dx = 1. A function with values f(x). does not give P(X = c) as in the discrete case. Also. densities. or p.d. in view of this property. the value of a probability density function can be changed for some of the values of a random variable without changing the prob- abilities. as probability densi- ties. let us now state the following properties of probability densities. it does not matter whether we include the endpoints of the interval from a to b. This agrees with what we said on the previous page and it also follows directly from Definition 4 with a = b = c. the value of the probability density of X at c. not the probability density. is called a probability density function of the continuous random variable X if and only if  b P(a ≤ X ≤ b) = f (x)dx a for any real constants a and b with a … b.5 F X F 1). and this is why we said in Definition 4 that f (x) is the value of a probability density. we have the following theorem. satisfy the conditions† 1. † The conditions are not “if and only if” as in Theorem 1 because f (x) could be negative for some values of the random variable without affecting any of the probabilities. In connection with continuous random variables. more briefly. Probability Distributions and Probability Densities DEFINITION 4. Because of this property. PROBABILITY DENSITY FUNCTION. which again follow directly from the postulates of probability. defined over the set of all real numbers. probabilities are always associated with intervals and P(X = c) = 0 for any real constant c. then P(a F X F b) = P(a F X < b) = P(a < X F b) = P(a < X < b) Analogous to Theorem 1.  q 2. Note that f (c). A function can serve as a probability density of a continuous random variable X if its values.’s. symbolically. THEOREM 4. Probability density functions are also referred to. If X is a continuous random variable and a and b are real constants with a F b. 76 . density functions. THEOREM 5. both conditions of Theorem 5 will be satisfied by nearly all the probability densities used in practice and studied in this text. f (x). However. −q EXAMPLE 9 If X has the probability density  k · e−3x for x > 0 f (x) = 0 elsewhere find k and P(0. The properties of distribution functions given in Theorem 2 hold also for the continuous case.5 = 0. This is a practice we shall follow throughout this text.5 F X F 1). Fur- thermore. we can state the following theorem. EXAMPLE 10 Find the distribution function of the random variable X of Example 9.5 Although the random variable of the preceding example cannot take on negative values. 77 . F(q) = 1. we must have   q q e−3x t k f (x) dx = k · e−3x dx = k · lim  = =1 −q 0 t→q −3 0 3 and it follows that k = 3. and F(a) F F(b) when a < b. F(−q) = 0. As in the discrete case.173 0. that is.5 F X F 1) = 3e−3x dx = −e−3x  = −e−3 + e−1. For the probability we get  1 1  P(0. we artificially extended the domain of its probability density to include all the real numbers.5 0. If X is a continuous random variable and the value of its probability density at t is f(t). If f (x) and F(x) are the values of the probability density and the distribution function of X at x. and dF(x) f (x) = dx where the derivative exists. Probability Distributions and Probability Densities Solution To satisfy the second condition of Theorem 5. based on Definition 5. DISTRIBUTION FUNCTION. Thus. DEFINITION 5. there are many problems in which it is of interest to know the probability that the value of a continuous random variable X is less than or equal to some real number x. let us make the following definition analogous to Definition 3. THEOREM 6. then P(a F X F b) = F(b) − F(a) for any real constants a and b with a F b. and use it to reevaluate P(0. then the function given by  x F(x) = P(X ≤ x) = f (t)dt for − q < x < q −q is called the distribution function or the cumulative distribution function of X. it does not matter how the probability density is defined at these two points. we can write ⎧ ⎪ ⎪ ⎨0 for x < 0 f (x) = 1 for 0 < x < 1 ⎪ ⎪ ⎩0 for x > 1 To fill the gaps at x = 0 and x = 1. and x > 1. Probability Distributions and Probability Densities Solution For x > 0. we can write  0 for x F 0 F(x) = 1 − e−3x for x > 0 To determine the probability P(0. we differentiate for x < 0. but there are certain advantages for choosing the values in such a way that the probability density is nonzero over an open interval.5 F X F 1) = F(1) − F(0. we can write the probability density of the original random variable as  1 for 0 < x < 1 f (x) = 0 elsewhere Its graph is shown in Figure 7. and 0. EXAMPLE 11 Find a probability density function for the random variable whose distribution func- tion is given by ⎧ ⎪ ⎪ ⎨0 for x F 0 F(x) = x for 0 < x < 1 ⎪ ⎪ ⎩1 for x G 1 and plot its graph.  x  x x  F(x) = f (t)dt = 3e−3t dt = −e−3t  = 1 − e−3x −q 0 0 and since F(x) = 0 for x F 0. Solution Since the given density function is differentiable everywhere except at x = 0 and x = 1. 78 .5 F X F 1).5) = (1 − e−3 ) − (1 − e−1. Thus. 0 < x < 1. Actually. Thus. getting P(0.173 This agrees with the result obtained by using the probability density directly in Example 9. we use the first part of Theorem 6. according to the second part of Theorem 6.5 ) = 0. getting 0. we let f (0) and f (1) both equal zero. 1. F (x) 1 3 4 1 2 1 4 x 0 0. 79 . Probability Distributions and Probability Densities f (x) 1 x 0 1 Figure 7. Such a distribution function will be discontinuous at each point having a nonzero probability and continuous elsewhere. As in the dis- crete case.5 1 Figure 9. or they are continuous curves or combinations of lines as in Figure 8. so the corresponding distribution functions have a steplike appearance as in Figure 5. Probability density of Example 11. the height of the step at a point of discontinuity gives the probability that F (x) 1 x 0 1 Figure 8. which shows the graph of the distribution function of Example 11. Distribution function of a mixed random variable. Discontinuous distribution functions like the one shown in Figure 9 arise when random variables are mixed. Distribution function of Example 11. In most applications we encounter random variables that are either discrete or continuous. f. of the random variable X is given by X of Exercise 17 and use it to reevaluate part (b). With reference to Exercise 24. but otherwise the random variable is like a continuous random variable. (b) Find P(3 < X < 5). Find the distribution function of the random variable curve (above the x-axis) is equal to 1. f (y) = 8 ⎪ ⎩0 28. ⎩0 elsewhere 80 .1 < x < 0. and indicate the area 6x(1 − x) for 0 < x < 1 g(x) = associated with the probability that 0. ⎪ ⎪ bilities asked for in that exercise. Find the distribution function of the random variable ⎨x f (x) = 2 − x for 1 F x < 2 Y of Exercise 20 and use it to determine the two proba. given by 27.5) = 34 − 14 = 12 .9 < Y < 3.2). The probability density of the random variable Z is 18.d. P(X = 0. 25.1 < x < 0. find the distribution ⎧ function of X and use it to reevaluate the two probabili- ⎪ ⎨ 1 (y + 1) for 2 < y < 4 ties asked for in that exercise. Find the distribution function of the random variable 22. The p.  (b) Sketch a graph of this function. With reference to Exercise 26. The probability density of the random variable Y is Find P(X < 14 ) and P(X > 12 ). Probability Distributions and Probability Densities the random variable will take on that particular value. (c) Calculate the probability that x > 1. (a) Show that given by  f (x) = e−x for 0 < x < q kze−z for z > 0 2 f (z) = 0 for z F 0 represents a probability density function. Exercises 16. X of Exercise 22 and use it to determine the two proba- bilities asked for in part (b) of that exercise. (a) Show that function of Z and draw its graph.5. find the distribution 19. 20. The density function of the random variable X is f (x) = 3x2 for 0 < x < 1 given by represents a density function. The probability density of the continuous random ⎪ ⎨√ c for 0 < x < 4 variable X is given by f (x) = x ⎪ ⎩0 elsewhere ⎧ ⎪ ⎪ ⎨1 for 2 < x < 7 Find f (x) = 5 ⎪ ⎪ (a) the value of c. In this chapter we shall limit ourselves to random variables that are discrete or continuous with the latter having distribution functions that are differentiable for all but a finite set of values of the random variables. With reference to Figure 9. ⎧ ⎪ ⎪ for 0 < x < 1 21. (a) Draw its graph and verify that the total area under the 23. Find the distribution function of the random variable elsewhere X whose probability density is given by Find P(Y < 3. ⎩0 elsewhere (b) P(X < 14 ) and P(X > 1). Find k and draw the graph of this probability density. 24. ⎧ 17.2) and P(2.5. 26. 0 elsewhere (c) Calculate the probability that 0. (b) Sketch a graph of this function and indicate the area associated with the probability that x > 1. The distribution function of the mixed random vari- ⎪ ⎪ 2 ⎪ ⎪ able Z is given by ⎩1 for x G 1 ⎧ ⎪ ⎪ 0 for z < −2 ⎪ ⎪ Find P(− 12 < X < 12 ) and P(2 < X < 3). (b) 0 < x < 0. P(Z = 2). ⎪ ⎪ 0 for x < −1 ⎪ ⎪ P(X = 0.5 F x < 1. P(−2 < Z < 1). sketch the graphs of the distribution function and the probability density of Y. ⎨ F(x) = x + 1 for −1 F x < 1 41. 81 . (b) 0 < x < 0. Also sketch the graphs of these probability density and 40. letting Also sketch the graphs of the probability density and dis. F(x) = 0 for x F 0 31. find the probabil- ⎪ ⎪ 3 ⎨ ity density of Y and use it to recalculate the two f (x) = 1 probabilities. Probability Distributions and Probability Densities ⎧ Also sketch the graphs of the probability density and dis. ⎧ ⎪ x 38. 37. ⎪ ⎪ 1 ⎪ ⎪ for 0 < x < 1 35. find P(0.8 < X < 1. The distribution function of the random variable X is 30. F(y) = y2 ⎪ ⎩0 29.5 < x < 1. f (3) = 0. and P(X > 4). Find the distribution function of the random variable elsewhere X whose probability density is given by ⎧ Find P(Y F 5) and P(Y > 8). ⎪ ⎪ ⎪ ⎪ ⎪ ⎨1 39. ⎨ F(z) = z + 4 for −2 F z < 2 33. find the probability ⎪ ⎪ for 0 < x F 1 ⎪ ⎪ 2 density of X. find the probability ⎪ ⎪ 8 ⎪ ⎪ density of X and use it to recalculate the two proba. The distribution function of the random variable Y is Find P(Z = −2). With reference to Exercise 28.  1 − (1 + x)e−x for x > 0 (b) the distribution function. Use the results of Exercise 39 to find expressions for distribution functions. The distribution function of the random variable X is (a) x < 0. 34. Find the distribution function of the random variable X whose probability density is given by Find P(X F 2).5. ⎪ ⎪ for 2 < x < 4 ⎪3 ⎪ ⎪ ⎪ 36. ⎪ ⎨1 − 9 for y > 3 tribution functions. find expressions for the for 1 < x F 2 f (x) = 2 values of the distribution function of the mixed random ⎪ ⎪ ⎪3 − x ⎪ variable X for ⎪ ⎪ for 2 < x < 3 ⎪ ⎪ 2 (a) x F 0. (d) x G 1. given by ⎧ (c) 0. P(1 < X < 3).2) given by using (a) the probability density.5) = 1 2. (d) x > 1. With reference to Exercise 34. and f (0) and f (1) are undefined. the values of the probability density of the mixed random variable X for 32.5. and P(0 F given by Z F 2). With reference to Figure 9. tribution functions. With reference to Exercise 32. ⎩1 for z G 2 bilities. ⎪ ⎪ ⎩0 elsewhere (c) 0. With reference to Exercise 37. With reference to Exercise 34 and the result of ⎩0 elsewhere Exercise 35. find the probabilities associated with all possible pairs of values of X and Y. and so forth. their teachers in the number of days they have been absent. The number of ways in which this can be done is 3 2 4 1 0 1 = 12.’s. where we dealt with one random variable and could display the probabilities associated with all values of X by means of a table. With reference to the sample space of Figure 1. Similarly. in the bivariate case. In this section we shall be concerned first with the bivariate case. covering any finite number of random variables. respectively. the random variable whose values are the dif- ferences between the numbers rolled with the red die and the green die. then the probability of event A is P(A) = n/N) 36 = 3 . we considered only the random variable whose values were the totals rolled with a pair of dice. hence. Since those possibilities are all equally likely by virtue of the assumption that the selection is random. (1. the random variable whose values are 0. the school nurse in their weights. we shall extend this discussion to the mul- tivariate case. (1. and if n of these out- comes together constitute event A. 0). If X and Y are. that is. Closer to life. for example. the probability associ- that the probability associated with (1. Thus. To find the prob- ability associated with (1. 0). but we could also have considered the random variable whose values are the products of the numbers rolled with the two dice. 0) is 12 1 ated with (1. (0. we write the probability that X will take on the value x and Y will take on the value y as P(X = x. or 2 depending on the number of dice that come up 2. display the probabilities associated with all pairs of values of X and Y by means of a table. none of the 2 sedative caplets. an experiment may consist of randomly choosing some of the 345 students attending an elementary school. Y = y) is the probability of the intersection of the events X = x and Y = y.    one   of the 4 laxative caplets. Solution The possible pairs are (0. 1) is     3 2 4 1 1 0 6 1 = = 36 36 6 82 . and. for example.Q. and 4 laxative caplets. 1). it follows from a theorem (If an experiment can result in any one of N different equally likely outcomes. 2 sedative. and so forth. Probability Distributions and Probability Densities 5 Multivariate Distributions In the beginning of this chapter we defined a random variable as a real-valued func- tion defined over a sample space with a probability measure. 2). and the total number of ways in which 2 of the 9 caplets can be   9 selected is 2 = 36. and the principal may be inter- ested in their I. Y = y). Later. If X and Y are discrete random variables. we can now. 1. observe that we are concerned with the event of getting one of the 3 aspirin caplets. As in the univariate case. the numbers of aspirin and sedative caplets included among the 2 caplets drawn from the bottle. 0). with situations where we are interested at the same time in a pair of random variables defined over a joint sample space. EXAMPLE 12 Two caplets are selected at random from a bottle containing 3 aspirin. 0). (0. P(X = x. and (2. and it stands to reason that many different random variables can be defined over one and the same sample space. 1). y) = P(X = x. 2. EXAMPLE 13 Determine the value of k for which the function given by f (x. JOINT PROBABILITY DISTRIBUTION. In other words. 3. y). f (x. Analogous to Theorem 1. y) within the range of X and Y is called the joint probability distribution of X and Y. Y = y) for each pair of values (x.  2. where the double summation extends over all x y possible pairs (x. continuing this way. 1. satisfy the conditions 1. 2. which follows from the postulates of probability. 1. it is generally preferable to represent proba- bilities such as these by means of a formula. the function given by f(x. Y = y) for any pair of values (x. y) = P(X = x. 2. f (x. y) = kxy for x = 1. y) within its domain. as in the univariate case. we obtain the values shown in the following table: x 0 1 2 1 1 1 0 6 3 12 2 1 y 1 9 6 1 2 36 Actually. 83 . A bivariate function can serve as the joint probability distri- bution of a pair of discrete random variables X and Y if and only if its values. for the two random variables of Example 12 we can write     3 2 4 x y 2−x−y for x = 0. y) G 0 for each pair of values (x. f (x. y = 0. it is preferable to express the probabilities by means of a function with the values f (x. y) within the range of the random variables X and Y. f (x. 2. For instance. y) within its domain. let us state the following theorem. y = 1. y) = 1. 3 can serve as a joint probability distribution. y) =   9 0 F x+y F 2 2 DEFINITION 6. If X and Y are discrete random variables. THEOREM 7. Probability Distributions and Probability Densities and. we get f (1. find F(1. k + 2k + 3k + 2k + 4k + 6k + 3k + 6k + 9k = 1 so that 36k = 1 and k = 1 36 . 84 . Y F 1) = f (0. As in the univariate case. f (2. t) is the value of the joint probability distribution of X and Y at (s. 1) = k. Let us now extend the various concepts introduced in this section to the contin- uous case. 2) = 2k. Solution F(1. 0) + f (0. f (1. the joint distribution function of two random variables is defined for all real numbers. there are many problems in which it is of interest to know the probability that the values of two random variables are less than or equal to some real numbers x and y.7. 3) = 9k.5) = P(X F 3. t) for − q < x < q s≤x t≤y −q < y < q where f(s. t). f (3. 1) = P(X F −2. EXAMPLE 14 With reference to Example 12. f (2. 1) = 2k. the function given by  F(x. 3) = 6k.5) = 1. 2) = 6k. 1). or the joint cumulative distribution of X and Y. Probability Distributions and Probability Densities Solution Substituting the various values of x and y. In Exercise 48 the reader will be asked to prove properties of joint distribution func- tions that are analogous to those of Theorem 2. the constant k must be nonnegative. Y F 4. for Example 12 we also get F(−2. For instance. To satisfy the first condition of Theorem 7. Y ≤ y) = f (s. f (1.7. 3) = 3k. f (2. 1) = 3k. 4. 1) + f (1. f (3. If X and Y are discrete random vari- ables. Y F 1) = 0 and F(3. JOINT DISTRIBUTION FUNCTION. 2) = 4k. y) = P(X ≤ x. DEFINITION 7. and to satisfy the second condition. 1) 1 2 1 1 = + + + 6 9 3 6 8 = 9 As in the univariate case. and f (3. 1) = P(X F 1. 0) + f (1. is called the joint distribution function. Y) ∈ A = f (x. y) G 0 for −q < x < q. −q < y < q. Y) ∈ A]. f (x. 1 < y < 2}. 85 . y) dx dy = 1. −q −q EXAMPLE 15 Given the joint probability density function ⎧ ⎪ ⎪ 3 ⎨ x(y + x) for 0 < x < 1. A bivariate function with values f(x. Probability Distributions and Probability Densities DEFINITION 8. f (x. f (x. Y) ∈ A] = P 0 < X < . y) defined over the xy-plane is called a joint probability density function of the continuous random variables X and Y if and only if  P(X. Solution   1 P[(X. y). it follows from the postulates of probability that THEOREM 8. Analogous to Theorem 5. find P[(X. JOINT PROBABILITY DENSITY FUNCTION. y)|0 < x < 12 .  q q 2. A bivariate function can serve as a joint probability density function of a pair of continuous random variables X and Y if its values. 1 < Y < 2 2  2 1 2 3 = x(y + x) dx dy 1 0 5  2 2 x= 1 3x y 3x3  2 = +  dy 1 10 15   2   3y 1 3y2 y 2 = + dy = + 1 40 40 80 40 1 11 = 80 Analogous to Definition 7. we have the following definition of the joint distribu- tion function of two continuous random variables. y) = 5 ⎪ ⎪ ⎩0 elsewhere of two random variables X and Y. 0 < y < 2 f (x. y)dxdy A for any region A in the xy-plane. where A is the region {(x. satisfy the conditions 1. hold also for the continuous case. Note that the properties of joint distribution functions. Also as in Section 4. which the reader will be asked to prove in Exercise 48 for the discrete case. y) = 0. Y ≤ y) = f (s. dF(x) Analogous to the relationship f (x) = of Theorem 6. we get  y 1 1 F(x. y) = (s + t) ds dt = xy(x + y) 0 0 2 for x > 1 and 0 < y < 1 (Region II of Figure 10). Probability Distributions and Probability Densities DEFINITION 9. it follows immediately that F(x. −q −q −q<y<q where f(s. y) = P(X ≤ x. y) = 0 elsewhere find the joint distribution function of these two random variables. t) ds dt for − q < x < q. t). EXAMPLE 16 If the joint probability density of X and Y is given by  x+y for 0 < x < 1. the function given by  y  x F(x. As in Section 4. we generally let the values of joint probability densities equal zero wherever they are not defined by the above relationship. If X and Y are continuous random variables. As in Section 4. is called the joint distribution function of X and Y. Solution If either x < 0 or y < 0. partial differentia- dx tion in Definition 9 leads to ⭸2 f (x. For 0 < x < 1 and 0 < y < 1 (Region I of Figure 10). 0 < y < 1 f (x. y) where the joint density is con- tinuous. we get  y x 1 F(x. t) is the joint probability density of X and Y at (s. y) = (s + t) ds dt = y(y + 1) 0 0 2 86 . y) = F(x. we shall limit our discussion here to random variables whose joint distribution function is continuous everywhere and partially differentiable with respect to each variable for all but a finite set of values of the two random variables. y) ⭸x⭸y wherever these partial derivatives exist. JOINT DISTRIBUTION FUNCTION. the joint distribution func- tion of two continuous random variables determines their joint density (short for joint probability density function) at all points (x. y) = (s + t) ds dt = x(x + 1) 0 0 2 and for x > 1 and y > 1 (Region IV of Figure 10). Probability Distributions and Probability Densities Figure 10. 0 < y < 1 ⎪ ⎪ ⎪ ⎪ 2 ⎪ ⎨1 F(x. y G 1 EXAMPLE 17 Find the joint probability density of the two random variables X and Y whose joint distribution function is given by  (1 − e−x )(1 − e−y ) for x > 0 and y > 0 F(x. we get  1 x 1 F(x. 0 < y < 1 ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ x(x + 1) for 0 < x < 1. Diagram for Example 16. y) = 0 elsewhere Also use the joint probability density to determine P(1 < X < 3. we get  1 1 F(x. y G 1 ⎪ ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎩1 for x G 1. y) = (s + t) ds dt = 1 0 0 Since the joint distribution function is everywhere continuous. the boundaries between any two of these regions can be included in either one. y) = y(y + 1) for x G 1. and we can write ⎧ ⎪ ⎪ for x F 0 or y F 0 ⎪ ⎪ 0 ⎪ ⎪ ⎪ ⎪1 ⎪ ⎪ xy(x + y) for 0 < x < 1. for 0 < x < 1 and y > 1 (Region III of Figure 10). 1 < Y < 2). 87 . . xn ) = P(X1 = x1 . All the definitions of this section can be generalized to the multivariate case. y) = 0 elsewhere Thus. . y) e 1 2 3 y 1 2 3 x Figure 11. and the probability that we calculated in the preceding example is given by the volume under this surface. X2 . Diagram for Example 17. y) (x y) f (x. X2 = x2 . . . Corresponding to Definition 6. Probability Distributions and Probability Densities Solution Since partial differentiation yields ⭸2 F(x.074 for P(1 < X < 3. . we find that the joint probability density of X and Y is given by  e−(x+y) for x > 0 and y > 0 f (x. . integration yields  2 3 e−(x+y) dx dy = (e−1 − e−3 )(e−1 − e−2 ) 1 1 = e−2 − e−3 − e−4 + e−5 = 0. 1 < Y < 2). . the values of the joint probability distribution of n discrete random variables X1 . y) = e−(x+y) ⭸x⭸y for x > 0 and y > 0 and 0 elsewhere. 88 . . geometrically speaking. a sur- face. the joint probability is. x2 . . where there are n random variables. as shown in Figure 11. and Xn are given by f (x1 . . . Xn = xn ) f (x.. For two random variables. f (t1 . where A is the region   1 1 (x1 . xn ) = F(x1 . . . EXAMPLE 19 If the trivariate probability density of X1 . . . . x2 . . dtn −q −q −q for −q < x1 < q. 1. . x2 . −q < xn < q. X2 . x3 > 0 f (x1 . X2 F x2 . and Z is given by (x + y)z f (x. . y = 1. t2 . . 2 63 find P(X = 2. Also. xn ) ⭸x1 ⭸x2 · · · ⭸xn wherever these partial derivatives exist. . < x2 < 1. X3 ) ∈ A]. 2) + f (2. probabilities are again obtained by integrating the joint probability density. . and corre- sponding to Definition 7. −q < xn < q. .. the values of their joint distribution function are given by F(x1 . . . −q < x2 < q. x2 . . . Y + Z F 3) = f (2. . Solution P(X = 2. . . 3. . . 1. x3 )|0 < x1 < . x2 . . z = 1. 2. Y + Z F 3). . Y. . Xn F xn ) for −q < x1 < q. 2. xn ) within the range of the random variables. . xn ) = . −q < x2 < q. . xn ) = P(X1 F x1 . 1) + f (2. . tn ) dt1 dt2 . y. . . . . . X2 . 1) 3 6 4 = + + 63 63 63 13 = 63 In the continuous case. Probability Distributions and Probability Densities for each n-tuple (x1 . x2 .. x2 . . 2. z) = for x = 1. and X3 is given by  (x1 + x2 )e−x3 for 0 < x1 < 1. partial differentiation yields ⭸n f (x1 . . x3 ) = 0 elsewhere find P[(X1 . analogous to Definition 9. EXAMPLE 18 If the joint probability distribution of three discrete random variables X. . . x2 . 0 < x2 < 1. . x3 < 1 2 2 89 . . and the joint distribution function is given by  xn  x2  x1 F(x1 . . (d) F(4. (d) P(X > Y).9). (3. x + y < 1 ned for c.2. 2 0 1 2 1 1 1 can serve as the joint probability distribution of two ran- 0 12 6 24 dom variables. 3. y) = 24xy for 0 < x < 1. . X3 ) ∈ A] = P 0 < X1 < .  45. 1. 0. (b) F(q.5). (b) P(X = 0. Determine k so that  (c) F(2. 49. 1. 43. If the joint probability density of X and Y is given by find the value of c. 3 50. 3. ues of the joint distribution function of the two ran. 2. y). 2. y) = ky(2y − x) for x = 0. (c) if a < b and c < d. (c) P(X + Y > 2). find 0 elsewhere (a) P(X F 1.7). (c) P(X + Y F 1). X2 . With reference to Exercise 42. 1. d). show that (a) F(−q. Y > 2). y) = (x + y) for x = 0. y) = c(x2 + y2 ) for x = −1. Show that there is no value of k for which and Y are as shown in the table x f (x. find 48. f (x. 0). If the values of the joint probability distribution of X 46. If F(x. y = 0. (0. y) is the value of the joint distribution func- (a) P(X = 1. y = −1. f (x. 1). find P(X + Y < 12 ). . −x < y < x 44. 1 F Y < 3). y = 0. then F(a. tion of two discrete random variables X and Y at (x. 1. c) F F(b. Y = 2). 1 1 1 1 47. −q) = 0. y) = kx(x − y) for 0 < x < 1. 2). If the joint probability distribution of X and Y is 0 elsewhere given by can serve as a joint probability density. 0 < y < 1. find the following val. 1. f (x. q) = 1. < X2 < 1. 2 8 20 30 1 construct a table showing the values of the joint distribu- 3 120 tion function of the two random variables at the 12 points (0. 0. With reference to Exercise 44 and the value obtai. Probability Distributions and Probability Densities Solution   1 1 P[(X1 .158 4 Exercises 42. . dom variables: (a) F(1. (b) F(−3. 3. X3 < 1 2 2  1 1 1 2 = (x1 + x2 )e−x3 dx1 dx2 dx3 1 0 2 0  1 1 1 x2  = + e−x3 dx2 dx3 0 1 2 8 2  1 1 −x3 = e dx3 0 4 1 = (1 − e−1 ) = 0. 90 . 2. If the joint probability distribution of X and Y is 4 4 40 given by y 1 1 1 2 f (x. Y F 2). (b) P(X = 0. 0). Y. 91 . given by 58. Find the joint probability density of the two random ⎪ ⎪ ⎪ x1 x2 (x1 + x2 )(1 − e−x3 ) for 0 < x1 < 1. Use the formula obtained in Exercise 58 to verify the result of Exercise 57. 2). X2 . f (x. Use the formula obtained in Exercise 58 to verify the (a) P(X = 12 . z) = 0 < z < 1. and use it to verify the 64. y. 2. With reference to Exercise 62. Observe that the result holds ⎪ ⎪ ⎪ ⎪ also for discrete random variables. and 0 elsewhere X3 of Example 19 is given by 55. If the joint probability density of X and Y is given by 61. ues of the joint distribution function of the three ran- dom variables: 53. and Z elsewhere is given by find the probability that the sum of the values of X and Y ⎧ ⎪ ⎪ will exceed 12 . f (x. z) = kxyz find (a) P(X F 12 . ⎨3 (x. 2.  (1 − e−x )(1 − e−y ) for x > 0. and F(b. y) = 0 elsewhere Z is given by f (x. Use the formula obtained in Exercise 58 to verify the result. c). Y F 12 ). for 0 < x < y. c < Y F d) in terms of F(a. ⎩0 elsewhere 59. Use the result of Example 16 to verify that the joint F(x. 0 < y < 1 f (x. x2 G 1. y) = y ⎪ ⎩0 65. y). x3 > 0 ⎪ 1 1 1 − e−x − e−y + e−x−y for x > 0. find P(X + Y < 12 ). Probability Distributions and Probability Densities 51.074. If F(x. express P(a < X F b. Find k if the joint probability distribution of X. y > 0 2 2 67. Y < 12 . find 60. and Z is cise 56 to find P(X + Y > 3). y = 1. (b) P(X + Y > 23 ). and x + y < 1. ⎧ ⎪ ⎪ 1 tion of the two continuous random variables X and Y at ⎪ (2x + 3y + z) ⎪ for 0 < x < 1. Use the joint probability density obtained in Exer. x3 > 0 ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪  ⎪ 1 x (x + 1)(1 − e−x3 ) ⎪ ⎪ for 0 < x1 < 1. find (c) P(X > 2Y). Use the joint probability density obtained in Exer. y. Y = 12 . or x3 F 0 ⎪ ⎪ ⎪1 ⎪ 56. and f (x.  2 for x > 0. 1 < Y F 2). 68. y) is the value of the joint distribution func. ⎧ ⎪ cise 54 to find P(1 < X F 2. Z < 12 ). y > 0. With reference to Exercise 51. 0. Find k if the joint probability density of X. y) = ⎪ ⎩1 − e−x3 for x1 G 1. for x = 1. 0 < y < 1. 0 < y < 1. 0 < x2 < 1. Z = 1). (b) P(X < 12 . 1). F(b. With reference to Exercise 62. x + y < 1 62. of Example 17. the values of the joint distribution function of X and Y when x > 0. Z = 12 ). x2 . y > 0 ⎪ ⎪2 ⎪ ⎪ F(x. (a) P(X = 1. z = 1. x + y + z < 1 54. ⎨kxy(1 − z) for 0 < x < 1. 3. z) = 0<z<1 F(a. 2. y. find the following val- result of part (a). x3 > 0 ⎪2 ⎪ variables X and Y whose joint distribution function is ⎪ ⎨1 given by F(x1 . 63. ⎧ (b) F(1. d). c). y) = distribution function of the random variables X1 . ⎪ ⎨1 (c) F(4. 4. d). If the joint probability density of X. If the joint probability density of X and Y is given by (a) F(2. x3 > 0 0 elsewhere 57. Y F 2. Find the joint probability density of the two random ⎪ ⎪ ⎩0 elsewhere variables X and Y whose joint distribution function is given by 66. Y. 4). 52. result of Exercise 55. ⎪ ⎪ ⎪ ⎪ 0 for x1 F 0. 0 < x2 < 1. Y. 0. x2 F 0. y > 0. With reference to Exercise 65. find an expression for (b) P(X = 2. Y + Z = 4). 1. x3 ) = x2 (x2 + 1)(1 − e−x3 ) for x1 G 1. x2 G 1. 1. two seda- tive. We are thus led to the following definition. By the same token. 1. the function given by  g(x) = f (x. y) y 92 . together with the marginal totals. MARGINAL DISTRIBUTION. 2 x=0 of the probability distribution of Y. the number of aspirin caplets and the number of sedative caplets included among two caplets drawn at random from a bottle containing three aspirin. In other words. and 2. Solution The results of Example 12 are shown in the following table. that is. DEFINITION 10. they are the values  2 g(x) = f (x. 2 y=0 of the probability distribution of X. 1. the totals of the respective rows and columns: x 0 1 2 1 1 1 7 0 6 3 12 12 2 1 7 y 1 9 6 18 1 1 2 36 36 5 1 1 12 2 12 The column totals are the probabilities that X will take on the values 0. If X and Y are discrete random variables and f(x. y) is the value of their joint probability distribution at (x. let us consider the following example. and four laxative caplets. Find the probability distribution of X alone and that of Y alone. y). the row totals are the values  2 h(y) = f (x. EXAMPLE 20 In Example 12 we derived the joint probability distribution of two random variables X and Y. y) for x = 0. y) for y = 0. Probability Distributions and Probability Densities 6 Marginal Distributions To introduce the concept of a marginal distribution. we get  q  1 2 2 g(x) = f (x. and we obtain the following definition. Corre- spondingly. If X and Y are continuous random variables and f(x. y) x for each y within the range of Y is called the marginal distribution of Y. y). Correspondingly. Probability Distributions and Probability Densities for each x within the range of X is called the marginal distribution of X. y) dy for − q < x < q −q is called the marginal density of X. we can speak not only of the marginal distributions of the individual random variables. Likewise. y) is the value of their joint probability density at (x. y) = 3 ⎪ ⎪ ⎩0 elsewhere find the marginal densities of X and Y. the function given by  h(y) = f (x. EXAMPLE 21 Given the joint probability density ⎧ ⎪ ⎪ 2 ⎨ (x + 2y) for 0 < x < 1. 0 < y < 1 f (x. When X and Y are continuous random variables. the function given by  q h(y) = f (x. DEFINITION 11. the probability distributions are replaced by probability densities. the function given by  q g(x) = f (x. but also of the 93 . the summations are replaced by integrals. When we are dealing with more than two random variables. MARGINAL DENSITY. y) dy = (x + 2y) dy = (x + 1) −q 0 3 3 for 0 < x < 1 and g(x) = 0 elsewhere. y) dx for − q < y < q −q is called the marginal density of Y. Solution Performing the necessary integrations. y) dx = (x + 2y) dx = (1 + 4y) −q 0 3 3 for 0 < y < 1 and h(y) = 0 elsewhere.  q  1 2 1 h(y) = f (x. xn ) dx1 dx3 · · · dxn −q −q for −q < x2 < q. x2 . we find that the marginal density of X1 alone is given by  q 1  q g(x1 ) = f (x1 . x3 ) dx2 dx3 = m(x1 . . . x2 . . xn ) = ··· f (x1 . the joint marginal density of X1 and Xn is given by  q  q ϕ(x1 . . . . and Xn has the values f (x1 . . . . EXAMPLE 22 Considering again the trivariate probability density of Example 19. x3 ) = 0 elsewhere find the joint marginal density of X1 and X3 and the marginal density of X1 alone. . . x2 . Solution Performing the necessary integration. xn ) x2 xn for all values within the range of X1 . x3 ) = ··· f (x1 . x2 . 94 . x3 > 0 f (x1 . . Using this result. Probability Distributions and Probability Densities joint marginal distributions of several of the random variables. . probability distributions are replaced by probability densities. we find that the joint marginal density of X1 and X3 is given by  1   1 −x3 m(x1 .  (x1 + x2 )e−x3 for 0 < x1 < 1. x2 . the joint marginal distribution of X1 . x2 . summations are replaced by integrals. xn ) x4 xn for all values within the range of X1 . and X3 . . . . xn ) dx2 dx3 · · · dxn−1 −q −q for −q < x1 < q and −q < xn < q. and if the joint probability density of the continuous random variables X1 . x2 . X2 . . x2 .. . . x2 . and Xn has the values f (x1 . x3 ) = (x1 + x2 )e−x3 dx2 = x1 + e 0 2 for 0 < x1 < 1 and x3 > 0 and m(x1 . xn ). . . X2 . . 0 < x2 < 1. x3 ) = 0 elsewhere. and X3 is given by   m(x1 . x3 ) dx3 0 0 0  q  1 −x3 1 = x1 + e dx3 = x1 + 0 2 2 for 0 < x1 < 1 and g(x1 ) = 0 elsewhere. . the marginal density of X2 alone is given by  q  q h(x2 ) = ··· f (x1 . . and other marginal distributions can be defined in the same way. . . For the continuous case. xn ).. the marginal distribution of X1 alone is given by   g(x1 ) = ··· f (x1 . and so forth. . . X2 . If the joint probabil- ity distribution of the discrete random variables X1 . . . X2 . we get 2 4 f (0|1) = 9 = 7 7 18 95 . given event B. Denoting the conditional probability by f (x|y) to indicate that x is a variable and y is fixed. where f (x. y) is the value of the joint probability distribution of X and Y at (x. Y = y) P(X = x|Y = y) = P(Y = y) f (x. y). y) w(y|x) = g(x) Z 0 g(x) for each y within the range of Y is called the conditional distribution of Y given X = x. Correspondingly. 79. y) is the value of the joint prob- ability distribution of the discrete random variables X and Y at (x. DEFINITION 12. Some problems relating to such distribution func- tions will be left to the reader in Exercises 72. y) and h(y) is the value of the marginal distribution of Y at y. CONDITIONAL DISTRIBUTION. y) = h(y) provided P(Y = y) = h(y) Z 0. as P(A ∩ B) P(A|B) = P(B) provided P(B) Z 0. EXAMPLE 23 With reference to Examples 12 and 20. let us now make the following definition. 7 Conditional Distributions In the conditional probability of event A. if g(x) is the value of the marginal distribution of X at x. Probability Distributions and Probability Densities Corresponding to the various marginal and joint marginal distributions and den- sities we have introduced in this section. If f(x. the function given by f (x. we can also define marginal and joint marginal distribution functions. Suppose now that A and B are the events X = x and Y = y so that we can write P(X = x. and 80. y) f (x|y) = h(y) Z 0 h(y) for each x within the range of X is called the conditional distribution of X given Y = y. the function given by f (x. Solution Substituting the appropriate values from the table in Example 20. find the conditional distribution of X given Y = 1. and h(y) is the value of the marginal distribution of Y at y. y) f (x|y) = h(y) Z 0 h(y) for −q < x < q. is called the conditional density of X given Y = y. and use it to evaluate P(X F 12 |Y = 12 ). y) (x + 2y) f (x|y) = = 3 h(y) 1 (1 + 4y) 3 2x + 4y = 1 + 4y for 0 < x < 1 and f (x|y) = 0 elsewhere. and we obtain the following definition. y) and h(y) is the value of the marginal distribution of Y at y. find the conditional density of X given Y = y. if g(x) is the value of the marginal density of X at x. is called the conditional density of Y given X = x.    1 1 2x + 4 · f x  = 2 2 1 1+4· 2 2x + 2 = 3 96 . Solution Using the results obtained on the previous page. Probability Distributions and Probability Densities 1 3 f (1|1) = 6 = 7 7 18 0 f (2|1) = =0 7 18 When X and Y are continuous random variables. CONDITIONAL DENSITY. the function given by f (x. Correspond- ingly. EXAMPLE 24 With reference to Example 21. Now. DEFINITION 13. we have 2 f (x. If f(x. the probability distributions are replaced by probability densities. the function given by f (x. y) is the value of the joint density of the continuous random variables X and Y at (x. y) w(y|x) = g(x) Z 0 g(x) for −q < y < q. Probability Distributions and Probability Densities and we can write     1 1  1 2 2x + 2 5 P X F Y = = dx = 2 2 0 3 12 It is of interest to note that in Figure 12 this probability is given by the ratio of the area of trapezoid ABCD to the area of trapezoid AEFD. and g(x) = 0 elsewhere. Figure 12. 0 < y < 1 f (x. also  q  1 h(y) = f (x. Solution Performing the necessary integrations. EXAMPLE 25 Given the joint probability density  4xy for 0 < x < 1. y) = 0 elsewhere find the marginal densities of X and Y and the conditional density of X given Y = y. we get  q  1 g(x) = f (x. y) dy = 4xy dy −q 0 y=1  2 = 2xy  = 2x  y=0 for 0 < x < 1. y) dx = 4xy dx −q 0 x=1  2  = 2x y = 2y  x=0 97 . Diagram for Example 24. we can consider various different kinds of conditional distributions or densities. x2 . x2 . where g(x1 . . x4 ). x3 . x4 ) given X1 = x1 and X3 = x3 . xn ) is the value of the joint probability distribution of the discrete random variables X1 . x3 . x4 |x1 ) = b(x1 ) Z 0 b(x1 ) for the value of the joint conditional distribution of X2 . or f (x1 . If f(x1 . and hence the formulas of Definitions 12 and 13 yield f (x. xn ) and fi (xi ) is the value of the marginal distribution of Xi at xi for i = 1. Generalizing from this observa- tion. x2 . . . x2 . and X4 = x4 . we get f (x. When we are dealing with two or more random variables. . x4 ) r(x2 . . . y) = f (x|y) · h(y) = g(x) · h(y) That is. if f (x1 . it can easily be verified that the three ran- dom variables of Example 22 are not independent. x3 ) Z 0 m(x1 . x4 ) Z 0 g(x1 . x4 ) q(x2 . x4 |x1 . . X3 . . x4 ). whether continuous or discrete.” With this definition of independence. n. Xn at (x1 . x4 ) = g(x1 . x3 . but this is clearly not the case in Example 24. x4 ) is the value of the joint marginal distribution of X1 . Whenever the values of the conditional distribution of X 1 + 4y given Y = y do not depend on y. X2 . x2 . we can write f (x1 . x3 . and f (x|y) = 0 elsewhere. x2 . y) 4xy f (x|y) = = = 2x h(y) 2y for 0 < x < 1. it follows that f (x|y) = g(x). questions of indepen- dence are usually of great importance. x2 . x2 . and X4 at (x1 . then the n random variables are independent if and only if f (x1 . xn ) = f1 (x1 ) · f2 (x2 ) · . . x3 ) for the value of the joint conditional distribution of X2 and X4 at (x2 . Probability Distributions and Probability Densities for 0 < y < 1. . . . . X2 = x2 . xn ) within their range. When we are dealing with more than two random variables. . x3 . . . x2 . and h(y) = 0 elsewhere. x4 ) p(x3 |x1 . x4 ) given X1 = x1 . . For instance. Then. . DEFINITION 14. . we simply substitute the word “density” for the word “distribution. x2 . . INDEPENDENCE OF DISCRETE RANDOM VARIABLES. and X4 at (x2 . We can also write f (x1 . x4 ) is the value of the joint distribution of the discrete random variables X1 . substituting into the formula for a con- ditional density. To give a corresponding definition for continuous random variables. x2 . X2 . . let us now make the following definition. x3 . · fn (xn ) for all (x1 . x2 . x3 ) = m(x1 . . but that the two random variables 98 . In Example 25 we see that f (x|y) = 2x does not depend on the given value Y = y. . . the values of the joint distribution are given by the products of the corre- sponding values of the two marginal distributions. 2x + 4y where f (x|y) = . X2 . X3 . . and X4 at (x1 . x3 . x2 . . 2. x4 ) for the value of the conditional distribution of X3 at x3 given X1 = x1 . x2 . Solution According to Definition 14. their joint probability distribution is given by f (x1 . . x2 . x3 ) = f1 (x1 ) · f2 (x2 ) · f3 (x3 ) = e−x1 · 2e−2x2 · 3e−3x3 = 6e−x1 −2x2 −3x3 99 .. EXAMPLE 27 Given the independent random variables X1 . . . n. EXAMPLE 26 Considering n independent flips of a balanced coin. Solution Since each of the random variables Xi . . and use it to evaluate the probability P(X1 + X2 F 1. . 1 2 and the n random variables are independent. Probability Distributions and Probability Densities X1 and X3 and also the two random variables X2 and X3 are pairwise independent (see Exercise 81). X2 . . Find the joint probability distribu- tion of these n random variables. . . . let Xi be the number of heads (0 or 1) obtained in the ith flip for i = 1. for i = 1. . . n. .· = 2 2 2 2 where xi = 0 or 1 for i = 1. . x2 . 2. n.. X3 > 1). 2. has the probability distri- bution 1 fi (xi ) = for xi = 0. and X3 with the probability densities  e−x1 for x1 > 0 f1 (x1 ) = 0 elsewhere  2e−2x2 for x2 > 0 f2 (x2 ) = 0 elsewhere  −3x3 for x > 0 f3 (x3 ) = 3e 3 0 elsewhere find their joint probability density. . 2. . . . · fn (xn )  n 1 1 1 1 = · ·. xn ) = f1 (x1 ) · f2 (x2 ) · . the values of the joint probability density are given by f (x1 . The following examples serve to illustrate the use of Definition 14 in finding probabilities relating to several independent random variables. . Given the joint probability distribution xyz find f (x. Check whether X and Y are independent if their joint of X and Y shown in the table probability distribution is given by x (a) f (x. z) = for x = 1. y) = 4 1 0 ⎪ ⎪ 8 ⎩0 elsewhere find find (a) the marginal distribution of X. x2 > 0. y) = 24y(1 − x − y) for x > 0. y = 1. and 1 1 x = 1 and y = 1. (a) the marginal distribution of X. With reference to Example 22. If the joint probability density of X and Y is given by 1 ⎧ y 0 0 ⎪ ⎪ 4 ⎨ 1 (2x + y) for 0 < x < 1. find (a) the marginal density of X. (b) f (x. that is. With reference to Exercise 53. 2 (a) the marginal density of X. (c) the conditional distribution of X given Y = −1. x = −1 and −1 1 y = 1. X3 = 2. find (a) the marginal density of Y. (b) the conditional density of Y given X = 14 . 76. and x = 1 and y = 1. y > 0. With reference to Exercise 74. 75. x3 > 0. y) = 14 for x = −1 and y = −1. 2. 0 < y < 2 1 f (x. Thus. (b) the marginal distribution of Y. find (a) the marginal distribution function of X. (b) the marginal density of Y. that is. f (x.020 Exercises 69. X3 > 1) = 6e−x1 −2x2 −3x3 dx1 dx2 dx3 1 0 0 −1 = (1 − 2e + e−2 )e−3 = 0. (b) the marginal distribution of Y. Given the values of the joint probability distribution 73. 77. −1 8 2 74. x3 ) = 0 elsewhere. (b) the conditional density of X given Y = 1. (e) the joint conditional distribution of Y and Z given Also determine whether the two random variables are X = 3.  q  1  1−x2 P(X1 + X2 F 1. for −q < x < q. z = 1. 100 . 108 (b) the marginal density of Y. If the joint probability density of X and Y is given by (c) the conditional distribution of X given Y = 1. (c) the marginal distribution of X. find 70. 3. x = 0 and y = 1. y. 72. x2 . y) = 13 for x = 0 and y = 0. the function given by F(x|1) = P(X F x|Y = 1) X1 = 12 . and f (x1 . (d) the conditional distribution of Z given X = 1 and Y = 2. 3. independent. (b) the joint marginal distribution of X and Z. With reference to Example 20. x = 1 and y = −1. (a) the marginal density of X. Probability Distributions and Probability Densities for x1 > 0. x + y < 1 0 elsewhere 71. With reference to Exercise 42. independent. (b) the conditional distribution function of X given Y = (b) the joint conditional density of X2 and X3 given 1.  (d) the conditional distribution of Y given X = 0. the (a) the conditional density of X2 given X1 = 13 and function given by G(x) = P(X F x) for −q < x < q. find Also determine whether the two random variables are (a) the joint marginal distribution of X and Y. find 78. 2. G(x) = F(x.0 3. x3 ).6 3.) In this section.4 4. q.4 3. the limitations of measuring instruments and roundoff produce discrete values. however.1 4. If F(x1 .7 4.7 4. show that the marginal distribution (b) the marginal distribution function of X1 . or service which gave rise to them.1 3. these distribu- tions become probability density functions.7 4. If the independent random variables X and Y have 80. q.1 4. and how. show that the ⎧ joint marginal distribution function of X1 and X3 is ⎪ ⎪ ⎨1 for 0 < x < 2 given by f (x) = 2 ⎪ ⎪ ⎩0 elsewhere M(x1 . illustrate this point: Integrated Circuit Response Times (ps) 4. To construct such a display.1 3. X2 . The following data. x2 . With reference to Example 22.4 3. giving the response times of 30 integrated circuits (in picoseconds). When confronted with raw data.7 4. use these results to find (b) the value of P(X 2 + Y 2 > 1).3 3.6 4. x2 . even this information would be difficult to elicit. an important element of what is called data analysis.6 4. Probability Distributions and Probability Densities 79.3 4. and X3 at (x1 . In practice. function of X is given by 81. verify that the three random variables X1 .0 3. 8 The Theory in Practice This chapter has been about how probabilities can group themselves into probability distributions. and X3 are not independent. we obtain the following stem-and-leaf display: 101 .4 4. product. −q < x3 < q ⎧ ⎪ ⎪ ⎨1 for 0 < y < 3 and that the marginal distribution function of X1 is π(y) = 3 ⎪ ⎪ given by ⎩0 elsewhere G(x1 ) = F(x1 .6 4.0 6.) A start at exploring data can be made by constructing a stem-and-leaf display. x3 ) for −q < x1 < q.7 3.2 4. 82. all data appear to be discrete. y) is the value of the joint distribution function (a) the joint marginal distribution function of X1 and X3 . With reference to Example 19. q) for −q < x < q but that the two random variables X1 and X3 and also the two random variables X2 and X3 are pairwise Use this result to find the marginal distribution function independent. If F(x. y). per- haps. it is difficult to understand what the data are informing us about the process.4 5. the first digit of each response time is listed in a column at the left. often consisting of a long list of measurements.8 4. we shall introduce some applications of the ideas of probability distributions and den- sities to the exploration of raw data.6 4. of X for the random variables of Exercise 54. For the response-time data. X2 .9 Examination of this long list of numbers seems to tell us little other than. of X and Y at (x. (If the list contained several hundred numbers. the response times are greater than 3 ps or less than 7 ps. (Even if data arise from continuous random variables. x3 ) is the value of the joint distribution the marginal densities function of X1 .5 6. and the associated second digits are listed to the right of each first digit. q) for −q < x1 < q find (a) the joint probability density of X and Y.5 4.1 5. in the case of continuous random variables. x3 ) = F(x1 . 1. but it is rarely helpful to use fewer than 5 or more than 15 classes. To construct a frequency distribution. and 9. namely there is no loss of information in a stem-and-leaf display. and more detail might be desirable. For example. 3. To obtain a finer subdivision of the data in each stem. 7. and those in the second half are 5. FREQUENCY DISTRIBUTION.9 ps than any other. Numerical data can be grouped according to their values in several other ways in addition to stem-and-leaf displays.0 to 4. The asterisk is used with stem labels 5 and 6 to show that all 10 digits are included in these stems. The stem-and-leaf display allows examination of the data in a way that would be difficult. Probability Distributions and Probability Densities 3 7 4 4 7 7 4 3 7 9 4 6 0 1 1 5 6 2 6 7 1 1 5 6 4 8 3 4 5 6 1 6 0 0 In this display. The first two stems of this stem-and-leaf display contain the great majority of the observations. The construction of a frequency distribution is easily facilitated with a computer program such as MINITAB. This method of exploratory data analysis yields another advantage. and s (for second) to denote that the leaves are 5–9. Each number on a stem to the right of the vertical line is called a leaf. The resulting double-stem display looks like this: 3f 4 4 4 3 3s 7 7 7 7 9 4f 0 1 1 2 1 1 4 3 4 4s 6 5 6 6 7 5 6 8 5∗ 6 1 6∗ 0 0 The stem labels include the letter f (for first) to denote that the leaves of this stem are 0–4. 102 . and 4. 2. it can quickly be seen that there are more response times in the range 4. the number of classes should increase as the number of observations becomes larger. 6. Generally. from the original listing. A grouping of numerical data into classes having definite upper and lower limits is called a frequency distribution. a double-stem display can be constructed by dividing each stem in half so that the leaves in the first half of each stem are 0. and that the great majority of circuits had response times of less than 5. The number of classes can be chosen to make the specification of upper and lower class limits convenient. first a decision is made about the number of classes to use in grouping the data. DEFINITION 15. if not impossible. each row is a stem and the numbers in the column to the left of the vertical line are called stem labels. 8. The following discussion may be omitted if a computer program is used to construct frequency distributions. Finally. given to the nearest 10 psi: 4890 4830 5490 4820 5230 4860 5040 5060 4500 5260 4610 5100 4730 5250 5540 4910 4430 4850 5040 5000 4600 4630 5330 5160 4950 4480 5310 4730 4700 4390 4710 5160 4970 4710 4430 4260 4890 5110 5030 4850 4820 4550 4970 4740 4840 4910 5200 4880 5150 4890 4900 4990 4570 4790 4480 5060 4340 4830 4670 4750 Solution Since the smallest observation is 4260 and the largest is 5540. (Note that class limits of 4200–4400. The interval between successive class boundaries is called the class interval. are used in constructing cumulative distributions (Exercise 88). so there is no ambiguity about which class contains any given observation. 5400–5990. 4400–4590. the observations are tallied to determine the class frequencies. . In choosing class limits. counting the number that fall into each class: Class Limits Tally Frequency 4200–4390 3 4400–4590 7 4600–4790 12 4800–4990 19 5000–5190 11 5200–5390 6 5400–5590 2 Total 60 Note the similarity between frequency distributions and probability distribu- tions. Class boundaries.) The following table exhibits the results of tallying the observations. are not used because they would overlap and assignment of 4400. it is important that the classes do not over- lap. . it will be convenient to choose seven classes. but a probability distribution rep- resents a theoretical distribution of probabilities. . Probability Distributions and Probability Densities The smallest and largest observations that can be put into each class are called the class limits. EXAMPLE 28 Construct a frequency distribution of the following compressive strengths (in psi) of concrete samples. A frequency distribution represents data. for example. having the class limits 4200–4390. would be ambiguous. The midpoint between the upper class limit of a class and the lower class limit of the next class in a frequency distribution is called a class boundary. rather than class marks. the number of obser- vations falling into each class. it can also be defined as the difference between successive lower class lim- its or successive upper class limits. 4400–4600. etc. that is. it could fit into either of the first two classes.. enough classes should be included to accommodate all observations.. Also. (Note that the class interval is not obtained by 103 . Using MINITAB software to construct the histogram of compressive strengths. and 5195–5395.9 30.) A class can be represented by a single number. This number is calculated for any class by averaging its upper and lower class limits. it is (4400 + 4590)/ 2 = 4495 for the second class.2 26.9 12. its actual value is lost.0 17. 5095.1 18.8 14. In so doing. the class boundaries overlap. 4795–4995. Histograms are easily constructed using most statistical software packages. Probability Distributions and Probability Densities subtracting the lower class limit of a class from its upper class limit.8 22.8 18. 4590 − 4390 = 200. and (c) the class mark of each class.9 14. and the class marks are 4695. Once data have been grouped into a frequency distribution. also is given by the difference between any two successive class marks.7 9. for example. or by subtracting successive upper class limits.4 16.7 7.4 23.0 14.4 13.3 12.1 12.1 14. each observation in a given class is treated as if its value is the class mark of that class. 4595–4795.6 23.9 24. (b) The class interval is 200. Also note that.8 13. (c) The class mark of the first class is (4200 + 4390)/2 = 4295.5 10. Solution (a) The class boundaries of the first class are 4195–4395. and 5495 for the remaining five classes.1 13.0 37. it is known only that its value lies somewhere between the class limits of its class. 4400 − 4200 = 200 psi. The class boundaries of the second through the sixth classes are 4395–4595. for example.8 13. 4995–5195.4 9. unlike class limits.0 12. The class boundaries of the last class are 5395–5595.1 16.8 10.8 21. It also can be found by subtracting successive lower class limits. EXAMPLE 29 For the frequency distribution of compressive strengths of concrete given in Example 28. 5295.2 27. (b) the class interval.5 5.4 11.6 26. 104 . Note that the lower class boundary of the first class is calculated as if there were a class below the first class.6 36.0 7.6 6. 4895. Note that the class interval. 200. Such an approximation is the price paid for the convenience of working with a frequency distribution.3 Use MINITAB or other statistical software to obtain a histogram of these data. find (a) the class boundaries. EXAMPLE 30 Suppose a wire is soldered to a board and pulled with continuously increasing force until the bond breaks. respectively. the difference between the upper and lower class boundaries of any class. and the upper class boundary of the last class is calculated as if there were a class above it. we obtain the result shown in Figure 13.8 10. The forces required to break the solder bonds are as follows: Force Required to Break Solder Bonds (grams) 19.7 12. called the class mark. Solution The resulting histogram is shown in Figure 14. This histogram exhibits a right-hand “tail. such as the proportion of cloudiness on a given day. Histogram of compressive strengths. 14 12 10 Frequency 8 6 4 2 0 4 12 20 28 36 Strength (psi) Figure 14. 105 . workers’ incomes. many kinds of stress tests. Histogram of solder-bond strengths. the data are said the have negative skewness. a few had strengths that were much greater than the rest. Examples of data that often are skewed include product lifetimes. if the tail is on the left.” suggesting that while most of the solder bonds have low or moderate breaking strengths. A histogram exhibiting a long right-hand tail arises when the data have pos- itive skewness. Likewise. Data having histograms with a long tail on the right or on the left are said to be skewed. Probability Distributions and Probability Densities 14 12 10 Frequency 8 6 4 2 0 4200 4500 4800 5100 5400 Strength (psi) Figure 13. and many weather-related phenomena. . (b) the probability of getting at most two heads. the weekly num- ber of accidents at a certain intersection. The probability distribution of V. find the distribution 86. A histogram exhibiting two modes is said to be bimodal. 88. If g(0) = 0. that is. and one having more than two modes is said to be multimodal. For example. and g(3) = 0. Histograms sometimes show more than one mode. Thus. find the probability gram. . the difference between the number of heads and the number of tails obtained in four tosses of a balanced coin. is given by 84. A coin is biased so that heads is twice as likely as tails. g(1) = 0. Some examples of “naturally” skewed data include the duration of telephone calls. two balls are drawn from the urn at random (that is. tribution of Y. where S = 2.20. 3. With reference to Exercise 85. each mode representing the center of the data that would arise from the corresponding cause if it were operating alone. Use probability that this sum of the spots on the dice will be the distribution function of X to find at most S. 3. and 4. or “high points. This question has been intentionally omitted for this of heads. . g(2) = 0. the time intervals between emissions of radioactive particles. each Construct the distribution function of V and draw pair has the same chance of being selected) and Z is the its graph. find the distribution function of the sum of the spots on the dice. If there are several causes operating. 1–2 83. the total number 89. An example of a bimodal histogram is shown in Figure 15. and the histogram of all the data may be multimodal. With reference to Exercise 80. (b) the values of the distribution function. and. 87. incomes of workers. find (a) the probability distribution of X. Probability Distributions and Probability Densities The shape of a histogram can be a valuable guide to a search for causes of production problems in the early stages of an investigation. With reference to Example 3. With reference to Exercise 87.40. a skewed histogram often arises from “drifting” of machine settings from their nominal val- ues. edition. . 90. sum of the numbers on the two balls drawn.10.” A mode is a bar in a histogram that is surrounded by bars of lower frequency. 85. using (b) the distribution function of Z and draw its graph. Bimodal histogram. (a) the original probabilities. find (a) the probability distribution of Z and draw a histo. An urn contains four balls numbered 1. each cause may generate its own distribution. that there will be at least two accidents in any one week. as previously mentioned. Sometimes skewed distributions do not arise from underlying causes but are the natural consequences of the type of measurements made. 2. 12. Figure 15. the function of the random variable X and plot its graph. 106 . multimodality can facilitate a search for underlying causes of error with the aim of eliminating them. (b) P(X > 2). (a) P(1 < X F 3).30. For three independent tosses of the coin. Applied Exercises SECS. find the probability dis. Two textbooks are selected at random from a shelf ⎧ that contains three statistics texts. Tucson is early or late is a random variable whose proba- bility density is given by 96. 91. and Y is the number of dice that come up 4. The actual amount of coffee (in grams) in a 230-gram (b) at most 100 hours. 3–4 (a) at least 200 hours. this city is 9 million liters? 93. 5.000 to 36. In a certain city the daily consumption of water (in ⎧ millions of liters) is a random variable whose probability ⎪ ⎨ 1 density is given by (36 − x2 ) for −6 < x < 6 f (x) = 288 ⎧ ⎪ ⎩ ⎪ ⎪ ⎨ 1 xe− 3 x 0 elsewhere for x > 0 f (x) = 9 ⎪ ⎪ where negative values are indicative of the flight’s being ⎩0 elsewhere early and positive values are indicative of its being late.000 99. likely points of the sample space.65 grams of coffee. jar filled by a certain machine is a random variable whose (c) anywhere from 80 to 120 hours. probability density is given by 95. What are the probabilities that on a given day (b) at least 1 minute late. ⎪ ⎨ 1 − 30x and three physics texts. two mathematics texts.5 ⎧ ⎪ ⎪ ⎪ ⎪ 5 ⎪ ⎨0 for x F 5 ⎩ 0 for x G 232.5 F(x) = 25 ⎪ ⎩1 − for x > 5 Find the probabilities that a 230-gram jar filled by this x2 machine will contain (a) at most 228. (c) at least 229. (b) anywhere from 27. A sharpshooter is aiming at a circular target with a shelf life of radius 1. If X is the number of statistics e for x > 0 texts and Y the number of mathematics texts actually f (x) = 30 ⎪ ⎩ chosen. (a) beyond 10 years. The tread wear (in thousands of kilometers) that car SEC. (b) the water supply is inadequate if the daily capacity of (d) exactly 5 minutes late. Probability Distributions and Probability Densities SECS.66 grams of coffee. or 6. aged food is a random variable whose probability density (b) Construct a table showing the values of the joint prob- function is given by ability distribution of X and Y. construct a table showing the values of the joint 0 for x F 0 probability distribution of X and Y. The shelf life (in hours) of a certain perishable pack. Find the probabilities that one of these tires will last 98. (c) anywhere from 1 to 3 minutes early.000 kilometers. (a) Draw a diagram like that of Figure 1 showing the val- (c) at least 48. If X is the number of heads and Y the number of ⎨ for x > 0 heads minus the number of tails obtained in three flips f (x) = (x + 100) 3 ⎪ ⎪ of a balanced coin.000 kilometers.5 ⎪ ⎨ function is given by f (x) = 1 for 227. 92. Find the probabilities that such a five-year-old dog will live (b) anywhere from 229. ⎧ ⎪ ⎪ 20.000 kilometers. (a) the water consumption in this city is no more than 6 million liters. The total lifetime (in years) of five-year-old dogs of ⎧ ⎪ a certain breed is a random variable whose distribution ⎪ ⎪ 0 for x F 227. Find the probabilities that one of these packages will have 100. the number of dice that come up 1. 5 owners get with a certain kind of tire is a random variable whose probability density is given by 97.34 to 231.5 < x < 232. Suppose that we roll a pair of balanced dice and X is (a) at most 18. The number of minutes that a flight from Phoenix to (c) anywhere from 12 to 15 years. Find the probabilities that one of these flights will be (a) at least 2 minutes early. If we draw a rectangular system of coordinates 107 .85 grams of coffee. construct a table showing the values ⎩0 elsewhere of the joint probability distribution of X and Y. (b) less than eight years. ues of X and Y associated with each of the 36 equally 94. Show that the two random variables of Exercise 102 on the humanities test? are not independent. the proportions of correct answers that variables is given by a student gets on the tests in the two subjects. and the joint probability density of X and Y of the point of impact. Z is the dom variables whose joint probability distribution can be number of aces obtained in the first draw. y)|0 < x2 + y2 < 12 }. when p = 25 cents.50 108. (a) P[(X. (a) the price will be less than 30 cents and sales will (c) the probability that sales will be less than 30. s > 0 (c) the conditional distribution of W given Z = 1.40. f (p. (X.000 units. If X is the proportion of persons who will respond to ⎨ for x > 0 f (x) = (x + 100)3 one kind of mail-order solicitation. 101. ment) from an ordinary deck of 52 playing cards. Find the probabilities that (b) the conditional density of S given P = p. y) = 25 x 2 ⎪ ⎪ ⎧ ⎩0 elsewhere ⎪ ⎪ 2 (2x + 3y) ⎨ for 0 < x < 1. Y is the proportion of ⎪ ⎪ persons who will respond to another kind of mail-order ⎩0 elsewhere 108 . If two cards are randomly drawn (without replace- dollars). Suppose that P. y) = 5 ⎪ ⎪ find ⎩0 elsewhere (a) the marginal density of X. If X is the amount of money (in dollars) that a sales- person spends on gasoline during a day and Y is the cor- 102. find (a) the marginal density of P. 6–7 109. the price of a certain commodity (in 105. ⎧ ⎪ ⎪ 20. SECS. A certain college gives aptitude tests in the sciences responding amount of money (in dollars) for which he or and the humanities to all entering freshmen. Y) ∈ B]. find grated circuit is a random variable having the probabil- (a) the marginal distribution of X. ity density (b) the conditional distribution of Y given X = 0. Y).000 104. the coordinates solicitation. With reference to Exercise 101. bursed at least $8 when spending $12.40 on both tests. where B = {(x.000 units). where A is the sector of the circle in (b) at most 50 percent will respond to the second kind of the first quadrant bounded by the lines y = 0 and y = x. y) = 5 ⎨ for 0 < x2 + y2 < 1 ⎪ ⎪ ⎩0 f (x. respectively.20 < p < 0. 0 < y < 1 ⎪ ⎪1 f (x. is given by ing the joint probability density ⎧ ⎪ ⎪ 2 ⎧ ⎨ (x + 4y) for 0 < x < 1. What are the probabilities that a student will get (c) the probability that the salesperson will be reim- (a) less than 0. its total sales (in 10. The useful life (in hours) of a certain kind of inte- 103. (b) the price will be between 25 cents and 30 cents and sales will be less than 10. the joint density of these two random are. and S. the joint ⎧   probability distribution of these random variables can be ⎪ approximated with the joint probability density ⎨ 1 20 − x ⎪ for 10 < x < 20. (b) more than 0. 5pe−ps for 0.000 units exceed 20. Probability Distributions and Probability Densities with its origin at the center of the target. and W is the approximated closely with the joint probability density total number of aces obtained in both draws. 0 < y < 1 f (x. If X and Y she is reimbursed. 107. cent response to the first kind of mail-order solicitation.80 on the science test and less than 0.000 units. (b) the conditional density of Y given X = 12.  (b) the marginal distribution of Z. x <y<x f (x. Y) ∈ A]. y) = π elsewhere ⎪ ⎪ ⎩0 elsewhere find the probabilities that (a) at least 30 percent will respond to the first kind of Find mail-order solicitation. are ran. are random variables hav. With reference to Exercise 97. find (a) the joint probability distribution of Z and W. s) = 0 elsewhere 106. mail-order solicitation given that there has been a 20 per- (b) P[(X. 0 56. and the construct a stem-and-leaf display for the following data class marks of the frequency distribution constructed in representing the time to make coke (in hours) in succes- Exercise 116.1 3.5 are 6.6 9.5 8. with the following 105 and the second row came from station 107.0 5.5 7. find 45.5 10.2 ments made on 24 solder joints: 59.2 7.8 11.2 58.0 6. Suppose the first row of 12 observations in Exer.4 3.6 56. 120. and the class 114.8 54.7 3.46 1.0 56. and the of different diameters.9 7.2 10.1 62.5 55.0 12.6 5.2 3.1 5. Measurements made of their diameters (in cm) 9.8 57.1 4.9 6.6 8.9 71.4 8.8 49.3 6.4 7. using (b) Construct a double-stem display.0 112.7 56.2 65.2 3.7 73. Two different lathes turn shafts to be used in electric motors.38 1.8 74.1 62. construct a stem-and-leaf display for the combined data of Exercise 112.7 8.4 4.2 75.8 40.9 11. X2 .8 3.4 60. 7.0 58.8 5.2 6.4 69.9 7.6 70.28 1.5 62.4 9.6 53.3 64.9 12. sheets coated with polyurethane under various ambient Construct a percentage distribution using the reaction- conditions: time data of Exercise 116. The following are the number of highway accidents 7.8 10.4 6. sive runs of a coking oven. and X3 .9 Lathe B: 1.1 63.7 7.4 57 64 65 62 59 59 60 62 61 63 58 61 68.6 58.6 68. X3 G 200). Construct two stem-and-leaf diagrams to see if you should suspect that the two lathes are turning out shafts 117.3 7.7 42. 61. The following are the percentages of tin in measure.7 53.8 53.6 7.0 Lathe A: 1.42 1.41 1.9 66.4 7.3 63.6 6. repre- senting the lengths of their useful lives.4 SEC.2 59.8 10.6 50.4 43.1 5.6 10. X2 < 100.1 110.8 67.3 46.6 13.0 6.6 61 63 59 54 65 60 62 61 67 60 55 68 64.56 1.9 (a) the joint probability density of X1 .5 5. Use MINITAB or some other computer software to 118.5 63.0 67.4 54.4 7.9 5.6 73. Probability Distributions and Probability Densities If three of these circuits operate independently. 109 .8 6.1 9.1 4.6 8.8 63.8 52. eight classes.7 14.1 57.6 7. 119.8 4.8 5.5 68.5 7.0 8.9 47.0 64.5 72.40 4. the time for each to take corrective action for a given cise 110 came from solder connections made at station emergency was measured in seconds.6 12.7 4. the class interval.1 57.9 (b) the value of P(X1 < 100. Use MINITAB or some other computer software to interval.1 46.8 68.8 7.1 5.8 62.8 57.4 Construct a frequency distribution of these data.42 1.2 51.9 9.8 6.6 5.7 6. the class interval.9 (a) Construct a stem-and-leaf diagram using 5 and 6 as 55.2 8.4 5.8 6.9 54.2 61.1 5.9 62. Construct a frequency distribution of these data.2 6.1 62.1 8.0 64. Find the class boundaries.8 66.40 1.4 7.44 1.7 9.3 8.4 73.1 13.3 12.4 9.2 63.39 1.9 58.2 6 4 0 3 5 6 2 0 0 12 3 7 2 1 1 6.5 6.4 6.1 62.0 5.1 8. class marks of the frequency distribution constructed in Exercise 115. Find the class boundaries. The following are the drying times (minutes) of 100 times the ratio of that frequency to the total frequency.8 10.6 should suspect a difference in the soldering process at the two stations.4 9.5 7.1 68.9 63. 51.2 4. Use a results: pair of stem-and-leaf diagrams to determine whether you 11.7 5.3 42.31 1.9 70.0 48.9 59.6 6.5 51.3 55. the class marks.6 57. Iden- tify the class boundaries. 8 55.3 52.5 3.5 the stem labels.47 1.36 1.3 61.51 6. (c) Which is more informative? 116.1 6.9 7. 56.6 63.8 50.5 64.8 9.5 reported on 30 successive days in a certain county: 10.3 60.0 4. A percentage distribution is obtained from a fre- quency distribution by replacing each frequency by 100 115.1 60.33 1. 113.2 8.1 43.0 8.6 8.0 66.6 62.4 50.4 7.1 5.2 0 4 0 0 0 1 8 0 2 4 7 3 6 2 0 Construct a frequency distribution of these data.1 9.29 1.8 58.5 64.3 11.3 56.8 64.3 14. Eighty pilots were tested in a flight simulator and 111.7 54.9 64.4 48. (b) Can you find the class mark of every class? 128. (a) Combining the data for both lathes in Exer- 124. (b) How would you describe the shape of the histogram? 125. 136. the distributions of daily absences in the two departments construct a frequency distribution from these data. The small number of obser. 137. given in hours of operation. Use the data of Exercise 128 to illustrate that class 6–7 4 3 marks are given by the midpoint between successive class boundaries as well as the midpoint between successive 8–9 2 1 class limits. show that the class marks also are given by the midpoint between successive class boundaries. To cise 115. boundary on the x-axis is called an ogive. using Construct a frequency polygon using the data in Exer- a larger interval for the last class. Construct a 133. (b) How would you describe the shape of the histogram? 134. Unequal class intervals. Construct percentage distributions from the follow- ing two frequency distributions and determine whether (a) Dropping the rule that class intervals must be equal. Class Shipping Security (b) What can be said about the shape of this histogram? Limits Department Department 129. one is tempted to drop the rule of equal-size classes. and the frequencies of all classes above it. Construct a histogram using the solder-joint data in quency with the sum of the frequency of the given class Exercise 110. 332 331 327 344 328 341 325 2 311 320 122. construct a histogram. (b) Is there a unique class interval? 139. the data of Exercise 116. (a) If that were done. Construct a percentage distribution using the drying. A cumulative frequency distribution is constructed from a frequency distribution by replacing each fre. one is faced with the dilemma of either creating too many classes for only 30 observa. 355 309 375 316 336 278 396 287 cies. keep class intervals equal. f ). Use MINITAB or some other computer software to the frequency distributions of absences given in Exer. 123. construct a histogram of the coking-time data given in cise 122. where x represents the tions or using a small number of classes with excessive class mark of a given class in a frequency distribution and loss of information in the first few classes. 150 389 345 310 20 310 175 376 334 340 time data of Exercise 115. what would the resulting frequency distribution become? 138. (a) Using only the first two rows of the data for the cumulative frequency distribution using the data of Exer. Probability Distributions and Probability Densities 121. f represents its frequency. Construct cumulative percentage distributions from 135. In such cases. follow similar patterns. response times given in Section 8. Percentage distributions are useful in comparing two 256 315 55 345 111 349 245 367 81 327 frequency distributions having different total frequen. cise 116. A plot of the cumulative frequency (see Exer- 127. is called a frequency polygon. Using the data of Exercise 129. Construct a cumulative frequency distribution using cise 112. (a) Construct a histogram of the reaction times of FREQUENCIES pilots from the data in Exercise 116. and represent- ing each class by its upper class boundary. Totals 60 40 131. 132. 126. A plot of the points (x. (a) Construct a histogram of the drying times of 0–1 26 18 polyurethane from the data in Exercise 115. cise 115. Exercise 113. Use MINITAB or some other computer software to vations greater than 7 in Exercise 119 may cause some construct a histogram of the drying-time data in Exer- difficulty in constructing a frequency distribution. Construct a frequency polygon from the data in Exercise 115. 2–3 18 11 (b) What can be said about the shape of this histogram? 4–5 10 7 130. 110 . construct a histogram. The following are the times to failure of 38 light cise 123) on the y-axis and the corresponding upper class bulbs. 89 89 89 5 and 1 . North Scituate. (d) 119 . New York: Macmillan Publishing Brunk. (d) 34 .. Inc. 29 F(x) = 2 9 (a) no.. 1978. ⎪ ⎪ ⎧ ⎪ ⎪ 1 (6x − x2 − 5) for 2 < x < 3 ⎪ for y … 2 ⎪ ⎪ 4 ⎪ ⎨ 0 ⎩1 for x Ú 3 21 F(y) = 1 (y2 + 2y − 8) for 2 < y < 4 ⎪ ⎪ 16 33 f (x) = 12 for − 1 < x < 1 and f (x) = 0 elsewhere. Inc. ⎪ ⎩1 37 The three probabilities are 1 − 3e−2 . f (x) = 0 elsewhere. (c) 55 .: Duxbury Press. ⎪ ⎪ 2 ⎪ ⎩1 11 (a) 12 . R. A. 1976. M.124.. 1977. Applications. Co.  0 for z … 0 39 (a) F(x) = 0. (b) yes.. 1976. Probability and Statistics. (b) 16 . Hogg. Introduction to Mathemat- rial in this chapter may be found in ical Statistics. (a) Construct an ogive for the reaction times given (b) Using the same set of axes. G. Read. The probabilities are 32 2 111 . ⎪ ⎩1 120 for x Ú 1 45 (a) 29 . (b) 14 and 12 . S. Basic Probability Theory and Applica- 1986..1519. (b) 14 . for x Ú 2 ⎧ f (10) = 16 .. (e) 12 . References More advanced or more detailed treatments of the mate. Inc. lishing Co. 31 F(x) = 1 (2x − 1) for 1 < x … 2 ⎪ ⎪ 4 19 (c) 0. Pacific Palisades.: Xerox College Publishing. 35 f (y) = for y > 0 and f (y) = 0 elsewhere. the same graph also shows the ogive of the percentage (b) Construct an ogive representing the cumulative per- distribution of drying times. (c) f (1) = 13 . M. and Craig. T. A. Khazanie.. H. Lexington. ⎪x ⎪ for 0 < x … 1 ⎨ 4 17 (b) 25 . ⎪ ⎪ 0 for x … 0 ⎪ ⎪ ⎪ ⎪ 2 13 (a) 34 . ing. New York: Macmillan Pub- DeGroot. (c) yes. 14 . 38 . H. (b) 5 . (c) no. (c) F(x) = 12 (x + 1).: Addison-Wesley Publishing Company. V. and for x Ú 4 5e−5 . (f) 14 . Mass. Mass. centage distribution. 2e−1 − 4e−3 . Calif.: Goodyear Publishing Fraser. Mass. Answers to Odd-Numbered Exercises ⎧ 1 (a) no.. Kendall. because F(2) is ⎪ ⎪ x2 ⎪ ⎪2x − −1 for … x < 2 less than F(1). A. The Advanced Theory 1975. of Statistics. and 12 . 140.. 2nd ed. R.. 4th ed.454 and 0.. ⎩1 for y Ú 4 18 The probabilities are 0. √ 25 23 F(x) = 12 x for 0 < x < 4. (b) 0. relabel the y-axis so that in Exercise 116. ⎧ ⎪ for x … 0 41 The probabilities are 14 . ⎪ ⎪ x2 ⎪ ⎨ for 0 < x < 1 5 0 < k < 1. 4th ed. ⎨0 27 G(x) = 3x2 − 2x3 for 0 < x < 1 43 (a) 14 . D.. (b) F(x) = 12 x. (b) no. f (6) = 23 . (c) 12 . Vol. Inc. Probability Distributions and Probability Densities (a) Construct an ogive for the data of Exercise 115. f (4) = 16 . because F(4) exceeds 1.. because ⎪ ⎪ 0 for x … 0 ⎪ ⎪ the sum of the probabilities is less than 1. 25 F(z) = 1 − e−z 2 for z > 0 (d) F(x) = 0. Probability and Statistics: Theory and Company. D. the two prob- ⎧ y3 ⎪0 ⎨ for x … 0 abilities are 16 and 649 . (c) 24 7 . tions.. and Stuart. An Introduction to Mathematical Statistics. 1. 3rd ed. because f (4) is negative. z|3) = for y = 1. (b) φ(y|12) = 16 for 6 < y < 12 and φ(y|12) = 0 75 (a) h(y) = 14 (1 + y) for 0 < y < 2 and h(y) = 0 elsewhere. 107 (a) g(x) = for 10 < x < 20 and g(x) = 0 else- 18 50 73 (a) Independent. 1) = 171 . 2. 2. . 2. 6 3 yz 20 − x (e) ψ(y.3038. 3 and y = 1. 93 (a) 0. 91 (a) 0. 0) = 188 221 . (c) 0. (d) φ(z|1. 3 .1054. 49 k = 2. φ(1. 63 (a) 18 27 103 (a) g(0) = 14 g(1) = 15g(2) = 28 28 . (b) 16 79 G(x) = 1 − e−x for x > 0 and G(x) = 0 elsewhere. (b) 0. 1) = 221 18 221 221 17 x z (c) g(x) = for x = 1. (c) 0. where. 65 k = 144. (c) 16 57 (e−2 − e−3 )2 . 1 . Probability Distributions and Probability Densities ⎧ 47 x ⎪ ⎪ 0 for V < 0 ⎪ ⎪ 0 1 2 3 ⎪ ⎨0. 3 and z = 1. y) = for x = 1. 1 P(Y) 16 4 6 4 1 16 16 16 16 85 (a) X 0 1 2 3 (b) 19 27 . 3. φ(2|0) = 10 71 (a) m(x. (b) not independent.6354.90 ⎪ for 2 … V < 3 1 y 1 30 2 3 8 ⎪ ⎩ 1 15 10 15 for V Ú 3 1 2 10 3 3 1 10 5  12 89 Yes. 55 (e−1 − e−4 )2 . g(1) = 17 . 1 . 3 .53. 30 10 5 ⎪ ⎪ ⎪0. f (1. 5 . 95 (a) 14 . x3 > 0 and f (x1 . 221 (b) n(x.70 for 1 … V < 2 . (c) 13 .464.4512. (b) 39 64 1 . x2 > 0. 000)3 77 (a) g(x) = − ln x for 0 < x < 1 and g(x) = 0 elsewhere. (b) 0.2019. x2 . 53 1 − 12 ln 2 = 0. (b) 59 . elsewhere. 2.40 for 0 … V < 1 0 0 1 1 1 87 F(V) = 0. 2) = for z = 1. (b) 221 1 .23. (20. f (0. xz 36 105 (a) f (0. 2. x2 . 109 (a) f (x1 . The two for x1 > 0. x3 ) = 0 elsewhere. 0) = 16 . 1) = 22116 . (c) 23 . 3. 2. 2. xy (b) φ(0|0) = 10 φ(1|0) = 10 . 101 (a) 0. (b) f (x|1) = 12 (2x + 1) for 0 < x < 1 and f (x|1) = 0 elsewhere. random variables are not independent. x=2 51 (a) 12 . 2 111 Station 107 data show less variability than station 83 Y −4 −2 0 2 4 105 data. f (x) = 1. (c) φ(0|0) = 16 . 2. 6 1 . z) = for x = 1. 3 and z = 1. (b) 7 . x3 ) = (x1 + 100)3 (x2 + 100)3 (x3 + 100)3 (b) h(y) = 1 for 0 < y < 1 and h(y) = 0 elsewhere. (b) g(0) = 204 . 1 . 1 P(X) 27 6 12 8 27 27 27 112 . f (1. Marylees Miller.0001) = $0.200 and a third prize worth $400.01 percent of the time (or with probability 0.9997) and win each of the prizes 0. All rights reserved.64 which is the sum of the products obtained by multiplying each amount by the corre- sponding probability.0001). and the mathematical expectation of this random variable was the sum of the products obtained by multiplying each value of the random variable by the corre- sponding probability. If there is also a second prize worth $1. Irwin Miller. the concept of a mathematical expectation arose in connection with games of chance.000 = $0.0001) + 400(0.48.Mathematical Expectation 1 Introduction 6 Product Moments 2 The Expected Value of a Random 7 Moments of Linear Combinations of Random Variable Variables 3 Moments 8 Conditional Expectations 4 Chebyshev’s Theorem 9 The Theory in Practice 5 Moment-Generating Functions 1 Introduction Originally.48 per ticket.400.9997) + 4.800 + $1. On the average we would thus win 0(0.400 10.800 10. and in its simplest form it is the product of the amount a player stands to win and the probability that he or she will win. we could argue that if the raffle is repeated many times.0001) + 1. Referring to the mathematical expectation of a random vari- able simply as its expected value. Inc.800 · 10. From Chapter 4 of John E.000 tickets in a raffle for which the grand prize is a trip worth $4. Looking at this in a different way. For instance.000 = $0.97 percent of the time (or with probability 0.000 tickets pay $4.64 per ticket.800. the amount we won was a random vari- able. or on the average $6. if we hold one of 10. Copyright © 2014 by Pearson Education. we thus have the following definition.000 1 = $0. 113 .800(0. Eighth Edition.200 + $400 = $6. we can argue that altogether the 10. Freund’s Mathematical Statistics with Applications. This amount will have to be interpreted in the sense of an average—altogether the 10.200(0. our mathemati- cal expectation is 4. 2 The Expected Value of a Random Variable In the illustration of the preceding section.800.000 tickets pay $4. or on the average $4. and extending the definition to the continuous case by replacing the operation of summation by integration. we would lose 99. 114 . Indeed. that the sum or the integral exists. other- wise. 3 of the 12 sets can be chosen in ways. 1. If X is a discrete random variable and f(x) is the value of its probability distribution at x. EXAMPLE 1 A lot of 12 television sets includes 2 with white cords. x 0 1 2 6 9 1 f (x) 11 22 22 Now. the expected value of X is  E(X) = x · f (x) x Correspondingly. it should be clear that the term “expect” is not used in its colloquial sense. and these x 3−x 3 3 possibilities are presumably equiprobable. we find that the probability distribution of X. If 3 of the sets are chosen at random for shipment to a hotel. in tabular form. if X is a continuous random variable and f(x) is the value of its probability density at x. the number of sets with white cords shipped to the hotel. 2 12 3 or. the mathematical expectation is undefined. the expected value of X is  q E(X) = x · f (x)dx −q In this definition it is assumed. of course. EXPECTED VALUE. Mathematical Expectation DEFINITION 1. is given by    2 10 x 3−x f (x) =   for x = 0. 6 9 1 1 E(X) = 0 · +1 · +2 · = 11 22 22 2 and since half a set cannot possibly be shipped. it should be interpreted as an average pertaining to repeated shipments made under the given conditions. how many sets with white cords can the shipper expect to send to the hotel? Solution  x  Since of the2 sets with white cords and 3 − x of the   10 other   sets can be chosen 2 10 12 12 in ways. if X is a continuous random variable and f (x) is the value of its probability density at x. to simplify our notation. If X is a discrete random variable and f (x) is the value of its probability distribution at x. Mathematical Expectation EXAMPLE 2 Certain coded measurements of the pitch diameter of threads of a fitting have the probability density ⎧ ⎪ ⎨ 4 for 0 < x < 1 f (x) = π(1 + x2 ) ⎪ ⎩0 elsewhere Find the expected value of this random variable. we might be interested in the random variable Y. but also in the expected values of random variables related to X. THEOREM 1. Since y = g(x) does not necessarily define a one-to-one correspondence. the expected value of g(X) is given by  E[g(X)] = g(x) · f (x) x Correspondingly. If we want to find the expected value of such a random variable g(X).4413 π There are many problems in which we are interested not only in the expected value of a random variable X. we have  1 4 E(X) = x· dx 0 π(1 + x2 )  1 4 x = dx π 0 1 + x2 ln 4 = = 0. the expected value of g(X) is given by  q E[g(X)] = g(x) · f (x) dx −q Proof Since a more general proof is beyond the scope of this chapter. we could first determine its probability distribution or density and then use Definition 1. whose values are related to those of X by means of the equation y = g(x). we denote this random variable by g(X). but generally it is easier and more straightforward to use the following theorem. g(X) takes on the value 23 = 8. Thus. g(X) might be X 3 so that when X takes on the value 2. Solution Using Definition 1. suppose that g(x) takes on the value gi when x takes on 115 . For instance. we shall prove this theorem here only for the case where X is discrete and has a finite range. . xi2 . Then. 116 . . g2 . we get  6 1 E[g(X)] = (2x2 + 1) · 6 x=1 1 1 = (2 · 12 + 1) · + · · · + (2 · 62 + 1) · 6 6 94 = 3 EXAMPLE 4 If X has the probability density ex for x > 0 f (x) = 0 elsewhere find the expected value of g(X) = e3X/4 . gm . . . the probability that g(X) will take on the value gi is ni P[g(X) = gi ] = f (xij ) j=1 and if g(x) takes on the values g1 . . . . find the expected value of g(X) = 2X 2 + 1. Solution Since each possible outcome has the probability 16 . xini . EXAMPLE 3 If X is the number of points rolled with a balanced die. Mathematical Expectation the values xi1 . . it follows that  m E[g(X)] = gi · P[g(X) = gi ] i=1  m  ni = gi · f (xij ) i=1 j=1  m  ni = gi · f (xij ) i=1 j=1  = g(x) · f (x) x where the summation extends over all values of X. we have  q E[e3X/4 ] = e3x/4 · e−x dx 0  q = e−x/4 dx 0 =4 The determination of mathematical expectations can often be simplified by using the following theorems. If a and b are constants. then E(aX) = aE(X) COROLLARY 2. then E(aX + b) = aE(X) + b Proof Using Theorem 1 with g(X) = aX + b. . . If b is a constant. . some proofs will be given for either the discrete case or the continuous case. which enable us to calculate expected values from other known or easily computed expectations. and cn are constants. . then E(b) = b Observe that if we write E(b). c2 . Mathematical Expectation Solution According to Theorem 1. If a is a constant. we get  q E(aX + b) = (ax + b) · f (x) dx −q  q  q =a x · f (x) dx + b f (x) dx −q −q = aE(X) + b If we set b = 0 or a = 0. we can state the following corollaries to Theorem 2. then ⎡ ⎤ n  n E⎣ ci gi (X)⎦ = ci E[gi (X)] i=1 i=1 117 . COROLLARY 1. others are left for the reader as exercises. Since the steps are essentially the same. If c1 . THEOREM 2. THEOREM 3. the constant b may be looked upon as a random variable that always takes on the value b. we get i=1 ⎡ ⎤ ⎡ ⎤  n  n E⎣ ci gi (X)⎦ = ⎣ ci gi (x)⎦ f (x) i=1 x i=1  n  = ci gi (x)f (x) i=1 x  n  = ci gi (x)f (x) i=1 x  n = ci E[gi (X)] i=1 EXAMPLE 5 Making use of the fact that 1 91 E(X 2 ) = (12 + 22 + 32 + 42 + 52 + 62 ) · = 6 6 for the random variable of Example 3. Solution 91 94 E(2X 2 + 1) = 2E(X 2 ) + 1 = 2 · +1 = 6 3 EXAMPLE 6 If the probability density of X is given by 2(1 − x) for 0 < x < 1 f (x) = 0 elsewhere (a) show that 2 E(X r ) = (r + 1)(r + 2) (b) and use this result to evaluate E[(2X + 1)2 ] Solution (a)   1 1 r r E(X ) = x · 2(1 − x) dx = 2 (xr − xr+1 ) dx 0 0   1 1 2 =2 − = r+1 r+2 (r + 1)(r + 2) 118 . Mathematical Expectation  n Proof According to Theorem 1 with g(X) = ci gi (X). rework that example. we get 1 1 E[(2X + 1)2 ] = 4 · +4· +1 = 3 6 3 EXAMPLE 7 Show that    n n n−i i n E[(aX + b) ] = a b E(X n−i ) i i=0 Solution   n n Since (ax + b)n = (ax)n−i bi . Y)] = g(x. it follows that i=0 i ⎡   ⎤ n n E[(aX + b)n ] = E ⎣ an−i bi X n−i ⎦ i i=0    n n n−i i = a b E(X n−i ) i i=0 The concept of a mathematical expectation can easily be extended to situations involving more than one random variable. y) dx dy −q −q Generalization of this theorem to functions of any finite number of random variables is straightforward. the expected value of g(X. if X and Y are continuous random variables and f (x. y) is the value of their joint probability density at (x. y). the expected value of g(X. y) · f (x. For instance. THEOREM 4. y). y)f (x. y) is the value of their joint probability distribution at (x. Mathematical Expectation (b) Since E[(2X + 1)2 ] = 4E(X 2 ) + 4E(X) + 1 and substitution of r = 1 and r = 2 into the preceding formula yields E(X) = 2·3 2 = 13 and E(X 2 ) = 3·4 2 = 16 . y). If X and Y are discrete random variables and f (x. Y) is   q q E[g(X. y) x y Correspondingly. 119 . if Z is the random variable whose values are related to those of the two random variables X and Y by means of the equation z = g(x. Y) is  E[g(X. we can state the following theorem. Y)] = g(x. Y) = X + Y. . X2 . THEOREM 5. . . . Xk )] i=1 i=1 120 . . y) = 7 0 elsewhere find the expected value of g(X. . . then ⎡ ⎤  n n E ⎣ ⎦ ci gi (X1 . y) x=0 y=0 1 2 1 1 = (0 + 0) · + (0 + 1) · + (0 + 2) · + (1 + 0) · 6 9 36 3 1 1 + (1 + 1) · + (2 + 0) · 6 12 10 = 9 EXAMPLE 9 If the joint probability density of X and Y is given by 2 (x + 2y) for 0 < x < 1. and its proof parallels the proof of that theorem. Mathematical Expectation EXAMPLE 8 Find the expected value of g(X. Xk ) = ci E[gi (X1 . c2 . If c1 . . and cn are constants. Solution  2  2 E(X + Y) = (x + y) · f (x. . . . Solution  2 1 3 2x(x + 2y) E(X/Y ) = dx dy 1 0 7y3    2 2 1 1 = + dy 7 1 3y3 y2 15 = 84 The following is another theorem that finds useful applications in subsequent work. X2 . 1 < y < 2 f (x. . It is a generalization of Theorem 3. Y) = X/Y 3 . ⎪ ⎪ 3−x ⎪ ⎪ (b) the marginal density of X. Prove Theorem 3 for continuous random variables. This is the famous ⎧ Petersburg paradox. ⎪ ⎪ for 2 < x < 3 ⎪ ⎪ 2 ⎪ ⎪ 6. Find the expected value of the random variable Y 13. Given two continuous random variables X and Y. the mathematical expectations defined here and in Definition 4. and 125 . 2. whose probability density is given by 14. 1. x (b) Use the results of part (a) to determine E(X 3 + 2X 2 − 3X + 1). 0. 8. 2. 0. Prove Theorem 5 for discrete random variables. f (−1). This has been intentionally omitted for this edition. Find the expected value of the random variable X 2 whose probability density is given by show that E(2X ) does not exist. |x − 2| f (x) = for x = −1. which takes on the values −2. and 3 with probabilities f (−2). f (0). and E(X 3 ). 3. use ⎪ ⎨ for 1 < x F 2 Theorem 4 to express E(X) in terms of f (x) = 2 (a) the joint density of X and Y. . (a) g1 . 2. To illustrate the proof of Theorem 1. If the probability density of X is given by 3. g2 . 11. f ( y) = 8 16. 3 12. find E[(3X + 2)2 ]. (a) If X takes on the values 0. 2. and 3 with probabil- dom variable X. (b) the probabilities P[g(X) = gi ] for i = 1. find E(X) and E(X 2 ). consider the ran. 2. are of special importance. . 10. ⎪ ⎪ for 0 < x F 1 ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪ 1 5. 1 ities 125 12 48 . ⎪ ⎩0 elsewhere able X having the probability distribution find the expected value of g(X) = X 2 − 5X + 3. and g4 . 3. called the moments of the distribution of a random variable or simply the moments of a random variable. 125 . according to which a player’s expec- ⎪ ⎪ for 0 < x < 1 tation is infinite (does not exist) if he or she is to receive ⎨x f (x) = 2 − x for 1 F x < 2 2x dollars when. f (1). This has been intentionally omitted for this edition. Mathematical Expectation Exercises 1. If g(X) = X 2 . 7 7. ⎧ ⎪ ⎨ 1 ( y + 1) for 2 < y < 4 15. the ⎪ ⎪ first head appears on the xth flip. in a series of flips of a balanced coin. 121 . ⎩0 elsewhere 3 Moments In statistics. (a) If the probability density of X is given by ⎧ 4 ⎪ (c) E[g(X)] = gi · P[g(X) = gi ]. ⎧ ⎪ ⎪ x 4. Prove Theorem 2 for discrete random variables. 9. 1. −1. (b) Use the results of part (a) to determine the value of and f (3). f (2). 125 64 . E(X 2 ). If the probability distribution of X is given by ⎪ ⎩0 elsewhere  x 1 f (x) = for x = 1. This has been intentionally omitted for this edition. 1. g3 . and show that ⎨ 1 for 1 < x < 3 i=1 f (x) = x(ln 3) it equals ⎪ ⎩0 elsewhere  g(x) · f (x) find E(X). 4. This has been intentionally omitted for this edition. . the four possible values of g(x). Find the expected value of the discrete random vari. the first moment divided by f (x) = 1. The analogy applies also in the continuous case. . 1. we have μ1 = E(X). and μ2 would be the moment of inertia. when X is discrete. or simply the mean of X. that is. μ1 would  be the x-coordinate of the center of gravity. symbolically    μr = E (X − μ)r = (x − μ)r · f (x) x for r = 0. 2. is the expected value of (X − ␮)r . MOMENTS ABOUT THE ORIGIN. When r = 1. ␮1 is called the mean of the distribution of X. and it is denoted simply by ␮. and  q   μr = E (X − μ)r = (x − u)r · f (x)dx −q when X is continuous. symbolically  μr = E(X r ) = xr · f (x) x for r = 0. MOMENTS ABOUT THE MEAN. . . that is. we have μ0 = E(X 0 ) = E(1) = 1 by Corollary 2 of Theorem 2. This also explains why the moments μr are called moments about the origin: In the analogy to physics. is the expected value of X . the length of the lever arm is in each case the distance from the origin. . . when X is discrete. 2. It is of interest to note that the term “moment” comes from the field of physics: If the quantities f (x) in the discrete case were point masses acting perpendicularly to the x-axis at distances x from the origin. denoted by ␮r . the shape of the graph of its probability distribution or probability density. and in view of its importance in statistics we give it a special symbol and a special name. and  q μr r = E(X ) = xr · f (x)dx −q when X is continuous. . where μ1 and μ2 might be the x-coordinate of the center of gravity and the moment of inertia of a rod of variable density. The special moments we shall define next are of importance in statistics because they serve to describe the shape of the distribution of a random variable. denoted by ␮r . 1. . When r = 0. The rth moment about the mean of a random variable X. The rth moment about the origin of a random variable X. MEAN OF A DISTRIBUTION. DEFINITION 4. DEFINITION 3. Mathematical Expectation DEFINITION 2. which is just the expected value of the random variable X. 122 . Here. Figure 1. As can be seen. DEFINITION 5. To serve this purpose. Distributions with different dispersions. ␮2 is called the variance of the distribution of X.66. it is given a special symbol and a special name. σ . The positive square root of the variance. 123 . a small value of σ 2 suggests that we are likely to get a value close to the mean. describes the symmetry or skewness (lack of symmetry) of a distribution is given in Exercise 26. and a large value of σ 2 suggests that there is a greater probability of getting a value that is not close to the mean.26. or V(X). and it is denoted by σ 2 . the third moment about the mean. 3. VARIANCE. A brief discussion of how μ3 . The second moment about the mean is of special importance in statistics because it is indicative of the spread or dispersion of the distribution of a random variable. Figure 1 shows how the variance reflects the spread or dispersion of the distribution of a random variable. In many instances. This will be discussed further in Section 4.88. the reader will be asked to verify a general formula in Exercise 25. or sim- ply the variance of X.18. Mathematical Expectation Note that μ0 = 1 and μ1 = 0 for any random variable for which μ exists (see Exercise 17). is called the standard deviation of X. σx 2 . and 0. Here we show the histograms of the probability distributions of four random variables with the same mean μ = 5 but variances equaling 5. moments about the mean are obtained by first calculating moments about the origin and then expressing the μr in terms of the μr . thus. var(X). let us merely derive the following computing formula for σ 2 . 1. representing the number of points rolled with a balanced die.4413. find the standard deviation of the random variable X. 1 1 1 1 1 1 μ2 = E(X 2 ) = 12 · + 22 · + 32 · + 42 · + 52 · + 62 · 6 6 6 6 6 6 91 = 6 and it follows that  2 2 91 7 35 σ = − = 6 2 12 EXAMPLE 11 With reference to Example 2. Solution In Example 2 we showed that μ = E(X) = 0. Mathematical Expectation THEOREM 6. Solution First we compute 1 1 1 1 1 1 μ = E(X) = 1 · +2· +3· +4· +5· +6· 6 6 6 6 6 6 7 = 2 Now. Now  1 4 x2 μ2 = E(X 2 ) = dx π0 1+x 2    4 1 1 = 1− dx π 0 1 + x2 124 . σ 2 = μ2 − μ2 Proof σ 2 = E[(X − μ)2 ] = E(X 2 − 2μX + μ2 ) = E(X 2 ) − 2μE(X) + E(μ2 ) = E(X 2 ) − 2μ · μ + μ2 = μ2 − μ2 EXAMPLE 10 Use Theorem 6 to calculate the variance of X. 2732 and it follows that σ 2 = 0. If X has the variance σ 2 . We shall prove it here only for the continuous case.4413)2 = 0. let us now prove the following theorem. THEOREM 7. in no way affects the spread of its distribution. we write  q 2 2 σ = E[(X − μ) ] = (x − μ)2 · f (x) dx −q 125 . resulting in a corresponding change in the spread of the distribution. then for any positive constant k the probability is at least 1 − k12 that X will take on a value within k standard deviations of the mean. The following is another theorem that is of importance in work connected with standard deviations or variances.2732 − (0. leaving the discrete case as an exercise. symbolically. then var(aX + b) = a2 σ 2 The proof of this theorem will be left to the reader. we find that the addition of a constant to the values of a random variable. Mathematical Expectation 4 = −1 π = 0. for b = 0. (Chebyshev’s Theorem) If μ and σ are the mean and the stan- dard deviation of a random variable X. called Chebyshev’s theorem after the nineteenth-century Russian mathematician P. σ Z0 k2 Proof According to Definitions 4 and 5. resulting in a shift of all the values of X to the left or to the right. 4 Chebyshev’s Theorem To demonstrate how σ or σ 2 is indicative of the spread or dispersion of the distribu- tion of a random variable. the variance is multiplied by the square of that constant. THEOREM 8. we find that if the values of a random variable are multiplied by a constant. but let us point out the following corollaries: For a = 1.0785 = 0. L.2802.0785 √ and σ = 0. 1 P(|x − μ| < kσ ) Ú 1 − . Chebyshev. we can form the inequality  μ−kσ  q σ2 G (x − μ)2 · f (x) dx + (x − μ)2 · f (x) dx −q μ+kσ by deleting the second integral. the probability is at least 1 − 212 = 34 that a random variable X will take on a value within two standard deviations of the mean. we get  μ−kσ  μ+kσ 2 2 σ = (x − μ) · f (x) dx + (x − μ)2 · f (x) dx −q μ−kσ  q + (x − μ)2 · f (x) dx μ+kσ Since the integrand (x − μ)2 · f (x) is nonnegative. Mathematical Expectation Figure 2. we have thus shown that 1 P(|X − μ| G kσ ) F 2 k and it follows that 1 P(|X − μ| < kσ ) G 1 − 2 k For instance. Therefore. the probability is at least 1 − 312 = 89 that it will take on a value within three standard deviations of the mean. dividing the integral into three parts as shown in Figure 2. Diagram for proof of Chebyshev’s theorem. Since the sum of the two integrals on the right-hand side is the probability that X will take on a value less than or equal to μ − kσ or greater than or equal to μ + kσ . and the probability is at least 1 − 512 = 24 25 that it will take on a value within 126 . Then. since (x − μ)2 G k2 σ 2 for x F μ − kσ or x G μ + kσ it follows that  μ−kσ  q σ2 G k2 σ 2 · f (x) dx + k2 σ 2 · f (x) dx −q μ+kσ and hence that  μ−kσ  q 1 G f (x) dx + f (x) dx k2 −q μ+kσ provided σ 2 Z 0. 15. is given by  MX (t) = E(etX ) = etX · f (x) x when X is discrete. the probability given by Chebyshev’s theorem is only a lower bound.80. EXAMPLE 12 If the probability density of X is given by 630x4 (1 − x)4 for 0 < x < 1 f (x) = 0 elsewhere find the probability that it will take on a value within two standard deviations of the mean and compare this probability with the lower bound provided by Chebyshev’s theorem. whether the probability that a given random variable will take on a value within k standard deviations of the mean is actually greater than 1 − k12 and. Mathematical Expectation five standard deviations of the mean. an alternative procedure sometimes provides considerable simplifications.75. so that σ = 1/44 or approximately 0.80 P(0. 127 .20 and 0.96” is a much stronger state- ment than “the probability is at least 0.80) = 630x4 (1 − x)4 dx 0.” which is provided by Chebyshev’s theorem. MOMENT GENERATING FUNCTION.20 = 0. where it exists. by how much we cannot say.20 < X < 0. DEFINITION 6. if so. but Chebyshev’s theorem assures us that this probability cannot be less than 1 − k12 . the probability that X will take on a value within two standard deviations of the mean is the probability that it will take on a value between 0. It is in this sense that σ controls the spread or dispersion of the distribution of a random variable. The moment generating function of a random variable X.96 Observe that the statement “the probability is 0. Solution  Straightforward integration shows that μ = 12 and σ 2 = 44 1 . Only when the distribution of a random variable is known can we calculate the exact probability. 5 Moment-Generating Functions Although the moments of most distributions can be determined directly by evalu- ating the necessary integrals or sums.  0. and  q tX MX (t) = E(e ) = etx · f (x)dx −q when X is continuous. that is. This technique utilizes moment-generating functions. Thus. Clearly. Solution By definition  q MX (t) = E(etX ) = etx · e−x dx 0  q = e−x(1−t) dx 0 1 = for t < 1 1−t As is well known. . the rth moment about the origin. 128 . Mathematical Expectation The independent variable is t. In the continuous r! case. . . that is. . let us substitute for etx its Maclaurin’s series expansion. EXAMPLE 13 Find the moment-generating function of the random variable whose probability den- sity is given by e−x for x > 0 f (x) = 0 elsewhere and use it to find an expression for μr . 2. t2 x2 t3 x3 tr xr etx = 1 + tx + + +···+ +··· 2! 3! r! For the discrete case. and we are usually interested in values of t in the neighborhood of 0. the argument is the same. we thus get    t2 x2 tr xr MX (t) = 1 + tx + +···+ + · · · f (x) x 2! r!   t2  2 tr  r = f (x) + t · xf (x) + · x f (x) + · · · + · x f (x) + · · · x x 2! x r! x t2 tr = 1 + μ · t + μ2 · + · · · + μr · + · · · 2! r! and it can be seen that in the Maclaurin’s series of the moment-generating function tr of X the coefficient of is μr . To explain why we refer to this function as a “moment-generating” function. when |t| < 1 the Maclaurin’s series for this moment-generating function is MX (t) = 1 + t + t2 + t3 + · · · + tr + · · · t t2 t3 tr = 1 + 1! · + 2! · + 3! · + · · · + r! · + · · · 1! 2! 3! r! and hence μr = r! for r = 0. 1. but that of expanding it into a Maclaurin’s series. MbX (t) = E(ebXt ) = MX (bt). Mathematical Expectation The main difficulty in using the Maclaurin’s series of a moment-generating func- tion to determine the moments of a random variable is usually not that of finding the moment-generating function.   1  tx 3 3 tX MX (t) = E(e ) = · e 8 x x=0 1 = (1 + 3et + 3e2t + e3t ) 8 1 = (1 + et )3 8 Then. find 8 x the moment-generating function of this random variable and use it to determine μ1 and μ2 . say. If we are interested only in the first few moments of a random variable.     X+a t a t 3. and 3. MX+a (t) = E[e(X+a)t ] = eat · MX (t). M X+a (t) = E[e b ]=e bt · MX . dr MX (t)   = μr dtr t=0 This follows from the fact that if a function is expanded as a power series in t.  3  3 μ1 = MX  (0) = (1 + et )2 et  = 8 t=0 2 and  3 3  μ2 = MX  (0) = (1 + et )e2t + (1 + et )2 et  =3 4 8 t=0 Often the work involved in using moment-generating functions can be simplified by making use of the following theorem. THEOREM 10. then 1. Solution In accordance with Definition 6. their determination can usually be simplified by using the following theorem. 1. EXAMPLE 14   1 3 Given that X has the probability distribution f (x) = for x = 0. If a and b are constants. the tr coefficient of r! is the rth derivative of the function with respect to t at t = 0. 2. 2. b b 129 . THEOREM 9. μ1 and μ2 . by Theorem 9. f (2) = 0. show that the random variable Z whose α4 = 4 values are related to those of X by means of the equation σ z = x−μσ has Use the formula for μ4 obtained in Exercise 25 to find α4 for each of the following symmetrical distributions. f (0) = said to be in standard form. . f (4) = 0.05. . The symmetry or skewness (lack of symmetry) of a ⎧ distribution is often measured by means of the quantity ⎪ ⎨ x for 0 < x < 2 μ3 f (x) = 2 α3 = 3 ⎪ σ ⎩0 elsewhere Use the formula for μ3 obtained in Exercise 25 to deter- mine α3 for each of the following distributions (which 20. f (2) = 0. Prove Theorem 7. and the third part is of special importance when a = −μ and b = σ .20. μr = μr − μr−1 · μ + · · · + (−1)i μr−i · μi 1 i 18.11. g(X) = 2X + 3. f (−2) = 0. The extent to which a distribution is peaked or flat. and when we perform the 0. f (3) = 0. f (2) = 0. we are said to be standardizing (b) f (−3) = 0. . and f (6) = 0.10. show that μ0 = 1 25.20. f (3) = 0. the probability density 26. μ2 . f (x) = ln 3 x ⎪ ⎩0 elsewhere Also draw histograms of the two distributions and note that whereas the first is symmetrical. find the variance of 27. μ4 dard deviation σ . for r = 1. and f (3) = 0.10. Show that if X is a random variable with the mean μ for which f (x) = 0 for x < 0.06.15. Duplicate the steps used in the proof of Theorem 8 to prove Chebyshev’s theorem for a discrete random vari- 2x−3 for x > 1 able X. for 1 < x < 3 f (5) = 0. f (1) = 0. also called the kurtosis of the distribution. above change of variable.50. 24. Find μr and σ 2 for the random variable X that has the have equal means and standard deviations): probability density (a) f (1) = 0.15. f (−1) = 0. 130 .30. If the random variable X has the mean μ and the stan. and f (6) = 0. 0. in which case   μt t M X−μ (t) = e− σ · MX σ σ Exercises 17.05. Find μ. and f (3) = 0.30. f (1) = 0. Show that and that μ1 = 0 for any random variable for which    r r E(X) exists. Mathematical Expectation The proof of this theorem is left to the reader in Exercise 39.04. then for any positive con- check whether its mean and its variance exist. μ2 .10. 2. 22. f (5) = 0.30. and use this formula to express μ3 and 19. If the probability density of X is given by 28.05. f (−1) = 0. f (x) = 0 elsewhere 29. The first part of the theorem is of special importance when a = −μ. f (0) = the distribution of X. f (−2) = 0.45.15. . With reference to Exercise 8.06. is often mea- sured by means of the quantity 23. 3. Find μ. With reference to Definition 4. E(Z) = 0 and var(Z) = 1 of which the first is more peaked (narrow humped) than the second: A distribution that has the mean 0 and the variance 1 is (a) f (−3) = 0.09.04. skewed. stant a. f (2) = 0.09. f (4) = 0.20.05. the second has a “tail” on the left-hand side and is said to be negatively 21. and σ 2 for the random variable X that has + · · · + (−1)r−1 (r − 1) · μr the probability distribution f (x) = 12 for x = −2 and x = 2.11. and σ 2 for the random variable X that has μ4 in terms of moments about the origin. ⎧ ⎪ ⎨ 1 ·1 (b) f (1) = 0. and use it to determine the and use it to determine the values of μ1 and μ2 . find the moment-generating function of the ran- 3 dom variable Z = 14 (X − 3). and we have generating function given it here mainly because it leads to a relatively simple MX (t) = e4(e −1) t alternative proof of Chebyshev’s theorem. find the variance of the given by random variable by 1 for 0 < x < 1 (a) expanding the moment-generating function as an infi- f (x) = 0 elsewhere nite series and reading off the necessary coefficients. PRODUCT MOMENTS ABOUT THE ORIGIN. 3. 34. and  q  q μr.  μr. What is the smallest value of k in Chebyshev’s theo- rem for which the probability that a random variable will 37. Show that if a random variable has the probabil- take on a value between μ − kσ and μ + kσ is ity density (a) at least 0. . 2. 6 Product Moments To continue the discussion of Section 3. denoted by ␮r. and σ 2 . . Prove the three parts of Theorem 10. . 1 −|x| (b) at least 0.s . when X and Y are discrete. . Find the moment-generating function of the contin- uous random variable X whose probability density is 38. Explain why there can be no random variable for which MX (t) = 1−t t . random variable X that has the probability distribution  x 40. Find the moment-generating function of the discrete 39. If we let RX (t) = ln MX (t). (b) using Theorem 9. y) x y for r = 0. symbolically. DEFINITION 7.99? f (x) = e for − q < x < q 2 32. 31. use these results to find the mean and the variance of a random variable X having the moment- This inequality is called Markov’s inequality. With reference to Exercise 37. y)dxdy −q −q when X and Y are continuous. Given the moment-generating function MX (t) = 1 2 f (x) = 2 for x = 1. [Hint: Substitute (X − μ)2 for X. is the expected value of Xr Ys . 1.s = E(X r Y s ) = xr ys · f (x.95.] 36. Mathematical Expectation μ 35. The rth and sth product moment about the origin of the random variables X and Y. . . e3t+8t . and s = 0. 1. 2. . . 30. μ2 . . 2. what does its moment-generating function is given by this theorem assert about the probability that a random variable will take on a value between μ − c and μ + c? 1 MX (t) = 1 − t2 33.s = E(X r Y s ) = xr ys · f (x. Also. and use it to find μ1 . mean and the variance of Z. Use the inequality of Exercise 29 to prove Cheby- shev’s theorem. If we let kσ = c in Chebyshev’s theorem. let us now present the product moments of two random variables. show that RX (0) = μ and P(X G a) F a RX (0) = σ 2 . 131 . μ1. symbolically. which we denote here by μX . Y). Y).s = E[(X − μX )r (Y − μY )s ]  = (x − μX )r (y − μY )s ·f (x. denoted by ␮r. . if any. Analogous to Definition 4. and that μ0. In statistics. 1 − μX μY Proof Using the various theorems about expected values. THEOREM 11.1 is of special importance because it is indicative of the relation- ship. let us now state the following definition of product moments about the respective means. Let us now prove the following result. or C(X. y)dxdy −q −q when X and Y are continuous. which we denote here by μY . DEFINITION 9. and vice versa. and μr. PRODUCT MOMENTS ABOUT THE MEAN. the covariance will be posi- tive. 2. we can write σXY = E[(X − μX )(Y − μY )] = E(XY − XμY − YμX + μX μY ) = E(XY) − μY E(X) − μX E(Y) + μX μY = E(XY) − μY μX − μX μY + μX μY = μ1. the double summation extends over the entire joint range of the two random variables. when X and Y are discrete. and it is denoted by ␴XY . DEFINITION 8. which is useful in actually determining covariances.1 is called the covariance of X and Y. is the expected value of (X − ␮X)r (Y − ␮Y )s . it is given a special symbol and a special name. . cov(X. COVARIANCE. thus. μr. ␮1.s = E[(X − μX )r (Y − μY )s ]  q  q = (x − μX )r (y − μY )s · f (x. 2. . and s = 0.1 = E(Y). Note that μ1. σXY = μ1. It is in this sense that the covariance measures the relationship.0 = E(X). 1. Observe that if there is a high probability that large values of X will go with large values of Y and small values of X with small values of Y.s . . . . the covariance will be negative. Mathematical Expectation In the discrete case. 1. The rth and sth product moment about the means of the random variables X and Y. y) x y for r = 0. analogous to Theorem 6. between the values of X and Y. if there is a high probability that large values of X will go with small values of Y. 1 − μX μY 132 . between the values of X and Y. or association. and four laxative caplets. y) = 0 elsewhere 133 . two sedative. Mathematical Expectation EXAMPLE 15 The joint and marginal probabilities of X and Y. we get μ1. makes sense. Solution Referring to the joint probabilities given here. and vice versa. y > 0. x + y < 1 f (x. of course. we get 5 1 1 2 μX = E(X) = 0 · +1· +2· = 12 2 12 3 and 7 7 1 4 μY = E(Y) = 0 · +1· +2· = 12 18 36 9 It follows that 1 2 4 7 σXY = − · =− 6 3 9 54 The negative result suggests that the more aspirin tablets we get the fewer sedative tablets we will get. the numbers of aspirin and sedative caplets among two caplets drawn at random from a bottle containing three aspirin. 1 = E(XY) 1 2 1 1 1 1 = 0·0· +0·1· +0·2· +1·0· +1·1· +2·0· 6 9 36 3 6 12 1 = 6 and using the marginal probabilities. and this. EXAMPLE 16 Find the covariance of the random variables whose joint probability density is given by 2 for x > 0. are recorded as follows: x 0 1 2 1 1 1 7 0 6 3 12 12 2 1 7 y 1 9 6 18 1 1 2 36 36 5 1 1 12 2 12 Find the covariance of X and Y. 134 . we can write f (x. but a zero covariance does not necessarily imply their indepen- dence. we have the following theorem.1 − μX μY = E(X) · E(Y) − E(X) · E(Y) =0 It is of interest to note that the independence of two random variables implies a zero covariance. If X and Y are independent. by definition. This is illustrated by the following example (see also Exercises 46 and 47). then E(XY) = E(X) · E(Y) and σXY = 0. observe that if X and Y are independent. Mathematical Expectation Solution Evaluating the necessary integrals. symbolically. y) x y Since X and Y are independent.1 = 2xy dy dx = 0 0 12 It follows that 1 1 1 1 σXY = − · =− 12 3 3 36 As far as the relationship between X and Y is concerned. Proof For the discrete case we have. y) = g(x) · h(y). THEOREM 12. we get  1  1−x 1 μX = 2x dy dx = 0 0 3  1  1−x 1 μY = 2y dy dx = 0 0 3 and  1  1−x 1  σ1. and we get  E(XY) = xy · g(x)h(y) x y ⎡ ⎤⎡ ⎤   =⎣ x · g(x)⎦ ⎣ y · h(y)⎦ x y = E(X) · E(Y) Hence. where g(x) and h(y) are the values of the marginal distributions of X and Y.  E(XY) = xy · f (x. σXY = μ1. their covariance is zero. but the two random variables are not independent. . . then E(X1 X2 · . . . Xn are independent. For instance. 135 . we get 1 1 1 μX = (−1) · +0· +1· = 0 3 3 3 2 1 1 μY = (−1) · + 0 · 0 + 1 · = − 3 3 3 and 1 1 1 1 1 μ1. . . If X1 . . . y) Z g(x) · h(y) for x = −1 and y = −1. · E(Xn ) This is a generalization of the first part of Theorem 12. Solution Using the probabilities shown in the margins. in the following theorem. σXY = 0 − 0(− 13 ) = 0. Here let us merely state the important result. the covariance is zero. THEOREM 13. X2 . Mathematical Expectation EXAMPLE 17 If the joint probability distribution of X and Y is given by x −1 0 1 1 1 1 2 −1 6 3 6 3 y 0 0 0 0 0 1 1 1 1 0 6 6 3 1 1 1 3 3 3 show that their covariance is zero even though the two random variables are not independent.1 = (−1)(−1) · + 0(−1) · + 1(−1) · + (−1)1 · + 1 · 1 · 6 3 6 6 6 =0 Thus. Product moments can also be defined for the case where there are more than two random variables. f (x. · Xn ) = E(X1 ) · E(X2 ) · . Xk ) = Xi for i = 0. . 1. . . Xi ). Mathematical Expectation 7 Moments of Linear Combinations of Random Variables In this section we shall derive expressions for the mean and the variance of a linear combination of n random variables and the covariance of two linear combinations of n random variables. Xn are random variables and  n Y= ai Xi i=1 where a1 . X2 . . THEOREM 14. for which i < j. . 136 . . n. . let us write μi for E(Xi ) so that we get ⎧⎡ ⎤2 ⎫   ⎪ ⎨  n n ⎪ ⎬ var(Y) = E [Y − E(Y)]2 = E ⎣ ai Xi − ai E(Xi )⎦ ⎪ ⎩ i=1 ⎪ ⎭ i=1 ⎧⎡ ⎤2 ⎫ ⎪ ⎨ n ⎪ ⎬ =E ⎣ ai (Xi − μi )⎦ ⎪ ⎩ i=1 ⎪ ⎭ Then. from 1 to n. . a2 . expanding by means of the multinomial theorem. . we get  n  var(Y) = a2i E[(Xi − μi )2 ] + 2 ai aj E[(Xi − μi )(Xj − μj )] i=1 i<j  n  = a2i · var(Xi ) + 2 ai aj · cov(Xi . . for example. . . To obtain the expression for the variance of Y. . . then  n E(Y) = ai E(Xi ) i=1 and  n  var(Y) = a2i · var(Xi ) + 2 ai aj · cov(Xi Xj ) i=1 i<j where the double summation extends over all values of i and j. . . Xj ) = cov(Xj . 2. and again referring to Theorem 5. If X1 . it follows immediately that ⎛ ⎞ n  n E(Y) = E ⎝ ai Xi ⎠ = ai E(Xi ) i=1 i=1 and this proves the first part of the theorem. Applications of these results will be important in our later discussion of sampling theory and problems of statistical inference. Xj ) i=1 i<j Note that we have tacitly made use of the fact that cov(Xi . according to which (a + b + c + d)2 . Proof From Theorem 5 with gi (X1 . an are constants. X2 . equals a2 + b2 + c2 + d2 + 2ab + 2ac + 2ad + 2bc + 2bd + 2cd. X2 . will be left to the reader in Exercise 52. the variances σX2 = 1. . . . we obtain the following corollary. . Xj ) = 0 when Xi and Xj are independent. Mathematical Expectation Since cov(Xi . we get E(W) = E(3X − Y + 2Z) = 3E(X) − E(Y) + 2E(Z) = 3 · 2 − (−3) + 2 · 4 = 17 and var(W) = 9 var(X) + var(Y) + 4 var(Z) − 6 cov(X. then i=1  n var(Y) = a2i · var(Xi ) i=1 EXAMPLE 18 If the random variables X. If the random variables X1 . and cov(Y. b2 . it concerns the covariance of two linear combinations of n random variables. Since cov(Xi . . . b1 . we obtain the following corollary. which is very similar to that of Theorem 14. Xn are independent and  n Y= ai Xi . Y) + 12 cov(X. and σZ2 = 2. Xj ) i=1 i<j The proof of this theorem. Z) = 1. . and μZ = 4. Xn are random variables and  n  n Y1 = ai Xi and Y2 = bi Xi i=1 i=1 where a1 . Y) = −2. Xj ) = 0 when Xi and Xj are independent. If X1 . . σY2 = 5. . bn are constants. and Z have the means μX = 2. . Y2 ) = ai bi · var(Xi ) + (ai bj + aj bi ) · cov(Xi . Solution By Theorem 14. then  n  cov(Y1 . a2 . Z) = −1. . . 137 . and the covariances cov(X. an . . find the mean and the variance of W = 3X − Y + 2Z. THEOREM 15. cov(X. Y. X2 . . COROLLARY 3. Z) − 4 cov(Y. . Z) = 9 · 1 + 5 + 4 · 2 − 6(−2) + 12(−1) − 4 · 1 = 18 The following is another important theorem about linear combinations of ran- dom variables. μY = −3. . σY2 = 12. CONDITIONAL EXPECTATION. Y. . μY = 5. or integrating the values of conditional probability densities. Y) + 5 cov(X. cov(X. Y) = 1. Mathematical Expectation COROLLARY 4. 3X − Y − Z) = 3 var(X) − 4 var(Y) − 2 var(Z) + 11 cov(X. Xn are independent. Z) = 2. the variances σX2 = 8. and cov(Y. the conditional expectation of u(X) given Y = y is  E[u(X)|y)] = u(x) · f (x|y) x Correspondingly. Y2 ) = ai bi · var(Xi ) i=1 EXAMPLE 19 If the random variables X. then i=1 i=1  n cov(Y1 . and μZ = 2. . Z) = 3 · 8 − 4 · 12 − 2 · 18 + 11 · 1 + 5(−3) − 6 · 2 = −76 8 Conditional Expectations Conditional probabilities are obtained by adding the values of conditional prob- ability distributions. Z) = −3. Z) − 6 cov(Y. and cov(X. and Z have the means μX = 3. and σZ2 = 18. and f(x|y) is the value of the conditional probability distribution of X given Y = y at x. . we get cov(U. the conditional expectation of u(X) given Y = y is  q E[(u(X)|y)] = u(x) · f (x|y)dx =q Similar expressions based on the conditional probability distribution or density of Y given X = x define the conditional expectation of υ(Y) given X = x. 138 . DEFINITION 10. if X is a continuous variable and f(x|y) is the value of the condi- tional probability distribution of X given Y = y at x. V) = cov(X + 4Y + 2Z. Y1 = n n ai Xi and Y2 = bi Xi . Conditional expectations of random variables are likewise defined in terms of their conditional distributions. If X is a discrete random variable. If the random variables X1 . . find the covariance of U = X + 4Y + 2Z and V = 3X − Y − Z Solution By Theorem 15. X2 . we obtain the conditional mean of the random variable X given Y = y. Mathematical Expectation If we let u(X) = X in Definition 10. The reader should not find it difficult to generalize Definition 10 for conditional expectations involving more than two random variables. Solution For these random variables the conditional density of X given Y = y is ⎧ ⎨ 2x + 4y for 0 < x < 1 ⎪ f (x|y) = 1 + 4y ⎪ ⎩0 elsewhere so that ⎧    ⎪2 1 ⎨ (x + 1) for 0 < x < 1 f x = 3 2 ⎪ ⎩0 elsewhere Thus. 0 < y < 1 f (x. which we denote by μX|y = E(X|y) Correspondingly. μX| 1 is given by 2     1 1 2 E X  = x(x + 1) dx 2 0 3 5 = 9 Next we find      2 1 1 2 2 E X  = x (x + 1) dx 2 0 3 7 = 18 and it follows that  2 7 5 13 σ2 1 = − = X| 2 18 9 162 139 . y) = 3 ⎪ ⎩0 elsewhere find the conditional mean and the conditional variance of X given Y = 12 . EXAMPLE 20 If the joint probability density of X and Y is given by ⎧ ⎪ ⎨ 2 (x + 2y) for 0 < x < 1. the conditional variance of X given Y = y is 2 σX|y = E[(X − μX|y )2 |y] = E(X 2 |y) − μ2X|y where E(X 2 |y) is given by Definition 10 with u(X) = X 2 . (b) U and V are dependent. ⎪ ⎪0 for x F a ⎪ ⎨ F(x) − F(a) (c) If two random variables have the joint density F(x|a < X F b) = for a < x F b given by ⎪ ⎪ F(b) − F(a) ⎪ ⎩1 for x > b e−x−y for x > 0. 0) = 0. 51. find the mean and f (x) dx the variance of a 140 . is ous case that the second partial derivative of the joint given by moment-generating function with respect to ti and tj . This has been intentionally omitted for this edition. cov(X. ⎧ ⎪ ⎨ 1 (x + y) for 0 < x < 1. y) = 0 elsewhere (b) Differentiate the result of part (a) with respect to x to find the conditional probability density of X given find their joint moment-generating function and use it a < X F b. 43. If the probability density of X is given by Y) in terms of the variances and covariance of X ⎧ and Y. the continuous random variable X. y) = 14 for x = −3 and y = −5. X) for both discrete (a) Y = 2X1 − 3X2 + 4X3 . y) = 3 ⎪ ⎩0 46. (a) Show for either the discrete case or the continuous 59. and 3 and the variances 3. If var(X1 ) = 5. and 5. 9. Mathematical Expectation Exercises 41. given a < X F b. var(X3 ) = 7. x = −1 and assumption of independence and using instead the y = −1. . Express var(X + Y). 1) = 14 . 52. This question has been intentionally omitted for this edition. X2 ) = 1. . Y). This question has been intentionally omitted for this edition. 0) = 16 . dropping the f (x. and x = 3 and y = 5. and U = X and V = X 2 . generating function with respect to ti at t1 = t2 = · · · = tk = 0 is E(Xi ). E(Y). 7. For k random variables X1 . f (x) = 1 − x for 0 < x < 1 X2 ) = 3. and X2 and X3 are indepen- ⎪ ⎪ ⎩0 elsewhere dent. find information that cov(X1 . Z). E(X). the values of their joint moment-generating function are given by 57. 53. X − 47. and continuous random variables X and Y. X2 . and cov(X + Y. x = 1 and y = 1. 0) = 12 1 . find the covariance of Y1 = X1 − 2X2 + 3X3 and Y2 = −2X1 + 3X2 + 4X3 . i Z j. X3 ) = −3. and show that to determine the values of E(XY). and cov(X1 . and X3 are independent and have the means E[u(X)|a < X F b] = a  b 4. This has been intentionally omitted for this edition. This question has been intentionally omitted for this case that the partial derivative of the joint moment- edition. and f (1. With reference to Exercise 49. X2 .   E et1 X1 +t2 X2 +···+tk Xk 58. (a) cov(U. Y) = cov(Y. (a) cov(X. . 0 < y < 2 45. cov(X1 . Prove Theorem 15. f (−1. cov(X1 . ⎪ ⎪ ⎨1 + x for − 1 < x F 0 54. f (0. 56. Y) = 0. ⎧ at t1 = t2 = · · · = tk = 0 is E(Xi Xj ). f (1. This has been intentionally omitted for this edition. var(X2 ) = 4. find cov(Y. (b) Z = X1 + 2X2 − X3 . Prove that cov(X. (b) the two random variables are not independent. var(X − Y). 1) = 0. If X and Y have the joint probability distribution elsewhere f (−1. Xk . If X and Y have the joint probability distribution 50. If the joint probability density of X and Y is given by 44. 42. 60. y > 0 f (x. cov(X2 . This question has been intentionally omitted for this edition.  b u(x)f (x) dx 49. show that find the variance of W = 3X + 4Y − 5. (a) Show that the conditional distribution function of (b) Show for either the discrete case or the continu. show that 55. f (x. . 1) = 12 . V) = 0. Y). X3 ) = −2. If X1 . and cov(X. Repeat both parts of Exercise 49. 48. X3 ) = −2. f (0. 000) are 25. suppose. defined as  n x= xi /n i=1 where i = 1.500. . 18. The median 2 2 of the 10 observations given in the preceding example is $28. but a relatively few have very high incomes. . then subtraction of the mean from each observation before squaring and adding. that extend the methodology of describing data. it is much easier to use the following calculating formula for s: 141 . is much higher than the others (it’s what the owner pays himself) and only one other employee earns as much as $39. the mean can be thought of as the centroid of the data and. The analog of the first moment. The sample standard deviation. Give the location of data. Other descriptive measures for the location of data should be used in cases like the one just described. n and n is the number of observations. The median is used instead of the mean here because it is well known that the distribution of family incomes in the United States is highly skewed—the great majority of families earn low to moderate incomes. as follows: ! " n " " (x − x)2 " # i=1 s= n−1 Since this formula requires first the calculation of the mean.500. 20.” He would be technically correct. The usefulness of the sample mean as a description of data can be envisioned by imagining that the histogram of a data distribution has been cut out of a piece of cardboard and balanced by inserting a fulcrum along the horizontal axis. those arising from data. 28.000. But it can be misleading when used to measure the location of highly skewed data. . Mathematical Expectation 9 The Theory in Practice Empirical distributions. in a small company.000. 32. smallest to largest.500. s. namely $150. the incomes of American families. σ . 29. the annual salaries of its 10 employees (rounded to the nearest $1. and it is defined as (n − 1) (n + 1) the mean value of observations and if n is an odd integer. as such. 36. A reasonable measure of dispersion can be based on the square root of the second moment about the mean. and it is a much better description of what an employee of this company can expect to earn. These descriptive measures are based on the ideas of moments. is the sample mean. 16. it describes its location. but very misleading. 2. calculated from data. We will discuss descriptive measures. say. The mean of these observations is $39. If the data are ranked from. One of the salaries. μ1 = μ. You may very well have heard the term “median income” for. The mean is an excellent measure of location for symmetric or nearly symmetric distributions. Thus. The median describes the center of the data as the middle point of the observations. in a recruiting ad. is calculated analogously to the second moment. and 150. say. This balance point corresponds to the mean of the data. the median becomes observation number n/2 if n is an even integer. can be described by their shape. given in Section 3. one reasonably wants to know how closely the observations are grouped around this value. 41. Suppose the owner. The dispersion of data also is important in its description. To give an example. x. . claimed that “Our company pays an average salary of $39. 8 + 12. xl and xs . 3. Brown will sell a piece of 3 property at a profit of $3. anced die. + (12. or 6 to make the game equitable? 62.29. divided by 10.2 = 119. but every statistical computer pro- gram in common use will calculate both the sample mean and the sample standard deviation once the data have been inputted.2 11. we obtain s2 = (10)(1435. 2. Using either formula for the calculation of s requires tedious calculation. If someone Assuming that each cake can be sold only on the day it is 142 . EXAMPLE 21 The following are the lengths (in feet) of 10 steel beams rolled in a steel mill and cut to a nominal length of 12 feet: 11. 2. This question has been intentionally omitted for this pays us $10 each time that we roll a 3 or a 4 with a bal- edition. To calculate the standard deviation.8. The standard deviation is not the only measure of the dispersion. A game of chance is considered fair. Mathematical Expectation ! ⎛ ⎞2 " " n n " "n " xi − ⎝ 2 xi ⎠ # i=1 i=1 s= n(n − 1) Note that in both formulas we divide by n − 1 instead of n. The mean.1 12. To calculate the range. The probability that Ms.2 Calculate the mean length and its standard deviation. defining the range to be r = xl − xs This measure of dispersion is used only for small samples. the probability 7 that she will break even is 20 . . the range becomes a poorer and poorer measure of dispersion. Then substituting into the formula for s.0 12.000 is 20 . we first cal- culate the sum of the squares of the observations. The manager of a bakery knows that the number of 7 chocolate cakes he can sell on any given day is a ran- she will sell it at a profit of $1.500 is 20 .082 foot.8)2 + (12. for larger and larger sam- ple sizes.5 11.9 12. 12. we obtain s = 0. or x = 11.500 is 20 . 1–2 61. Applied Exercises SECS. Is the mean a reasonable mea- sure of the location of the data? Why or why not? Solution The mean is given by the sum of the observations.1)2 + .1 + . 11. each player’s expectation is equal to zero.5 11.94) − (119. seems to be a reasonable measure of location inasmuch as the data seem to be approximately symmetrically distributed. we find the largest and the smallest observations. how much should we pay that person when we roll a 1.94.98 feet. 5. or variability of data. 1. He also knows that there is a profit of $1. and 5.8)2 /(10)(9) = 0. Taking the square root.40 for each cake that he does not sell. or equitable. . (11. 4. 11. the probability that 64. The sample range sometimes is used for this purpose.8 12.2)2 = 1. if to spoilage) of $0.9 12. .98 feet. .7 11. What is her expected profit? for x = 0.00 for each cake that he sells and a loss (due 63. 435. and the probability that she dom variable having the probability distribution f (x) = 16 3 will lose $1. This question has been intentionally omitted for this and a standard deviation of 0. looked upon as a continuous random variable having the probability density 75. at each flip the to four decimals? loser pays the winner one dollar. The amount of time it takes a person to be served at a variable with a mean of 0. Smith are betting on repeated the given restaurant if we use Chebyshev’s theorem with flips of a coin.60.1 inch. mean and the standard deviation of the length of a 73.3 inch and a standard deviation given restaurant is a random variable with the probabil. f (x) = 4 ⎪ ⎩0 80. A quarter is bent so that the probabilities of heads expectation is zero.5? What is the corresponding probability rounded a dollars and Ms. what is will win Ms. Accord- edition. ing to Chebyshev’s theorem.260 milligram and σ = 0. 143 (b) at least 144 of all slices of this bread? 68.5 inch 72. (a) The scores that high school juniors get on the verbal (b) two of the cakes. 6–9 fact that in an equitable game each player’s mathematical 78. Adams has k = 1. Mr. what is her expected profit? bread shows that the amount of thiamine (vitamin B1 ) in a slice may be looked upon as a random variable with 66. The length of certain bricks is a random variable elsewhere with a mean of 8 inches and a standard deviation of Find the mean and the variance of this random variable. can assume that all the random variables involved are independent? 143 . with what probability can f (x) = 18 we assert that between 64 and 184 marriage licenses will ⎪ ⎩0 elsewhere be issued there during the month of June? 76. Find the mean and the standard deviation of the ⎪ ⎨ 1 e− x4 for x > 0 outside diameter of the tube. This question has been intentionally omitted for this wall made of 50 of these bricks laid side by side.40 and 0. find the baker’s expected profit for a day on which 74. (b) The weight of certain animals may be looked upon How many should he bake in order to maximize his as a random variable with a mean of 212 grams. find the variance of the 79. If none expected profit? of the animals weighs less than 165 grams.005 inch. and W.02 inch. Mathematical Expectation made.5. the number of heads obtained on the dollars. Smith’s b dollars before he loses his a the covariance of Z. between what values must 67. Adams and Ms. will get a score of 65 or more. 77. If a contractor’s profit on a construction job can be least 250 grams. what can we assert about the amount of time it takes a person to be served at 69. and the game contin- ues until either player is “ruined. first toss. 3–5 the two tosses of the coin? 70. if we edition. (a) at least 35 36 of all slices of this bread.005 milligram. Find an upper bound to the probability that one of the students (d) four of the cakes.” Making use of the SECS. A study of the nutritional value of a certain kind of where the units are in $1. 0. What is the edition. If it is tossed twice. The number of marriage licenses issued in a certain city during the month of June may be looked upon as a ⎧ random variable with μ = 124 and σ = 7. The following are some applications of the Markov he bakes inequality of Exercise 29: (a) one of the cakes. At the start of the game Mr. Adams and tails are 0. the total number of heads obtained in SECS. This question has been intentionally omitted for this μ = 0. find the probability that Mr. variable with a mean of 3 inches and a standard devia- tion of 0.000. of 0. (e) five of the cakes. Accord- ⎪ ⎨ 1 (x + 1) for −1 < x < 5 ing to Chebyshev’s theorem. and the two random variables are indepen- ity density ⎧ dent. With reference to Example 1. This question has been intentionally omitted for this be the thiamine content of edition. part of the PSAT/NMSQT test may be looked upon as (c) three of the cakes. With reference to Exercise 71. find an upper bound to the probability that such an animal will weigh at 65. the thickness of the tube is a random 71. The inside diameter of a cylindrical tube is a random number of television sets with white cords. Smith has b dollars. This question has been intentionally omitted for this edition.03 inch. and the thickness of the mortar between two bricks is a random variable with a mean of 0. values of a random variable with the mean μ = 41. and getting an ace is a suc. 9 (a) 2. g2 = 1. 1. roll a balanced die. 25 μ3 = μ3 − μμ2 + 2μ3 and μ4 = μ4 − 4μμ3 + 6μ2 μ2 − 75 At least 63 64 . and then draw having the probability density a card from a well-shuffled deck. f (−2) + f (2). √ 31 (a) k = 20. 83 0. 63 $5. 51 805 162 . 5 (a) E(x) = xf (x. σY2 = 155.05. 77 0.  q  q (X.  q−q −q 57 3. find the 83. 1 . roll a balanced die ⎪ ⎪ for 0 < x F 2 twice. ⎩0 elsewhere what are the mean and the standard deviation of the num- ber of heads that we obtain in 10 flips of these coins? With reference to part (b) of Exercise 60. f (−1) + 1 .0224. 7 E(Y) = 3712 .95 min.8. σ 2 = 1. 67 6 million liters. σ 2 = 4. that has lasted at least 1 minute. μ2 = 2. 85 2. edition.96. 55 −56. getting a six 84. 11 − 11 6 . and then draw a card from a well-shuffled deck.997.74. and g4 = 9. 69 a+ba .9179. ⎪ ⎨ 4 f (x) = 4 82. 43 −0. Y). ⎧ ⎪ x (b) flip a balanced coin three times. (b) μZ = 19. σ = 0. σZ2 = 36. (b) f (0). −q 61 (a) 98.000. 45 72 f (1). 81 (a) 0. var(X) − var(Y). (b) k = 10.68. 33 Mx (t) = 3−e μ1 = 32 . (b) 88. 59 125 . 15 12 71 μ = 4.24. Y). $  x 53 var(X) + var(Y) + 2cov(X. 19 μ = 43 .6. 13 12 .2. This question has been intentionally omitted for this expected length of one of these telephone conversations edition. + 4 · {f (−2) + f (2)} + 9 · f (3) = (−2)2 · f (−2)+ (−1)2 · f (−1) 2 2 2 2 + 0 · f (0) + 1 · f (1) + 2 · f (2) + 3 · f (3) = g(x) · f (x). and σ 2 = 29 .14.91.45. g3 = 4. 144 . 0. 79 μ = 3. 73 μ = 1. This question has been intentionally omitted for this is a success when we roll a die. The amount of time (in minutes) that an executive of of the total number of successes when we a certain firm talks on the telephone is a random variable (a) flip a balanced coin. 2et . σ 2 = 16. (c) 0 · f (0) + 1 · {f (−1) + f (1)} 49 (a) μY = −7. find the mean and the standard deviation 85.6. If we alternately flip a balanced coin and a coin that ⎪ ⎪ for x > 2 ⎪ ⎪ x3 is loaded so that the probability of getting heads is 0. 27 (a) 3. If heads is a success when we flip a coin. μ2 = 3. y) dy dx. Answers to Odd-Numbered Exercises 1 (a) g1 = 0. (b) 29.4 and 6. Mathematical Expectation 81. (b) 2. cess when we draw a card from an ordinary deck of 52 playing cards. t 35 μ = 4. var(X) + var(Y) − 2cov 3 Replace by in the proof of Theorem 3. (b) 1. (b) E(x) = xg(x) dx. 3μ4 . and f (3). 65 $3. σ 2 = 34 . x2 . many of the details are left as exercises. Copyright © 2014 by Pearson Education. . Inc. Freund’s Mathematical Statistics with Applications. that is. Marylees Miller. In some instances this will be done because the results are needed later. symbolically. Although it would seem logical to use in each case whichever method is simplest. Also. We shall also study their parameters.Special Probability Distributions 1 Introduction 6The Hypergeometric Distribution 2 The Discrete Uniform Distribution 7The Poisson Distribution 3 The Bernoulli Distribution 8The Multinomial Distribution 4 The Binomial Distribution 9The Multivariate Hypergeometric 5 The Negative Binomial and Geometric Distribution Distributions 10 The Theory in Practice 1 Introduction In this chapter we shall study some of the probability distributions that figure most prominently in statistical theory and applications. we shall sometimes use both. mainly μ and σ 2 . The most common parameters are the lower moments. we have the following definition. and there are essentially two ways in which they can be obtained: We can eval- uate the necessary sums directly or we can work with moment-generating functions. . Irwin Miller. A random variable X has a discrete uniform distribution and it is referred to as a discrete uniform random variable if and only if its probability distribution is given by 1 f (x) = for x = x1 . . DEFINITION 1. Eighth Edition. 145 . All rights reserved. to keep the size of this chapter within bounds. 2 The Discrete Uniform Distribution If a random variable can take on k different values with equal probability. the quantities that are constants for particular distributions but that can take on different values for different members of families of distributions of the same kind. in others it will merely serve to provide the reader with experience in the application of the respective mathematical techniques. DISCRETE UNIFORM DISTRIBUTION. we say that it has a discrete uniform distribution. xk k where xi Z xj when i Z j. From Chapter 5 of John E. no two of them belong to the same household). we refer to an experiment to which the Bernoulli distribution applies as a Bernoulli trial. . we have the following definition. and it may be losing a race. it applies if we want to know the probability of getting 5 heads in 12 flips of a coin. it may be catching pneumonia. the parameter θ (the probability of a success) is the same for each trial. especially when the number of trials is fixed. k. for instance. . BERNOULLI DISTRIBUTION. a success may be getting heads with a balanced coin. 3 The Bernoulli Distribution If an experiment has two possible outcomes. the probability that 7 of 10 persons will recover from a tropical disease. respectively. or simply a trial.” and their prob- abilities are. f (0. and to sequences of such experiments as repeated trials. symbolically. The mean and the variance of this discrete uniform distribution and its moment-generating function are treated in Exercises 1 and 2. θ ) to indicate explicitly that the Bernoulli distribution has the one parameter θ . to the number k of points we roll with a balanced die. or the probability that 35 of 80 persons will respond to a mail-order solicitation. observe that the probability of getting x successes and n − x failures in a specific order is θ x (1 − θ )n−x . A random variable X has a Bernoulli dis- tribution and it is referred to as a Bernoulli random variable if and only if its probability distribution is given by f (x. How- ever. 146 . for example. DEFINITION 2. they are treated by dif- ferent doctors in different hospitals). it may be passing (or failing) an examination. Special Probability Distributions In the special case where xi = i. “success” and “failure. . This inconsistency is a carryover from the days when probability theory was applied only to games of chance (and one player’s failure was the other’s success). then the number of successes. and the trials are all independent. The theory that we shall discuss in this section has many applications. others will be given in Section 5. 4 The Binomial Distribution Repeated trials play a very important role in probability and statistics. has a Bernoulli distribution. The one we shall study here concerns the total number of successes. As we shall see. In connection with the Bernoulli distribution. θ ) = 1 − θ and f (1. Also for this reason. θ and 1 − θ . and in this form it applies. the discrete uniform distribution becomes 1 f (x) = for x = 1. 1 Thus. . several random variables arise in connection with repeated trials. There is one factor θ for each success. this is the case only if each of the 10 persons has the same chance of recovering from the disease and their recoveries are independent (say. θ ) = θ are combined into a single formula. Observe that we used the notation f (x. 2. 0 or 1. and if the probability of getting a reply to the mail-order solicitation is the same for each of the 80 persons and there is indepen- dence (say. θ ) = θ x (1 − θ )1−x for x = 0. To derive a formula for the probability of getting “x successes in n trials” under the stated conditions. EXAMPLE 1 Find the probability of getting five heads and seven tails in 12 flips of a balanced coin. . we have only to count how many sequences of this kind there are and then multiply θ x (1 − θ )n−x by that number. EXAMPLE 2 Find the probability that 7 of 10 persons will recover from a tropical disease if we can assume independence and the probability is 0. 147 . . x DEFINITION 3. 1. . n are the succes- sive terms of the binomial expansion of [(1 − θ ) + θ ]n . n. as it should. A random variable X has a binomial dis- tribution and it is referred to as a binomial random variable if and only if its probability distribution is given by   n b(x. Special Probability Distributions one factor 1 − θ for each failure. . and θ = 1 2 into the formula for the binomial distribution. 12. n x Thus. the number of successes in n trials is a random variable having a bino- mial distribution with the parameters n and θ . n. n = 12. . 1. Clearly. . we find that 5  12 the result is 792 12 . 2. . and the x factors θ and n − x factors 1 − θ are all multiplied together by virtue of the assumption of independence. θ ) = θ x (1 − θ )n−x for x = 0. looking up the value of in Table VII of “Statistical Tables”. θ ) for x = 0. the number  of ways in which we can n select the x trials on which there is to be a success is . Since this prob- ability applies to any sequence of n trials in which there are x successes and n − x failures.19. The name “binomial distribution” derives from the fact that the values of b(x. Solution Substituting x = 5. = 1− 2 5 2 2   12 and.80 that any one of them will recover from the disease. BINOMIAL DISTRIBUTION. or approximately 0. we get        1 12 1 5 1 12−5 b 5. and it follows that the x   n desired probability for “x successes in n trials” is θ x (1 − θ )n−x . 2. this shows also that the sum of the probabilities equals 1. Computer printout of binomial probabilities for n = 10 and θ = 0. 0. θ = 0. .50. . n.20)3 .63. we get   10 b(7. θ ) k=0 upon simple commands.15. Special Probability Distributions Solution Substituting x = 7.05. θ ) to four decimal places for n = 1 to n = 20 and θ = 0. the National Bureau of Standards table and the book by H. and there exists an abundance of computer software yielding binomial probabilities as well as the corresponding cumulative probabilities  x B(x. To use this table when θ is greater than 0. they are listed among the references at the end of this chapter. Also. An example of such a printout (with somewhat different notation) is shown in Figure 1. for they are tabulated extensively for var- ious values of θ and n. and θ = 0. by substituting x = 35. 148 . or approximately 0.50. 10. n. n. In the past. and.80 into the formula for the binomial distribu- tion. binomial probabilities are rarely calculated directly. . .45.10. 0. n = 80. n = 10. say. 0. Figure 1.80)7 (1 − 0. 0.80)10−7 7   10 and. If we tried to calculate the third probability asked for on the previous page. looking up the value of in Table VII of “Statistical Tables”.15. Romig have been widely used. θ ) = b(k.80) = (0. the one concerning the responses to the mail-order solicitation. G.80)7 (0. In actual practice. Table I of “Statistical Tables” gives the values of b(x. into the formula for the binomial distribution. we refer to the following identity. we would find that this requires a prohibitive amount of work.20. we find that 7 the result is 120(0. 0. The mean and the variance of the binomial distribution are μ = nθ and σ 2 = nθ (1 − θ ) Proof    n n x μ= x· θ (1 − θ )n−x x x=0  n n! = θ x (1 − θ )n−x (x − 1)!(n − x)! x=1 where we omitted the term corresponding to x = 0. and can-   the x against the first factor of x! = x(x − 1)! in the denominator of celed n . which is 0. Then. x we get   n n − 1 x−1 μ = nθ · θ (1 − θ )n−x x−1 x=1 and. THEOREM 2. this becomes    m m y μ = nθ · θ (1 − θ )m−y = nθ y y=0 since the last summation is the sum of all the values of a binomial distri- bution with the parameters m and θ . we look up b(7. For instance.70). Duplicat- ing for all practical purposes the steps used before. to find b(11. 18.1376. To find expressions for μ2 and σ 2 . 0. Special Probability Distributions THEOREM 1.30) and get 0. let us make use of the fact that E(X 2 ) = E[X(X − 1)] + E(X) and first evaluate E[X(X − 1)]. 0. there are several ways in which binomial probabilities can be approximated when n is large. letting y = x − 1 and m = n − 1. b(x. and hence equal to 1. 1 − θ ) which the reader will be asked to prove in part (a) of Exercise 5. we thus get    n n x E[X(X − 1)] = x(x − 1) θ (1 − θ )n−x x x=0  n n! = θ x (1 − θ )n−x (x − 2)!(n − x)! x=2    n n − 2 x−2 2 = n(n − 1)θ · θ (1 − θ )n−x x−2 x=2 149 . one of these will be mentioned in Section 7. n. Let us now find formulas for the mean and the variance of the binomial distribution. n. Also. 18. factoring out the factor n in n! = n(n − 1)! and one factor θ . θ ) = b(n − x. Hence. letting y = x − 2 and m = n − 2. if a balanced coin is flipped 200 times. we expect (in the sense of a mathematical expectation) 200 · 12 = 100 heads and 100 tails. we expect 240 · 16 = 40 sixes. and it should be observed that it applies to the proportion of successes. the probability approaches 1 that the proportion of successes will differ from ␪ by less than any arbitrary constant c. This random variable is the pro- portion of successes in n trials. THEOREM 3. After all. Special Probability Distributions and. let X us consider the random variable Y = . this becomes    m m y 2 E[X(X − 1)] = n(n − 1)θ · θ (1 − θ )m−y y y=0 = n(n − 1)θ 2 Therefore. and if the probability is 0. to emphasize its significance. but. requiring much less algebraic detail. has many important applications. This result is called a law of large numbers. where X is a random variable having a n binomial distribution with the parameters n and θ . is sug- gested in Exercise 6. σ 2 = μ2 − μ2 = n(n − 1)θ 2 + nθ − n2 θ 2 = nθ (1 − θ ) An alternative proof of this theorem. if a balanced die is rolled 240 times. being a measure of variation. we can assert that for any positive constant c the probability is at least θ (1 − θ ) 1− nc2 that the proportion of successes in n trials falls between ␪ − c and ␪ + c. if we apply Chebyshev’s theorem with kσ = c.80 that a person shopping at a department store will make a pur- chase. when n → q. μ2 = E[X(X − 1)] + E(X) = n(n − 1)θ 2 + nθ and. finally. then n θ (1 − θ ) E(Y) = θ and σY2 = n Now. 150 . and in Exercise 6 the reader will be asked to prove the following result. It should not have come as a surprise that the mean of the binomial distribution is given by the product nθ . we would expect 400(0. The formula for the variance of the binomial distribution. It is a fallacy to suppose that when n is large the number of successes must necessarily be close to nθ . not to their actual number.80) = 320 of 400 persons shopping at the department store to make a purchase. If X has a binomial distribution with the parameters n and θ X and Y = . sim- ilarly. 50. The corresponding proportions. are 35 = 0. plotted in Figure 3. the probability of heads for each flip of the coin. . 10 6 = 0. Since the moment-generating function of the binomial distribution is easy to obtain. among the first twenty-five there are 14 heads.. and 51 100 = 0.000 Proportion of heads 0. Graph illustrating the law of large numbers.. . we find that among the first five simulated flips there are 3 heads. This is shown in Figure 2. among the first fifteen there are 8 heads. Special Probability Distributions Figure 2. where the 1’s and 0’s denote heads and tails. An easy illustration of this law of large numbers can be obtained through a computer simulation of the repeated flipping of a balanced coin. let us find it and use it to verify the results of Theorem 2.60. among the first twenty there are 12 heads. 1. 151 . . 15 8 = 0.500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Number of simulated flips of a coin Figure 3. . .60. Observe that the proportion of heads fluctuates but comes closer and closer to 0.51. 14 25 = 0. . Reading across successive rows. 12 20 = 0. and among all hundred there are 51 heads. among the first ten there are 6 heads.56.60. Computer simulation of 100 flips of a balanced coin.53. . n. n. show that k+1 5. n. . 2. θ ). show that its moment-generating function (d) B(x. n. . but it should be apparent that the differentiation becomes fairly involved if we want to determine. An alternative proof of Theorem 2 may be based on MX (t) = the fact that if X1 . Show that for the Bernoulli distribution. Actually. θ ) − B(x − 1. say. θ ) = b(n − x. n. t→0 Exercise 1. θ ) = B(x. without making use of the fact that the Bernoulli distribution is a special case of the 3. The moment-generating function of the binomial distribution is given by MX (t) = [1 + θ (et − 1)]n If we differentiate MX (t) twice with respect to t.. Special Probability Distributions THEOREM 4. the work can usually be simplified by first calculating (b) letting n = 1 in the moment-generating function of b(0. X2 . . . . we get μ1 = nθ and μ2 = nθ (1 − θ + nθ ). Verify that (a) its mean is μ = . θ ) and then using the recursion formula the binomial distribution and examining its Maclaurin’s series. We did not study the Bernoulli distribution in any binomial distribution) that the mean and the variance of detail in Section 3. . k. 1. . 2 k2 − 1 x (b) its variance is σ 2 = . k x = 1. which is explained in Exercise 12. . θ ). . upon substituting t = 0. 1 − θ ). 1 − θ ) − B(n − x − 1. n. n. If X has the discrete uniform distribution f (x) = for (c) b(x. Verify directly (that is. it is based on its factorial moment-generating function. (a) b(x. n. there exists yet an easier way of determining the moments of the binomial distribution. 2. . 1 − θ ). θ (n − x) b(x + 1. θ ) for x = 12 k=0 0. 2. 3. . θ ) = B(n − x. n. When calculating all the values of a binomial distribu- x=0 tion. n. μr = θ for r = 1. we get  MX (t) = nθ et [1 + θ (et − 1)]n−1  MX (t) = nθ et [1 + θ (et − 1)]n−1 + n(n − 1)θ 2 e2t [1 + θ (et − 1)]n−2 = nθ et (1 − θ + nθ et )[1 + θ (et − 1)]n−2 and. Also show that if B(x. 1 (a) evaluating the sum xr · f (x. . then 1 (b) b(x. Thus. This question has been intentionally omitted for this 1. From the work of this section it may seem easier to find the moments of the binomial distribution with the moment-generating function than to evaluate them directly. . n. n. by 7. n. n.. . 1 − θ ). θ ) = 1 − B(n − x − 1. is given by et (1 − ekt ) 6. 2. Exercises 1 4. Prove Theorem 3. and compare the result with that obtained in the parameters n and θ . θ ) (x + 1)(1 − θ ) 152 . . k. . θ ) = b(k. θ ) = · b(x. then Y = X1 + X2 + · · · + Xn is Also find the mean of this distribution by evaluating a random variable having the binomial distribution with  lim MX (t). a binomial distribution with n = 1. . k x = 1. which agrees with the formulas given in Theorem 2. because it can be looked upon as the Bernoulli distribution are μ = θ and σ 2 = θ (1 − θ ). 8. n. and Xn are independent ran- k(1 − et ) dom variables having the same Bernoulli distribution with the parameter θ . μ = nθ and σ 2 = μ2 − μ2 = nθ (1 − θ + nθ ) − (nθ )2 = nθ (1 − θ ). 2. n. If X has the discrete uniform distribution f (x) = for edition. μ3 or μ4 . . there must be k − 1 successes on the first x − 1 trials. general. The name “negative binomial distribution” derives from the fact that the values of b∗ (x. and the probability for this is   x − 1 k−1 b(k − 1. therefore. This question has been intentionally omitted for this μ(r) = E[X(X − 1)(X − 2) · . If the kth success is to occur on the xth trial. the rth factorial moment of X is given by 14. and the probability that the kth success occurs on the xth trial is. x − 1. .   x−1 k θ · b(k − 1. In the proof of Theorem 2 we determined the quan. find the factorial θ is the probability b(x. Use the recursion formula of Exercise 8 to show that  FX (t) = E(tX ) = tx · f (x) for θ = 12 the binomial distribution has n x (a) a maximum at x = when n is even. θ ) a maximum? moment-generating function of (a) the Bernoulli distribution and show that μ(1) = θ and 11. . the number of the trial on which the kth success occurs is a random vari- able having a negative binomial distribution with the parameters k and θ . and μ4 in terms of factorial moments. μ(r) = 0 for r > 1. the rth factorial moment defined in n−1 n+1 (b) maxima at x = and x = when n is odd. the probability that the fifth person to hear a rumor will be the first one to believe it. The factorial moment-generating function of a dis- the binomial distribution with n = 7 and θ = 0. . k−1 Thus. A random variable X has a negative binomial distribution and it is referred to as a negative binomial ran- dom variable if and only if   ∗ x−1 b (x. . θ ) for x = k. k + 1. tity E[X(X − 1)]. 2 Show that the rth derivative of FX (t) with respect to t at t = 1 is μ(r) . . 2 2 10. In (b) the binomial distribution and use it to find μ and σ 2 . . · (X − r + 1)] edition. If X is a binomial random variable. NEGATIVE BINOMIAL DISTRIBUTION. are the successive terms of the binomial expansion of 153 . we are sometimes interested in the number of the trial on which the kth success occurs. k. θ ) = θ (1 − θ )x−k k−1 DEFINITION 4. edition.25. 15. k + 1. With reference to Exercise 12. crete random variable X is given by 9. . 5 The Negative Binomial and Geometric Distributions In connection with repeated Bernoulli trials. Exercise 11. μ3 . . Special Probability Distributions Verify this formula and use it to calculate the values of 12. θ ) = θ (1 − θ )x−k k−1 The probability of a success on the xth trial is θ . called the second factorial moment. or the probability that a burglar will be caught for the second time on his or her eighth job. we may be inter- ested in the probability that the tenth child exposed to a contagious disease will be the third to catch it. θ ) = θ k (1 − θ )x−k for x = k. k + 2. n. For instance. This question has been intentionally omitted for this Express μ2 . for what value of 13. x − 1. k. k + 2. 0. 3rd ed.0645 When a table of binomial probabilities is available. THEOREM 5..40) = · b(3. k. Inc.. 3. θ ) = · b(k.60)7 2 = 0. the determination of nega- tive binomial probabilities can generally be simplified by making use of the following identity. what is the probability that the tenth child exposed to the disease will be the third to catch it? Solution Substituting x = 10. negative binomial distributions are θ θ also referred to as binomial waiting-time distributions or as Pascal distributions. x. k = 3. we get 3 b∗ (10. Solution Substituting x = 10. 154 .40 that a child exposed to a certain contagious disease will catch it.2150) 10 = 0. 0. 3. † Binomial expansions with negative exponents are explained in Feller. I. Vol. 1968.40) = (0. k = 3. Special Probability Distributions   1 1 − θ −k † − . EXAMPLE 4 Use Theorem 5 and Table I of “Statistical Tables” to rework Example 3. EXAMPLE 3 If the probability is 0. New York: John Wiley & Sons.40) 10 3 = (0. θ ) x The reader will be asked to verify this theorem in Exercise 18. 10. and θ = 0.40)3 (0.40 into the formula for the negative binomial distribution. THEOREM 6. The mean and the variance of the negative binomial distribu- tion are   k 2 k 1 μ= and σ = −1 θ θ θ as the reader will be asked to verify in Exercise 19. 0.0645 Moments of the negative binomial distribution may be obtained by proceeding as in the proof of Theorem 2. for the mean and the variance we obtain the following theorem. W. An Introduction to Probability Theory and Its Applications. we get   ∗ 9 b (10. In the literature of statistics. and θ = 0.40 into the formula of Theorem 5. k b∗ (x. 0. we get g(4. nof the  N elements contained in the set. in which case the trials are not independent. 2. Special Probability Distributions Since the negative binomial distribution with k = 1 has many important appli- cations. and we shall assume that they are all equally likely (which is what we mean when  wesay that  the selection is random). EXAMPLE 5 If the probability is 0. As in connection with the binomial distribution. without replacement. . it is given a special name.75)4−1 = 0. but now we are choosing. and.   M N −M There are ways of choosing x of the M successes and ways of x n−x    M N −M choosing n − x of the N − M failures. DEFINITION 5. 3. hence.25)3 = 0.75(1 − 0. it is called the geometric distribution.75) = 0. θ ) = θ (1 − θ )x−1 for x = 1. . what is the probability that an applicant will finally pass the test on the fourth try? Solution Substituting x = 4 and θ = 0. ways of choosing x n−x   N x successes and n − x failures.75(0. . this result is based on the assumption that the trials are all independent. 6 The Hypergeometric Distribution To obtain a formula analogous to that of the binomial distribution that applies to sampling without replacement. Since there are ways of choosing n of the N n elements in the set. A random variable X has a geometric dis- tribution and it is referred to as a geometric random variable if and only if its probability distribution is given by g(x. the probability of “x successes   M N −M N in n trials” is . x n−x n 155 . let us consider a set of N elements of which M are looked upon as successes and the other N − M as failures. we are interested in the probability of getting x successes in n trials.75 into the formula for the geometric distribution.0117 Of course.75 that an applicant for a driver’s license will pass the road test on any given try. GEOMETRIC DISTRIBUTION. and there may be some question here about its validity. 24. M) =   N x … M and n − x … N − M n Thus. A random variable X has a hyperge- ometric distribution and it is referred to as a hypergeometric random variable if and only if its probability distribution is given by    M N −M x n−x for x = 0.2880 The method by which we find the mean and the variance of the hypergeometric distribution is very similar to that employed in the proof of Theorem 2. HYPERGEOMETRIC DISTRIBUTION. N = 24. what is the probability that none of them will be included in the inspec- tor’s sample? Solution Substituting x = 0. n = 6. EXAMPLE 6 As part of an air-pollution survey. THEOREM 7. 2. let us directly evaluate the sum   M   N − Mn − x n x μ= x·   x=0 N n   N−M n M! n−x = ·   (x − 1)!(M − x)! N x=1 n 156 . If 4 of the company’s trucks emit excessive amounts of pollutants. for sampling without replacement. and M = 4 into the formula for the hypergeomet- ric distribution. N. an inspector decides to examine the exhaust of 6 of a company’s 24 trucks. the number of successes in n trials is a ran- dom variable having a hypergeometric distribution with the parameters n. n h(x. 1. N. n. . . and M. 4) =   24 6 = 0. . Special Probability Distributions DEFINITION 6. we get    4 20 0 6 h(0. . 6. The mean and the variance of the hypergeometric distribu- tion are nM nM(N − M)(N − n) μ= and σ2 = N N 2 (N − 1) Proof To determine the mean. we thus get   M(M − 1)n(n − 1) nM nM 2 σ2 = + − N(N − 1) N N nM(N − M)(N − n) = N 2 (N − 1) The moment-generating function of the hypergeometric distribution is fairly complicated. Special Probability Distributions where we omitted the term corresponding to x = 0. this becomes    M  M−1 m N −M μ=  · N y m−y y=0 n       k m n m+n Finally. we get r k−r k r=0     M N −1 M N −1 nM μ=  · =  · = N m N n − 1 N n n To obtain the formula for σ 2 . which is 0. Details of this may be found in the book The Advanced Theory of Statistics by M. letting y = x − 1 and m = n − 1. Kendall and A. using = . When N is large and n is relatively small compared to N (the usual rule of thumb is that n should not exceed 5 percent of N). x   N Then. and the formula for M the binomial distribution with the parameters n and θ = may be used to approx- N imate hypergeometric probabilities. we get n    M  M−1 n N −M μ=  · N x−1 n−x x=1 n and. we proceed as in the proof of Theorem 2 by first evaluating E[X(X − 1)] and then making use of the fact that E(X 2 ) = E[X(X − 1)] + E(X). factoring out M . Leaving it to the reader to show that M(M − 1)n(n − 1) E[X(X − 1)] = N(N − 1) in Exercise 27. G. 157 . there is not much difference between sampling with replacement and sampling without replacement. Stuart. and canceled   M the x against the first factor of x! = x(x − 1)! in the denominator of . Specifically. Solution (a) Substituting x = 2.995)2. N = 120. we get        2 5 2 2 2 3 b 2. n = 5. As can be seen from these results. Special Probability Distributions EXAMPLE 7 Among the 120 applicants for a job. the approxima- tion is very close. n = 5. 80) =   120 5 = 0. we first have to determine . (b) the formula for the binomial distribution with θ = 80 120 as an approximation. only 80 are actually qualified. and if the 18 probability is 0.165 rounded to three decimals. If 5 of the appli- cants are randomly selected for an in-depth interview. we shall investigate the limiting form of the binomial distribution when n → q. and θ = 80 120 = 2 3 into the formula for the binomial distribution. (b) substituting x = 2. Letting this constant be λ.982 .000 day will suffer from heat exhaustion. find the probability that only 2 of the 5 will be qualified for the job by using (a) the formula for the hypergeometric distribution.005 that any one of the 3. 5. hence. θ = . = 1− 3 2 3 3 = 0. θ → 0. 120.005)18 (0. In this section we shall present a probability distribution that can be used to approximate binomial probabilities of this kind.164 rounded to three decimals. to calculate the probability that 18 of 3. the calculation of binomial probabilities with the formula of Defi- nition 3 will usually involve a prohibitive amount of work.000 persons watching the parade will suffer from heat exhaustion. For instance. nθ = λ and. we can write n 158 .000 persons watching a parade on a very  hot summer 3. we get    80 40 2 3 h(2. 7 The Poisson Distribution When n is large. that is. 5. while nθ remains con- λ stant. we also have to calculate the value of (0. and M = 80 into the formula for the hypergeometric distribution. Special Probability Distributions      n λ x λ n−x b(x. · (n − x + 1) and write    −n/λ . . · (n − x + 1) λ λ n−x = 1− x! n n  x λ Then. n. if we divide one of the x factors n in into each factor of the product n n(n − 1)(n − 2) · . . . . θ ) = 1− x n n  x   n(n − 1)(n − 2) · . .−λ  −x λ n−x λ λ 1− as 1− 1− n n n we obtain       1 1 − n1 1 − n2 · . . · 1 − x−1 −n/λ . . 2. we find that      1 2 x−1 1 1− 1− ·. x! DEFINITION 7. consider the computer printout of Figure 4. POISSON DISTRIBUTION. which shows.05.−λ   n λ λ −x (λ)x 1− 1− x! n n Finally. . . θ → 0. the Poisson distribution will provide a good approximation to binomial probabilities when n G 20 and θ F 0. 2. 1. When n G 100 and nθ < 10.. x! Thus. that the limiting distribution becomes λx e−λ p(x. the approximation will generally be excellent. λ) = for x = 0. 1. To get some idea about the closeness of the Poisson approximation to the bino- mial distribution. . hence. . A random variable has a Poisson distribu- tion and it is referred to as a Poisson random variable if and only if its probability distribution is given by λx e−λ p(x. in the limit when n → q.· 1− →1 n n n   λ −x 1− →1 n   λ −n/λ 1− →e n and. . In general. the number of successes is a random variable having a Poisson distribution with the parame- ter λ. one above 159 .. if we let n → q while x and λ remain fixed. and nθ = λ remains constant. This distribution is named after the French mathematician Simeon Poisson (1781–1840). λ) = for x = 0. 5.5. among 10. 8) = = = 0.0017. (a) exactly two will have a flat tire.0758. Special Probability Distributions the other.00034 (from Table VIII of “Statistical Tables”) into the formula of Definition 7. Solution (a) Referring to Table II of “Statistical Tables”.05) = 7.. EXAMPLE 8 Use Figure 4 to determine the value of x (from 5 to 15) for which the error is greatest when we use the Poisson distribution with λ = 7.5. 000(0. 0.768)(0. 0. but more often than not.0003. the Poisson probability is 0.093 5! 120 In actual practice. such as Table II of “Statistical Tables”.00005 that a car will have a flat tire while crossing a certain bridge. (b) at most two will have a flat tire.0037.05. x = 15. x = 6. and 2. the Poisson probabilities are 0.0006.05 and the Poisson distribution with λ = 150(0. Solution Substituting x = 5. the 160 . EXAMPLE 9 If 2 percent of the books bound at a certain bindery have defective bindings.0013.0037. or more extensive tables in handbooks of statistical tables. .0011. 1. and e−8 = 0. the binomial distribution with n = 150 and θ = 0. Thus. we get 0.0027. EXAMPLE 10 Records show that the probability is 0.0011. we get 85 · e−8 (32.0034. −0. . −0. and 0. Thus. Use the Poisson distribution to approximate the binomial probabilities that. −0. . (b) Referring to Table II of “Statistical Tables”. 0. Sometimes we refer to tables of Poisson probabilities.00034) p(5. −0. Solution Calculating the differences corresponding to x = 5. we refer to suit- able computer software. −0.00005) = 0. and it corresponds to x = 8.0011.02) = 8. Poisson probabilities are seldom obtained by direct substi- tution into the formula of Definition 7.0758. we find that for x = 2 and λ = 10.0008.3033. The examples that follow illustrate the Poisson approximation to the binomial distribution. we find that for x = 0. and λ = 0. 0.5 to approximate the binomial distribution with n = 150 and θ = 0. λ = 400(0. use the Poisson approximation to the binomial distribution to determine the probability that 5 of 400 books bound by this bindery will have defective bindings. 0. and 0. The use of tables or computers is of special importance when we are concerned with probabilities relating to several values of x. the maximum error (numerically) is −0.000 cars crossing this bridge.6065. nowadays. Computer printout of the binomial distribution with n = 150 and θ = 0. 161 .5.05 and the Poisson distribution with λ = 7. Special Probability Distributions Figure 4. θ → 0.0758. and nθ = λ remains constant) to the mean and the variance of the binomial distribution. which approaches λ when θ →0. where it exists. 162 . The moment-generating function of the Poisson distribution is given by MX (t) = eλ(e −1) t Proof By Definition 7 and the definition of moment-generating function— The moment generating function of a random variable X. THEOREM 8.5.3033 + 0. Solution (a) Reading off the value for K = 2 in the P(X = K) column. and x Figure 5. (b) Here we could add the values for K = 0. we can obtain formulas for its mean and its variance by applying the same limiting conditions (n → q. Special Probability Distributions probability that at most 2 of 10. Computer printout of the Poisson distribution with λ = 0. and K = 2 in the P(X = K) column.9856. For the mean we get μ = nθ = λ and for the variance we get σ 2 = nθ (1 − θ ) = λ(1 − θ ). K = 1.9856 EXAMPLE 11 Use Figure 5 to rework the preceding example.0758 = 0. THEOREM 9. or we could read the value for K = 2 in the P(X LESS OR = K) column. is given by MX (t) = E(etX ) = etX · f (x) when X is discrete.6065 + 0. Having derived the Poisson distribution as a limiting form of the binomial dis- tribution. getting 0. we get 0. The mean and the variance of the Poisson distribution are given by μ = λ and σ 2 = λ These results can also be obtained by directly evaluating the necessary summa- tions (see Exercise 33) or by working with the moment-generating function given in the following theorem.000 cars crossing the bridge will have a flat tire is 0. a Poisson distribution might describe the number of telephone calls per hour received by an office. 12) = 0. which agrees with Theorem 8. MX (t) = e−λ · eλe = eλ(e −1) t t Then. the number of typing errors per page. λ. (2) the probability of a single success occurring in a very short time interval or in a very small region is proportional to the length of the time interval or the size of the region. then the number of successes in an interval of t units of time or t units of the specified region is a Poisson random variable with 163 . What is the probability that on a given day fewer than 9 trucks will arrive at this depot? Solution Let X be the number of trucks arriving on a given day. Thus. successes occur at a mean rate of α per unit time or per unit region. the Poisson distribution can serve as a model for the number of successes that occur during a given time interval or in a specified region when (1) the numbers of successes occurring in nonoverlapping time intervals or regions are independent. Special Probability Distributions q MX (t) = E(etX ) = −q e · f (x)dx tx when X is continuous—we get q  q  (λet )x λx e−λ MX (t) = ext · = e−λ · x! x! x=0 x=0 q  (λet )x where can be recognized as the Maclaurin’s series of ez with x! x=0 z = λet . EXAMPLE 12 The average number of trucks arriving on any one day at a truck depot in a certain city is known to be 12. if we differentiate MX (t) twice with respect to t.1550 x=0 If. we get  (t) = λet eλ(e −1) t MX  (t) = λet eλ(e −1) + λ2 e2t eλ(e −1) t t MX so that μ1 = MX (0) = λ and μ = M (0) = λ + λ2 . For example. it has many applications that have no direct connection with binomial distributions. μ = λ and σ 2 = μ − μ2 = 2 X 2 (λ + λ ) − λ = λ. in a situation where the preceding conditions apply. or the number of bacteria in a given culture when the average number of successes. 2 2 Although the Poisson distribution has been derived as a limiting form of the binomial distribution. Hence. and (3) the probability of more than one success occurring in such a short time interval or falling in such a small region is negligible. we get  8 P(X < 9) = p(x. using Table II of “Sta- tistical Tables” with λ = 12. for the given time interval or specified region is known. Then. Thus. . X. its failure rate is constant and equal to θ . θ2 θ2 23. show that the mean of the geometric distribution is given where X has the distribution of Definition 4. αt) = for x = 0. If the kth x=1 success occurs on the xth trial. x! EXAMPLE 13 A certain kind of sheet metal has. we have λ = αt = (5)(1. it must be preceded by x − k failures. and σY2 . . on the average. find expressions for μY 2−θ 1−θ show that μ2 = and hence that σ 2 = . that is. differentiating again with respect to θ . A variation of the binomial distribution arises when both sides of the equation the n trials are all independent. If we assume a Poisson distribution. θ2 able.5 and P(X G 6) = 1 − P(X F 5) = 1 − 0. Use the moment-generating function derived in Exer. since the unit of area is 10 square feet. θ (1 − θ )x−1 = 1 ber of failures that precede the kth success.7586 according to the computer printout shown in Figure 4. show that 19. Exercises 16. 1 − F(x − 1) 1 cise 20 to show that for the geometric distribution. what is the probability that a 15-square-foot sheet of the metal will have at least six defects? Solution Let X denote the number of defects in a 15-square-foot sheet of the metal. The negative binomial distribution is sometimes q  defined in a different way as the distribution of the num. If X is a random variable having a geometric distribu- 18. Prove Theorem 6 by first determining E(X) and P(X = x + n|X > n) = P(X = x) E[X(X + 1)]. tion. μ = where F(x) is the value of the corresponding distribution θ 1−θ function at x. in a time interval of length t units or a region of size t units has the Poisson distribution e−αt (αt)x p(x.5) = 7. Prove Theorem 5. θ 17. on the xth trial. 1 by μ = . 1. If the probability is f (x) that a product fails the xth geometric distribution is given by time it is being used. Differentiating with respect to θ the expressions on 25. it is given by 1 − et (1 − θ ) f (x) Z(x) = 21. Show that the moment-generating function of the 24. 2. Then. Thus. . Therefore. but the probability of a 164 . Show that if X is a geometric random vari- and σ 2 = . With reference to Exercise 16. 20. find the distribution of Y = X − k. the number of successes. five defects per 10 square feet. then its fail- ure rate at the xth trial is the probability that it will fail on θ et the xth trial given that it has not failed on the first x − 1 MX (t) = trials. 22.2414 = 0. Then. Special Probability Distributions the mean λ = αt (see Exercise 31). symbolically. t)α · t to verify that σY2 = λ. t)] 1  n = α[f (x − 1. Also. λ p(x + 1. and μ4 . n. about the mean of the Poisson distribution: rithms. N = 9. Verify the expression given for E[X(X − 1)] in the of the Poisson distribution by first evaluating E(X) and proof of Theorem 7. x! x=0 30. n. . t) is the probability of getting x suc. 32. Special Probability Distributions success on the ith trial is θi . the work can often be simplified by first calculating 35. the mean 34. M) = y=0 (x + 1)(N − M − n + x + 1) · h(x.   (b) Table II of “Statistical Tables. use this recursion formula and the from t to t + t is α · t.   z n 29. 100. Derive the formulas for the mean and the variance 27. and these probabilities are and hence that not all equal.] n→q n tion. [Hint: Make use of the fact that lim 1 + = ez . the work can often be simplified by first calculating h(0. 2. . M = 5. are applied to the moment- N−n generating function of the binomial distribution. and one success during such a time interval is negligible. (iii) the probability of a success during such a time inter- val does not depend on what happened prior to time t. N. M) This result is important because values of the dis- tribution function of a Poisson random variable may Verify this formula and use it to calculate the values of thus be obtained by referring to a table of incomplete the hypergeometric distribution with n = 4. N. M) and then using the recur-  x  q λy e−λ 1 sion formula = · tx e−t dt (n − x)(M − x) y! x! λ h(x + 1. Use Theorem 9 to find the moment-generating func- (a) Show that under these conditions tion of Y = X − λ. each value of x) is given by the Poisson distribution with n λ = αt. and gamma functions. (ii) the probability of more than fact that μ0 = 1 and μ1 = 0 to find μ2 . dλ cesses during a time interval of length t when (i) the probability of a success during a very small time interval for r = 1.” dμr μr+1 = λ rμr−1 + 31. How do N−1 get the moment-generating function of the Poisson these results tie in with the discussion in the theorem? distribution. When calculating all the values of a Poisson distribu. λ) 36. Differentiating with respect to λ the expressions on x+1 both sides of the equation Verify this formula and use it and e−2 = 0. and use it f (x. t + t) = f (x. i=1 26. When calculating all the values of a hypergeomet. Approximate the binomial probability b(3. μ3 . dt n i=1 (b) σX2 = nθ (1 − θ ) − nσθ2 . 165 . N and the variance of the hypergeometric distribution can while nθ remains constant. we be written as μ = nθ and σ 2 = nθ (1 − θ ) · .10) by using derive the following recursion formula for the moments (a) the formula for the binomial distribution and loga. Use repeated integration by parts to show that ric distribution. Suppose that f (x. N. where θ = · θi . where θ is as defined in part (b) Show by direct substitution that a solution of this infinite system of differential equations (there is one for 1  n (a) and σθ2 = · (θi − θ )2 . λ) and then using the recursion formula edition. If X is the number of successes obtained under these conditions in n trials. t)] (a) μX = nθ . where X is a random variable having the Poisson distribution with the parameter λ. Show that if we let θ = in Theorem 7. n. 3. This question has been intentionally omitted for this p(0. t)[1 − α · t] + f (x − 1. θ → 0. Show that if the limiting conditions n → q. t) − f (x. and verify the formula given for α3 in Exercise 35.1353 to ver- q  ify the values given in Table II of “Statistical Tables” for λx e−λ μr = (x − λ)r · λ = 2. λ) = · p(x.. show that d[f (x. E[X(X − 1)]. 33. M 28. . 0. 37. when persons interviewed by an opinion poll are asked whether they are for a candidate. . . we first find that the probability of getting x1 outcomes of the first kind. x2⎞outcomes of the second kind. . . MULTINOMIAL DISTRIBUTION. . . and xk outcomes of the kth kind ⎝with xi = n⎠. . . To treat this kind of problem in general. . .. .) 166 . . . and the kth kind. . . 1. (For the purpose of this example. n for each i. . xk k k for xi = 0. . To get the corresponding probability for that many outcomes of each kind in any order. 5 will read newspaper A. . against her. n. EXAMPLE 14 A certain city has 3 newspapers. Xn have a multinomial distribution and they are referred to as multinomial random variables if and only if their joint probability distribution is given by   n x x x f (x1 . for instance. or undecided or when samples of manufactured products are rated excellent. . . and 1 will read newspaper C. . and the trials are all independent. i=1 i=1 Thus. and xk outcomes of the kth kind in a specific order x x x is θ1 1 · θ2 2 · . let us consider the case where there are n independent trials permitting ⎛ k mutually⎞exclusive outcomes whose respective k probabilities are θ1 . The name “multinomial” derives from the fact that for various values of the xi . θ2 . . . . the probabilities equal corresponding terms of the multinomial expansion of (θ1 + θ2 + · · · + θk )n . . . k . . θ2 . among 8 randomly-chosen readers in that city. . θ2 . . . · xk ! DEFINITION 8. X2 . we shall be interested in the probability of getting x1 outcomes of⎛the first kind. B. · θk k x1 . 2 will read newspaper B. . has 30 percent of the readers. assume that no one reads more than one newspaper. and C. The random variables X1 . . average.. Referring to the outcomes as being i=1 of the first kind. θk ) = · θ1 1 · θ2 2 · . . x2 out- comes of the second kind. or inferior. . the numbers of outcomes of the different kinds are random variables hav- ing the multinomial distribution with the parameters n. . . . θ1 . where xi = n and θi = 1. x2 . the probabilities of the respective outcomes are the same for each trial. A. This would be the case. above average. · θk k . θ1 . we shall have to multiply the probability for any specific order by   n n! x1 . . i=1 Proceeding as in the derivation of the formula for the binomial distribution. . . xk = x1 ! · x2 ! · . the second kind. x2 . θk ⎝with θi = 1⎠. x2 .. xk . . Special Probability Distributions 8 The Multinomial Distribution An immediate generalization of the binomial distribution arises when each trial has more than two possible outcomes. Newspaper B. . . and newspaper C has the remaining 20 percent. . Find the probability that.. Newspaper A has 50 percent of the readers in that city. and θk . x2 . and n = 8 into the formula of Definition 8.20. and xk elements of the kth kind. . x3 = 1. Special Probability Distributions Solution Substituting x1 = 5. xk .   M1 M2 There are ways of choosing x1 of the M1 elements of the first kind. i=1 i=1 167 . . . n. The random variables X1 . θ1 = 0.30. . . 8. . x2 = 2.50. ·.50)5 (0.· x1 x2 xk f (x1 ... θ2 = 0.0945 9 The Multivariate Hypergeometric Distribution Just as the hypergeometric distribution takes the place of the binomial distribution for sampling without replacement.. x1 x2   Mk ways of choosing x2 of the M2 elements of the second kind. x1 x2 xk n DEFINITION 9. and. M1 . .  without  replacement. of which M1 are elements of the first kind. hence. . i=1 we are interested in the probability of getting x1 elements (outcomes) of the first kind. 0. Since there are ways of n i=1 choosing n of the N elements in the set and we assume that they are all equally likely (which is what we mean when wesay that  the selection  is  random).. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION. . there also exists a multivariate distribution anal- ogous to the multinomial distribution that applies to sampling without replacement. . 0..30)2 (0. . . and Mk are elements of the k kth kind.. .20) 5! · 2! · 1! = 0. 2. 0. such that Mi = N. it follows   M1 M2 Mk N that the desired probability is given by ·. . As in connection with the multinomial distribution. but now we are choosing.· . where xi = n and Mi = N.30. . .. . . . M2 . . . . 1. n and xi … Mi for each i. Mk ) =   N n k k for xi = 0. Xk have a multivariate hypergeometric distribution and they are referred to as multivariate hypergeometric random variables if and only if their joint probability distribution is given by      M1 M2 Mk ·. let us consider a set of N elements.· x1 x2 xk   k N ways of choosing the required xi = n elements. . we get 8! f (5. 1. X2 . n of the N elements of the set.. To derive its formula.50. . M2 are elements of the second kind. x2 elements of the second kind.. θ3 = 0. and ways of xk      M1 M2 Mk choosing xk of the Mk elements of the kth kind.20) = (0. is violated. will depend upon p. Xk have the multinomial distribution of 39. Suppose a sampling plan requires samples of size n from each lot. 10 The Theory in Practice In this section we shall discuss an important application of the binomial distribution. x4 = 2. 2. that is. . and n = 12 into the formula of Definition 9. EXAMPLE 15 A panel of prospective jurors includes six married men. M1 = 6. show that the covariance of Xi and Xj is tion of Xi is nθi for i = 1. . . . one single man. that is. 7. but it is rarely scrapped. .. Xk have the multinomial distribution of Definition 8. show that the mean of the marginal distribu. (A rejected lot may be subjected to closer inspection. X2 . x2 = 1. is a multivariate hypergeometric distribution with the parameters n. k. 12. we calculate the probability of accepting a lot for several different val- ues of p. . 5. . . Suppose. and i Z j. . Since the value of p is unknown. If the number of defectives found in the sample exceeds a given acceptance number. for large 168 . The probability that a lot will be accepted by a given sampling plan. j = 1. . k. seven married women. and a statement about the maximum number of defectives allowed before rejection takes place.0450 Exercises 38. supervised conditions. and Mk . M1 . . . M3 = 7. the lot is rejected. (Since sam- pling inspection is done without replacement. . that the acceptance number is c. M2 . . Definition 8. M2 = 3. the probability of finding c or fewer defectives in a sample of size n. what is the prob- ability that a jury will consist of four married men. and four single women. −nθi θj for i = 1. Special Probability Distributions Thus. 3. 2. . 2. . But if the sample size is small relative to the lot size. the actual proportion of defectives in the lot. 4) =   20 12 = 0. the distribution of the numbers of outcomes of the different kinds. 2. the joint distribution of the random variables under consideration. In sampling inspection. this assumption is nearly satisfied. .) A sampling plan consists of a specification of the number of items to be included in the sample taken from each lot. and two single women? Solution Substituting x1 = 4. . further. . . the lot will be accepted if c defectives or fewer are found in the sample. If the selection is random.) Thus. If X1 . x3 = 5. and that the lot size is large with respect to n. underlying the binomial distribution. . the assumption of equal probabilities from trial to trial. X2 . we get      6 3 7 4 4 1 5 2 f (4. N = 20. namely sampling inspection. a specified sample of a lot of manufactured product is inspected under controlled. . three single men. If X1 . is given by the binomial distribution to a close approximation. The probability of acceptance. M4 = 4. . of course. 6. five married women. k. 1. n. or OC curve.35 0. for a given sampling plan (sample size.6477 0. PROBABILITY OF ACCEPTANCE.15 0. p) = B(c.9841 0. it is certain that a lot with a given small value of p or less will be accepted. a better OC curve can be obtained by increasing the sample size. .0049 A graph of L(p) versus p is shown in Figure 6. the OC curve of Figure 6 seems to do a poor job of discriminating between “good” and “bad” lots.25 0. p0 .10) for values of p greater than about 0. . In such cases. Also. let us consider the sampling plan having n = 20 and c = 3. sampling plans can be evaluated by choosing two values of p considered to be important and calculating the probabilities of lot acceptance at these values.20 0. This equation simply states that the probability of c or fewer defectives in the sample is given by the probability of 0 defectives. In this figure. there is no “gray area”. To illustrate the construction of an OC curve. It can be seen from this definition that.30. up to the probability of c defectives. If the actual proportion of defectives in the lot lies between 0. that is. c).1071 0.4114 0.40 0. and it is certain that a lot with a value of p greater than the given value will be rejected. as there always will be some statistical error associated with sampling. however. and a lot is accepted if the sample contains 3 or fewer defectives. p.0444 0. the probabilities that a random variable having the binomial distribution b(x. plus the probability of 1 defec- tive.30.30 0. the probability of accepting a lot having the proportion of defectives p is closely approximated by the following definition.2252 0. the actual (unknown) proportion of defectives in the lot. samples of size 20 are drawn from each lot. Special Probability Distributions lots. 20. n.10 0. DEFINITION 10. p) k=0 where p is the actual proportion of defectives in the lot.05 0. . Definition 10 is closely related to the power function. By comparison. it is somewhat of a tossup whether the lot will be accepted or rejected. and acceptance number. a number. defines the characteristics of the sampling plan. the probability of acceptance depends upon p.9) for small values of p.0160 0.45 L(p) 0. p) will assume a value less than or equal to 3 for various values of p are as follows: p 0.8670 0.10 and 0. If n is the size of the sample taken from each large lot and c is the acceptance number. Inspection of the OC curve given in Figure 6 shows that the probability of accep- tance is quite high (greater than 0. . Thus a curve can be drawn that gives the probability of accepting a lot as a function of the lot proportion defective. An “ideal” OC curve would be like the one shown in Figure 7. This value of p is 169 . That is. Referring to the line in Table I of “Statistical Tables” corresponding to n = 20 and x = 3. the probability of acceptance is low (less than 0. called the operating characteristic curve. is chosen so that a lot containing a proportion of defectives less than or equal to p0 is desired to be accepted.10. This curve. n. First. n. with each probability being approxi- mated by the binomial distribution having the parameters n and θ = p. However. The OC curve of a sampling plan never can be like the ideal curve of Figure 7 with finite sample sizes. the probability of acceptance is closely approximated by  c L(p) = b(k. say values less than about 0. 20 . The probability that a “good” lot will be rejected is called the producer’s risk.8 .3 .10 . and the probability that a “bad” lot will be accepted is called the consumer’s risk. The producer’s risk expresses the probability that a “good” lot (one with p < p0 ) will erroneously be rejected by the sampling plan. p1 . 170 .05 .2 Consumer's risk . or LTPD. It is the risk that the producer takes as a consequence of sampling variability.30 . On the other hand.25 . We evaluate a sampling plan by finding the probability that a “good” lot (a lot with p … p0 ) will be rejected and the probability that a “bad” lot (one with p Ú p1 ) will be accepted. called the acceptable quality level. or AQL.4 . a second value of p.0 } Producer's risk . “Ideal” OC curve. he is committing an error referred to as a type I error.97 . is chosen so that we wish to reject a lot containing a proportion of defectives greater than p1 .15 . if the true value of the parameter θ L(p) 1 0 p 1 Figure 7. Special Probability Distributions L( p) 1. This value of p is called the lot tolerance percentage defective.5 . OC curve.45 AQL LTPD Figure 6. Then.1 0 p .40 . The consumer’s risk is the probability that the consumer erroneously will receive a “bad” lot (one with p > p1 ). These risks are analogous to the type I and type II errors.9 .6 .35 . α and β (If the true value of the parameter θ is θ0 and the statistician incorrectly concludes that θ = θ1 .7 . 1756 0. and the consumer’s risk is 0.0001 A graph of this OC curve is shown in Figure 8.7358 = 0.4 .) Suppose an AQL of 0.3 .97.0076 0.0005 0.3917 0.3 . we observe that the producer’s risk is 1 − 0. To produce a plan with better characteristics. This plan obviously has an unacceptably high consumer’s risk—over 40 percent of the lots received by the consumer will have 20 percent defectives or greater.40 0.2 .0243 0.15 0.05). Special Probability Distributions is θ1 and the statistician incorrectly concludes that θ = θ0 . to decrease the acceptance number.25 0.0692 p 0 . Solution First. or both.20 for the sampling plan defined by n = 20 and c = 1.05 0.41.2642.0 .0692 0.20 0. it can be seen from Figure 6 that the given sampling plan has a producer’s risk of about 0.30 0.9 . since the probability of acceptance of a lot with an actual proportion defective of 0. EXAMPLE 16 Find the producer’s and consumer’s risks corresponding to an AQL of 0. we calculate L(p) for various values of p.05 and an LTPD of 0.8 .20 is chosen.05 is chosen (p0 = 0.4 .05 is approximately 0.45 L(p) 0.0021 0.10 0. Similarly.5 .1 .35 0.0692. c. From this graph. the consumer’s risk is about 0. n. if an LTPD of 0.03.2 . L( p) 1. Referring to Table I of “Statistical Tables” with n = 20 and x = 1. Then.7 .7358 . it will be necessary to increase the sample size. OC curve for Example 16.05 . The following example shows what happens to these characteristics when c is decreased to 1. he is committing a type II error. 171 . while n remains fixed at 20.6 . Note that the work of constructing OC curves can be shortened considerably using computer software such as Excel or MINITAB.6 AQL LTPD Figure 8.1 . we obtain the following table: p 0.5 .7358 0. what is the probability that he will get (b) the values in the P(X LESS OR = K) column.000 hours of operation. If 40 percent of the mice used in an experiment will be correct to say three out of five. If one drug will become very aggressive within 1 minute. In practice. ability that five of the next six divorce cases filed in this city will claim incompatibility as the reason. Choice then is made of the sampling plan whose OC curve has as nearly as possible the desired characteristics. and the third answer (a) the values in the P(X = K) column.20 (20 percent defectives). distribution? 45. cars stolen in the given city anywhere from 3 to 5 will be anced die and checking the first answer if he gets a 1 or 2. The preceding example has been somewhat artificial. Assuming that this claim is true. A multiple-choice test consists of eight questions and 46. using the second answer if he gets a 3 or 4. if he gets a 5 or 6. use Table I distribution by half. suppose that the per- centage had been 51 instead of 50. four decimals. Twenty products at least 8 of 10 cars stolen in this city will be recov. It was found that three of them required service (a) the values in the P(X = K) column. find the probability predictions. printout of Figure 1.42 to rework both parts of that exercise. suppose that the per- 41. exactly four correct answers? 47. time a given product will sustain fewer than 1. 18 and θ = 0. board member claims that four out of five newly hired (b) Table I of “Statistical Tables. Find the prob. tribution having the parameters n and θ . It would be quite unusual to specify an LTPD as high as 0.63 that a car stolen 51. and producer’s risk for sample sizes in an acceptable range. find the probability that among 10 rect). A social scientist claims that only 50 percent of all high school seniors capable of doing college work actually 50. and higher sample sizes than 20 usually are used for acceptance sampling. An automobile safety engineer claims that 1 in 10 centage had been 42 instead of 40. With reference to Exercise 43. In planning the operation of a new school. the two become very aggressive within 1 minute after having been board members have been about equally reliable in their administered an experimental drug. In the past. before 1. In a certain city. using tested.” teachers will stay with the school for more than a year. (a) To reduce the standard deviation of the binomial go to college. Use a suitable table or 42. A manufacturer claims that at most 5 percent of the in a certain Western city will be recovered. can be made about the standard deviation of the resulting (c) at most 8 will go to college. With reference to Exercise 44.” newly hired teachers stayed with the school for more than a year? 44. using or the other has to be right. recovered. consumer’s risk. Suppose that the probability is 0. Applied Exercises SECS. LTPD. but now the producer’s risk seems unacceptably high. were selected at random from the production line and ered. while another school board member claims that it would 43. one school (a) the formula for the binomial distribution. so in the absence of any other information that exactly 6 of 15 mice that have been administered the we would assign their judgments equal weight. what is the probability that at least 3 of 5 automobile accidents are due to driver fatigue? 48. Evidently.51 to rework the three parts of that exercise. If a student answers each question by rolling a bal. Comment on the manu- (b) the values in the P(X LESS OR = K) column. With reference to Exercise 45 and the computer three answers to each question (of which only one is cor. Use a suitable table automobile accidents is due to driver fatigue. 1–4 40. what probabilities would (a) the formula for the binomial distribution. Using the or a computer printout of the binomial distribution with formula for the binomial distribution and rounding to n = 15 and θ = 0. using 49. OC curves have been calculated for sampling plans having many different combinations of n and c. 172 . what change must be made in the of “Statistical Tables” to find the probabilities that among number of trials? 18 high school seniors capable of doing college work (b) If n is multiplied by the factor k in the binomial dis- (a) exactly 10 will go to college. a larger sample size is needed. AQL. Special Probability Distributions Reduction of the acceptance number from 3 to 1 obviously has improved the consumer’s risk. what statement (b) at least 10 will go to college. Use the com. facturer’s claim. incompatibility is given as the legal a computer printout of the binomial distribution with n = reason in 70 percent of all divorce cases.000 hours puter printout of Figure 1 to find the probability that of operation before requiring service. we assign to their claims if it were found that 11 of 12 (b) Table I of “Statistical Tables. but the law of large (c) 2 have college degrees. tical Tables” to (b) Can this result be used as evidence that the assump. In (a) none has a college degree. note degrees.000 flips of a balanced coin the proportion of is turned on or off? Assume that the conditions underly- heads will be between 0. If the probability is 0. Check in each case whether the condition for the believe it. 10 have college ing in advertisements. If the probability is 0. (a) verify the result of Example 5. results may seem quite surprising. A shipment of 80 burglar alarms contains 4 that are straight for the first time on the sixth take? defective.47 and 0.497 and 0. ing condition. beginning with page 1 and proceeding in any con- venient. (c) in 1. otherwise.30 that a certain actor will get his lines straight on any one take. of two hand-held calculators from each incoming lot of Note that this serves to illustrate the law of large num. (a) N = 200 and n = 12. (Note that 0 cannot be a leftmost digit. 64. the probability (c) N = 640 and n = 30. Find the mean and the variance of the hypergeometric 57. Record the first 200 numbers encountered in a news. . 2’s. of rumor about the transgressions of a certain politician. numbers tells you that you must be estimating correctly. . what are the probabilities that 3’s.000 flips of a balanced coin the proportion of 63. is 0. tion is reasonable? Why? (b) rework Exercise 59. rolled? Why? (b) Theorem 5 and Table I of “Statistical Tables. size 18 and accepts the lot if they are both in good work- bers.) The (b) 1 has a college degree. what is the probability will be between 0. 66. and use the formula and Table I of “Statis- such calls last that long. 5–7 65. A quality control engineer inspects a random sample heads will be between 0. Include also numbers appear. Special Probability Distributions 52.000. What are the probabilities 55. (d) all 3 have college degrees? SECS.0074.503. 60. using are both 0. (a) a family’s fourth child is their first son.001 that the switch will (a) in 900 flips of a balanced coin the proportion of heads fail any time it is turned on or off. and M = 10. the leftmost digit is 7. systematic fashion. for the second time on the fifteenth shot using (b) Would it surprise you if more than 18 “sevens” were (a) the formula for the negative binomial distribution. N = 16. If 3 of the applicants are randomly chosen for the leftmost digit. If the probabilities of having a male or female child distribution with n = 3. (a) Use a computer program to calculate the proba. . interviews. (b) 8 calculators that are not in good working condition. What is the probability that he will get his lines 68. 56. Use Chebyshev’s theorem and Theorem 3 to verify 62.” 53.50. the entire lot is inspected with the cost charged to the vendor. that the switch will not fail during the first 800 times that it (b) in 10. 59. (b) the formulas of Theorem 7. Adapt the formula of Theorem 5 so that it can be used bility that more than 12 of 80 business telephone calls last to express geometric probabilities in terms of binomial longer than five minutes if it is assumed that 10 percent of probabilities. 61. You can get a feeling for the law of large numbers that such a lot will be accepted without further inspection given Section 4 by flipping coins.40 and 0. (b) a family’s seventh child is their second daughter. What is the probability that an IRS auditor will catch only 2 income tax returns with illegitimate deductions if 58. and record the proportions of 1’s.60. five flips. ing the geometric distribution are met and use logarithms. binomial approximation to the hypergeometric distribu- (b) the fifteenth person to hear the rumor will be the tion is satisfied: tenth to believe it. (b) N = 500 and n = 20. In a “torture test” a light switch is turned on and off that the probability is at least 35 36 that until it fails. An expert sharpshooter misses a target 5 percent of bility of rolling between 14 and 18 “sevens” in 100 rolls of the time. (c) 12 calculators that are not in good working condition? paper. Find the probability that she will miss the target a pair of dice. Among the 16 applicants for a job. Flip a coin 100 times if it contains and plot the accumulated proportion of heads after each (a) 4 calculators that are not in good working condition. For each of these numbers. (c) a family’s tenth child is their fourth or fifth son.75 that a person will believe a she randomly selects 5 returns from among 15 returns. . When taping a television commercial. If 3 from the shipment are randomly selected 173 . find the probabilities that (a) the results of Exercise 64. the decimal number 0. 54. (a) Use a computer program to calculate the proba. find which 9 contain illegitimate deductions? the probabilities that (a) the eighth person to hear the rumor will be the fifth to 67.53. and 9’s. 3. 81. x (from 5 to 15) for which the percentage error is great- est when we use the Poisson distribution with λ = 7. administers the pension fund. (b) Approximate this probability using the appropriate Use the Poisson approximation to the binomial distribu.000 per- customer will get exactly one bad unit using sons attending the fair at most 2 will get food poisoning.000 manufactured products assumed to contain six defectives. 74. Special Probability Distributions and shipped to a customer. a certain kind of compact car will average less 75. most one imperfection using (b) n = 25 and θ = 0. A panel of 300 persons chosen for jury duty includes the Poisson distribution to find the probability that it will 30 under 25 years of age. 8–9 (b) at most 3 will be involved in at least one accident in any given year. tion to determine the probability that among 150 licensed (c) Approximate this probability using the appropriate drivers randomly chosen in this city Poisson distribution and compare the results of parts (a).0012 that a per.40.10 that. printout of Figure 4.50. determine the value of (c) anywhere from 4 to 6 such illnesses in a given year. Poisson distribution with λ = 3. Check in each case whether the values of n and θ results of Exercise 78. Use Table II of “Statistical Tables” determine the probability that among 150 calls received to find the probabilities of by the switchboard 2 are wrong numbers. from 28 to 32 miles per gallon. Use the formula for the Poisson distribution representative. Records show that the probability is 0. and 0. an 80.05. the probability of having one of the 12 jurors (a) without a breakdown. he argues. 240 are foot sheet of the metal will have anywhere from 8 to 12 union members. (c) n = 120 and θ = 0. Use sonous plant is a random variable having a Poisson distri- the Poisson approximation to the binomial distribution to bution with λ = 5. Find the probability state fair.8. or neither when we want to use rolls. ple of size 100 taken from a lot of 1.4 percent of the become seriously ill each year from eating a certain poi- calls received by a switchboard are wrong numbers. 3 will average less than 174 . (b) the computer printout of Figure 5. Since the jury of 12 persons cho. Use the Poisson approximation to the binomial that among 10 such cars tested. SECS. (a) 3 such illnesses in a given year.04. The number of complaints that a dry-cleaning estab- lishment receives per day is a random variable having a (b) the binomial distribution as an approximation. in city driving. what is the ratio of these two probabilities? 79. It is known from experience that 1. under 25 years of age should be many times the probabil- ity of having none of them under 25 years of age. 73. and (c). 77. (a) Table II of “Statistical Tables”. sen from this panel to judge a narcotics violation does 78. (b) at least 10 such illnesses in a given year. (a) the formula of the hypergeometric distribution. The probabilities are 0. variable having the Poisson distribution with λ = 0. In a certain desert region the number of persons who 72. 4 percent of all licensed drivers will be involved in at least one car accident in any given year. the youthful puter is a random variable having a Poisson distribution defendant’s attorney complains that this jury is not really with λ = 1. (a) Use a computer program to calculate the exact approximate the binomial distribution with n = 150 and probability of obtaining one or more defectives in a sam- θ = 0. given year.2. bilities. Use Table II of “Statistical Tables” to verify the 71. than 28 miles per gallon. find the probability that 4 (b) the values in the P(X LESS OR = K) column. The number of monthly breakdowns of a super com- not include anyone under 25 years of age. using employees are chosen by lot to serve on a committee that (a) the values in the P(X = K) column. 0. find the probability that the distribution to find the probability that among 1. Find the probability that 2 yards of the fabric will have at (a) n = 125 and θ = 0. find the probability that a 15-square- 69. Use the formula for 70. if the selection were to find the probabilities that this computer will function random. binomial distribution. (a) only 5 will be involved in at least one accident in any (b). 83. In the inspection of a fabric produced in continuous excellent approximation.10. With reference to Example 8. whereas the others are not. (d) n = 40 and θ = 0. son will get food poisoning spending a day at a certain or more than 32 miles per gallon. Among the 300 employees of a company. satisfy the rule of thumb for a good approximation. Indeed.06. Actually.25.5 to 82. (b) with only one breakdown. If 6 of the defects. receive only two complaints on any given day.05. With reference to Example 13 and the computer (b) the binomial distribution as an approximation. the number of imperfections per yard is a random the Poisson distribution to approximate binomial proba. In a given city. of the 6 will be union members using (a) the formula for the hypergeometric distribution. 76. Statistical Distribu. Gleser. Find the AQL and the LTPD of the sampling plan in other 2 mints. If the AQL is 0. Keeping the producer’s risk at 0. tributions may be found in Boston: Houghton Mifflin Company. the consumer’s risk is 0. C. sample size of 10 and an acceptance number of 0. find the producer’s and is the probability that among 6 of the bricks (chosen at consumer’s risks. that it will contain only errors favoring the state. S. (b) What is the probability of accepting a lot whose true tain only errors favoring the state.C. National Hastings. No.20. and Kotz.. 89. 1975.1 and the LTPD is 0. 0. three that produce round green seeds. Among 25 silver dollars struck in 1903 there are 15 sample size of 15 and an acceptance number of 1. If 18 defective glass bricks include 10 that have cracks are 0. A sampling inspection program has a 0.. 175 .03 is the LTPD. 4 proportion of defectives is 0. the AQL is 0. A.10.. (b) find the LTPD corresponding to a consumer’s risk of kled yellow seeds. Ltd. wrinkled yellow of the AQL and the LTPD? seeds. Government Printing and Office. Derman. Sketch the OC curve for a sampling plan having a Orleans mint. and a 0. 6 will average from 28 to 32 miles per 0. 0. and none that produce wrinkled green seeds? 92.01 is the AQL and 0.05 and the LTPD is 0. If 5 of these sil. true proportion of defectives is 0..10 probability if the AQL is 0. 1980. and 2 97.10. and Olkin. that it will contain only errors favoring the tax. that among nine plants thus obtained there will be four that produce round yellow seeds. 16 .3 in both sampling of rejecting a lot when the true proportion of defectivesis plans? References Useful information about various special probability dis.60. 5 that have discoloration but no 96. 6.03 and the LTPD or that it will contain both kinds of errors. Tables of the Binomial Probability Distribution..07. Probability Mod. (a) What is the probability of accepting a lot whose true domly chosen for audit.10 that a state income tax return will be filled out cor- rectly. D. or wrinkled green seeds are. Binomial probabilities for n = 2 to n = 49 may be els and Applications. and 1 will contain both proportion of defectives is 0. 10 (b) How do the producer’s and consumer’s risks change 88. If 0. Discrete Distributions. (a) In Exercise 92 change the acceptance number will have cracks and discoloration? from 1 to 0 and sketch the OC curve. SEC. L.10. Washington. 0.05. what are the producer’s and con- 84.. 2 will con. from the Philadelphia mint. and 3 from the San Francisco mint. respectively. the probabilities of getting a and the consumer’s risk at 0. New York: Macmillan Publishing found in Co.. Sketch the OC curve for a sampling plan having a ver dollars are picked at random. What is the is 0. 1969.05 with wrinkled green seeds. probability that among 12 such income tax returns ran. I. From Figure 6. Special Probability Distributions 28 miles per gallon.10. London: Butterworth & Co.: U.S.07? kinds of errors? 85. and sumer’s risks? 0. 93. round green seeds. if 90. L. what pling plan given in Exercise 92. J. random for further checks) 3 will have cracks but no dis- coloration. but no discoloration. The producer’s risk in a sampling program is 0. 7 from the New Orleans mint. 1 will have discoloration but no cracks. N. 16 . and 1 will average more than 32 miles per gallon. Sketch the OC curve for a sampling plan having a 86.25 in the sam- cracks.. and 3 that have cracks and discoloration.03? will contain only errors favoring the taxpayer. What is the probability (a) find the producer’s risk if the AQL is 0. (b) 3 from the Philadelphia mint and 1 from each of the 95. Suppose that the probabilities are 0. Johnson. Inc. 9 3 3 1 91.05 and payer.. and Peacock. find the probabilities sample size of 25 and an acceptance number of 2. and 16 . what are the new values plant that produces round yellow seeds. J. 5 will be filled out correctly.01. N. B. Exercise 93 if both the producer’s and consumer’s risks 87. two that produce wrin.. According to the Mendelian theory of heredity. 16 . of getting (a) 4 from the Philadelphia mint and 1 from the New 94. Bureau of Standards Applied Mathematics Series tions.03. Suppose the acceptance number in Example 16 is plants with round yellow seeds are crossbred with plants changed from 1 to 2.95 probability of rejecting the lot when the gallon. 1950.10. H. 43 (a) 0. 50–100 Binomial Tables. 63 (a) 0. (b) 0. 1953. (b) 0. (b) α3 →0 when n→q. 85 0.  (0) = λ. μ3 = μ(3) + 3μ(2) + μ(1) . Poisson’s Exponential Binomial Limit. pany.4944. (b) The condition is sat- μ(4) + 6μ(3) + 7μ(2) + μ(1) . (c) The condition is satisfied. 57 (a) 0. and μ4 = 67 (a) The condition is not satisfied.17.1493. (b) Fx (t) = [1 + θ (t − 1)]n . σY2 = −1 . 53 (a) 0.0117. Inc. (b) 0.2206. (b) 0. (c) 0.     thumb for good approximation is satisfied..0861 and con- (a) μ = 15 2 39 15 2 39 sumer’s risk = 0. 87 0. (b) The rule of 15 (a) α3 = 0 when θ = 12 . 59 0. G. E. Special Probability Distributions and for n = 50 to n = 100 in Molina. 95 AQL = 0. (b) μ = 8 and σ = 64 .0397. 176 . Fla. 49 0.0841. 51 0.0538. 41 0. 77 0.0504. 71 (a) Neither rule of thumb is satisfied. LTPD = 0.2458.0469. 91 (a) 0.2941. 1973 Reprint. (c) 0.4013 and consumer’s risk = 0. (b) 0.0282. (c) The rule of 1 k 1 17 μY = k − 1 .1653. (c) 0. 89 (a) 0.0980.35.0504. (b) 0.2975. MY (t) = eλ(e −t−1) .0754. 61 (a) 0.8795. 75 0.. 69 (a) 0.0625.9222.2478. (b) 0. (b) 0.0292. 81 (a) 0. (b) 0. 0. The most widely used table of Poisson probabilities is Answers to Odd-Numbered Exercises 11 μ2 = μ(2) + μ(1) . 79 (a) 0. Romig. Krieger Publishing Com- Wiley & Sons. (d) Neither θ θ θ rule of thumb is satisfied.2066. (b) 0. σY2 = MY t 37 73 x = 15.07. thumb for excellent approximation is satisfied..: Robert E.2066.2051.0970. 45 (a) 0. 47 0. Plan 2 (c = 1): producer’s risk = 65 8 and σ = 64 . 83 0.0086.95. C. New York: John Melbourne. 97 (b) Plan 1 (c = 0): producer’s risk = 0. isfied.10.2008.2205. 13 (a) Fx (t) = 1 − θ + θt.1293.5948.2041.33. Irwin Miller. 2 The Uniform Distribution DEFINITION 1. Eighth Edition. 177 . THEOREM 1. and Chi-Square Distribution Distributions 7 The Bivariate Normal Distribution 4 The Beta Distribution 8 The Theory in Practice 1 Introduction In this chapter we shall study some of the probability densities that figure most prominently in statistical theory and in applications. In addition to the ones given in the text. again leaving some of the details as exercises. In Exercise 2 the reader will be asked to verify the following theorem. and may be pictured as in Figure 1. UNIFORM DISTRIBUTION. We shall derive parameters and moment-generating functions. Inc. several others are introduced in the exercises following Section 4. A random variable X has a uniform distri- bution and it is referred to as a continuous uniform random variable if and only if its probability density is given by ⎧ ⎪ ⎨ 1 for α < x < β u(x. β) = β − α ⎪ ⎩0 elsewhere The parameters α and β of this probability density are real constants. α. Freund’s Mathematical Statistics with Applications. Marylees Miller. Copyright © 2014 by Pearson Education.Special Probability Densities 1 Introduction 5 The Normal Distribution 2 The Uniform Distribution 6 The Normal Approximation to the Binomial 3 The Gamma. All rights reserved. The mean and the variance of the uniform distribution are given by α+β 1 μ= and σ 2 = (β − α)2 2 12 From Chapter 6 of John E. with α < β. Exponential. b) 1 ba x a b Figure 1. it lends itself readily to the task of illustrating various aspects of statistical theory. its main value is that. 3 The Gamma. and k must be such that the total area under the curve is equal x to 1. due to its simplicity. and it defines the well-known gamma function  q (α) = yα−1 e−y dy for α > 0 0 which is treated in detail in most advanced calculus texts. a. To evaluate k. Integrating by parts. Although the uniform distribution has some direct applications. which is left to the reader in Exercise 7. which yields β  q  q kxα−1 e−x/β dx = kβ α yα−1 e−y dy 0 0 The integral thus obtained depends on α alone. we first make the substitution y = . The uniform distribution. we find that the gamma function satisfies the recur- sion formula (α) = (α − 1) · (α − 1) 178 . β > 0. Special Probability Densities u(x. and Chi-Square Distributions Let’s start with random variables having probability densities of the form  kxα−1 e−x/β for x > 0 f (x) = 0 elsewhere where α > 0. Exponential. Special Probability Densities for α > 1. GAMMA DISTRIBUTION. Graphs of gamma distributions. To give the reader some idea about the shape of the graphs of gamma densities. we obtain the following definition. α. as the reader will be asked to verify in Exercise 9. b1 2 3 4 1 a  2. When α is not a positive integer. DEFINITION 2. b  5 4 x 0 1 2 3 4 5 6 Figure 2. the value of (α) will have to be looked up in a special table. and since  q (1) = e−y dy = 1 0  (α)√= (α − 1)! when it follows by repeated application of the recursion formula that α is a positive integer. Returning now to the problem of evaluating k. for α = 1 and β = θ . Some special cases of the gamma distribution play important roles in statistics. an important special value is  12 = π . we equate the integral we obtained to 1. A random variable X has a gamma distribu- tion and it is referred to as a gamma random variable if and only if its probability density is given by ⎧ ⎪ ⎨ 1 α xα−1 e−x/β for x > 0 g(x. Also. those for several special values of α and β are shown in Figure 2. β) = β (α) ⎪ ⎩0 elsewhere where ␣ > 0 and ␤ > 0. f (x) 1 1 a . b  2 1 2 1 1 a  11. 179 . getting  q kxα−1 e−x/β dx = kβ α (α) = 1 0 and hence 1 k= β α (α) This leads to the following definition of the gamma distribution. for instance. (ii) the probability of more than one success during such a time interval is negligible. the waiting time until the first success. Clearly. Exponential distribution. This density is pictured in Figure 3. 180 . αy) e−αy (αy)0 = 1− 0! = 1 − e−αy for y > 0 g (x . Special Probability Densities DEFINITION 3. and (iii) the probability of a success during such a time interval does not depend on what happened prior to time t. A random variable X has an exponen- tial distribution and it is referred to as an exponential random variable if and only if its probability density is given by ⎧ ⎪ ⎨ 1 e−x/θ for x > 0 g(x. EXPONENTIAL DISTRIBUTION. Let us determine the probability density of the continuous random variable Y. u) 1 u x 0 Figure 3. θ ) = θ ⎪ ⎩0 elsewhere where ␪ > 0. The number of successes is a value of the discrete random variable X having the Poisson distribution with λ = αt. F(y) = P(Y F y) = 1 − P(Y > y) = 1 − P(0 successes in a time interval of length y) = 1 − p(0. Let us consider there is the probability of getting x successes during a time inter- val of length t when (i) the probability of a success during a very small time interval from t to t + t is α · t. v) = 2 (ν/2) ⎪ ⎩0 elsewhere The parameter ν is referred to as the number of degrees of freedom.4 1 . by virtue of condition (see Exercise 16). The chi-square distribution plays a very important role in sam- pling theory.4x −1. we find that differentiation with respect to y yields  αe−αy for y > 0 f (y) = 0 elsewhere 1 which is the exponential distribution with θ = . DEFINITION 4. A random variable X has a chi-square distribution and it is referred to as a chi-square random variable if and only if its probability density is given by ⎧ ⎪ ⎨ 1 ν−2 x ν/2 x 2 e− 2 for x > 0 f (x. we find that the desired probability is  1/6 1/6 −8. it applies also to the waiting times between successes. CHI-SQUARE DISTRIBUTION. ν Another special case of the gamma distribution arises when α = and β = 2. let us first prove the follow- ing theorem.4. To derive formulas for the mean and the variance of the gamma distribution.4 8. the waiting time is a random variable having an exponential distribution with θ = 8.75. Having thus found the distribution function of Y.4x −8. and since 1 5 minutes is 6 of the unit of time.4e dx = −e = −e +1 0 0 which is approximately 0. What is the probability of a waiting time of less than 5 minutes between cars exceeding the speed limit by more than 10 miles per hour? Solution Using half an hour as the unit of time. Therefore. the number of cars exceeding the speed limit by more than 10 miles per hour in half an hour is a random variable having a Poisson distribution with λ = 8. we have α = λ = 8. and hence the exponential and chi-square distributions. Special Probability Densities and F(y) = 0 for y F 0. 181 . α The exponential distribution applies not only to the occurrence of the first suc- cess in a Poisson process but. 2 where ν is the lowercase Greek letter nu. or simply the degrees of freedom.4. EXAMPLE 1 At a certain location on highway I-10. Since the integral on the right is (r + α) according β to the definition of gamma function. let us now derive the following results about the gamma distribution. Special Probability Densities THEOREM 2. The rth moment about the origin of the gamma distribution is given by β r (α + r) μr = (α) Proof By using the definition of the rth moment about the origin. THEOREM 3. THEOREM 4. we obtain the following 2 corollaries. The mean and the variance of the exponential distribution are given by μ = θ and σ 2 = θ 2 COROLLARY 2. COROLLARY 1. The moment-generating function of the gamma distribution is given by MX (t) = (1 − βt)−α 182 . Using this theorem. Substituting into these formulas α = 1 and β = θ for the exponential distribu- ν tion and α = and β = 2 for the chi-square distribution. The mean and the variance of the chi-square distribution are given by μ=ν and σ 2 = 2ν For future reference. this completes the proof. we get β(α + 1) μ1 = = αβ (α) and β 2 (α + 2) μ2 = = α(α + 1)β 2 (α) so μ = αβ and σ 2 = α(α + 1)β 2 − (αβ)2 = αβ 2 .  q  q 1 βr μr = xr · α xα−1 e−x/β dx = · yα+r−1 e−y dy 0 β (α) (α) 0 x where we let y = . The mean and the variance of the gamma distribution are given by μ = αβ and σ 2 = αβ 2 Proof From Theorem 2 with r = 1 and r = 2. let us give here also the moment-generating function of the gamma distribution. β) = (α) · (β) ⎪ ⎩0 elsewhere where ␣ > 0 and ␤ > 0. A random variable X has a beta distribution and it is referred to as a beta random variable if and only if its probability density is given by ⎧ ⎨ (α + β) xα−1 (1 − x)β−1 ⎪ for 0 < x < 1 f (x. Detailed discussion of the beta function may be found (α + β) in any textbook on advanced calculus. which is defined in the following way. In recent years. BETA DISTRIBUTION. whose values are denoted B(α. By “flex- ible” we mean that the probability density can take on a great variety of different shapes. where parameters are looked upon as random variables. β). THEOREM 5. as the reader will be asked to verify for the beta distribution in Exercise 27. we shall make use of the fact that  1 (α + β) α−1 x (1 − x)β−1 dx = 1 0 (α) · (β) and hence that  1 (α) · (β) xα−1 (1 − x)β−1 dx = 0 (α + β) This integral defines the beta function. Special Probability Densities The reader will be asked to prove this result and use it to find some of the lower moments in Exercises 12 and 13. like that of any probability density. and there is a need for a fairly “flexible” probability density for the parameter θ of the binomial distribution. α. The mean and the variance of the beta distribution are given by α αβ μ= and σ2 = α+β (α + β)2 (α + β + 1) 183 . the beta distribution has found important applications in Bayesian inference. β) = . which takes on nonzero values only on the interval from 0 to 1. B(α. 4 The Beta Distribution The uniform density f (x) = 1 for 0 < x < 1 and f (x) = 0 elsewhere is a special case of the beta distribution. We shall not prove here that the total area under the curve of the beta distribu- tion. is equal to 1. but in the proof of the theorem that follows. in other (α) · (β) words. DEFINITION 5. Similar steps.  (α + β) 1 μ= · x · xα−1 (1 − x)β−1 dx (α) · (β) 0 (α + β) (α + 1) · (β) = · (α) · (β) (α + β + 1) α = α+β where we recognized the integral as B(α + 1. Special Probability Densities Proof By definition. β) and made use of the fact that (α + 1) = α · (α) and (α + β + 1) = (α + β) · (α + β). which will be left to the reader in Exercise 28. yield (α + 1)α μ2 = (α + β + 1)(α + β) and it follows that . parameters α and β. 8. find its distribution function. the probability integral defining the gamma function can be written as that it will take on a value less than α + p(β − α) is  q equal to p. Show that if a random variable has a uniform den. Using the form of the gamma function of Exercise 8. Prove Theorem 1. If a random variable X has a uniform density with the 9. 1 2 (α) = 21−α · z2α−1 e− 2 z dz for α > 0 2. Perform a suitable change of variable to show that the sity with the parameters α and β. 0 3.2 2 (α + 1)α α σ = − (α + β + 1)(α + β) α+β αβ = (α + β)2 (α + β + 1) Exercises 1. Show that if a random variable has a uniform density . we can write 4. √  q 1 1 2 with the parameters α and β. . the rth moment about the  = 2 e− 2 z dz 2 0 mean equals (a) 0 when r is odd. and hence 1 β −α r (b) when r is even. . A random variable is said to have a Cauchy distribu. and thus show that ( 12 ) = π. (x − α)2 + β 2 10. Use integration by parts to show that (α) = (α − 1) · (a) α = 2 and β = 3. (α − 1) for α > 1. able will exceed 4 if it has a gamma distribution with 7. 1 e− 2 (x 2 +y2 ) =2 dx dy tion if its density is given by 0 0 β Change to polar coordinates to evaluate this double inte- π √ f (x) = for − q < x < q gral. 184 . Use the results of Exercise 4 to find α3 and α4 for the 2 0 0 uniform density with the parameters α and β.  q q 6.2  q   q  r+1 2 1 1 2 1 2  =2 e− 2 x dx e− 2 y dy 5. (b) α = 3 and β = 4. Find the probabilities that the value of a random vari- Show that for this distribution μ1 and μ2 do not exist. Show that a gamma distribution with α > 1 has a rel. What happens when only if its probability density is given by 0 < α < 1 and when α = 1?  β kxβ−1 e−αx for x > 0 12. = f (x) = Prove. 23. A random variable X has a Weibull distribution if and ative maximum at x = β(α − 1). Special Probability Densities 11. Expand the moment-generating function of the . (a) Express k in terms of α and β. 13. β where α > 0 and β > 0. making the substitution y 0 elsewhere 1 x − t in the integral defining MX (t). Theorem 4. gamma distribution. make use of the recur- 1 π (a) μ = . (d) α = 2 and β = 5. 18. Use the results of Exercise 13 to find α3 and α4 for the tial distributions. f (t) take on a value less than −θ · ln(1 − p) is equal to p for 0 F then its failure rate at time t is given by . Show that for this distribution [Hint: To evaluate ( 32 ) and ( 52 ). exponential distribution with the parameter θ . f (x) = 0 elsewhere (c) α = 2 and β = 12 . the failure rate is given by αβtβ−1 . A random variable X has a Rayleigh distribution if 27. 19. Show that if a random variable has an exponential commercial product and the values of its probability den- density with the parameter θ . What happens when relative maximum at α−1 ν = 2 or 0 < ν < 2? x= . If the random variable T is the time to failure of a 15. the probability that it will sity and distribution function at time t are f (t) and F(t). 17. μ3 . Show that if ν > 2. 24. cise 23). μ2 . 1 − F(t) failure rate at time t is the probability density of failure at 16. the chi-square distribution has a 26. With reference to Exercise 17. and μ4 . show that time t given that failure does not occur prior to time t. Show that if α > 1 and β > 1. (b) α = 3 and β = 3.  2αxe−αx for x > 0 2 (b) α = 12 and β = 1. 1 gamma distribution as a binomial series. (a) Show that if T has an exponential distribution. find α3 and α4 for the (a) α = 2 and β = 4. This question has been intentionally omitted for this (b) Show that if T has a Weibull distribution (see Exer- edition. and read off the (b) Show that μ = α −1/β  1 + . Sketch the graphs of the beta densities having and only if its probability density is given by (a) α = 2 and β = 2. the p < 1. values of μ1 . sion formula (α) = (α − 1) · (α − 1) and the result of 2 α . Thus. β Note that Weibull distributions with β = 1 are exponen- 14. α+β −2 20. the P[(X Ú t + T)|(x Ú T)] = P(X Ú t) failure rate is constant. Verify that the integral of the beta density from −q moments of Y about the origin are the corresponding to q equals 1 for moments of X about the mean. the beta density has a relative maximum at x = ν − 2. using the fact that the 25. If X has an exponential distribution. where α >0. ] 1 π (b) σ 2 = 1− . Karl Pearson. showed that the differential equation Pareto distribution α 1 d[f (x)] d−x μ= provided α > 1. Exercise 9. σ2 30. With reference to Exercise 21. one of the founders of modern statis- 22. · = α−1 f (x) dx a + bx + cx2 185 . A random variable X has a Pareto distribution if and 29. Verify the expression given for μ2 in the proof of The- α 4 orem 5. Show that the parameters of the beta distribution can only if its probability density is given by be expressed as follows in terms of the mean and the vari- ⎧ ance of this distribution: ⎪ α   ⎨ for x > 1 x α+1 μ(1 − μ) f (x) = (a) α = μ − 1 . 21. where α > 0. 28. ⎪ ⎩0 σ2 elsewhere   μ(1 − μ) (b) β = (1 − μ) − 1 . show that for the tics. Show that μr exists only if r < α. that the parameter μ is. b > 0. Pierre Laplace (1749–1827). (a) the gamma distribution when a = c = 0. 186 . which we shall study in this section. Graph of normal distribution. They found that the patterns (distributions) that they observed could be closely approximated by continuous curves. the square root of var(X). b. Special Probability Densities yields (for appropriate values of the constants a. is shown in Figure 4. and Karl Gauss (1777–1855). Verify that the differential equation gives (c) the beta distribution when a = 0. NORMAL DISTRIBUTION. The notation used here shows explicitly that the two parameters of the normal distribution are μ and σ . It was investigated first in the eighteenth century when scientists observed an astonishing degree of regularity in errors of measurement. 5 The Normal Distribution The normal distribution. (b) the exponential distribution when a = c = d = 0 and and d) most of the important distributions of statistics. μ. First. shaped like the cross section of a bell. d > −b. μ. A random variable X has a normal distribu- tion and it is referred to as a normal random variable if and only if its probability density is given by  1 x−μ 2 −1 n(x. where X is a random variable having the normal distribution with these two parameters. σ ) = √ e 2 σ for − q < x < q σ 2π where ␴ > 0. c. let us show that the formula of Definition 6 can serve as a prob- ability density. x m Figure 4. b = −c. which they referred to as “normal curves of errors” and attributed to the laws of chance. E(X) and that the parameter σ is. however. and and db > −1. in fact. DEFINITION 6. Since the values of n(x. is in many ways the cornerstone of modern statistical theory. The mathematical properties of such normal curves were first studied by Abraham de Moivre (1667–1745). in fact. It remains to be shown. b > 0. d−1b < 1. σ ) are evidently positive as long as σ > 0. though. The graph of a normal distribution. we get σ  q  2  q  q 1 − 1 x−μ 1 1 2 2 1 2 √ e 2 σ dx = √ e− 2 z dz = √ e− 2 z dz −q σ 2π 2π −q 2π 0   12 √ π Then. indeed. 187 . since the integral on the right equals √ = √ according to Exercise 9.  q  2 1 − 1 x−μ MX (t) = ext · √ e 2 σ dx −q σ 2π  q 1 − 1 [−2xtσ 2 +(x−μ)2 ] = √ · e 2σ 2 dx σ 2π −q and if we complete the square. use the identity −2xtσ 2 + (x − μ)2 = [x − (μ + tσ 2 )]2 − 2μtσ 2 − t2 σ 4 we get ⎧  2 ⎫ ⎪ ⎨  q x−(μ+tσ 2 ) ⎪ ⎬ 1 2 1 − 12 σ MX (t) = eμt+ 2 t σ 2 √ · e dx ⎪ ⎩ σ 2π −q ⎪ ⎭ Since the quantity inside the braces is the integral from −q to q of a normal density with the parameters μ + tσ 2 and σ . that is. the mean and the standard deviation of the normal distribution. The moment-generating function of the normal distribution is given by 1 2 2 MX (t) = eμt+ 2 σ t Proof By definition. and hence is equal to 1. 2π 2 Next let us prove the following theorem. E(X) = μ and var(X) = (μ2 + X σ ) − μ = σ 2. Thus. THEOREM 6. it follows that 1 MX (t) = eμt+ 2 σ 2 t2 We are now ready to verify that the parameters μ and σ in Definition 6 are. it 2 2 √ 2 π follows that the total area under the curve is equal to √ · √ = 1. Integrating from −q x−μ to q and making the substitution z = . 2 2 Since the normal distribution plays a basic role in statistics and its density cannot be integrated directly. Twice dif- ferentiating MX (t) with respect to t. Special Probability Densities we must show that the total area under the curve is equal to 1. we get  MX (t) = (μ + σ 2 t) · MX (t) and  MX (t) = [(μ + σ 2 t)2 + σ 2 ] · MX (t) so that MX (0) = μ and M (0) = μ2 + σ 2 . its areas have been tabulated for the special case where μ = 0 and σ = 1. 08.4599 − 0. (b) less than −0. By virtue of the symmetry of the normal distribution about its mean. and 3. and z = 6. Solution (a) We look up the entry corresponding to z = 1.0987 + 0.02.5000 = 0.0. add 0. subtract it from 0.72 in the standard normal distri- bution table.75 and z = 1. (b) We look up the entry corresponding to z = 0.0. EXAMPLE 2 Find the probabilities that a random variable having the standard normal distribu- tion will take on a value (a) less than 1.00.1736 = 0. (d) between −0. add them (see Figure 6).01. for z = 0. .25 and z = 0.5000 − 0. it is unnecessary to extend the table to negative values of z.1894. 188 . 0 z Figure 5. the probabilities that a random variable having the standard normal distribu- tion will take on a value on the interval from 0 to z. (c) We look up the entries corresponding to z = 1. . The entries in standard normal distribution table. . .88.45. The normal distribution with ␮ = 0 and ␴ = 1 is referred to as the standard normal distribution. and get 0. 0. and get 0. and get 0.25 and 0. 3.5000 (see Figure 6).4573 + 0.75.09 and also z = 4.2723. Special Probability Densities DEFINITION 7. are the values of  z 1 1 2 √ e− 2 x dx 0 2π that is. Tabulated areas under the standard normal distribution.5000 (see Figure 6). STANDARD NORMAL DISTRIBUTION. and get 0.45 in the table.30 and 1.9573.4032 = 0.0. 0. represented by the shaded area of Figure 5. subtract the second from the first (see Figure 6).72.3106 = 0. (d) We look up the entries corresponding to z = 0.88 in the table. (c) between 1. z = 5.30 in the table.0567. 04 and z = 1.2549. THEOREM 7. Diagrams for Example 2.25 0 0. corresponding to z = 0. we always choose the z value corresponding to the tabular value that comes closest to the specified probability. Solution (a) Since 0. Occasionally. In that case. (b) Since 0.3508 and 0.1736 0. for convenience. EXAMPLE 3 With reference to the standard normal distribution table.3531.88 0 0.45 Figure 6. and since 0.4032 0. we choose z = 0.75 0. if the given probability falls midway between tabular values.3512.05.3508 than 0.69. However. we are required to find a value of z corresponding to a specified probability that falls between values listed in the table.2533 falls midway between 0. we make use of the follow- ing theorem.3512 falls between 0.3531.3512 is closer to 0. then X −μ Z= σ has the standard normal distribution. To determine probabilities relating to random variables having normal distri- butions other than the standard normal distribution.4573 0. find the values of z that correspond to entries of (a) 0.5000 0. (b) 0. If X has a normal distribution with the mean μ and the stan- dard deviation σ .3106 0.72 0.0987 0. corresponding to z = 1. 189 . we choose z = 1.2533.0567 z z 0 1.1894 z z 0 1.685. Special Probability Densities 0.30 1.2517 and 0.68 and z = 0. we shall choose for z the value falling midway between the corre- sponding values of z.04. 0.0749. 190 . Z x1 − μ x2 − μ must take on a value between z1 = and z2 = when X takes σ σ on a value between x1 and x2 . we can write  x2 1  x−μ 2 1 − P(x1 < X < x2 ) = √ e 2 σ dx 2π σ x1  z2 1 1 2 =√ e− 2 z dz 2π z1  z2 = n(z. to use the standard normal distribution table in connection with any ran- dom variable having a normal distribution.20 mrem of cosmic radiation on such a flight? Solution 5. What is the prob- ability that a person will be exposed to more than 5.44 Figure 7.20 z0 z  1.4251 0.35 mrem and a standard deviation of 0. we get 0. Special Probability Densities Proof Since the relationship between the values of X and Z is linear.59 subtracting it from 0.5000 (see Figure 7). Thus.0749 x 4. Diagram for Example 4. Probabilities relating to random variables having the normal distribution and several other continuous distributions can be found directly with the aid of computer 0. 1) dz z1 = P(z1 < Z < z2 ) where Z is seen to be a random variable having the standard normal distribution. we simply perform the change of scale x−μ z= .59 mrem.5000 − 0. Hence.4251 = 0.44 in the table and 0.20 − 4.35 Looking up the entry corresponding to z = = 1.35 5. σ EXAMPLE 4 Suppose that the amount of cosmic radiation to which a person is exposed when fly- ing by jet across the United States is a random variable having a normal distribution with a mean of 4. 191 .ØØØØ Ø. Figure 8 shows the histograms of binomial distributions with θ = 12 and n = 2.5620.6ØØØ Ø.7757 Thus. the probability of a success on an individual trial. n2 n5 n  10 n  25 Figure 8.0000 − 0. the required probability is 1.7487 − 0. is close to 12 .1867 24.7 9.1 will assume a value between 10.7 and the standard deviation 9.8.6 and 24. Binomial distributions with θ = 12 . SUBC>Chisquare 25 3Ø. SUBC>Normal 18. EXAMPLE 5 Use a computer program to find the probability that a random variable having (a) the chi-square distribution with 25 degrees of freedom will assume a value greater than 30.2243.7757 = 0. The following example illus- trates such calculations using MINITAB statistical software. Special Probability Densities programs especially written for statistical applications. the required probability is 0. 6 The Normal Approximation to the Binomial Distribution The normal distribution is sometimes introduced as a continuous distribution that provides a close approximation to the binomial distribution when n. is very large and θ .1. SUBC>Normal 18. (b) MTB>CDF C2. and MTB>CDF C3. the number of trials. we select the option “cumulative distribution” to obtain the following: (a) MTB>CDF C1. (b) the normal distribution with the mean 18. 1Ø.1867 = 0. Solution Using MINITAB software.8ØØ Ø.1.7487 Thus.7 9. Then. taking logarithms and substi- tuting the Maclaurin’s series of et/σ . To provide a theoretical foundation for this argument. let us first prove the fol- lowing theorem. then the moment-generating function of X − nθ Z= √ nθ (1 − θ ) approaches that of the standard normal distribution when n→q. Proof Making use of theorems relating to moment-generating functions of the binomial distribution. Special Probability Densities 5. and 25. and it can be seen that with increasing n these distributions approach the symmetrical bell-shaped pattern of the normal distribution. If X is a random variable having a binomial distribution with the parameters n and θ . we can write MZ (t) = M X−μ (t) = e−μt/σ · [1 + θ (et/σ − 1)]n σ √ where μ = nθ and σ = nθ (1 − θ ). 10. we get μt ln M X−μ (t) = − + n · ln[1 + θ (et/σ − 1)] σ σ ⎡  ⎤ . THEOREM 8. 2 . using the infinite series ln(1 + x) = x − 12 x2 + 13 x3 − · · · . to expand this logarithm. it follows that .3 μt t 1 t 1 t = − + n · ln ⎣1 + θ + + +··· ⎦ σ σ 2 σ 6 σ and. which con- verges for |x| < 1. 2 .  μt t 1 t 1 t 3 ln M X−μ (t) = − + nθ + + +··· σ σ σ 2 σ 6 σ . 2 . 2 nθ 2 t 1 t 1 t 3 − + + +··· 2 σ 2 σ 6 σ 2 3 nθ 3 t 1 t 1 t 3 + + + +··· −··· 3 σ 2 σ 6 σ Collecting powers of t, we obtain   μ nθ nθ nθ 2 2 ln M X−μ (t) = − + t+ − t σ σ σ 2σ 2 2σ 2   nθ nθ 2 nθ 3 + − + t3 + · · · 6σ 3 2σ 3 3σ 3     1 nθ − nθ 2 2 n θ − 3θ 2 + 2θ 3 = 2 t + 3 t3 + · · · σ 2 σ 6 192 Special Probability Densities √ since μ = nθ . Then, substituting σ = nθ (1 − θ ), we find that   1 n θ − 3θ 2 + 2θ 3 3 ln M X−μ (t) = t2 + 3 t +··· σ 2 σ 6 n For r > 2 the coefficient of tr is a constant times r , which approaches 0 σ when n→q. It follows that 1 2 lim ln M X−μ (t) = t n→ q σ 2 and since the limit of a logarithm equals the logarithm of the limit (pro- vided the two limits exist), we conclude that 1 2 lim M X−μ (t) = e 2 t n→ q σ which is the moment-generating function of Theorem 6 with μ = 0 and σ = 1. This completes the proof of Theorem 8, but have we shown that when n→q the distribution of Z, the standardized binomial random variable, approaches the standard normal distribution? Not quite. To this end, we must refer to two theorems that we shall state here without proof: 1. There is a one-to-one correspondence between moment-generating functions and probability distributions (densities) when the former exist. 2. If the moment-generating function of one random variable approaches that of another random variable, then the distribution (density) of the first random variable approaches that of the second random variable under the same limit- ing conditions. Strictly speaking, our results apply only when n→q, but the normal distribution is often used to approximate binomial probabilities even when n is fairly small. A good rule of thumb is to use this approximation only when nθ and n(1 − θ ) are both greater than 5. EXAMPLE 6 Use the normal approximation to the binomial distribution to determine the proba- bility of getting 6 heads and 10 tails in 16 flips of a balanced coin. Solution To find this approximation, we must use the continuity correction according to which each nonnegative integer k is represented by the interval from k − 12 to k + 12 . With reference to Figure 9, we must thus determine the  area under the curve between 5.5 and 6.5, and since μ = 16 · 12 = 8 and σ = 16 · 12 · 12 = 2, we must find the area between 5.5 − 8 6.5 − 8 z= = −1.25 and z= = −0.75 2 2 The entries in the standard normal distribution table corresponding to z = 1.25 and z = 0.75 are 0.3944 and 0.2734, and we find that the normal approximation to the 193 Special Probability Densities 0.2734 0.1210 Number 5.5 6.5 8 of heads z  1.25 z  0.75 Figure 9. Diagram for Example 6. probability of “6 heads and 10 tails” is 0.3944 − 0.2734 = 0.1210. Since the corre- sponding value in the binomial probabilities table of “Statistical Tables” is 0.1222, we find that the error of the approximation is −0.0012 and that the percentage error 0.0012 is · 100 = 0.98% in absolute value. 0.1222 The normal approximation to the binomial distribution used to be applied quite extensively, particularly in approximating probabilities associated with large sets of values of binomial random variables. Nowadays, most of this work is done with computers, as illustrated in Example 5, and we have mentioned the relationship between the binomial and normal distributions primarily because of its theoretical applications. Exercises 31. Show that the normal distribution has 37. If X is a random variable having the standard normal (a) a relative maximum at x = μ; distribution and Y = X 2 , show that cov(X, Y) = 0 even (b) inflection points at x = μ − σ and x = μ + σ . though X and Y are evidently not independent. 32. Show that the differential equation of Exercise 30 38. Use the Maclaurin’s series expansion of the moment- with b = c = 0 and a > 0 yields a normal distribution. generating function of the standard normal distribution to show that 33. This question has been intentionally omitted for this (a) μr = 0 when r is odd; edition. r! (b) μr = when r is even. 34. If X is a random variable having a normal distribu- r/2 r 2 ! tion with the mean μ and the standard deviation σ , find 2 the moment-generating function of Y = X − c, where c is tr a constant, and use it to rework Exercise 33. 39. If we let KX (t) = ln MX−μ (t), the coefficient of r! in the Maclaurin’s series of KX (t) is called the rth cumu- 35. This question has been intentionally omitted for this lant, and it is denoted by κr . Equating coefficients of like edition. powers, show that 36. This question has been intentionally omitted for this (a) κ2 = μ2 ; edition. (b) κ3 = μ3 ; (c) κ4 = μ4 − 3μ22 . 194 Special Probability Densities 40. With reference to Exercise 39, show that for normal that is, that of a standardized Poisson random variable, distributions κ2 = σ 2 and all other cumulants are zero. approaches the moment-generating function of the stan- dard normal distribution. 41. Show that if X is a random variable having the Pois- son distribution with the parameter λ and λ→q, then the 42. Show that when α →q and β remains constant, the moment-generating function of moment-generating function of a standardized gamma X −λ random variable approaches the moment-generating Z= √ function of the standard normal distribution. λ 7 The Bivariate Normal Distribution Among multivariate densities, of special importance is the multivariate normal dis- tribution, which is a generalization of the normal distribution in one variable. As it is best (indeed, virtually necessary) to present this distribution in matrix notation, we shall give here only the bivariate case; discussions of the general case are listed among the references at the end of this chapter. DEFINITION 8. BIVARIATE NORMAL DISTRIBUTION. A pair of random variables X and Y have a bivariate normal distribution and they are referred to as jointly nor- mally distributed random variables if and only if their joint probability density is given by      1 x−μ 2 x−μ y−μ y−μ 2 − σ1 1 −2ρ σ1 1 σ2 2 + σ2 2 2(1−ρ)2 e f (x, y) =  2π σ1 σ2 1 − ρ 2 for −q < x < q and −q < y < q, where ␴1 > 0, ␴2 > 0, and −1 < ␳ < 1. To study this joint distribution, let us first show that the parameters μ1 , μ2 , σ1 , and σ2 are, respectively, the means and the standard deviations of the two random variables X and Y. To begin with, we integrate on y from −q to q, getting    x−μ1 2   − 1 σ1  q 1 y−μ2 2 x−μ y−μ2 e 2(1−ρ 2 ) − σ2 −2ρ σ 1 σ2 2(1−ρ 2 ) g(x) =  e 1 dy 2π σ1 σ2 1 − ρ 2 −q x − μ1 for the marginal density of X. Then, temporarily making the substitution u = σ1 to simplify the notation and changing the variable of integration by letting v = y − μ2 , we obtain σ2 − 1 2(1−ρ 2 ) μ2  q 1 e − (v2 −2ρuv) g(x) =  e 2(1−ρ 2 ) dv 2π σ1 1 − ρ 2 −q After completing the square by letting v2 − 2ρuv = (v − ρu)2 − ρ 2 u2 and collecting terms, this becomes ⎧  2 ⎫ ⎪ 1 2 ⎪  ⎪ ⎪ e− 2 u ⎨ q −2 √ ⎬ 1 v−ρu 1 2 g(x) = √ √  e 1−ρ dv σ1 2π ⎪ ⎩ 2π 1 − ρ −q ⎪ 2 ⎪ ⎪ ⎭ 195 Special Probability Densities Finally, identifying the quantity in parentheses as the integral of a normal density from −q to q, and hence equaling 1, we get 1 2  e− 2 u 1 −1 x−μ1 2 g(x) = √ = √ e 2 σ1 σ1 2π σ1 2π for −q < x < q. It follows by inspection that the marginal density of X is a normal distribution with the mean μ1 and the standard deviation σ1 and, by symmetry, that the marginal density of Y is a normal distribution with the mean μ2 and the standard deviation σ2 . As far as the parameter ρ is concerned, where ρ is the lowercase Greek letter rho, it is called the correlation coefficient, and the necessary integration will show that cov(X, Y) = ρσ1 σ2 . Thus, the parameter ρ measures how the two random vari- ables X and Y vary together. When we deal with a pair of random variables having a bivariate normal distri- bution, their conditional densities are also of importance; let us prove the follow- ing theorem. THEOREM 9. If X and Y have a bivariate normal distribution, the condi- tional density of Y given X = x is a normal distribution with the mean σ2 μY|x = μ2 + ρ (x − μ1 ) σ1 and the variance 2 σY|x = σ22 (1 − ρ 2 ) and the conditional density of X given Y = y is a normal distribution with the mean σ1 μX|y = μ1 + ρ (y − μ2 ) σ2 and the variance 2 σX|y = σ12 (1 − ρ 2 ) f (x, y) Proof Writing w(y|x) = in accordance with the definition of con- g(x) x − μ1 y − μ2 ditional density and letting u = and v = to simplify the σ1 σ2 notation, we get 1 1 − [u2 −2ρuv+v2 ]  e 2(1−ρ 2 ) 2π σ1 σ2 1 − ρ 2 w(y|x) = 1 1 2 √ e− 2 u 2π σ1 1 − 1 [v2 −2ρuv+ρ 2 u2 ] =√  e 2(1−ρ 2 ) 2π σ2 1 − ρ 2 2 1 − 12 √v−ρu 1−ρ 2 =√  e 2π σ2 1 − ρ 2 196 they are independent if and only if ρ = 0. and it can be seen by inspection that this is a normal σ2 density with the mean μY|x = μ2 + ρ (x − μ1 ) and the variance σY|x 2 = σ1 σ22 (1 − ρ 2 ). we have shown that for two random variables having a bivariate normal distribution the two marginal densities are normal. The bivariate normal distribution has many important properties. which the reader will be asked to prove in Exercise 43. y). The corresponding results for the conditional density of X given Y = y follow by symmetry. and ρ = 0 at (x. In this connection. y) inside squares 2 and 4 of Figure 10 ∗ f (x. it is easy to see that the marginal densities of X and Y are normal even though their joint density is not a bivariate normal distribution. In other words. if ρ = 0. μ2 = 0. Special Probability Densities Then. 197 . y) elsewhere where f (x. y 2 1 x 3 4 Figure 10. y) is the value of the bivariate normal density with μ1 = 0. some statisti- cal and some purely mathematical. we obtain ⎡   ⎤2 σ y− μ2 +ρ σ2 (x−μ1 ) ⎢ √1 ⎥ − 12 ⎣ ⎦ 1 σ2 1−ρ 2 w(y|x) = √  e σ2 2π 1 − ρ 2 for −q < y < q. THEOREM 10. if the bivariate density of X and Y is given by ⎧ ⎪ ⎪ ⎨2f (x. Among the former. there is the following prop- erty. y). Also. Sample space for the bivariate density given by f∗ (x. but the converse is not necessar- ily true. the marginal distributions may both be normal without the joint distribution being a bivariate normal distribution. For instance. expressing this result in terms of the original variables. y) = 0 inside squares 1 and 3 of Figure 10 ⎪ ⎪ ⎩f (x. the random variables are said to be uncorrelated. If two random variables have a bivariate normal distribution. 49. then μ1 = 2. find μY|1 (a) their independence implies that ρ = 0. Verify that (a) the first partial derivative of this function with respect 46. show that if X and Y have a 47. at t1 = 0 and t2 = 0 is ρσ1 σ2 + μ1 μ2 .8(x + 2)(y − 1) + 4(y − 1)2 ] 102 MX. 54 (c) the second partial derivative with respect to t1 and t2 find σ1 . the bivariate nor- mal surface has a maximum at (μ1 . and it is customary to refer to the corresponding joint density as a circular normal distribution. y) is the value of the bivariate normal density at (x. Thus. If the exponent of e of a bivariate normal density is be shown that their joint moment-generating function is given by −1 [(x + 2)2 − 2. If X and Y have a bivariate normal distribution and 44. pictured in Figure 11. Show that any plane perpendicular to the xy-plane U = X + Y and V = X − Y. σ2 = 6. (b) the second partial derivative with respect to t1 at t1 = 0 −1 2 (x + 4y2 + 2xy + 2x + 8y + 4) and t2 = 0 is σ12 + μ21 . Many interesting properties of the bivariate normal density are obtained by studying the bivariate normal surface. the shape of a normal distribution. μ2 ). 8 The Theory in Practice In many of the applications of statistics it is assumed that the data are approxi- mately normally distributed. If X and Y have a bivariate normal distribution. and ρ. and ρ. When ρ = 0 and σ1 = σ2 . whose equation is z = f (x. Exercises 43. (b) μY|x and σY|x 2 . If the exponent of e of a bivariate normal density is to t1 at t1 = 0 and t2 = 0 is μ1 . σ1 = 3. it can 45. 48.Y (t1 . t2 ) = E(et1 X + t2 Y ) 1 = et1 μ1 + t2 μ2 + 2 (σ1 t1 + 2ρσ1 σ2 t1 t2 + σ2 t2 ) 2 2 2 2 find (a) μ1 . σ2 . it is important to make sure that the assumption 198 . and σY|1 . To prove Theorem 10. μ2 = 5. (b) ρ = 0 implies that they are independent. and any plane parallel to the xy-plane that intersects the surface intersects it in an ellipse called a contour of constant probability density. given that μ1 = 0 and μ2 = −1. y). the contours of constant probability density are circles. Special Probability Densities Figure 11. If X and Y have the bivariate normal distribution with bivariate normal distribution. σ1 . y). As the reader will be asked to verify in some of the exercises that follow. Bivariate normal surface. σ2 . μ2 . any plane parallel to the z-axis intersects the surface in a curve having the shape of a normal distribution. where f (x. find an expression for the intersects the bivariate normal surface in a curve having correlation coefficient of U and V. and ρ = 23 . MINITAB offers three tests for normality that are less subjective than mere examination of a normal-scores plot.25. 3. The cumulative percentage distribution is as follows: Class Boundary Cumulative Percentage Normal Score 4395 5 −1.88 A graph of the class boundaries versus the normal scores is shown in Figure 12.18 0. the following table results: Observation: 2 3 3 4 5 7 Normal score: −1. they divide the area under the normal curve into n + 1 equal parts. Special Probability Densities of normality can.08. the assumption that the data set comes from a normal distribution cannot be supported. −z0.14 = −1. bell-shaped histograms may not be normally distributed. The procedure involves identifying 199 . zp . examination of the histogram picturing the frequency distribution of the data is useful in checking the assumption of normal- ity. The ordered observations then are plotted against the corre- sponding normal scores on ordinary graph paper. Another somewhat less subjective method for checking data is the normal-scores plot. It can be seen from this graph that the points lie in an almost perfect straight line. 5 Solution Since n = 6. It is based on the calculation of normal scores. be supported by the data.25.55 −0. In addition. strongly suggesting that the underlying data are very close to being normally distributed. In modern practice.29 = 0.29 = −0. and z20 = 0. the normal scores for n = 4 observations are −z0.84. or −z1/(n+1) . EXAMPLE 7 Find the normal scores and the coordinates for making a normal-scores plot of the following six observations: 3.18.20 = −0. or if it is symmetric but not bell-shaped.18 0. use of MINITAB or other statistical software eases the com- putation considerably. −z0. at least reasonably.08.95 4795 37 −0.50 5195 87 1. 2.55.33 4995 69 0.18. and z0. Since the normal dis- tribution is symmetric and bell-shaped.14 = 1. 4. The normal score for the first of these areas is the value of z such that the area under the standard normal curve to the left of z is 1/(n + 1). Of course. If n observations are ordered from smallest to largest. data that appear to have symmetric.55 1. If the histogram is not symmetric. there are 6 normal scores.40 = −0. When the observa- tions are ordered and tabulated together with the normal scores. z0.43 = −0. This plot makes use of ordinary graph paper.64 4595 17 −0.40 = 0.84. Sometimes a normal-scores plot showing a curve can be changed to a straight line by means of an appropriate transformation.08 −0.08 The coordinates for a normal-scores plot make use of a cumulative percentage distribution of the data. z0. Thus.43 = 0. z0. −z0.55. each having the area 1/(n + 1).13 5395 97 1. 7. as follows: −z0. this method is subjective. If x is a value of a normally distributed random variable. where a > 1 On rare occasions. and then to use one of the indicated transformations.0 Normal score Figure 12.3 5.2 32. This strategy becomes necessary when some of the data have negative values and logarithmic. the following transformations may produce approximately normal data: power transformation u = xa .4 15. making a linear transfor- mation alone cannot be effective. a linear transformation alone cannot transform nonnormally dis- tributed data into normality. When data appear not to be normally distributed because of too many large values. square-root. and check the transformed data for normality. EXAMPLE 8 Make a normal-scores plot of the following data. it helps to make a linear transformation of the form u = a + bx first. and then checking the transformed data by means of a normal-scores plot to see if they can be assumed to have a normal distribution. the following transformations are good candidates to try: logarithmic transformation u = log(x) √ square-root transformation u = x 1 reciprocal transformation u = x When data exhibit too many small values. then the random variable having the values a + bx also has the normal distribution. making the transformation.0 0 1.9 8. Thus. or certain power transformations are to be tried.0 1. Special Probability Densities Class boundary 5395 5195 4995 4795 4595 4395 2. 54.0 2.5 200 . However. Normal-scores plot. If the plot does not appear to show normality. where a > 1 exponential transformation u = ax . the type of transformation needed. make an appropriate transformation. There is always a great temptation to drop outliers from a data set entirely on the basis that they do not seem to belong to the main body of data. Outliers which occur infrequently. 0. and 0. A normal-scores plot of these data (Figure 13[a]) shows sharp curvature. −0. It is difficult to give a hard-and-fast rule for identifying outliers. a logarithmic transformation (base 10) was used to transform the data to 1.92 0.72 1. an observation that clearly does not lie on the straight line defined by the other observations in a normal-scores plot can be considered an outlier. an error of observation. a single large observation. thus producing one or two outliers. Special Probability Densities Solution The normal scores are −0. Outlying observations may result from several causes. indicating that the transformed data are approximately normally dis- tributed.5 50 40 1. Normal-scores plot for Example 8.51 1. In the presence of suspected outliers. it is not likely that the data can be transformed to normality.0 1. 0.0 Original data Transformed data (a) (b) Figure 13. such as an error in record- ing data.5 10 1. Perhaps the condition was corrected after one or two holes were drilled. 201 . but regularly in successive data sets.0 0 1. or both. For example. While outliers some- times are separated from the other data for the purpose of performing a preliminary 60 1. a hole with an unusually large diameter might result from a drill not having been inserted properly into the chuck.0 0 1. Ordinarily. or an unusual event such as a particle of dust settling on a material during thin-film deposition.44. a single small observation. it is customary to examine normal-scores plots of the data after the outlier or outliers have been omitted. Since two of the five values are very large compared with the other three values.95. If lack of normality seems to result from one or a small number of aberrant observations called outliers. For example.95.19 A normal-scores plot of these transformed data (Figure 13[b]) shows a nearly straight line. But an outlier can be as informative about the process from which the data were taken as the remainder of the data.44. it may be inappropri- ate to define an outlier as an observation whose value is more than three standard deviations from the mean. give evidence that should not be ignored. since such an observation can occur with a reasonable probability in a large number of observations taken from a normal distribution. and the operator failed to discard the parts with the “bad” hole.0 30 20 0.74 0. 00 2 C2 1.70 (b) Transformed data Figure 14.002 and 0. A normal-scores plot. (a) be between −0. as shown in Figure 14(b).80 0. the distance from D to uniform density with α = −0. BD. A point D is chosen on the line AB. To illustrate the procedure using MINITAB. The cube-root transformation u = X1/3 . The points in this graph clearly do not follow a straight line. If X. Special Probability Densities 300.80 2.00 0.90 0.3333 PUT IN C3. Normal scores and normal-score plots can be obtained with a variety of statis- tical software.00 0.00 100. generated by the command PLOT C1 VS C2.00 2 C2 1. they should be discarded only after a good reason for their existence has been found. Then.015 and β = 0. seemed to work best. Normal-scores plots. made by giving the command RAISE C1 TO THE POWER . whose midpoint the density of a substance is a random variable having a is C and whose length is a.90 1. In certain experiments.00 0.015.005 in absolute value.70 (a) Original data 6. 202 . is shown in Figure 14(a).90 1. Several power transformations were tried in an attempt to transform the data to normality. Find the A.00 4. and AC will form a triangle? (b) exceed 0. what is the probability that AD. 20 numbers are entered with the following command and data-entry instructions SET C1: 0 215 31 7 15 80 17 41 51 3 58 158 0 11 42 11 17 32 64 100 END Then the command NSCORES C1 PUT IN C2 is given to find the normal scores and place them in the second column.80 2. the error made in determining 51. Applied Exercises SECS.00 200. a normal-scores plot of the transformed data was generated with the command PLOT C3 VS C2. It appears from this graph that the cube roots of the original data are approximately normally distributed. 1–4 50.80 0.00 2 2 2.003.00 2 2 0. analysis. is a random variable having the uniform density with probabilities that such an error will α = 0 and β = a.90 0. 79). what is the probability that in any given year there tion. If a company employs n salespersons. returns filed with the IRS can be looked upon as a ran- dom variable having a beta distribution with α = 2 and 66. Find the probabilities that 63. the daily consumption of electric 61.7704. Find z if the standard-normal-curve area vate airport is a random variable having a Poisson dis. check on any one day during the first 2 hours of business? (d) P(−z4 F Z < z4 ) = 0. Assuming that the times (d) within four standard deviations of the mean? between repairs are exponentially distributed.025. Suppose that the service life in hours of a semicon- power in millions of kilowatt-hours can be treated as a ductor is a random variable having a Weibull distribution random variable having a gamma distribution with α = 3 (see Exercise 23) with α = 0.05. the annual pro.025 and β = 0. Find the (d) between −0. If the power plant of this city has a daily (a) How long can such a semiconductor be expected capacity of 12 million kilowatt-hours.36.22).8502.8. find the probabilities that it will take on (a) have to be reset in less than 24 days. find β = 2.12. its gross sales in SECS.005.9868. many salespersons should the company employ to maxi- mize the expected profit? (b) P(Z G −0. 54. A certain kind of appliance requires repairs on the (c) within three standard deviations of the mean. 0. 58. The number of planes arriving per day at a small pri. find the respective values z1 . The number of bad checks that a bank receives during (b) P(Z G z2 ) = 0. that is. probabilities that one of these tires will last 64. 203 . (d) P(−1.33). (c) α = 0. If Z is a random variable having the standard normal able having a gamma distribution with α = 80 n and distribution. 59. a value (b) not have to be reset in at least 180 days.55 < Z < 1. 56.000 miles. 1) dz = α in a given city may be looked upon as a random vari.46 and −0. If the sales cost is $8. (a) α = 0. If zα is defined by years without requiring repairs?  q 60. how (a) P(Z < 1. 55.000 miles. The amount of time that a watch will run without hav. distribution. z3 . z2 . (a) between 0 and z is 0. (b) the probability that at least 25 percent of all new (d) α = 0. (b) within two standard deviations of the mean.14. (a) greater than 1. the given city. In a certain city. What is the probability that the (b) to the left of z is 0.000 hours? 53. a 5-hour business day is a Poisson random variable with λ = 2.000 per salesperson. If the annual proportion of erroneous income tax (d) between −z and z is 0. zα able having a beta distribution with α = 1 and β = 4. If the annual proportion of new restaurants that fail n(z. ing to be reset is a random variable having an exponential distribution with θ = 120 days. (c) P(0.2912. and β = 2. such that (a) P(0 < Z < z1 ) = 0.4306. 62. The mileage (in thousands of miles) that car owners (c) between −0. 57.90 … Z … 0. what is the probability that such an appliance will work at least 3 67. restaurants will fail in the given city in any one year. and z4 (b) at most 30. If Z is a random variable having the standard normal (a) at least 20. find find its values for (a) the mean of this distribution. get with a certain kind of radial tire is a random variable having an exponential distribution with θ = 40.500.01.4726. what are the probabilities of getting a value will be fewer than 10 percent erroneous returns? (a) within one standard deviation of the mean.58 and 1. (b) greater than −0.09. time between two such arrivals is at least 1 hour? (c) to the right of z is 0. tribution with λ = 28. 65. 5–7 thousands of dollars may be regarded as a random √ vari.9700.1314. Special Probability Densities 52. portion of new restaurants that can be expected to fail in (b) α = 0. If Z is a random variable having the standard nor- such a watch will mal distribution. average once every 2 years. What is the probability that it will not receive a bad (c) P(Z > z3 ) = 0. to last? ability that this power supply will be inadequate on any (b) What is the probability that such a semiconductor will given day? still be in operating condition after 4. what is the prob. If X is a random variable having a normal distribu- β = 9.44). show that the Poisson to find this probability and compare your result with the distribution would have yielded a better approximation. the developing time of approximation to the binomial distribution to determine prints may be looked upon as a random variable hav. have bad side effects from a certain kind of medicine. Suppose that we want to use the normal approxi- 85. Check the following data for normality by finding nor- 73. (c) anywhere from 30. 80.9 4. Check in each case whether the normal approxima- tion to the binomial distribution may be used according 84. A random variable has a normal distribution with mal scores and making a normal-scores plot: σ = 10. 150. (b) Interpolate in the standard normal distribution table 77. (a) at least 16. The weights (in pounds) of seven shipments of to the rule of thumb in Section 6. Use a computer program to make a normal-scores justified in using the approximation? plot for the data on the time to make coke in successive (b) Make the approximation and round to four decimals. (c) anywhere from 15. 0.00 to 15. SEC. (a) Use a computer program to find the probability bution to determine (to four decimals) the probability of that a random variable having the normal distribution getting 7 heads and 7 tails in 14 flips of a balanced coin. that during a period of meditation a person’s oxygen con- sumption will be reduced by 79.98.05). (a) Use a computer program to find the probability (c) If a computer printout shows that b(1. 37 45 11 51 13 48 61 (b) n = 65 and θ = 0.10. This question has been intentionally omitted for this edition.00 seconds.8212. Suppose that the actual amount of instant coffee that mal scores and making a normal-scores plot: a filling machine puts into “6-ounce” jars is a random vari- able having a normal distribution with σ = 0.361 will Also refer to the binomial probabilities table of “Statisti- assume a value greater than 8. use the normal approximation to determine (to three decimals) the probability that the (b) at most 35.5670. what must be the mean fill of these jars? 83.6 4.5 1. Special Probability Densities 68. more exact value found in part (a).625. edition.0416 error of the approximation obtained in part (b)? will assume a value between −2. 76. use the normal 71. 78.80 seconds.40 seconds and where from 0. regarded as having come from a normal distribution? 75. cal Tables” to find the error of this approximation.6 cc per among 120 patients with high blood pressure treated with minute and σ = 4. tion of a person’s oxygen consumption is a random vari. the probabilities that the proportion of heads will be any- ing the normal distribution with μ = 15.48 second.6 4. Check the following data for normality by finding nor- 72.0 cc per minute.05 ounce. If 36 22 3 13 31 45 only 3 percent of the jars are to contain less than 6 ounces of coffee. use the normal approximation to find the probability that able having a normal distribution with μ = 37. 0. 74. what is the proba.786 and the standard deviation 1. This question has been intentionally omitted for this mation to the binomial distribution to determine b(1. takes to develop one of the prints will be (b) 1.159 and 0. If the probability that the random variable will take on a value less than 82. bolts are (a) n = 16 and θ = 0.2 bility that it will take on a value greater than 58. Find the probabilities this medicine more than 32 will have bad side effects. (b) at most 14. In a photographic process.05) = that a random variable having the normal distribution 0. with mean 5. more exact value found in part (a). would we be 86. If 23 percent of all patients with high blood pressure 70. making approximations like this also to find this probability and compare your result with the requires a good deal of professional judgment. 8 81. This serves to illustrate that the rule of thumb is just (b) Interpolate in the standard normal distribution table that and no more. If the probability is 0. Can they be (c) n = 120 and θ = 0. what is the percentage with the mean −1. To illustrate the law of large numbers. refuse a loan application. Suppose that during periods of meditation the reduc.20 that a certain bank will (a) at least 44.5 cc per minute. Use the normal approximation to the binomial distri- 69. 150.0 cc per minute.000 times. (a) Based on the rule of thumb in Section 6.0036 rounded to four decimals.49 to 0. runs of a coke oven (given in hours). (c) 10.5 is 0. bank will refuse at most 40 of 225 loan applications. 204 .3? 82. Find the probabilities that the time it (a) 100 times.51 when a balanced coin is flipped σ = 0. With reference to Exercise 75.853 and the standard deviation 1.000 times.20.20 seconds. Make a normal-scores plot of these weights.6 cc per minute. 3.0 to 40. 2 6. σ1 = 10.. Answers to Odd-Numbered Exercises ⎧ ⎪ ⎪ 0 for x … α 51 1.1056.8 11. the function has an absolute maximum at x = 0.6 12. S.5 6. Vols.2 10. N. 1980.6406. J.2 3.227. tions. Inc.. An Introduction to the Co. Princeton.8 6. Inc. Theory of Statistics.22. S. Continuous Univariate Dis. 205 .5276. 1 − θt 19 For 0 < v < 2 the function → q when x → 0.33. 69 (a) 0. 1976..44..7 9. New York: Macmillan Publishing approaches the standard normal distribution when Co. (b) 0.6 9. (b) 0.7 7. J.4 6. Mathematical Statis- tributions.5 6.4 7.8 3. lishing Co.1 4..7 3.0 4.6 10.1 9. μ2 = 1.2231. 57 0.094 ounces.1 5.7 5.3 8.4 5. I. Inc.0 6.4 7. New York: Macmillan Publishing Yule.2 8. (c) 2. 14th ed..0 5. and Olkin.0 6.5 7.8 5. W. B..575. 45 (a) μ1 = −2.8 9. σY|1 = 20 = 4.: D.6 8.3 6. Inc. Probability Mod.1 13. Inc... 11 For 0 < α < 1 the function → q when x → 0.6 6. 73 6.. 67 (a) 1. New York: Hafner Pub- Hastings.. (b) 0. n→q is given in Lindgren.6 7.8 10.9 given by MINITAB.1 5. and Doksum.7 6.9 7.8 6.4 87.6 5. 33 μ3 = 0 and μ4 = 3σ 4 . and Kotz.47. Inc. Van Nostrand Co. 1970.5934. Ltd. Introduction to Statistical Inference. with the following data. Special Probability Densities 7.8 7. 1975.5 3. in outline form. San Francisco: Company.1 3.3 7. (b) 1. (d) 0.4 9.. √ 47 μY|1 = 113 . L.96.. E.92.1 6. Bickel.645. London: Butterworth and Co. (c) 117%. for v = 2 71 (a) 0.1271.9 7.. μ3 = α(α + 1)(α + 2)β 3 .J..7 4.4 9.9 12.0 9. and 65 (a) 1. R. μ4 = α(α + 1)(α + 2)(α + 3)β 4 .4 4.1 5. P. for α = 1 61 (a) 3200 hours. μ2 = α(α + 1)β 2 .1 4.12.9 11..6 7.7. 79 0.0 12.4 8.2 8.4 3. 1950.1 9. 75 (a) yes.0041. 1977. 23 (a) k = αβ. Hogg..6065. σ2 = 5.6 13.1827.4 7.5 10. and Peacock.6 8. and test for normality.0 8.2 7.4 7.5 7. A.1. Statistical Distribu. found in els and Applications. (c) 0. N. and Craig. (d) .. 5 α3 = 0 and α4 = 95 . (d) 2. G. 59 0.4 6.8 5. B.. (b) 0.8 4. and Kendall.9 6..6 6.5 5. V.9 7. 77 0. A. the function has an absolute maximum at x = 0.3 14. Boston: Houghton Mifflin tics: Basic Ideas and Selected Topics. N. ⎪ ⎪ β −α ⎩ 1 for x Ú β 55 (a) 0. Gleser. New York: Keeping. C. 13 μ1 = αβ.. 63 (a) 0.2 7. K. G. 1962.3 11.0062. 6.7 6. ⎨x−α 2 3 F(x) = for α < x < β 53 n = 100. T.. may be found in cal properties of the bivariate normal surface may be Derman.1 8. and ρ = 0.3 12. e−θt 17 MY (t) = .. (b) 2.4 7.2060.8 10. Eighty pilots were tested in a flight simulator and the time for each to take corrective action for a given Use a computer to make a normal-scores plot of these emergency was measured in seconds.6 8.7 8.1413. J. L.. 1978.9 5.1 5.6 10.1 5.0078.2 3. (b) 0.5876..6 5.8 10.0 5. 1 and 2..5 11. A. (c) 1. Statistical Theory. M.5 8. A detailed treatment of the mathematical and statisti- sities. U.8 6.5 7. Holden-Day. Macmillan Publishing Co.. The multivariate normal distribution is treated in matrix and notation in Johnson.9 9.0 8. 4th ed.0208.2 14.2 6. 3rd ed. Introduction to Mathemat- A direct proof that the standardized binomial distribution ical Statistics.4 9.0 Also test these data for normality using the three tests 4.1 8. results: References Useful information about various special probability den. (c) 0.2 4. This page intentionally left blank . . . . Although all three methods can be used in some situations. Eighth Edition. Freund’s Mathematical Statistics with Applications. 207 . X2 . the transformation technique. . . . Inc. and the moment-generating function technique. . 2 Distribution Function Technique A straightforward method of obtaining the probability density of a function of continuous random variables consists of first finding its distribution function and then its probability density by differentiation. . X2 . Irwin Miller. The ones we shall discuss in the next four sections are called the distribution function technique. All rights reserved. . . if X1 . Xn . in most problems one technique will be preferable (easier to use than the others). . Xn ) F y] and then differentiating to get dF(y) f (y) = dy From Chapter 7 of John E. Thus. X2 . Xn and their joint probability distribution or density. for example. given a set of random variables X1 . . This is true. That is. . Xn ). X2 . Copyright © 2014 by Pearson Education. Xn are continu- ous random variables with a given joint probability density.Functions of Random Variables 1 Introduction 4 Transformation Technique: Several Variables 2 Distribution Function Technique 5 Moment-Generating Function Technique 3 Transformation Technique: One Variable 6 The Theory in Application 1 Introduction In this chapter we shall concern ourselves with the problem of finding the probability distributions or densities of functions of one or more random variables. . the probability density of Y = u(X1 . . Xn ) is obtained by first determining an expression for the probability F(y) = P(Y F y) = P[u(X1 . . X2 . . X2 . . . . . in some instances where the function in question is linear in the random variables X1 . xn ) Several methods are available for solving this kind of problem. . . . . and the moment-generating function technique yields the simplest derivations. . Marylees Miller. . we shall be interested in finding the probability distribution or density of some random variable Y = u(X1 . . This means that the values of Y are related to those of the X’s by means of the equation y = u(x1 . x2 . EXAMPLE 2 If Y = |X|. elsewhere. upon differentiation. Solution Letting G(y) denote the value of the distribution function of Y at y. show that ⎧ ⎨f (y) + f (−y) for y > 0 g(y) = ⎩0 elsewhere where f (x) is the value of the probability density of X at x and g(y) is the value of the probability density of Y at y. g(y) = f (y) + f (−y) 208 . Solution For y > 0 we have G(y) = P(Y F y) = P(|X| F y) = P(−y F X F y) = F(y) − F(−y) and. we can write G(y) = P(Y F y) = P(X 3 F y) = P(X F y1/3 )  y1/3 = 6x(1 − x) dx 0 = 3y2/3 − 2y and hence g(y) = 2(y−1/3 − 1) for 0 < y < 1. g(y) = 0. use this result to find the probability density of Y = |X| when X has the standard normal distribution. Also. Functions of Random Variables EXAMPLE 1 If the probability density of X is given by ⎧ ⎨6x(1 − x) for 0 < x < 1 f (x) = ⎩0 elsewhere find the probability density of Y = X 3 . In Exercise 15 the reader will be asked to verify this result by a different technique. x2 > 0 f (x1 . EXAMPLE 3 If the joint density of X1 and X2 is given by ⎧ ⎨6e−3x1 −2x2 for x1 > 0. x2 ) = ⎩0 elsewhere find the probability density of Y = X1 + X2 . 1) = 2n(y. 0. g(y) = 0 for y < 0. we get  y  y−x2 F(y) = 6e−3x1 −2x2 dx1 dx2 0 0 = 1 + 2e−3y − 3e−2y x2 x1  x2  y x1 Figure 1. 0. 0. it follows that g(y) = n(y. 1) + n(−y. Solution Integrating the joint density over the shaded region of Figure 1. Functions of Random Variables Also. 1) for y > 0 and g(y) = 0 elsewhere. Diagram for Example 3. since |x| cannot be negative. 209 . An important application of this result may be found in Example 9. we can thus write  f (y) + f (−y) for y > 0 g(y) = 0 elsewhere If X has the standard normal distribution and Y = |X|. Arbitrarily letting g(0) = 0. 4. Functions of Random Variables and. y > 0 for x > 0 f (x) = f (x. differentiating with respect to y. If X has the uniform density with the parameters α = 0 eter θ . 210 . find (a) the distribution function of Y. use the distribution function technique to and β = 1. find and Z = X 2 + Y 2 . (b) the probability density of Y. y) = ⎩0 ⎪ ⎩0 elsewhere elsewhere  and Y = X 2 . (a) the distribution function of Z. If X has an exponential distribution with the param. If the probability density of X is given by ⎧ ⎧ ⎨2xe−x2 ⎪ ⎨4xye−(x +y ) 2 2 for x > 0. If the joint probability density of X and Y is given by 2. Diagram for Exercise 6. elsewhere. we obtain f (y) = 6(e−2y − e−3y ) for y > 0. x2 x2 2 2 y  x1  x2  2 1  y  x1  x2  2 1 1 x1 x1 1 2 1 2 x2 x2 2 2 0  y  x1  x2  1 y  x1  x2  0 1 1 x1 x1 1 2 1 2 Figure 2. Exercises 1. f (y) = 0. 3. Y = ln X. (b) the probability density of Z. use the distribution function technique to√find find the probability density of the random variable the probability density of the random variable Y = X. distribution function technique. X +Y (b) 0 < y < 1. and Z = . the random variable (a) θ1 Z θ2 . y > 0 to Figure 2. find the 1 probability distribution of Y = . X1 (b) θ1 = θ2 . find the probability density of Z by the 2 (c) 1 < y < 2. θ2 = 12 . If X1 and X2 are independent random variables having (d) y G 2. show that if θ1 = θ2 = 1. With reference to the two random variables of Exer- cise 5. we find that the probability distribution of X is given by x 0 1 2 3 4 1 4 6 4 1 f (x) 16 16 16 16 16 1 Then. find expressions for the distribution function f (x. using the relationship y = to substitute values of Y for values of X. In the discrete case there is no real problem as long as the relationship between the val- ues of X and Y = u(X) is one-to-one. Z= X1 + X2 (Example 3 is a special case of this with θ1 = 1 3 and has the uniform density with α = 0 and β = 1. 1+X Solution Using the formula for the binomial distribution with n = 4 and θ = 12 . 3 Transformation Technique: One Variable Let us show how the probability distribution or density of a function of a random variable can be determined without first getting its distribution function. we 1+x find that the probability distribution of Y is given by 1 1 1 1 y 1 2 3 4 5 1 4 6 4 1 g(y) 16 16 16 16 16 211 . exponential densities with the parameters θ1 and θ2 . all we have to do is make the appropriate substitution. EXAMPLE 4 If X is the number of heads obtained in four tosses of a balanced coin. ⎧ ing the uniform density with α = 0 and β = 1.) 8. the distribution function technique to find the probability density of Y = X1 + X2 when 7. Referring ⎨e−(x+y) for x > 0. y) = ⎩0 elsewhere of Y = X1 + X2 for (a) y F 0. Functions of Random Variables 5. use Also find the probability density of Y. Let X1 and X2 be independent random variables hav. If the joint density of X and Y is given by 6. Functions of Random Variables If we had wanted to make the substitution directly in the formula for the binomial 1 distribution with n = 4 and θ = 12 . we could have substituted x = − 1 for x in y . 4 x 2 getting . 1. 3. 2. 4 1 4 f (x) = for x = 0. . 1 4 1 4 1 1 1 1 g(y) = f −1 = 1 for y = 1, , , , y y −1 2 2 3 4 5 Note that in the preceding example the probabilities remained unchanged; the only difference is that in the result they are associated with the various values of Y instead of the corresponding values of X. That is all there is to the transformation (or change-of-variable) technique in the discrete case as long as the relationship is one-to-one. If the relationship is not one-to-one, we may proceed as in the follow- ing example. EXAMPLE 5 With reference to Example 4, find the probability distribution of the random vari- able Z = (X − 2)2 . Solution Calculating the probabilities h(z) associated with the various values of Z, we get 6 h(0) = f (2) = 16 4 4 8 h(1) = f (1) + f (3) = + = 16 16 16 1 1 2 h(4) = f (0) + f (4) = + = 16 16 16 and hence z 0 1 4 3 4 1 h(z) 8 8 8 To perform a transformation of variable in the continuous case, we shall assume that the function given by y = u(x) is differentiable and either increasing or decreas- ing for all values within the range of X for which f (x) Z 0, so the inverse function, given by x = w(y), exists for all the corresponding values of y and is differentiable except where u (x) = 0.† Under these conditions, we can prove the following theorem. † To avoid points where u (x) might be 0, we generally do not include the endpoints of the intervals for which probability densities are nonzero. This is the practice that we follow throughout this chapter. 212 Functions of Random Variables THEOREM 1. Let f (x) be the value of the probability density of the con- tinuous random variable X at x. If the function given by y = u(x) is differentiable and either increasing or decreasing for all values within the range of X for which f (x) Z 0, then, for these values of x, the equation y = u(x) can be uniquely solved for x to give x = w(y), and for the corre- sponding values of y the probability density of Y = u(X) is given by g(y) = f [w(y)] · |w (y)| provided u (x) Z 0 Elsewhere, g(y) = 0. Proof First, let us prove the case where the function given by y = u(x) is increasing. As can be seen from Figure 3, X must take on a value between w(a) and w(b) when Y takes on a value between a and b. Hence, P(a < Y < b) = P[w(a) < X < w(b)]  w(b) = f (x) dx w(a)  b = f [w(y)]w (y) dy a where we performed the change of variable y = u(x), or equivalently x = w(y), in the integral. The integrand gives the probability density of Y as long as w (y) exists, and we can write g(y) = f [w(y)]w (y) When the function given by y = u(x) is decreasing, it can be seen from Figure 3 that X must take on a value between w(b) and w(a) when Y takes on a value between a and b. Hence, P(a < Y < b) = P[w(b) < X < w(a)]  w(a) = f (x) dx w(b)  a = f [w(y)]w (y) dy b  b =− f [w(y)]w (y) dy a where we performed the same change of variable as before, and it fol- lows that g(y) = −f [w(y)]w (y) dx 1 Since w (y) = = is positive when the function given by y = u(x) is dy dy dx increasing, and −w (y) is positive when the function given by y = u(x) is decreasing, we can combine the two cases by writing g(y) = f [w(y)] · |w (y)| 213 Functions of Random Variables y y = u(x) b a x w(a) w(b) Increasing function y b y = u(x) a x w(b) w(a) Decreasing function Figure 3. Diagrams for proof of Theorem 1. EXAMPLE 6 If X has the exponential distribution given by ⎧ ⎨e−x for x > 0 f (x) = ⎩0 elsewhere √ find the probability density of the random variable Y = X. Solution √ The equation y = x, relating the values of X and Y, has the unique inverse x = y2 , dx which yields w (y) = = 2y. Therefore, dy g(y) = e−y |2y| = 2ye−y 2 2 for y > 0 in accordance with Theorem 1. Since the probability of getting a value of Y less than or equal to 0, like the probability of getting a value of X less than or equal to 0, is zero, it follows that the probability density of Y is given by 214 Functions of Random Variables  2ye−y 2 for y > 0 g(y) = 0 elsewhere 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 f (x)  ex 0.5 0.4 0.4 2 0.3 0.3 g(y)  2yey 0.2 0.2 0.1 0.35 0.1 0.35 x y 1 2 3 4 5 1 2 3 4 5 Figure 4. Diagrams for Example 6. The two diagrams of Figure 4 illustrate what happened in this example when we transformed from X to Y. As in the discrete case (for instance, Example 4), the prob- abilities remain the same, but they pertain to different values (intervals of values) of the respective random variables. In the diagram on the left, the 0.35 probability per- tains to the event that X will take on a value on the interval from 1 to 4, and in the diagram on the right, the 0.35 probability pertains to the event that Y will take on a value on the interval from 1 to 2. EXAMPLE 7 If the double arrow of Figure 5 is spun so that the random variable  has the uni- form density  1 for − π2 < θ < π2 f (θ ) = π 0 elsewhere determine the probability density of X, the abscissa of the point on the x-axis to which the arrow will point. u a x  a · tan u 0 x Figure 5. Diagram for Example 7. 215 Functions of Random Variables Solution As is apparent from the diagram, the relationship between x and θ is given by x = a · tan θ , so that dθ a = 2 dx a + x2 and it follows that 1 a g(x) = · 2 π a + x2 1 a = · for − q < x < q π a2 + x2 according to Theorem 1. EXAMPLE 8 If F(x) is the value of the distribution function of the continuous random variable X at x, find the probability density of Y = F(X). Solution As can be seen from Figure 6, the value of Y corresponding to any particular value of X is given by the area under the curve, that is, the area under the graph of the density of X to the left of x. Differentiating y = F(x) with respect to x, we get dy = F  (x) = f (x) dx and hence dx 1 1 = = dy dy f (x) dx provided f (x) Z 0. It follows from Theorem 1 that 1 g(y) = f (x) · =1 f (x) for 0 < y < 1, and we can say that y has the uniform density with α = 0 and β = 1. y  F (x) x Figure 6. Diagram for Example 8. 216 Functions of Random Variables The transformation that we performed in this example is called the probability integral transformation. Not only is the result of theoretical importance, but it facili- tates the simulation of observed values of continuous random variables. A reference to how this is done, especially in connection with the normal distribution, is given in the end of the chapter. When the conditions underlying Theorem 1 are not met, we can be in serious difficulties, and we may have to use the method of Section 2 or a generalization of Theorem 1 referred to among the references at the end of the chapter; sometimes, there is an easy way out, as in the following example. EXAMPLE 9 If X has the standard normal distribution, find the probability density of Z = X 2 . Solution Since the function given by z = x2 is decreasing for negative values of x and increas- ing for positive values of x, the conditions of Theorem 1 are not met. However, the transformation from X to Z can be made in two steps: First, we find the probability density of Y = |X|, and then we find the probability density of Z = Y 2 (= X 2 ). As far as the first step is concerned, we already studied the transformation Y = |X| in Example 2; in fact, we showed that if X has the standard normal dis- tribution, then Y = |X| has the probability density 2 1 2 g(y) = 2n(y; 0, 1) = √ e− 2 y 2π for y > 0, and g(y) = 0 elsewhere. For the second step, the function given by z = y2 is increasing for y > 0, that is, for all values of Y for which g(y) Z 0. Thus, we can use Theorem 1, and since dy 1 1 = z− 2 dz 2 we get 2 − 1 z 1 − 1 h(z) = √ e 2 2z 2 2π 1 1 1 = √ z− 2 e− 2 z 2π √ for z > 0, and h(z) = 0 elsewhere. Observe that since ( 12 ) = π, the distribution we have arrived at for Z is a chi-square distribution with ν = 1. 4 Transformation Technique: Several Variables The method of the preceding section can also be used to find the distribution of a random variable that is a function of two or more random variables. Suppose, for instance, that we are given the joint distribution of two random variables X1 and X2 and that we want to determine the probability distribution or the probability density 217 Functions of Random Variables of the random variable Y = u(X1 , X2 ). If the relationship between y and x1 with x2 held constant or the relationship between y and x2 with x1 held constant permits, we can proceed in the discrete case as in Example 4 to find the joint distribution of Y and X2 or that of X1 and Y and then sum on the values of the other random variable to get the marginal distribution of Y. In the continuous case, we first use Theorem 1 with the transformation formula written as ⭸x1 g(y, x2 ) = f (x1 , x2 ) · ⭸y or as ⭸x2 g(x1 , y) = f (x1 , x2 ) · ⭸y where f (x1 , x2 ) and the partial derivative must be expressed in terms of y and x2 or x1 and y. Then we integrate out the other variable to get the marginal density of Y. EXAMPLE 10 If X1 and X2 are independent random variables having Poisson distributions with the parameters λ1 and λ2 , find the probability distribution of the random variable Y = X1 + X2 . Solution Since X1 and X2 are independent, their joint distribution is given by e−λ1 (λ1 )x1 e−λ2 (λ2 )x2 f (x1 , x2 ) = · x1 ! x2 ! e−(λ1 +λ2 ) (λ1 )x1 (λ2 )x2 = x1 !x2 ! for x1 = 0, 1, 2, . . . and x2 = 0, 1, 2, . . .. Since y = x1 + x2 and hence x1 = y − x2 , we can substitute y − x2 for x1 , getting e−(λ1 +λ2 ) (λ2 )x2 (λ1 )y−x2 g(y, x2 ) = x2 !(y − x2 )! for y = 0, 1, 2, . . . and x2 = 0, 1, . . . , y, for the joint distribution of Y and X2 . Then, summing on x2 from 0 to y, we get  y e−(λ1 +λ2 ) (λ2 )x2 (λ1 )y−x2 h(y) = x2 !(y − x2 )! x2 =0 e−(λ1 +λ2 )  y y! = · (λ2 )x2 (λ1 )y−x2 y! x2 !(y − x2 )! x2 =0 218 Functions of Random Variables after factoring out e−(λ1 +λ2 ) and multiplying and dividing by y!. Identifying the sum- mation at which we arrived as the binomial expansion of (λ1 + λ2 )y , we finally get e−(λ1 +λ2 ) (λ1 + λ2 )y h(y) = for y = 0, 1, 2, . . . y! and we have thus shown that the sum of two independent random variables having Poisson distributions with the parameters λ1 and λ2 has a Poisson distribution with the parameter λ = λ1 + λ2 . EXAMPLE 11 If the joint probability density of X1 and X2 is given by  e−(x1 +x2 ) for x1 > 0, x2 > 0 f (x1 , x2 ) = 0 elsewhere X1 find the probability density of Y = . X1 + X2 Solution Since y decreases when x2 increases and x1 is held constant, we can use Theorem 1 to x1 1−y find the joint density of X1 and Y. Since y = yields x2 = x1 · and hence x1 + x2 y ⭸x2 x1 =− 2 ⭸y y it follows that x x −x1 /y 1 − 2 = 2 · e−x1 /y 1 g(x1 , y) = e y y for x1 > 0 and 0 < y < 1. Finally, integrating out x1 and changing the variable of inte- gration to u = x1 /y, we get  q x1 −x1 /y h(y) = 2 ·e dx1 0 y  q = u · e−u du 0 = (2) =1 for 0 < y < 1, and h(y) = 0 elsewhere. Thus, the random variable Y has the uniform density with α = 0 and β = 1. (Note that in Exercise 7 the reader was asked to show this by the distribution function technique.) The preceding example could also have been worked by a general method where we begin with the joint distribution of two random variables X1 and X2 and determine 219 Functions of Random Variables the joint distribution of two new random variables Y1 = u1 (X1 , X2 ) and Y2 = u2 (X1 , X2 ). Then we can find the marginal distribution of Y1 or Y2 by summation or integration. This method is used mainly in the continuous case, where we need the following theorem, which is a direct generalization of Theorem 1. THEOREM 2. Let f (x1 , x2 ) be the value of the joint probability density of the continuous random variables X1 and X2 at (x1 , x2 ). If the functions given by y1 = u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) are partially differentiable with respect to both x1 and x2 and represent a one-to-one transformation for all values within the range of X1 and X2 for which f (x1 , x2 ) Z 0, then, for these values of x1 and x2 , the equations y1 = u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) can be uniquely solved for x1 and x2 to give x1 = w1 (y1 , y2 ) and x2 = w2 (y1 , y2 ), and for the corresponding values of y1 and y2 , the joint probability density of Y1 = u1 (X1 , X2 ) and Y2 = u2 (X1 , X2 ) is given by g(y1 , y2 ) = f [w1 (y1 , y2 ), w2 (y1 , y2 )] · |J| Here, J, called the Jacobian of the transformation, is the determinant ⭸x1 ⭸x1 ⭸y ⭸y 1 2 J= ⭸x2 ⭸x2 ⭸y1 ⭸y2 Elsewhere, g(y1 , y2 ) = 0. We shall not prove this theorem, but information about Jacobians and their applications can be found in most textbooks on advanced calculus. There they are used mainly in connection with multiple integrals, say, when we want to change from rectangular coordinates to polar coordinates or from rectangular coordinates to spherical coordinates. EXAMPLE 12 With reference to the random variables X1 and X2 of Example 11, find X1 (a) the joint density of Y1 = X1 + X2 and Y2 = ; X1 + X2 (b) the marginal density of Y2 . Solution x1 (a) Solving y1 = x1 + x2 and y2 = for x1 and x2 , we get x1 = y1 y2 and x1 + x2 x2 = y1 (1 − y2 ), and it follows that y2 y1 J= = −y1 1 − y2 −y1 220 x2 ) = 0 elsewhere find (a) the joint density of Y = X1 + X2 and Z = X2 . we can use Theorem 2 and we get g(y. (b) the marginal density of Y. EXAMPLE 13 If the joint density of X1 and X2 is given by  1 for 0 < x1 < 1. so that 1 −1 J= =1 0 1 Because this transformation is one-to-one. 0 < x2 < 1 f (x1 . (b) Using the joint density obtained in part (a) and integrating out y1 . z) = 1 · |1| = 1 for z < y < z + 1 and 0 < z < 1. Solution (a) Solving y = x1 + x2 and z = x2 for x1 and x2 . mapping the region x1 > 0 and x2 > 0 in the x1 x2 -plane into the region y1 > 0 and 0 < y2 < 1 in the y1 y2 -plane. elsewhere. we can use Theorem 2 and it follows that g(y1 . h(y2 ) = 0. y2 ) = 0. y2 ) dy1 0  q = y1 e−y1 dy1 0 = (2) =1 for 0 < y2 < 1. z) = 0. elsewhere. Note that in Exercise 6 the reader was asked to work the same problem by the dis- tribution function technique. 221 . Functions of Random Variables Since the transformation is one-to-one. we get  q h(y2 ) = g(y1 . y2 ) = e−y1 | − y1 | = y1 e−y1 for y1 > 0 and 0 < y2 < 1. mapping the region 0 < x1 < 1 and 0 < x2 < 1 in the x1 x2 -plane into the region z < y < z + 1 and 0 < z < 1 in the yz-plane (see Figure 7). we get x1 = y − z and x2 = z. g(y1 . elsewhere. g(y. For instance. and y G 2. we let h(1) = 1. (b) Integrating out z separately for y F 0. X3 ). We have thus shown that the sum of the given random variables has the triangular probabil- ity density whose graph is shown in Figure 8.X2 . 0 < y < 1. So far we have considered here only functions of two random variables. X2 . we get ⎧ ⎪ ⎪ 0 for y F 0 ⎪ ⎪ ⎪ ⎪  y ⎪ ⎪ ⎪ ⎨ 1 · dz = y for 0 < y < 1 h(y) = 0 ⎪ 1 ⎪ ⎪ ⎪ 1 · dz = 2 − y for 1 < y < 2 ⎪ ⎪ ⎪ ⎪ y−1 ⎪ ⎩ 0 for y G 2 and to make the density function continuous. Y2 = u2 (X1 . if we are given the joint probability density of three random variables X1 . Functions of Random Variables z z1 1 x zy zy1 y 0 z0 1 2 Figure 7. and 222 . 1 < y < 2. X2 . but the method based on Theorem 2 can easily be generalized to functions of three or more random variables. X3 ). and X3 and we want to find the joint probabil- ity density of the random variables Y1 = u1 (X1 . h(y) 1 h(y)  y h(y)  2  y y 0 1 2 Figure 8. Transformed sample space for Example 13. Triangular probability density. EXAMPLE 14 If the joint probability density of X1 . y3 ) = 0. but the Jacobian is now the 3 * 3 determinant ⭸x1 ⭸x1 ⭸x1 ⭸y ⭸y ⭸y 1 2 3 ⭸x2 ⭸x2 ⭸x2 J = ⭸y1 ⭸y2 ⭸y3 ⭸x3 ⭸x3 ⭸x3 ⭸y1 ⭸y2 ⭸y3 Once we have determined the joint probability density of the three new random variables. X2 . y2 . g(y1 . and X3 is given by ⎧ ⎨e−(x1 +x2 +x3 ) for x1 > 0. elsewhere. x3 ) = ⎩0 elsewhere find (a) the joint density of Y1 = X1 + X2 + X3 . x2 . and y3 = x3 for x1 . h(y1 ) = 0 elsewhere. x2 . since the transformation is one-to-one. Functions of Random Variables Y3 = u3 (X1 . we get x1 = y1 − y2 − y3 . by integration. X2 . y2 . x2 = y2 . y3 ) = e−y1 · |1| = e−y1 for y2 > 0. we can find the marginal density of any two of the random variables. Observe that we have shown that the sum of three independent random variables having the gamma distribution with α = 1 and β = 1 is a random variable having the gamma distribution with α = 3 and β = 1. y2 = x2 . or any one. 223 . (b) Integrating out y2 and y3 . Solution (a) Solving the system of equations y1 = x1 + x2 + x3 . X3 ). (b) the marginal density of Y1 . and y1 > y2 + y3 . It follows that 1 −1 −1 J = 0 1 0 = 1 0 0 1 and. and x3 . that g(y1 . and x3 = y3 . we get  y1  y1 −y3 h(y1 ) = e−y1 dy2 dy3 0 0 1 2 −y1 = y ·e 2 1 for y1 > 0. and Y3 = X3 . Y2 = X2 . y3 > 0. the general approach is the same. x3 > 0 f (x1 . x2 > 0. probability distribution of Y1 = X1 + X2 . find (a) the joint distribution of U = X + Y and V = X − Y. (b) Find the probability density of Z = X 4 (= Y 4 ). 224 . the number of successes minus the number of failures. edition. This question has been intentionally omitted for this density having α = 1 and β = 3. If X has a uniform density with α = 0 and β = 1. 26. Consider the random variable X with the probabil- 10. X2 . 19. This question has been intentionally omitted for this density of Y = |X|. This question has been intentionally omitted for this tribution. y) = ⎨ for 0 < x < 2 7 f (x) = 2 ⎪ ⎪0 ⎩ elsewhere for x = 1. In X has a gamma dis. find the probability distribution of Y. edition. show that the random variable Y = −2. This question has been intentionally omitted for this N = 6. Also. Exercises 9. ity density tribution of the random variable Z = (X − 1)2 . plot the graphs of the probability densities of X and Y and indi. find where k is an appropriate constant. If X has a binomial distribution with n = 3 and θ = 13 . 22. 20. 2. find the probability (a) the joint distribution of Y1 = X1 + X2 and Y2 = 2X X1 − X2 . (a) Use the result of Example 2 to find the probability 14. edition. given by x1 x2 16. Functions of Random Variables As the reader will find in Exercise 39. 21. density of the random variable Y = . distribution of Y. ⎪ ⎪ ⎩0 elsewhere (b) the probability distribution of X1 /X2 . 2 and y = 1. the formula for the probability distribution of the random (b) Find the probability density of Z = X 2 (= Y 2 ). If X has a geometric distribution with θ = 13 . find ⎨ for x > 0 f (x) = (1 + 2x)6 (a) the probability distribution of X1 X2 . find the probability dis. find density of Y = |X|. θ1 = 14 . 24. ⎩0 elsewhere 1+X (b) U = (X − 1)4 . θ2 = 13 . If the probability density of X is given by f (x1 . and thus determine the value of k. 2. and X3 have the multinomial distribution cate the respective areas under the curves that represent with n = 2. ⎨ for − 1 < x < 1 find the probability distributions of f (x) = 2 X ⎪ ⎪ (a) Y = . and n = 2. x2 ) = 36 ⎧ ⎪ ⎪ kx3 for x1 = 1. If X1 . With reference to Exercise 9. If the joint probability distribution of X1 and X2 is cise 2. With reference to Exercise 22. 23. (b) the marginal distribution of U. 3. 25. 2. Y2 = X1 − X2 . and Y3 = X3 . If the probability density of X is given by given by ⎧ (x − y)2 ⎪ ⎪ x f (x. it would have been easier to obtain the result of part (b) of Example 14 by using the method based on Theorem 1. ⎧ ⎪ ⎪ 3x2 11. Identify the 1 + 2X (b) the marginal distribution of Y1 . 15. variable Y = 4 − 5X. 3. What are its parameters? edition. (a) Use the result of Example 2 to find the probability 12. and θ3 = 12 5 . find the probability density of Y = X 3 . Consider the random variable X with the uniform 13. If the joint probability distribution of X and Y is 17. Use the transformation technique to rework Exer. 3 and x2 = 1. 18. find the joint P( 12 < X < 1) and P( 18 < Y < 1). If X has a hypergeometric distribution with M = 3. 33. then finding the marginal density of Z. The method of transformation based on Theorem 1 can be generalized so that it applies also to random vari- 1 ables that are functions of two or more random variables. Let X and Y be two independent random variables Y and Z and then integrating out y. 31. having the joint probability density ing the geometric distribution with the parameter θ . x + y < 1 ⎨12xy(1 − y) f (x. X +Y (b) Find and identify the marginal density of U. y) = 2 − y for Regions III and IV of Figure 9 Find the probability density of U = Y − X by using ⎪ ⎪ ⎪ ⎩0 Theorem 1. Find the probability density of Z = XY 2 by using Theorem 1 to determine the joint probability density of 38. 35. x2 ) = 0 elsewhere 29. binomial distribution with the parameters n1 + n2 and θ . Find the probability density of Y1 = X1 + X2 by using we introduce the new random variable in place of one of Theorem 1 to determine the joint probability density of the original random variables. and then we eliminate (by X1 and Y1 and then integrating out x1 . If X1 and X2 are independent random variables hav. Let X1 and X2 be two continuous random variables 28. Let X and Y be two continuous random variables hav- 30. the sum of two independent random variables having the uniform density with α = 0 and β = 1. elsewhere 225 . f (x1 . Use this method to rework Example 14. 0 < y < 1 0 elsewhere f (x. show that ⎧ if U = Y + X3 = X1 + X2 + X3 . Rework Exercise 30 by using Theorem 2 to determine (a) Find the joint probability density of the random vari- X the joint probability density of Z = XY 2 and U = Y and ables U = and V = X + Y. then ⎪ ⎨1 (a) the joint probability density of U and Y is given by for x > 0. having identical gamma distributions. x + y < 2 f (x. Consider two random variables X and Y with the ing the joint probability density joint probability density  ⎧ 24xy for 0 < x < 1. X1 X2 . If X and Y are independent random variables hav- ing the standard normal distribution. Functions of Random Variables 27. y) = 2 ⎧ ⎪ ⎩0 elsewhere ⎪ ⎪y for Regions I and II of Figure 9 ⎪ ⎨ g(u. show that Y = X1 + X2 has the and then finding the marginal density of U. which has the same uniform probability density is given by density and is independent of both X1 and X2 . Rework Exercise 32 by using Theorem 2 to determine the joint probability density of Y1 = X1 + X2 and Y2 = 40. (Hint: Complete the square in the exponent. Also. Consider two random variables X and Y whose joint third random variable X3 . show that Y = X1 + X2 is a random variable having the neg. In Example 13 we found the probability density of X1 − X2 and then finding the marginal density of Y1 . 0 < x2 < 1 k = 2. for example. y) = ⎩0 elsewhere Find the joint probability density of Z = X + Y and W = X. identify the summation or integration) the other two random vari- distribution of Y1 . If X1 and X2 are independent random variables hav. show that the ran- Find the joint probability density of Y1 = X12 and Y2 = dom variable Z = X + Y is also normally distributed. 0 < y < 1. y) = for 0 < x < 1. Rework Exercise 34 by using Theorem 2 to determine ing binomial distributions with the respective parameters the joint probability density of U = Y − X and V = X n1 and θ and n2 and θ . f (x) = for − q < x < q π(1 + x2 ) So far we have used this method only for functions of two random variables. Consider two independent random variables X1 and X2 having the same Cauchy distribution 39. 36. ables with which we began.  ative binomial distribution with the parameters θ and 4x1 x2 for 0 < x1 < 1. but when there are three. 32.) What are the mean and the variance of this normal distribution? 37. Given a 34. y > 0. . 2. 5 Moment-Generating Function Technique Moment-generating functions can play an important role in determining the prob- ability distribution or density of a function of random variables when the func- tion is a linear combination of n independent random variables. Functions of Random Variables y (b) the probability density of U is given by ⎧ ⎪ ⎪ 0 for u F 0 ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 2 IV ⎪ ⎪ for 0 < u < 1 ⎪ ⎪ u ⎪ ⎪ 2 ⎪ ⎨ yu III h(u) = 1 u2 − 3 (u − 1)2 for 1 < u < 2 1 yu1 ⎪ ⎪ 2 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 2 3 3 II ⎪ ⎪ u − (u − 1)2 + (u − 2)2 for 2 < u < 3 ⎪ ⎪ ⎪ ⎪ 2 2 2 ⎪ ⎪ I ⎪ ⎩ 0 for u G 3 u 1 2 3 Note that if we let h(1) = h(2) = 12 . The method is based on the following theorem that the moment-generating function of the sum of n independent random variables equals the product of their moment-generating functions. the sum of n inde- pendent random variables. . . We shall illustrate this technique here when such a linear combination is. . . we can write 226 . . then the n random variables are independent if and only if f (x1 . . . . If f(x1 . . . . then  n MY (t) = MXi (t) i=1 where MXi (t) is the value of the moment-generating function of Xi at t. . x2 . X2 . xn ) is the value of the joint proba- bility distribution of the discrete random variables X1 . . . . xn ) and fi (xi ) is the value of the marginal distribution of Xi at xi for i = 1. in fact. xn ) = f1 (x1 ) · f2 (x2 ) · . . Proof Making use of the fact that the random variables are independent and hence f (x1 . Xn at (x1 . . xn ) within their range”. x2 . · fn (xn ) according to the following definition “INDEPENDENCE OF DISCRETE RANDOM VARIABLES . .. · fn (xn ) for all (x1 . X2 . probability density of U continuous. . . x2 . . . Diagram for Exercise 40. . . xn ) = f1 (x1 ) · f2 (x2 ) · . x2 . this will make the Figure 9. . . . . If X1 . n. x2 . . . . leaving it to the reader to generalize it in Exercises 45 and 46. . . and Xn are independent random variables and Y = X1 + X2 + · · · + Xn . . THEOREM 3. . . the distribution of the sum of n independent random variables having Poisson distributions with the param- eters λi is a Poisson distribution with the parameter λ = λ1 + λ2 + · · · + λn . Note that in Example 10 we proved this for n = 2. we have only to replace all the integrals by sums. Thus. . x2 . λn . X2 . for Y = X1 + X2 + · · · + Xn . To prove it for the discrete case. Solution Since the exponential distribution is a gamma distribution with α = 1 and β = θ . . find the probability density of the random variable Y = X1 + X2 + · · · + Xn . λ2 . . . . . X2 . . . we must be able to identify whatever probability distribution or density corresponds to MY (t). Note that if we want to use Theorem 3 to find the probability distribution or the probability density of the random variable Y = X1 + X2 + · · · + Xn . . Xn are independent random variables having exponential distributions with the same parameter θ . . Xn having Poisson distributions with the respective parameters λ1 . we have MXi (t) = (1 − θ t)−1 227 . . we obtain  n eλi (e −1) = e(λ1 +λ2 + ··· +λn )(e −1) t t MY (t) = i=1 which can readily be identified as the moment-generating function of the Poisson distribution with the parameter λ = λ1 + λ2 + · · · + λn . . Solution By the theorem “The moment-generating function of the Poisson distribution is given by MX (t) = eλ(e −1) ” we have t MXi (t) = eλi (e −1) t hence. EXAMPLE 15 Find the probability distribution of the sum of n independent random variables X1 . xn ) dx1 dx2 · · · dxn −q −q  q  q  q x1 t x2 t = e f1 (x1 ) dx1 · e f2 (x2 ) dx2 · · · exn t fn (xn ) dxn −q −q −q  n = MXi (t) i=1 which proves the theorem for the continuous case. . EXAMPLE 16 If X1 . . Functions of Random Variables MY (t) = E(eYt )   = E e(X1 +X2 + ··· +Xn )t  q  q = ··· e(x1 +x2 + ··· +xn )t f (x1 . Exercises 41. Use the moment-generating function technique to and identify the corresponding distribution. and Xn is the num- ber of successes on the nth trial. Xn are independent random variables having the same Bernoulli distribution f (x. θ ) = θ x (1 − θ )1−x for x = 0. 44. Suppose that X1 . If n independent random variables have the same where MXi (t) is the value of the moment-generating func- gamma distribution with the parameters α and β. find the moment-generating function of their sum distribution? 228 . What are the mean and the variance of this tions σi . find the tion of Xi at t. X2 is the number of successes on the second trial. if n identify its distribution. then independent random variables have geometric distribu- tions with the same parameter θ . . Prove the following generalization of Theorem 3: If 42. its mean. Use the result of Exercise 45 to show that. X2 . Functions of Random Variables and hence  n MY (t) = (1 − θ t)−1 = (1 − θ t)−n i=1 Identifying the moment-generating function of Y as that of a gamma distribution with α = n and β = θ . If n independent random variables Xi have normal then Y = a1 X1 + a2 X2 + · · · + an Xn has a normal dis- distributions with the means μi and the standard devia. . tribution. their sum is a random  n variable having the negative binomial distribution with MY (t) = MXi (ai t) the parameters θ and k. Y = X1 + X2 + · · · + Xn is the total number of successes in n trials. i=1 43. since X1 is the number of successes on the first trial. independent random variables Xi have normal distribu- tions with the means μi and the standard deviations σi .. 45.. its variance. . moment-generating function of their sum and. . Find the moment-generating function of the negative X1 . and Xn are independent random variables and binomial distribution by making use of the fact that if k Y = a1 X1 + a2 X2 + · · · + an Xn . Of course. where we showed that the sum of three independent random variables having exponential distributions with the parameter θ = 1 has a gamma distribution with α = 3 and β = 1. . 46. we conclude that the distribution of the sum of n independent random variables having exponential distributions with the same parameter θ is a gamma distribution with the parameters α = n and β = θ . . We have MXi (t) = e0·t (1 − θ ) + e1·t θ = 1 + θ (et − 1) so that Theorem 3 yields  n MY (t) = [1 + θ (et − 1)] = [1 + θ (et − 1)]n i=1 This moment-generating function is readily identified as that of the binomial dis- tribution with the parameters n and θ . . Theorem 3 also provides an easy and elegant way of deriving the moment- generating function of the binomial distribution. X2 . 1. if possible. . This is a fruitful way of looking at the binomial distribution. Note that this agrees with the result of Example 14. and rework Exercise 27. . . that is. we must remember that the differential element.) Solution A simple alternate way to use the distribution-function technique is to write down the differential element of the density function. of the transformed obser- vations. y. some other distribution might have been selected to impart better properties to this estimate. f (x) dx. and to substitute x2 for y.) We obtain 1 1 · 2x · e− 2 (x 2 −μ)2 /σ 2 f (x) dx = √ dx 2π σ The required density function is given by  2 1 2 xe− 2 (x −μ) /σ 2 2 f (x) = πσ 2 This distribution is not immediately recognizable. The probability R 1 E distribution of R is given by f (R) = for 0 < R … A. we have I = u(R) = . EXAMPLE 18 What underlying distribution of the data is assumed when the square-root transfor- mation is used to obtain approximately normally distributed data? (Assume the data are nonnegative. (When we do this. with respect to this example. but it can be graphed quickly using appropriate computer software. 229 . EXAMPLE 17 Suppose the resistance in a simple circuit varies randomly in response to environ- mental conditions. Find the distribution of the random variable I. Thus. If the nominal value of R is to be the mean of this distribution. must be changed to dx = 2x dx. To illustrate these applications. the current flowing through the circuit. To determine the effect of this variation on the current flowing through the circuit. the probability of a negative observation is zero. The first example illustrates an application of the transformation technique to a simple problem in electrical engineering. dy. an experiment was performed in which the resistance (R) was varied with equal probabilities on the interval 0 < R … A and the ensuing voltage (E) was measured. Functions of Random Variables 6 The Theory in Application Examples of the need for transformations in solving practical problems abound. The next example illustrates transformations of data to normality. we give three examples. Solution E Using the well-known relation E = IR. w(I) = . and the A I probability density of I is given by  1 E E g(I) = f (R) · |w (I)| = − 2 = R>0 A R AR2 It should be noted. that this is a designed experi- ment in as much as the distribution of R was preselected as a uniform distribution. find the probability that a given substance will emit 2 particles in less than or equal to 3 seconds. n − 1. Then the total time for n emissions to take place is the sum T = x0 + x1 + · · · + xn−1 . . 1. 2. Thus. It can be shown that such a process has no memory. . If the parameter λ equals 5. The required probability is given by . the time between successive emissions also can be described by this distribution. that is. that is. The moment-generating function of this sum is given in Example 16 to be MT (t) = (1 − t/λ)−n This can be recognized as the moment-generating function of the gamma distribu- tion with parameters α = n = 2 and β = 1/λ = 1/5. it follows that successive emissions of α particles are independent. for i = 0. the time for the nucleus to emit the first α particle is x (in seconds). . . Functions of Random Variables The final example illustrates an application to waiting-time problems. EXAMPLE 19 Let us assume that the decay of a radioactive element is exponentially distributed. so that f (x) = λe−λx for λ > 0 and x > 0. Solution Let xi be the waiting time between emissions i and i + 1. Let X be the amount of premium gasoline (in 1.  3 1 1 P T … 3. it is clear that this event is virtually certain to occur. ⎧ 50. ability density of the amount that the service station has left in its tanks at the end of the day.000 ⎪ ⎪ 3 ⎪ gallons) that a service station has in its tanks at the ⎨ 11 (5x1 + x2 ) ⎪ for x1 > 0. This question has been intentionally omitted for this edition. the ⎪ ⎩0 expected total percentage of copper and iron in the ore. This question has been intentionally omitted for this use the distribution function technique to find the prob- edition. Also find E(Y). f (x1 . β = = 1 x e−5x dx 5 (2) 0 5 Integrating by parts. 1–2 47. the integral becomes 3  3 1 1 P(T … 3) = − xe−5x − − e−5x dx = 1 − 1. beginning of a day. X1 and X2 . If the joint density of 49.6e−15 5 0 0 5 Without further evaluation. x2 ) = and x1 + 2x2 < 2 ⎪ ⎪ tion sells during that day. 51. respectively. x2 > 0. y) = 200 ability density of Y = X1 + X2 . This question has been intentionally omitted for this these two random variables is given by edition. and Y the amount that the service sta. α = 10. elsewhere 230 . 48. The percentages of copper and iron in a certain kind of ore are. If the joint density of X and Y ⎪ ⎪ ⎩0 elsewhere is given by ⎧ ⎪ ⎨ 1 use the distribution function technique to find the prob- for 0 < y < x < 20 f (x. Applied Exercises SECS. If X has the exponential distribution given by f (x) = numbers of calls that she receives on these phones are 0. what are the probabili- ical physics. 5 65. x > 0. find the probability that x > 1. This question has been intentionally omitted for this attendant to balance a tire is a random variable having an edition. d. 63. A lawyer has an unlisted number on which she of dice three times. what are the probabili- might have been used by the writers of the software that ties that it will take the doctor at least 20 minutes to treat you used to produce the result of Exercise 56.1 calls every half-hour and exceed 2. is (a) 2 complaints on any given day. the velocity of a ties that it will receive gas molecule. ⎧ (b) 5 complaints altogether on any two given days. If. edition.6.5x .9 calls every half-hour. If the number of minutes it takes a service station 54. Poisson distribution with λ = 3. a listed number on which she receives on the average 10. This question has been intentionally omitted for this (c) at most three fish in 4 hours? edition. find the probability that Y = X 2 will receives on the average 2. from the density function (b) at most 6 calls? . where m the mass ities that a person fishing there will catch of the molecule is a random variable having a gamma (a) four fish in 2 hours. The number of fish that a person catches per hour where β depends on its mass and the absolute tem. at Woods Canyon Lake is a random variable having the perature and k is an appropriate constant. (b) at least 12 minutes to balance three tires? 56. (c) three patients? SEC.5 e−0. According to the Maxwell–Boltzmann law of theoret. Functions of Random Variables SECS. 53. 6 will receive altogether 67. Show that Poisson distribution with λ = 1. distribution. (b) at least two fish in 3 hours. 3–4 61. (a) one patient. This question has been intentionally omitted for this (a) less than 8 minutes to balance two tires. the diameter of a circle is selected at random (a) 14 calls. Describe how the probability integral transformation bution with the parameter θ = 9. the probability density of V. What are the probabil- the kinetic energy E = 12 mV 2 . (b) two patients.3. what are the probabilities that in half an hour she SEC. If X is the number of 7’s obtained when rolling a pair 58. ⎨kv2 e−βv2 for v > 0 (c) at least 12 complaints altogether on any three given f (v) = ⎩0 elsewhere days? 62. If it can be assumed that the 66. If the number of minutes that a doctor spends with a patient is a random variable having an exponential distri- 57. Use a computer program to generate 10 “pseudoran- dom” numbers having the standard normal distribution. exponential distribution with the parameter θ = 5. If the number of complaints a dry-cleaning establish- ment receives per day is a random variable having the 52. independent random variables having Poisson distribu- tions. 64. what are the probabilities that the attendant will take 55. 68. a probability distribution. In a newspaper ad. the parameters λ1 = 3. a car dealer lists a 2001 Chrysler. 69.6. 60. If the numbers of inquiries 5 he will get about these cars may be regarded as indepen- dent random variables having Poisson distributions with (a) find the value of k so that f(d) is a probability density. indeed. what are (b) find the density function of the areas of the circles so the probabilities that altogether he will receive selected. Show that the underlying distribution function of (b) anywhere from 15 to 20 inquiries about these cars. Example 18 is. a d f (d) = k 1 − . find the probability density 2010 Ford and eight inquiries about the other two cars? of Y which is said to have the log-normal distribution. and use (c) at least 18 inquiries about these cars? a computer program to graph the density function. (a) fewer than 10 inquiries about these cars. With reference to Exercise 59.8. 59. 231 . and a 2008 Buick.6. If X = ln Y has a normal distribution with the mean μ ity that the car dealer will receive six inquiries about the and the standard deviation σ . 2010 Ford. λ2 = 5. what is the probabil. and λ3 = 4. 0 < d < 5. −1.2919. 0) = 9 . w) = 5e−v for 0. v) = 12 over the region bounded by v = 0.475. T. g( 3 ) = 1 . 1) = 2 . New York: Macmillan Publishing Company.. S. −1) = 6 . 1973. of Theorem 1 apply separately for each of the subin- Wilks. and g(y) = 0 elsewhere. E. z = 1.. 2) = 3 . R. Mass. Upper Saddle River. 9 h(−2) = 15 . h(v) = e−v 1 · z−3/4 for 1 < z < 81 and h(z) = 0 elsewhere. 23 (a) f (2. 1) = 18 and g(A) = 0 elsewhere. 4th ed. ical Statistics. A First Course in Mathematical Statis- interval within the range of X for which f (x) Z 0 can be tics. y < 3.: Addison-Wesley Publishing Com- partitioned into k subintervals so that the conditions pany. g(1) = 14 . Miller and Freund’s Probability and texts on mathematical statistics. and g(2.. Mathematical Statistics. 33 The marginal distribution is the Cauchy distribution 1 2 3 g(y) = 2y for 0 < y < 1 and g(y) = 0 elsewhere. g(u) = 14 (2 − u) for 0 < u < 2 and g(u) = 0 f (y) = 0 elsewhere.4 and v > 0. Answers to Odd-Numbered Exercises 1 g(y) = θ1 ey e−(1/θ) ey for −q < y < q. π 4 + y2 1 5 (a) f (y) = · (e−y/θ1 − e−y/θ2 ) for y > 0 and 35 f (u. and g(z. probability that the current gain will exceed the required rent gain measurements made on a certain transistor are minimum value of 6. g(u) = 14 (2 + u) f (y) = 0 elsewhere. 6th ed. 36 36 = 363 .1728. 36 36 36 6 .: Prentice Hall. (b) 0. and Myers. −1) = 2 . 1 . g(y) = · for −q < y < q. and 65 27 (b) g(2) = 36 36 36 36 1 67 (a) 25 . u) = 0 elsewhere. g(1) = 13 . g(16) = 1 . Reading. 1) = 61 (a) 0. 1 < y < 2. 0. A generalization of Theorem 1. g( 1 ) = 12 . 0) = 9 6 16 1 1 − 12 ln y−μ 2 σ 29 μ = 0 and σ 2 = 2.. −2. 69 g(y) = √ · ·e for y > 0 and g(y) = 0 2π σ y 31 g(z. 1978. 2. 13 g(0) = 13 . g(1.05. 1989.1420. R. may be found in Wiley & Sons. and h(2) = 15 ..8 and σ = 0. (c) 0. 0) = 1 .. f (4. 4th ed. 0.. f (4. If cur. g(4) = 10 . Inc. 2 27 3 27 4 27 (b) g(0) = 12 .2008. 6 21 (a) g(y) = 18 y−3/4 for 0 < y < 1 and g(y) = 14 for 1 < 53 h(r) = 2r for 0 < r < 1 and h(r) = 0 elsewhere. 63 (a) 0. V. f (5. A.570.J. The logarithm of the ratio of the output to the input normally distributed with μ = 1.2 < w < 0. and f (u. g(5) = 12 . g(3) = 4 .0. Functions of Random Variables 70. (b) 0. Hogg. −2) 59 (a) 0. u = −v. π 4 π 25 (b) g(0. 37 g(w. g(2. g( 2 ) = 6 . z = 0. and z = u2 . θ1 − θ2 1 and 2v + u = 2. 43 It is a gamma distribution with the parameters αn 27 27 27 and β. which applies when the Roussas. (b) g(A) = 25 √ A−1/2 − 1 for 0 < A < 25 9 g(6) = 36 . 2000. and z = w. z) = 24w(z − w) over the region bounded by w = 11 (a) g(0) = 278 . and f (6. 2) = 14425 . 5 1 . Inc. Inc. elsewhere..   24 . h(0) = 35 . g(y) = 3(2 − y)(7y − 4) for 1 −1 11 17 g(y) = y 3 . 0) = 1 . R. New York: Macmillan Publishing Company. g(2) = 13 . References The use of the probability integral transformation in More detailed and more advanced treatments of the problems of simulation is discussed in material in this chapter are given in many advanced Johnson. and Craig. R. 55 g(v. (b) f (y) = · ye−y/θ for y > 0 and θ2 for −2 < u … 0. Walpole. A. New York: John tervals..3817. G. (b) h(z) = 16 for v > 0 and h(v) = 0 elsewhere. g(1. 51 g(y) = 11 9 · y2 for 0 < y … 1. in Statistics for Engineers. f (3. 1. v) = 0 elsewhere. u = 1. G. f (4. √ h(z) = 6z + 6 − 12 z for 0 < z < 1 and h(z) = 0 elsewhere. 0) = 361 . (b) 0. (c) 0. 36 36 2 ..1093. f (3. Introduction to Mathemat- N. H. 0) = 4 . g(2. z) = 0 elsewhere. 0. u) = 12z(u−3 − u−2 ) over the region bounded by elsewhere. S. f (5. Inc. find the current of a transistor is called its current gain. 1) = 5 .. 1962. for instance. 232 . g(w. Probability and Statistics for Engineers and Scientists. not all samples lend themselves to valid generalizations about the populations from which they came. and so forth. suppose a scientist must choose and then weigh 5 of 40 guinea pigs as part of an experiment. To illustrate. most of the methods of infer- ence discussed in this chapter are based on the assumption that we are dealing with From Chapter 8 of John E. In fact. which consists of the weights of all 40 guinea pigs. Marylees Miller. an engineer selects 10 of these transistors. In this way. Inc. suppose that. Copyright © 2014 by Pearson Education. Also. called statistics. The theory to be given in this chapter forms an important foundation for the theory of statistical inference. A set of numbers from which a sample is drawn is referred to as a population. we say that they constitute a sample from this exponential population. its mean. Inasmuch as statistical inference can be loosely defined as a process of drawing conclusions from a sample about the population from which it is drawn.) The properties of these distributions then allow us to make probability statements about the resulting inferences drawn from the sample about the population. All rights reserved. and records for each one the time to failure. In statistics. it is useful to have the following definition. If these times to failure are values of independent random variables having an exponential distribution with the parameter θ . it is preferable to look upon the weights of the 5 guinea pigs as a sample from the popu- lation. its variance. Eighth Edition. As can well be imagined. 233 . This is how the term “sample” is used in everyday language. Freund’s Mathematical Statistics with Applications. POPULATION.Sampling Distributions 1 Introduction 5 The t Distribution 2 The Sampling Distribution of the Mean 6 The F Distribution 3 The Sampling Distribution of the Mean: 7 Order Statistics Finite Populations 8 The Theory in Practice 4 The Chi-Square Distribution 1 Introduction Statistics concerns itself mainly with conclusions and predictions resulting from chance outcomes that occur in carefully planned experiments or investigations. Irwin Miller. tests them over a period of time. the population as well as the sample consists of numbers. to estimate the average useful life of a certain kind of transistor. a layman might say that the ones she selects constitute the sample. The distribution of the numbers constituting a popu- lation is called the population distribution. Drawing such conclusions usually involves taking sample observations from a given population and using the results of the sample to make inferences about the popu- lation itself. To do this requires that we first find the distributions of certain functions of the random variables whose values make up the sample. (An example of such a statistic is the sample mean. DEFINITION 1. . Here. . x2 . but large enough to be treated as if they were infinite. these definitions apply only to random samples. . . X2 . . be defined for any set of random variables X1 . xn ) = f (xi ) i=1 where f (xi ) is the value of the population distribution at xi . In practice. . sampling without replacement from finite populations is discussed in section 3. Xn are independent and identically distributed random variables. . and s2 are values of the corresponding random † The note has been intentionally omitted for this edition. constituting a random sample. . then the sample mean is given by  n Xi i=1 X= n and the sample variance is given by  n (Xi − X)2 i=1 S2 = n−1 As they are given here. DEFINITION 3. Thus. . x2 . . Thus. . . Intuitively. . . . this makes more sense and it con- forms with colloquial usage. but the sam- ple mean and the sample variance can. we might calculate  n  n xi (xi − x)2 i=1 i=1 x= and s2 = n n−1 for observed sample data and refer to these statistics as the sample mean and the sample variance. Xn . that is. .” “statistic. we often deal with random samples from populations that are finite. most statistical theory and most of the methods we shall discuss apply to samples from infinite populations. the xi . Sampling Distributions random samples. . X2 . . DEFINITION 2. we say that they constitute a random sample from the infinite population given by their common distribution. . . . Observe that Definition 2 and the subsequent discussion apply also to sampling with replacement from finite populations. . If X1 . .” “sample mean. 234 . . . . Typical of what we mean by “statistic” are the sample mean and the sample variance. RANDOM SAMPLE. Statistical inferences are usually based on statistics. Xn constitute a random sample. . It is common practice also to apply the terms “random sample. on random variables that are functions of a set of random variables X1 .” and “sample variance” to the values of the random variables instead of the random variables themselves. Random samples from finite populations will be treated later in Section 3. Xn . xn ). SAMPLE MEAN AND SAMPLE VARIANCE. X2 . x. . If f (x1 . X2 . and we shall begin here with a definition of random samples from infinite populations. similarly. x2 . xn ) is the value of the joint distribution of such a set of random vari- ables at (x1 . by virtue of independence we can write  n f (x1 . If X1 . it is necessary that we find the distribution of such statistics. X2 . n the sample size. . The formula for the standard error of the mean. X2 . . we can expect values of X to be closer to μ. Then. we conclude that i=1  n   1 2 1 2 σ2 var(X) = · σ = n · σ = n2 n2 n i=1 It is customary to write E(X) as μX and var(X) as σ 2 and refer to σX as the X standard error of the mean. the formulas for x and s2 are used even when we deal with any kind of data. This means that when n becomes larger and we actually have more information (the values of more random variables). this probability approaches 1. not necessarily sample data. Sample statistics such as the sample mean and sample variance play an important role in estimating the parameters of the popula- tion from which the corresponding random samples were drawn. and S2 . First let us study some theory about the sampling distribution of the mean. . 235 . . by the corollary of a theorem “If the random  n variables X1 . We call these distributions sampling distributions. 2 The Sampling Distribution of the Mean Inasmuch as the values of sampling statistics can be expected to vary from sam- ple to sample. . mak- ing only some very general assumptions about the nature of the populations sampled. . is increased. . These. we get n n   1 1 E(X) = ·μ = n ·μ = μ n n i=1 since E(Xi ) = μ. in which case we refer to x and s2 simply as the mean and the variance. shows that the standard deviation of the distribution of X decreases when n. Xn are independent and Y = ai Xi . X. THEOREM 2. For any positive constant c. Indeed. THEOREM 1. Xn constitute a random sample from an infinite population with the mean μ and the variance σ 2 . the probability that X will take on a value between μ − c and μ + c is at least σ2 1− nc2 When n → q. and other statistics that will be introduced in this chapter. then σ2 E(X) = μ and var(X) = n 1 Proof Letting Y = X and hence setting ai = . σX = σ √ . then var(Y) = i=1  n a2i · var(Xi )”. and we make important use of them in determining the properties of the inferences we draw from the sample about the parameters of the population from which it is drawn. Sampling Distributions variables Xi . are those mainly used in statistical inference. If X1 . . the quantity that they are intended to estimate. CENTRAL LIMIT THEOREM. which concerns the limiting distribution of the standardized mean of n random variables when n→q. 2. More general conditions under which the theorem holds are given in Exercises 7 and 9. b b we get √ √ nt − n μt/σ MZ (t) = M X−μ (t) = e · MX √ σ/ n σ √ t = e− n μt/σ · MnX √ σ n Since nX = X1 + X2 + · · · + Xn . is primarily of theoretical interest. 3. . we obtain σ n √   n μt  t  t 2  t3 ln MZ (t) = − + n · ln 1 + μ1 √ + μ2 2 + μ3 3 √ + · · · σ σ n 2σ n 6σ n n where μ1 . μ2 . Of much more practical value is the central limit theorem.     X X+a t a t t MbX (t) = E(e ) = MX (bt). one of the most important theorems of statistics. X2 . If X1 . . then 1. Proof First using the third part and then the second of the given theorem “If a and b are constants. and the most general conditions under which it holds are referred to at the end of this chapter. Sampling Distributions This result. 236 . THEOREM 3. those of the original random variables Xi . We shall prove this theorem here only for the case where the n random variables are a random sample from a population whose moment-generating function exists. called a law of large numbers. . M X+a (t) = E[e bXt b ] = e b · MX ”. and the moment-generating function MX (t). the variance σ 2 . and μ3 are the moments about the origin of the population distribution. then the limiting distri- bution of X −μ Z= √ σ/ n as n→q is the standard normal distribution. MX+a (t) = E[e (X+a)t ] = eat · M (t). Xn constitute a ran- dom sample from an infinite population with the mean μ. ⎡ ⎤n √ t MZ (t) = e − n μt/σ · ⎣MX √ ⎦ σ n and hence that √ n μt t ln MZ (t) = − + n · ln MX √ σ σ n t Expanding MX √ as a power series in t. . that is. the coefficient of tr is a constant times √ . This is incorrect because var(X) → 0 when n → q. we get nr−2 1 2 lim ln MZ (t) = t n→q 2 and hence 1 2 lim MZ (t) = e 2 t n→q since the limit of a logarithm equals the logarithm of the limit (provided these limits exist). this approximation is used n when n G 30 regardless of the actual shape of the population sampled. For smaller values of n the approximation is questionable. getting ⎧  √ ⎨ n μt t t 2 t 3 ln MZ (t) = − +n μ √ + μ2 2 + μ3 3 √ + · · · σ ⎩ 1σ n 2σ n 6σ n n  2 1 t 2 t3   t  − μ1 √ + μ2 2 + μ3 3 √ + · · · 2 σ n 2σ n 6σ n n  3 ⎫ 1 t t 2 t 3 ⎬ + μ1 √ + μ2 2 + μ3 3 √ + · · · − · · · 3 σ n 2σ n 6σ n n ⎭ Then. we can use the expansion of ln(1 + x) as a power series in x. for r G 2. 237 . Sampling Distributions If n is sufficiently large. this reduces to ⎛ ⎞  1 2 ⎝ μ3 μ1 μ2 μ13 ⎠ t3 ln MZ (t) = t + − + √ +··· 2 6 2 6 σ3 n 1 Finally. but see Theorem 4. on the other hand. In practice. the central limit theorem is interpreted incorrectly as implying that the distribution of X approaches a normal distribution when n → q. we obtain √ √   nμ n μ1 μ2 μ12 2 ln MZ (t) = − + t+ − t σ σ 2σ 2 2σ 2 ⎛ ⎞  μ3 μ1 · μ2 μ13 + ⎝ 3 √ − 3 √ + 3 √ ⎠ t3 + · · · 6σ n 2σ n 3σ n and since μ1 = μ and μ2 − (μ1 )2 = σ 2 . Sometimes. An illustration of this theorem is given in Exercise 13 and 14. collecting powers of t. observing that the coefficient of t3 is a constant times √ and in n 1 general. the central limit theorem does justify approximating the distribution of X with a normal distribution having the σ2 mean μ and the variance when n is large. MX+a (t) = E[e(X+a)t ] = eat · MX (t). . .5000 − 0. we get 2 2  n t 1 t 2 2 MX (t) = eμ· n + 2 ( n ) σ 1 2 σ2 = eμt+ 2 t ( n ) This moment-generating function is readily seen to be that of a normal distribution with the mean μ and the variance σ 2 /n. . we can write   n t MX (t) = MX n and since the moment-generating function of a normal distribution with the mean μ and the variance σ 2 is given by 1 MX (t) = eμt+ 2 σ 2 t2 1 according to the theorem MX (t) = eμt+ 2 σ t . the distribution of X has the mean μX = 200 and the 15 standard deviation σX = √ = 2. then 1. Sampling Distributions EXAMPLE 1 A soft-drink vending machine is set so that the amount of drink dispensed is a ran- dom variable with a mean of 200 milliliters and a standard deviation of 15 milliliters. it follows from 2. THEOREM 4. 3. If X1 .0548. and according to the central limit theorem. 238 . . cN }. Since z = = 1.4452 = 0.6) = 0. c2 . its sampling distribution is a normal distribution with the mean μ and the variance σ 2 /n. . If X is the mean of a random sample of size n from a normal population with the mean μ and the variance σ 2 . 2. 36 204 − 200 this distribution is approximately normal. and Xn are independent ran- b  dom variables and Y = X1 + X2 + · · · + Xn . Proof According to Theorems “If a and b are constants.6. then MY (t) = ni=1 MXi (t) where MXi (t) is the value of the moment-generating function of Xi at t”. ..5 Table III of “Statistical Tables” that P(X G 204) = P(Z G 1. .5. X2 . the distribution of X is a normal distribution regardless of the size of n. this set is referred to as a finite population of size N. In the definition that follows. M X+a (t) =     b X+a t a t t E[e b ] = e · MX b . MbX (t) = E(ebXt ) = MX (bt). It is of interest to note that when the population we are sampling is normal. 3 The Sampling Distribution of the Mean: Finite Populations If an experiment consists of selecting one or more values from a finite set of numbers {c1 . it will be assumed that we are sampling without replacement from a finite population of size N. What is the probability that the average (mean) amount dispensed in a random sam- ple of size 36 is at least 204 milliliters? Solution According to Theorem 1. Therefore. . . . . . and we refer to the mean and the variance of this discrete uniform distribution as the mean and the variance of the finite population. From the joint probability distribution of Definition 4. Thus. X2 is the second value drawn. that is. SAMPLE MEAN AND VARIANCE—FINITE POPULATION. n. . xs ) = N(N − 1) for each ordered pair of elements of the finite population. . then X1 . cN N for r = 1. c2 . Xn is given by 1 g(xr . As in Definition 2. . . . If X1 is the first value drawn from a finite population of size N. . 239 . . . . xn ) = N(N − 1) · . . . . . X2 . . . . Xn are said to constitute a random sample from the given finite population. X2 . x2 . Sampling Distributions DEFINITION 4. Xn is the nth value drawn. 2. . . it follows from the joint probability distribution of Definition 4 that the joint marginal distribution of any two of the random variables X1 . . it follows that the prob- ability for each subset of n of the N elements of the finite population (regardless of the order in which the values are drawn) is n! 1 = N(N − 1) · . to the actual numbers drawn. . . c2 . but here again it is common practice also to apply the term “random sample” to the values of the random variables. . · (N − n + 1) N n This is often given as an alternative definition or as a criterion for the selection of a   N random sample of size n from a finite population of size N: Each of the n possible samples must have the same probability. It also follows from the joint probability distribution of Definition 4 that the marginal distribution of Xr is given by 1 f (xr ) = for xr = c1 . and the joint probability distribution of these n random variables is given by 1 f (x1 . cN } are  N 1  N 1 μ= ci · and σ2 = (ci −μ)2 · N N i=1 i=1 Finally. . . RANDOM SAMPLE—FINITE POPULATION. . The sample mean and the sample variance of the finite population {c1 . . DEFINITION 5. · (N − n + 1) for each ordered n-tuple of values of these random variables. the random sample is a set of random variables. . we can prove the following theorem. ␮1. THEOREM 6. . Xs ) = (ci − μ)(cj − μ) i=1 j=1 N(N − 1) iZj ⎡ ⎤ 1  N ⎢ N ⎥ = · (ci − μ) ⎢ ⎣ (cj − μ)⎥ ⎦ N(N − 1) i=1 j=1 jZi  N  N and since (cj − μ) = (cj − μ) − (ci − μ) = −(ci − μ).1 is called the covariance of X and Y. for random samples from finite populations. Sampling Distributions THEOREM 5. cN }. Xj ) = − into n N N −1 the formula E(Y) = i=1 ai E(Xi ). or C(X. . Xs ) = − · (ci − μ)2 N(N − 1) i=1 1 =− ·σ2 N −1 Making use of all these results. then σ2 cov(Xr . c2 . we get j=1 j=1 jZi 1 N cov(Xr . we get  n 1 E(X) = ·μ = μ n i=1 and n 1  1 σ2 2 var(X) = ·σ +2· − n2 n2 N −1 i=1 i<j σ2 n(n − 1) 1 σ2 = +2· · 2 − n 2 n N −1 σ2 N −n = · n N −1 240 . If Xr and Xs are the rth and sth random variables of a random sample of size n drawn from the finite population {c1 . and cov(Xi . which. cov(X. var(Xi ) = σ 2 . and it is denoted by ␴XY . Y). . let us now prove the following theorem. If X is the mean of a random sample of size n taken without replacement from a finite population of size N with the mean μ and the variance σ 2 . Y)”. N  N 1 cov(Xr . . then σ2 N −n E(X) = μ and var(X) = · n N −1 1 σ2 Proof Substituting ai = . corresponds to Theorem 1. Xs ) = − N −1 Proof According to the definition given here “COVARIANCE. the difference between the two formulas for var(X) is σ usually negligible. having the uniform densities 4. N−1 N−1 241 . ci = E(|Xi − μi |3 ) θ (1 − θ1 ) θ2 (1 − θ2 ) (b) var( ˆ 2) = 1 ˆ 1 − + . holds for a sequence of independent random variables Xi having the respective probability distributions 2. With reference to Exercise 2. when N −1 N is large compared to n. lim [var(Yn )]− 2 · ci = 0 n→q i=1 7. If the first n1 random variables of Exercise 2 have Bernoulli distributions with the parameter θ1 and the 9. 2 fi (xi ) = ples come from normal populations. for the central limit theorem: If with the parameter θ2 . is a sequence of independent random vari- cise 4. . X3 .) ables X1 . . 1 i fi (xi ) = 2 − which we denote by . This question has been intentionally omitted for this normal distribution. . † Since there are many problems in which we are interested in the standard deviation rather than the variance. ˆ Verify that ⎪ ⎪ i ⎪ ⎩0 (a) E() = θ . X3 . ˆ elsewhere θ (1 − θ ) (b) var() ˆ = . the distribution of the tral limit theorem holds for the sequence of random vari- standardized mean of the Xi approaches the standard ables of Exercise 8. then X 1 − X 2 is a ⎪ ⎪ 1 ⎪ ⎩ for xi = ( 12 )i − 1 random variable having a normal distribution with the 2 σ2 σ2 mean μ1 − μ2 and the variance 1 + 2 . If X1 . . then if the variance of sequence of random variables of Exercise 7. Use the sufficient condition of Exercise 7 to show that the n central limit theorem holds. . Xn are where Yn = X1 + X2 + · · · + Xn . X1 . and the formula σX = √ is often used as an approximation n when we are sampling from a large finite population. The following is a sufficient condition. Show that this sufficient condition edition. . This question has been intentionally omitted for this ⎧ edition. in the notation of Exer. Indeed. This question has been intentionally omitted for this 3  n edition. Xn are independent random variables ⎧ having identical Bernoulli distributions with the param. show that. (Hint: Proceed n1 n2 8. as long as the usage is clearly understood. Use the condition of Exercise 9 to show that the cen- becomes infinite when n → q. then the distribution independent and uniformly bounded (that is. X2 . show that if the two sam. The following is a sufficient condition for the central limit theorem: If the random variables X1 . . the term “finite population correction ! N−n N−n factor” often refers to instead of . Sampling Distributions It is of interest to note that the formulas we obtained for var(X) in Theorems 1 N −n † and 6 differ only by the finite population correction factor . Exercises 1. X2 . . then X is the proportion of successes in n trials. A general rule of thumb is to use this approximation when the sample does not constitute more than 5 percent of the population. . ⎪ 1 ⎪ ⎪ for xi = 1 − ( 12 )i ⎨ 3. X2 . ⎪ ⎪ 1 1 ⎪ ⎨ for 0 < xi < 2 − eter θ . . each having an absolute third moment (a) E( ˆ 1 − ˆ 2 ) = θ1 − θ2 . ables. . X2 . 5. This does not matter. n1 n2 and if 6. there exists of the standardized mean of the Xi approaches the stan- a positive constant k such that the probability is zero that dard normal distribution when n → q. the Laplace– other n2 random variables have Bernoulli distributions Liapounoff condition. Yn = X1 + X2 + · · · + Xn 10. . Consider the sequence of independent random vari- as in the proof of Theorem 4. of course. . Use this condi- any one of the random variables Xi will take on a value tion to show that the central limit theorem holds for the greater than k or less than −k). . N togram of these sample means. (a) Does this histogram more closely resemble that of a  n normal distribution than that of Exercise 13? Why? Xi2 2 i=1 nX (b) Which resembles it more closely? S2 = − n−1 n−1 (c) Calculate the mean and the variance of the 20 sample means. N. and 9. Theorem 11 will show the importance of this distribution in making inferences about sample variances. 7. . Sampling Distributions 11. Referring to Exercise 13. 0 ≤ x ≤ 1. the results of Theorem 1 apply rather tion that consists of the 10 numbers 15. now change the sample size the formula for the sample variance can be written as to 30. 21. and 11. 10. If a random variable X has the chi-square distribution with ν degrees of freedom if its probability density is given by ⎧ ⎪ ⎪ 1 ν−2 ⎨ ν/2 x 2 e−x/2 for x > 0 f (x) = 2 (ν/2) ⎪ ⎪ ⎩0 elsewhere 242 . 14. Show that the variance of the finite population edition. Also. 15. analogous to the formula of Exercise 17. 17. We also use χ 2 for values of random variables having chi-square distributions. . . cN } can be written as 13. Explain why. use this formula to recalculate the variance of the E(Y) = and var(Y) = 2 12 sample data of Exercise 18. c2 . means. when we sample with replacement from 16. Also. 18. If a random sample of size n is selected without received by a tow truck operator on eight consecutive replacement from the finite population that consists of working days: 13. 4 The Chi-Square Distribution If X has the standard normal distribution. n⎝ Xi ⎠ − ⎝ 2 Xi ⎠ i=1 i=1 (c) the mean and the variance of Y = n · X are S2 = n(n − 1) n(N + 1) n(N + 1)(N − n) Also. . This question has been intentionally omitted for this 17. Use MINITAB or some other statistical computer  N program to generate 20 samples of size 10 each from the c2i uniform density function f (x) = 1. written as 2 ⎛ ⎞ ⎛ ⎞2 (N + 1)(N − n)  n n (b) the variance of X is 12n . 14. i=1 σ2 = − μ2 (a) Find the mean of each sample and construct a his. 13. . then X 2 has the special gamma distri- bution. 6. . but we shall refrain from denoting the corresponding ran- dom variables by X2 .” where χ is the lowercase Greek letter chi. 18. 11. 13. Find the mean and the variance of the finite popula- a finite population. where X is the capital Greek letter chi. Show that. 2. . . the integers 1. The chi-square distribution is often denoted by “χ 2 distribution. 20. This avoids having to reiterate in each case whether X is a random variable with values x or a random variable with values χ . and this accounts for the important role that the chi-square distribution plays in problems of sampling from normal populations. 12. which is referred to as the chi-square distribution. use this formula to recalculate the variance of the (b) Calculate the mean and the variance of the 20 sample finite population of Exercise 16. show that N+1 19. 14. {c1 . Show that the formula for the sample variance can be (a) the mean of X is . use this formula to calculate the variance of the following sample data on the number of service calls 15. 11. than those of Theorem 6. ν2 . . Proof Using the moment-generating function given previously with ν = 1 and Theorem 7. then X 2 has the chi-square distribution with ν = 1 degree of freedom. . If X has the standard normal distribution. respectively. . 243 . . X2 . X1 has a chi-square distribution with ν1 degrees of freedom. THEOREM 7. then X2 has a chi-square distribution with ν − ν1 degrees of freedom. Xn are independent random variables having standard normal distributions. then  n Y= Xi2 i=1 has the chi-square distribution with ν = n degrees of freedom. . . . and X1 + X2 has a chi-square distribution with ν > ν1 degrees of freedom. then  n Y= Xi i=1 has the chi-square distribution with ν1 + ν2 + · · · + νn degrees of freedom. Two further properties of the chi-square distribution are given in the two theo- rems that follow. If X1 and X2 are independent random variables. More generally. . νn degrees of freedom. we find that 1 MX 2 (t) = (1 − 2t)− 2 i n and it follows the theorem “MY (t) = i=1 MXi (t)” that  n 1 n MY (t) = (1 − 2t)− 2 = (1 − 2t)− 2 i=1 This moment-generating function is readily identified as that of the chi- square distribution with ν = n degrees of freedom. which are given in Theorems 7 through 10. If X1 . THEOREM 8. . THEOREM 10. . and its moment-generating function is given by MX (t) = (1 − 2t)−ν/2 The chi-square distribution has several important mathematical properties. . let us prove the following theorem. THEOREM 9. . Xn are independent random variables having chi-square distributions with ν1 . If X1 . the reader will be asked to prove them in Exercises 20 and 21. Sampling Distributions The mean and the variance of the chi-square distribution with ν degrees of free- dom are ν and 2ν. X2 . ν )=α 244 . Since the chi-square distribution arises in many important applications. . it follows that i=1 n   2  Xi − μ 2 (n − 1)S2 X −μ = + √ σ σ2 σ/ n i=1 With regard to the three terms of this identity. Table V of “Statistical Tables” con- tains values of χα. Now. it follows that the two terms on the right-hand side of the equation are (n − 1)S2 independent.025. .05. If X and S2 are the mean and the variance of a random sam- ple of size n from a normal population with the mean μ and the standard deviation σ .01. 0. we begin with the identity  n  n (Xi − μ)2 = (Xi − X)2 + n(X − μ)2 i=1 i=1 which the reader will be asked to verify in Exercise 22. 0. then 2 P(X G χα.95. To prove part 2. then 1.ν 2 is such that if X is a random variable having a chi-square distribution with ν degrees of freedom. 0.ν 2 for α = 0. χα. In addition to the references to proofs of part 1 at the end of this chapter. 0.99. on the following theorem. since X and S2 are assumed to be independent. Foremost are those based. 30.975. Now. .995. THEOREM 11. . if we divide n each term by σ 2 and substitute (n − 1)S2 for (Xi − X)2 . 2. (n − 1)S2 2. we know from The- orem 8 that the one on the left-hand side of the equation is a random variable having a chi-square distribution with n degrees of freedom. Also. the second term on the right-hand side of the equation is a random variable having a chi-square distribution with 1 degree of freedom. X and S2 are independent. Exercise 31 outlines the major steps of a somewhat simpler proof based on the idea of a conditional moment-generating function. That is. directly or indirectly. according to Theorems 4 and 7. and we conclude that is a random variable having σ2 a chi-square distribution with n − 1 degrees of freedom. 0. and ν = 1. where χα. Proof Since a detailed proof of part 1 would go beyond the scope of this chapter we shall assume the independence of X and S2 in our proof of part 2. Sampling Distributions The chi-square distribution has many important applications.ν 2 is such that the area to its right under the chi-square curve with ν degrees of freedom (see Figure 1) is equal to α. 0.005. the random variable has a chi-square distribution with n − σ2 1 degrees of freedom. and in Exercise 30 the reader will be asked to prove the independence of X and S2 for the special case where n = 2. integrals of its density have been extensively tabulated. 0. 60)2 exceeds 36. To keep a check on the process. n X −μ √ σ/ n 245 . Chi-square distribution. What can one conclude about the process if the standard deviation of such a periodic random sample is s = 0.191. and it is regarded to be “out of control” if the probability that S2 will take on a value greater than or equal to the observed sample value is 0.19 2 = 36.24 σ 2 (0. 5 The t Distribution In Theorem 4 we showed that for random samples from a normal population with the mean μ and the variance σ 2 .84 thousandth of an inch? Solution (n − 1)s2 The process will be declared “out of control” if with n = 20 and σ = 0.01.60 σ2 exceeds χ0. EXAMPLE 2 Suppose that the thickness of a part used in a semiconductor is its critical dimension and that the process of manufacturing these parts is considered to be under control if the true variation among the thicknesses of the parts is given by a standard deviation not greater than σ = 0. When ν is greater than 30. it is assumed here that the sample may be regarded as a random sample from a normal population.60 thousandth of an inch.60). in other words. Of course. v Figure 1. as in Exercise 25 or 28.191. random samples of size n = 20 are taken periodically. the process is declared out of control. Since (n − 1)s2 19(0. the random variable X has a normal distribution σ2 with the mean μ and the variance .84)2 = = 37. Table V of “Statistical Tables” cannot be used and prob- abilities related to chi-square distributions are usually approximated with normal distributions. Sampling Distributions a x2 0 x2a.01 or less (even though σ = 0. If Y and Z are independent random variables. z) = √ e− 2 z ·   y 2 −1 e− 2 2π ν ν  22 2 for y > 0 and −q < z < q. we solve t = √ for z. Y has a chi- square distribution with ν degrees of freedom. Thus. the theory that X −μ follows leads to the exact distribution of √ for random samples from normal S/ n populations. THEOREM 12. the joint density of Y and T is given by ⭸t ⎧   ⎪ ⎪ y t2 ν−1 − 2 1+ ν ⎪ ⎪ 1 ⎨   y 2 e for y > 0 and − q < t < q √ ν ν g(y. Then. This makes it necessary to replace σ with an estimate. getting z = t y/ν y/ν ⭸z " and hence = y/ν. This is an important result. usually with the value of the sample standard deviation S. their joint probability density is given by 1 1 2 1 ν y f (y. we 2 ν finally get   ν +1 − ν+1  2 t2 2 f (t) =   · 1+ for −q < t < q √ ν ν π ν 2 246 . and f (y. Sampling Distributions has the standard normal distribution. Proof Since Y and Z are independent. to use the z √ change-of-variable technique. To derive this sampling distribution. then the distribution of Z T=√ Y/ν is given by   ν +1 − ν+1  2 t2 2 f (t) =   · 1+ for −q < t < q √ ν ν π ν 2 and it is called the t distribution with ν degrees of freedom. z) = 0 elsewhere. and Z has the standard normal distribution. Thus. t) = 2π ν 2 2 ⎪ ⎪ 2 ⎪ ⎪ ⎩0 elsewhere y t2 and. integrating out y with the aid of the substitution w = 1+ . let us first study the more general situation treated in the following theorem. but the major dif- ficulty in applying it is that in most realistic applications the population standard deviation σ is unknown. 1) f (t. 2. 0.ν .ν = −tα. who published his scientific writings under the pen name “Student. Comparison of t distributions and standard normal distribution. When ν is 30 or more. Thus.ν ) = α The table does not contain values of tα. for example. t distribution. . Sampling Distributions The t distribution was introduced originally by W. . a brewery. Table IV of “Statistical Tables”.ν is such that the area to its right under the curve of the t distribution with ν degrees of freedom (see Figure 3) is equal to α.” since the company for which he worked. tα. the t distribution has been tabulated extensively. In fact. . where tα.ν for α > 0.005 and ν = 1. then P(T G tα. probabilities related to the t distribution are usually approximated with the use of normal distributions (see Exercise 35).01. 247 .50. Among the many applications of the t distribution.025. or Student’s t distribution. but have larger variances.05. graphs of t distributions having different numbers of degrees of free- dom resemble that of the standard normal distribution. the t distribution is also known as the Student t distribution. That is. its major application (for which it was originally developed) is based on the following theorem. In view of its importance. S. n (0. the t distribution approaches the standard normal distribution. .10. since the density is symmetrical about t = 0 and hence t1−α. 0. 0.ν is such that if T is a random variable having a t distribution with ν degrees of freedom. a t 0 ta. for large values of υ. did not permit publication by employees. 0. 2) −2 −1 0 1 2 Figure 2. Gosset. contains values of tα. v Figure 3.ν for α = 0. 29. As shown in Figure 2. 10) f (t. 1 gallons.0 t= √ = √ = 8. it was studied as the sampling distribution of the ratio of two independent random variables with chi-square distributions. used in conjunction with experimental designs. Test the claim that the average gasoline consumption of this engine is 12. Sampling Distributions THEOREM 13. and this is how we shall present it here.1/ 16 Since Table IV of “Statistical Tables” shows that for ν = 15 the probability of getting a value of T greater than 2.0. μ = 12. Thus. each divided by its respective degrees of freedom. Since they are also independent by part 1 of Theorem 11. EXAMPLE 3 In 16 one-hour test runs.4 gallons with a standard deviation of 2. 6 The F Distribution Another distribution that plays an important role in connection with sampling from normal populations is the F distribution. the gasoline consumption of an engine averaged 16. 248 . then X −μ T= √ S/ n has the t distribution with n − 1 degrees of freedom. we get x−μ 16. the probability of getting a value greater than 8 must be negligible.4 − 12. Fisher’s F distribution is used to draw statistical inferences about the ratio of two sample variances. Fisher. named after Sir Ronald A.947 is 0.38 s/ n 2. a chi-square distribution with n − 1 degrees of freedom and the standard normal distribution. one of the most prominent statisticians of the last century. Proof By Theorems 11 and 4. the random variables (n − 1)S2 X −μ Y= and Z= √ σ2 σ/ n have. it would seem reasonable to conclude that the true average hourly gasoline consumption of the engine exceeds 12. respectively. substitution into the formula for T of Theo- rem 12 yields X −μ √ σ/ n X −μ T=" = √ S2 /σ 2 S/ n and this completes the proof. it plays a key role in the analysis of variance. and s = 2. x = 16. As such.4.005.1 into the formula for t in Theorem 13. Originally. If X and S2 are the mean and the variance of a random sam- ple of size n from a normal population with the mean μ and the variance σ 2 .0 gallons.0 gallons per hour. Solution Substituting n = 16. Sampling Distributions THEOREM 14. and f (u. we solve u/ν1 f = v/ν2 ν1 ⭸u ν1 for u. v) = 0 elsewhere. Then. to use the change-of- variable technique. the joint density of U and V is given by 1 ν1 u 1 ν2 v f (u. that is. we finally get 2 ν2   ν1 + ν2    ν1   1 2 ν1 2 ν1 −1 ν1 − 2 (ν1 +ν2 ) g(f ) =     ·f 2 1+ f ν1 ν2 ν2 ν2   2 2 for f > 0. getting u = · vf and hence = · v. Proof By virtue of independence. v) = ν ν 2(ν1 +ν2 )/2  1 2  2 2 for f > 0 and v > 0. then U/ν1 F= V/ν2 is a random variable having an F distribution. If U and V are independent random variables having chi-square distributions with ν1 and ν2 degrees of freedom. and g(f ) = 0 elsewhere. integrating out v by v ν1 f making the substitution w = + 1 . 249 . a random variable whose probability density is given by   ν1 + ν2    ν1   1 2 ν1 2 ν1 −1 ν1 − 2 (ν1 +ν2 ) g(f ) =     · f 2 1+ f ν1 ν2 ν2 ν2   2 2 for f > 0 and g(f ) = 0 elsewhere. v)  = 0 elsewhere. Thus. and g(f . the joint density ν2 ⭸f ν2 of F and V is given by  ν1 /2 ν1   v ν1 f ν2 ν ν +ν − 2 ν2 +1     · f 2 −1 v 2 −1 e 1 1 2 g(f .  Now. v) =   · u 2 −1 e− 2 ·   · v 2 −1 e− 2 ν ν 2ν1 /2  2ν2 /2  1 2 2 2 1 ν1 ν2 μ+v =     · u 2 −1 v 2 −1 e− 2 ν ν 2(ν1 +ν2 )/2  1 2  2 2 for u > 0 and v > 0. 05 and 0. for example.ν2 is such that P(F G fα. In view of its importance. then S2 /σ 2 σ 2 S2 F = 12 12 = 22 12 S2 /σ2 σ1 S2 is a random variable having an F distribution with n1 − 1 and n2 − 1 degrees of freedom. THEOREM 15. Table VI of “Statistical Tables”.ν1 . fα. That is. in problems σ2 in which we want to estimate the ratio 12 or perhaps to test whether σ12 = σ22 .ν2 for α = 0.ν1 . the F distribution has been tabulated extensively. If S21 and S22 are the variances of independent random samples of sizes n1 and n2 from normal populations with the variances σ12 and σ22 .ν1 . according to which (n1 − 1)s21 (n2 − 1)s22 χ12 = and χ22 = σ12 σ22 are values of random variables having chi-square distributions with n1 − 1 and n2 − 1 degrees of freedom. The F distribution is also known as the variance-ratio distribution. By “independent random samples. F distribution.ν1 . where fα. 250 . v1. We σ2 base such inferences on independent random samples of sizes n1 and n2 from the two populations and Theorem 11. v2 Figure 4. Sampling Distributions a f 0 fa. for instance.” we mean that the n1 + n2 random variables constituting the two random samples are all independent. contains values of fα. so that the two chi-square random variables are independent and the substitution of their values for U and V in Theorem 14 yields the following result.ν2 is such that the area to its right under the curve of the F distribution with ν1 and ν2 degrees of freedom (see Figure 4) is equal to α.ν2 ) = α Applications of Theorem 14 arise in problems in which we are interested in com- paring the variances σ12 and σ22 of two normal populations.01 and for various values of ν1 and ν2 . 32. . . Xn constitute a random sample from a normal 2/n population with the mean μ and the variance σ 2 . . g(x2 . it also shows that has a chi-square be approximated with the standard normal distribution. Find the percentage errors of the approximations of ences at the end of this chapter. X2 . . X2 . Prove Theorem 10. Xn . 30. for random samples of size n from a normal population with the variance σ 2 . . . the distribution of (b) find the joint density of X. use this method of approximation to rework Exer. .04596. Xn are independent ran. . . X3 = as n→q is the standard normal distribution. 2 −(x2 +u2 ) g(u. Use Theorem 11 to show that. . (Proof of the independence of X and S2 for n = 2) If 21. ·t # 1 E ⎣e σ 2 #x⎦ = (1 − 2t)− 2 n−1 for t < 2ν 2 2ν # 2 28. . . Xn = xn . 251 . x) = ·e the sampling distribution of S2 has the mean σ 2 and the π 2σ 4 variance . Prove Theorem 9. . . cise 26. and show that tribution. n−1 √ 1 − (n−1)s 2 26. . and then set X1 = nX − X2 − · · · − Xn 25. 2 S for random samples from any population with finite second and fourth moments may be found in the book (c) S2 = 2(X1 − X)2 = 2U 2 . (b) the joint density of U = |X1 − X| and X is given by 23. since f (x1 . x3 . . is listed among the refer- 29. . Exercises 26 and 28. distribution with n − 1 degrees of freedom. Cramér listed among the references at the end of (d) the joint density of X and S2 is given by this chapter. Xn by multiply- X −ν ing the conditional density of X obtained in part (a) by √ can be approximated with the standard normal dis- 2ν the joint density of X2 . x) = √ e−x · √ (s2 )− 2 e− 2 s 2 24. . (Proof of the independence of X and S2 ) If −1 Z= " n X1 . it follows that X and S2 are dom. xn |x) = n √ e 2σ 2 mate value of the probability that a random variable hav. x3 . . (c) show that the conditional moment-generating func- 27. . σ 2π ing a chi-square distribution with ν = 50 will take on a value greater than 68. Yn 31. x) = ·e e i=1 i=1 π which we used in the proof of Theorem 11. by H. . Sampling Distributions Exercises 20. If the range of X is the set of all positive √ real num- √ (n − 1)S2 bers. This proof. This question has been intentionally omitted for this edition. X2 . . for −q < x1 < q and −q < x < q. . X3 . n. show that for k > 0 the probability that 2X − 2ν tion of given X = x is σ2 will take on a value less than k equals the probability that ⎡ # ⎤ X −ν k2 (n−1)S2 # √ will take on a value less than k + √ . show that 22. σ2 Also. for −q < xi < q. x) is symmetrical n−1 about x for fixed x. Verify the identity (a) the joint density of X1 and X is given by  n  n 1 −x−2 −(x1 −x)2 (Xi − μ)2 = (Xi − X)2 + n(X − μ)2 f (x1 . X1 and X2 are independent random variables having the standard normal distribution. . 3. X3 = x3 . . ν degrees of freedom and ν is large.0. . due to J. π 2π dom variables having the chi-square distribution with ν = 1 and Yn = X1 + X2 + · · · + Xn .) 1 1 1 1 2 h(s2 . . Use the method of Exercise 25 to find the approxi. . then the limiting for s2 > 0 and −q < x < q. . Show that if X1 . demonstrating that X and S2 distribution of are independent. show that if X is and use the transformation technique to find the condi- a random variable having a chi-square distribution with tional density of X given X2 = x2 . . i = 2. Use the results of Exercises 25 and 27 to show that if X has a chi-square distribution with ν √ degrees√of free. . Xn = xn . (a) find the conditional density of X1 given X2 = x2 . Since this result is free of x. Based on the result of Exercise 24. then for large ν the distribution of 2X − 2ν can (n − 1)S2 independent. given that the actual value of the probability (rounded to five decimals) is 0. . (A general formula for the variance of for u > 0 and −q < x < q. X3 . Shuster. 42. Verify that if Y has a beta distribution with α = 35. However. Show that for the t distribution with ν > 4 degrees freedom. In an effort to deal with the problem of small samples in cases where it may be unreasonable to assume a normal population. edition. . the next largest after that as a 252 .ν2 . Show that the F distribution with 4 and 4 degrees of ν2 freedom is given by is . .ν2 = ν −4 fα. show that Y = has an F distribution with ν2 X of freedom and ν1 degrees of freedom. and β = . V ν2 − 2 39. small samples sometimes must be used in practice. Show that for ν > 2 the variance of the t distribution 40. This question has been intentionally omitted for this has an F distribution with ν1 and ν2 degrees of freedom. and Xn according to size. If we look upon the smallest of the x’s as a value of the random variable Y1 . then X = T 2 has an F distribution with ν1 = 1 with ν degrees of freedom is .ν1 . This question has been intentionally omitted for this ν2 2 edition. By what name did we refer to the t distribution with ν2 Y X= ν = 1 degree of freedom? ν1 (1 − Y) 37. f1−α. If X has an F distribution with ν1 and ν2 degrees of ν u 1 34.. 7 Order Statistics The sampling distributions presented so far in this chapter depend on the assumption that the population from which the sample was taken has the normal distribution. Verify that if X has an F distribution with ν1 and and use this density to find the probability that for inde- ν2 degrees of freedom and ν2 → q. This assumption often is satisfied.) ν u ν1 43. . for example in statistical quality control or where taking and measuring a sample is very expensive. making use of the definition of F in Theo- ν2 − 2 rem 14 and the fact that for a random variable V having $ 6f (1 + f )−4 for f > 0 the chi-square  distribution with ν2 degrees of freedom. the distribution of pendent random samples of size n = 5 from normal pop- Y = ν1 X approaches the chi-square distribution with ν1 ulations with the same variance. (Hint: Make the sub- ν −2 and ν2 = ν degrees of freedom. less than 12 or greater than 2. 38. and suppose that we arrange the values of X1 . Show that for ν2 > 2 the mean of the F distribution 44. S21 /S22 will take on a value degrees of freedom. g(f ) = 1 1 0 elsewhere E = . X2 .ν1 t2 1 (Hint: Make the substitution 1 + = . Statistical inferences based upon such statistics are called nonparametric inference. We will identify a class of nonparametric statistics called order statistics and discuss their statistical properties. the next largest as a value of the random variable Y2 .) 41. Use the result of Exercise 41 to show that (ν − 2)(ν − 4) 6 1 (b) α4 = 3 + . statisti- cians have developed nonparametric statistics. Consider a random sample of size n from an infinite population with a continu- ous density. at least approximately for large samples. Sampling Distributions 33. Verify that if T has a t distribution with ν degrees of ν freedom. as illus- trated by the central limit theorem. 3ν 2 (a) μ4 = . t 2 1 stitution 1 + = . whose sampling distributions do not depend upon any assumptions about the population from which the sample is taken. then 2 36. consider the case where n = 2 and the relationship between the values of the X’s and the Y’s is y1 = x1 and y2 = x2 when x1 < x2 y1 = x2 and y2 = x1 when x2 < x1 Similarly. 2. y2 = x2 .. the probability that r − 1 of the sample values fall into the first interval... we refer to these Y’s as order statistics. when x1 < x3 < x2 ........ Using the mean- value theorem for integrals from calculus. Y1 is the first order statistic. For random samples of size n from an infinite population that has the value f (x) at x. and y3 = x2 .... THEOREM 16.. (We are limiting this discussion to infinite populations with continuous densities so that there is zero probability that any two of the x’s will be alike. In particular. Proof Suppose that the real axis is divided into three intervals... y2 = x2 .. and y3 = x1 .) To be more explicit.. . . n. and the largest as a value of the random vari- able Yn .. one from −q to yr . 1 falls into the second interval. we have % yr +h f (x) dx = f (ξ ) · h where yr F ξ F yr + h yr 253 . . and n − r fall into the third interval is % r−1 %  % n−r n! yr yr +h q f (x) dx f (x) dx f (x) dx (r − 1)!1!(n − r)! −q yr yr +h according to the formula for the multinomial distribution... . . the probability density of the r th order statistic Yr is given by % r−1 % n−r n! yr q gr (yr ) = f (x) dx f (yr ) f (x) dx (r − 1)!(n − r)! −q yr for −q < yr < q.. a second from yr to yr + h (where h is a positive constant)... . and the third from yr + h to q.. Sampling Distributions value of the random variable Y3 .. y1 = x3 .. y2 = x3 . for n = 3 the relationship between the values of the respective random variables is y1 = x1 . Y3 is the third order statistic... and so on.... and y3 = x3 ..... Y2 is the second order statistic..... when x3 < x2 < x1 Let us now derive a formula for the probability density of the rth order statistic for r = 1. Since the population we are sampling has the value f (x) at x. ... when x1 < x2 < x3 y1 = x1 .. the sampling distributions of Y1 and Yn are given by ⎧ ⎪ n ⎨ · e−ny1 /θ for y1 > 0 g1 (y1 ) = θ ⎪ ⎩0 elsewhere and ⎧ ⎪ n ⎨ · e−yn /θ [1 − e−yn /θ ]n−1 for yn > 0 gn (yn ) = θ ⎪ ⎩0 elsewhere and that. the sampling distribution of the median is given by ⎧ ⎪ ⎪ (2m + 1)! −x̃(m+1)/θ ⎨ ·e [1 − e−x̃/θ ]m for x̃ > 0 h(x̃) = m!m!θ ⎪ ⎪ ⎩0 elsewhere 254 . EXAMPLE 4 Show that for random samples of size n from an exponential population with the parameter θ . the largest value in a random sample of size n. is given by % n−1 yn gn (yn ) = n · f (yn ) f (x) dx for −q < yn < q −q Also. the median is defined as 12 (Ym + Ym+1 ). In particular. for random samples of size n = 2m + 1 from this kind of population. Sampling Distributions and if we let h → 0.] In some instances it is possible to perform the integrations required to obtain the densities of the various order statistics. in a random sample of size n = 2m + 1 the sample median X̃ is Ym+1 . is given by % n−1 q g1 (y1 ) = n · f (y1 ) f (x) dx for −q < y1 < q y1 while the sampling distribution of Yn . whose sampling distribution is given by % m % q m (2m + 1)! x̃ h(x̃) = f (x) dx f (x̃) f (x) dx for − q < x̃ < q m!m! −q x̃ [For random samples of size n = 2m. the smallest value in a random sample of size n. we finally get % r−1 % n−r n! yr q gr (yr ) = f (x) dx f (yr ) f (x) dx (r − 1)!(n − r)! −q yr for −q < yr < q for the probability density of the rth order statistic. for other populations there may be no choice but to approximate these integrals by using numerical methods. the sampling distribution of Y1 . the sampling distribution of the median for ran- dom samples of size 2n + 1 is approximately normal with the mean μ̃ and 1 the variance . Verify the results of Example 4. and X̃ shown there for random (Hint: Enumerate all possibilities. consists of the first five positive integers. dom samples of size 2m + 1 from the population of Exer- cise 49. Use the formula for the joint density of Y1 and Yn 51. Yn for random samples of size n from an exponen- tial population. THEOREM 17. which holds when the population density is continuous and nonzero at the & μ̃ population median μ̃. rem 16 to show that the joint density of Y1 and Yn is dom samples of size n from a continuous uniform popu. If we compare this with the 4n variance of the mean. The following is an interesting result about the sampling distribution of the median. Duplicate the method used in the proof of Theo- 46. g(y1 . 54. find the 50. Find the sampling distribution of the median for ran. With reference to part (b) of Exercise 52. 8[f (μ̃)]2 n Note that for random samples of size 2n + 1 from a normal population we have μ = μ̃. Yn . we find that for large samples from normal populations the mean 2n + 1 is more reliable than the median. so 1 f (μ̃) = f (μ) = √ σ 2π πσ2 and the variance of the median is approximately . 49. the mean is subject to smaller chance fluc- tuations than the median. the sampling (b) with replacement from the same population. (b) Use this result to find the joint density of Y1 and Yn dom samples of size n from a population having the beta for the population of Exercise 46. Find the sampling distribution of Y1 for random sam. For large n. 53. covariance of Y1 and Yn . that is. Find the sampling distributions of Y1 and Yn for ran. Sampling Distributions Solution The integrations required to obtain these results are straightforward. Find the sampling distribution of the median for ran. and g(y1 . 52. % n−2 yn 47. given by lation with α = 0 and β = 1. yn ) = n(n − 1)f (y1 )f (yn ) f (x) dx y1 dom samples of size 2m + 1 from the population of Exer- for −q < y1 < yn < q cise 46. 48. 255 .) samples from an exponential population. Find the sampling distributions of Y1 and Yn for ran. distributions of Y1 . which is such that −q f (x) dx = 12 . and they will be left to the reader in Exercise 45. yn ) = 0 elsewhere. that is. which for random samples of size 2n + 1 from an infinite pop- σ2 ulation is . Exercises 45. shown in Exercise 52 and the transformation technique of ples of size n = 2 taken several variables to find an expression for the joint den- (a) without replacement from the finite population that sity of Y1 and the sample range R = Yn − Y1 . distribution with α = 3 and β = 2. Find the mean and the variance of the sampling dis- tribution of Y1 for random samples of size n from the (a) Use this result to find the joint density of Y1 and population of Exercise 46. % yn n−1 2(n − 1) p= f (x) dx E(P) = and var(P) = y1 n+1 (n + 1)2 (n + 2) is What can we conclude from this about the distribution of h(y1 . Thus. The following steps lead to 0 elsewhere the sampling distribution of the statistic P. such as a set of ran- dom numbers generated by a computer. whose values are given by % y1 56. Use the result of Exercise 54 to find the sampling dis- tribution of R for random samples of size n from the w= f (x) dx −q continuous uniform population of Exercise 46. Sampling Distributions 55. and ϕ(w. There are many other methods. Use the result of Exercise 54 and that of part (a) of (b) Use the result of part (a) and the transformation tech- Exercise 52 to find the sampling distribution of R for ran. There are many problems. that can be used to aid in selecting a random sample. for the ran- and P. g(p) = its are called tolerance limits. of a random sample of size n. which is the proportion of a population (having a continuous density) This is the desired density of the proportion of the popu- that lies between the smallest and the largest values of a lation that lies between the smallest and the largest values random sample of size n. to choose the containers. systematic sampling can be used to select units at evenly spaced periods of time or having evenly spaced run numbers. of Exercise 46. the production-line sample should not be taken from a “chunk” of products produced at 256 . Thus. if a sample of product is meant to represent an entire production line. but it always requires care. there are several methods that can be employed to assure that a sample is close enough to random- ness to be useful in representing the distribution from which it came. Care should be taken to assure independence of the observations. p > 0. employing mechanical devices or computer-generated random numbers. and it is of interest to note (a) Use the formula for the joint density of Y1 and Yn that it does not depend on the form of the population shown in Exercise 52 and the transformation technique distribution. In selecting a random sample from products in a warehouse. numbering the containers and using a random device. in which we are interested in the proportion $ n(n − 1)pn−2 (1 − p) for 0 < p < 1 of a population that lies between certain limits. particularly in industrial applications. Then. w + p < 1. a second set of random numbers can be used to select the unit or units in each container to be included in the sample. it should not be taken from the first shift only. p) = 0 elsewhere. Use the result of Exercise 58 to show that. p) = n(n − 1)f (y1 )pn−2 P when n is large? 8 The Theory in Practice More on Random Samples While it is practically impossible to take a purely random sample. a two-stage sampling process can be used. Such lim. Selection of a sample that reasonably can be regarded as random sometimes requires ingenuity. p) = n(n − 1)pn−2 the variance of the sampling distribution of R for random samples of size n from the continuous uniform population for w > 0. is 57. (c) Use the result of part (b) to show that the marginal density of P is given by 58. In selecting a sample from a production line. of several variables to show that the joint density of Y1 59. Care should be taken to assure that only the specified distribution is represented. nique of several variables to show that the joint density of dom samples of size n from an exponential population. Use the result of Exercise 56 to find the mean and ϕ(w. whose values are given by dom variable P defined there. P and W. 257 . Sampling Distributions about the same time. It is this conclusion that underscores the importance of the chi-square. the variance of the sum of the random errors is nσ 2 . are additive. Whenever possible. however. at least to a good approxi- mation. In fact this is usually the case. Random error. Let us assume. and bias. such as not obtaining a representative sample in a survey. that the many individual sources of random error. we are assuming that the random errors have a mean of zero. parallax (not viewing readings straight on) errors in setting up apparatus. The central limit theorem given by Theorem 3 allows us to conclude that X −μ Z= √ σ n is a random variable whose distribution as n → q is the standard normal distribution. The Assumption of Normality It is not unusual to expect that errors are made in taking and recording observa- tions. minor changes in ambient conditions. En . known or unknown. and the Ei are the n random errors that affect the value of the observation. Bias occurs when there is a relatively consistent error. they represent the same set of conditions and settings. often unconscious. where E is the sample mean of the errors E1 . among these imperfections are imprecise markings on measurement scales. It also demonstrates why the normal distribution is of major importance in statistics. Observational error can arise from one or both of two sources. and so forth. μ is the “true” value of the obser- vation. We also can write var(X) = (μ + E1 + E2 + · · · + En ) = nσ 2 In other words. Human judgment in select- ing samples usually includes personal bias. random error. as no human endeavor can be made perfect in applications. Random errors occur as the result of many imper- fections of measurement. t. It is not difficult to see from this argument that most repeated measurements of the same phenomenon are. using a measuring instrument that is not properly calibrated. E2 . normally distributed. . at least approximately. . Errors involving bias can be corrected by discerning the source of the error and making appropriate “fixes” to eliminate the bias. however. at least in the long run. slight differences in materials. and the resulting observations are closely related to each other. . This phenomenon was described by early nineteenth-century astronomers who noted that different observers obtained somewhat different results when determin- ing the location of a star. and F distributions in applications that are based on the assumption of data from normally distributed populations. the use of mechanical devices or random numbers is preferable to methods involving personal choice. We shall assume that E(X) = μ + E(E1 ) + E(E2 ) + · · · + E(En ) = μ In other words. or statistical error. and such judgments should be avoided. Then we can write X = μ + E1 + E2 + · · · + En where the random variable X is an observed value. and σ 2 X = σ 2 /n. It follows that X = μ + E. is some- thing we must live with. and recording errors. . expansion and contraction. ) N−n tor for 73. 1–3 In the following exercises it is assumed that all samples are 68. Independent random samples of sizes 400 are taken (a) increased from 30 to 120. A random sample of size 100 is taken from a normal population with σ = 25. using the result of Exercise 4 and and 129. The actual proportion of men who favor a certain 67.64 and 0.4 and σ = 6. How many different samples of size n = 3 can be of the sample will exceed 4. Using Cheby- (c) decreased from 450 to 50. rather than rent.8. What is the prob- finite population of size N = 22? ability that the mean of the sample will (a) exceed 52. standard deviations σ1 = 20 and σ2 = 30. Based on the cen- tral limit theorem.70.5? drawn from a finite population of size (a) N = 12. what can we (d) decreased from 250 to 40? assert with a probability of at least 0. n1 = 500 men and n2 = 400 258 . Sampling Distributions Applied Exercises SECS. that the samples satisfy the conditions of Exercise 2.70. A random sample of size 64 is taken from a normal (b) a random sample of size n = 5 is to be drawn from a population with μ = 51. Independent random samples of sizes n1 = 30 and 65. If a random sample of size n = 3 is drawn from a finite (b) fall between 50. ing the means μ1 = 78 and μ2 = 75 and the variances ance σ 2 = 256. an exponential population with θ = 4. their home is 0.9. for women is 0. σ12 = 150 and σ22 = 200. population of size N = 50. normal and use the result of Exercise 3 to find k such that (b) n = 50 and N = 300. Use the results of Exercise 3 to (a) Based on Chebyshev’s theorem. Based on 61. A random sample of size n = 81 is taken from an infi. what is the probability that a (c) be less than 50. 62. (b) the central limit theorem? (b) the central limit theorem? 76.25. The actual proportion of families in a certain city who (b) Based on the central limit theorem. find the probability that the mean of the first sample will ity can we assert that the value we obtain for X will fall exceed that of the second sample by at least 4. Find the value of the finite population correction fac.5 and 52. what mean of the sample will differ from the mean of the pop- happens to the standard error of the mean if the sample ulation by 3 or more either way? size is 72. shev’s theorem and the result of Exercise 2. For random samples from an infinite population.6 between 0. What is the probability that the 63. What is the probability of each possible sample if the central limit theorem. 70. (a) Chebyshev’s theorem. what is the probability that the (a) a random sample of size n = 4 is to be drawn from a mean of the sample will be less than 35? finite population of size N = 12. (c) N = 50? 69. P(−k < X 1 − X 2 < k) = 0. Rework part (b) of Exercise 66. between 67 and 83? 75. own. with what proba. A random sample of size n = 225 is to be taken from drawn without replacement unless otherwise specified. (b) N = 20. assuming that the tax proposal is 0.8. ing identical Bernoulli distributions with the parameter nite population with the mean μ = 128 and the standard θ = 0. with what probabil. 74. from each of two populations having equal means and the (b) increased from 80 to 180. what is the probability that the mean 60.3.76.99 (c) n = 200 and N = 800. Assume that the two populations of Exercise 72 are N −1 (a) n = 5 and N = 200.4 if we use (a) Chebyshev’s theorem.99 about the value we will get for X 1 − X 2 ? (By “independent” we mean 64. A random sample of size n = 100 is taken from an n2 = 50 are taken from two normal populations hav- infinite population with the mean μ = 75 and the vari.40 and the corresponding proportion population is not infinite but finite and of size N = 400. A random sample of size n = 200 is to be taken from a uniform population with α = 24 and β = 48.6? particular element of the population will be included in the sample? 71.3. If 84 families in bility can we assert that the value we obtain for X will fall this city are interviewed at random and their responses to between 67 and 83? the question of whether they own their home are looked upon as values of independent random variables hav- 66. With what probability can we assert value we obtain for the sample proportion  ˆ will fall that the value we obtain for X will not fall between 126. with what probability can we assert that the deviation σ = 6. 75. 4–6 89. . ple with the use of a table of random digits. If S21 and S22 are the variances of independent random why a sample that includes only the top can in each stack samples of sizes n1 = 10 and n2 = 15 from normal popu. are sampled to determine the proportion of damaged cans. sample of size 16 exceeds 54.16). An inspector chooses a sample of parts coming from 84. Explain 83. Cans of food. can where χα. 17. the largest (In Exercises 78 through 83. (b) Of what population can this be considered to be a ran- dom sample? 86.01 in Table VI of “Statistical Tables” corresponding to random variables having Bernoulli distributions with the ν1 = 15 and ν2 = 12. but it can be rem 13.05 in Table VI of “Statistical Tables” corresponding to 95. Use this method to find an approximate solution the mean of the population is μ = 28. refer to Tables IV. Find the probability that in a random sample of size n = 3 from the beta population of Exercise 77. 13.03).4 2 must be looked up in Table V of “Statistical we say that the given information supports the claim that Tables”. and in Table IV of “Statistical Tables” corresponding to 11 then including 10 percent of the “good” parts in the sam- degrees of freedom.668 or is less than 12.90 and α = 0. can we say that the given information supports shown that an approximate solution for n is given by the conjecture that the mean of the population is μ = 42? 1 1 1+p 2 81. Sampling Distributions women are interviewed at random. 7 and 6 to 10 degrees of freedom. and their individual 87. V. according to Chebyshev’s theorem. 94. + · ·χ ulation has the mean x = 27. 2 4 1 − p α. would not be a random sample. 8 P(S21 /S22 > 1. 1 (0. . the smallest value will be at least 0. respective parameters θ1 = 0.”) 90. If we base our decision on the statistic of Theo- This kind of equation is difficult to solve. and VI value will be less than 0. 7 a probability of at least 0. are lined up 259 .90)n−1 = ulation has the mean x = 47 and the standard deviation 2n + 18 s = 7. If S1 and S2 are the standard deviations of indepen- dent random samples of sizes n1 = 61 and n2 = 31 from normal populations with σ12 = 12 and σ22 = 18. Find the probability that in a random sample of size proportions of favorable responses? Use the result of n = 4 from the continuous uniform population of Exer- Exercise 5.8 and the variance s2 = 3. Show that ple of size 9 exceeds 7. A random sample of size n = 12 from a normal pop. The claim that the variance of a normal population is is required to be able to assert with probability 1 − α that the proportion of the population contained between the σ 2 = 4 is to be rejected if the variance of a random sam- smallest and largest sample values is at least p. . Use the result of Exercise 56 to find the probability 77. cise 46. SECS. 85. A random sample of size n = 25 from a normal pop.9375 about the value we will get for  ˆ 1 −ˆ 2 .7535.20. What is the probability that this claim will be rejected 92.05 this equation can be written as this claim will be rejected even though σ 2 = 4? 80. Integrate the appropriate chi-square density to find that the range of a random sample of size n = 5 from the the probability that the variance of a random sample of given uniform population will be at least 0.24. Use the result of part (c) of Exercise 58 to find the probability that in a random sample of size n = 10 at least 78.4 If we base our decision on the statistic of Theorem 13. lations with equal variances. used for construction of airplane fuselages.102. of “Statistical Tables. size 5 from a normal population with σ 2 = 25 will fall between 20 and 30. What can we assert.40 and θ2 = 0. Use a computer program to verify the six values of responses are looked upon as the values of independent f0. What is the probability that for p = 0. The claim that the variance of a normal population 80 percent of the population will lie between the smallest is σ 2 = 25 is to be rejected if the variance of a random and largest values.90 and α = 0. 82. find SEC. the difference between the two sample 88. Use the result of part (c) of Exercise 58 to set up an even though σ 2 = 25? equation in n whose solution will give the sample size that 79. Use a computer program to verify the five values of f0. find P(S21 /S22 < 4. 91. Use a computer program to verify the eight entries (a) Why does this method not produce a random sample in Table V of “Statistical Tables” corresponding to 21 of the production of the lathe? degrees of freedom. Sections of aluminum sheet metal of various lengths.5? of the equation for p = 0.90.05. Use a computer program to verify the five entries an automated lathe by visually inspecting all parts. 93.25. with SEC. stacked in a warehouse. . A process error may cause the oxide thicknesses on sample is selected by taking whatever section is passing the surface of a silicon wafer to be “wavy. W.: Princeton University Press. (d) It is multiplied by 2. 1. E. 1 . 3rd ed.5. New York: John Wiley & Sons. it is not an accu.6. and Hartley. H. I. 73 4. Mathematical Statistics.63.σ = . “A Simple Method of Teaching the Indepen- fers from S2 only insofar as we divide by n instead of dence of X and S2 . E.. h(x̃) = 0 else. dom samples from normal populations are given in many advanced texts on mathematical statistics.53%. 55 f (R) = e [1 − e−R/θ ]n−2 for R > 0... 1962. (b) 0. the random variables are inde.0207. F. Inc. Vol. Feller. 69 0. are necessary in taking a random sample of oxide thick- rate representation of the population of all aluminum nesses at various positions on the wafer to assure that the sections. 53 .5.851. S. 1968. A 96. n−1 2 2(n − 1) 57 E(R) = . Sampling Distributions on a conveyor belt that moves at a constant speed.J.. may be found in for Statisticians. H. lation.. N. a proof based on moment-generating func- Extensive tables of the normal. I.. 61 (a) 495 77 pendent and identically distributed. 65 (a) 0. n − 1) is derived in No.2302.” with a constant in front of a station at five-minute intervals. Introduction to Statistical Inference..347. J.6242.0. Pearson. 1968. n − 1 −R/θ 91 0. Shuster. (2m + 1)! 47 h(x̃) = x̃(1 − x̃)m for 0 < x < 1. 19 s2 = 4. 51 (a) 4 3 2 1 g1 (y1 ) 10 10 10 10 77 0. S. Van Nostrand Co.. y1 1 2 3 4 5 79 0. 1950. 49 g1 (y1 ) = 12ny21 (1 − y1 )(1 − 4y1 )3 . 27. Answers to Odd-Numbered Exercises 11 When we sample with replacement from a finite popu. Inc.. chi-square. (b) 9 g1 (y1 ) 25 7 5 3 1 81 t = −1.3056. Vol.5. 67 0. 63 (a) It is divided by 2.99. New York: John Wiley & Proofs of the independence of X and S2 for ran- Sons. (b) It is divided by 1. For as well as in other advanced texts on probability theory. tions may be found in the above-mentioned book by butions may be found in S.9999994.9% and 5. instance. Princeton. Mathematical Methods of Statistics. we satisfy the conditions for random sampling from n+1 (n + 1)2 (n + 2) an infinite population. are given in Wilks. 1973. S. the Lindeberg–Feller conditions. 1962. (n + 1)2 (n + 2) 89 0. the data support the claim. S. Vol. and a somewhat more elementary proof.. (b) 1 . f (R) = 0 θ elsewhere.96. S. that is. An Introduction to Probability Theory and Its Applications. 25 25 25 25 1 83 0. y1 1 2 3 4 75 (a) 0. that is..” The American Statistician. N. Keeping. New York: John Wiley & Sons. Inc. σ 2 = 25. Explain why difference between the wave heights. Inc. What precautions this sample may not be random.J.216.: D.0250. multiplied by 3. Wilks. m!m! where. O. 260 . 29 21.7698. The proof outlined in Exercise 48 is given in tribution of the second sample moment M2 (which dif.. A general formula for the variance of the sampling dis. Biometrika Tables illustrated for n = 3. (c) It is 17 μ = 13. 71 0. and t distri. Cramér. (b) 0. observations are independent? References Necessary and sufficient conditions for the strongest form and a proof of Theorem 17 is given in of the central limit theorem for independent random variables. Prince- ton. if not impossible. His advisors tell him that if he expands now and economic conditions remain good. Eighth Edition. What should the manufacturer decide to do if he wants to minimize the expected loss during the next fiscal year and he feels that the odds are 2 to 1 that there will be a recession? Solution Schematically.000. Copyright © 2014 by Pearson Education.000. hence. to assign numerical values to the consequences of one’s actions and to the probabilities of all eventualities. gains are represented by negative numbers: † Although the material in this chapter is basic to an understanding of the foundations of statistics. maximize expected sales.000 during the next fiscal year. because it is generally considered rational to select alternatives with the “most promising” mathematical expectations— the ones that maximize expected profits. Inc. Although this approach to decision making has great intuitive appeal. there will be a small profit of $8. From Chapter 9 of John E. there will be a profit of $80. All rights reserved. and so on. if he expands now and there is a recession. there will be a loss of $40. minimize expected losses.000. and if he waits at least another year and there is a recession. 261 . Marylees Miller. it is often omitted in a first course in mathematical statistics. minimize expected costs. all these “payoffs” can be represented as in the following table. Freund’s Mathematical Statistics with Applications. in making decisions. there will be a profit of $164. EXAMPLE 1 A manufacturer of leather goods must decide whether to expand his plant capacity now or wait at least another year. if he waits at least another year and economic conditions remain good. Irwin Miller. mathematical expectations are often used as a guide in choos- ing among alternatives. that is. it is not without complications. where the entries are the losses that correspond to the various possibilities and.Decision Theory† 1 Introduction 5 The Minimax Criterion 2 The Theory of Games 6 The Bayes Criterion 3 Statistical Games 7 The Theory in Practice 4 Decision Criteria 1 Introduction In applied situations. for there are many problems in which it is difficult. Since an expected profit (negative expected loss) of $32. is referred to in Exercise 15. and it is only one of many different criteria that can be used in this kind of situation.000 3 3 if he waits at least another year.” is referred to in Exercise 16. he might argue that if he expands his plant capacity now he could lose $40.000) · = −32.000 · + (−8. 2 The Theory of Games The examples of the preceding section may well have given the impression that the manufacturer is playing a game—a game between him and Nature (or call it fate or whatever “controls” whether there will be a recession).000 and.000 There is a recession 40. What should he decide to do if he is a con- firmed pessimist? Solution Being the kind of person who always expects the worst to happen. hence. and another. Each of the “players” has the choice of two moves: The manufacturer has the choice between actions a1 and 262 . Decision Theory Expand now Delay expansion Economic conditions remain good −164.000 We are working with losses here rather than profits to make this example fit the general scheme that we shall present in Sections 2 and 3. As the reader will be asked to show in Exercises 10 and 11. One such criterion. that he will minimize the maximum loss (or maximize the minimum profit) if he waits at least another year. if he delays expan- sion there would be a profit of at least $8. respectively. EXAMPLE 2 With reference to Example 1. based on the fear of “losing out on a good deal. based on optimism rather than pessimism.000.000 3 3 if he expands his plant capacity now. The criterion used in this example is called the minimax criterion.000. and 1 2 −80.000 −80. it follows that the manufacturer should delay expanding the capacity of his plant. Since the probabilities that economic conditions will remain good and that there will be a recession are.000 · + 40.000 −8.000 is preferable to an expected profit (negative expected loss) of $28. the manufacturer’s expected loss for the next fiscal year is 1 2 −164. suppose that the manufacturer has no idea about the odds that there will be a recession. The result at which we arrived in this example assumes that the values given in the table and also the odds for a recession are properly assessed.000 · = −28. changes in these quantities can easily lead to different results. 13 and 23 . and “zero-sum” means that whatever one player loses the other player wins. For instance. but it applies to any kind of competitive situation and. or alternatives) that each player has at his disposal. θ2 ) The amounts L(a1 . θ3 . or fixed. positive payoffs represent losses of Player A and negative payoffs represent losses of Player B. III. that is. and those of Player B are usually labeled 1. The payoffs. . let us begin by explaining what we mean by a zero-sum two-person game. the game is 3 * 4 or 4 * 3. such games are generally much more complicated. as can well be imagined. . θ2 ) L(a2 .) Let us also add that it is always assumed in the theory of games that each player must choose a strategy without knowing what the opponent is going to do and that once a player has made a choice it cannot be changed. Although it does not really matter. if each player has to choose one of two alternatives (as in Example 1). and even in terms of life or death (as in Russian roulette or in the conduct of a war). . . Thus. To introduce some of the basic concepts of the theory of games. In this section we shall consider only finite games. . . a rela- tively new branch of mathematics that has stimulated considerable interest in recent years. . The analogy we have drawn here is not really farfetched. if one player has 3 possible moves while the other has 4. 2. but the moves (choices or alternatives) of Player A are usually labeled I. 3. a3 . (As before. the amounts of money or other considerations that change hands when the players choose their respective strategies. θj ) is the loss of Player A (the amount he has to pay Player B) when he chooses alternative ai and Player B chooses alternative θj . as its name might suggest. as the case may be. the theory of games also includes games that are neither zero-sum nor limited to two players. . choices. This theory is not limited to parlor games. . in units of utility (desirability or satisfaction). instead of a1 . . . games in which each player has only a finite. a2 . are usually shown in a table like that on this page. Games are also classified according to the number of strategies (moves. θ1 ). instead of θ1 . we shall assume here that these amounts are in dollars. . . θ1 ). they can also be expressed in terms of any goods or services. as we shall see. Decision Theory a2 (to expand his plant capacity now or to delay expansion for at least a year). . It is customary in the theory of games to refer to the two players as Player A and Player B as we did in the preceding table. in a zero-sum game there is no “cut for the house” as in profes- sional gambling. L(a2 . . two parties with con- flicting interests). In actual practice. In this term. more generally. L(ai . and Nature controls the choice between θ1 and θ2 (whether economic conditions are to remain good or whether there is to be a recession). are referred to as the values of the loss function that characterizes the particular “game”. which is referred to as a payoff matrix in the theory of games. . and no capital is created or destroyed during the course of play. . it has led to a unified approach to solving problems of statistical inference. Of course. number of possible moves. θ1 ) L(a2 . II. “two- person” means that there are two players (or. Depending on the choice of their moves. we say that it is a 2 * 2 game. but later we shall consider also games where each player has infinitely many moves. 263 . in other words. the problem of Example 2 is typical of the kind of situation treated in the theory of games. . . . Exercise 27 is an example of a game that is not zero-sum. θ2 . θ1 ) (Nature) θ2 L(a1 . but. there are the “payoffs” shown in the following table: Player A (The Manufacturer) a1 a2 Player B θ1 L(a1 . which is called the value of the game. but the third strategy of Player A is dominated by each of the other two. PAYOFF MATRIX. since Strategy 2 will yield more than Strategy 1 regardless of the choice made by Player A. EXAMPLE 4 Given the 3 * 2 zero-sum two-person game Player A I II III 1 −4 1 7 Player B 2 4 3 5 find the optimum strategies of Players A and B and the value of the game. A payoff in game theory is the amount of money (or other numerical consideration) that changes hands when the players choose their respective strategies. Solution In this game neither strategy of Player B dominates the other. the payoff corresponding to Strategies I and 2. the value of the game. the only one left. and it stands to reason that any strategy that is dominated by another should be discarded. and the Player A’s optimum strategy is Strategy I. In a situation like this we say that Strategy 1 is dominated by Strategy 2 (or that Strategy 2 dominates Strategy 1). we find that Player B’s optimum strategy is Strategy 2. and a loss of $4 or a loss of 264 . strategies that are most profitable to the respective players) and the corresponding payoff. Expressing the units as dollars. EXAMPLE 3 Given the 2 * 2 zero-sum two-person game Player A I II 1 7 −4 Player B 2 8 10 find the optimum strategies of Players A and B and the value of the game. Decision Theory DEFINITION 1. a profit of $4 or a loss of $1 is preferable to a loss of $7. The matrix giving the payoff to a given player for each choice of strategy by both players is called the payoff matrix. since a loss of 8 units is obviously preferable to a loss of 10 units. The objectives of the theory of games are to determine optimum strategies (that is. is 8 units. Positive payoffs represent losses of Player A and negative payoffs represent losses of player B. A strategy is a choice of actions by either player. Solution As can be seen by inspection. Also. it would be foolish for Player B to choose Strategy 1. If we do this here. we can discard the third column of the payoff matrix and study the 2 * 2 game Player A I II 1 −4 1 Player B 2 4 3 where now Strategy 2 of Player B dominates Strategy 1. the worst that can happen is that he loses $2. Player B makes sure that she will actually win this amount. the minimax strategies. which means that the game favors Player B. which is the same) by choosing Strategy 2. as is illustrated by the following 3 * 3 zero-sum two-person game: Player A I II III 1 −1 6 −2 Player B 2 2 4 6 3 −2 −6 12 So. Applying the same kind of argument to select a strategy for Player B. but we could make it equitable by charging Player B $2 for the privilege of playing the game and giving the $2 to Player A. Thus. the optimum choice of Player B is Strategy 2. and if Player B announced publicly that she will choose Strategy 2. and by choosing Strategy 2. Unfortunately. The choice of a minimax strategy to make a decision is called the minimax criterion. The process of discarding dominated strategies can be of great help in the solu- tion of a game (that is. she could minimize the maximum loss (or maximize the minimum gain. we might argue as follows: If he chooses Strategy I. MINIMAX STRATEGY. and if she chooses Strategy 3. the worst that can happen is that she loses $6. Decision Theory $3 is preferable to a loss of $5. Thus $2 is the value of the game. Thus. DEFINITION 2. we must look for other ways of arriving at optimum strategies. he could minimize the maximum loss by choosing Strategy I. 265 . In our example. The selection of Strategies I and 2. A strategy that minimizes the maximum loss of a player is called a minimax strategy. the worst that can happen is that he loses $12. the optimum choice of Player A is Strategy II (since a loss of $3 is preferable to a loss of $4). and if he chooses Strategy III. Dominances may not even exist. we find that if she chooses Strategy 1. in finding optimum strategies and the value of the game). From the point of view of Player A. Thus. even if Player A announced publicly that he will choose Strategy I. Player A makes sure that his opponent can win at most $2. if she chooses Strategy 2. if he chooses Strategy II. the worst that can happen is that she wins $2. not all games are spyproof. A very important aspect of the minimax strategies I and 2 of this example is that they are completely “spyproof” in the sense that neither player can profit from knowing the other’s choice. but it is the exception rather than the rule that it will lead to a complete solution. it would still be best for Player A to choose Strategy I. the worst that can happen is that he loses $6. Thus. it would still be best for Player B to choose Strategy 2. the worst that can happen is that she loses $2. By choosing Strategy I. and the value of the game is $3. is really quite rea- sonable. In general. if Player B dis- covered that Player A would try to outsmart her in this way. However. he could switch to Strategy I and thus reduce his loss from $6 to $2. and Player B can minimize her maximum loss by choosing Strategy 2. To avoid this possibility. since the smallest value of each row is also the smallest value of its column. pairs of strategies for which the corresponding entry in the payoff matrix is the smallest value of its row and the greatest value of its column. On the other hand. she could in turn switch to Strategy 1 and increase her gain to $8. DEFINITION 3. SADDLE POINT. the small- est value of the second row. and the best way of doing this is by introducing an element of chance into the selection of strategies. Decision Theory EXAMPLE 5 Show that the minimax strategies of Players A and B are not spyproof in the follow- ing game: Player A I II 1 8 −5 Player B 2 2 6 Solution Player A can minimize his maximum loss by choosing Strategy II. the minimax strate- gies of the two players are not spyproof. and the 3 * 3 game on the previous page has a saddle point corresponding to Strategies I and 2 since 2. each player should somehow mix up his or her behavior patterns intentionally. If a game does not have a saddle point. A saddle point of a game is a pair of strategies for which the corresponding entry in the payoff matrix is the smallest value of its row and the greatest value of its column. Also. it also follows from this exercise that it does not matter in that case which of the saddle points is used to determine the optimum strategies of the two players. in the game of Example 3 there is a saddle point corresponding to Strategies I and 2 since 8. In any case. is the greatest value of the second column. if Player A knew that Player B was going to base her choice on the minimax criterion. Of course. A game that has a saddle point is said to be strictly determined. that is. What we have to look for are saddle points. thus leaving room for all sorts of trickery or deception. the smallest value of the second row. minimax strategies are not spyproof. if a game has a saddle point. and the strategies corresponding to the saddle point are spyproof (and hence optimum) min- imax strategies. In Example 5 there is no saddle point. the 3 * 2 game of Example 4 has a saddle point corresponding to Strategies II and 2 since 3. is the greatest value of the first column. the smallest value of the second row. it is said to be strictly determined. and each player can outsmart the other if he or she knows how the opponent will react in a given situation. is the greatest value of the first column. There exists an easy way of determining for any given game whether minimax strategies are spyproof. The fact that there can be more than one saddle point in a game is illustrated in Exercise 2. 266 . Graphically. this assumes. numbered slips of paper. Player A can expect to lose E = 8x − 5(1 − x) dollars. If we apply these methods to Example 1. if Player A uses 11 slips of paper numbered I and 6 slips of paper numbered II. and then acts according to which kind he randomly draws. Find the value of x that will minimize Player A’s maximum expected loss.41 to the nearest cent. and to find the corresponding value of x. or $3. questionably so. suppose that Player A uses a gambling device (dice. shuffles them thoroughly. The examples of this section were all given without any “physical” interpreta- tion because we were interested only in introducing some of the basic concepts of the theory of games. Player A can expect to lose E = 2x + 6(1 − x) dollars. Solution If Player B chooses Strategy 1. he will be holding his maximum expected loss down to 17 − 5 · 17 = 3 17 . a table of random numbers) that leads to the choice of Strategy I with probability x and to the choice of Strategy II with probability 1 − x. Thus. If a player’s choice of strategy is left to chance. that Nature (which controls whether there is going to be a recession) is a malevolent opponent. the $3. 267 . RANDOMIZED STRATEGY. By con- trast. cards. Of course. we find from Figure 1 that the greater of the two values of E for any given value of x is smallest where the two lines intersect.41 to which Player A can hold down his expected loss and Player B can raise her expected gain is called the value of this game. in Exercise 22 the reader will be asked to use a similar argument to show that Player B will maximize her minimum gain (which is the same as minimizing her maximum loss) by choosing 4 between Strategies 1 and 2 with respective probabilities of 17 and 13 17 and that she 7 will thus assure for herself an expected gain of 3 17 . or $3. or a mixed strategy. and if Player B chooses Strategy 2. Applying the minimax criterion to the expected losses of Player A. we have only to solve the equation 8x − 5(1 − x) = 2x + 6(1 − x) which yields x = 11 17 . we find that the “game” has a saddle point and that the manufacturer’s minimax strategy is to delay expand- ing the capacity of his plant. each strategy is called a pure strategy.41 to the nearest cent. 8 · 11 6 7 As far as Player B of the preceding example is concerned. in a game where each player makes a definite choice of a given strategy. Decision Theory EXAMPLE 6 With reference to the game of Example 5. where we have plotted the lines whose equations are E = 8x − 5(1 − x) and E = 2x + 6(1 − x) for values of x from 0 to 1. the overall strategy is called a randomized strategy. Incidentally. this situation is described in Figure 1. DEFINITION 4. n. matrix and another corresponding to the kth row and the   lth column. . Decision Theory E 9 8 7 ) x 5(1 6  E 8x 2x   5 6(1 E x ) 4 3 2 1 x 0 11 1 17 1 2 3 4 5 6 Figure 1. responding to the ith row and the jth column of the payoff ing is an example of a 3 * 3 Latin square. The follow. player in a game whose payoff matrix is an n * n Latin square. show that 1 2 3   (a) there are also saddle points corresponding to the ith   2 3 1 row and the lth column of the payoff matrix and the kth   3 1 2 row and the jth column. . What is the value of the game? 268 . Diagram for Example 6. If a zero-sum two-person game has a saddle point cor- each column contains the integers 1. Exercises 1. . . it would seem that in a situation like this the manufacturer ought to have some idea about the chances for a recession and hence that the problem should be solved by the first method of Section 1. An n * n matrix is called a Latin square if each row and 2. 2. Also. (b) the payoff must be the same for all four saddle Show that any strategy is the minimax strategy for either points. Decision Theory 3 Statistical Games In statistical inference we base decisions about populations on sample data, and it is by no means farfetched to look upon such an inference as a game between Nature, which controls the relevant feature (or features) of the population, and the person (scientist or statistician) who must arrive at some decision about Nature’s choice. For instance, if we want to estimate the mean μ of a normal population on the basis of a random sample of size n, we could say that Nature has control over the “true” value of μ. On the other hand, we might estimate μ in terms of the value of the sample mean or that of the sample median, and presumably there is some penalty or reward that depends on the size of our error. In spite of the obvious similarity between this problem and the ones of the preceding section, there are essentially two features in which statistical games are different. First, there is the question that we already met when we tried to apply the theory of games to the decision problem of Example 1, that is, the question of whether it is reasonable to treat Nature as a malevolent opponent. Obviously not, but this does not simplify matters; if we could treat Nature as a rational opponent, we would know, at least, what to expect. The other distinction is that in the games of Section 2 each player had to choose his strategy without any knowledge of what his opponent had done or was planning to do, whereas in a statistical game the statistician is supplied with sample data that provide him with some information about Nature’s choice. This also complicates matters, but it merely amounts to the fact that we are dealing with more complicated kinds of games. To illustrate, let us consider the following decision problem: We are told that a coin is either balanced with heads on one side and tails on the other or two-headed. We cannot inspect the coin, but we can flip it once and observe whether it comes up heads or tails. Then we must decide whether or not it is two-headed, keeping in mind that there is a penalty of $1 if our decision is wrong and no penalty (or reward) if our decision is right. If we ignored the fact that we can observe one flip of the coin, we could treat the problem as the following game: Player A (The Statistician) a1 a2 Player B θ1 L(a1 , θ1 ) = 0 L(a2 , θ1 ) = 1 (Nature) θ2 L(a1 , θ2 ) = 1 L(a2 , θ2 ) = 0 which should remind the reader of the scheme in Section 2. Now, θ1 is the “state of Nature” that the coin is two-headed, θ2 is the “state of Nature” that the coin is bal- anced with heads on one side and tails on the other, a1 is the statistician’s decision that the coin is two-headed, and a2 is the statistician’s decision that the coin is bal- anced with heads on one side and tails on the other. The entries in the table are the corresponding values of the given loss function. Now let us consider also the fact that we (Player A, or the statistician) know what happened in the flip of the coin; that is, we know whether a random variable X has taken on the value x = 0 (heads) or x = 1 (tails). Since we shall want to make use of this information in choosing between a1 and a2 , we need a function, a decision function, that tells us what action to take when x = 0 and what action to take when x = 1. 269 Decision Theory DEFINITION 5. DECISION FUNCTION. The function that tells the statistician which decision to make for each action of Nature is called the decision function of a statistical game. The values of this function are given by di (x), where di refers to the ith decision made by the statistician and x is a value of the random variable X whose values give the actions that can be taken by Nature. One possibility is to choose a1 when x = 0 and a2 when x = 1, and we can express this symbolically by writing  a1 when x = 0 d1 (x) = a2 when x = 1 or, more simply, d1 (0) = a1 and d1 (1) = a2 . The purpose of the subscript is to distinguish this decision function from others, for instance, from d2 (0) = a1 and d2 (1) = a1 which tells us to choose a1 regardless of the outcome of the experiment, from d3 (0) = a2 and d3 (1) = a2 which tells us to choose a2 regardless of the outcome of the experiment, and from d4 (0) = a2 and d4 (1) = a1 which tells us to choose a2 when x = 0 and a1 when x = 1. To compare the merits of all these decision functions, let us first determine the expected losses to which they lead for the various strategies of Nature. DEFINITION 6. RISK FUNCTION. The function that gives the expected loss to which each value of the decision function leads for each action of Nature is called the risk function. This function is given by R(di , θj ) = E{L[di (X), θj ]} where the expectation is taken with respect to the random variable X . Since the probabilities for x = 0 and x = 1 are, respectively, 1 and 0 for θ1 , and 1 2 and 12 for θ2 , we get R(d1 , θ1 ) = 1 · L(a1 , θ1 ) + 0 · L(a2 , θ1 ) = 1 · 0 + 0 · 1 = 0 1 1 1 1 1 R(d1 , θ2 ) = · L(a1 , θ2 ) + · L(a2 , θ2 ) = · 1 + · 0 = 2 2 2 2 2 R(d2 , θ1 ) = 1 · L(a1 , θ1 ) + 0 · L(a1 , θ1 ) = 1 · 0 + 0 · 0 = 0 1 1 1 1 R(d2 , θ2 ) = · L(a1 , θ2 ) + · L(a1 , θ2 ) = · 1 + · 1 = 1 2 2 2 2 R(d3 , θ1 ) = 1 · L(a2 , θ1 ) + 0 · L(a2 , θ1 ) = 1 · 1 + 0 · 1 = 1 270 Decision Theory 1 1 1 1 R(d3 , θ2 ) = · L(a2 , θ2 ) + · L(a2 , θ2 ) = · 0 + · 0 = 0 2 2 2 2 R(d4 , θ1 ) = 1 · L(a2 , θ1 ) + 0 · L(a1 , θ1 ) = 1 · 1 + 0 · 0 = 1 1 1 1 1 1 R(d4 , θ2 ) = · L(a2 , θ2 ) + · L(a1 , θ2 ) = · 0 + · 1 = 2 2 2 2 2 where the values of the loss function were obtained from the table under Section 3. We have thus arrived at the following 4 * 2 zero-sum two-person game, in which the payoffs are the corresponding values of the risk function: Player A (The Statistician) d1 d2 d3 d4 Player B θ1 0 0 1 1 (Nature) 1 1 θ2 1 0 2 2 As can be seen by inspection, d2 is dominated by d1 and d4 is dominated by d3 , so that d2 and d4 can be discarded; in decision theory we say that they are inadmissi- ble. Actually, this should not come as a surprise, since in d2 as well as d4 we accept alternative a1 (that the coin is two-headed) even though it came up tails. This leaves us with the 2 * 2 zero-sum two-person game in which Player A has to choose between d1 and d3 . It can easily be verified that if Nature is looked upon as a malevolent opponent, the optimum strategy is to randomize between d1 and d3 with respective probabilities of 23 and 13 , and the value of the game (the expected risk) is 13 of a dollar. If Nature is not looked upon as a malevolent opponent, some other criterion will have to be used for choosing between d1 and d3 , and this will be discussed in the sections that follow. Incidentally, we formulated this problem with reference to a two-headed coin and an ordinary coin, but we could just as well have formulated it more abstractly as a decision problem in which we must decide on the basis of a single observation whether a random variable has the Bernoulli distribution with the parameter θ = 0 or the parameter θ = 12 . To illustrate further the concepts of a loss function and a risk function, let us consider the following example, in which Nature as well as the statistician has a continuum of strategies. EXAMPLE 7 A random variable has the uniform density ⎧ ⎪ ⎨1 for 0 < x < θ f (x) = θ ⎪ ⎩0 elsewhere and we want to estimate the parameter θ (the “move” of Nature) on the basis of a single observation. If the decision function is to be of the form d(x) = kx, where k G 1, and the losses are proportional to the absolute value of the errors, that is, L(kx, θ ) = c|kx − θ | where c is a positive constant, find the value of k that will minimize the risk. 271 Decision Theory Solution For the risk function we get  θ/k  θ 1 1 R(d, θ ) = c(θ − kx) · dx + c(kx − θ ) · dx 0 θ θ/k θ k 1 = cθ −1+ 2 k and there is nothing we can do about the factor θ ; but it can easily be verified that √ k 1 k = 2 will minimize − 1 + . Thus, if we actually took the observation and got 2 k √ x = 5, our estimate of θ would be 5 2, or approximately 7.07. 4 Decision Criteria In Example 7 we were able to find a decision function that minimized the risk regard- less of the true state of Nature (that is, regardless of the true value of the parameter θ ), but this is the exception rather than the rule. Had we not limited ourselves to deci- sion functions of the form d(x) = kx, then the decision function given by d(x) = θ1 would be best when θ happens to equal θ1 , the one given by d(x) = θ2 would be best when θ happens to equal θ2 , . . . , and it is obvious that there can be no decision function that is best for all values of θ . In general, we thus have to be satisfied with decision functions that are best only with respect to some criterion, and the two criteria that we shall study in this chapter are (1) the minimax criterion, according to which we choose the decision function d for which R(d, θ ), maximized with respect to θ , is a minimum; and (2) the Bayes criterion. DEFINITION 7. BAYES RISK. If  is assumed to be a random variable having a given distribution, the quantity E[R(d, )] where the expectation is taken with respect to , is called the Bayes risk. Choosing the decision function d for which the Bayes risk is a minimum is called the Bayes criterion. It is of interest to note that in the example of Section 1 we used both of these criteria. When we quoted odds for a recession, we assigned probabilities to the two states of Nature, θ1 and θ2 , and when we suggested that the manufacturer minimize his expected loss, we suggested, in fact, that he use the Bayes criterion. Also, when we asked in Section 2 what the manufacturer might do if he were a confirmed pes- simist, we suggested that he would protect himself against the worst that can happen by using the minimax criterion. 5 The Minimax Criterion If we apply the minimax criterion to the illustration of Section 3, dealing with the coin that is either two-headed or balanced with heads on one side and tails on the other, we find from the table on the previous page with d2 and d4 deleted that for d1 the maximum risk is 12 , for d3 the maximum risk is 1, and, hence, the one that minimizes the maximum risk is d1 . 272 Decision Theory EXAMPLE 8 Use the minimax criterion to estimate the parameter θ of a binomial distribution on the basis of the random variable X, the observed number of successes in n trials, when the decision function is of the form x+a d(x) = n+b where a and b are constants, and the loss function is given by 2 x+a x+a L ,θ =c −θ n+b n+b where c is a positive constant. Solution The problem is to find the values of a and b that will minimize the corresponding risk function after it has been maximized with respect to θ . After all, we have control over the choice of a and b, while Nature (our presumed opponent) has control over the choice of θ . Since E(X) = nθ and E(X 2 ) = nθ (1 − θ + nθ ) it follows that 2 X +a R(d, θ ) = E c −θ n+b c = [θ 2 (b2 − n) + θ (n − 2ab) + a2 ] (n + b)2 and, using calculus, we could find the value of θ that maximizes this expression and then minimize R(d, θ ) for this value of θ with respect to a and b. This is not partic- ularly difficult, but it is left to the reader in Exercise 6 as it involves some tedious algebraic detail. To simplify the work in a problem of this kind, we can often use the equal- izer principle, according to which (under fairly general conditions) the risk function of a minimax decision rule is a constant; for instance, it tells us that in Example 8 the risk function should not depend on the value of θ .† To justify this principle, at least intuitively, observe that in Example 6 the minimax strategy of Player A leads to an expected loss of $3.41 regardless of whether Player B chooses Strategy 1 or Strategy 2. To make the risk function of Example 8 independent of θ , the coefficients of θ and θ 2 must both equal 0 in the expression for R(d, θ ). This yields b2 − n = 0 and √ √ n − 2ab = 0, and, hence, a = 12 n and b = n. Thus, the minimax decision function is given by 1√ x+ n d(x) = 2√ n+ n † The exact conditions under which the equalizer principle holds are given in the book by T. S. Ferguson listed among the references at the end of this chapter. 273 Decision Theory and if we actually obtained 39 successes in 100 trials, we would estimate the param- eter θ of this binomial distribution as 1√ 39 +100 d(39) = 2√ = 0.40 100 + 100 6 The Bayes Criterion To apply the Bayes criterion in the illustration of Section 3, the one dealing with the coin that is either two-headed or balanced with heads on one side and tails on the other, we will have to assign probabilities to the two strategies of Nature, θ1 and θ2 . If we assign θ1 and θ2 , respectively, the probabilities p and 1 − p, it can be seen from the second table in Section 3 that for d1 the Bayes risk is 1 1 0·p+ · (1 − p) = · (1 − p) 2 2 and that for d3 the Bayes risk is 1 · p + 0 · (1 − p) = p It follows that the Bayes risk of d1 is less than that of d3 (and d1 is to be preferred to d3 ) when p > 13 and that the Bayes risk of d3 is less than that of d1 (and d3 is to be preferred to d1 ) when p < 13 . When p = 13 , the two Bayes risks are equal, and we can use either d1 or d3 . EXAMPLE 9 With reference to Example 7, suppose that the parameter of the uniform density is looked upon as a random variable with the probability density  θ · e−θ for θ > 0 h(θ ) = 0 elsewhere If there is no restriction on the form of the decision function and the loss function is quadratic, that is, its values are given by L[d(x), θ ] = c{d(x) − θ }2 find the decision function that minimizes the Bayes risk. Solution Since  is now a random variable, we look upon the original probability density as the conditional density ⎧ ⎪ ⎨1 for 0 < x < θ f (x|θ ) = θ ⎪ ⎩0 elsewhere 274 Decision Theory and, letting f (x, θ ) = f (x|θ ) · h(θ ) we get  e−θ for 0 < x < θ f (x, θ ) = 0 elsewhere As the reader will be asked to verify in Exercise 8, this yields  e−x for x > 0 g(x) = 0 elsewhere for the marginal density of X and  ex−θ θ >x ϕ(θ |x) = 0 elsewhere for the conditional density of  given X = x. Now, the Bayes risk E[R(d, )] that we shall want to minimize is given by the double integral  q  θ 2 c[d(x) − θ ] f (x|θ ) dx h(θ ) dθ 0 0 which can also be written as  q  q  2 c[d(x) − θ ] ϕ(θ |x) dθ g(x) dx 0 0 making use of the fact that f (x|θ ) · h(θ ) = ϕ(θ |x) · g(x) and changing the order of integration. To minimize this double integral, we must choose d(x) for each x so that the integral   q q c[d(x) − θ ]2 ϕ(θ |x) dθ = c[d(x) − θ ]2 ex−θ dθ x x is as small as possible. Differentiating with respect to d(x) and putting the derivative equal to 0, we get  q 2cex · [d(x) − θ ]e−θ dθ = 0 x This yields   q q d(x) · e−θ dθ − θ e−θ dθ = 0 x x and, finally,  q θ e−θ dθ (x + 1)e−x d(x) = x q = = x+1 e−x e−θ dθ x Thus, if the observation we get is x = 5, this decision function gives the Bayes esti- mate 5 + 1 = 6 for the parameter of the original uniform density. 275 Decision Theory Exercises   3. With reference to the illustration in Section 3, show c 1 2 1 that even if the coin is flipped n times, there are only (b − n) + (n − 2ab) + a2 (n + b)2 3 2 two admissible decision functions. Also, construct a table showing the values of the risk function corresponding to Also show that this Bayes risk is a minimum when a = 1 these two decision functions and the two states of Nature. and b = 2, so that the optimum Bayes decision rule is x+1 4. With reference to Example 7, show that if the losses given by d(x) = . are proportional to the squared errors instead of their n+2 absolute values, the risk function becomes 8. Verify the results given on the previous page for the marginal density of X and the conditional density of  cθ 2 2 given R(d, θ ) = (k − 3k + 3) 3 X = x. and its minimum is at k = 32 . 9. Suppose that we want to estimate the parameter θ of the geometric distribution on the basis of a single obser- 5. A statistician has to decide on the basis of a single vation. If the loss function is given by observation whether the parameter θ of the density ⎧ L[d(x), θ ] = c{d(x) − θ }2 ⎨ 2x for 0 < x < θ f (x) = θ 2 and  is looked upon as a random variable having the ⎩0 elsewhere uniform density h(θ ) = 1 for 0 < θ < 1 and h(θ ) = 0 else- where, duplicate the steps in Example 9 to show that equals θ1 or θ2 , where θ1 < θ2 . If he decides on θ1 when the (a) the conditional density of  given X = x is observed value is less than the constant k, on θ2 when the observed value is greater than or equal to the constant k,  x(x + 1)θ (1 − θ )x−1 for 0 < θ < 1 and he is fined C dollars for making the wrong decision, ϕ(θ|x) = which value of k will minimize the maximum risk? 0 elsewhere 6. Find the value of θ that maximizes the risk function of (b) the Bayes risk is minimized by the decision function Example 8, and then find the values of a and b that min- imize the risk function for that value of θ . Compare the 2 results with those given in Section 6. d(x) = x+2 7. If we assume in Example 8 that  is a random variable having a uniform density with α = 0 and β = 1, show that (Hint: Make use of the fact that the integral of any beta the Bayes risk is given by density is equal to 1.) 7 The Theory in Practice When Prof. A. Wald (1902–1950) first developed the ideas of decision theory, it was intended to deal with the assumption of normality and the arbitrariness of the choice of levels of significance in statistical testing of hypotheses. However, statistical deci- sion theory requires the choice of a loss function as well as a decision criterion, and sometimes the mathematics can be cumbersome. Perhaps it is for these reasons that decision theory is not often employed in applications. However, this theory is a remarkable contribution to statistical thinking and, in the opinion of the authors, it should be used more often. In this section we offer an example of how some of the ideas of decision the- ory can be used in acceptance sampling. Acceptance sampling is a process whereby a random sample is taken from a lot of manufactured product, and the units in the sample are inspected to make a decision whether to accept or reject the lot. If the number of defective units in the sample exceeds a certain limit (the “accep- tance number”), the entire lot is rejected, otherwise it is accepted and sent to the warehouse, or to a distributor for eventual sale. If the lot is “rejected,” it is rarely 276 and to make the decision to accept or reject on the basis of the number of defective units found in the sample. we need to max- imize the risk functions with respect to θ and then minimize the results. Thus. On the other hand. θ ) + Cd · [1 − B(0.” that is. and we shall not attempt it here. obtaining R(d1 . n. θ ) = Cw · x · P(x = 0. n. θ ) + Cd · [1 − B(2. and 0 under the second. n. n. n. n. n. the number of defective units found in the sampling inspection. θ ) + Cd · [1 − B(0. θ ) = Cw · nθ · B(0. This is a somewhat daunting task for this example. Decision Theory scrapped. n. it is inspected further and efforts are made to cull out the defective units. θ )] L(d2 . θ ) = Cw · x · P(x = 0|θ ) + Cd · P(x > 0|θ ) = Cw · x · B(0. θ ) represents the cumulative binomial distribution having the para- meters n and θ . The following example shows how elements of deci- sion theory can be applied to such a process. if we use the minimax criterion. EXAMPLE 10 Suppose a manufacturer incurs warranty costs of Cw for every defective unit shipped and it costs Cd to detail an entire lot. Two strategies are to be compared. as follows: Number of Sample Defectives. however. However. the loss functions are L(d1 . The decision function d2 accepts the lot if x = 0 and rejects it otherwise. instead it is “detailed. It is not too dif- ficult. The sampling inspection procedure is to inspect n items chosen at random from a lot containing N units. thus introducing a new assumption that may not be warranted. x Strategy 1 Strategy 2 0 Accept Accept 1 Accept Reject 2 Accept Reject 3 or more Reject Reject In other words. θ ) = Cw · nθ · B(2. θ )] R(d2 . (a) Find the risk function for these two strategies. 2|θ ) + Cd · P(x > 2|θ ) = Cw · x · B(2. does not exceed 2. The corresponding risk functions are found by taking the expected values of the loss functions with respect to x. θ ) + Cd · [1 − B(2. (b) Under what conditions is either strategy preferable? Solution The decision function d1 accepts the lot if x. the acceptance number is 2 under the first strategy. and rejects the lot otherwise. θ )] Either the minimax or the Bayes criterion could be used to choose between the two decision functions. use of the Bayes criterion requires that we assume a prior distribution for θ . n. to examine the difference between the two risk functions as a func- tion of θ and to determine for which values of θ one is associated with less risk than 277 . 1. θ )] where B(x. respectively. Thus. the telephone at the lumberyard is out of order.000 (the difference between the $164.000 profit is replaced by a $200. θ ) Ú B(0. θ ) in the first equation and B(2. we choose Strat- egy 1. sion she wants to attend will be held at Hotel X or Hotel (b) the truck driver of Exercise 13 go first? Y. which would cost her $66. θ ) + 2. what wrong hotel. For instance. δ(θ ) is never positive and. Also.000 loss and the construction site that is 33 miles from the lumberyard. 10. θ ) = (1. 10. Applied Exercises SECS. The risk functions become R(d1 . 10.000 · [1 − B(0. 10. Ms. Cooper of Exercise 12. to compli- (b) the odds for a recession are 7 to 4? cate matters. 27 and profit that he would have made if he had decided to 278 . 10. session she wants to attend will be held at Hotel X. and the cost of detailing a rejected lot is Cd = $2. it is straight forward to show that B(2. θ )] Collecting coefficients of B(2. θ )] R(d2 . θ ) + 2. Cooper feels that the odds are 3 to 1 that the (a) the manufacturer of Example 1. θ ) in the second.00 at Hotel X and $62. Basing their decisions on pessimism as in Example 2.40 at Hotel Y. and she must send in her room reservation immediately. would the manufac. where (b) Ms. for which the acceptance number is 2.000 loss is replaced by a $60. Cooper of Exercise 12 make her reservation. decisions should be reached by (a) If Ms.000) … 0. he will 13. With reference to Example 1. Cooper does not know whether the particular ses. Basing their decisions on optimism (that is. With reference to Example 1. Cooper feels that the odds are 5 to 1 that the session she wants to attend will be held at Hotel X. 10.000 · [1 − B(2. (b) the odds are 2 to 1 that the lumber should go to the (b) the $40. should she make her reservation so as to minimize her expected cost? (c) the truck driver of Exercise 13? (b) If Ms.000 · θ · B(0. two construction sites are 12 miles apart.000)[B(2. the distance he can expect to drive and he feels that turer’s decision remain the same if (a) the odds are 5 to 1 that the lumber should go to the (a) the $164.000. suppose the sample size is chosen to be n = 10. are held partly in Hotel X and partly in Hotel Y. θ ) = 1. θ ) = 1. θ ). θ )] Since θ … 1.000 · θ · B(2. 10. since the risk for Strat- egy 1 is less than or equal to that for Strategy 2 for all values of θ . and where should Ms. (a) Ms.000 profit and construction site that is 33 miles from the lumberyard. what decision would 33 miles from the lumberyard. the warranty cost per defective unit shipped is Cw = $100. he finds that if he delays expansion and economic conditions remain good. the quantity (1.00 for cab fare if she stays at the ing maximum gains or minimizing minimum losses). where 16.000θ − 2. A truck driver has to deliver a load of lumber to one lose out by $84. The convention is so large that the activities 14. and it 15. Decision Theory the other. 1–2 10.000 of two construction sites. but he has misplaced the minimize the manufacturer’s expected loss if he felt that order telling him where the load of lumber should go. θ ) − R(d2 . then subtracting. 10. odds are 3 to 2 that there will be a recession? (c) the odds are 3 to 1 that the lumber should go to the 12. Experience with the proportions of defectives in prior lots can guide us in determining for which “reasonable” values of θ we should compare the two risks. 10.000θ − 2. the odds are 2 to 1 that there will be a recession. which are. and. She is planning to stay only one night. Where should he go first if he wants to minimize 11. θ ) − B(0. 10. Cooper is planning to attend a convention in construction site that is 33 miles from the lumberyard? Honolulu. we obtain δ(θ ) = R(d1 . The (a) the odds for a recession are 3 to 2. Suppose that the manufacturer of Example 1 is the should she make her reservation so as to minimize her kind of person who always worries about losing out on expected cost? a good deal. maximiz- will cost her an extra $6. To illustrate. Eliminate all dominated strategies and (c) What is the value of the game? determine the optimum strategy for each player as well as the value of the game: 24. he can expect a net profit of $140. the first wins −2 −4 5 5 6 5 6 this amount in dollars.” points) and the value of each game: (a) (b) 26. With reference to Exercise 12. suppose that the man- ufacturer has the option of hiring an infallible forecaster 3 −4 for $15. find 22. in Section 2. he can 279 . 20. They know (from similar situations elsewhere) that if Station A gives away free glasses and Station B 27. and at the same time the second writes either 0 or 3 on another slip of 0 3 1 4 4 4 3 paper. With reference to Example 1. he can expect a net profit of $100 on any away free steak knives and Station A does not give away given day. Each of the following is the payoff matrix of a zero. the second wins $2. loss. A small town has two service stations. Station B’s share of the market will increase does not. Verify the two probabilities 17 and 13 17 . would it be worthwhile for the manufac- (a) What randomized strategy should Player A use so as turer to spend this $15. items. (a) Construct the payoff matrix in which the payoffs are 5 7 5 9 the first person’s losses.000 profit that he will actu. otherwise.000. If the sum of the two numbers is odd. With reference to the definition of Exercise 16. or regret. nity loss of (a) Ms. (b) What randomized strategy should Player B use so as ments Player A makes to Player B) for a zero-sum to maximize her minimum expected gain? two-person game. If he lowers his prices while the other station free glasses. and the owner ond person use so as to maximize his minimum expected of Station B is debating whether to give away free steak gain? knives. for the randomized strategy of Player B. respectively. Cooper of Exercise 12. Cooper’s maximum expected (a) (b) 3 −2 14 11 cost? 5 7 16 −2 25. 23. (b) the decision that would minimize the manufacturer’s (b) Find optimum strategies for the owners of the maximum loss of opportunity. if he does by 8 percent. (c) What randomized decision procedure should the sec- tomers as part of a promotional scheme. Decision Theory expand right away and the $80. find the optimum strategy of the country as well as that sum two-person game.000? to minimize his maximum expected loss? 19. two stations.000. and if both stations give away the respective not lower his prices but the other station does.000 and $10. There are two gas stations in a certain block. 21. two-person game: 18. Each of the following is the payoff matrix (the pay. the market. 4 17. what randomized strategy will minimize Ms. Referring to this quantity as an opportunity 3 percent. find (a) Present this information in the form of a payoff table (a) the opportunity losses that correspond to the other in which the entries are Station A’s losses in its share of three possibilities. if Station B gives lowers its prices.000 to find out for certain whether there will be a −3 1 recession. of which it can −5 0 3 7 10 8 defend only one against an attack by its enemy. and the does not give away free steak knives. Find the saddle point (or saddle of its enemy and the value of the “game. Two persons agree to play the following game: The −1 5 −2 3 2 4 9 first writes either 1 or 4 on a slip of paper. which share (b) What randomized decision procedure should the first the town’s market for gasoline. The owner of Station A person use so as to minimize her maximum expected loss? is debating whether to give away free glasses to her cus. Considering the “payoff” to the country to be the total value of the installations it holds after the attack. Station B’s share of the market will increase by ally make). Based on the original 2 to 1 odds that there will be a recession. A country has two airfields with installations worth (c) (d) $2. can attack only one of these airfields and take it successfully only if it is left unde- −12 −1 1 7 5 9 fended. The following is the payoff matrix of a 2 * 2 zero-sum (b) the truck driver of Exercise 13. which we gave the decisions that will minimize the maximum opportu.000. The −6 −3 −3 8 8 11 enemy. on the other hand. Station A’s share of owner of the first station knows that if neither station the market will increase by 6 percent. .00.: Harcourt Brace Jovanovich. With reference to Example 10. The Compleat Strategyst. 1968. and 2 defective 29. A statistician has to decide on the basis of one obser. She can ship vation whether the parameter θ of a Bernoulli distribu. or 2 of the (c) Show that five of the decision functions are not admis. and she feels that the probabilities for 0. Game Theory. and more (Republication of 1959 edition). (c) What should the manufacturer do to minimize her eter θ are regarded as equally likely? Bayes risk if α = $10. J. (a) Construct a table showing the manufacturer’s struct a table showing all the values of the corresponding expected losses corresponding to her three “strategies” risk function. she can inspect both components deducted from her fee) is 100 times the absolute value and repair them if necessary. S. G. so that the possi. one of the components and ship the item with the origi- (a) Construct a table showing the nine possible values of nal guarantee if it works.” he can expect a net profit of $80. and Moses. New York: John New York: McGraw-Hill Book Company. sible and that. 1954. instead of 2. and she wants to minimize her maximum (d) Which decision function is best. Rework Example 10. Basic Ideas and Selected Topics.: Prentice Hall. or she can randomly select of her error. 0. Games of Strategy: Theory and Appli.. if the three possible values of the param. and the three “states” of Nature that 0. remaining decision functions are all equally good.. Decision Theory expect a net profit of $70. The struct a table showing all the values of the corresponding owners of the two gas stations decide independently what risk function. 280 . P. respectively. Orlando. J. Upper Saddle River. tion apply also to the second gas station. A manufacturer produces an item consisting of two Note that this “game” is not zero-sum. 12 . each item without inspection with the guarantee that it will be put into perfect working condition at her factory tion is 0. N. A. and if both stations participate (b) List the eight possible decision functions and con- in this “price war.. M. Wiley & Sons. and Cd will Strategy 2 be preferred? References Some fairly elementary material on the theory of games Owen. Ferguson. N. tion properly. Statistical Analysis for Decision Making.00.: Prentice Hall. Bickel. or 1. Inc.Y. Mathematical Statistics: cations. her loss in dollars (a penalty that is in case it does not work.. respectively. 7 expect a net profit of $105? 30. 1967. A. his loss (a penalty that is deducted from his acceptance number of 1. Chernoff. E. Hamburg.00.00. Statistical Decision Functions. advanced treatments in. New York: Theory. C. for what values of Cw the loss function. and 0. 3–6 ing one of the components is β dollars.10? vations whether the parameter θ of a binomial distribu- 31.20. New York: Academic Press. 1. B. The cost of returning one of the items to the manufacturer for repairs is α dollars. and the cost of 28.. Inc. A statistician has to decide on the basis of two obser. 1977. 1952. components are. ϕ = $30. repairing a faulty component is ϕ dollars. D.J. McGraw-Hill Book Company. Introduction to the Theory of Games.. according to the expected losses? Bayes criterion. 1. fee) is $160 if he is wrong. prices to charge on any given day. Inc. ϕ = $10. McKinsey. Mathematical Statistics: A Decision The- 4th ed. (b) List the nine possible decision functions and con. oretic Approach. 1961.00. according to the minimax criterion. or repair it and also check the the loss function. components do not work. the cost of inspect- SECS.. how might the owners of the gas stations collude so that each could SEC.. L. Saunders and decision theory can be found in Company. other component. components.70.. K. C. Upper Saddle River. 23 and 13 . J. charged by the other. 1950. β = $12. which must both work for the item to func- bility of collusion opens entirely new possibilities.. the (b) What should the manufacturer do if α = $25. and it is assumed that (c) Show that three of the decision functions are not they cannot change their prices after they discover those admissible.: Dover Publications. Fla. M. T. regular prices or should he lower them if he wants to (e) Find the decision function that is best according to the maximize his minimum net profit? Bayes criterion if the probabilities assigned to θ = 14 and (b) Assuming that the profit figures for the first gas sta- θ = 12 are. Wald. N.. changing the first strategy to an tion is 14 or 12 . 1988. H. (a) Construct a table showing the four possible values of 32. (d) Find the decision function that is best according to the (a) Should the owner of the first gas station charge his minimax criterion.. and Doksum. Dresher. Elementary Decision Williams. 0.J. Mineola. Philadelphia: W. (b) 4 and 7 . (b) He should go to the d4 (0) = . d8 (2) = . matter. 1 23 (a) 115 and 6 . 27 (a) He should lower the prices. d1 (1) = . d4 (2) = . d2 (2) = . d8 (1) = . (b) The optimum strategies are for Station A to give away the glasses and for θ1 0 1 Station B to give away the knives. would be the same. Decision Theory Answers to Odd-Numbered Exercises 1 n. d6 (0) = . 21 (a) The payoffs are 0 and −6 for the first row of the table 3 d 1 d2 and 8 and 3 for the second row of the table.333. d3 (2) = . (c) It does not second row are 160 and 0. those of the site that is 27 miles from the lumberyard. construction site that is 27 miles from the lumberyard. d1 (2) = . d7 (0) = . d4 (1) = . (c) He should go to the construction d2 (1) = . (c) − 9 . d7 (2) = . (b) He should go to the construction 29 (a) The values of the first row are 0 and 160. (e) d2 . d6 (2) = . d5 (0) = . 2 2 4 2 2 the game is 5. (b) The decision $10. the value is 8. θ12 + θ22 ize its strategies with probabilities 56 and 16 . d3 (1) = . (c) The optimum strategies are I and 1 and d7 (1) = . 4 2 4 2 4 1 1 1 1 1 17 (a) She should choose Hotel Y. 2 2 2 2 2 the value is −5. from the lumberyard. 15 (a) He should expand his plant capacity now. (b) They could accom- 13 (a) He should go to the construction site that is 33 miles plish this by lowering their prices on alternate days. (b) The optimum strategies are II and 1 and 1 1 1 1 1 the value is 11. (b) She 4 4 4 4 1 1 1 1 1 should choose Hotel Y. the value is 11 (a) The decision would be reversed. 4 2 2 2 4 1 1 1 1 1 19 (a) The optimum strategies are I and 2 and the value of d5 (2) = . site that is 27 miles from the lumberyard. 281 . d2 (0) = .333. 1 1 1 1 (b) d1 (0) = . and the enemy should random- 5  . d5 (1) = . d6 (1) = . d3 (0) = . d8 (0) = . (d) The optimum strategies are I and 2 and (d) d4 . θ2 0 11 11 11 11 2n 25 The defending country should randomize its strategies θ1 θ2 with probabilities 16 and 56 . This page intentionally left blank . whether X or X  is more likely to yield a value that is actually close. S2 may be used as a point estimator of σ 2 . Also. we are in each case using a point estimate of the parameter in question. hence. it would be important to know. or a single point on the real axis. DEFINITION 1. X may be used as a point estimator of μ. Similarly. Marylees Miller. in which case x is a point estimate of this parameter. POINT ESTIMATION. if we must decide whether to use a sample mean or a sample median to estimate the mean of a population. Freund’s Mathematical Statistics with Applications. if we use a value of X to estimate the mean of a population. in which case s2 is a point estimate of this parameter. Copyright © 2014 by Pearson Education. Eighth Edition. From Chapter 10 of John E. is used to estimate the parameter. whereas in tests of hypotheses we must decide whether to accept or reject a specific value or a set of specific values of a parameter (or those of sev- eral parameters). among other things. when we estimate the variance of a population on the basis of a random sample. though actually they are all decision problems and. one of the key problems of point esti- mation is to study their sampling distributions. could be handled by the unified approach. These estimates are called point estimates because in each case a single number. All rights reserved. For instance. problems of statistical inference are divided into problems of estima- tion and tests of hypotheses. to know whether we can expect it to be close. Here we used the word “point” to distinguish between these estimators and estimates and the interval estimators and interval estimates. at least. Inc. we can hardly expect that the value of S2 we get will actually equal σ 2 . For example. 283 . or a value of S2 to estimate a population variance. we refer to the statistics themselves as point estimators. The main difference between the two kinds of problems is that in problems of estimation we must determine the value of a parameter (or the values of several parameters) from a possible continuum of alternatives. Using the value of a sample statistic to estimate the value of a population parameter is called point estimation. Since estimators are random variables. We refer to the value of the statistic as a point estimate.Point Estimation 1 Introduction 6 Robustness 2 Unbiased Estimators 7 The Method of Moments 3 Efficiency 8 The Method of Maximum Likelihood 4 Consistency 9 Bayesian Estimation 5 Sufficiency 10 The Theory in Practice 1 Introduction Traditionally. Correspondingly. an observed sample proportion to estimate the parameter θ of a binomial population. Irwin Miller. but it would be reassuring. For instance. Solution Since E(X) = nθ . DEFINITION 2. and robustness. is an unbiased estimator of θ . the minimax estimator of the binomial parameter θ is biased. If this is the case. n Solution Since E(X) = nθ . 2 Unbiased Estimators Perfect decision functions do not exist. the estimator is said to be unbiased. minimum variance. Thus. its expected value should equal the parameter that it is supposed to estimate. it follows that X 1 1 E = · E(X) = · nθ = θ n n n X and hence that is an unbiased estimator of θ . it is said to be biased. show that unless θ = 12 . Point Estimation Various statistical properties of estimators can thus be used to decide which esti- mator is most appropriate in a given situation. efficiency. show that the sample X proportion. which will expose us to the smallest risk. sufficiency. and in connection with problems of estima- tion this means that there are no perfect estimators that always give the right answer. To illustrate why this statement is necessary. that is. The par- ticular properties of estimators that we shall discuss in Sections 2 through 6 are unbi- asedness. EXAMPLE 2 If X has the binomial distribution with the parameters n and θ . consistency. this concept is expressed by means of the following definition. it follows that ⎛ ⎞ 1√ 1√ 1√ E X+ n ⎜X + 2 n⎟ 2 nθ + n E⎜ √ ⎝ n+ n ⎠ ⎟= √ = 2 √ n+ n n+ n and it can easily be seen that this quantity does not equal θ unless θ = 12 . A statistic  ˆ is an unbiased estimator of the parameter ␪ of a given distribution if and only if E() ˆ = ␪ for all possible values of ␪. The following are some examples of unbiased and biased estimators. UNBIASED ESTIMATOR. which will give us the most information at the lowest cost. . it would seem reasonable that an estimator should do so at least on the aver- age. n 284 . Formally. otherwise. and so forth. EXAMPLE 1 Definition 2 requires that E() = θ for all possible values of θ . . . . Point Estimation EXAMPLE 3 If X1 . . Solution Since the mean of the population is . Xn constitute a random sample from the population given by e−(x−δ) for x > δ f (x) = 0 elsewhere show that X is a biased estimator of δ. X2 . but here there is something we can do about it. it may be of interest to know the extent of the bias. When . given by bn (θ ) = E() ˆ −θ Thus. ASYMPTOTICALLY UNBIASED ESTIMATOR. ˆ based on a sample of size n from a given population. for Example 1 the bias is 1√ 1 nθ + n −θ 2 √ − θ = √2 n+ n n+1 and it can be seen that it tends to be small when θ is close to 1 2 and also when n is large. is a biased esti- mator of θ . then E(X) = μ and var(X) = σn · N−n 2 N−1 ” that E(X) = 1 + δ Z δ and hence that X is a biased estimator of δ. Xn constitute a random sample from a uniform population with α = 0. we say that  ˆ is an asymptotically unbiased estimator of ␪ if and only if lim bn (θ ) = 0 n→q As far as Example 3 is concerned. DEFINITION 3. . the nth order statistic. EXAMPLE 4 If X1 . Also. The following is another example where a minor modification of an estimator leads to an estimator that is unbiased. Letting bn (␪) = E() ˆ −␪ express the bias of an estimator  based on a random sample of size n from a ˆ given distribution. 285 . . the bias is (1 + δ) − δ = 1. q μ= x · e−(x−δ) dx = 1 + δ δ it follows from the theorem “If X is the mean of a random sample of size n taken without replacement from a finite population of size N with the mean μ and the variance σ 2 . it follows that E(X − 1) = δ and hence that X − 1 is an unbiased estimator of δ. X2 . show that the largest sample value (that is. modify this estimator of β to make it unbiased. Since E(X) = 1 + δ. . . Yn ) is a biased estimator of the parameter β. 0 elsewhere we find that the sampling distribution of Yn is given by . Point Estimation Solution n θ · e−yn /θ [1 − e−yn /θ ]n−1 for yn > 0 Substituting into the formula for gn (yn ) = . yn n−1 1 1 gn (yn ) = n · · dx β 0 β n = n · yn−1 n β for 0 < yn < β and gn (yn ) = 0 elsewhere. and hence that . we can explain why we divided by n − 1 and not by n when we defined the sample variance: It makes S2 an unbiased estimator of σ 2 for random samples from infinite populations. E(Yn ) Z β and the nth order statistic is a biased estimator of the parameter β. If S2 is the variance of a random sample from an infinite pop- ulation with the finite variance σ 2 . β n E(Yn ) = n · ynn dyn β 0 n = ·β n+1 Thus. ⎡ ⎤ 1 n E(S2 ) = E ⎣ · (Xi − X)2 ⎦ n−1 i=1 ⎡ ⎤ 1 n = · E ⎣ {(Xi − μ) − (X − μ)} ⎦ 2 n−1 i=1 ⎡ ⎤ 1  n = ·⎣ E{(Xi − μ)2 } − n · E{(X − μ)2 }⎦ n−1 i=1 σ2 Then. then E(S2 ) = σ 2 . since E{(Xi − μ)2 } = σ 2 and E{(X − μ)2 } = . since n+1 n+1 n E · Yn = · ·β n n n+1 =β n+1 it follows that times the largest sample value is an unbiased estimator of the n parameter β. Proof By definition of sample mean and sample variance. However. THEOREM 1. it follows that n ⎡ ⎤ 1  n σ 2 E(S2 ) = ·⎣ σ2 −n· ⎦ = σ2 n−1 n i=1 286 . As unbiasedness is a desirable property of an estimator. the greater the information. it can be shown under very general conditions (referred to in the references at the end of the chapter) that the variance of ˆ must satisfy the inequality 1 var() ˆ G  2  ⭸ ln f (X) n·E ⭸θ where f (x) is the value of the population density at x and n is the size of the random sample. that is. S. the quantity in the denominator is referred to as the information about θ that is supplied by the sample (see also Exercise 19). If  ˆ is an unbiased estimator of θ and 1 var() ˆ =  2  ⭸ ln f (X) n·E ⭸θ then  ˆ is a minimum variance unbiased estimator of θ . or the best unbiased estimator for ␪. It may not be retained under functional transformations. THEOREM 2. the Cramér–Rao inequality. The bias of S as an estimator of σ is discussed. Another difficulty associated with the concept of unbiasedness is that unbiased estimators are not necessarily unique. Point Estimation Although S2 is an unbiased estimator of the variance of an infinite population. it does not necessarily fol- low that ω() ˆ is an unbiased estimator of ω(θ ). The esti- mator with the smaller variance is “more reliable. leads to the following result. This inequality. n+1 For instance. The estimator for the parameter ␪ of a given distribution that has the smallest variance of all unbiased estimators for ␪ is called the minimum variance unbiased estimator.” DEFINITION 4. in Example 6 we shall see that · Yn is not the only unbiased esti- n mator of the parameter β of Example 4. 3 Efficiency If we have to choose one of several unbiased estimators of a given parameter. 287 . If  ˆ is an unbiased estimator of θ . Thus. we usually take the one whose sampling distribution has the smallest variance. and in neither case is S an unbiased estimator of σ . among others. and in Exercise 8 we shall see that X − 1 is not the only unbiased estimator of the parameter δ of Example 3. MINIMUM VARIANCE UNBIASED ESTIMATOR. it is not an unbiased estimator of the variance of a finite population. Here. Keeping listed among the references at the end of this chapter. if  ˆ is an unbiased estimator of θ . The discussion of the preceding paragraph illustrates one of the difficulties asso- ciated with the concept of unbiasedness. the smaller the variance is. in the book by E. 288 . Indeed. Solution Since  2 x−μ 1 − 12 f (x) = √ ·e σ for − q < x < q σ 2π it follows that 2 1 √ x−μ ln f (x) = − ln σ 2π − 2 σ so that ⭸ ln f (x) 1 x−μ = ⭸μ σ σ and hence  2   2  ⭸ ln f (X) 1 X −μ 1 1 E = 2 ·E = ·1 = 2 ⭸μ σ σ σ2 σ Thus. 1 1 σ2  2  = = ⭸ ln f (X) 1 n n·E n· ⭸μ σ2 σ2 and since X is unbiased and var(X) = . It would be erroneous to conclude from this example that X is a minimum vari- ance unbiased estimator of the mean of any population. we use the ratio var( ˆ 1) var( ˆ 2) as a measure of the efficiency of  ˆ 2 relative to  ˆ 1. in Exercise 3 the reader will be asked to verify that this is not so for random samples of size n = 3 from the continuous uniform population with α = θ − 12 and β = θ + 12 . unbiased estimators of one and the same parameter are usually compared in terms of the size of their variances. it follows that X is a minimum variance n unbiased estimator of μ. we say that  less than the variance of  ˆ 1 is relatively more efficient than ˆ 2 . Point Estimation EXAMPLE 5 Show that X is a minimum variance unbiased estimator of the mean μ of a nor- mal population. If  ˆ 1 and ˆ 2 are two unbi- ased estimators of the parameter θ of a given population and the variance of  ˆ 1 is ˆ 2 . Also. As we have indicated. 2X is an unbiased estimator of β. X2 . Solution β (a) Since the mean of the population is μ = according to the theorem “The 2 mean and the variance of the uniform distribution are given by μ = α+β 2 and σ = 12 (β − α) ” it follows from the theorem “If X1 . . n (a) Show that 2X is also an unbiased estimator of β. . then E(X) = μ and var(X) = σn ” that E(X) = β2 and hence that 2 E(2X) = β. . Xn constitute 2 1 2 a random sample from an infinite population with the mean μ and the vari- ance σ 2 . . . we get . Point Estimation EXAMPLE 6 In Example 4 we showed that if X1 . then · Yn is an unbiased estimator of β. Thus. (b) First we must find the variances of the two estimators. X2 . (b) Compare the efficiency of these two estimators of β. . Xn constitute a random sample from a n+1 uniform population with α = 0. . . Using the sampling dis- tribution of Yn and the expression for E(Yn ) given in Example 4. it can be shown that n+1 β2 var · Yn = n n(n + 2) β2 Since the variance of the population is σ 2 = according to the first stated 12 theorem in the example. for example. the relative efficiency is only 25 percent. and for n = 25 it is only 11 percent. it follows from the above (second) theorem that β2 var(X) = and hence that 12n β2 var(2X) = 4 · var(X) = 3n n+1 Therefore. For n = 10. the efficiency of 2X relative to · Yn is given by n n+1 β2 var · Yn n n(n + 2) 3 = = var(2X) β 2 n + 2 3n and it can be seen that for n > 1 the estimator based on the nth order statistic is much more efficient than the other one. 289 . β n n E(Yn2 ) = n· yn+1 n dyn = · β2 β 0 n+2 and 2 n n var(Yn ) = · β2 − ·β n+2 n+1 If we leave the details to the reader in Exercise 27. It is important to note that we have limited our discussion of relative efficiency to unbiased estimators. ˆ Exercises 1. a2 . we judge its merits and make efficiency comparisons on the basis of the mean square error E[( ˆ − θ )2 ] instead of the variance of . Given a random sample of size n from a population is an unbiased estimator of μ? that has the known mean μ and the finite variance σ 2 . . . . an so that 4. what condition must be imposed on the con. Xn constitute a random sample from a 3. what condition must be edition. If X1 . 290 . and for large samples  = πσ2 var(X) 4n Thus. Point Estimation EXAMPLE 7 When the mean of a normal population is estimated on the basis of a random sample of size 2n + 1. it is unbiased by virtue of the symmetry of the normal distribution about its mean. what is the efficiency of the median relative to the mean? Solution From the theorem on the previous page we know that X is unbiased and that σ2 var(X) = 2n + 1 As far as X is concerned. for large samples. 5. X2 . ˆ 1 + k2  k1  ˆ2 6. · (Xi − μ)2 stants k1 and k2 so that n i=1 is an unbiased estimator of σ 2 . the efficiency of the median relative to the mean is approxi- mately σ2 = 2n +21 = var(X) 4n var(X) πσ π(2n + 1) 4n and the asymptotic efficiency of the median with respect to the mean is 4n 2 lim = n→ q π(2n + 1) π or about 64 percent. if  ˆ is not an unbiased estima- tor of a given parameter θ . the mean requires only 64 percent as many observations as the median to estimate μ with the same reliability. . This question has been intentionally omitted for this is also an unbiased estimator of θ ? edition. If  ˆ 1 and ˆ 2 are unbiased estimators of the same 1  n parameter θ . This question has been intentionally omitted for this a1 X1 + a2 X2 + · · · + an Xn edition. . The result of the preceding example may be interpreted as follows: For large samples. This question has been intentionally omitted for this population with the mean μ. . imposed on the constants a1 . If we included biased estimators. show that 2. Therefore. we could always assure our- selves of an estimator with zero variance by letting its values equal the same constant regardless of the data that we may obtain. . . on is also given by the first order statistic. Xn constitute a random sample from a ⭸2 ln f (X) normal population with μ = 0. . X2 . 8. If X is a random variable having the binomial dis. The derivation of this formula takes the following steps: is an unbiased estimator of σ 2 . . . With reference to Example 3. show that −n · E ⭸θ 2  n X2 where f (x) is the value of the population density at x. 18. Point Estimation X +1 17. Y1 ). Y1 ). (a) Differentiating the expressions on both sides of 11. With reference to Example 4. . on the n+1 ple 4. the Cramér–Rao inequality is not first order statistic. n satisfied. Show that is a biased estimator of the binomial an exponential population is a minimum variance unbi- n+2 parameter θ . Show that the mean of a random sample of size n from 7. Is this estimator asymptotically unbiased? ased estimator of the parameter θ . . · Yn .   10. The information about θ in a random sample of size n tor of β based on the smallest sample value (that is. 9. find an unbiased estima- 19. Show that for the unbiased estimator of Exam- tor of δ based on the smallest sample value (that is. pro- i n vided that the extremes of the region for which f (x) Z 0 do i=1 not depend on θ . find an unbiased estima. If X1 . show that n . X f (x) dx = 1 tribution with the parameters n and θ . with respect to θ . show that n · · n X 1− is a biased estimator of the variance of X. See also Exercise 80. . . 21. find the the estimator of part (a) with ω = 12 relative to this esti- constants a1 and a2 such that a1 ˆ 1 + a2  ˆ 2 is an unbiased mator with estimator with minimum variance for such a linear com. n the information given in Exercise 19. Show that the sample proportion is a minimum and the two samples are independent. Show that if  ˆ is an unbiased estimator of θ and a normal population with the mean μ and the variance var() ˆ 2 is not an unbiased estimator of θ 2 . ⭸ ln f (x) · f (x) dx = 0 ment from the finite population that consists of the posi. With reference to Exercise 21. is an unbiased X estimator of μ. . (Hint: Treat as the mean of a random sample of size n (b) the variance of this estimator is a minimum when n from a Bernoulli population with the parameter θ . where 0 F ω F 1. Yn . If  ˆ 2 are independent unbiased estimators of 22. ˆ Z 0. X 2 is the mean of a random sample of size n from a X normal population with the mean μ and the variance σ22 . . by interchanging the order of integration and differen- is given by tiation. If a random sample of size n is taken without replace. n+1 20. Show that the mean of a random sample of size n is a σ22 ω= minimum variance unbiased estimator of the parameter σ12 + σ22 λ of a Poisson population. σ1 + σ2 291 . θ . Rework Example 5 using the alternative formula for (b) · Yn − 1 is an unbiased estimator of k. . find the efficiency of a given parameter θ and var( ˆ 1 ) = 3 · var( ˆ 2 ). ⭸θ tive integers 1. show that n variance unbiased estimator of the binomial parameter (a) ω · X 1 + (1 − ω) · X 2 . (b) Differentiating again with respect to θ . 2. then  σ12 . k. σ2 ω= 2 2 2 bination. k.) 15. show that (a) the sampling distribution of the nth order statistic. . show that yn − 1 n−1  2    f (yn ) = ⭸ ln f (X) ⭸2 ln f (X) k E = −E n ⭸θ ⭸θ 2 for yn = n. 12. 14. ˆ 1 and  16. . . If X 1 is the mean of a random sample of size n from 13. we can determine these variances for ran- n1 dom samples of size 3 from the uniform population is a minimum when ω = . relative to X. ˆ 29. ω = 1 − 10−100 . we showed that X − 1 is 32. and ˆ 3 = are estimators n n+2 3 an unbiased estimator of δ. and σ is very small.ˆ2 = . of the unbiased estimator 30. and var(X) = 12 1 26. 292 . if ω is very close to 1. or its mean square error. was asked to find another unbiased estimator of δ based for what values of n is on the smallest sample value. Verify the result given for var · Yn in Ex- n ample 6. (b) n = 3. is a good indication of its chance fluctuations. the probability that a random variable having this distribution will take on a value that is very close to θ . n1 + n2 by referring instead to the uniform population 25. since the variance of the Cauchy distribution does not exist. Find the efficiency of the (a) the mean square error of  ˆ 2 less than the variance first of these two estimators relative to the second. Show that if  ˆ is a biased estimator of θ . Evidently. X X +1 1 28. The fact that these mea- sures may not provide good criteria for this purpose is illustrated by the following example: Suppose that we want to estimate on the basis of one observation the parameter θ of the population given by  2 1 −1 x−θ 1 1 f (x) = ω · √ · e 2 σ + (1 − ω) · · σ 2π π 1 + (x − θ )2 for −q < x < q and 0 < ω < 1. With reference to Exercise 12. If X 1 and X 2 are the means of independent random of this estimator relative to the one of part (b) of Exer- samples of sizes n1 and n2 from a normal population with cise 12 for the mean μ and the variance σ 2 . If  ˆ1 = . this population is a combination of a nor- mal population with the mean θ and the variance σ 2 and a Cauchy population with α = θ and β = 1. find the efficiency of 2Y1 n = 3. say. show that the variance (a) n = 2. 3 Show that E(X) = 12 . var(X) = 361 . and find the efficiency of 1 ? ˆ 4 Consistency In the preceding section we assumed that the variance of an estimator. and X3 constitute a random sample of size n = 3 from a normal population with the mean μ and the X1 + 2X2 + X3 1 for 0 < x < 1 variance σ 2 . If X1 and X2 constitute a random sample of size n = 2 for this population so that for a random sample of size from an exponential population. Since the variances of the mean and the midrange ω · X 1 + (1 − ω) · X 2 are not affected if the same constant is added to each observation. Point Estimation 23. say. find the efficiency of relative f (x) = 4 0 elsewhere X1 + X2 + X3 to as estimates of μ. Yet. of 1 . E(X 2 ) = 13 . and in Exercise 8 the reader of the parameter θ of a binomial population and θ = 12 . σ = 10−100 . show that 2X − 1 is (b) the mean square error of  ˆ 3 less than the variance also an unbiased estimator of k. and hence is a very good estimate of θ . where Y1 is the first order statistic and 2Y1 and X are both unbiased estimators of the parameter θ . n1 + n2 ⎧ ⎨ 1 1 24. With reference to Example 3. 31. neither will the variance of this estimator. then n+1 ˆ − θ )2 ] = var() E[( ˆ + [b(θ )]2 27. find the efficiency of 1 for θ − <x<θ + f (x) = 2 2 the estimator with ω = 12 relative to the estimator with ⎩0 elsewhere n1 ω= . X2 . With reference to Exercise 23. is prac- tically 1. Now. If X1 . is an immediate consequence of Chebyshev’s theorem. the estimators will take on values that are very close to the respective parameters. Point Estimation The example of the preceding paragraph is a bit farfetched. DEFINITION 5. a limiting property of an estimator. but it suggests that we pay more attention to the probabilities with which estimators will take on values that are close to the parameters that they are supposed to estimate. for large n. The kind of convergence expressed by the limit in Definition 5 is generally called convergence in probability. In practice. The statistic  ˆ is a consistent estimator of the parameter ␪ of a given distribution if and only if for each c > 0 lim P(| ˆ − θ | < c) = 1 n→q Note that consistency is an asymptotic property. we see that when n→q the probability approaches 1 that X will take on a value that differs from the mean of the population sampled by less than any arbitrary constant c > 0. Solution Since S2 is an unbiased estimator of σ 2 in accordance with Theorem 3. then  ˆ is a consistent estimator of θ . In both of these examples we are practically assured that. EXAMPLE 8 Show that for a random sample from a normal population. Definition 5 says that when n is sufficiently large. we find that for a random sample from a normal population 2σ 4 var(S2 ) = n−1 293 . it remains to be shown that var(S2 )→0 as n→q. X Based on Chebyshev’s theorem. is a consistent estimator of the binomial n parameter θ and X is a consistent estimator of the mean of a population with a finite variance. we can be practically certain that the error made with a consistent estimator will be less than any small preassigned positive constant. Formally. the sample variance S2 is a consistent estimator of σ 2 . Basing our argu- ment on Chebyshev’s theorem. this concept of “closeness” is expressed by means of the following definition of consistency. THEOREM 3. If  ˆ is an unbiased estimator of the parameter θ and var()→0 ˆ as n→q. CONSISTENT ESTIMATOR. which. that is. in fact. Informally. Also using Chebyshev’s theorem. Referring to the theorem “the random variable (n−1)S2 σ2 has a chi-square distribution with n − 1 degrees of freedom”. when n→q the probability approaches 1 that the X sample proportion will take on a value that differs from the binomial parameter n θ by less than any arbitrary constant c > 0. we can often judge whether an estimator is consistent by using the following sufficient condition. Point Estimation It follows that var(S2 )→0 as n→q. we find that the sampling distribution of Y1 is given by . EXAMPLE 9 With reference to Example 3. Solution Substituting into the formula for g1 (y1 ). and we have thus shown that S2 is a consistent estimator of the variance of a normal population. the first order statistic Y1 ) is a consistent estimator of the parameter δ.” This is illustrated by the following example. It is of interest to note that Theorem 3 also holds if we substitute “asymptotically unbiased” for “unbiased. show that the smallest sample value (that is. n Furthermore. Based on this result. n−1 q −(y1 −δ) −(x−δ) g1 (y1 ) = n · e · e dx y1 = n · e−n(y1 −δ) for y1 > δ and g1 (y1 ) = 0 elsewhere. it can easily be shown 1 that E(Y1 ) = δ + and hence that Y1 is an asymptotically unbiased estimator of δ. P(|Y1 − δ| < c) = P(δ < Y1 < δ + c) . or even asymptotically unbiased. x2 . x2 . if all the knowledge about θ that can be gained from the individual sample values and their order can just as well be gained from the value of ˆ alone. 5 Sufficiency An estimator  ˆ is said to be sufficient if it utilizes all the information in a sample relevant to the estimation of θ . . Theorem 3 provides a sufficient condition for the consistency of an estimator. . it follows from Definition 5 that Y1 is a consistent estima- n→q tor of δ. . xn ) f (x1 . . we can describe this property of an estimator by referring to the con- ditional probability distribution or density of the sample values given  ˆ = θ̂. . θ̂ ) f (x1 . . which is given by f (x1 . . . δ+c = n · e−n(y1 −δ) dy1 δ = 1 − e−nc Since lim (1 − e−nc ) = 1. xn |θ̂ ) = = g(θ̂ ) g(θ̂ ) 294 . xn . . . . . that is. This is illustrated by Exercise 41. It is not a necessary condition because consistent estimators need not be unbiased. Formally. x2 . . f (xi . Xn yielding ˆ = θ̂ will be just as likely for any value of θ . . . . . . and the knowledge of these sample values will help in the estimation of θ . Xn yielding ˆ = θ̂ will be more probable for some values of θ than for others. . n. θ ) = θ (1 − θ )n−x x and the transformation-of-variable technique yields   n 1 g(θ̂ ) = θ nθ̂ (1 − θ )n−nθ̂ for θ̂ = 0. . . . θ ) = θ xi (1 − θ )1−xi for xi = 0. . On the other hand. x2 . 2. since X = X1 + X2 + · · · + Xn is a binomial random variable with the parameters θ and n. and the knowledge of these sample values will be of no help in the estimation of θ . Point Estimation If it depends on θ . EXAMPLE 10 If X1 . . its distribution is given by   n x b(x. 1 nθ̂ n 295 . if it does not depend on θ . then particular values of X1 . DEFINITION 6. Xn . Xn constitute a random sample of size n from a Bernoulli population. . A random variable X has a Bernoulli distribution and it is referred to as a Bernoulli random variable if and only if its probability distribution is given by f (x. SUFFICIENT ESTIMATOR. . . X2 . . Solution By the definition “BERNOULLI DISTRIBUTION. X2 . . Also. then particular values of X1 . θ ) = θ x (1 − θ )1−x for x = 0. . show that X + X2 + · · · + Xn ˆ = 1 n is a sufficient estimator of the parameter θ . . . . n. . X2 . 1”. 1 so that  n f (x1 . . . X2 . . . . . . The statistic  ˆ is a sufficient estimator of the parameter ␪ of a given distribution if and only if for each value of  ˆ the condi- tional probability distribution or density of the random sample X1 . given ˆ = ␪. xn ) = θ xi (1 − θ )1−xi i=1  n  n xi n− xi =θ i=1 (1 − θ ) i=1 = θ x (1 − θ )n−x = θ nθ̂ (1 − θ )n−nθ̂ for xi = 0 or 1 and i = 1. is independent of ␪. x2 . 2. x3 . . x3 |y) = g(y) is not independent of θ for some values of X1 . θ̂ ) f (x1 . xn . x2 . . that ˆ = is a sufficient estimator of θ . . x2 . we get f (x1 . . 1) where f (x1 . 1. y = 16 (1 + 2 · 1 + 3 · 0) = 12 and 1 P X1 = 1. 0) + f (0. . x2 = 1. Thus. . and x3 = 0. . 0) = f (1. 0. x3 ) = θ x1 +x2 +x3 (1 − θ )3−(x1 +x2 +x3 ) 296 . this does not depend on θ and we have X shown. . and X3 . X3 = 0. therefore. x2 . X2 = 1. . let us consider the case where x1 = 1. 0|Y = = 2 1 P Y= 2 f (1. . x2 . n. . xn |θ̂ ) on the previous page. n EXAMPLE 11 Show that Y = 6 (X1 + 2X2 + 3X3 ) 1 is not a sufficient estimator of the Bernoulli parameter θ . Y = 1 2 f 1. 1. y) f (x1 . Point Estimation Now. xn ) = g(θ̂ ) g(θ̂ ) θ nθ̂ (1 − θ )n−nθ̂ =  n θ nθ̂ (1 − θ )n−nθ̂ nθ̂ 1 =  n nθ̂ 1 =  n x 1 =  n x1 + x2 + · · · + xn for xi = 0 or 1 and i = 1. substituting into the formula for f (x1 . X2 . Solution Since we must show that f (x1 . . . . 1. x2 . . . Evidently. . xn ) does not depend on θ . xn . it is usually easier to base it instead on the following factorization theorem. for instance. . . and h(x1 . . 0) = θ 2 (1 − θ ) and f (0. x2 . θ ) = g(θ̂ . Point Estimation for x1 = 0 or 1 and i = 1. 3. the book by Hogg and Tanis listed among the references at the end of this chapter. . 2. . A proof of this theorem may be found in more advanced texts. EXAMPLE 12 Show that X is a sufficient estimator of the mean μ of a normal population with the known variance σ 2 . x2 . xn . Here. . Solution Making use of the fact that  n n  x −μ 2  1 − 12 · i σ f (x1 . 0. . 0|Y = = 2 =θ 2 θ (1 − θ ) + θ (1 − θ )2 and it can be seen that this conditional probability depends on θ . . 1. . xn ) where g(θ̂ . θ ) depends only on θ̂ and θ . The statistic  ˆ is a sufficient estimator of the parameter θ if and only if the joint probability distribution or density of the random sample can be factored so that f (x1 . . 1. μ) = √ ·e i=1 σ 2π and that  n  n 2 (xi − μ) = [(xi − x) − (μ − x)]2 i=1 i=1  n  n = (xi − x)2 + (x − μ)2 i=1 i=1  n = (xi − x)2 + n(x − μ)2 i=1 297 . . x2 . . Since f (1. Because it can be very tedious to check whether a statistic is a sufficient estima- tor of a given parameter based directly on Definition 6. x2 . . it follows that 1 θ 2 (1 − θ ) f 1. see. 1) = θ (1 − θ )2 . . . let us illustrate the use of Theorem 4 by means of the following example. THEOREM 4. We have thus shown that Y = 16 (X1 + 2X2 + 3X3 ) is not a sufficient estimator of the parameter θ of a Bernoulli population. θ ) · h(x1 . when estimating the difference between the average weights of two kinds of frogs. but if we want to show that a statistic  ˆ is not a sufficient estimator of a given parame- ter θ . θ ] depends only on y and θ . . Point Estimation we get ⎧ 2 ⎫ ⎪ ⎨ √n ⎪ − 2 σ/√n ⎬ 1 x−μ f (x1 . it follows that X = X1 + X2 + · · · + Xn is also a sufficient estimator of the mean μ = nθ of a binomial population. . other methods of inference) are adversely affected by violations of underlying assumptions. as we shall see later. This was illustrated by Example 11. xn . it follows that X is a sufficient estimator of the mean μ of a normal population with the known variance σ 2 . X where we showed that  ˆ = is a sufficient estimator of the Bernoulli parameter n θ . provided y = u(θ̂ ) can be solved to give the single-valued inverse θ̂ = w(y). . . or when estimating the average income of a certain age group. Let us also mention the following important property of sufficient estimators. an estimator is said to be robust if its sampling dis- tribution is not seriously affected by violations of assumptions. whereas actually the population (income distribution) is highly skewed. respectively. xn ) where g[w(y). . As we already said. we may be 298 . and in general the difference μ1 − μ2 between the means of two populations.’s of two ethnic groups. special attention has been paid to a statistical property called robust- ness.Q. then any single-valued function Y = u(). we may use a method based on the assumption that we are sampling a normal population. the factorization theorem usually leads to easier solutions. In other words. According to Theorem 4. we have presented two ways of checking whether a statistic  ˆ is a sufficient estimator of a given parameter θ . Based on Definition 6 and Theorem 4. since we can write f (x1 . the difference between the mean I. If ˆ is a sufficient estimator of θ . . Also. and therefore of u(θ ). For instance. . ˆ not involving θ . xn . They may also pertain to the nature of the populations sampled or their parameters. This follows directly from Theorem 4. . x2 . If we apply this result to Example 10. . whereas actually we are sam- pling a Weibull population. Such violations are often due to outliers caused by outright errors made. . and the second factor does not involve μ. . μ) = √ ·e ⎪ ⎩ σ 2π ⎪ ⎭ ⎧ ⎫ ⎪  n−1 n  1  xi −x 2 ⎪ ⎨ 1 1 −2· σ ⎬ * √ √ · e i=1 ⎪ n σ 2π ⎩ ⎪ ⎭ where the first factor on the right-hand side depends only on the estimate x and the population mean μ. θ ] · h(x1 . 6 Robustness In recent years. θ ) = g[w(y). x2 . . It is indicative of the extent to which estimation procedures (and. say. we may think that we are sampling an exponential population. is also a sufficient estimator of θ . when estimating the average useful life of a certain electronic component. it is nearly always easier to proceed with Definition 6. x2 . in reading instruments or recording the data or by mistakes in experimental procedures. Substituting “asymptotically unbiased” for “unbi. we first take a random sample of size n. Xn constitute a random sample of size n being unbiased or even asymptotically unbiased. Xn constitute a random sample of size n of θ . . 47. When it comes to questions of robustness. and if the estimator of the variance of a normal population with the number we draw is 2. If X1 . 7 The Method of Moments As we have seen in this chapter. estimator of θ ? mator of the parameter θ ? 45. Show that the estimator of Exercise 5 is a sufficient of n slips of paper numbered from 1 through n. Therefore.. If X1 and X2 are independent random variables hav- the nth order statistic. With reference to the uniform population of Exam- ple 4. or methods. use this theorem to rework X1 + 2X2 + X3 is not a sufficient estimator of θ . show that the mean of the ased” in Theorem 3. Xn constitute a random sample of size n+1 n from an exponential population. . indeed. or n. show that Y = X1 + X2 + the following estimation procedure: To estimate the mean · · · + Xn is a sufficient estimator of the parameter θ . Use Definition 5 to show that Y1 . 44. we use as our estimator known mean μ. Show that this estimation procedure is form population with β = α + 1. If X1 and X2 constitute a random sample of size n = 2 X +1 from a Poisson population. . (Hint: Exercise 35. X1 + 2X2 tent estimator of the parameter θ . In reference to Exercise 43. when we speak of violations of underlying assumptions. Consider special values of X1 . Yn . After all. we are thus faced with all sorts of difficulties. . is the nth order statistic. show that is a sufficient estimator n1 + n2 36. otherwise. . . If X1 . 43. of a population with the finite variance σ 2 . X1 + X2 and θ and n2 . and X3 . what do we mean by “not seriously affected”? Furthermore. show that X is a consis. . and X3 constitute a random sample of size 40. To show that an estimator can be consistent without 48. consider from a geometric population. If X1 . is Xn a consistent esti. 3. mator of the binomial parameter θ . is a consistent estimator of the parameter α of a uni. 38. . show that is a consistent esti- n+2 sample is a sufficient estimator of the parameter λ. With reference to Exercise 36. Exercises 33. and for the most part they can be resolved only by computer simulations. . Point Estimation assuming that the two populations have the same variance σ 2 . it would seem desirable to have some general method. mathematically and otherwise. (a) consistent. there can be many different estimators of one and the same parameter of a population. use the definition of consistency to show that Yn . 46. cient estimator of the parameter θ . estimate n2 . the first order statis. X2 . 1 that Y1 − is a consistent estimator of the parameter 42. it should be clear that some violations are more serious than others. . a sufficient estimator of the parameter β? 39. . As should be apparent. n = 3 from a Bernoulli population. . from an exponential population. whereas in reality σ12 Z σ22 .) 41. the mean of the random sample. After referring to Example 4. most questions of robustness are difficult to answer. 34. show that X is a suffi- α. show that Y = ased” in Theorem 3. is a sufficient n1 + 2n2 37. X2 . we use the tic. Then we randomly draw one 49. . . Substituting “asymptotically unbiased” for “unbi. X2 . If X1 . that yield estimators with as many desirable 299 . . much of the language used in the preceding paragraph is relatively impre- cise. X2 . With reference to Exercise 33. Show that the estimator of Exercise 21 is consistent. 35. is a consistent estimator of the ing binomial distributions with the parameters θ and n1 parameter β. use Theorem 3 to show (b) neither unbiased nor asymptotically unbiased. X2 . which is historically one of the oldest methods. . . the method of moments. if a population has r parameters. EXAMPLE 13 Given a random sample of size n from a uniform population with β = 1. xn is the mean of their kth powers and it is denoted by m k . 2. In this section and in Section 8 we shall present two such methods. 2 2 α+1 x= 2 and we can write the estimate of α as α̂ = 2x − 1 EXAMPLE 14 Given a random sample of size n from a gamma population. . the method of moments consists of solving the system of equations mk = μk k = 1. use the method of moments to obtain a formula for estimating the parameter α. where m1 = x and μ1 = α+β α+1 = . The method of moments consists of equating the first few moments of a popu- lation to the corresponding moments of a sample. x2 . . thus getting as many equations as are needed to solve for the unknown parameters of the population. Solution The system of equations that we shall have to solve is m1 = μ1 and m2 = μ2 where μ1 = αβ and μ2 = α(α + 1)β 2 . SAMPLE MOMENTS. DEFINITION 7. Thus.  n xki mk i=1 = n Thus. Thus. Furthermore. m1 = αβ and m2 = α(α + 1)β 2 300 . The kth sample moment of a set of observations x1 . . use the method of moments to obtain formulas for estimating the parameters α and β. Point Estimation properties as possible. . and the method of maximum likelihood. Bayesian estimation will be treated briefly in Section 9. . r for the r parameters. . symbolically. Solution The equation that we shall have to solve is m1 = μ1 . solving for α and β. suppose that four letters arrive in somebody’s morning mail. we get the following formulas for estimating the two param- eters of the gamma distribution: (m1 )2 m2 − (m1 )2 α̂ = and β̂ = m2 − (m1 )2 m1  n  n xi x2i Since m1 = = x and m2 = i=1 i=1 . however. Fisher proposed a general method of estimation called the method of maximum likelihood. or functional form. of the population. we can write n n  n 2 (xi − x)2 nx i=1 α̂ = and β̂ =  n nx (xi − x)2 i=1 in terms of the original observations. and if we assume that each letter had the same chance of being misplaced. Point Estimation and. the total number of credit-card billings among the four letters received? Clearly. It is important to note. k must be two or three. A. we find that the probability of the observed data (two of the three remaining letters contain credit- card billings) is    2 2 2 1 1   = 4 2 3 for k = 2 and    3 1 2 1 3   = 4 4 3 301 . but unfortunately one of them is misplaced before the recipient has a chance to open it. two contain credit-card billings and the other one does not. In these examples we were concerned with the parameters of a specific popula- tion. If. that when the parameters to be estimated are the moments of the population. He also demon- strated the advantages of this method by showing that it yields sufficient estimators whenever they exist and that maximum likelihood estimators are asymptotically minimum variance unbiased estimators. R. 8 The Method of Maximum Likelihood In two papers published early in the last century. among the remaining three letters. To help to understand the principle on which the method of maximum like- lihood is based. then the method of moments can be used without any knowledge about the nature. what might be a good estimate of k. if we choose as our estimate of k the value that maximizes the probability of getting the observed data. . . Xn = xn . Xn at X1 = x1 . . Point Estimation for k = 3. In the discrete case. . xn . . . . the general idea applies also when there are several unknown parameters. f(x1 . EXAMPLE 15 Given x “successes” in n trials. . x2 . . x2 . . . . the essential feature of the method of maximum likelihood is that we look at the sample values and then choose as our estimates of the unknown parameters the values for which the probability or probability density of getting the sample val- ues is a maximum. and we refer to this function as the likelihood function. as we shall see in Example 18. but in that case f (x1 . x2 . xn . . . . . θ ) which is just the value of the joint probability distribution of the random variables X1 . . X2 . x2 . and the method by which it was obtained is called the method of maximum likelihood. . xn . Therefore. the likelihood function of the sample is given by L(θ ) = f (x1 . . . xn . . . . . X2 . An analogous definition applies when the random sample comes from a continuous population. . we regard f (x1 . X2 = x2 . . . Xn = xn ) = f (x1 . We call this estimate a maximum likelihood estimate. . but. Since the sample values have been observed and are therefore fixed numbers. . Solution To find the value of θ that maximizes   n x L(θ ) = θ (1 − θ )n−x x it will be convenient to make use of the fact that the value of θ that maximizes L(θ ) will also maximize   n ln L(θ ) = ln + x · ln θ + (n − x) · ln(1 − θ ) x 302 . . . Xn at X1 = x1 . If x1 . . We refer to the value of ␪ that maximizes L(␪) as the maximum likelihood estimator of ␪. find the maximum likelihood estimate of the param- eter θ of the corresponding binomial distribution. . . . . . θ ) for values of ␪ within a given domain. we shall limit ourselves to the one-parameter case. X2 = x2 . . if the observed sample values are x1 . we obtain k = 3. Xn = xn . MAXIMUM LIKELIHOOD ESTIMATOR. . xn are the values of a random sample from a population with the parameter ␪. xn . X2 = x2 . θ ) as a value of a function of θ . DEFINITION 8. . xn . X2 = x2 . . X2 . ␪) is the value of the joint probability distribution or the joint probability density of the random variables X1 . the probability of getting them is P(X1 = x1 . Xn at X1 = x1 . . . In what follows. . x2 . . Xn = xn . x2 . . . x2 . Here. . . . . . . . Thus. . θ ) is the value of the joint probability density of the random variables X1 . . . . x2 . we get the maximum likelihood estimate 1  n θ̂ = · xi = x n i=1 Hence. . find the maximum likelihood estimator of its parameter θ . x2 . . θ )  n = f (xi . EXAMPLE 16 If x1 . . we get d[ln L(θ )] x n−x = − dθ θ 1−θ and. xn are the values of a random sample of size n from a uniform popula- tion with α = 0 (as in Example 4). we find that the likelihood func- x tion has a maximum at θ = . . . β) = β i=1 303 . Now let us consider an example in which straightforward differentiation cannot be used to find the maximum value of the likelihood function. EXAMPLE 17 If x1 . Solution The likelihood function is given by  n n 1 L(β) = f (xi . This is the maximum likelihood estimate of the n X binomial parameter θ . Point Estimation Thus. . Solution Since the likelihood function is given by L(θ ) = f (x1 . the maximum likelihood estimator is  ˆ = X. . . . x2 . θ ) i=1   n − 1  n 1 θ xi = ·e i=1 θ differentiation of ln L(θ ) with respect to θ yields 1  n d[ln L(θ )] n =− + 2· xi dθ θ θ i=1 Equating this derivative to zero and solving for θ . xn . xn are the values of a random sample from an exponential population. . find the maximum likelihood estimator of β. and we refer to  ˆ = as the corresponding maximum like- n lihood estimator. equating this derivative to 0 and solving for θ . μ. σ ) = n(xi . X2 . we get 1  n μ̂ = · xi = x n i=1 and equating the second of these partial derivatives to zero and solving for σ 2 after substituting μ = x. and it follows that the maximum likelihood estimator of β is Yn . . Xn constitute a random sample of size n from a normal population with the mean μ and the variance σ 2 . σ 2 ) with respect to μ and σ 2 yields 1  n ⭸[ln L(μ. Point Estimation for β greater than or equal to the largest of the x’s and 0 otherwise. σ 2 )] = 2· (xi − μ) ⭸μ σ i=1 and 1  n ⭸[ln L(μ. the ones of Examples 15 and 16 were unbiased. σ ) i=1  n 1  n 1 − · (xi −μ)2 2σ 2 i=1 = √ ·e σ 2π partial differentiation of ln L(μ. Since the value of this likelihood function increases as β decreases. σ 2 )] n = − + · (xi − μ)2 ⭸σ 2 2σ 2 2σ 4 i=1 Equating the first of these two partial derivatives to zero and solving for μ. we get 1  n σ̂ 2 = · (xi − x)2 n i=1 304 . we find that maximum likelihood estimators need not be unbiased. However. In that case we must find the values of the parameters that jointly maximize the likelihood function. Comparing the result of this example with that of Example 4. . . find joint maximum likelihood estimates of these two parameters. . the nth order statistic. Solution Since the likelihood function is given by  n 2 L(μ. EXAMPLE 18 If X1 . The method of maximum likelihood can also be used for the simultaneous esti- mation of several parameters of a given population. we must make β as small as possible. find formulas for estimating ⎧ ⎪ 1 its parameter θ by using ⎪ x−δ ⎨ · e− θ for x > δ (a) the method of moments. X2 . and 18 we maximized the logarithm of the likelihood func- tion instead of the likelihood function itself. Use the method of maximum likelihood to rework lation with β = 1. . . Given a random sample of size n from a beta popu- 59. . θ . . Given a random sample of size n from a Poisson popu. X2 . parameter exponential distribution. . Given a random sample of size n from a normal pop- ulation with the known mean μ. . . from a population given by ⎧ 61. If n0 of them take on the value 0. It just so happened that it was convenient in each case. X2 . only that σ̂ 2 is a maximum likelihood estimate of σ 2 . use the method of moments to find a Exercise 53. Xn constitute a random sample of size n from a population given by 63. If X1 . X2 . population with the mean μ and the variance σ 2 . and for θ = 1 it is the distribution of Example 3. is a maximum likelihood estimate of σ . ⎩0 elsewhere 62. In Examples 15. θ ) = θ2 maximum likelihood to find a formula for estimating β. . It follows that ! " "1  n σ̂ = # · (xi − x)2 n i=1 which differs from s in that we divide by n instead of n − 1. estimating θ . 56. on the value 1. Xn constitute a random sample of size n Exercise 54. hood estimator for σ . . identical binomial distributions with the parameters θ and n = 3. 60. use the method of moments to obtain an estimator value 3. use the method of moments to find a formula for for the parameter λ. 54. g(x. If X1 . If X1 . find the maximum likeli- find an estimator for θ by the method of moments. Xn constitute a random sample of size n from a geometric population. . Consider N independent random variables having of moments. n2 take on the value 2. δ) = θ (b) the method of maximum likelihood. . Exercises 50. . use the This distribution is sometimes referred to as the two- method of moments to find estimators for μ and σ 2 . formula for estimating the parameter α. . uniform population. use the method of moments to find an 57. If X1 . 52. then g() ˆ is also a maximum likelihood estimator of g(θ ). Given a random sample of size n from a continuous estimator of the parameter θ . . Point Estimation It should be observed that we did not show that σ̂ is a maximum likelihood esti- mate of σ . Xn constitute a random sample from a find estimators for δ and θ by the method of moments. Xn constitute a random sample of size n ⎨ 2(θ − x) from a gamma population with α = 2. and n3 take on the lation. ⎪ ⎪ 0 elsewhere ⎩ 305 . n1 take 53. it can be shown (see reference at the end of this chapter) that maximum likelihood estima- tors have the invariance property that if  ˆ is a maximum likelihood estimator of θ and the function given by g(θ ) is continuous. use the method of for 0 < x < θ f (x. X2 . . Use the method of maximum likelihood to rework 55. use the method of moments to find formulas for estimating the parameters α and β. but this is by no means necessary. . find an estimator for β by the method 58. 51. If X1 . . Given a random sample of size n from an exponen- tial population. . Given a random sample of size n from a uniform pop- ulation with α = 0. 16. . However. . we determine the posterior distribution of  by means of the formula f (θ . if h(θ ) is the value of the prior distribution of  at θ and we want to combine the information that it conveys with direct sample evidence about . and this is accomplished by deter- mining ϕ(θ |x). Let X1 . of θ : (a) 12 (Y1 + Yn ). Wn are independent be unique. 72. W2 . Given a random sample of size n from a Pareto pop. Use the method of maximum likelihood to rework Exercise 56. find an estimator for its parameter α by the pendent random samples of sizes n1 and n2 from method of maximum likelihood. ⎧ ⎨ 1 1 1 for θ − <x<θ + 67. . . and σ 2 . 68. . 9 Bayesian Estimation† So far we have assumed in this chapter that the parameters that we want to estimate are unknown constants. this conditional distribution (which also reflects the direct sample evidence) is called the posterior distribution of . use the method of maximum likelihood to find a formula for estimating its parameter α. random samples of size n from normal populations with the means μ1 = α + β and μ2 = α − β and the common 73. Use the method of maximum likelihood to rework f (x. . find the maximum 1 1 Yn − F  ˆ F Y1 + likelihood estimator for 2 2 (a) β. Note that the preceding formula for ϕ(θ |w) is. ulation. . . Vn and W1 . find maximum likelihood estimators for lowing estimators are maximum likelihood estimators α and β. . μ2 . . . an extension of Bayes’ theorem to the continuous case. Vn1 and W1 . Show that if Y1 and Yn are the first and nth order statistic. in fact.” † This section may be omitted with no loss of continuity. The main problem of Bayesian estimation is that of combining prior feelings about a parameter with direct sample evidence. Hence. . Use the method of maximum likelihood to rework Exercise 58. f (θ . (b) τ = (2β − 1)2 . X2 . . usually reflecting the strength of one’s belief about the possible values that they can assume. the conditional density of  given X = x. check whether the fol- variance σ 2 = 1. w) h(θ ) · f (w|θ ) ϕ(θ |w) = = g(w) g(w) Here f (w|θ ) is the value of the sampling distribution of W given  = θ at w. . Given a random sample of size n from a gamma pop- ulation with the known parameter α. Point Estimation 64. find maximum likelihood estima- 65. Xn be a random sample of size n from the uniform population given by 66. . X2 . . In general. 306 . V2 . Wn2 are inde- population. any estimator ˆ such that 69. Given a random sample of size n from a Rayleigh 71. . With reference to Exercise 72. If V1 . and g(w) is the value of the marginal distribution of W at w. . . . . Xn ). tors for μ1 . . This shows that maximum likelihood estimators need not 70. in Bayesian estimation the parameters are looked upon as random variables having prior distributions. (b) 13 (Y1 + 2Y2 ). θ ) = 2 2 ⎩0 elsewhere Exercise 57. V2 . the term “Bayesian estimation. . w) is the value of the joint distribution of  and W at θ and w. the value of a statistic W = u(X1 . If V1 . normal populations with the means μ1 and μ2 and the common variance σ 2 . can serve as a maximum likelihood estimator of θ . In contrast to the prior distribution of . . . for instance. W2 . . If X is a binomial random variable and the prior distribution of  is a beta distribution with the parameters α and β. 1. Proof For  = θ we have   n x f (x|θ ) = θ (1 − θ )n−x for x = 0. . . we shall limit our discussion here to infer- ences about the parameter  of a binomial population and the mean of a normal population. and f (θ . 2. . THEOREM 5. . To obtain the marginal density of X. inferences about the parameter of a Poisson population are treated in Exercise 77. then the posterior distribution of  given X = x is a beta distribution with the parameters x + α and n − x + β. . let us make use of the fact that the integral of the beta density from 0 to 1 equals 1. x) = 0 elsewhere. or it can be used to make probability statements about the parameter. . as will be illustrated in Example 20. x) = · θ α−1 (1 − θ )β−1 * θ (1 − θ )n−x (α) · (β) x   n (α + β) = · · θ x+α−1 (1 − θ )n−x+β−1 x (α) · (β) for 0 < θ < 1 and x = 0. it can be used to make estimates. n x ⎧ ⎨ (α + β) · θ α−1 (1 − θ )β−1 for 0 < θ < 1 h(θ ) = (α) · (β) ⎩ 0 elsewhere and hence   (α + β) n x f (θ . 2. . . . 1. Although the method we have described has extensive applications. that is. n. Point Estimation Once the posterior distribution of a parameter has been obtained. and hence (n + α + β) ϕ(θ |x) = · θ x+α−1 (1 − θ )n−x+β−1 (α + x) · (n − x + β) for 0 < θ < 1. 1. this is a beta density with the parameters x + α and n − x + β. . . n. . 307 . and ϕ(θ |x) = 0 elsewhere. . As can be seen by inspection. 1 (α) · (β) xα−1 (1 − x)β−1 dx = 0 (α + β) Thus. we get   n (α + β) (α + x) · (n − x + β) g(x) = · · x (α) · (β) (n + α + β) for x = 0. when the loss function is given by L[d(x). where nxσ02 + μ0 σ 2 1 n 1 μ1 = and = + nσ02 + σ 2 σ12 σ 2 σ02 Proof For M = μ we have 2 √ x−μ n − 12 σ/√ f (x|μ) = √ · e n for −q < x < q σ 2π 308 . the minimum variance unbiased estimate of θ (see Exercise 14) would be the sample proportion x 42 θ̂ = = = 0. Solution Substituting x = 42.41 40 + 40 + 120 Note that without knowledge of the prior distribution of . then the posterior distribution of M given X = x is a normal distribution with the mean μ1 and the variance σ12 . that is. we get 42 + 40 E(|42) = = 0. α = 40.35 n 120 THEOREM 6. If X is the mean of a random sample of size n from a normal population with the known variance σ 2 and the prior distribution of M (capital Greek mu) is a normal distribution with the mean μ0 and the variance σ02 . EXAMPLE 19 Find the mean of the posterior distribution as an estimate of the “true” probability of a success if 42 successes are obtained in 120 binomial trials and the prior distribution of  is a beta distribution with α = β = 40. θ ] = c[d(x) − θ ]2 where c is a positive constant. Since the posterior distribution of  is a beta distribu- tion with parameters x + α and n − x + β. and β = 40 into the formula for E(|x). Point Estimation To make use of this theorem. it follows from the theorem “The mean and the variance of the beta distribution are given by μ = α+β α and σ 2 = (α+β)2αβ(α+β+1) ” that x+α E(|x) = α+β +n is a value of an estimator of θ that minimizes the Bayes risk when the loss function is quadratic and the prior distribution of  is of the given form. let us refer to the result that (under very general conditions) the mean of the posterior distribution minimizes the Bayes risk when the loss function is quadratic. n = 120. it can be written as   μ−μ1 2 1 − 12 ϕ(μ|x) = √ ·e σ1 for −q < μ < q σ1 2π where μ1 and σ1 are defined above. the posterior distri- bution of M becomes √ − 2 (μ−μ1 ) 1 n · eR 2 ϕ(μ|x) = · e 2σ1 for −q < μ < q 2π σ σ0 g(x) which is easily identified as a normal distribution with the mean μ1 and the variance σ12 . Point Estimation and   μ−μ0 2 1 − 12 h(μ) = √ ·e σ0 for −q < μ < q σ0 2π so that h(μ) · f (x|μ) ϕ(μ|x) = g(x) 2   √ x−μ μ−μ0 2 n − 12 √ σ/ n − 12 σ0 = ·e for − q < μ < q 2π σ σ0 g(x) Now. the mean will vary somewhat from market to market. and complete the square. Note that we did not have to deter- mine g(x) as it was absorbed in the constant in the final result. x. the number of drinks sold will vary from week to 309 .4. Hence. Thus. EXAMPLE 20 A distributor of soft-drink vending machines feels that in a supermarket one of his machines will sell on the average μ0 = 738 drinks per week. but not μ. if we collect powers of μ in the exponent of e. μ0 . we get       1 n 1 2 nx μ0 1 nx2 μ20 − + μ + + μ− + 2 2 σ 2 σ02 σ 2 σ02 2 σ2 σ0 and if we let 1 n 1 nxσ02 + μ0 σ 2 = + and μ1 = σ12 σ 2 σ02 nσ02 + σ 2 1 factor out − . and σ0 . the exponent of e in the expres- 2σ12 sion for ϕ(μ|x) becomes 1 − 2 (μ − μ1 )2 + R 2σ1 where R involves n. As far as a machine placed in a particular market is concerned. σ . and the distributor feels that this variation is measured by the standard deviation σ0 = 13. Of course. 645. This question has been intentionally omitted for this 76. or approximately 0.692(13.53 9. that is.4429 + 0.2019 = 0.6448.2019 0.5 Thus.5)2 (13.53 Figure 1. week.58 z  0.4)2 + (42.0111 σ12 (42.4429 m 700 715 720 z  1. Point Estimation 0.4)2 so that σ12 = 90.4. 310 .5)2 μ1 = = 715 10(13. we find that substitution into the two formulas of Theorem 6 yields 10. the answer to our question is given by the area of the shaded region of Figure 1. given in Theorem 6 can be written as 75. Show that the mean of the posterior distribution of M edition.5 9. and this variation is measured by the standard deviation σ = 42.5)2 and 1 10 1 = + = 0.4)2 + 738(42. If one of the distributor’s machines put into a new supermarket averaged x = 692 during the first 10 weeks. the probability that the value of M is between 700 and 720 is 0.5. Exercises 74. Now. This question has been intentionally omitted for this μ1 = w · x + (1 − w) · μ0 edition. what is the probability (the distributor’s personal probability) that for this market the value of M is actually between 700 and 720? Solution Assuming that the population sampled is approximately normal and that it is reason- able to treat the prior distribution of M as a normal distribution with the mean μ0 and the standard deviation σ0 = 13. Diagram for Example 20. the area under the standard normal curve between 700 − 715 720 − 715 z= = −1.0 and σ1 = 9.5.58 and z= = 0. As per the theorem “If X has a binomial distribution with the parameters n and θ and Y = Xn . n+ β +1 σ02 (b) the mean of the posterior distribution of  is β(α + x) 77. n.03. then E(Y) = θ and σY2 = θ(1−θ) n ” we know θ (1 − θ ) that σX/n = 2 . assuming that n will turn out to be large. If the sample size. Point Estimation that is. In spite of these desirable properties of the sample mean as an estimator for a population mean. show that n (a) the posterior distribution of  given X = x is a w= β σ2 gamma distribution with the parameters α + x and . E = |x − μ|. How many registered voters should she interview? Solution We shall use the normal approximation to the binomial distribution. x. is large. It has been shown to be the minimum variance unbiased estimator as well as a sufficient estimator for the mean of a normal distribution. the maximum value of σ is √ . She wishes to be sure with a probability of 0. It is at least asymptotically unbiased as an estimator for the mean of most frequently encountered distributions.95 that the error in the resulting estimate will not exceed 3 percent. we can state with probability 1 − α that |x − μ| √ … zα/2 σ/ n or σ E … zα/2 √ n EXAMPLE 21 A pollster wishes to estimate the percentage of voters who favor a certain candidate. as a weighted mean of x and μ0 . we know that the sample mean will never equal the population mean. Thus. Since the 2 2 n maximum error is to be 0. is most frequently used to estimate the mean of a distribution from a random sample taken from that distribution. the quantity x−μ √ σ/ n is a value of a random variable having approximately the standard normal distribu- tion. the inequality for E can be written as 1 E … zα/2 · √ 2 n 311 . If X has a Poisson distribution and the prior dis. μ1 = tribution of its parameter  (capital Greek lambda) β +1 10 The Theory in Practice The sample mean. where is a gamma distribution with the parameters α and β. Since n 1 1 this quantity is maximized when θ = . Let us examine the error we make when using x to estimate μ. where θ is the parameter of the binomial distribution. we always round up to the nearest integer.000. the difference between the expected value of the estimate of the parameter θ and its true value. the sam- pling bias usually is unknown. This can be done by carefully calibrating all instruments to be used in measuring the sample units. and solving this inequality for n. such as temperature and humid- ity. Thus. ˆ We can write MSE() ˆ − θ )2 ˆ = E( = E[ ˆ − E() ˆ − θ ]2 ˆ + E() = E[ ˆ 2 + [E() ˆ − E()] ˆ − θ ]2 + 2{E[ ˆ − E()][E( ˆ ) ˆ − θ ]} The first term of the cross product. The mean square error defined above can be viewed as the expected squared error loss encountered when we estimate the parameter θ with the point estimator . that the error in the resulting estimate will not exceed 3 percent z2 α/2 (1. with probability 0.) It should not be surprising in view of this result that most such polls use sample sizes of about 1. we can write ˆ = σ 2ˆ + [Bias]2 MSE()  While it is possible to estimate the variance of  ˆ in most applications. because it omits people who do not own cars. Point Estimation Noting that zα/2 = z. For example. in performing this calculation. and we are left with MSE() ˆ = E[ ˆ 2 + [E() ˆ − E()] ˆ − θ ]2 The first term is readily seen to be the variance of ˆ and the second term is the square of the bias. Great care should be taken to avoid. we obtain for the sample size that assures. for it can be much greater than the sampling variance σ2ˆ . Such people may well have different opinions than those who do. or at least minimize sampling bias. Sampling bias occurs when a sample is chosen that does not accurately represent the population from which it is taken.96.03)2 (Note that. referenced at the end of this chapter.025 = 1.95. a national poll based on automobile registrations in each of the 50 states proba- bly is biased. E[ ˆ − E()] ˆ = E() ˆ − E() ˆ = 0. 312 . and by assuring that the method of sampling is appropriately randomized over the entire popula- tion for which sampling estimates are to be made.068 4E2 4(. These and other related issues are more thoroughly discussed in the book by Hogg and Tanis. by eliminating human subjectivity as much as possible. A sample of product stored on shelves in a warehouse is likely to be biased if all units of product in the sample were selected from the bottom shelf. Another consideration related to the accuracy of a sample estimate deals with the concept of sampling bias. Ambient conditions.96)2 n… = = 1. may well have affected the top-shelf units differently than those on the bottom shelf. The output of a certain integrated-circuit production tain light bulbs had useful lives of 415. x1 = 27. a farmer got her tractor started mated by the capture–recapture method. and 5. ation of 0. and 155. Assuming that Over a long period of time. $31. first. 479. estimate these 0. If on a certain day the sample contains 313 . 7. use the estimators obtained eter α. fourth.8.700. a city’s consumption $22. 38. 105. $23. 20 percent.200. three races. Assuming that these data race track. Assuming that these data can be parameters by means of the maximum likelihood estima- looked upon as a random sample from the population tors obtained in Exercise 71. 10.07. Point Estimation Applied Exercises SECS. in Example 14 to estimate the parameters α and β. 16. 83. 11. third. 99. won once 7 times.200. A country’s military intelligence knows that an enemy cise 57 to estimate the parameters α and β. use the estimators obtained in Exer- 80.300. μ1 and μ2 and the common variance σ 2 . $26. Assum- ing that these data can be looked upon as independent 84. whereas those of 6 teenagers belonging to another Try N = 9.3.04. Not counting the ones that failed immediately. 89. 11. 8. n2 of the animals ond. estimate μ using be looked upon as a random sample from a continuous the estimator of part (b) of Exercise 21. If 88. 4–8 district. sec- sideration. the size of the error was 0.5. 531. If θ is the probability that nential population. SEC. maximum likelihood estimator obtained in Exercise 68.1.3. and x2 = 38.600. Assuming that these data are captured. 79. estimate it by using the cise 51 to estimate the parameter θ . Assuming that these data can be looked upon 7. variance σ 2 . use the estimator obtained there by the method of moments to estimate the parameter θ . first. 0. 466.900. sixth. 10. $21. $33. and 439 hours.04. 87. seventh. second. X of them are found to be tagged.100.3.800.800.8. 117. of Exercise 55. $28. eighth. third.5. On 12 days selected at random. Random samples of size n are taken from normal she got a busy signal 6.5. 106. In this method.500. fourth. and 108. 92. 10. 4. Rework Exercise 85 using the estimators obtained in taken from a normal population with the mean μ and the Exercise 66 by the method of maximum likelihood.6. 422.1. 489. If x1 = 26.08. that is. $22. and populations with the mean μ and the variances σ12 = 4 5. $24. use the estimator obtained in Exer- he will win any one of his bets. 562. fifth. 410. 12. If n1 = 25. Assuming that as a random sample from a Pareto population.400. 3. 114.’s of 10 teenagers belonging to one ethnic are captured and only one of them is found to be tagged. cer. and released.500 miles.600.Q. $26. use the estimator of part (b) of Exercise 12 to estimate k. 93.000. In a random sample of 20 of his visits to the 44. tion. estimate its parameter θ by either of the animals of the given kind in the area under considera- methods of Exercise 63.1. he lost all his bets 11 times. the process has maintained these data can be looked upon as a random sample from a yield of 80 percent. Rework Exercise 87 using the estimators obtained in three of these tanks are captured and their serial num- Exercise 67. use the estima.700. Jones goes to the race track he bets on 82. the total number of ric population.400. If n1 = 3 rare owls are captured in a section of a forest. third. 4. and the variation of the proportion defec- tors obtained in Exercise 56 to estimate the parameters δ tive from day to day is measured by a standard devi- and θ . 9 85. 9. 114. $24.4. sixth. Data collected over a number of years show that when a broker called a random sample of eight of her clients. Later. Among six measurements of the boiling point of a sil- random samples from normal populations with the means icon compound. use the these data may be looked upon as a random sample estimator obtained in Exercise 65 to estimate the param- from a gamma population.500. 126. line is checked daily by inspecting a sample of 100 units. estimate μ using the estimator of Exercise 23. 1–3 78.8.5. and released. and 14. first. Assuming that these figures can and σ22 = 9. 8.03.7 percent of the time. 101. 90. and 41. Independent random samples of sizes n1 and n2 are 86. their annual salaries were $23. 4.0. 110. 0. 403. 6. n2 = 50. 475.500. first. and second try. and 0. tagged.400.8. fifth. and estimate N by the method of maximum likelihood.5.200.6. 41. The size of an animal population is sometimes esti- 91. $29.03◦ C. 4. 38. built certain new tanks numbered serially from 1 to k. and later n2 = 4 such owls 92. 105. On 20 very cold days. uniform population. and of electricity was 6. group are 98. n1 of the animals are captured in the area under con- second. $21. 0. 433. 81.0 and x2 = 32. on the first. tagged.9. $36. can be looked upon as a random sample from an expo- and won twice on 2 occasions. bers are 210.4 million kilowatt-hours. (Hint: 108.2.) ethnic group are 122. Every time Mr. Certain radial tires had useful lives of 35. and this can be looked upon as a random sample from a geomet- information is used to estimate N. The I. 7. 123. a proportion defective of a two-parameter exponential population. 13. In a random sample of the teachers in a large school SECS.14. : Prentice Research. References Various properties of sufficient estimators are dis. Assuming a standard deviation of 1 ohm... with 95 percent probability. as well Wiley & Sons. Sections of sheet metal of various lengths are lined expressed subjectively by a normal distribution with the up on a conveyor belt that moves at a constant speed. John Wiley & Sons.99? (c) the result of Exercise 74 to combine the prior infor- mation with the direct information.Q. find P(712 < M < 40 ohms.4? Use s = 7.’s of at least 115. and Tanis. parameter has a prior gamma distribution with α = 50 tive. If the purpose is to from 63. explain how this event if the examination is tried on a random sample of 40 sampling procedure could be biased.2 so that the sample (a) only the prior information.. With reference to Example 20. is this 725|x = 692). find the mean of the posterior distribu. 6th ed. E.J.4 as an estimate of σ .Q. New York: John and a derivation of the Cramér–Rao inequality. whose tion of  as an estimate of that day’s proportion defec. Inc. S. Hall. estimate of the mean will have an error of at most 0. A history professor is making up a final examination of the population of resistors being produced? that is to be given to a very large group of students. N. taken by asking how people will vote in an election if the 97. R. what would be her esti- mate of that particular business’s average daily number 94.Q..0 to 68. Advanced Statistical Methods in Biometric Inference. An office manager feels that for a certain kind of sample is confined to the person claiming to be the head business the daily number of incoming telephone calls is of household. V. New York: Keeping. If a sam.2 and the standard deviation σ0 = 1.. sample of these sections is taken for inspection by tak- (a) What prior probability does the professor assign to ing whatever section is passing in front of the inspection the actual average grade being somewhere on the interval station at each five-minute interval. 99. sample adequate to ensure. S. Inc. the percentage (b) only the direct information.. and a proof of Theorem 4 may be found in may be found in Hogg. Wilks. Van Nostrand Co. N. (c) both kinds of information and the theory of Exer- ple check of 30 freshmen entering the university in 2003 cise 77? showed that only 18 of them have I. Inc. Introduction to Statistical Inference.’s of at least 115..: D. A random sample of 36 resistors is taken from a pro- duction line manufacturing resistors to a specification of 95. L. as the most general conditions under which it applies. a probability of 0. 1983.. Princeton. R..5 with (b) only the direct information. 1962. 1952.’s of at 98. and this variation is measured by a standard deviation of 3 percent. Inc. Assume that the prior distribution of Θ is a beta and β = 2. 314 . SEC.9 and a standard 101. a random variable having a Poisson distribution. Theory of Point Estimation.5. Mathematical Statistics. How large a random sample is required from a popu- least 115 in that freshman class using lation whose standard deviation is 4. A mean μ0 = 65. Comment on the sampling bias (if any) of a poll deviation of 7. men have I. His feelings about the average grade that they should get is 100. 10 estimate the true proportion of students with I. that the sample mean will be within 1.5 ohms of the mean 96. Upper Saddle River. S.0? estimate the number of defects per section in the popu- (b) What posterior probability would he assign to this lation of all such manufactured sections. Important properties of maximum likelihood estimators cussed in are discussed in Lehmann. Of course. incoming calls on a given day. A. Records of a university (collected over many years) of incoming calls if she considers show that on the average 74 percent of all incoming fresh. Point Estimation 38 defectives. E. 1995. varies somewhat from year to year. Being told that one such business had 112 distribution. Probability and Statistical Rao. E. C. (a) only the prior information. New York: John Wiley & Sons...J. students whose grades have a mean of 72. 1962. 4786.  ln xi i=1 315 . $ 57 β̂ = m1 + 3[m2 − (m1 )2 ]. (b) θ̂ = . (c) μ̂ = 108. 81 α̂ = 4. 37 Yes. 85 θ̂ = 47. 2 45 Yes. (b) No. 83 N = 11 or 12. 93 E(Θ|38) = 0. 2 x 2x i=1 69 (a) β̂ = . τ̂ = −1 . 87 α̂ = 3. σ 2 = 181 .627 and β̂ = 1. 65 α̂ = n . 97 (a) μ̂ = 100. x x n 99 Yes.556. μ2 = v. σ̂ = . (b) 35 . α α   25 89 .69 and δ̂ = 412. 79 μ̂ = 28. 75 μ = 12 . 91 θ̂ = 0.64. 73 (a) Yes. n1 + n 2 29 (a) 34 . 1 ai = 1. 59 λ̂ = x. 55 θ̂ = 3m1 .29. Point Estimation Answers to Odd-Numbered Exercises  n 67 α̂ = y1 . (b) μ̂ = 112. 51 θ̂ = m1 . 1 1 95 0. 61 β̂ = x2 . β̂ = yn .95. 9 (n + 1)Y1 . 63 (a) θ̂ = .30.   2 (v − v)2 + (w − w)2 71 μ1 = v. 53 λ̂ = m1 .83 and β̂ = 11. symmetrical about x = 1 . This page intentionally left blank . For instance. we might have to supplement a point estimate θ̂ of θ with the size of the sample and the value of var() ˆ or with some other information about the sam- pling distribution of . there are various confidence intervals for μ. where θ̂1 and θ̂2 are values of appropriate random variables ˆ 1 and  ˆ 2. it does not tell us on how much infor- mation the estimate is based. If ␪ˆ 1 and ␪ˆ 2 are values of the random vari- ˆ 1 and  ables  ˆ 2 such that P( ˆ 1 <θ < ˆ 2) = 1 − α for some specified probability 1 − ␣. Thus. based on a single random sample. Alternatively. It should be understood that. when α = 0. the degree of confidence is 0.ˆ As we shall see. CONFIDENCE INTERVAL. Irwin Miller. Eighth Edition. another desirable property is to have the expected length. and the endpoints of the interval are called the lower and upper confidence limits. From Chapter 11 of John E. Copyright © 2014 by Pearson Education.95 and we get a 95% confidence interval. we might use interval estimation. methods of interval estimation are judged by their various sta- tistical properties. All rights reserved. interval estimates of a given parameter are not unique. this will enable us to appraise the possible size of the error. like point estimates. 317 . The probability 1 − ␣ is called the degree of confidence. As was the case in point estimation. we refer to the interval θ̂1 < θ < θ̂2 as a (1 − ␣)100% confidence interval for ␪. it leaves room for many questions. where we show that. An interval estimate of θ is an interval of the form θ̂1 < θ < θ̂2 . E( ˆ 2 − ˆ 1 ) as small as possible.Interval Estimation 1 Introduction 5 The Estimation of Differences Between 2 The Estimation of Means Proportions 3 The Estimation of Differences Between Means 6 The Estimation of Variances 4 The Estimation of Proportions 7 The Estimation of the Ratio of Two Variances 8 The Theory in Practice 1 Introduction Although point estimation is a common way in which estimates are expressed. all having the same degree of confidence 1 − α. Inc. nor does it tell us anything about the possible size of the error. Freund’s Mathematical Statistics with Applications. Marylees Miller.05. For instance. DEFINITION 1. one desirable property is to have the length of a (1 − α)100% confidence interval as short as possible. For instance. This is illustrated by Exercises 2 and 3 and also in Section 2. It follows that     σ   P X − μ < zα/2 · √ = 1−α n or. If.575 into the expression for the maxi- mum error. Interval Estimation 2 The Estimation of Means To illustrate how the possible size of errors can be appraised in point estimation. THEOREM 1.2. By the theorem.2 for such data. the mean of a random sample of size n from a normal population with the known variance σ 2 . suppose that the mean of a random sample is to be used to estimate the mean of a normal population with the known variance σ 2 . its sampling distribution is a normal distribution with the mean μ and the variance σ 2 /n”. the probability is 1 − α that the error will be σ less than zα/2 · √ . If X. we get 6. the efficiency experts can assert with probability 0. based on experience.2 2. the sampling distribution of X for random samples of size n from a normal population with the mean μ and the variance σ 2 is a normal distribution with σ2 μx = μ and σx2 = n Thus.575 · √ = 1. the efficiency experts can assume that σ = 6. we have the following theorem.005 = 2. we can write P(|Z| < zα/2 ) = 1 − α where X −μ Z= √ σ/ n and zα/2 is such that the integral of the standard normal density from zα/2 to q equals α/2.99 about the maximum error of their estimate? Solution Substituting n = 150.99 that their error will be less than 1.30. in words. and z0.30 150 Thus. σ = 6. what can they assert with probability 0. “If χ is the mean of a random sample of size n from a normal population with the mean μ and the vari- ance σ 2 . is to be used as an estimator of the mean of the population. n EXAMPLE 1 A team of efficiency experts intends to use the mean of a random sample of size n = 150 to estimate the average mechanical aptitude of assembly-line workers in a large industry (as measured by a certain standardized test). 318 . is less than 1.” In general. x = 69.30. Accordingly.99 probability applies to the method that they used to get their estimate and calculate the maximum error (collecting the sample data. and using the formula of Theorem 1) and not directly to the parameter that they are trying to estimate. EXAMPLE 2 If a random sample of size n = 20 from a normal population with the variance σ 2 = 225 has the mean x = 64. but it must be understood that the 0. is less than 1. x = 64.99 that the error of their estimate.30? After all.96 · √ < μ < 64. Interval Estimation Suppose now that these efficiency experts actually collect the necessary data and get x = 69.3.3 + 1. we return to the probability   σ P |X − μ| < zα/2 · √ = 1−α n the previous page. we have the following theorem. THEOREM 2. x = 69.96 · √ 20 20 which reduces to 57.3. we get 15 15 64.30 or it does not. x = 69. Can they still assert with probability 0.5. To clarify this distinction. Actually.7 < μ < 70. construct a 95% confidence interval for the popu- lation mean μ. If x is the value of the mean of a random sample of size n from a normal population with the known variance σ 2 .9 319 .96 into the confidence-interval formula of Theorem 2. we should have said in our example that the efficiency experts can be 99% confident that the error of their estimate. determining the value of x. σ = 15.5 differs from the true (population) mean by less than 1.5. then σ σ x − zα/2 · √ < μ < x + zα/2 · √ n n is a (1 − α)100% confidence interval for the mean of the population. it has become the custom to use the word “confidence” here instead of “probability. the potential error of an estimate) and confidence statements once the data have been obtained. we make probability statements about future values of random variables (say.3 − 1. and they have no way of knowing whether it is one or the other. and z0.5. which we now write as   σ σ P X − zα/2 · √ < μ < X + zα/2 · √ = 1−α n n From this result. To construct a confidence-interval formula for estimating the mean of a normal population with the known variance σ 2 . Solution Substituting n = 20.025 = 1. they can. Interval Estimation As we pointed out earlier. say. This may be seen by changing the confidence-interval formula of Theorem 2 to σ σ x − z2α/3 · √ < μ < x + zα/3 · √ n n or to the one-sided (1 − α)100% confidence-interval formula σ μ < x + zα · √ n Alternatively.73 19. we get 5.92. Use the following data (in minutes). n < 30. Instead. n−1 < T < tα/2. 320 . a random sample. z0.96 · √ < μ < 19.92 − 1. x = 19. and s = 5. Substi- X −μ tuting √ for T in S/ n P(−tα/2. Strictly speaking.79 minutes. these results can also be used for random samples from nonnormal populations provided that n is sufficiently large.92 + 1.05 and 21. EXAMPLE 3 An industrial designer wants to determine the average amount of time it takes an adult to assemble an “easy-to-assemble” toy. to construct a 95% confidence interval for the mean of the popula- tion sampled: 17 13 18 19 17 21 29 22 16 28 21 15 26 23 24 20 8 17 17 21 32 18 25 22 16 10 20 22 19 14 30 22 12 24 28 11 Solution Substituting n = 36. that is. the 95% confidence limits are 18. When we are dealing with a random sample from a normal population. Theorems 1 and 2 cannot be used. n G 30. n−1 ) = 1 − α we get the following confidence interval for μ. However.73 5. by virtue of the central limit theorem. confidence-interval formulas are not unique. we could base a confidence interval for μ on the sample median or. and σ is unknown.96 · √ 36 36 Thus. the midrange. we may also substitute for σ the value of the sample standard deviation. we make use of the fact that X −μ T= √ S/ n is a random variable having the t distribution with n − 1 degrees of freedom.96.73 for σ into the confidence- interval formula of Theorem 2.025 = 1. Theorems 1 and 2 require that we are dealing with a random sample from a normal population with the known variance σ 2 . In that case. 025. the standard normal distribution. Graybill. but whose distribution for random samples from normal populations. This was the case. we refer to it as a small-sample confidence interval for μ. 3 The Estimation of Differences Between Means For independent random samples from normal populations 321 . Since this confidence-interval formula is used mainly when n is small.3 minutes and a standard deviation of 8. If for 12 test areas of equal size he obtained a mean drying time of 66. but there exist more general methods. the 95% confidence interval for μ becomes 8.4 8. n−1 · √ < μ < x + tα/2.201 (from Table IV of “Statistical Tables”). such as the one discussed in the book by Mood.3 − 2. for example.3. This method of confidence-interval construction is called the pivotal method and it is widely used.6 This means that we can assert with 95% confidence that the interval from 61. construct a 95% confidence interval for the true mean μ. n−1 · √ n n is a (1 − α)100% confidence interval for the mean of the population.4. s = 8. and Boes referred to at the end of this chapter. Interval Estimation THEOREM 3. when we used the random variable X −μ Z= √ σ/ n whose values cannot be calculated without knowledge of μ.11 = 2.201 · √ 12 12 or simply 61.4 minutes.4 66.0 min- utes to 71. does not involve μ.6 minutes contains the true average drying time of the paint. Solution Substituting x = 66. less than 30.201 · √ < μ < 66. yet whose distribution does not involve the parameter we are trying to estimate.3 + 2. The method by which we constructed confidence intervals in this section con- sisted essentially of finding a suitable random variable whose values are determined by the sample data as well as the population parameters. EXAMPLE 4 A paint manufacturer wants to determine the average drying time of a new interior wall paint. then s s x − tα/2. and t0.0 < μ < 71. If x and s are the values of the mean and the standard devia- tion of a random sample of size n from a normal population. Interval Estimation (X 1 − X 2 ) − (μ1 − μ2 ) Z=  σ12 σ22 + n1 n2 has the standard normal distribution. then   σ12 σ22 σ12 σ22 (x1 − x2 ) − zα/2 · + < μ1 − μ2 < (x1 − x2 ) + zα/2 · + n1 n2 n1 n2 is a (1 − α)100% confidence interval for the difference between the two population means. we find from Table III of “Statistical Tables” that z0. we simply substitute 322 .3 < μ1 − μ2 < 25.88 · + < μ1 − μ2 < (418 − 402) + 1. given that a random sample of 40 light bulbs of the first kind lasted on the average 418 hours of continuous use and 50 light bulbs of the sec- ond kind lasted on the average 402 hours of continuous use. when n1 G 30 and n2 G 30. the 94% confidence interval for μ1 − μ2 is   262 222 262 222 (418 − 402) − 1. n2 G 30. but σ1 and σ2 are unknown. Solution For α = 0. If x1 and x2 are the values of the means of independent ran- dom samples of sizes n1 and n2 from normal populations with the known variances σ12 and σ22 .88 · + 40 50 40 50 which reduces to 6. To construct a (1 − α)100% confidence interval for the difference between two means when n1 G 30.03 = 1. EXAMPLE 5 Construct a 94% confidence interval for the difference between the mean lifetimes of two kinds of light bulbs. By virtue of the central limit theorem.88.7 Hence.7 hours contains the actual difference between the mean lifetimes of the two kinds of light bulbs.06. this confidence-interval formula can also be used for independent random samples from nonnormal populations with known variances when n1 and n2 are large. we are 94% confident that the interval from 6. The fact that both confidence limits are positive suggests that on the average the first kind of light bulb is superior to the second kind. If we substitute this expression for Z into P(−zα/2 < Z < zα/2 ) = 1 − α the pivotal method yields the following confidence-interval formula for μ1 − μ2 . There- fore. that is. THEOREM 4. The population standard deviations are known to be σ1 = 26 and σ2 = 22.3 to 25. Substituting this expression for T into P(−tα/2. In Exercise 9 the reader will be asked to verify that the resulting pooled estimator (n1 − 1)S21 + (n2 − 1)S22 S2p = n1 + n2 − 2 is. then Y = ni=1 Xi has the chi-square distribution with ν1 + ν2 + · · · + νn degrees of freedom” the independent random variables (n1 − 1)S21 (n2 − 1)S22 and σ2 σ2 have chi-square distributions with n1 − 1 and n2 − 1 degrees of freedom. ν2 . When σ1 and σ2 are unknown and either or both of the samples are small. and their sum (n1 − 1)S21 (n2 − 1)S22 (n1 + n2 − 2)S2p Y= + = σ2 σ2 σ2 has a chi-square distribution with n1 + n2 − 2 degrees of freedom. indeed. then 1. νn degrees of freedom. . . the procedure for estimating the difference between the means of two normal populations is not straightforward unless it can be assumed that σ1 = σ2 . Now. . n−1 < T < tα/2. the random variable has a chi-square distribution with n − 1 σ2 degrees of freedom. Interval Estimation s1 and s2 for σ1 and σ2 and proceed as before. If X1 . 323 . . . an unbiased estimator of σ 2 . n−1 ) = 1 − α we arrive at the following (1 − α)100% confidence interval for μ1 − μ2 . then (X 1 − X 2 ) − (μ1 − μ2 ) Z=  1 1 σ + n1 n2 is a random variable having the standard normal distribution. Since it can be shown that the above random variables Z and Y are independent (see references at the end of this chapter) Z T= Y n1 + n2 − 2 (X 1 − X 2 ) − (μ1 − μ2 ) =  1 1 Sp + n1 n2 has a t distribution with n1 + n2 − 2 degrees of freedom. . . Xn are independent random variables  having chi- square distributions with ν1 . . If σ1 = σ2 = σ . X and S2 are inde- (n − 1)S2 pendent. 2. and σ 2 can be esti- mated by pooling the squared deviations from the means of the two samples. X2 . “If X and S2 are the mean and the variance of a random sample of size n from a normal pop- ulation with the mean μ and the standard deviation σ . by the two theorems. Since this confidence-interval formula is used mainly when n1 and/or n2 are small.120(0. we refer to it as a small-sample confidence interval for μ1 − μ2 .596) + 10 8 which reduces to −0.20 < μ1 − μ2 < 1. n2 = 8. n1 + n2 −2 · sp + n1 n2 is a (1 − α)100% confidence interval for the difference between the two population means. If x1 . substituting this value together with n1 = 10.7.7) + 2.5.00 milligrams. construct a 95% confidence interval for the differ- ence between the mean nicotine contents of the two brands of cigarettes.1. and s2 are the values of the means and the standard deviations of independent random samples of sizes n1 and n2 from normal populations with equal variances. 324 . but observe that since this includes μ1 − μ2 = 0. n2 = 8.49) sp = = 0.7 milligrams with a standard deviation of 0. less than 30. while eight cigarettes of Brand B had an aver- age nicotine content of 2.120 (from Table IV of “Statistical Tables”) into the confidence-interval formula of Theorem 5. then  1 1 (x1 − x2 ) − tα/2. we find that the required 95% confidence interval is  1 1 (3. x1 = 3.596) + < μ1 − μ2 10 8  1 1 < (3.7 into the formula for sp . x2 .25) + 7(0. and s2 = 0.1 milligrams with a standard deviation of 0. the 95% confidence limits are −0. x2 = 2. Solution First we substitute n1 = 10. EXAMPLE 6 A study has been made to compare the nicotine contents of two brands of cigarettes.025.16 = 2. s1 = 0.1 − 2. Interval Estimation THEOREM 5.00 Thus.1 − 2.596 16 Then. Ten cigarettes of Brand A had an average nicotine content of 3. we cannot conclude that there is a real difference between the average nicotine contents of the two brands of cigarettes.20 and 1. s1 .5 milligram. n1 + n2 −2 · sp + < μ1 − μ2 n1 n2  1 1 < (x1 − x2 ) + tα/2. and t0. and we get  9(0.7 milligram. Assuming that the two sets of data are independent random samples from normal populations with equal variances.120(0.7) − 2. If x1 and x2 are the values of a random sample of size 2 from a population having a uniform density with α = 0 6.) edition. which enables us to appraise the maximum error in using is shorter than the (1 − α)100% confidence interval x1 − x2 as an estimate of μ1 − μ2 under the conditions of σ σ Theorem 4. X − nθ Z= √ nθ (1 − θ ) can be treated as a random variable having approximately the standard normal dis- tribution. State a theorem analogous to Theorem 1. that is. Show that the (1 − α)100% confidence interval the maximum error when σ 2 is unknown. the abso- lute value of our error. or rates.) x − zα/2 · √ < μ < x + zα/2 · √ n n 8. this formula cannot be used unless it is reasonable to assume that we are sampling a 3. Thus. Interval Estimation Exercises 1. 2. This question has been intentionally omitted for this normal population. hence. x − z2α/3 · √ < μ < x + zα/3 · √ n n 9. In many of these it is reasonable to assume that we are sampling a binomial population and. Show that if x is used as a point estimate of μ and σ and β = θ . (b) α > 12 . such as the proportion of defectives in a large shipment of transistors. 4 The Estimation of Proportions In many problems we must estimate proportions. 7. This question has been intentionally omitted for this of the form edition.Q. the probability that a car stopped at a road block will have faulty lights. or the mortality rate of a disease. find k so that is known. find k so that the interval from 0 to kx is n n a (1 − α)100% confidence interval for the parameter θ .’s over 115. Show that S2p is an unbiased estimator of σ 2 and find its variance under the conditions of Theorem 5. the one with k = 0. the percentage of schoolchil- dren with I. Modify Theorem 1 so that it can be used to appraise 4. that our problem is to estimate the binomial parameter θ . we can make use of the fact that for large n the binomial distribution can be approximated with a normal distribution. the probability is 1 − α that |x − μ|. (Note that this method can be used only after the data have σ σ been obtained. percentages. Substituting this expression for Z into P(−zα/2 < Z < zα/2 ) = 1 − α we get   X − nθ P −zα/2 < √ < zα/2 = 1−α nθ (1 − θ ) 325 . σ σ x − zkα · √ < μ < x + z(1−k)α · √ tial distribution. If x is a value of a random variable having an exponen. will not exceed a specified amount 0 < θ < k(x1 + x2 ) e when  σ 2 n = zα/2 · is a (1 − α)100% confidence interval for θ when e (a) α F 12 . (If it turns out that n < 30. Show that among all (1 − α)100% confidence intervals 10. 5. probabilities.5 is the shortest. then n   θ̂(1 − θ̂ ) θ̂(1 − θ̂ ) θ̂ − zα/2 · < θ < θ̂ + zα/2 · n n is an approximate (1 − α)100% confidence interval for θ .39. let us give here instead a large-sample approx- X − nθ imation by rewriting P(−zα/2 < Z < zα/2 ) = 1 − α. we get   (0.34. 0. and z0.29 < θ < 0. If X is a binomial random variable with the parameters n and x θ .34)(0.96 400 400 0. θ̂ = 136 400 = 0.96 into the confidence-interval formula of Theorem 6. Then. Leaving the details of this to the reader in Exercise 11.294 < θ < 0. Solution Substituting n = 400. which is a further n approximation. with √ substituted for nθ (1 − θ ) Z.34)(0. Construct a 95% confidence interval for the true proportion of persons who will experience some discomfort from the vaccine. THEOREM 6.025 = 1. n is large. we obtain the following theorem.34 + 1. if we substitute θ̂ for θ inside the radicals. as     θ (1 − θ ) θ (1 − θ ) P  ˆ − zα/2 · <θ < ˆ + zα/2 · = 1−α n n X where ˆ = . 326 . we can also obtain the fol- lowing theorem. and θ̂ = .386 or. EXAMPLE 7 In a random sample. 136 of 400 persons given a flu vaccine experienced some dis- comfort.66) (0.66) 0. rounding to two decimals. Interval Estimation and the two inequalities x − nθ x − nθ −zα/2 < √ and √ < zα/2 nθ (1 − θ ) nθ (1 − θ ) whose solution will yield (1 − α)100% confidence limits for θ .34 − 1. Using the same approximations that led to Theorem 6.96 < θ < 0. If the respective numbers of successes are X1 and X2 and the corresponding X X sample proportions are denoted by  ˆ 1 = 1 and  ˆ 2 = 2 .061 400 or 0.575 into the formula of Theorem 7. 5 The Estimation of Differences Between Proportions In many problems we must estimate the difference between the binomial parameters θ1 and θ2 on the basis of independent random samples of sizes n1 and n2 from two binomial populations.06 rounded to two decimals.06.35 as an estimate of the actual proportion of voters in the community who favor the project. Let’s take E( ˆ 1 − ˆ 2 ) = θ1 − θ2 and θ1 (1 − θ1 ) θ2 (1 − θ2 ) ˆ 1 − var( ˆ 2) = + n1 n2 327 .35)(0. we can assert with 99% confidence that the error is less than 0.005 = 2. If θ̂ = is used as an estimate of θ .35. we can assert with (1 − n α)100% confidence that the error is less than  θ̂(1 − θ̂ ) zα/2 · n EXAMPLE 8 A study is made to determine the proportion of voters in a sizable community who favor the construction of a nuclear power plant. If 140 of 400 voters selected at random favor the project and we use θ̂ = 140 400 = 0.35 as an estimate of the actual proportion of all voters in the community who favor the project. let us investigate the n1 n2 ˆ 1 − sampling distribution of  ˆ 2 . which is an obvious estimator of θ1 − θ2 . if we use θ̂ = 0.65) 2. and z0. θ̂ = 0. we get  (0.575 · = 0. Thus. if we want to estimate the difference between the proportions of male and female voters who favor a certain candidate for governor of Illinois. for example. Interval Estimation x THEOREM 7. what can we say with 99% confidence about the maximum error? Solution Substituting n = 400. This would be the case. 575 into the confidence- 90 interval formula of Theorem 8. find a 99% confidence interval for the difference between the actual proportions of male and female voters who favor the candidate.66.575 + 200 150 which reduces to −0.66 − 0.66 − 0.575 + < θ1 − θ2 200 150  (0. Solution Substituting θ̂1 = 132 200 = 0. EXAMPLE 9 If 132 of 200 male voters and 90 of 150 female voters favor a certain candidate running for governor of Illinois.074 to 0. θ̂2 = 150 = 0. X2 is a binomial random variable with the parameters n2 and θ2 . n1 x1 x2 and n2 are large. it follows that (ˆ 1 − ˆ 2 ) − (θ1 − θ2 ) Z=  θ1 (1 − θ1 ) θ2 (1 − θ2 ) + n1 n2 is a random variable having approximately the standard normal distribution.60) − 2.34) (0.194 Thus.194 contains the dif- ference between the actual proportions of male and female voters who favor the candidate. Interval Estimation and since. and z0. and hence also their difference.40) (0. we get  (0.34) (0. we arrive at the following result. can be approximated with normal distributions.66)(0.60) + 2.60)(0. X1 and X2 . THEOREM 8.40) < (0.60. If X1 is a binomial random variable with the parameters n1 and θ1 . then n1 n2  θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) (θ̂1 − θ̂2 ) − zα/2 · + < θ1 − θ2 n1 n2  θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) < (θ̂1 − θ̂2 ) + zα/2 · + n1 n2 is an approximate (1 − α)100% confidence interval for θ1 − θ2 .66)(0. for large samples. Observe that this includes the possibility of a zero difference between the two proportions. 328 . we are 99% confident that the interval from −0. and θ̂1 = and θ̂2 = .60)(0. Sub- stituting this expression for Z into P(−zα/2 < Z < zα/2 ) = 1 − α.074 < θ1 − θ2 < 0.005 = 2. Interval Estimation Exercises 11. Use the formula of Theorem 7 to demonstrate that we can be at least (1 − α)100% confident that the error 16. 2 n 4 n + z2α/2 15. 2 n−1 χ1−α/2. we can obtain a (1 − α)100% confidence interval for σ 2 by making use of the theorem referred under Section 3. By solving 13. n−1 = 1 − α σ2 ⎡ ⎤ ⎣ (n − 1)S (n − 2 1)S2 P <σ2 < 2 ⎦ = 1−α χα/2. θ1 − θ2 . 12. we can be at least (1 − α)100% confident that the x error that we make when using θ̂1 − θ̂2 as an estimate of θ̂ = with n θ1 − θ2 is less than e when z2α/2 n= 4e2 z2α/2 as an estimate of θ . Find a formula for the maximum error analogous to that of Theorem 7 when we use θ̂1 − θ̂2 as an estimate of are (1 − α)100% confidence limits for θ . n−1 < < χα/2. If s2 is the value of the variance of a random sample of size n from a normal population. to the confidence-interval formula of 1 2 x(n − x) 1 2 x + · zα/2 . we obtain the following theorem. Fill in the details that led from the Z statistic for θ . show that on the previous page. 329 . 2 (n − 1)S2 2 P χ1−α/2. 2 n−1 χ1−α/2. n= 2e2 6 The Estimation of Variances Given a random sample of size n from a normal population. THEOREM 9. Thus. substituted into P(−zα/2 < Z <  zα/2 ) = 1 − α. Find a formula for n analogous to that of Exercise 12 when it is known that θ must lie on the interval from θ  x − nθ x − nθ to θ  . according to which (n − 1)S2 σ2 is a random variable having a chi-square distribution with n − 1 degrees of free- dom. n−1 Thus. Use the result of Exercise 15 to show that when n1 = we make is less than e when we use a sample proportion n2 = n. then (n − 1)s2 (n − 1)s2 <σ2 < χα/2. −zα/2 = √ and √ = zα/2 nθ (1 − θ ) nθ (1 − θ ) 14. 2 n−1 is a (1 − α)100% confidence interval for σ 2 . zα/2 + · zα/2 Theorem 8. then F = 12 12 = 22 12 is a random variable having an F S2 /σ2 σ1 S2 distribution with n1 − 1 and n2 − 1 degrees of freedom”. n1 −1. 7 The Estimation of the Ratio of Two Variances If S21 and S22 are the variances of independent random samples of sizes n1 and n2 from normal populations. we can write   σ22 S21 P f1−α/2. EXAMPLE 10 In 16 test runs the gasoline consumption of an experimental engine had a standard deviation of 2.995. “If S21 and S22 are the variances of independent random samples of sizes n1 and n2 from normal populations with S2 /σ 2 σ 2 S2 the variances σ12 and σ22 . according to the theorem.801 and χ0.78 To get a corresponding 99% confidence interval for σ . along with χ0. n1 −1. Interval Estimation Corresponding (1 − α)100% confidence limits for σ can be obtained by taking the square roots of the confidence limits for σ 2 . n2 −1 = 1−α σ12 S22 Since 1 f1−α/2. Thus. Construct a 99% confidence interval for σ 2 .601.2)2 <σ2 < 32. n1 −1 we have the following result.97. n2 −1 < < fα/2. we substitute n = 16 and s = 2. into the 2 confidence-interval formula of Theorem 9.601 or 2.152 = 32. 330 . we take square roots and get 1. and we get 15(2.49 < σ < 3. which measures the true variability of the gasoline consumption of the engine.005. σ22 S21 F= σ12 S22 is a random variable having an F distribution with n1 − 1 and n2 − 1 degrees of free- dom. obtained from Table V of “Statistical Tables”. then. Solution Assuming that the observed data can be looked upon as a random sample from a normal population.2. n2 −1 = fα/2. n1 −1.2)2 15(2.2 gallons.801 4.21 < σ 2 < 15.15 = 4. n2 −1. 7. n1 −1 s22 fα/2.72 and f0.01. For large n. s1 = 0. Use this formula to find a 99% upper confi- dence limit for the proportion of defectives produced s s by a process if a sample of 200 units contains three zα/2 < σ < zα/2 1+ √ 1− √ defectives.7 = 6. 2n 2n 331 . Fill in the details that led from the probabil- assumes a value close to zero. s2 = 0.2(x+1) 2n . the sampling distribution of S is some- times approximated with a normal distribution having the 1 2 σ2 θ< χ mean σ and the variance 2n α.49 6. If it can be assumed that the binomial parameter θ 18.25 1 σ 2 0.5. n2 −1. upper confidence limits of ity in Section 6 to the confidence-interval formula of the form θ < C are often useful. size n.862 σ22 Since the interval obtained here includes the possibility that the ratio is 1.25 · < 12 < · 5.49 or σ12 0. n2 −1 σ22 s22 σ12 is a (1 − α)100% confidence interval for . the one-sided interval 19. there is no real evidence against the assumption of equal population variances in Example 6.7.076 < < 2.01. find a 98% confidence interval for . Exercises 17.61 from Table VI of “Statistical Tables”. σ22 EXAMPLE 11 σ12 With reference to Example 6. σ22 Solution Substituting n1 = 10. and f0.9 = 5. n2 = 8. Show that this approxi- mation leads to the following (1 − α)100% large-sample has a confidence level closely approximating (1 − α) confidence interval for σ : 100%. If s21 and s22 are the values of the variances of independent random samples of sizes n1 and n2 from normal populations. n1 −1.61 0. Interval Estimation THEOREM 10.72 σ2 0. σ22 σ1 Corresponding (1 − α)100% confidence limits for can be obtained by taking the σ2 σ12 square roots of the confidence limits for . then s21 1 σ12 s21 · < < · fα/2. we get 0.9. For a random sample of Theorem 10. sampling all kinds of populations) when a 332 . because there is an abundance of software that requires only that we enter the original raw (untreated) data into our computer together with the appro- priate commands. 162.800. In practice. Solution The computer printout of Figure 1 shows that the desired confidence interval is 124. which is given s by √ . Also important is the use of computers in simu- lating values of random variables (that is.500. 108. Construct a 95% confidence interval for the average amount of traffic (car crossings) that this paint can withstand before it deteriorates.000.600. All this is important. EXAMPLE 12 To study the durability of a new paint for white center lines.400. computers enable us to do more efficiently—faster. a highway department painted test strips across heavily traveled roads in eight different locations. n Figure 1. the example cannot very well do justice to the power of com- puters to handle enormous sets of data and perform calculations not even deemed possible until recent years. and almost automatically—what was done previously by means of desk calculators. 133.917 car crossings. computers can be used to tabulate or graph functions (say. their standard deviation. and the estimated standard error of the mean. However. which allow for methods of analysis that were not available in the past. the t. 136.300. and elec- tronic counters showed that they deteriorated after having been crossed by (to the nearest hundred) 142. or χ 2 distributions) and thus give the inves- tigator a clear understanding of underlying models and make it possible to study the effects of violations of assumptions. consider the following example.400 cars. Among other things. To illustrate. 126. hand-held calculators. or even by hand. SE MEAN. more cheaply. the mean of the data.758 < μ < 156. Computer printout for Example 12.700. 167. none of this is really necessary. our example does not show how computers can summarize the output as well the input and the results as well as the original data in various kinds of graphs and charts. Interval Estimation 8 The Theory in Practice In the examples of this chapter we showed a number of details about substitutions into the various formulas and subsequent calculations. dealing with a sample of size n = 8. and 149. but it does not do justice to the phenomenal impact that computers have had on statistics. As used in this example. F. Also. It also shows the sample size. studies that σ = 12.8 mm of 18 of these records shows average sales of 63. mean blood pressure of women in their fifties. A district official intends to use the mean of a ran. 1–3 20. by less than 2. 30. shows that 61 failures of the first kind of equipment 27.4.2 hours.9. This question has been intentionally omitted for this 34. 33.05 minutes as an estimate of the σ2 = 3. Use sample mean is off by less than 20 minutes. In the applied exercises that follow.1 minutes to repair with a standard deviation of 18.8 and error if he uses x = 24.4. 32. what can the manufac.95 about the maximum to estimate the average number of hours that teenagers error? spend watching per week. An efficiency expert wants to determine the average minutes. 2.80 mm with a standard deviation of peanut butter. Applied Exercises SECS.2 and x2 = 23. Interval Estimation formal mathematical approach is not feasible.29 cm.2 seconds.2. 2. a estimate of the average percentage of impurities in this random sample.5 mm. suppose that the dis. This provides an important tool when we study the appropriateness of statistical models.8. 2. examining 12 jars of a certain brand grew on the average 52. A medical research worker intends to use the mean dard deviation of 0.8 28.5 mm of mercury.75 gallons.8. Use the formula for n in Exercise 6 takes to repair failures of the two kinds of photocopying to determine the sample size that is needed so that the equipment.95 that the dom sample of 150 sixth graders from a very large school sample mean will differ from μ. tain arithmetic achievement test. what can of bird. the official knows that σ = 9. (Hint: Refer all the given information to construct a 99% confidence to Exercise 6.9. construct a 99% confidence interval for the mean of the population sampled. an automobile manufacturer had 40 mechanics. Based on the modification of Theorem 1 of Exer- cise 7. If it is reasonable to assume that σ = 3. the reader is encouraged to use a statistical computer program as much as possible. If.3. 2. 3.0. what 29. find a 95% confidence blood pressure of women in their fifties. If. given that there are 900 sixth graders dard deviation of 19. suppose that the ious transactions with its customers. efficiency expert can assert with probability 0. 26. interval for the mean length of the skulls of this species rience. the quantity to be esti- district to estimate the mean score that all the sixth mated. it will be possible to assert with 95% confidence that the trict official takes her sample and gets x = 61. the second kind of equipment took on the average 88. To estimate the average time required for certain maximum error if she uses the mean of this sample as an repairs. 24. Use the modification suggested in Exercise 26 to took on the average 80.7 minutes to repair with a stan- rework Exercise 21. based on experience.4 minutes. 1. Find a 99% confidence interval for the dif- amount of time it takes a pit crew to change a set of four ference between the true average amounts of time it tires on a race car.) interval for the mean score of all the sixth graders in the district. selected at random in a desert region. 2. It is known from previous graders in the district would get if they took a cer. With reference to Exercise 22. timed in the performance of this task.8. A study of two kinds of photocopying equipment edition.3. With reference to Exercise 20.99 about the maximum error? 31. and 1.4 for such data. Find a actual mean time required to perform the given repairs? 90% confidence interval for μ1 − μ2 . true average annual growth of the given kind of cactus. A study of the annual growth of certain cacti showed that 64 of them. Independent random samples of sizes n1 = 16 and turer assert with 95% confidence about the maximum n2 = 25 from normal populations with σ1 = 4. The length of the skulls of 10 fossil skeletons of an extinct species of bird has a mean of 5.6.68 minutes. he knows that σ = 10. If brand of peanut butter? it took them on the average 24. what can she assert with 95% confidence about the 25. whereas 61 failures of in the school district.5 have the means x1 = 18. A major truck stop has kept extensive records on var- 23. 3.5 seconds. obtained the following percentages of of 4.1. he assert with probability 0. A food inspector. 333 . If a random sample research worker takes his sample and gets x = 141. it is desired can she assert with probability 0. In a study of television viewing habits.1.68 cm and a stan- 22. 1. based on expe. Construct a 99% confidence interval for the impurities: 2. Construct a 98% confidence interval for the of diesel fuel with a standard deviation of 2.05 minutes with a stan- dard deviation of 2. how large a sample is needed so that 21.84 gallons of mercury. 1. Assuming that such measure- of a random sample of size n = 120 to estimate the mean ments are normally distributed. 95% confidence about the maximum error if we use the observed sample proportion as an estimate of the corre. 8. 39. later. If it can be assumed that σ1 = 0.480. electric wires. attraction. random in a given year. To study the effect of alloying on the resistance of fident that the sample proportion is off by less than 0. 8. With struct a 95% confidence interval for the difference 90% confidence.04. Use the result of Exercise 13 to rework Exercise 47. were inedible as a result of chemical pollution. Among 500 marriage license applications chosen at Construct a 99% confidence interval for the correspond. Use the SECS. only 102 had dessert. In a random sample of 120 cheerleaders.) all drivers exceed the legal speed limit on a certain stretch of road between Los Angeles and Bakersfield.8 feet with a stan.9 feet with a standard deviation of 1. interval for the difference between the corresponding true proportions of marriage license applications in which 41. 40.005 ohm for such data. height of 12. A private opinion poll is engaged by a politician to construct a 99% confidence interval for the difference estimate what proportion of her constituents favor the between the true average heat-producing capacities of decriminalization of certain minor narcotics violations. and among 400 (a) the large-sample confidence-interval formula of marriage license applications chosen at random six years Theorem 6. A sample survey at a supermarket showed that 204 resulting estimate. an engineer plans to measure the resis- tance of n1 = 35 standard wires and n2 = 45 alloyed 46. The following are the heat-producing capacities of coal from two mines (in millions of calories per ton): 44. what can we say with 99% confidence about the maximum error if we use the 49. 7. than 0. 18 variety have a mean height of 13. sample confidence-interval formula of Theorem 6 to con- struct a 95% confidence interval for the corresponding 48. Assuming that the random samples were selected 43.05? Assuming that the data constitute independent random samples from normal populations with equal variances. If we Mine A: 8. 54 had suf- from normal populations with equal variances. 7. Use the large.02. is off by less of 300 shoppers regularly use coupons. there were 48 in which the woman ing true proportion using was at least one year older than the man. With reference to Exercise 38.004 ohm and given that the poll has reason to believe that the true pro- σ2 = 0. what can we say with the woman was at least one year older than the man.920. 84 of 250 men and 156 of 250 women bought tion of all shoppers in the population sampled who use souvenirs. 98% confidence about the maximum error if she uses x1 − x2 as an estimate of μ1 − μ2 ? (Hint: Use the result of 47. Con- dard deviation of 1.710. 7. In a random sample of visitors to a famous tourist observed sample proportion as an estimate of the propor. fered moderate to severe damage to their voices.30.030 300 = 0. 8. true proportion. wires. given that we have good reason to believe that the pro- portion we are trying to estimate is at least 0. 190 had seen a certain controversial program. Interval Estimation 35.5 feet. there were 68 in which the woman was at least one year older than the man. 4–5 formula of Exercise 12 to determine how large a sam- ple we will need to be at least 99% confident that the 38.65.330. Construct a 99% confidence (b) the confidence limits of Exercise 11.45 as an rus trees.270. Use the formula of Exercise 12 to determine how large a sample the poll will have to take to be at least 95% con- 37. con. In a random sample of 300 persons eating lunch at a department store cafeteria. Suppose that we want to estimate what proportions of Exercise 8. coal from the two mines. estimate of the true proportion of cheerleaders who are afflicted in this way? 36. 51.860 proportion. the sample proportion.500.34 as an estimate of the corresponding true use 102 Mine B: 7. 7. With reference to Exercise 50.890. with what confidence can we assert that our error is less than 0. Twelve randomly selected mature citrus trees of one 42.960.2 feet. what can she assert with portion does not exceed 0. Among 100 fish caught in a certain lake. With reference to Exercise 40. 45. what can we say about the maximum between the true average heights of the two kinds of cit- 54 error if we use the sample proportion 120 = 0. 8. In a random sample of 250 television viewers in a large city. what can we say with sponding true proportion? 98% confidence about the maximum error if we use the 334 . and 15 randomly selected struct a 99% confidence interval for the corresponding mature citrus trees of another variety have a mean true proportion. 50. Use the result of Exercise 13 to rework Exercise 45. Construct a 95% confidence interval for the coupons? difference between the true proportions of men and women who buy souvenirs at this tourist attraction. New York: Biometrika Tables.4 6. Lexington. 3rd ed. With reference to Exercise 32. 9 . construct a 90% con- mula of Exercise 16 to determine the size of the samples fidence interval for the ratio of the variances of the two that are needed to be at least 95% confident that the dif- populations sampled.8 6. With reference to Exercise 35. construct a 98% con- an estimate of the difference between the corresponding fidence interval for the ratio of the variances of the two true proportions? (Hint: Use the result of Exercise 15. Graybill. New York: John Wiley & Sons.8 4.9 6. With reference to Exercise 36. H. References A general method for obtaining confidence intervals is and in other advanced texts on mathematical statistics. construct a 95% confi. use the large-sample the nearest 10 psi) of 30 concrete samples. α 13 n = θ ∗ (1 − θ ∗ ) . E. L. where θ ∗ is the value on the interval s σ e2 7 Substitute tα/2.. construct a 98% con- between the proportions of the customers of a donut fidence interval for the ratio of the variances of the two chain in North Carolina and Vermont who prefer the populations sampled. An Introduction to Mathematical Statistics. Use a computer program to find a 95% confidence inter- tion sampled. Suppose that we want to determine the difference 58.2 5.: Xerox Publishing Co.5 7. see Brunk.9 7. 61. A.8 5. ference between the two sample proportions is off by less than 0. 1974. and further criteria for judging the relative merits of con..0 54. Testing Statistical Hypotheses.. and the 53. from θ  to θ n closest to 12 .) populations sampled. C. With reference to Exercise 34. ln(1 − α) (n1 + n2 − λ) √ 1. 6–7 60. Introduc.6 6. Inc. and Boes. that is. sive strengths. A. n n 335 . time for each to complete a certain corrective action was dence interval for the true variance of the skull length of measured in seconds.. 1959. Use the for- 59. Interval Estimation difference between the observed sample proportions as 57. 1−α z2α/2 3 c= .. fidence intervals may be found in Lehmann. with the following results: the given species of bird. SEC.7 9. fidence intervals for proportions are given in the tion to the Theory of Statistics. confidence-interval formula of Exercise 19 to construct a 99% confidence interval for the standard deviation of the 4890 4830 5490 4820 5230 4960 5040 5060 4500 5260 annual growth of the given kind of cactus. With reference to Exercise 25. Mass. D.0 4. use the large-sample confidence-interval formula of Exercise 19 to construct a Use a computer program to find a 90% confidence 98% confidence interval for the standard deviation of the interval for the standard deviation of these compres- time it takes a mechanic to perform the given task. M. 52. 5.. With reference to Exercise 24. 3rd ed. F. construct a 90% con- fidence interval for the standard deviation of the popula. for the percentage of impurities in val for the mean time to take corrective action.6 7. For a proof of the independence of McGraw-Hill Book Company. given in Special tables for constructing 95% and 99% con- Mood. Answers to Odd-Numbered Exercises −1 2σ 4 1 k= . D. With reference to Exercise 30.n−1 √ for zα/2 √ .. 1975. 8 SECS.3 8.0 6. the random variables Z and Y in Section 3.05.6 6.8 4.4 4.5 3. Twenty pilots were tested in a flight simulator. the given brand of peanut butter. chain’s donuts to those of all its competitors.1 5. The following are the compressive strengths (given to 55. 4600 4630 5330 5160 4950 4480 5310 4730 4710 4390 4820 4550 4970 4740 4840 4910 4880 5200 5150 4890 56. 61. 401. 25 0. 47 n = 1.053. σ2 31 61. Interval Estimation  θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) 41 0. 27 59. 57 0.57 < μ < 144.3.998 feet. 29 355.04 < σ 2 < 0. σ22 39 0.915.204.72 gallons.28.83 minute.372 < θ1 − θ2 < −0.82 < μ < 63.96 < μ < 65.198 < μ1 − μ2 < 1.0023 ohm. 43 0.069.053. 51 0. 37 0. 21 59. σ2 59 0. 55 3. 336 .67 < σ < 5.78.99 < μ < 63.7 < σ < 352. 61 227. n1 n2 17 0.506. σ22 35 −1. 15 E < zα/2 + .03. 45 n = 2. 49 −0.233 < 1 < 9.050.83. 33 −7.96. 037.58 < 1 < 1. 23 139. 53 0.075.485 < μ1 − μ2 < −2. Marylees Miller. but sometimes they also concern the type. of course.Hypothesis Testing 1 Introduction 5 The Power Function of a Test 2 Testing a Statistical Hypothesis 6 Likelihood Ratio Tests 3 Losses and Risks 7 The Theory in Practice 4 The Neyman–Pearson Lemma 1 Introduction If an engineer has to decide on the basis of sample data whether the true average life- time of a certain kind of tire is at least 42. As in the preceding examples. the parameter of a binomial population. and if a manufacturer of pharmaceutical products has to decide on the basis of samples whether 90 percent of all patients given a new med- ication will recover from a certain disease. in the first of our three examples the engi- neer may also have to decide whether he is actually dealing with a sample from an exponential population or whether his data are values of random variables having. STATISTICAL HYPOTHESIS. if not. it is referred to as a composite hypothesis. In the first case we might say that the engineer has to test the hypothesis that θ . the one dealing with the effectiveness of the new medication. the hypothesis θ = 0. assuming. in the first of the preceding examples the hypothesis is composite since θ G 42. Irwin Miller.000. it is called a simple hypothesis. but also the values of all parameters. DEFINITION 1. A simple hypothesis must therefore specify not only the functional form of the underlying distribution. if an agronomist has to decide on the basis of experiments whether one kind of fertilizer produces a higher yield of soybeans than another. Inc. the parameter of an exponential popula- tion. An assertion or conjecture about the dis- tribution of one or more random variables is called a statistical hypothesis. Thus. In each case it must be assumed. All rights reserved. Copyright © 2014 by Pearson Education. the distribution provides the correct statistical model. that is. that we specify the sample size and that the population is binomial. and in the third case we might say that the manufacturer has to decide whether θ . For instance. is at least 42. equals 0. in the second case we might say that the agronomist has to decide whether μ1 > μ2 . these problems can all be translated into the language of statistical tests of hypotheses. say. If a statistical hypothesis completely specifies the distribution. However. Freund’s Mathematical Statistics with Applications. of the distributions themselves. most tests of statistical hypotheses concern the parameters of distributions.90. in the third of the above examples.000 does not assign a specific value to the parameter θ . the Weibull distribution.000 miles. Eighth Edition. or nature.90 is simple. 337 . that the chosen distribution correctly describes the experimental conditions. where μ1 and μ2 are the means of two normal populations. of course. From Chapter 12 of John E. in the example dealing with the lifetimes of the tires. Symbolically. he is committing a second kind of error referred to as a type II error. where μ1 and μ2 are the means of two normal populations. which will tell him what action to take for each possible outcome of the sample space.000. and if we want to show that there is a greater vari- ability in the quality of one product than there is in the quality of another. Frequently.60. For instance. Similarly. where θ is the parameter of an exponential population. In view of the assumptions of “no difference. it is nec- essary that we also formulate alternative hypotheses. if the true value of the parameter θ is θ1 and the statistician incorrectly concludes that θ = θ0 .Q. we might formulate the alternative hypoth- esis that the parameter θ of the exponential population is less than 42. we shall use the symbol H0 for the null hypothesis that we want to test and H1 or HA for the alternative hypothesis. Problems involving more than two hypotheses. we might formulate the alternative hypothesis μ1 = μ2 . statisticians formulate as their hypotheses the exact opposite of what they may want to show. he will generate sample data by conducting an experiment and then compute the value of a test statistic. we might formulate the alternative hypothesis that the parameter θ of the given binomial pop- ulation is only 0.000 against the composite alternative θ < 42. than those in another school. 338 . The concept of simple and composite hypotheses applies also to alternative hypotheses.” hypotheses such as these led to the term null hypoth- esis. With this hypoth- esis we know what to expect. if we want to show that the students in one school have a higher average I. Suppose. therefore. For instance. if the true value of the parameter θ is θ0 and the statistician incorrectly concludes that θ = θ1 . Similarly. we might formulate the hypothesis that the two percentages are the same. if we want to show that one kind of ore has a higher percentage con- tent of uranium than another kind of ore. that is. we might formulate the hypothesis that there is no difference. σ1 = σ2 . In order to make a choice. On the other hand. and in the example dealing with the new medication. The procedure just described can lead to two kinds of errors. for example.90 against the simple alternative θ = 0. he is committing an error referred to as a type I error.60. in the example dealing with the two kinds of fertilizer. that is. that a statistician wants to test the null hypothesis θ = θ0 against the alternative hypoth- esis θ = θ1 . tend to be quite complicated. which is the disease’s recovery rate without the new medication. and in the third example we are testing the simple hypothesis θ = 0. 2 Testing a Statistical Hypothesis The testing of a statistical hypothesis is the application of an explicit set of rules for deciding on the basis of a random sample whether to accept the null hypothesis or to reject it in favor of the alternative hypothesis. problems involving several alternative hypotheses. and in the first example we can now say that we are testing the com- posite hypothesis θ G 42. we might formulate the hypothesis that there is no difference: the hypothesis μ1 = μ2 . The test proce- dure. at least not unless we specify the actual difference between μ1 and μ2 . Hypothesis Testing To be able to construct suitable criteria for testing statistical hypotheses. where θ is the parameter of a binomial population for which n is given. partitions the possible values of the test statistic into two subsets: an acceptance region for H0 and a rejection region for H0 .000. For instance. in the second example we are testing the composite hypothesis μ1 > μ2 against the composite alternative μ1 = μ2 . but this would not be the case if we formulated the hypothesis μ1 > μ2 . but nowadays this term is applied to any hypothesis that we may want to test. . For instance. 16.60) = 0.0114 and β = P(X > 14. . θ = 0. This probability is also called the level of significance of the test (see the last part of Section 5). Rejection of a null hypothesis when it is true is called a type I error. the probability of a type I error has become larger. Find α and β. from the Binomial Probabilities table of “Statistical Tables”. the rejection region (or critical region) is x = 0. It is customary to refer to the rejection region for H0 as the critical region of a test. 339 .0433 and β = 0.60. The probability of obtaining a value of the test statistic inside the critical region when H0 is true is called the size of the critical region. if we use the acceptance region x > 15 in this example so that the critical region is x F 15. 2. 14. and. EXAMPLE 1 Suppose that the manufacturer of a new medication wants to test the null hypothesis θ = 0. thereby giving us a good chance of making the correct decision.90 against the alternative hypothesis θ = 0. Thus. There- fore. he will reject it. The probability of a type II error in Example 1 is rather high. Definition 2 is more readily visualized with the aid for the following table: H0 is true H0 is false Accept H0 No error Type II error probability = β Reject H0 Type I error probability = α No error DEFINITION 3. if the probability of one type of error is reduced. Thus. TYPE I AND TYPE II ERRORS. 1. 2. 17. Solution The acceptance region for the null hypothesis is x = 15. Hypothesis Testing DEFINITION 2. although the probability of a type II error is reduced. 19. The probability of committing a type I error is denoted by ␣.90) = 0.1255 A good test procedure is one in which both α and β are small. Acceptance of the null hypothesis when it is false is called a type II error. and he will accept the null hypothesis if x > 14. and 20. The only way in which we can reduce the probabilities of both types of errors is to increase the size of the sample. the size of the critical region is just the probability ␣ of committing a type I error. but this can be reduced by appropriately changing the critical region. . In other words. . but as long as n is held fixed. 18. 1. correspondingly.0509. CRITICAL REGION. His test statistic is X. otherwise. The probability of committing a type II error is denoted by ␤. the observed number of successes (recoveries) in 20 trials. it can easily be checked that this would make α = 0. that of the other type of error is increased. this inverse relationship between the probabil- ities of type I and type II errors is typical of statistical decision procedures. θ = 0. α = P(X F 14. Find the value of K such that x > K provides a critical region of size α = 0.645) and since z = 1.4400 in the Stan- dard Normal Distribution√ table of “Statistical Tables”.555. we find that z = 1. Hypothesis Testing EXAMPLE 2 Suppose that we want to test the null hypothesis that the mean of a normal popula- tion with σ 2 = 1 is μ0 against the alternative hypothesis that it is μ1 .645 + 1.555 = 3. Solution Since β is given by the area of the ruled region of Figure 1. EXAMPLE 3 With reference to Example 2.645 β = P X < 10 + √ . 340 .645 corresponds to an entry of 0. we get   1.24.5000 − 0.200 and n = 10.645 K = μ0 + √ n b a  0. where μ1 > μ0 .645 equal to −1. or 11 rounded up to the nearest integer.05 for a random sample of size n. Solution Referring to Figure 1 and the Standard Normal Distribution table of “Statistical Tables”. Diagram for Examples 2 and 3.645 ⎢ 10 + √ − 11 ⎥ ⎢ n ⎥ ⎢ ⎥ = P ⎢Z < √ ⎥ ⎢ 1/ n ⎥ ⎣ ⎦ √ = P(Z < − n + 1. It follows that n = 1.05  x m0 K m1 Figure 1.555 corresponds to an entry of 0. we set − n + 1.06 =√0. μ = 11 n ⎡   ⎤ 1.4500 and hence that K − μ0 1. determine the minimum sample size needed to test the null hypothesis μ0 = 10 against the alternative hypothesis μ1 = 11 with β F 0.645 = √ 1/ n It follows that 1.06. in some way. θi ). but this is not a very realistic approach in most practical situations. in either case the right decision is more profitable than the wrong one. The values of the risk function are thus R(d. keeps the probabilities of both types of errors as small as possible. θ0 ) and L(a1 . θ0 ) L(a1 . we could use the minimax criterion and choose the decision function that minimizes the maximum risk. Hypothesis Testing 3 Losses and Risks† The concepts of loss functions and risk functions also play an important part in the theory of hypothesis testing. 341 . and the only condition that we shall impose is that L(a0 . θ1 ) that is. correspondingly. Depending on the true “state of Nature” and the action that she takes. In the decision theory approach to testing the null hypothesis that a population parameter θ equals θ0 against the alternative that it equals θ1 . θ0 )] and R(d. θ0 ) = [1 − α(d)]L(a0 . θ0 ) < L(a1 . θ1 )] where. the quantities in brackets are both positive. For the decision function d. θ1 ) < L(a0 . θ1 ) + [1 − β(d)]L(a1 . perhaps. θ1 ) + β(d)[L(a0 . by assumption. if we looked upon Nature as a malev- olent opponent. θ0 ) = L(a0 . if the value of the parameter is θ1 and the statistician takes action a0 . θ0 ) − L(a0 . θ0 ) + α(d)L(a1 . she commits a type I error. † This section may be omitted. θ1 ) These losses can be positive or negative (reflecting penalties or rewards). that is. have been obvious from the beginning) that to min- imize the risks the statistician must choose a decision function that. Alternatively. if the value of the parameter is θ0 and the statistician takes action a1 . which tells her for each possible outcome what action to take. θ1 ) L(a1 . her losses are shown in the following table: Statistician a0 a1 θ0 L(a0 . we shall let α(d) denote the probability of committing a type I error and β(d) the probability of committing a type II error. θ0 ) Nature θ1 L(a0 . we could calculate the Bayes risk and look for the decision function that minimizes this risk. θ1 ) = β(d)L(a0 . The statistician’s choice will depend on the outcome of an experiment and the decision function d. θ1 ) = L(a1 . It is apparent from this (and should. If the null hypothesis is true and the statistician accepts the alternative hypothesis. she commits a type II error. If we could assign prior probabilities to θ0 and θ1 and if we knew the exact values of all the losses L(aj . or takes the action a1 and accepts the alternative hypothesis. the statistician either takes the action a0 and accepts the null hypothe- sis. θ0 ) + α(d)[L(a1 . θ1 ) − L(a1 . Denoting these likelihoods by L0 and L1 . we restrict ourselves to critical regions of size less than or equal to α. THE POWER OF A TEST. which lead to correct decisions when θ = θ0 and type II errors when θ = θ1 . we refer to the likelihoods of a random sample of size n from the population under consid- eration when θ = θ0 and θ = θ1 . equivalently. we circumvent the dependence between probabilities of type I and type II errors by limiting ourselves to test statistics for which the probability of a type I error is less than or equal to some constant α. (We must allow for the critical region to be of size less than α to take care of discrete random variables. we hold the probability of a type I error fixed and look for the test statistic that minimizes the probability of a type II error or. it stands to reason that should be large L1 for sample points outside the critical region. indeed. In other words.” the Neyman–Pearson theory. the quantity 1 − ␤ is referred to as the power of the test at ␪ = ␪1 . which lead to type I errors when θ = θ0 and to L0 correct decisions when θ = θ1 .) For all practical purposes. A critical region for testing a simple null hypothesis H0 : ␪ = ␪0 against a simple alternative hypothesis H1 : ␪ = ␪1 is said to be a best critical region or a most powerful critical region if the power of the test is a maximum at ␪ = ␪1 . When testing the null hypothesis H0 : ␪ = ␪0 against the alternative hypothesis H1 : ␪ = ␪1 . guarantee a most powerful critical region is proved by the following theorem. To construct a most powerful critical region in this kind of situation. then. where it may be impossible to find a test statistic for which the size of the critical region is exactly equal to α. it stands to reason that should be small for sample L1 points inside the critical region. DEFINITION 4. The fact that this argument does. (Neyman–Pearson Lemma) If C is a critical region of size α and k is a constant such that L0 Fk inside C L1 and L0 G k outside C L1 then C is a most powerful critical region of size α for testing θ = θ0 against θ = θ1 . we thus have n n L0 = f (xi . that maximizes the quantity 1 − β. 342 . θ0 ) and L1 = f (xi . θ1 ) i=1 i=1 L0 Intuitively speaking. THEOREM 1. Hypothesis Testing 4 The Neyman–Pearson Lemma In the theory of hypothesis testing that is nowadays referred to as “classical” or “traditional. similarly. .) 343 . it follows that L0 L0 ··· L1 dx G ··· dx = ··· dx G ··· L1 dx k k C∩D C∩D C ∩D C ∩D and hence that ··· L1 dx G ··· L1 dx C∩D C ∩D Finally. we can write ··· L0 dx + ··· L0 dx = ··· L0 dx + ··· L0 dx = α C∩D C∩D C∩D C ∩D and hence ··· L0 dx = ··· L0 dx C∩D C ∩D Then. and the two multiple integrals are taken over the respective n-dimensional regions C and D. Hypothesis Testing Proof Suppose that C is a critical region satisfying the conditions of the theorem and that D is some other critical region of size α. (For the discrete case the proof is the same. ··· L1 dx = ··· L1 dx + ··· L1 dx C C∩D C∩D G ··· L1 dx + ··· L1 dx = ··· L1 dx C∩D C ∩D D so that ··· L1 dx G ··· L1 dx C D and this completes the proof of Theorem 1. dx2 . . Now. dxn . ··· L0 dx = ··· L0 dx = α C D where dx stands for dx1 . The final inequality states that for the critical region C the proba- bility of not committing a type II error is greater than or equal to the corresponding probability for any other critical region of size α. Thus. . with summations taking the place of integrals. making use of the fact that C is the union of the disjoint sets C ∩ D and C ∩ D . . while D is the union of the disjoint sets C ∩ D and C ∩ D. since L1 G L0 /k inside C and L1 F L0 /k outside C. subtracting (μ21 − μ20 ). Thus. these two terms. we must find a constant k and a region C of the sample space such that n e 2 (μ1 −μ0 )+(μ0 −μ1 )·xi F k 2 2 inside C n 2 (μ1 −μ0 )+(μ0 −μ1 )·xi 2 2 e G k outside C n and after taking logarithms. In our case (see Example 2) we obtain K = μ0 + zα · √1n . This is an important property. and vice versa. and μ1 . n. Note that we derived the critical region here without first mentioning that the test statistic is to be X. “critical region” and “test statistic. to which we shall refer again in Section 5. and dividing by the negative 2 quantity n(μ0 − μ1 ). Solution The two likelihoods are  n  n 1 − 12 (xi −μ0 )2 1 1 · e− 2 (xi −μ1 ) 2 L0 = √ ·e and L1 = √ 2π 2π where the summations extend from i = 1 to i = n. where μ1 > μ0 . μ0 . the most powerful critical region of size α for testing the null hypothesis μ = μ0 against the alternative μ = μ1 (with μ1 > μ0 ) for the given normal population is 1 x G μ0 + zα · √ n and it should be noted that it does not depend on μ1 . constants like K are determined by making use of the size of the critical region and appropriate statistical theory. In actual practice. Hypothesis Testing EXAMPLE 4 A random sample of size n from a normal population with σ 2 = 1 is to be used to test the null hypothesis μ = μ0 against the alternative hypothesis μ = μ1 . Use the Neyman–Pearson lemma to find the most powerful critical region of size α. and after some simplification their ratio becomes L0 n = e 2 (μ1 −μ0 )+(μ0 −μ1 )·xi 2 2 L1 Thus. 344 . Since the specification of a critical region thus defines the corresponding test statistic. these two inequalities become x G K inside C x F K outside C where K is an expression in k.” are often used interchangeably in the language of statistics. Decide in each case whether the hypothesis is 8.50. 9. Hypothesis Testing Exercises 1. distribution with α = 3 and β = 2. Let X1 and X2 constitute a random sample of size 2 nential density. from the population given by (d) the hypothesis that a random variable has a beta dis- tribution with the mean μ = 0. A single observation of a random variable having a uni- simple or composite: form density with α = 0 is used to test the null hypothesis (a) the hypothesis that a random variable has a gamma β = β0 against the alternative hypothesis β = β0 + 2. If the null hypothesis is rejected if and only if the ran- (b) the hypothesis that a random variable has a gamma dom variable takes on a value greater than β0 + 1. (c) the hypothesis that a random variable has an expo. . find distribution with α = 3 and β Z 2. the probabilities of type I and type II errors. where θ is the parameter of a rejection region had been x F 16? binomial distribution with a given value of n.25. A single observation of a random variable having a geometric distribution is used to test the null hypothesis 13. ting a type II error. ties of type I and type II errors.60. Pearson lemma to find the most powerful critical region abilities of type I and type II errors. Show that if μ1 < μ0 in Example 4. With reference to Exercise 12. If the null hypothesis μ = μ0 is to be rejected in favor of the alternative hypothesis 16. to construct the most powerful critical region of size α to test the null hypothesis σ = σ0 against the alterna- 7. A single observation of a random variable having a geometric distribution is to be used to test the null 6. If 0. Decide in each case whether the hypothesis is simple 0 elsewhere or composite: (a) the hypothesis that a random variable has a Poisson distribution with λ = 1.90 345 . 4. if n = 100. θ1 = 0.30. A single observation of a random variable having an hypothesis that its parameter equals θ0 against the alter- exponential distribution is used to test the null hypoth- native that it equals θ1 > θ0 . A random sample of size n from an exponential pop- to test the null hypothesis k = 2 against the alternative ulation is used to test the null hypothesis θ = θ0 against hypothesis k = 4. 1 x F μ0 − zα · √ n 3. use the Neyman–Pearson lemma type II errors.05. Let X1 and X2 constitute a random sample from a nor. θ0 = θ = θ0 against the alternative hypothesis θ = θ1 > θ0 . Pearson lemma yields the critical region (d) the hypothesis that a random variable has a negative binomial distribution with k = 3 and θ < 0. 5. the Neyman– distribution with the mean μ = 100. distribution with λ > 1. use the normal approximation to the value of the random variable is greater than or equal to binomial distribution to find the probability of commit- the positive integer k. θ xθ −1 for 0 < x < 1 f (x. Suppose that in Example 1 the manufacturer of the μ = μ1 > μ0 when x > μ0 + 1. the alternative that it is θ = 5. Use the Neyman–Pearson lemma to indicate how to the probabilities of type I and type II errors if the accep. what is the size of the criti. tive σ = σ1 > σ0 . new medication feels that the odds are 4 to 1 that with cal region? this medication the recovery rate from the disease is 0. mal population with σ 2 = 1. find the probabilities of type I and ulation with μ = 0. θ ) = 2. Given a random sample of size n from a normal pop- variable is less than 3. Use the Neyman–Pearson esis that the mean of the distribution is θ = 2 against lemma to find the best critical region of size α. If the critical region x1 x2 G 34 is used to test the null (b) the hypothesis that a random variable has a Poisson hypothesis θ = 1 against the alternative hypothesis θ = 2. and α is as large as possible with- the null hypothesis is rejected if and only if the observed out exceeding 0. find expressions for the probabili. of size α. If the null hypothesis is rejected if and the alternative hypothesis θ = θ1 > θ0 . A single observation of a random variable having a hypergeometric distribution with N = 7 and n = 2 is used 11. If the null hypothesis is accepted if and only if the observed value of the random 15. 14.40. what would have been 12. Use the Neyman– only if the value of the random variable is 2. against the alternative hypothesis θ = θ1 < θ0 . what is the power of this test at θ = 2? (c) the hypothesis that a random variable has a normal 10. With reference to Example 1. construct the most powerful critical region of size α to test tance region had been x > 16 and the corresponding the null hypothesis θ = θ0 .25. find the prob. Hypothesis Testing . With these odds.60. a0 for x > 15 bilities that he will make a wrong decision if he uses the (b) d2 (x) = a1 for x F 15 decision function . what are the proba. rather than 0. . a0 for x > 14 a0 for x > 16 (a) d1 (x) = (c) d3 (x) = a1 for x F 14 a1 for x F 16 5 The Power Function of a Test In Example 1 we were able to give unique values for the probabilities of committing type I and type II errors because we were testing a simple hypothesis against a simple alternative. It is customary to combine the two sets of probabilities in the following way.90 against the alternative hypothesis θ < 0. the alternative hypothesis that the new medication is not as effective as claimed. however. DEFINITION 5. The power function of a test of a statistical hypoth- esis H0 against an alternative hypothesis H1 is given by . that simple hypotheses are tested against simple alternatives. in Example 1 it might well have been more realistic to test the null hypothesis that the recovery rate from the disease is θ G 0. are compos- ite. becomes more involved. usually one or the other. For instance. or critical region. or both. When we deal with composite hypotheses. POWER FUNCTION. In that case we have to consider the probabilities α(θ ) of committing a type I error for all values of θ within the domain specified under the null hypothesis H0 and the probabilities β(θ ) of committing a type II error for all values of θ within the domain specified under the alternative hypothesis H1 . that is.90. In actual practice. it is relatively rare. the problem of evaluating the merits of a test criterion. the power function gives the probability of committing a type I error. As before. we find the 346 . the values of the power function are the probabilities of rejecting the null hypothesis H0 for various values of the parameter θ . and for values of θ assumed under H1 . Observe also that for values of θ assumed under H0 . α(θ ) for values of θ assumed under H0 π(θ ) = 1 − β(θ ) for values of θ assumed under H1 Thus. Investigate the power function corresponding to the same test criterion as in Exercises 3 and 4.90 against the alternative hypothesis θ < 0. α(θ ) or β(θ ). where we accept the null hypothesis if x > 14 and reject it if x F 14. are avail- able from the Binomial Probabilities table of “Statistical Tables”. EXAMPLE 5 With reference to Example 1. x is the observed number of successes (recoveries) in n = 20 trials.90. it gives the probability of not commit- ting a type II error. suppose that we had wanted to test the null hypothesis θ G 0. Solution Choosing values of θ for which the respective probabilities. 1958 0. but it is of interest to note how it compares with the power function of a corresponding ideal (infallible) test criterion. OC-curve. the critical region x F 14. These are shown in the following table.2455 0. In other words.5837 0.0003 0. the probability of a type I error.9 0.2 0.5 0. the values of the operating characteristic function.90 and 0. fixed. used mainly in industrial applications.6 0. 0. Incidentally.9447 0.95 0.9326 0.65 0. given by the dashed lines of Figure 2. .0 0.0207 0.1 u 0. Of course.6 0.80. π(θ ): Probability of Probability of Probability of type I error type II error rejecting H0 θ α(θ ) β(θ ) π(θ ) 0. it applies only to the decision criterion of Example 1.8 0. .8042 0.3 0.85 0.85. In Section 4 we indicated that in the Neyman–Pearson theory of testing hypo- theses we hold α.3 0. Diagram for Example 5.7 0. . if we had plot- ted in Figure 2 the probabilities of accepting H0 (instead of those of rejecting H0 ).4163 0. and this requires that the 347 . 0.8 0. together with the corresponding values of the power function.70 0.9793 The graph of this power function is shown in Figure 2.0553 0. particularly in the comparison of several critical regions that might all be used to test a given null hypothesis against a given alternative.50.95 and the prob- abilities β(θ ) of getting more than 14 successes for θ = 0. are given by 1 − π(θ ).55 0.0 Figure 2. .0114 0. of the given critical region.1 0. we would have obtained the operating characteristic curve.0674 0.75 0.60 0.7545 0.7 0. Hypothesis Testing probabilities α(θ ) of getting at most 14 successes for θ = 0.8745 0.90 0.1255 0. Power functions play a very important role in the evaluation of statistical tests.4 0.0003 0.50 0.4 0.6171 0.2 0.0114 0.5 0. p(u) 1.9 1.3829 0.80 0. Since for each value of θ .6 0. when we test a simple hypothesis against a composite alternative.0 0. for a given problem. while the other is prefer- able for θ > θ0 . designed for this purpose. and refer to one critical region of size α as uniformly more powerful than another if the values of its power function are always greater than or equal to those of the other. p(u) 1. the only point at which the value of a power function is the probability of making an error. the probability of a type I error. the values of power functions are probabilities of making correct decisions. and we say that the first critical region is uniformly more powerful than the second. To illustrate.2 0. Note that if the alternative hypothesis had been θ > θ0 . say. it is said to be a uniformly most powerful critical region.7 0. Thus. except θ0 . or a uniformly most powerful test. This facilitates the comparison of the power functions of several critical regions.5 0. In general.4 0. or test criteria. for instance that of Exercise 27. the second critical region is said to be inadmissible. 348 . In situations like this we need further criteria for comparing power functions. it can be seen by inspection that the critical region whose power function is given by the dotted curve of Figure 3 is preferable to the critical region whose power function is given by the curve that is dashed. a critical region of size ␣ is uniformly more powerful than any other critical region of size ␣. with the strict inequality holding for at least one value of the parameter under consideration. we specify α.9 0. the critical region whose power function is given by the solid curve would have been uniformly more powerful than the critical region whose power function is given by the dotted curve. Power functions. The same clear-cut distinction is not possible if we attempt to compare the critical regions whose power functions are given by the dotted and solid curves of Figure 3. θ = θ0 . If. which are all designed to test the simple null hypothesis θ = θ0 against a composite alter- native. Hypothesis Testing null hypothesis H0 be a simple hypothesis. in this case the first one is preferable for θ < θ0 . the power function of any test of this null hypothesis will pass through the point (θ0 . UNIFORMLY MOST POWERFUL CRITICAL REGION (TEST). say. giving the power functions of three different critical regions.1 u u0 Figure 3. consider Figure 3. The probability of not committing a type II error with the first of these critical regions always exceeds that of the second. α). As a result.3 0.8 0. also. DEFINITION 6. it is desirable to have them as close to 1 as possible. the alternative hypothesis θ Z θ0 . For instance. 349 . but it does not always apply to composite hypotheses. as defined in Section 4. It is mainly in connection with tests of this kind that we refer to the probability of a type I error as the level of significance. this will not enable us to reject the null hypothesis when α = 0. or we can say that this difference is not large enough to reject the null hypothesis. and 100 tosses yield 57 heads and 43 tails. let us suppose that X1 . in fact. We shall discuss this method here with reference to tests concerning one parameter θ and continuous populations. The resulting tests. or a discrete set of real numbers. we can say that the difference between 50 and 57. may reasonably be attributed to chance. called likeli- hood ratio tests. 6 Likelihood Ratio Tests The Neyman–Pearson lemma provides a means of constructing most powerful crit- ical regions for testing a simple null hypothesis against a simple alternative hypoth- esis. It is also not the case in tests of significance. are based on a generalization of the method of Section 4. and as long as we do not actually accept the null hypothesis. we may well be reluctant to accept the null hypothesis as true. . where the alternative to rejecting H0 is reserving judgment instead of accepting H0 . . for example. Thus. the parameter space for θ is partitioned into the disjoint sets ω and ω . in multistage or sequential tests. and vice versa. To illustrate the likelihood ratio technique. but this is not the case. We often refer to as the parameter space for θ . Of course. Xn constitute a random sample of size n from a population whose density at x is f (x. is.05 (see Exercise 42). but they are not necessarily uniformly most powerful. θ ) and that is the set of values that can be taken on by the parameter θ . when we test a simple hypothesis against a simple alternative. θ is an element of the first set. but all our arguments can easily be extended to the multiparameter case and to discrete populations. However. where the alternatives are to accept H0 . uniformly most powerful. we do not really commit ourselves one way or the other. We shall now present a general method for constructing critical regions for tests of composite hypotheses that in most cases have very satisfactory properties. it is an element of the second set. Hypothesis Testing Unfortunately. Until now we have always assumed that the acceptance of H0 is equivalent to the rejection of H1 . X2 . if we want to test the null hypothesis that a coin is per- fectly balanced against the alternative that this is not the case. and according to the alternative hypothesis. To avoid this. a most powerful critical region of size α. In either case. since we obtained quite a few more heads than the 50 that we can expect for a balanced coin. . or to defer the decision until more data have been obtained. some interval of real numbers. the set of all positive real numbers. In most problems is either the set of all real numbers. according to the null hypothesis. to accept H1 . the number of heads that we expected and the number of heads that we obtained. uniformly most powerful critical regions rarely exist when we test a simple hypothesis against a composite alternative. we cannot commit a type II error. The null hypothesis we shall want to test is H0 : θ ∈ω and the alternative hypothesis is H1 : θ ∈ ω where ω is a subset of and ω is the complement of ω with respect to . . . since they depend on the observed values x1 . To summarize. since ω is a subset of the parameter space . and max L is the maximum value of the likelihood function for all values of θ in . in which case λ would be close to 1. we have the following definition. θ̂ ) i=1 and n max L = f (xi . . where max L0 is the maximum value of the likelihood function for all values of θ in ω. LIKELIHOOD RATIO TEST. it follows that λ G 0. when the null hypothesis is true and θ ∈ ω. we would expect max L0 to be small compared to max L. xn . defines a likelihood ratio test of the null hypothesis ␪ ∈ ␻ against the alternative hypothesis ␪ ∈ ␻ . x2 . and their ratio max L0 λ= max L is referred to as a value of the likelihood ratio statistic (capital Greek lambda). we would expect max L0 to be close to max L. we com- pare instead the two quantities max L0 and max L. and θ̂ˆ is the maximum likelihood estimate of θ for all values of θ in . If H0 is a simple hypothesis. if we have a random sample of size n from a population whose density at x is f (x. where at least one of the two hypotheses is composite. respectively. in which case λ would be close to zero. When the null hypothesis is false. it follows that λ F 1. Hypothesis Testing When H0 and H1 are both simple hypotheses. θ̂ˆ ) i=1 These quantities are both values of random variables. θ̂ is the max- imum likelihood estimate of θ subject to the restriction that θ must be an element of ω. and in Section 4 we constructed tests by comparing the likelihoods L0 and L1 . . if possible. then the critical region λ … k where 0 < k < 1. also. for at least one value 350 . A likelihood ratio test states. . ω and ω each have only one ele- ment. In other words. In the general case. If ␻ and ␻ are complementary subsets of the parameter space and if the likelihood ratio statistic max L0 λ= max L where max L0 and max L are the maximum values of the likelihood function for all values of ␪ in ␻ and . therefore. k is chosen so that the size of the critical region equals α. On the other hand. Since max L0 and max L are both values of a likelihood function and therefore are never negative. where 0 < k < 1. if H0 is composite. that the null hypothesis H0 is rejected if and only if λ falls in a critical region of the form λ F k. DEFINITION 7. then n max L0 = f (xi . and equal to α. k is chosen so that the probability of a type I error is less than or equal to α for all θ in ω. θ ). Hypothesis Testing of θ in ω. the integral is replaced by a summation. after taking logarithms and dividing by − . the critical region of the likelihood ratio test is n − (x−μ0 )2 e 2σ 2 Fk n and. it becomes 2σ2 2σ 2 (x − μ0 )2 G − · ln k n 351 . Solution Since ω contains only μ0 .  n 1 − 1 ·(xi −μ0 )2 max L0 = √ · e 2σ 2 σ 2π and  n 1 − 1 ·(xi −x)2 max L = √ · e 2σ 2 σ 2π where the summations extend from i = 1 to i = n. Thus. then k must be such that k P( F k) = g(λ)dλ = α 0 In the discrete case. if H0 is a simple hypothesis and g(λ) is the density of at λ when H0 is true. it follows that μ̂ˆ = x. and the value of the likelihood ratio statistic becomes 1 − ·(xi −μ0 )2 e 2σ 2 λ= 1 − ·(xi −x)2 e 2σ 2 n − (x−μ0 )2 =e 2σ 2 after suitable simplifications. and k is taken to be the largest value for which the sum is less than or equal to α. and since is the set of all real numbers. Thus. which the reader will be asked to verify in Exercise 19. EXAMPLE 6 Find the critical region of the likelihood ratio test for testing the null hypothesis H0 : μ = μ0 against the composite alternative H1 : μ Z μ0 on the basis of a random sample of size n from a normal population with the known variance σ 2 . Hence. it follows that μ̂ = μ0 . For instance. because we were able to refer to the known distribution of X and did not have to derive the distribution of the likelihood ratio statistic itself. Hypothesis Testing or |x − μ0 | G K where K will have to be determined so that the size of the critical region is α. if the population has more than one unknown parameter upon which the null hypothesis imposes r restrictions. under very general conditions. In the preceding example it was easy to find the constant that made the size of the critical region equal to α. whose proof is referred to at the end of this chapter. if we want to test the null hypothesis that the unknown mean and variance of a normal population are μ0 and σ02 against the alternative hypothesis that μ Z μ0 and σ 2 Z σ02 . THEOREM 2. it is often preferable to use the following approximation. Since small values of λ correspond to large values of −2 · ln λ. equivalently. the number of degrees of freedom in the chi-square approximation to the distribution of −2 · ln would be 2. the two restrictions are μ = μ0 and σ 2 = σ02 . see the references at the end of this chapter. Since the distribution of is usually quite complicated.1 In connection with Example 6 we find that  2 n x − μ0 −2 · ln λ = 2 (x − μ0 )2 = √ σ σ/ n † For a statement of the conditions under which Theorem 2 is true and for a proof of this theorem. the distribution of −2 · ln approaches. the null hypothesis must be rejected when Z takes on a value greater than or equal to zα/2 or a value less than or equal to −zα/2 .† For large n. the chi-square distribution with 1 degree of freedom. we can use Theo- rem 2 to write the critical region of this approximate likelihood ratio test as 2 −2 · ln λ G χα. We should add that this theorem applies only to the one-parameter case. Note that ln k is negative in view of the fact that 0 < k < 1. |z| G zα/2 where x − μ0 z= √ σ/ n In other words. we n find that the critical region of this likelihood ratio test is σ |x − μ0 | G zα/2 · √ n or. the number of degrees of freedom in the chi-square approx- imation to the distribution of −2 · ln is equal to r. which makes it difficult to evaluate k. 352 . σ2 Since X has a normal distribution with the mean μ0 and the variance . or x = 3. b. and c. which is somewhat out of the ordinary. The corresponding probability of a type II error is given by g(4) + g(5) + g(6) + g(7). we want to test the simple null hypothesis that the probability distribution of X is x 1 2 3 4 5 6 7 1 1 1 1 1 1 1 f (x) 12 12 12 4 6 6 6 against the composite alternative that the probability distribution is x 1 2 3 4 5 6 7 a b c 2 g(x) 0 0 0 3 3 3 3 where a + b + c = 1. Hypothesis Testing which actually is a value of a random variable having the chi-square distribution with 1 degree of freedom. clearly. and hence it equals 23 . max L = 13 (corresponding to a = 1).25. 353 . and hence λ = 14 . that is. we first let x = 1.25 since f (4) = 14 . the critical region obtained by means of the likelihood ratio technique is inadmissible. As we indicated in Section 6. EXAMPLE 7 On the basis of a single observation. To determine λ for each value of x. Now let us consider the critical region for which the null hypothesis is rejected only when x = 4. we get the results shown in the following table: x 1 2 3 4 5 6 7 1 1 1 3 λ 1 1 1 4 4 4 8 If the size of the critical region is to be α = 0. x = 2. subject only to the restriction that a + b + c = 1. Its size is also α = 0. For this value we get max L0 = 12 1 . we find that the likelihood ratio technique yields the critical region for which the null hypothesis is rejected when λ = 14 . Show that the critical region obtained by means of the likeli- hood ratio technique is inadmissible. f (1) + f (2) + f (3) = 12 1 + 12 1 + 12 1 = 0. That this is not always the case is illustrated by the following example. Solution The composite alternative hypothesis includes all the probability distributions that we get by assigning different values from 0 to 1 to a. the likelihood ratio technique will generally pro- duce satisfactory results. but the corresponding probability of a type II error is a b c g(1) + g(2) + g(3) + g(5) + g(6) + g(7) = + + +0+0+0 3 3 3 1 = 3 Since this is less than 23 . when x = 1.25. Determining λ for the other values of x in the same way.  k (ni − 1)s2 i 19. . ⎨ 1 f (x) = 1 + θ2 −x for 0 < x < 1 ⎩ 2 23. show that the critical region x F α 24. and 2. In the solution of Example 6. variances are to be used to test the null hypothesis σ12 = Also plot the graph of the corresponding power function. suppose that we had hypothesis σ = σ0 against the alternative hypothesis wanted to test the null hypothesis k F 2 against the alter. Given a single x · e−x/θ0 F K observation of the random variable X having the density 22. Cal. Find the probabilities of (a) type I errors for k = 0. . Exercise 25 can be expressed in terms of the ratio of the ical region. i=1 λ= ⎡ ⎤n/2 in particular its minimum and its symmetry. (b) Use the result of part (a) to show that the critical (b) Using the results of part (a). With reference to Exercise 3. σ22 = · · · = σk2 against the alternative that these variances 18. Show that for k = 2 the likelihood ratio statistic of where K is a constant that depends on the size of the crit. 21. a critical region is unbi- region of the likelihood ratio test can be written as ased if the probability of rejecting the null hypothesis is least when the null hypothesis is true. find an expres. Hypothesis Testing Exercises 17. 5. [Hint: Use the infi- nite series for ln(1 + x). a critical region is said to be unbiased if not equal θ0 . This question has been intentionally omitted for this ⎧   edition. n2 . σ Z σ0 . The number of successes in n trials is to be used to likelihood estimates of the means μi and the variances test the null hypothesis that the parameter θ of a bino. In other words. show that the likelihood region of the likelihood ratio test can be written as ratio statistic can be written as x · ln x + (n − x) · ln(n − x) G K  ni /2 k (ni − 1)s2i where x is the observed number of successes. When we test a simple null hypothesis against a com- population equals θ0 against the alternative that it does posite alternative. . 25. μ̂ˆ i = xi and σ̂ˆ i2 = ni (a) Find an expression for the likelihood ratio statistic. ni (c) Study the graph of f (x) = x · ln x + (n − x) · ln(n − x). σi2 are mial population equals 12 against the alternative that it (ni − 1)s2i does not equal 12 . verify the step that μ̂i = xi and σ̂i2 = n led to i=1 − n (x−μ0 )2  k λ=e 2σ 2 where n = ni . 1. For the likelihood ratio statistic of Exercise 22.] where −1 F θ F 1. the null hypothesis if x F 15 and accept it if x > 15. With reference to Example 5. suppose that we reject are not all equal. native hypothesis k > 2. and 7. show 0 elsewhere that −2 · ln λ approaches t2 as n → q. the corresponding power function takes on its minimum (a) Find an expression for the likelihood ratio statistic. while without restrictions the maximum i=1 20. to show that k (ni − 1)s2i the critical region of this likelihood ratio test can also be ⎣ ⎦ written as n i=1 x − n G K 2 26. (a) Show that under the null hypothesis the maximum culate π(θ ) for the same values of θ as in the table in likelihood estimates of the means μ1 and the variances Section 5 and plot the graph of the power function of this σi2 are test criterion. two sample variances and that the likelihood ratio test can. the null hypothesis θ = 0 against the alternative hypoth- sion for the likelihood ratio statistic for testing the null esis θ Z 0. provides an unbiased critical region of size α for testing ulation with unknown mean and variance. therefore. value at the value of the parameter assumed under the (b) Use the result of part (a) to show that the critical null hypothesis. Independent random samples of sizes n1 . 354 . 6. A random sample of size n is to be used to test the null hypothesis that the parameter θ of an exponential 27. be based on the F distribution. Given a random sample of size n from a normal pop. . and nk from k normal populations with unknown means and (b) type II errors for k = 4. where μ is the average dry.6 cm. (a) Explain under what conditions we would commit a (a) What alternative hypothesis should the department type I error and under what conditions we would commit use if it does not want to use the new tires unless they a type II error. and vice versa. (b) μ = 12. takes a random sample and decides to accept the null hypothesis if and only if the mean of the sample falls (c) What alternative hypothesis should the department between 12. school year. the burden of proof is put on the new tires.3 mm burden of proof is on the old tires. 34. A biologist wants to test the null hypothesis that the give poorer mileage than the old tires? Note that now the mean wingspan of a certain kind of insect is 12. what conditions the doctor would be committing a type I (b) he gets a sample mean of 10.6 cm. If μ1 is the average number of miles that the old tires last and μ2 is the aver. Suppose that we want to test the null hypothesis that age number of miles that the new tires will last. and assume that use if she does not want to make the modification in the high scores are desirable. A botanist wishes to test the null hypothesis that the percent of its passengers object to smoking inside the average diameter of the flowers of a particular plant is plane. ple falls between 9.3 mm.2 cm and μ = 9.6 mm. What decision will he make and will it be in physical checkup to test the null hypothesis that he will be error if able to take on additional responsibilities.2 cm and μ = 9.5 mm. (b) What alternative hypothesis should the department use if it is anxious to get the new tires unless they actually 35. He decides to take a random sample of size n = 80 mitting a type I error and under what conditions they and accept the null hypothesis if the mean of the sam- would be committing a type II error. becomes a type II error. hypothesis to be tested is μ1 = μ2 .8 cm? 30. (d) he gets a sample mean of 9. and error depends on how we formulate the null hypothe- the old tires are to be kept unless the null hypothesis can sis. the null an antipollution device for cars is effective. Rephrase the null hypothesis so that the type I error be rejected. Explain under (a) he gets a sample mean of 10. he will reject the null 29. tests after many years of experience. and μ1 is the average score obtained on these ing time of the modified paint.9 cm. Investigating the effectiveness of a modifica. A city police department is considering replacing the the standardized test scores? tires on its cars with a new brand tires. chemical composition of the paint unless it decreases the (a) What null hypothesis should the education special- drying time? ist use? (b) What alternative hypothesis should the manufacturer (b) What alternative hypothesis should be used if the spe- use if the new process is actually cheaper and she wants to cialist does not want to adopt the new discs unless they make the modification unless it increases the drying time improve the standardized test scores? of the paint? (c) What alternative hypothesis should be used if the spe- cialist wants to adopt the new discs unless they worsen 31.2 cm and μ = 9. of third-grade students with reading disabilities. A doctor is asked to give an executive a thorough hypothesis.2 cm and μ = 9. the manu. error and under what conditions he would be committing (c) he gets a sample mean of 9. Applied Exercises SECS. Let μ2 be the aver- (a) What alternative hypothesis should the manufacturer age score for students using the discs. a type II error. Students facturer wants to test the null hypothesis μ = 20 minutes in this class are given a standardized test in May of the against a suitable alternative. An airline wants to test the null hypothesis that 60 32. 9.3 cm and 9. An education specialist is considering the use of 20 minutes.8 cm. The average drying time of a manufacturer’s paint is 33.6 cm. Hypothesis Testing 7 The Theory in Practice The applied exercises that follow are intended to give the reader some practical experience with the theory of this chapter. what decision will she use so that rejection of the null hypothesis can lead make if she gets x = 12. which will be kept only against the alternative that it is not 12. if the mean of this sample falls outside this interval. instructional material on compact discs for a special class tion in the chemical composition of her paint. Explain under what conditions they would be com.0 mm and 12.9 mm and will it be in error if either to keeping the old tires or to buying the new ones? (a) μ = 12. If she if the null hypothesis can be rejected. 1–4 28.3 mm? 355 . are definitely proved to give better mileage? In other (b) Whether an error is a type I error or a type II words. 5. cance.0. 1986. 28. 19.0. n3 = 6.0 against an exponential population. and 3. 33.0. Wald. 47. 38.5.0. 1.2.) References Discussions of various properties of likelihood ratio Wilks. 1957. s23 = 12. S. esis that on the average the bank cashes 10 bad checks per 39. An employee of a bank wants to test the null hypoth. find (b) of Exercise 25 to calculate −2 · ln λ and test the null (a) the probability of a type I error. 42. 12.. and will it be in error if 41. A single observation is to be used to test the null hypothesis that the mean waiting time between tremors 43. for duced in example. hypothesis σ12 = σ22 = σ32 = σ42 at the 0. 12. 45.05 level of significance. week at a certain intersection (that λ > 2 for this Poisson population) against the alternative hypothesis that on the 37.0.0. Explain why the number of degrees of freedom for 12. 65.: Stanford University Press. 30.. Testing Statistical Hypotheses. If (b) the probabilities of type II errors when μ = 41.4.0. and 0. a random variable having an exponential distribution. and s24 = 24. find 38. 16. L. day against the alternative that this figure is too small. A random sample of size 64 is to be used to test the in accelerate environment tests are 15.570. 2nd ed. 4. particularly their large-sample properties. use the results of Exercise 21 the alternative that it is greater than 40. 62. Verify the statement in Section 5 that 57 heads and 43 we change the alternative hypothesis to tails in 100 flips of a coin do not enable us to reject the null (a) θ1 = 50. hypothesis is to be rejected if and only if the sum of the observations is five or less.000 miles. 3. (Use ln 1. s22 = 25.0. and 48. a certain kind of tire will last.8. (b) λ = 10. and 20.0. 46. The times to failure of certain electronic components 40. (Hint: Use the normal approxima- tion to the binomial distribution. researchers took independent random samples nential population) is θ = 10 hours against the alternative of sizes n1 = 8.0. error and use the Neyman–Pearson lemma to construct a critical region.0. and Wiley & Sons. If the null and Theorem 2 to test the null hypothesis θ = 15 min- hypothesis is to be rejected if and only if the mean of the utes against the alternative hypothesis θ Z 15 minutes at random sample exceeds 43.01.000 miles 2. (a) the probabilities of type I errors when μ = 37. and got that θ Z 10 hours. Assuming that the and only if the observed value is less than 8 or greater populations sampled are normal. on the average. 2. 35. 44.5.05 level of signifi- (b) the probabilities of type II errors when θ = 2. and 40. Assuming that we are dealing with 1. 5–6 level of significance. Inc. and 28 min- score on an achievement test (the mean of a normal pop. a proof of Theorem 2 may be found in most Much of the original research done in this area is repro- advanced textbooks on the theory of statistics. find the 0. 25. 2. Looking upon these data as a random sample from ulation with σ 2 = 256) is less than or equal to 40. Inc.. The sum of the values obtained in a random sample (a) λ = 11.0? of size n = 5 is to be used to test the null hypothesis Here λ is the mean of the Poisson population being that on the average there are more than two accidents per sampled. New York: John tests. Calif. in Selected Papers in Statistics and Probability by Abraham Lehmann. 44. against the alternative hypothesis that it will last. If the null hypothesis is to be rejected if s21 = 16. 16. 1962. Mathematical Statistics. 8. 12. 18. average. S. To compare the variations in weight of four breeds recorded at a seismological station (the mean of an expo.) 39.0. hypothesis if and only if the mean of the sample exceeds Also plot the power function of this test criterion. Would we get the same critical region if 42.0.0. utes. plot the graph of the power function of this test specify the sample size and the probability of a type I criterion. (b) β = 0.763 = 0. Suppose that we want to test the null hypothesis that (a) the probabilities of type I errors when λ = 2. use the formula of part than 12. 356 . n2 = 10. 4. 44. he takes a random sample and decides to reject the null 43. Stanford. Rework Example 3 with average the number of accidents is two or less. what decision will he make if he gets x = 11. null hypothesis that for a certain age group the mean 20. E. Also plot the power function of this test criterion.000 miles? hypothesis that the coin is perfectly balanced (against the alternative that it is not perfectly balanced) at the 0. 51.05 SECS. 45. If the null (a) β = 0. 2..5.5. (b) θ1 > 35. on the (b) the probabilities of type II errors when λ = 2.0.000 miles. Hypothesis Testing 36. 6. New York: John Wiley & Sons. and n4 = 8.03.6. 42. this approximate chi-square test is 3.0. of dogs.2. we Also. 37. 0. 0. (c) The alternative hypothesis 9 1 − β = 0. (b) The alterna- 11 xi Ú K.0027. 3 α = 21 7 21 (a) λ = e−(nx/θ0 +n) θ0 5 α = (1 − θ0 )k−1 and β = 1 − (1 − θ1 )k−1 . 2 . 357 .  n 7 21 7 x 1 and β = 5 .0420. Hypothesis Testing Answers to Odd-Numbered Exercises 1 (a) Simple. the fact that Xi has the gamma distribution with α = n 35 (a) Correctly reject the null hypothesis.122.424. and 13 β = 0. (b) composite. (b) 0. and 0. (b) Erro- i=1 neously reject the null hypothesis. 11 . the null hypothesis cannot be rejected. is μ1 Z μ2 .3840.0107. 39 (a) 0. 15 x2i Ú K. 0.08.129.0055. i=1 the formula for the sum of n terms of a geometric distribu. 0. (b) 5 . 0. n 41 (a) 0. where K can be determined by making use of tive hypothesis is μ2 > μ1 .9329.086.852. (b) The alter- 7 α = 0.016. 0. 21 (d) composite. where K can be determined by making use of (b) 0. 17 (a) 0. 0. n 33 (a) The null hypothesis is μ1 = μ2 .0375.0203. 0. 0. 43 −2 · ln λ = 1.145. and β = θ0 . (c) The alternative hypothesis is i=1  n μ2 < μ1 . 0. 0.134. 0. 1 . 0. (c) composite. native hypothesis is μ1 > μ2 .114. 31 (a) The alternative hypothesis is μ2 > μ1 . tion. and 0.7585.144. This page intentionally left blank . ˆ Such a test is referred to as a two-tailed test. Therefore. Marylees Miller. On the other hand. Copyright © 2014 by Pearson Education. For instance. it would seem reasonable to reject H0 only when θ̂ is much smaller than θ0 . we reject H0 only for large values of θ̂ . in this case it would be logical to let the critical region consist only of the left-hand tail of the sampling distribution of . in testing H0 : θ = θ0 against the one-sided alternative H1 : θ > θ0 . To explain the terminology we shall use.ˆ Any test where the critical region consists only of one tail of the sampling distribution of the test statistic is called a one-tailed test. TEST OF SIGNIFICANCE. the likelihood ratio technique leads to a two-tailed test with the critical region σ |x − μ0 | G zα/2 · √ n From Chapter 13 of John E. the size of the critical region. for the two-sided alternative μ Z μ0 . ␣. Inc. A statistical test which specifies a simple null hypothesis. and a composite alternative hypoth- esis is called a test of significance. In such a test. Eighth Edition.Tests of Hypothesis Involving Means. can be obtained by the likelihood ratio technique. and Proportions 1 Introduction 6 Tests Concerning Differences Among k 2 Tests Concerning Means Proportions 3 Tests Concerning Differences Between 7 The Analysis of an r * c Table Means 8 Goodness of Fit 4 Tests Concerning Variances 9 The Theory in Practice 5 Tests Concerning Proportions 1 Introduction In this chapter we shall present some of the standard tests that are most widely used in applications. Irwin Miller. and the critical region consists only of the right tail of the sampling distribution of . 359 . Variances. Since it appears reasonable to accept the null hypothesis when our point estimate θ̂ of θ is close to θ0 and to reject it when θ̂ is much larger or much smaller than θ0 . let us first consider a situation in which we want to test the null hypothesis H0 : θ = θ0 against the two-sided alternative hypothesis H1 : θ Z θ0 . ␣ is referred to as the level of significance. if we are testing the null hypothesis H0 : θ = θ0 against the one-sided alternative H1 : θ < θ0 . it would be logical to let the critical region consist of both tails of the sampling distribution of our test statistic . at least those based on known population distributions. All rights reserved. Most of these tests. Freund’s Mathematical Statistics with Applications. DEFINITION 1. ˆ Likewise. and Proportions Reject H0 Accept H0 Reject H0 a/2 a/2 x s m0 s m0 za/2 m0 za/2 n n Figure 1. the null hypothesis μ = μ0 is rejected if X takes on a value falling in either tail of its sampling distribution. where x − μ0 z= √ σ/ n Had we used the one-sided alternative μ > μ0 . or σ σ x F μ0 − zα/2 · √ and x G μ0 + zα/2 · √ n n As pictured in Figure 1. Critical region for one-tailed test (H1 : μ > μ0 ). this critical region can be written as z F −zα/2 or z G zα/2 . Critical region for two-tailed test. Variances. Symbolically. 360 . It stands to reason that in the first case we would reject the null hypothesis only for values of X falling into the right-hand tail of its sampling distribution. and in the second case we would reject the null hypothesis only for values of X falling into the left-hand Accept H0 Reject H0 a x m0 s m0 za n Figure 2. Tests of Hypothesis Involving Means. the likelihood ratio technique would have led to the one-tailed test whose critical region is pictured in Figure 3. the likelihood ratio technique would have led to the one-tailed test whose critical region is pictured in Figure 2. and if we had used the one-sided alternative μ < μ0 . or the critical values) require knowledge of zα or zα/2 . With reference to the test for which the critical region is shown in Figure 2. the boundaries of the critical regions. or an F distribution. and 3. 2. However. or reserve judgment. “there are many problems in which it is difficult. it has been the custom to base tests of statistical hypotheses almost exclusively on the level of significance α = 0. determine a critical region of size α.05 or α = 0. and Proportions Reject H0 Accept H0 a x s m0 m0 za n Figure 3. Variances. it has been the custom to outline tests of hypotheses by means of the following steps: 1. is not specified in a test of significance. and of course it is. Check whether the value of the test statistic falls into the critical region and. we could use a decision-theory approach and thus take into account the consequences of all possible actions. 2.Tests of Hypothesis Involving Means. 3. but the problem is not always this simple. F . 4. Determine the value of the test statistic from the sample data.” With the advent of computers and the general availability of statistical software. and specify α. χα/2 2 . two-sided alternatives usually lead to two-tailed tests and one-sided alternatives usually lead to one-tailed tests. or F α α/2 . where z is as defined before.) In Figures 1.01. Although there are exceptions to this rule (see Exercise 1). χα2 . Traditionally. but only for a few values of α. Using the sampling distribution of an appropriate test statistic. Critical region for one-tailed test (H1 : μ < μ0 ). tα/2 . the usual tables will provide the necessary values of tα . For instance. if not impossible. to assign numerical values to the consequences of one’s actions and to the probabilities of all eventualities. Alternatively. we compare the shaded region of Figure 4 with 361 . Formulate H 0 and H 1 . if the sampling distribution of the test statistic happens to be a t distribution. a chi-square distribution. and this accounts for the current preference for using P-values (see Definition 2). accordingly. the corresponding critical regions can be written as z G zα and as z F −zα . This may seem very arbitrary. These values are readily available from Table III of “Statistical Tables” (or more detailed tables of the standard normal distribution) for any level of significance α. Mainly for this reason. the dividing lines of the test criteria (that is. the four steps outlined on this page may be modified to allow for more freedom in the choice of the level of significance α. (Note that we do not accept the null hypothesis because β. Symbolically. the probability of false acceptance. reject the null hypothesis. tail of its sampling distribution. say. or we withhold judgment. Diagram for definition of P-values. we define P-values as follows. Corresponding to an observed value of a test statistic. Determine the value of the test statistic and the corresponding P-value from the sample data. when the alternative hypothesis is μ < μ0 and the critical region is the one of Figure 3. we reject the null hypothesis if the shaded region of Figure 4 is less than or equal to α. Variances. More generally. the P-value is 2P(X G x) or 2P(X F x). α instead of comparing the observed value of X with the boundary of the critical region or the value of X − μ0 Z= √ σ/ n with zα/2 . it is the probability P(X G x) when the null hypothesis is true. this allows for more freedom in the choice of the level of significance. DEFINITION 2. This shaded region is referred to as the P-value. the first of the four steps on the previous page remains unchanged. Check whether the P-value is less than or equal to α and. Tests of Hypothesis Involving Means. the prob-value. the second step becomes 2 .05 or α = 0. the observed value of X. In fact. P-VALUE. but it is difficult to conceive of situations in which we could justify using. and the fourth step becomes 4 . In 362 .01. As we pointed out earlier. Here again we act as if the null hypothesis is true. or reserve judgment. accordingly. the P-value is the lowest level of significance at which the null hypothesis could have been rejected.015 rather than α = 0. and Proportions P-value ⫺ x Figure 4. or the observed level of significance correspond- ing to x. depending on whether x falls into the right-hand tail or the left-hand tail of the sampling distribution of X. Correspondingly. In other words.04 rather than α = 0. the tail probability. α = 0. and when the alternative hypothesis is μ Z μ0 and the critical region is the one of Figure 1. reject the null hypothesis. Specify the test statistic. the P-value is the probability P(X F x) when the null hypoth- esis is true. With regard to this alternative approach to testing hypotheses. the third step becomes 3 . scientists get P-values of 0. and in Section 3 we shall discuss the corresponding tests concern- ing the means of two populations. consider the temptation one might be exposed to when choosing α after having seen the P-value with which it is to be compared. which do not require knowledge about the population or populations from which the samples are obtained. In any case.0021 for the effectiveness of these drugs in reducing the size of tumors. Limiting their role to data analysis.05.33. where we are not really concerned with making inferences. 363 . The critical regions for the respec- tive alternatives are |z| G zα/2 . it would be tempting to choose α = 0.01 = 2. z G zα . Variances.05 or α = 0. If we are anxious to reject the null hypothesis and thus prove our point. at least in part. or reserving judgment—will be the same. when a great deal is at stake and it is practical. P-values can be used as measures of the strength of evi- dence.0735 and 0. we might use a level of significance much smaller than α = 0. or μ < μ0 on the basis of a random sample of size n from a normal population with the known variance σ 2 . There are statisticians who prefer to avoid all problems relating to the choice of the level of significance.645. This means that no matter which method we use. μ > μ0 . it is virtually impossible to avoid some element of arbitrariness. All the tests in this section are based on normal distribution theory.036.05 and 0.01 reflects acceptable risks.575. that in cancer research with two drugs. and the nature of the problem (see.01. Tests of Hypothesis Involving Means. Of course. for instance. the cor- responding values of zα and zα/2 are z0. Suppose that we want to test the null hypothesis μ = μ0 against one of the alter- natives μ Z μ0 . In practice. Nevertheless. Suppose. in exploratory data analysis. and in most cases we judge subjectively. we use whichever method is most convenient.01. it is always desirable to have input from others (research workers or management) in formulating hypotheses and specifying α. the ultimate decision—rejecting the null hypothesis.025 = 1. for instance. the most commonly used levels of significance are 0. This suggests that there is more supporting evidence for the effectiveness of the second drug. and Proportions practice.005 = 2. for instance. are equivalent. Of course.05 = 1. Example 8 and Exercise 57). To compound the difficulties. there are also some nonparametric alternatives to these tests. assuming either that the samples come from normal populations or that they are large enough to justify normal approximations. z0. but it would hardly seem reasonable to dump P-values into the laps of persons without adequate training in statistics and let them take it from there. it would be tempting to choose α = 0. they do not specify α and omit step 4 .” 2 Tests Concerning Means In this section we shall discuss the most widely used tests concerning the mean of a population. and as the reader can verify from Table III of “Statistical Tables”. if we are anxious to accept the null hypothesis and thus prove our point. where x − μ0 z= √ σ/ n As we indicated in Section 1. and z0. it should be understood that the two methods of testing hypotheses. and z F −zα . and this may depend on the sampling distribution of the test statistic. the four steps given earlier and the four steps described here.01. z0.96. or that the second drug “looks much more promising. Suppose. whether α = 0. that an experi- ment yields a P-value of 0. the availability of statistical tables or computer software. EXAMPLE 2 Suppose that 100 high-performance tires made by a certain manufacturer lasted on the average 21.16 ounce. Solution 1.05 level of significance. When we are dealing with a large sample of size n G 30 from a population that need not be normal but has a finite variance.575 or z G 2. consider the following example. Substituting x = 8. Test the null hypothesis μ = 22. To illustrate the use of such an approximate large-sample test. and in the second case α would be the maximum probability of committing a type I error for any value of μ assumed under the null hypothesis.091 − 8 z= √ = 2.01. employees select a random sample of 25 packages and find that their mean weight is x = 8.16/ 25 4. σ = 0. where x − μ0 z= √ σ/ n 3.000 miles against the alternative hypothesis μ < 22. we get 8. Had we used the alternative approach described in Section 1.16. It should be noted that the critical region z G zα can also be used to test the null hypothesis μ = μ0 against the simple alternative μ = μ1 > μ0 or the composite null hypothesis μ F μ0 against the composite alternative μ > μ0 . Of course. and since 0.819 miles with a standard deviation of 1. Variances. and n = 25. 364 . and Proportions EXAMPLE 1 Suppose that it is known from experience that the standard deviation of the weight of 8-ounce packages of cookies made by a certain bakery is 0.091 ounces. the null hypothesis must be rejected and suitable adjustments should be made in the production process.0046 is less than 0. and even when σ 2 is unknown we can approximate its value with s2 in the computation of the test statistic.0046 (see Exercise 21).575. similar arguments apply to the critical region z F −zα .575. μ0 = 8.01 level of significance. we would have obtained a P-value of 0. to check whether the true average weight of the packages is 8 ounces. that is. we can use the central limit theo- rem to justify using the test for normal populations. In the first case we would be testing a simple hypothesis against a simple alternative. To check whether its production is under control on a given day.091. Tests of Hypothesis Involving Means.84 0. Reject the null hypothesis if z F −2. H0 : μ = 8 H1 : μ Z 8 α = 0. Since the bakery stands to lose money when μ > 8 and the customer loses out when μ < 8. test the null hypothesis μ = 8 against the alternative hypothesis μ Z 8 at the 0. the conclusion would have been the same.01 2.84 exceeds 2.295 miles.000 miles at the 0. Since z = 2. If five pieces randomly selected from different rolls have breaking strengths of 171. To illustrate this one-sample t test. Note that the comments made on the previous page in connection with the alternative hypothesis μ1 > μ0 and the test of the null hypothesis μ F μ0 against the alternative μ > μ0 apply also in this case. H0 : μ = 22.9. the likeli- hood ratio technique yields a corresponding test based on x − μ0 t= √ s/ n which is a value of a random variable having the t distribution with n − 1 degrees of freedom. 191. we get 21. where x − μ0 z= √ σ/ n 3. the test we have been discussing in this section cannot be used. or μ < μ0 are. When n < 30 and σ 2 is unknown. the null hypothesis cannot be rejected.819. |t| G tα/2. 178.000 H1 : μ < 22.05 365 . Substituting x = 21. μ > μ0 . Had we used the alternative approach described in Section 1. consider the follow- ing example. and 189. critical regions of size α for testing the null hypothesis μ = μ0 against the alternatives μ Z μ0 .40 is greater than −1.05 2.1 pounds. for random samples from normal populations. there is no convincing evidence that the tires are not as good as assumed under the null hypothesis. Thus.645.05 level of significance. As should have been expected.8.6.295 for σ . and t F −tα.819 − 22.3. Reject the null hypothesis if z F −1.645.000 z= √ = −1. and n = 100. H0 : μ = 185 H1 : μ < 185 α = 0. Solution 1.05. test the null hypothesis μ = 185 pounds against the alternative hypothesis μ < 185 pounds at the 0. 184. n−1 . t G tα.40 1.295/ 100 4. and Proportions Solution 1. EXAMPLE 3 The specifications for a certain kind of ribbon call for a mean breaking strength of 185 pounds. Since z = −1. respectively.000. Variances. However.n−1 . which exceeds 0. as it is usually called.Tests of Hypothesis Involving Means. we would have btained a P-value of 0. the conclusion is the same: The null hypothesis cannot be rejected.000 α = 0.0808 (see Exercise 22). n−1 . μ0 = 22. s = 1. 49 is greater than −2. 3. First we calculate the mean and the standard deviation. or μ1 − μ2 < δ.20 against the alternative hypothesis μ1 − μ2 Z 0.132. we can still use the test that we have just described with s1 substituted for σ1 and s2 substituted for σ2 as long as both samples are large enough to invoke the central limit theorem.05.2/ 5 4.61 milligrams with a standard deviation of s1 = 0. where t is determined by means of the formula given above and 2. Applying the likelihood ratio technique.14 milligram. or we may want to decide on the basis of an appropriate sample survey whether the average weekly food expenditures of families in one city exceed those of families in another city by at least $10. we may want to decide upon the basis of suitable samples whether men can perform a certain task as fast as women. 366 . Then. test the null hypothesis μ1 − μ2 = 0.1 − 185 t= √ = −0. Let us suppose that we are dealing with independent random samples of sizes n1 and n2 from two normal populations having the means μ1 and μ2 and the known variances σ12 and σ22 and that we want to test the null hypothesis μ1 − μ2 = δ.00. Since t = −0.51 8. If we went beyond this and concluded that the rolls of ribbon from which the sample was selected meet specifications. Reject the null hypothesis if t F −2. Variances.1 and s = 8. we will arrive at a test based on x1 − x2 . EXAMPLE 4 An experiment is performed to determine whether the average nicotine content of one kind of cigarette exceeds that of another kind by 0.05 level of significance. the null hypothesis cannot be rejected.132. If n1 = 50 cigarettes of the first kind had an average nicotine content of x1 = 2. we would. 3 Tests Concerning Differences Between Means In many problems in applied research. where δ is a given constant.20 at the 0. μ1 − μ2 > δ. and the respective critical regions can be written as |z| G zα/2 .38 milligrams with a standard deviation of s2 = 0. and z F −zα . For instance.4 . Base the decision on the P-value corresponding to the value of the appropriate test statistic. and Proportions 2.132 is the value of t0. against one of the alternatives μ1 − μ2 Z δ. we are interested in hypotheses concerning differences between the means of two populations. of course. where x1 − x 2 − δ z=  σ12 σ22 + n1 n2 When we deal with independent random samples from populations with unknown variances that may not even be normal. getting x = 183.2. be exposed to the unknown risk of committing a type II error.20 milligram. Tests of Hypothesis Involving Means. we get 183. substituting these values together with μ0 = 185 and n = 5 into the formula for t.12 milligram. whereas n2 = 40 cigarettes of the other kind had an average nicotine content of x2 = 2. z G zα . n1 +n2 −2 . However. s2 = 0. Use the test statistic Z.20 H1 : μ1 − μ2 Z 0.20 z=  = 1. respectively. Assuming 367 .05. When n1 and n2 are small and σ1 and σ2 are unknown. and n2 = 40 into this formula. This means that the difference may well be attributed to chance. t G tα.38 = 0. EXAMPLE 5 In the comparison of two kinds of paint.3599 is the entry in Table III of “Statistical Tables” for z = 1. and t F −tα. To illustrate this two-sample t test.61. or μ1 − μ2 < δ under the given assumptions are. for independent random samples from two normal populations having the same unknown variance σ 2 . |t| G tα/2.08.38. Substituting x1 = 2.2802. the likelihood ratio tech- nique yields a test based on x1 − x 2 − δ t=  1 1 sp + n1 n2 where (n1 − 1)s21 + (n2 − 1)s22 s2p = n1 + n2 − 2 Under the given assumptions and the null hypothesis μ1 − μ2 = δ. where x1 − x2 − δ z=  σ12 σ22 + n1 n2 3 . whereas four 1-gallon cans of another brand cover on the average 492 square feet with a standard deviation of 26 square feet. a consumer testing service finds that four 1-gallon cans of one brand cover on the average 546 square feet with a standard deviation of 31 square feet.14)2 + 50 40 The corresponding P-value is 2(0. μ1 − μ2 > δ.20 α = 0.61 − 2.2802 exceeds 0. and Proportions Solution 1.61 − 2.20.20 is not significant.12 for σ1 .Tests of Hypothesis Involving Means.38 − 0. n1 +n2 −2 . δ = 0. H0 : μ1 − μ2 = 0. x2 = 2. we say that the difference between 2. consider the following problem.08 (0. we get 2.12)2 (0. s1 = 0. Since 0. this expression for t is a value of a random variable having the t distribution with n1 + n2 − 2 degrees of freedom. where 0. n1 = 50. Thus.23 and 0. the null hypothesis cannot be rejected.14 for σ2 . 4 .05  2 .5000 − 0. Variances.3599) = 0. the appropriate critical regions of size α for testing the null hypoth- esis μ1 − μ2 = δ against the alternatives μ1 − μ2 Z δ. the test we have been discussing cannot be used. n1 +n2 −2 . 0185 in this example. Solution 1.609 + 4 4 4. there are several possibilities.943. In Exercise 41 the reader will be asked to use suitable computer software to show that the P-value would have been 0.05 level of significance. the null hypothesis must be rejected. Tests of Hypothesis Involving Means. and the conclusion would.943 is the value of t0. Variances. we get  3(31)2 + 3(26)2 sp = = 28. to decide on the basis of weights “before and after” whether a certain diet is really effective or whether an observed difference between the average I. so the formula for s2p becomes 1 2 s2p = (s + s22 ) 2 1 Use of this formula would have simplified the calculations in this special case. from a normal population that. has the mean μ = δ.6 .Q. 3. the Smith–Satterthwaite test.609 4+4−2 and then substituting its value together with x1 = 546.67 exceeds 1. Then we test this null hypothesis against the appropriate alternative by means of the methods of Section 2. for example. A relatively simple one consists of randomly pairing the val- ues obtained in the two samples and then looking upon their differences as a random sample of size n1 or n2 . First calculating sp . and Proportions that the two populations sampled are normal and have equal variances.05 2.05. H0 : μ1 − μ2 = 0 H1 : μ1 − μ2 > 0 α = 0. Since t = 2. have been the same. we conclude that on the average the first kind of paint covers a greater area than the second. If the assumption of equal variances is untenable in a problem of this kind. δ = 0. under the null hypothesis. x2 = 492.943. and the methods we have introduced in this section cannot be used. whichever is smaller. Reject the null hypothesis if t G 1. test the null hypothesis μ1 − μ2 = 0 against the alternative hypothesis μ1 − μ2 > 0 at the 0.67 1 1 28. In both of these examples the samples are not independent because the data are actually paired. where t is calculated according to the formula given on the previous page and 1.’s of husbands and their wives is really significant. Note that n1 = n2 in this example. but there exist alternative techniques for handling the case where n1 Z n2 (one of these. and n1 = n2 = 4 into the formula for t. of course. we obtain 546 − 492 t=  = 2. So far we have limited our discussion to random samples that are independent. This is a good rea- son for having n1 = n2 . is mentioned among the references at the end of the chapter). A common way of handling this 368 . As far as direct applications are concerned. and that the probabilities of type I and type II errors are sis μ = μ1 . tests about variances are often prerequisites for tests concerning other parameters. Suppose that a random sample from a normal popula. Variances. δ  = 86. where μ1 > μ0 . then X2 has a chi-square distribution with ν − ν1 degrees of freedom”. α = (μ1 − μ0 )2 0. Show that the required size of the sample is (σ12 + σ22 )(zα + zβ )2 given by n= (δ − δ  )2 5. δ = 80. a teacher may want to know whether certain statements are true about the variability that he or she can expect in the performance of a student. μ0 = 15. or σ 2 < σ02 . and Proportions kind of problem is to proceed as in the preceding paragraph. and that the probabilities of to have the preassigned values α and β. and X1 + X2 has a chi-square distribution with ν > ν1 degrees of freedom. α = 0. 3. we can then use the test described in Section 2 to test the null hypothesis μ1 − μ2 = δ against the appropriate alternative. As far as indirect applications are concerned. and β = 0. and a pharmacist may have to check whether the variation in the potency of a medicine is within permissible limits. we shall want to test the null hypothesis σ 2 = σ02 against one of the alternatives σ 2 Z σ02 . 4. σ 2 > σ02 . hypothesis μ Z μ0 with the use of a one-tailed criterion based on the chi-square distribution. and if n is small. and the likelihood ratio technique leads to a test based on s2 .01.05. Based on theorem “If X1 and X2 are independent random vari- ables. and in practice this means that we may have to check on the reasonableness of this assumption before we perform the test concerning the means. and β = 0. μ1 = hypothesis μ = μ0 can be tested against the alternative 20. find the ulation with the known variance σ 2 . we can thus write the critical regions 369 . 4 Tests Concerning Variances There are several reasons why it is important to test hypotheses concerning the vari- ances of populations. Given a random sample of size n from a normal population. the two-sample t test described in Section 3 requires that the two pop- ulation variances be equal. Given a random sample of size n from a normal pop. a manufacturer who has to meet rigid specifications will have to perform tests about the variability of his product. to work with the differences between the paired measurements or observations. we can use the t test described also in Section 2 provided the differences can be looked upon as a random sample from a normal population. Tests of Hypothesis Involving Means. that is. Show that the type I and type II errors are to have the preassigned val. show that the null required size of the sample when σ = 9. With reference to the preceding exercise. If n is large. the value of the sample variance. The tests that we shall study in this section include a test of the null hypothesis that the variance of a normal population equals a given constant and the likelihood ratio test of the equality of the variances of two normal populations. For instance. find the required size of σ 2 (zα + zβ )2 n= the samples when σ1 = 9. With reference to Exercise 4.01. required size of the sample is given by ues α and β. Suppose that independent random samples of size n from two normal populations with the known variances 2. Exercises 1. σ12 and σ22 are to be used to test the null hypothesis μ1 − tion with the known variance σ 2 is to be used to test the μ2 = δ against the alternative hypothesis μ1 − μ2 = δ  null hypothesis μ = μ0 against the alternative hypothe.01. σ2 = 13. X1 has a chi-square distribution with ν1 degrees of freedom. Given independent random samples of sizes n1 and n2 from two normal popula- tions with the variances σ12 and σ22 . σ02 = 0.05 2.68) χ2 = = 32.36 4. This serves to indicate again that the choice of the level of significance is something that must always be specified in advance.68. The process is considered to be under control if the variation of the thicknesses is given by a variance not greater than 0. 2 n−1 . since χ 2 = 32. and Proportions for testing the null hypothesis against the two one-sided alternatives as χ 2 G χα. EXAMPLE 6 Suppose that the uniformity of the thickness of a part used in a semiconductor is critical and that measurements of the thickness of a random sample of 18 such parts have the variance s2 = 0. and the size of all these critical regions is.11 exceeds 27.36 against the alternative hypothesis σ 2 > 0.01 in the preceding example.17 2 . we get 17(0.36.11 does not exceed χ0.36 α = 0.587.587 is the value of χ0. Since χ 2 = 32. where the measurements are in thousandths of an inch. test the null hypothesis σ 2 = 0.05 level of significance. equal to α.68.17 2 = 33.05. n−1 .36 at the 0. the null hypothesis must be rejected and the process used in the manufacture of the parts must be adjusted.36. Solution 1.587. Note that if α had been 0. where (n − 1)s2 χ2 = σ02 and 27.11 0. where (n − 1)s2 χ2 = σ02 As far as the two-sided alternative is concerned. Reject the null hypothesis if χ 2 G 27. The likelihood ratio statistic for testing the equality of the variances of two nor- mal populations can be expressed in terms of the ratio of the two sample variances. Tests of Hypothesis Involving Means. 2 n−1 or χ F χ1−α/2. 2 n−1 and χ 2 F χ1−α. so we will be spared the temptation of choosing a value that happens to suit our purpose (see also Section 1). Substituting s2 = 0. and n = 18. Assuming that the mea- surements constitute a random sample from a normal population.409. we thus find from the theorem “If S21 and S22 are the variances of independent random samples of sizes n1 and n2 from normal 370 .01. of 2 2 course.36 H1 : σ 2 > 0. H0 : σ 2 = 0. 3. Variances. we reject the null hypothesis if χ 2 G χα/2. the null hypothesis could not have been rejected. n2 −1 if s21 G s22 s2 and s22 G fα/2.67 is the value of s22 f0. where the units of measurement are 1. then has an F distribution with ν2 and ν1 degrees X of freedom. we get s21 19. H0 : σ12 = σ22 H1 : σ12 Z σ22 α = 0.49 exceeds 3. s21 = 19. test the null hypothesis σ12 = σ22 against the alternative σ12 Z σ22 at the 0.Tests of Hypothesis Involving Means.12. reject the null hypothesis if G 3.5 4. respectively. n2 −1.2.5.5. n2 = 16. EXAMPLE 7 In comparing the variability of the tensile strength of two kinds of structural steel. where 3. s21 s22 G fα. 3. and Proportions S21 /σ12 σ22 S21 populations with the variances σ12 and σ22 . Since s21 G s22 .000 pounds per square inch. an experiment yielded the following results: n1 = 13.01. the null hypothesis must be rejected.02 level of significance. Solution 1. Substituting s21 = 19. Assuming that the measurements constitute independent random samples from two normal popu- lations. which is made possible by the fact that if the random variable X has an F distribution with 1 ν1 and ν2 degrees of freedom. n1 −1.49 s22 3.67. then F = = is a random S22 /σ22 σ12 S22 variable having an F distribution with n1 − 1 and n2 − 1 degrees of freedom” that corresponding critical regions of size α for testing the null hypothesis σ12 = σ22 against the one-sided alternatives σ12 > σ22 or σ12 < σ22 are. 371 . n1 −1 s22 s21 The appropriate critical region for testing the null hypothesis against the two-sided alternative σ12 Z σ22 is s21 2 G fα/2.15 .67.2 = = 5. n2 −1. n1 −1. and s22 = 3. n2 −1 and G fα.2 and s22 = 3. n1 −1 if s21 < s22 s21 Note that this test is based entirely on the right-hand tail of the F distribution. Since f = 5. we con- clude that the variability of the tensile strength of the two kinds of steel is not the same. Variances.02 s21 2. and Proportions Exercises 6. This question has been intentionally omitted for this n−1 edition. as well as the ones that follow. show that for Also construct corresponding critical regions for testing large samples from normal populations this null hypothesis against the alternatives σ 2 < σ02 and ⎡  ⎤ σ 2 Z σ02 . n. the number of children who are absent from school on a given day. we might test on the basis of a sample whether the true proportion of cures from a certain disease is 0. the critical region of size α of the likelihood ratio criterion is x G kα where kα is the smallest integer for which  n b(y. In fact. a test concerning the parameter θ of the binomial distribution. Let’s take that the most powerful critical region for testing the null hypothesis θ = θ0 against the alternative hypothesis θ = θ1 < θ0 . the number of imperfections found in a piece of cloth. is large. Variances. if we want to test the null hypothesis θ = θ0 against the one-sided alternative θ > θ0 . 5 Tests Concerning Proportions If an outcome of an experiment is the number of votes that a candidate receives in a poll. θ0 ) F α y=0 372 . Thus. we refer to such data as count data. n. .. is thus as close as possible to α without exceeding it.02. n. When it comes to composite alternatives. Appropriate models for the analysis of count data are the binomial distribution. The corresponding critical region for testing the null hypothesis θ = θ0 against the one-sided alternative θ < θ0 is x F kα where kα is the largest integer for which kα  b(y. θ0 ) F α y=kα and b(y. the multinomial distribution. the number of degrees of freedom. . where θ is the parameter of a binomial population. θ0 ) is the probability of getting y successes in n binomial trials when θ = θ0 . is based on the value of X. The size of this critical region. and some of the other discrete distributions. Making use of the fact that the chi-square distribution is an approximate critical region of size α for testing the can be approximated with a normal distribution when ν. the likelihood ratio technique also yields tests based on the observed number of successes. the Poisson distribution.90 or whether the true proportion of defectives coming off an assembly line is 0. the number of “successes” obtained in n trials. In this section we shall present one of the most common tests based on count data. . Tests of Hypothesis Involving Means. 2 s2 G σ02 ⎣1 + zα ⎦ 7. null hypothesis σ 2 = σ02 against the alternative σ 2 > σ02 . 373 . H0 : θ = 0.50 at the 0.05 level of significance. for large values of n we can use the normal approximation to the binomial distribution and treat x − nθ z= √ nθ (1 − θ ) as a value of a random variable having the standard normal distribution.0118. For large n.50 H1 : θ Z 0. the null hypothesis must be rejected. θ > θ0 . 12 − nθ0 z= √ nθ0 (1 − θ0 ) if we use the continuity correction.50. the observed number of successes. z G zα . we conclude that θ Z 0. respectively.0059) = 0. For n F 20 we can use Table I of “Statistical Tables”. and z F −zα . 3 .0059. Alter- natively. Solution 1. x = 4. and Proportions and.05  2 . We use the minus sign when x exceeds nθ0 and the plus sign when x is less than nθ0 . the critical region for testing the null hypothesis θ = θ0 against the two-sided alternative θ Z θ0 is x G kα/2 or x F kα/2 We shall not illustrate this method of determining critical regions for tests concerning the binomial parameter θ because. regardless of whether we use the four steps discussed in Section 1.Tests of Hypothesis Involving Means. 0. where x − nθ0 z= √ nθ0 (1 − θ0 ) or x . Use the test statistic X. or θ < θ0 using. Since the P-value. the critical regions |z| G zα/2 . it is much less tedious to base the decisions on P-values. EXAMPLE 8 If x = 4 of n = 20 patients suffered serious side effects from a new medication. the P-value is 2(0. finally.50 α = 0. is less than 0. 4 . we can thus test the null hypothesis θ = θ0 against the alternatives θ Z θ0 .05. The tests we have described require the use of a table of binomial probabilities. and since P(X F 4) = 0. Here θ is the true proportion of patients suffering serious side effects from the new medication.0118. and for values of n up to 100 we can use the tables in Tables of the Binomial Probability Distribution and Binomial Tables. test the null hypothesis θ = 0.50 against the alternative hypothesis θ Z 0. Variances. in actual practice. k ni θi (1 − θi ) with standard normal distributions. we must decide whether observed differences among sample proportions. as claimed. we may want to investigate whether the difference between these two percentages is significant. we would have obtained z = −3. Xn are n independent random variables having standard normal distributions. according to the theorem “If X1 .80) 4. . . Note that if we had used the continuity correction in the preceding example. X2 . . the null hypothesis must be rejected. .18 200(0. . . Variances. where (without the continuity correction) x − nθ0 z= √ nθ0 (1 − θ0 ) 3. suppose that x1 . Test this claim at the 0. Similarly. . . less than 20 percent of all car owners have not tried the oil company’s gasoline. are significant or whether they can be attributed to chance. . we conclude that. . xk are observed values of k independent random variables X1 . we can then 2 look upon 374 . To indicate a general method for handling problems of this kind. Since z = −3. and. then Y = i=1 Xi has the chi-square distribution with ν = n degrees of freedom”. H0 : θ = 0. . For instance. we get 22 − 200(0.33. . and Proportions EXAMPLE 9 An oil company claims that less than 20 percent of all car owners have not tried its gasoline.01 level of significance if a random check reveals that 22 of 200 car owners have not tried the oil company’s gasoline. we can approximate the distributions of the inde- pendent random variables Xi − ni θi Zi = √ for i = 1. nk and θk .09 and the conclusion would have been the same. if 6 percent of the frozen chickens in a sample from one supplier fail to meet certain standards and only 4 percent in a sample from another supplier fail to meet the standards. . Xk having binomial distributions with the parameters n1 and θ1 . Tests of Hypothesis Involving Means. Reject the null hypothesis of z F −2. we may want to judge on the basis of sample data whether equal proportions of voters in four different cities favor a certain candidate for governor. If the n’s are sufficiently large.33.20 H1 : θ < 0.20. or percentages. . n = 200. . x2 . n2 and θ2 . . . X2 . and θ0 = 0. . 2. Substituting x = 22. .20) z= √ = −3.20 α = 0.18 is less than −2.20)(0. 6 Tests Concerning Differences Among k Proportions In many problems in applied research.01 2. Solution 1. . we can thus use the critical region χ 2 G χα. when we are interested only in the null hypoth- esis θ1 = θ2 = · · · = θk . as before. and Proportions  k (xi − ni θi )2 χ2 = ni θi (1 − θi ) i=1 as a value of a random variable having the chi-square distribution with k degrees of freedom. and estimate the expected cell frequencies as ei1 = ni θ̂ and ei2 = ni (1 − θ̂ ) for i = 1. Variances. Let us now present an alternative formula for the chi-square statistic immedi- ately above. . k. the change in the critical region from χα. which. Successes Failures Sample 1 x1 n1 − x1 Sample 2 x2 n2 − x2 ··· ··· Sample k xk nk − xk Under the null hypothesis θ1 = θ2 = · · · = θk = θ0 the expected cell frequencies for the first column are ni θ0 for i = 1. where the first subscript indicates the row and the second subscript indicates the column of this k * 2 table. lends itself more rapidly to other applications. let us refer to its entries as the observed cell frequencies fij . we substitute for θ the pooled estimate x1 + x2 + · · · + xk θ̂ = n1 + n2 + · · · + nk and the critical region becomes χ 2 G χα. . is due to the fact that an estimate is substituted for the unknown parameter θ . that is. as we shall see in Section 7. . where  k (xi − ni θ0 )2 χ2 = ni θ0 (1 − θ0 ) i=1 When θ0 is not specified. 2. that is. When θ0 is not known. θ1 = θ2 = · · · = θk = θ0 (against the alternative that at least one of the θ ’s does not equal θ0 ). where  k (xi − ni θ̂ )2 χ2 = i=1 ni θ̂ (1 − θ̂ ) The loss of 1 degree of freedom.k 2 . It will be left to the reader to show in Exercise 8 that the chi- square statistic  k (xi − ni θ̂ )2 χ2 = i=1 ni θ̂ (1 − θ̂ ) 375 . we substitute for it.k−1 2 . To test the null hypothesis.k−1 2 . . If we arrange the data as in the following table. 2. . . and those for the second column are ni (1 − θ0 ).k 2 to χα. .Tests of Hypothesis Involving Means. the pooled estimate θ̂ . k. . 3. Solution 1.47) = 235 e31 = 400(0.991. on the basis of the sample data shown in the following table.47) = 188 and substitution into the formula for χ 2 given previously yields (232 − 212)2 (260 − 265)2 (197 − 212)2 χ2 = + + 212 265 212 (168 − 188)2 (240 − 235)2 (203 − 188)2 + + + 188 235 188 = 6. α = 0. H0 : θ1 = θ2 = θ3 H1 : θ1 . Tests of Hypothesis Involving Means. where  3  2 (fij − eij )2 χ2 = eij i=1 j=1 and 5. Variances.05 2. and Proportions can also be written as  k  2 (fij − eij )2 χ2 = eij i=1 j=1 EXAMPLE 10 Determine.300 the expected cell frequencies are e11 = 400(0.48 376 . Reject the null hypothesis if χ 2 G 5.991 is the value of χ0.53) = 265 and e22 = 500(0.05.53 400 + 500 + 400 1.2 2 . Since the pooled estimate of θ is 232 + 260 + 197 689 θ̂ = = = 0. whether the true proportion of shoppers favoring detergent A over detergent B is the same in all three cities: Number favoring Number favoring detergent A detergent B Los Angeles 232 168 400 San Diego 260 240 500 Fresno 197 203 400 Use the 0.53) = 212 and e12 = 400(0.47) = 188 e21 = 500(0. θ2 . and θ3 are not all equal.05 level of significance.53) = 212 and e32 = 400(0. Exercises 8. Since χ 2 = 6. Tests of Hypothesis Involving Means. and Proportions 4. Variances. n1 n2 z=  . in other words.48 exceeds 5. the true proportions of shoppers favoring detergent A over detergent B in the three cities are not the same.991. the null hypothesis must be rejected. Show that the two formulas for χ 2 in Section 6 are x 1 x2 − equivalent. 6 on the basis of five obser-  2 (xi − ni θ̂ )2 vations. χ2 = ni θ̂ (1 − θ̂ ) 11. Note that the test described in Exercise 12. tested on the basis of the statistic 7 The Analysis of an r * c Table The method we shall describe in this section applies to two kinds of problems. Modify the critical regions in Section 5 so that they 1 1 θ̂ (1 − θ̂ ) + can be used to test the null hypothesis λ = λ0 n1 n2 against the alternative hypotheses λ > λ0 . but not the one based on the 12. It would also have been the case in Example 10 if each shopper had been asked whether he or she favors detergent A.05 level of significance. This would be the case. χ 2 statistic. For k = 2. or does not care one way or the other. with each trial permitting c possible outcomes. which differ conceptually but are analyzed in the same way. Here λ is the param.025 to test the null hypothesis λ = 3. n1 + n2 10. favors detergent B. λ < λ0 . In the first kind of problem we deal with samples from r multinomial populations. can be used when the alternative hypothesis ulations.6 against the alternative hypothesis λ Z 3. or undecided. x 1 + x2 eter of the Poisson distribution. We might thus have obtained the results shown in the following 3 * 3 table: Number favoring Number favoring Number detergent A detergent B indifferent Los Angeles 174 93 133 400 San Diego 196 124 180 500 Fresno 148 105 147 400 377 . show that the χ 2 formula can be written as i=1 (n1 + n2 )(n2 x1 − n1 x2 )2 so that the two tests are actually equivalent when χ2 = n1 n2 (x1 + x2 )[(n1 + n2 ) − (x1 + x2 )] the alternative hypothesis is θ1 Z θ2 . Given large random samples from two binomial pop. where θ̂ = . 9.025 and cise 12 equals k0. and λ Z λ0 on the basis of n observations. against her. With reference to Exercise 9. when persons interviewed in five different precincts are asked whether they are for a candidate. Use the 0. show that the null hypothesis θ1 = θ2 can be is θ1 < θ2 or θ1 > θ2 . Here r = 5 and c = 3. Show that the square of the expression for z in Exer- tical Tables” to find values corresponding to k0. use Table II of “Statis- 13. for instance. the null hypothesis we want to test is θij = θi· · θ·j for i = 1.’s of persons who have gone through a large company’s job-training program and their subsequent performance on the job: Performance Poor Fair Good Below average 67 64 25 156 I. and θ·j is the probability that an item will fall into the jth column. and Proportions The null hypothesis we would want to test in a problem like this is that we are sampling r identical multinomial populations. . . . 2. . Average 42 76 56 174 Above average 10 23 37 70 119 163 118 400 Here there is one sample of size 400. c. 2. on the other hand. In the preceding example we dealt with three samples.Q. let us consider the following table obtained in a study of the relationship. The null hypothesis we shall want to test by means of the preceding table is that the on-the-job performance of persons who have gone through the training program is independent of their I. whose fixed sizes were given by the row totals. . Such a table is assembled for the purpose of testing whether the row variable and the column variable are independent. if any. Since the method by which we analyze an r * c table is the same regardless of whether we are dealing with r samples from multinomial populations with c different outcomes or one sample from a multinomial population with rc different outcomes. and the row totals as well as the column totals are left to chance. the column totals were left to chance. Tests of Hypothesis Involving Means. let us discuss it here with regard to the latter.. we would want to test the null hypothesis θ1j = θ2j = · · · = θrj for j = 1. θi· is the probability that an item will fall into the ith row. and θrj are not all equal for at least one value of j. Symbolically. if θij is the probability of the jth outcome for the ith population. CONTINGENCY TABLE. 378 . In the other kind of problem where the method of this section applies. we are dealing with one sample and the row totals as well as the column totals are left to chance. Correspondingly. Variances. . the alternative hypothesis is θij Z θi· · θ·j for at least one pair of values of i and j. θ2j . . A table having r rows and c columns where each row represents c values of a non-numerical variable and each column repre- sents r values of a different nonnumerical variable is called a contingency table. and 400.Q. 500. In such a table. To give an example. The alternative hypothesis would be that θ1j . if θij is the probability that an item will fall into the cell belonging to the ith row and the jth column. . Symbolically. 2.. DEFINITION 3. of the I. the entries are count data (positive integers) and both the row and the column totals are left to chance. .Q. . r and j = 1. . . c. . 400. . In Exercise 15 the reader will be asked to parallel the work for the first kind of problem. With this notation. . the sum of all the cell frequencies. since we had to estimate the k parameters θ1 . we get s − t − 1 = rc − (r + c − 2) − 1 = (r − 1)(c − 1). and the number of degrees of freedom was 2k − k − 1 = k − 1.. we estimate the probabilities θi· and θ·j as fi· f·j θ̂i· = and θ̂·j = f f and under the null hypothesis of independence we get fi· f·j fi· · f·j eij = θ̂i· · θ̂·j · f = · ·f = f f f for the expected frequency for the cell in the ith row and the jth column. Thus. we base our decision on the value of  r  c (fij − eij )2 χ2 = eij i=1 j=1 and reject the null hypothesis if it exceeds χα. Since the test statistic that we have described has only approximately a chi- square distribution with (r − 1)(c − 1) degrees of freedom. When testing for differences among k proportions with the chi-square statistic of Section 6. When testing for independence with an r * c contingency table. and the grand total. Variances. The number of degrees of freedom is (r − 1)(c − 1). the row totals by fi· . the number of degrees of freedom is s − t − 1. θk . EXAMPLE 11 Use the data shown in the following table to test at the 0. and in connection with this let us make the following observation: Whenever expected cell frequencies in chi- square formulas are estimated on the basis of sample count data. we have s = rc and t = r + c − 2. . we had s = 2k and t = k. θ2 . Ability in mathematics Low Average High Low 63 42 15 Interest in statistics Average 58 61 31 High 14 47 29 379 . Note that eij is thus obtained by multiplying the total of the row to which the cell belongs by the total of the column to which it belongs and then dividing by the grand total. since the r parameters θi· and the c parameters θ·j are not all independent: Their respective sums must equal 1.Tests of Hypothesis Involving Means. by f .01 level of significance whether a person’s ability in mathematics is independent of his or her interest in statistics. the column totals by f·j . Once we have calculated the eij . sometimes this requires that we combine some of the cells with a corresponding loss in the number of degrees of freedom.(r−1)(c−1) 2 . where s is the number of terms in the summation and t is the number of independent parameters replaced by estimates. and Proportions In what follows. it is customary to use this test only when none of the eij is less than 5. . we shall denote the observed frequency for the cell in the ith row and the jth column by fij . suppose that we want to decide on the basis of the data (observed frequencies) shown in the following table whether the number of errors a compositor makes in setting a galley of type is a random variable having a Poisson distribution: 380 . Also.01. Since χ 2 = 32. 120 · 135 120 · 150 3.0 50. but in this case there is no specific ordering of the rows. Then.75)2 χ2 = + +···+ 45. A shortcoming of the chi-square analysis of an r * c table is that it does not take into account a possible ordering of the rows and/or columns.0.75 = 32. and 120 − 45.25. 8 Goodness of Fit The goodness-of-fit test considered here applies to situations in which we want to determine whether a set of data may be looked upon as a random sample from a population having a given distribution. Tests of Hypothesis Involving Means.01 2.277 is the value of χ0. Similarly.277.75.5. Variances. = 360 360 50. we conclude that there is a relationship between a person’s ability in mathematics and his or her interest in statistics. ability in mathematics. 62. and 31.277. in Example 11. the columns of the table in Section 7 reflect a definite ordering from favoring B (not favoring A) to being indifferent to favoring A.25.0 − 50. the expected frequencies for the second row are 56. 37.0.0)2 (29 − 18. is ordered from low to average to high. and Proportions Solution 1. H0 : Ability in mathematics and interest in statistics are independent. where we made use of the fact that for each row or column the sum of the expected cell frequencies equals the sum of the corresponding observed frequencies (see Exercise 14). Reject the null hypothesis if χ 2 G 13.0)2 (42 − 50. and 18. substituting into the formula for χ 2 yields (63 − 45. α = 0. and the value we get for χ 2 would remain the same if the rows and/or columns were interchanged among themselves.75. the null hypothesis must be rejected. and those for the third row (all obtained by subtraction from the column totals) are 33.0. For instance. To illustrate. H1 : Ability in mathematics and interest in statistics are not independent.0 18. where  r  c (fij − eij )2 χ2 = eij i=1 j=1 and 13.14 exceeds 13.14 4.5. The expected frequencies for the first row are = 45.0 = 25. as well as interest in statistics.4 2 . H0 : Number of errors is a Poisson random variable.Tests of Hypothesis Involving Means.2 7 10 0.05 381 . we get the expected frequencies shown in the right-hand column of the table. we must judge how good a fit.05 or.2240 98. we first use the mean of the observed distribution 1. getting λ̂ = = 3.6 3 5.6 3 107 0. To determine a corresponding set of expected frequencies for a random sam- ple from a Poisson population. the total frequency. we compute  m (fi − ei )2 χ2 = ei i=1 and reject H0 at the level of significance α if χ 2 G χα. α = 0. H1 : Number of errors is not a Poisson random variable.0081 3.0498 21.7 2 103 0. test at the 0. To test the null hypothesis that the observed fre- quencies constitute a random sample from a Poisson population. where m is the number of terms in the summation and t is the number of independent parameters estimated on the basis of the sample data (see the discussion in Section 7). the two classes are combined. and the number of degrees of freedom is m − 2. 2 m−t−1 . Solution (Since the expected frequencies corresponding to eight and nine errors are less than 5. to test the null hypothesis H0 that a set of observed data comes from a population having a specified distribution against the alternative that the population has some other distribution.7 Note that we have combined the last two classes in this table to create a new class with an expected frequency greater than 5.9 1 53 0. copying the Poisson probabilities for λ = 3 from Table II of “Statistical Tables” (with the probability of 9 or more used instead of the probability of 9) and multiplying by 440.1494 65.05 level of significance whether the number of errors the compositor makes in setting a galley of type is a random variable having a Poisson distribution.1008 44. we have between the two sets of frequencies. In general.2240 98. 440 λ̂ = 3.3 9 1 0.341 to estimate the Poisson parameter λ.0504 22. t = 1 since only one parameter is estimated on the basis of the data. In the above illus- tration.9 5 46 0.1680 73.0038 1. EXAMPLE 12 For the data in the table on this page.) 1.4 6 18 0.5  8 2 0.6 4 82 0. Then. approximately. or how close an agreement. Variances.0216 9. and Proportions Observed Poisson Expected Number of frequencies probabilities frequencies errors fi with λ = 3 ei 0 18 0. 83 4.690 7. Show that the rule in Section 7 for calculating the eration. Variances.83 is less than 14.067. Use the formula of Exercise 16 to recalculate χ 2 for lated in accordance with the rule in Section 7.” Exercises 14.9 65. and f is the grand total as defined in Section 7. Again. EXAMPLE 13 The following random samples are measurements of the heat-producing capacity (in millions of calories per ton) of specimens of coal from two mines: Mine 1: 8. and Proportions 2. Reject the null hypothesis if χ 2 G 14. Show that √ a 2 * 2 contingency table the maximum value of C (a) for  r  c f2 ij is 12 2. consider the following example.067 is the value of χ0.9)2 (53 − 65. the null hypothesis cannot be rejected.3 = 6. 2 7.510 7. the close agreement between the observed and expected frequencies suggests that the Poisson distribution provides a “good fit. Tests of Hypothesis Involving Means. Verify that if the expected cell frequencies are calcu.067. 9 The Theory in Practice Computer software exists for all the tests that we have discussed.05. Since χ 2 = 6. If the analysis of a contingency table shows that there is a relationship between the two variables under consid- 15. 17.070 7.380 7. we get (18 − 21. for any row or column equals the sum of the correspond- ing observed frequencies. 18. χ2 = −f eij (b) √for a 3 * 3 contingency table the maximum value of C i=1 j=1 is 13 6. indeed.7 5.3)2 χ2 = + +···+ 21. Show that the following computing formula for χ 2 is equivalent to the formula in Section 7: where χ 2 is the value obtained for the test statistic.930 Mine 2: 7. their sum Example 10.720 8.7)2 (3 − 5. χ2 C= χ2 + f 16. 3.230 8.860 7. To illustrate.400 8. we have only to enter the original raw (untreated) data into our computer together with the appropriate command. the strength of this relationship may be measured expected cell frequencies applies also when we test the by means of the contingency coefficient null hypothesis that we are sampling r populations with  identical multinomial distributions. Substituting into the formula for χ 2 .660 382 . where  m (fi − ei )2 χ2 = ei i=1 and 14. 05 level of significance. and the P-value is 0. With reference to Example 1. What are the null responding to the test statistic is 0. but we wanted to make the point that there exists software for all the standard testing procedures that we have discussed.8 383 . With reference to Example 3.021 is less than 0. Based on certain data. 26.8. Solution The MINITAB computer printout in Figure 5 shows that the value of the test statistic is t = 2. the P-value cor. In the test of a certain hypothesis. Computer printout for Example 13. statistic is 0. Test at the 0. Applied Exercises SECS.10 level of significance? than 10” if the distribution from which the sample was 20.3 (b) 0. use suitable statistical whether the true average time required by the night software to find the P-value that corresponds to t = guard to walk his round is 30 minutes. Since 0. at the (a) 0. and σ = 3. eighth graders should average 84. Can the null and alternative hypotheses for this test? hypothesis be rejected at the 25. where t is a value of a random variable having the dom sample of 32 rounds. use the four steps in the initial part of Section 1 to 21.10 level of significance? eighth graders from a certain school district averaged 87. The security department of a factory wants to know 23. According to the norms established for a reading (a) 0. 27. x = 8. 24. The use of appropriate statistical computer software is recom- mended for many of the applied exercises that follow. verify that the P-value corresponding to the observed value of the P-value corresponding to the observed value of the test test statistic. With reference to Example 2.0808. Would it also be rejected to rework the example. a null hypothesis is rejected at t distribution with 4 degrees of freedom. we conclude that the difference between the means of the two samples is significant at the 0.3 at the 0. the night guard averaged 30.01 level of significance.3 against the alternative P-value corresponding to the observed value of the test μ > 84. Rework Exercise 25. If. basing the decision on the 22.01 level of significance.0046. and Proportions Use the 0.05. Variances. statistic is 0. with a standard deviation of 8.05 level of significance.95.05 level of significance whether the mean of a random sample of size n = 16 is “significantly less (b) 0.05 level of significance. in a ran- −0. 1–3 19. the number of degrees of freedom is 7. Tests of Hypothesis Involving Means. taken is normal. verify that the test the null hypothesis μ = 84.0316.49. Figure 5.021.6. Use this P-value the 0. The impact of computers on statistics goes for Example 13.4. If 45 randomly selected (c) 0. comprehension test.01 level of significance.05 level of significance to test whether the difference between the means of these two samples is significant.2. 3. what is the probability that at least one of these experiments will come up “significant” even if 40. each with a different group test statistic.4. basing the decision on the fessor of biology. Sample surveys conducted in a large county in a cer- (a) If the professor conducts this experiment once with tain year and again 20 years later showed that originally several mice (using the 0. ing eating habits. whereas result even if the color of the mouse does not affect its 20 years later the average height of 500 ten-year-old boys speed in running the maze? was 54. even though there may be evidence to the contrary. it is no longer samples and yielded the following results: significant. in favor of the alternative μ Z 14. An epidemiologist is trying to discover the cause of a mine whether this is sufficient evidence to reject the null certain kind of cancer.05 level of significance.9 32. ran the maze faster than the norm established by many previous tests involving various colors of mice.5 against the alternative hypothesis μ1 − if the color of a mouse does not affect its maze-running μ2 < −0.6 seconds with a standard devia. Rework Exercise 36. deter. what is the probability that at least one the 0.05 level of terion if significance the null hypothesis μ = 14. A study of the number of business lunches that exec- 14. use the four steps in the find the probabilities of type II errors with the given cri- initial part of Section 1 to show that at the 0. Use the four steps in the ini. so on. factors will be associated with the cancer. Tests of Hypothesis Involving Means.3 seconds. (b) μ1 − μ2 = 0. Assuming that the data are a random sam. 14. test statistic. basing the decision on the those who developed the given cancer and those who did P-value corresponding to the observed value of the not.05 level of significance to test the null hypothesis of the experiments will yield a “significant” result even μ1 − μ2 = −0. attempting to demonstrate this fact. With reference to Exercise 30. tion of 2. measuring 48 different “factors” involv- hypothesis μ Z 30 minutes. n1 = 40 x1 = 9. for what values of x1 − mg/cigarette.28. In 12 test runs over a marked course.0 must be rejected (a) μ1 − μ2 = 0. Explain the appar. what is the average height of 400 ten-year-old boys was 53. likely to be rejected at least once.6 35. even if none of them is a use the four steps in the initial part of Section 1 to test causative factor? the null hypothesis μ = 35 against the alternative μ < 35 (b) What is the probability that more than one of these at the 0.2. 31.1 cal software to find the P-value that corresponds to the observed value of the test statistic.16. His object is to determine if there are any differ- ences in the means of these factors (variables) between 28.4 inches. x2 would the null hypothesis have been rejected? Also ple from a normal population. Assuming that it is reasonable to treat (a) What is the probability that one of these factors will the data as a random sample from a normal population. 38. use suitable statisti- n2 = 50 x2 = 8. level of significance in all his statistical tests. show that if the first measurement is recorded incorrectly as 16.000 peo- hypothesis μ = 30 minutes in favor of the alternative ple for five years. and 14.5 inches with a standard deviation of 2.05 level of significance). a newly designed an effort to be cautiously conservative.24.5 inches. he uses the 0. be “associated with” the cancer. basing the decision on the (c) If the professor has 30 of his students independently P-value corresponding to the observed value of the run the same experiment. exercise. even if none of them is a causative factor? 30. even if it is true.01 level of significance. utives in the insurance and banking industries claim as ent paradox that even though the difference between deductible expenses per month was based on random the sample mean and μ0 has increased. 14. Five measurements of the tar content of a certain kind of cigarette yielded 14. He assumes that these variables are independent.0 s2 = 2.12.5.0. drinking habits.05 level of significance to test the null hypothesis μ1 − 33. A pro- 37.01 motorboat averaged 33. (b) If the professor repeats the experiment with a new set Use the four steps in the initial part of Section 1 and of white mice. If the same hypothesis is tested often enough. this will reverse the result. ran P-value corresponding to the observed value of the white mice through a maze to determine if white mice test statistic.0 instead of 36. He studies a group of 10.5. To find out whether the inhabitants of two South mouse color plays no role in their maze-running speed? Pacific islands may be regarded as having the same 384 . it is μ2 = 0 against the alternative hypothesis μ1 − μ2 Z 0. With reference to Exercise 30.5. Rework Exercise 38.1 s1 = 1. Rework Exercise 27. 14. With reference to Example 4. Use this P-value to Use the four steps in the initial part of Section 1 and the rework the exercise.5 minutes. and tial part of Section 1 and the 0. smoking. (c) μ1 − μ2 = 0. of white mice. and Proportions minutes with a standard deviation of 1.8 the probability that he will come up with a “significant” inches with a standard deviation of 2. 34. speed? 39. Variances. 0. In 29. (d) μ1 − μ2 = 0. and 17 and 11 ances.41 square inch at the 0. Variances.01 level of sig. described in the text. effective in weight reduction. use suitable statisti.0086. Use the method of Exercise 7 to rework Exercise 49. SEC. (Use the method rework the exercise. that the two populations sampled have equal vari- 57 and 51. 46 and 44. 26 and 24. In a study of the effectiveness of certain exercises in 50. test at the 0. standard deviation of 0. a group of 16 persons engaged in these exercises for one month and showed the following results: 51. for their driver’s licenses. Assuming that the weights constitute a random Design 2: 154 135 132 171 153 149 sample from a normal population. 33 and 35. With reference to Exercise 40. 124 and 119.0100 at the 0. In a random sample. find the P-value cor- 194 192 181 175 responding to the observed value of the test statistic and 160 161 245 233 use it to decide whether the null hypothesis could have 182 182 146 142 been rejected at the 0.85 minutes. To compare two kinds of front-end designs.05 level of Then each car was run into a concrete wall at 5 miles significance. Use the 0.10 level of 45. and thus judge whether the exercises are at the 0. 54.4 and x2 = 72. an anthropologist determines the cephalic Use the four steps in the initial part of Section 1 and the indices of six adult males from each island. weight reduction.01 level of significance whether the difference between the means of these two samples is significant. six of each population. Past data indicate that the standard deviation of measurements made on sheet metal stampings by expe- Weight Weight Weight Weight rienced inspectors is 0. Assuming that these deter- minations constitute a random sample from a normal 42. With reference to Example 5. With reference to Example 5. 73 and 60. Tests of Hypothesis Involving Means. If a new inspector before after before after measures 50 stampings with a standard deviation of 0. test the null hypothesis σ = 250 pounds against the two-sided alternative σ Z 250 Use the four steps in the initial part of Section 1 to test pounds at the 0. 4 41.49 211 198 172 166 square inch. after a certain safety program was put into operation: 55. 53. and Proportions racial ancestry. test the null hypothesis σ = 0. getting 0. use suitable statisti- nificance to see whether the difference between the two cal software to find the P-value that corresponds to the sample means can reasonably be attributed to chance. test the null hypothesis esis μ1 − μ2 = 0 against the alternative hypothesis σ1 − σ2 = 0 against the alternative hypothesis σ1 − σ2 > 0 μ1 − μ2 > 0. 182 179 203 201 52.05 level of significance to test the null hypoth.85 minutes against the observed value of the test statistic. With reference to Exercise 42.53 minutes for the amount of time that 30 women took to complete the written test 43.05 level of 214 209 167 164 significance.015 level of significance.05 level of significance to test whether the safety pro- x1 = 77. Use the four steps in the initial part of Section 1 and the 0. significance whether it is reasonable to assume that the hours due to accidents in 10 industrial plants before and two populations sampled have equal variances.41 square inch against the alterna- 171 172 185 181 tive hypothesis σ > 0. the alternative hypothesis σ < 0. use suitable statistical software to show that the P-value corresponding to t = 47. 34 and 29. At the 0. per hour.01 level of significance. gram is effective. 385 . With reference to Exercise 45.2 and the corresponding stan.05 level of significance.02 level of significance whether it is reasonable to assume 45 and 36. Nine determinations of the specific heat of iron had a 2. at the 0. With reference to Exercise 42. test at the 0. the weights of 24 Black Angus steers of a certain age have a standard deviation of 238 Design 1: 127 168 143 165 122 139 pounds. 83 and 77.05 level of significance.0100 against kind were installed on a certain make of compact car. With reference to Exercise 51. and the following are the costs of the repairs (in dollars): 48.67 is 0. cal software to find the P-value corresponding to the test the null hypothesis σ = 2. dard deviations s1 = 3.41 square inch. s = 2. Use this P-value to alternative hypothesis σ < 2. observed value of the test statistic.) 44.3 and s2 = 2.0185. Use this P-value to Assume that the populations sampled are normal and rework the exercise. have equal variances. In a random sample. 46. The following are the average weekly losses of work. use the method of Exercise 7 to test the 180 173 155 154 null hypothesis σ = 0. 49.1. Use the method of Exercise 12 and the 0. 7 of 18 customers prefer the new kind of packaging θ2 = θ3 (that the proportion of persons favoring the to the old kind. obtained for z equals the value obtained for χ 2 . Use the statistic of Exercise 12 to rework significance to test the null hypothesis θ = 0. 82 reported that they experienced midmorning critical region. test the null hypothesis θ = 0.40 against Exercise 70. and verify that the square of the value the alternative hypothesis θ Z 0. A doctor claims that less than 30 percent of all per. With reference to Exercise 57. sis θ < 0. respectively. 62. level of significance to test the null hypothesis θ1 = ple. 200 persons with average incomes. of significance. find the critical region nursery failed to bloom and 18 of 200 tulip bulbs from and the actual level of significance corresponding to this another nursery failed to bloom. 59.90 at the 0. and D were 23. In random samples of 250 persons with low incomes.40 can be rejected against the alternative hypothesis θ < 0. 5–6 66. In a random sample of 200 persons who skipped and the actual level of significance corresponding to this breakfast. 74 of 250 persons who watched and the actual level of significance corresponding to this a certain television program on a small TV set and 92 of critical region. sons exposed to such radiation felt any ill effects. test the null hypothesis θ = 0. test the null hypothesis θ1 = θ2 against the alternative hypothesis θ1 Z θ2 at the 0.05 level of significance. in a random sam.01 level of significance to test B. 70. 20 percent of sis that there is no difference in the durability of the four all undergraduate business students will take advanced kinds of tires at the 0. students. 12 of 14 industrial accidents were due to unsafe working conditions. households is at least 35 percent? (Use the 0. 155. The manufacturer of a spot remover claims that his shoppers can identify a highly advertised trademark. 58. In a random sample. If. only 1 of 19 per. level of significance to test the null hypothesis that there tained at least one member with a college degree.60 that a customer will prefer a new who favor a certain piece of legislation. and 150 persons with 65. that is. all equal. level of significance. show that the critical Use the 0. Variances. C. sample.000 miles.05 level of significance to test the null hypoth- region is x F 5 or x G 15 and that. find the critical region 72. Use the χ 2 statistic to critical region. 69. Does is no difference between the corresponding population this finding refute the statement that the proportion of all proportions against the alternative hypothesis that mid- such U.40.05 States. In a random sample of 600 cars making a right turn at a certain intersection. In a random survey of 1. In random samples.20. and Proportions SECS. 56.60 against legislation is the same for all three income groups) against the alternative hypothesis θ Z 0.30. If. Use the statistic of Exercise 12 to rework Exercise 68. 250 persons who watched the same program on a large set remembered 2 hours later what products were advertised. corresponding to esis that the actual proportion of drivers who make this this critical region.30 at the 0.60 at the 0. against the alternative hypothesis θ > 0. It has been claimed that more than 40 percent of all 67. and 87 ability is really 0. alternative hypothesis θ Z 0.01 level of 71. 46 of 400 tulip bulbs from one 60. Tests of Hypothesis Involving Means.0414.) skip breakfast. With reference to Example 8. If 26 of 200 tires of brand A failed to last 30. Use the 0. 10 of 18 shoppers were able to iden.05 level of significance ufacturer’s product. fatigue. 74. high incomes. In a random sample of 12 undergraduate business 73. 15. 57. Use the 0. If. test at the 0.30 against the alternative hypothe. With reference to Exercise 61. work in accounting. and in a random sample of 300 persons who ate breakfast. 6 said that they will take advanced work in whereas the corresponding figures for 200 tires of brands accounting.S.05 against the alternative hypothesis θ > 0.05 level of significance. 157 pulled into the wrong lane. Use the χ 2 statistic to test the null hypothesis θ1 = θ2 sons exposed to a certain amount of radiation will feel against the alternative hypothesis θ1 Z θ2 at the 0. With reference to Exercise 59. If. find the critical region 68. only 174 of 200 spots were removed with the man- tify the trademark.90 whether the null hypothesis θ = 0.05 level of significance. 64. in a random sample. 386 . it is found that 29 percent of the households con.01 level any ill effects. 87 reported that they experienced midmorn- 63.20. there were. 61.40. Use the 0.30 against the 0. 118. in a random in a random sample.000 households in the United ing fatigue. In random samples.05 level of the alternative hypothesis that the three θ ’s are not significance.05 kind of packaging to the old kind. the level of significance is actually mistake at the given intersection is θ = 0.05 morning fatigue is more prevalent among persons who level of significance. product removes 90 percent of all spots. and 32. test the null hypothesis θ = 0. test the null hypothe- the null hypothesis θ = 0. A food processor wants to know whether the prob. Use the following data obtained for 300 1-second 76. Tests of the fidelity and the selectivity of 190 radios significance whether they may be looked upon as values produced the results shown in the following table: of a binomial random variable: Fidelity Number of Number of Low Average High cakes sold days Low 7 12 31 0 1 1 16 Selectivity Average 35 59 18 2 55 3 228 High 15 13 0 Use the 0. and 6 times. test at the 0 19 0. Tests of Hypothesis Involving Means. 80. Four coins were tossed 160 times and 0. adequate. The following sample data pertain to the shipments 40-second intervals: received by a large firm from three different vendors: Number of Number Number imperfect Number particles Frequency rejected but acceptable perfect 5–9 1 Vendor A 12 23 89 10–14 10 15–19 37 Vendor B 8 12 62 20–24 36 25–29 13 Vendor C 21 30 119 30–34 2 Test at the 0. Variances. 3. 1. 54. two.05 level of in sex education.05 level of significance to test whether it is reasonable to suppose that the coins are balanced and Process A Process B Process C randomly tossed. which pertains to the responses of shoppers in three dif- 75. pliance to a strength standard. 19. and Proportions SECS. a baker bakes Good 57 46 20 three large chocolate cakes.05 level of 77.01 level of signifi. or three Number of or more children in the school system and also whether gamma rays Frequency they feel that the course is poor. 387 . In a study of parents’ feelings about a required course intervals to test this null hypothesis at the 0. The following is the distribution of the readings sis that fidelity is independent of selectivity. 2.01 level of significance whether the three 35–39 1 vendors ship products of equal quality.4. 7–8 79. Monday through Saturday. Use the data shown in the following table to test at the 0. or 4 cance that the three processes have the same probability heads showed. It is desired to test whether the number of gamma rays emitted per second by a certain radioactive substance is Number failing test 21 15 35 a random variable having the Poisson distribution with λ = 2. If the tests showed the following results. significance: sified according to whether they have one. level of significance. are clas. Each day. of passing this strength standard? Use the 0.01 level of significance to test the null hypothe.05 three different prototype processes and tested for com. 23. Samples of an experimental material are produced by ferent cities with regard to two detergents. or good. 83. can it be said at the 0. 58. Based on the results shown in the following table. Use the 0. respectively. Number passing test 45 58 49 81. a random sample. Analyze the 3 * 3 table in the initial part of Section 1. obtained with a Geiger counter of the number of parti- cles emitted by a radioactive substance in 100 successive 78.05 level of significance whether there is a relationship 1 48 between parents’ reaction to the course and the number 2 66 of children that they have in the school system: 3 74 Number of children 4 44 1 2 3 or more 5 35 6 10 Poor 48 40 12 7 or more 4 Adequate 55 53 29 82. 360 parents. and those not sold on the same day are given away to a food bank. and greater than 34.8 55. between 14. 5th ed.3 60.5 52. Additional comments relative to using the continuity Agresti. York: John Wiley & Sons. Inc. between 19.5. research has been done on the analysis of The Smith–Satterthwaite test of the null hypothesis that r * c tables.. R..9 51. Mathematical Methods of Statistics. where the categories represented by the two normal populations with unequal variances have rows and/or columns are ordered. a normal distribution with μ = 20 and σ = 5 will take on Condition 1: 55.. A. parameters can be found in Goodman.. A.8 (c) Find the expected normal curve frequencies for the various classes by multiplying the probabilities obtained Use a suitable statistical computer program to test in part (b) by the total frequency.5 51.J. Condition 2: 55.: Princeton University Press. In recent years.0 55. Use a suitable statistical computer program to test at the nificantly less than 300 hours..2 50. 9 the housing of machinery on a seagoing vessel are tested by means of a salt-spray test. Tests of Hypothesis Involving Means. 2nd ed. York: John Wiley & Sons.5 and 24. 1984. the 0. Use the 0. Mass. Miller and Freund’s Probability and rial may be found in Statistics for Engineers.0 53. Samples of three materials under consideration for SEC.7 52. B.5 60.5.. 1984.: Prentice Hall.8 54. The Analysis of Contingency Tables. 1990. Statistical Theory and Methodology in Having Ordered Categories. A.J. sheets coated with polyurethane under two different (b) Find the probabilities that a random variable having ambient conditions.5 55.9 61.2 54. The following are the drying times (minutes) of 40 this distribution are x = 20 and s = 5.5 and 14.8 51. The Analysis of Cross-Classified Data Brownlee. Variances. 86. S.2 46. and Proportions (a) Verify that the mean and the standard deviation of 85.5. This work is beyond the same mean is given in the level of this chapter. 388 . The 38 light bulbs.9 54. Any sample that leaks when 84. H. Cambridge..5 and 29. Upper Saddle River.0 between 29.5 and 34. L. K.05 level of significance.3 60. 1977. ton. Prince. 1946.5.1 57. A.2 56. New N. Inc. between 9.5. Categorical Data Analysis. between 24.5.6 42.5. Inc.05 whether there is a significant difference between the level of significance whether the data may be looked upon mean drying times under the two ambient conditions.6 59.6 56.5 63.8 and 19. Wiley & Sons.8 a value less than 9.3 62..05 level of significance if the three materials have the icance. New York: John correction for testing hypotheses concerning binomial Wiley & Sons.1 57. The following are the hours of operation to failure of subject to a power spray is considered to have failed. but some introductory mate- Johnson. New Cramér. Inc. 1965. Use as a random sample from a normal population.1 61.1 43. following are the test results: 150 389 345 310 20 310 175 376 334 340 Material A Material B Material C 332 331 327 344 328 341 325 2 311 320 256 315 55 345 111 349 245 367 81 327 Number leaked 36 22 18 355 309 375 316 336 278 396 287 Number not leaked 63 45 29 Use a suitable statistical computer program to test whether the mean failure time of such light bulbs is sig.6 53.1 62. and then test at the 0.5 58. same probability of leaking in this test. N. References The problem of determining the appropriate number of Details about the analysis of contingency tables may be degrees of freedom for various uses of the chi-square found in statistic is discussed in Everitt. A.1 47.5 53..9 52. 1994. 53. Analysis of Ordinal Categorical Data. 0. Agresti.4 59..8 58.01 level of signif. New York: John vard University Press..1 57.: Har- Science and Engineering. 43 The P-value is 0. the null hypothesis must be rejected. 81 χ 2 = 28. the null hypothesis cannot be rejected.0104. 0. the null hypothesis cannot be 31 s has also increased to 0.2.98. 37 The P-value is 0. 5 n = 151. 3 n = 52. 33 (a) P(reject H0 |H0 is true) = 0.1554. the null hypothesis must be rejected. 0. 0. Tests of Hypothesis Involving Means. (d) β = 0.7. χ 2 = 1.93.18. 32.03. reject the null hypoth. 77 χ 2 = 52. nificant at the 0.61. the null hypothesis must be 23 P-value = 0. the null hypothesis must be rejected. the null hypothesis cannot be rejected.42. the null hypothesis cannot be q  rejected. (b) yes. the null rejected. 0.3557. λ0 ) … α.0009. 83 (b) The probabilities are 0.4.742.79. 1 Use the critical region 2 Ú χα. the null hypothesis must be rejected. 49 χ 2 = 22.0179. the null hypothesis cannot be rejected. the P-value = 0.0268. 389 . 19 (a) No. 63 z = −3. thus. Variances. thus the 27 z = 3.80. the null hypothesis cannot be are 1. the null hypothesis must be y=kα rejected.8. hypothesis cannot be rejected. and 0.3249. (c) P(reject H0 on one or more of 30 experiments 75 χ 2 = 8.0012. |H0 is true) = 0.71.0975. and Proportions Answers to Odd-Numbered Exercises n(x − μ0 )2 45 t = 4.10.05.11.9. the null hypothesis cannot be rejected. 29 t = −2. the null hypothesis must be rejected. 53 f = 1. the null hypothesis cannot be rejected.1154. rejected. 2. (b) β = 0. i=1 57 The P-value is 0.92. 0.1112. 61 The P-value is 0. 79 χ 2 = 3. the null hypothesis cannot be rejected.6.  n esis if xi Ú kα where kα is the smallest integer for which 55 f = 1. (b) P(reject H0 69 z = −1. on experiment 1 or experiment 2 or both |H0 is true) = 73 χ 2 = 7.73.1 σ2 47 χ 2 = 5. 51 z = 1. the null hypothesis must be rejected. 0. the null hypothesis cannot be rejected. 25 z = 2. the null hypothesis cannot be rejected.18. the null hypothesis must be rejected. the null hypothesis must be rejected.0019. 9 The alternative hypothesis is λ > λ0 . (c) The expected frequencies 39 The P-value is 0.61. and 0. 11. the difference is sig- rejected.71.1348. p(y.7. the null hypothesis cannot be rejected. 35 (a) β = 0. 65 The P-value is 0. 59 The P-value is 0. the null hypothesis cannot be 85 t = 3. statement is refuted.5. (c) β = 0.0094. rejected. n.03.71.8.3245.005 level of significance. the null hypothesis must be rejected.46. 35.85.02.1178.71. 15. This page intentionally left blank . that is. the “average” value of Y for the given value of X. x3 . and so on. we are concerned with quantities such as μZ|x. dates back to Francis Galton. x2 . bivariate regression consists of determining the conditional density of Y. μX4 |x1 . the mean of X4 for given values of X1 .Q. Although it is. that is. Copyright © 2014 by Pearson Education. and X3 . All rights reserved.’s.y . if we are given the joint distribution of two random variables X and Y. family expen- ditures on entertainment in terms of family income. DEFINITION 1. Similarly. who employed it to indicate certain relationships in the theory of heredity. Thus. The term “regression. Marylees Miller. given suitable data. but. 391 . X2 . a patient’s weight in terms of the number of weeks he or she has been on a diet. we can predict the average income of a col- lege graduate in terms of the number of years he has been out of college. and we can at best predict the average performance of students starting college in terms of their I. and X is known to take on the value x. REGRESSION EQUATION. Formally.” as it is used here. If f(x. the mean of Z for given values of X and Y. desirable to be able to predict one quantity exactly in terms of others. and in most instances we have to be sat- isfied with predicting averages or expected values. we may not be able to predict exactly how much money Mr. given X = x and then evaluating the integral  q μY|x = E(Y|x) = y · w(y|x)dy −q From Chapter 14 of John E. studies are made to predict the potential sales of a new product in terms of its price. Brown will make 10 years after graduating from college. in multiple regression. Thus. Eighth Edition.Regression and Correlation 1 Introduction 5 Normal Correlation Analysis 2 Linear Regression 6 Multiple Linear Regression 3 The Method of Least Squares 7 Multiple Linear Regression (Matrix Notation) 4 Normal Regression Analysis 8 The Theory in Practice 1 Introduction A major objective of many statistical investigations is to establish relationships that make it possible to predict one or more variables in terms of others. BIVARIATE REGRESSION. In problems involving more than two random variables. this is seldom possible. of course. and so forth. y) is the value of the joint density of two random variables X and Y. we can at best predict the average yield of a given variety of wheat in terms of data on the rainfall in July. the per capita consumption of certain foods in terms of their nutritional values and the amount of money spent advertising them on television. Irwin Miller. the basic problem of bivariate regression is that of determining the conditional mean μY|x . Inc. Freund’s Mathematical Statistics with Applications. y) = ⎩0 elsewhere find the regression equation of Y on X and sketch the regression curve. the determination of μY|x or μX|y becomes a problem of estimation based on sample data. the regression equation of X on Y is given by  q μX|y = E(X|y) = x · f (x|y)dy −q In the discrete case. EXAMPLE 1 Given the two random variables X and Y that have the joint density ⎧ ⎨x · e−x(1+y) for x > 0 and y > 0 f (x. the integrals in the two regression equations given in Definition 1 are simply replaced by sums. by evaluating x  q μY|x = y · x · e−xy dy 0 or by referring to the corollary of a theorem given here “The mean and the variance of the exponential distribution are given by μ = θ and σ 2 = θ 2 . Hence. when we are dealing with probability distributions instead of probability densities. 392 . Regression and Correlation The resulting equation is called the regression equation of Y on X. or at least not all its parameters. Alternately. Solution Integrating out y. When we do not know the joint probability density or distribution of the two random variables. which we shall discuss in Sections 3 and 4. y) x · e−x(1+y) w(y|x) = = = x · e−xy g(x) e−x for y > 0 and w(y|x) = 0 elsewhere.” we find that the regression equation of Y on X is given by 1 μY|x = x The corresponding regression curve is shown in Figure 1. which we recognize as an exponential density 1 with θ = . this is an entirely different problem. we find that the marginal density of X is given by ⎧ ⎨e−x for x > 0 g(x) = ⎩0 elsewhere and hence the conditional density of Y given X = x is given by f (x. 2. 1. which we recognize as a binomial distribution with the parame- ters n and θ1 . . . . Regression curve of Example 1. . . . 2. Hence. 2. . y. . n. . n. . n. and. 1. 1. n − x − y y=0   n x = θ (1 − θ1 )n−x x 1 for x = 0. y) = · θ1x θ2 (1 − θ1 − θ2 )n−x−y x. and y = 0. Solution The marginal distribution of X is given by    n−x n y g(x) = · θ1x θ2 (1 − θ1 − θ2 )n−x−y x. . . 2. EXAMPLE 2 If X and Y have the multinomial distribution   n y f (x. . .   n−x y θ2 (1 − θ1 − θ2 )n−x−y f (x. Regression and Correlation y 4 3 1 2 my|x ⫽ x 1 x 1 2 3 4 Figure 1. . with x + y F n. y) y w(y|x) = = g(x) (1 − θ1 )n−x for y = 0. . n − x − y for x = 0. y. find the regression equa- tion of Y on X. 1. n − x. rewriting this formula as   y n−x θ2 1 − θ1 − θ2 n−x−y w(y|x) = y 1 − θ1 1 − θ1 393 . X2 . 3. Solution The joint marginal density of X1 and X3 is given by ⎧ ⎪ ⎨ x + 1 e−x3 1 for 0 < x1 < 1. so that the regression equation of 1 − θ1 Y on X is (n − x)θ2 μY|x = 1 − θ1 With reference to the preceding example. for each of the 30 − x outcomes that are not even. x3 ) = 2 ⎪ ⎩0 elsewhere Therefore. x2 . x3 = x2 · dx2 = dx2 −q m(x1 . EXAMPLE 3 If the joint density of X1 . 0 < x2 < 1. if we let X be the number of times that an even number comes up in 30 rolls of a balanced die and Y be the number of times that the result is a 5. or 5. x3 > 0 m(x1 . 1.  q  f (x1 . Regression and Correlation we find by inspection that the conditional distribution of Y given X = x is a binomial θ2 distribution with the parameters n − x and . x3 > 0 1 2 f (x1 . then the regression equation becomes 1 (30 − x) μY|x = 6 = 1 (30 − x) 1 3 1− 2 This stands to reason. because there are three equally likely possibilities. x3 ) 1 x2 (x1 + x2 ) μX2 |x1 . x2 . x3 ) 0 1 x1 + 2 2 x1 + = 3 2x1 + 1 394 . x3 ) = ⎩0 elsewhere find the regression equation of X2 on X1 and X3 . and X3 is given by ⎧ ⎨(x + x )e−x3 for 0 < x1 < 1. If the regression of Y on X is linear. in terms of E(X) = μ1 . it follows that  y · w(y|x) dy = α + βx and if we multiply the expression on both sides of this equation by g(x). they often provide good approximations to otherwise complicated regression equations. since there is a pairwise independence between X2 and X3 . they lend them- selves readily to further mathematical treatment. then σ1 μX|y = μ1 + ρ (y − μ2 ) σ2 Proof Since μY|x = α + βx. called the regression coefficients. in fact. the corresponding value of the marginal density of X. This could have been expected. the regression equations are. we obtain    y · w(y|x)g(x) dy dx = α g(x) dx + β x · g(x) dx or μ2 = α + βμ1 395 . finally. Regression and Correlation Note that the conditional expectation obtained in the preceding example depends on x1 but not on x3 . then σ2 μY|x = μ2 + ρ (x − μ1 ) σ1 and if the regression of X on Y is linear. and cov(X. linear. that is. var(Y) = σ22 . Then. E(Y) = μ2 . There are several reasons why linear regression equations are of special interest: First. then. also using the correlation coefficient σ12 ρ= σ1 σ2 we can prove the following results. it is of the form μY|x = α + βx where α and β are constants. var(X) = σ12 . and. 2 Linear Regression An important feature of Example 2 is that the regression equation is linear. let us express the regression coefficients α and β in terms of some of the lower moments of the joint distribution of X and Y. that is. To simplify the study of linear regression equations. and integrate on x. Y) = σ12 . THEOREM 1. in the case of the bivariate normal distribution. At this time. nevertheless. and the sign of ρ tells us directly whether the slope of a regression line is upward or downward. we find that σ12 σ2 α = μ2 − · μ1 = μ2 − ρ · μ1 σ12 σ1 and σ12 σ2 β= =ρ σ12 σ1 This enables us to write the linear regression equation of Y on X as σ2 μY|x = μ2 + ρ (x − μ1 ) σ1 When the regression of X on Y is linear. When ρ = 0 and hence σ12 = 0. there are many problems where a set of paired data gives the indication that the regression is linear. y) dy dx = α x · g(x) dx + β x2 · g(x) dx or E(XY) = αμ1 + βE(X 2 ) Solving μ2 = α + βμ1 and E(XY) = αμ1 + βE(X 2 ) for α and β and mak- ing use of the fact that E(XY) = σ12 + μ1 μ2 and E(X 2 ) = σ12 + μ21 . but if two random variables are uncorrelated. 3 The Method of Least Squares In the preceding sections we have discussed the problem of regression only in con- nection with random variables having known joint distributions. In actual practice. we would have obtained    xy · f (x. similar steps lead to the equation σ1 μX|y = μ1 + ρ (y − μ2 ) σ2 It follows from Theorem 1 that if the regression equation is linear and ρ = 0. and we can say that if two random variables are independent. the latter is again illustrated in Exercise 9. The correlation coefficient and its estimates are of importance in many statistical investigations. where we do not know the joint distribution of the random vari- ables under consideration but. the two random variables X and Y are uncorrelated. and they will be discussed in some detail in Section 5. Regression and Correlation since w(y|x)g(x) = f (x. they are not necessarily independent. let us again point out that −1 F ρ F +1. as the reader will be asked to prove in Exercise 11. they are also uncorrelated. want to estimate the regression 396 . If we had multiplied the equation for μY|x on both sides by x · g(x) before integrating on x. y). then μY|x does not depend on x (or μX|y does not depend on y). the overall pattern suggests that the average test score for a given number of hours studied may well be related to the number of hours studied by means of an equation of the form μY|x = α + βx. let us consider the following data on the number of hours that 10 persons studied for a French test and their scores on the test: Hours studied Test score x y 4 31 9 58 10 65 14 73 4 37 7 44 12 60 22 91 1 21 17 84 Plotting these data as in Figure 2. Although the points do not all fall exactly on a straight line. Regression and Correlation coefficients α and β. Data on hours studied and test scores. To illustrate this technique. 397 . y 100 90 80 70 60 Test score 50 40 30 20 10 x 0 5 10 15 20 25 Hours studied Figure 2. we get the impression that a straight line pro- vides a reasonably good fit. Problems of this kind are usually handled by the method of least squares. a method of curve fitting suggested early in the nineteenth century by the French mathematician Adrien Legendre. we face the problem of estimating the coefficients α and β from the sample data. LEAST SQUARES ESTIMATE. as indicated in Figure 3. 2. Least squares criterion. . ˆ y 100 90 80 ei 70 60 Test score 50 yi 40 a ⫹ bxi 30 20 10 x 0 5 10 15 20 25 Hours studied Figure 3. . n} The least squares estimates of the regression coefficients in bivariate linear regres- sion are those that make the quantity  n  n q= e2i = [yi − (α̂ + β̂xi )]2 i=1 i=1 a minimum with respect to ␣ˆ and ␤. If we are given a set of paired data {(xi . we face the problem of obtaining estimates α̂ and β̂ such that the estimated regression line ŷ = α̂ + β̂x in some sense provides the best possible fit to the given data. . the least squares criterion on which we shall base this “goodness of fit” is defined as follows: DEFINITION 2. . Denoting the vertical deviation from a point to the estimated regression line by ei . i = 1. Regression and Correlation Once we have decided in a given problem that the regression is approximately linear and the joint density of x and y is unknown. In other words. yi ). 398 . Regression and Correlation Finding the minimum by differentiating partially with respect to α̂ and β̂ and equating these partial derivatives to zero. we find that the least squares estimate of β is ⎛ ⎞ ⎛ ⎞⎛ ⎞  n  n  n n⎝ xi yi ⎠ − ⎝ xi ⎠ ⎝ yi ⎠ i=1 i=1 i=1 β̂ = ⎛ ⎞ ⎛ ⎞2  n  n n⎝ x2i ⎠ − ⎝ xi ⎠ i=1 i=1 Then we can write the least squares estimate of α as  n  n yi − β̂ · xi i=1 i=1 α̂ = n by solving the first of the two normal equations for α̂. we obtain ⭸q  n = (−2)[yi − (α̂ + β̂xi )] = 0 ⭸α̂ i=1 and ⭸q  n = (−2)xi [yi − (α̂ + β̂xi )] = 0 ⭸β̂ i=1 which yield the system of normal equations  n  n yi = α̂n + β̂ · xi i=1 i=1  n  n  n xi yi = α̂ · xi + β̂ · x2i i=1 i=1 i=1 Solving this system of equations by using determinants or the method of elimi- nation. This formula for α̂ can be simplified as α̂ = y − β̂ · x To simplify the formula for β̂ as well as some of the formulas we shall meet in Sections 4 and 5. let us introduce the following notation: ⎛ ⎞2  n  n 1 n Sxx = (xi − x)2 = x2i − ⎝ xi ⎠ n i=1 i=1 i=1 ⎛ ⎞2  n  n 1 n Syy = (yi − y)2 = y2i − ⎝ yi ⎠ n i=1 i=1 i=1 399 . 69 and β̂ = 3.471 · = 21.376 − (100)2 = 376 10 and 1 Sxy = 6. 400 .69.69 + 3.69 + 3. Given the sample data {(xi . Since we did not make any assumptions about the joint distribution of the ran- dom variables with which we were concerned in the preceding example. . y = 564. x = 100.945 from the data. Regression and Correlation and ⎛ ⎞⎛ ⎞  n  n 1 ⎝ ⎠ ⎝ ⎠ n n Sxy = (xi − x)(yi − y) = xi yi − xi yi n i=1 i=1 i=1 i=1 We can thus write the following theorem.305 564 100 Thus. and the equation 376 10 10 of the least squares line is ŷ = 21. rounded to the nearest unit. (a) find the equation of the least squares line that approximates the regression of the test scores on the number of hours studied. Solution (a) Omitting the limits of summation for simplicity. Thus 1 Sxx = 1.471x (b) Substituting x = 14 into the equation obtained in part (a).471 and α̂ = − 3. yi ). . (b) predict the average test score of a person who studied 14 hours for the test. i = 1. THEOREM 2. we cannot judge the “goodness” of the prediction obtained in part (b). we get n = 10.945 − (100)(564) = 1. and xy = 6. β̂ = = 3.284 or ŷ = 70. we get ŷ = 21. . n}. also. the coefficients of the least squares line ŷ = α̂ + β̂x are Sxy β̂ = Sxx and α̂ = y − β̂ · x EXAMPLE 4 With reference to the data in the table in Section 3.305 10 1. x2 = 1. .471(14) = 70. Problems like this will be discussed in Section 4. 2.471 obtained in part (a). we cannot judge the “goodness” of the estimates α̂ = 21.376. y) = (1 + x + xy) 3 ⎪ edition. This question has been intentionally omitted for this α = μ3 edition. multiplying by (x1 − μ1 ) 2 2 2 and (x2 − μ2 ). Given the joint density 10. but not independent. equation of X on Y is (c) verify the results of part (a) by substituting the val- 2 ues of μ1 . 4.. μ2 . obtained with the formula of μX|y = part (b). and x + y < 1 ⎧ f (x. Given the joint density 24xy for x > 0. Above all. in other words. the method of least squares. Xj )... ⎪ ⎩0 elsewhere 12.  2. σ2 . x2 . Given the joint density Also sketch the regression curve. μ2 . σi2 = var(Xi ). then var(Y|x) = σ22 (1 − ρ 2 ). Given the random variables X1 . [Hint: Proceed as in Section 2. or. and ρ. third equations. σ13 σ22 − σ12 σ23 7. 9. ⎪ ⎪ 2x ⎨ for x > 0 and y > 0 11. σ1 . then 6. x2 = α + β1 (x1 − μ1 ) + β2 (x2 − μ2 ) edition. y) = 5 ⎪ show that μY|x = 23 (1 − x) and verify this result by deter- ⎪ ⎩0 elsewhere mining the values of μ1 .. find μY|x and μX|y . respectively. σ2 . and ρ and by substi- tuting them into the first formula of Theorem 1. show that if the regression 1 show that μY|x = 1 + and that var(Y|x) does not exist. show that the regression Also. of X3 on X1 and X2 is linear and written as x 5. Given the joint density   1 for −y < x < y and 0 < y < 1 6x for 0 < x < y < 1 f (x. σ1 . X2 . y) = 0 elsewhere ⎪ ⎪ 2 ⎨ (2x + 3y) for 0 < x < 1 and 0 < y < 1 f (x. is used in many problems of curve fitting that are more general than the one treated in this section. y) = f (x. and X3 having the joint density f (x1 . y > 0. This question has been intentionally omitted for this f (x. x3 ). With reference to Example 1. and σij = cov(Xi . Given the joint density 3. 1+y 8. y) = 0 elsewhere 0 elsewhere show that the random variables X and Y are uncorrelated find μY|x and μX|y . y) = β2 = 0 elsewhere σ12 σ22 − σ12 2 show that x 1+y where μi = E(Xi ). This question has been intentionally omitted for this μX3 |x1 . Show that if μY|x is linear in x and var(Y|x) is con- ⎧ stant.] (n + 1)(m + n + 2) 401 . it will be used in Sections 6 and 7 to estimate the coefficients of multiple regression equations of the form μY|x1 . Given the joint density β1 = σ12 σ22 − σ12 2  2 for 0 < y < x < 1 σ23 σ12 − σ12 σ13 f (x. xk = β0 + β1 x1 + · · · + βk xk Exercises 1. to obtain the second and (b) E(X m Y n ) = . Regression and Correlation The least squares criterion. into the formulas of Theorem 1. (a) μY|x = and μX|y = . where α̂ and β̂ are estimates of α and β. Find the least squares estimate of the parameter β in is odd. Thus. we get 402 . This clearly differs from correlation analysis. . this is a problem of regression analysis. 1. and height and weight are both looked upon as random variables. . when n y = a + bx + γ x2 to paired data. . 2. Regression and Correlation 13. which is easier) with respect to α. −5. find the nor- 15. and with predictions based on the estimated regression equation ŷ = α̂ + β̂x. −3. even. . 3. 2. −3. For example. 4 Normal Regression Analysis When we analyze a set of paired data {(xi . normal regression analysis concerns itself mainly with the estimation of σ and the regression coefficients α and β. β. . and then solve the resulting system of equations. equate the expressions to zero. . . On the other hand. which we shall take up in Section 5. treating the ages as known constants and the prices as values of random variables. we look upon the xi as constants and the yi as values of corresponding independent random variables Yi . . β. Solve the normal equations in Section 3 simultane- ously to show that  n  n ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ yi xi yi  n n  n  n ⎝ x2i ⎠ ⎝ yi ⎠ − ⎝ xi ⎠ ⎝ xi yi ⎠ α̂ = i=1 and β̂ = i=1 n n α̂ = i=1 ⎛ i=1 ⎞ ⎛ i=1 ⎞2 i=1 x2i n n i=1 n⎝ x2i ⎠ − ⎝ xi ⎠ i=1 i=1 16. Given a random sample of such paired data. When the x’s are equally spaced. The method of least squares can be used to fit curves to data. +βxi ) 2 1 −1 w(yi |xi ) = √ · e 2 σ − q < yi < q σ 2π where α. . the calculation of mal equations that provide least squares estimates of α̂ and β̂ can be simplified by coding the x’s by assign. . if we want to analyze data on the ages and prices of used cars. . α. or the values . 3. This section will be devoted to some of the basic problems of normal regression analysis. Show that with this coding the formulas for α̂ and β̂ become 14. and σ . and σ are the same for each i. n} by regression analysis. −1. β. . differentiating 1  n n ln L = −n · ln σ − · ln 2π − 2 · [yi − (α + βxi )]2 2 2σ i=1 partially with respect to α. To obtain maximum likelihood estimates of the parameters α. 0. −2. yi ). β. β. we par- tially differentiate the likelihood function (or its logarithm. where we look upon the xi and the yi as values of correspond- ing random variables Xi and Yi . 5. with tests of hypotheses concerning these three parameters. . this is a problem of correlation analysis. when n is the regression equation μY|x = βx. . −1. and γ when fitting a quadratic curve of the form ing them the values . if we want to analyze data on the height and weight of certain animals. 1. Using the method of least squares. and σ and equating the expressions that we obtain to zero. and σ . where it is assumed that for each fixed xi the conditional density of the corresponding random variable Yi is the normal density   yi −(α. 1. . . let us write  n (xi − x)(Yi − Y) SxY i=1 B̂ = = Sxx Sxx n  xi − x = Yi Sxx i=1 which is seen to be a linear combination of the n independent normal random vari- ables Yi . the maximum likelihood estimates of α and β are identical with the least squares estimate of Theorem 2. B̂ itself has a normal distribution with the mean n   xi − x E(B̂) = · E(Yi |xi ) Sxx i=1 n    xi − x = (α + βxi ) = β Sxx i=1 403 . Correspond- ing theory relating to Â. if we substitute these estimates of α and ⭸ ln L β into the equation obtained by equating to zero. will be treated in Exercises 20 and 22. where A is the capital Greek letter alpha. where B is the capital Greek letter beta. Also. let us now investigate their use in testing hypotheses concerning α and β and in con- structing confidence intervals for these two parameters. also. To study the sampling distribution of B̂. the null hypothesis β = 0 is equivalent to the null hypothesis ρ = 0). whereas α is merely the y-intercept. Having obtained maximum likelihood estimators of the regression coefficients. Since problems concerning β are usually of more immediate interest than problems concerning α (β is the slope of the regression line. Regression and Correlation 1  n ⭸ ln L = 2· [yi − (α + βxi )] = 0 ⭸α σ i=1 1  n ⭸ ln L = 2· xi [yi − (α + βxi )] = 0 ⭸β σ i=1 1  n ⭸ ln L n =− + 3· [yi − (α + βxi )]2 = 0 ⭸σ σ σ i=1 Since the first two equations are equivalent to the two normal equations in an earlier page. we shall discuss here some of the sampling theory relating to B̂. it follows immediately that ⭸σ the maximum likelihood estimate of σ is given by   1  n σ̂ =  · [yi − (α̂ + β̂xi )]2 n i=1 This can also be written as  1 σ̂ = (Syy − β̂ · Sxy ) n as the reader will be asked to verify in Exercise 17. Under the assumptions of normal regression analysis. we shall have to use the following theorem. H0 : β = 3 H1 : β > 3 α = 0. Based on this statistic. where t is determined in accordance with Theorem 4 and 2. A proof of this theorem is referred to at the end of this chapter.8 obtained from the Table IV of “Statistical Tables. 2 is σ a value of a random variable having the chi-square distribution with n − 2 degrees of freedom. EXAMPLE 5 With reference to the data in the table in Section 3 pertaining to the amount of time that 10 persons studied for a certain test and the scores that they obtained. β̂ − β  √ σ/ Sxx β̂ − β (n − 2)Sxx t=  = σ̂ n nσ̂ 2 /(n − 2) σ2 is a value of a random variable having the t distribution with n − 2 degrees of freedom. test the null hypothesis β = 3 against the alternative hypothesis β > 3 at the 0.896. Reject the null hypothesis if t G 2. Making use of this theorem as well as the result proved earlier that B̂ has a σ2 normal distribution with the mean β and the variance .01 level of significance.” 404 .896 is the value of t0. Furthermore. this random variable and B̂ are inde- pendent. nσ̂ 2 THEOREM 3.01. Regression and Correlation and the variance n    xi − x 2 var(B̂) = · var(Yi |xi ) Sxx i=1 n    xi − x 2 σ2 = ·σ2 = Sxx Sxx i=1 In order to apply this theory to test hypotheses about β or construct confidence intervals for β. let us now test a hypothesis about the regression coefficient β. Under the assumptions of normal regression analysis. THEOREM 4. we find that the definition Sxx of the t distribution leads to the following theorem. Solution 1.01 2. 562 − (564)2 = 4.720 10 so that  3.   n n β̂ − tα/2. Under the assumptions of normal regression analysis.4 − (3.471 − (2. we have    B̂ − β (n − 2)Sxx P −tα/2. n−2 · σ̂ (n − 2)Sxx (n − 2)Sxx is a (1 − α)100% confidence interval for the parameter β. n−2 · ˆ < β < B̂ + tα/2.562 from the original data and copying the other quan- tities from Section 3.10 405 .752.305)] = 4. Calculating y2 = 36.471 + (2. Solution Copying the various quantities from Example 4 and Section 4 and substituting them together with t0.720 10 4.306)(4.84 < β < 4.4 10 and  1 σ̂ = [4.752. Regression and Correlation  3.471 − 3 8 · 376 t= = 1.896. Writing this as     n n P B̂ − tα/2. Letting ˆ be the random variable whose values are σ̂ .025. n−2 = 1 − α ˆ n according to Theorem 4.73 is less than 2.306)(4. THEOREM 5. n−2 · σ̂ < β < β̂ + tα/2.720) 8(376) 8(376) or 2. we get 1 Syy = 36. we cannot conclude that on the average an extra hour of study will increase the score by more than 3 points.306 into the confidence interval formula of Theorem 5. the null hypothesis cannot be rejected. construct a 95% confidence inter- val for β.73 4. Since t = 1. we get   10 10 3.8 = 2. n−2 < < tα/2.720) < β < 3. n−2 · ˆ = 1−α (n − 2)Sxx (n − 2)Sxx we arrive at the following confidence interval formula.471)(1. EXAMPLE 6 With reference to the same data as in Example 5. normal distribution.306)(0. Show that nSxx (a) ˆ 2 . Had we used this printout in Example 5. √ (α̂ − α) nSxx 19. Making use of the fact that α̂ = y − β̂x and β̂ = . (b) S2e = is an unbiased estimator of σ 2 .2723).471 − 3 t= = 1. it provides not only the values of α̂ and β̂ in the column headed COEFFICIENT. rewrite z=  (a) the expression for t in Theorem 4.471 . 5.74 Figure 4. Regression and Correlation Since most realistically complex regression problems require fairly extensive cal- culations. σ Sxx + nx2 (b) the confidence interval formula of Theorem 5. but also estimates of the standard deviations of the sam- pling distributions of  and B̂ in the column headed ST. 21. α̂ = yi Sxx nSxx show that i=1  n [yi − (α̂ + β̂xi )]2 = Syy − β̂Sxy (b)  has a normal distribution with i=1 (Sxx + nx2 )σ 2 E(Â) = α and var(Â) = 18.470 /7 / 0.194 6. n−2 22.2723 and in Example 6 we could have written the confidence limits directly as 3.7 + 3. 21.2723 12. This question has been intentionally omitted for this n · ˆ 2 edition. Use the result of part (b) of Exercise 20 to show that The quantity se is often referred to as the standard error of estimate. T–RATIO = COLUMN COEFFICIENT OF COEF. they are virtually always done nowadays by using appropriate computer software.47 X ST. the random variable corresponding to σ̂ 2 . is not an unbiased estimator of σ 2 . we could have written the value of the t statistic directly as 3. Under the assumptions of normal regression analysis. as can be seen. DEV. and 6. Exercises   Sxy  n Sxx + nx2 − nxxi 17. OF COEF. is a value of a random variable having the standard 20. Using se (see Exercise 18) instead of σ̂ . MTB > NAME C1 = 'X' MTB > NAME C2 = 'Y' MTB > SET C1 DATA > 4 9 10 / 14 4 7 12 22 1 17 MTB > SET C2 DATA > 31 58 65 73 37 44 60 / 91 21 84 MTB > REGR C2 1 C1 THE REGRESSION EQUATION IS Y = 21. A printout obtained for our illustration using MINITAB software is shown in Figure 4. Also. (2. use the first part of Theorem show that nˆ 2 (a) the least squares estimate of α in Theorem 2 can be 3 and the fact that  and are independent to σ2 written in the form show that 406 . COEF/S.693 3. DEV. Computer printout for Examples 4.73 0.D.79 X 3. Solve the double inequality −tα/2. knowledge of y0 . i = 1. inequality may be interpreted like a confidence interval. . use the first part of Theorem 3 as well as the fact t=  nˆ 2 n(x0 − x)2 that Ŷ0 and 2 are independent to show that σ̂ 1 + n + σ Sxx √ (ŷ0 − μY|x0 ) n − 2 t=  is a value of a random variable having the t distribution n(x0 − x)2 with n − 2 degrees of freedom. μ2 . that is. and ρ. n−2 with t given by the formula of vides limits of prediction for a future observation of Y Exercise 23. we get ⭸μ1 ⭸μ2 407 . To estimate these parameters by the method of maximum likelihood.   23. n}. partially with respect to μ1 . by solving the double inequal. yi ) i=1 and to this end we shall have to differentiate L. instead. σ1 . Regression and Correlation √ (α̂ − α) (n − 2)Sxx 25. n−2 < t < tα/2. σ̂ 1 + Sxx 26. Use the results of Exercises 20 and 21 and the fact that t=  σ2 σ̂ Sxx + nx2 E(B̂) = β and var(B̂) = to show that Y0 − ( + B̂x0 ) Sxx is a random variable having a normal distribution with is a value of a random variable having the t distribution zero mean and the variance with n − 2 degrees of freedom. Use the results of Exercises 20 and 21 and the fact that 2 1 (x0 − x)2 σ2 σ 1+ + E(B̂) = β and var(B̂) = to show that Ŷ0 =  + B̂x0 n Sxx Sxx is a random variable having a normal distribution with the mean Here Y0 has a normal distribution with the mean α + βx0 α + βx0 = μY|x0 and the variance σ 2 . and ρ. it is not designed to estimate a parameter. 5 Normal Correlation Analysis In normal correlation analysis we drop the assumption that the xi are fixed constants. σ2 . Also. Note that although the resulting double 24. μ2 . that corresponds to the (given or observed) value x0 . analyzing the set of paired data {(xi . let ⭸ ln L ⭸ ln L us merely state that when and are equated to zero. where the xi ’s and yi ’s are values of a random sample from a bivariate normal population with the parameters μ1 .n−2 with t given by the formula of Exercise 25 so that the mid- is a value of a random variable having the t distribution dle term is y0 and the two limits can be calculated without with n − 2 degrees of freedom. Leaving the details to the reader. Y0 is a future observation of Y corresponding to x = x0 . or ln L. . it pro- ity −tα/2. Derive a (1 − α)100% confidence interval for μY|x0 . the mean of Y at x = x0 . use the first part of and the variance Theorem 3 as well as the fact that Y0 − ( + B̂x0 ) and   nˆ 2 1 (x0 − x)2 are independent to show that σ2 + σ2 n Sxx √ [y0 − (α̂ + β̂x0 )] n − 2 Also. equate the resulting expressions to zero. . σ1 . yi ).n−2 < t < tα/2. and then solve the result- ing system of equations for the five parameters. . 2. we shall have to maximize the likelihood  n L= f (xi . σ2 . then Sxy r=  Sxx · Syy Since ρ measures the strength of the linear relationship between X and Y. n The estimate ρ̂. we find that the maximum likelihood estimates of these two parameters are μ̂1 = x and μ̂2 = y ⭸ ln L ⭸ ln L ⭸ ln L that is. yi ). . Regression and Correlation  n  n (xi − μ1 ) ρ (yi − μ2 ) i=1 i=1 − + =0 σ12 σ1 σ2 and  n  n ρ (xi − μ1 ) (yi − μ2 ) i=1 i=1 − + =0 σ1 σ2 σ22 Solving these two equations for μ1 and μ2 . is usually denoted by the letter r. When ρ = 0. THEOREM 6. If {(xi .) It is of interest to note that the maximum likelihood estimates of σ1 and σ2 are identical with the one obtained for the standard deviation of the univariate normal distribution. σ̂2 = n n  n (xi − x)(yi − y) i=1 ρ̂ =      n  n  (xi − x)2  (yi − y)2 i=1 i=1 (A detailed derivation of these maximum likelihood estimates is referred to at the end of this chapter. computing formula. Subsequently. i = 1. equating . the two random variables are uncorrelated. there are many problems in which the estimation of ρ and tests concerning ρ are of special interest. as we have 408 . . and ⭸σ1 ⭸σ2 ⭸ρ to zero and substituting x and y for μ1 and μ2 . 2. and its calculation is facilitated by using the following alternative. the respective sample means. called the sample correlation coefficient. . and. . but equivalent. they differ from the respective sample standard devi- n−1 ations s1 and s2 only by the factor . . n} are the values of a random sample from a bivariate population. we obtain a system of equations whose solution is    n  n    (xi − x)2  (yi − y)2    i=1  i=1 σ̂1 = . n} fall on a straight line.4 10.6 6. . 2.6 is “nine times as strong” as a correlation of r = 0.7 is almost “twice as strong” as a correlation of r = 0. When ρ equals +1 or −1. 100r2 is the percentage of the total variation of the y’s that is accounted for by the relationship with x.2 8. but also serves to tie together the concepts of regression and correlation. . EXAMPLE 7 Suppose that we want to determine on the basis of the following data whether there is a relationship between the time.4 8. In order to interpret values of r between 0 and +1 or 0 and −1.7 9.7. it follows from the relationship 2 σY|x = σ 2 = σ22 (1 − ρ 2 ) where σ = 0. it takes a secretary to complete a certain form in the morning and in the late afternoon: Morning Afternoon x y 8.9 11. yi ).2. For instance. we might say that a correlation of r = 0. Similarly. when r = 0. .9 9.5 10.0 6. . From this formula for σ̂ 2 it is clear that when σ̂ 2 = 0. We take r = +1 when the line has a positive slope and r = −1 when it has a negative slope. and hence σ̂22 − σ̂ 2 measures that part of the total variation of the y’s that is accounted for by the relationship with x.6 9. then r will equal +1 or −1. in the case of the bivariate normal distribution this means that they are also independent.4 8.2 6. 409 .5.3 Compute and interpret the sample correlation coefficient. and we might thus say that a correla- tion of r = 0. Regression and Correlation already seen.1 7. Thus.0 9. when the set of data points {(xi . and this means that there is a perfect linear relationship between X and Y.6 9.5. that is. in minutes.3 8. we can write σ̂ 2 = σ̂22 (1 − r2 ) which not only provides an alternative computing formula for finding σ̂ 2 . when r = 0. Using the invariance property of maximum likelihood estimators. we solve the preceding equation for r2 and multiply by 100.5 12. getting σ̂22 − σ̂ 2 100r2 = · 100 σ̂22 where σ̂22 measures the total variation of the y’s. i = 1.3 7.6 7. σ̂ 2 measures the conditional varia- tion of the y’s for fixed values of x. then 49 percent of the variation of the y’s is accounted for by the relationship with x. then 25 percent of the variation of the y’s is accounted for by the relationship with x. 024 r= √ = 0.661 10 1 Syy = 819.92. so 1 Sxx = 771.6. and xy = 792.35.936 (19.7.34. Scattergram of data of Example 7. Since 100r2 = 100(0. Since the sampling distribution of R for random samples from bivariate normal populations is rather complicated.8) = 23.34 − (88.936)2 = 87.796 10 1 Sxy = 792. x = 86.7)2 = 19.7)(88.024 10 and 23. it is common practice to base confidence intervals for ρ and tests concerning ρ on the statistic 410 . y (minutes) 12 11 10 9 8 Afternoon 7 6 5 4 3 2 1 x (minutes) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Morning Figure 5.661)(30.8.35 − (86. and this is also apparent from the scattergram of Figure 5. y2 = 819. Regression and Correlation Solution From the data we get n = 10. x2 = 771.8)2 = 30. y = 88.796) This is indicative of a positive association between the time it takes a secretary to perform the given task in the morning and in the late afternoon. we can say that almost 88 percent of the variation of the y’s is accounted for by the implicit linear relationship with x.92 − (86. 575. Use the formula for t of Exercise 28 to show that if β̂ 1 − r2 the assumptions underlying normal regression analysis 411 . Exercises 27.936. 1 − r 2 ten as β̂ ⎣1 . Use the formula for t of Exercise 28 to derive the fol- μ1 . Since z = 4. EXAMPLE 8 With reference to Example 7. Substituting n = 10 and r = 0. or we can calculate confidence intervals for ρ by the method suggested in Exercise 31.5 exceeds 2. where √ n−3 1+r z= · ln 2 1−r 3.5 2 0. n−2 · √ ⎦ r n−2   √ β r n−2 t = 1−  30. σ1 .936 z= · ln = 4. test the null hypothesis ρ = 0 against the alternative hypothesis ρ Z 0 at the 0.575 or z G 2. Regression and Correlation 1 1+R · ln 2 1−R whose distribution can be shown to be approximately normal with the mean 1 1+ρ 1 · ln and the variance . H0 : ρ = 0 H1 : ρ Z 0 α = 0. Using this approximation. Reject the null hypothesis if z F −2. the null hypothesis must be rejected. we can test the null hypothesis ρ = ρ0 against an appropriate alternative.01 2. Verify that the formula for t of Theorem 4 can be writ. and ρ given in Section 5.064 4. Verify the maximum likelihood estimates of 29. Solution 1.01 level of significance. we con- clude that there is a linear relationship between the time it takes a secretary to complete the form in the morning and in the late afternoon.575. as illustrated in Example 8. μ2 . σ2 . we get √ 7 1. lowing (1 − α)100% confidence limits for β: ⎡  ⎤ 28. Thus. 2 1−ρ n−3 1 1+r 1 1+ρ · ln − · ln 2 1−r 2 1−ρ z= 1 √ n−3 √ n−3 (1 + r)(1 − ρ) = · ln 2 (1 − r)(1 + ρ) can be looked upon as a value of a random variable having approximately the stan- dard normal distribution. tα/2. x2 denotes a composite retail price of pork in cents per pound. 412 .. . it stands to reason that predictions should improve if one considers additional relevant information. for instance. . the mean . n} the least squares estimates of the β’s are the values β̂0 . multiple regression coefficients are usually estimated by the method of least squares. Regression and Correlation are met and β = 0. i = 1. xi2 . 2. write a formula for the coefficient of derive a (1 − α)100% confidence interval formula for ρ. but also the potential demand and the competition. . By solving the double inequality −zα/2 F z F zα/2 the value xi and f·j the number of pairs where Y takes (with z given by the formula in the previous page) for ρ. .. In a random sample of n pairs of values of X and Y. yi ). . . . . 2. . . we should be able to make better predictions of a new textbook’s success if we consider not only the quality of the work. . xi1 is the ith value of the variable x1 . For instance.090x1 + 0. yj ) occurs fij times for i = 1. To illustrate. and x3 denotes family income as measured by a certain payroll index. r and j = 1. Example 3). xk = β0 + β1 x1 + β2 x2 + · · · + βk xk This is partly a matter of mathematical convenience and partly due to the fact that many relationships are actually of this form or can be approximated closely by lin- ear equations.064x2 + 0. For n data points {(xi1 . x2 . which was obtained in a study of the demand for different meats: ŷ = 3. . . consider the following equation. Y is the random variable whose values we want to predict in terms of given values of the independent variables x1 . . β̂2 . β̂1 . . and the multiple regression coefficients β0 . . xik . Letting fi· denote the number of pairs where X takes on n−1 31. correlation. on the value yj . then R2 has a beta distribution with 32. xi2 is the ith value of the variable x2 .019x3 Here ŷ denotes the estimated family consumption of federally inspected beef and pork in millions of pounds. Although many different formulas can be used to express regression relation- ships among more than two variables (see.. . . but also their years of experience and their personality. As in Section 3. . . where there was only one independent variable x. . . . In this notation. the most widely used are linear equations of the form μY|x1 . x2 .489 − 0. and βk are numerical constants that must be determined from observed data. 2. .. β1 . and β̂k for which the quantity n q= [yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )]2 i=1 is a minimum. 6 Multiple Linear Regression Although there are many problems in which one variable can be predicted quite accurately in terms of another. 1 (xi . . c. . β2 . x1 denotes a composite retail price of beef in cents per pound. and so on. Also. and xk . In the preceding equation. we should be able to make better predictions of the performance of newly hired teachers if we consider not only their education. . . we get ⭸q  n = (−2)[yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )] = 0 ⭸β̂0 i=1 ⭸q  n = (−2)xi1 [yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )] = 0 ⭸β̂1 i=1 ⭸q  n = (−2)xi2 [yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )] = 0 ⭸β̂2 i=1 ··· ⭸q n = (−2)xik [yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )] = 0 ⭸β̂k i=1 and finally the k + 1 normal equations y = β̂0 · n + β̂1 · x1 + β̂2 · x2 + · · · + β̂k · xk x1 y = β̂0 · x1 + β̂1 · x21 + β̂2 · x1 x2 + · · · + β̂k · x1 xk x2 y = β̂0 · x2 + β̂1 · x2 x1 + β̂2 · x22 + · · · + β̂k · x2 xk ··· xk y = β̂0 · xk + β̂1 · xk x1 + β̂2 · xk x2 + · · · + β̂k · x2k  n   n  Here we abbreviated our notation by writing xi1 as x1 .600 4 3 317.500 2 1 265. and equating these partial derivatives to zero.000 2 2 275. i=1 i=1 and so on.000 2 1 264. 413 . and the prices at which a random sample of eight one-family houses sold in a certain large housing development: Number of Number of Price bedrooms baths (dollars) x1 x2 y 3 2 292.500 5 3 333. Regression and Correlation So.500 Use the method of least squares to fit a linear equation that will enable us to pre- dict the average sales price of a one-family house in the given housing development in terms of the number of bedrooms and the number of baths. we differentiate partially with respect to the β̂’s.500 3 2 302. xi1 xi2 as x1 x2 . EXAMPLE 9 The following data show the number of bedrooms. the number of baths.000 4 2 307. x1 = 25.929 + 15. β̂1 = 15. Solution Substituting x1 = 3 and x2 = 2 into the least squares equation obtained in the preceding example. x1 x2 = 55.000 C1 15314 2743 5. EXAMPLE 10 Based on the result obtained in Example 9.600 and we get       n = 8.957x2 and this tells us that (in the given housing development and at the time of this study) each bedroom adds on the average $15. predict the sales price of a three-bedroom house with two baths in the subject housing development. x22 = 36 2.929.003 C2 10957 4086 2. Computer printout for Example 9. which shows in the column headed “Coef” that β̂0 = 224. Regression Analysis: C3 versus C1.600. C2 The regression equation is C3 = 224929 + 15314 C1 + 10957 C2 Predictor Coef SE Coef T P Constant 224929 5016 44. Regression and Correlation Solution The quantities we need to substitute into the three normal equations are:   x1 y = 7. Let us refer to the printout of Figure 6.357. y = 2.835.558.957 to the sales price of a house. such work is usually left to computers.7% R-Sq(adj) = 96.8% Figure 6.929 + 15.84 0.044 S = 4444.200 = 25β̂0 + 87β̂1 + 55β̂2 4.314 and each bath adds $10. x2 = 16.957 · 2 = $292.314 · 3 + 10.785 414 .558.200 and x2 y = 4.357.45 R-Sq = 97.68 0.314x1 + 10. we get ŷ = 224.957.58 0.600 = 8β̂0 + 25β̂1 + 16β̂2 7. The least squares equation becomes ŷ = 224. x21 = 87. and β̂2 = 10.600 = 16β̂0 + 55β̂1 + 36β̂2 We could solve these equations by the method of elimination or by using deter- minants. but in view of the rather tedious calculations.835.314. . . let us begin here with the normal equations given earlier. . THEOREM 7. Regression and Correlation Printouts like those of Figure 6 also provide information that is needed to make inferences about the multiple regression coefficients and to judge the merits of esti- mates or predictions based on the least squares equations. we shall denote matrices by capital letters in boldface type. where we shall study the whole problem of multiple linear regression in a much more compact notation. . Using these matrices. but we shall defer it until Section 7. . and B is a (k + 1) * 1 matrix (or column vector) consisting of the least squares estimates of the regression coefficients. 415 . . . . As is customary. † It is assumed for this section that the reader is familiar with the material ordinarily covered in a first course on matrix algebra.⎠ 1 xn1 xn2 · · · xnk ⎛ ⎞ ⎛ ⎞ y1 β̂ ⎜ ⎟ ⎜ 0⎟ ⎜y2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜β̂1 ⎟ ⎜ ⎟ ⎜ ⎟ Y = ⎜ # ⎟ and B = ⎜ ⎜ # ⎟⎟ ⎜# ⎟ ⎜# ⎟ ⎜ ⎟ ⎜ ⎟ ⎝# ⎠ ⎝# ⎠ yn β̂k The first one. To express the normal equations in matrix notation. . with the column of 1’s appended to accommodate the constant terms. We could introduce the matrix approach by expressing the sum of squares q (which we minimized in the preceding section by differentiating partially with respect to the β̂’s) in matrix notation and take it from there. . is an n * (k + 1) matrix consisting essentially of the given values of the x’s. . X. . . . . Y is an n * 1 matrix (or column vector) consisting of the observed values of Y. 7 Multiple Linear Regression (Matrix Notation)† The model we are using in multiple linear regression lends itself uniquely to a unified treatment in matrix notation. This corresponds to the work of Section 4. This notation makes it possible to state general results in compact form and to utilize many results of matrix theory to great advantage. The least squares estimates of the multiple regression coeffi- cients are given by B = (X X)−1 X Y where X is the transpose of X and (X X)−1 is the inverse of X X. . we can now write the following symbolic solution of the normal equations. let us define the following three matrices: ⎛ ⎞ 1 x11 x12 · · · x1k ⎜ ⎟ ⎜1 x x · · · x ⎟ X=⎜ ⎜ 21 22 2k ⎟ ⎟ ⎝. but leaving this to the reader in Exercise 33. . . and X Y. x1 x2 = 55. getting ⎛ ⎞ n x1 x2 · · · xk ⎜ ⎟ ⎜ x1 x1 2 x1 x2 · · · x1 xk ⎟ ⎜ ⎟ X X = ⎜ ⎜ x2 x2 x1 x2 2 · · · x2 xk ⎟ ⎟ ⎜ ⎟ ⎝ ··· ⎠ xk xk x1 xk x2 · · · x2k ⎛ ⎞ β̂0 · n + β̂1 · x1 + β̂2 · x2 + · · · + β̂k · xk ⎜ ⎟ ⎜ β̂ · x + β̂ · x2 + β̂2 · x1 x2 + · · · + β̂k · x1 xk ⎟ ⎜ 0 1 1 1 ⎟  ⎜ ⎟ X XB = ⎜ β̂0 · x2 + β̂1 · x2 x1 + β̂2 · x2 + · · · + β̂k · x2 xk ⎟ ⎜ 2 ⎟ ⎜ ··· ⎟ ⎝ ⎠ β̂0 · xk + β̂1 · xk x1 + β̂2 · xk x2 + · · · + β̂k · x2k ⎛ ⎞ y ⎜ ⎟ ⎜ x1 y ⎟ ⎜ ⎟ X Y = ⎜ ⎟ ⎜ x2 y ⎟ ⎜ ⎟ ⎝ ··· ⎠ xk y Identifying the elements of X XB as the expressions on the right-hand side of the normal equations given in an earlier page and those of X Y as the expressions on the left-hand side. x22 = 36. we can write X XB = X Y Multiplying on the left by (X X)−1 . Solution      Substituting x1 = 25. use Theorem 7 to determine the least squares esti- mates of the multiple regression coefficients. and n = 8 from Example 9 into the preceding expression for X X. x21 = 87. EXAMPLE 11 With reference to Example 9. Regression and Correlation Proof First we determine X X. we get ⎛ ⎞ 8 25 16 ⎜ ⎟ X X = ⎝ 25 87 55 ⎠ 16 55 36 416 . we get (X X)−1 X XB = (X X)−1 X Y and finally B = (X X)−1 X Y since (X X)−1 X X equals the (k + 1) * (k + 1) identity matrix I and by def- inition IB = B. X XB. x2 = 16. We have assumed here that X X is nonsingular so that its inverse exists. 2. .000 1 ⎜ ⎟ = · ⎝ 1.600 and finally ⎛ ⎞⎛ ⎞ 07 −20 −17 2. xi2 .286. as in Section 4.600 ⎜ ⎟ X Y = ⎝ 7. we assume that for i = 1. xik . we then get ⎛ ⎞ 2.929 ⎜ ⎟ = ⎝ 15. 357. and n.400 ⎠ 84 920.600 1 ⎜ ⎟⎜ ⎟ B̂ = (X X)−1 X Y = · ⎝ −20 32 −40 ⎠ ⎝ 7. we find that ⎛ ⎞ 107 −20 −17 1 ⎜ ⎟ (X X)−1 = · ⎝ −20 32 −40 ⎠ 84 −17 −40 71   of |X X|.400 ⎛ ⎞ 224. so they are given by the elements of the (k + 1) * 1 column matrix B = (X X)−1 X Y 417 .  Substituting y = 2. and x2 y = 4. and it will be left to the reader in Exercise 33. Note that the results obtained here are identical with those shown on the computer printout of Figure 6.894. . using the one based on cofactors. .957 where the β̂’s are rounded to the nearest integer.314 ⎠ 10. The results are as follows: The maximum likelihood estimates of the β’s equal the corresponding least squares estimates.600 ⎛ ⎞ 18. Finding maximum likelihood estimates of the β’s and σ is straightforward.835.558.200 ⎠ 84 −17 −40 71 4. 835.200 ⎠ 4. 200. the inverse of the matrix can be obtained by any one of a number of different techniques. 600 from Example 9 into the expression for X Y (given above). . 600.. 558. the Yi are independent random variables having normal distributions with the means β0 + β1 xi1 + β2 xi2 + · · · + βk xik and the common standard deviation σ . Regression and Correlation Then. and judge the merits of estimates and predictions based on the estimated multiple regression equation. x1 y = 7.357. the determinant where 84 is the value  of X X. to generalize the work of Section 4. Next. .835. Based on n data points (xi1 . yi ) we can then make all sorts of inferences about the parameters of our model. . the β’s and σ .558. .357. 024.160.0001 Then.000 − 699.160. analogous to the stan- dard error of estimate that we defined earlier.160. which is simply y2i obtaining i=1 Y Y = (292.444. we get ⎛ ⎞ 637.285 It follows that  699.835.000 1 ⎜ ⎟ B X Y = · (18.000)2 + (264.394. copying B and X Y. EXAMPLE 12 Use the results of Example 11 to determine the value of σ̂ for the data of Example 9. + (307.123. S = 4. It differs from σ̂ in that we divide by n − k − 1 instead of n.285 σ̂ = 8 = 3.123. is such that S2 is an unbiased estimate of σ 2 .394. as the reader will be asked to verify in Exercise 35.600 = 699.894.394.024. does not equal the one shown in the computer printout of Figure 6.400) ⎝ 7.400 920.123.600)2 + .000 − 699. . 418 . If we had done so in our example. The estimate shown there. . Solution  n First let us calculate Y Y.558.514.444 rounded to the nearest integer. 3.500)2 = 699. Regression and Correlation The maximum likelihood estimate of σ is given by   1 n σ̂ =  · [yi − (β̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik )]2 n i=1 where the β̂’s are the maximum likelihood estimates of the β’s and. we would have obtained  699.285 se = 8−2−1 = 4. this estimator can also be written as  Y Y − B X Y σ̂ = n in matrix notation.024.200 ⎠ 84 4.000 286.514 It is of interest to note that the estimate that we obtained here. β̂1 = 15. 1. and . 500 at the 0. Let us also state the result that. . they are unbiased estimators. 1. ˆ Leaving the details to the reader. THEOREM 8. that is.5 obtained from the table of Values of tα. 500 against the alter- native hypothesis β1 > $9. . the random variable corresponding to .05 2. Substituting n = 8. . with i and j taking on the values 0. . . Reject the null hypothesis if t Ú 2. 1. we investigate next the sampling distribution of the B̂i for i = 0.546 from Example 12 into the formula for t. .500 H1 : β1 > 9. k and their variances are given by var(B̂i ) = cii σ 2 for i = 0. . . . EXAMPLE 13 With reference to Example 9. . E(B̂i ) = βi for i = 0. ν of “Statistical Tables. .500 α = 0. . 1. Furthermore. . is the chi-square dis- σ2 σ2 nˆ 2 tribution with n − k − 1 degrees of freedom and that 2 and B̂i are independent σ for i = 0. . . Combining all these results. let us merely point out that arguments similar to those in Section 4 lead to the results that the B̂i are linear combinations of the n independent random variables Yi so that the B̂i themselves have normal distributions.314. k Here cij is the element in the ith row and the jth column of the matrix (X X)−1 . and 2. Solution 1. we get 419 . 1. and c11 = 32 84 from Example 11 and σ̂ = 3. . k.” 3.05 level of significance.015. .05. . we find that the definition of the t distribution leads to the following theorem. . let us now test a hypothesis about one of the multiple regression coefficients. k. test the null hypothesis β1 = $9. the sampling distri- nˆ 2 nσ̂ 2 bution of . k n|cii | σ̂ · n−k−1 are values of random variables having the t distribution with n − k − 1 degrees of freedom. H0 : β1 = 9. where t is determined in accordance with Theorem 8. Under the assumptions of normal multiple regression analysis. . k. . . Based on this theorem. Regression and Correlation Proceeding as in Section 4. 1. . β̂i − βi t=  for i = 0.015 is the value of t0. . analogous to Theorem 3. k.814 = 2. If b is the column vector of the β’s. k..743 = 2. . . Use the t statistic of Theorem 8 to construct a (1 − is a value of a random variable having the t distribution α)100% confidence interval formula for βi for i = with n − k − 1 degrees of freedom.. Y Y − B X Y σ̂ = (b) Derive a (1 − α)100% confidence interval formula for n μY|x01 . 1.743. B̂j ) = cij σ 2 for i Z j = 0. . . variance σ 2 . . Show that under the assumptions of normal multiple regression analysis 40.. verify in matrix 1 notation that q = (Y − Xb) (Y − Xb) is a minimum when ⎜ ⎟ ⎜x01 ⎟ b = B = (X X)−1 (X Y). 1. σ̂ · n−k−1 38. 2. tribution with the mean β0 + β1 x01 + · · · + βk x0k and the (c) cov(B̂i . (b) the maximum likelihood estimate of σ is B X0 − μY|x01 . x2 . x02 . 1. . k. x02 . 1. 0.015. Exercises ⎛⎞ 33.. . Show that for k = 1 the formulas of Exercise 36 are t=  equivalent to those given in Section 4 and in Exercises 20 n[1 + X0 (X X)−1 X0 ] and 21. .. x0k t=   n[X0 (X X)−1 X0 ] (Y − XB) (Y − XB) σ̂ · σ̂ = n−k−1 n 35. x0k 36. x02 .500 to the sales price of such a house. (Note that the value of the denominator of the t statistic. Regression and Correlation 15. . we can also use the t statistic of Theorem 8 to con- struct confidence intervals for regression coefficients (see Exercise 38). . . we con- clude that on the average each bedroom adds more than $9.500 t=  % % % % 8·% 32 84 % 3.12 4. equals the second value in the column headed “SE Coef” in the com- puter printout of Figure 6. . . . x02 .. . xk and (b) Derive a formula for (1 − α)100% limits of prediction X0 is the column vector for a future observation of Y0 .314 − 9.12 exceeds 2. Verify that under the assumptions of normal multiple regression analysis x0k (a) the maximum likelihood estimates of the β’s equal the it can be shown that corresponding least squares estimates. ..⎠ 34. it can be shown that y0 − B X0 37. .. Verify that the estimate of part (b) of Exercise 34 can is a value of a random variable having the t distribution also be written as with n − k − 1 degrees of freedom. . ⎜ ⎟ X0 = ⎜x02 ⎟ ⎜ ⎟ ⎝.. . x0k are given values of x1 . Since t = 2. . the null hypothesis must be rejected. . (a) Show that for k = 1 this statistic is equivalent to the  one of Exercise 23. If x01 .514 5 5. . (a) Show that for k = 1 this statistic is equivalent to the one of Exercise 25. With x01 . . k. and Y0 being a random variable that has a normal dis- (b) var(B̂i ) = cii σ 2 for i = 0. x0k and X0 as defined in Exercise 39 (a) E(B̂i ) = βi for i = 0. . 39..) Analogous to Theorem 5. . 420 . . . . It is tempting to conclude that the coefficients in this.64 240 0.884 0.7 259 0.71 265 0.71 228 0.68 217 0.119 9 6. Suppose 5 major variables involved in the machine setup are measured for each run. let us consider the following example.870 0.105 14 6.62 269 0.053 8 6.4 254 0.77 229 0.64 266 0.7 238 0. x1 x2 x3 x4 x5 y 1 6.8 258 0.306 19 6. with the following results: Solder Flux Preheat Faults per Conveyor tempe.872 0.100 13 5.862 0. Specifically. and all solder joints are made.73 246 0. In this section.877 0. an entire circuit board is run through the wave-soldering machine.201 2 5. (Each board contains 460 solder joints.70 267 0.860 0.76 238 0.075 6 5.286 18 6.77 270 0.7 264 0.869 0.881 0. Conveyor tempe. and ways to deal with them.69 250 0. Regression and Correlation 8 The Theory in Practice Multiple linear regression is used (and misused) widely in applications. and the number of defective solder joints per 100 joints inspected is recorded.7 255 0.8 262 0. and y in C6.4 260 0.860 0. A total of 25 separate runs of 5 boards each are made. the “regress” command produces the results shown in Figure 7. we shall examine the problem of multicollinearity.863 0.5 258 0.870 0.76 234 0.888 0. In wave soldering of circuit boards.69 249 0.3 256 0.102 Using MINITAB software to perform a linear multiple regression analysis.65 250 0.3 264 0. In addition. concen.008 25 5.868 0.6 250 0.76 233 0. . .196 15 5.126 16 6.876 0.8 254 0.287 23 5.7 260 0.4 259 0. 100 solder Run angle rature tration speed rature joints No. .854 0.) The soldered boards are subjected to visual and electrical inspection.853 0. To begin. Then.3 260 0.855 0. .872 0.5 250 0.8 244 0. x5 in C5.879 0. multiple- regression analysis represent the “effects” of the corresponding predictor variables 421 . x2 in C2.132 7 5.239 4 6.053 3 6.81 246 0.403 20 6.1 241 0.092 24 5.891 0.4 239 0.1 260 0.6 249 0.214 22 5.876 0. we shall introduce methods for examin- ing the residuals in a multiple regression analysis to check on the assumption of normality and other characteristics of the data.888 0. we set the values of x1 in column C1.242 5 5.162 21 6.64 276 0.216 17 6.369 12 5.75 230 0.79 252 0.81 262 0.80 261 0. or any other.80 245 0.68 251 0.854 0.171 11 6.891 0.74 245 0. we shall discuss some of the pitfalls presented by indiscriminate use of multiple regression analysis. in the same order as the run numbers shown in the data table.172 10 6.8 247 0.875 0.1 256 0.2 241 0. the number of faults per 100 solder joints.281 . and x5 only.207 (Only a portion of the full matrix is shown here. and the correlation of any column with itself equals 1. is the estimated effect Y of increasing x1 by 1 unit. it is not possible to separate their effects on the dependent variable. Thus.030 . the random error is relatively easily quantified.00617x2 + 1.214 when x1 . the following correlation matrix of pairwise correlation coefficients has been computed for the wave-solder data by giving the MINITAB command CORRELATE C1-C5: C1 C2 C3 C4 C2 -. the correlation of C1 with C2 equals the correlation of C2 with C1. Regression and Correlation Figure 7. A much more serious source of error in interpreting the coefficients of a multiple regression equation arises from multicollinearity among the independent variables in the multiple regression equation. will increase by 0. is increased by 1 unit. The resulting multiple regression equation is ŷ = 0.039 . Computer printout for the example above. it appears that the coefficient of x1 . but it often plays only a small role relative to other sources of error.) It can be seen that several of the data columns involving independent variables show evidence of multicollinearity.251 .23 − 0.214.402 . on the dependent variable. x4 . But it probably is not true that Y. a confidence interval can be found for such a coefficient when it can be assumed that the residuals are approximately normally distributed. since the matrix is symmetrical. For example.215 C5 . for example. having the value 0. x3 .117 -.150x4 + 0. Any estimate of a coefficient in a regression analysis is subject to random error.328 C3 -.174 C4 -. the multiple regression equation previously obtained when all five independent variables were used was 422 . by omitting x1 from the regression equation. Using Theorem 8. In such cases we say that the effects of the independent variables are confounded with each other. that is.18x3 − 0. When at least some of the independent variables are highly correlated with each other. There are several reasons for making this statement.00238x5 By comparison. the conveyor angle. The effect of multicollinearity in this example can be observed directly by performing a multiple linear-regression analysis of y on x2 . To investigate the degree of correlation among the independent variables. 122x4 + 0. the coefficient of x2 . such as x2 .00617. (There appears to be one “outlying” observation. PUT RESIDUALS IN C7. or by the variables included in the equation. To illustrate these methods for checking residuals.214x1 − 0. are examples of such effects. However. the residuals were computed for the wave-solder regression analysis. the residuals will show a curvilinear trend.) A time trend in the residuals may suggest that these (and possibly other) variables should be controlled or their values measured and included in the regression equation when performing further research. a plot of the residuals against integers reflecting the order of taking the observations (or “run number”) or the time each observation was taken also should show a random pattern. When normal multiple regression analysis is to be used. and the coefficient of x4 actually changes sign. or at least minimized.0096x2 + 0. run 22.90x3 + 0. The normal-scores plot of the raw residuals is shown in Figure 8. gross departures will inval- idate the significance tests associated with the regression. even when standardization is employed. Regression and Correlation ŷ = −1. there is a risk of introducing further multicollinear- ity. The quantity ei = yi − ŷi is called the ith residual in the multiple regression. this method often cre- ates highly correlated independent variables. thereby making the problems of multicollinearity even worse.79 + 0. not included in the regression analysis. a plot of the residuals against the predicted values of y can reveal errors in the assumptions leading to the form of the fitted equation. For example. the regression equation remains useful for estimating values of the coefficients and for obtaining ŷ.) In addition. such as temperature and humidity.) The use of large multiple regression equations. x3 . x3 .000169x5 It is readily seen that the coefficients of x2 . If significance 423 . by standardizing the variables used in the regression analysis. (However. Often in practice. Standardized residuals can be found directly with MINITAB software by giving the command REGRESS C6 ON 5 PREDICTORS C1-C5. if a linear equation is fitted to data that are highly nonlinear. This difficulty may be avoided. and estimates of the coefficients of the independent variables will be relatively meaningless. excessive errors in prediction will result. x1 x2 . are intro- duced into a multiple regression equation to fit curved surfaces to data. x4 . If the chosen equation adequately describes the data. A trend in such a plot can be caused by the presence of one or more variables. and x5 have changed by more than trivial amounts when the independent variable x1 has been omitted from the analysis. containing many variables in both linear and nonlinear forms. nonlinear terms. however. and dividing the result by its standard deviation. On the other hand.0096 when x1 was included in the regression equation. for example. an increase of 36%. An analysis of the residuals is useful in checking if the data are ade- quately described by the form of the fitted equation. becomes−0. such as between x and x2 . When non- linear terms are added. such a plot will show a “random” pattern without trend or obvious curvilinearity. which was −0. (Ambient variables. in this case. consists of subtracting the mean of each variable from each value of that variable. whose values have a measurable influence on the value of Y over the time period of the experiment. without trends. the residuals should be examined carefully. (Standard- ization. While the t-tests associated with regression anal- ysis are not highly sensitive to departures from normality. The graph shows reasonable agreement with the assumption that the residuals are normally distributed. and so forth. Finally. A normal-scores plot is used to check the assumption that the residuals are approximately normally distributed. when x1 is not included. can produce an equation with better predictive power than one containing only a few linear terms. When the data depart seriously enough from the assumed relationship. a predicted value of y. 10 .07 . it is recommended that the outlying observation be discarded and that the regression be rerun. This graph.20 .00 2 2 2 ⫺.24 .10 Residuals . Thus. .50 2 2 ⫺2. These residuals are plotted against the run numbers in Figure 10.32 . Regression and Correlation 2.08 .00 Run number Figure 10.40 Predicted values of y Figure 9.00 12.00 18. .00 24.16 . 424 .14 .10 Residuals .00 ⫺.00 . tests are to be performed for the coefficients of the regression. likewise.00 ⫺.21 .00 .00 2 ⫺.50 Normal scores 1.07 . Plot of residuals against run numbers. this graph shows a random pattern with no obvious trends or curvilinearity.10 .) A plot of the residuals against ŷ is shown in Figure 9. Plot of residuals against ŷ.28 Figure 8. it appears that the linear multiple regression equation was adequate to describe the relationship between the dependent variable and the five independent variables over the range of observations.20 . with no linear or curvilinear trends.00 30. Normal-scores plot of wave-solder regression residuals. It appears that no time-dependent extraneous variable has materially affected the value of y during the experiment.00 6. shows a random pattern. Ignoring the outlier. 425 . The following are the scores that 12 students obtained a swimming pool at various times after it has been treated on the midterm and final examinations in a course in with chemicals: statistics: Midterm examination Final examination Number of Chlorine residual x y hours (parts per million) 71 83 2 1. the moisture content when the relative humidity is 38 percent. Regression and Correlation Applied Exercises SECS. Various doses of a poisonous substance were given 37 11 to groups of 25 mice and the following results were 42 13 observed: 34 10 Dose (mg) Number of deaths 29 8 x y 60 17 44 12 4 1 41 10 6 3 48 15 8 6 33 9 10 8 40 13 12 14 14 16 16 20 (a) Fit a least squares line that will enable us to predict the moisture content in terms of the relative humidity.4 91.00 2.4 73 77 8 1.1 85 74 12 0. 46 12 53 14 42.1 93 89 10 1. Use the coding of Exercise 15 to rework both parts of examination. 1–3 41. (a) Find the equation of the least squares line fit to (b) Use the result of part (a) to estimate (predict) these data. Exercise 42.10 1. The following data give the diffusion time (hours) of (b) Predict the final examination score of a student who a silicon wafer used in manufacturing integrated circuits received an 84 on the midterm examination. enable us to predict a student’s final examination score in this course on the basis of his or her score on the midterm 46.8 49 62 4 1. Raw material used in the production of a synthetic Diffusion time.2 92.56 1.9 58 48 82 78 (a) Fit a least squares line from which we can predict the 64 76 chlorine residual in terms of the number of hours since 32 51 the pool has been treated with chemicals.6 content of samples of the raw material (both in percent- ages) on 12 days yielded the following results: (a) Find the equation of the least squares line fit to these data. y 83.5 80 76 6 1.45 fiber is stored in a place that has no humidity control.3 hours.58 2. (b) Estimate the number of deaths in a group of 25 mice that receive a 7-milligram dose of this poison. Humidity Moisture content (b) Predict the sheet resistance when the diffusion time is 1. x 0.7 90. 87 73 80 89 (b) Use the equation of the least squares line to estimate the chlorine residual in the pool five hours after it has (a) Find the equation of the least squares line that will been treated with chemicals. The following data pertain to the chlorine residual in 43. and the resulting sheet resistance of transfer: 44.0 90. 45. Measurements of the relative humidity and the moisture Sheet resistance. (a) Use this technique to fit a power function of the form 57. (expressed as a percentage of total expenses) and the ducing certain electronic components and the number of net operating profits (expressed as a percentage of total units produced: sales) in a random sample of six drugstores: Lot size Unit cost Advertising Net operating x y expenses profits 50 $108 1.5 3. 49.0 157.0 4.50 at the 0.000 $5 1. 2. of least squares. If a set of paired data gives the indication that the regression equation is of the form μY|x = α · xβ . 2.30 at the 0. 3.3 6 9.0 2.3 (b) Use the result of part (a) to estimate the unit cost for (a) Fit a least squares line that will enable us to predict a lot of 300 components. Regression and Correlation 47.2 x y 188.8 250 $24 2. With reference to Exercise 43.6. Use the coding of Exercise 15 to fit a least squares line and. 50. construct a 98% confi- tomary to estimate α and β by fitting the line dence interval for the regression coefficient β. net operating profits in terms of advertising expenses.8 5.01 level of significance.4 the selling price of a house in that metropolitan area in 4 5. . . 51. i = 1.5.1 terms of its assessed value.0 (a) Fit a least squares line that will enable us to predict 2 2. The following table shows the assessed values and the regression equation is of the form μY|x = α · β x .0 269.1.350 against the alternative hypothesis β < 0.9 199. During its first five years of operation. With reference to Exercise 45.4 274.5 1 2. and 3.8 Weeks after Height 181.05 level of significance.7 the 0. .01 level of signif- to the points {(log xi . 56. 426 . . . a company’s sis β = 1. With reference to Exercise 44. construct a 99% confi- dence interval for the regression coefficient β. log yi ). predict the company’s gross income from sales β = 0. million dollars. Use this technique to fit an exponential 170.3 tive hypothesis β > 1. test the null hypothe- 48.5 206.4 curve of the form ŷ = α̂ · β̂ x to the following data on the 202. 55. selling prices of eight houses.25 against the alternative hypothesis β > 1. constituting a random sam- tomary to estimate α and β by fitting the line ple of all the houses sold recently in a metropolitan area: Assessed value Selling price log ŷ = log α̂ + x · log β̂ (thousands) (thousands) (of dollars) (of dollars) to the points {(xi . use the theory of Exer- log ŷ = log α̂ + β̂ · log x cise 22 to test the null hypothesis α = 21.3 2.0 243. If a set of paired data gives the indication that the 53.2 conditions: 174.350 at during its sixth year of operation. it is cus. the 0. 54.9 1.30 against the alterna- 8 18.4 500 $9 0.9 2. assuming that the same linear trend 52.3 growth of cactus grafts under controlled environmental 162. .50 against the alternative hypothesis α Z 21.25 at gross income from sales was 1. n} by the method icance.1 grafting (inches) 210. log yi ).8 225. i = 1. 5 7. The following data show the advertising expenses ŷ = α̂ · xβ̂ to the following data on the unit cost of pro. . With reference to Exercise 42.6 232. 2.4. Use the coding of Exercise 15 to rework both parts of SEC. test the null hypothesis continues.6 100 $53 1.3 214. 2. it is cus. . With reference to Example 4.4 1.05 level of significance. 4 Exercise 45.4 (b) Test the null hypothesis β = 1. n} by the method of least squares. that 20 students of prediction. dent who takes the test several times will consistently (b) 99% limits of prediction of the number of deaths in a get high (or low) scores. yield of wheat (in bushels per acre): 427 . x y cise 22 to construct a 95% confidence interval for α.9 65. 25 1.58 59. and observe the correlation between the scores that Note the greatly increased width of the confidence lim.2 Calculate r for these data and test its significance.01 against the alternative that 61. and y. Redo Exercise 61 when the dosage is 20 milligrams. x and y. who has studied 14 hours for the test. An achievement test is said to be reliable if a stu- 9 milligrams. Regression and Correlation (b) Test the null hypothesis α = 0. 5 (a) a 99% confidence interval for the expected number of deaths in a group of 25 mice when the dosage is 65.8 66.6 99.01 level of significance.41 40 1. Use the theory of Exercises 24 and 26. obtained for the even-numbered problems and the odd- estimating a value of Y for observations outside the range numbered problems of a new objective test designed to of the data. usually the even-numbered problems and the odd-numbered prob- 62.5 7.05 level of significance.1 39.0 27 38 39 43 2.65 (a) a 95% confidence interval for the mean test score of 60 1.6 36.2 34. that is. test eighth grade achievement in general science: x y x y 63. (a) Use appropriate computer software to fit a straight line to these data. 44 49 38 38 32 27 24 22 Force Elongation 27 35 33 34 x y 41 33 32 37 38 29 37 38 1. 30 1.01. The following table shows the elongation (in thou- sandths of an inch) of steel rods of nominally the same 27 29 33 42 composition and diameter when subjected to various ten. With reference to Exercise 42.3 4. Use the theory of Exercises 24 and 26. his fitted line.39 cise 22 to construct a 99% confidence interval for α. the follow- its for the expected number of deaths and of the limits ing data represent the grades. like plastic rods with the resulting deflections (cm).3 80.8 at the 0.9 esis ρ Z 0 at the 0. quantities already calculated in Exercise 51 for the data of Exercise 42. usually results in a highly inaccurate estimate.7 the null hypothesis ρ = 0 against the alternative hypoth- 6. 2.6 44 40 33 35 5. With reference to Exercise 43. 35 1. to find SEC.2 15. students get in both halves of the test. lems. (a) Use an appropriate computer program to fit a straight (b) 95% limits of prediction for the test score of a person line to these data.81 the quantities already calculated in Examples 4 and 5.95 level of significance.8 against the alterna. The following are loads (grams) put on the centers of tive hypothesis α > 0.94 persons who have studied 14 hours for the test. the amount of fertil- (b) Construct 99% confidence limits for the slope of the izer (in pounds) that a farmer applies to his soil. One way of checking the relia- group of 25 mice when the dosage is 9 milligrams.5 88. use the theory of Exer. The following data pertain to x. 36 44 39 31 sile forces (in thousands of pounds). Thus.60 60. Load Deflection 58. use the formula 4. use the theory of Exer. 45 1.78 to construct 50 1. 64.7 obtained in Exercise 31 to construct a 95% confidence interval for ρ.1 58. 8. as well as the β > 0.3 111. test the null hypothesis that β = 0. 67. (b) Using the 0. bility of a test is to divide it into two parts. With reference to Exercise 65. as well as 55 1. This example illustrates that extrapolation.3 30 27 34 32 3. Recalculate r for the data of Example 7 by first (1. of significance whether there is a relationship between and β2 . The following are sample data provided by a moving 70. and 3. and 48 and the col- umn headings by the corresponding class marks 23. and from Exercise 70 that this kind of coding will not affect the average daily traffic (in thousands of cars) that passes the value of r.10 level. β1 . and the test scores in the table in Section 3 and compare this interval with the one obtained in Example 6.05 level. 28. 0. Also. (b) Test whether r is significantly different from 0 using 68.05 level (a) Assuming that the regression is linear.) their locations: 428 . the sample correlation coefficient for the data of Exer- cise 63.2 112 71. SECS. use the formula the 0. Use the formula of Exercise 29 to calculate a 95% (b) Test whether this coefficient is significant using the confidence interval for β for the numbers of hours studied 0.000 pounds) (1.5 160 3. (It follows (in $1. and 43. the distances the same constant to each x. (a) Use an appropriate computer program to obtain the sample correlation coefficient for the data of Exer- cise 64. Regression and Correlation x y x y x y History scores 112 33 88 24 37 27 21–25 26–30 31–35 36–40 41–45 92 28 44 17 23 9 21–25 1 Economics scores 72 38 132 36 77 32 66 17 23 14 142 38 26–30 3 1 112 35 57 25 37 13 88 31 111 40 127 23 31–35 2 5 2 42 8 69 29 88 31 36–40 1 4 1 126 37 19 12 48 37 72 32 103 27 61 25 41–45 1 3 52 20 141 40 71 14 46–50 1 28 17 77 26 113 26 73.0 1. adding the same constant to they were moved. 1.0 69 tory and economics scores of 25 students are distributed. and the damage that was incurred: each y. 69. 1. 33. 43.000 miles) (dollars) multiplying each x and each y by 10 and then subtracting x1 x2 y 70 from each x and 60 from each y.4 0. their seating capacities.6 186 marks (midpoints) 23. 38. 76.8 1. or multiplying each x and/or y by the same positive Weight Distance Damage constants.400 pounds is moved 1. 33. −1. With reference to Exercise 67. (b) Use the results of part (a) to estimate the dam- age when a shipment weighing 2.01 level of significance. 2. 28. 6–7 77. Assuming that the data can be looked upon as a random 74. 4. 1. The following are data on the average weekly profits the economics scores −2.0 2. and 2 and the class marks of 78. draw a scattergram of these paired data and judge 75. 72. estimate β0 . −1.2 2.8 123 r. calculate r edition. replacing the row headings by the corresponding class 4. scores in the two subjects. 38.200 miles. (a) Use an appropriate computer program to obtain whether the assumption seems reasonable. 0. The calculation of r can often be simplified by adding company on the weights of six shipments. 1. Use this value of r to test at the 0. and test its significance at the 0.6 1. obtained in Exercise 31 to construct a 99% confidence interval for ρ. Rework Exercise 71. This question has been intentionally omitted for this sample from a bivariate normal population. coding the class marks of the his- tory scores −2. This question has been intentionally omitted for this edition. The table at the bottom of the page shows how the his.0 90 Use the method of Exercise 32 to determine the value of 3.000) of five restaurants. 0 57.2 z1 ’s and z2 ’s.10 1.Q.18 1.’s.000 cars. and 1 and the x2 -values 120 19 23. 103 4 38 111 6 60 83. of 108 who parabolic and of the form studied 6 hours for the examination. call them 200 8 24. The regression models that we introduced in 124 2 86 Sections 2 and 6 are linear in the x’s.) 180 15 26. .200 3 6. For instance. When the x1 ’s.0 85.14 percent and the annealing tem. The following are data on the percent effectiveness of a pain reliever and the amounts of three different medi- (a) Assuming that the regression is linear. Indeed. 0.5 80. β1 . 15 20 10 47 79. but.000 4 5. in some problems where the relationship between the x’s and β2 .2 0. their I. Rework Exer- cise 80 coding the x1 -values −1. μY|x = β0 + β1 x + β2 x2 80.9 0.2 240 16 33. and 15 30 10 58 the numbers of hours they spent studying for the 15 30 20 66 examination: 30 20 10 59 Number of 30 20 20 67 I.7 0. cations (in milligrams) present in each capsule: and β2 .200 1 8. the calculation of the β̂’s can be simplified by x1 x2 y using the coding suggested in Exercise 15. but 150 12 22.100◦ F. (b) Use the results of part (a) to predict the average Percent weekly net profit of a restaurant with a seating capacity Medication A Medication B Medication C effective of 210 at a location where the daily traffic count averages x1 x2 x3 y 14. we have not only z1 = 0 and z2 = 0.Q. estimate β0 . they are also linear in the β’s. estimate the 112 11 82 regression coefficients after suitably coding each of the 121 9 93 x’s. 429 .3 0.5 82.02 1. Regression and Correlation Seating Traffic Weekly net 81.0 Fit a plane by the method of least squares.5 estimate the average hardness of this kind of steel when 7 6. they can be used (a) Assuming that the regression is linear. The following data were collected to determine the relationship between two processing variables and the we simply use the regression equation μY|x = β0 + β1 x1 + hardness of a certain kind of steel: β2 x2 with x1 = x and x2 = x2 .4 0. and express the estimated regression equation in 110 8 81 terms of the original variables. more important.8 −1 and 1.0 60. estimate β0 . The following data consist of the scores that 10 15 20 20 54 students obtained in an examination.5 the copper content is 0. β1 . and use it to 6 5.200 5 6. x2 ’s. .02 1.18 1.000 2 8.9 0. (Note that for the coded x1 ’s and x2 ’s. and/or the xk ’s are equally capacity count profit spaced. and y is not linear.000 x y 55. when the regression is (b) Predict the score of a student with an I. . Use this method to fit Annealing a parabola to the following data on the drying time of Hardness Copper content temperature a varnish and the amount of a certain chemical that has (Rockwell 30-T) (percent) (degrees F) been added: Amount of additive Drying time y x1 x2 (grams) (hours) 78.10 1. . 8 7. hours studied Score 30 30 10 71 x1 x2 y 30 30 20 83 45 20 10 72 112 5 79 45 20 20 82 126 13 97 45 30 10 85 100 3 51 45 30 20 94 114 7 65 Assuming that the regression is linear.0 perature is 1.Q.0 also z1 z2 = 0. (a) Use an appropriate computer program to fit a Fit a parabola to these data by the method suggested in plane to the following data relating the monthly water Exercise 83. mean monthly ambient temperature (◦ F). 2. With reference to Exercise 77.200 miles.2 21 the 0. y x1 x2 92.1 21 β2 = 10. 2. 85.1 19 90. Regression and Correlation Also.7 21 88.2 70. 1. 156 19 6 45 67 −3 93.3 20 87. Use an appropriate computer program to redo Exer- 11 89 cise 82 without coding the x-values.723 144.0 63. and it operates for 20 days. 10 120 97. 2.400-pound shipments that are 109 27 −5 moved 1. construct a 95% confi.088 109.378 101.5 19 1.4 36. use the result of part 31 62 −1 (b) of Exercise 39 to construct a 98% confidence interval 17 99 −3 for the mean damage of 2.0 tons.228 98. 349 43 12 195 25 −8 94.400-pound 118 3 4 shipment that is moved 1.7 58.6 82.5 67. test the null hypothesis 1. With reference to Exercise 77. predict the drying time when 6. With reference to Exercise 78. Water usage Production temperature operation y x1 x2 x3 86.3 18 the 0.6 17 89. With reference to Exercise 78.902 97. The following data pertain to the demand for a prod.980 83. With reference to Exercise 78. test the null hypothesis 3. 14 56 98. With reference to Exercise 77. With reference to Exercise 77. ing capacity of 210 at a location where the daily traffic uct (in thousands of units) and its price (in cents) charged count averages 14.05 level of significance. 2.05 level of significance. 430 . the mean ambient tem- 91.721 131.12 against the alternative hypothesis β1 < 0. usage of a production plant (gallons) to its monthly pro- duction (tons). use the result of part ical is added.5 44. With reference to Exercise 78.12 at 2.3 64.254 82. 8 95% confidence interval for the mean sales price of a three-bedroom house with two baths in the given housing 99. To judge whether it was worthwhile to fit a parabola and the monthly number of days of plant operation over in Exercise 84 and not just a straight line. use the result of part Price Demand (b) of Exercise 40 to construct 98% limits of prediction x y for the average weekly net profit of a restaurant with a 20 22 seating capacity of 210 at a location where the daily traffic 16 41 count averages 14.0 19 dence interval for the regression coefficient β1 .8 49. 95. Use the results obtained for the data of Example 9 perature is 65◦ F. and the result of part (b) of Exercise 39 to construct a SEC.522 64.200 miles.000 cars. test the null a period of 12 months.4 58.0 69. use the result of part 72 24 2 (b) of Exercise 40 to construct 95% limits of prediction 94 48 5 for the damage that will be incurred by a 2.1 44. (b) Estimate the water usage of the plant during a month dence interval for the regression coefficient β2 . Use the results obtained for the data of Example 9 to construct a 90% confidence interval for the regression 2.031 84. Use the results obtained for the data of Example 9 and the result of part (b) of Exercise 40 to construct 99% 118 41 −6 limits of prediction for the sales price of a three-bedroom 38 76 3 house with two baths in the given housing development. hypothesis β2 = 0 against the alternative hypothesis Mean Days of β2 Z 0 at the 0.1 20 1.05 level of significance. (b) of Exercise 39 to construct a 99% confidence interval for the mean weekly net profit of restaurants with a seat- 84.4 19 coefficient β2 (see Exercise 38).6 23 β1 = 0. (a) Fit a linear surface to the following data: development.0 at 2. when its production is 90.5 grams of the chem.0 against the alternative hypothesis β2 Z 10. construct a 98% confi.717 70. in five different market areas: 96.000 cars.609 108. 1977. A. 2.254 82 44 18 Exercise 99. pattern is “random. and Wonnacott. Inc. Second Course in Statistics. and create a new variable that is the prod- 1. Regression and Correlation (b) How good a fit is obtained? (c) Plot the residuals against ŷ and determine whether the (c) Plot the residuals against ŷ and determine whether the pattern is random.031 84 58 20 y = b 0 + b 1 x1 + b 2 x2 + b 3 x3 + b 4 x1 x2 1. 2. The following data represent more extended mea. 2. 100. 2nd ed. Applied Linear Statistical Methods. (b) Use a computer program to make a normal-scores (f) Find the correlation matrix of the four standardized plot of the residuals..717 70 64 21 2. Compare the goodness of fit of this surface surface to these data. and ρ is given in the third edition (but not in the Sons. Using the data of Exercise 99...” (d) Check for excessive multicollinearity among the inde- (d) Check for multicollinearity among the independent pendent variables. New York: John Wiley & Sons. x1 and y x1 x2 x3 x2 . variables.980 84 64 19 uct of the standardized values of x1 and x2 .: Prentice Hall. 4th York: John Wiley & Sons. (e) Fit a curved surface of the same form to the standard- (a) Use an appropriate computer program to fit a linear ized variables.117 107 51 22 (c) Find the correlation matrix of the four independent 1. in Inc. 1983. Applied Linear Regression...228 97 68 19 (e) Fit a surface of the same form as in part (b) to the 2.609 108 70 20 2..721 132 49 23 x1 . 1 1+R Upper Saddle River. 431 . New York: John Wiley & σ2 . 1981.902 97 36 17 variables. Introduction to Mathematical Statistics. (a) Create a new variable. (b) Fit a surface of the form surements of monthly water usage at the plant referred to in Exercise 98 over a period of 20 months: y = b 0 + b 1 x1 + b 2 x2 + b 3 x2 2 (c) Find the correlation matrix of the three independent Mean Days of variables.460 111 62 21 (a) Create a new variable.. x2 . H.723 144 58 19 this surface to that of the linear surface fitted in Exer- 3.436 126 59 21 102.. 2. P.. Compare the goodness of fit of 2. Using the data of Exercise 100. D.012 91 61 20 values of ŷ and compare this plot to the one obtained in 2. for Mathematical Statistics. Regression: A derivation of the maximum likelihood estimates of σ1 . S.088 109 82 21 cise 99. T. New art. x2 2 . S. 1. · ln may be found in Kendall.. Inc. Inc. to that of the linear surface fitted in Exercise 100. J. and Stu- 2 1−R Weisberg. The Advanced Theory of Statistics. 1985.522 64 44 19 (f) Plot the residuals of this regression analysis against the 2. N. F.378 101 69 21 (b) Fit a surface of the form 2. and create a new variable that is the square of the stan- dardized value of x2 . 1962. Is there evidence of multicollinearity? Water usage Production temperature operation (d) Standardize each of the independent variables. A Wonnacott. instance. Vol. Is there evidence of multicollinearity? 2. 3rd ed. Inc. New York: John Wiley & Sons.251 98 56 22 2. 2..559 113 66 19 standardized variables.. be found in numerous more advanced books. 1962. and x3 . References A proof of Theorem 3 and other mathematical details More detailed treatments of multiple regression may left out in the chapter may be found in Wilks. ed. M. G. New York: Macmillan Publishing Co. Does the assumption of normality independent variables and compare with the results of appear to be satisfied at least approximately? part (c). x1 x2 . and information about the distribution of Morrison.357 96 85 19 (d) Standardize each of the three independent variables 1.. fourth edition) of Hoel. S. R..J.147 85 54 18 2. 101. 7634.975z1 − 11. (b) t = 3.5 < β < 15.1. 47 (a) ŷ = 1.10.994.2 < $128.97z2 (coded).659 and β̂2 = 1. (b) 3. 3 μY|x = and μX|y = . (b) t = 3.84 < β < 4.5 − 24.904x1 + 0. z = 5.3x.5 < μY|2.9 − 0.05 and the value of r is significant. 1 + r − (1 − r)e−2zα/2 / n−3 79 (a) β̂0 = −124.5816x. 31 √ <ρ < (b) ŷ = 63.609 + 0. (b) z = 7.2x2 . 2 2 (e) ŷ = 47.0857x. z = 5. n−k−1 rather than a straight line. (b) ŷ = 88. 2 3 61 (a) 6.5 − 2.1217 < β < −0. 51 t = 3.02 + 1.18.05 level of significance. (b) ŷ = 1.4.x2 = 0. 83 ŷ = 10. β̂1 = 30.10. n 65 r = 0. tα/2. 1 + r − (1 − r)e2zα/2 / n−3 √ . ŷ = 71.94.16.20 + 13.4927x. β̂1 = 1. (b) 88. (c) rx1 x2 = −0.1. 63 (a) ŷ = 2.4777 and 12. (b) ŷ = 1.46 + 3.285.x2 = −0.56.0x + 0.06x22 . β̂ − β 75 (a) 0. the null hypothesis must be rejected.55.24.772. 97 ŷ = −2. i=1 13 β̂ = .0x2 + 70.371(1. (b) 11. 93 $74.7009.3.2 < $296.413. xi yi 67 r = 0. 1 + r + (1 − r)e2zα/2 / n−3  85 t = 2.8x1 + 15.55 and the value of r is significant. 432 . the null hypothesis cannot be rejected.07x2 .33 + 0. hypothesis must be rejected. 99 (a) ŷ = 170 − 1.n−2 · √ . 53 (a) ŷ = −37. 5 μX|1 = 47 and μY|0 = 98 .39x1 + 6. z = 2.439. 41 (a) ŷ = 83. 87 t = 0. 69 2.57. 1 2 55 −0. the null hypothesis must be rejected. 920.0497. 1 + r + (1 − r)e−2z √ α/2 / n−3 81 ŷ = 69. the null 101 (b) ŷ = 86.98x.142.508x2 + 2.68.72.73 + 2.63.2(x2 )2 . 57 (a) ŷ = 1. 650 < μY|3. the null hypothesis cannot be rejected and there n[X0 (X  X)−1 X0 ] is no real evidence that it is worthwhile to fit a parabola 39 (b) B X0 .n−2 · √ < β < β̂ + tα/2.41.727.900x3 .16. 89 t = −4.95.3 − 0.900x1 + 1.8999 + 0.4826x.2594 + 1. ŷ = 5. 91 $288. 49 ŷ = 1.81 and the value of r is significant. Sxx √ Sxx (b) ŷ = $101. the null hypoth- esis cannot be rejected. i=1 73 r = 0. (b) β̂ − tα/2.n−k σ̂ · .109 and β̂2 = 12. 45 (a) ŷ = 1.4714.5026.565 and the value of r is significant.421.2. Se Se 77 (a) β̂0 = 14. rx . z = 4. it is significantly different from 19 (a) t = √ .452 < μY|9 < 9.0857x (coded).4714. 43 (a) ŷ = 31. rx . se / Sxx 0 at the 0.27x2 + 0. Regression and Correlation Answers to Odd-Numbered Exercises 1+x 2y 59 −2.218.63.2846 < α < 65. n x2i 71 r = 0.383)x . xi yi = xi yi i=1 i=1 i=1  n  n 7. ln xi = ln xi i=1 i=1 From Appendix A of John E. that is. Freund’s Mathematical Statistics with Applications. without the or notation: THEOREM 1. Copyright © 2014 by Pearson Education. (xi + yi ) = xi + yi i=1 i=1 i=1  n  n 4. When working with sums or products. k = kn i=1     n  n  n 6. Marylees Miller. Inc. . the and notations are widely used in statistics. Eighth Edition. which  can all be verified by writing the respective expressions in full. 433 . All rights reserved. Appendix Sums and Products 1 Rules for Sums and Products 2 Special Sums 1 Rules for Sums and Products   To simplify expressions involving sums and products. k = nk i=1 n  n  n 3. it is often helpful to apply the following rules.  n  n 1. Irwin Miller. kxi = kn · xi i=1 i=1 n 5. . In the usual notation we write  b xi = xa + xa+1 + xa+2 + · · · + xb i=a and  b xi = xa · xa+1 · xa+2 · . · xb i=a for any nonnegative integers a and b with a F b. kxi = k · xi i=1 i=1  n 2. .. first for r = 0. which the reader will be asked to prove in Exercise 1.. then for r = 2. for k = 1 we get   1 S(n.. ... 0) = (n + 1) − 1 = n 0 434 . and if we repeat- edly apply the definition of given above. .. particularly when we deal with rank sums. THEOREM 3. the following theorem is of special interest. we have... expressions for S(n. triple sums. then for r = 1. that is. . and the second subscript denotes the column. .. 2.. 3. for example. . and so forth. + (xm1 + xm2 + · · · + xmn ) Note that when the xij are thus arranged in a rectangular array. When we work with double sums.. ⎡ 2 ⎤  1⎣  n  n xi xj = xi − x2i ⎦ 2 i<j i=1 i=1 where   n−1  n xi xj = xi xj i<j i=1 j=i+1 2 Special Sums In the theory of nonparametric statistics... we often need expressions for the sums of powers of the first n positive integers. A disadvantage of this theorem is that we have to find the sums S(n. . the first subscript denotes the row to which a particular element belongs. r) = (n + 1)k − 1 r r=0 for any positive integers n and k. 1..... The following theorem. . r) = 1r + 2r + 3r + · · · + nr for r = 0.    k−1 k S(n.  m  n  m xij = (xi1 + xi2 + · · · + xin ) i=1 j=1 i=1 = (x11 + x12 + · · · + x1n ) + (x21 + x22 + · · · + x2n ) .. provides a convenient way of obtaining these sums... are also widely used in statistics. For instance. r) one at a time.. it is an immediate consequence of the multinomial expansion of (x1 + x2 + · · · + xn )2 THEOREM 2. Appendix: Sums and Products Double sums.. Using the same technique. 0) + S(n. x3 = 4. 2) = 12 + 22 + · · · + n2 1 = n(n + 1)(2n + 1) 6 and S(n. Similarly. 1) = n2 + 2n and hence S(n. 3) = 13 + 23 + · · · + n3 1 2 = n (n + 1)2 4 Exercises 1. for k = 2 we get     2 2 S(n. Given x1 = 2. 0) = 10 + 20 + · · · + n0 = n. x4 = −2. Appendix: Sums and Products and hence S(n. 1) = (n + 1)2 − 1 0 1 n + 2S(n. Prove Theorem 3 by making use of the fact that 5. x2 = −3. y3 = 2. y2 = −3. y1 = 5. 1) = 11 + 21 + · · · + n1 = 12 n(n + 1). find k−1 . and y4 = −1. the reader will be asked to show in Exercise 2 that S(n. (d) x2i fi . Given x11 = 3. x21 = 1. 3. x33 = 2. i=1 i=1 i=1 4 4. 3  4 i=1 i=1 mation xij using  5  5 i=1 j=1 (c) xi fi . (e) xi yi . find j=1  5  5 7. x2 = 3.  4  4  4 (c) x2i . 2) and S(n. f4 = 5. x32 = −1. f2 = 7. and f5 = 2. and find an expression for S(n. and 4. i=1 i=1 i=1 ously. f3 = 10. x22 = 4. 5 (a) 1. x7 = 1. (d) y2i . (e) 29. x23 = −2. (b) 3. find  8  8  3 (a) xi . (b) fi . evaluate the double sum- (a) xi . x6 = 2. i=1 i=1 (b) the results of part (b) of that exercise. 6. and x8 = 2. (b) x2i . (a) xij separately for j = 1. x14 = 2. and 3. 2. 435 . (d) 39. x2 = 4. (c) 33. 7 (a) 19. x3 = −2. 2. x12 = 1. Given x1 = 3. find and x34 = 3. Verify the formulas for S(n. Answers to Odd-Numbered Exercises 3 (a) 10. (b) xij separately for i = 1. (b) 19. 4). (b) 40.  (m + 1)k − mk = k mr  4  4 r (a) xi . (a) the results of part (a) of that exercise. (b) yi . f1 = 3. r=0 i=1 i=1 which follows from the binomial expansion of (m + 1)k . x13 = −2. Given x1 = 1. x31 = 3. 3) given previ. x24 = 5. x5 = 7. x4 = 4. x4 = 6. x5 = −1. 2. 3. x3 = 5. With reference to Exercise 6. This page intentionally left blank . All rights reserved. n. θ ) = θ (1 − θ )n−x for x = 0. Copyright © 2014 by Pearson Education. . . . . . . 2. . Marylees Miller. . . Freund’s Mathematical Statistics with Applications. n x Parameters: n is a positive integer and 0 < θ < 1 Mean and variance: μ = nθ and σ 2 = nθ (1 − θ ) 3 Discrete Uniform Distribution (Special Case) 1 f (x. Parameter: 0 < θ < 1 1 1−θ Mean and variance: μ = and σ2 = θ θ2 From Appendix B of John E. k k Parameter: k is a positive integer k+1 k2 − 1 Mean and variance: μ = and σ2 = 2 12 4 Geometric Distribution g(x. Eighth Edition. 3. k) = for x = 1. 2. . Inc. θ ) = θ (1 − θ )x−1 for x = 1. 437 . θ ) = θ x (1 − θ )1−x for x = 0. 2. 1. Appendix Special Probability Distributions 1 Bernoulli Distribution 4 Geometric Distribution 2 Binomial Distribution 5 Hypergeometric Distribution 3 Discrete Uniform Distribution 6 Negative Binomial Distribution (Special Case) 7 Poisson Distribution 1 Bernoulli Distribution f (x. 1 Parameter: 0 < θ < 1 Mean and variance: μ = θ and σ 2 = θ (1 − θ ) 2 Binomial Distribution   n x b(x. . Irwin Miller. and n − x F N − M ⎝ ⎠ n Parameters: n and N are positive integers with n F N. λ) = for x = 0. . . . k. n. and M is a nonnegative integer with M F N nM nM(N − M)(N − n) Mean and variance: μ = and σ 2 = N N 2 (N − 1) 6 Negative Binomial Distribution   x−1 k b∗ (x. N. 1. M) = ⎛ ⎞ N x F M. . h(x. k + 1. k−1 Parameters: k is a positive integer and 0 < θ < 1 k k(1 − θ ) Mean and variance: μ = and σ 2 = θ θ2 7 Poisson Distribution λx e−λ p(x. n. . k + 2. . . . Appendix: Special Probability Distributions 5 Hypergeometric Distribution ⎛ ⎞⎛ ⎞ M⎠⎝N − M⎠ ⎝ x n−x for x = 0. 2. 2. x! Parameter: λ > 0 Mean and variance: μ = λ and σ2 = λ 438 . 1. θ ) = θ (1 − θ )x−k for x = k. . . Eighth Edition. Irwin Miller. α. β) = π (x − α)2 + β 2 Parameters: − q < α < q and β >0 Mean and variance: Do not exist 3 Chi-Square Distribution ⎧ ⎪ ⎪ 1 ν−2 x ⎨ x 2 e− 2 for x > 0 f (x. α. Copyright © 2014 by Pearson Education. Freund’s Mathematical Statistics with Applications. 439 . All rights reserved. Appendix Special Probability Densities 1 Beta Distribution 6 Gamma Distribution 2 Cauchy Distribution 7 Normal Distribution 3 Chi-Square Distribution 8 t Distribution (Student’s t Distribution) 4 Exponential Distribution 9 Uniform Distribution (Rectangular 5 F Distribution Distribution) 1 Beta Distribution ⎧ ⎪ ⎪ (α + β) α−1 ⎨ x (1 − x)β−1 for 0 < x < 1 f (x. ν) = 2ν/2 (ν/2) ⎪ ⎪ ⎩0 elsewhere Parameter: ν is a positive integer Mean and variance: μ = ν and σ 2 = 2ν From Appendix C of John E. Inc. β) = (α) · (β) ⎪ ⎪ ⎩0 elsewhere Parameters: α > 0 and β >0 α αβ Mean and variance: μ = and σ2 = α+β (α + β)2 (α + β + 1) 2 Cauchy Distribution β p(x. Marylees Miller. θ ) = θ ⎪ ⎩0 elsewhere Parameter: θ > 0 Mean and variance: μ = θ and σ2 = θ2 5 F Distribution ⎧ . Appendix: Special Probability Densities 4 Exponential Distribution ⎧ ⎪ ⎨ 1 e−x/θ for x > 0 g(x. ⎪ ⎪ ν1 + ν2 ⎪ ⎪  . ν1 . − 1 (ν1 +ν2 ) ⎪ ⎪ ⎨ . 2 . σ ) = √ e 2 σ for − q < x < q σ 2π Parameters: μ and σ > 0 Mean and variance: μ = μ and σ2 = σ2 8 t Distribution (Student’s t Distribution) . μ. ν1 2 · f ν21 −1 1 + ν1 f 2 for f > 0 g(f ) = ν1 ν2 ν2 ν2 ⎪ ⎪  ⎪ ⎪ 2 2 ⎪ ⎪ ⎩0 elsewhere Parameters: ν1 > 0 and ν2 > 0 ν2 Mean: μ = ν2 − 2 6 Gamma Distribution ⎧ ⎪ ⎨ 1 xα−1 e−x/β for x > 0 f (x) = β α (α) ⎪ ⎩0 elsewhere Parameters: α > 0 and β >0 Mean and variance: μ = αβ and σ 2 = αβ 2 7 Normal Distribution  2 1 − 1 x−μ n(x. ν) = . ν +1  − ν+1  2 t2 2 f (t. · 1+ for − q < t < q √ ν ν π ν 2 Parameter : ν is a positive integer ν Mean and variance: μ = 0 and σ2 = for ν > 2 ν −2 440 . Appendix: Special Probability Densities 9 Uniform Distribution (Rectangular Distribution) ⎧ ⎪ ⎨ 1 for α < x < β u(x. α. β) = β −α ⎪ ⎩0 elsewhere Parameters: − q < α < β < q α+β 1 Mean and variance: μ = and σ2 = (β − α)2 2 12 441 . This page intentionally left blank . Critical Values for the Runs Test From Statistical Tables of John E. Critical Values for the Signed-Rank Test V. Inc. Values of χα. Values of ex and e−x III. Values of f 0. Statistical Tables I. Factorials and Binomial Coefficients II. Values of tα. Freund’s Mathematical Statistics with Applications. Critical Values for the U Test VI. ν2 and f 0. Eighth Edition. Poisson Probabilities VIII. ν1 . ν1 .05. 443 . Values of rp IV. Marylees Miller. Binomial Probabilities VII. Irwin Miller. ν X. All rights reserved. ν2 XII. Standard Normal Distribution IX. Copyright © 2014 by Pearson Education.2 ν XI.01. 0467 .3600 .3840 .2389 .25 .1569 .3456 .1641 6 .0146 .0079 .0000 .1762 .2160 .3277 .15 .2109 .45 .1678 .0270 .2344 5 .0012 .0320 .1335 .0000 .0000 .0500 .0007 .0879 .3364 .3750 .3685 .2965 .0938 2 .2437 .0054 .0000 .1468 .0164 .1318 .2765 .0256 .1875 .2793 .0092 .9000 .3110 .6634 .1176 .3280 .0000 .0185 .0000 .4084 .0021 .0001 .0231 .3543 . D.0039 1 .3915 .0010 .3105 .1664 .4320 .0617 .0872 .0824 .0486 .2458 .1681 .2090 .3960 .0865 .1296 .3771 .0011 .0135 .0001 .2048 .2188 4 .0548 .3430 .2725 .0576 .0055 .3826 .9500 .1001 .0016 .1240 .0900 .2373 .3993 .0000 .2613 .2059 .2322 .3456 .0312 1 .0413 .3115 .1811 .4950 .0015 .0280 .1382 .4200 .5000 2 .1488 .0037 .3675 .1359 .8100 .0256 .: U.6000 .0080 .3932 . 1950.0010 .0044 .0026 .0004 .0000 .1361 .2587 .1935 .0100 .0000 .0005 .0078 8 0 .0230 .2550 .0102 .5220 .0004 .3369 .6400 .2304 .3164 .0703 .1536 .4096 .0547 2 .0972 .0250 .0488 .6500 .4550 .2734 5 .0152 .0911 .0625 1 .0000 .0896 .4096 .C.0002 .0041 .0000 .7000 .2627 .0081 .4436 .0185 .3025 .7290 .0459 .0100 .05 .5000 2 0 .1172 .0774 .8145 .1354 .0640 .2500 4 .0503 .1406 .4116 .0331 .2500 1 .3280 .0406 .2025 .0000 .2753 .10 .0003 .0287 .0000 .0330 .0002 .2880 .5120 .0001 .2097 .0415 .0000 .1500 .1715 .0004 .0270 .3241 .0001 .2916 .0547 7 .4900 .3355 .3720 .0043 .0001 .1250 4 0 .0115 .0004 .3115 .1785 .2734 5 .3200 .3500 .0154 .6983 .1306 .1115 .0004 .0754 .2966 .0305 .0756 . 6.0369 .2568 .0400 .0466 .1239 .0729 .8574 .0033 .0024 .0244 .0046 .0084 . 444 .2541 .0975 .7738 .0000 .5000 1 .2076 .0039 † Based on Tables of the Binomial Probability Distribution.2918 .0109 .2005 .2787 .0000 .0312 6 0 .1890 .7351 .0490 .4410 .0084 .3750 3 .0000 .0083 .3125 3 .3602 .2355 .1848 .2500 .1094 7 .0012 .1000 .3341 .0214 .2903 .0081 .1730 .0064 .4219 .0025 .0277 .3560 .0625 5 0 .0515 .2995 .0000 .0808 .0778 .1094 3 .0000 .0004 .5625 .2430 .2621 .2376 .0002 .0022 .1442 .2000 .5314 .2670 .4500 .0319 .1147 .1852 .0026 .2401 .4305 .0172 .3087 .0915 .2637 .2573 .4096 .0002 .0000 . Washington.0002 .2097 .0001 .2786 .4437 .0001 .0013 .0819 .0429 .3955 .35 .2388 .0156 7 0 .2592 .2757 .2471 .0115 .1780 .1866 .1250 1 .0168 .3000 .1382 .0577 .0000 .0102 .20 .2188 6 .2344 3 .1536 .1600 .1861 . National Bureau of Standards Applied Mathematics Series No.0156 .3845 .2140 .6141 .3206 .1225 .3125 4 .0036 .0950 .0984 .2734 4 .4219 .8000 .0839 .7225 .0001 .1719 .2321 .2679 .0150 .0036 .0146 .0595 .3750 3 .3847 .0034 .3177 .7500 .0000 .0011 .0960 .1977 .9025 .0574 .0000 .3251 .1323 .1128 .0016 .2269 .2646 .50 1 0 .2036 .0312 8 .0005 .1373 .0053 .0018 .0078 1 .0284 .3115 .0312 2 .0002 .0951 .4000 .3670 .5500 .3025 .0217 .1562 2 .0156 1 .0512 .0071 .0017 .0001 .2500 3 0 .5905 .1160 .0467 .2936 .0000 .3032 .0006 .1800 .0038 .S.0205 .1641 3 .1562 5 .0938 6 .0768 .2746 .0000 .0609 .0039 .0036 .3125 4 .6561 .2985 .2500 2 .0012 .4800 . Statistical Tables Table I: Binomial Probabilities† θ n x .3456 .40 .0001 .4225 .0625 .0225 .0469 .2780 . Government Printing Office.4219 .30 .0000 .3124 .4783 .0007 .0410 .3750 2 .8500 . 0000 .0000 .2600 .2581 .0001 .0001 .0037 .0213 .1612 .20 .0000 .1172 4 .0113 .0000 .3487 .0135 .0000 .0806 4 .0097 .0024 .2581 .1937 .1720 .1721 .0000 .2119 .2316 .1209 .1787 .1489 .3151 .0002 .0046 .0001 .1517 .0000 .0016 .0033 .0836 .0008 .0000 .0012 .0013 .0173 .1298 .0001 1 .0176 9 .0000 .0000 .0000 .3874 .0317 .1585 .0003 .1321 .0037 .0074 .0002 .0000 .1998 .0004 .0282 .0703 3 .2759 .0735 .0025 .0212 .2301 .1110 .3679 .1931 .1611 8 .0161 3 .0064 .2684 .3003 .2150 .0020 1 .1029 .0923 .2428 .0932 .0867 .0712 .0000 .0000 .0001 .1611 5 .0000 .0453 .0703 8 .3293 .0000 .1877 .2013 .0000 .0006 .0000 .1388 .0000 .0126 .0403 .3874 .1107 .0060 .1168 .1762 .2924 .0097 .2007 .1088 .1673 .0057 .2448 .0000 .0016 11 0 .1329 .0000 .0751 .0401 .0683 .2835 .0550 .0000 .0000 .0045 .0339 .0229 .0176 2 .0036 .2336 .0078 .0000 .2568 .1032 .0518 .15 .0881 .0000 .0000 .0000 .0048 .0000 .0259 .0035 .2124 .0003 .0000 .0055 .1160 .2323 .45 .0061 .0000 .0000 .5133 .0021 .0000 .0000 .1128 .0269 10 .0000 .2953 .0746 .0234 .0220 .2461 6 .0368 .0039 .0000 .0025 .1069 .0000 .1715 .0563 .0020 10 0 .0132 .1954 .2194 .0001 .2256 7 .0701 .0001 .0199 .0040 .0379 .0000 .0006 .0115 .2311 .0198 .0090 .0004 .0031 .0014 .0125 .3020 .1471 .1211 .0003 .1208 9 .0001 .0639 .3248 .0125 .0000 .1678 .0005 .1536 .0710 .1700 .0002 .0004 .0098 10 .0532 .2680 .2377 .0028 .0008 .2597 .2362 .0000 .0015 .0023 .0725 .0988 .0010 .0050 .1934 6 .0000 .0000 .3874 .0002 13 0 .1004 .0401 .0000 .0277 .0005 .0422 .0000 .0007 .2256 7 .2362 .0002 .0425 .0407 .0000 .0098 2 .0743 .0000 .2716 .1672 .0005 .0013 .0420 .2503 .0001 .1596 .0000 .1766 .0003 .0000 .3020 .2051 5 .0138 .0210 .2581 .0687 .2360 .0000 .25 .2253 .2215 .0000 .0207 .35 .1109 .1556 .2039 .3512 .2062 .0574 .0173 .0029 2 .0001 .0540 .1641 4 .5404 .1757 .2270 .0102 .2508 .0389 .0052 .0000 .1936 .0000 .0000 .0264 .2367 .2201 .0000 .0404 .0158 .2522 .0095 445 .0002 1 . Statistical Tables Table I: (continued) θ n x .1549 .0000 .0101 .2508 .0000 .2384 .2060 .50 9 0 .1342 .2340 .0155 .3020 .0000 .1259 .2254 .0005 12 0 .0000 .0291 .0283 .0584 .0000 .0077 .6302 .0042 .0000 .0008 .3672 .0165 .1029 .1281 .2207 .0075 .0038 .0000 .0269 3 .0174 .3413 .0629 .0852 .30 .0000 .1460 .1641 7 .0000 .0021 .1009 .2001 .2051 7 .0000 .0105 .1934 8 .0054 11 .05 .0001 .0591 .1722 .0792 .5987 .3835 .3766 .0212 .0000 .0014 .1115 .0887 .0238 .2824 .2131 .1172 8 .0859 .0000 .2668 .0388 .0000 .0002 .0000 .10 .0010 .0193 .0985 .0000 .0018 .0016 2 .0015 .0001 .0087 .0001 .0001 .0085 .0005 .0000 .0207 .0537 10 .0368 .0339 .1422 .2937 .0161 11 .2461 6 .1074 .0762 .40 .0000 .2128 .1419 .3474 .2542 .0566 .0806 9 .0106 .0162 .2365 .2397 .0000 .0000 .0025 .2059 .0011 .2162 .0112 .0446 .0000 .3012 .0000 .1181 .0605 .1969 .0689 .0043 .0014 .0746 .0008 .0439 3 .0000 .0001 .1665 .0006 .2128 .0536 .0017 .0022 .0268 .0000 .0000 .0054 2 .5688 .0000 .2668 .0513 .2668 .0012 .0010 1 .0068 .2508 .2461 5 .0029 12 .0000 .1830 .2774 .0266 .0537 4 .1208 5 .3138 .0000 .2866 .0001 .0137 .2225 .0000 .0000 .0803 .0462 .1774 .1395 .0000 .0005 1 .0088 .0083 .2816 .0004 .2985 .0763 .1267 .0439 9 .0000 .2256 6 .0008 .2335 .1209 .0424 .0003 .0003 .0098 . 1700 .0012 .0000 .2186 .4401 .1472 .2669 .0082 .0000 1 .2912 .1759 .2501 .0732 .2079 .0000 .0136 .0000 .0003 .0000 .0312 .50 13 3 .0001 .1549 .1312 .0510 .1571 6 .1319 .0001 .0001 .0417 12 .2095 7 .0336 .0000 .0000 .1229 .0008 .0001 .0000 .0000 .0916 6 .0691 .3593 .1535 .0000 .0230 .0010 .0515 .0006 .0000 .0150 .0000 .0219 .0611 5 .0183 .2252 .0352 .1651 .0001 .10 .0359 .0918 .0353 .1651 .0277 .1982 .0514 .1833 7 .0052 .0000 .4877 .1030 .2745 .1404 .0006 .1262 .0000 .0000 .0000 .0000 .1181 .0003 .2309 .0000 .0000 .1285 .0027 .0660 .0000 .1574 .3559 .0000 .0037 .0000 .0000 .1546 .0005 15 .15 . Statistical Tables Table I: (continued) θ n x .0917 .0068 .0000 .0000 .0024 .0000 .0000 .0100 .0000 .0555 .0000 1 .0000 .0001 .0016 .1319 .45 .0000 .1775 .0000 .0001 1 .0055 .05 .0131 .2184 .1142 .0019 .0003 .0009 .0000 .0000 .2040 .1123 .0000 .1126 .1553 .1859 .1110 .0093 .0005 2 .2099 .0000 .0874 .2022 .0000 .1968 .0005 .25 .0349 11 .2517 .1571 9 .0000 .0002 .0002 .0000 .0468 .2285 .0259 .0634 .0001 .2856 .1559 .0011 .1906 .0000 .1914 .2290 .1943 .0002 .0000 .0000 .0049 .3432 .1623 .0408 .0000 .0000 .3294 .0860 .0024 .0002 2 .2181 .1684 .0142 .0873 10 .0000 .0047 .0000 .0000 .0001 .1952 .0462 .0612 .0139 4 .2252 .2154 .0000 .0000 .0000 .1833 9 .1101 .20 .0092 .0030 .0035 .0352 .0000 .0278 5 .0000 .0047 .0668 .0095 12 .0667 6 .0093 .2463 .2013 .0000 .0004 .2123 .0063 .0000 .0298 .2214 .0430 .0245 .35 .0000 .0280 .30 .0232 .0000 .0007 .0873 5 .0019 .2312 .2001 .1720 .0030 .0087 .1082 .1336 .2095 8 .4633 .0132 .0056 .0078 .2775 .0476 .0003 .2008 .0000 .0000 .0139 13 .1222 6 .1964 8 .0535 .0832 .0000 .0126 .0000 .2061 .2501 .0550 .0000 .0005 .0215 .0001 14 0 .0003 .0000 .1350 .1134 .0001 .0407 .0611 11 .0393 .2337 .0000 .0000 .0007 .0000 .0000 .0349 4 .0065 .0066 .0028 .0134 .1398 .0618 .2169 .0073 .1792 .40 .0049 .0000 .0032 3 .0000 .1900 .2066 .0000 .0440 .0838 .1803 .0000 .2178 .1348 .0058 .0916 .0000 .1964 9 .0281 .0266 .0005 .0074 .0000 .2402 .0348 .0000 .2539 .0181 .0656 .0056 13 .2088 .0000 .0003 .0322 .0000 .1983 .0495 .2252 .0214 .0003 .0034 .1107 .0000 .1201 .0000 .0000 .2066 .2570 .0000 .0888 .0006 .0222 12 .1222 446 .0000 .1311 .0307 .1649 .0572 .0845 .0033 .0141 .0000 .0000 .0000 .0003 .0016 .2059 .1028 .0997 .0000 .1853 .1258 .0101 .0016 .0018 .0000 .2222 .0000 .2111 .1089 .1771 .0000 .0008 .1423 .1701 .0998 .0105 .0001 .1222 10 .0000 .0000 .0013 .1963 .0000 .3658 .0001 .0243 .0000 .0022 .0222 4 .0000 .1802 .0780 .0305 .0009 2 .0000 .0178 .0000 .0001 15 0 .0417 5 .1014 .0743 .0061 .1463 .0442 .2095 8 .0034 .0010 .0449 .0000 .0000 .0428 .0028 .0559 .0349 .0001 .2066 .0162 .0000 .1527 7 .0000 .0085 4 .0811 .0090 .0228 .1845 .1539 .0010 .0833 .1040 .0000 .0000 .0138 .0710 .0916 11 .0000 .0000 .2097 .0000 .0734 .0008 .3706 .1876 .0000 .0033 .0004 .0000 16 0 .1048 .2097 .1032 .0000 .0032 14 .0000 .2288 .0001 .1527 10 .0009 14 .0056 3 .0180 .0030 .0116 .2202 .0016 13 .0020 .0009 .0318 .0001 .0018 3 .0001 .0001 .0001 .0000 .0000 .1989 .2501 .0000 .2457 .0002 .1156 .0634 .0762 .1802 .1647 .0000 .1366 .1268 .1465 .0036 .0000 .0047 .0137 .0019 .0186 .0317 .0011 .0014 .0096 .1468 .2056 .0000 .0191 .0005 .0000 . 1893 .0000 .0276 .0000 .0000 .0811 .0019 .0000 .0000 .0000 .0000 .0005 .0000 .1784 .0006 17 .30 .0004 .0002 .1855 10 .0000 .0182 14 .0002 .0000 17 0 .0215 .0042 .0000 .0016 .0000 .0000 .0008 .0031 16 .0350 .0002 .0151 .0000 .1592 .0571 .0000 .0944 7 .0175 .1181 .1964 9 .0002 .0002 16 .0816 .0093 .0354 .0000 .0029 .1657 .40 .0000 .0004 .0341 .0411 .0010 .0002 .0000 .0039 .0000 .1704 .0000 .0000 .0003 .0095 .0000 .0025 .0012 .2359 .0001 .0014 .0000 .0002 .1556 .1524 .0014 .9700 .0056 .2081 .0049 .0000 .1457 .0811 .0023 .0000 .0000 .1669 9 .1201 .0000 .0392 .1245 .1734 .0611 .0000 .0260 .0021 .1484 8 .0000 .0039 .0003 .0000 .0000 .0010 3 .0008 .0033 .0472 13 .1883 .0029 .0197 .0000 .0085 14 .0001 .0134 .4181 .0002 .0000 .1680 .3774 .0139 .2835 .0144 .0000 .0018 15 .3150 .0000 .0000 .0000 .0000 .1873 .0000 .0052 4 .0000 .0374 .0000 .0000 .0000 .0581 .1606 .0000 .0958 .0000 .0012 .0022 .0000 .0060 .0197 .2393 .0000 . Statistical Tables Table I: (continued) θ n x .0001 17 .1008 .0000 .1746 8 .0069 .0000 .0820 .1276 .0052 15 .0771 .0000 .1529 .0009 .1991 .0668 .0022 .0000 .25 .0685 .0337 .0680 .0001 18 .0042 .1746 10 .1685 .3002 .0002 .1284 .0263 .50 16 7 .2406 .0386 .0005 .3774 .0000 .0426 .0005 .2852 .0000 .0000 .1136 .0167 .1893 .0026 .0242 .0012 .1436 .0000 .0278 13 .0415 .0009 .1507 .1988 .0000 .0011 .1855 9 .0042 .0000 .1889 .2800 .0327 6 .0338 .0000 .0000 .0076 .0000 .0000 .1841 .1914 .0000 .0000 .0000 .0000 .0000 .0487 .0000 .1575 .1681 .0794 .10 .1914 .0006 3 .0000 18 0 .0000 .15 .1361 .2556 .35 .0000 1 .0000 .0001 .0000 .1379 .0000 .0000 .1484 11 .0145 .0024 .0180 .0000 .0000 .0000 .0291 .0000 .0010 .0000 .0001 2 .0144 .1376 .0001 .0000 .0040 .1839 .0473 .0000 .3763 .0081 .0052 .0000 .0001 .0755 .2297 .0002 .1417 .0666 .1694 .0093 .0000 .1320 .1723 .1864 .0011 .2093 .1327 .0000 .0000 .0000 19 0 .0000 .1070 .0000 .0056 .0093 .0000 .0000 .0001 .0000 .1792 .0000 .0000 .0055 .0000 .1222 11 .0001 .0000 .0385 .0004 .0169 .0068 .0007 .0668 .0004 .1664 .1892 .0000 .0117 15 .0149 .0000 .0012 .0000 .0667 12 .0524 .05 .1134 .0000 .0547 .0117 5 .0000 .0058 .1432 .1501 .1146 .0875 .0010 16 .0142 .0000 .0840 .2209 .0000 .0065 .0944 12 .0000 .1046 .0000 .0045 .0016 .1669 11 .0001 2 .0301 .0001 .0000 .0014 .0000 .1849 .0007 .0000 .0000 .1540 .0091 .0000 .0376 .2017 .1214 8 .1351 .1812 .0035 .0236 .0000 .1010 .0001 .0000 .0002 .2153 .0000 .1855 10 .0000 .1927 .0045 .3741 .0000 .0002 .2673 .0957 .0268 .0102 .0000 .0000 .0000 .0644 .45 .0000 .0327 14 .0000 .0787 .0001 .1704 .0536 .0000 .0001 .0000 .0000 .0000 .0708 13 .0001 .0000 .0605 .0456 .0000 .0000 .0442 .0614 .0010 .0000 .0004 .0090 .0000 .0120 .0218 .0923 .0472 6 .3972 .0185 .0000 .0000 1 .0126 .1655 .0046 .0021 .0246 .0000 .0075 .0084 .0000 .0458 .0000 .0001 .0013 .0796 .0000 .0000 .0000 .1104 .0047 .0115 .0001 .0003 .0000 447 .0267 .0003 .0000 .0701 .1969 .0001 .0708 7 .0742 .0000 .0095 .1683 .2130 .0001 .1941 .0000 .0000 .1868 .0190 .0182 5 .0011 .0000 .0000 .0000 .0000 .0000 .1248 .0000 .0005 .0225 .0000 .0000 .0000 .0000 .0031 4 .0631 .0006 .0279 .1318 .0000 .0008 .1668 .1214 12 .0000 .0000 .0000 1 .0000 .0002 .0000 .0000 .0000 .0525 .20 .0000 .0000 .0000 .0000 . 0869 .0000 .1844 .0018 .1442 12 .0374 .1844 .1593 .0000 .0669 .0000 .1897 .0976 .0069 .1821 .0010 .0024 .0654 .0000 .0000 .0000 .0350 .0000 .0012 .0000 .1244 .2428 .0686 .0074 16 .0020 .0002 .0000 .0000 .2023 .0000 .0000 .0000 .50 19 2 .0046 .0233 .0222 6 .1221 .0000 .0000 .0000 .0222 .0039 .1614 .3585 .0388 .0528 .0148 16 .0000 .0085 .0133 .0518 7 .0000 .0949 .0115 .0030 .2182 .0000 .0000 .1916 .1636 .0746 .1789 .0000 .0000 .0000 .0032 .0000 .1643 .0000 .0000 .0000 .0020 .0000 .2293 .0022 .0497 .0000 .0002 .0355 .0002 .0000 .1771 .0000 20 .0000 .1443 .1525 .0596 .0120 .0000 19 .0000 .0002 3 .0222 15 .0074 .0000 .0082 .0066 .0046 5 .0000 .0961 13 .0739 8 .0001 .0000 .0422 .0533 .0049 .0008 .0980 .2428 .0001 .0014 .0018 4 .1762 10 .1158 .0000 .0710 .0005 .0000 .1797 .1796 .0000 .0004 .0002 .1712 .0336 .0004 .0000 .0738 .0123 .0031 .2182 .0000 .0518 14 .1916 .0000 .0000 .0051 .2428 .0008 .1171 .0000 .0046 17 .0146 .0008 .2702 .0532 .0160 .0000 .0001 .0220 .0211 .0007 .1201 13 .0000 .2054 .1028 .0003 .0000 .1659 .1442 9 .0032 .0000 .1602 10 .0006 .0487 .1771 .0271 .0323 .0000 .0002 .1787 .1491 .0045 . Statistical Tables Table I: (continued) θ n x .0122 .1901 .0746 .0000 .0000 448 .0001 .0000 .1449 .0000 .0716 .0000 .1686 .1304 .0000 .0002 .2182 .0000 .0002 .1746 .0727 .10 .0000 .0933 .1714 .0370 15 .0000 .0000 1 .0000 .05 .0000 .0000 .1339 .0798 .1517 .20 .0803 .1468 .0000 .0467 .0011 18 .0136 .1489 .0000 .0000 .0909 .0005 .0000 .0454 .0001 .1201 9 .0000 .0000 .0077 .45 .0003 3 .0003 .0961 8 .0018 17 .1574 .0000 .0000 .0970 .0074 5 .1540 .0013 .0000 .0000 .0000 .0020 .0366 .0000 .1602 12 .1464 .0013 .0955 .0365 .0000 .0000 .0000 .0000 .0898 .0148 6 .0000 20 0 .0308 .0139 .0000 .0000 .0609 .2852 .0022 .1797 .0514 .0002 19 .0000 .1762 11 .25 .0000 .1272 .0000 .40 .0000 .2852 .0974 .0000 .0000 .0000 2 .0000 .0198 .0000 .1451 .1762 11 .0000 .1369 .0000 .0011 .0099 .0001 .0000 .0005 .0739 14 .0000 .1124 .0000 .0907 .2023 .1216 .1368 .1844 .0003 .0000 .0000 .1597 .0068 .0000 .35 .0166 .0000 .0000 .0022 .0000 .0049 .0278 .0000 .0018 .1623 .0040 .0000 .1091 .0112 .0013 .1144 .0576 .0000 .2023 .0000 .3774 .0000 .0000 .0000 .0000 .1185 .0062 .1916 .0000 .1887 .0000 .0001 .0000 .0000 .0000 .0100 .30 .0000 .0000 .1797 .0000 .0000 .0529 .0443 .0046 .0005 .0000 .0000 .0000 .0000 .0138 .0319 .0237 .0001 .0000 .0000 .0000 .0150 .0089 .0981 .0266 .0083 .0000 .15 .0545 .0000 .0370 7 .0013 .0358 .0233 .0000 .0203 .0175 .0000 .0003 .0000 .1771 .0000 .0024 .0000 .0001 .0000 .0005 .0000 .0003 18 .0000 .0011 4 .0000 . 0536 .0000 .0723 .0005 .1264 . Fla.0000 .0000 .7 2.0735 .3347 .1 3.2240 4 .0001 .1169 .0466 .0812 .0992 .2303 .1839 3 .1353 1 .0216 .0662 .0206 .0000 .0000 .0247 .7 1.1815 .1929 .0771 .0000 .3293 . Poisson’s Exponential Binomial Limit.0881 .8 0.2138 .0000 .0176 .2 0.0003 .0007 .0000 .0508 .3033 .0007 .0040 .1042 7 .1888 .0298 9 .2169 .1607 .0095 .0850 .0188 .3543 .0001 .0008 11 .1931 .1135 .0608 .1217 .0102 .4966 .0001 .1912 .0022 .1596 .0001 .2158 .0551 .0047 .1858 .7 0.0872 .0002 .5 2.0011 .0002 λ x 2.2001 .1734 .1823 .1082 . C.0555 .0940 .0001 λ x 3.1496 .2707 3 .0001 .2678 .2186 .5 1.0278 .2681 .0000 .0334 .0035 .0068 .0826 .0215 .1539 .1703 .0001 .0312 . Melbourne.0047 .0260 .0000 .0007 . Krieger Publishing Company.0047 .1377 .1304 .0476 .0031 6 .0743 .8187 .2314 .0139 .0348 .0 0 .0009 . 1973 Reprint.0369 .2090 .0000 .0026 .0224 .0000 .0183 1 .3679 2 .0146 .0002 12 .3012 .2384 .0164 .0033 . Molina.0000 .0001 .0269 .0000 .1378 .0030 .0027 10 .1438 .1710 .0031 .0998 .1108 .2226 .0004 .2237 .0241 .1255 .1075 .0011 .0003 .0 0 .1522 .0005 .8 3.0174 .0120 7 .0333 .2510 .7 3.0002 .0081 9 .0045 .0000 .0077 .0006 .1563 6 .0309 .0016 .2222 .0324 .2681 .0001 .2306 .0000 .0044 .9 3. Robert E.0009 9 .0025 .0000 .3614 .2725 .2842 .0012 .0602 .0273 .2165 .2707 2 .0915 .2176 .2613 .1850 .0000 .0004 .0005 7 .0129 .6 0.0098 .0099 .1057 .0000 .0000 .6065 .3 3.2 1.0000 .0012 .0003 .0126 .2572 .0001 λ x 1.1429 .3 2.2087 .4 2.0608 .2125 .0202 .0000 .4 1.1622 .6 3.1951 .6703 .0153 5 .0000 .0005 .1496 .0408 .3 1..0804 .0821 .0001 .0083 .0278 .0407 .0538 .0000 .0163 .3476 .0003 .1465 3 .2237 .0395 .1680 5 .2565 .2438 .1653 .1008 6 .3662 .0111 .2209 .0002 .0001 .0078 .9 4.0027 .8 1.2225 .0988 .0203 .2510 .3452 .1336 .0636 .0005 .1781 .2 2.1488 .0260 .0008 .2466 .3 0.0905 .1966 .2640 .0001 .0076 .0008 .6 2.0417 .0004 .0002 .0613 4 .0038 .1322 .0867 .2450 .1140 .4 0.0989 .0001 .0148 .0000 .0033 .0062 .0733 2 .0758 .0056 .0504 7 .0 0 . Statistical Tables Table II: Poisson Probabilities† λ x 0.0385 .0050 .1254 .1890 .2975 .0045 .0362 .0241 .1477 .0011 .0198 .3329 .0284 .0000 .3230 .1647 .2652 .2008 .0789 .0055 .1217 .0006 .2231 .0668 .1128 .0000 .0003 .0 0 .0471 .0494 .0001 .0089 .0907 .0302 .2052 .0361 6 .0169 .5 3.0216 8 .1954 5 .0000 .5488 .0550 .8 2.1692 .0111 .1003 .0072 .0020 .0018 .9 1.0551 .0015 .0425 .1615 .0000 .0141 .0984 .0057 .0000 .0132 † Based on E.2240 3 .1 2.0738 .0000 .1225 . 449 .2014 .3106 .0084 .2 3.0015 .0000 .1397 .0498 1 .0000 .2205 .1827 .0000 .1494 2 .0000 .0001 .0068 .0595 8 .0002 .6 1.0319 .1771 .0191 .0018 .0000 .0450 .1 1.0902 5 .2033 .0002 .0034 8 .0061 .1931 .0020 .0002 .2046 .3659 .1 0.1804 4 .0936 .1944 .3679 1 .0001 .2019 .0111 .0246 .2584 .4 3.4066 .7408 .0011 .0000 .1557 .9048 .9 2.0001 .2087 .1637 .0000 .1203 .0116 .1954 4 .0672 .0716 .0000 .0383 .0455 .0118 . by permission of the publisher.2177 .0002 .5 0.0019 .0014 .0066 .3595 .0000 .4493 .1414 .2700 .2417 .2700 .0002 .0000 . 0842 3 .1753 .0002 .1571 .0002 .0027 .0000 .0427 .0265 .0001 λ x 6.0002 .1719 .0191 .1281 .0287 .0036 .0258 .0618 .0002 λ x 5.0001 .0018 .0575 .7 4.0004 .1044 8 .0009 .1254 .0001 .0001 .0241 .0500 .1875 .0746 .0002 .0731 .3 6.1687 .0037 .0061 .0034 13 .0037 .0021 .0162 .1555 .0 0 .0022 .0509 .0082 .1517 .2 6.0000 .0007 .0051 .0000 .0959 .0076 .0 0 .1267 .0001 .0001 .0003 .0032 .0869 .0002 .0000 .1748 .1951 .0018 .0014 .1714 .1033 9 .4 4.8 3.0000 .1849 .0001 .1005 .0 10 .8 6.0003 .0041 .0001 .1798 .0001 .1002 .0 0 .0012 .0001 . Statistical Tables Table II: (continued) λ x 3.0123 .1234 .0013 14 .3 4.0446 3 .0001 .1323 .0028 .1185 .0082 12 .0778 .0118 .1125 .0552 .0005 .0055 .0050 .0015 .0053 11 .4 3.0003 .1725 .0016 .0000 .0011 .0328 .0318 .0463 .0000 .0240 .1093 .0056 .0000 .0296 .0002 .0001 .0010 .0311 .1755 6 .0006 .0002 .0477 .0104 .1641 .1 5.1063 .0071 .1933 .0052 14 .0173 .0003 .0017 .1606 7 .0143 .0006 13 .5 4.8 5.0030 .0285 .1 3.1383 .0001 .0137 .0019 12 .1574 .0104 .0544 .0262 .1353 .1 4.1633 .0074 .0360 .1377 8 .5 3.0005 .0013 .0011 .0334 .1600 .0630 .0039 .0925 .1188 .0000 .1728 .0007 .0620 .0003 .1944 .0004 .0046 .0006 .0614 .0640 .0064 .0073 .0013 .0004 .1428 .1852 .0041 .0015 .0454 .0008 .0023 .1681 .0686 .6 3.1082 .0583 .0164 .4 5.0039 .0006 .0417 .0894 .0008 .0093 .0001 .0000 .1086 .1755 5 .0009 .0027 .0586 .1 6.0001 .0009 .0009 16 .1738 .0001 .0334 .0998 .1515 .1743 .0000 .0005 .1163 .0136 .1678 .0102 .0337 2 .0008 .0045 .1631 .0340 .0032 .1239 .0200 .0486 .0500 .0067 1 .0365 .3 3.0091 .0962 .0188 .0082 .1904 .0030 .0026 .1237 .0232 .0007 .0892 4 .7 5.0009 .7 3.2 3.0010 .0001 .0070 .0006 .0045 .0098 .6 4.0519 .0168 .0126 .0028 .1033 .4 6.0019 .1747 .1789 .0157 .0386 .0000 .9 4.0003 .0255 .0824 .0363 10 .0011 .0113 13 .0008 .0132 .0019 .0061 .1293 .0014 .0309 .1505 .1740 .0413 11 .2 4.0000 .0149 2 .0150 .1490 .1708 .0005 .1632 .0081 .0276 .0015 .0001 .1133 .0002 14 .1601 .0101 .1393 .0001 .0679 .0073 .1753 .1558 .0001 .1125 .1339 5 .0000 .0000 .0223 450 .1537 .1584 .1323 .0359 .0280 .1362 .0692 .0244 .1697 .0049 .0016 .0423 .0045 .0003 17 .0207 .0001 .1298 .0181 11 .0462 .0001 .0771 .0004 .0428 .1404 4 .0025 1 .0064 2 .0024 .0002 .5 6.0701 .0023 .0000 .0225 12 .1917 .1606 6 .1398 .9 7.0000 .0129 .0116 .0013 .1348 .0065 .0150 .0043 .6 5.9 6.0011 .0106 .0985 .0111 .0001 λ x 4.0390 .0659 .0033 .0022 15 .1600 .0207 .0392 .0209 .0003 .0002 .1326 .1200 .3 5.1472 .0810 .0000 .0003 .0017 .8 4.1656 .0092 .0019 .0732 .6 6.0033 .0000 .0090 .0022 .1662 .0887 .0849 .0000 .0580 .0225 .0001 .1820 .0393 .0020 .0307 .0914 .0537 .1143 .0147 .1191 .0176 .0001 .2 5.1515 .0003 .0220 .0793 .0002 .0001 .0004 .0364 .1687 .0688 10 .0540 .0007 .1432 .0016 .0166 .0002 .9 5.1462 7 .0058 .0082 .0011 .0000 .0009 1 .0948 .0653 9 .0654 .0190 .0116 .0000 .5 5.0005 15 .0938 .0092 .0001 .1898 .1460 .0395 .1594 .7 6. 0941 .0285 .0169 15 .0001 .0000 .0573 5 .1529 .0069 .1575 .6 7.0026 .0038 .0265 .7 8.0195 .0992 .0243 .0002 .1428 .1480 .0001 1 .0002 .0652 .0014 .0194 .0696 .0145 .0003 .0005 .0 0 .0002 .0024 .0090 16 .1435 .0002 .0004 .0002 19 .0006 18 .0004 20 .1442 .1277 6 .0025 .0164 .0000 .0283 .0227 .0000 .1034 .8 8.0000 .0401 .0017 .0086 .0544 .0075 .0558 .0007 .0710 11 .0618 .1396 8 .0008 .0001 .3 6.0434 .1605 .0003 .1224 .0013 .0003 .0323 .1601 .0092 .0181 .1284 .0046 .0337 451 .0954 .0588 .0134 .0469 .0286 4 .0004 .0000 .0916 6 .0021 .1579 .1204 .0330 .0119 .0008 .0649 .0000 .1546 . Statistical Tables Table II: (continued) λ x 6.0012 .1096 .0001 .0124 .1562 .0006 .1057 .0029 .0324 .0052 .1487 .0009 .0045 17 .0041 .0006 .0000 .0478 .0800 .0357 .1 8.0057 .0001 .0389 .0613 .0014 17 .0344 .0019 .0001 .1207 .0011 .1474 .0858 .1392 .1349 .0764 .0000 .0081 .0765 .0296 14 .0010 .1519 .0923 .0058 .0007 .0016 .0154 .0032 .0104 .0000 .1462 .0006 .6 8.0006 .1221 7 .0123 .1367 .0010 .0180 .0914 .0059 .0108 .0007 .1314 .1130 .0041 .0245 .0019 .0005 .1339 .0050 3 .0160 .1 7.1066 .1167 .1282 .8 6.0663 .0008 .0740 .1490 7 .1454 .0278 .1481 .0269 .0000 .0051 .0142 14 .0005 .0891 .0001 .0806 .0005 .0585 .0617 .0208 .0037 .0137 .0002 21 .0688 .0366 .0003 .9 9.0001 .1241 .7 6.1263 .0000 .0874 .0757 .0007 .0013 .0063 .0001 .0113 .0000 .0054 .1465 .1205 .0002 .0049 .1486 .0079 .0008 .0602 .0443 .1489 .6 6.0003 .0065 .0045 .0722 12 .0023 .1445 .0157 .1252 .0073 .1241 10 .0912 5 .0150 .0 3 .0452 12 .1382 .0252 .0 0 .0196 .0222 .0015 .0345 .1468 .2 7.0037 .0183 .0002 .0001 .0001 λ x 7.0858 .1099 .4 7.0002 .0003 .1014 10 .0000 .0002 .0086 .0068 .0001 .1070 .0004 .0466 .0002 .0227 .1373 .0481 13 .0033 .0125 .5 7.0398 .0029 .0064 .0171 .4 6.0026 .0825 .1188 .0041 .0001 .1304 9 .0986 .0836 .0237 .0001 .9 7.0001 .1418 .0027 2 .0985 .1215 .0179 .0303 .0019 .1511 .0584 .0098 .0210 .0001 .0012 .0016 .0208 .0041 .0004 .0441 .1486 .0000 .0353 .1394 .0420 .1321 .0492 .0006 .5 8.0037 .0000 .5 6.3 8.0770 .0799 .1311 .1042 .0021 .0521 4 .1076 .0029 .1351 .0001 .0411 .0083 .0074 .0245 .1144 .0457 .0993 11 .0017 .0464 .0013 .0003 .1489 .0005 .0001 .4 8.0305 .0003 .0791 .0632 .0145 .1130 .0168 .0388 .0033 .0004 .2 8.0009 19 .0002 .1094 .0030 .1399 .0011 2 .0729 .0004 .2 6.0035 .1450 .1413 .0517 .0002 .1396 9 .0150 4 .0000 .0156 .0002 .0014 .1395 .0667 .0194 .0001 .1187 .0723 .0012 .0307 .0021 18 .1388 .0426 .0001 .9 8.0016 .0377 .0726 .0058 .0260 .0001 λ x 8.7 7.0552 .0264 13 .0001 .0002 .0000 .0167 .0848 .0130 .1586 .0134 .8 7.0003 .0366 .0498 .0640 .0001 .0100 .0211 .1294 .1162 .1385 .1595 .0558 .0107 3 .0002 .0071 15 .1420 .1160 .0491 .0829 .0095 .0062 .0020 .0116 .0951 .1363 .0528 .0887 .0018 .0046 .1118 .1337 .0058 .0952 .0695 .1420 .1490 8 .0007 .1549 .1021 .1240 .0438 .0001 .1249 .0967 .1454 .0413 .0089 .1472 .0023 .0000 .1 6.1121 .0004 .0003 1 .0679 .0531 .0005 .0377 .3 7.0054 .0504 .0025 .1167 .0033 16 .0078 .0010 . 0481 .0347 16 .0009 .0221 .0004 .0081 .0008 .1317 .1171 8 .0182 .0005 2 .0002 .0003 .0006 21 .0046 .0006 .0000 .1358 .0081 .0007 .0851 .1126 9 .0398 .0374 .0007 .0075 .0816 .1356 .1 9.0005 .1294 .1210 .0521 15 .0004 .1256 .0000 .0530 .0000 .0024 .1198 .0003 .0005 .0008 .0044 .1235 .1378 .0948 .1375 .1104 .0729 14 .1212 .0002 .0169 .0033 .0005 .0182 .0039 .0001 .0000 .0005 .0901 8 .0001 .0002 24 .0210 .0014 .0001 .1031 .0086 .0005 .0017 .0240 .0000 .0289 .0776 .0012 .0023 3 .0970 12 .0549 .1382 .0118 .1286 .0439 .0764 .0140 .0028 .1344 .0095 .1012 .0017 .1338 .1317 .1063 .0005 .0065 .1010 .0024 .1034 .0002 .0000 .0728 13 .0217 17 .0254 .0008 .0438 .0001 .0001 .0334 .1172 .0656 .1232 .0109 17 .0010 .0235 .0776 .0272 .0002 .0011 .0682 .1388 .0001 .0306 .7 9.0101 .0281 .1250 .5 9.0036 .0581 .0029 .0180 .0158 .1123 .0003 .0088 .0005 .1084 .0010 .1251 11 .1066 .0009 22 .0029 19 .0093 .0009 .0822 .0991 .0506 .0127 .0361 .0001 .0116 .1311 .1017 .0007 .1318 10 .0001 .0378 6 .0000 .1269 .0256 .0006 .0866 .0483 .1299 .0014 20 .0000 .1098 .1137 12 .0878 .0123 .0635 .0000 .1274 .0168 .0003 .0004 23 .0021 .1083 .0035 .0001 .0000 452 .0000 .1003 .0128 18 .0147 .0012 .0087 .1306 .1145 .0459 .1300 .0023 .0354 .0204 .1315 .9 9.1170 .0093 .0076 4 .0107 .0000 .0126 .0002 .1 8.1128 .0031 .0079 .0001 .0001 λ x 11 12 13 14 15 16 17 18 19 20 0 .0654 .0011 .0060 .0265 .0285 .0029 .0380 .1247 .0100 .1191 .0000 .1251 .0002 .0002 .0001 .0034 .0017 .0002 .0594 .8 8.0026 .1160 .0111 .0000 .0098 .0640 .1269 .0882 .1228 .0012 .0459 .3 8.0009 .0003 .0008 .0001 .0240 .0505 .0000 .1112 .0136 .0037 20 .0000 .0019 21 .1311 .0037 .0799 .0418 .0001 .0395 .0722 .0001 .0007 .0399 .0192 .0617 .0208 .0319 .0752 .2 8.0213 .8 9.1249 .3 9.9 10 0 .0629 .0001 .0002 .0002 .0911 7 .1251 10 .0025 .0 5 .1222 .0066 .0941 .0703 .0009 .0031 .0439 .0002 .0269 .0479 .0330 .0060 .0250 .0685 .0419 .6 8.0015 .0928 .0115 .0000 .0103 .0026 .1241 .0844 .0001 .1284 .0071 19 .0040 .0000 .0752 .0784 .0001 .0692 .0048 .0000 .7 8.0881 .0802 .1040 .0006 .0055 .0051 .0853 .1157 .0001 .0604 .0157 .0908 .0015 .0000 1 .0001 .1197 .0955 .1302 .1140 .0001 .0302 .1315 .0201 .0530 .0003 .0107 .1067 .0004 .1366 .0902 .0055 .0000 .0928 .0000 .0631 7 .0014 .0001 .1317 .0925 .0053 .0001 .0416 .0019 .0000 .0000 .1118 .1318 9 .2 9.4 9.0315 .0003 22 .0679 .0004 .0072 .0297 .0006 .0000 .0063 .0026 .0000 .0001 .0972 .0147 .0001 .0828 .0888 .0607 6 .0948 13 .0000 .1293 .0043 .0042 .0189 5 .1037 .0555 .0001 .0526 .0000 1 .1191 .0004 .0500 .1064 .0579 .0027 .0707 .0663 .4 8.0196 .0793 .0194 16 .0001 .1219 . Statistical Tables Table II: (continued) λ x 8.1280 .0313 .0000 .0822 .1332 .0002 .0001 .0709 .0069 .0010 .1290 .0849 .0019 .0050 .0225 .0001 .0040 .1186 11 .1091 .1395 .0001 .0007 .0749 .1306 .1097 .0460 .0119 .0555 .0046 .1271 .0342 .0011 .0001 .0034 .0021 .1049 .6 9.0662 .0000 .0324 15 .0001 .1392 .0982 .0058 18 .1125 .0572 .1263 .0736 .0000 .0015 .1245 .1148 .0032 .0137 .5 8.0131 .0226 .0504 14 .0001 λ x 9.0002 . 0286 .0936 .0684 .0000 .1194 .0387 15 .0004 .0418 .0783 .0281 .0019 .0050 .0010 .0245 .0304 .0001 .0001 .0013 .0930 .0000 .0037 .0125 30 .0043 .0368 .0002 .0989 .0104 .0003 .0007 .0070 .0213 .0004 37 .1056 .0084 .0692 .0018 .0083 .0000 .0016 .0442 .0341 .0509 .0046 .0706 .0077 .0956 .0000 .1021 .0127 .0006 .0719 .0010 .0002 .0002 .0000 .0034 33 .0378 .1194 .0000 .0027 .0676 .0029 10 .0002 .0000 .0007 36 .0324 .0053 .0087 .0936 .0905 .0054 32 .0003 .0000 .0001 .0018 .0000 .0000 .0769 23 .0514 .0000 .0000 .0560 .0543 .0007 .0000 .0000 .0271 14 .0560 .0000 .0963 .0000 .0885 .0000 .0216 .0034 .0004 .0000 .0000 .0000 .0000 4 .0164 .0000 .0992 .0663 .0984 .0254 28 .0001 .0760 18 .0001 .0012 .0005 .0002 7 .0001 6 .0000 .0070 .0013 .0174 .0000 .0014 .0299 .0011 .0037 .0000 .0194 .0000 .0001 39 .0237 .0550 .0016 .0106 12 .0534 .0019 .0072 .0144 .0005 .0457 .0646 .0888 .0000 .0272 .0204 .0150 .0060 .0002 .0888 21 .0001 453 .0001 .0004 .0000 .0003 .0310 .0055 .0000 .0000 .0007 .0161 .0001 .0000 .0906 .0001 .0000 .0006 .0000 .0000 .0181 29 .0000 .0008 .0154 .0496 .0121 .0000 .0000 .0173 .0884 .0001 .0926 .0934 .0000 .0336 .1060 .0655 .0057 .0092 .0003 .0000 . Statistical Tables Table II: (continued) λ x 11 12 13 14 15 16 17 18 19 20 2 .0013 .0000 .0559 .0863 .0065 .0010 .0911 .0000 .0034 .0102 .0000 .0000 .0191 .0000 .0000 .0120 .0830 .0000 .0026 .0786 .1024 .0847 .1099 .0557 25 .0015 .0772 .0800 .0083 31 .0020 .0002 .0001 .0000 .0438 .0355 .0000 .0909 .0516 16 .0728 .0504 .0663 .0001 .0083 .0000 .0002 .0004 .0000 .0001 .0411 .0009 .0874 .0963 .0960 .0650 .0023 .0010 .0000 .0109 .0646 17 .0256 .0866 .0024 .0437 .0001 .0004 .0328 .0000 .0152 .0001 .0008 .0000 .0097 .0367 .0814 .0074 .0176 13 .0020 34 .0000 .0000 .0006 .0044 .0009 .0000 .0230 .0000 .0030 .0669 24 .0888 20 .0655 .0050 .0557 .0002 38 .0002 .0724 .0049 .0010 .0658 .0992 .1144 .0000 .0000 .0409 .0699 .0559 .0109 .0000 .0000 .0000 .0024 .0397 .0383 .1024 .0911 .0001 .0661 .0000 .0000 .0433 .0844 .0058 11 .0814 .0000 .0798 .0224 .0133 .0037 .0866 .0000 3 .0554 .0000 .0004 .0000 .0844 19 .0000 .0117 .0048 .0145 .0030 .0002 .0255 .0006 .0846 22 .1060 .0005 8 .0246 .0000 .0237 .0026 .0005 .0013 9 .0000 .0002 .0177 .0038 .0713 .0003 .0000 .0486 .1144 .0095 .0101 .0012 35 .1085 .0000 5 .0000 .0829 .0426 .0887 .0343 27 .0661 .0135 .0042 .0859 .0001 .0024 .1048 .0226 .0001 .0000 .0320 .0164 .0029 .0000 .1099 .0000 .0000 .0018 .0446 26 .0001 .0000 .1094 .1015 .0259 .0473 .0000 .0063 . 3485 .4959 .4881 .0.4656 .4906 .3461 .4495 .2 .4932 .4664 .0.8 .3365 .4505 .8 .02 .4292 .4744 .4861 .1915 .4222 .4972 .2967 .4893 .3810 .3389 1.4936 2.1517 0.4 .0596 .7 .2088 .1950 .4192 .4864 .1772 .4990 .9 .2454 .4871 .2611 .0871 .4911 .3051 .3264 .4838 .3508 .4525 .2389 .4979 .4279 .0987 .1700 .4452 .1 .2580 .4162 .3643 .4953 .4890 2.4798 .2852 0.1 .4960 .4980 .1368 .4032 .1 .4961 .1331 .4394 .5 .3888 .1879 0.0398 .4922 .4901 . and 6.4981 .4082 .0199 .3830 1.49997.4817 2.4988 .1480 .3997 .1591 .2673 .3907 .4793 .1406 .4957 .3315 .0 .4821 .4535 .4641 .4973 .4693 .4671 .4913 .1985 .4803 .4 .2324 .3212 .4941 .4015 1.2642 .0279 .4463 .4115 .4887 .4918 .4979 .5 .3729 .2734 .0 .3 .4788 .2054 .2357 . the probabilities are 0.3665 .4981 2.7 .4946 .2019 .4989 .4345 .4554 .4319 1.1554 .0910 .2486 .4474 .3708 .3106 .4989 .4964 2.4306 .4896 .3577 .4573 .3980 .4983 .4977 .7 .2517 .4955 .2123 .0120 .08 .1443 .3770 .03 .3869 .4929 .0714 .4812 .4982 .4940 .4920 .4952 2.4582 .4938 .1103 .4987 .2257 .4987 .4931 .0.3413 .4778 .4948 .4732 .1844 .4649 .1217 .3023 .4986 .4826 .4956 .3 .3078 .00 .4756 .4207 .0793 .4984 .4515 .4985 .4868 .0239 . and 0. Statistical Tables Table III: Standard Normal Distribution z .0 .4925 .4854 .4406 .2794 .4706 1.3133 0.0359 0.4966 .4962 . 454 .4418 .4974 .2939 .0160 .0636 .07 .3749 .4875 .3438 .5 .4357 .4999997.06 .4699 .499999999.0 .1255 .4678 .0517 .4987 .4969 .2823 .0080 .4370 .4978 .4441 1.0438 .1293 .2190 .0478 .4842 .4934 .0675 .4761 . for z = 4.6 .3849 .3554 .4382 .4049 .0832 .4808 .4916 2.4251 .01 .3621 1.3531 .2422 .4977 .4625 .4985 .4564 .4265 .4984 .05 .3238 .4738 .4975 .4898 .2291 .2764 .2995 .3925 .2881 .4909 .4633 1.4783 .1664 .1179 .4591 . 0.2549 0.6 .4147 .4904 .4945 .4878 .2 .4429 .4846 .1628 .4988 3.09 0.4976 .4965 .4332 .4857 2.9 .04 .8 .4131 .1064 .4066 .4713 .4970 .4990 Also.4772 .4236 .4545 1.4951 .2157 .3289 .0319 .4834 .3186 .4988 . 5.0753 0.3340 .3 .3790 .4686 .4599 .3686 .4750 .0040 .4989 .4767 2.2910 .4850 .4971 .4982 .3944 .1808 .4484 .6 .1736 .2 .4949 .4726 .4927 .4830 .0557 .0000 .4968 .4967 .2704 .4884 .4608 .4177 1.4 .3159 .9 .1141 0.1026 .4974 2.3599 .0948 .4963 .4099 .2224 0.4943 .4719 .3962 .4616 . 819 22 23 1.886 2.878 18 19 1.718 3.747 4.056 2.740 2.898 17 18 1.508 2.708 2.306 2.355 8 9 1. By permission of Prentice Hall.845 20 21 1.977 14 15 1.821 63.771 2.492 2.064 2. Johnson and Dean W.861 19 20 1. Statistical Tables Table IV: Values of tα.052 2.473 2.943 2.476 2.345 1.145 2.699 2.353 3.528 2.706 31.060 2.650 3.895 2.328 1.143 3.330 1.326 2.397 1.337 1.110 2.552 2.761 2.415 1.896 3.576 inf.725 2.717 2.ν † ν α = .201 2.645 1.779 26 27 1.711 2.383 1..602 2.921 16 17 1.025 α = .323 1.462 2.734 2.303 6.363 1.341 1.998 3. † Based on Richard A.638 2. Applied Multivariate Statistical Analysis.841 3 4 1.604 4 5 1.753 2.132 2.078 6.372 1.860 2.131 2.015 2.314 1.325 1.05 α = .318 1.567 2.479 2. N.706 2.764 3.782 2.032 5 6 1.763 28 29 1. 592. Table 2.350 1. Wichern.831 21 22 1.624 2.518 2.074 2.313 1. Upper Saddle River.701 2.807 23 24 1.485 2.746 2.314 12.319 1.086 2.583 2.316 1.776 3.797 24 25 1.771 27 28 1.055 12 13 1.533 2.657 1 2 1.160 2.045 2.012 13 14 1.356 1.812 2.120 2.440 1.321 1.262 2.721 2.703 2.467 2. 455 .960 2.681 3.796 2.101 2.833 2.311 1.282 1. 2nd ed. p.179 2.365 4.729 2.10 α = .048 2.714 2.571 3.250 9 10 1.333 1.947 15 16 1.J.182 4. 1.965 9.787 25 26 1.365 2.821 3.01 α = .169 10 11 1.315 1.756 29 inf.093 2.447 3.925 2 3 1. © 1988.920 4.005 ν 1 3.541 5.069 2.539 2.106 11 12 1.499 7 8 1.228 2.707 6 7 1.080 2.500 2. 844 15.449 16.115 .932 41.337 44.107 5.526 34.119 29.603 3.597 2 3 .265 7.975 α = .000982 .278 50.928 41.711 9.735 2.750 5 6 .542 10.816 4.520 11.643 9.142 5.325 16.261 24. 1.247 3.852 36.180 2.700 3.672 30 † Based on Table 8 of Biometrika Tables for Statisticians.685 26.995 α = .475 20.0506 .566 39.851 31.638 44.869 31.239 1.172 38.611 37.209 25.319 14 15 4.076 41.117 30.997 20 21 8.338 33.928 25 26 11.412 .845 32.379 38.524 13.262 7.548 6 7 .336 29 30 13.277 14.571 4.672 27.919 19.688 29.217 28.962 26.591 10.034 8.00393 3.0100 .805 37.216 .009 5.558 24 25 10.408 7.557 45.188 10 11 2.196 11.645 27 28 12.170 37.996 27.290 26 27 11.831 1.488 11.409 35.075 4.015 8.143 13.796 22 23 9.815 9.207 .260 9.191 38.267 16 17 5.652 40.722 49.434 8.897 10.210 10.488 30.364 42.013 18.892 53.297 .844 7.588 52.000157 .378 9.633 8.690 2.260 10.067 16.955 8 9 1.194 46.801 15 16 5.337 26.024 6.601 5.0717 .892 22.991 7.493 43.145 11.000 34.256 16.300 12 13 3.047 17.95 α = .819 13 14 4.708 42.646 2.725 26.141 31.953 16.635 12.461 13.675 21.907 10. 1954.587 30.404 5.736 27.879 14. 456 .879 1 2 .296 28.980 45.344 1.103 5.390 28.838 3 4 .0201 .787 14.856 12.144 32.535 20.086 16.507 17.565 15.023 21.01 α = .924 36.345 12.791 18.592 14.314 46.812 18.982 12.860 4 5 .629 6.ν ν α = .773 46.940 18.812 6.181 23 24 9.160 12.697 6.483 23.283 11.993 28 29 13.689 13.963 49.307 20.671 35.226 21.575 19.191 33.660 5.666 23.832 15.088 2.733 15.415 39.582 19 20 7.05 α = .589 9 10 2.074 3.121 14.886 10.156 18 19 6.920 24.461 48.025 α = . by permission of the Biometrika trustees.923 45.198 13.676 .120 14.571 23.410 34.646 44.231 9.578 32.070 12.564 8.026 23.151 40. Statistical Tables 2 † Table V: Values of χα.718 17 18 6.99 α = .348 11.885 41.091 35.848 36.479 38.113 43.484 .401 21 22 8.308 16.872 1.229 6.401 13.0000393 .053 3.237 1.808 12.908 7.781 40.565 4.278 7 8 1.352 7. Vol.167 14.156 2.554 .558 3.757 11 12 3.841 5.635 7.989 1.005 ν 1 .090 21.573 16.979 50.642 48. Cambridge University Press.289 42.591 32.362 24. 70 8.4 19.07 3.79 8.71 6. 33 (1943).57 3.46 2.92 2.59 4.19 5.1 9.4 19.65 2. Merrington and C.97 3.81 3.23 8 5.27 2.” Biometrika.74 4.18 2.33 3.05.55 9.59 2.85 2.3 19.4 19.5 19.69 2.83 2.11 2.96 2.88 4.94 6.58 2.69 3.49 2.76 2.41 3.21 4.84 3.87 3.62 8.18 3.44 3.14 3.60 3. Vol.34 3.91 5.40 4.91 2.75 2.46 4.30 2.72 5.39 2.79 3.34 3.86 2.74 3.22 2.22 3.95 4.74 3.50 3.72 2.70 2.68 3.90 2.53 4.28 3.77 5.09 6. Thompson.51 3.47 2.ν2 † ν1 = Degrees of freedom for numerator Statistical Tables 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 q ν2 = Degrees of freedom for denominator 1 161 200 216 225 230 234 237 239 241 242 244 246 248 249 250 251 252 253 254 2 18.90 2.79 2.07 3.53 4 7.36 3.42 2.09 3.40 2.5 19.80 2.12 9.15 3.31 2.56 4.71 2.94 2.38 3.67 7 5.84 3. M.75 3.ν1 .77 4.16 2.57 2.12 3.62 2.39 3.54 2. by permission of the Biometrika trustees.29 3.39 4.66 5.61 2.4 19.48 3.30 3.94 8.67 3.82 4.59 8.4 19.65 2.53 2.58 3.10 3.53 4.4 19.11 3. Table VI: Values of f0.96 5.49 3.55 8.4 19.13 15 4.01 2.34 2.54 11 4.91 2.63 5 6.81 8.0 19.85 8.39 6.66 2.25 2.74 8.28 9.64 8. 457 .26 3.77 2.86 5.93 9 5.70 2.85 2.14 4.54 2.33 2.79 2.62 2.48 2.29 2.5 19.00 2.63 3. “Tables of percentage points of the inverted beta (F) distribution.75 2.5 19.41 5.98 2.06 2.99 5.37 6 5.89 3.59 6.07 3.08 3.35 3.14 3.51 2.18 3.11 2.79 5.54 3.5 3 10.74 2.20 3.43 4.05 4.50 4.89 8.5 19.90 2.15 4.94 3.07 † Reproduced from M.16 6.48 3.41 3.59 3.20 2.28 4.53 2.95 2.26 3.27 3.04 3.26 6.44 3.23 3.71 10 4.87 3.21 14 4.71 3.86 3.10 4.81 3.38 2.75 5.03 2.06 4.3 19.35 2.85 2.61 5.98 3.35 4.80 5.38 2.68 3.5 19.77 3.22 3.57 8.85 2.12 3.2 19.01 2.71 2.67 2.76 4.70 3.45 2.79 2.73 3.77 2.64 3.43 2.34 2.29 3.84 3.00 5.40 12 4.02 2.00 3.04 6.30 13 4.2 19.37 3.46 2.60 2.53 2.83 2.68 4.96 4.25 2.01 8.62 4.74 4.64 2.46 4.12 4.66 8.01 2.97 2.32 4.60 2.69 5. 23 2.47 3.18 2.96 1.25 2.79 1.87 1.66 2.10 2.42 2.49 2.82 2.94 1.38 3.81 1.01 2.92 19 4.26 3.17 3.74 2.81 2.89 1.45 2.45 2.27 2.93 1.01 17 4.51 2.03 1.53 2.68 2.84 2.92 1.66 2.01 1.70 2.98 1.32 2.25 2.28 2.15 2.97 1.75 1.59 2.03 1.39 2.35 3.47 1.85 2.38 2.68 2.42 2.06 2.32 3.61 1.67 1.64 1.32 1.16 2.64 2.40 3.38 2.32 2.12 2.11 2.91 1.46 1.09 2.24 3.54 2.87 1.81 22 4.86 1.04 1.80 2.13 2.82 1.49 3.92 2.28 2.10 2.94 1.34 2.60 2.20 2.42 2.71 2.27 2.29 2.89 1.24 2.01 1.23 2.44 2.01 1.45 2.00 3.34 2.00 .15 2.49 2.18 2.96 18 4.07 2.77 2.79 1.37 2.24 3.84 1.69 2.92 3.60 2.79 1.42 2.19 2.74 1.01 1.41 3.74 2.08 2.96 2.25 q 3.84 2.71 30 4.37 2.25 2.01 2.23 2.84 1.09 2.04 1.89 1.03 2.34 2.84 1.39 120 3.00 2.31 2.70 1.55 3.53 2.90 1.68 1.05.18 2.09 2.15 2.83 1.00 1.49 2.25 2.52 3.96 1.30 2.48 2.91 1.18 2.52 1.87 2.46 2.83 1.39 2.458 Table VI: (continued) Values of f0.35 2.49 2.42 2.07 2.43 1.11 2.51 60 4.02 1.59 1.88 1.84 1.23 2.99 2.ν1 .98 1.40 2.28 3.74 1.20 2.16 2.65 1.05 2.21 2.76 2.20 2.50 1.03 1.11 2.93 2.01 1.62 2.57 1.55 1.19 2.16 2.60 2.35 1.19 2.27 2.53 2.17 2.95 1.05 2.51 2.33 2.75 1.06 2.49 3.30 3.98 1.37 2.63 2.46 2.06 2.92 1.40 2.90 2.88 20 4.21 2.99 1.08 2.10 2.69 1.75 1.84 21 4.30 2.35 2.01 1.02 1.42 3.44 3.73 25 4.78 2.78 23 4.93 1.57 2.45 2.15 2.24 2.76 24 4.92 1.59 3.84 1.96 1.32 2.10 2.96 1.55 2.77 1.55 2.07 2.99 1.15 2.61 2.53 1.22 1.ν2 ν1 = Degrees of freedom for numerator Statistical Tables 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 q ν2 = Degrees of freedom for denominator 16 4.16 2.07 2.05 2.76 2.41 2.28 2.54 2.34 2.61 2.66 2.84 3.94 1.10 2.37 2.11 2.13 2.08 3.92 1.58 1.62 40 4.63 3.58 2.39 1.66 1.36 2.31 2.12 2.45 3.51 2. 18 3.764 5.82 3.01 6.ν1 .47 5.03 3.65 7.5 14.31 6.04 4.0 16.91 5.7 10.0 14.1 30.96 3.65 7.26 8.209 6.01 3.3 8.61 5.1 11.20 9.70 5.78 3.6 8.023 6.4 11.287 6.81 4.67 5.47 6.3 27.68 6.43 3.70 4.403 5.2 26.91 5.72 9.08 4.13 3.2 99.32 5.62 4.59 3.86 3.106 6.25 4.41 5.09 3.69 3.4 99.0 99.67 5.39 4.4 99.3 12.62 6.05 2.99 6.33 4.30 4.9 26.88 7 12.12 5.46 7.41 4.80 3.25 3.339 6.5 5 16.75 8.11 9.46 4.10 7.3 10.06 6.9 13.03 5.07 6.5 27.19 6.21 3.74 5.6 13.2 10.89 4.7 14.07 5.63 4.9 27.55 5.78 3.48 4.14 4.74 5.56 7.30 4.27 3.97 6.928 5.63 6.64 4.78 9.7 26.2 15.8 13.4 99.99 6.65 4.74 4. Table VI: (continued) Values of f0.47 8.28 4.000 5.95 4.40 4.85 4.02 3.3 13.94 4.55 8.28 5.4 26.43 3.00 3.5 99.20 5.95 5.21 6.02 6 13.18 6.54 4.10 4.7 13.51 3.313 6.36 13 9.73 4.80 3.07 4.03 4.99 6.91 11 9.70 3.96 2.2 99.57 4.06 4.0 10.982 6.2 27.2 18.4 99.86 9 10.47 9.3 26.60 12 9.98 7.35 5.2 27.14 4.51 3.7 28.16 6.40 4.26 5.38 9.366 2 98.84 6.96 4.859 5.31 7.7 10.06 4.7 13.0 7.87 459 .22 5.14 7.5 99.89 9.5 99.0 15.52 3.45 3.67 3.29 3.17 14 8.52 5.99 5.5 99.82 4.93 5.86 6.00 15 8.5 99.17 4.5 3 34.3 99.15 8.89 3.37 3.35 3.5 28.51 5.80 5.45 7.87 7.4 14.56 4.59 7.052 5.50 4.3 99.1 26.4 99.056 6.86 4.5 99.29 9.06 5.86 3.44 4.625 5.7 27.72 6.5 10.55 9.56 6.25 4.1 4 21.2 14.00 3.19 4.64 5.5 15.66 3.21 4.39 5.8 29.157 6.20 5.5 26.72 7.01.65 8 11.94 3.42 6.23 7.4 99.02 6.6 26.56 5.261 6.2 9.89 4.85 7.40 7.62 3.7 16.82 5.8 14.71 4.16 4.ν2 Statistical Tables ν1 = Degrees of freedom for numerator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 q ν2 = Degrees of freedom for denominator 1 4.4 99.56 4.9 9.94 3.0 13.36 5.31 10 10.34 3.11 4.32 4.10 3.54 3.33 6.36 5.66 3.42 4.235 6.37 6.81 5.1 9. 95 5.34 3.70 1.82 5.50 2.51 2.45 2.61 3.66 2.74 2.53 2.66 4.66 2.94 2.17 3.38 q 6.59 3.84 2.41 3.18 4.95 2.78 2.52 2.19 5.65 18 8.48 3.09 2.94 3.75 2.96 2.80 2.52 2.88 5.31 23 7.94 4.56 3.18 3.92 1.76 3.70 3.64 2.94 3.47 2.49 20 8.03 2.46 3.69 3.68 3.78 4.82 4.02 1.98 2.85 4.36 3.89 2.03 1.39 4.30 3.58 4.78 3.54 3.31 3.53 6.26 3.47 2.41 2.26 3.93 5.66 1.57 4.62 2.22 3.99 2.83 2.31 2.35 2.09 4.51 4.37 2.45 2.95 3.46 3.11 5.29 6.98 4.37 3.12 2.86 2.54 2.12 2.72 2.03 3.00 2.35 3.60 120 6.17 3.71 3.66 2.50 4.460 Table VI: (continued) Values of f0.30 3.63 2.58 2.01 4.ν2 ν1 = Degrees of freedom for numerator Statistical Tables 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 q ν2 = Degrees of freedom for denominator 16 8.79 1.44 4.98 2.78 2.84 2.03 2.18 3.31 3.68 4.34 2.13 3.57 19 8.66 2.92 2.18 2.79 3.63 3.72 2.70 2.82 2.56 2.70 3.23 3.53 1.15 3.94 1.49 2.63 4.92 2.02 2.45 3.10 3.23 5.76 4.20 4.84 3.13 2.62 2.40 3.93 2.21 25 7.89 2.00 2.59 1.26 3.47 3.64 3.32 2.29 3.17 30 7.79 2.61 4.67 2.02 2.39 2.21 2.01.19 4.70 2.69 2.75 2.88 2.25 4.55 3.80 2.08 3.67 2.83 2.47 1.85 2.32 1.04 3.17 3.72 4.12 2.43 4.95 1.77 3.29 4.30 2.56 5.72 4.03 1.86 1.40 2.08 3.32 3.88 1.73 1.78 3.08 4.67 4.61 2.17 2.02 3.11 2.70 2.83 3.84 2.40 6.84 1.71 3.12 2.20 2.86 3.31 3.87 3.92 2.17 3.84 2.ν1 .80 2.26 3.42 21 8.41 3.79 3.75 17 8.75 2.51 3.50 3.10 3.35 2.76 1.51 3.00 2.65 3.55 2.58 2.21 3.89 3.34 4.40 2.10 5.29 2.90 3.31 5.99 3.77 4.76 2.80 60 7.30 3.04 1.60 3.64 2.99 2.36 2.27 2.07 2.77 5.67 3.87 4.01 40 7.10 3.11 2.37 3.93 3.22 3.02 5.52 3.59 3.19 2.26 24 7.46 3.00 .55 2.46 2.63 3.32 3.85 4.81 3.51 3.23 3.43 3.01 3.01 5.93 2.50 2.36 22 7.16 3.58 2.20 2.31 3.37 4.07 2. 040 3.1165 Binomial Coefficients .880 5.0000 1 1 0.6012 12 479.368.227.001.8573 7 5.0000 2 2 0.7024 8 40.200 10.600 8.6055 9 362.674.800 6.7943 14 87. Statistical Tables Table VII: Factorials and Binomial Coefficients Factorials n n! log n! 0 1 0.0792 6 720 2.3802 5 120 2.7782 4 24 1.307.9404 15 1.628.178.000 12.800 7.3010 3 6 0.6803 13 6.5598 11 39.916.320 4.020.5598 10 3.800 9.291. . . . . . . . . . . n n n n n n n n n n n n 0 1 2 3 4 5 6 7 8 9 10 0 1 1 1 1 2 1 2 1 3 1 3 3 1 4 1 4 6 4 1 5 1 5 10 10 5 1 6 1 6 15 20 15 6 1 7 1 7 21 35 35 21 7 1 8 1 8 28 56 70 56 28 8 1 9 1 9 36 84 126 126 84 36 9 1 10 1 10 45 120 210 252 210 120 45 10 1 11 1 11 55 165 330 462 462 330 165 55 11 12 1 12 66 220 495 792 924 792 495 220 66 13 1 13 78 286 715 1287 1716 1716 1287 715 286 14 1 14 91 364 1001 2002 3003 3432 3003 2002 1001 15 1 15 105 455 1365 3003 5005 6435 6435 5005 3003 16 1 16 120 560 1820 4368 8008 11440 12870 11440 8008 17 1 17 136 680 2380 6188 12376 19448 24310 24310 19448 18 1 18 153 816 3060 8568 18564 31824 43758 48620 43758 19 1 19 171 969 3876 11628 27132 50388 75582 92378 92378 20 1 20 190 1140 4845 15504 38760 77520 125970 167960 184756 461 . 1 1.7 812.3 0.9 2.0006 2.055 7.0 1.045 8.974 0.686 0.3 0.3 10.0 0.1 0.1 0.00041 2.46 0.273 6.2 9.4 11.0011 1.3 4.037 8.1 3.718 0.00005 462 .649 0.00030 3.1 0.0 148.00008 4.301 6.0 0.3 0.0061 0.765 0.5 4.067 7.8 2.669 0.00014 4.0 0.007 9.012 9.111 7.332.208.00007 4.3 0.2 492.607 5.00010 4.492 0.00027 3.0 403.9 134.4 0.1 3.318 0.1 164.7 6.8 44.6 13.440.0041 0.51 0.00007 4.0030 0.6 99.00006 4.0 0.7 5.0027 1.0 7.221 0.00017 3.4 81.670 5.023 0.00020 3.4 1.3 0.5 13.0 1.105 0.7 109.636.6 4.2 3.082 7.0025 1.697.2 1.0 0.482 0.202 6.4 0.6 14.7 40.3 9.00006 4.3 0.004 0.8 330.223 6.0 2.027 8.9 2.3 0.1 0.5 665.0008 2.00025 3.48 0.135 7.212.0055 0. Statistical Tables Table VIII: Values of ex and e−x x ex e−x x ex e−x 0.3 544.0 0.050 8.4 4.8 0.7 0.0067 0.3 73.389 0.431.000 1.998.0050 0.025 0.226 0.088 0.1 60.4 1.165 6.449 5.0009 2.460 0.022 8.8 16.819 5.6 1.34 0.822 0.150 6.914.95 0.447.5 0.0012 1.0 2.00055 2.6 735.023.0 0.8 18.8 897.9 365.061 7.350 0.3 200.4 0.1 0.9 18.9 992.4 12.018 9.041 8.497 5.055 0.938 0.7 0.02 0.166 0.1 22.0045 0.6 1.53 0.0018 1.017 9.69 0.474 0.103.002.183 6.014 9.4 221.5 12.70 0.1 445.953 0.9 49.009 9.9 0.2 3.11 0.000 5.0033 0.4 4.020 8.29 0.339.5 4.100 7.00034 3.6 0.00037 3.8 0.2 66.6 36.7 14.3 0.8 6.4 0.7 298.0007 2.034 0.741 5.09 0.0022 1.050 0.3 3.44 0.7 16.011 9.122 7.0 8.9 0.407 5.6 0.2 0.5 90.12 0.5 1.7 2.247 6.074 7.8 0.96 0.0020 1.00022 3.014 0.2 24.4 29.0 0.00018 3.0010 2.8 2.00012 4.40 0.1 8.0037 0.955.0014 1.70 0.00009 4.00015 3.4 0.6 270.00011 4.641.320 0.096.010 9.9 6.897.0007 2.0017 1.360 0.17 0.60 0.808.905 5.3 1.368 6.091 7.033 8.333 6.9 0.20 0.00050 2.6 0.1 1.5 244.025 8.8 121.3 1.88 0.4 0.5 1.480.2 9.0 54.4 601.8 6.634.2 1.294.9 0.45 0.549 5.6 5.0 20.2 0.8 0.18 0.9 7.930 0.0015 1.00045 2.030 8.981.3 27.008 9.7 2.9 19.1 8.2 181.015 9.60 0.5 33.45 0. 59 4.19 5.47 4.79 4.16 4.29 4.31 4.67 4.07 6 5.74 6.48 4.85 4.20 4.48 40 3.73 4.88 4. “Critical Values for Duncan’s New Multiple Range Test.06 11 4. Duncan in “Multiple Range and Multiple F Tests. Vol.66 4.24 5. Harter.51 4.64 4.42 5.95 5.70 4.25 4.70 3.32 4.50 4.77 4.66 4.45 4.39 4.64 20 4.18 4.43 4.17 4.48 4.80 3.46 4.12 5.15 5.98 4.56 4.66 4.06 4.57 4.39 4.75 4.90 5.04 4.31 4.85 14 4.57 4.55 30 3.26 8.84 4.71 4.46 4.55 5.26 4.13 4.f.27 4.” Biometrics.44 8 4.76 5 5.11 4.32 4 6.09 5.23 4. B.02 2 14.51 6.01† p d.23 5.17 4.92 4.89 4.77 4.44 5.44 4.42 4.76 16 4.94 5.95 4.62 4.66 4.33 4.61 4.21 † This table is reproduced from H.37 4.26 5.82 3.58 4.68 6.69 18 4.59 4. 2 3 4 5 6 7 8 9 10 1 90.38 5.70 4.16 10 4.88 4.45 4.21 4.59 4.38 4.70 4.62 5.33 5.26 5.25 4.27 q 3.41 4. The above table is reproduced with the permission of the author and the Biometric Society.31 4. L.24 4.04 4.79 4.99 6.06 5.04 6.34 120 3.63 4.34 4.04 5.79 4.14 5.13 4.39 4.05 4.99 5.33 4.72 17 4.01 5.24 4.56 4.13 5.97 12 4.24 4. 463 .60 4.70 4.80 15 4.39 4.10 4.04 5.” It contains some corrected values to replace those given by D.31 4.51 4.97 4.78 4.48 4.51 4.74 4.04 3 8.02 4.36 4.74 4.51 4.54 4.60 4.93 4.86 3.90 3.32 4.18 4.55 4.76 3.55 4.64 4.41 60 3.91 4.92 4.82 4.91 13 4.62 4.17 4.07 4.64 3.10 4.36 4.81 4.89 4.68 7 4.44 4.28 9 4.71 4.13 4.11 4.70 5.22 4.98 5.96 4. 11 (1955).66 5.61 4.03 4.20 4.66 19 4. Statistical Tables Table IX: Values of rp for α = 0.62 24 3.32 8.04 14.99 4.53 4.40 4.52 4.09 4. 22 3.47 14 3.52 3.25 3.15 3.49 3.03 3.02 3.41 24 2.55 3.23 3.68 3.43 3.63 8 3.54 3.46 3.27 3.52 3.54 3.95 3.48 3.07 3.48 3.42 3.92 3.40 3.40 3.77 2.18 3.35 3.52 4 3.20 3.44 3.30 3.37 3.36 3.16 3.47 3.62 3.29 3.75 3.39 3. 2 3 4 5 6 7 8 9 10 1 17.36 3.28 3.46 15 3.38 3.81 3.21 3.86 3.04 3.14 3.31 3.11 3.48 13 3.45 16 3.27 3.05 p d.95 3.41 3.61 3.51 3.03 3. Statistical Tables Table IX: (continued) Values of rp for α = 0.41 20 2.03 5 3.44 3.43 18 2.40 3.00 3.96 3.42 3.30 3.46 3.52 3.19 3.20 3.48 3.43 3.35 3.31 3.20 3.97 2 6.69 3.89 3.50 4.39 30 2.37 40 2.25 3.46 3.34 3.64 3.31 3.28 3.33 3.09 3.38 3.83 2.01 4.34 3.31 q 2.43 3.29 464 .98 3.39 3.20 3.24 3.57 3.57 3.37 3.55 3.93 4.50 3.25 3.03 4.12 3.19 3.01 3.01 3.07 3.52 11 3.32 3.06 3.27 3.23 3.38 3.22 3.28 3.17 3.31 3.40 3.09 3 4.38 3.17 3.11 3.10 3.47 3.29 3.12 3.50 12 3.44 3.47 3.32 3.44 17 2.20 3.34 3.39 3.59 3.46 3.92 3.49 3.16 3.70 7 3.22 3.26 3.31 3.23 3.08 3.97 3.81 6 3.23 3.37 3.35 3.f.13 3.33 120 2.40 3.33 3.55 10 3.59 3.34 3.46 3.40 3.30 3.42 19 2.14 3.42 3.80 3.34 3.65 3.26 3.29 3.35 60 2.80 2.09 6.26 3.13 3.33 3.41 3.15 3.58 9 3.39 3.10 3.35 3.52 4.25 3.41 3.29 3.98 3.37 3.27 3.37 3. Amer- ican Cyanamid Company. Reproduced with permission of American Cyanamid Company.02 T0.10 T0. 465 . Wilcoxon and R.01 4 5 1 6 2 1 7 4 2 0 8 6 4 2 0 9 8 6 3 2 10 11 8 5 3 11 14 11 7 5 12 17 14 10 7 13 21 17 13 10 14 26 21 16 13 15 30 25 20 16 16 36 30 24 19 17 41 35 28 23 18 47 40 33 28 19 54 46 38 32 20 60 52 43 37 21 68 59 49 43 22 75 66 56 49 23 83 73 62 55 24 92 81 69 61 25 101 90 77 68 † From F. Statistical Tables Table X: Critical Values for the Signed-Rank Test† n T0. Pearl River. Some Rapid Approximate Statistical Procedures. A. N. Y. Wilcox. 1964.05 T0.. 1. 1953.10 n2 n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 0 0 0 1 1 1 1 2 2 3 3 3 0 0 1 2 2 3 4 4 5 5 6 7 7 4 0 1 2 3 4 5 6 7 8 9 10 11 12 5 0 1 2 4 5 6 8 9 11 12 13 15 16 18 6 0 2 3 5 7 8 10 12 14 16 17 19 21 23 7 0 2 4 6 8 11 13 15 17 19 21 24 26 28 8 1 3 5 8 10 13 15 18 20 23 26 28 31 33 9 1 4 6 9 12 15 18 21 24 27 30 33 36 39 10 1 4 7 11 14 17 20 24 27 31 34 37 41 44 11 1 5 8 12 16 19 23 27 31 34 38 42 46 50 12 2 5 9 13 17 21 26 30 34 38 42 47 51 55 13 2 6 10 15 19 24 28 33 37 42 47 51 56 61 14 3 7 11 16 21 26 31 36 41 46 51 56 61 66 15 3 7 12 18 23 28 33 39 44 50 55 61 66 72 Values of U0. 466 .05 n2 n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 0 0 0 0 1 1 1 1 3 0 1 1 2 2 3 3 4 4 5 5 4 0 1 2 3 4 4 5 6 7 8 9 10 5 0 1 2 3 5 6 7 8 9 11 12 13 14 6 1 2 3 5 6 8 10 11 13 14 16 17 19 7 1 3 5 6 8 10 12 14 16 18 20 22 24 8 0 2 4 6 8 10 13 15 17 19 22 24 26 29 9 0 2 4 7 10 12 15 17 20 23 26 28 31 34 10 0 3 5 8 11 14 17 20 23 26 29 30 36 39 11 0 3 6 9 13 16 19 23 26 30 33 37 40 44 12 1 4 7 11 14 18 22 26 29 33 37 41 45 49 13 1 4 8 12 16 20 24 28 30 37 41 45 50 54 14 1 5 9 13 17 22 26 31 36 40 45 50 55 59 15 1 5 10 14 19 24 29 34 39 44 49 54 59 64 Values of U0. Auble. By permission of the author. “Extended Tables for the Mann–Whitney Statistics.” Bulletin of the Institute of Educational Research at Indiana University.02 n2 n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 0 0 0 3 0 0 1 1 1 2 2 2 3 4 0 1 1 2 3 3 4 5 5 6 7 5 0 1 2 3 4 5 6 7 8 9 10 11 6 1 2 3 4 6 7 8 9 11 12 13 15 7 0 1 3 4 6 7 9 11 12 14 16 17 19 8 0 2 4 6 7 9 11 13 15 17 20 22 24 9 1 3 5 7 9 11 14 16 18 21 23 26 28 10 1 3 6 8 11 13 16 19 22 24 27 30 33 11 1 4 7 9 12 15 18 22 25 28 31 34 37 12 2 5 8 11 14 17 21 24 28 31 35 38 42 13 0 2 5 9 12 16 20 23 27 31 35 39 43 47 14 0 2 6 10 13 17 22 26 30 34 38 43 47 51 15 0 3 7 11 15 19 24 28 33 37 42 47 51 56 † This table is based on D. Vol. Statistical Tables Table XI: Critical Values for the U Test† Values of U0. 01 n2 n1 3 4 5 6 7 8 9 10 11 12 13 14 15 3 0 0 0 1 1 1 2 4 0 0 1 1 2 2 3 3 4 5 5 0 1 1 2 3 4 5 6 7 7 8 6 0 1 2 3 4 5 6 7 9 10 11 12 7 0 1 3 4 6 7 9 10 12 13 15 16 8 1 2 4 6 7 9 11 13 15 17 18 20 9 0 1 3 5 7 9 11 13 16 18 20 22 24 10 0 2 4 6 9 11 13 16 18 21 24 26 29 11 0 2 5 7 10 13 16 18 21 24 27 30 33 12 1 3 6 9 12 15 18 21 24 27 31 34 37 13 1 3 7 10 13 17 20 24 27 31 34 38 42 14 1 4 7 11 15 18 22 26 30 34 38 42 46 15 2 5 8 12 16 20 24 29 33 37 42 46 51 Table XII: Critical Values for the Runs Test† Values of u0. 14. Vol. “Tables for testing randomness of grouping in a sequence of alternatives. from F. Swed and C. 467 .” Annals of Mathematical Statistics. S. by permission.025 n2 n1 4 5 6 7 8 9 10 11 12 13 14 15 4 9 9 5 9 10 10 11 11 6 9 10 11 12 12 13 13 13 13 7 11 12 13 13 14 14 14 14 15 15 15 8 11 12 13 14 14 15 15 16 16 16 16 9 13 14 14 15 16 16 16 17 17 18 10 13 14 15 16 16 17 17 18 18 18 11 13 14 15 16 17 17 18 19 19 19 12 13 14 16 16 17 18 19 19 20 20 13 15 16 17 18 19 19 20 20 21 14 15 16 17 18 19 20 20 21 22 15 15 16 18 18 19 20 21 22 22 † This table is adapted. Statistical Tables Table XI: (continued) Values of U0.025 n2 n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 3 4 2 2 2 3 3 3 3 3 3 3 3 5 2 2 3 3 3 3 3 4 4 4 4 4 6 2 2 3 3 3 3 4 4 4 4 5 5 5 7 2 2 3 3 3 4 4 5 5 5 5 5 6 8 2 3 3 3 4 4 5 5 5 6 6 6 6 9 2 3 3 4 4 5 5 5 6 6 6 7 7 10 2 3 3 4 5 5 5 6 6 7 7 7 7 11 2 3 4 4 5 5 6 6 7 7 7 8 8 12 2 2 3 4 4 5 6 6 7 7 7 8 8 8 13 2 2 3 4 5 5 6 6 7 7 8 8 9 9 14 2 2 3 4 5 5 6 7 7 8 8 9 9 9 15 2 3 3 4 5 6 6 7 7 8 8 9 9 10 Values of u0. Eisenhart. Statistical Tables Table XII: (continued) Values of u0.005 n2 n1 5 6 7 8 9 10 11 12 13 14 15 5 11 6 11 12 13 13 7 13 13 14 15 15 15 8 13 14 15 15 16 16 17 17 17 9 15 15 16 17 17 18 18 18 19 10 15 16 17 17 18 19 19 19 20 11 15 16 17 18 19 19 20 20 21 12 17 18 19 19 20 21 21 22 13 17 18 19 20 21 21 22 22 14 17 18 19 20 21 22 23 23 15 19 20 21 22 22 23 24 468 .005 n2 n1 3 4 5 6 7 8 9 10 11 12 13 14 15 3 2 2 2 2 4 2 2 2 2 2 2 2 3 5 2 2 2 2 3 3 3 3 3 3 6 2 2 2 3 3 3 3 3 3 4 4 7 2 2 3 3 3 3 4 4 4 4 4 8 2 2 3 3 3 3 4 4 4 5 5 5 9 2 2 3 3 3 4 4 5 5 5 5 6 10 2 3 3 3 4 4 5 5 5 5 6 6 11 2 3 3 4 4 5 5 5 6 6 6 7 12 2 2 3 3 4 4 5 5 6 6 6 7 7 13 2 2 3 3 4 5 5 5 6 6 7 7 7 14 2 2 3 4 4 5 5 6 6 7 7 7 8 15 2 3 3 4 4 5 6 6 7 7 7 8 8 Values of u0. 342. 33-34 binomial. 74. 323. 61. 242. 397. 47-48. 320. 354. 329-330. 215 Circles. 269. 273. 374-375. 228. 281. 121-123. 215. 89. 378-385. 369. 320. 89. 309. 259-260. 22 352-354. 396-398. 320-322. 11-17. 153. 106-107. 439-441. 164-165. 364. 140. 311. 138. 428-430 224. 318. 96. 437-438 428 216-221. 304. 425. 110 Decay. 93-94. Base. 61-112. 104. 298 Axes. 68 267. 94-95. 145-155. 305-306. 428 sample. 251. 257. 224-225. 117. 248. Calculus. 178. 173. 18. 287. Applied problems. 404. 181. 174-175. 382. 66. 365. 193-194. Center. 426. 267. 181. 421-423. 260. 14 Differential equations. 255 371-375. 312. 237. 361. 141. generating. 113. 130. 220. 187-188. 127. 409-410. 226 Bernoulli trials. 406. 304. 374-375. 344-346. 173. 157. 349. 141. 424 Area. 362. 337. 287. 80. 195 first. 204-205. 219. 387-388. 173-174. 461 407. 76. probability of. 420 Approximation. 219. collection. 396. 52. 356. 189-195. 408-409. 224-225. 412-413. 60. 323. 189. 16. 200 Column matrix. 374. 101. 397. 68 131. 261. 82. 198. 224. 349 231. 194. 74-75. 106. 260. 359-363. 67. 339-340. 197. 342 140. 366 290. 435 Counting. 53. 284-287. 16. 198 Consumer's risk. 426. 183. 200 Curve fitting. 419-420. 169. 412. Bonds. 364. 377. mean of. 359-362. 191. 199. 430-431 Acceptable quality level. 420 Diameter of a circle. Calculators. 430 rounding. 212. 314 distributions of. 367. 191-194. 400-403. 196. Candidates. 419. 312. 38 Deviations from the mean. 443-444. 428 Biased estimator. 325. 145-155. 177. 378-380. 143. 241. 231. 5. 253. 384-386. Binomial distribution. 244. table. 327-328. 310. 203. 86. 240. 332-334. 77-81. 199. 76-77. 369. 140. 388 order of. 140. 344. 313. 437-438 Change of variable. 422-423. 88-92. 405. 345. 2. 312. 369. 34. 108. 83-96. 269. 298. Completing the square. 231. 266. 252. 257. 73-74. 318-321. 250. 98. 304. 132-138. 185-186. 338. 193. 153. 136-137. 78. 64. 1. 259. 201 Absolute value. 17. 12. 313 320-323. 241-257. 298. 352-353. 388 Data points. 109. 463. 404 A Chi-square statistic. 153 207. 298. Combinations. 273-274. 66. 364-366. 64. 257-258. 196. 213. 153-155. 193-194. 138. 140. 413-414. 57. 21. 125. 218-219. 333 reduced sample space. 128-129. 36. 355-356. 403-404. 326-327 Addition. 302. 214-215 Average. 32 Derivatives. 177-179. 374-376. 207-208. 382. 141-142 Bias. 368-370. 233-239. 109. 123. 176. 162. 229. 88-89. 257. 145-169. 342-354. Contingency tables. 345. 157-169. 8-9. 204. 332-334. 278. 321. 213 368. 73. 170 Differentiation. 33-34. 326. 73. 135. 222. 431. 164. 106. 238. 443. 391-396. 114-115. 291 B Continuous random variable. 216. 386. 250. 428 real numbers. 205. 220 106-111. 193-195. 23. 198. function. 395-396. 410 standard deviation of. 64 partial. 110-111. 426-427. 413 Areas. 1-2. 217. 388 303-304 Continuity. 119. 166. 259 Bar charts. 423 375. 356. 437-438 average. 332 391-392. 237. 212. 356. 337-338. 401. 414 Arithmetic. 419. 253. 158-159. 379-383. 172. 187-188. binomial. 49. 141. 412. 37-38. 369. 113-114. 131. 326 Addition rule. 361-363. 284. 103-104. 342 Bayes. 109-112. 372. 18. 417 Abscissa. 352. 393-394. 231. 373-374. 174. 86. 191. 229. 299. 207. 430 Degree. 352. 399. 415 412. 186. 143 Discrete random variables. 260. 404. Correlation coefficient. 391-432 129-131. 168. Conditional probability. 312 237. 248. 361. 89-91. 421. 429-430 276. 398. 280. 75-76. 425-431 352-353. 15-16. 417. 192. 177. 284. 324-325. 75. 107. 163-164. Coordinates. 242. 14-16. Defects. 285. 198. 121. 181. 140-142. 204. 127. 391. 292-293. 122. 202. 276 Coefficient. 257. 103. 198 minimizing. 214. 413 193. 244. 411 Complementary events. 183. 252. 207-208. 242-245. 134. 429 Approximately normal data. 313. 55. 314 generating. 349. 423 422-423. 374. 406. 294-295. 383-387. 443. 235. 251-252. 68-74. 198. 364. 226. 11-14. 293. Binomial expansion. 102. 377-378. 177. 119. 419. 130. 10. 379. 202. 17 sample spaces. 328. 32 Dependent variable. 114-115. 145. 362. 297 Determinants. 403-404. 180 366. 389 437-438. 301-302. 331 Difference. 17. 395-396. 373. radioactive. 382. 106. 430 397-398. 127. 111 352. 345. 1. 138. 210-211. 417-418. 143. Certain events. 74. 225 Dispersion. 297. 400. 415-416. 367-368. 404. 287. 61 429. 207-212. 406. 168. 123. 387-388. 271. 7. 1 439 definition of. 417 Denominator. 202 100-101. logarithmic. 121. 193-194 Decreasing function. 410-411. 265. 375. Diagrams. 178. 273. 125. 419 Density function. 404. 199 281. 314 Algebra. 40. 236-238. 280. 177-201. 320. 172-176. 333-335. 164. 147. 109. 65-66. 227-228. 290. leading. 385 271-274. 420. 391. Decimals. 234-235. 231 Data sets. 1 85-86. 33-34 correlation. 181. 196. 97. 291-292 Contours. 325. 285. 419-424. 337.Index Page references followed by "f" indicate illustrated Charts. 80. 95. 409. 138. 75. 259. 68-73. 387. 323. 216. 241. 172-174. 371-373. 273 Cumulative distribution function. 146. 321. 422 352-357. 17. 122. 183. 305-311. 372. 241. 353. 198. 128. 356-357. 16. 164-165. 388-389 Averages. 114-115. 221. 399 340. 91. 170 Class intervals. 45 330. 82-84. 242 311. 109. 401 230-231 Capacity. 404. 119. 185. 75. 428. 395. 411. 198. 77 184-185. 400. 71. 372-373. 149. 185. 203. 261-263. 283. Digits. 177. 203. 313. 77-81. 141-142. 26. 278. 157-169. 317-318. 98. Complement. 388. 125. 304 Argument. 103-104. 101-102. 371-372. 141. 80. 385 297-298. 231. 406-412. 199. 84. 327-331. 338-342. 434 Confidence intervals. 430 Composite null hypothesis. 332 Data. 199. 414-415. 361. 68. 203-205. 125. 340. followed by "t" indicates a Chi-square distribution. 335. 457 C Critical values. 366. 291. 331. 349 Axis. 201 Array. 196. 104-105 Critical region. 184. Coefficients. 106 Convergence. 257. 214 probability. 12. 356. 293. Costs. 400. 158-160. 23. 314. 260. 289-295. 429-430 Distributions. 80. 54. figures or photographs. 314. 379. 54. 201. 247. 278 Bimodal histogram. 157. 301-302. 233. 404-407. 198. 375. 211-212. 26. 383-386. 157. 469 . 109-110. 8. 322. 366-368. 122. 220. Central limit theorem. 372. matrix. 386. 223-232. 1. 465-467 Distribution function. 253. 150. Descriptive statistics. 6. 68. 101-102. 79. 180. 234. 185. 84-87. 218-219. 426 Complex numbers. 32. 231. 365. functions. 165 ellipse. 325 Circuits. 143 130-131. 367. 101-106. Thomas. 364 second. 164. 306. 65-69. 28. 277-278. 283. 271. 372-373. 317. binomial coefficients. 395-396. 191. Alternative hypothesis. 412. 428. 194. 130. 93-94 Distance. 402. 231 203. 198-202. 329. 332 Cube-root. 98-104. Categories. 225. 172. 216 Discrete random variable. 52. 291. 207 formula. 277. 454. 86. 277. 172-175. 332. 246-252. 308-309. 280. 368. 173. 356. 251. 204-205. 283 Constant term. 348-350. 324. 85-86. 317. negative. 399. 217. 172. 57. 311. 153 Covariance. 243-244. 392. 422. 285-286. 233-260. 280 Class boundaries. 230 Accuracy. 325. 107-108. 381. 425 Days. 172. 107. 67-68. 158. 181-182. 26. 230 Acceptance sampling. 228. 359. 356. 388 D 145-176. 76. Correlation. 152. 419-420. 242-244. 383. 22. 121. 345. 373. defined. Constant. 143. 422 Degrees. 11-15. Confidence level. 38. 322-324. 277. 23. 173. 54. 408-409. 379. 26-27. 18-19. 359. 399. 337-338. 201 Continuous random variables. 241. 276-277. 293. 389. 314. 184. 408-409. 140. 233. 293 Distribution. 205. 275. 372. 56. 298. 430. 202. 267. 150. 30. 73. 381. 330. 246. H Histogram. 231. 305. 146. 396-398. 332-334. 181. 325. 1. 61. 332. 229. 291. 255. 225-228. 374. 185. 165 236. 437 Line. 426 Linear combination. 335. 317. difference. 48. 48. 127. 111. 419 276 formulas for. 317-319. 425-427. 97. 165. 333. 235. 184 standard normal. 72. 202. 199 defined. 132 Identity. 186. 311. reference. 59. 131-132. 420. 210-211. 374. 326. 55. 68. 366. 231. 84. 402 sampling. 110. 82. 350. 52 defined. Likelihood function. Estimate. 219. 400. 241. 165. 66-68. 335. 27. 104. 142. 210. 193. 348. 313. 263. 220. 149. 283-315. 152. 172. 381. 384 Infinite. 164. 345. 345. 418. 165. 113-144. 155-158. 260. 332 one variable. 257-258. 325. 21. 130. 98. 113. 283-294. 204. 280. 75. 283-284. 45. Location. 164. 102. 189. 189-195. 224-225. 183. 311. 192. 169. 73. 197. 406. 73. 302-306. 94. 370. 233. 74. 64. 211. 153. 214 420 E linear. 260. 403 Estimated regression line. 311. 257-258. 233. 121. 227-228. 157. 264. Infinite sequence. 21. 73. 299. 212. 82. 195. 415 Empty set. 242-245. 72. regression. 24. 185-186. 417. 361 unbiased. 181-182. 416 Mathematical expectation. 203. 56. 423 gamma. 230. 366. 2. 117. 336. 369. 227. 242. 317-318. 231. 412-414. 189-190. 53-54. 388 Line segments. 205 First quadrant. 107-110. 272. 241. 16-17. 122. 233-234 Integration by parts. 366. 332. 219 Inference. 205. 412 Estimated standard error. equations. 205. 113. 182. 345. 260 of matrix. 346 family of. 167-168. 251. 191. 422 269. 57. 198 minimum value. 231. 74. 227. 224. 85-86 Equations. 105. 391. 329-331. 194 337. 235. 141 odd. 406. 52. 308. 398. 369-370 square. 108. 74-76. 164. 291 283. 233 395. inverse. 145. 275 graphs of. 55. 53-54. 192. 379. 426. 163-165. 30-31. 130. 391. 23 257-260. 379-380. 113. 435 Inflection points. 132. 167-168. 110. 318. 401-403. 227-228. 224. 220. 231-232. 227-228. 242 Linear regression. 398 exponential. 422. 292. 24-26. 325. 318. 141 Hours. 283-299. 214 column. 229. 109-110 F Independent variable. 233. 177. 406 Gamma function. 241. 325. 217. 317-319. 181. 241-243. 185. 317. 184. 299. 106. factoring out. 15-16. 50. slope of. 55 Expected values. 50. 429 311-314. 406. 283. 256. 418 Grams. Frequency. 408. 11-13. 305-306. 335 relative. 201. 61. 16. 343. 173. 230. 143. 423. 48-49. 291. 322. 411. 61. 332-334. 233-260. 325. 231. 119-122. 45. Double integral. 361 406. 155-157. 321. 204. 415 Experiments. 122. 292-293. 420. 335. 136. 184-186. 233. 418 431 biased. 338. 317-336. 141 certain. 412. 244. 64. 101. 68. 349 Gamma distribution. 248. 437 Factors. 268. 443. 219 269. 382. 193. 241. 233. 302-305. 28. 406-408. 372. 101. 260. 312 Hypotenuse. 68. 127 Expectation. 403. 225. 18. 285. 218-219. 399. 419 frequency. 236-238. 113. 233. 232. 418 Liters. 305-306. 269. symmetric. 195-196. 185. 365-366. 392 174. 228. 49 chi-square. 305. 62. 246. 412. 49. 289 multivariate. 406-407. 169-171. 171 Hypergeometric distribution. 283-299. 73. 98. 68. 22. 38. 74 231. 419 280. 260. 430. 101. 423-424 Inclusion-Exclusion Principle. 169-170. 204. 392 Law of Large Numbers. 99. 28-31. 431 224-225. 461 Independent variables. 260. 105-106. 416 Mathematical expectations. 207-232. 76. 325. 113. 294. 311-313. 306. 108. 260. 95 median. hypergeometric. 165 Interest. 332 400. 84. 256. 425. 153. 144 Events. 102-104. 335. 20. 113. 231. Integrand. 320. 238. 284-286. 165. 399. 332. 42. 318. 233. 14. 225. Horizontal axis. 419 Extrapolation. 420 Integers. 372. 236 Error. 432 327. 68. 311-313. 359. 409. 223. 109-110. 304. 179. 104-106. 386. 323. 119. 341-343. dividing. 150-151. 392. 308. 84. 56. 154. 267. 299. 270. 251. 104-106. 247. 214. 130. 194. 305-306. 165. 422 223-224. 253-254. 270. 193. 412 Estimation. 141. 73 420. square-root. 391. 332. 142. 276-278. 141. 378. 371-372. 269. 141 323. 333-335. 141-142. 248-249. 199 Histograms. 427. 22-23. 223. 76-77. 326. 395 437-438 214. Games. 437-438 Finite sequence. Poisson. 225-228. 267. 236. 49 exponential. 224 317. 173. 36 286-287. 248. 321. 415-417 Ellipse. 134. 154 218-219. 228. 226. 258. 199 Integrals. 246-248. 165. 226. 332. 174. 415. 310. Graphs. 1. 143. 86. 277-278 repeated. 17 Inequalities. 176. 299. 302. 337. 257-258. 199. Greater than. 417. 343. 329. 389. 145. 106. 438 218-219. 95. 345. 160. 121. 57. 88. 407-408. 290. 412. 402. 292-293. 19. 419 defined. 235. 165. 81. 230-232. 249. 251-252. 16. 173-174. 364. 328. 250. 191. 107 Frequency distribution. 224-225. 242. 1. 392. 392 100-101. 325 Lines. 255. 260 Joint probability density function. 109-110. 244. 305. 66 433-434. Infinity. 374. Factorials. 155-158. 40-46. 347. 423 sum. 357. 29. 334. 379. 110. 395-396. 283. 115. 214. 387-388 subtracting. 18-19. 203. 299. 367 234-236. 192-195. 73-74. 425-430 Growth. 352. 333. 311. Geometric distribution. 167-168. 74. 224. 232 functions. 430 313-314. 422-424. 403. 408 two variables. 429-430 52-53. 312. 305-306. 12. 290. 272. 236 normal. 36. 253. 229 401-403. 181-182. 388. 276. 106. 30. 195. 181. 152. 325. 169-171. G Least squares. 175. 311. 334. 345. 263. 426 Mathematical induction. 311-314. 339. 42-43. 84. 238. 400. 33 Matrices. 406 even. 165. 432 271-276. 126. 317 product. 201-202. 199. 259-260. 430 425-426. 226 Inverse. 73. 61. 396. 375-376. 32 one-to-one. 106. design of. 225. 66. 164. 192-193. 121. Exponents. 23. 427 381. 102-106. 61. 425-427. 305. 180-182. 134. 32. Increasing function. 84. 339. 64. 389 Independence. 110. 344 311 Factoring. 73-74. 155. 224-225. 81. 222-223. 214 Empirical distributions. 437-438 379. 212. 252. 396-398. 233-234. 251. 223 Endpoints. 158-165. 127. 141 Linear relationship. 257. 172. 412-416. 127. 260. 77. 145. 1 146-147. Independent events. 308. 162. 412 Estimator. 333. definition of. 226. 178. 300-301. 39. 421. 73-74. 228. 165 Excel. 59. 74. 11. 327. 145. 252-253. 107. 323. 387. 21-22. 50. 244. 224. 314. 423 rectangular. 305. 391 I Mass. 213 255. 356. 314 299. 311. 260. 212. 350. 276. 147. 301. 402. 245. 329. 323. 257. 198. 261-265. standard. 157. 379. 242. constant. 318. 167-172. geometric. 212. 173. 235. Inches. 384. 311. 396. 165. 328. 26. 203-204. 44. 173. 338-339. 64. Identity matrix. 250-252. 146. 181. 363. 359. 79. 56. 145. 74. 276. 76. 406. 165. 400. 423 382. 426 Linear equations. 35. 165. 356. 212. 227. 323. 260. 173. 351 complement of. 102-106. 86. 377. 192. 258-260. 372. 401-402. evaluating. 28. 212. 194. 220-221. 24. 174. 68. 218-219. 241-248. 184 324. 225. 320. 23-24. 327. 355-356. 191-194. 255. 195. 292. 276 J Equality. 234. 7. 146. 149. 284. 231. 369. 165 skewed. 403. 314. 347. 199. 341. 101. 141 defined. 215. 145. 412-417. 347-348 Integration. 134. 298. 397-398. 88. 344. 193. 237. 90-91. 159. 302-306. 50. 248-252. 270. 374. 332 241. 256. 54-55. 305. 126-127. 205. 34. 299. 193. 406. 76. 60. 177. 83. 256. 173. 22. 409. 148. 235. 241. 399. 106 defined. 107. 82. 223. 142-144. 403. 356. 177-179. 101. 312 bimodal. 148. 328. Distributive law. 357. 194. 388 definition of. 187. 312. 136. 172. 304. 272. sampling. 345-350. 102-104. 207. 127. 280 L exponential. 149. 22-23. 172. 300. 126-127. 396-403. Functions. 280 Length. 235. 219-220. 134. 140. 203-205. 408-411 point. 177-178. 353. 57 322. 429-430 201-202. 75. Fractions. 77. 284. 398-399. Game theory. 395 Expected value. 22. 327. 195. 49. 429 364. 439-440 Independent random variables. Formulas. 277. 374. 185-186. 384. 418. 384. 224. 38. 437-438 Lower bound. 41 341. 165. 173-174. 323. 180. 1. 21. 427 243. 163. 336. 191. 179. chance. 395. 410-411. 123. 61 Intervals. 32-36. 298-303. 193. empirical. 28. 308. 391 M applications. 80-82. 233. 415 337. 113-144. 93-94. 177. 231. 290. Exponential distribution. 128. 108 series. 95-96. 150 Experiment. 181. 229. 434 Domain. Gallons. 353. 261 73. 113-116. 392. 257. evaluating. 184. 33. 333. 129. 427-429 Logarithms. 412. 290-292. 26-28. 418. 79-81. 211-212. 280 method of. Feet. 165. 415 470 . 420. 117. 313-314. 177. 372. 231. 406. 238. 28-30. 427. 325. 403-404. 401-402. 251. 396-397. 439-440 Limits. 155. 194. 429 Paths. 359-389. Mutually exclusive events. 236 Probability density function. 333. 183. 369. 297-298. 430. 225. 279-281. 342. 383. 79-80. 33. 299. 73. 267. 136-138. 259. 82. 323. 248-251. 142-144. 122-123. 113. 307-308. 115. 437-438. 272. 378. 295. 121. 82. 345-350. 22. 130-131. 367. 152. 311. 293. 290. 110. 190-195. 121. 207. 56. 399 Q Method of least squares. 27-28. 185. 174. 63. 103. n factorial. 109. 142-144. 423 Pooling. 212-213. 332. Prior probability. 426 411-412. 173. 379. 218. 365. 400-403. 359. 345-348. 276. 106-107. 266. limit. 302. 217. 421-423. 82-84. 420. 64. 269. 265-267. 334-335. 423. 162-165. 327. 204. 203 discrete. 308. 61. 312 98-101. 421-423 Personal probability. Odds. 225. 396-397. 422. 5 146. 233. 236-237 138. 412. 381. 108. 143. 61. 437-438. 231. 310-314. 261 177-205. 333-334. 172-176. 263. 204-205. 317. 352. 311. 292-295. 292. 154. 188. 164-165. 290. 104. 369. 402-403. 417-418. 73-76. 202. 52. 242. 265. 235. 210-211. 168. 228. 26. 288. 37. 143. 202. 212. 163. 201-202 Random sampling. 82-83. 265-268. 40. 55. 247. 233-234 Normal random variables. 146. 24. notation. 411. 241. 134. real. 168-169. 218-219. 267 398-399. 297. 331. 430 expectations of. 228-229. 399. 146. 412. 189. 39. 169. 241 430 226-228. 84-86. 417 332. 412-413. 345-346 256. 71-72. 48. 423. 425 Radicals. 138 interval. 1. 88-92. average. 106-108. 135. 424. 426. 135. 385. 4-8. 234-235. 2-3. 164. 327. 372-373. 391-392. 402. 113-115. 363. 328. 420. 377. 352. 372. 248. 84-87. 345. 205. 138-140. 382. 149-150. 383. 222-227. 263. rational. 283. 40. 28. 391. 345-348. 32 430-431. 210-219. 159. 325. 269. 428 continuous. 274. 71. 2. whole. 402-403. 320-321. 189. 96. 132. 203. 123. 373 of location. 417. 434. 426. 243. 52-53. 437-438 quadratic. 245. 162-165. 424. 253. 353-354. 243. 262. 19. 129. 369. 341-342. 233-245. 184. 261. Open interval. 107 finding. 252-254 mutually exclusive events. 318. 78 addition rule. 333. 339. 417. 309. 245-246. 328-330. 278-279. 194. 256 Random samples. 433 Powers. 230. 166. 420. 61-112. 338. 378. 117-119. 443. 278-280. 313. 340. 199-200. 238-243. 241-242. 74-76. 110. 313 Patterns. 300-306. 155-156. 413-416 426 138. 194. 50. 23-25. 341-343. 112. 98. 154-156. 363. 89. 33-34 403-404. 21. 291. 349. 85. 141. 386. 347-348 98. 220. 274. 318-324. 410 355-356. 108. 420 Periods. 369. 253. 52-53. 279. 384. 132. 68-71. 420. 381. 53. 301. 260 Mode. 454 Positive integers. 308. Power functions. 427-428 212-213. 78. 61-62. 230. 215. 402-403. 402 412-413. 274. 68-69. 268. 202-204. 185. 244-245. 133. 155. 207-232. 384. 354. 305. 229. 183-186. 348-351. 338. 36. 331-335. 115. 325. 283. 419 177. 28-29. 415 composite. 372. 399. 222. Point estimator. 403 defined. 353. 10. 344-345. 101. 384-386. 402-403. 264-265. 141. 286. 195. N 73-74. 121. 22 R Minimum. 337. 325. 247. 238-239. 228. 338-342. 121. 434 226-227. 106 Plotting. 325. 388. 215. 122-123. 349. 313. 203-204. 237. 82-96. 342. 67. 173-174. 318-325. 113. 61. 1. 230-232. 371. 261-262. Multiplication. 153. 194-195. 349. 119. 236-237. 238-239. 218-219. 228. 337. 298 Probability of an event. 404-407. 354. 231. Notation. 407-409. 66 301-302. 183. 245-248. 342. 199. 175. 313-314. 284. 227. 194. 177. 64. 123. 396. 164. 331. 463 411. 356. 37. 231. Matrix. 233-236. 321 symmetry. p-values. 313. 65. 276. 151. 340. 340-342. 201. 104. 337-338. 276. 385. 412. 391. 364 Probability distributions. 251-252. 317-318. 323. 284. 61-112. Profit. Plane. 397 simple. 354. 407-408. 22. 205. 5. 74. 204. 251. 223-225. 238. 45 396. 83. 70. defined. 62. 39. 207. 283-284. 203-205. 200. 269. 80. generating. 172. 310-314. 239 odds. 305-306 standard deviation. 338. 261 Polls. 351 235. 86. 109. 239. Operating characteristic curve. 276. 364. 397. 354. 82 critical. 354. 428 125-133. 286. 380-381. Numerators. Mean square error. Permutations. 156-157. Ounces. 391. 187-191. 194-195. 311. 308 Outliers. 260 set. 150-151. 225. 313. Probabilities. 291-292. 304. 437 Outlier. 145-147. 150. 89. 201. 79-80. 256-257. 207. 361. 278. 144. 94. 146. 108. 297-300. 191. 203. 449. 49-50. 152-153. 323 131-140. 208-209. 245-246. 2 Method of moments. 48. 345-346 386-388. 236. 73-76. 429-430 305-306. 304-305. 248-249. 267. 298-302. 119. 429-430 356. 73-76. 354. 306. 196-197. 325. 318-319. 363. 111. 350. 267. 353-354. 290. 52-54. 348. 349-352. 250-256. 125. 23 251-254. 171. 138. 392 exponential. 285-286. 279. 348. 403-404. 177. P 226. 90. 267. 74 of discontinuity. 338. 194. 426 Range. 168. 220. 258-260. nonlinear. 122. 417-418. 265. 207. 141. 431 108. 2. 172. 329. 228. 361. 317-319. 337. 98. 113. Multivariate distributions. 320 Partial derivatives. 168-169. 354. 408-409 392-394. 353-354. 374. 412. 152. 192. 291-293. 342. 368-370. 333-335. 160. 361-363. 101. 255. 304-308. 152. 22. 128. 193-194. 115. Multicollinearity. 64. 345. 391. 194. 124. 22. 356. 74 defined. 429-430. 403-405. 28-31. 129. 136-137. 84. 427. 314 131-132. 244-246. 269. 292-293. 138-140. 388. 139-144. 270-272. 142-143. 202 Percent chance. 364. 419-420 Negative exponents. 74 Mean square. 407. 310 320-322. 388. 283. 138. 258-260. 249-253. 247. 388. 250-252. 365-371. 130. 391-392. 429 Point. 113-116. 339. 353-354. 64. 240. 165-166. 145-176. 272. 33. 61-82. summation. 201. 292. 263-266. 379. 293. 328. 292. 205. 400. 444 363-366. 28. Minutes. 321. 251. 39 Random numbers. Means. 325-326. confidence intervals for. 82. 204. 101. 188-190. 340. 140. 374. 349. 283. 227-229. 232. 106-108. 182. 89. Maximum likelihood estimates. 135. 29. 131-132. 224-225. 195-196. 330. 317. 373. 79 231. 327. 381. 168. 312. 245. 242. 172. 26. 231. 39. mean. 379 Principal. 19 258-260. 357. 255-259. 73. 326 287-288. total. 109-110. 30. 64. 241-242. 321-324. 258-260. 99. 18. 363-364. 36-37. 319. O 356. 277. 325. 153 359. 205. 195. 71. 426-427. 201. 272. 399. 278-280. 383-384. 93-95. 419 Pounds. 216-218. 364. 61-66. 342. 432 430-431 140. 125-127. 168. 256-257. 384. 359. 259. 290. 432 471 . 202-203. 341-342. 22. 356. 257. 68-69. 73. 133. Quadratic. 412-414. 439-441 Order statistics. 61. 424 193. 368. 415-420. 200. 327. 22. 407. 423. 185. 108. 159. 283. 61. 329-331. 314. 91. 259. 36. Ordered pair. Natural logarithms. 261-262. 383. 279 408-409. 203. 40. 121. 113. Maximum. 198. 148. 207-220. 300. 375. 302. 398. 61. 235. 130 Probability density functions. 411-412. 212. 75-76. 48. Points. 388. 351 Probability. 333. 76-77. 292. 73-74. 232. 243. 312 Product. Normal distribution. 418. 233. 45. 8. 231 Random variables. 311. 439-441. 251. Normal equations. 48. 13. positive. 43. 305-306. 391. 278-280. 350. 415. 382. 406-409. 177. 372-373. 21-22. 409. 359-389 Measures. 307-314. 183. 420. 325. 261-262. 285-295. 372 47-50. 258. 454 408. 61-62. 16 397. 236. 251. 186 Prediction. 324. 374. 427. 2. 262. Maxima. 28. 345. 345. 169. 53-54. 368 135. 150. 245. 54-60. 116. 33-34. 335. 42-45. 149-150. 73. 379. Minitab. 321-323. 147. 359-389. 359. 241. 417. 102. 94-95. 207. 20. 330-331. 381. 108. 143. 251-256. Proportions. 186 Population. 420. 349. 37. 85-86. 327-328. 392. 40. 364. 365. 150. 145-176. 373. 199. 299. Normal distributions. 431 composite. 288-293. 173. 80-83. 279. 417 295. 429-430 337-338. 118-123. 283. 437-438. 70-71. 132. 52 total. 186-201. 189. 16. 421. 233. 386. 79-80. 101. 31. 33. 420. 74. 169. 392. 266 Midpoint. 228. 332-333. 348-351. 225-227. 233-234. Multiplication rule. 78. 173. 312. 254-255. 141. 167-168. 215. 106. 86. 23. 226. 146. 251. 383-385. 119. 431 simple. 405. 131. 359. 333-334. 76-77. 11. Null hypothesis. 103. Power series. Normal curves. 211-212. Perfect square. 101 geometric. 345-346. 189-190. 152-153. 166-168. 64. 146. 353-354. 420 Price. 186. 105. 388-389. 431 279. 110. 145. 101. 157-160. 374. 312 256-257. 146-149. 283. 231. 312 131-132. 407-408. 269-274. 307. 224. 332-333. 173. 395. 179-181. 245 Power. 347 complementary events. 256. 76. Percentages. 198. 439-440. 238. 143. 21-22. 119. 417. 411. Negative numbers. 140. 255. 299-301. 84. 363 202-204. 192. 317-318. 354-356. 138. 420 Origin. 361-362. 409-410 Plots. 196. 115. 314 259. 378. 304. 305. 345-346. 5. 438 93-94. 419 361. 380-381. 225. 61. 411. 260 Models. 351. 194-197. 121-131. 89. Numbers. 101-102. 308-311. 82 142. 56. 76. 115. 247. 123. 152. 73. 204. 181-185. 107. 241-243. 317 326-328. 115. 388. 36. Mean. 186. 313. 154 Poisson distributions. 233-242. 302. 166. 141 439-441 Median. 64-66. 183-187. 318. 229-232. 205. 357. 356. 45 283-315. 428-429 443-444. 406. 427. 178. 56. 280. 79. 344-357. 413 finding. 86. 181. 268. 419-420. 387-388. 246. 92. 27-28. 102. 248. 195. Parabola. 291. Random variable. 408. 242. 221. 140. 401-402. 371. 302. 12. Parameters. 36-40. 130-132. 202. 355. 297-308. 320-321. 202. 236-238. 190. 309. 227-228. Posterior probability. 314. 21-60. 26. U Real numbers. 52-54. 170 Yards. 145. 337 294. Statistical testing of hypotheses. 384. 427-429. 252. 384 384. 346. Velocity. 253. 345-349. 56. 130 Variance. 122 286-287. 248. 164. 290. 110-111. 192-195. 332. 378-381. 331. 260. 401. 253. 286. 233. 321-322. 340. 36. 172. 201. 257. 406. 36. 330. 197. Series system. Residuals. 279-280. 372 checking. 36. Sampling. 443. 154. 110-111 423-425. 383-388. 191. 12. 388. 201 Standard error. 17-18. 244-246. 404. 392. 217. 76. 33. 329-331. 231. 181. 433 Transformations. 61. 61. 194. 247-248. 403. 194. 74. 426. 305-306. 16. 423-424 241-243. 76-77. 101 Standard form. 257. 337. 344. 27. 421-424. 366-367. 391. 39. 47. 198 mean. 361. 24-26. 234-235. 398. 201 156. 419-423. 421 Type II error. 291. 342. 28. 195. 280. 61. 289. 406. 82. 130. 317. 238-260. 76. 251. x-axis. 259-260. Regression analysis. 349. 67-68. 162. 411. 255-256. 131. 283. 404. 420 Standard deviations. 371 Regression. 16. 136. 78. 163. 329. 324. Saddle point. 130-131. 238. 121. 233-245. W 233-260. 383 411. 222. 256-257. 323 V Rectangular distribution. 338-339. 86. 54-55. 388 291. 20. 396. 314. 276 Statistics. 391. 302-306. 361. 185-187. 198. 342. 369. 406 defined. 154. 29-30. 331 Subjective probability. 59 195. 141-142. 351 268. 185. 248-252. 367. 208-209. 170. 127. 112. 164. 322. 317. 154. 409-410 342. 33. 178. 107. 331. 238. 367. 156. 408-409. 37-41. 312. 289-295. 309-310. 337-338. 359-389. 361. 195. 60. 271. 334 356. 18-19. 364 Ratio test. 163. 268. 311 419 population. 177. 196. 252-254. 431 Years. 168-169. 332. 324. 334. 332 Rise. 112. 314. Vertical. 423 302. 40. 242-245. 431 estimated. 61-62. 439. 46-47. 408. 383-385. 402. 330-331. 236-237. 84. 250. 233. 257. 349. Vertex. 248. 326 203-205. 57 Sample correlation coefficient. 185-187. 312. 292-293. 160. 311-313. 441 real. 17. 337. 330. 344 Voting. matrix. defined. 343. 279-280. 382. 369 318-320. 321-322. 372. 311. 437. 379. 335. Statistical Quality Control. 332 113. 80. 233. 322. 391-392. Unbiased estimator. 52 247-248. 411. 123. 415 Sums. 317. 313 Tests of hypotheses. 157. 269. 391-432 Squaring. 152. 197. 132. 188-190. 91. 68-69. properties of. 396-403. 258-259. 392. 251-260. 13-14. 383. 292. 380 Y Sampling variability. 74. 233-235. 143. 419 310. 303-304. 164-165. 311-314. 74. 314 inferences. 174. 398 286. 19-20. 163. 351 419. 388. 217. Rejection region. 232. 307. 407-408. 145-146. 26. 26. 18 Sample standard deviation. 319. 319. 299-300. 417. 439-441 Sample. 56. 52. 428 165. 330-331 similar. 397. 402 472 . 332-335. Viewing. 338-339. 34. Regression line. 466-467 Weighted mean. 439 Upper bound. 323. 429. 258-259. 115. 204. 26. 59. 41. 369 Speed. 377. 185. 259. 152. 335. 412. 114. 291. 165. 56. 323. 49. 437-438. 433-434. 353. 396. 30 Symmetry. 354-356. 419 244-246. 392. 366. 317-327. 378. 443. 62. 119. 242. 258-259. 225. 256 Zero. 15. 356. 335. 71. 245. 68. 283. 185. 55. 125. 232 Tree diagram. 131. 317. 279 empty. 49. 410. 354. Tables. 363. 46-47. 283. 289. Set notation. 257. 52 330-331. 192. 276-277. 229. 250-252. Variation. 114. 59. 433-435 y-intercept. 429 measurement. 308-309. 250. 329-331. 195-196. 142. 402-403. 125. 290. 384-385. 238-242. 200. 332. 39. 233-236. 333 49. 381 169. 1. data sets. 139-140. 175. 323. 379-380. 211-212. 52. 406 Solutions. 194. 246. 196-197. 248-251. 251. 323-324. 142-143. 203-204. 354 Systematic sampling. 284-294. 406. 171-175. 248. 204-205. 314. 364. 203. 156. 171. 380. 247-248. 359-363. 342. 196-197. 248. 66-68 matrix. 231. 24. Weight. 374. 304-306. 32-33. 195. 123-125. 122. 329-335. 426 Sequences. 18. exponential. 373. 426. 123. 155-156. 68-69. 65. 97. 317-318. 113-114. 332. 119. 356. 172. 313-314. Sets. 260. 258-260. 269. 268. 42. 426 Standard deviation. 11-12. 379-380. 351-352. 298 355-356. 395. 346. 56. 361. 239. 283. 250. 152. 355. 54 Square. 356. 439. 283 229. 369. 130. 226 Simplification. 337-340. 260. 50. 229. 140. Rounding. 404. 366 Rays. 143. 229. 391. 325. 382. 373. 59. 141. 428 Signal. 21-22. 47 288. 349-354. 257. 344 Tons. 123. 287 Ratio. 172. 248-250. 16. 346-350. 388. 143. 369-370. intersection. 350-352. 165. 317. 52. 288-289. 64. 251. 141-144. 85. 233-236. 388. 235. 419. 306. 419. 377. 400. 443-468 Vertical line. 147. 327. 73-76. 346. 371 Reduced sample space. 235. 286-287. 74-75 Square roots. 386 defined. 68-69. 425. 331-332. 400. 125-127. 64. 194. 327. 71. 145-146. 258-259. 55 Sample mean. 308. 350. 18. 233 Statistical computer software. 168-171. 73. 334. 346 Statistical inference. Sum. 409. 235 confidence intervals for. common. 38-39 425-426. 346. 233. 367. 175. 205. 261. 239. 255-260. 88 338. 314. 324. 86. 194. 147-148. 356 372. 203. 40. 167-169. 233. 414. 22-26. 35. 408 431 Remainder. 175. 194-197. 325. 299. Variability. 412-413. 335. 384 339-340. 79-80. 113. 258-259. 311. 28-35. 283 Statistical model. 76-77. Statements. defined. 121. 322. 201. 26. 170. 119. 236. 335. 388. linear. 134-135. 24. 131-140. Sample variance. 239. 121-123. 332. 359. 406. 179. 396. 363. 189-191. 84. 215 233-236. 385-386. 132. 61-64. 351. 367. 391. Weibull distribution. 203. 420. 325. 382. 385. 363-364. 355. 269. 341-342. 290. 136. 276-278. 361. 376 Subtraction. 171-176. 407-409. 283-294. 30. 301. 82. 355. 381. 377. 7 Z geometric. Statistical hypothesis. 189-190. 263-265. 293. 325. 385. 40. 344-345. 419. 257. 204-205. 403 228. 50-52. 143 Rectangle. 311. S 354. 268. 439. 56 X random. 194. xy-plane. 160. 385. sampling distributions. 94 Simple null hypothesis. 273. 246. 136-137. 395-396 Test scores. 201. 82-96. 49. 107. 177-178. 25. 349-351. 198 325. 130. Roots. 150. 431. 17. 8-10. 233-236. 437. Whole numbers. 256. 152. 320-321. 37-38. 283. 98-101. 233-234. 56. 321-323. 50. 402. 128-129. 225. 306. 454 297-299. 379-380. 421. Subset. 34. 441 Squares. 1-2. Real axis. 166-168. 443-468 381 Temperature. 353-354. 333. 231 400. 259-260. 327. 399. 317 283-287. 228. 252 Statistical tables. 253. 21. 156. 152. 324. 283. 398. 143. 137-138. 33-34. 342. solution. 354 z-axis. 354. 421-424. 428-431 241. 106. 239. 420. 412-417. 350. Variances. 172. 197. 50. 5. 337 369-370. 21-26. 74 Squared deviations. 171-174. 263. 107. 40. 13-14. 423. 141-142. 146-147 356-357. 317-318. 178. 22 T 271. 185. 18. 52-54. 235. 431. 169. 260. 183. 387 181-182. 404. 17. 345 sample. 312. y-axis. 339-340. 217. 330-331. 142. 202. 268. 236-237. 123. 2-3. 207-232. 419. 142. 324. 160. 428 331-332. 283. 146. 359-360. 143. 363. 205. 408 400-403. 173. 146. 50. 363. 369 177. 444 Survey. 175. 177-178. 298 Type I error. 214. 64. 195. 252-254. 76-77. Ray. 16. 334-335. 95. 333. 338. 36. 141 Variables. 225. 335. 227. 218-219. 431 Rectangles. 397. 403 Series. 126. 258. 141-142. 328. 53-54. 163. 14. 260. 233-236. 229. 382. 330-331 Standard normal distribution. 203. 334. 75. 187. 337. 314. 349. 384. 238-243. 283. 44. 309-310. 101. 204. 155-157. 195. 429-431 union. 421-424. 313-314. Simulation. 402. Variations. 349-350 x-coordinate. 151. 222-223. 374-376. 29-31. 263-265. 34. 406. 73. 276-277. Uniform distribution. 365-367. 338-339. 115. 174 Scores. 419. 309. 413 332. 418 functions. 430 Rates. 181. 324. 427 Trees. 235. 381. 323. 398. 263-265. 250. 113. 408-413. 37-41. 328. 183. 47. 60. 289. 84. 344. 428 Venn diagram. Run. 200. 404. 369. 337. 363. 291. 256 determining. 121 Symbols. 110. 149-150. 266-268. 335. 359. 279 defined. 371-372. 412. 257. 320 111. 406. 313-314. 263. 79-80. 207. 313. 403. 50-51. 407. 246-248. 399. 241. 26. 73. 323. 222. 181. 355. 324. 348-349. 343 Test of a statistical hypothesis. 109. 23 proportion. 391. 152. 244-245. 257. 356. 147-148. 395-396. 141-142. 392. 404-406. 34-35. 181-183. 269. 359 Tolerance limits. 369-370. 314. 154. 325. 238. 225-228. 391. 346 Sides. 207-232. 28-29. 167-172. 415. Slope. 202. 403. 123-125. 408. 1. 298. 344. 187. 102 Sample space. 354. 109. 298. 38. 260. 149. Substitution. 68. 185. 244-245. 241. 21. 410. 403. 199-202. 108. 174. 107. 102. 431 364. 349 352-353. 314. 84. 246. 61-64. 361-370. 231. 337. 84 374-375. 188. Seconds. 136. 232. Volume. 355. 230. 230. 165. 341-343. 325 Simplify. 11-12. 297. 283. 417. 373. 242.
Copyright © 2024 DOKUMEN.SITE Inc.