Jay L. Devore-Probability and Statistics for Engineering and the Sciences, Jay L Devore Solutions Manual-Duxbury Press (2007)

March 26, 2018 | Author: Nicodemus Sigit Sutanto | Category: Quartile, Median, Outlier, Mean, Histogram


Comments



Description

Chapter 1: Overview and Descriptive StatisticsCHAPTER 1 Section 1.1 1. a. Houston Chronicle, Des Moines Register, Chicago Tribune, Washington Post b. Capital One, Campbell Soup, Merrill Lynch, Pulitzer c. Bill Jasper, Kay Reinke, Helen Ford, David Menedez d. 1.78, 2.44, 3.5, 3.04 a. 29.1 yd., 28.3 yd., 24.7 yd., 31.0 yd. b. 432, 196, 184, 321 c. 2.1, 4.0, 3.2, 6.3 d. 0.07 g, 1.58 g, 7.1 g, 27.2 g a. In a sample of 100 VCRs, what are the chances that more than 20 need service while under warrantee? What are the chances than none need service while still under warrantee? b. What proportion of all VCRs of this brand and model will need service within the warrantee period? 2. 3. 1 Chapter 1: Overview and Descriptive Statistics 4. a. b. Concrete: All living U.S. Citizens, all mutual funds marketed in the U.S., all books published in 1980. Hypothetical: All grade point averages for University of California undergraduates during the next academic year. Page lengths for all books published during the next calendar year. Batting averages for all major league players during the next baseball season. Concrete: Probability: In a sample of 5 mutual funds, what is the chance that all 5 have rates of return which exceeded 10% last year? Statistics: If previous year rates-of-return for 5 mutual funds were 9.6, 14.5, 8.3, 9.9 and 10.2, can we conclude that the average rate for all funds was below 10%? Conceptual: Probability: In a sample of 10 books to be published next year, how likely is it that the average number of pages for the 10 is between 200 and 250? Statistics: If the sample average number of pages for 10 books is 227, can we be highly confident that the average for all books is between 200 and 245? 5. a. No, the relevant conceptual population is all scores of all students who participate in the SI in conjunction with this particular statistics course. b. The advantage to randomly choosing students to participate in the two groups is that we are more likely to get a sample representative of the population at large. If it were left to students to choose, there may be a division of abilities in the two groups which could unnecessarily affect the outcome of the experiment. c. If all students were put in the treatment group there would be no results with which to compare the treatments. 6. One could take a simple random sample of students from all students in the California State University system and ask each student in the sample to report the distance form their hometown to campus. Alternatively, the sample could be generated by taking a stratified random sample by taking a simple random sample from each of the 23 campuses and again asking each student in the sample to report the distance from their hometown to campus. Certain problems might arise with self reporting of distances, such as recording error or poor recall. This study is enumerative because there exists a finite, identifiable population of objects from which to sample. 7. One could generate a simple random sample of all single family homes in the city or a stratified random sample by taking a simple random sample from each of the 10 district neighborhoods. From each of the homes in the sample the necessary variables would be collected. This would be an enumerative study because there exists a finite, identifiable population of objects from which to sample. 2 Chapter 1: Overview and Descriptive Statistics 8. a. Number observations equal 2 x 2 x 2 = 8 b. This could be called an analytic study because the data would be collected on an existing process. There is no sampling frame. a. There could be several explanations for the variability of the measurements. Among them could be measuring error, (due to mechanical or technical changes across measurements), recording error, differences in weather conditions at time of measurements, etc. b. This could be called an analytic study because there is no sampling frame. 9. Section 1.2 10. a. Minitab generates the following stem-and-leaf display of this data: 59 6 33588 7 00234677889 8 127 9 077 stem: ones 10 7 leaf: tenths 11 368 What constitutes large or small variation usually depends on the application at hand, but an often-used rule of thumb is: the variation tends to be large whenever the spread of the data (the difference between the largest and smallest observations) is large compared to a representative value. Here, 'large' means that the percentage is closer to 100% than it is to 0%. For this data, the spread is 11 - 5 = 6, which constitutes 6/8 = .75, or, 75%, of the typical data value of 8. Most researchers would call this a large amount of variation. b. The data display is not perfectly symmetric around some middle/representative value. There tends to be some positive skewness in this data. c. In Chapter 1, outliers are data points that appear to be very different from the pack. Looking at the stem-and-leaf display in part (a), there appear to be no outliers in this data. (Chapter 2 gives a more precise definition of what constitutes an outlier). d. From the stem-and-leaf display in part (a), there are 4 values greater than 10. Therefore, the proportion of data values that exceed 10 is 4/27 = .148, or, about 15%. 3 Chapter 1: Overview and Descriptive Statistics 11. 6l 6h 7l 7h 8l 8h 9l 9h 034 667899 00122244 Stem=Tens Leaf=Ones 001111122344 5557899 03 58 This display brings out the gap in the data: There are no scores in the high 70's. 12. One method of denoting the pairs of stems having equal values is to denote the first stem by L, for 'low', and the second stem by H, for 'high'. Using this notation, the stem-and-leaf display would appear as follows: 3L 1 3H 56678 4L 000112222234 4H 5667888 5L 144 5H 58 stem: tenths 6L 2 leaf: hundredths 6H 6678 7L 7H 5 The stem-and-leaf display on the previous page shows that .45 is a good representative value for the data. In addition, the display is not symmetric and appears to be positively skewed. The spread of the data is .75 - .31 = .44, which is.44/.45 = .978, or about 98% of the typical value of .45. This constitutes a reasonably large amount of variation in the data. The data value .75 is a possible outlier 4 Chapter 1: Overview and Descriptive Statistics 13. a. 12 12 12 12 13 13 13 13 13 14 14 14 14 2 Leaf = ones 445 Stem = tens 6667777 889999 00011111111 2222222222333333333333333 44444444444444444455555555555555555555 6666666666667777777777 888888888888999999 0000001111 2333333 444 77 The observations are highly concentrated at 134 – 135, where the display suggests the typical value falls. b. 40 Frequency 30 20 10 0 122 124 126 128 130 132 134 136 138 140 142 144 146 148 strength The histogram is symmetric and unimodal, with the point of symmetry at approximately 135. 5 Chapter 1: Overview and Descriptive Statistics 14. a. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 23 stem units: 1.0 2344567789 leaf units: .10 01356889 00001114455666789 0000122223344456667789999 00012233455555668 02233448 012233335666788 2344455688 2335999 37 8 36 0035 9 b. A representative value could be the median, 7.0. c. The data appear to be highly concentrated, except for a few values on the positive side. d. No, the data is skewed to the right, or positively skewed. e. The value 18.9 appears to be an outlier, being more than two stem units from the previous value. 15. Crunchy 644 77220 6320 222 55 0 2 3 4 5 6 7 8 Creamy 2 69 145 3666 258 Both sets of scores are reasonably spread out. There appear to be no outliers. The three highest scores are for the crunchy peanut butter, the three lowest for the creamy peanut butter. 6 Chapter 1: Overview and Descriptive Statistics 16. a. beams cylinders 9 5 8 88533 6 16 98877643200 7 012488 721 8 13359 770 9 278 7 10 863 11 2 12 6 13 14 1 The data appears to be slightly skewed to the right, or positively skewed. The value of 14.1 appears to be an outlier. Three out of the twenty, 3/20 or .15 of the observations exceed 10 Mpa. b. The majority of observations are between 5 and 9 Mpa for both beams and cylinders, with the modal class in the 7 Mpa range. The observations for cylinders are more variable, or spread out, and the maximum value of the cylinder observations is higher. c. Dot Plot . . . :.. : .: . . . : . . . -+---------+---------+---------+---------+---------+----- cylinder 6.0 7.5 9.0 10.5 12.0 13.5 17. a. Number Nonconforming 0 1 2 3 4 5 6 7 8 RelativeFrequency(Freq/60) 0.117 0.200 0.217 0.233 0.100 0.050 0.050 0.017 0.017 doesn't add exactly to 1 because relative frequencies have been rounded 1.001 b. Frequency 7 12 13 14 6 3 3 1 1 The number of batches with at most 5 nonconforming items is 7+12+13+14+6+3 = 55, which is a proportion of 55/60 = .917. The proportion of batches with (strictly) fewer than 5 nonconforming items is 52/60 = .867. Notice that these proportions could also have been computed by using the relative frequencies: e.g., proportion of batches with 5 or fewer nonconforming items = 1- (.05+.017+.017) = .916; proportion of batches with fewer than 5 nonconforming items = 1 - (.05+.05+.017+.017) = .866. 7 Chapter 1: Overview and Descriptive Statistics c. The following is a Minitab histogram of this data. The center of the histogram is somewhere around 2 or 3 and it shows that there is some positive skewness in the data. Using the rule of thumb in Exercise 1, the histogram also shows that there is a lot of spread/variation in this data. Relative Frequency .20 .10 .00 0 1 2 3 4 5 6 7 8 Number 18. a. The following histogram was constructed using Minitab: 800 Frequency 700 600 500 400 300 200 100 0 0 2 4 6 8 10 12 14 16 18 Number of papers The most interesting feature of the histogram is the heavy positive skewness of the data. Note: One way to have Minitab automatically construct a histogram from grouped data such as this is to use Minitab's ability to enter multiple copies of the same number by typing, for example, 784(1) to enter 784 copies of the number 1. The frequency data in this exercise was entered using the following Minitab commands: MTB > set c1 DATA> 784(1) 204(2) 127(3) 50(4) 33(5) 28(6) 19(7) 19(8) DATA> 6(9) 7(10) 6(11) 7(12) 4(13) 4(14) 5(15) 3(16) 3(17) DATA> end 8 Chapter 1: Overview and Descriptive Statistics b. From the frequency distribution (or from the histogram), the number of authors who published at least 5 papers is 33+28+19+…+5+3+3 = 144, so the proportion who published 5 or more papers is 144/1309 = .11, or 11%. Similarly, by adding frequencies and dividing by n = 1309, the proportion who published 10 or more papers is 39/1309 = .0298, or about 3%. The proportion who published more than 10 papers (i.e., 11 or more) is 32/1309 = .0245, or about 2.5%. c. No. Strictly speaking, the class described by ' ≥15 ' has no upper boundary, so it is impossible to draw a rectangle above it having finite area (i.e., frequency). d. The category 15-17 does have a finite width of 2, so the cumulated frequency of 11 can be plotted as a rectangle of height 6.5 over this interval. The basic rule is to make the area of the bar equal to the class frequency, so area = 11 = (width)(height) = 2(height) yields a height of 6.5. a. From this frequency distribution, the proportion of wafers that contained at least one particle is (100-1)/100 = .99, or 99%. Note that it is much easier to subtract 1 (which is the number of wafers that contain 0 particles) from 100 than it would be to add all the frequencies for 1, 2, 3,… particles. In a similar fashion, the proportion containing at least 5 particles is (100 - 1-2-3-12-11)/100 = 71/100 = .71, or, 71%. b. The proportion containing between 5 and 10 particles is (15+18+10+12+4+5)/100 = 64/100 = .64, or 64%. The proportion that contain strictly between 5 and 10 (meaning strictly more than 5 and strictly less than 10) is (18+10+12+4)/100 = 44/100 = .44, or 44%. c. The following histogram was constructed using Minitab. The data was entered using the same technique mentioned in the answer to exercise 8(a). The histogram is almost symmetric and unimodal; however, it has a few relative maxima (i.e., modes) and has a very slight positive skew. 19. Relative frequency .20 .10 .00 0 5 10 Number of particles 9 15 Frequency 10 5 0 0 1000 2000 3000 length 10 4000 5000 6000 .191. or 19. 6000 is shown below. 1000. The proportion of subdivis ions with total length less than 2000 is (12+11)/47 = . The following stem-and-leaf display was constructed: 0 123334555599 1 00122234688 2 1112344477 3 0113338 4 37 5 23778 stem: thousands leaf: hundreds A typical data value is somewhere in the low 2000's. The histogram shows the same general shape as depicted by the stem-and-leaf in part (a). 2000.489.9%. b.Chapter 1: Overview and Descriptive Statistics 20.1%. the stem at 0 another) and has a positive skew. the proportion is (7 + 2)/47 = . Between 200 and 4000. using classes of width 1000 centered at 0. or 48. a. The display is almost unimodal (the stem at 5 would be considered a mode. A histogram of this data. or 83. or 63.4%.638. the number of subdivisions with at most 5 intersections (i. is an easy way to find the number of subdivisions with y ≥ 1. Frequency 20 10 0 0 1 2 3 4 5 y b. Note that subtracting the number of cul-de-sacs with y = 0 from the total. A histogram of the y data appears below.2%. the number of subdivisions having no cul-de-sacs (i.e.362. The proportion having fewer than 5 intersections (z < 5) is 39/47 = . a. z ≤ 5) is 42/47 = . The proportion having at least one cul-de-sac (y ≥ 1) is (47-17)/47 = 30/47 = . or 89. From this histogram. y = 0) is 17/47 = . 47. From this histogram.894.0%.8%. or 36. Frequency 10 5 0 0 1 2 3 4 z 11 5 6 7 8 .e. A histogram of the z data appears below..Chapter 1: Overview and Descriptive Statistics 21.830.. A very large percentage of the data values are greater than 0.Chapter 1: Overview and Descriptive Statistics 22. which means that some runners slow down a lot compared to the others. 12 . with a majority of observations between 0 and 300 cycles. runners do slow down at the end of the race. which indicates that most. but not all. The class holding the most observations is between 100 and 200 cycles. The histogram is also positively skewed. The proportion of the runners who ran the last 5 km faster than they did the first 5 km is very small. A typical value for this data would be in the neighborhood of 200 seconds. 23. a. Percent 30 20 10 0 0 100 200 300 400 500 600 700 800 900 brkstgth The histogram is skewed right. about 1% or so. 21 = .002 0.000 0 50100150200 300 400 500 600 900 brkstgth c [proportion ≥ 100] = 1 – [proportion < 100] = 1 .003 0.79 24.004 Density 0..Chapter 1: Overview and Descriptive Statistics b. 0. Percent 20 10 0 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 weldstrn 13 .001 0. 9 log(IDT) The transformation creates a much more symmetric. 14 .6 70 80 IDT Histogram of transformed data: 9 8 7 Frequency 25. mound-shaped histogram. 6 5 4 3 2 1 0 1.7 1.3 1.Chapter 1: Overview and Descriptive Statistics Histogram of original data: 15 Frequency 10 5 0 10 20 30 40 50 60 1.4 1.1 1.2 1.8 1.5 1. 0. The proportion of days with a clearness index of at least .15 0.35 is (8 + 4) = . Freq.25 0.65 is (84 + 11) = .65 -< .25 -< .70 -< .45 .60 -< .02192 0.06575 0. or 26%.700.650.35 0.15 -< .00001 6 5 Density 4 3 2 1 0 0.55 .65 .50 -< . 365 c. a.70 . 365 15 .60 0.75 Frequency 8 14 28 24 39 51 106 84 11 n=365 Rel.13973 0. The proportion of days with a clearness index smaller than .45 -< .07671 0.50 .25 .550. Class Intervals .500.55 -< .75 clearness b.29041 0.03014 1.45 0.35 -< .35 .03836 0.Chapter 1: Overview and Descriptive Statistics 26.06 .10685 0. or 6%.60 .26 .23014 0. 00 Frequency 20 10 0 0 50 100 150 200 250 300 350 400 450 500 550 600 lifetime The distribution is skewed to the right.< 200 200 . 16 .< 300 300 .< 400 >= 400 Frequency 9 19 11 4 2 2 1 1 1 50 Relative Frequency 0. the value 50 falls in both of the intervals ‘0 – 50’ and ’50 – 100’. There is a gap in the histogram.< 150 150 .< 50 50 . For example.< 350 350 .04 0.< 100 100 . or positively skewed.02 1.04 0.18 0.Chapter 1: Overview and Descriptive Statistics 27.22 0.02 0. a. Class Interval 0 .38 0. b.< 250 250 . The endpoints of the class intervals overlap. and what appears to be an outlier in the ‘500 – 550’ interval.02 0.08 0. Chapter 1: Overview and Descriptive Statistics c. Class Interval 2.25 - < 2.75 2.75 - < 3.25 3.25 - < 3.75 3.75 - < 4.25 4.25 - < 4.75 4.75 - < 5.25 5.25 - < 5.75 5.75 - < 6.25 Frequency 2 2 3 8 18 10 4 3 Relative Frequency 0.04 0.04 0.06 0.16 0.36 0.20 0.08 0.06 Frequency 20 10 0 2.25 2.75 3.25 3.75 4.25 4.75 5.25 5.75 6.25 ln lifetime The distribution of the natural logs of the original data is much more symmetric than the original. d. There are seasonal trends with lows and highs 12 months apart. 21 20 19 radtn 28. The proportion of lifetime observations in this sample that are less than 100 is .18 + .38 = .56, and the proportion that is at least 200 is .04 + .04 + .02 + .02 + .02 = .14. 18 17 16 Index 10 20 30 17 40 Chapter 1: Overview and Descriptive Statistics 29. Complaint B C F J M N O Frequency 7 3 9 10 4 6 21 60 Relative Frequency 0.1167 0.0500 0.1500 0.1667 0.0667 0.1000 0.3500 1.0000 Count of complaint 20 10 0 B C F J M N complaint 30. Count of prodprob 20 0 10 0 0 1 2 3 4 prodprob 1. 2. 3. 4. 5. incorrect comp onent missing component failed component insufficient solder excess solder 18 5 O Chapter 1: Overview and Descriptive Statistics 31. Relative Cumulative Relative Class Frequency Frequency Frequency 0.0 - under 4.0 2 2 0.050 4.0 - under 8.0 14 16 0.400 8.0 - under 12.0 11 27 0.675 12.0 - under 16.0 8 35 0.875 16.0 - under 20.0 4 39 0.975 20.0 - under 24.0 0 39 0.975 24.0 - under 28.0 1 40 1.000 32. a. The frequency distribution is: Class 0-< 150 150-< 300 300-< 450 450-< 600 600-< 750 750-< 900 Relative Frequency .193 .183 .251 .148 .097 .066 Class 900-<1050 1050-<1200 1200-<1350 1350-<1500 1500-<1650 1650-<1800 1800-<1950 Relative Frequency .019 .029 .005 .004 .001 .002 .002 The relative frequency distribution is almost unimodal and exhibits a large positive skew. The typical middle value is somewhere between 400 and 450, although the skewness makes it difficult to pinpoint more exactly than this. b. The proportion of the fire loads less than 600 is .193+.183+.251+.148 = .775. The proportion of loads that are at least 1200 is .005+.004+.001+.002+.002 = .014. c. The proportion of loads between 600 and 1200 is 1 - .775 - .014 = .211. 19 Chapter 1: Overview and Descriptive Statistics Section 1.3 33. a. x = 192.57 , ~ x = 189 . The mean is larger than the median, but they are still fairly close together. b. Changing the one value, median stays the same. x = 189.71 , ~ x = 189 . c. x tr = 191.0 . d. For n = 13, Σx = (119.7692) x 13 = 1,557 For n = 14, Σx = 1,557 + 159 = 1,716 x= The mean is lowered, the 1 = .07 or 7% trimmed from each tail. 14 1716 = 122.5714 or 122.6 14 34. x = 514.90/11 = 46.81. a. The sum of the n = 11 data points is 514.90, so b. The sample size (n = 11) is odd, so there will be a middle value. Sorting from smallest to largest: 4.4 16.4 22.2 30.0 33.1 36.6 40.4 66.7 73.7 81.5 109.9. The sixth value, 36.6 is the middle, or median, value. The mean differs from the median because the largest sample observations are much further from the median than are the smallest values. c. Deleting the smallest (x = 4.4) and largest (x = 109.9) values, the sum of the remaining 9 observations is 400.6. The trimmed mean percentage is 100(1/11) ≈ 9.1%. xtr is 400.6/9 = 44.51. The trimming xtr lies between the mean and median. 35. a. The sample mean is x = (100.4/8) = 12.55. The sample size (n = 8) is even. Therefore, the sample median is the average of the (n/2) and (n/2) + 1 values. By sorting the 8 values in order, from smallest to largest: 8.0 8.9 11.0 12.0 13.0 14.5 15.0 18.0, the forth and fifth values are 12 and 13. The sample median is (12.0 + 13.0)/2 = 12.5. The 12.5% trimmed mean requires that we first trim (.125)(n) or 1 value from the ends of the ordered data set. Then we average the remaining 6 values. The 12.5% trimmed mean xtr (12.5) is 74.4/6 = 12.4. All three measures of center are similar, indicating little skewness to the data set. b. The smallest value (8.0) could be increased to any number below 12.0 (a change of less than 4.0) without affecting the value of the sample median. 20 Chapter 1: Overview and Descriptive Statistics c. The values obtained in part (a) can be used directly. For example, the sample mean of 12.55 psi could be re-expressed as  1ksi   = 5.70 ksi .  2.2 psi  (12.55 psi) x  36. a. A stem-and leaf display of this data appears below: 32 55 33 49 34 35 6699 36 34469 37 03345 38 9 39 2347 40 23 41 42 4 stem: ones leaf: tenths The display is reasonably symmetric, so the mean and median will be close. 37. b. The sample mean is x = 9638/26 = 370.7. The sample median is ~ x = (369+370)/2 = 369.50. c. The largest value (currently 424) could be increased by any amount. Doing so will not change the fact that the middle two observations are 369 and 170, and hence, the median will not change. However, the value x = 424 can not be changed to a number less than 370 (a change of 424-370 = 54) since that will lower the values(s) of the two middle observations. d. Expressed in minutes, the mean is (370.7 sec)/(60 sec) = 6.18 min; the median is 6.16 min. x = 12.01 , ~ x = 11.35 , x tr(10) = 11.46 . The median or the trimmed mean would be good choices because of the outlier 21.9. 38. a. The reported values are (in increasing order) 110, 115, 120, 120, 125, 130, 130, 135, and 140. Thus the median of the reported values is 125. b. 127.6 is reported as 130, so the median is now 130, a very substantial change. When there is rounding or grouping, the median can be highly sensitive to small change. 21 Chapter 1: Overview and Descriptive Statistics 39. a. b. 40. 16.475 = 1.0297 16 (1.007 + 1.011) ~ x= = 1.009 2 Σ xl = 16.475 so x = 1.394 can be decreased until it reaches 1.011(the largest of the 2 middle values) – i.e. by 1.394 – 1.011 = .383, If it is decreased by more than .383, the median will change. ~ x = 60.8 x tr( 25) = 59.3083 x tr(10) = 58.3475 x = 58.54 All four measures of center have about the same value. 41. 10 = .70 a. 7 b. x = .70 = proportion of successes c. s = .80 so s = (0.80)(25) = 20 25 total of 20 successes 20 – 7 = 13 of the new cars would have to be successes 42. a. b. 43. Σyi Σ( x i + c) Σxi nc = = + = x+c n n n n ~y = the median of ( x + c, x + c ,..., x + c ) = median of 1 2 n ~ ( x1 , x 2 ,..., x n ) + c = x + c y= Σyi Σ( x i ⋅ c ) cΣx i = = = cx n n n ~y = ( cx , cx ,..., cx ) = c ⋅ median ( x , x ,..., x ) = c~x 1 2 n 1 2 n y= median = (57 + 79) = 68.0 , 20% trimmed mean = 66.2, 30% trimmed mean = 67.5. 2 22 Chapter 1: Overview and Descriptive Statistics Section 1.4 44. a. range = 49.3 – 23.5 = 25.8 b. ( xi − x ) xi 29.5 49.3 30.6 28.2 28.0 26.3 33.9 29.4 23.5 31.6 ( xi − x ) 2 -1.53 18.27 -0.43 -2.83 -3.03 -4.73 2.87 -1.63 -7.53 0.57 Σx = 310.3 2.3409 333.7929 0.1849 8.0089 9.1809 22.3729 8.2369 2.6569 56.7009 0.3249 x i2 870.25 2430.49 936.36 795.24 784.00 691.69 1149.21 864.36 552.25 998.56 Σ ( x i − x ) = 0 Σ ( x i − x ) 2 = 443.801 Σ ( x i2 ) = 10,072.41 x = 31.03 n s2 = c. s = d. s2 = Σ (x i − x ) 2 i =1 n −1 s 443.801 = 49.3112 9 = 7 . 0222 2 Σx 2 − ( Σx ) 2 / n 10,072.41 − ( 310.3) 2 / 10 = = 49.3112 n −1 9 45. a. = 1 n x = ∑x i = 577.9/5 = 115.58. Deviations from the mean: i 116.4 - 115.58 = .82, 115.9 - 115.58 = .32, 114.6 -115.58 = -.98, 115.2 - 115.58 = -.38, and 115.8-115.58 = .22. b. c. s 2 = [(.82)2 + (.32)2 + (-.98)2 + (-.38)2 + (.22)2 ]/(5-1) = 1.928/4 =.482, so s = .694. ∑x i i d. 2 2 = 66,795.61, so s = 1 n −1 2     2 1 ∑ xi − n  ∑ xi   =  i    i [66,795.61 - (577.9)2 /5]/4 = 1.928/4 = .482. Subtracting 100 from all values gives x = 15.58 , all deviations are the same as in part b, and the transformed variance is identical to that of part b. 23 Chapter 1: Overview and Descriptive Statistics 46. a. x = 1 n ∑x i = 14438/5 = 2887.6. The sorted data is: 2781 2856 2888 2900 3013, i so the sample median is b. 47. ~ x = 2888. Subtracting a constant from each observation shifts the data, but does not change its sample variance (Exercise 16). For example, by subtracting 2700 from each observation we get the values 81, 200, 313, 156, and 188, which are smaller (fewer digits) and easier to work with. The sum of squares of this transformed data is 204210 and its sum is 938, so the computational formula for the variance gives s 2 = [204210-(938)2 /5]/(5-1) = 7060.3. The sample mean, x= 1 1 x i = (1,162) = x =116.2 . ∑ n 10 (∑ x ) ∑x − n 2 The sample standard deviation, s= i 2 i n −1 = 140,992 − 9 (1,162) 2 10 = 25.75 On average, we would expect a fracture strength of 116.2. In general, the size of a typical deviation from the sample mean (116.2) is about 25.75. Some observations may deviate from 116.2 by more than this and some by less. 48. 2 Using the computational formula, s = 1 n −1 2     2 1 ∑ xi − n  ∑ xi   =  i  i   [3,587,566-(9638)2 /26]/(26-1) = 593.3415, so s = 24.36. In general, the size of a typical deviation from the sample mean (370.7) is about 24.4. Some observations may deviate from 370.7 by a little more than this, some by less. 49. a. Σx = 2.75 + ... + 3.01 = 56.80 , Σx 2 = ( 2.75) 2 + ... + (3.01) 2 = 197.8040 b. 197.8040 − (56.80) 2 / 17 8.0252 s = = = .5016, s = .708 16 16 2 24 Chapter 1: Overview and Descriptive Statistics 50. First, we need x= 1 1 x i = (20,179) = 747.37 . Then we need the sample standard ∑ n 27 24,657,511 − (20,179 )2 27 = 606.89 . The maximum award should be 26 x + 2s = 747.37 + 2( 606.89) = 1961.16 , or in dollar units, $1,961,160. This is quite a deviation s= bit less than the $3.5 million that was awarded originally. 51. a. Σx = 2563 s2 = b. and [368,501 − ( 2563) 2 / 19] = 1264.766 and s = 35.564 18 If y = time in minutes, then y = cx where s 2y = c 2 s 2x = 52. Σx 2 = 368,501 , so c= 1 60 , so 1264.766 35.564 = .351 and s y = cs x = = .593 3600 60 Let d denote the fifth deviation. Then .3 + .9 + 1.0 + 1.3 + d = 0 or 3.5 + d = 0 , so d = −3.5 . One sample for which these are the deviations is x1 = 3.8, x 2 = 4.4, x 3 = 4.5, x 4 = 4.8, x 5 = 0. (obtained by adding 3.5 to each deviation; adding any other number will produce a different sample with the desired property) 53. a. lower half: 2.34 2.43 2.62 2.74 2.74 2.75 2.78 3.01 3.46 upper half: 3.46 3.56 3.65 3.85 3.88 3.93 4.21 4.33 4.52 Thus the lower fourth is 2.74 and the upper fourth is 3.88. b. f s = 3.88 − 2.74 = 1.14 c. f s wouldn’t change, since increasing the two largest values does not affect the upper fourth. d. By at most .40 (that is, to anything not exceeding 2.74), since then it will not change the lower fourth. e. Since n is now even, the lower half consists of the smallest 9 observations and the upper half consists of the largest 9. With the lower fourth = 2.74 and the upper fourth = 3.93, f s = 1.19 . 25 the lower quartile. 2 So.5(44.15 = − 40.6 40.7 73.7 would there be any change in the value of the upper quartile.3 or more units below the lower. or above the upper quartile.7) = 36.2 + 66.35 units] to be classified as a mild outlier. of either type.15 units below the lower quartile 26. we conclude that there are no outliers. and therefore. c. An extreme outlier would fall (3)44. The lower half of the data set: 4.9 – 73. That is. The variation seems quite large.7 + 73. in this data set. the IQR = (70. 2 The top half of the data set: 36. in this case.9 is lowered below 73.2 30.7 81.6. Not until the value x = 109. An observation would need to be further than 1. 26 .9 respectively.1 36.1.2 units. Since the minimum and maximum observations in the data are 4. an outlier on the lower side would not be possible since the sheer strength variable cannot have a negative value.4 66.4 22.15) =136.Chapter 1: Overview and Descriptive Statistics 54. a. d.2 – 26.4 and 109.1 − 66.1 b. is (66. There are no outliers.1) = 66. the upper quartile.9. and therefore. Notice that. whose median. the value x = 109. whose median.2 . A boxplot (created in Minitab) of this data appears below: 0 50 100 sheer strength There is a slight positive skew to the data.1) = 44.7) = 70.1) = 132.9 could not be decreased by more than (109.2 + 30.0 ) + 26. is ( 22.0 33.05 units or above the upper quartile [( ] ) [(70.4 16.5 109. The top half of the data is 370 373 373 374 375 389 392 393 394 397 402 403 424. Lower half of the data set: 325 325 334 339 356 356 359 359 363 364 364 366 369. 27 . but it is not far from being symmetric. a. That is. and therefore the upper quartile is 392.5 = 309. the value x = 424 could not be decreased by more than 424-392 = 32 units. There is a slight positive skew to the data. 1.. Observations that are further than 49.5 or less) or more than 49. So.Chapter 1: Overview and Descriptive Statistics 55. or above the upper.5 = 441. is 359 (the 7th observation in the sorted list).5 units above the upper quartile (greater than 392+49. seems large (the spread 424-325 = 99 is a large percentage of the median/typical value) 320 370 420 Escape time d. whose median. Not until the value x = 424 is lowered below the upper quartile value of 392 would there be any change in the value of the upper quartile. The variation.5(33) = 49. A boxplot (created by Minitab) of this data appears below. and therefore the lower quartile.5 and 3(IQR) = 3(33) = 99. quartile.5(IQR) = 1. however. b. 359-49. no 'extreme' outliers either). the IQR = 392 359 = 33.e. Since the minimum and maximum observations in the data are 325 and 424. we conclude that there are no mild outliers in this data (and therefore.5) are classified as 'mild' outliers.5 below the lower quartile (i. c. 'Extreme' outliers would fall 99 or more units below the lower. whose median. 2 = 164.2. However. There is one extreme outler (x=511). except for the two outliers identified in part (a). * 120 58. A boxplot of this data appears below. a typical value.5(216.2 = 248. as measured by the median. 28 . 125.5(IQR) = 1.8-196. 0 100 200 300 400 500 aluminum There is a slight positive skew to this data. The only outlier that exists is from machine 1.8+31. the variation in the data is relatively small.6 or above 216.2 and 3(IQR) = 3(216. A boxplot (created in Minitab) of this data appears below. the variation is still moderately large. a. Mild outliers: observations below 196-31.4.4 = 133. 57.8-196. 1.8 is an extreme outlier and 250.6 or above 216.2 is a mild outlier.Chapter 1: Overview and Descriptive Statistics 56. Even when removing the outlier.4 = 279. Extreme outliers: observations below 196-62. Of the observations given.0) = 31.8+62.0) = 62. There is a bit of positive skew to the data but. b. seems to be about the same for the two machines. * 140 160 180 200 220 240 260 x The most noticeable feature of the comparative boxplots is that machine 2’s sample values have considerably more variation than does machine 1’s sample values. A comparative boxplot appears below. the Non-Ed data has more variability then the Ed data.65) = 6. Therefore.3. 9. ED: mild outliers are less than .65) = -3. the two largest observations (11. Extreme outliers are less than . Non-ED: mild outliers are less than . Note that there are no mild outliers in the data.Chapter 1: Overview and Descriptive Statistics 59.6. IQR = 2.7)/2 = 1. the upper quartile (median of the upper half of the data) is 7.5(2.75 + 1.3.9.1.725.3(2.75 .65) = -7..9. The outliers in the ED data are clearly visible.65.1 = 2.1+.1.9 .1 )/2 = .4 (the 14th value in the sorted list of data). The lower quartile (median of the lower half of the data.1 .3 .1 or greater than 7. ED: median = . IQR = 7.75 + 3(2.5+1.6) = 19.75.65) = 10.2) are mild outliers. Non-ED ED 0 10 Concentration (mg/L) 29 20 . So. 21. The upper quartile is (2.85 or greater than 2.8)/2 = 2. Therefore. including the median. hence there can not be any extreme outliers either.875 or greater than 2.5(7. There is noticeable positive skewness in both samples. the typical values of the ED data tend to be smaller than those for the Non-ED data.6.1 .6) = -11. c.0) are extreme outliers and the next two largest values (8.9 + 1. There are no outliers at the lower end of the data.7.5(7. The lower quartile (median of the lower 25 observations) is .1. since n is odd) is ( ..5(2.3 = 7.7+2. a. Non-ED: median = (1. b.7. 61. The production welds data does contain 2 outliers. 30 . Variability and the 'typical' values in the data increase a little at the 12 noon and 2 p. The distributions at the other times are fairly symmetric.Chapter 1: Overview and Descriptive Statistics 60. type test ca nnister 500 0 6 000 70 00 8000 burst strength The burst s trengths for the test nozzle closure welds are quite different from the burst strengths of the production canister nozzle welds. A comparative boxplot (created in Minitab) of this data appears below. The test welds have much higher burst strengths and the burst strengths are much more variable. data. The production welds have more consistent burst strength and are consistently lower than the test welds. Outliers occur in the 6 a. times.m.m. 324 − x1 − x4 = 1.593 − x 2 )  ( 3324)2 2 ∑ xi − 2 2 4 Next.683 x3 = 76.593 x 2 + 621.1365 . x2 = 1.593 − x 2 ) −1.859.324 so. x1 + x 2 + x3 + x 4 = 3.651 = 0 . x1 = 683. To somewhat simplify the algebra.8635 and x 3 =1.499 = 0 x 2 we obtain x 2 = 682. s = (180) =  3   So.000 from the original data. It will not affect the standard deviation.859.324 and x 2 + x3 = 3. 444 − x12 + x 42 =1.593 and x 3 = (1.593 − 682.910 . 294. y = 831 nx = ( 4)(831) = 3.651 By substituting x 3 = (1593 − x 2 ) we obtain the equation x 22 + (1.Chapter 1: Overview and Descriptive Statistics Supplementary Exercises 62.8635 = 910. begin by subtracting 76. Evaluating for 31 . 294. ∑x 2 i      = 2.444 .444 and x 22 + x32 = 2. Thus. 2 x x2 −1. x12 + x 22 + x 32 + x 42 = 2. This transformation will affect each date value and the mean.048. x 2 = 76.859. 8 4. Flow rate 125 160 200 Lower Upper Median quartile quartile 3. the three data sets differ with respect to their central values (the medians are different) and the data for flow rate 160 is somewhat less variable than the other data sets.1 3.05 1.9 3.5(IQR) 1.6 IQR 1.4 4.6 There are no outliers in the three data sets.Chapter 1: Overview and Descriptive Statistics 63.1 .3 . Flow rates 125 and 200 also exhibit a small degree of positive skewness.2 4.8 3.4 4. However. as the comparative boxplot below shows.80 3(IQR) .65 1.7 3.2 1.1 2.7 1. Flow rate 200 160 125 3 4 Uniformity (%) 32 5 . 3 8. 9556 .5 )( 2 .5)( 2 .Chapter 1: Overview and Descriptive Statistics 64. The distribution is skewed to the left.15 + (1 . upper fourth = 11.4 11 . 6 7 34 17 stem=ones leaf=tenths 8 4589 9 1 10 12667789 11 122499 12 2 13 1 x = 9.6 s = 1 . 85 − (1 .3) = 5. ~x = 10 .85.6 no outliers 6 7 8 9 10 11 12 13 Radiation There are no outliers.15 f s = 2 . 33 .7594 n = 27 lower fourth = 8.3) = 14 . (735)2 /4]/3 = i i 3529.59. The coefficient of variation of the CO data is 59.8)2 /4]/3 = 91.583 and the sample standard deviation is s = 59. The mean of the HC data is 96. c. should be added to the relative frequencies for the classes to the left of x = 90. unimodal. That is. A representative value for this data would be x = 90. or 32. or 39. The variation in the data is not small since the spread of the data (99-81 = 18) constitutes about 20% of the typical value of 90. 34 . 66.41. the mean of the CO data is 735/4 = 183.8.20 . HC data: ∑x 2 i = 2618. so s 2 = [145645 .1775 + .42 and ∑x i = 96.3233. the approximate proportion of observations that are less than 90 is . CO data: ∑x 2 i = 145645 and ∑x i =735.63%.(96.75 = . .8/4 = 24. x = 90 is the midpoint of the class 89-<91.2544. Relative frequency .7751.Chapter 1: Overview and Descriptive Statistics 65.59/24. The histogram appears below.2. Therefore about half of this frequency.9231. a.0355 + . Therefore.1272. the coefficient of variation of the HC data is 9.(22+13+3)/169 = .0414 + . The proportion less than 95 is 1 . and somewhat bell-shaped. even though the CO data has a larger standard deviation than does the HC data.33%. it actually exhibits less variability (in percentage terms) around its average than does the HC data.2 = . a. which contains 43 observations (a relative frequency of 43/169 = .1272 = .953 and the sample standard deviation is s = 9.4822.10 0 81 83 85 87 89 91 93 Fracture strength (MPa) 95 97 99 b.(6+7)/169 = .1006 + . The proportion of the observations that are at least 85 is 1 . Thus. The histogram is reasonably symmetric.42 . i i so s 2 = [2618. b.41/183.75.3963. i i n ∑ (y = i − y) n −1 2 2 a ∑ ( xi − x ) n −1 n 2 = i n 2 ∑ (axi + b − (ax + b)) n −1 ∑ (ax − ax ) = = a 2 s x2.65 2 2 68.Chapter 1: Overview and Descriptive Statistics 67.5044 = 1.6  1 100 % trimmedmean = = 10. a.70 ) + (10. ∑x i n ∑ ( x − x ) issmallert han∑ ( x − µ ) .70 13  15  163.60 11  15  1 1 1 2 1 ∴ (100 )  + (100)  = 100  = 10% trimmedmean 2  15  2  15   10  1 1 = (10.872  5 2 y 35 2 i n −1 = x. y= s 2 y = ∑ y = ∑ (ax + b) = a ∑ x + b = ax + b. x = οC . .6 − 13.2 163.5 − 8. d ∑ d = −2∑ ( xi − c ) = 0 ⇒ ∑ ( xi − c ) = 0 = dc ∑ ( xi − c) 2 dc ( xi − c) 2 { } ⇒ ∑ xi − ∑ c = 0 ⇒ ∑ xi − nc = 0 ⇒ nc = ∑ xi ⇒ c = b.7  2 100 % trimmedmean = = 10. ∑x i = 163.2 − 8. a. y = οF 9 y = (87.14 5 2  9 s y = s =   (1.60) = 10.8 − 15.5 − 15. b. 2 2 i i 69.2 − 8.04 )2 = 3.3) + 32 = 189. the differences are 3.4.0. 2.5.3.7. 9. which suggests that the weight training produced higher oxygen consumption for most subjects. The median difference is about 6 liters. a. 6. on average. than the treadmill exercise. 14.8.2. b.5. -0. 0 5 10 15 Difference The majority of the differences are positive. 8. 9. 10. and 11.4.4.4. Oxygen Consumption 25 20 15 10 5 0 Treadmill Weight Exercise Type There is a significant difference in the variability of the two samples.1. 36 .Chapter 1: Overview and Descriptive Statistics 70. Subtracting the y from the x for each subject.1. -2.5. with the median consumptions being approximately 20 and 11 liters. respectively. 5. 2. 8. The weight training produced much higher oxygen consumption.2. 2. and adds a visual of the outliers. and one on the upper. The range of values is only 25. b. The mean. two on the lower end. they are balanced.25. which suggests symmetry. and trimmed mean are virtually identical. 150 strength 140 130 120 The boxplot also displays the symmetry. a. If there are outliers.5. but half of the values are between 132.Chapter 1: Overview and Descriptive Statistics 71.95 and 138. 37 . median. 23 51 14.Chapter 1: Overview and Descriptive Statistics A table of summary statistics. Mean Median Std Dev Min Max PTSD 32. The distribution of values for the healthy is reasonably symmetric.86 23 72 3 1 2 0 058 9 3 1578899 7310 4 26 81 5 9763 6 2 7 PTSD Individuals 72. The healthy individuals have higher receptor binding measure on average than the individuals with PTSD. There is also more variation in the healthy individuals’ values. The box plot indicates that there are no outliers.92 37 9. and confirms the above comments regarding symmetry and skewness. a stem and leaf display. while the distribution for the PTSD individuals is negatively skewed.93 10 46 Healthy 52. Healthy 10 20 30 40 50 Receptor Binding 38 60 70 stem = tens leaf = ones . and a comparative boxplot are below. The Modal Category is the one in which the most observations occur.7 8 0. a. The mean and the median are close in value.93. 74. There are no outliers. upperfourt h = . It occurs four times in the data set.9 1.0809.8 0.96 0. 39 .855. 0. ~x = .9255.0 Cadence The data appears to be a bit skewed toward smaller values (negatively skewed). b.0 0566 stem=tenths leaf=hundredths x = .9 2233335566 1.93 lowerfourt h = . Mode = .8 11556 0. s = .Chapter 1: Overview and Descriptive Statistics 73. The graph shows substantially the same type of information as those described in (a) except the graphs based on quartiles are able to detect the slight differences in variation between the three data sets. The median is the same (371) in each plot and all three data sets are very symmetric. a. Moreover.Chapter 1: Overview and Descriptive Statistics 75. all three have the same minimum value (350) and same maximum value (392). . all three data sets have the same lower (364) and upper quartiles (378). . : . Our boxplots use the lower and upper hinges to define the spread of the middle 50% of the data. . In addition. b. . . :. -----+---------+---------+---------+---------+---------+. -----+---------+---------+---------+---------+---------+. .0 368.Type 2 . ::: .: . Technically speaking. . . usually unnoticeable. : . So. a comparative boxplot based on the actual quartiles (as computed by Minitab) is shown below. The primary reason is that boxplots give up some detail in describing data because they use only 5 summary numbers for comparing data sets. but other authors sometimes use the actual quartiles for this purpose. . Type of wire 3 2 1 350 360 370 MPa 40 380 390 . . :. . the median of the lower half of the data is not really the first quartile.. . These graphs show that there are differences in the variability of the three data sets. the medians of the lower and upper halves of the data are often called the lower and upper hinges. all three boxplots will be identical.0 376. The difference is usually very slight.0 360. although it is generally very close. For example in the data sets of this exercise.Type 3 352. They also show differences in the way the values are distributed in the three data sets. . . .0 384.0 c. .Type 1 .0 392. . A comparative dotplot is shown below. but not always. Instead. . . . The boxplot in (a) is not capable of detecting the differences among the data sets. Note: The definition of lower and upper quartile used in this text is slightly different than the one used by some other authors (and software packages). -----+---------+---------+---------+---------+---------+. 77.5 stem: ones leaf: tenths 41 . The midrange is sensitive because it uses only the most extreme values in its computation. The median. the trimmed mean. The larger the trimming percentage. The median is the most resistant to outliers because it uses only the middle value (or values) in its computation. the more resistant the trimmed mean becomes. which uses the quartiles. The midhinge. The measures that are sensitive to outliers are: the mean and the midrange. is reasonably resistant to outliers because both quartiles are resistant to outliers. a.0 24. The mean is sensitive because all values are used in computing it. The trimmed mean is somewhat resistant to outliers.Chapter 1: Overview and Descriptive Statistics 76. and the midhinge are not sensitive to outliers. 0 2355566777888 1 0000135555 2 00257 3 0033 4 0057 5 044 6 7 05 88 90 10 3 HI 22. Since the constant b.196 . Then s z = cs y = (1/s)s = 1.076 .152 .004 0.05 0.098 4 -< 6 6 -< 10 7 4 . 42 .043 . the “standardized” quantities z1 . x is subtracted from each x value to obtain each y value.Chapter 1: Overview and Descriptive Statistics b.25 Density 0.022 . … . and s z2 = 1.500 . Density 0 -< 2 2 -< 4 23 9 . That is. zn have a sample variance and standard deviation of 1. s y = s x and s y = s x a.022 10 -< 20 1 .087 . Interval Frequency Rel. Freq.10 0.002 20 -< 30 2 . Let c = 1/s.20 0.250 . where s is the sample standard deviation of the x’s and also (by a ) of the y’s.00 0 2 4 6 10 20 30 Repair Time 78.15 0. and 2 2 addition or subtraction of a constant doesn’t affect variability. sox n+1 = [ nxn + xn+1 ] (n + 1) b. ( ) So the standard deviation 43 s n+1 = .8 − 12.512 2 + n ( n + 1) 15 (16) x n+1 = sn2+1 ( ) = .53 16 16 n − 1 2 ( xn+1 − x n ) 2 14 (11.58) + 11. } When the expression for x n+1 from a is substituted.5 = = 12. as desired: n( xn +1 − x n ) 2 ( n + 1) 15(12.245 + .238 .58) 2 = sn + = . a.238 = .Chapter 1: Overview and Descriptive Statistics 79. ns 2 n +1 n +1 n +1 = ∑ ( xi − x n+1) = ∑ xi2 − ( n + 1) xn2+1 2 i =1 i =1 n = ∑ xi2 − nx n2 + xn2+1 + nxn2 − ( n + 1) xn2+1 i =1 { = (n − 1) sn2 + xn2+1 + nxn2 − ( n + 1) xn2+1 c.8 200.038 = . the expression in braces simplifies to the following.532 . n +1 n i =1 i =1 ∑ xi = ∑ xi + x n+1 = nx n + xn +1 . < 35. again an indication of positive skewness. the smallest value (i.< 20. The 174th ordered value lies in the interval 16 -< 18. we would say. Thus the median (50th percentile) should be the 196 ordered value.400 = 100 while 95th percentile -median = 720 . 44 .500 = 220. Finally. d.Chapter 1: Overview and Descriptive Statistics 80.5th percentile = 500 .102  391  c.. Assuming that the histogram is unimodal. Thus.< 20. First compute (.01 0. the median is roughly 19. The 196th observation is about in the middle of these. Proportion less than Proportion at least 81. the largest 5% of the values (above the 95th percentile) are further from the median than are the lowest 5%. We do not know how these values are distributed.500 = 140.90)(391 + 1) = 352.02 0. The 352nd ordered value lies in the interval 30 . So. Thus. the 90th percentile should be about the 352nd ordered value.e. note that the largest value (925) is much further from the median (925-500 = 425) than is the smallest value (500 220 = 280).552  391   40  30 =   = .05 Density 0. the mean and median would coincide).04 0.10th percentile = 500 430 = 70 while 90th percentile -median = 640 .00 5 15 25 35 45 length b. a.< 30. First compute (. the 352nd value in the data set) cannot be smaller than 30.06 0. So. the 90th percentile is roughly 30. Bus Route Length 0. compare the distances of the 5th and 95th percentiles from the median: median . however.  216  20 =   = .< 35. The next 42 observation lie in the interval 18 . The 351st ordered value lies in the interval 28 . Thus. For more evidence of skewness. There are 27 values in the interval 30 .03 0.8. then there is evidence of positive skewness in the data since the median lies to the left of the mean (for a symmetric distribution. ordered observation 175 to 216 lie in the intervals 18 . The same skewness is evident when comparing the 10th and 90th percentiles to the median: median .50)(391 + 1) = 196. 0 50.1 1 47.2 4 48.α = . etc.0 xt = αxt + (1 − α ) xt−1 = αxt + (1 − α )[αxt −1 + (1 − α ) xt − 2 ] = αxt + α (1 − α ) xt−1 + (1 − α ) 2[αxt− 2 + (1 − α ) xt −3 ] = . a.8 50.8 14 48. c.1)( 54) + (.9 50. since (1-α)t-1 will be very small.0 2 47.1x3 + .9 x1 = (. 45 .Chapter 1: Overview and Descriptive Statistics 82.1x2 + .7 x3 = . Temperature 60 50 40 Index b. d. There is some evidence of a cyclical pattern.4 5 48. xt for. the coefficient on xtk decreases (further back in time implies less weight).1)(53) + (.1 48.5 51.8 49..0 50.9 48.9 47. As k increases.0 7 47.0 50..2.6 49..7) = 48. + α (1 − α ) t− 2 x2 + (1 − α ) t−1 x1 Thus.9 x 2 = (.7 3 48.1 gives a smoother series.3 12 48.23 ≈ 48. (x bar)t depends on xt and all previous values. 9)( 47.9 α= .2 6 48.1 9 48..5 11 48.9 8 48. 5 10 x2 = .9)( 47 ) = 47.4 47.5 47.6 13 48.α = .4 10 48.2 47. = αxt + α (1 − α ) xt −1 + α (1 − α ) 2 xt − 2 + . Not very sensitive. t xt for. so yn − x = x − y1 .4. Similarly.Chapter 1: Overview and Descriptive Statistics 83. 204. 181. 23. ( 53. ( 208. 217.1).2.5). The first number in each of the first seven pairs greatly exceed the second number. ( 267. a. so that each point in the plot will fall exactly on the 45 degree line. A substantial positive skew (stretched upper tail) is indicated.6). Thus. so each point falls well below the 45 degree line. ~ x − y1 ) will fall considerably below the 45 degree line.4. so will considerably exceed yn − ~x ~ x − y1 and the point ( y n − ~ x. 213.0).5. ( 112.6. the second smallest and second largest will be equidistant from the median.1) = (2524. and (20.2). ( 81.0. 106.0). 221. When the data is positively skewed. 20. so yn −1 − x = x − y 2 and so on. 46 . When there is perfect symmetry. 188. (33.5.6 0.9). The first point in the plot is (2745.6 – 221. b. 190. ( 756. the first and second numbers in each pair will be equal. the smallest observation y 1 and the largest observation y n will be equidistant from the median.9).9.9).1. 129. ( 53. The others are: (1476. A similar comment aplies to other points in the plot.4. (1434. ( 481. y n will be much further from the median than is y 1 . 103.4.3).4.0).8.2).2.3).1. 102. 92. LRS. 3241. LSR. RRS. RLR. RSS. RRS. 2341. SRR } 47 . LLR. SRL. Event A = { RRR. LLS. Event D = { RRL. RLR. RRS. 1432 } c.CHAPTER 2 Section 2. LSS } Using similar reasoning. Event D′ contains outcomes where all cars go the same direction. 2431. 1423. a. 1432. 3214. RLR. 1423. 3124. 1342. RSR. SLL. SSL. 2413. SLR } c. LRL. RRS. RSL. LLL. RSR. 4231 } d. Event C = { RRL. LRR. we see that the compound event C∩D = C: C∩D = { RRL. RLS. 3214. 2314. SSR. SSR. RLR. LSL. 4213. Event B contains the outcomes where 2 is first or second: B = { 2314. LSL. SRR } d. S = { 1324. the compound event C∪D = D: C∪D = { RRL. SLL. Event B = { RLS. The compound event A∪B contains the outcomes in A or B or both: A∪B = {1324. SLR } 2. 3241. SRS. 4213. or they all go different directions: D′ = { RRR. 4123. 1342. 4132. 3214. 2413. Because Event D totally encloses Event C. SLS. Event A contains the outcomes where 1 is first in the list: A = { 1324. 2341. 4231 } b. 4213. 2431. LRR. LRS. LRL. 4231 } a. 2341.1 1. SRS. RSL. RSR. 1423. SLS. 2431. SRL. LLS. 2314. SRR. SSS } b. RSS. SRR. RSR. 3241. LRR. 2413. LLR. RLL. SSS. LSS } e. SSL. LSR. RLL. LRR. 1342. LLL. 3142. 1432. FFF } Event A∪C = { SSS. a. then at least one of the other two components must work (at least one S in the 2nd and 3rd positions: Event C = { SSS. 9 e. FSS } c. or that at most all of them are variable: outcomes 1. 5.Chapter 2: Probability 3. Outcome numbers 1. This cannot happen. SFS. 3. 3. FSF. SSF. SFS } Event B∪C = { SSS.9. 5 . The UNION described is the event that either exactly three are fixed. 2. 16. SFS } 4.5. SFS. FFS. SSF. 48 .2. Event A = { SSF. SSF. Event C′ = { SFF. f. FSS. 2. SFS. FSS } Event B∩C = { SSS SSF. 16 d.16.3. For Event C. In words. 3. a. or that all four are the same: outcomes 1. FSS } b. the UNION described is the event that either all of the mortgages are variable. Event B = { SSS. Outcome numbers 1. FSS } Event A∩C = { SSF. 5. SFS. SFS } d. Outcome numbers 2.9 c. The INTERSECTION in words is the event that exactly three are fixed AND that all four are the same. SSF. Outcome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Home Mortgage Number 1 2 3 4 F F F F F F F V F F V F F F V V F V F F F V F V F V V F F V V V V F F F V F F V V F V F V F V V V V F F V V F V V V V F V V V V b. The INTERSECTION described is the event that all of the mortgages are fixed: outcome 1. (There are no outcomes in common) : b ∩ c = ∅. 9. the system must have component 1 working ( S in the first position). 9. Outcome Numbers 1.Chapter 2: Probability 5. Outcome Numbers 6. 22 d. Outcome Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Outcome 111 112 113 121 122 123 131 132 133 211 212 213 221 222 223 231 232 233 311 312 313 321 322 323 331 332 333 b. Outcome Numbers 1. 8. 20. 27 c. 25. 27 49 . 12. 19. 16. 14. 21. 3. 7. a. BAABABA. AABABBA. BAAABBA. BABAAAB. 15 d. BABABAA. S = {BBBAAAA. Outcomes 3. AAAABBB} b. AABBABA. 14. AABAABB. 9. BABBAAA. ABABBAA. AAABABB. ABBAAAB. AABAABB. AAABBAB. 12. Outcomes 13. 15 a. Outcome Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Outcome 123 124 125 213 214 215 13 14 15 23 24 25 3 4 5 b. ABAAABB. ABAABAB. BBABAAA. AABBAAB. 15 c. 11. AAABBBA. ABAABBA. AABABAB} 7. 13. ABBAABA. AABBBAA. a. BAABBAA. BAAABAB. ABABABA. BAABAAB.Chapter 2: Probability 6. BBAAABA. Outcomes 10. 6. {AAAABBB. BBAABAA. 14. BAAAABB. ABABAAB. AABABAB. BABAABA. ABBBAAA. BBAAAAB. 12. AAABABB. 50 . AAABBAB. ABBABAA. A1 ∪ A2 ∪ A3 b. A1 ∩ A2′ ∩ A3′ 51 . a. A1 ∩ A2 ∩ A3 c.Chapter 2: Probability 8. (A 1 ∩ A 2 ′∩ A 3 ′) ∪ (A 1 ′ ∩ A 2 ∩ A 3 ′) ∪ (A 1 ′∩ A 2 ′∩ A 3 ) e.Chapter 2: Probability d. A 1 ∪ (A 2 ∩ A 3 ) 52 . Using the diagram on the right above. Chrys} are three mutually exclusive events. the union of A′ and B′ is represented by the areas that have either shading or stripes or both. a. 53 . Buick}. Pont}. Merc}. On the right. In the diagram on the left. yet there is no outcome common to all three events. In the diagram below. 10. the shaded area is (A∪B)′. Pont. the shaded area is A′. F = {Pont. Ford}. C = {Plym. the striped area is B′.g. let E = {Chev. the shaded area represents (A∩B)′. G = {Buick. These events are not mutually exclusive (e. a. A = {Chev. E and F have an outcome in common). Both of the diagrams display the same area. b.Chapter 2: Probability 9. No. Buick}. B = {Ford. b. These two diagrams display the same area. and the intersection A′ ∩ B′ occurs where there is BOTH shading and stripes. P(A ∪ B)′ = 1 .57.64 c.22 +.07 b. #3: P(A 1 ∪ A 2 ∪ A 3 ) = P(A 1 ) + P(A 2 ) + P(A 3 ) .17 54 .53 = .11 = . e.65 = .53 awarded none of the three projects: P( A 1 ′ ∩ A 2 ′ ∩ A 3 ′ ) = 1 – P(awarded at least one) = 1 .47.. .. #2.25 a.15 + . ..30 c.01 = . awarded neither #1 or #2: P(A 1 ′ ∩ A 2 ′) = P[(A 1 ∪ A 2 ) ′] = 1 .36 b..2 11. P(A ∪ B) = .50 + .25 + .P(A 1 ∪ A 2 ) = 1 .65 b.22 + ...Chapter 2: Probability Section 2.25 = .50 .25 = . P(A ∩ B′) = P(A) – P(A ∩ B) = . d.P(A 1 ∩ A 2 ) = . a. Let event A = selected customer owns stocks..18 + .05 = . awarded either #1 or #2 (or both): P(A 1 ∪ A 2 ) = P(A 1 ) + P(A 2 ) .25 .35 c.28 .05 . awarded #3 but neither #1 nor #2: P( A 1 ′ ∩ A 2 ′ ∩ A 3 ) = P(A 3 ) . A ∩ B′ .40 . Then the probability that a selected customer does not own a stock can be represented by P(A′) = 1 .. a.25) = 1 ..01 = . awarded at least one of #1.P(A 1 ∩ A 2 ) . This could also have been done easily by adding the probabilities of the funds that are not stocks.10 + .. 13.43 = .28 .P(A 1 ∩ A 3 ) – P(A 2 ∩ A 3 ) + P(A 1 ∩ A 2 ∩ A 3 ) = .11 -.05 .36 = .07+ .P(A 1 ∩ A 3 ) – P(A 2 ∩ A 3 ) + P(A 1 ∩ A 2 ∩ A 3 ) = .P(A) = 1 – (..07 + . 12. 47 + . either (neither #1 nor #2) or #3: P[( A 1 ′ ∩ A 2 ′ ) ∪ A 3 ] = P(shaded region) = P(awarded none) + P(A 3 ) = .75 Alternatively.28 = .Chapter 2: Probability f. answers to a – f can be obtained from probabilities on the accompanying Venn diagram 55 . 005 = .Chapter 2: Probability 14. and PDC.. P( C ranked first and D last) = P({CPD}) = 15. corresponding to the outcomes CDP. There are six simple events. Let event E be the event that at most one purchases an electric dryer. the desired probability = 1 – P(A) – P(B) = 1 .3 Shaded region = event of interest = (A ∩ B′) ∪ (A′ ∩ B) a. DCP. CPD.P(A ∪ B) = . All other possible outcomes are those in which at least one of each type is purchased... Let event A be the event that all five purchase gas. P(E′) = 1 – P(E) = 1 . The probability assigned to each is 16 .879 a. P(shaded region) = P(A ∪ B) . so P(A ∩ B) = P(A) + P(B) . DPC.572 b. P( C ranked first) = P( {CPD.116 .6 b. Then E′ is the event that at least two purchase electric dryers. Thus. b. CDP} ) = c.P(A ∩ B) = .9 = . a. Let event B be the event that all five purchase electric..9 . PCD..7 .6 = .333 . P(A ∪ B) = P(A) + P(B) . 16.8 +. 1 6 56 + 16 = 1 6 2 6 = .428 = .P(A ∩ B). P(A ∪ B) = P(A) + P(B) = .10 + . P(B ∩ A′) = P(B) .70 c. .Chapter 2: Probability 17. {S2. P(A′) = 1 – P(A) = 1 .{S3.60 Let event A be that the selected joint was found defective by inspector A. respectively.0751 . Let event A be that a 75 W bulb is selected first..0724 + . 000 . . The only way for the desired event NOT to happen is if a 75 W bulb is selected first. respectively.80 (since A and B are mutually exclusive events) d.{S3.C1}.30 + . = . {S3.1159 = .P(A ∪ B) =1 .C2}) = 1 – ( . Let C1 and C2 represent the unsafe conditions and unrelated to conditions. P(A∪B) = a.05 = .55 57 .C1}.30 = .23 c. P(A′ ∩ B′) = P[(A ∪ B) ′] (De Morgan’s law) = 1 .C2}. 000 = .C2}. S2 and S3 represent the swing and night shifts.08 + .8841 The desired event is B ∩ A′..P({S1.0435 Let S1. 6 15 = 159 = . So P(A′) = 1 – P(A) = 1 − 19. 20. 18.{S2. The simple events are {S1. Compound event A∪B is the event that the selected joint was found defective by at least one of the two inspectors. P(A) = event B be analogous for inspector B.35) = . 000 = 8841 10.C1}.. The probabilities do not add to 1 because there are other software packages besides SPSS and SAS for which requests could be made. 000 724 10.C1})= . 000 The desired event is (A∪B)′. Then the desired event is event A′.C1}. b.P(A ∩ B). P(B) = 751 10.0316 So P(B ∩ A′) = P(B) .C1}.0316 = .20 This situation requires the complement concept. Let 1159 10.P(A ∩ B) = . a.P(A∪B).C2}. P({S1}′) = 1 . a. 1159 10.. {S1. P(A ∩ B) = P(A) + P(B) . {S1. {S2.P(A∪B) = 1 - b. 6 and P(A) = 15 . P({C1})= P({S1.50 = . b.10 + .0751 .C1}. so we use the complement rule: P(A∪B)′ = 1 .80 = . Chapter 2: Probability 21. a. P({M,H}) = .10 b. P(low auto) = P[{(L,N}, (L,L), (L,M), (L,H)}] = .04 + .06 + .05 + .03 = .18 Following a similar pattern, P(low homeowner’s) = .06 + .10 + .03 = .19 c. P(same deductible for both) = P[{ LL, MM, HH }] = .06 + .20 + .15 = .41 d. P(deductibles are different) = 1 – P(same deductibles) = 1 - .41 = .59 e. P(at least one low deductible) = P[{ LN, LL, LM, LH, ML, HL }] = .04 + .06 + .05 + .03 + .10 + .03 = .31 f. P(neither low) = 1 – P(at least one low) = 1 - .31 = .69 a. P(A 1 ∩ A 2 ) = P(A 1 ) + P(A 2 ) - P(A 1 ∪ A 2 ) = .4 + .5 - .6 = .3 b. P(A 1 ∩ A 2 ′) = P(A 1 ) - P(A 1 ∩ A 2 ) = .4 - .3 = .1 c. P(exactly one) = P(A 1 ∪ A 2 ) - P(A 1 ∩ A 2 ) = .6 - .3 = .3 22. 23. Assume that the computers are numbered 1 – 6 as described. Also assume that computers 1 and 2 are the laptops. Possible outcomes are (1,2) (1,3) (1,4) (1,5) (1,6) (2,3) (2,4) (2,5) (2,6) (3,4) (3,5) (3,6) (4,5) (4,6) and (5,6). 1 15 a. P(both are laptops) = P[{ (1,2)}] = =.067 b. P(both are desktops) = P[{(3,4) (3,5) (3,6) (4,5) (4,6) (5,6)}] = c. P(at least one desktop) = 1 – P(no desktops) = 1 – P(both are laptops) = 1 – .067 = .933 d. P(at least one of each type) = 1 – P(both are the same) = 1 – P(both laptops) – P(both desktops) = 1 - .067 - .40 = .533 58 6 15 = .40 Chapter 2: Probability 24. Since A is contained in B, then B can be written as the union of A and (B ∩ A′), two mutually exclusive events. (See diagram). From Axiom 3, P[A ∪ (B ∩ A′)] = P(A) + P(B ∩ A′). Substituting P(B), P(B) = P(A) + P(B ∩ A′) or P(B) - P(A) = P(B ∩ A′) . From Axiom 1, P(B ∩ A′) ≥ 0, so P(B) ≥ P(A) or P(A) ≤ P(B). For general events A and B, P(A ∩ B) ≤ P(A), and P(A ∪ B) ≥ P(A). 25. P(A ∩ B) = P(A) + P(B) - P(A∪B) = .65 P(A ∩ C) = .55, P(B ∩ C) = .60 P(A ∩ B ∩ C) = P(A ∪ B ∪ C) – P(A) – P(B) – P(C) + P(A ∩ B) + P(A ∩ C) + P(B ∩ C) = .98 - .7 - .8 - .75 + .65 + .55 + .60 = .53 a. P(A ∪ B ∪ C) = .98, as given. b. P(none selected) = 1 - P(A ∪ B ∪ C) = 1 - .98 = .02 c. P(only automatic transmission selected) = .03 from the Venn Diagram d. P(exactly one of the three) = .03 + .08 + .13 = .24 59 Chapter 2: Probability 26. 27. 28. a. P(A 1 ′) = 1 – P(A 1 ) = 1 - .12 = .88 b. P(A 1 ∩ A 2 ) = P(A 1 ) + P(A 2 ) - P(A 1 ∪ A 2 ) = .12 + .07 - .13 = .06 c. P(A 1 ∩ A 2 ∩ A 3 ′) = P(A 1 ∩ A 2 ) - P(A 1 ∩ A 2 ∩ A 3 ) = .06 - .01 = .05 d. P(at most two errors) = 1 – P(all three types) = 1 - P(A 1 ∩ A 2 ∩ A 3 ) = 1 - .01 = .99 Outcomes: a. (A,B) (A,C1 ) (A,C2 ) (A,F) (B,A) (B,C1 ) (B,C2 ) (B,F) (C1 ,A) (C1 ,B) (C1 ,C2 ) (C1 ,F) (C2 ,A) (C2 ,B) (C2 ,C1 ) (C2 ,F) (F,A) (F,B) (F,C1 ) (F,C2 ) 2 P[(A,B) or (B,A)] = 20 = 101 = .1 b. P(at least one C) = c. P(at least 15 years) = 1 – P(at most 14 years) = 1 – P[(3,6) or (6,3) or (3,7) or (7,3) or (3,10) or (10,3) or (6,7) or (7,6)] 8 = 1 − 20 = 1 − .4 = .6 14 20 = 107 = .7 There are 27 equally likely outcomes. a. P(all the same) = P[(1,1,1) or (2,2,2) or (3,3,3)] = 3 27 = 1 9 b. P(at most 2 are assigned to the same station) = 1 – P(all 3 are the same) 3 24 = 1 − 27 = 27 = 89 c. P(all different) = [{(1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1)}] = 6 27 = 2 9 60 Chapter 2: Probability Section 2.3 29. a. (5)(4) = 20 (5 choices for president, 4 remain for vice president) b. (5)(4)(3) = 60 c.  5  5!   = = 10 (No ordering is implied in the choice)  2  2!3! a. Because order is important, we’ll use P8,3 = 8(7)(6) = 336. b. Order doesn’t matter here, so we use C30,6 = 593,775. c. From each group we choose 2: d. The numerator comes from part c and the denominator from part b: e. We use the same denominator as in part d. We can have all zinfandel, all merlot, or all cabernet, so P(all same) = P(all z) + P(all m) + P(all c) = 30.  8  10  12    •   •   = 83,160  2  2   2  83,160 = .14 593,775  8  10  12    +   +    6   6   6  = 1162 = .002  30  593,775   6 31. a. (n 1 )(n 2 ) = (9)(27) = 243 b. (n 1 )(n 2 )(n 3 ) = (9)(27)(15) = 3645, so such a policy could be carried out for 3645 successive nights, or approximately 10 years, without repeating exactly the same program. 61 Chapter 2: Probability 32. a. 5×4×3×4 = 240 b. 1×1×3×4 = 12 c. 4×3×3×3 = 108 d. # with at least on Sony = total # - # with no Sony = 240 – 108 = 132 e. P(at least one Sony) = 132 240 = .55 P(exactly one Sony) = P(only Sony is receiver) + P(only Sony is CD player) + P(only Sony is deck) 1 × 3 × 3 × 3 4 × 1 × 3 × 3 4 × 3 × 3 ×1 27 + 36 + 36 + + = 240 240 240 240 99 = = .413 240 = 33. a.  25  25!   = = 53,130  5  5!20! b.  8  17    •   = 1190  4  1   8  17      4  1  = 1190 = .022 c. P(exactly 4 have cracks) = 53,130  25    5 d. P(at least 4) = P(exactly 4) + P(exactly 5)  8  17   8 17         4  1  +  5  0  = .022 + .001 = .023 =  25   25      5 5 62 Chapter 2: Probability 34. a.  20  25     6 0 38,760  20    = 38, 760. P(all from day shift) =    = = .0048 8,145,060  45  6   6  20  25  15  30  10  35           6 0 6 0 6 0 b. P(all from same shift) =    +    +     45   45   45        6 6 6 = .0048 + .0006 + .0000 = .0054 c. P(at least two shifts represented) = 1 – P(all from same shift) = 1 - .0054 = .9946 d. Let A 1 = day shift unrepresented, A 2 = swing shift unrepresented, and A 3 = graveyard shift unrepresented. Then we wish P(A 1 ∪ A 2 ∪ A 3 ). P(A 1 ) = P(day unrepresented) = P(all from swing and graveyard)  25    6, P(A 1 ) =  45    6  30    6, P(A 2 ) =  45    6 10    6 P(A 1 ∩ A 2 ) = P(all from graveyard) =  45    6 15   20      6  6, P(A 1 ∩ A 3 ) = , P(A 2 ∩ A 3 ) =  45   45      6 6  25   30   35  10          6+6+ 6- 6 So P(A 1 ∪ A 2 ∪ A 3 ) =  45   45   45   45          6 6 6 6 = .2939 - .0054 = .2885 63  35    6, P(A 3 ) =  45    6 P(A 1 ∩ A 2 ∩ A 3 ) = 0, 15    6  45    6  20    6  45    6 Chapter 2: Probability 35. There are 10 possible outcomes --  5   ways to select the positions for B’s votes: BBAAA,  2 BABAA, BAABA, BAAAB, ABBAA, ABABA, ABAAB, AABBA, AABAB, and AAABB. Only the last two have A ahead of B throughout the vote count. Since the outcomes are equally likely, the desired probability is 2 10 = .20 . 36. 37. a. n 1 = 3, n 2 = 4, n 3 = 5, so n 1 × n 2 × n 3 = 60 runs b. n 1 = 1, (just one temperature), n 2 = 2, n 3 = 5 implies that there are 10 such runs. There are  60    ways to select the 5 runs. Each catalyst is used in 12 different runs, so the 5 number of ways of selecting one run from each of these 5 groups is 125 . Thus the desired probability is 12 5 = .0456 .  60    5 38.  6  9      2  1  = 15 ⋅ 9 = .2967 a. P(selecting 2 - 75 watt bulbs) = 15  455   3  4  5 6   +   +    3   3   3 = 4 + 10 + 20 = .0747 b. P(all three are the same) = 15  455   3 c.  4  5  6  120     = = .2637  1  1  1  455 64 Chapter 2: Probability d. To examine exactly one, a 75 watt bulb must be chosen first. (6 ways to accomplish this). To examine exactly two, we must choose another wattage first, then a 75 watt. ( 9 × 6 ways). Following the pattern, for exactly three, 9 × 8 × 6 ways; for four, 9 × 8 × 7 × 6; for five, 9 × 8 × 7 × 6 × 6. P(examine at least 6 bulbs) = 1 – P(examine 5 or less) = 1 – P( examine exactly 1 or 2 or 3 or 4 or 5) = 1 – [P(one) + P(two) + … + P(five)] 9×6 9 ×8 × 6 9 ×8 × 7 × 6 9×8×7×6× 6  6 = 1−  + + + +  15 15 ×14 15 × 14 × 13 15 × 14 × 13 × 12 15 × 14 ×13 × 12 × 11 = 1 – [.4 + .2571 + .1582 + .0923 + .0503] = 1 - .9579 = .0421 39. a. We want to choose all of the 5 cordless, and 5 of the 10 others, to be among the first 10  5 10      5 5  = 252 = .0839 serviced, so the desired probability is  15  3003    10  b. Isolating one group, say the cordless phones, we want the other two groups represented in the last 5 serviced. So we choose 5 of the 10 others, except that we don’t want to include the outcomes where the last five are all the same.  10    − 2 5 So we have . But we have three groups of phones, so the desired probability is 15    5 10   3 ⋅   − 2   5   = 3( 250) = .2498 . 3003 15    5 c. We want to choose 2 of the 5 cordless, 2 of the 5 cellular, and 2 of the corded phones:  5  5  5       2  2  2  = 1000 = .1998  15 5005   6 65 Chapter 2: Probability 40. a. If the A’s are distinguishable from one another, and similarly for the B’s, C’s and D’s, then there are 12! Possible chain molecules. Six of these are: A 1 A 2 A 3 B2 C3 C1 D3 C2 D1 D2 B3 B1 , A 1 A 3 A 2 B2 C3 C1 D3 C2 D1 D2 B3 B1 A 2 A 1 A 3 B2 C3 C1 D3 C2 D1 D2 B3 B1 , A 2 A 3 A 1 B2 C3 C1 D3 C2 D1 D2 B3 B1 A 3 A 1 A 2 B2 C3 C1 D3 C2 D1 D2 B3 B1 , A 3 A 2 A 1 B2 C3 C1 D3 C2 D1 D2 B3 B1 These 6 (=3!) differ only with respect to ordering of the 3 A’s. In general, groups of 6 chain molecules can be created such that within each group only the ordering of the A’s is different. When the A subscripts are suppressed, each group of 6 “collapses” into a single molecule (B’s, C’s and D’s are still distinguishable). At this point there are 12! molecules. Now suppressing subscripts on the B’s, C’s and D’s in turn gives 3! ultimately b. 12! (3 !) 4 = 369,600 chain molecules. Think of the group of 3 A’s as a single entity, and similarly for the B’s, C’s, and D’s. Then there are 4! Ways to order these entities, and thus 4! Molecules in which the A’s are contiguous, the B’s, C’s, and D’s are also. Thus, P(all together) = 4! 369.600 = .00006494 . 41. a. P(at least one F among 1st 3) = 1 – P(no F’s among 1st 3) =1- 4 × 3× 2 24 =1− = 1 − .0714 = .9286 8×7× 6 336 An alternative method to calculate P(no F’s among 1st 3) would be to choose none of the females and 3 of the 4 males, as follows:  4  4      0  3  = 4 = .0714 , obviously producing the same result.  8 56    3  4  4      4  1  = 4 = .0714 st b. P(all F’s among 1 5) =  8 56    5 c. P(orderings are different) = 1 – P(orderings are the same for both semesters) = 1 – (# orderings such that the orders are the same each semester)/(total # of possible orderings for 2 semesters) = 1− 8 × 7 × 6 × 5 × 4× 3 × 2 ×1 = .99997520 (8 × 7 × 6 × 5 × 4 × 3 × 2 × 1×) × (8 × 7 × 6 × 5 × 4 × 3 × 2 × 1) 66 Chapter 2: Probability 42.960   5 P(straight) = 10 × 45 = . # of orderings without a H-W pair in seats #1 and 3. etc.0667 6 × 5 × 4 × 3 × 2 ×1 15 P(J&P next to each other) = P(J&P in 1&2) + … + P(J&P in 5&6) = 5× 1 1 = = . etc) 45 1024 P(10 high straight) = = = . # =can’t put the mate of seat #2 here or else a H-W pair would be in #5 and 6. Seats: P(J&P in 1&2) = 2 ×1 × 4 × 3 × 2 ×1 1 = = . and no H next to his W = 6 × 4 × 2# × 2 × 2 × 1 = 192 # = can’t be mate of person in seat #1 or #2.598.00001539  52    5 67 . = # of 10 high straights = 4×4×4×4×4 ( 4 – 10’s.333 15 3 P(at least one H next to his W) = 1 – P( no H next to his W) We count the # of ways of no H next to his W as follows: # if orderings without a H-W pair in seats #1 and 3 and no H next to his W = 6* × 4 × 1* × 2# × 1 × 1 = 48 *= pair. so P(straight flush) = 40 = . # of seating arrangements with no H next to W = 48 + 192 = 240 240 1 = . so 6 × 5 × 4 × 3 × 2 ×1 3 1 2 P(at least one H next to his W) = 1 .000394  52  2.= 3 3 And P(no H next to his W) = 43. So. 4 – 9’s .) There are only 40 straight flushes (10 in each suit).003940 (Multiply by 10 because there are 10 different card  52     5 values that could be high: Ace. King. P(A|C) = P ( A ∩ C) .400 .4 45.192. P(C|A) = P ( A ∩ C ) . P(C) =. the probability that he is P( A) . and P (BA) = the probability of the individual being a professional basketball player. If we know that the individual came from ethnic P(C ) .500 P(A ∩ C) = . Let event A be that the individual is more than 6 feet tall.106 + .082 + . Define event D = {ethnic group 1 selected}. The number of individuals that are pro BB players is small in relation to the # of males more than 6 feet tall. so the probability of an individual in that reduced sample space being more than 6 feet tall is very large.141 + .065] = . because to each subset of size k there corresponds exactly one subset of size n-k (the n-k objects not in the subset of size k).200 + .215 + .40.004 = . If a person has type A blood.200 b.447.106 + .200 = . Section 2.200 = = .500 group 3.500 1 – [.447 . 68 . Let event B be that the individual is a professional basketball player.Chapter 2: Probability  n  n  n! n!  = = =    k  k! (n − k )! (n − k )! k!  n − k  44. Then P (AB) = the probability of the individual being more than 6 feet tall. knowing that the individual is more than 6 feet tall.  The number of subsets of size k = the number of subsets of size n-k. P(A) = .200 = = .200 = = .020 = . P(D∩B′)=.065 + . P (AB) will be larger.909 46.400 .447 from ethnic group 3 is . We are asked for P(D|B′) = P ( D ∩ B ′) .018 + .008 + . knowing that the individual is a professional basketball player.447 c. Most professional BB players are tall. P(B′) = 1 – P(B) = P ( B ′) . a. the probability that he has type A blood is . 14 Also notice that the intersection of the two events is just the 1st event.50 P( A) .06 = = .6125 P( B) .. since “exactly one” is totally contained in “at least one.50 P( A1 ) .25 = = .833 P( A1 ∩ A2 ) . P ( A1 ∩ A2 ) . We want P[(exactly one)  (at least one)]. P(BA) = b. P(A 2 A 1 ) = b.0833 ..3875 P( B) .14 The pieces of this equation can be found in your answers to exercise 26 (section 2. P(AB) = P ( A ∩ B) .07 + .2): P( A3′ | A1 ∩ A2 ) = P( A1 ∩ A2 ∩ A3′ ) .” 48.06 .Chapter 2: Probability 47.05 = = .12 .05 .25 = = .25 = = . P(A′B) = a.02 + .01 = ..04 + .7692 P( A ∪ B ) . P(B′A) = P ( A ∩ B′) .01 = .40 P ( A′ ∩ B) .40 P[ A ∩ ( A ∪ B)] .50 c.03 . P(A 1 ∩ A 2 ∩ A 3 A 1 ) = c.50 P( A) .06 69 .65 d. .15 = = . P ( A ∩ B) .3571 . P(AA∪B) = = = .50 a.12 + .12 So P[(exactly one)  (at least one)]= d. P(at least one) = P(A 1 ∪ A 2 ∪ A 3 ) = .50 e.01 = . 44 d. P(at least one is 75 watt) = 1 – P(none are 75 watt)  9    2  = 1 − 36 = 69 . P(M ∩ Pr) = P(M.07+.05+. P(M ∩ LS ∩ PR) = .  4  5   +    2   2  = 16 P(both 40 watt or both 60 watt) = 15  105   2 16 105 = 16 = .05+. =1 15 105 105   2 Notice that P[(both are 75 watt)∩(at least one is 75 watt)]  6    2  = 15 .25 70 .07 = .05. P(M) = . directly from the table of probabilities b.12 c.SS) = . we want P(same rating at least one NOT 75 watt).07=. The first desired probability is P(both bulbs are 75 wattat least one is 75 watt).Pr.05+.07+. P[(same rating)∩(at least one not 75 watt)] = P(both 40 watt or both 60 watt).2174 So P(both bulbs are 75 wattat least one is 75 watt) = 69 69 105 Second.10+. P(at least one NOT 75 watt) = 1 – P(both are 75 watt) =1- 15 90 = .LS) + P(M.02+.12+. a. = P(both are 75 watt) =  15 105   2 15 105 = 15 = .1778 Now. 105 105 Now.07+. P(LS) = 1 = .02+. the desired conditional probability is 90 90 105 50.08+. P(SS) = sum of 9 probabilities in SS table = 56.Chapter 2: Probability 49.56 = .Pr.02 = .49 P(Pr) = . 08 = = . A ∩ B = B) P( A) P( A) . implying t = .444 = .01 + t = .04 and r = .07. P(SS|M ∩ Pl) = P ( SS ∩ M ∩ Pl) .556 51. Then one approach is as follows: P(A 1 ∩ A 2 ) = P(A 2 | A 1 ) • P(A 1 ) = rq = . t + . P( A ∩ B ) P( B) = (since B is contained in A.07 These two equations give 2q . with t = P(A 1 ′∩ A 2 ) = P(A 1 ∩ A 2 ′) .0833 . 53.533 P ( SS ∩ Pl) .Chapter 2: Probability e. from which q = ..04 + . P(R from 1st ∩ R from 2nd ) = P(R from 2nd | R from 1st ) • P(R from 1st ) = b.10 P(LS|M Pl) = 1 . a.581 11 10 Let A 1 be the event that #1 fails and A 2 be the event that #2 fails.01 P(A 1 ∪ A 2 ) = P(A 1 ∩ A 2 ) + P(A 1 ′∩ A 2 ) + P(A 1 ∩ A 2 ′) = rq + 2(1-r)q = . P(same numbers) 8 6 • = .05 = = . Alternatively.08 = = . P(M|SS ∩ Pl) = P ( M ∩ SS ∩ Pl) .436 + 4 4 • = .08 + .07.P(SS|M Pl) = 1 .03 f. We assume that P(A 1 ) = P(A 2 ) = q and that P(A 1 | A 2 ) = P(A 2 | A 1 ) = r.436 11 10 = P(both selected balls are the same color) = P(both red) + P(both green) = 52.04 without reference to conditional probability. .60 P(BA) = 71 ..03 and thus q = .25.08 + .444 P( M ∩ Pl) .01 = . 28.or. P(two other H’s next to their wives | J and M together in the middle) P[( H − W .22 a.53 This is the probability of being awarded all three projects given that at least one project was awarded.or .682 P ( A1 ) . 48 3 72 .the.50 P( A1 ) .11 = = .01 P ( A1 ∩ A2 ) . P(A 2 ) = .07. P(A 2 A 1 ) = b. P(A 2 ∩ A 3 ) = .22 P[ A1 ∩ ( A2 ∪ A3 )] P[( A1 ∩ A2 ) ∪ ( A1 ∩ A3 )] = P( A1 ) P( A1 ) P ( A1 ∩ A2 ) + P( A1 ∩ A3 ) − P ( A1 ∩ A2 ∩ A3 ) .W − H )] P( J − M .01 = = . P(A 1 ∩ A 2 ∩ A 3 ) = .middle) 4 × 1× 2 × 1× 2 × 1 16 numerator = = 6 × 5 × 4 × 3 × 2 × 1 6! 4 × 3 × 2 ×1 × 2 ×1 48 denominator = = 6 × 5 × 4 × 3 × 2 ×1 6! 16 1 so the desired probability = = . P(A 1 ) = . P(A B) = P(B|A)•P(A) = b.22. P(A 1 ∩ A 2 ) = .or.05.Chapter 2: Probability 54.0455 P( A1 ) .15 = = . P(A 2 ∩ A 3 A 1 ) = c.11.01 = = . 55.W − H ) and ( J − M .0111 4×3 6×5 a. P(A 3 ) = .25.M − J . P(A 1 ∩ A 3 ) = .22 P( A1 ∩ A2 ∩ A3 | A1 ∪ A2 ∪ A3 ) = P ( A1 ∩ A2 ∩ A3 ) .M − J ) and ( H − W . P ( A1 ∩ A2 ∩ A3 ) . 2 ×1 2 ×1 × = .0189 P ( A1 ∪ A2 ∪ A3 ) . P( A2 ∪ A3 | A1 ) = = d.or.in. Chapter 2: Probability c. 58. therefore P(B’|A) < P(B’). If P(B|A) > P(B). . Proof by contradiction. This contradicts the initial condition. Assume P(B’|A) ≥ P(B’). P(B|A) ≤ P(B). Then 1 – P(B|A) ≥ 1 – P(B). P( A | B) + P ( A′ | B ) = P( A ∩ B ) P ( A′ ∩ B) P( A ∩ B) + P ( A′ ∩ B) P( B) + = = =1 P ( B) P( B ) P ( B) P( B) P[( A ∪ B ) ∩ C) P[( A ∩ C) ∪ ( B ∩ C )] = P(C ) P (C ) P( A ∩ C ) + P ( B ∩ C ) − P ( A ∩ B ∩ C ) = P(C ) P( A ∪ B | C ) = = P(A|C) + P(B|C) – P(A ∩ B | C) 73 . P(all H’s next to W’s | J & M together) = P(all H’s next to W’s – including J&M)/P(J&M together) 6 × 1 × 4 × 1 × 2 ×1 48 6! = = = .P(B|A) ≥ – P(B).2 5 × 2 × 1× 4 × 3 × 2 × 1 240 6! 56. then P(B’|A) < P(B’). 57. P(not disc | has loc) = b.loc ) .21 P(A 2 |B) = = .264 .462 = . 3 = .455 c.loc) .455 . 4 × . P(B) = P(A 1 ∩ B) + P(A 2 ∩ B) + P(A 3 ∩ B) = .274 .03 + .55 74 .42 P ( disc ∩ no.12 = P( A1 ∩ B) = P( A1 ) • P( B | A) . P(disc | no loc) = P( A1 ∩ B ) .509 P (no.loc) .. P( not.Chapter 2: Probability 59.125 = P ( A3 ∩ B ) a. 6 = .28 = = .462 .455 60.067 P(has.disc ∩ has. 21 = P( A2 ∩ B) .03 = = . 35 × . 25 × .21 b.5 = .264 P( B) .12 = = . P(A 3 |B) = 1 .. . P(A 2 ∩ B) = .loc ) . P(A 1 |B) = a. 578 . 622 10    2  2  8     P(1 def in sample | 2 def in batch) =  1  1  = .356 10    2 P(2 def in sample | 2 def in batch) = a.144 .5 + .24 + .Chapter 2: Probability 61. 022 10    2 .5 + .24 + .24 + .5 + .24 P(1 def in batch | 0 def in sample) = = .1244 .200 10    2 8    P(0 def in sample | 2 def in batch) =  2  = . 1 = .1244 .1244 P(2 def in batch | 0 def in sample) = = .278 .5 = . P(0 def in sample | 0 def in batch) = 1  9   P(0 def in sample | 1 def in batch) =  2  = .1244 P(0 def in batch | 0 def in sample) = 75 .800 10    2  9   P(1 def in sample | 1 def in batch) =  1 = . 12 = P(B ∩ W ) .0712 .42 76 . 5 = .28 = P(B ∩ W ′) .0712 P(1 def in batch | 1 def in sample) = 62. 5 = . 6 × . P(0 def in batch | 1 def in sample) = 0 .12 . 6 × .457 .Chapter 2: Probability b.06 + .543 . 4 × . W’ = no warranty .0712 P(2 def in batch | 1 def in sample) = = . 7 = . 4 × . B = basic. W = warranty purchase.06 + . 3 = .12 . 30 = P( D ∩W ) . Using a tree diagram.2857 P(W ) .30 + . 30 = P( D ∩ W ′) We want P(B|W) = P (B ∩ W ) . D = deluxe.12 = = = .06 = . 7941 P( B ∩ C ) .25×.14+.74 e.5400 c.75 × .6800 d.7 = .015 = . P(B ∩ C) = P(A ∩ B ∩ C) + P(A′ ∩ B ∩ C) =.68 77 . P(A|B ∩ C) = P ( A ∩ B ∩ C ) .8 = . P(C) = P(A ∩ B ∩ C)+P(A′ ∩ B ∩ C) + P(A ∩ B′ ∩ C) + P(A′ ∩ B′ ∩ C) = .9 × .54 = = .54+.Chapter 2: Probability 63.045+.5400+. a. P(A ∩ B ∩ C) = . b.8×. P(+) = .3137 So Mean (and not Mode!) is the most likely author.0588 . P(has d | +) = c.) = .0396 = .9412 65.0588 b. 78 .9408 = .3922 . a. P(satis) = .2941 P(mode | satis) = .9996 .Chapter 2: Probability 64. while Median is least.6735 . P(doesn’t have d | .2 = .51 P(median | satis) = .51 P(mean | satis) = . P(1) .34. P(2|A1) = P[DC∩LA] = P[DC] ⋅ P[LA] = (.247 .27 = . the probabilities of the second generation branches are calculated as follows: For the A1 branch.03 Follow a similar pattern for A2 and A3. and A3 as flying with airline 1. and P(A2|1). 1.365 P(A1/1) = 79 .7)(. and 3. Define events A1. we know that P(1) = P(A1∩1) + P(A2∩1) + P(A2∩1) = (from tree diagram below) .170 + . 1. P(1) .63. Events 0.07 + .090 P(A3/1) = = = .1) = .466 .Chapter 2: Probability 66.09 = . From the law of total probability. respectively. 2.365 P ( A2 ∩ 1) . P(A2|1). Creating a tree diagram as described in the hint. P(1|A1) = P[(DC′∩LA) ∪ (DC∩LA′)] = (.9) = .365 P ( A3 ∩ 1) . P (1) .105 + .9) = .170 = = . and event LA = the event that the flight to LA is late.3)(. P ( A1 ∩ 1) . and 2 are 0. P(0|A1) = P[DC′∩LA′] = P[DC′] ⋅ P[LA′] = (.1) + (.3)(. A2.365. Event DC = the event that the flight to DC is late.288 .105 P(A2/1) = = = . respectively. and 2 flights are late.7)(. We wish to find P(A1|1). Chapter 2: Probability 67. a. P(Pr ∩ NF ∩ Cr) = .05 c.05 = .1125 d.1125 = = . P(F ∩ Cr) = .2725 e.5325 80 .0625 + .1260 + . P(Cr) = .2113 P( Cr ) .5325 f. P(PR | Cr) = P (Pr ∩ Cr ) . P(U ∩ F ∩ Cr) = . P(Pr ∩ Cr) = .0625 = .1260 b.0840 + . 95]10 = .146 P( A ∪ B ) P( A ∪ B) .5)(. P(B′) = 1 .05 = . P( B ) Using subscripts to differentiate between the selected individuals..07. the events are independent if P(A ∩ B)=P(A)• P(B). (see paragraph below equation 2. P(A) • P(B) = (.25.50. P(A 2 ∩ A 3 ) = . . .Chapter 2: Probability Section 2. P(A ∩ B) = . P(O1 ∩ O2 ) = P(O1 )•P(O2 ) = (.11.7 + (. P(A 1 ) • P(A 2 ) = . P( A′ | B) = P ( A′ ∩ B) P( B) − P( A ∩ B ) = P( B) P( B) P( B) − P( A) ⋅ P ( B) = 1 − P ( A) = P( A′).P(A) • P(B) = [1 – P(A)] • P(B) = P(A′)• P(B).7 = .12 = = = . P(A|B) = . P(A ∪ B)=P(A) + P(B) – P(A)⋅P(B) = .102 + .05.4)(. P(A 1 ) • P(A 3 ) = .4) = . a. = 72.442 = .401.7) = .. P(B′|A′) = .3816 73.2. Using the definition. P(A 1 ) • P(A 3 ) = .723 81 . the desired probability is =1 [P(E′)]25 =1 . We want P(at least one signaled incorrectly) = P(E1 ∪ E2 ∪ …∪ E10 ) = 1 . two events A and B are independent if P(A|B) = P(A). so A and B are dependent. Since the events are independent. for 25 points.2.3 b.042 + . Using the multiplication rule. Alternatively.44)(.055. so A and B are dependent.(. A 1 and A 2 are not independent. P(E′) =1 .82 c. 69.6125 ≠ . Let event E be the event that an error was signaled incorrectly.P(E1 ′ ∩ E2 ′ ∩ …∩ E10 ′) . A 2 and A 3 are independent.82 70.6125.5 68. too. then A′ and B′ are independent. P(A 1 ∩ A 3 ) = .07.0616.50. A 1 and A 3 are not independent. P(A) = . P(E1 ′ ∩ E2 ′ ∩ …∩ E10 ′) = P(E1 ′ )P(E2 ′ )…P(E10 ′) so = P(E1 ∪ E2 ∪ …∪ E10 ) = 1 .7. Similarly.1936 P(two individuals match) = P(A 1 ∩A 2 )+P(B1 ∩B2 ) + P(AB1 ∩AB2 ) + P(O1 ∩O2 ) = .44) = .[.95)25 =.4 + .95. For 10 independent points. P(AB′| A∪ B) = P( AB ′ ∩ ( A ∪ B )) P( AB ′) . P(A′ ∩ B) = P(B) – P(A ∩ B) = P(B) .422 + . P(A 1 ∩ A 2 ) = .25 ≠ . 71. so .0059 and x = .005 = 0. P(seam need rework) = .10 + x. Let A 1 = older pump fails. P(A 2 ) = .P( 1 – 2 works ∩ 3 – 4 works) = P(1 works ∪ 2 works) + P(3 works ∩ 4 works) – P( 1 – 2 ) • P(3 – 4) = ( .85x + .99999969 P(at least one fails to open) = 1 = P(all open) = 1 – (.9-. Then P(at least one error) = 1 – (.80)1/25 .8019 = .9.99111 = .05 + x) ..9+.. 76.9-. The desired condition is .10 + x)( .00421.00889. For p replacing .90.9981 82 . b.9)(. and x = P(A 1 ∩ A 2 ) = P(A 1 ) •P(A 2 ) = (.6513.9)10 = . The resulting quadratic equation. Then P(A 1 ) = .99579 = . has roots x = . P(system works) = P( 1 – 2 works ∪ 3 – 4 works) = P( 1 – 2 works) + P( 3 – 4 works) . (1 – q)25 = .10 = 1 – (1 – q)25 . 75. i. Let q denote the probability that a rivet is defective.81)(.Chapter 2: Probability 74. Hopefully the smaller root is the actual probability of system failure. and thus q = 1 .9)(. the two probabilities are (1-p)n and 1 – (1-p)n . A 2 = newer pump fails.05 + x. a.81 .80 = (1 – q)25 . P(at least one opens) = 1 – P(none open) = 1 – (.8441.05)5 = . and x = P(A 1 ∩ A 2 ).20 = 1 – P(seam doesn’t need rework) = 1 – P(no rivets are defective) = 1 – P(1st isn’t def ∩ … ∩ 25th isn’t def) = 1 – (1 – q)25 . from which q = 1 . 78.3487. 1 – q = (. x2 .9) = . so P(no error on any of the 10 questions) =(.e.9+..2262 77.9) – (. P(no error on any particular question) = .81) + (.99 + .1.95)5 = .9)10 = . 4) }.4) }. Event A∩B: { (3.6)(2.1) }.1 = .4)(4.4)(2. Use the quadratic formula to solve for x: = 2 ± 4 − ( 4)(.2)(6.4)(5. The events are pairwise independent.4) }. P(A∩B) = 1 36 . . P(A)⋅P(B) ⋅P(C)= 1 6 1 ⋅ 16 ⋅ 16 = 216 ≠ 1 36 = P(A∩B∩C) The events are not mutually independent 83 . Event C: { (1. set this equal to .4) }. Now.5)(3. and x = p 2 . P(A) = 1 6 Event B: { (1. Event A∩C: { (3. P(A∩B∩C) = P(A)⋅P(B)= 1 6 ⋅ 16 = 1 36 =P(A∩B) P(A)⋅P(C)= 1 6 ⋅ 16 = 1 36 =P(A∩C) P(B)⋅P(C)= 1 6 ⋅ 16 = 1 36 =P(B∩C) 1 36 .2)(3.4)(3. 80. Event A∩B∩C: { (3.4)(3.99 or 1.4)(6. then P(system lifetime exceeds t 0 ) = p 2 + p 2 – p 4 = 2p 2 – p 4 = 2x – x2 . P(A∩C) = 1 36 .99. Using the hints. we use the value of .99 ⇒ x2 – 2x + .2 = 1 ± . Event B∩C: { (3.4)(4. or 2x – x2 = . Event A: { (3.3)(3.3)(5.01 2 Since the value we want is a probability. P(A∩C) = 1 36 .1)(3. and has to be = 1.99.Chapter 2: Probability 79.6) }. let P(A i ) = p.99) 2 = 2 ± .99 = 0. P(C) = 1 6 .4) }.5)(3. P(B) = 1 6 . P(at most n fixations) = 1 – P(at least n+1 are req’d) = 1 – P(no detection in 1st n fixations) = 1 – P(D1 ′ ∩ D2 ′ ∩ … ∩ Dn ′ ) = 1 – (1 – p)n c. P(# pass ≤ 1) = P(0 pass) + P(exactly one passes) = (.30)(. P(1st doesn’t ∩ 2nd does) = . (. D2 = detection on 2nd fixation.189 d. P(pass) = . pass ∩ ≥ 1. P(detection in at most 2 fixations) = P(D1 ) + P(D1 ′ ∩ D2 ) = P(D1 ) + P(D2 | D1′ )P(D1 ) = p + p(1 – p) = p(2 – p). … . pass ) .1.30) + (.30)(.70 a.. pass ) P(3.70) = .2 b. P(no detection in 3 fixations) = (1 – p)3 84 .70)(.70)(. 82.1+. pass ) P( ≥ 1.70) = .30)(.343 = ..2 = .9 .3)3 + .1 Similarly.189 = . pass ) .8 a.30)(. b.657 c..2] = 0 so P(all 3 escape) = (0)(0)(0) = 0.216 e.1= . P(1st detects ∩ 2nd doesn’t) = P(1st detects) – P(1st does ∩ 2nd does) = .70)(.8 = . 1 – P(all pass) = 1 . a. Let D1 = detection on 1st fixation. P(both detect the defect) = 1 – P(at least one doesn’t) = 1 . Define D1 .30) + (.343 = = = .70)(. Dn as in a. so P(exactly one does)= . P(exactly one passes) = (.Chapter 2: Probability 81.353 P (≥ 1. D2 .973 83. P(3 pass | 1 or more pass) = = P (3.8+. P(neither detects a defect) = 1 – [P(both do) + P(exactly 1 does)] = 1 – [.343 b. Then P(at most n fixations) = P(D1 ) + P(D1 ′ ∩ D2 ) + P(D1 ′ ∩ D2 ′ ∩ D3 ) + …+ P(D1 ′ ∩ D2 ′ ∩ … ∩ Dn-1 ′ ∩ Dn ) = p + p(1 – p) + p(1 – p)2 + … + p(1 – p)n-1 = p [ 1 + (1 – p) + (1 – p)2 + … + (1 – p)n-1 ] = p• 1 − (1 − p ) n = 1 − (1 − p ) n 1 − (1 − p) Alternatively. 9897 85 . the sample size is small relative to the “population” size.8588)(. b. P(system works) = P( 1 – 2 works ∩ 3 – 4 – 5 – 6 works ∩ 7 works) = P( 1 – 2 works) • P( 3 – 4 – 5 – 6 works) •P( 7 works) = (.927 – (. while here it is not.1) e.5.0222 .9 + .927) = . P(B) = P(A ∩ B) + P(A′ ∩ B) 10.2) + • (.02 . but P(A ∩ B) = P(B|A)P(A) = 19 ⋅ 10 = . so the two numbers are quite different.8588+.2.9639) (.04. c.99) (.8588 With the subsystem in figure 2. 8) = . P(A) = 2000 = .1(1 − p ) 3 = P( passed ) .Chapter 2: Probability d.5) 3 84. Very little difference. a.04. P(flawed | passed) = . P(passes inspection) = P({not flawed} ∪ {flawed and passes}) = P(not flawed) + P(flawed and passes) = . the events are not independent. P(flawed | passed) = P ( flawed ∩ passed ) . P(A ∩ B) = .039984.9 + P(passes | flawed)• P(flawed) = .9) = . since P(A ∩ B) ≠ P(A)P(B). 2 P(A) = P(B) = .000 = P(B|A) P(A) + P(B|A′) P(A′) = 1999 2000 • (.1(.1(. 85. P(system works) = .2 9999 9999 P(A ∩ B) = . P(A)P(B) = . Yes.5) 3 = .9 + .14 connected in parallel to this subsystem. In a.9+(1 – p)3 (.0137 .1(1 − p ) 3 For p = . P(late) = P(stopped at 1 or 2 crossings) = 1 – P(stopped at none) = 1 .81 = .19) 87.5)(. For route #1.0523) + (.most1) 1− π 2 86 .216 (. b. π2 π π (1 − π ) π 1− π 1− π π (1 − π )π (1 − π ) 2 1 −π P(at most 1 is lost) = 1 – P(both lost) = 1 – π2 P(exactly 1 lost) = 2π(1 .Chapter 2: Probability 86. P(late) = P(stopped at 2 or 3 or 4 crossings) = 1 – P(stopped at 0 or 1) = 1 – [.0523 For route #2.. a. 0523) = .1)] = .(.9)3 (.94 + 4(.5)(.19 thus route #1 should be taken.5)(. P(4 crossing route | late) = = P (4cros sin g ∩ late) P(late) .π) P(exactly 1 | at most 1 ) = P (exactly1) 2π (1 − π ) = P (at .  20    = 1140 3 19  b.# of crews having none of 10 best = 1140 - 10    = 1140 . The desired probability is 720/5040 = .10(500 ) + . c. 500 = .15 c. The total # ways to arrange the four forms: 10 × 9 × 8 × 7 = 5040. P(best will not work) = = . a.40(600 ) 666 P(Crack) = = = .50(500 ) + .1429 87 .291 90.120 = 1020 3 969 d.444 1500 1500 a. He must choose all 6 of the withdrawal forms to pass to a subordinate.Chapter 2: Probability Supplementary Exercises 88. 1500 . P(Surface Defect) = .15(600 ) 172 = 1500 1500 . The only way he will have one type of forms left is if they are all course substitution forms. The  6   desired probability is  6  = .333 .44(400) + .00476 10    6 b.10(500 ) 50 P(line 1 and Surface Defect) = = 1500 1500 So P(line 1 | Surface Defect) = = 50 1500 172 1500 = .08(400 ) + . P(Blemish | line 1) = . P(line 1) = b. He can start with the wd forms: W-C-W-C or with the cs forms: C-W-C-W: # of ways: 6 × 4 × 5 × 3 + 4 × 6 × 3 × 5 = 2(360) = 720. # having at least 1 of the 10 best = 1140 .   = 969 3 a.85 1140 89. .512 b. Use the quadratic formula to solve: .023 = .512+..32 92.8)(. P(A∪B) = P(A) + P(B) – P(A)P(B) .770 and P(A)P(B) = .608 c.77 – x ) = .Chapter 2: Probability 91.45 2 So P(A) = . and substituting this into the second equation.77 ± .032+. we get x ( .144.144 or x2 . then using the first equation.32 or .77 ± . P(1 sent | 1 received) = P (1sent ∩ 1received ) .4256 = = . (.8) = .144) 2 = . y = .626 = P(A) + P(B) . a.5432 88 .45 and P(B) = .13 = .144 So P(A) + P(B) = .144 = 0. .77 2 − ( 4)(. Let x = P(A) and y = P(B).7835 P(1received .8)(.023+.77x + .77 – x. 2 ∩ present ) P(det ected.0083 a.5 6× 5 × 4 2 89 .1074 When three experiments are performed. 94.2)(.different ) P( all .4) = = .2)(.905 (.384. If the impurity is present.are.Chapter 2: Probability 93. # orderings in which F is 3rd = 4×3×1*×2×1 = 24.1)(.8) + (.9) = . there are 3 different ways in which detection can occur on exactly 2 of the experiments: (i) #1 and #2 and not #3 (ii) #1 and not #2 and #3. (iii) not#1 and #2 and #3. the probability of exactly 2 detections in three (independent) experiments is (. 6) 96.384)(.027)(.exactly. the analogous probability is 3(.8)(.384)(. P(F last) = 4 × 3 × 2 × 1× 1 = .027. 1 120 = .2) (. There are 5×4×3×2×1 = 120 possible orderings. so 24 P(F 3rd) = 120 = .in.2) + (.8)(.different ) 6 × 5× 4 5 Denominator = = = .selects#1 ∩ all . P(exactly 1 selects category #1 | all 3 are different) P( exactly.exactly. = .5556 6× 6 × 6 9 = Numerator = 3 P(contestant #1 selects category #1 and the other two select two different categories) = 3× 1× 5 × 4 5 × 4 × 3 = 6×6× 6 6×6×6 The desired probability is then 5× 4 × 3 1 = = .1)(.1.are. Thus P(present | detected in exactly 2 out of 3) = P(det ected.in.8)(.8)(.4) + (.2 c. If the impurity is absent. ( * because F must be here).8) = . so P(BCDEF) = b.2 120 P(F hasn’t heard after 10 times) = P(not on #1 ∩ not on #2 ∩…∩ not on #10) 10 4 =  5 95. inspection) = . P(carrier | both + ve) = Let A = 1st functions.80)(.75 = .01) = = .05)2 (.02)(.89358 = .9754 .96.Chapter 2: Probability 97. P (carrier ∩ both.99) = .60) (following path “bad-good-good” on tree diagram) = .01) + (.01058 + .7656 P (both.96. P(needed no recrimping | passed inspection) = P( passed . P(A ∪ B) = .P(A ∩ B) = P(A) + . This gives P(B | A) = 100.95)2 (. P(A ∪ B) = P(A) + P(B) . 99. a. a. P(A ∩ B)=.10)2 (. so P(B) = . P(both + ) = P(carrier ∩ both + ) + P(not a carrier ∩ both + ) =P(both + | carrier) x P(carrier) + P(both + | not a carrier) x P(not a carrier) = (.initially ) P( passed .008 90 . positive) (.974 b.95 + (. positive) .90416 b. B = 2nd functions.01058 P ( B ∩ A) .81 P(E1 ∩ late) = P( late | E1 )P(E1 ) = (.9.89358 P(tests agree) = . P(pass inspection) = P(pass initially ∪ passes after recrimping) = P(pass initially) + P( fails initially ∩ goes to recrimping ∩ is corrected after recrimping) = .40) = .95 = .974 98.926 P( A) . Thus.01) + (.01058 P(both – ) = (.99) = .9 .81.90) 2 (.75 = = .05)(.. implying P(A) = .75.90)2 (. 10)(.05)(.50) + (.( 356) = .time) ..05)(.08)(.01)(.time) (.982 Let B denote the event that a component needs rework. P(E1 ′ | on time) = 1 – P(E1 | on time) = 1− 102.069 Thus P(A 1 | B) = 103..044 – (.117 b.069 (.069 i i i =1 (.( 991) (1000)10 = 1 .05)(. 4) =1− = .30) + (. P( E1 ∩ on.883 = ..883 ( 365) 10 P(at least two the same) = 1 .362 .507 for k=23 c.956 = .044 Thus P(at least one “coincidence”) = P(BD coincidence ∪ SS coincidence) = . P(all different) = (365)(364). P(at least two have the same SS number) = 1 – P(all different) = 1− (1000)(999). The law of total probability gives 3 P(late) = ∑ P(late | E ) ⋅ P (E ) i i i =1 = (.50) = . P(at least two the same) = .08)(.02)(.069 (.476 for k=22.20) P(A 3 | B) = = ..30) P(A 2 | B) = = .018 b. Then 3 P(B) = ∑ P( B| A ) ⋅ P( A ) = (.10)(.044) = .117)(.290 .. a.98)(.40) + (.20) = .601 P (on.Chapter 2: Probability 101..117 + . and = .10) = .50) + (.156 91 .348 . a. 33.10)(. .2941 < . .15)(.0625 = .2125 .0667 .67 .175 .075 a. P(B | R1 < R2 < R3 ) = . so classify as basalt. .25) = .25 + . P(G | R1 < R2 < R3 ) = b.5625 92 .75) + (.05.15 + . P(G | R1 < R3 < R2 ) = c. classify as granite. .25) = (.0375 P(G | R3 < R1 < R2 ) = = . P(erroneous classif) = P(B classif as G) + P(G classif as B) = P(classif as G | B)P(B) + P(classif as B | G)P(G) = P(R1 < R2 < R3 | B)(.75) + P(R1 < R3 < R2 or R3 < R1 < R2 | G)(. so classify as basalt.Chapter 2: Probability 104.15 = . 375)4 = .15625 = . a.5)4 = .25 p + .0625 b. 17 P(detection by the end of the nth glimpse) = 1 – P(not detected in 1st n) = 1 – P(G1 ′ ∩ G2 ′ ∩ … ∩ Gn ′ ) = 1 .6 p . P(all in correct room) = b.0625 + . P(walks on 6th ) = P(2 of the 1st 5 are strikes. P(G | R1 < R3 < R2 ) > .0417 4 × 3 × 2 × 1 24 a.5 iff p > (most restrictive) .15 p + .7 (1 − p ) 17 P(G | R1 < R2 < R3 ) = If 105.25 p 4 P(G | R1 < R3 < R2 ) = > .5. 2341.1 + . 107. 4321. The 9 outcomes which yield incorrect assignments are: 2143.P(G1 ′)P(G2 ′) … P(Gn ′) n = 1 – (1 – p 1 )(1 – p 2 ) … (1 – p n ) = 1 - π (1 − p i ) i =1 106.6 p + .375 24 . P(G | R3 < R1 < R2 ) > .15 p 14 P(G | R3 < R1 < R2 ) = > . 3412.5)5 ](. 1 1 = = .6 p 1 = > .15625 c. P(Batter walks) = P(walks on 4th ) + P(walks on 5th ) + P(walks on 6th ) = .5. and 4312.Chapter 2: Probability d. #6 is a ball) = P(2 of the 1st 5 are strikes)P(#6 is a ball) = [10(.5 iff p > .5 p 7 .5 iff p > .15625 + . 3421.5) = . 2413. P(walks on 4th pitch) = P(1st 4 pitches are balls) = (.375 P(first batter scores while no one is out) = P(first 4 batters walk) =(. For what values of p will P(G | R1 <R2 <R3 ) > .1(1 − p ) .0198 d. 4123. so P(all incorrect) = 93 9 = . p> 14 always classify as granite.5? .2(1 − p ) 9 . 3142. P(A 1 ∩ A 3 ) = P(A 1 )P(A 3 ) = ¼. 94 .6)(. P(A 1 ) = P(draw slip 1 or 4) = ½. P(only NY is full) = P(A ∩ B′ ∩ C′) = P(A)P(B′)P(C′) = .. P(A 1 ∩ A 3 ) = P(draw slip 4) = ¼ Hence P(A 1 ∩ A 2 ) = P(A 1 )P(A 2 ) = ¼. P(only Atlanta is full) = .38 Note: s = 0 means that the very first candidate interviewed is hired. Each entry below is the candidate hired for the given policy and outcome. P(at least one occurs) = 1 – P(none occur) = 1 – (1 – p 1 ) (1 – p 2 ) (1 – p 3 ) (1 – p 4 ) = p 1 p 2 (1 – p 3 ) (1 – p 4 ) + …+ (1 – p 1 ) (1 – p 2 )p 3 p 4 + (1 – p 1 ) p 2 p 3 p 4 + … + p 1 p 2 p 3 (1 – p 4 ) + p 1 p 2 p 3 p 4 111. P(A 2 ∩ A 3 ) = P(A 2 )P(A 3 ) = ¼.88 b.Chapter 2: Probability 108.12 P(at least one isn’t full) = 1 – P(all full) = 1 . P(A 1 ∩ A 2 ) = P(draw slip 4) = ¼.08 So P(exactly one full) = . thus there exists pairwise independence P(A 1 ∩ A 2 ∩ A 3 ) = P(draw slip 4) = ¼ ≠ 1/8 = P(A 1 )p(A 2 )P(A 3 ).12 and P(only LA is full) = . 109.18 Similarly. a. P(A 2 ∩ A 3 ) = P(draw slip 4) = ¼. P(A 3 ) = P(draw slip 3 or 4) = ½.18 + . P(A 2 ) = P(draw slip 2 or 4) = ½. 110.4) = .5)(.12 = .08 = . so the events are not mutually independent. Outcome s=0 s=1 s=2 s=3 Outcome s=0 s=1 s=2 s=3 1234 1 4 4 4 3124 3 1 4 4 1243 1 3 3 3 3142 3 1 4 2 1324 1 4 4 4 3214 3 2 1 4 1342 1 2 2 2 3241 3 2 1 1 1423 1 3 3 3 3412 3 1 1 2 1432 1 2 2 2 3421 3 2 2 1 2134 2 1 4 4 4123 4 1 3 3 2143 2 1 3 3 4132 4 1 2 2 2314 2 1 1 4 4213 4 2 1 3 2341 2 1 1 1 4231 4 2 1 1 2413 2 1 1 3 4312 4 3 1 2 2431 2 1 1 1 4321 4 3 2 1 s P(hire#1) 0 1 2 3 6 24 11 24 10 24 6 24 So s = 1 is best.12 + . P(all full) = P(A ∩ B ∩ C) = (. In the experiment in which a coin is tossed repeatedly until a H results. a Bernoulli random variable.CHAPTER 3 Section 3. 2. 2. Possible X values are1. W = 1 if the sum of the two resulting numbers is even and W = 0 otherwise. X = 1 if a randomly selected book is non-fiction and X = 0 otherwise X = 1 if a randomly selected executive is a female and X = 0 otherwise X = 1 if a randomly selected driver has automobile insurance and X = 0 otherwise 3. … (all positive integers) Outcome: X: RL AL RAARL RRRRL AARRL 2 2 5 5 5 95 . 1.1 1. In my perusal of a zip code directory. X = 4 for the outcome 44074. 5 (and not 0 or 1). 3. S: FFF SFF FSF FFS FSS SFS SSF SSS X: 0 1 1 1 2 2 2 3 2. 3. 4. I found no 00000. The sample space is infinite. 4. Thus possible X values are 2. M = the difference between the large and the smaller outcome with possible values 0. 3. 5. and X = 3 for 94322. X = 5 for the outcome 15213. 4. 6. yet Y has only two possible values. or 5. No. 4. nor did I find any zip codes with four zeros. let Y = 1 if the experiment terminates with at most 5 tosses and Y = 0 otherwise. a fact which was not obvious. respectively. 4. values are 0. 4. 5. 3. discrete f. …. 9. Y = 4: FSSS. FSSFSSS. discrete b.Chapter 3: Discrete Random Variables and Probability Distributions 7. not discrete g. 2(2). not discrete e. … . { x: 0< x < ∞ } if we assume that a rattlesnake can be arbitrarily short or long. 8. 3. 3c. possible values are { x: m < x < M }. 4. 2(3).000c. FSFSSS. c. 1. so x is discrete. 10 b.. 96 . 1+3. FFFSSS.e. FFFFSSS 9. 2. b. 4. With N = # on the list. 1. SFSSS. 5. 3(2). 6 c. Y = 6: SSFSSS. With c = amount earned per book sold. a. 2. …. 3. a. 2. 2. -1. 1. Y = 5: FFSSS. SFFFSSS. SFFSSS. X: -4. 12. 3. -3. 8. 8. FFSFSSS. Now a return to 0 is possible after any number of tosses greater than 1. 2. 5.…) an infinite sequence. 12. 1. 9. an infinite sequence) and X is discrete a. N. …giving a first element. 6. so possible values are 2. not discrete h. … -. Z: 0. 0. Possible values are 3. 2(1). { y: 0 < y < 14} since 0 is the smallest possible pH and 14 is the largest possible pH. -2. Possible values are 0. T = total number of pumps in use at both stations. 1+4. …(i.i. Returns to 0 can occur only after an even number of tosses. 3. possible S values are 2. 4. possible values are 0. 7. 2c. 4. … . 2(4). U: 0. 2. discrete c. 6.e. 15. 5.1+2. discrete Y = 3 : SSS. 2 10. 1. Possible values are 1. SFSFSSS. 3(4). With m and M denoting the minimum and maximum possible tension. 10. 6. 1. discrete d. 3(1). Possible values: 0. etc. FSFFSSS. 3(3). 6 d. … . … (1+1. Y = 7: SSFFS. 05 + .05 + .5 0 R el ative Freq ue ncy .10 + .25 = . no more than 50 can show up. P(y = 49) = . P(y > 50) = 1 .P(y = 50) = 1 .10 + .Chapter 3: Discrete Random Variables and Probability Distributions Section 3.15 a.3 0 .15 b.66.05 + . above. at most 49 of the ticketed passengers must show up. P(y = 50) = .14 + . For the 3rd person on the standby list.4 0 .1 0 0 4 5 6 7 8 x c.55 P(x > 6) = .45 .10 + .83 b. a. 97 . Using the information in a. In order for the flight to accommodate all the ticketed passengers who show up.12 + .14 + .27 12.17 = .2 11.12 + . at most 47 of the ticketed passengers must show up.12 = . We need y = 50.25 + .17 c.83 = .. x 4 6 8 P(x) . P(x = 6) = .2 0 . For you to get on the flight. .40 + .40 . P(y = 44) = .15 = . P( 2 ≤X≤ 5) = p(2) + p(3) + p(4) + p(5) = .30 F(1) = P(X ≤ 1) = P(X = 0 or 1) = .4) (1.20 = . i.5) (3.45 a.60.5) (4. P(X < 3) = P(X ≤ 2) = p(0) + p(1) + p(2) = .2) }] = 1 10 3 10 = .3) (1. 2 c.25 = .d.90 F(2) = P(X ≤ 2) = 1 The c. 1. a. P(3 ≤ X) = p(3) + p(4) + p(5) + p(6) = . P(Y ≤ 3) = p(1) + p(2) + p(3) = c.2) (1.4) (3. is 0 .Chapter 3: Discrete Random Variables and Probability Distributions 13. or X ≤ 2.10+. 2 ≥ X.15+.71 e.3) (2. Thus we desire P( 2 ≤X≤ 4) = p(2) + p(3) + p(4) = . P(X = 0) = p(0) = P[{ (3. and 6 – X = 4 to X = 2.70 b. F(0) = P(X ≤ 0) = P(X = 0) = .30  F(x) =  .55 d. P( 2 ≤Y≤ 4) = p(2) + p(3) + p(4) = 6 15 1 15 = .90  1 x<0 0 ≤ x <1 1≤ x < 2 2≤ x 98 .45 c. (1.10+. 6 – X ≥ 4 if 6 – 4 ≥ X. a.f. No 50 50 y =1  50  5 15.5) b.65 f. and P(X ≤ 2) = . ∑  ≠ 1 .5)}] = P(X = 2) = p(2) = P[{ (1. ∑ p( y) = K[1 + 2 + 3 + 4 + 5] = 15K = 1 ⇒ K = 14.20+. 6 – X = 3 to X = 3. 5 y =1 b.e. and p(x) = 0 if x ≠ 0.1 P(X = 1) = p(1) = 1 – [p(0) + p(2)] = . P(X ≤ 3) = p(0) + p(1) + p(2) + p(3) = .4 9 15 = .3 = . The number of lines not in use is 6 – X .15+.6  y2  1 55  = [1 + 4 + 9 + 16 + 25] = d.5) (4.4) (2. so 6 – X = 2 is equivalent to X = 4.5) (2.4) (3. 9)2 ] = . SFSS.3)3 ] =.9)2 + (.SFFS.3)] =.4116 2 FFSS. a.30 .162 c.81 b. Thus.40 . 99 .SSSF 4[(.SFFF 4[(.4.5. The fifth battery must be an A.20 .SSFF 6[(.7)4 =.… 17.00324 d.FSFS. P(2) = P(Y = 2) = P(1st 2 batteries are acceptable) = P(AA) = (.9)2 = 2[(.SSFS.7)(.3. Relative Freq ue ncy . x Outcomes p(x) 0 FFFF (.0756 4 SSSS (.1)(.2646 3 FSSS.9)2 .1)(. P(X ≥ 2) = p(2) + p(3) + p(4) = . y = 2.2401 1 FFFS.FFSF.FSSF.3)2 ] =.10 0 0 1 2 3 4 In sure d c. p(3) = P(Y = 3) = P(UAA or AUA) = (.7)3 (.0081 b. P(Y = y) = p(y) = P(the y th is an A and so is exactly one of the first y – 1) =(y – 1)(.Chapter 3: Discrete Random Variables and Probability Distributions 16.3483 This could also be done using the complement.1)y-2 (.9)2 ] = . and one of the first four must also be an A.0756+. a.1)(.9)(.SFSF.FSFF.0081 = . p(x) is largest for X = 1 d. p(5) = P(AUUUA or UAUUA or UUAUA or UUUAA) = 4[(.9) = .3)4 =.7)2 (.2646+.1)3 (. P(0) = P(Y = 0) = P(both arrive on Wed. the other 3 individuals.0 0 19.09 P(1) = P(Y = 1) = P[(W.25 So p(3) = 1 – (. D.25) = .Chapter 3: Discrete Random Variables and Probability Distributions 18.25 p(2) = P(Y = 2) = P(B. F(m) = 7 36 .1 0.40 P(2) = P(Y = 2) = P[(W.25 20. p(1) = P(M = 1 ) = P[(1. 1 2 3 4 5 6 7 8 Let A denote the type O+ individual ( type O positive blood) and B. or D first and A next) = p(4) = P(Y = 3) = P(A last) = 3 4 ⋅ 23 ⋅ 12 = 1 4 3 4 ⋅ 13 = 1 4 = .25 = .9 0.3) + (.7 0.5 0.25+.Th) or (F.1)] = 1 36 p(2) = P(M = 2 ) = P[(1.2 0.2) or (2.2) or (3. p(4) = b.F)or(Th.4 0.6 0. C.W) or (F.32] = .3) = .W)or(Th. C.32 P(3) = 1 – [.F)or(F.40 + .4) + (.Th)] = (.3) or (3.4)(.1) or (2.Th)or(Th.19 100 .4)(.3) or (2.) = (.3)(.F)] = .1) or (3. a.4) = . p(5) = 0 for m < 1.3 0.0 0.3)] = Similarly. and p(6) = 5 36 11 36 for 1 ≤ m < 2. 9 36 .25+.3)(.09 + .8 0. Then p(1) – P(Y = 1) = P(A first) = 14 = . 1 36 0 1  36  364 9 F(m) =  36  16  36  25 36 1  m <1 1≤ m < 2 2≤ m <3 3≤ m < 4 4≤m<5 5≤m<6 m≥6 1.2)] = 3 36 p(3) = P(M = 3 ) = P[(1. 39 . P(X = 2) = ..53 a.10 F(1) = P(X ≤ 1) = p(0) + p(1) = ..55.10   .45 = .00  .25   .1. 24. 1. and P(2 ≤ X ≤ 5) = F(5) – F(1) = .70. x 1 3 4 6 12 p(x) . P(X > 3) = 1 . 3.92 .2.70  .90   .70.92 .39 = .15 .. P(X < 3) = P(X ≤ 2) = F(2) = .20 b. then B) = P(GB) = (1 – p)p P(2) = P(Y = 2) = P(GGB) = (1 – p)2 p Continuing. and the probability of any particular value is the size of the jump at that value. 4. P(2 < X < 5) = . and 6.96 . and F(6) = 1.71 22.3.67 = .45 F(3) = . a. 5.25 = . b.60 .78 d.40 P(3 ≤ X ≤ 6) = F(6) – F(3-) = .… 101 .30 .19 = . P(3 ≤ X) = 1 – P(X ≤ 2) = 1 – F(2) = 1 .10 . Thus we have: 23..45.d.25 F(2) = P(X ≤ 2) = p(0) + p(1) + p(2) = .60 P(0) = P(Y = 0) = P(B first) = p P(1) = P(Y = 1) = P(G first. P(2 ≤ X ≤ 5) = . Possible X values are those values at which F(x) jumps.45 F(x) =   .40 = .. 2...00  x<0 0 ≤ x <1 1≤ x < 2 2≤ x< 3 3≤ x< 4 4≤ x<5 5≤x<6 6≤ x Then P(X ≤ 3) = F(3) = .05 . so we first calculate F( ) at each of these values: F(0) = P(X ≤ 0) = P(X = 0) = . is  .96 1. F(5) = . p(y) = P(Y=y) = P(y G’s and then a B) = (1 – p)y p for y = 0. The jumps in F(x) occur at x = 0..19 = . F(4) = .Chapter 3: Discrete Random Variables and Probability Distributions 21. The c.30 = .96.33 c.30 P(4 ≤ X) = 1 – P(X < 4) = 1 – F(4-) = 1 .90.f. then home) + P(F. The sample space consists of all possible permutations of the four numbers 1. home) + P(M. 3. p(1) = P(Y = 1) = p(3) = P(Y = 3) = 0. 1. 3 . ( 23 )x −1 ( 13 ) for x = 1. home) + P(M. Possible X values are 1. … P(1) = P(X = 1 ) = P(return home after just one visit) = 1 3 P(2) = P(X = 2) = P(second visit and then return home) = P(3) = P(X = 3) = P(three visits and then return home) = In general p(x) = b. M. … p(0) = P(male first and then home) = 1 2 ⋅ 13 = 16 . p(2) = ( 12 )( 23 )2 ( 53 )(13 ) + ( 12 )( 23 )2 ( 53 )( 13 ) . y value 4 2 2 1 1 2 2 0 Thus p(0) = P(Y = 0) = 9 24 outcome 2314 2341 2413 2431 3124 3142 3214 3241 . 3. a. F. 2. so as in a. … Possible Z values are 0. 3. … ⋅ 13 ( 23 )2 ⋅ 13 The number of straight line segments is Y = 1 + X (since the last segment traversed returns Alvie to O). 2. p(y) = c. 2. . 2 3 ( 23 ) y − 2 ( 13 ) for y = 2. a. 4: outcome 1234 1243 1324 1342 1423 1432 2134 2143 b. 102 y value 1 0 0 1 1 0 2 1 8 24 outcome 3412 3421 4132 4123 4213 4231 4312 4321 . p(z) = ( 12 )( 23 )2 z −2 ( 53 )( 13 ) + ( 12 )( 23 )2 z −2 ( 53 )(13 ) = ( 2454 )( 23 ) 2z −2 for z = 1. … 26.Chapter 3: Discrete Random Variables and Probability Distributions 25. In general. 3. 3. Similarly. p(3) = P(Y = 3) = 1 24 . M. 2. p(1) = P(exactly one visit to a female) = P(female 1st . F. 2. home) = 12 13 + 12 23 13 + 12 23 13 + 12 23 23 13 ( )( ) ( )( )( ) ( )( )( ) ( )( )( )( ) = ( 12 )(1 + 23 )( 13 ) + ( 12 )( 23 )( 23 + 1)( 13 ) = ( 12 )( 53 )(13 ) + ( 12 )( 23 )( 53 )(13 ) where the first term corresponds to initially visiting a female and the second term corresponds to initially visiting a male. p(2) = P(Y = 2) = y value 0 0 1 0 1 2 0 0 6 24 . 60 x= 0 4 b.2436 = .4602) or ( 0.001620+.168540+.74 σy = .9364 = .1 V(Y) = E(Y2 ) – [E(Y)]2 = 1.08) + …+ (4 – 2.05) = 110 30. E (Y) = . If x1 < x2 .60) + (1)(. E (Y2 ) = 1.1 – (.08) + (1)(.2602.25) + (2)(.60) + (100)(.8602 = (-.27) + (4)(.74 = . V(X) = ∑ ( x − 2.238572+.05) x= 0 = .9364 . E (Y) = 4 2  2 ∑ x ⋅ p ( x ) − ( 2. F(x1 ) = F(x2 ) when P( x1 < X ≤ x2 ) = 0.05) = 2. F(x2 ) = P(X ≤ x2 ) = P( {X ≤ x1 } ∪ { x1 < X ≤ x2 }) = P( X ≤ x1 ) + P( x1 < X ≤ x2 ) ≥ P( X ≤ x1 ) = F(x1 ). P(Y = 0) + P(Y =1) = .85 103 . 1).05) = .06 4 b.9677 c. 4 a. V(X) = a.45) + (3)(.06)2 (.8602 E (Y) ± σy = .06) = 5.10) + (900)(.3 28.06) 2 ⋅ p ( x ) = (0 – 2.06)2 (.339488+.60)2 = .Chapter 3: Discrete Random Variables and Probability Distributions 27. E (X) = ∑ x ⋅ p( x) x= 0 = (0)(.25) x= 0 + (400)(.60.15) + (2)(.1800 – 4. E (100Y2 ) = ∑ 100 y 2 ⋅ p ( y ) = (0)(. 4 ∑ y ⋅ p ( y ) = (0)(.9364 x=0  29. σx= d. Section 3.10) + (3)(. 1.60 ± .188180 = . 2) + (15. on the average. x 1 2 3 4 5 6 h 3 (x) -1 1 3 3 3 3 h 4 (x) -2 0 2 4 4 4 p(x) 1 15 2 15 3 15 4 15 3 15 2 15 6 E[h 3 (X)] = ∑ h (x ) ⋅ p( x) = (-1)( 3 x =1 1 15 2 ) + … + (3)( 15 ) = 2. E(X2 ) = 32. V(X) = 272.1)2 (. h 3 (X) = 2x – 3. h 4 (X) = 2x – 4 for x=1. V(25X – 8. but at x = 3. so E(X) is finite.01X2 ] = E(X) .5)2 (. E(X) = ∞ x =1 x =1 ∞ infinite series that ∑ x =1 34. x2 Let h(X) denote the net revenue (sales revenue – order cost) as a function of X.5) = V(25X) = (25)2 V(X) = (625)(3. E(x79 ) = (079 )(1 – p) + (179 )(p) = p ∞ 33.38)2 = 3. but it is a well-known result from the theory of x2 1 < ∞.66 a. E (X) = (13.5 = 401 c.01E(X2 ) = 16.5) + (19. Then h 3 (X) and h 4 (X) are the net revenue for 3 and 4 copies purchased.9936 b.5 = (25)(16.5) + (19. E (25X – 8. but plateaus at 4 for x = 4.3) = 16.Chapter 3: Discrete Random Variables and Probability Distributions 31.6 the revenue plateaus.9936) = 2496 d. a.2.38.298 – (16. For x = 1 or 2 . E[h(X)] = E[X .4.9)2 (.6667 Ordering 4 copies gives slightly higher revenue.6. 1 ∑x ⋅ p( x) = (02 )((1 – p) + (12 )(p) = (1)(p) = p 2 x= 0 b.4667 6 Similarly. E (X2 ) = (13.2) + (15.38 – 2.5.9)(..298.72 = 13.3. Following similar reasoning. c ∑ x ⋅ p( x ) = ∑ x ⋅ x 3 ∞ = c∑ x =1 1 .5. respectively.38) – 8.3) = 272.5)(.5) = 25 E (X) – 8. 104 .1)(.. E[h 4 (X)] = ∑ h ( x) ⋅ p(x ) = (-2)( 4 x =1 1 15 2 ) + … + (4)( 15 ) = 2. V(X) = E(X2 ) – [E(X)]2 = p – p 2 = p(1 – p) c. 81 x =1 Each lot weighs 5 lbs. and the variance of the weight left is V(100 – 5X) = V(-5X) = 25V(x) = 20.000 H(x) 0 500 4. Thus the expected weight left is 100 – 5E(X) = 88.m.f.5. so weight left = 100 – 5x.000 5. whereas 6 x=1 x 3 .500 E[h(X)] = 600. With a = -1. 6 1 6 1 1 1 1 = . Premium should be $100 plus expected value of damage minus deductible or $700. b. of X reflected about zero.02 x 0 1.8 .1. E(X) = ∑ x ⋅ p( x) = 2. 4 38. P(x) .1 – (2. but both have the same degree of spread about their respective means.3. E(X) = (n + 1)( 2n + 1)  n + 1  n2 −1 So V(X) = −  = 6 12  2  2 37. n 1  n( n + 1)  n + 1 1 1 x ⋅ =     ∑  n   n ∑ x = n  2  = 2 x =1 x =1 n n 1  n( n + 1)( 2 n + 1)  ( n + 1)( 2 n + 1) 1 2 1 2 E(X2 ) = ∑ x ⋅   =  ∑ x = = n  6 6  n   n  x =1  x =1 n 36. 39. of –X is just the line graph of the p.f. V(aX + b) = V(-X) = a2 V(X).25. The line graph of the p. a. ∑ [aX + b − E (aX + b)] ⋅ p( x) = ∑ [aX + b − (aµ + b)] = ∑ [ aX − ( aµ )] p ( x ) = a ∑ [ X −µ ] p ( x ) = a V ( X ). 40. so you  = ∑   ⋅ p ( x ) = ∑ = .08 . suggesting V(-X) = V(X).500 9.5  X  x=1  x  E[h(X)] = E  expect to win more if you gamble.m. b = 0.408 .000 10.Chapter 3: Discrete Random Variables and Probability Distributions 35. so V(X) = 6.3) 2 2 = . 2 V(aX + b) = x 2 x x 2 2 x 105 2 2 p( x) . E(X ) = 6.286 .1 . so P(|x-µ| ≥ 2σ) = P(X is lat least 2 s. E[X(X-1)] = E(X2 ) – E(X).µ) = E(X) . identical to the upper bound. and 10. Let p(-1) = = 13 . . x= 0  6  σ 2 = ∑ x 2 ⋅ p( x)  − µ 2 = 2. a.11 .04 .µ = 0. so P(|x-µ| ≥ 3σ) = P(| X | ≥ 1) 1 = P(X = -1 or +1) = 18 + 181 = 19 . E(X . For K = 3.72) = P(X = 6) = . 42. so the expected deviation from the mean is zero. Chebyshev’s bound of .06 .4.44.72. µ = 0 and σ d. ⇒E(X2 ) = E[X(X-1)] + E(X) = 32.64 . 43.01 6 b. b. and µ + 2σ = 5.5 a. k 2 3 4 5 10 1 k2 . When c = µ.025 is much too conservative.54  x =0  Thus µ . V(X) = 32.25 . µ = ∑ x ⋅ p( x) = 2.44 or ≥ 5. p (+1) = 1 50 .5 c. p( 0) = 24 25 106 .’s from µ) = P(x is either ≤-.µ = µ . P(|x-µ| ≥ kσ) = 0.5 – (5)2 = 7.04. 1 50 .2σ = -. E(X – c) = E(aX + b) = aE(X) + b = E(X) – c. σ = 1.d. k2 c.37. V(X) = E[X(X-1)] + E(X) – [E(X)]2 With a = 1 and b = c.Chapter 3: Discrete Random Variables and Probability Distributions 41. here again pointing to the very conservative nature of the bound 1 .5. 7) .05) = .6) = 8 5  (.7) = .635 d.P(X ≤ 4) = 1 – B(4.00497664) = .8.10.10..279 5 c. b(3.10.9)12 = . B(4.3) = .3) .993 .Chapter 3: Discrete Random Variables and Probability Distributions Section 3..05) = 1.10.9) 12 = 1 – (.3) .007 c.6) + b(4.1 . 6) (.124 3 b.1875 σx = 1.B(1.3) = . b(4.3) = ..6) = 8 3  (.3) = B(4.10.. b(6.25 V(X) = np(1 – p) = (25)(.716 d.05)(..3) = .10.570 12  0  (.P(X ≤ 1) = 1 ..851 f.873 b...8.4 44.850 b.6) (. P( 3 ≤ X ≤ 5) = b(3.95) =1.10.7) = .. P(X ≤ 2) = B(2.8.3) = .05) a.10..05) = ..718 0 45.10... P( 2 ≤ X ≤ 4) = B(4. P(X ≤ 1) = B(1. E(X) = np = (25)(.200 c.B(2... .25. P( 1 ≤ X ≤ 4) = P(X ≤ 4) – P(X ≤ 0) = . P(2 < X < 6) = P( 3 ≤ X ≤ 5) = B(5.B(5..277 = .B(3.4) 5 = (56)(...993 = .0000 g. P(X = 0) = P(X ≤ 0) = . 1) (.200 d.B(1.701 e.8.00221184) = .6) + b(5. a. P(1 ≤ X) = 1 – P(X = 0 ) = 1 - a.4) 3 = (56)(.10.10.. P(X ≥ 5) = 1 ..0897 107 .25. P(2 < X) = 1 .6) = .10.10. b(5.7) = B(6.8.277 e.3) ... X ~ Bin(25. 46. 5314] = .3543..786 ..25) = .3543  x  1 a. If at least 10 have no citations (Failure). P( 5 ≤ X ≤ 10) = P(X ≤ 10) – P(X ≤ 4) = .403 b. 49...25 . n = 20 a.1) (.  0 ii) Select 4 goblets.1143 c.25) = 5.9) 4 = .786 b. Let S = has at least one citation.1) (. we know P(X = 1) = .774 108 .3543 + .91854 48.15.6561 .10)  n x  6  ( p ) (1 − p) n− x =  (.9)  × . one of which has a defect.  6 0 (.B(5. and the 5th is good:  4  1 3  (.B(5.. Let S = comes to a complete stop.  0 From a .25) ..20.20.25) = .383 d. E(X) = (20)(.20) = B(6. P(X ≤ 6) = B(6.4.5314 .40) = . We expect 5 of the next 20 to stop. X ~ Bin(6.217 = .991 . 9) 6 = .40) = . P(X ≤ 7) = B(7.26244 = .6561 + .25) = 1 . so p = . Then p = .9) 5 = ... P(X = 6) = b(6.26244  1   So the desired probability is .9 = .617 = .787 c.617 = .20.20.1)1 (. Either 4 or 5 goblets must be selected i) Select 4 goblets with zero defects: P(X = 0) =  4 0  (. n = 15 a. 1) (. P(X ≥ 2) = 1 – [P(X = 0) + P(X = 1)]. .15.169 c..Chapter 3: Discrete Random Variables and Probability Distributions 47. and P(X = 0) =  Hence P(X ≥ 2) = 1 – [. P(X ≥ 6) = 1 – P(X ≤ 5) = 1 . then at most 5 have had at least one (Success): P(X ≤ 5) = B(5. P(X = 1) = b..20. P( 3 ≤ X ≤ 7) = P(X ≤ 7) – P(X ≤ 2) = ..9 = P(X=1) = .4) = 2.6)(.19) +  (.397 c. Let event B = {battery has acceptable voltage}.08.705 .02) = .02)(. We desire P( 5 ≤ X ≤ 7) = P(X ≤ 7) – P(X ≤ 4) = . 7.4 = 1.122 = . X∼ Bin (10. Then p = P(S) = P(replaced | submitted)⋅P(submitted) = (.81)9 (. has a binomial distribution with n = 10.5) + 24. e. V(X) = np(1 – p) = (10)(.7 x + 2σ = .667 c.9)(.9 So P(0 = X = 1.92) 8 = .45. σx = 1. . .49 = ..166 = . the number among the company’s 10 phones that must be replaced.. 51. 53.02)(.03 hours 25 X = the number of flashlights that work. 08) 2 (.5(3) = 3.98) = .98)24 = .308 b. p = .012 = .B(5.603 = . 10   (. P(X ≥ 6) = 1 – P(X ≤ 5) = 1 .407 9 10  109 .Chapter 3: Discrete Random Variables and Probability Distributions 50.20.4.5(4. Then P(flashlight works) = P(both batteries work) = P(B)P(B) = (.6) = 6.55 E(X) ± σx = ( 4. Thus X.285 + . P(X=1) = 25(. P(X=1) = 1 – P(X=0) = 1 – (.367 = ..1478 2 x = 25(.20) = .08.9) = . P(X=2) = 1 – P(X=1) = 1 – [. P(X=9) = P(X=9) + P(X=10) 10  10    (.81 We must assume that the batteries’ voltage levels are independent.81).833 .81)10 = . .821 Let S represent a telephone that is submitted for service while under warranty and must be replaced.02) a. so p(2) = P(X=2) = 52. X ∼ Bin (25.397] d.833 .40)(. X ~ Bin(10.98)25 = 1 .55 ).5 + 1.60) = 1 . σ = npq = 25(.5 .308 + ..633 b.60) a. E(X) = np = (10)(. 5 0.05 .01 .01 .7 0.2 0. a.05 .0 p c.0 P(acc ept) 54.25 P(accept) 1. and X denote the number of defectives in the sample.678 .4 0.5 0.10.526 b.9 1.p) p .988 .00 .930 .0 0.914 .00 .15.01 .816 .1 and low for p > .20 . P(the batch is accepted) = P(X ≤ 1) = B(1.244 P(the batch is accepted) = P(X ≤ 2) = B(2. P(the batch is accepted) = P(X ≤ 2) = B(2.3 0.25 P(accept) .376 .236 We want a plan for which P(accept) is high for p ≤ .1 0.6 0.996 . 0. e.398 .10 . 110 .0 0.p) p .20 .20 .8 0.964 .10 .1 The plan in d seems most satisfactory in these respects.10. 1.05 .p) p . d.Chapter 3: Discrete Random Variables and Probability Distributions Let p denote the actual proportion of defectives in the batch.10 .25 P(accept) 1.736 . 1)2 = . If p = .5..9) = .8 becomes B(14.5. P(rejecting claim when p = . b(x.8) = . so there is no variability in X) or if p = 1 (whence every trial is a success and again there is no variability in X) d [np (1 − p) ] = n[(1 – p) + p(-1)] = n[1 – 2p = 0 dp ⇒ p = .5 – (1.9963 Thus topic B should be chosen. so now A should be chosen. 58. a.5)(25)(. n..5X. However. 56.7) = 1 . p) x n − x     Alternatively.B(15. Whenever p > . smaller than in a above.6) = 1 .5 – 1. respectively.B(15. 59.8) = B(15.5 – 1.99 If B is chosen. the probabilities are .5.5E(x) = 62.7) = 1 .017 b. P(not rejecting claim when p = . (1 – p) < .5 so probabilities involving X can be calculated using the results a and b in combination with tables giving probabilities only for p ≤ . b. so E(h(X)) = 62... yielding P(n-x S’s when P(S) = 1 – p) as desired.75 for A and ..586. since the two events are identical). B(x. the probabilities of b above increase to . P(x S’s when P(S) = 1 – p) = P(n-x F’s when P(F) = p). The probability of rejecting the claim when p = .006.Chapter 3: Discrete Random Variables and Probability Distributions 55.n. this probability is = 1 .. b.25.25.189 = .n.25.6) = $40.1)3 (. c. 1 – p) =  n  n  n− x  (1 − p) x ( p) n− x =   ( p ) (1 − p ) x = b(n-x. np(1 – p) = 0 if either p = 0 (whence every trial is a failure. which is easily seen to correspond to a maximum value of V(X).6.7) = P(X ≥ 16 when p = .425. a.00 57. P(at least half received) = P(X ≥ 2) = 1 – P(X ≤ 1) = 1 – (0.575 = .25.1)4 – 4(. but the labels S and F are arbitrary so can be interchanged (if P(S) and P(F) are also interchanged).1 – p) = P(at most x S’s when P(S) = 1 – p) = P(at least n-x F’s when P(F) = p) = P(at least n-x S’s when P(S) = p) = 1 – P(at most n-x-1 S’s when P(S) = p) = 1 – B(n-x-1. h(x) = 1 ⋅ X + 2. If topic A is chosen. for p = . when n = 2.5np – 62. P(at least half received) = P(X ≥ 1) = 1 – P(X = 0) = 1 – (.6875 for B.902 and .5 111 .5 – 1. n.p) c. when n = 4. a.25(25 – X) = 62.8) = .811. 0080(.6.8)(.. Although there are three payment methods. a binomial r. so equals 1.2) + .6.4) = .6.1) + .2) + b(0. leaving only np.277 x= 0 c. n-1 of a binomial p.0003(. and V(X) = 21. X = 0 if there are 3 reservations and none show or …or 6 reservations and none show. a. P(X = 3) = .Chapter 3: Discrete Random Variables and Probability Distributions 60. The desired probability is P(X = 5 or 6) = b(5. 2. and 4.0001(.1) + b(0. 2.8) = .5) = 50. 1.7) = 70.3. With S = doesn’t pay with cash. 61. we are only concerned with S = uses a debit card and F = does not use a debit card.3932 + .0016(..3) + .5.002) = 2(.8) = 4(.f. Proof of E(X) = np:  n n E(X) = ∑ x ⋅  x  p n x x= 0 n = n! (1 − p ) n− x = ∑ x ⋅ ∑ (x − 1)!(n − x )!p x=1 x (1 − p ) n− x x =1 n! p x (1 − p) n− x x! (n − x )! n (n − 1)! x −1 n −x = np∑ p (1 − p ) x =1 ( x − 1)!( n − x )! (n − 1)! p y (1 − p ) n−1− y (y replaces x-1) ( y )! ( n − 1 − y )! y =0 n = np ∑ =  n−1  n − 1 y   p (1 − p ) n −1− y  np ∑  y=0  y   The expression in braces is the sum over all possible values y = 0..8)(.4) = . Then When x is: 0 1 2 3 4 H(x) is: 4 3 2 1 0 62.8)(.3. Possible X values are 0.0013 P(X = 1) = b(1. so P(X = 0) = b(0.5.6.4) = .082) = ..8)(.6. b.2273 ] = . n = 100 and p = .0172 P(X = 2) = . E(X) = np = 100(.m.8)(.4.000) + 3(.. P(X = 4) = 1 – [ . with n = 6 and p = .8)(.1) + … + b(1...0906. So n = 100 and p = . E(X) = np = 100(.. based on n-1 trials.3) + b(0. 1. 5 0 6 0 6 E[h(X)] = ∑ h(x ) ⋅ b( x.015 + 3(. as desired.0013 + … + .2621 = . Let X = the number with reservations who show. a..8.8) + b(6. and V(X) = 25. 3. Thus we can use the binomial distribution.2273.v.7. Let h(X) = the number of available spaces.6553 b. … .6636 112 . 041 + .Chapter 3: Discrete Random Variables and Probability Distributions 63.021 = . 6  6  15 − 5  6  5  = 2 . a. M=6  6  9      2  3  = 840 = .811.  15   14   15   15  σ = V ( X ) = . µ = 15 and σ = 1.024 = .926 E(X) = 113 .15| ≥ 5.021 + .573  15  15  3003 3003 3003     5 5 126 + 756 P(X=2) = 1 – P(X=1) = 1 – [P(X=0) + P(X=1)] = 1 − = .5 64. When p = . Section 3.857 .280 b.15| ≥ 3.874) = P(X ≤ 11 or X ≥ 19) = . or P(|X .236. P(|X . µ = 10 and σ = 2.065. All these probabilities are considerably less than the upper bounds .472 and 3σ = 6. X ∼ Hypergeometric N=15. n=5. V(X) =   ⋅ 5 ⋅   ⋅ 1 −  = . so 2σ = 3.811) = P(X ≤ 9) = .004.µ| ≥ 2σ) = P(X ≤ 5 or X ≥ 15) = .5.472 is satisfied if either X ≤ 5 or X ≥ 15. The inequality |X – 10| ≥ 4.874 and 3σ = 5. P(X=2) = 15  3003   5 P(X=2) = P(X=0) + P(X=1) + P(X=2)  9   6  9       5 1 4 840 126 + 756 + 840 1722 =   +    + = = = .042.75.708. whereas P(|X .25(for k = 2) and .706 3003 c.937. so 2σ = 4. In the case p = .11 (for k = 3) given by Chebyshev. 000150+.10).50) + h(1. P(X ≥ 10) = h(10.15.1176+.15.3799 c.892 P(X > 3.40.0001 = .0101+.0438+.50) = 11697+.114 a.01412 So the desired probability = P(x ≥ 10) + P(X ≤ 5) = .Chapter 3: Discrete Random Variables and Probability Distributions 65.  30  20      10  5  = .795 = .15.879  12  924  12          6     6  c.000001+. 15.30.15. P(X=4) = 1 – P(X=5) = 1 – [P(X=5) + P(X=6)] =   7  5   7          5 1 6 105 + 7 1 −     +    = 1 − = 1 − .30.01412 = .10) from the binomial tables = . 7)  7  5      5  1  = 105 = .2070 a.002045+.50) + … + h(5.50) =  50     15  b. E(X) =  6⋅ 7   = 3 .15.121 = . But “at least 10 from 1st class” is the same as “at most 5 from the second” or P(X ≤ 5).30.30.15. σ =  12  (116 )(6)(127 )(125 ) = .121 (see part b) d. So P(X=5) ˜ B(5.30.000227+. X∼h(x. P(X ≤ 5) = h(0.15. We can approximate the hypergeometric distribution with the binomial if the population size and the number of successes are large: h(x.892) = P(X > 4.3799 + .392) = P(X=5) = .5 + .50) = .50) + h(11.50) + … + h(15. . 12.30. P(at least 10 from the same class) = P(at least 10 from second class [answer from b]) + P(at least 10 from first class).2070+. 6.0013+. P(X = 10) = h(10.5 ..15.000000 = . P(X=5) =  12  924   6 b.998 66.39402 114 .15.400) approaches b(x.30. 3483 = . 9. 6.11) b.  10 10      5  10  = .6036 a.Chapter 3: Discrete Random Variables and Probability Distributions d. so σY = 1. (In order to have less than 5 of the granite. 4 6 ⋅   = 2. 10.5 ± .1354 .5 . 15. E(X) = n⋅ M 10 5  10  = 15 ⋅ = 7. 8.3483 . a. V(X) =  (7. P(X = 5) = h(5.0163 + . P(all 10 of one kind or the other) = P(X = 5) + P(X = 10) = .5066. Let Y = 15 – X.5714  49   50  E(X) = n⋅ σx = 1. we arrive at the pmf. 8.6036 e.4934).0163 .4. x 5 6 7 8 9 10 p(x) . 6. Possible values of X are 5.3483 .0326 c.0163 b. 67.5714. in table form below.18  11  115 . M 30 = 15 ⋅ =9 N 50  35   30  V(X) =  (9 )1 −  = 2.20) =  20     15  Following the same pattern for the other values. 7.3483 + . so we want P(X = 7) + P(X = 8) = . N 20 20   19   σx = .9934 µ ± σ = 7.9868 .10.9934 = (6.6966 68. there would have to be more than 10 of the basaltic).0163 .1354 .5)1 −  = .0163 = . h(x. Then E(Y) = 15 – E(X) = 15 – 9 = 6 V(Y) = V(15 – X) – V(X) = 2. h(x. N   150  10 ⋅   = 3 and  500  490 (10)(.150.10. n. 2n 2 V(X) = n  n   n  n  n   n  n 1  2n − n  ⋅ 1 −  =    ⋅n ⋅  ⋅ ⋅ 1 −  =   ⋅ ⋅  2n  2n   2n − 1  2  2 n   2 n − 1  2  2   2n − 1  70.3)(.n. h(x. h(x.3) Using the hypergeometric model. Then P(all of top 5 play the same direction) = P(X = 5) + P(X = 0) = h(5. n.7 ) = .5.Chapter 3: Discrete Random Variables and Probability Distributions 69.M.982( 2.20) (the successes here are the top 10 pairs..n.20)  15 15       5  + 10  = .50) b.033 =  20   20       10   10  c. When N is large relative to n. n = n h(x. N = 2n.10.7) = 2. E(X) = V(X) = M  =&b x.3)(.N) so h(x.10.15.1 116 . M = n.2n) E(X) = n⋅ n 1 = n. 10.10.06 499 Using the binomial model.20) + h(5. a. a.3) = 3.5. E(X) = (10)(.1) = 2.10. and V(X) = 10(. =&b( x. and a sample of 10 pairs is drawn from among the 20) b. .10.500) c. Let X = the number among the top 5 who play E-W. 5) = . and 5.Chapter 3: Discrete Random Variables and Probability Distributions 71.250 p(4) = P(two among the 1st three are B’s and the 4th is a B) + P(two among the 1st three are G’s and the 4th is a G) =  3 2 ⋅  (.0625) = .5) b. so the expected number of children = E(X + 2) . The P(x rolls) = P(9 doubles among the first x – 1 rolls and a double on the xth roll = x −10 9 x −10 10  x − 1 5   1   1   x − 1 5   1       ⋅   =       9  6   6   6   9  6   6  r (1 − p ) 10( 56 ) r (1 − p ) 10( 56 ) E(X) = = 1 = 10(5) = 50 V(X) = = 1 2 = 10(5)( 6) = 300 p p2 (6 ) 6 117 .25+2(. so p(x) = nb(x. 5) = 2 .5 for each of the three families).0625) = .2..5) 4 = . a.5)3 = . p = 1 6 . Then P(X = x) = nb(x. Furthermore.5) = (3)(. With S = a female child and F = a male child.) 74. p(3) = P(X = 3) = P(first 3 are B’s or first 3 are G’s) = 2(.688 x= 0 d.2.188 c. the sum of the expected number of males born to each one. With S = doubles on a particular roll.5) + 3(. A and B are really identical (each die is fair). P(exactly 4 children) = P(exactly 2 males) = nb(2.. The only possible values of X are 3.5) and E(X) = 6 ( = 2+2+2. E(X) = (2)(.2. . P(at most 4 children) = P(X ≤ 2) 2 = ∑ nb(x.375  2 p(5) = 1 – p(3) – p(4) = .5 = E(X) + 2 = 4 72. let X = the number of F’s before the 2nd S.375 73.25)(. so we can equivalently imagine A rolling until 10 doubles appear.6. This is identical to an experiment in which a single family has children until exactly 6 females have been born( since p = . The interpretation of “roll” here is a pair of tosses of a single player’s die(two tosses by A or two by B). 4.. 20)] = . so P(X > 12.616=. P(1st doesn’t ∩ 2nd doesn’t) = P(1st doesn’t) ⋅ P(2nd doesn’t) = (.F(7.021 = .966 .20) – F(9. P(X > 20) = 1 – F(20.932 b.982 .819)(.P(X ≤ 11) = F(28.867-.944 < X < 20 + 8.459 d.5) = .F(12. P(X = 1) = F(1. P(5 ≤ X ≤ 8) = F(8.251 a. a.163 b.83 .2) = 1 .5) – F(5.5) = .20) = . P(X ≤ 8) = F(8.472 = P(20 – 8.8) = . P(X ≥ 2) = 1 – P(X ≤ 1) = 1 – F(1.011 b..283 d..5) = . = 4.064 a.191 b. σX = λ P(µ .5) = .819 = . P(5 < X < 8) = F(7.671 118 .2σ < X < µ + 2σ ) 76.5) – F(4.065 c.20) = .441 c..819) = .559 . E(X) = λ= 20..5) .Chapter 3: Discrete Random Variables and Probability Distributions Section 3. P(X ≤ 10) = F(10. P(X = 8) = F(8.492 e.945 78.559 = .944) = P(11.011 = . P(X ≤ 5) = F(5. P(6 ≤ X ≤ 9) = F(9..8) = . σX = .20) . P(10 ≤ X ≤ 20) = F(20.6 75.554 P(10 < X < 20) = F(19.056 < X < 28. λ = 2.018 c. P(X ≥ 9) = 1 – P(X ≤ 8) = . E(X) = λ= 10.936 = .982 = .. P(X ≥ 10) = 1 – P(X ≤ 9) = .470 .944) = P(X ≤ 28) .20) – F(10.2) – F(0.2) = . a.068 d.005 = .20) = .20) = 1 .83) = P(X ≥ 13) = 1 – P(X ≤ 12) =1 - 77.8) – F(5.526 c. P(5 ≤ X ≤ 8) = F(8. which is also the expected number of arrivals in a 45 minute period. λ = np = 5 200 a.265 = .492 b..8) =.Chapter 3: Discrete Random Variables and Probability Distributions 79. The experiment is binomial with n = 10.5 hours implies that λ = 20.F(5. so for a 45 minute period the rate is λ = (5)(. P(X = 0) ˜ 0 a.5) = .75) = 3. a.000 and p = .417 c.464 c. a. For a two hour period the parameter of the distribution is λt = (4)(2) = 8.8) – F(9. P(X ≥ 4) = 1 . P(X = 4) = F(4. so P(X = 10) = F(10.122.735 c.011. 80.2) = . P(X ≥ 8) = 1 – P(X ≤ 7) = 1 . so P(X > 10) ˜ 1 – F(10.8) – F(5. 83. so µ = np = 10 and σ = npq = 3.5) = 2.135 c.20) = .161 . λ = 8 when t = 1. t = 2. p= 1 . so λ = 12.001.265 = .283 b.440 . so P(X = 6) = F(6.8) = . thus the expected number of arrivals is 12 and the SD 81.5) – F(4.191 = .. P(X ≥ 6) = 1 .099. and P(X ≥ 10) = 1 . in this case. P(X ≥ 20) = 1 – F(19.20) = . E(X) = λt = 2 82.8) = . X has approximately a Poisson distribution with λ = 10.313 . so P(X = 0) = F(0.. t = 90 min = 1. For a 30 minute period.5) – F(3..F(9.5) = . n = 1000.P(X ≤ 3) = 1 . 119 . Arrivals occur at the rate of 5 per hour.133 a.867 = . = 12 = 3.5 hours.75.. b.583 = .809. b. λt = (4)(.530 and P(X ≤ 10) = F(10.8) = .10) = 1 .175 b. 3026 ≈ 1.106 120 .5 α = 1/(mean time between occurrences) = E(X) = ∑x x= 0 2. so P(X ≤ 16) = F(16. The area of the circle is πr2 = .865) 0 = .407 b.215 c. a binomial r. so P(X ≥ 4) = 1 – P(X ≤ 3) = 1 – F(3.221 b. X has approximately a Poisson distribution with λ = np = 2. a..2) = .000.15 years 2 ∞ ∞ ∞ e−λ λ x e−λ λ x e−λ λ x e −λ λ y =∑x = λ∑ x = λ∑ x =λ x! x! x! y! x=1 x=1 y =0 87. the parameter is (80)(. with n = 5 and p = . 135) 5 (.01)(.v.031416 sq.857 = .25) = 20. E(X) = np = (200)(.135 Let Y = the number among the five boards that work.98. Let X = the number of diodes on a board that fail.00148  4 5 85.135) 4 (.785 = . The expected number of trees is λ⋅(area) = 80(85. Then P(Y ≥ 4) = P(Y = 4 ) + P(Y = 5) =  5 5  (.00004 = . a. σX = 1. For a one-quarter acre plot. given α = 2: . V(X) = npq = (200)(.000) = 6.20) = . Solve for t .01) = 2.143 c.Chapter 3: Discrete Random Variables and Probability Distributions 84.1) = -αt t= ∞ 86. miles or 20. Thus X has a Poisson distribution with parameter 20.1 = e-αt ln(. c. 1 =2 .800..2) = 1 . a.99) = 1. αt = (2)(2) = 4 b.00144 + . P(board works properly) = P(all diodes work) = P(X = 0) = F(0.865) +  (.135. P(X > 5 ) 1 – P(X ≤ 5) = 1 .106 acres. 000122 b. t+∆t) if and only if no events in (o.5).λ ⋅ ∆t – o(∆t)] b.5) ⋅ =  y  10  (.5). P(X = 10 and no violations) = P(no violations | X = 10) ⋅ P(X = 10) = (. a.10) – F(9.5) y −10 e −10 10  e −10 e −10 (5) y ∑ y =10 10! ( y − 10)! ∞ c. a.10)] = (. P(exactly 10 without a violation) = ( 5) y −10 e −10 ⋅ 510 = ∑ 10! y =10 ( y − 10)! = e −10 ⋅ 510 10! = e −5 ⋅ 510 = p(10. P(y arrive and exactly 10 have no violations) = P(exactly 10 have no violations | y arrive) ⋅ P(y arrive) (10) y y! y −10 y (10) e (5) = y! 10!( y − 10)! = P(10 successes in y trials when p = . the 5 results from λp = 10(. 10! ∞ (5) u e −10 ⋅ 510 5 = ⋅e ∑ 10! u =0 (u )! ∞ In fact.125) = . t) and no events in (t. d − λt e = -λe-λt = -λP0 (t) . t+∆t). P0 (t+∆t) = P0 (t) ⋅P(no events in (t. k! ( k − 1)! 121 . as desired. d  e − λt ( λt ) k  − λe − λt ( λt ) k kλe − λt ( λt ) k −1 = + dt  k!  k! k! [ ] = −λ e −λ t ( λt ) k e −λt (λt ) k −1 +λ = -λPk (t) + λPk-1 (t) as desired. 5) (. No events in (0.000977)(. t+∆t)) = P0 (t)[1 .Chapter 3: Discrete Random Variables and Probability Distributions 88.5)10 ⋅ [F(10. P0 (t + ∆t ) − P0 (t ) ∆ ′t o( ∆t ) = −λP0 ( t ) − P0 (t ) ⋅ ∆t ∆ ′t ∆t c. dt d. generalizing this argument shows that the number of “no-violation” arrivals within the hour has a Poisson distribution with parameter 5. 89. Thus. 1 d. ….6. Continuing in this manner yields the following distribution: W 6 7 8 9 10 11 12 13 14 15 16 17 18 P(W) 1 35 1 35 2 35 3 35 4 35 4 35 5 35 4 35 4 35 3 35 2 35 1 35 1 35 Since the distribution is symmetric about 12.598.7). so p(9) = 1 35 3 35 .26375  52    5 p(3) = 1 – [p(1) + p(2) + p(4)] = .3)(1.4) (1.14592  2. Each having probability 1 35 . p(1) = P(exactly one suit) = P(all spades) + P(all hearts) + P(all diamonds) 13     5  = . and σ 18 2 = ∑ ( w − 12) p ( w) 2 w =6 = 1 35 [(6)2 (1) + (5)2 (1) + … + (5)2 (1) + (6)2 (1) = 8 91. ∑ ∑ x ⋅ p( x)  − (3.114) = .2.3.2. Since there is just one outcome with W value 6.4)].616      = 6 ⋅ 2 ⋅ + 2 ⋅     = 6 = .6) (1.5) … (5.636 x =1  x =1  4 b. µ = 12.Chapter 3: Discrete Random Variables and Probability Distributions Supplementary Exercises 90. Outcomes are(1. The W values for these outcomes are 6 (=1+2+3).58835  4 2  2 2 σ = x ⋅ p ( x ) = 3 .3. there are 35 such outcomes. 18. 114 . 1 c) = = .405. 1 h.2.2.590 + 44. p(6) = P(W = 6) = with W value 9 [(1.5) and 2. µ= 122 . 8. 7. Similarly.00198 + P(all clubs) = 4P(all spades) = 4 ⋅  52    5 p(2) = P(all hearts and spades with at least one of each) + …+ P(all diamonds and clubs with at least one of each) = 6 P(all hearts and spades with at least one of each) = 6 [ P( 1 h and 4 s) + P( 2 h and 3 s) + P( 3 h and 2 s) + P( 4 h and 1 s)]  13 13  13 13           4 1 3 2 18. σ = .960   52   52           5  5   13  4 ⋅  (13)(13)(13) 2 p(4) = 4P(2 spades. there are three outcomes . a. .75) b. 123 =&p(x... P(D light works) = P(at least 2 d batteries work) = 1 – P(at most 1 D battery works) = 1 – [(1 – p)4 + 4(1 – p)3 ]. Let X ~ Bin(5.5). P(X ≥ 5) = 1 ... 95. B(10.005) b.8912 = . σ2 = (15)(.B(9. so np = 2.5 and b(x. …  r − 1 = nb(y – r. 15.75.. b(x.75) .05) = . Requests can all be met if and only if X ≤ 10. a Poisson p.m. P(X = 5) = p(5. a.15.2.p) ⇒ 0 ≤ 2p – 3p 3 ⇒ p ≤ 2 3 .p(4.310 P( 6-v light works) = P(at least one 6-v battery works) = 1 – P(neither works) = 1 –(1 – p)2 . . The 6-v should be taken if 1 –(1 – p)2 ≥ 1 – [(1 – p)4 + 4(1 – p)3 ]. if 7 ≤ X ≤ 10.5) = 1 .313 d.579 d.75) ..5) = .5.007 b.75) = . a. . which is bad if the % defective is large and good if the % is small.001 = .991 96. 2.15. 15. and 15 – X ≤ 8.9) = .25.25.005.B(6.. µ = (15)(. N = 500. . P(X > 10) = 1 .75) = ..81 e.25) = 2.0668 c.e.Chapter 3: Discrete Random Variables and Probability Distributions 92.10) = . Simplifying.25. 2..75) = 11. 1 ≤ (1 – p)2 + 4p(1. r+2.15. 500. r+1.75) = 1 .B(4.098 c.148 c.20) = . Then P(X ≥ 3) = 1 – P(X ≤ 2) = 1 – B(2. 2.8912 = .r. i. p(y) = P(Y = y) = P(y trials to achieve r S’s) = P(y-r F’s before rth S)  y − 1 r  p (1 − p) y − r . P(X ≥ 5) = 1 – p(4. . p = ..9). 94. P(X ≥ 5) = 1 . y = r. so P(all requests met) = B(10.f. .B(4.314 .p) =  93.75)(..B(5.9580 .B(4.15.5) .1088 97. All would decrease. a. P(X ≥ 5) = 1 . For n = 3. Using the decision rule “reject if X = 6 or X ≥ 19” gives the probability . 100.25. P(rejecting the claim) = P(X ≤ 7) + P(X ≥ 18) = .8) – B(6.986 b. .1) = 1 – B(6.002 = .271 = 720 b. X ~ B(x.8) = .5) = . The block of n symbols gives a binomial experiment with n trials and p = 1 – p 1 + p 1 p 2 . With p = .3)](1 – p)p 2 For p = .1) = 2. B(18.6) – B(7.. P(Y = 1) = P(no one has the disease) = (. 25.009 d.8) = .2) = B(6.25.5.044 d. Then p = P(S) = P(symbol is sent correctly or is sent incorrectly and subsequently corrected) = 1 – p 1 + p 1 p 2 . x 2 3 4 5 6 7 8 p(x) .. 25.Chapter 3: Discrete Random Variables and Probability Distributions 98. e. 99.0010 So P(X ≤ 8) = p(2) + … + p(8) = . p). The claim will not be rejected when 8 ≤ X ≤ 17. a.6.25. P(8 ≤ X ≤ 17) = B(17..780.… – p(x . for x = 5..022 = .8) – B(7.991 = . 6. which is quite large npq = 25(.6) = . 7..001 = . P(8 ≤ X ≤ 17) = B(17.. so E(Y) = (1)(. With p=. which is too large.846 .271) = 1.845.8.729) + (4)(. Regard any particular symbol being received as constituting a trial. as contrasted with the 3 tests necessary without group testing.081 .813. B(18.5.002 + .0023 .25.1 – B(1.109...0088 .022 + [1 .729 and P(Y = 4) = . P(X ≥ 7 when p = .004. With p=. p(2) = P(X = 2) = P(S on #1 and S on #2) = p 2 p(3) = P(S on #3 and S on #2 and F on #1) = (1 – p)p 2 p(4) = P(S on #4 and S on #3 and F on #2) = (1 – p)p 2 p(5) = P(S on #5 and S on #4 and F on #3 and no 2 consecutive S’s on trials prior to #3) = [ 1 – p(2) ](1 – p)p 2 p(6) = P(S on #6 and S on #5 and F on #4 and no 2 consecutive S’s on trials prior to #4) = [ 1 – p(2) – p(3)](1 – p)p 2 In general.9.220 c. 101.. 25.25 = 1. We want P(rejecting the claim) = . With X ~ Bin(25. P(X ≤6 when p = .25..81 .9) = 2. .1).. .014.. Let Y denote the number of tests carried out.2) = = . 25.022 + . σX = c. a.1)(.0154 . 25.109 . …: p(x) = [ 1 – p(2) .01. 50 124 . We should use “reject if X = 5 or X ≥ 20” which yields P(rejecting the claim) = . .25.271. possible Y values are 1 and 4. .000 = .978] = .1) = .1) = 1 ..9995 102.25.081 .5) – B(6.991 . E(X) = np = 25(.25.9)3 = .P(2 ≤ X ≤ 6) = B(6. 104.12).4) + 5∑ p( x.m.e = .m. . 5).5. Let X = the number of seeds out of the 10 selected that meet the condition P ∩ C.01 0! − 1n(. Let S = an operator who receives no requests.135 b. and event P = seed produces ears with single spikelets.116)..4) + (4) p(4. the event of interest is P = seed produces ears with single spikelets.97024 With S = favored acquittal.f. E(X) = 4 4 ⋅   = 1.. P(all receive x) = P(first receives x) ⋅ … ⋅ P(fifth receives x) =   . Let Y = the number out of the 10 seeds that meet condition P.4) ∞ = (0)p(0. Let event C = seed carries single spikelets.116) 5 (. P(X = 5) = b. the sample size is n = 4. a.272.4) + (1) p(1.272) + … + b(5.735 + 5[1 – F(4. Then p = .7329 ⇒ R = .041813 + … + . P(at least one) = 1 – P(none) = 1 ⇒ R2 = ( λπR 2 ) 0 −λπ R 2 −λπ R 2 =1. Then P( P ∩ C) = P(P | C) ⋅ P(C) = .0767. Then Y ~ Bin(10.116. . e −λπ R ⋅ 2 106.. 01) = .10.884) 1 = . a.59 125 . P(Y ≤ 5) = b(0.4.10. 10    (. P(P) = P( P ∩ C) + P( P ∩ C′) = . h(x. The number sold is min (X.002857 5 For 1 seed.135) =  5  (. and P(all  x!  receive the same number ) is the sum from x = 0 to ∞.40) = . P(X = 0) = F(0.Chapter 3: Discrete Random Variables and Probability Distributions 103. of the number of interviewed jurors who favor acquittal is the hypergeometric p. Then X ~ Bin(10.99 ⇒ e = .116 + (.884) 5 = .272).33  12  105.4.29 (.8561 λπ ∞ 107. the population size is N = 12.00144  4 5  e −2 2 x  c.4) x =5 = 1.4) + (3) p(3. 135) 4 (. 5)] = ∑ min( x..5) p (x.272) = .4)] = 3.4) + (2) p(2.2) 0. the number of population S’s is M = 4.40) = .f.26)(. and P(Y = 5) = .116 (from a) + P(P | C′) ⋅ P(C′) = . so E[ min(x. and the p.135 and we wish P(4 S’s in 5 trials) = b(4.076719 = . 0729) = . There are four ways exactly three could have positive results. 2) (. probability of success is not the same for all tests b. P(X = x) = P(A wins in x games) + P(B wins in x games) = P(9 S’s in 1st x-1 ∩ S on the xth ) + P(9 F’s in 1st x-1 ∩ F on the xth )  x − 1 9  x − 1   p (1 − p) x −10 p +  (1 − p ) 9 p x −10 (1 − p )  9   9   x − 1 10 =   p (1 − p ) x−10 + (1 − p) 10 p x−10  9  = [ b.4096)(.32768)(. 126 .1)   2    1   =(.02389 1 2  5  1 4   5  3  (.  9  [ ] 19 So P(X ≥ 20) = 1 – P(X < 20) and P(X < 20) = ∑ P ( X = x) x =10 109.00009216 3 0  5  3 2   5  0 5  (. 11.0273. 9) (.1)   1    2   =(. Combination D D′ 0 3 Probability  5  0 5   5  3 2   (.0081) = . ] Possible values of X are now 10.8)  ⋅  (. 8)  ⋅  (. Let D represent those with the disease and D′ represent those without the disease.9) (.2) (. 12.00001) = .0512)(.2) (. a.00045) = .00332 2 1  5  2 3   5  1 4  (. a.2) (.9) 2(.Chapter 3: Discrete Random Variables and Probability Distributions 108.8)  ⋅  (.1)   3    0   =(. 9) (. Now P(X = x) =  x − 1 10   p (1 − p ) x−10 + q 10 (1 − q ) x−10 for x = 10. … .2048)(. 19.8)  ⋅  (. …( all positive integers ≥ 10). No.000000512 Adding up the probabilities associated with the four combinations yields 0.1)   0    3   =(. Chapter 3: Discrete Random Variables and Probability Distributions 110.. 4 p ( x.µ) = 1 2 p ( x. but p(λ. λ ) + 12 p( x. λ. p ) ( x + 1) (1 − p) conclusion follows.5 (. b ( x + 1.5)( 3. [ ] 1 2 λ +µ λ + µ  λ −µ  so V(X) = λ +λ + µ2 +µ − =  +  2 2  2   2  2 2 112. λ ) + 2 ∑ x p (x. a. p) ( n − x) p = ⋅ > 1 if np – (1 – p) > x. ( x + r − 1)( x + r − 2). E(X2 ) = ∞ 1 1 1 ∞ 1 ∞ ∑ x[ 2 p (x. from which the stated b ( x. 6 p ( x.9507 111. n.1 is a mode.λ.v. λ ) + p( x. µ ) = (λ + λ ) + ( µ + µ ) (since for a ∑ 2 x=0 2 x =0 2 2 Poisson r.m. µ )] = 2 ∑ x p( x.0) = 1. µ ) x= 0 x= 0 x=0 1 1 λ +µ = λ+ µ = 2 2 2 1 ∞ 2 1 ∞ 2 1 2 1 2 x p( x.λ.λ) = p(1 . µ) are Poisson p. p( x + 1. µ ) c. n.( x + r − x) x! (5. λ ) λ = > 1 if x < λ . ∞ ∑ p( x. λ) so λ is also a mode[p(x. λ ) + .1 . so p(x. λ ) + ∑ x p ( x. λ ) + 2 p(x. P(X ≥ 1) = 1 – p(0) = 1 – (. then λ . λ ) ( x + 1) λ is an integer. 127 .1068 4! k(r.’s and thus ≥ 0..5 = . a.f.3) 2.3)2.µ) ≥ 0.7 ) 4 = .1 and x = λ. Further.5)( 2. µ ) = + = 1 ∑ ∑ 2 x= 0 2 x =0 2 2 b.5 and p = .x) = Using k(r.5) With r = 2. λ. .3.. If p( x. p(x.5)( 4.λ) and p(x. E(X2 ) = V(X) + [E(X)]2 = λ + λ2 ). b. p(4) = (. µ ) where both p(x. from which the stated conclusion follows. λ)] achieves its maximum for both x = λ . E(X) = d. µ ) = x= 0 1 ∞ 1 ∞ 1 1 p ( x . λ = b. For [0.44. 6t dt = 123. M − 1. 128 . a.82 e 2+. 114. as desired. 6t dt = 409.Chapter 3: Discrete Random Variables and Probability Distributions 10 113. λ= ∫ 0. M N n −1 M ∑ h( y. Then σ2 = ∑ ( x − µ) A 2 p ( x ) ≥ (kσ ) 2 ∑ p( x) . 10) = . so the desired probability is F(15. 2 2 2 A 116.951.9996 ≈ 10.6]. N − 1) = n ⋅ N y =0 Let A = {x: |x .µ| ≥ kσ).4]. But A ∑ p( x) = P(X is in A) = P(|X . λ = ∫ 6 2 e 2+. n − 1. whereas for [2.  M  N − M  M! N − M      ⋅    x  n − x  n n ( x − 1)! ( M − x )!  n − x      E(X) = ∑ x ⋅ =∑ N N x= 0 x =1     n  n N − M  N − 1 − ( M − 1)      M n  M − 1  n − x  M n−1  M − 1  n − 1 − y  n ⋅ ∑   = n ⋅ ∑   N x =1  x − 1   N − 1 N y =0  y   N − 1      n −1  n −1  n⋅ 115.µ| ≥ kσ).µ| ≥ kσ}. so σ ≥ k σ ⋅ P(|X . 6t dt = 9. 9907 0 ∫ 4 0 e 2+. ∑ P(X = j) = 10 P (arm on track i ∩ X = j) = i =1 10 = ∑ P (X = j | arm on i ) ⋅ p i i =1 10 ∑ P (next seek at I+j+1 or I-j-1) ⋅ pi = ∑ ( pi + j +1 + Pi − j −1 ) p i i =1 i =1 where p k = 0 if k < 0 or k > 10. 1 1. P(X < 0) = b. f(x1) 0.5 1 −2 . 5 ] 1 .5 1 a.25 = .5) = c.5 k+4 k dx = .5) = c. P(x > 1.09375(4 – x2 ) −5 10 ∫ ∫ 2 .438 for –5 ≤ x ≤ 5.5 0.5) = F(x) = 1 10 −∞ ∫ 0 ∞ ∫ 1 . 2. P(.5 < X < 2. ∫ 1 f ( x) dx = ∫ 12 xdx = 14 x 2 1 a.0 -0.5 . and = 0 otherwise ∫ 0 dx = .5 = 167 ≈ .5 10 3 1 −2 10 ∫ dx = . Graph of f(x) = .4 3. P(-2 ≤ X ≤ 3) = d. P(-2.5 1 2 xdx = 14 x 2 f ( x )dx = ∫ 2 1 1.5 .5 ] 1 0 = . P( k < X < k + 4) = a.CHAPTER 4 Section 4.5 ≤ X ≤ 1.5 xdx = 14 x 2 ] 2 1 . 5 2 1.5 1 10 dx = 10x ]k k+4 = 101 [( k + 4) − k ] = . P(x ≤ 1) = b.5 -3 -2 -1 0 1 x1 129 2 3 . 09375( 4 − x 2 ) dx = . a. θ ) dx = ∫ −∞ 200 0 = − e −x 2 ∞ = 0 − ( −1) = 1 0 x − x 2 / 2θ 2 e dx θ2 ] 200 / 2θ 2 ] ≈ −..5 −.09375(4 − x )dx = . P(X > 0) = ∫ .P(X ≤ 200) ≈ .8647.5 0 130 () 1 3 3 8 2 − 18 (1)3 = 19 ≈ . For x > 0.θ ) dx = ∫ 200 ∞ 0 x − x 2 / 2θ 2 − x2 / 2θ 2 e dx = − e θ2 f ( x. θ ) dx = − e − x 2 ∫ ∞ f ( x )dx = ∫ kx 2 dx = k 2 1= b. P(0 ≤ X ≤ 1) = c.6328 4. P(X ≤ x) = ∫ x −∞ 200 100 ∫ f ( y. P(X ≤ 200) = −∞ ∫ f ( x.5 1 1 .1353 + 1 = .5 0 3 0 2 2 ∫ 1 .5 0 3 8 3 8 ( )] x3 3 ] 1 0 200 ] x 0 ≈ .5) = 1 – P(-. P(100 ≤ X ≤ 200) = d. ∫ b. P(1 ≤ X ≤ 1.θ ) dy = x 0 f ( x. since x is continuous.5 .6875 c.5) = ∫ d.125 0 ] = ] =1− x 2 dx = 18 x 3 x 2 dx = 18 x 3 2 ] 100 2 2 y − y 2 / 2θ 2 e dx = − e − y / 2θ 2 e 5.09375(4 x − )  = . ∞ a.5 1 1 .5) = 1 - ∫ −∞ 0 ∫ 1 3 0 8 x 2 dx = 18 x 3 1 .2969 64 [ ( ) − 0] = 1 − 1 3 3 8 2 27 64 = 37 64 ≈ .5) = 1 - −1 ∫ . P(-1 < X < 1) = d.5 OR x > .Chapter 4: Continuous Random Variables and Probability Distributions 2 x3  b.5781 .5 ≤ X ≤ .3672 = . P(x < -. 000 1 .4712 = 1 − e −x = k ( 83 ) ⇒ k = 2 / 2θ 2 3 8 = 18 = . / 20 .09375( 4 − x 2 ) dx = 1 . P(X ≥ 200) = 1 .8647 0 P(X < 200) = P(X ≤ 200) ≈ . P(X ≥ 1.1353 ∫ c. 5 ≤ X ≤ 3.2 . a.5 by symmetry of the p. P(28 < X < 32) = ∫ P( a ≤ x ≤ a+2) = ∫ 32 1 28 10 a+ 2 a dx = 101 x ]28 = .4 1 10 32 dx = . 1= c. f(x) 2 1 0 0 1 2 3 4 5 x ∫ 4 4 3 ⇒k= 3 4 1 k[1 − ( x − 3) 2 ]dx = ∫ k[1 − u 2 ]du = b.367 128 5 ≈ .5) 2 −1 ∫ 4 3 3 4 [1 − ( x − 3) 2 ]dx = . P( |X-3| > . f(x) = b.2 1 33 10 35 x2  c. since the interval has length 2.313 16 . P(114 ≤ X ≤ 134 ) = ∫ e. E(X) = ∫ x ⋅ dx =  = 30 25 20  25 35 1 10 30 ± 2 is from 28 to 32 minutes: d.f 13 / 4 3 11 / 4 4 [1 − ( x − 3) 2 ]dx = 3 4 . 131 47 ≈ . P(X > 3) = d. 1 10 for 25 ≤ x ≤ 35 and = 0 otherwise a.5 4 −1 / 4 [1 − (u) 2 ]du = [1 − (u ) 2 ]du = 7.5 =1- ∫ 1/4 ∫ 3 −. P(X > 33) = ∫ 35 dx = .d.5) = 1 – P( 2.5) = 1 – P( |X-3| ≤ .Chapter 4: Continuous Random Variables and Probability Distributions 6. 438 c.18  = 50  0 50 c.15( x −5) dx = .Chapter 4: Continuous Random Variables and Probability Distributions 8.491 = .4 5 9.071 132 .15∫ e −. 825 10 y2   2 1 2  y ) dy = y   + y− 50  0  5 50   5 ≈ .P(Y ≤ 5) ≈ .74 50 50 50 10 ydy + ∫ ( 25 − 251 y ) dy = 6 2 = . P(Y ≤ 8) = = e.. P(Y < 2 or Y > 6) = a.438. 0. P(X ≤ 6) = 3 1 0 25 ∫ 5 1 0 25 8 ydy + ∫ ( 25 − 5 = ∫ 3 1 0 25 1 25 y ) dy = 23 ≈ .5 0.4 f(x) 0.92 25 46 9 37 − = = . = ∫ 6 ..562 b.562 = .5) 5 .5 = .5 0 0 =1− e −. 1 . P( 5 ≤ Y ≤ 6) = P(Y ≤ 6) ..P(Y < 3) = f. 15u du (after u = x .15 u 5 .2 0. a.0 0 5 10 x b.5 e ] −.562 . ∫ ∞ −∞ = 5 f ( y )dy = ∫ 5 1 0 25 ydy + ∫ ( − 10 5 2 5 1 25 1  1  1 1 + ( 4 − 2) − (2 − )  = + = 1 2  2  2 2 5 y2  9 ydy = ≈ .3 0.15e −. P( 3 ≤ Y ≤ 8) = P(Y ≤ 8) . P(Y ≤ 3) = ∫ d. .1 0. σx ≈ .1875 ~2 µ ~ = 2 ≈ 1.5) = 1 – P(X ≤ .5 ≤ X ≤ 1) = F(1) – F(. 1 1 2 2 x3  8 E(X) = ∫ x ⋅ f ( x) dx =∫ x ⋅ xdx = ∫ x dx = = ≈ 1. . = .5) = d.333  −∞ 0 2 2 0 6 0 6 1 4 x 2 15 16 = . P(X ≤ 1) = F(1) = b.2 11.25 a. E(X2 ) = 2 133 8 36 ≈ .9375 for 0 ≤ x < 2. and = 0 otherwise 2 2 E(X ) = ∫ ∞ −∞ 2 2 x f ( x ) dx = ∫ 2 2 0 So Var(X) = E(X2 ) – [E(X)]2 = h. = ∫ ∞ −∞ f ( x. 2 2 0 8 0 2 2 − ( 86 ) = 2 From g . P(X ≤ b) = ∫ k +1 dx = θ ⋅  − k   = 1 −   θ x  x θ b b d. ∫ P(a ≤ X ≤ b) = b a kθ k  1 dx = θ k ⋅  − k k +1 x  x b ∞ θk   = k = 1 θ θ k k  θ  θ   =   −    a  a   b  k Section 4.222 . 3 16 1 1 2 x4  x xdx = ∫ x 3 dx =  = 2. a.θ )dx = ∫ ∞ θ kθ k  1 dx = θ k ⋅  − k k +1 x  x b kθ k 1  θ  k  c.Chapter 4: Continuous Random Variables and Probability Distributions 10.5) = 1 – F(.5) = c. f(x) = F′(x) = f. = .5 = e. P(X > . k .414 F ( µ~) = ⇒ µ~ 2 = 2 ⇒ µ 4 ∞ g.471 . P(. θ b. x ≤ 1 F (x) =  −3 1 − x .366 −3 ) = .5 + .866) = P( x < 2. e.75 4 4 2 ∞ σ = V (x ) = e. a. P(X > . 1= ∫ b.6836 = .866 P(1. F(0) = .125.866 < x < 1.366) = F ( 2.088 x ∞ 3   3  3 −2 = 0 + 3 = 3 d.5) = 1 – P(X ≤ . F(x) = F′(x) = +  4 x −   = 0 +  4 −  32  3  dx  2 32  3  ( ~ ) = . ∞ k dx ⇒ 1 = − k 4 3 x a. So 1 (1 − 18 ) = 18 or . E ( x ) = ∫ x  4 dx = ∫  3 dx = − x 1 1 2 2 2 x  x  1 x ∞ ∞ 3   3  E ( x 2 ) = ∫ x 2  4 dx = ∫  2 dx = − 3x −1 = 0 + 3 = 3 1 1 x  x  1 2 9 3 3 V ( x) = E( x 2 ) − [ E ( x )] 2 = 3 −   = 3 − = or . 3 4 = .875 = . cdf: F(x)= c. F ( µ 13.5) = 1 – F(.366) = 1 − ( 2. P(X < 0) = F(0) = .5 from a above.5 − .3164 3 3x 2  d 1 3  x3     = .Chapter 4: Continuous Random Variables and Probability Distributions 12.5 b. P( 2 < x < 3) = F (3) − F ( 2) = (1 − 271 ) − (1 − 18 ) = . P(x > 2) = 1 – F(2) = 1 – 1 ∞ x −3 ⇒ 1 = 0 − (− k3 )(1) ⇒ 1 = k3 ⇒ k = 3 1 x x −4 3 ∫−∞ f ( y )dy = ∫1 3 y dy = − 3  0..963 − . which is as desired.9245 134 ) .09375 4 − x 2 d.5 by definition.5) = 1 . P(-1 ≤ X ≤ 1) = F(1) – F(-1) = 11 16 = .6875 c. x > 1 x y − 3 = − x − 3 + 1 = 1 − x13 . 5)9 – 9(.25)10 ] ≈ .0 0. V(X) = 13. E( X ) = A B−A 2 3 2 ( B − A) .53 ≤ X ≤ 20.5 0. σ = 3.200.5)10 ≈ .0000 ≈ .5) = F(.36) – F(10.σ ≤ X ≤ µ + σ) = P(6.61. then 1 A+ B A 2 + AB + B 2 2 E( X ) = ∫ x ⋅ dx = .0107 – [10(.5 and B = 20. P(. F( X ) = ∫ f ( y )dy = ∫ 90 y 8 (1 − y) dy = 90 ∫ ( y 8 − y 9 )dy x x −∞ ( 0 90 y − 1 9 x 9 1 10 10 y )] x 0 0 = 10 x − 9 x 9 10 F(x) 1.0107 – .5) = 10(.5) – F(.25)9 – 9(.5 7.0107 d.75.0 x b. F(.0 0.0107 c. P(µ .4 d. P(X ≤ 10) = F(10) = .9036 135 . so µ ± σ = (10.25 ≤ X ≤ .25) ≈ . P(µ . E(X) = 13. 17. V(X) = E(X2 ) – [E(X)]2 = 2 B With A = 7.5  1 x < 7.5 b.5 1. If X is uniformly distributed on the interval from A to B. P(10 ≤ X ≤ 15) = F(15) – F(10) = .5776 Similarly.02  0  x − 7. F(X) =   12.75 = 10(x)9 – 9(x)10 ⇒ x ≈ .5 ≤ x < 20 x ≥ 20 c.0 0. and for 0 < X < 1. The 75th percentile is the value of x for which F(x) = . a.36) Thus. F(X) = 0 for x ≤ 0.14.σ ≤ X ≤ µ + σ) = F(17. 15.75 ⇒ .Chapter 4: Continuous Random Variables and Probability Distributions 14.14) = .97) = 1 a. = 1 for x ≥ 1. 6818 V(X) ≈ . ∫ x y 2 dy = 18 y 3 3 0 8 F(x) = ] x 0 = 18 x 3 F(x) 1. 16.Chapter 4: Continuous Random Variables and Probability Distributions e.5) = c.75 = F(x) = e.8403 . E(X) = ∫ ∞ −∞ x ⋅ f ( x )dx =∫ x ⋅ 90 x 8 (1 − x) dx = 90 ∫ x 9 (1 − x) dx = 9x − 10 E(X2 ) = = ∫ ∞ −∞ 90 11 ] 11 1 0 90 11 x = 1 1 0 0 ≈ .0 0 1 2 x () 1 1 3 8 2 = b. P(µ .11134.4 = .1127) = . f. .0137 x 3 ⇒ x3 = 6 ⇒ x ≈ 1.1722 = .8873) – F(1. Thus.5 = 2..1602 = . For 0 ≤ x ≤ 2.9295) – F(.6681 136 .0124. P(x ≤ .1127.8465 .σ ≤ X ≤ µ + σ) = F(.6863 a. . σx = . 1.8873). F(x) = 0 for x < 0 and F(x) = 1 for x > 2.5 0.σ ≤ X ≤ µ + σ) = F(1.9295).6818 – (. µ ± σ = (.25 ≤ X ≤ .8182)2 = .25) d.8182 9 11 x 2 ⋅ f ( x )dx =∫ x 2 ⋅ 90 x 8 (1 − x )dx = 90 ∫ x 10 (1 − x) dx x − 11 90 12 ] 12 1 0 x 1 1 0 0 ≈ . P(µ . E(X) = f.15 σx = . ∞ x ⋅ f ( x )dx =∫ x ⋅ 2 −∞ 2 ( ) x ⋅ 38 x 2 dx = 0 12 5 = 1 64 − 18 ( 14 ) = 3 7 512 ≈ . P(.8171 ∫ ∫ E(X2 ) = V(X) = 1 8 1 64 − ( 32 ) = 2 3 20 0 ( 3 8 ∫x 1 3 8 0 4 ) x 2 dx = dx = ( 3 1 8 5 ∫ 1 3 8 0 x 3 dx = x 5)]0 = 2 12 5 ( 3 1 8 4 x4 )] 2 0 = 32 = 1.7068.5) = F(.7068) = .0 0.5) – F(.5) = F(. Thus.3873 µ ± σ = (1. E(X) = ∫ 4 2 = ∞ µ~ = 3 x ⋅ 34 [1 − ( x − 3) 2 ]dx = ∫ 1 3 4 −1 1 ( y + 3)(1 − y 2 ) dx 3 y2 y4  3 3 3 y + − y −   = ⋅4=3 4 2 4  −1 4 3 4 ( x − 3)2 ⋅ [1 − ( x − 3) 2 ]dx ∫−∞ ∫ 2 4 1 3 3 4 1 2 2 = ∫ y (1 − y )dy = ⋅ = = . σx =  =  3 12 12    2  2 B c.A)p B 1 1 x2  1 1 A+ B E( X ) = ∫ x ⋅ dx = ⋅  = ⋅ ⋅ B2 − A2 = A B− A B− A 2 A 2 B− A 2 1 1 A 2 + AB + B 2 E( X 2 ) = ⋅ ⋅ B 3 − A3 = 3 B− A 3 ( B b. a. ( ) ) 2  A 2 + AB + B 2   ( A + B )  ( B − A) ( B − A)   V (X ) =  − . F(X) = ( x − µ ) 2 f ( x) dx = x− A =p B−A ⇒ x = (100p)th percentile = A + (B .2 − 1 4 4 15 5 V(X) = 18. c. Thus −1 4 3  −1 4 3 3  0 x<2  1 3 F(x) =  4 [3 x − 7 − ( x − 3) ] 2≤x≤4  1 x>4  x−3 2 3 4 b. F( X ) = For 2 ≤ X ≤ 4. a. By symmetry of f(x). E( X n ) = ∫ x n ⋅ A 1 B n +1 − An +1 dx = B− A ( n + 1)( B − A) 137 .Chapter 4: Continuous Random Variables and Probability Distributions 17. ∫ x −∞ f ( y) dy = ∫ x 3 2 4 [1 − ( y − 3) 2 ]dy (let u = y-3) x −3 3 u3  3 7 ( x − 3) 3  =∫ [1 − u ]du = u −  =  x − −  . 597 ≈ .25[1 + ln(4)] ≈ . For the waiting time X for a single bus. ∫ 1 y2 udu = 25 50 y 0 For 5 ≤ y ≤ 10.25 ln(4) . F(y) = ∫ y 0 = f (u ) du = ∫ f ( u) du + ∫ f ( u) du 5 y 0 5 y 2 1 u  2 y2 + ∫  − du = y − −1 2 0  5 25  5 50 F(x1) 1. F(y) = 20.0 0 5 10 x1 y 2p ⇒ y p = (50 p )1/ 2 b. For 0 ≤ y ≤ 5. P(1 ≤ X ≤ 3) = F(3) – F(1) ≈ . a.966 .1667 .. E(Y) = 5 by straightforward integration (or by symmetry of f(y)).. For 0 < p ≤ .5 < p ≤ 1. f(x) = F′(x) = . and similarly V(Y)= 50 y 2p 2 For .5 and V(X) = 12 21.597 b. ( ) 2 3  2 πr   1 − (10 − r ) dr  4 3 11  3  11 2 2 2 3 4 =  π ∫ r 1 − (100 − 20r + r ) dr = π ∫ − 99r + 20r − r dr = 100 ⋅ 2π 9 9 4 4   E(area) = E(πR2 ) = ( ∫ ∞ −∞ πr f ( r ) dr = 2 ∫ 11 9 ) 138 . 12 25 E(X) = 2.5 0.369 c.Chapter 4: Continuous Random Variables and Probability Distributions 19.0 0. p = F(y p ) = c.25 ln(x) for o < x < 4 a. p = yp − − 1 ⇒ y p = 10 − 5 2(1 − p ) 5 50 50 = 4. P(X ≤ 1) = F(1) = .5. 5  5 so σ = 3. so 5 2 9   9 Var  X + 32 =   ⋅ ( 2) 2 = 12. F(x) = ∫ 21 − 2 dy = 2 y +   = 2 x +  − 4.6 139 . temperature in °F = 9  9 E  X + 32  = (120) + 32 = 248. x  1   1  1  a.64 find µ p 2 + 8 p ] To 2 c. 0).5 1   max( 1. so 1 y  y 1 x    0 x <1   F(x) = 2( x + 1x ) − 4 1≤ x ≤ 2  1 x>2  x b.5 − x )1 − 2 dx = . E(X) = 2 E(X ) = d. 2  x3  8 x − 1 dx = 2 − x   = ⇒ Var(X) = .5 − x.0) f ( x ) dx =2 ∫ (1. 5   5 With X = temperature in °C.Chapter 4: Continuous Random Variables and Probability Distributions 22.061 1 x   9 X + 32.0626  3 1 3 2 ∫ 2 1 1 .  1   − 4 = p ⇒ 2xp 2 – (4 – p)xp + 2 = 0 ⇒ xp = 14 [ 4 + p + 2 x p +  x p   ~ .5 ⇒ µ~ = 1.5 – X.614 1 x  x    2  1 2∫ 2 1 ( ) Amount left = max(1.96 . ∫ 2 1 2  x2  1  1  x ⋅ 21 − 2 dx = 2∫  x −  dx =2 − ln( x)  = 1. set p = . For 1 ≤ x ≤ 2. so E(amount left) = 23. 9 as desired. Var(x) = ∞. (i.5 b.e. a linear transformation of X). θ µ~ ) = .9) ) = . 90th for Y = 1. When Y = aX + b. i. ∞ kθ k ∫ x n− ( k +1) dx .8 µ 25.Chapter 4: Continuous Random Variables and Probability Distributions 24. e. ∫ a. E(X ) = kθ ∫ dx = . c. if n<k.9) + 32) = P(1. E(X) = ∞ ∞ θ ∞ ∞ 1 kθ k kθ k x − k +1  kθ x ⋅ k +1 dx = kθ k ∫ k dx = =  θ x − k + 1 θ k − 1 x kθ 2 c.8X + 32 ≤ 1. The (100p)th percentile for Y is 1. E(Xn ) = a.8η(.8η(p) + 32. so θ x k −1 k −2 2  kθ 2   kθ  kθ 2  −  Var(X) =  =  (k − 2)(k − 1) 2  k − 2   k −1 2 k ∞ 1 d.8η(. verified by substituting p for .9 in the argument of b.8η(. which will be finite if n – (k+1) < -1.9) + 32) = (X ≤ η(. since P(Y ≤ 1.9) + 32 where η(. then the corresponding (100p)th percentile of the Y distribution is a⋅η(p) + b. and the (100p)th percentile of the X distribution is η(p). (same linear transformation applied to X’s percentile) 140 . E(X) = b. since E(X2 ) = ∞. ~ + 32) = P(1.9) is the 90th percentile for X.8 µ~ + 32) = P( X ≤ P(Y ≤ 1.8X + 32 ≤ 1.e. 3 26.17) = Φ(2.17 d. P( -1.81 c.Φ(-1..121 ⇒ 1 . Φ(2) .Φ(c)) = 2Φ(c) – 1 ⇒ Φ(c) = .50 ) = P( -2.50 ≤ Z ≤ 2.016 ⇒ 1 .41 27.Φ(-1.50) = . a.8790 ⇒ c = 1. P(0 ≤ Z ≤ 2.50) .50) .Φ(-2.37) = .14.3413 c.Φ(1.Φ(-2.Φ(-c) = Φ(c) – (1 .50) = Φ(2..0791 i.9599 g. 141 . Φ(1) .75)] = 1 .4938 d. .Φ(-2. P( c ≤ | Z | ) = .75) = .7910 ⇒ c = .121 = . P(-c ≤ Z ≤ c) = Φ(c) .50) .9876 a.97 e.50) = . 1 .50) = .1 row and the .37) = .Φ(-c) = 2Φ(c) – 1 ⇒ Φ(c) = .9840 = 1 – P(c ≤ | Z | ) = P( | Z | < c ) = P(-c < Z < c) = Φ(c) .75 < Z) + [1 – P(Z < -1. Φ(0) .50) = .0668 j.9838 is found in the 2.9920 ⇒ c = 2.50) = .4850 b. Φ(1. b.Φ(1.9876 e. P( |Z| ≤ 2.9147 f.17) .Chapter 4: Continuous Random Variables and Probability Distributions Section 4.P(c ≤ Z) = P(Z < c) = Φ(c) = 1 . P(0 ≤ Z ≤ c) = .016 = .291 ⇒ Φ(c) = .9104 h. P(c ≤ Z) = .Φ(0) = . Φ(2.9920 ⇒ c = .04 column of the standard normal table so c = 2. Φ(2.Φ(0) = . -1.34 c. P( |X – 80 | ≤ 10) = P(-10 ≤ X . 25th = -75th = -.3370 ⇒ z.7500 ⇒ c ≈ .0055 is .00) = .34 entry). Φ( z. Φ( z. .0055 = 2. 30.50) .68 entries.55 entries.09) = .80 ≤ 10) = P(70 ≤ X ≤ 90) P(-1.9104 d.633 ≈ -. a. so z.00) = .34 (.8413 e. P(70 ≤ X) = P(-1. Φ(c) = .04 column) b.7517 are in the .555 (both .Chapter 4: Continuous Random Variables and Probability Distributions 28. which implies that Φ( z. Φ(c) = .633 = .9100 ⇒ z = 1.42 a. Area under Z curve above z. Φ(c) = .00) = .5 10   100 − 80   65 − 80 P ≤z≤  = P(-1.00) = .9099 is the entry in the 1.9099 appears as the 1.34 (since .00 ≤ Z ≤ 1.67 and .06 ⇒ c ≈ .50) = Φ(1. 9th percentile = -91st percentile = -1.. P(65 ≤ X ≤ 100) = 29.7486 and . P(X ≤ 100) = b.0055.50 ≤ Z ≤ 1.50) = .0055) = 1 .633) = area below z.. P(85 ≤ X ≤ 95) = P(. d.0594 and .50 ≤ Z ≤ 2) 10   10 = Φ(2.9100 ⇒ c ≈ 1.50) = .0606 appear as the –1.54 b.Φ(.00 ≤ Z) = 1 . respectively).0055 = .675 since . c.00) .2417 f. 100 − 80   P z ≤  = P(Z ≤ 2) = Φ(2.Φ(-1.6826 142 .3 row. P(X ≤ 80) = c.9945.675 e.56 and –1.Φ(-1. respectively.9772 10   80 − 80   P z ≤  = P(Z ≤ 0) = Φ(0.0668 = .9772 . a. P(X > .10) = Φ(-3.Chapter 4: Continuous Random Variables and Probability Distributions 31.40) = . P(2. 33. P(x = a) = 0.06) = .4) = Φ(2..7967 b. that is the 95th percentile (5% of the values are higher).8 – c and 8. 4 = 1 − (1 − .0004 c.8 – c = µ + (-2.43) = 1 . a.83) = 1 .8) = 6. b.8) ⇒ c = 2.00 ≤ Z ≤ 3.00 ≤ Z ≤ 1.2033 = .8 + c) = . and let Y be defined analogously for the second machine. 1 – P(no A i ) = 1 − P( A ′) 34.36 ≤ Z ≤ .40) ≈ P(Z ≤ -2. P(10 ≤ X ≤ 12) = P(-4.3987. then P(at least one A i ) = 32. P(X ≤ . The 95th percentile of the standard normal distribution = 1. 143 .9452 1.0082 c.43) .8 – c ≤ X ≤ 8.5 ≤ X ≤ 17.3987 mg/cm3 . So c = . P(5 ≤ X ≤ 10) = P(-1.6664 . so 8.4) = .8028 Let X denote the diameter of a randomly selected cork made by the first machine.33. Define event A as {diameter > 10}.50 ≤ X-15 ≤ 2.33(2.Φ(. P( |X – 10| ≤ 2(1.50) = P(12.645. P(X ≥ 10) = P(Z ≥ .36) = . We want the value of the distribution. c.00) = .1) = P(-7.1) = P(-1. P(8.25) ) = P(-2. The largest 5% of all concentration values are above .25   a. 18 − 15   P z ≤  = P(Z ≤ 2.5795 d.3336.43) = 1 .6826 P(2.9544 a.0869 = .33(2. P( X ≤ 18) = b.00 ≤ Z ≤ -2.Φ(-1.8 + c are at the 1st and the 99th percentile of the given distribution.33) = .524. e.8 – 2. respectively.3336) 4 = 1 − . The 1st percentile of the standard normal distribution has the value –2.645)(. P(x > 10) = . From a. so 8. P(X > 10) = P(X ≥ 10) = . P(X > 20) = P(Z > 4) ≈ 0 c.33)σ = 8.00) = . since for any continuous distribution.9987 So the second machine wins handily.9 ≤ Y ≤ 3..1972 = .6664 = .3336.30 + (1.3336.98.43) = Φ(..25) = P(Z > -.5) P(-2.00 ≤ Z ≤ 2.9 ≤ X ≤ 3.00) = .40) = Φ(-2. Then P(µ .140.1  .Φ(-1.645 is the 5th z percentile (z.5σ or X > µ + 2.998 = . P(-1.256 and µ + σ(-1.67)(4.667) = .P(-2.671.50) = .5) = 39. We desire the 90th percentile: 30 + 1.σ or µ + σ ≤ X ≤ µ + 2σ) = P(within 2 sd’s) – P(within 1 sd) = P(µ 2σ ≤ X ≤ µ + 2σ) . σ = .002 From Table A..645).5 ≤ Z ≤ 1. P( X < µ .Chapter 4: Continuous Random Variables and Probability Distributions 35.9996)5 = 1 .0510 P < z <  implies that = 1.95.985 P(damage) = P(X < 100) = 100 − 200   P z <  = P(Z < -3.6826 = . σ = 0.5σ) = 1 .96  σ 39.2σ ≤ X ≤ µ .2000.28(0.5) = 1 .5σ) = P(-1.P(µ .28) = 10. Since 1.1 ≤ X ≤ µ + .28) and –1.05 = 1. a.5   60 − 43   P(X > 60) = P z >  = P(Z > 3.σ ≤ X ≤ µ + σ) = . P(µ .925) = -.96) = .585. 43 + (-0.0124 c. 36.000 µm.1 = . 30 + 5( -1. = 1 – P(none damaged) = 1 – (. and thus that σ = σ σ 1.. σ = 4.96 ≤ Z ≤ 1.3. from which σ(-2.7 b.645) = 9.8664 b.9544 .P(µ .9876 = .50) ...5σ) = 1 .2.1 = 1.2514 4 .96. the given information implies that µ + σ(1.5 40 − 43   P z ≤  = P(Z < -0.0004 300   P(at least one among five is damaged) 38.1.5 ≤ Z ≤ 2.2718 144 .5) = Φ(1. 37. 40.14) = 3. a. µ + σ⋅(91st percentile from std normal) = 30 + 5(1. µ = 3.179 µ = 43.5   a.225 c.778) ≈ 0 4 .33) = .1  − .1 .34) = 36. P(µ . P(X < 40) = b.1) = .2.5σ ≤ X ≤ µ + 2. and µ = 10.5σ ≤ X ≤ µ + 1.28 is the 90th z percentile (z.555) = 22. 05 d.500 inches.002 .499 and σ =.72)] No. 4000 − 3432   P( x > 4000) = P Z >  = P( z > 1.Φ(-1..8810 = .55) b.3% of the bearings will be unacceptable.002     Φ ( −1.0015 + .7938 b. so c – 1 is the 99th percentile of the distribution.55 ≤ Z ≤ 1. 10⋅P(a single one is acceptable) = 9.496 and . The new distribution has µ = .499  . P(67 ≤ X ≤ 75) = P(-1.Φ(.10.1190 4000 − 3432   3000 − 3432 P(3000 < x < 4000) = P <Z<  482 482   = Φ (1.00 ≤ Z ≤ 1.499    P z <  + P z >  = P ( z < − 1 .18) − Φ (− .496 or larger than .9750 3 3 3  3 c = 1.6969 b. c.72) = Φ(.504.504 − .1841 = .18) 482   = 1 − Φ (1. P(x < .33) = 20. so P(Y ≤ 8) = B(8.9) = . 44.496 − . 2000 − 3432  5000 − 3432    P( x < 2000orx > 5000) = P Z <  + P Z >  482 482     = Φ (− 2.0006 = . or 7. X ∼N(3432.96 ⇒ c = 5.155. P(-1.88 3 43.504 inches. 45.5 ) . and c = 21.67) = . so unacceptable bearings will have diameters smaller than .Chapter 4: Continuous Random Variables and Probability Distributions 41.25)] = .0021 145 .002.073 .504) = . a.5) + (1 − Φ (2.84) = P(Z < 1.28) = .55) = Φ(. the acceptable range for the diameter is between .72) = Φ(1.9.72 ≤ Z ≤ . 42.18) = 1 − .155.5 ) + P ( z > 2 . symmetry of the Z curve about 0. a.264 The stated condition implies that 99% of the area under the normal curve with µ = 10 and σ = 2 is to the left of c – 1.Φ(1.55) – [1 .95 ⇒ Φ( ) = .0062 = . P(-1. With µ = . By symmetry.72) .8810 − .0068 + .55) . P(70 – c ≤ X ≤ 70 + c) = c c c −c P ≤ Z ≤  = 2Φ( ) − 1 = . Thus c – 1 = µ + σ(2.90) = .72 ≤ Z ≤ -. 482) a. p = P(X < 73.55) = P(.5)) = .496 or x >.97 ) + [1 − Φ (3. P(at most 30) = P(X ≤ 30 + .µ | ≥ 2σ ) = 1 – P(-2 ≤ Z ≤ 2) = .394   = 1 − Φ (1.295 = 1844 and 5020.µ | ≥ σ ) = P( X ≤ µ .7019 This yields the same 1. P(less than 30) = P(X < 30 .0392 5% of 1000 = 50: 50.. 46.5) = P(Z < . Using the Z table.5. and we wish to find 3178 − 3432   P( x > 3178) = P Z >  = 1 − Φ (−.µ | ≥ 3σ ) = 1 – P(-3 ≤ Z ≤ 3) = .0005 and the bottom . 39 .5595   P( x > 7 ) = P Z >  = 1 − Φ ( −.. We need the top .σ or X ≥ µ + σ ) = 1 – P(µ . Converting to lbs yields mean 7. a.. ±3. so we will use a middle value. 1.5 − 30   P( x ≥ 40) = 1 − P( x ≤ 39 ) = 1 − P Z ≤  5.5 ≤ X ≤ 30 + .7286 b.7019 482   d.00 5. both .9995 and .1000. Then 3432±(482)3. P(20 .0608. then 7 lbs = 3178 grams.53) = .5) = P(-1.53) = .295.5595 and s.5) = P(Z ≤ 1.03) ˜ N(30.76) = 1 − .1 ≤ Z ≤ 1.σ ≤ X ≤ µ + σ) = 1 – P(-1 ≤ Z ≤ 1) = .8159 146 .0026 48.8643.Chapter 4: Continuous Random Variables and Probability Distributions c.0608   answer as in part c.5 ≤ X ≤ 30. e.394   P( |X .0005 have multiple z values. 47.0005 of the distribution.394) a.9608 = . or the most extreme . We will use the conversion 1 lb = 454 g.3174 Similarly. We use a Normal approximation to the Binomial distribution: X ∼ b(x.0456 And P( |X .9) = . b. Then 7 − 7.80) ≈ 1. P( |X .1) = .d.5 − 30   P( x ≤ 50 ) = P Z ≤  = Φ( 3.1) = .1% of all birth weights are less than 1844 g and more than 5020 g.5) = P(19. 0968 = ..5) .33) = .212 P(. P(X ≤ 30) = b.885 P(Z ≤ 1. P(20 ≤X) P(19.75 ≤ Z ≤ .8 20 2.5 12. P(15 ≤ X ≤ 25) = P(X ≤ 25) – P(X ≤ 14) =  29 + .80 ≤ Z ≤ 3.30) .47) = .20) = .017 P( Z ≤ -2.78) = .6 15 2.573 P(-2.5793 .5 ≤ normal) .5957 b. P(X < 30) =P(X ≤ 29) = c.24) = .5) = P(Z ≤ -2.9545 a.5) .002 .5 − 20  Φ   = Φ(2. µ = 200.87 ≤ Z ≤ 2.10.25) = . 50. σ = 10.5 ≤ normal ≤ 20.9032 . P(180 ≤ X ≤ 230) = P(179.0122 c.029 . P(X < 175) = P(X ≤ 174) = P(normal ≤ 174.0026 .5 − 20  Φ   = Φ(2.0329 .6 .8 .30) = .575 P(Z ≤ . P: µ: σ: a.45 .2112 .4.Φ(-1.9875 18   Φ(1.0099 147 .24) = .  25 + .20) = . P(X ≤15) P(normal ≤ 15.617 .5668 .50 .20) = . npq = 18  30 + .20 ≤ Z ≤ 2.5 − 20   14 + .5 2. n = 200.9932 18   a. p = .8849 .Chapter 4: Continuous Random Variables and Probability Distributions 49.5 − 20  Φ   − Φ   18 18     N = 500.00 P(15≤ X ≤20) P(14.5 . .8064 51. np = 20.25) = .577 P(-.9666 b.5) = P(-1.5987 P = .5 ≤ normal ≤ 230. 5 ⋅ exp  b. a. P(Z ≥ 1) ≈ .999937 d.  83 + 351 + 562   = . P(Z > 4) ≈ . 9 5 (115) + 32 = 239 .6  P(Z > 5) ≈ .Chapter 4: Continuous Random Variables and Probability Distributions 52.00000029  305.  − 4392   = . variance = 12. Normal.5 ⋅ exp  148 .1587  703 + 165   − 2362   = . a   Now differentiate with respect to y to obtain fy (y) = ′ 1 Fy ( y ) = 2π aσ e − 1 2a2 σ 2 [ y −( a µ +b )] 2 so Y is normal with mean aµ + b and variance a2 σ2 . so  340.0013  399. P(Z > 3) ≈ .] = P(Z ≤ […]) = p as desired  σ  53.0000317 . P(X ≤ µ + σ[(100p)th percentile for std normal])  X −µ  P ≤ [..5 ⋅ exp  54.75  P(-4 < Z < 4) ≈ 1 – 2(.0000317) = . Fy (y) = P(Y ≤ y) = P(aX + b ≤ y) = ( y − b)   P X ≤  (for a > 0).3333   − 3294   = . mean a..96 b.5 ⋅ exp  c. 424 b.α=5  24  F  . (P(X ≤ µ~ ) = .Chapter 4: Continuous Random Variables and Probability Distributions Section 4.  5  3  1  3 1  1   3 Γ  = Γ  = ⋅ ⋅ Γ  =   π ≈ 1.567.4) = . F(0. so while the mean is 24. a. α = 4 a.715  4  µ = 24. αβ2 = 144 ⇒ β = 6.5) = .7) = .735 e.4) = . P( 3 < X < 8 ) =. P(20 ≤ X ≤ 40 ) = F(10.411 56.4) – F(2.5  = F(6. This is a result of the positive skew of the gamma distribution.238 b. column 5 of Table A.4 55. σ2 = 80 ⇒ αβ = 20. 80 20 . P(X ≤ 24) = c.4) = P(X ≤ 0.7) – F(3. α= 4) = 0 a. the median is less than 24. 149 .653 e.653 f.4) = .5) = . P( 3 ≤ X ≤ 8 ) = F(8.5) – F(5.5) = . P(12 ≤ X ≤ 24 ) = F(4. 58. 57. P(X < 5) = P(X ≤ 5) = .7) – F(4. µ = 20.329  2  2  2  2 2  2   4 c.5).7)] = .7) = . P(X > 8) = 1 – P(X < 8) = 1 – F(8.238 c. P(X < 4 or X > 6) = 1 – P(4 ≤ X ≤ 6 ) = 1 – [F(6. Γ(6) = 5! = 120 b. P(X ≤ 5) = F(5.371 from row 4. αβ2 = 80 ⇒ β = b.713 a. σ2 = 144 ⇒ αβ = 24.4 d. P(X ≤ 24 ) = F(4.313 d. F(5. F(4.7) = . 01386) = 1 − e −1.000) = 1 – F(20. only .4. 59. Similarly.982 −(1 )(5 ) [ ] − 1 − e − (1)( 2) = e −2 − e −5 = .000 implies λ = .9375 . P(X ≤ 4 ) = 1 − e d.000) = .000 .4)=. the 99th percentile = 6(10)=60. −(100 )(. We want a value of X for which F(X. so t = 6(11)=66.00004) − 1. σ = 72.9999 = .45)(. so P(X > µ + 2σ) = P( x > 75. 01386) = .000.000) = .5 = P(X ≤ Mean = a. P(X > µ + 3σ) = P( x > 100. In table A.00004) = .000 ≤ X ≤ 30. 772 = .7499 = 1 − e −2.5% of all transistors would still be operating.995. In the table.699 P(20.15 . 5) = ..01386 P(X > µ + 2σ) = P(X > 72. At 66 weeks.000. 000) = .01386) = e −2. .Chapter 4: Continuous Random Variables and Probability Distributions c. σ = c.15 + 2(72.129 60.P(X ≤ 100 ) = . 00004)(20.4)=.15)) = P(X > 216. we see F(10.449 . . We want a value of X for which F(X. σ = 1 = 25.000.148 b.01386) = .551 = . 2 P(X ≤ 30..01386) −( 200 )(. ~ ~ µ~ ) ⇒ 1 − e −( µ )(.5 ⇒ e − ( µ )(.7499 = . P(2 ≤ X ≤ 5) = 1 − e a.4)=. P(X ≤ 100 ) = 1 − e 1 =1 λ −(1 )(4 ) = 1 − e −4 = .01386) = ln(. 61. E(X) = b.000) = 1 – P(X ≤ 20.00004) = e = .000) = λ 1 – F(75. d.693 ⇒ µ~ = 50 .45) = [ ] 1 − 1 − e − ( 216. F(11.1876 b. So with β = 6.018 150 = e −(.05.00004 λ P(X > 20.0498 c. 386 = .4)=.9375 P(X ≤ 200 ) = 1 − e P(100 ≤ X ≤ 200) = P(X ≤ 200 ) .995.000) = F(30.5 − µ~ (. µ= 1 = 72. 1 = 25. 1 =1 λ a.990..699 .99.15 . so Fx(t) = P(X ≤ for t ≥ 0.d. fx(t) = . Now differentiate with respect to y to obtain the chi2π squared p.05 t (e ) −λ t 5 = e −. e −nλt . a. P(X ≤ t) = P(at least n events in time t) = P( Y ≥ n) when Y ∼ Poisson with parameter λt .f. E(X) = 20 λ λ a. for λ = .e . For p = . {X2 ≤ y} = {− b.5 = µ λ λ With xp = (100p)th percentile. 151 . E(X) = αβ = n b. so X has an exponential distribution with −λx − λx e p ⇒ e p = 1− p .05. −. c. 1 n = .10) = .5.5.693 . e − λt (λt )k . k! k =0 n −1 Thus P(X ≤ t) = 1 – P( Y < n) = 1 – P( Y ≤ n – 1) = 1−∑ 63. P(X ≤ 30) =  30  F  .Chapter 4: Continuous Random Variables and Probability Distributions 62. − [ln( 1 − p )] ~ = .10  = F(15. p = F(xp ) = 1 - 65. ⇒ −λx p = ln( 1 − p) ⇒ x p = . n = 10. {X ≥ t} = A 1 ∩ A 2 ∩ A 3 ∩ A 4 ∩ A 5 b.05e with parameter λ = . 05t . but By the same reasoning. P(X ≤ t) = 1 parameter nλ. P(X2 ≤ y) = ∫ y≤X≤ y − y y } 1 −z 2 / 2 e dz . Thus X also ha an exponential distribution . x. P(X ≥ t) =P( A 1 ) ⋅ P( A 2 ) ⋅ P( A 3 ) ⋅ P( A 4 ) ⋅ P( A 5 ) = −. a. 64. with ν = 1. 05t t) = 1 .930  2  c. 5 reduces to ~ 2.5 ) 2 1.6931) (200) = 172.982 −( 6 / 3) 2 2 [ ] − 1 − e − (1. e [ − ] ( x − 3 .5 / 3) = e −.e. E(X – 3. For x > 3. P(X > 5) = 1 – P(X ≤ 5) = 1 − d.0001 = . The equation F( µ~ ) = .2.Chapter 4: Continuous Random Variables and Probability Distributions Section 4. i.727.0636 2. P(5 ≤ X ≤ 8) = 1 − e µ~ is requested.5) = 1.5 )  Γ(2) − Γ   = . F(x) = P( X ≤ x) = P(X – 3.329 so E(X) = 4.3678 −1 −1 −1 −1 152 −9 .5 ≤ x – 3. 5 = 1 − e −1. 5 P(X > 300) = 1 – F(300.5 = e .75 ≈ .2. P(1.4 . 2. P(X ≤ 6) = 1 − e c.5Γ c. 2 2 2   1  2 Var(X) = 9 Γ(1 + 1) − Γ 1 +  = 1.8257 ..162 = .2.5  µ  −(µ~ / 200 )2 .6637 c.5 66.829  2 2 2  3  Var(X) = Var(X – 3. 200) = 1 − e P(X < 250) = P(X ≤ 250) ≈ .3679 − .66 .5.8257 e −(1. so µ = (.5. P(X ≤ 250) = F(250.368 − [1 − e ] = e − e = .5.926  2   a.483  2   −9 [1 − e ] = e = .F(100.5. The median a. ln(. 1 1  1 3Γ1 +  = 3 ⋅ ⋅ Γ  = 2. 5  3  = 1.5) = 1 - b. − ( 250 / 200) 2.8257 −( 6 / β )α = 1 − e −( 6 / 3) = 1 − e −4 = . E(X) = b.5) = ..760 2 67. 200) ≈ .5 ~ .5) = (1.5.  200  68.5) ≈ −   .5 ≤ X ≤ 6) = 1 − e a. P(100 ≤ X ≤ 250) = F(250. 200) = b.25 − e −4 = . 200) . 930 − 1 − e − 1 = .β=100 () x α β ( ) a. a.5 = F ( µ ~ 2 = −9 ln(.34 )(. F (x .97 )( ) 2 V ( X ) = e (2( 4. 5) = 6. ( ) P = F(xp ) = 1 .5 ⇒ c.25 ln(.632 = . dy = dx ) β βα β ∞ 1 1  −y β ∫ y α e dy = β ⋅ Γ1 +  by definition of the gamma function.1587 = P( x > 200 ) . so t = 3. c. 50) ⇒ x = 98.13) = .8   153 . 2 − xp ( µ~ − 3. F (105) − F (100 ) = .53 σ = 117.930 ) x = .8   ln( 200 ) − 4.8964 ) = 13. 8 ) ⋅ e −. 50) ⇒ − x = 100 20 ln(.1)]1/2 = 2. β ) = 1 − e b. a.Chapter 4: Continuous Random Variables and Probability Distributions 69.5] = .20.5596 ⇒ µ~ = 4.50 ⇒ −(100 ) ( 20 = ln(.50 e − µ / 9 = .5   P( x ≤ 100) = P z ≤  = Φ(0.298 c.5 ⇒ µ b.8413 = . From c.2383 ⇒ µ~ = 2.2761 = 5. 5) /1.5) = 1.5[ -ln(. 1 − e −[( µ− 3. b. 82 = 123 . µ=∫ ∞ 0 70.5   P( x ≥ 200 ) = P z ≥  = 1 − Φ(1 .00 ) = 1 − .5 is 1.5 + 2. 50) )  − x  20   = ln(.18  100  72.5517 .373 ln( 100) − 4 . 0  α ~ ) = 1 − e −( µ / 3)2 ⇒ .27661.5) +.7761 X ∼ Weibull: α=20. 8 − 1 = (15.930 − . E( X ) = e (  µ +σ 2   2    = e 4. .50 = 1 − e − = 1− e − 105 20 100 ( − ( ) x 20 100 ⇒e − ( ) x 20 100 = 1 − .75 α β ⇒ (xp /β)α = -ln(1 – p) ⇒ xp = β[ -ln(1-p)]1/α The desired value of t is the 90th percentile (since 90% will not be refused and 10% will be). 71. α α α −1 −( x β )α x αx α −1 x⋅ α x e dx = (after y =   .5) 2 = -2. the 90th percentile of the distribution of X – 3.070 = .e ~ d.776 .367. Chapter 4: Continuous Random Variables and Probability Distributions 73. ~)= . 72) = . so ln( µ σ ~ = e 3.34) = . P(any particular one has X > 125) = . 1) = 125. 1 . For the power distribution. 01) / 2 = e 5.2 ) 2 a. 2 = 68. Since the median of the standard normal distribution is 0.α)th percentile is e µ+σzα . (where µ σ   refers to the lognormal distribution and µ and σ to the normal distribution). E(X) = /2 σx = 122. 01) ⋅ (e .0335.474 = 238.41 µ e.2 ) − 1 = 14907.0335) − 3 .5 )+ (1.68) – P(Z ≤ . 005 = 149. P(X > 125) = 1 – P(X ≤ 125) = ln(125) − 5   = 1 − P z ≤  = 1 − Φ(− 1.  ln( X ) − µ  ≤ zα  = P(ln( X ) ≤ µ + σz α )  σ   µ +σ zα = P( X ≤ e ) .168. a.9573 ⇒ expected # = 10(.0949 b.. 5+ (1. 74. P(50 ≤ X ≤ 250) = ( ) e 2(3.9573 . ln(µ~ ) − µ ~)=µ = 0 .5   P z ≤  = P(Z ≤ .2   The lognormal distribution is not a symmetric distribution.9573) = 9.1   c. 1.6331 = . which is (continued) e 5+( − 1. e 3. 645)(.1    ln(110) − 5  = Φ(− 1. Var(X) = e10+(. 5 = 33.12 µ b.2     P(Z ≤ 1. ⇒ ~ =eµ . P(110 ≤ X ≤ 125) d. so the 100(1 .573 f. E(X) = b.60) = . e 3.41 e 5+(. 2) = e 5.α = Φ(zα) = P(Z ≤ zα) = the 95th percentile is 75.72) − Φ  = .5 = F( µ ~ −µ   ln( µ) ~ Φ  . 5+ (1.5    P z ≤  − P z ≤  1.9535 .90 154 .01 − 1) = 223. ~ = e 5 = 148. 645)(1.2 ) ⋅ e (1.0335) = ln(68.0414 . µ For the power distribution. We wish the 5th percentile.5  ln(50) − 3 .594 a.0427 − .2 1.157 . P(X ≤ 68.7257.0013 = . c. V(X) = 2 ln(250) − 3.3204. 2 0 ∫ 30(x . σx = 11. P(..45) = .2) = c.3026) = P(-. e 1.4 4 . V(X) = = . α +β E[(1 – X)m] = ∫ (1 − x) 1 0 m ⋅ 155 .2991 1 2 The point of symmetry must be .4) = ∫ 30(x .20 a.6094 ≤ ln(X) ≤2.0016 ) − x 5 dx = . i.714 .9+ 9 2 /2 = 10. E(1 – X) = 1 – E(X) = 1 - a. 78. ( 12 − µ )α −1 ( 12 + µ )β −1 = ( 12 + µ )α −1 ( 12 − µ ) β −1 . Γ(5 )Γ(2 ) ( so P(X ≤ .6736 . ) − x 5 dx = . so we require that f ( 12 − µ ) = f ( 12 + µ ) .0255 (5 + 2 ) 7 (49 )(8 ) Γ( 7 ) ⋅ x4 ⋅ (1 − x) = 30 x 4 − x 5 for 0 ≤ X ≤ 1.. a. 77. 4 ) β −1 dx = Γ(α + β ) α −1 β −1 x (1 − x ) dx Γ(α )Γ (β ) Γ(α + β ) 1 α −1 Γ(α + β ) ⋅ Γ(m + β ) m + β −1 = x (1 − x ) dx = ∫ 0 Γ(α )Γ( β ) Γ (α + β + m )Γ(β ) β For m = 1.32 ≤ Z ≤ . which in turn implies that α = β.e.Chapter 4: Continuous Random Variables and Probability Distributions 76.395 .3026) = P(Z ≤ .286 7 7 Γ(α + β ) ∫ x ⋅ Γ(α )Γ (β ) x (1 − x) Γ(α + β ) 1 α β −1 x (1 − x ) dx ∫ 0 Γ(α )Γ (β ) Γ(α + β ) Γ(α + 1)Γ( β ) αΓ (α ) Γ(α + β ) α = ⋅ ⋅ = Γ (α )Γ(β ) Γ(α + β + 1) Γ (α )Γ(β ) (α + β )Γ(α + β ) α + β 1 α −1 0 b. P(X ≤ 10) = P(ln(X) ≤ 2. 81) ⋅ (e . f(x) = 10 5 5 = = .024 . E(1 – X) = .6736 P(5 ≤ X ≤ 10) = P(1.2 d.3745 = . E(X) = b. E(X) = 79.2 ≤ X ≤ .45) = .03936 5 2 = = .81 − 1) = 125.8+(. E(X) = b. Var(X) = e 3. 0 172.390 -0.365 .3 = F(. Var(Y) = ⇒ Var   = = = = 7  20  2 α + β  20  2800 28 E(Y) = 10 ⇒ E  a. The given probability plot is quite linear.5 173.670 -0.4 2 2 dy = .Chapter 4: Continuous Random Variables and Probability Distributions 80. αβ (α + β ) (α + β + 1) 2 b.P(8 ≤ X ≤ 12) = 1 . so P( Y < 8 or Y > 12) = 1 .0 204.365 = .6 422.3. Section 4. β = 3 .  12   8  . ∫ 30 y (1 − y ) . This observation is clearly much larger than what would be expected in a normal random sample. ⇒ α = 3.6.3) – F(.6 400 lifetime percentile -1.3 − F  .7 216.6 81.3.9 262.7 172..6 .3). 3.390 0.645 300 200 -2 -1 0 1 2 z %ile The accompanying plot is quite straight except for the point corresponding to the largest observation.130 0.  20   20  P(8 ≤ X ≤ 12) = F  The standard density function here is 30y 2 (1 – y)2 . after some algebra.040 1. 82. it would be inadvisable to analyze the data using any inferential method that depended on assuming a normal population distribution. We expect it to snap at 10. so P(8 ≤ X ≤ 12) = c. and thus it is quite plausible that the tension distribution is normal.3.670 1.040 -0.665.4.130 0. The z percentiles and observations are as follows: observation 152. Because of this outlier. 156 . α 100 1 Y  1  Y  100 .5 234.645 -1.3 193. Chapter 4: Continuous Random Variables and Probability Distributions 83.2 ln(x) 84.30. The z percentile values are as follows: -1. and thus it would be reasonable to use estimating methods that assume a normal population distribution.8 -2 -1 0 1 2 z %ile The Weibull plot uses ln(observations) and the z percentiles of the p i values given.40. -1. thickness 1. -0. 0.-0.1 -0. 0. 0. -0. 0.78.58.5 -0.8 -2 -1 0 z %ile 157 1 2 .6 -0.86. -0.0 -0. 1.4 -0. 0. The accompanying probability plot appears sufficiently straight to lead us to agree with the argument that the distribution of fracture toughness in concrete specimens could well be modeled by a Weibull distribution. 0. 1.24.3 0.7 -0.58. -0.86.01.32. -1.40.8 1.24.01. and 1. -0.08.3 -0.78.08. The accompanying probability plot is reasonably straight. 01.132).9 0.109). The 10 largest z percentiles are 1.2 1.0 0. An assumption of population distribution normality is plausible.915).01.3 obsvn 1. 300 200 100 0 -2 -1 0 z %ile 158 1 2 .08. (.140). .736). .15. .064). (1. (. (-. 1. . . (.86. 500 400 load life a. (-.4 1.865). (. (1. .24. observation) pairs are (-1. 1.19 and . 1. 1. 1.007). .153).32.937). (-.Chapter 4: Continuous Random Variables and Probability Distributions 85.8 0. the remaining 10 are the negatives of these values. 1. suggesting that an assumption of population normality is extremely plausible. .78. .60. (-1. (-. The accompanying normal probability plot is reasonably straight.1 1.93.58.08.011).58.394). . .40.24. 1. (.40.7 -2 -1 0 1 2 z %ile 86.44. . 1. The accompanying probability plot is very straight. 1. (-1.76.78. 1.32.863).913).253). The (z percentile. 1.06.96.45.983).66. 1. (1. . (-.32. The accompanying probability plot is roughly as straight as the one for checking normality (a plot of ln(x) versus the z percentiles.93. . This plot and a normal probability plot for the original data appear below. 5 4 3 1000 2 1 0 -2 -2 -1 0 1 -1 0 z %ile 2 z % ile 159 1 2 . .13. take the natural logs and construct a normal probability plot. -2.76. appropriate for checking the plausibility of a lognormal distribution.59. -.95. -. these percentiles are -3. and 1.02.26.56. . -1. -. .65.73.12.55.any of 3 different families of population distributions seems plausible. so lognormality is plausible.37.44.16. the natural logs of the observations are plotted against extreme value percentiles.Chapter 4: Continuous Random Variables and Probability Distributions b. The curvature in the plot for the original data implies a positively skewed population distribution . . -1. -. -1. -.) ln(loadlif e) 6 5 4 -4 -3 -2 -1 0 1 W %ile To check for plausibility of a lognormal population distribution for the rainfall data of Exercise 81 in Chapter 1. -. . is also reasonably straight .like the lognormal distribution.31.68. Clearly the log transformation gives quite a straight plot. 8 3000 7 6 ln(rainfall) 2000 rainfall 87.30.40. . For a Weibull probability plot. -2.01. 1 0.0 0. It is very reasonable that the distribution is normally distributed.Chapter 4: Continuous Random Variables and Probability Distributions 88. The cube root transformation also results in a very straight plot. a.0 sqrt 1.5 1.6 -2 -1 0 z %iles 160 1 2 .6 1. The square root transformation results in a very straight plot. 2. 5 precip 4 3 2 1 0 -2 -1 0 1 2 z %iles b. It is reasonable that this distribution is normally distributed. cubert 1.5 -2 -1 0 1 2 z %iles c. The plot of the original (untransformed) data appears somewhat curved. 935 1.755 0.6 0.44 2.008 1 0 0 5 10 15 20 25 30 35 40 45 wi This half-normal plot reveals some extreme values.875 0. generated by Minitab) is quite linear.625 0.454 0. Anderson-Darling Normality Test A-Squared: 1.775 0. Normal Probability Plot .19 0.063 0.05 .01 .525 0. without which the distribution may appear to be normal.675 0. ordered absolute values (w's) 0.96 2 z values 90.001 125 135 145 strength Average: 134.38 30.44 1.32 0.89 1.54186 N: 153 We use the data (table below) to create the desired plot. The pattern in the plot (below.80 .Chapter 4: Continuous Random Variables and Probability Distributions 89.84 43.20 .999 .725 0.065 P-Value: 0. It is very plausible that strength is normally distributed. 161 .825 0.99 Probability .78 3.575 0.4 probabilities 0.15 1.96 12.975 z values 0.34 3.902 StDev: 4.27 1.50 .15 1.925 0.95 . dy = 25 25 ( A + B ) (0 + 25) E(X) = 2 = 2 15 = .. F(x)=0 for x < 0 and = 1 for x > 25. . .5 3.. ∫ x 0 1 x . λ = 1 can be used to assess the plausibility of the entire exponential family. 600 500 failtime 400 300 200 100 0 0.5 .4 25 a.. The (100p)th percentile η(p) for the exponential distribution with λ = 1 satisfies F(η(p)) = 1 – 5 1. 1.269. P(X ≥ 10) = P(10 ≤ X ≤ 25) = c.466. 1.0 0. For 0 ≤ X ≤ 25.Chapter 4: Continuous Random Variables and Probability Distributions 91.5 .421.856.0 2.520. . F(x) = d. .247. 1.633.398.758. 1516. 3.. .e.5 1. Because λ is a scale parameter (as is σ for the normal family).367. 16 .0 1.330. 10 = . 1. 2.5 2. . With n = 16. These are . we need η(p) for p = 16 .5 percentile Supplementary Exercises 92. casting doubt on the assumption of an exponential population distribution. Var(X) = 12 162 = 625 = 52.068.032. .0 3. . . η(p) = -ln(1 – p). P(10 ≤ X ≤ 20) = b.901.521.170. this plot exhibits substantial curvature.083 12 ..5 exp[-η(p)] = p. i.6 25 2 ( B − A) = 12. a.12 − y) ⋅ f ( y )dy 6 0 + ∫ min( y.d. so V(Y) = 43..3. for x > 0. x F ( x) = ∫ f ( y) dy = ∫ −∞ x 0 x 1 32  16 dy = − ⋅ =1− 3 2  2 ( y + 4)  0 ( y + 4) (x + 4 )2 32 ( F(x) = 0 for x ≤ 0.12 − y) ⋅ f ( y)dy = ∫ min( y.5 P(4 ≤ X ≤ 6) = F(6) – F(4) = .f. Thus 24 ∫0  12  24  2 36  0  0  1  y3  F(y) =   y 2 −  18   48   1 b. y<0 0 ≤ y ≤ 12 y > 12 P(Y ≤ 4) = F(4) = . f(x) is a legitimate pdf.241 12 1 12 2  y 1  y3 y 4  c. 12 – Y) so ∫ 12 E[min(Y.) Since F(∞) = ∞ ∫ −∞ f ( y )dy = 1.518 e. For 0 ≤ Y ≤ 25. Clearly f(x) ≥ 0. E(Y) = y 1 − dy = − =6   24 ∫0 24  3 48  0  12  1 12 3  y E(Y2 ) = y 1 − dy = 43.Chapter 4: Continuous Random Variables and Probability Distributions 93.2 .75 24 . the shorter segment has length min(Y. y 1 y u 2  1  u 2 u 3  a. is . The c.P(4 ≤ X ≤ 8) = . P(Y > 6) = 1 – F(6) = .2 – 36 = 7. 12 – Y)] = 0 min( y.2 ∫ 24 0  12  d. b. P(Y < 4 or Y > 8) = 1 .259.12 − y) ⋅ f ( y)dy = ∫ y ⋅ f ( y )dy + ∫ (12 − y) ⋅ f ( y) dy = 12 6 6 0 12 6 94. See above c. P(2 ≤ X ≤ 5) = F(5) – F(2) = 1 − 16  16  − 1 −  = . F(y) =  =  u − −  .259 = .5 .247 81  36  (continued) 163 90 = . Φ(-. a.9082 ) =1 . −∞ 32 (x + 4 ) 32 ∞ E ( x) = ∫ x ⋅ f ( x)dx = ∫ x ⋅ 2 dx − 4 ∫ ∞ 0 E(salvage value) = = ∫ ∞ 0 (x + 4 ) 32 ∞ 3 dx = ∫ ( x + 4 − 4 ) ⋅ 0 32 (x + 4 )3 dx dx = 8 − 4 = 4 (x + 4 )3 ∞ 100 32 1 3200 ⋅ dx = 3200 ∫ dx = = 16 .0918 ) (. P(X > 42) = 1 – P(X ≤ 42) = 1 − Φ  42 − 40  = 1 .5   1.Chapter 4: Continuous Random Variables and Probability Distributions d.67 3 4 0 x + 4 ( y + 4) (3)(64 ) ( y + 4) 95..56 c.  4  0 P(D ≥ 1 ) = 1 – P(D = 0) = 1 −  (.5 V a.Φ(1. By differentiation.5) = 1 − c.3197 0 164 4 .5) = = .0918   1 .5  Let D represent the number of diodes out of 4 with voltage exceeding 42.  42 − 40   39 − 40   − Φ   1.213 1 µ = 40 V.2514 = .33) . E(X) = 7 131 3 7 3  2 x ⋅ x dx + ∫0 ∫1 x ⋅  4 − 4 x  dx = 108 = 1.6803 = .5  P(39 < X < 42) = Φ  = Φ(1.33) = .5) = 41. σ = 1.. b.67) = . P(.  x2 7 3 f(x) =  − x 4 4  0 0 ≤ x <1 7 1≤ y ≤ 3 otherwise 1 7 11  7 3  (.04)(1. We desire the 85th percentile: 40 + (1.917  − 2  − ⋅ 2  − 2 3 3 12  4 4  3 96. ∞ −∞ =∫ ∞ 0 e.5 ≤ X ≤ 2) = F(2) – F(.9082 .6568 b. σ = 3.5.0005 = .3859  14   80 − 96   50 − 96   − Φ   14   14  = Φ(-1. 10. so P(Y ≥ 8) = b(8.2. .  24. and nq = 237. X ~ Binomial.1266.5   = 1 − Φ(3. .9162  1 .03. 10.33 σ 98.38) = 1 . .  100 − 96   = 1 − Φ(.9162.5 − 12.2 oz. The interval (72.6   a. Then p = P(S) = .97. 119. The random variable X = the number of defectives in the batch of 250. c.5 ≥ 10.Φ(-1... .Chapter 4: Continuous Random Variables and Probability Distributions 97.2810 − . With Y = the number among ten that contain more than 135 oz.9997 = . n = 250 ⇒ µ = np = 12.645)(14) = 119.87) = . 165 .1922 = .P(Xnorm ≤ 9.446.0888 a.5) = Φ(− . P(X > 100) = 1 − Φ b.645)(14) = 72.9162) + b(10. 10.Φ(-3.0003  3. a. Since np = 12.48) = 1 − .97. c.5) . P(Xbin = 10) ≈ P(Xnorm ≤ 10.29 ) = 1 − .5 ≥ 10. a = 5th percentile = 96 + (-1. Let S = defective. we can use the normal approximation. b = 95th percentile = 96 + (1.2  = 1 . µ = 137. P(50 < X < 80) = Φ  99.03) contains the central 90% of all grain sizes.446  P(Xbin ≥ 25) ≈ 1 − Φ b.1271 .58) − Φ(− .9549.9162) =..6 oz  135 − 137 .65 ⇒ σ = 1. Y ~ Bin(10.5) .2 = −1.05.6141 = .9162) + b(9.29) = . σ = 1. 135 − 137 .0838 = . µ = 137. P(X > 135) = 1 − Φ b. 267 1. 5 3 3 1 3 1 so E[h(X)] = = ∫ (x − 1 .5  1  1. P(1.9. 2 x 2 1 1 ≤ x ≤ 1.5.5 ≤ x ≤ 2. 5 2 x 2 x 101. 1 1 x 3  11 4 − y 2 dy =  4 x −  + −1 9 9 3  27 F ( x) = ∫ c. ∫ ∫ 3 1 3 31 3 ⋅ 2 dx = ∫ dx = 1. Since F(0) = 11 .5(1 .5 ln( x )]1 = 1. F(x) = 0 for x < -1 or == 1 for x > 2.Chapter 4: Continuous Random Variables and Probability Distributions 100.5.5 ≤ x ≤ 3 2.1 0.5) – F(1. Y is a binomial r. the . For –1 ≤ x ≤ 2.5 2 .5) = .5) = F(2. F ( x) = 1 = ∫ 0dy + ∫ −∞ 3 1  ⋅ dy = 1 .5) = 1.0 -2 -1 0 1 2 3 x b.4) = .5  0  e.5) ⋅ ⋅ 2 dx + ∫ 1 ⋅ ⋅ 2 dx = .284. a. F(X) = 0 for x < 1 and = 1 for x > 3. a. so V(X) = E(X2 ) – [E(X)]2 = . 3 x⋅ 1 E(X2 ) = = ∫ 3 1 x −∞ f ( y) dy 1  x b. E(X) = = d.648 2 x 2 1 x x2 ⋅ σ =. x ( ) The median is 0 iff F(0) = . with n = 10 and p = P(X > 1) = 1 – F(1) = 166 5 27 11 < 27 . Because median must be greater than 0. h(x) =  x − 1 . 0. d.4 f (x) 0..v.511 − 2 y2  x 1 P(X ≤ 2.553 3 1 3 3 ⋅ 2 dx = ∫ dx = 3 .5 2 .3 0.5 ≤ x ≤ 2.5) = F(2.4 c. 27 this is not the case. For 1 ≤ x ≤ 3.2 0. whence c = ln(. so exp  = 0 .0) = 3-. which implies that x = α.0) = 1 – F(3. Thus the mode is α. b..250367) + 150 = 352.9 = F(c) = 1 – e-(. where 90    exp(u) = eu . Taking the natural log of each side twice in succession 90    − (c − 150) yields ln[ ln(.53. 1 1 = 1. 90 .93) 103.828 . 1) = 2.0) = . P(3.0 ≤ X ≤ 3. σ = = 1.93)c. so c = 90(2. c λ ∫ c(1 − . E(X) = b.95.Chapter 4: Continuous Random Variables and Probability Distributions 102.99.5e )⋅ λ e ∞ 0 ax −λx dx = 167 c[.9 = exp − exp   − ( x − α )   − (x − α )  1 ⋅ exp − exp   ⋅ exp   β β β      c.6667 )] = . The 90th percentile is requested.828 .075 λ λ a.368 = . so c satisfies   − (c − 150)   . E[c(1 . we have .333 c.5e ax)] = 104.0) = . The distribution is positively skewed. this is the same as the value of x for which ln[f(x)] is a maximum.5772β + α = 201. denoting it by c. The desired value c is the 90th percentile. a. E(cX) = cE(X) = b.5λ − a] λ−a .0) – F(1.460. P(X ≤ 150) = exp − exp and P(150 ≤ X ≤ 300) = . f(x) = F′(X) = d. P(X ≤ 300) = exp[ − exp( − 1.476 ( −. E(X) = .9)] = . whereas the mode is 150 and the median is –(90)ln[-ln(.0614 P(1. β β   e. a.5)] + 150 = 182.. The equation of d[ln ( f ( x) )] = 0 gives dx  − (x − α )  − (x − α )  = 1 .93(3.0 < X) = 1 – P(X ≤ 3.368 . We wish the value of x for which f(x) is a maximum.0) = F(3.075.   − (150 − 150)   = exp[ − exp(0)] = exp( −1) = . 1e .α . 2 P(-1 ≤ X ≤ 2) – F(2) – F(-1) = .256.Chapter 4: Continuous Random Variables and Probability Distributions 105. 1 .10 0.1e −. α . ∫ 106.2 x dx = .1e . c.09 0. 2  e.08 0. σ) or by differentiation.00 -2 -1 0 1 2 x b.5e -. . c. β )] = − ln β α − ln (Γ (α )) + (α − 1) ln( x) − d α −1 1 ln[ f (x.5 = 1 0 −∞ 0 0. F(x) = ∫ For x ≥ 0.670 168 x β.5 . a.2 x e . 1 .(-2 ≤ X ≤ 2) = .02 0. µ.665.2 y dy = 1 − e −. F(x) = x 1 1 + ∫ . ( ) ln[ f ( x. No. 2 0 2 x −∞ P(X < 0) = F(0) = .5 + .03 0. β )] = − ⇒ x = x* = (α − 1) β dx x β ν  − 1(2) = ν − 2. F(x. b. the density function has constant height for A ≤ X ≤ B.4 = . From a graph of f(x.06 0. P(X < 2) = F(2) = 1 . x* = µ.2 x dx + ∫ .1e −.λ) is largest for x = 0 (the derivative at 0 does not exist since f is not continuous there) so x* = 0.05 0.04 0.2 y dy = 1 = .2 x . From d x* =  a.. ∞ −∞ ∞ f ( x)dx = ∫ .01 0. 2 For x < 0. d.07 fx 0. λ1 . so that CV > 1. and = ∫ [p λ e ∞ − λ1 x 1 0 ] ∞ ∫ −∞ f ( x. For x > 0. p) ≥ 0 for all x.Chapter 4: Continuous Random Variables and Probability Distributions 107. . λ 2 . p)dx ∞ ∞ + (1 − p)λ 2 e − λ2 x dx = p∫ λ1 e − λ1x dx + (1 − p )∫ λ 2 e −λ2 x dx 0 0 = p + (1 – p) = 1 b. λ1 . λ2 . λ1 . 2 ( )  2 p λ 22 + (1 − p )λ12  =  − 1 2  ( p λ 2 + (1 − p )λ1 )  ( pλ 2 2 + (1 − p)λ12 1 2 ) . Clearly f(x. a. so Var(X) = + + − +  λ21 λ22 λ21 λ22 λ2   λ1 2 2 e. c. p) = E(X) = ∞ ∫ 0 ∫ x 0 f ( y. λ2 so σ = 169 n λ and CV = 1 n < 1 if n > 1. λ1 . For an exponential r. E(X ) = . [ ] x ⋅ pλ1 e −λ1 x ) + (1 − p)λ 2 e −λ2 x ) dx ∞ ∞ 0 0 = p∫ xλ1 e −λ1 x dx + (1 − p) ∫ xλ 2 e −λ2 x dx = p (1 − p) + λ1 λ2 2 p 2(1 − p ) 2 p 2(1 − p )  p (1 − p)  d. F(x. CV = 1 λ 1 λ    2 p + 2 (1− p )   λ2  λ22  1  − 1 2 CV =     p ( 1 − p )  +     λ1  λ2  = [2r – 1]1/2 where r = 1 = 1. µ= n . λ σ2 = n . But straightforward algebra shows that r > ( pλ 2 + (1 − p) λ1 )2 1 provided λ1 ≠ λ 2 . λ2 . λ2 .v. f. p)dy = p(1 − e −λ1 x ) + (1 − p)(1 − e − λ2 x ). For X hyperexponential.. Var o  = e 2 +.72. 1= ∫ ∞ 5 b. the cdf of an exponential r.v.v. k 51−α dx = k ⋅ ⇒ k = (α − 1)51−α where we must have α > 1. α x α −1 For x ≥ 5.05  I E  o  Ii  1+.0025 − 1 = . with parameter α . ( ) ( ) α −1 109. I   i  I    I   I  P(I o > 2I i ) = P o > 2 = P  ln  o  > ln 2  = 1 − P ln  o  ≤ ln 2      Ii    Ii     Ii    ln 2 − 1  1 − Φ  = 1 − Φ(− 6. F(x) = E(X) = ∞ ∫ 5 d. I  A lognormal distribution. a. b.14 ) = 1  .0025 ⋅ e .1. c.0185   Ii  ( 170 ) . dx = x ⋅ α −1 dx = α −2 α ∫ 5 x x 5 ⋅ (α − 2)  X  X   5  P ln   ≤ y  = P ≤ e y  = P X ≤ 5e y = F 5e y = 1 −  y   5   5e    5  − (α −1 ) y 1− e . since ln  o  is a normal r.0025/ 2 I   = e = 2. provided α > 2. c. a. x⋅ ∫ x 5 k 1   1  5 dy = 51 −α  1−α − α −1  = 1 −   α y x  5 x α −1 .Chapter 4: Continuous Random Variables and Probability Distributions 108. ∞ k k k . Chapter 4: Continuous Random Variables and Probability Distributions 110.178 MPa. b. α  ⋅ = α  60 β  60 β (60β ) Γ(α ) = f  with parameters α and 60β.5950 d. α . − (175 )9 P(X > 175) = 1 – F(175.0 0 50 100 150 200 250 C1 b. a. A small bit of algebra leads us to x = 140.3636 c..5398 . 180) = 1 − e 180 . 180) = e 180 = . 9.  FY (y) = P(Y ≤ y) = P(60X ≤ y) = P X ≤  y  y  . P(at least one) = 1 – P(none) = 1 – (1 .1762 = . 171 .F(150. 112. which shows that Y has a gamma distribution . 9. Thus 10% of all tensile strengths will be less than 140. 111.10 = F( x.3636)2 = .. Thus fY (y)  = F  60   60 β  −y α −1 60 β  y  1 y e .178. 9.5 0. 180) = . a. C2 1. We want the 10th percentile: . −   F(y) = P(Y ≤ y) = P(σZ + µ ≤ y) = P Z ≤ (y − µ) = σ   ∫ ( ) ( y− µ ) σ −∞ x 9 1 2π e − 12 z 2 dz . Now differentiate with respect to y to obtain a normal pdf with parameters µ and σ. the same argument shows that cX has a gamma distribution with parameters α and cβ. With c replacing 60 in a.0 0.4602 P(150 ≤ X ≤ 175) = F(175. 9. 180) . 00002145 d. we could make similar calculations as in part c for each possible day.5   − Φ  = Φ(− 1. and P(x = -24) ˜ . Thus since f(x) = 1.542 115. = ν e = ν e = ν e 2 α 2σ 2 β 2σ ( ) which is in the Weibull family of distributions.Chapter 4: Continuous Random Variables and Probability Distributions 113. To calculate the probability of the three births happening on any day. the baby due on April 1 was 21 days early.0097 Again.03) − Φ(− 1. ( 3651)3 (365) = .4286 − .4090 = . cdf: F ν .0000073 c. If we let α = 2 and β = y 1 and k′(y) = . b. so y has an exponential distribution with parameter λ = 1.88  Similarly. 2 = 1 − e 800 = 1 − .0196)(. P(all 3 births occur on March 11) = b.88   19. and then add the probabilities.5  = Φ  − Φ  = Φ(− .08) = .0114)(. from which the result follows easily.  19. Then the baby due on March 15 was 4 days early. 88    19. then we can manipulate f(v) as follows: 2 2 2 α α −1 − (ν β )2 −ν 2 / 2σ 2 2 −1 −(ν / 2σ ) . P(x = -4) ˜ P(-4.0114 . 172 . Assuming independence.5)  − 3 . from which the σ result follows easily. (3651)3 = . y = σZ + µ ⇒ y = h(z) = σZ + µ ⇒ z = k(y) = (y − µ) σ and k′(y) = 1 . a. f (ν ) = ν σ 2 e −ν 2 / 2σ 2 2σ .88).00000002 Let X = deviation from due date. 19.0097) = .458 = . assuming independence. P( all 3 births occurred on March 11) = (.5 < x < -3. c. so − 625 F 25. X∼N(0.5   − 4.0196 . Y = -ln(X) ⇒ x = e-y = k(y).5   − 21.1515 − . y = h(x) = cx ⇒ x = k(y) = a. so k′(y) = -e-y .18 ) − Φ(− . 19 . F (ν ) = ∫ 25 0 ( ( ) −ν −  ν 800 e dν . b.2. 2σ = 1 − e  400 ) ν 2σ   − v2 = 1 − e 800 . c c 114.2.88  ˜ Φ The baby due on April 4 was 24 days early.1401 = . a.237 ) = . and P(x = -21)  − 20. g(y) = 1 ⋅ | -e-y | = e-y for 0 < y < ∞. giving E(g(X)) ≈ g(µ).µ) ≤ g(X) implies that E[g(µ) + g′(µ)(X . 173 . …and computing xi = − 1 ln (1 − ui ) .µ)] = E(g(µ)) + g′(µ)⋅E(X . while for α < 1 it is a decreasing function.e. v −v v v g ( I ) = . for α > 1 this is increasing.µ) = (g′(µ))2 ⋅V(X). ( )  1  ln (1 − U ) ≤ x  = P(ln(1 − U ) ≥ −λ x) = P 1 − U ≥ e −λx  λ  −λx − λx = P U ≤1 − e = 1 − e since FU (u) = u (U is uniform on [0. 118. By taking successive random numbers u 1 . so E (g ( I ) ) = µ R ≈ = I I µ I 20 2  −v v v V ( g( I ) ) ≈  2  ⋅ V (I ). g ′(I ) = 2 . Thus X has an FX(x) = P − ( ) exponential distribution with parameter λ.µ = 0 and E(g(µ)) = g(µ) ( since g(µ) is constant). that g(E(X)) ≤ E(g(X)). E(g(X)) ≈ E[g(µ) + g′(µ)(X .µ)] = (g′(µ))2 ⋅V(X . b. b. ln(1 – F(x)) = − ∫ x2   − α  x−  2β   x  x2   .  α βα  r(x) =   α −1  x . u 2 .µ)] = V[g′(µ)(X . a. 10 … we obtain a sequence of values generated from an exponential distribution with parameter λ = 10. a constant (independent of X). so r(x) = λ e −λx = λ .µ). e −λx this is consistent with the memoryless property of the exponential distribution. V(g(X)) ≈ V[g(µ) + g′(µ)(X . F(x) = λe − λx and F(x) = 1 − e − λx . b. σ g ( I ) ≈ 2 ⋅ σ I = 20 800  µI  119.Chapter 4: Continuous Random Variables and Probability Distributions 116. a.   c.µ)] = E(g(µ)) = g(µ) ≤ E(g(X)). u 3 . g(µ) + g′(µ)(X . α 1 − dx = −α  x − ⇒ F ( x ) = 1 − e  β 2 β      x2   x  −α  x− 2 β  f(x) = α 1 − e   β 0≤x≤ β 117. i. a. 1]). but E(X) . and then differentiate with respect to y to .  2X 2 For y > 0. replace x by  β y   β 2y  . F ( y ) = P(Y ≤ y) = P  2  β take the cdf of X (Weibull). 174 . Now ≤ y  = P  X 2 ≤  = P  X ≤   2  2     β y 2 obtain the desired result fY (y).Chapter 4: Continuous Random Variables and Probability Distributions 120. a. P(0. By summing row probabilities. 1.2) + p(3.2) + p(2. p(1.0) = . Y = 1) = p(1.30 .1 .3) + p(2.34.20 b.0) + p(1.1 1.05 . a.10+.0) + p(3.6 .1) + p(4.025 .8)(.15+.0384 ≠ .y) 0 1 y 2 3 4 0 1 2 . 175 . P(X ≤ 1 and Y ≤ 1) = p(0.1) + p(1.0) + p(0.12 .50 for x = 0. 2.0) = .1) = . P( X1 = X2 ) = p(0.24.3) + p(1. . 2.3)=. the entry in the 1st row and 1st column of the joint probability table. .1) + p(4.015 .5 .03 . At least one hose is in use at both islands.1) + p(4.70 d.30 d. but p x(0) ⋅ p y (0) = (.24) = .0) + p(3.1) + p(2. and by summing column probabilities.02 . P( X + Y = 0) = P(X = 0 and Y = 0) = p(0.07 = .22 d.10. x2 ): x1 ≥ 2 + x2 } ∪ { (x1 .0) + p(1.2) + p(0.015 .38.01 . 2.18 .38 for y = 0.3) = . P(X = 1. . A = { (x1 .7) = P(X ≤ 1) ⋅ P(Y ≤ 1) c.2) + p(3. P(X + Y ≤ 1) = p(0.01 .1) = .2 .2) + p(4.2 b. so X and Y are not independent.1) = .53 a.0) + p(0.56 = (. p x(x) = .04 .06 .50 e.3) =.16.17 P(at least 4) = P(exactly 4) + p(4.3 .1) + p(1.3) + p(3.2) + p(0. P( exactly 4) = p(1. x p(x.10 .0) = .42 c.1) + p(2. p y (y) = . P(X ≤ 1 and Y ≤ 1) = p(0.2) = . x2 ): x2 ≥ 2 + x1 } P(A) = p(2.15.05 .0) = .16)(.2) + p(3.10. .3) + p(2.1) = . b.1) + p(1.CHAPTER 5 Section 5.40 c.0) + p(4.025 .1) + p(1.46 3.1) + p(4. P(X ≠ 0 and Y ≠ 0) = p(1.0) + p(1.08+. 1.05 .0) + p(0. P(X ≤ 1) = p x(0) + p x(1) = . 19 . 3.3 or 2.6) (.1) + p(1.0) + p(1. P(X = 4.0) = 0.6)3 + (.0) + p(1.25)(.19 P1 (1) = P(X1 = 1) = p(1.2) + p(1.2)(. 3 .0) = .14 .30 .0518  2   176 .6) + (.Chapter 5: Joint Probability Distributions and Random Samples 4. x2 ) ≠ p 1 (x1 ) ⋅ p 2 (x2 ) for every (x1 . 6.  4  2 2  (.3) + p(4. 2.3) = . each with 1 package) = P( each has 1 package | 3 customers) ⋅ P(3 customers) = (.3). p(4. 3.3)(.25) = .2) = P( Y = 2 | X = 4) ⋅ P(X = 4) = b. there are 4 different ways to have a total of 11 packages: 3.15) = . 3.4) = .6)3 ⋅ (.6)2 + (.0) + p(3.3) = .4014 5.1)3 (.28 . a. 2 or 3.3)(.15)(.1+(. Each way has probability (. 3 or 3. etc x2 0 1 2 3 p 2 (x2 ) .25 .12 P2 (0) = P(X2 = 0) = p(0.0) + p(2. P1 (0) = P(X1 = 0) = p(0.19 > 0 . etc.30 .12 > 0 and p 2 (0) = . P(X = Y) = p(0. and the two variables are not independent.0) + p(1.1) + p(2.0) + p(0. p(4.054 b. yet p 1 (4) = .2) + p(0.1)3 (. so p(x1 .23 c.30. 2.19 .4)  ⋅ (. Y = 3) = P(3 customers. x1 0 1 2 3 4 p 1 (x1 ) . P(X = 3. a. 3.1) + p(0.19. 11) = 4(.15) = . Y = 11) = P(total of 11 packages | 4 customers) ⋅ P(4 customers) Given that there are 4 customers.0) + p(4. b. so p(4.00018 a.6)4 = .2) + p(3. 3. 3. x2 ). 2.05.10 . 4) 3 (.100.0194  4 (.15) = .20. p x(1) = . 2.1) = .620 = .y) = 0 unless y = 0.1) or (2.3590 1 1  p y (3) = p(3.15) = .25.1058  3  3 2 2 p y (2) = p(2.y) = p x(x) ⋅ p y (y).3) = p y (0) = 1 – [. p(x.25) +  (.2) + p(3.4)(. 177 . p y (2) = .0) + p(1.Y)=(0.1) + p(1. …. p x(5) = . 3.6)(.1) + p(1.1) + p(3. 2) +   (.15) = .0194] = .4) = (. x = 0. y = 4) = p(4.1) or (1.2) = (. 3)  1 3  4  (.1) + p(2.1) + p(4. 6) 2 (. P(Y = 1) = p(0. p x(2) = .6)(. P(overflow) = P(X + 3Y > 5) = 1 – P(X + 3Y ≤ 5) = 1 – P[(X.10.6) (.Chapter 5: Joint Probability Distributions and Random Samples c.3. P(X = 1) = p(1.4) x − y ⋅ p x ( x )  y p y (4) = p(y = 4) = p(x = 4. so X and Y are independent.0) or …or (5.1058+. p(1. 6) y (. p x(4) = .15) = . 25) +  (.2678+. P(X ≤ 1 and Y ≤ 1 = p(0.030 b.1) + … + p(5. 1. 4.4)(.2) + p(4. 4) 2 (.3) + p(4. 1.1) = (.300 d. 4) 2 (.6)(.6)4 ⋅(. It is now easily verified that for every (x.6) (.6) 3 (.6) 3 (.1)] = 1 .3) +  (. p(x.0) or (0.1) = .2480 7.120 c. p(x. x.3590+. For any such pair. 4)(.2) = . a.30. p x(3) = .y).y) = P(Y = y | X = x) ⋅ P(X = x) =  x  (.1) = . those for Y (column sums) are p y (0) = . p y (1) = ..6)(.0) + p(0. The marginal probabilities for X (row sums from the joint probability table) are p x(0) = .5.0) + p(1.380 e. 25)  2  4 +  (.2678  2  2 p y (1) = p(1. 775 . y )dydx − ∫ 30 x − 2 ∫ 22 20 = (after much algebra) . y ) dxdy − ∫∫ f ( x. a.y) =   30     6   0 otherwise 9.240 denominator =   = 593.Chapter 5: Joint Probability Distributions and Random Samples 8. a.000  3  ∫∫ K ( x 2 + y 2 ) dxdy = 12 K ∫ x 2 dx 26 26 b. 1= ∞ ∫ ∫ ∞ f ( x.3593 178 f ( x. 30 I y = x+ 2 y = x− 2 III II 20 20 P( | X – Y | ≤ 2 ) = 30 ∫∫ f (x .2) = = .304K = .0509 593. y ) dxdy I 28 30 1− ∫ ∫ 20 x +2 II f ( x.240  3  2  1   30  30. p(x. y ) dydx . y ) dxdy = ∫ ∫ = K ∫ ∫ x dydx + K ∫ ∫ −∞ −∞ 30 30 30 30 K ( x 2 + y 2 ) dxdy 20 20 30 30 2 20 20 20 20 y 2 dxdy = 10 K ∫ x 2 dx + 10 K ∫ y 2 dy 30 30 20 20 3  19. p(3.775 6 numerator = x. P(X < 26 and Y < 26) = 20 20 26 20 26 4 Kx 3 20 = 38. y )dxdy region III 1 − ∫∫ f ( x.3024 c.  8 10 12      = (56)(45 )(12) = 30. y _ are _ non − negative int egers _ such _ that 0 ≤ x+ y ≤6   8 10  12         x  y  6 − ( x + y )  b.000  = 20 K ⋅  ⇒K = 380. e m ∑  k λ µ k m −k = e 179 .25 ≤ Y ≤ 5.25 ≤ X ≤ 5. 6 I y = x −1/ 6 y = x +1/ 6 II 5 P((X.1) + p(1. a.75) ⋅ P(5.306 36 36 11. p(x. y) dy = ∫ K ( x + y ) dy = 10 Kx + K 20 3 ∞ 30 −∞ 2 2 30 2 20 20 ≤ x ≤ 30 = 10Kx2 + .25 ≤ X ≤ 5.Chapter 5: Joint Probability Distributions and Random Samples d. fy (y) = 1 for 5 ≤ x ≤ 6. P( X+Y=m ) = e −λ −µ [1 + λ + µ ] m m ∑ P( X = k . …. 5 ≤ y ≤ 6 b. P(5. 5. so X and Y are not independent.y) = b. fx(x) = ∫ y3 f ( x. y = 0. 1.y) ≠ fx(x) ⋅ fy (y). clearly f(x. 2. fy (y) is obtained by substituting y for x in (d). 1  0 5 ≤ x ≤ 6 .0) = c.25 c.25 ≤ Y ≤ 5. Y = m − k ) = ∑ e − λ − µ k =0 − (λ + µ ) m − (λ + µ ) k=0 m λk µ m =k k ! ( m − k )! (λ + µ ) . … x! y! a.5) = . f(x. so the total # of errors X+Y also has a m! k = 0 m! Poisson distribution with parameter λ + µ .Y) ∈ A) = 5 6 ∫∫1dxdy A = area of A = 1 – (area of I + area of II ) = 1− 25 11 = = . 1. e.y) = 10.05.75) = P(5.75.75) = (by independence) (. 2.5 ≤ y ≤ 6 otherwise since fx(x) = 1.5)(.0) + p(0. p(0. e−λλ x e−µ µ y ⋅ for x = 0. 25 − .25e −12 = . P(X1 < t.300 13. and other 5 don’t) = 9  9 5   1 − e − λt (e −λt ) 4 e − µt +   1 − e −λt  5  4 ( ) ( ) 180 ( ) (1 − e )(e 4 −µ t − λt 5 ) . P(X ≤ 1 and Y ≤ 1) = P(X ≤ 1) ⋅ P(Y ≤ 1) = (1 – e-1 ) (1 – e-1 ) = . … .264 . If “success” = {fail before t}. so P( 1 ≤ X + Y ≤ 2 ) = P(X + Y ≤ 2) – P(X + Y ≤ 1) = . 10  k  1 − e −λt ( e −λt ) 10− k k P(exactly 5 fail) = P( 5 of λ’s fail and other 5 don’t) + P(4 of λ’s fail.264 = . P(X + Y ≤ 2) = 2 2− x ∫∫ =∫ 0 0 2 0 d.050 xe− x (1+ y ) dy = e − x for 0 ≤ x. a. P(X + Y ≤ 1) = [ ] e − x − y dydx = ∫ e − x 1 − e −( 2− x ) dx 2 0 (e − x − e − 2 ) dx = 1 − e − 2 − 2e − 2 = . so the two r.594 ∫ e [1 − e 1 otherwise −x 0 −(1 − x) ]dx = 1 − 2e −1 = . P( at least one exceeds 3) = 1 – P(X ≤ 3 and Y ≤ 3) ∫∫ =1 − ∫ e =1 − 3 3 xe− x (1+ y ) dydx = 1 − ∫ 0 0 3 −x 0 ∫ 3 3 0 0 xe− x e − xy dy (1 − e −3 x ) dx = e −3 + . ∞ ∞ ∫∫ xe − x (1+ y ) dydx = a.400 c. (1 − e − λt )10 1 − e −λt . f(x. y ≥ 0 a.Chapter 5: Joint Probability Distributions and Random Samples 12. P(X> 3) = b. µ fails.v’s are not independent. then p = P(success) = and P(k successes among 10 trials) = c.y) = fx(x) ⋅ fy (y) = b. that of Y is 1 for 0 ≤ y. X2 < t. c.330 14. X10 < t) = P(X1 < t) … P( X10 < t) = b. The marginal pdf of X is ∫ ∞ 3 3 0 ∫ ∞ 0 ∫ ∞ 3 e − x dx = . It is now clear that f(x.. e − x− y   0 x ≥ 0.y) is not the product of (1 + y ) 2 xe− x (1+ y ) dx = the marginal pdf’s.594 . 5 . P(X1 + X3 ≤ . 5 − x1 ∫∫ 0 0 72 x1 (1 − x3 )(1 − x1 − x 3 ) 2 dx 2 dx1 = (after much algebra) . P(( X . x 3 )dx 2 = ∫ 0 kx1 x 2 (1 − x 3 )dx 2 72 x1 (1 − x3 )(1 − x1 − x 3 ) 0 ≤ x1 . x 3 ) dx 3 = −∞ ∫ 72 x (1 − x )(1 − x 1 18x1 − 48 x12 + 36 x13 − 6 x15 17. A 1 ∫∫A dxdy = πR 2 = 4 = . Y ) within a circle of radius R 2 3 1 2 0 ≤ x1 ≤ 1 ) = P( A) = ∫∫ f ( x.− ≤ Y ≤  = 2 = 2 2 2  πR π  2 181 . x 2 . a.25 b.P[(X1 ≤ y) ∩ (X2 ≤ y) ∩ (X3 ≤ y)] = b. f(x1 . ∞ f x1 ( x1 ) = ∫ f ( x1 .of .Chapter 5: Joint Probability Distributions and Random Samples 15. F(y) = P( Y ≤ y ) = P [(X1 ≤y) ∪ ((X2 ≤ y) ∩ (X3 ≤ y))] = P (X1 ≤ y) + P[(X2 ≤ y) ∩ (X3 ≤ y)] .53125 c. a. y)dxdy A 1 = πR 2 − x3 ) dx3 area. x1 + x3 ≤ 1 2 b. a. R R R  R2 1  R P − ≤ X ≤ . x3 ) = ∫ ∞ 1 − x1 − x3 −∞ f ( x1 .5) = . 0 ≤ x3 . (1 − e − λy ) + (1 − e − λy ) 2 − (1 − e − λy ) 3 for y ≥ 0 ( E(Y) = ∫ ∞ 0 ) ( λe − λy + 2(1 − e − λy ) λe − λy − 3(1 − e − λy ) 2 λe − λy −2 λy = 4λ e − 3λe −3λy for y ≥ 0 f(y) = F′(y) = ( y ⋅ 4λ e − 2 λy − 3λ e −3 λy ) )dy = 2 1  1 2 = −  2λ  3λ 3λ 16. X and Y are not independent since e.34 .1765 .60 c.9R) > 0. f x (x) = ∫ ∞ −∞ f ( x.9R.50: y Py|X(y|2) 0 1 2 . yet f(. PX|Y(x|2) results from dividing each entry in the y = 2 column by p y (2) = .28 = . fx(. P( Y ≤ 1 | x = 2) = Py|X(0|2) + Py|X(1|2) = .2353 .40 d.9R) is outside the circle of radius R.38: x Px|y(x|2) 0 1 2 .1579 .9R) = fY (. Py|X(x|2) is requested.12 .34 .0526 .5882 . 18.9R.  R R R R  2R 2 2 P − ≤X≤ .20 Py| x (1 | 1) = = .12 + .34: .− ≤Y ≤ = = 2 2 2 2  πR 2 π  d. .9R) = 0 since (.g. y )dy = ∫ − 1 2 R2 − x 2 dy = for –R ≤ x ≤ R and R 2 − x 2 πR 2 πR 2 R2 − x 2 similarly for fY (y). .08 = .28 .34 Py| x ( 0 | 1) = b. Py|X(y|1) results from dividing each entry in x = 1 row of the joint probability table by p x(1) = .06 Py| x ( 2 | 1) = = .7895 182 .Chapter 5: Joint Probability Distributions and Random Samples c. a. to obtain this divide each entry in the y = 2 row by p x(2) = . 05) dy = . x 3 ) dx 2 dx3 For every x and y. x 2 . x 2 . x 2 ( x1 . X2 ) = b. a. x2 (x 3 | x1 .Chapter 5: Joint Probability Distributions and Random Samples 19. a. fY|X(y|x) = fy (y). x3 ) where f x1 ( x1 ) f ( x1 .243976 20.372912 ∫ E( Y2 | X=22 ) = 30 20 y2 ⋅ k (( 22) 2 + y 2 ) dy = 652. x3 | x1 (x 2 . f ( x. P( Y ≥ 25 | X = 22 ) = ∫ 30 25 20 ≤ y ≤ 30 20 ≤ x ≤ 30  3  k =  380. x 2 . f x3 |x1 .05 b.05 = 25. ∫ f Y ( y ) dy = ∫ (10ky 2 + .05 V(Y| X = 22 ) = E( Y2 | X=22 ) – [E( Y | X=22 )]2 = 8. x3 ) where f x1 . x 2 ) = ∫ of (X1 . x2 .05 k (x 2 + y 2 ) f X |Y ( x | y) = 10 ky2 + . x3 )dx 3 f x2 . since then f(x. x2 ( x1 .05 dy = .75 30 30 25 25 ∫ E( Y | X=22 ) = ∞ −∞ y ⋅ f Y | X ( y | 22) dy = ∫ 30 20 y⋅ k (( 22) 2 + y 2 ) dy 10k ( 22) 2 + . 183 .028640 10k (22) 2 + .y) = fY|X(y|x) ⋅ fX(x) = fY (y) ⋅ fX(x). as required.000   f Y | X ( y | 22) dy k (( 22) 2 + y 2 ) ∫25 10k (22) 2 + . ∞ −∞ f ( x1 .783 30 = P( Y ≥ 25 ) = c. x 2 ) f ( x1 . x 2 ) = the marginal joint pdf f x1 . y ) k ( x 2 + y 2 ) f Y | X ( y | x) = = f X ( x) 10kx 2 + . x 3 | x1 ) = f x1 ( x1 ) = ∫ ∞ ∫ ∞ −∞ − ∞ 21. f ( x1 . 0) + . E[h(X.. 06) + .02) + (5)(.2 22.07) + … + (4 – 3)(. y ) x y = ( 0)(. E[max (X.y) 1 2 3 4 5 6 1 - 2 3 4 3 2 2 2 - 2 3 4 3 3 3 2 - 2 3 4 4 4 3 2 - 2 3 5 3 4 3 2 - 2 6 2 3 4 3 2 - for each possible (x.06) + .15 (which also equals E(X1 ) – E(X2 ) = 1. E(X1 – X2 ) = 3 ∑ ∑ (x x1 =0 x2 = 0 1 − x 2 ) ⋅ p ( x1 . + 35 ⋅ p(5. + (15)(. y ) = 0 ⋅ p( 0.y). Revenue = 3X + 10Y.60 4 23. y ) ⋅ x 25. y x Since p(x.02) E( X + Y ) = x y + ( 0 + 5)(.Y)] = ∑∑ h( x .08) + (0 – 1)(. + (10 + 15)(. a. so E (revenue) = E (3X + 10Y) 5 2 1 30 = y = ∑∑ ( 3x + 10 y ) ⋅ p ( x. ∑∑ ( x + y ) p( x..70 – 1.Y) = # of individuals who handle the message.55) 24. 01) = 9.y) = 1 30 h(x.10 b.. E(XY) = E(X) ⋅ E(Y) = L ⋅ L = L2 26..2) = 15.80 . x2 ) = (0 – 0)(.Chapter 5: Joint Probability Distributions and Random Samples Section 5.Y)] = ∑∑ max( x + y) ⋅ p( x. Let h(X.4 x= 0 y =0 184 84 30 = 2..01) = 14. y ) = ( 0 + 0)(.06) = .. a.Y = a.05 dx = 25. E(X) = 5.8246 = E (Y 2 ) .25. ∫ 30 20 ] xf x ( x ) dx = ∫ x 10 Kx 2 + . so Var (X) = − = 0 60 5 5 25 25 −2 1 50 75 Similarly. y ) = ∑∑ xy ⋅ p ( x) ⋅ p x 29.2664 ⇒ρ= − . so ρ X . E(X2 ) = ∫ 30 20 [ ] x 2 10 Kx 2 + .02) + (0)(.Y) = − 30.447 ⇒ Cov ( X .15) 20 20 xy ⋅ K ( x 2 + y 2 )dxdy = 641.447 − ( 25. σ Y2 = 19.111 = −.01) = 44.329) 2 = −.20 = −. so Var (X) = Var(Y) = 649.55) = -3. σ X2 = 12.329)2 = 8.Y = =− = −.Chapter 5: Joint Probability Distributions and Random Samples ∫∫ 1 1 27.15 .25 – (5.111 b. Y ) = 641.8246 – (25. E[h(X. (replace Σ with ∫ 2 ydydx x5 1 dx = 6 3 ∑∑ xy ⋅ p ( x. E(X) = 31.0134 (8. so Cov(X. E(XY) = (0)(.Y)] = 0 0 x − y ⋅ 6 x 2 ydxdy = 2 ∫ 0 0 ∫ (x 1 x 12∫ 3 0 0 28. Var(Y) = .Y) = 44.20 b. so ρ X .05 dx = 649.55)(8.2664)(8.329 = E(Y ) 30 20 ∫∫ 30 30 E(XY) = [ − 3.2664) 185 .207 (12. E(X2 ) = ∫ x ⋅ f x ( x )dx 0 75 5 1 12 1 1 4 1 = 12∫ x 3 (1 − x 2 dx ) = = .45)(19. E(XY) = ) y − x 2 y 2 dydx = 12∫ 1 0 x y x = E(X) ⋅ E(Y).55.667 1 1 25 75 ⋅ 25 25 Cov(X.55.45. ∫ (x − y) ⋅ 6x 1 x y y ( y ) = ∑ xp x ( x) ⋅∑ yp y ( y ) x y in the continuous case) 1 2 2 2 and µ x = µ y = .06) + … + (150)(. E(Y) = 8. Y ))] 2 x y in the continuous case. b. cY + d) = E[(aX + b)(cY + d)] – E(aX + b) ⋅ E(cY + d) = E[acXY + adX + bcY + bd] – (aE(X) + b)(cE(Y) + d) = acE(XY) – acE(X)E(Y) = acCov(X.Y) – E(h(X. and the 2 (1 + y ) (1 + y ) (1 + y )2 first integral is not finite.5. Yet since the marginal pdf of Y is ∞ E ( y ) = ∫0 ∞ y (1 + y ) 2 dy = ∫0 1 (1 − y ) 2 for y ≥ 0. Corr(aX + b. ∞ ∞ (1 + y − 1) 1 1 dy = ∫0 dy − ∫0 dy .Y) b.Y))]2 } = ∑∑ [ h( x. When a and c differ in sign. Since E(XY) = E(X) ⋅ E(Y). then Corr(X. E[h(X. y ) 2 p ( x. y ) − E( h( X . a. In the discrete case. 33. and –1 if a < 0 .60)2 = 13. aX+b) = E[X⋅(aX+b)] – E(X) ⋅E(aX+b) = a Var(X).Y)] = 9. Var[h(X.Y)] = 105.Y) when a and c have the same signs.Y)] = E[max(X.Y) = 0 σ xσ y 34.Y) = E(XY) – E(X) ⋅ E(Y) = E(X) ⋅ E(Y) .Chapter 5: Joint Probability Distributions and Random Samples 32.01) = 105. cY + d ) acCov( X . There is a difficulty here. Y ))] x with y ∫∫ replacing ∑∑ 2 p ( x. y )] − [ E( h( X .Y))2 ] = (0)2 (.02) +(5)2 (.5 – (9.Y)] = E[(max(X. Thus ρ itself is undefined. so Var[max(X. cY + d) = 35.Y) = aVar ( X ) Var ( X ) ⋅ Var (Y ) = aVar ( X ) Var ( X ) ⋅ a 2Var ( X ) 186 = 1 if a > 0.Y). Existence of ρ requires that both X and Y have finite means and variances.E(X) ⋅ E(Y) = 0. Y ) = Var ( aX + b) Var (cY + d ) | a | ⋅ | c | Var ( X ) ⋅ Var (Y ) = Corr(X.Y) = Cov(X. Cov(aX + b.60. 36. and E[h 2 (X. and since Corr(X. Cov(X. so Corr(X. Cov(X.Y)] = E{[h(X. Y ) . y ) = ∑∑ [ h( x. Corr(aX + b. Cov( aX + b. c.34 a. cY + d) = -Corr(X.Y) = Cov ( X .06) + …+ (15)2 (. a.30 . s2 0 112..06 .38 .10 .30 .09 b. µ T0 = E (T0 ) = 2.50 .5 312.25 .12 .20) + .20 .2) 2 = . x 25 32..50 40 .04) + 32.5 = µ b.2 = 2 ⋅ µ c.06 .20 25 .20 .30 .3 37. σ T20 = E (T02 ) − E(T0 ) 2 = 5.98 = 2 ⋅ σ 2 187 .15 .5(.04 .5 800 P(s 2 ) . T0 0 1 2 3 4 P(T0 ) .30 65 .25 .09 a.10 . + 65(.20 .82 − ( 2.09) = 44.12 E(s 2 ) = 212.20 .25 = σ2 38.09 E (x ) = ( 25)(.04 .Chapter 5: Joint Probability Distributions and Random Samples Section 5.04 .5 65 p(x ) .37 .30 P(x2 ) x2 | x1 25 40 65 .5 40 45 52.15 . P(x1 ) . 107 X is a binomial random variable with p = . 10.001 .302 . The population distribution is a s follows: x 0 5 10 p(x) 1/2 3/10 1/5 Write a computer program to generate the digits 0 – 9 from a uniform distribution.9 1. 188 . a. hence p(M = 0) = (. 5.088 .125. Assign a value of 0 to the digits 0 – 4.2 .027 . x 0 1 2 3 4 5 6 7 8 9 10 x/n 0 .8 . 5.0 p(x/n) .269 .7 . M = 10 when there is a single envelope with $10.Chapter 5: Joint Probability Distributions and Random Samples 39. Furthermore. or x3 . Possible values of M are: 0.1 . hence p(M = 10) = 1 – p(no envelopes with $10) = 1 – (. M = 0 when all 3 envelopes contain 0 money.488.201 .387 .125 + . p(M = 5) = 1 – [.488] = . As n. Generate samples of increasing sizes. x2 . the maximum of x1 .6 . keeping the number of replications constant and compute M from each sample. increases. b.5)3 = .005 .387.000 . but at a slower rate than p(M = 0). p(M = 10) goes to one.000 .5 .488 An alternative solution would be to list all 27 possible combinations using a tree diagram and computing probabilities directly from the tree. The statistic of interest is M. and a value of 10 to digits 8 and 9. a value of 5 to digits 5 – 7. 40.8)3 = . p(M = 5) goes to zero.8. M 0 5 10 p(M) . so that M = 0.000 . the sample size.4 . or 10.125 .3 . p(M = 0) goes to zero. Outcome 1.03 x 1 1.2 4.2.1) + P(2.3 3.5 2 2.8 c.7 29.1 1.25 .5) = .5 2.4 Probability .02 .40 .3) = (.12 .02 .2) + P(1.1.4333 189 .20 .5 2 2.6 p( x ) 4 30 2 30 6 30 4 30 8 30 4 30 2 30 b. x 27.2.4 2.1 4.08 P( X ≤ 1.4)2 (. x 27.08 .5 3 3.1.22 .3 1.5 2 2.12 .03 .5 3 r 0 1 2 3 1 0 1 2 Outcome 3. x b.3 2.3 4.1) + P(3. c.06 .4)4 + 4(.0 29.75 31.2)2 = .1.3) + 6(.16 .24 .09 .5) = P(1. P ( x ≤ 2.5 4 r 2 1 0 1 3 2 1 2 1 1.04 .2 3.1.1) + … + P(1.75 28.4 4.2400 42.2 2.4)2 (.95 31.2) + … + P(2.1.01 a.04 .5 4 p ( x ) .1.30 .5 3 3.04 .1. a.1.Chapter 5: Joint Probability Distributions and Random Samples 41.65 31.06 . d.1 2.01 x 2 2.4 Probability .4)3 (.65 31.5 3 3. r 0 1 2 3 p(r) .10 .2 1.3)2 + 4(.08 .1.9 p( x ) 1 3 1 3 1 3 all three values are the same: 30.1.1.1) + … + P(1.16 .1 3.04 .5 1.9 33.1. or the difference between the medians of the upper and lower halves of the data. n = 10 Normal Probability Plot 80 .05 10 . Use a computer to generate samples of sizes n = 5.05 . 20. The population distribution is uniform with A = 8 and B = 10. Below is a histogram. The statistic of interest is the fourth spread. Plot the sampling distributions on separate histograms for n = 5. and 30. 44.001 0 10 20 30 40 50 60 70 80 90 5 15 25 35 45 n=1 0 55 65 75 85 Anderson-D arling N ormality Tes t A-Squared: 7. 42 8 P-Va lue : 0 .Sq ua re d : 4.000 n = 50 Normal Probability Plot . 10. compute the upper and lower fourth.50 . For each sample.99 . both generated by Minitab.80 . we can see that as n increases.50 . even the sampling distribution for n = 50 is not yet approximately normal. 20.0 00 190 . and 30 from a Weibull distribution with parameters as given. Keep the number of replications the same (say 500.01 20 . 10.95 60 Probability Frequency 90 50 40 30 20 . This sampling distribution appears to be normal. 10. the distribution slowly moves toward normality.406 P-Value: 0.Chapter 5: Joint Probability Distributions and Random Samples 43. as in problem 43 above. and a normal probability plot for the sampling distribution of x for n = 5.Da rl ing No r mali ty Te s t A. calculate the mean.001 10 20 0 15 25 35 45 55 65 30 40 50 60 An de r so n. 20. keeping the number of replications the same. 45.999 70 .99 Probability 60 Frequency 50 40 30 .80 .20 . However. and 30 from a uniform distribution with A = 8 and B = 10.20 .95 . Use a computer to generate samples of sizes n = 5. For each sample.999 70 . for example).01 0 . the others will also appear normal. Using Minitab to generate the necessary sampling distribution. then compute the difference. so since larger sample sizes will produce distributions that are closer to normal. µ = 12 cm a. n = 25 P( X > 12.01 − 12   P Z >  = P( Z > 1. n = 16 σ = ..Chapter 5: Joint Probability Distributions and Random Samples Section 5.25) = 1 .99 ≤ 11.10 .04 cm E ( X ) = µ = 12cm σ x = n = 64 n = E ( X ) = µ = 12cm σ x = .10 .75 ≤ X ≤ 50. σx σ = .4 46.5 ≤ Z ≤ 2. a.. This is due to the decreased variability of X with a larger sample size. µ = 12 cm a. larger.01 cm of the mean (12 cm) with the second.1056 48.75 ≤ 49.10 100 50. σ x = σx 1 = .25 − 49.75 − 50 P( 49.Φ(-1) =.6915 191 .9876 b.04 = .5 ≤ Z ≤ 4.25) . 47. sample.75 − 49.04 cm n = 16 P( 11. b. c.10   n = = P(-2.01   = P(-1 ≤ Z ≤ 1) = Φ(1) . µ X = µ = 50 .Φ(1.8  X ≤ 50.01) = P ≤Z ≤  .10   = P(-.1587 =.01) = 12.8944 =. P( 49.25 − 50   49.8 50.25) ≈ P ≤Z≤  .5) = .01cm 4 σx n = .25) = P ≤Z≤  .01 − 12  X ≤ 12.005cm 8 X is more likely to be within .04 / 5   = 1 .8413 .99 − 12 12.01 .5) = .6826 b.04 = . 95   µ = 10. 200 − 10.4).6026 37.900 − 10.2981 37.29 192 .1038 = .000  X ≤ 10. – 6:50 P.95. For day 1.000 psi a.53) = Φ(2. = 250 minutes.7720 52.8686)(. n =4 µ T0 = nµ = ( 4)(10) = 40 and σ T0 = σ n = ( 2)(1) = 2.26 ) = . so P( T0 ≤ 250) ≈ 250 − 240   P Z ≤  = P( Z ≤ . With T0 = X1 + … + X40 = total grading time. n = 40 σ = 500 psi  9. µ T0 = nµ = ( 40)( 6) = 240 and σ T0 = σ n = 37.8888 2 / 6  For day 2.9943 . thus using the same procedure for n = 15 as was used for n = 40 would not be appropriate.8905 According to the Rule of Thumb given in Section 5.. n = 5  P( X ≤ 11)= P Z  ≤ 11 − 10   = P ( Z ≤ 1.26 ≤ Z ≤ 2.200) ≈ P ≤Z≤  500 / 40   500 / 40 P( 9. 51. We desire the 95th percentile: 40 + (1. a. = P(-1.645)(2) = 43. X ~ N(10. 11 P.4.8686 2 / 5  ≤ 11 − 10   = P ( Z ≤ 1.. P( X ≤ 11)= (.Chapter 5: Joint Probability Distributions and Random Samples 49.53) . n = 6  P( X ≤ 11)= P Z  For both days. 260 − 240   P(T0 > 260) = P Z >  = P( Z > .12) = .Φ(-1. n should be greater than 30 in order to apply the C.53) = .M.8888) = .95   b.22) = .26) = .000 10.T.900 ≤ b. X ~ N(10). 50.L.M. 02.811 .2 / 9  ≥ 51 − 50   = P( Z ≥ 5. Thus n = 33 will suffice.61 ≤ Z ≤ 1.5 − 50  σ = λ = 7. n = 9  P( X ≥ 51) = P Z  b.5 − 20   14.811   15.65 ≤ X ≤ 3.9803 .30 ≤ Z ) = .65   P( X ≤ 3.00)= P Z ≤  = P( Z ≤ 2.17 5 n 3.464 = P( −1.4803 µ X = µ = 2.99 implies that = 2.19 7.071 ≤ Z ≤ 2.8882 b. ≥ 51 − 50   = P( Z ≥ 2. µ = 250 . σ = 1.071 . σ 2 = 250.59 ≤ Z ≤ 1. 55. Y has approximately a normal distribution with 56.00)= P Z  ≤ = 3.17   P(2.464  24. so P( 225 ≤ Y ≤ 275) ≈ 275. µ = 50.5 − 20 ≤Z≤  3.Chapter 5: Joint Probability Distributions and Random Samples 53.00)= = P( X ≤ 3.61) = .9938 = .464  µ = np = 20 25.0968  3.5) = 1 − . σx . P( 25 ≤ X ) ≈ P ≤ Z  = P(1.27 ) ≈ 0 1.65 .59) = .464   3.2 / 40  n = 40  P( X ≥ 51) = P Z  54. so P( 35 ≤ Y ≤ 70) ≈ P ≤Z ≤  = P( -2.00 − 2.85 / n  85 / n which n = 32. from . P( 15 ≤ X ≤ 25) ≈ P a. With Y = # of tickets.65) = .9838 b.071   7.90) = .35  = .5 − 250 P ≤Z≤  = P( -1.06) = .65  .8926 15.00 − 2. µ = λ = 50 .85 = .σ = 15.00) − P( X ≤ 2.33. σ x =  P( X ≤ 3.2 a. b. 34. σ = npq = 3.0062 1.5 − 250   224.5 − 20  a.811 Here 193 . a.5 − 50 70. 05) = . µ X = µ = 60 . sd = 4.9986 P(X1 + X2 + X3 ≤ 200) = σx 15 = = 2.. a.8531 . 59.14 .5X2 -.98) ≈ ..5 58. σ x1 + x2 + x3 = 6.708   P(150 ≤ X1 + X2 + X3 ≤ 200) = P( −4.100.Chapter 5: Joint Probability Distributions and Random Samples 57.9986 6. E(X) = 100.. but the variance is not because the covariances now also contribute to the variance.850 V(27X1 + 125X2 + 512X3 ) = 272 V(X1 ) + 1252 V(X2 ) + 5122 V(X3 ) = 272 (10)2 + 1252 (12)2 + 5122 (8)2 = 19.89) = .7434 5− 0   − 10 − 0 ≤Z ≤  4.6266 b. a. E( X1 . σ x = c.5X3 ≤ 5) = P = P(− 2.25σ 32 = 22.98) = .708 200 − 180   P Z ≤  = P( Z ≤ 2. E( 27X1 + 125X2 + 512X3 ) = 27 E(X1 ) + 125 E(X2 ) + 512 E(X3 ) = 27(200) + 125(250) + 512(100) = 87. V( X1 .236) = .77) = . Var(X) = 200. V(X1 + X2 + X3 ) = 45.236   P(58 ≤ X ≤ 62) = P(− .89 ≤ Z ≤ .5X3 ) = 0.14   Section 5.8357 194 .7434   4. σ x = P( Z ≤ 1.11 ≤ Z ≤ 1.9875 2.5.7434 P(-10 ≤ X1 .116 b.0174 = . The expected value is still correct.9616 125 − 100  = 14.5X3 ) = σ 1 2 + .5X2 -.236 n 3 55 − 60   P( X ≥ 55) = P Z ≥  = P( Z ≥ −2.25σ 22 + .5X2 -. so P( X ≤ 125) ≈ P Z ≤  14..47 ≤ Z ≤ 2. E( X1 + X2 + X3 ) = 180. σ x1 + x 2 + x3 Thus. P(0 ≤ Y ) = P ≤ Z  = P(. V(X1 + X2 + X3 ) = 12 + 22 + 1. 56 ≤ Z ) = .2877 and 1 . P(X1 + X2 + X3 ≤ 60) = 60 − 65   P Z ≤  = P( Z ≤ −1.Chapter 5: Joint Probability Distributions and Random Samples d.2X3 ) = 40 + 50 – 2(60) = -30.0675 195 p( x1 . V(X1 + X2 .25. a. and 2 3 1 1 1 1 1 σ Y2 = σ 12 + σ 22 + σ 32 + σ 42 + σ 52 = 3.X2 ) = 1.33 . 4 4 9 9 9 0 − ( − 1 )   Thus.70.2X3 ) = σ 1 + σ 22 + 4σ 32 = 78.94. and the standard deviation of X + Y is 1. σ Y = 1. x 2 ) = 3.7795 . P( X1 + X2 .0875 + 2(.X2 ) = .7795   Y is normally distributed with µY = 61. = 7. E(X1 X2 ) = ∑∑ x x 1 2 x1 x2 E(X1 X2 ) . or written another way.832. so Cov(X1 . 62. E(X1 ) = 1. sd = 8. E( X1 + X2 + X3 ) = 150.12) = . V(Y) = . V(X) = 1.4. V(X1 + X2 + X3 ) = 36.25 = 2. a.51 b. E(X2 ) = 1.61. E(Y) = .5. V(X+Y) = V(X) + V(Y) = 2. σ x1 + x2 + x 3 =6 160 − 150   P Z ≤  = P ( Z ≤ 1.0314 2. so P( X1 + X2 . 1 (µ 1 + µ 2 ) − 1 (µ 3 + µ 4 + µ 5 ) = −1 .33 – 2.8.59 + 1.52 = 7.6926 V(X1 + X2 ) = V(X1 ) + V(X2 ) + 2 Cov(X1 .167.832   2 60. The marginal pmf’s of X and Y are given in the solution to Exercise 7.2X3 ≥ 0) = 0 − ( −30)   P Z ≥  = P ( Z ≥ 3. Thus E(X+Y) = E(X) + E(Y) = 3.67 ) = .695) = 4.7. 7795   2   P(− 1 ≤ Y ≤ 1) = P 0 ≤ Z ≤  = P (0 ≤ Z ≤ 1. from which E(X) = 2.71 E( X1 + X2 + X3 ) = E( X1 ) + E(X2 ) + E(X3 ) = 15 + 30 + 20 = 65 min. V(3X+10Y) = 9V(X) + 100V(Y) = 75. E(3X+10Y) = 3E(X) + 10E(Y) = 15.695 b.27.66..E(X1 ) E(X2 ) = 3.2X3 ≥ 0) E( X1 + X2 .9525 6   P(X1 + X2 + X3 ≤ 200) = We want P( X1 + X2 ≥ 2X3 ). 36.40) = .6926   63.55.635 = .0003 8. and the standard deviation of revenue is 8.86) = .3686 1. 1) ≈ P(− 1. σM = 10. σ2 σ2 + = . Var(X1 + …+ X10 ) = Var(X1 ) + … + Var(X10 ) = 5 Var(X1 ) + 5Var(X6 )  64 100  820 = 5 + = 68. E ( X − Y ) = 0.E(X6 ) = 4 – 5 = -1 Var(X1 – X6 ) = Var(X1 ) + Var(X6 ) = d.5625. …. With M = 5X1 + 10X2 . 64 100 164 + = = 13. E(M 2 ) = ( )( ) ) (by independence) = (.43 < Z ) = .9660 V (X − Y ) = 66. b.  75 − 50  P < Z  = P( 2.0566 25 25 ⇒ P (− .77 ) = .5625 196 . a. X10 denote evening times.5625 – (50)2 = 111. ….308  ( )( ) =EA EX 2 1 2 1 ( E A12 X 12 + 2 A1 X 1 A2 X 2 + A22 X 22 + 2 E( A1 )E ( X 1 )E ( A2 )E( X 2 ) + E A22 E X 22 E(Y2 ) = Var(Y) + [E(Y)]2 .0022222 .1 ≤ X − Y ≤ .33 =  12 12  12 c. Y. E(X1 + …+ X10 ) = E(X1 ) + … + E(X10 ) = 5 E(X1 ) + 5 E(X6 ) = 5(4) + 5(5) = 45 b.12) = . σ X −Y = . Var(M) = E(M 2 ) – [E(M)]2 .308.0471 36 36 ⇒ P (− .0032 .0075  10.5)2 + 102 (1)2 = 106.v.25 + 100)(1 + 16) = 2611. E(M) = 5(2) + 10(4) = 50.67 12 12 12 E[(X1 + … + X5 ) – (X6 + … + X10 )] = 5(4) – 5(5) = -5 Var[(X1 + … + X5 ) – (X6 + … + X10 )] = Var(X1 + … + X5 ) + Var(X6 + … + X10 )] = 68. Recall that for any r.2 a.77 ≤ Z ≤ 1. Let X1 . E(X1 – X6 ) = E(X1 ) . Var(M) = 52 (.9232 (by the CLT) b. so E(M) = E(A 1 X1 ) + E(A 2 X2 ) = E(A 1 )E(X1 ) + E(A 2 )E(X2 ) = 50 d.Chapter 5: Joint Probability Distributions and Random Samples 64. 65.00.12 ≤ Z ≤ 2. σ X −Y = .25 + 25)(. M = A 1 X1 + A 2 X2 with the A I’s and XI’s all independent. X5 denote morning times and X6 .1) ≈ P(− 2. so Var(M) = 2611. Thus. P( 75 < M ) = c. a.1 ≤ X − Y ≤ .25 + 4) + 2(5)(2)(10)(4) + (.33 µ = 5. V (X − Y ) = σ2 σ2 + = .25. σ = . 5∑ i = n (n + 1) 4 n n i =1 i =1 Var (Yi ) = .X3 ≤ 35) = P(.Chapter 5: Joint Probability Distributions and Random Samples e.77 ≤ Z ≤ 1. whereas #2 will be 2X2 from where it started. 0 ≤ X2 – X1 ≤ 10. a. X3 ) = 1745. but now Var ( M ) = a12Var ( X 1 ) + 2a1 a2 Cov( X 1 .77) = . i.42.25+. X2 . X2 .9207 = .25 + 2(5)(10)(-.77 69.5 ≤ X1 + X2 . n n i =1 i =1 E (Yi ) = . Letting X1 .01 = . Thus.. 70.6481.25.25) + 100 = 81. and X3 denote the lengths of the three pieces.05 c. Thus the separation distance will be al most 10 if |2X2 – 10 – 2X1 | ≤ 10. E(X1 + X2 + X3 ) = 800 + 1000 + 600 = 2400.9830 . and standard deviation 14. but now Var(X1 + X2 + X3 ) = Var(X1 ) + Var(X2 ) + Var(X3 ) + 2Cov(X1 .12) = . After two hours.e. with sd = 41.0623. variance 100 + 100 = 200. so Var (W ) = ∑ i 2 ⋅ Var (Yi ) = .X2 ) + 2Cov(X1 . b.41 ≤ Z ≤ 2.25∑ i 2 = 197 n( n + 1)( 2n + 1) 24 . Standardizing gives P(34. so E (W ) = ∑ i ⋅ E (Yi ) = . X3. After two hours. This has a normal distribution with mean value 20 + 15 – 1 = 34. a. b. 14.e. the planes have traveled 2X1 km.14   b.9616. and standard deviation . and 2X2 km.. 5 − ( −20)   P ( X 2 − X 1 < 5) = P  Z <  = P( Z < 1. respectively. E(M) = 50 still. the total length is X1 + X2 .X3. X 2 ) + a 22Var ( X 2 ) = 6. i. X2 – X1 has a mean 500 – 520 = -20.1588 68. so the second will not have caught the first if 2X1 + 10 > 2X2 . if X2 – X1 < 5.5. X3 ) + 2Cov(X2 . variance . a. Assuming independence of X1 . i.25 67.e. Let X1 and X2 denote the (constant) speeds of the two planes.14. #1 will be 10 + 2X1 km from where #2 started.16+. The corresponding probability is P(0 ≤ X2 – X1 ≤ 10) = P(1. E(X1 + X2 + X3 ) = 2400 as before. –10 ≤ 2X2 – 10 – 2X1 ≤ 10.54) = . Var(X1 + X2 + X3 ) = (16)2 + (25)2 + (18)2 = 12. 72.0068 10 − 5   P( X − Y ≥ 10) ≈&P Z ≥  = P( Z ≥ 3. and the desired value t is the 99th percentile of the lapsed time distribution added to 10 A.25 ) = 430.74   The total elapsed time between leaving and returning is To = X1 + X2 + X3 + X4 . and a linear combination X − Y has approximately a normal 82 6 2 = + = 2.08) = . so 0   − 10 p (− 5 ≤ X − Y ≤ 5) ≈ P ≤Z≤  = P( −2. 0 E(M) = (5)(2) + (10)(4) + (72)(1. 2 2 2 2 2 200 − 158   P( M ≤ 200) = P Z ≤  = P( Z ≤ 2.0010. so 12 a.7 )(. Both approximately normal by the C. The difference of two r.5) = 158m σ M2 = (5) (.: 10:00 + [40+(5. This probability is 1. σ X − Y = 1.76 73.621 40 35 of normal r. To is normally distributed.25 .47) ≈ . Thus µ X −Y = 5 and σ X −Y = 22.6213   1. as 2 2 is Y with µ 2 = 30 and σ 2 = 12 .M.9788 20.477 . with E (To ) = 40.74   4. σ T2o = 40 .74 2 b.’s is just a special linear combination.11 ≤ Z ≤ 0) = . 74.6213   quite small.5 .03) = . b.33)] = 10:52.74 X is approximately normal with 198 .3) = 10.5 .T.Chapter 5: Joint Probability Distributions and Random Samples 71. d.477)(2. so such an occurrence is unlikely if µ1 − µ 2 = 5 .4826 4 . a.7) = 35 and σ 12 = (50)(. M = a1 X 1 + a2 X 2 + W ∫ xdx = a1 X 1 + a 2 X 2 + 72W .v.v’s has a normal distribution. µ X −Y = 5 and σ X2 −Y 1− 5   − 1− 5 P(− 1 ≤ X − Y ≤ 1) ≈&P ≤Z ≤  1. µ1 = ( 50)(.629.L. σ To = 5.6213 = P( −3.70 ≤ Z ≤ −2. so distribution with c. σ M = 20. and we would thus doubt this claim.5) + (10) (1) + (72) (. 15. as described below. 76.35) e.12) + p(15.55 for y = 12. p X(x) is obtained by adding joint probabilities across the row labeled x. . fX(x) ⋅ fY (y) ≠ f(x. a.35 (or = E(X) + E(Y) = 33. p x(12) ⋅ p y (12) = (. 1= ∞ ∫ ∫ −∞ − ∞ = b.y) pair yields the same conclusion). x + y = 30 x + y = 20 a.15) = .15) + p(15.12) + p(12. P(X ≤ 15 and Y ≤ 15) = p(12. 250 3 ⋅k ⇒ k = 3 81. so X and Y are not independent.5.2)(.05 = p(12.e. Sum of percentiles: µ1 + ( Z )σ 1 + µ 2 + ( Z )σ 2 = µ 1 + µ 2 + ( Z )(σ 1 + σ 2 ) Percentile of sums: µ1 + µ 2 + ( Z ) σ 12 + σ 22 These are equal when Z = 0 (i. Similarly. 77. E ( X + Y ) = ∑∑ ( x + y ) p ( x . d. 199 .2.25 c. . y ) = 33. resulting in p X(x) = .250 30 − x 2   ∫20− x kxydy = k ( 250 x − 10 x ) f X ( x ) =  30− x 2 3 ∫0 kxydy = k ( 450 x − 30 x + 12 x ) 0 ≤ x ≤ 20 20 ≤ x ≤ 30 and by symmetry fY (y) is obtained by substituting y for x in fX(x). E ( X − Y ) = ∑∑ x + y p( x. .y) for all x.y so X and Y are not independent. y ) = 3.85 The roll-up procedure is not valid for the 75th percentile unless σ 1 = 0 or σ 2 = 0 or both σ 1 and σ 2 = 0 . but f(25. Since fX(25) > 0. which happens when σ 1 = 0 or σ 2 = 0 or both σ 1 and σ2 =0. ∞ f ( x. and fY (25) > 0. 25) = 0 .35. . 20 respectively.1. from column sums p y (y) = .1) ≠ .3 for x = 12. 20 respectively. b. (Almost any other (x.Chapter 5: Joint Probability Distributions and Random Samples Supplementary Exercises 75.12). y) dxdy = ∫ 20 30 − x 0 ∫ 20 − x 30 30 − x kxydydx + ∫ ∫ 20 0 kxydydx 81. 15. for the median) or in the unusual case when σ 1 + σ 2 = σ 12 + σ 22 . and E(X2 ) = E(Y2 ) = 204. P( X + Y ≤ 25) = ∫ 20 0 ∫ 25− x 20 − x = d.66  y − 100  FY (y) = P( max(X1 . and the std dev = 11.19 = −. − 32. 25 25− x E ( XY ) = ∫ ∞ ∫ ∞ − ∞ −∞ +∫ = 2 k (351.0182 Var (X + Y) = Var(X) + Var(Y) + 2Cov(X.6154 − (12.67) = 25.09.Y) = 7.0) ≈ 1 Var ( X + Y + Z ) = 200 . …. E ( X + Y + Z ) = 500 + 900 + 2000 = 3400 50 2 100 2 180 2 + + = 123.355 81.4103 . 365 365 365 P( X + Y + Z ≤ 3500) = P( Z ≤ 9.4103 – (12. Xn ≤ y) = [P(X1 ≤ y)] =   for  100  n 78. so 3 3 Cov(X.666. 250 24 { ( ) E ( X + Y ) = E( X ) + E (Y ) = 2 ∫ x ⋅ k 250 x − 10 x 2 dx ( ) 20 0 } + ∫ x ⋅ k 450 x − 30 x 2 + 12 x 3 dx 30 20 e.014 . Xn ) ≤ y) = P( X1 ≤ y.894 36.969 xy ⋅ f ( x. y ) dxdy = ∫ 20 30 − x 0 30 30 − x ∫ 20 0 kx 2 y 2 dydx = ∫ 20 − x kx2 y 2dydx k 33.6154. ….9845) 2 = 36.9845)2 = -32. so σ x2 = σ 2y = 204.625 ⋅ = . 250. n 100 200 n n 100 n −1 E (Y ) = ∫ y ⋅ ( y − 100 ) dy = (u + 100 )u n−1 du n 100 100 100 n ∫0 n 100 n n 2n + 1 = 100 + u du = 100 + 100 = ⋅ 100 n ∫0 100 n +1 n + 1 Thus fY (y) = 79. n 100 ≤ y ≤ 200.0182 and ρ = f.Y) = 136.19.Chapter 5: Joint Probability Distributions and Random Samples c.000 ⋅ = 136. n ( y − 100 )n−1 for 100 ≤ y ≤ 200. kxydydx + ∫ ∫ 20 0 kxydydx 3 230. variance 200(.2 n ≤ Z ≤ .78) = .5) = 70 81.06. V(X) = 12V(X1 ) = 12(36) = 432.45) and Y ~ Bin (300. a.6).4) = 121.45) + (300)(. Y50 denote the tourist-class weights.25. 82. so 100.96 ⇒ n = 97. both X and Y are approximately normal.01 / n = P − .95 = ( ) .02 .000) = P Z ≤  = P( Z ≤ 2.T.96 ) = .483 .02  P( µ − . Thus E(T) = E(X) + E(Y) = 360 + 2000 = 2360 And V(T) = V(X) + V(Y) = 432 + 5000 = 5432.9973 4370.37. We expect 20 components to come in for repair during a 4 hour period. 2500 − 2360   P(T ≤ 2500) = P Z ≤  = P(Z ≤ 1.8340 2. and standard deviation 2. so 0 − ( − 2)   P ( X 1 − X 2 + X 3 ≤ 0) = P  Z ≤  = P( Z ≤ .06   201 .02    − .02 ≤ X ≤ µ + . The expected value and standard deviation of volume are 87.L. E(N) ⋅ µ = (10)(40) = 400 minutes b.37   86.45)(.9099. X12 denote the weights for the business-class passengers and Y1 . X ~ Bin ( 200. ….9713 73. . Thus. 84. so P(To < 192) = P(Z < 1.34) = . V(Y) = 50V(Y1 ) = 50(100) = 5000.850   P( volume ≤ 100.5 − 270   = P Z ≥  = P( Z ≥ −1. ….97 ) = . The student will not be late if X1 + X3 ≤ X2 . I have 192 oz.850 and 4370.7021   a. 249.2 n .6) = 270. respectively.01 / n   .2 n = 1. variance 4.02.7021 b. Then T = total weight = X1 + … + X12 + Y1 + … + Y50 = X + Y E(X) = 12E(X1 ) = 12(30) = 360.6)(. Thus To is normal with µ To = 182 and σ To = 7. The amount which I would consume if there were no limit is To = X1 + …+ X14 where each XI is normally distributed with µ = 13 and σ = 2.Chapter 5: Joint Probability Distributions and Random Samples 80. if X1 – X2 + X3 ≤ 0. std dev = 73. Because both n’s are large.000 − 87.96 ≤ Z ≤ 1.55) + 300(.02) =&P ≤Z≤  .e. i. Let X1 . 85. and standard deviation 11.90) = . but P(− 1. P(X + Y ≥ 250) 83. so E(N) ⋅ µ = (20)(3.86) = .40. The C. E(Y) = 50E(Y1 ) = 50(40) = 2000. .9686 11.95 so 0. This linear combination has mean –2. so X + Y is approximately normal with mean (200)(. Substituting a= σY 2 2 2 2 yields σ Y + 2σ Y ρ + σ Y = 2σ Y (1 − ρ ) ≥ 0 . a. Same argument as in a c. 9s chi-squared withν Xi − µ is standard normal. Suppose ρ = 1 .. take the derivative with respect to t and equate it to 0: 0=∫ = ∫ 2( x + y − t )(−1) f ( x.. y) = 0 ⇒ ∫ ∫ tf (x. To find the minimizing value of t. 0 0 2 ⋅ f ( x. 89. 2 =n  Xi − µ   σ  is chi-squared with ν = 1 . so aX − Y = aX − k . y ) dxdy. so the sum   202 . E ( X + Y − t) 2 = ∫ ∫ (x + y − t) 1 1 88. until Z12 + . (ν1 +ν 2 ) / 2 By a.167). so (Z12 + Z 22 ) + Z 32 is chi-squared with ν = 3 . Y ) + σ y2 = a 2σ x2 + 2aσ X σ Y ρ + σ y2 . FY ( y ) = ∫ y 0 ν1 ν2 x +x −1 −1 − 1 2   y − x1 1 1 2 2 2 ⋅ ⋅ x x e dx  ∫0 dx1 . which implies that aX − Y = k (a constant). y)dxdy = E ( X + Y ) . 2 2 2ν1 / 2 Γ(ν 1 / 2) 2ν 2 1 / 2 Γ(ν 2 / 2) 1   But the inner integral can be shown to be equal to 2 b. 1 y [ (ν1 +ν 2 ) / 2]−1e − y / 2 . y )dxdy = t 1 1 1 1 0 0 1 1 0 0 ∫ ∫ ( x + y ) ⋅ f ( x. Then Var (aX − Y ) = 2σ Y2 (1 − ρ ) = 0 . a. so σ is chi-squared with ν = n . so the best prediction is the individual’s 0 0 expected score ( = 1. which is of the form aX + b . + Z n2 c. so ρ ≥ −1 σX b. etc. Γ((ν 1 + ν 2 ) / 2 ) Z12 + Z 22 is chi-squared with ν = 2 . With Y = X1 + X2 .Chapter 5: Joint Probability Distributions and Random Samples 87. Var ( aX + Y ) = a 2σ x2 + 2aCov( X . from which the result follows. W + E 2 ) = Cov(W . Y1 ) + Cov(X1 . because measurement error substantially reduces the correlation. and (by the result of exercise 70a) the second and third factors are the square roots of Corr(X1 . Clearly. a. B ) ⋅ ⋅ σ Aσ B σ A2 + σ D2 σ B2 + σ E2 The first factor in this expression is Corr(A.W ) + Cov ( E1 . Y + Z) = E[X(Y + Z)] – E(X) ⋅ E(Y + Z) = E(XY) + E(XZ) – E(X) ⋅ E(Y) – E(X) ⋅ E(Z) = E(XY) – E(X) ⋅ E(Y) + E(XZ) – E(X) ⋅ E(Z) = Cov(X. E 2 ) = Cov(W . measurement error reduces the correlation.E)= Cov(A.W ) = V (W ) = σ w2 . E 2 ) + Cov( E1 . .B). 91.Y) + Cov(X.Y) = Cov(A+D. V ( X 1 ) = V (W + E1 ) = σ W2 + σ 2E = V (W + E2 ) = V ( X 2 ) and Cov( X 1 . Y1 + Y2 ) = Cov(X1 . a.B) + Cov(A. Corr ( X .E) + Cov(D.Z). Cov(X.9025 = . B+E) = Cov(A. b. Y1 ) + Cov(X2 . Cov(X1 + X2 .Y2 ) (apply a twice) = 16. a. b.B) + Cov(D. Y ) = = Thus Cov( A. This is disturbing. X2 ) and Corr(Y1 . 203 . since both square-root factors are between 0 and 1. B) σ A2 + σ D2 ⋅ σ B2 + σ E2 σA σB Cov ( A.8100 ⋅ . X 2 ) = Cov(W + E1 .0001 92.9999 1 + .B). Y2 ). respectively.855 .Y2 ) + Cov(X2 . b. ρ= ρ= σ W2 σ W2 + σ E2 ⋅ σ W2 + σ E2 = σ W2 σW +σ E 2 2 1 = .Chapter 5: Joint Probability Distributions and Random Samples 90. Cov(X. Thus. W ) + Cov(W . and . and x4 are − x4 .0)(. µ 2 .5)(-.1200 + .1894. . The four second order partials are 2 x4 2 x4 2 x4 . µ 3 . -. x2 .2167. and 2 x2 x3 x1 x 2 x3 x4 = 120 gives –1.0356 + .64.2167)2 = 2.3000. and + + . − 42 . so V(Y) = (1)(-1.5333)2 + (1.3000)2 + (4.0338 = 26. and the approximate sd of y is 1. x3 = 20. x2 = 15. respectively.6783. µ 2 . -. Substitution gives x13 x 23 x 33 E(Y) = 26 + . 94.Chapter 5: Joint Probability Distributions and Random Samples 93. x3 .2. Substituting x1 = 10. µ 3 . µ 4 ) = 120[101 + 151 + 201 ] = 26 The partial derivatives of − h ( µ1 . µ 4 ) with respect to x1 . x12 x4 x 1 1 1 . . and 0 respectively.5333.2)2 + (1)(. E (Y ) =&h( µ 1 . 204 . respectively. 50 . 20 16 = .94 − ( 21927. With X = # of T’s in the sample.1 1. x = 10. Σxi 219.660 26 2 s= 2 c. or a. X = # in sample without TI graphing calculator.1407 e. x = # of successes = 4.1407 n 27 We use the sample median.1481 s 1. = . a. We use the sample standard deviation. Here. so 2. We use the sample mean. so pˆ = pˆ = 10 .660 = = .80 = = 8.8) s = = 1. µˆ = x = b.80 20 .2039 x 8. and pˆ = x n = 4 27 = . ~ x = 7. We use the sample (std dev)/(mean). With “success” = observation greater than 10. 205 pˆ = X n . ascending order). x to estimate the population mean µ . d.7 (the middle observation when arranged in 1860. the estimator is b.CHAPTER 6 Section 6. and x = 16. so we also use the sample mean x = 1.300 y = 374.45) = . c.6   y 206 .300 2 x  340. V ( X − Y ) = V ( X ) + V (Y ) = σ 12 + σ 22 = 1.3481    P ( X < 1 . We use the sample mean.575 = .5 − 1. The estimate would be n1 n2 s12 s 22 1. x = 1.000 θˆ = T − Nd = 1.660 = = .000 T = 1.6 d = 34. Since we can assume normality.761. 2 Y σ 12 σ 22 + . V ( X − Y ) = V (X ) + V (Y ) = σ + σ = + n1 n2 2 X σ X −Y = V ( X − Y ) = s X −Y = 5. We could also easily use the sample median.5687 .6) = 1.0 θˆ1 = Nx = ( 5.438. x − y = 8.281  374.000)( 340. s1 1. b.703. µˆ + (1.3385) = 1.1824 N = 5.5 − x  1.Chapter 6: Point Estimation 3. the mean = median.141 − 8.761.300  = 1.3481 .591. Because we assume normality. We use the 90th percentile of the sample: d.3385    x= σˆ s .66 2 + 2.434 4.3481 + (1.5 ) ≈ P  Z <  = P Z <  = P( Z < .601.7814 .104 d.28)(.104 2 + = + = .6  θˆ3 = T   = 1. n1 n2 27 20 c.3385 = = = .0846 n n 16 e.6736 s  .000)( 34.28)σˆ = x + 1.104 2 = 7.0) = 1.3481 a.28s = 1.300 − (5.7890 s 2 2. E (X − Y ) = E( X ) − E (Y ) = µ 1 − µ 2 .6 x = 340.761.66 2 2. 1 . The estimated standard error of a. σ 12 σ 22 b. 147. τˆ = 10. a. It is natural to estimate E(X) by using µˆ and σˆ in place of  2   2 µ and σ in this expression: .2461 ).206. respectively.2461  E ( Xˆ ) = exp 5. from 118 + 122 µ~ = ~ x= = 120.102 + = exp( 5. ∑x µˆ = x = b.6 10 a. 118.Chapter 6: Point Estimation 6. 103. so 8. 2 b.102 and σˆ = .size 80 2 b. 99. 8 of 10 houses in the sample used at least 100 therms (the “successes”). With q denoting the true proportion of defective components. . P(system works) = p 2 . µˆ = 1. 31.000 c.102 and s y = . d. qˆ = (# defective.150 sample. 125.0 2 ^ which the two middle values are 118 and 122. 138.in. It is easily verified that the sample mean and sample sd of the y i ' s are y = 5. so an estimate of this probability is 207  68  pˆ 2 =   = . The ordered sample values are 89. Using the sample mean and sample sd Let to estimate µ and σ .225) = 185.000 pˆ = 108 = .. 109.80. 122. 2 y  σ2 2 E ( X ) ≡ exp  µ + . so i n = 1206 = 120.sample) 12 = = . y i = ln( xi ) for I = 1.. 156.723  80  .87 2   7.4961 (whence σˆ = s = .4961 . a. gives µˆ = 5. a. n2  n1 n2 n1 n2  n1 2 2  X1 X2  X  X  1  1  = Var  1  + Var  2  =   Var ( X 1 ) +   Var ( X 2 ) b. Var  − n2   n1  n1   n2   n1   n2  pq p q 1 (n1 p1 q1 ) + 12 (n 2 p2 q 2 ) = 1 1 + 2 2 . 150 i σ = n λ . n n 2 2 2 E ( X − kS ) = µ . pˆ 2 = 2 .11 = . + ( 7)(1) = 317. n n X 2 tends to overestimate µ 2 . E ( X 2 ) = Var ( X ) + [ E( X )] 2 = λˆ = n 2. σ2 σ2 + µ 2 . a. since n = 150. n1 n2 d.245 200 200 208 . so X is an unbiased estimator for the Poisson parameter λ. 2 2 2 2 2 11. 317 λˆ = x = = 2. so the estimated standard error is n b. ( pˆ 1 − pˆ 2 ) = 127 − 176 = .119 150 10. qˆ 2 = 1 − pˆ 2 . and the standard error is the square 2 n1 n2 n1 n2 root of this quantity. E ( X ) = µ = E ( X ) = λ .Chapter 6: Point Estimation 9.880 = −. With pˆ 1 = x1 x ˆ 1 . so with k = . X X  1 1 1 1 E 1 − 2  = E ( X 1 ) − E ( X 2 ) = ( n1 p1 ) − (n 2 p2 ) = p1 − p 2 . σ2 1 E ( X − kS ) = E( X ) − kE( S ) = µ + − kσ 2 ..11 .635 − . σx = a. thus b. qˆ1 = 1 − p n1 n2 pˆ 1 qˆ1 pˆ 2 qˆ 2 + . c. so the bias of the estimator X 2 is . the estimated standard error is .. ∑x = (0)(18) + (1)( 37) + . so θˆ = = 74.Chapter 6: Point Estimation e.min(xi ) + 1 = 525 – 202 + 1 = 324. ∑x 2 i ( ) 1490. b. 12.α + 1. (. i. Similarly. The estimate will equal the true number of planes manufactured iff min(xi ) = α and max(xi ) = β..880)(.120) + = . 1 1 2 1 E( X ) = θ 3 −1 1 = θ 3 1 E( X ) = θ 3  1 θˆ = 3 X ⇒ E (θˆ ) = E (3 X ) = 3 E( X ) = 3 θ = θ  3 14. 15.α + 1. so the estimate of the number of planes manufactured is max(xi ) . so that E[max(xi )] < β.1058 = 1490. E[min(xi )] > α . min(xi ) = 202 and max(xi ) = 525.min(xi )] < β . iff the smallest serial number in the population and the largest serial number in the population both appear in the sample. 635)(. a.041 200 200  (n1 − 1)S12 + (n2 − 1)S 22  (n1 − 1) (n 2 − 1) E = E ( S12 ) + E ( S 22 )  n1 + n2 − 2 n1 + n2 − 2   n1 + n2 − 2 (n1 − 1) 2 (n2 − 1) 2 2 = σ + σ =σ . Then E ( X 2 ) = 2θ implies that E 2n  2   ∑ X i2  ∑ E X i2 ∑ 2θ = 2nθ = θ . implying that θˆ is an = E θˆ = E =  2n  2n 2n 2n   unbiased estimator for θ . a. X2 X2  = θ .1058 . () b. Consider θˆ = ∑ i . n1 + n2 − 2 n1 + n 2 − 2 x 2 θx 3 E ( X ) = ∫ x ⋅ (1 + θx )dx = + −1 4 6 1 13.so E[max(xi ) . This is because max(xi ) never overestimates β and will usually underestimate it ( unless max(xi ) = β) .505 20 209 . and can never exceed it. The estimator is not unbiased. The estimate will usually be smaller than β .e. 365) (. a. b. m n 2δσ 2 8(1 − δ )σ 2 Setting the derivative with respect to δ equal to 0 yields + = 0. µ . σ ) = and 2π σ 2π σ 1 2πσ 2 π σ 2 π ~ = = ⋅ .  ( x − µ )2 a. r − 1. p ) = p .  −  1 1 2 σ 2  2 f ( x. E ( pˆ ) = ∑ ∞ = p ∑ nb ( x. E[δX + (1 − δ )Y ] = δE( X ) + (1 − δ ) E (Y ) = δµ + (1 − δ ) µ = µ δ 2σ 2 4(1 − δ ) 2 σ 2 + . so 5 −1 4 = = . x =0 b. σ ) = e . so f ( µ .Chapter 6: Point Estimation 16. since > 1. π 4n n 210 . x = 5. 4m + n Var [δX + (1 − δ )Y ] = δ 2Var ( X ) + (1 − δ ) 2 Var (Y ) = 17. For the given sequence. b. Var ( X ) > Var ( X ). m n 4m from which δ = . r − 1  x + r − 1 r ⋅   ⋅ p ⋅ (1 − p )x x  x= 0 x + r − 1  ∞ ( x + r − 2 )! ⋅ p r −1 ⋅ (1 − p ) x = p ∞  x + r − 2  p r −1 (1 − p )x = p∑ ∑  x  x = 0 x!( r − 2)! x =0  ∞ a.467 . µ . 2 4n[[ f ( µ )] 4n 2 n 2 2 f (µ) = 1 ~ π 2 2. so Var ( X ) ≈ = .444 5 + 5 −1 9 pˆ = 18.  − 7 70 7  n  70 Section 6.2 20.3 = 2  − . For n = 20 and x = 3.3 and pˆ = 2λˆ − . Y  λ = . 1 X 1 E( pˆ ) = E   = E( X ) = ( np ) = p .4437 211 . n n n c.  80  ( ) () b. n  20  the estimate is 2  − .3 = 2λ − . (1 − .Chapter 6: Point Estimation 19. set it equal to zero and solve  x    x n− x d   n for p.3 = .2 .7 p + (.15)5 = . thus pˆ is an unbiased estimator of p. pˆ = = . Here λ = . E ( pˆ ) = E 2λˆ − .  n  n −x  ln   p x (1 − p )  . so p = 10 9 10  Y  9 λ− and pˆ = . so p = 2λ − . c.3.3 = 2 E λˆ − .15 n 20 a. .5 p + . We wish to take the derivative of b.3 = p .3)(.15 ⇒ 2λ = p + . setting this equal to ln   + x ln ( p ) + (n − x ) ln (1 − p ) = − dp   x   p 1− p x 3 zero and solving for p yields pˆ = . as desired. a.3).3 . so the moment estimator θˆ is the 0 θ +2 θ +2 1 1 solution to X = 1 − . θ ) = (θ + 1) ( x1 x2 .2 ⇒ αˆ = 5 . so = .500  1 αˆ  αˆ  b.2) Γ(1.0  1 1. xn ) .θˆ = 5 − 2 = 3. a..05 =  . f (x1 . From a...  αˆ   αˆ  2  Γ1 +  2 1 Xi αˆ  =  .. θ +1 1 =1− .80.0 from the hint. so the log likelihood is d n ln (θ + 1) + θ ∑ ln ( xi ) . x n . ˆ 1 − X θ +2 E ( X ) = ∫ x(θ + 1)x θ dx = 1 a.95 =  .2 ) 22. Thus βˆ = ∑ 1 n   αˆ  Γ1 +   αˆ  1 1  2 2 2 Γ1 +  is evaluated and βˆ then computed.  αˆ  1 2 X 2 2  . Then βˆ = = . Since X = βˆ ⋅ Γ 1 +  . θ n 212 . so the  α  α 1  moment estimators αˆ and βˆ are the solution to X = βˆ ⋅ Γ 1 +  . αˆ Γ(1. ( ) 1 2  2 2 2  E ( X ) = β ⋅ Γ 1 +  and E X = Var ( X ) + [ E ( X )] = β Γ1 +  . b. so this equation must be solved to obtain αˆ .. = . so once αˆ has been determined X i = βˆ Γ 1 +  . yielding θˆ = − 2 . Taking ln ( xi ) for each given x i θ +1 ∑ ln( X i ) yields ultimately θˆ = 3. Since x = .. and  2  20  28.12 . Taking and equating to 0 yields dθ n n = −∑ ln( xi ) .05 2  2 Γ 1 +  Γ1 +   αˆ   αˆ  1 x 28. = 1.Chapter 6: Point Estimation 21. so θˆ = − − 1 . ∑ 2 n X 1 2 Γ 1 +   αˆ  2 1   Γ1 +  Γ 2 1 +  1  16. . dλ λ n f (x1 .16) = 355.86 (this is not s). so x1! xn ! x1!. x n .. For our problem.. xn . 26. which is the same estimator for the binomial in exercise 6. implying that λˆ = x .. λ1 . set it equal ln   x    r d   x + r − 1 x  + r ln( p ) + x ln( 1 − p)  = − to zero.x n! ln [ f ( x1 . f (x1 . 25... λˆ = y . µˆ = x = 384..17 is pˆ = r −1 .86  The mle of 213 . µ + 1.. ^ 24.4  Φ  = Φ  = Φ (.. x n . = ...4. For a single sample from a Poisson distribution.  x + r − 1 r   p (1 − p ) x  with respect to p.. λ ) = x e − λ λx1 e − λ λ xn e − nλ λ∑ 1 .Chapter 6: Point Estimation 23. s 2 = 395. y n .42 .. and solve for p: . λ )] = − nλ + ∑ xi ln (λ ) − ∑ ln ( x i !) . The unbiased estimator from exercise 6.645σˆ = 415. x n ..645σ .16 . Thus x x d [ln [ f ( x1 . which is not the r + x −1 same as the maximum likelihood estimator.. The 95th percentile is P( X ≤ 400) is (by the invariance principle)  400 − µˆ   400 − 384. and (by the invariance principle) 1 2 (λ1 − λ 2 ) = x − y . ln  dp   x   p 1− p r Setting this equal to zero and solving for p yields pˆ = .7881  σˆ   18.. so and b.64 = 18. a. λ 2 ) is a product of the x sample likelihood and the y sample likelihood. 1 ( xi − x )2 = σˆ 2 = 9 (395..64 ∑ n 10 σˆ = 355... λ )]] = −n + ∑ i = 0 ⇒ λˆ = ∑ i = x . This is the number of r+x We wish to take the derivative of successes over the total number of trials.. y1 .20.. so the mle of this is (by the invariance principle) µˆ + 1..80) = . For x > 0 the cdf of X if F ( x.Chapter 6: Point Estimation 27. so the log likelihood is − nα ln ( β ) − n ln Γ(α ) . Taking the 2θ Σ xi2 n Σx i2 derivative wrt θ and equating to 0 gives − + = 0 .. 28.. n exp − xn2 / 2θ  = ( x1. b.38630θˆ 2 . (α − 1)∑ ln ( x i ) − ∑ β 0 yields xi . β ) = β nα Γ n (α ) i a...5 ) = . Equating  2θ  − x 2  this to .. which is identical to the unbiased 2n 2n [ ] [ ] estimator suggested in Exercise 15. α −1 − Σ x / β ( x1 x2 . ( ) 214 . From the second equation in a.5 = exp   implies  2θ  − x2 ~ = 1.x n ) θn θ  θ  Σxi2 natural log of the likelihood function is ln ( xi .38630 .5 and solving for x gives the median in terms of θ : . so the mle of µ is µˆ = X . The mle is therefore θ = . The  exp − x12 / 2θ . ∑x i β = nα ⇒ x = αβ = µ . a..x n ) − n ln (θ ) − .x n ) e f (x1 . x n . The mle of µ~ is therefore that ln (. a very β difficult system of equations to solve.. so x = µ 2θ 1 1 .. − x2  b. so n θ = and θ 2θ 2 2 Σx i2 ΣX i2 ˆ θ = .. [ ] exp − Σx i2 / 2θ  x1  x  . α . Equating both ∑ ln ( x ) − n ln ( β ) − n dα Γ(α ) = 0 and d i ∑x i β 2 = d d and to dα dβ nα = 0 . θ ) = P ( X ≤ x ) is equal to 1 − exp   .... n 24 Supplementary Exercises 31.4  n f ( y. The joint pdf (likelihood function) is λn e −λΣ( xi −θ ) x1 ≥ θ . Because the exponent nλθ is positive. x n ≥ θ iff min ( xi ) ≥ θ . 30. if we make θ larger than min (x ) . We know pˆ = 24 0 [ ( )] y . . n. a... p ) =   p y (1 − p ) n− y where  y p = P ( X ≥ 24 ) = 1 − ∫ λe − λx dx = e − 24λ . and Σ xi = 55..80 .64. Thus likelihood = ( i ) i The log likelihood is now n ln( λ ) − λΣ xi − θˆ . λ .202 55..Chapter 6: Point Estimation 29.. λn exp (− λΣx i ) exp (nλθ ) min ( xi ) ≥ θ  min ( xi ) < θ 0  Consider maximization wrt θ . Equating the derivative wrt λ to 0 n n and solving yields λˆ = = . This implies that the mle of θ is θˆ = min ( x ) ... y = 15. x n ≥ θ f (x1 . both integrals → 0 since lim ∫ e ∞ c→∞ c 215 −z 2 / 2 − nε / σ 1 −∞ 2π dz + ∫ 1 − z2 / 2 e dz = 0 . 2π e − z2 / 2 dz . and that − λΣ (x i − θ ) = −λΣ xi + nλθ . the likelihood drops to 0. increasing θ will increase the likelihood provided that min ( xi ) ≥ θ .. Σ xi − θˆ Σxi − nθˆ ( θˆ = min ( xi ) = ... ( )  X −µ ε   X −µ −ε  P X − µ > ε = P( X − µ > ε ) + P( X − µ < −ε ) = P > <  + P  σ / n σ / n  σ / n σ / n    nε  − nε   + P Z < = = P Z >    σ σ     As ∫ ∞ nε /σ 1 2π n → ∞ .80 − 6.0120 for n = 20..θ ) =  0 otherwise  Notice that x1 ≥ θ . ) The likelihood is 10 = . so by the invariance n y principle e −24 λ ln n y = ⇒ λˆ = − = . so λˆ = b.. x n . x4 = ∑ kxk and so on.. take and equate to 0. Then k =1 6 4. λ ) = λe −λx1 ⋅ (2λ )e −2λx2 .5) + .0.2) + (2)(16. For the given sample.. thus the estimator which minimizes MSE is neither the unbiased n +1 n −1 estimator (K = 1) nor the mle K = . Taking and equating to 0 yields dλ n λˆ = n . so K = n n n +1 n  n  33. n 216 .....0436 . Let x1 = the time until the first birth. θn θ ny n −1 n n +1 b. While θˆ = Y is not unbiased. To find the minimizing value of K. so f Y ( y ) = . dK n −1  n −1 the result is K = . x3 = 9. x2 = the elapsed time between the first and second births. x6 = 2.. x1 = 25. FY ( y ) = P(Y ≤ y ) = P( X 1 ≤ y .. Thus d the log likelihood is ln (n!) + n ln (λ ) − λΣ kxk . 6 = . Bias ( KS 2 ) = E( KS 2 ) − σ 2 = Kσ 2 − σ 2 = σ 2 ( K − 1) .2. since 0 n n +1 n n +1 n n +1 n +1  n +1 does the trick.7 and k =1 λˆ = 34. so ∑ kx k = (1)( 25. xn .2 = 16. E Y = E (Y ) = ⋅ θ = θ . Y is.P( X n ≤ y ) =   θ  ny n −1 for 0 ≤ y ≤ θ .3) = 137. x5 = 4.. and Var ( KS ) = K Var ( S ) = K 2 2 2 2 (E [(S ) ] − [E (S )] ) = K  (n n+ −1)1σ 2 2 2 2 2  4 2 − σ 2   ( )  2K 2 d 2 = + (k − 1) σ 4 ..(nλ )e −nλx n = n! λn e − λΣ kxk .Chapter 6: Point Estimation 32. E (Y ) = ∫ y ⋅ dy = θ ..7 ( ) MSE KS 2 = Var ( KS 2 ) + Bias ( KS 2 ). + ( 6)( 2.. x2 = 41. f (x1 . X n ≤ y ) = P( X 1 ≤ y ). sp n  y a..5. n = 6..5. 137.3.7 – 25.3. 3 29.8 28.4 5 27.54.5)(. Therefore.32.86 = 1.Chapter 6: Point Estimation 35.5)(7.08.7 5 39. .5 23. 1.2 29.3 26. .(1.9 29.2 5 32.81.3 23.6 27. 1.42.3 There are 55 averages.6 31.0 28.7 36.35. xi + x j 23.17.3 28.5) Γ(10) ⋅ 2 19 .1 37.6745 This estimate is in reasonably close agreement with s.81 + . Γ factors and the Γ(10 ) = 9! and Γ(9.0132. . but Γ(.4 32. Then E(cS) = cE(S).2 29.22.9 5 31.54.54.30.4 5 41. 37.2 28.4 25.9 49.8 5 28.7 5 27.85 29.5 24..5).7 39.9 5 40.4687 .8 30.6 33.2 5 26.275 .0 28.5 5 30. 217 .1 29.45 30.8 5 27.5 26. 1.9 39.1 5 25.92.86 .02.50.6 31.3 5 31.6 49. and c cancels with the two square root in E(S).9 26. .5 36.5) .6 31. . so the median is the 28th in order of increasing magnitude.91. 1. c= Γ(9.5 26.5) = (8.5 5 28. 5)Γ(.71.0 30. With ∑ x = 555.91) 2 = . The estimate based on the resistant estimator is then .0 28. . The median of these values is (.9 49. Straightforward calculation gives c = 1. 1. The xi − ~ x ' s are. .53. µˆ = 29.9 5 28. .490 .9 30.4 29.5 ) = π .5 27. .8 28.5 29.5 30.1570 = 1. in increasing order.0 5 30.4 30. s = s 2 = 2.6 5 38. 2. 3.0 5 30.5 29.75 29.15.3 28.1 28.7 28. 2. . leaving just σ .3 33. 1. When n = 20. .02.6 31.6 5 38.4 5 27.4 30.86 and ∑ x 2 = 15. .65. Let c= Γ( n2−1 ) Γ( n2 ) ⋅ 2 n −1 ..4 29.0 5 28.7 5 33. a. E (σˆ ) = ( ) ) 1 1 2 2 E Σ ( X i − Yi ) = ⋅ ΣE ( X i − Y ) . and dσ 2 2 solving for σ gives the desired result. ( 2 ) 1 2σ 2 ( b. Now taking . so the mle is definitely not unbiased. but 4n 4n 2 2 E ( X i − Y ) = V ( X i − Y ) + [E ( X i − Y )] = 2σ 2 + 0 = 2σ 2 . Substituting these estimates of the µˆ i ' s into the log 2 ( likelihood is thus ) likelihood gives 2 2   xi + y i  xi + y i     − n ln 2πσ − x −  + ∑  yi −    ∑  i 2 2      d 2 = − n ln (2πσ 2 ) − 2σ1 2 12 Σ (x i − y i ) . Taking and equating to dµ i x i + yi zero gives µˆ i = . The log 2 2 d − n ln 2πσ 2 − (Σ( xi − µ i )2+σΣ2 ( y i − µ i ) ) . Thus ( ) E σˆ 2 = 1 1 σ2 Σ 2σ 2 = 2nσ 2 = . the 4n 4n 2 ( ) expected value of the estimator is only half the value of what is being estimated! 218 . equating to zero. The likelihood is n Π i =1 1 2πσ 2 e − ( xi − µ i ) 2σ 2 1 ⋅ 2πσ 2 e − ( y i − µi ) 2σ 2 = 1 (2πσ ) 2 n e  Σ ( x − µ ) 2 + Σ ( y − µ )2  i i i i  2σ 2 − .Chapter 6: Point Estimation 38. 15 . 115. the mean is either enclosed by it.125 = 1.96 . a. smaller than the z of 1.9985 in the main body of table A. In theory. The higher confidence level will produce a wider interval. c. The interval (114.1 1. c. above) Also. or not. 75% implies α = .645. a.44 for α = 2[1 − Φ(1. x= 114. for repeated sampling. α 2 = . 2 3. 219 µ . 99. if the process were repeated an infinite number of times. Not a correct statement. so α = . The interval is an estimate for the population mean. Not a correct statement. the z critical value for a 90% confidence level is 1. the Z table. α 2 = .CHAPTER 7 Section 7. a.) d.6) has the 90% confidence level.0015 . The sample mean is the center of the interval. thus producing a narrower interval. z α 2 = 1.44)] = . b. z α 2 = 2. not a boundary for population values. (Look for cumulative area . b.81) = .003 .5% .25 .4 + 115. and 100(1 −α )% = 85% .125 .4.15 . Once and interval has been created from a sample. 2.6 = 115 .81 implies that α 2 = 1 − Φ (2.005 and the confidence ( level is 100 1 −α )% = 99. d.7% implies that α = . A 90% confidence interval will be narrower (See 2b.0015 = 2. and z . and z .3. The 95% confidence is in the general procedure.0025 . Not a correct statement.96 for the 95% confidence level. so b. 95% of the intervals would contain the population mean µ . 1) 100 d.18).3 ± .33 .96)(.12. n =   = 93.34(3) = (57.58)3  n=  = 239.7. (1.52.34 and 58.96(3) = 58.9 ) 100 c.40   2 c.2   2 6.85 ± .3 ± 1.18 = (57.59.96)(.1.3 ± 1.7 ) .96(3) = 58.75 220 . so n = 94.02 .  2(2. 8471.77 = (57. 25 1 − α = .Chapter 7: Statistical Intervals Based on a Single Sample 4.9 = (8406.9).09 = 1.33)(.75)  d. 58.3 ± 1.08 ⇒ α 2 = .3 ± 1. .9.59 = (57. 58.82 ⇒ α = .58(3) = 58.33 = (4. 5. 82% confidence the interval is ⇒ 1 − α = .5) 25 b.58. a. 4.92 ⇒ α = . 20 z α 2 = z . so the interval is 4.56 ± (2. 100 e. 5.75) = (4. b. 8439 ± (1.75)  n=  = 54.18 ⇒ α 2 = .5.58. a. so z α 2 = z .645)(100) = 8439 ± 32.09 .85 ± 2 5.  2(2.02 2 = z. .61 . 58.59.3 ± 2.58)(.62 so n = 240. b.04 so z α 2 = z .1.75 ) = 4. 1   a.3 ± .00). 01 = 2. 16  2(1. so n = 55.04 = 1. These inequalities can be  n manipulated exactly as was done in the text to isolate µ . n   3 58. x = 4.92 σ .85 . From 4a.75 4. so a 100(1 − α )% interval is n n  σ σ   X − zα 2 . The usual 95% interval has length (z α1 + zα 2 )σ .0375 = 1.78 . which is longer. X + zα1  n n   a. so the interval is (4. the result is σ σ X − zα 2 ≤ µ ≤ X + zα1 . n 9.78) σ n 3. If n ′ = 25n .3 + 2.0 . σ = .75 . σ = 3. x + z α  .02 σ . n   .70 ) 25 221 .85 − 1. while this interval will have length n z α1 = z. c. so the length is decreased by a factor of 5.3 . With n (2.59. ∞) .33 = (− ∞ . 20  σ   x − zα . n = 20. σ  z α1 ≤ ( X − µ )  ≤ zα 2 . n = 25.645 . ∞  . Thus halving the length requires n to be 4n  n  2  2 L increased fourfold.5741.24 and z α2 = z .24 + 1.645 = 4. ∞  n    σ   − ∞. x = 58. b.  σ   x − 1. With probability 1 − α .0125 = 2. 5 If L = 2zα 2 8.Chapter 7: Statistical Intervals Based on a Single Sample 7. then L′ = . b.5741 . a. σ and we increase the sample size by a factor of 4. From 5a. the new length is n σ  σ  1  L L′ = 2 z α 2 = 2zα 2   = . the length is = 4. 96)(.96 = 1.0643 = . P(940 ≤ Y ≤ 960) = P(939. From the 30 d.2 12. 2λ ∑ X i has a chi-squared distribution with 30 d.Chapter 7: Statistical Intervals Based on a Single Sample 10.16 ) ⇒ n= = 12.f.028 ± 1.163 = 1.73.990.58 = .. i  2∑ xi 2∑ x i  1   as a 95% C.05 n 222 .  46.f.025 s .979 16. Using the normal approximation to the binomial distribution.58 s .6. the critical values that capture lower and upper tail areas of . which replace 16.791 and 46.2 the interval is (2.5) = P(-1.v. V (X ) = 1 1 when X has an exponential distribution.066 ) n 69 b.53).08 = (.81 ± .8714. A 99% confidence level requires using critical values that capture area .95.787 and 53. so the standard deviation is .5 gives ∑x b..89 ) n 110 13. I. Thus the interval of a is also a 95% C. x ± z .038 = (.69.34 = .05 = 2(1.I. x ± 2.16 ) 2(1.9357 .979 in a. 11.005 in each tail of the chi-squared curve with 30 d.672. 7. Section 7. a. row of Table A. When n = 15. c. a.1.52) = .544 )2 ≈ 158 . with n = 1000 and p = .025 (and thus a central area of . these are 13. for µ = . for the standard deviation of the lifetime distribution. the expected number of intervals that capture µ .81 ± 2.95) are 16. An argument parallel to that given in Example 7..5 ≤ Ynormal ≤ 960.791  λ   = 63.544 ⇒ n = (12.028 ± .f. and σ Y = npq = 6.791 and 46. w = . Since .52 ≤ Z ≤ 1. Y is a binomial r. 2 λ λ the same as the mean.979. so E(Y) = np = 950.96)(.892 . 96 )2 ± 1.5700 ± .16 ) b. We calculate a 95% confidence interval for the proportion of all dies 356 that pass the probe: . 90% lower bound: pˆ = x − z .54. a..84) = .615) 1.64 = 389.53.80 .67 .33 = 135. .1 + 1. and Φ (.5646)(.1 . so the confidence level is 75%.56 = (88.30 = 4.513. It appears quite precise.39 − . 3. so the confidence level is 80%.7486 ≈ ..98 . 18.28 = 4.59 = 135. s = 31. 19.7995 ≈ .01 s 4. 17.0518 = (.75 .01079 .10 ± . and Φ (.96)2 2 2(356 ) 356 4(356 ) 2 ( 1.53 With a confidence n 153 level of 99%.10 ± 1. b.10 s 1.66 ) .89.39 − 2.1 + 7.5. so the confidence level is 98%.96 a.5646 . z α = 2. the true average ultimate tensile strength is between (134.74 n 46 16.645 = 382.Chapter 7: Statistical Intervals Based on a Single Sample 14.84 .06 n 75 201 = . 89. The 95% upper confidence bound = s 31. x = 382.  (1.5646 + (1.73 = 89.5   2 15.05) = .865 = 134. c. and Φ (2.5 x + zα = 382.25 − 1. z α = .9798 ≈ .96 ) 1+ 356 223 = . Yes.96 (.86 ⇒ n = 246 .05 .67 ) = . n =   = 245. n = 46. this is a very narrow 169 interval. ∞).96)(.4354 ) + (1. z α = . x − z . n = 56.15 ± 2..6486)(. The 99% confidence interval for p is 37 .58 = .96 .3282 = ≈ 659 .072 )(.01(2.1799 37 2(2. the 95% lower confidence bound is: 539 .218 1. x = 8.01) ± 4(2.1043 1.58) (. s = 1.163) 133 = .17 .58) (. the 99% upper confidence bound is: 2 2 ( 2.I. pˆ = (.072 + + 2. We make no assumptions about the distribution if  56  percentage elongation. n= 2 4 .645) 2 2 539 4(539) 2 ( 1.42  8.645) 2 2(539) (.2468 + (1.58) (.58)2 2 37 4(37 ) (2.58)2 1+ ± 2.11) for the confidence interval for p is sufficient.261636 ± 3. Because the sample size is so large.58) 2 b. pˆ = 24 = .58) 2 2(37) (..3514 ) + (2.33 + 2 2(487 ) 487 4(487 ) 2 ( 2.0111 487 23.25 − .8. pˆ = . .013 = (.137.072 .33) .01) + .25) − (2.96  = (7.0307 = .6486 + (2.005 539 22.58 21.15 )(. the simpler formula (7.438.33) 1+ = .6486 .42.2493 − .17 ± 1. The interval is  1. 4 For a 95% C. z α 2 = 1.Chapter 7: Statistical Intervals Based on a Single Sample 20.33) ( .2468 .85) 4722 = .01 24.928) (2.7532) + (1.01 3.25)(.645 = .798.645 ) 1+ − 1.814 ) 1.2216 = (. 224 . a.542 ) .2468)(.0776 + .0279 = .15 ± .7386 ± .. 06 4.753 c. n= n= With 2 4 4 ≈ 381 .96) 4 . b.704 .341 b. 1.96 = 4. the desired conclusion. 1.62) 50 x ± zα / 2 z2 2 .Chapter 7: Statistical Intervals Based on a Single Sample 25.06 ± .01) + .25 − .96) (. θˆ = X and σ θˆ = λ so σˆ θˆ = n ≈ 339 X .96) (. 1.  x + 2  x + 2   1 −   n + 4  n + 4  .01 θ = λ .708 225 d. a. The variance of this quantity is np(1 − p ) p(1 − p ) x+2 .I.01 2(1.01) ± 4(1.25)(. or roughly . then n+4  x + 2   ± zα 2  n + 4 x* pˆ * = * and the formula reduces to pˆ * ± z α 2 n pˆ * qˆ * .01(1.06 ± 1. 2(1.56 = (3.684 e.96) (. 26.01) + .50. 1. which is roughly x + 2 with a Note that the midpoint of the new interval is n + z2 n+4 confidence level of 95% and approximating 1. For clarity.25) − (1. Section 7.01(1.96) (. Now replacing p with . see the Agresti article. For n* further discussion.96) ( 2 1 3 ⋅ 23 ) − (1.3 28.06 . The large sample C.01) ± 4(1.4.96 ) 2 a. let x * = x + 2 and n * = n + 4 . so x = 4. and a 95% interval for λ is n 4.96) 2 ( 4 1 3 ⋅ 23 )( 13 ⋅ 23 − .96 ≈ 2 . we have 2 n+4 n+4 (n + z 2 ) x+ 27. We calculate ∑ x i = 203 . is then n x . 2. 32.005. 50 = 2. is t . 7 = 2. − t .131 e.025. t . t . t .025.1  30. t .01.2 ± 2.01.712 a.571 a. ≈ t .01. 4 = 3.10 = 1.025. 25 = 2.Chapter 7: Statistical Intervals Based on a Single Sample 29.10 = 2. 20 = 2.025. t .086 e.025.492 c. t .  8 226 .485 c.05.602 f.228 d.01. 37 ≈ 2.365 .10 = 2. = n – 1 = 7. t . 5 = −2.6 = (27. t . 24 = 2. 4 = 4. a. so the critical value for a 95% C.6.15 = 2. 32.025. d. t 05.604 b.365)  = 30. t .845 f.005. t . t .8 ) .f.005.429 30.947 f.01.678 b.747 b. 31.064 c. t .025. t .2 ± (2.005. t .15 = 1.005. 20 = 2.228 d.753 e. 37 ≈ 2.15 = 2. The interval is  3.812 d.I. 24 = 2. t .15 = 2. 05.29 ± 7. is interval is t .120 )  = 438.13 = 1. s = . in the long run 95% of these bounds will provide a lower bound for the corresponding future values of the proportional limit stress of a single joint of this type.771  = 8. since it lies outside the interval.48 − 1. If 14 this bound is calculated for sample after sample. with no outliers.11. With  14  95% confidence. If this interval is calculated for sample after sample.I.48 . in the long run 95% of these intervals will include the true mean proportional limit stress of all such joints. it is reasonable to assume the sample observations came from a normal distribution. x = 8. The data appears to center near 438. a. is not.79. however. With d.771 A 95% lower confidence bound:  . the critical value for a 95% C. the value of the true mean proportional limit stress of all such joints lies ( ) in the interval 8.771(.51. ∞ .Chapter 7: Statistical Intervals Based on a Single Sample 33.785 = (430. 34. n = 14.45 = 7.08 ) .025.48 − . 420 4 30 4 40 450 460 470 polymer b. = n – 1 = 16.120 . a. 450.446. A 95% lower prediction bound: 8. Based on a normal probability plot.f.48 − 1. We must assume that the sample observations were taken from a normally distributed population.79  8.16 = 2.  17  Since 440 is within the interval. and the  15.03 . The boxplot indicates a very slight positive skew.11 . b.29 ± (2. c.79 ) 1 + 1 = 8.37 = 8. 227 .48 − 1.14  438. t . 440 is a plausible value for the true mean. 228 .69 ± 1..2180 ) . s = 24.8876.85  26  b. t .615(.05.16 = 378.14 26 Following a similar argument as that on p.69 + 42.1) .615.9255 ± .I. A 95% upper prediction bound: 370.36 ) 1 + c.9255 ± .84.9634 ) a.69 + 1. s = .0809 ) ⇒ (.708 A 95% upper confidence bound:  24. about 2.776 )  ⇒ (2783.f.708(24.0990) A tolerance interval is requested.3. is considerably larger than the C. A 95% C.47. n = 26. confidence level 95%.6 .6330.1.0181 b. 36. The tolerance critical value. We eventually arrive at T = ~t n 4 4 s 12 + 1n  2 n distribution with n – 1 d.53) 2 26 ( ) = .0. for the mean: b.69 ± 30.36  370. The 5 P.776  84  2887.I. so the new prediction interval is x ± t α / 2.093 . is 3.9 )  5 a.6 ± 2. 25 = 1. The interval is .776(84) 1 + 1 ⇒ (2632.36 ) 37.69 + (1.1.36.2991.3143. n−1 ⋅ s 1 2 + 1n .7520.I.9255 ± 2.9255 ± 2. with k = 99. n = 5.69 + 8. t .I. A 95% C. : . 300 of the text.45 = 413.0809) 1 + 201 = ..1.5 times larger.093 c.708(24. For this situation. A 95% Prediction Interval: 2887.6 ± (2.025.400. x = 2887.708)  = 370.Chapter 7: Statistical Intervals Based on a Single Sample 35.6. : . from Table A.I. 1 = 370. x = 370..69 .1735 ⇒ (. and n = 20. a. (.9255 ± 3. we have 370.53 = (39. 4 = 2. A 95% P.0379 ⇒ (. we need to find the variance of X − X new : V ( X − X new ) = V ( X ) + V ( X new ) = V ( X ) + V ( 12 ( X 27 + X 28 )) = V ( X ) + V ( 12 X 27 ) + V ( 12 X 28 ) = V ( X ) + 14 V ( X 27 ) + 14 V ( X 28 ) = X − X new σ2 1 2 1 2 1 1 + σ + σ = σ 2  +  . 1 1 + = 370. N = 25.Chapter 7: Statistical Intervals Based on a Single Sample 38.231 ± 3.179(14.460. with 95% confidence..0635 . a.0065 ) 1 + 25 = .0772 ) .01 .081.20 . and n=13. We require a tolerance interval.0498.025.0635 ± 2. with k = 95.001 30 50 70 volume Average: 52.824 ) 229 .999 Probability .856) = 52.771 ⇒ (6. x ± (tcv)s = 52.593 ⇒ (18.85.80 . a.972 .05 . b.002 ) c. A prediction interval.231 ± 45. : .98.99 .0442. it is plausible that the population distribution is normal.0828 .638.8557 N: 13 Anderson-Darling Normality Test A-Squared: 0. critical value 2.360 P-Value: 0. ( ) ( ) 39.231 ± 33.0137 ⇒ (. the tcv = 3.972 (table A. b. generated by Minitab.6): .081(14. (from table A6.856 ) 1 + 131 = 52.0635 ± ..50 .392 Based on the above plot.0635 ± 2. k = 95.0065 1 95% P. with t . 99% Tolerance Interval.231 ± 2. x = . Normal Probability Plot .2308 StDev: 14.I.179 : 52. s = .064(.0065 ⇒ .95 .12 = 2. 725 captures upper tail area .5 shows that 1. A Normal Probability plot.05) = .065 P-Value: 0.231 ± 33. so it should be used.856 ) 1 + 131 = 52. This will always be true for a t interval.15 + .231 ± 2. For the first interval.85.15) = 70.179(14.687 + 1.50 . The width of the first interval is s (. We need to assume the samples came from a normally distributed population.2412s = .01 .05 and 1.185 n n and 2. c. is the shortest.999 .593 ⇒ (18. row of Table A.99 Probability . a.80 .001 125 135 145 strength Average: 134.008 The very small p-value indicates that the population distribution from which this data was taken is most likely not normal.25 + .05 .824 ) The 20 d.54186 N: 153 Anderson-Darling Normality Test A-Squared: 1.20 + .Chapter 7: Statistical Intervals Based on a Single Sample 40.725) . Thus each interval has confidence level 70%.70 and 1 – (.20 . 41.128 respectively. b. The third interval.95 .902 StDev: 4.f.70. 230 .325 captures uppertail area .10 The confidence level for each interval is 100(central area)%. central area = 1 – sum of tail areas = 1 – (. and for the second and third intervals the central areas are 1 – (. with symmetrically placed critical values. 95% lower prediction bound: 52. whereas the widths of the second and third intervals are 2.10) = . generated by Minitab: Normal Probability Plot .638. The 95% interval for σ is  17. d. 25 and 37. The interval for σ 2 is  21(25.5.65 = χ . b.295.8. Σ xi = 1701. 28.2025.313 f. row) b. 28.987 = χ . χ .368)  .523 (from .6. so s 2 = 25.f.7 give the necessary chi-squared critical values as 8. n – 1=8. P χ .180  3. 25 d.214 23.38 ) .   = (3. ( ( ) ) χ .519 a.14 = 23 . χ . 2 .3 and Σ xi2 = 132.f.90 .1. 25 = 11.60 .60.66. 44. χ . 25 = 44.307 b.98) . 25 ≤ χ 2 ≤ χ .543 2. ) n = 22 implies that d. Using a normal probability plot.78 = χ .368) 21(25.317 ) and that for σ is (3. 685 the 95% upper confidence bound for 14(1. 25 = 10.90.2975. 25 .368 . χ .f.2 42. χ c. 22 and 36. P χ .381 column.995 and .90) 8(7. 22 = . 22 .2005.2995.95 . row) 43.399. χ .033   41. 15 d.10 = 3. 25 = 46. ( 45.307 (.21.2975.940 c.15 = 22.925 e. Since 14.201.90)  .8 = 17.097.8 = 2.685 2 is 231 σ .205. χ .2025.10 = 18.1 column. 25 = 34. so the .033 and 41.61 = χ . we ascertain that it is plausible that this sample was taken from a normal population distribution.205. Since 10. = n – 1 = 21.1) Validity of 8. 22 ≤ χ 2 ≤ χ .   = (12.399 this interval requires that fracture toughness be (at least approximately) normally distributed.295.35 .579 . 25 = .543 .295.2975. n = 15.005 columns of Table A.98 = (1.2025.Chapter 7: Statistical Intervals Based on a Single Sample Section 7.579) = 1.205 .99 d. 46. and χ . χ . a. With s = 1.180 .299.868.205. so the 95% interval for σ 2 is  8( 7. χ . a. 53.1319 = (.0.0 ± 2. s 2 = 23.Chapter 7: Statistical Intervals Based on a Single Sample Supplementary Exercises 47. 9 49.868. The extent of variability is rather substantial.96 2 .96 = 8. and s = 4. n = 18.I.66 ± 2.96 48 2 = .13 = (33. A 95% C. x = 38.868 = 8.586 s = 8.166. requires 188. yes.473 = 38. 38.96 b.I.96 2 2(48) 48 4(48) pˆ = 1+ 48. x = 8. for µ = the true average strength is x ± 1.896 .I.896 1.17 = 2. is 48 1.473. c. s 4.66 .2708 .410) 1.2708 + ± 1..9.079 ± 1. A 98% t C.079 . 20 30 40 50 %porevolume b.7017.2 = 188.96 2 (.0 ± 7.195.586 .0800 t α / 2. normality is plausible. The interval is 7.079 ± 1. and t .2708)(.0 = (181. so.43. There appears to be a slight positive skew in the middle half of the sample.7292) + 1. A 95% C.456 ) n 48 13 = .377 = (6.0) . n = 48.79 ) . 18 232 .01.3108 ± .01. The 98% confidence interval is 8. a.66 ± 5.702. but the lower whisker is much longer than the upper whisker. 8 = 2. a. The pattern of points in a normal probability plot is reasonably linear.n −1 = t . although there are no outliers. 753 t .25)(.10 . A 95% prediction interval. with χ 2 . t .214 ± .5055 . 2 2 2 4 4 .3530 = 1079. 234. 036 ) 1+ 1 16 = .533.5055 231.645) (.198.776 .738) limit – lower limit = 3.0145 = . 4 = 4.025. and solve for s.230 ) 16 σ 15(. .016 = (. and width = upper  n s 5 (3. 214 ± 2 .01353 2(1. 131 . 341 .7 ⇒ use n = 1080 .120 = 1 .502 = 231.. for µ is .15 b. is 200 1. a.645 2 (.604 = 213. 214 ± .645) 2 b.776) 5 a 99% C.738.0025 1.15 = 2 . with t .0025 c.Chapter 7: Statistical Intervals Based on a Single Sample 50.0025) + . Assuming normality.633 ± 4.25 − ..645) (. a. 52. 5 51. 4 = 2.645 2 . Here. it gives a 95% upper bound. So for 2(2. 131 (. 2931 ) . and the interval is 1. x = the middle of the interval = 229.645) (.0547 = (.624 .I. No. do s 95% C.633 ± 3. n = 5.341 2 .214 ± 1.025 . is 1.604 .15 = 1.732 ) 1. 4 )  . A 90% upper bound for c. is .764 + 233.6868 ± . To find s we use 2  s  width = 2(t .I.025.25) − (1.036 = . t . 0791 = (. 136 = . 233 .05) ± 4(1.005.680 )(.05.633.320 ) + 1.645 2 2(200 ) 200 4(200 ) pˆ = 1+ 1. 1349 . 3.I.680 ⇒ a 90% C.052 (1.3462 ± 1.753 .680 + ± 1.733) .100 = (228.036) = ..738 = 2(2776) ⇒s= = 1.645 200 n= = = .. 75 ± 2.I. Proceeding as in Example 7. and the sample contains no outliers.2246 ± .62 ) .5).125 = (139.0492  2(1. = 20. 2r     where t r = y1 + . For the given data. t .I. a. 57.. the necessary critical values are 9. 56. 232.2   2 55.625.3 and 228. + y r + n − r y r .88..16) . With θˆ = 1 3 (X 1 + X 2 + X 3 ) − X 4 .Chapter 7: Statistical Intervals Based on a Single Sample 53.2. Note also that the lower and upper fourths are 192. is 209. the C. The large-sample interval for θ is then 1 3 (x1 + x 2 + x 3 ) − x 4 ± zα / 2 1 9 2  s 12 s2 s  + 2 + 3 n2 n3  n1  s 42 .6452 (. A normal probability plot lends support to the assumption that pulmonary compliance is normally distributed.75 ± 70. and t r = 1115.156) = 209.86 . r = 10.. The censored experiment provides less information about 1λ than would an uncensored experiment with n = 20. With d. so the interval is − .170.156 = 209. 2r χ α 2 . 1718 .645 55 2 = .96(.87 = (196 . 50 .5 with Tr replacing ( ) Σ X i .1295.2)(.f.222. 1. is 55 .75 ± 12.0887 = (.15 = 2. so the fourth spread is 35.3.025. n = 20.50 ± 1. so . and the tolerance critical value is 2. σ θ2ˆ = 19 Var (X 1 + X 2 + X 3 ) + Var (X 4 )= 1  σ 12 σ 22 σ 32  σ 42 .1718) = (− . 16 K = 95. for 1  2t r 2t is .2986 ) .75 ± 2. 279. In Example 6. b.903. 54.875) . 234 . n = 16.96)(.591 and 34. 2r 2 λ  χ1−α 2 .2 + 1.1.645 2(55 ) 55 4(55)2 1+ 1.2 ⇒ a 90% C.8) + 1.131 .  +  n4 σˆ θˆ = .903(24. 24.I.131 c.−. so the 95% tolerance interval is 209.7. giving the interval (65. so the C. so n =   = 245. This is obviously an extremely wide interval.84. ˆ θ = − .8.645 2 ± 1. pˆ = 11 = . ˆ is obtained by replacing each ˆ 2 by 2 and taking the σ θˆ σi si  + + + 9  n 1 n2 n 3  n 4 square root.8)  The specified condition is that the interval be length . n = 246 should be used. 44. n max ( X i )   1 − α .65). c. min( xi ) = 1.5)1 (.54. 7. a.05. is (1..5) − (. a.54). 235 . ~.5) n −1 )% confidence interval. µ~ < X n ) − P( X 1 < µ~. With α = . X n < µ~) = 1 − (. max(xI)=4. From the probability 2 2 (α 2 ) ≤ 1 ≤ (1 − α 2 ) with probability 1 − α . which leads to the C. n −1 n . of b..5) = 1 − 2(. 1 α n   It is easily verified that the interval of b is shorter – draw a graph of f U (u ) and verify that the shortest interval which captures area 1 − α under the curve is the rightmost such interval.Chapter 7: Statistical Intervals Based on a Single Sample 58..I.I. for θ. max ( X i ) θ max ( X i )  max ( X i ) max ( X i )   reciprocal of each endpoint and interchanging gives the C. 59.44 and max( xi ) = 3. 1 1 n n −1 = 1 − 2(n + 1)(. α n 1 ≤ 1 n max( X i ) θ 1 ≤ 1 with probability 1 − α .. a nu n−1du = u n ](( 1−α / 2 ) 1/ n α /2) 1/ n =1− α α − = 1 − α .5 ) .. (α )   2 2  1 n 1 n 1 b. < min( X i )) − P(max( X i ) < µ ) = 1 − P( µ~ < X 1 .5 ) n b.5 )n−1 .2.. from which the confidence interval follows. so 1 ≤ ≤ 1 with θ max ( X i ) α n probability c. Since P( X ( 2) ≤ µ~ ≤ X ( n−1) ) = 1 − P( µ~.   (1 − α ) . ∫( (1−α / 2 )1 / n α / 2) 1/ n (n + 1)(. so taking the statement. the C.I..5) = 1 − (n + 1)(.5) n −  (. 3. . n = 5.5)n −1 (.5) n −  (.5 ) Thus the confidence coefficient is 1 − ( 100 1 − (n + 1)(. < X ( 2) ) − P ( X ( n−1) < µ~ ) ~ ) – P(at most one X exceeds µ~ ) = 1 – P( at most one X is below µ I I n n 1 − (. this yields (4. < min( X ) or max( X ) < µ~) P(min( X i ) ≤ µ~ ≤ max( X i )) = 1 − P ( µ i i ~ ~ = 1 − P( µ . or in another way. which yields the interval  max ( X i ).2..5)n−1 − (. 295.f.2 ± 2.05. 2n 2Σ X i ⇒e  − tχ . so the t interval is robust interval is  5.080)  = 77.205. 2 n  < exp  . 2(550.2.2 ± (1.2  76.95 . The following inequalities are equivalent to the one in parentheses: λ< χ .205. Arguing as in a.205. This implies is a lower confidence bound for λ with confidence coefficient 95%. Table A. which is minimized when z γ + zα −γ is n d −1 −1 minimized.6 = (73.78. respectively.7 gives the chi-squared critical value for 20 d. 2n is .7. when Φ (1 − γ ) + Φ (1 − α + γ ) is minimized.295. ( ) P 2λΣX i < χ .33 ± (2. the lower and upper fourths are 73. and the area under this chi- squared curve to the right of that χ .205. and t .87 ) b.1. P(χ . 236 .410 2 Σ xi = 550.8) .79.f.95 . We can be 95% confident that λ exceeds . Taking and dγ 1 1 equating to 0 yields = where Φ (• ) is the standard normal p. The length of the interval is 61.025.295. Since 2 λΣ X i has a chi-squared distribution with 2n d.2.e. which is pulled out to the right of ~ x by the single mild outlier 93. 2n 2Σ X i χ . a.851. ~ x = 76.33 ± 2. and f s = 6.95.  22  x = 77. 21 = 2.5 and 79. the interval widths are comparable.Chapter 7: Statistical Intervals Based on a Single Sample (z + zα −γ ) s .058 as the lower bound for the probability that time until breakdown exceeds 100 minutes.037  77. as 10.0098.d. 2n = . The t interval is centered at  22  x .0098 .6 ) . so the bound is 10.  2ΣX i  Σ X i by Σ xi in the expression on the right hand side of the last inequality gives a 95% lower confidence bound for and − λt e −λt ..93)  = 76. χ .87 gives .037.f.7.851 = . Φ(1 − γ ) Φ (1 − α + γ ) α whence γ = .080 . 2n 2ΣX i Replacing the ⇒ −λt < − tχ .6. i. 2 60. Substituting t = 100.23 = (75. s = 5. 20 = 31.33 . 62. The γ  6.2 n < 2λΣX i ) = . Using Ha: µ < 100 results in the welds being believed in conformance unless provided otherwise. so these are not legitimate. a. so these hypotheses are not in compliance. f. 237 . so cannot appear in a hypothesis. so our In this formulation. c. The sample standard deviation s is not a parameter. H is an assertion about the value of a parameter. This assertion will not be rejected unless there is strong evidence to the contrary. whereas Ha does here. d. h.CHAPTER 8 Section 8. d. σ = 20 ). Yes. b. Yes. 2.g. so does not belong in a hypothesis. No. Yes. so the burden of proof is on the non-conformance claim. − µ 2 in Ho should also appear in Ha. ~ X is not a parameter. f. It does not here. a. These hypotheses comply with our rules. No. Ho is not an equality claim (e. Ho states the welds do not conform to specification. X and Y are statistics rather than parameters. These hypotheses comply with our rules. No. We are not allowing both Ho and Ha to be equality claims (though this is allowed in more comprehensive treatments of hypothesis testing). The assertion is that the standard deviation of population #2 exceeds that of population #1 e. The sample median c. It is an assertion about the value of a parameter. b. Each S2 is a statistic.1 1. Ho should contain the equality claim. g. e. These hypotheses are in compliance. 3. Thus the burden of proof is on those who wish to assert that the specification is satisfied. The asserted value of µ1 conditions are not met. With this formulation. so that the probability of this error can be explicitly controlled. Type I error: Conclude that the standard deviation is < . Type II error: Conclude that there is no difference in the two laminates when in reality. Type II error: Conclude that the standard deviation is . Notice that in this formulation.05 mm when it is really equal to . The alternative reflects the fact that a departure from µ = 40 in either direction is of concern. A type II error involves judging the water unsafe when it is actually safe. Though a serious error. 8.05 mm when it is really < . so a test which ensures that this error is highly unlikely is desirable. If in your judgement it is the type II error.05 vs H a : σ < . Then the hypotheses are H o : µ1 = µ 2 vs H o : µ1 > µ 2 . The appropriate hypotheses are H o : σ = . and µ 2 = the analogous value for the special laminate. the special one produces less warpage. when it really does not. When the alternative is Ha: µ < 5 . A type I error involved deciding that the water is safe (rejecting Ho ) when it isn’t (Ho is true). 5.Chapter 8: Tests of Hypotheses Based on a Single Sample 4. Reasonable people may disagree as to which of the two errors is more serious.05 mm. A type II error occurs when we conclude that the plant is in compliance when in fact it isn’t. Let σ denote the population standard deviation. where µ is the true average burn-out amperage for this type of fuse. it is initially believed that the value of µ is the design value of 40. Using Ha: µ > 5 . It is generally desirable to formulate so that the type 1 error is more serious. 7.05 . the formulation is such that the water is believed unsafe until proved otherwise. A type I error here involves saying that the plant is not in compliance when in fact it is. 6. the burden of proof is on the data to show that the requirement has been met (the sheaths will not be used unless Ho can be rejected in favor of Ha. 238 . Let Type I error: Conclude that the special laminate produces less warpage than the regular. H o : µ = 40 vs H a : µ ≠ 40 . then the reformulation H o : µ = 150 vs H a : µ < 150 makes the type I error more serious. the type II error (now stating that the water is safe when it isn’t) is the more serious of the two errors.05. µ1 = the average amount of warpage for the regular laminate. this is less so than the type I error. This is a very serious error. Chapter 8: Tests of Hypotheses Based on a Single Sample 9.3) = .416 P( z ≥ 1.e. β (. α = P(type I error) = P( X ≤ 7orX ≥ 18 when X ~ Bin(25. .25.7) e.5) + 1 – B(17.33) = . and β (.4) = B(17.044 d. X has a binomial distribution with n = 25 and p = 0.26 iff z≥ 1331. 25.0808 13. x has a normal distribution with mean 1350 and standard deviation 13.645 (since 13. x ≥ 1331. Increasing α gives a decrease in β .416   c − 1300 d. iff z ≥ 2.. A type I error consists of judging one of the tow candidates favored over the other when in fact there is a 50-50 split in the population.25. R1 is most appropriate. x is normally distributed with mean E ( x ) = µ and standard deviation σ 60 = = 13.416 239 .. x = 6 is in the rejection region R1 . 13. 25.488 = β (.0188 .40) = . b.5. b.845 also. because x either too large or too small contradicts p = . Thus c = 1322.01 13. H o : µ = 1300 vs H a : µ > 1300 10.3) = B(17.26 − 1350   P z ≤  = P (z ≤ −1.3) − B (7. 25. β (. a.6 ) = 0.26 when Ho is true) = 1331.5) = ..26 − 1300   P z ≥  = P( z ≥ 2. where c satisfies = 1.645) = .845.05) . E (x ) = 1300 . A type II error involves judging the split to be 50-50 when it is not..08) = . c.26 by c.4 ) = P (8 ≤ X ≤ 17 when p = .5.5)) = B(7.4) = 0. When β (1350 ) = P( z ≤ −2.26 when µ = 1350) = 1331.5) – B(7. so Ho is rejected in favor of Ha. 25.. Replace 1331. When Ho is true.416. now c.416   µ = 1350 . Thus n 20 α = P ( x ≥ 1331.26 − 1300 i. e.07.5 and supports p ≠ . so β (1350 ) = P( x < 1331.416. a.33 .. 8) = P (2.08 < z < .124 nor region. c = ±2. Now σ .1 .04.5319 .124. it is still plausible that g. f.0078 d. x = 10. a. c. Since x is normally distributed with standard deviation σ .0632 .1) = P( 9.876 .005 + .1032 is replaced by c.2 = = .1032 or ≤ 9.08) = . 9.42 < z < 7. b. it is not in the rejection µ = 10.58) = .0632 n 3. so β (10.8968 iff z ≥ 2.1032 when µ = 10.58 . 240 . α = P ( z ≥ 2. Thus 10. Similarly. When β (9.8968whenµ = 10) . Similarly.58or ≤ −2.1) = P( −5.Chapter 8: Tests of Hypotheses Based on a Single Sample 11. E (x ) = 10. x ≥ 10.8968 is replaced by 9.58) = . 162 and so c = 10. Ho is not rejected. ≤ 9.2 c − 10 = = . H o : µ = 10 vs H a : µ ≠ 10 α = P( rejecting Ho when Ho is true) = P( x ≥ 10.8968 < x < 10.1.876.01 n 5 µ = 10.020 . Since x is neither ≥ 10.005 = .1032 or ≤ 9.58 or ≤ −2.96 . where = 1.58 e. 241 . ( E c.33) = . b.33 1   when µ = 98) = P( z ≥ 4. replace 1.33) = 0 . 120 when Ha is true). R2 should be used. x value substantially (x ) = 120 when Ho is true and .33 whenµ = µ o ) = P z ≥   σ n     n   = P( z ≥ 2. α (98 ) = P( x ≥ 102.   σ    µ o + 2. Similarly.33 when µ = 99) 102 − 99   = P z ≥  = P( z ≥ 3. where Z is a standard normal r. σx = d.4522 e. In general.87 .20 − 120   P z ≤  = P (z ≤ −2. α = P ( z ≤ −2.33) = . P( z ≤ −2. because when Ho is true Z has a standard normal distribution ( X has been standardized using 120). we have P(type I error) < .87 when µ = 120) = P( z ≤ −3. P(rejecting Ho when µ = 99) = P( x ≥ 102. To obtain α = .6667 ) = 114.01 when this probability is calculated for a value of µ less than 100.2 when µ = 115) = P( z > . a.v.6667 . β (115 ) = P ( x > 115. Similarly so this second rejection region is equivalent to R2 .6667   115. so α = P ( x ≥ 115.20 when µ = 120) = 6 n 115.001 .001 . The hypotheses are H o : µ = 120 vs H a : µ < 120 .33   σ  n  P( x ≥ µ o + 2.0004 .20 by c = 120 − 3. µ = true average braking distance for the new design at 40 mph.01 .88) = . The boundary value µ = 100 yields the largest α . Let b. σ 10 = = 1. since support for Ha is provided only by an smaller than 120.08) = .01 . a.88) = . so that P( x ≤ 114.002 .002 .Chapter 8: Tests of Hypotheses Based on a Single Sample 12.08(1. 13.33) = .12) = . whereas β (9. row of Table A. ( ) ( ) Section 8.65) = . Since µ = 9.f.1 represent equally serious departures from Ho .15 < z < .8940 when µ = 10) = P( z ≥ 2.8940 < x < 10.500) = . because the 15 d.88 when z has a standard normal distribution) = 1 − Φ (1.75 when z ~ N(0.5 shows that t .05 = .697 ) = . α = P ( z ≤ −2.5596. σ x = .9) = P( −.004 = .15 = . α = P( t ≥ 3. α = P ( z ≥ 1. 1) = Φ (− 2. d.001.) =. = n – 1 = 23.733 when t has a t distribution with 15 d.15 < z < 5. a.05 + .2 15.3733 b. d.9 and µ = 10.003 c. a.006 + . b.88 )) = . and = P (t ≤ −2.75) = .1) = P( 9.9 = β 10.01) = .5040 .1004or ≤ 9.f. A similar result and comment apply to any other pair of alternative values symmetrically placed about 10. = 30. so P( x ≥ 10.Chapter 8: Tests of Hypotheses Based on a Single Sample 14.001.10 242 .004 16.1 .04 . so α c.0301 b.51or ≤ −2.88) = .01 β (10.f.697 ) + P(t ≤ −1.88) + (1 − Φ (2.f.1) = P (−5.01) = . one would probably want to use a test procedure for which β 9.1004 when µ = 10. a. α = Φ (− 2.01 α = P (t ≥ 1. 500  β (20.3 is 1.33 so reject Ho . 75 − 70   Φ  − 2.000 = 2. since z = −1.0052 a.1) = .500) : Φ  2.56 > 2. 1 .8 2 18. 1500 16 a.8413 1500 / 16   c.88 + 2.5 is not ≤ −2.05 : n =   = 142. Ho is rejected if c.3 − 75 = −1.88 +  = Φ (− .2 .33 +  = Φ (1.9 when µ = 76)  72.Chapter 8: Tests of Hypotheses Based on a Single Sample 17.33 .5 so 72.00 ) = .5398 9 /5   e.56) = .33)  n=  = 87.5 SD’s (of x ) below 75. 20.88 = Φ (− 2. so use n = 88  75 − 70  area under standard normal curve below –2.33 when µ = 76) = P( X < 72.33 .000 − 20.0020 2 f.645)  β (20. so use n = 143  20.500) = .44) = .0003 .000 − 20.33 + 1.  20.9 − 76  = Φ  = Φ(− 3. 72. z ≤ −2. α (76) = P( Z < −2.9   243 .4602 so β (70) = .95 .500  d. α = 1 − Φ (2.960 − 20. don’t reject Ho . 1500(2.  9(2. α= d.88) = . z= b. b. 27 . 23. 25 = 1.58.645.58 + 1. so reject Ho .32 − 95 = −2.5 .3    c.91) = . Reject Ho if either z ≥ 2. so don’t reject Ho b. t= 206. n−1 a.179.025.14 < -1. 21.14 > -2.699 . With Ho : µ = .46 .05. 244 . don’t reject Ho . 0 .005.645.5 we reject Ho if t > t α / 2. so we don’t reject the null hypothesis and thus continue with the purchase.27 is not < -2.6 < t . 29 = 1. It appears that the true average weight could be more than the production specification of 200 lb per pipe. so we reject Ho in favor of Ha. Ho : µ = 200 . – 2.025. 1.3 .3  0 . we reject Ho if z < -2.01.24 = -2.16 6 . 1. and Ha: µ > 200 we reject Ho if t > t . a. –3. x − 360 .58 −  = Φ (− . 95 − 94   2 20.73 − 200 6. z = -2.73 = = 5.58 −  − Φ − 2. a.6 > -t.179. There appears to be a 24. 1  1    β (94 ) = Φ 2.69 − 360 = 2.6 > -t. σ = 0. At a significance level of .80 > 1.58 or z ≤ −2. The test appears to 1. With Ho : µ = 750 .36 / 26 contradiction of the prior belief. so don’t reject Ho d. so use n = 22. t = t= 370.708 . Since –2. so we reject the null hypothesis and do not continue with the purchase.28)  n=  = 21. and Ha: µ < 750 and a significance level of .35 / 30 22. -1.708 . z = -2.12 = -2. and Ha: µ ≠ . reject Ho if t > t .24 > 1.05 .699 .20(2.2266 0 . n−1 or t < −t α / 2. substantiate the statement in part a.75) − Φ (− 5.58 .12 = 2.797. so n 94.3 z= b. b. we reject Ho if z < 1.33.9 < the negative of all t values in the df = 24 row. Thus Ho should be rejected.Chapter 8: Tests of Hypotheses Based on a Single Sample 19.33. s/ n Ho : µ = 360 vs. so don’t reject Ho c. Ha: µ > 360 .05. Using α = .423 (from the df 40 row of the t-table).77 is .25) + Φ (− 3. 245 .99 < −2.423 . so we reject Ho .075 Ho : (− . Ho is rejected if 73. a.28 . reject Ho if t > t . Ho is not rejected.05 and conclude that true average penetration exceeds 50 mils. The alloy is not suitable.58 + 2.01 .25 − 5. This requirement is not 84 / 5 satisfied.33 ≤ −2. so use n = 217. Ha: µ ≠ 3000 . We wish to test Ho : 28.33)  n=  = 216.86 − 20 Since z = = −1.58 −  .7155.58 +  + Φ − 2.5 .01 test.58 . reject Ho .776 . Since 5.9 / 42 which is not ≤ −2. which is not ≤ −1. − . Ho is not rejected.6 / 73 data does not strongly suggest that true average t ime is less than 20.645 .3(2. 25. 41 ≈ −2.97 . 025. With µ = true average recumbency time. s 52.91) = .5 z= = −3.075    = 1 − Φ (1. the hypotheses are Ho : µ = 20 vs Ha: µ < 20 . 27.09 . The sample 8. .1 − 75 t ≤ −t . Ha: µ < 75 .6) = 1 − Φ  2.28 s/ n 18. Reject Ho if z ≥ 1. Since t = = −2.075  .13 . Since 3. t = t= 2887. 10 = −1.77 .776 . 26. so z = = 3. reject Ho at level .01. s/ n Ho : µ = 3000 vs. µ = 75 vs.58 or z ≤ −2. 4 = 2.7 − 50 = . reject Ho if either z ≥ 2. Ha: µ ≠ 5.7155 n ≥ 1.6 − 3000 = −2.Chapter 8: Tests of Hypotheses Based on a Single Sample 24. for a level . x − 20 The test statistic value is z = .105  .58 . µ = 5.1)   (− .645 .1   2 c.1)   1 − β (5. and Ho should be rejected if z ≤ − z.5 vs. x − 3000 . b. 5. (not specified in the problem description). 30. z= 11. so .Chapter 8: Tests of Hypotheses Based on a Single Sample 29. 246 .442 n 8 Ho is not rejected. t = = . 7 = 1.32 − 7 = −1.05 if t ≥ 1. s = 6.25 3. so from table A. n – 1 = 7.8 µ = 7 vs Ha: µ < 7 .17.3 .72 − 3. such as α= .1. Ho (prior belief) is not rejected (contradicted) at level . and n = 8. so a lower-tailed test is appropriate. Because -1. 6 7 31.43 1 Parameter of Interest: – 74 years.397 . The data does support the claim that average daily intake of zinc for males aged 65 .895 . b.397 . t = = −1. 6.50 − 4.05.24 .0 ) ≈ .645 significance. this does not exceed 1.00 1.442 . so Ho is rejected at level . β (4. s 1.65 / 9 −1. The hypotheses of interest are Ho : Ho should be rejected if not ≤ t ≤ −t .498 .17 < -1. and Since t .72 µ = true average dietary intake of zinc among males aged 65 µ = 15 Alternative Hypothesis: Ha: µ < 15 x − µ o x − 15 z= = s/ n s/ n Rejection Region: No value of α was given.01.05.645.3 − µ o 6. a.43 / 115 = −6. so select a reasonable level of z ≤ zα or z ≤ −1. For n = 8.50 = = . x = 11.895. d= µo − µ σ = 3.17 –6.25 n = 115.895 . so reject Ho . 2 Null Hypothesis: Ho : 3 4 5 = .40 .24 is 1.74 years falls below the recommended daily allowance of 15 mg/day. 2 Null Hypothesis: Ho : µ = 100 3 Alternative Hypothesis: Ha: 4 t= = x − 100 s/ n s/ n t ≤ −2. σ = 7.375 − 100 t= = −. µ o − µ is negative so n (µ o − µ ) / σ → −∞ as n → ∞ . 34.70(.96 6 z= 7 pˆ − po p o (1 − p o ) / n = pˆ − .30 ) / 200 Reject Ho .201 or t ≥ 2. The data does not indicate that these readings differ significantly from 100. For an upper-tailed test. The arguments for a lower-tailed and towconsidering tailed test are similar. The desired conclusion follows since Φ (− ∞ ) = 0 .9213 6.375 .Chapter 8: Tests of Hypotheses Based on a Single Sample 32. β = 0.96 or z ≤ -1. Section 8.3 35.5.70 4 z= 5 either z ≥ 1.201 98. x = 98. n = 12. thus n ≈30.70 .1095 a. Ho : p = . From table A. ( ) ( ) β (µ o − ∆ ) = Φ zα / 2 + ∆ n / σ − Φ − z α / 2 − ∆ n / σ = 1 − Φ (− zα / 2 − ∆ n / σ ) + Φ (zα / 2 − ∆ n / σ ) = β ( µ o + ∆) 33.70 = −2.17. x − µo µ ≠ 100 Fail to reject Ho .469 . s = 6.70 Ha: p ≠ . 1 2 3 Parameter of interest: p = true proportion of cars in this particular county passing emissions testing on the first try.30) / n 124 / 200 − . 247 . df ≈ 29. The data indicates that the proportion of cars passing the first time on emission testing or this county differs from the proportion of cars passing statewide.Φ(c) = Φ(-c) ). Since in this case we are µ > µ o . 1 Parameter of Interest: µ = true average reading of this type of radon detector when exposed to 100 pCi/L of radon.1095 / 12 5 6 7 b.10. ( ] ) = β (µ ) = Φ zα + n (µ o − µ ) / σ . [ (since 1 .70(. 15(.10   2 c.147 = = 3.645 .4 .15) = Φ   = Φ (− .01 significance level). Since the z critical value for a significance level of .60 ) / 150 Reject Ho .01 = 361.667 . 248 .85 ) / 100   .645 .15 − . the conclusion would not change.60) / n 82 / 150 − .10 Ha: p > .04 . β (. 1 2 3 p = true proportion of all nickel plates that blister under the given circumstances.85)  2 n=  = 19.01. (at the .28 .40 .10 = 1.02 ) = .40 .15(.10(. 37.40(.33 .58 6 z= 7 pˆ − po p o (1 − p o ) / n = pˆ − .40(.90) + 1.10(.90) / n 14 / 100 − .645 .40 4 z= 5 Reject Ho if z ≥ 2. 1 2 3 p = true proportion of the population with type A blood Ho : p = .90) / 100  β (.10 4 z= 5 Reject Ho if z ≥ 1.10 . so use n = 362 .40 Ha: p ≠ . When n = .90 ) / 200  200.60) = .05 is less than that of .15 + 1.10 − .10(. The data does not give compelling evidence for concluding that more than 10% of all plates blister under the circumstances. 15 + 1.85 ) / 200   1.58 or z ≤ -2.Chapter 8: Tests of Hypotheses Based on a Single Sample 36. Ho : p = .10 − .4920 .645 6 z= 7 pˆ − po p o (1 − p o ) / n = pˆ − .10(.90 ) / 100 Fail to Reject Ho .10(.2743 . . The possible error we could have made is a Type II error: Failing to reject the null hypothesis when it is actually true.15(. a. b. The data does suggest that the percentage of the population with type A blood differs from 40%.15) = Φ   = Φ (− . so .25 .05 .05) = Φ   = Φ (− 3.85) = . so Ho is rejected. 40. Ho cannot be rejected.95) / 500  β (. .0332 . Ho : p = .05(.02.58 .10 ) = Φ   = Φ(− 1.  . which is not ≤ -1. Let p denote the true proportion of those called to appear for service who are black.02 − . P = true proportion of current customers who qualify.02(.07 ≥ 2. only if Ho can be rejected will the inventory be postponed. so is p = .58 .95) / n .98 ) / 1000  β (.49 ) ≈ 1 .01(. . a.01 = -2.02(.33. with the rejection region z ≤ . b.25(. z = 1000 -1.25.99 ) / 1000   c.90 ) / 500   249 .05 it is highly unlikely that Ho will be rejected and the inventory will almost surely be carried out.58. so the inventory should be carried out.01 + 1. pˆ = .10(.05(.30 ) = . Because – 1050 .05. Ho is rejected. and z = = −6. We wish to pˆ − .645 . We use z= 6.03 z= = 3.08 .95 ) / 1000   .25 z.01) = Φ   = Φ (5.05 vs Ha: p ≠ . With pˆ = 15 = . The lower-tailed test rejects Ho if z ≤ -1.645 .1686 .33.05(.25 vs Ha: p < .02 − .Chapter 8: Tests of Hypotheses Based on a Single Sample 38.01.05 − .1 < -2. The company’s premise is not correct.1 . Thus.0005 . A conclusion that discrimination exists is very compelling. a.1686 − .05 + 1. 39.00975 z= b.0134 test Ho : p = . reject Ho if z ≥ 2. We calculate pˆ = = .645.10 + 2.75) / n 177 .58 or z ≤ -2. We wish to test Ho : p = . .645.015 .98) / 1000  β (.02 vs Ha: p < . pˆ − . . d. The hypotheses are Ho : p = .07 while no larger R has α ≤ . Ha: p > .035(.05 along with smallest possible β). 10}) yields α = 1 – B(2.035.e. 20.61 . 20}. so n = 25 should be used. 20.035 vs Ha: p < .05. 17. Robots have not demonstrated their superiority. 42. Thus (15. so Ho can’t be rejected at level .10.1) = .3) = . c. and β(.01 = -2. . 10. Section 8.0082 z= is not rejected. .021 < . R = {3. ….05.05 test. Ho should be rejected whenever p-value < .196. .078 is not < . 250 .05) = .3) = B(2.05.50 (which states that more than 50% of all enthusiasts prefer gut). c = 5 yields α = 1 – B(4. For R = {14. α = P (15 ≤ X when p = . Ho 500 .965) / n 15 − . Because -. 20.05.05. (α ≤ . .6) = B(14. 20.052) Since 13 is not in R. so reject Ho b. 16. For n = 25. n}.33. 20) is the appropriate region. 25. so reject Ho .1) = .05 test and the region of a is the best level . .5) = 1 – B(14.874. P-value = .005 z. so this R does not specify a level . . 20.03 .10 test is specified by R = (14.058. z = = −.047 < . 20} (with α = . 25. With pˆ = = . but again β(. . The alternative of interest here is Ha: p > .10 vs.10.7) = B(4.33. The best level . 4.148 > . b. 18. 19. .043. Using α = .3) = B(4. .098 while β(.035 . with the rejection region z ≤ . .4 44. 43. α = . Ho is not rejected at this level. .05 test.8) = .3) = .238.Chapter 8: Tests of Hypotheses Based on a Single Sample 41. so the rejection region should consist of large values of X (an upper-tailed test). c = 3 (i. …. so reject Ho ( a close call).05.3) = .05. a.05.6) = . d. a. β(. c = 5 yields α = 1 – B(4. . so this is a level .8) = B(14. For n = 10. ….61 isn’t ≤ -2.090 ≤ . 10. …. c. Ho : p = . so R has the form {c. 15.383.1) = . so don’t reject Ho . For n = 20.001 < . We use pˆ − .10.021. e. however β(. 753. so Ho can’t be rejected at level . Thus a two-tailed p-value: 2(. . b. .4.5824 a.025 and . so .039 is not < .10. b.1586 e.032 < t=5 < 5.0778 1 − Φ( z) d.01. or .01 < p-value < .0358 b.4 | < 2.0005.201 < | -2.10. f.20 d. . 251 .6 | < 1.8 |.341 < | -1. c.001 < p-value < . p-value = . .0250 a. the p-value = P( t > -.Chapter 8: Tests of Hypotheses Based on a Single Sample 45.0802 c.0066 e. In each case the p-value = a. 4. . . p-value = . 47.8)] < 2(.05 d. A two-tailed p-value = 2[ P(t < -4.10 e. 084 < .6) < .10 < p-value < .10).05: . In the df = 8 row of table A.498 >> . so . so P(t < -4. 2.1841 c. . so don’t reject Ho .025. 3. 46. .005 f.6) < . 48.4) > .001. With an upper-tailed test and t = -. so reject Ho at level .05. a. e.05.551 < | -4. 0 b.306.05 < P( t < -1.860 and 2. so Ho cannot be rejected. 1.084 > . .8) < .893. so . so don’t reject Ho .4562 d.001 = α.003 < . .025 < p-value < .5.05 < P( t < -1. or p-value < .05 = α.218 > . t = 2.0005).0 is between 1.718. p-value = . .50. so the p-value is between . so reject Ho .10. c. 035 n ≤ −2. so p-value = 1 − Φ .761 : 3.8 is not > 2.0002) = .79) =.63 .33(. The p-value is greater than the level of significance α = . For this lower (. We can use a large sample test if both np 0 ≥ 10 and n (1 − p0 ) ≥ 10 .896. At level .2 > 1. which is not < .5. an upper-tailed test is appropriate. α=.035 . Since s − . An upper-tailed test a.18 c.50) .51.0516)) / . reject Ho if either z ≥ 2. so the relevant hypotheses are Ho : µ = 5. Df = 14.49). Because the p-value is rather large. so reject Ho . With p = . so Ho should be rejected.50) = . Here we might be concerned with departures above as well as below the specified weight of 5. the p-value = Φ(z) = Φ(-.01.9974 [ ] p = proportion of all physicians that know the generic name for methadone.71 . so fail to reject Ho at any significance level. The data does not indicate a difference in average serum receptor concentration between pregnant women and all other women.79) = . For testing Ho : p = .0 . so don’t reject Ho . = 2. therefore fail to reject Ho that µ = 5.166 .039 z = 102 = = −.050 47 102 102 tailed test.Chapter 8: Tests of Hypotheses Based on a Single Sample 49.0002 corresponds to z = -3. 51.01.3 + 2. 252 .13 = .58 . t . pˆ = . t . 102(. so no modification appears necessary.50 vs Ha: p < .50. p-value < 2(. p-value > .01.761.0645 = 1 − Φ(− 2. 53. so we can proceed.0.71 is “off” the z-table. The computed Z is z = .5) = 1 − Φ (− . = 1.01.05. a. which is .2148.05 . 1.79 . Df = 23. so 47 − .2.0004 (.58 .97 = .01.97.01. Ho : p = . z = = −3. 52.14 b. 50. so we do not reject H0 at significance level . We will reject H0 if the p-value < . Ho would ( ) not be rejected at any reasonable α (it can’t be rejected for any α < .50 − .50.166).58 or z ≤ −2.2 vs Ha: p > .896 .0 vs Ha: µ ≠ 5. 1 − β (. Because 3. 50)(. b. The p-value = 2[P( t > 1.295 .0711. µ = the true average percentage of organic matter in this type of soil.6) ˜ . The computed summary statistics are x = 27.8.Chapter 8: Tests of Hypotheses Based on a Single Sample 54.01 and . and Ho should be rejected if = 1. 55.8. there is not enough evidence to conclude that the spectrophotometer needs recalibrating. which is less 1.5 − 70 5. P-value = P( t > 3. µ = true average reading.05.10. which is = . d. 56. At significance level . we would not have rejected H0 .5 t= = = = 1.05. so we would reject Ho . P( t > 1. so we would not reject Ho .759 )] = 2( . 57.082. we use the t test: x − 3 2.and a two-tailed p-value of . we fail to reject Ho at levels . which is not = .60 . With Ho : µ = .8) = . From table A.92 )] 2.92 .3) = .045.001.05. and the hypotheses are Ho : µ = 3 vs Ha: µ ≠ 3 .10.12 s 2.759 . df = 5.559 n than .60 vs Ha: µ ≠ . With n = 30. p-value = 2[P(t> 1. 253 . P-value = P( t > 2.017.01. Ho : µ = 70 vs Ha: µ ≠ 70 . since .082 = . At significance level . There is not enough evidence to say that the pens don’t satisfy the design specifications. c.041 ) . At significance level .559 and t = = 1. and x − 70 75.05.05 ( thus concluding that the amount of impurities need not be adjusted) .05.295 s/ n = . so Ho is rejected at level .86 s/ n 7/ 6 ˜ 2(.116.88 .058) = . µ = 10 vs Ha: µ < 10 a. but we would reject Ho at level .619 . which gives strong evidence to support the alternative hypothesis. µ = 25 vs Ha: µ > 25 .88) ˜ .782 . The data indicates that the pens do not meet the design specifications.05.481 − 3 − . P-value = P( t > 1.041. 58. so The hypotheses to be tested are Ho : t ≥ t . we would reject H0 and conclude that the true average percentage of organic matter in this type of soil is something other than 3.10 (and conclude that the amount of impurities does need adjusting).519 t= = = = −1. The appropriate hypotheses are Ho : b. and assuming normality.923 = 1.923 . s = 5. From table A. 2 z ≥ z. a statistically significant result is highly likely to appear. the test is too likely to detect small departures from Ho . which is “off the z table. this value of z is quite statistically significant. and 5 for the four n’s.9793. so n = 50 is too large.2 in favor of Ha: µ ≠ 3. . . and . a.2 − 3 . 1.1049 for β is 1 − Φ  − 2.4073 . (  − .5 59. 10. .8554.4   n = 900. Because n = 50 is large. .4325. No. The formula for  n  . 2500. respectively. By . 40.3.0002.33 +  9. Even when the departure from Ho is insignificant from a practical point of view.95 when d = inspection.12 is ≤ −1. and use Table A.   . so that for any n a t test is appropriate. Z = -5. c. Here z = . 2.12 . Supplementary Exercises 61.20 z= = −3. The computed z value is 3. n = 20 satisfies this requirement.01 + .4213. we use a z test here.667.25. rejecting Ho : µ = 3.96 .25.Chapter 8: Tests of Hypotheses Based on a Single Sample Section 8. We wish π (3) = . the reasoning is the same as in 54 (c).0944.000. Ho should be rejected in favor of Ha.5.” so p-value < .000.01 n + .96 or z ≤ −1. .96 . and 0 for n = 100. .17 to determine n.8980 for n = 100. a. No. Here 60. c. b. 254 3 .0000. whence p-value = . and 90. respectively.000. which gives .34 / 50 if either 62.0014 for n = 2500. .3 = .05 − 3. Here we assume that thickness is normally distributed.9320 / n   − . 025 = 1. b. Since –3.025 n which equals .9320  = Φ β = Φ   .1056.4073 / n    )  = .0062. The data does not indicate the mean heat-flux for pots covered with coal dust is greater than for plots covered with grass.9 .5300 Parameter of interest: µ = true average heat-flux of plots covered with coal dust 2 Ho : µ = 29.10 (actually ≈ . we would reject the null hypothesis at any reasonable significance level.10 ) 65. 255 .05. The population sampled was normal or approximately normal. Ho will be = 2. a.0 = . b. The relevant hypotheses are Ho : 64.5 30 / 16 d. n = 8.33 7. n −1 or t ≥ 1.10. The test statistic 587 − 548 39 = = 12.895 1 6 7 t= 30. so Ho should be rejected at level .05 and .10. This clearly falls into the upper tail of the 3.0 x − 29. which includes both .15 = 1. 66.10 µ = 548 vs Ha: µ ≠ 548 .05.2 vs Ha: µ ≠ 3.228 .228 or t ≤ −t .15) a. a. With a p-value of .Chapter 8: Tests of Hypotheses Based on a Single Sample 63. At level .05. t= 2160 − 2150 10 = = 1. Ho : µ = 2150 vs Ha: µ > 2150 b. p-value > .7875.2 gives a p-value of roughly .7742 6. or any other reasonable level). so Ho cannot be rejected at this significance level.0 3 Ha: µ > 29. µ = 3.02 10 / 11 two-tailed rejection region.025. x = 30.10 = −2. t= x − 2150 s/ n c.7875 − 29.30.53 / 8 Fail to reject Ho . Since e. From d. t . rejected if either value is t= t ≥ t . Ho : b.0 4 t= s/ n 5 RR: t ≥ tα .2 (Because Ha: µ > 3.025. s = 6.341 . p-value > . p-value = 1 − Φ (. 69.5 .10. d = . 1 2 Parameter of interest: women.043 > .33 > . Range 5 mg to 1. a. when µ = 9.0 . From Table A. it appears to be skewed to the right. The relevant hypotheses are Ho : µ = 9.75 .17.05.33 235 / 47 Fail to reject Ho . fail to reject Ho because . reject Ho because . The condition is not met. Summary statistics are n = 20.25 . but not at the .17. At the . and s = .8525 − 9. Thus the data contradicts the design specification that sprinkler activation is less than 25 seconds at the level .01.625.25. when b.20 .44) = . Ho : µ = 200 3 Ha: 4 5 6 7 68. from which the p-value = . the distribution does not appear to be normal. because .60 .0965 / 20 Ha: output). d = . x = 9. With such a small p-value. d = 1.Chapter 8: Tests of Hypotheses Based on a Single Sample 67.0965. µ = true daily caffeine consumption of adult µ = 9.282 or if p-value ≤ . b.8525 . This sample size is large enough so we can conduct a hypothesis test about the mean.05 significance level. From table A.10 215 − 200 z= = . n ≈ 28 A normality plot reveals that these observations could have come from a normally distributed population. therefore a t-test is appropriate. and β ≈ .75 .176 mg. 70. which leads to a 9.0001.75 test statistic t = = 4.01 level.043 < . µ > 200 x − 200 z= s/ n RR: z ≥ 1. (From MINITAB . It is not necessary to assume normality if the sample size is large enough due to the central limit theorem.05. s = 235 mg. df = 9.625. df = 9. N = 47. At the level .75 vs µ > 9. and β ≈ .44 . The data does not indicate that daily consump tion of all adult women exceeds 200 mg. β = . 256 .01. x = 215 mg. No. a. the data strongly supports the alternative hypothesis. Since 1.70 =&1. With p o = .10 (since for a two-tailed test.96 . we could reject Ho .75. Then the appropriate hypotheses are Ho : p = . so z = = 1.08 . do not reject Ho . 02 vs Ha: λ>4 using the test statistic z= x −4 . Let p = the true proportion of mechanics who could identify the problem.28) = .02 .001. z = -3. 75.01333 16 = 1.08 .70 .75 >. z 800 . We assume that the population from which the sample was taken was approximately normally distributed. Yes.02 − .01333(.89 − 1.05.05 = α / 2 . For the given 4/n 160 4.33 is not ≥ 2. the data does A t test is appropriate. the data argues strongly against Ho . . so a lower-tailed test should be used. µ = 1. a.0202 ). we reject 36 4 / 36 =&2. Even though the underlying distribution may not be normal.05] = .0005 .645)] = 2[. At level .75 if the p-value 1. The computed z is z = 3107 − 3200 = −3.10 . Ho : not contradict prior research.96 or z ≤ −1.33 .05) = . There is not evidence that the incidence rate among prisoners differs from that of the adult population. we fail to reject the null hypothesis. 72.75 and pˆ = 42 = . P-value = 2[1 − Φ (1.75 vs Ha: p < . Because this 72 p-value is so small.05 (since 1 − Φ (2. The computed t is t = = 1.02. n = 36 and Ho if λ=4 x= should not be rejected at this level. Thus.444 . 05) = . we reject Ho if either z ≥ 1. Ho sample. Ho : µ = 3200 should be rejected in favor of Ha: µ < 3200 if z ≤ − z.32 ≤ −3. b. a z test can be used because n is large.Chapter 8: Tests of Hypotheses Based on a Single Sample 71.708 = t .05 .10 < . so Ho 188 / 45 should be rejected at level . 257 . since .20. which is not in either With pˆ = = . 73.). 74. We wish to test Ho : z ≥ z.583 .28 and P = Φ (− 3.75 is rejected in favor of Ha: µ ≠ 1. 25 . 001 = −3.025. Because 1.645 . . .444 − 4 = 4. The possible error we could have made is a type II. so we reject it in favor of Ha.42 / 26 P =&2(. 1 75 vs Ho : p ≠ 75 .98667 ) 800 With Ho : p = 1 rejection region. 390 column of Table A. χ . so Ho is rejected in favor of the t= conclusion that the true average time exceeds 15 minutes. Ho : σ 2 = .5 − 15 2. so θˆ = X + 2. that captures 9(. The estimated n 2n s standard error (standard deviation) is 1.4289 . and this exceeds . More than 99% of all soil samples have pH less than 6. The test statistic value is not 78. and we can assume the distribution is approximately normal. E (X + 2.33 V ( S ) = + 5.11 is .33E ( S ) = µ + 2. 20 (Ho rejected at level . Since z= − . Thus .75 iff the 95th percentile is less than 6. Ho cannot be rejected.591 = χ . The 20 df row of Table A.25 2 upper-tail area .33σ < 6.8.927 .f.25 vs Ha: σ 2 > .58 (Ho not rejected at level .75. a.01 < p-value < . Because 12.025 and Ho cannot be rejected at level .25 . Thus the p-value is “off the chart” in the 20 df s / n 2. V ( X + 2. Ho clearly cannot .01 (the p-value is the smallest alpha at which rejection can take place.665 .33σ = 6.75 . n 2 c.58) = 12. Because the sample size is less than 40. 79.75. Ho : µ = 15 vs Ha: µ > 15 .7 shows that and The uniformity specification is not contradicted. The chi-squared critical value for 9 d.01 is 21. ≥ 21. The 95th percentile does not appear to exceed 6.2975.0385 be rejected.33S is approximately unbiased.2 / 32 .33σ .047 < 0 .26 < 8.58 < 9.33 . and so is approximately 0 < .33S ) = V ( X ) + 2. Ho will be rejected at level .33S ) = E ( X ) + 2 .4 .299.665. 258 .025).75 vs Ha: µ + 2. the appropriate statistic is x − 15 17.5 = = = 6. Thus we wish to test Ho : µ + 2. 77.01) 8. 20 = 8.01 if z ≤ 2.01).05 .11 . σ2 σ2 b.Chapter 8: Tests of Hypotheses Based on a Single Sample 76. This happens when γ > α − γ . is the area under that the Z curve above the interval (z α −γ (z γ + λ . 2n or ≤ χ12−α / 2. since x is large) Xi suggest that H o be rejected in favor of Ha.v. then we wish to know when π (µ o + ∆ ) = 1 − Φ( zγ − λ ) σ + Φ (− z α −γ − λ ) > 1 − Φ( zγ + λ ) + Φ (− z α −γ + λ ) = π ( µ o − ∆ ) .. 2n gives a µo test with significance level α . β (µ ) = P ( X ≥ µ o + Z ≥ z γ or Z ≤ z α −γ ) (when Z is a standard normal r. this inequality becomes Φ (z γ + λ ) − Φ(z γ − λ ) > Φ(z α −γ + λ ) − Φ (zα −γ − λ ) . The test statistic value is Ho is rejected if 2∑ 2(737 ) = 19. a.Chapter 8: Tests of Hypotheses Based on a Single Sample 80. Using the fact Φ (− c ) = 1 − Φ (c ) .e. µo a. large test statistic values (large Σ xi . The sample data does not suggest that true average lifetime is less than the previously claimed value. σzγ n orX ≤ µ o − σzα −γ n when the true value is µ) = µ −µ µ −µ    Φ  z γ + o  − Φ − zα −γ + o  σ/ n  σ / n    ∆ c. but when z γ < zα − γ .s.260 .) = Φ (− zα −γ ) + 1 − Φ (z γ ) = α − γ + γ = α . P(type I error) = P(either b. 20 = 8. 81.h. b.01. i. Xi has a chi-squared distribution with df = 2n.s.65 is not in the rejection µo region.299. 2n . If the alternative is Ha: µ < µ o . 75 Xi ≤ χ . 259 . rejecting when X 2 ∑ i ≤ χ 12−α . is the area above − λ . When Ho is true.h. If µo the alternative is Ha: µ > µ o . when γ > α / 2 . The rejection region for Ha: µ ≠ µ o is either µo X 2 ∑ i ≥ χ α2 / 2. Both intervals have width 2 λ . the first interval is closer to 0 (and thus corresponds to the large area) than is the second. Let λ = n . z γ − λ ) .65 . while the r. so Ho should not be rejected. The l. so rejecting when 2 ∑ ≥ χ α2. 2n gives a level α test. Clearly 19. z α −γ + λ ) . Ho : 2λ oΣX i = 2∑ µ = 75 vs Ha: µ < 75 . At level . 5 0.30 0. as well as the associated ß(p’) are in the table below.0 0. 1. so the region (0. P’ 0.20 0.70 0.9) = . ß(p’) seems to be quite large for a great range of p’ values. so whenever 10 is in the rejection region.99 Beta 0. b.9900 1.9 1.349 .4 0. 5) does specify a level .1 0. and the sketch follows.7 p prime 260 0. α ≥ . 341.3 0.01 test. The first value to be placed in the upper-tailed part of a two tailed region would be 10.6206 0. α = P( X ≤ 5 when p = .0 .002.0 0.8696 0.0000 0.0071 0.6 0.10 0. but P(X = 10 when p = . .5 0.0 0. Using the two-tailed formula for ß(p’) on p.50 0.3594 0.80 0.0505 0.9) = .9) = B(5.8 0.1635 0. The values of p’ we chose.40 0.0000 Beta 1.60 0. 10.90 0.0000 0.01 0. c. a.349.2 0.Chapter 8: Tests of Hypotheses Based on a Single Sample 82.0000 0. …. we calculate the value for the range of possible p’ values. the CLT implies that both X and Y have approximately normal distributions.33.0724 = . x− y z= We compute z= .0724 . The shape is not necessarily that of a normal curve when m = n = 10.2691 .0) = . the distribution of 2.96 or 2100 = 4. a. A normal curve with mean and s. The test statistic value is z= z ≥ 2.d.4 . V ( X − Y ) = V (X ) + V (Y ) = of c.400 2200 2 1900 + 45 45 = 2 z ≥ 1.500 − 36. Since 4. E (X − Y ) = E( X ) − E (Y ) = 4.01 if s12 s 22 + m n (43.1 − 4.500 − 40. so we don’t reject Ho and conclude that the true average life for radials does not exceed that for economy brand by more than 500.5 = −. which is not 396. 3. So if the two lifetime population distributions are not normal.85 .93 > 2.96. so X − Y does also). and Ho will be rejected if either s12 s 22 + m n 42. We compute z = ( x − y ) − 5000 . and the s.85 > 433. The test statistic value is z ≤ −1.96 . b. reject Ho and conclude that the two brands differ with respect to true average tread lives. and H o will be rejected at level .33 .33 1.8) (2. because the CLT cannot be invoked. as given in a and b (because m = n = 100.d.800 ) − 5000 2200 2 1500 + 45 45 2 = 700 = 1.1 1.76 . + = + m n 100 100 2 2 X − Y = . irrespective of sample sizes. 261 . σ 12 σ 22 (1 . X − Y will typically be quite complicated.CHAPTER 9 Section 9. 2.01.1414   d.1414 for nonsufferers. b. σ 12 σ 22 (. the upper bound is a.90) = .33 . From Exercise 3.2 + 1   β = 1 − Φ  − 2. since – z= o . In the context of this problem situation.33 + 1. so + = m n 10 10 (. 5.33 m n = (1250.2) 2 = 65. − 1 . the interval is ( x − y ) ± (1.I.92 ) = .90 .8212 . At level . From Exercise 2.67. a.16 )2 = .33.95 .1414 .04) 2 + (. b.96) moderately wide (a consequence of the standard deviations being large). m=n= .Chapter 9: Inferences Based on Two Samples 4.33 −  = 1 − Φ (− . is s12 s 22 + = 2100 ± 1. 262 .28) 2 (− .2949.96(433.95 = 6352. so the information about µ1 and µ 2 is not as precise as might be desirable.2(2. so use 66.645(396. the C. H is rejected if z ≤ −2. reject Ho . P = Φ (− 2.93) = 5700 + 652.33) = 2100 ± 849.0019 c.15 . Ha says that the average calorie output for sufferers is more than 1 cal/cm2 /min below that 5700 + 1.64 − 2.90 < -2.05 ) − (− 1) = −2.33) . b. Since z= (18.96 1 1. Ho should be rejected at level .96 + = = . Since n = 32 is not a large sample.12 − 16.0529 ⇒ n = 37.01.06 .33 2.3539   c.28) n = 38. 2.40 − 9. 7.2).645 (10.3085 .68 2 + 97 148 Reject Ho . 1− 0   β (1) = Φ 2.83 2 4.83 4. The data indicates the Boredom Proneness Rating is higher for males than for females. µ1 − µ 2 = the true difference of means for males and females on the Boredom Proneness Rating.26 ) − ∆ o z= = 1.50) = . a. so use 2 40 n n (1.56 1. d.96 + 40 32 .33 −  = Φ (− . Let µ1 = men’s average and µ 2 = 1 Parameter of interest: 2 women’s average. and the appropriate conclusion would follow.1169 ⇒ = .33 .87) = 3. Ho should be rejected if z ≥ 2.53 ≥ 2.Chapter 9: Inferences Based on Two Samples 6. Ho : µ1 − µ 2 = 0 3 Ha: 4 5 6 7 µ1 − µ 2 > 0 (x − y) − ∆ o (x − y ) − 0 z= = s12 s 22 s12 s 22 + + m n m n RR: z ≥ 1.645 + 1.56 1. 263 . A small sample t procedure should be used (section 9. it would no longer be appropriate to use the large sample test. There is no significant difference. We will calculate a 95% confidence interval for µ. c.2542. d. and 5. 21. 19.3 2 2 .1271) = .7) 39. z = (19.9 − 13. m n x − y = 19. No. b. µ1 − µ 2 = the true difference of mean tensile strength of the 1064 grade and the 1078 grade wire rod. µ1 − µ 2 = 0 . 2 sd’s above the mean is 98.6 − 123.588) . 4 5 6 7 so reject Ho .1.2 = 1. The requested information can be provided by a 95% confidence interval for ( x − y ) ± 1.1 = 19. Ho : µ1 − µ 2 = −10 3 Ha: µ1 − µ 2 < −10 ( x − y ) − ∆ o ( x − y ) − (− 10 ) z= = s12 s 22 s12 s 22 + + m n m n RR: p − value < α (107.9 − 13.6) − (− 10 ) = − 6 = −28. but the distribution stops at zero on the left.Ha: µ1 − µ 2 ≠ 0 .0.96(.412.57 z= . The distribution is positively skewed.−5. a.0 2 + 129 129 For a lower-tailed test.96 264 39.Chapter 9: Inferences Based on Two Samples 8.9 = (10.14 .7 = 6.8 2 + 60 60 = 6 .96 9. b. The p value is larger than any reasonable α. With a normal distribution.44 the p-value = 2[P(z > 1.57 ) ≈ 0 .210) = (− 6.9 ± 1. a.210 1 .12 15. It appears that there could be a difference. so we do not reject H0 . and the distribution should be symmetric. point estimate Ho : µ1 − µ 2 : s12 s 22 + = (− 6) ± 1.8) 60 . the p-value = Φ (− 28.14)] = 2( . There is very compelling evidence that the mean tensile strength of the 1078 grade exceeds that of the 1064 grade by more than 10.9 ± 9.2 . Let µ1 = 1064 grade average and µ 2 = 1 Parameter of interest: 2 1078 grade average. we would expect most of the data to be within 2 standard deviations of the mean. which is less than any α . the true average length of stays for patients given the treatment. 2 )2 = (0. so use n = 50.2. so (5. =0 vs. is ( x − y ) ± 2.05 .645 )2 . and σˆ θ is obtained by replacing each σ i with Si . z = − . With θˆ = 2(2.05.31) . so the use of the high purity steel cannot be justified. Ho should (65. s12 s 22 The C.58 .01.I. z α / 2 = 1.05 .04. so β = Φ 3. 12. 1  µ1 − µ 2 − ∆ o = 1 .8) ± 1.−6.08 . Because –1. ( θ < 0 equivalent to 2 µ1 < µ 2 .2272   (X − Y ) ± z α /2 s12 s 22 s + .3) (4.2891 . Because we selected α = .0016 The appropriate hypotheses are Ho : θ = 49.23. the interval calculated will bracket the true difference 95% of the time. and the test is one-tailed. 11. At level . indicating precision of the estimate.77 ± 2.08 . σ θ is the square root m n 2 2 ˆ of Var θ . where θ = 2µ 1 − µ 2 .03) = .001. 13. d = . α = . b.9104 = −8.53) = .8) − 5 = 2. with Var θˆ = 4Var ( X ) + Var (Y ) = 1 + 2 .05 .05.5 − 3.38 . Standard error = . and Ho is rejected if z ≤ −2.08 −  = Φ (− .0025)(2.97 σ θˆ () and σˆ θ = 4(2.33.58 + = (− 8. σ 1 = σ 2 = . so normal is more than twice schizo) The estimator of θ is is () 4σ 2 σ 2 θˆ = 2 X − Y .3) 2 + (0.Chapter 9: Inferences Based on Two Samples 10. 265 .33. Using α = . H cannot be be rejected if z ≥ 3.9236 .41) .6 − 59. Substitution yields m n n ( x − y ) ± zα / 2 (SE1 ) 2 + ( SE2 )2 .05 > -2.0025 + . The interval is fairly narrow. we can state that when using this method with repeated sampling. With 99% confidence we may say that the true difference between the average 7-day and 28-day strengths is between -11.99.31 N/mm2 . so n= 14. The test statistic is then θˆ (since θ o = 0 ). Ha: θ < 0 .2272 The hypotheses are Ho : rejected in favor of Ha at this level.77) ± 2.9236 2 2 Ho is not rejected.96 .46 m n = (− 11.97 = −1. (.35 = −.23 and -6. β = . + 43 45 . a.33 + 1.69 ) − 6.96 (0. Since z = o . µ1 − µ 2 = 5 and Ha: µ1 − µ 2 > 5 .89 < 3. the closeness of x and y suggests that there is essentially no difference between true average fracture toughness for type I and type I steels. 16. From a practical point of view. so β = Φ zα − 1  σ σ     As either m or n increases.1586 .7 ≈ 21 .84 = 26. and since z β is the numerator of n . ν = ( 52 10 2 2 + 62 15 ν = 5 10 22 10 2 2 ν = 2 6 15 + 615 2 ) 24.694 + 1. The p-value by itself would not have conveyed this message.05 ≈ 26 .41)] = .21 = 17. µ1 − µ 2 − ∆ o increases (the numerator is σ µ1 − µ 2 − ∆ o  µ − µ 2 − ∆o    positive).0046. z = 2. a.41 and p-value = 2 n s s + n n 2[1 − Φ (1. For n = 400. z = 1.098 2 2 2 2 10 52 12 2 2 = 14 ( ) +( ) ( 37.84 = 18. As β decreases. ) ( ) +( ) ( = 9 9 c. For n = 100. 2 2 14 + 62 24 ) 2 ( ) +( ) 2 5 12 62 24 11 23 266 . z β increases.411 = 7. so  zα −  decreases. σ decreases. n increases also.018 + . b. The very small difference in sample averages has been magnified by the large sample sizes – statistical rather than practical significance.Chapter 9: Inferences Based on Two Samples 15.83 and p-value = . a. so decreases. ν = ( 52 10 2 2 ) 2 6 + 10 2 ( ) +( ) 5 10 6 10 9 b.395 + . x− y z= 2 1 2 2 .2 17.44 2 2 6 15 9 d.43 ≈ 17 .2 = . Section 9.27 ≈ 18 .01 = 21.694 + .411 = 12. 2168 )2 + (4. t= For the given hypotheses. 21. 9 = 2. we will reject Ho if p − value < α .f.8241)2 (4. Since –2.20) . 2402 5 5 4 22. and µ 2 = the corresponding value for CTS subjects.f. 267 .240 5 2 ) 2 ( ) +( ) . we fail to reject Ho .82 Ha: µ1 − µ 2 < 0 . We will reject Ho if 5 5 = −2. so the interval is − 3.01. and the test statistic .262 . .1 . With Ho : ν = t= ( µ1 − µ 2 = 0 vs. the test statistic = (4. µ1 = the true average gap detection threshold for normal subjects.007 ) = (− 10. We have insufficient evidence to claim that the true average gap detection threshold for CTS subjects exceeds that for normal subjects.78 = 6.764.1265 which is less than most reasonable α ' s .001. Ha: µ1 − µ 2 ≠ 0 .15 = −2. 164 6 + . 240 5 2 2 = = 6. 19.46 . so use d. and 3.f.73 − 21. 38 6 = − 3.262(3. .17)] < 2(.764.20 > -2.8 ≈ 6 . since –1.164 2 6 2 2 .71 − 2. 03 6 + 2 5. .53 − . we don’t reject Ho .96 .0351125 + .8241)2 115.0351125 + .7 − 129.6 ± 2.07569 )2 = 15. the rejection region is 7 9 t ≤ −t . the d. so we reject Ho and conclude that there is a difference in the densities of the two brick types. 9 20.007 = 9. ν 2 ( .6 = −1.Chapter 9: Inferences Based on Two Samples 18. We want a 95% confidence interval for µ1 − µ 2 .3329 Let Using d. t . The relevant hypotheses are Ho : µ1 − µ 2 = 0 vs.2168 + 4.602 .07569 .164 6 + . or 15.3 + 10 2 5.0351125 )2 + (.95 2 .0005) =.3.40.01.17 leads to a p-value of 2[ P(t > 6.46 is not ≤ −2. = 9. is ν t ≤ −t .07569 ) = (. Because the interval is so wide. and the test statistic t = = = −2.025. it does not appear that precise information is available.602 .20 . 1. which we round down to 14. so we .80 . 012 12 ) 2 ( ) +( ) 2 4 .5 P: H: Av erage: 1. 50833 StD ev : 0.93 t= = = −3.999 . The test statistic is 19.0 2. 025. We need the degrees of freedom to find the rejection region: ν = ( 1 .145 .05 .3 1.58 2 12 2 1 .58 4 .344 Anderson-D arling N ormality Tes t A-Squared: -10.20 .8 1.Chapter 9: Inferences Based on Two Samples Let µ1 = the true average strength for wire-brushing preparation and let µ 2 = the average strength for hand-chisel preparation.20 − 23.58 2 12 + 4.999 .0 2.99 . we see that both plots illustrate sufficient linearity.99 . 444206 N: 24 1.001 . Since we are concerned about any possible difference between the two means. We test H 0 : µ1 − µ 2 = 0 vs.01 5 11 2 = 2. 58750 St Dev : 0.0039 + . 268 .01 .05 .20 .159 .50 .530330 N: 24 Anders on-D arling N ormality Tes t A-Squared: 0.33 . 22.01 . 396 P-Value: 0.3 Av erage: 1. H a : µ 1 − µ 2 ≠ 0 .145 .001 0. 000 Using Minitab to generate normal probability plots.50 . it is plausible that both samples have been selected from normal population distributions.5 2. 23.3964 = 14.670 P-Value: 1.13 − 3.01 1.1632 11 t ≥ t .8 1.80 . Normal plots Normal Probability Plot for Poor Quality Fabric .2442 + 12 12 reject Ho if ( ) conclude that there does appear to be a difference between the two population average strengths. so we reject Ho and 2 2 1 .95 . a two-sided test is appropriate. a. which is ≤ −2. Therefore.14 = 2.95 Probability Probability Normal Probability Plot for High Quality Fabric . 5 1. so the interval is 3.00017906 = 10. and 2 2 2 2 ( ) +( ) .13 N.228 .39 20 19 19 = 2.39 2 + .74 − 4. 24.96 ) ± t.5 extensibility (%) The comparative boxplot does not suggest a difference between average extensibility for the two types of fabrics.228 in absolute value.025.4.39 2    + 20 20   calculate the degrees of freedom ν = = 30. which is not ≥ 2.66 20 t .5 .0433265 ) cannot reject Ho .042 . we reject Ho if statistic is t= t ≥ t .83 . Thus. The test − . Comparative Box Plot for High Quality and Poor Quality Fabric Poor Quality H igh Quality 0. 025. we can say that the true average firmness for zero-day apples exceeds that of 20-day apples by between 3.5 2. and using significance level . so we use 30 df.13) . which we round down to 10.43. We test ν H 0 : µ1 − µ 2 = 0 vs. 269 .Chapter 9: Inferences Based on Two Samples b. With degrees of freedom 2 ( .43 and 4. A 95% confidence interval for the difference between the true firmness of zero-day apples and the true firmness of 20-day apples is (8.025.08 = −.38 . We 20 20 2  . with 95% confidence.042(.ν . There is insufficient evidence to claim that the true average extensibility differs for the two types of fabrics.66 2 .17142) = (3. so we (. H a : µ 1 − µ 2 ≠ 0 .10 = 2.66 2 .05 (not specified in the problem).0433265) = . c.78 ± 2. 30 . 2 ± 2.506 . using either 53 or 54 won’t make much difference in the critical t value) so the (91. Our p-value of . Because 0 is not contained in this interval.83 = 18. Let µ1 = the true average potential drop for alloy connections and let µ 2 = the true average potential drop for EC connections. Since we are interested in whether the potential drop is higher for alloy connections.4) ± 2. or about 54 (normally 30 we would round down to 53. 27.83 . 8 = 2. 270 . the 95% interval estimate of µ 2 − µ1 would be ( -31.68 528.306(5.607 = (6. and the p-value for our upper tailed test would be ½ (two-tailed p-value) = 1 2 (. Using the SAS output provided. the test statistic. that the two population means are not equal.5.5 − 88.4674) = 18. 5 28 + 7. but the standard error calculation would give the same result as before. we can be 2 desired confidence interval is 2 reasonably certain that the true difference µ1 − µ 2 is not 0 and.Chapter 9: Inferences Based on Two Samples 25.931 = (.9 ± 12. the t value increases to about 2.0004 is less than the significance level of .3). we can conclude that the population means are not equal. Because 0 does not lie inside this interval. the population means are not equal.306 11. we can no longer conclude that the means are different if we use a 95% confidence interval.3.3 2 6 2 2 11. when assuming unequal variances.63 + 8. 3 6 + 8.5. the corresponding df is 37. an upper tailed test is appropriate.3 8 2 2 = 893.0008) = . is t = 3. so we reject Ho .025.5 2 28 2 2 5.131) .6.9 ± 2. just the negatives of the endpoints of the original interval. but this number is very close to 54 – of course for this large number of df. H a : µ 1 − µ 2 > 0 .3 − 21. Since 0 is not in this interval.269. we reach exactly the same conclusion as before. so t . i.8 2 31 = 53.2 ± 3. We test H 0 : µ1 − µ 2 = 0 vs. We calculate the degrees of freedom ν = ( 5 . 26. therefore.3) ± 1 .0004 . The approximate degrees of freedom for this estimate are ν = ( 11 .8 = 3. Therefore.31.83 2 ) 2 ( ) +( ) 5 8 . -6.95 . which results in the interval 3.6362.01 or so. 8 2 31 ) 2 ( ) +( ) 27 2 7 . Calculating a confidence interval for µ 2 − µ1 would change only the order of subtraction of evidence that the sample means. which we round down to 8.306 101. Since this interval does contain 0. 5 + 731. there is strong 2 and the desired interval is 2 µ1 − µ 2 is not 0..175 7 (40. For a 95% interval.01.59 = 8. We have sufficient evidence to claim that the true average potential drop for alloy connections is higher than that for EC connections.5) .e. meaning that the true average load for carbon beams exceeds that for fiberglass beams by at least 7.75 2 2.98. and the 99% C. We desire a 99% confidence interval.22 26 + 4.04.8114 14 14 The p-value ≈ P(t < −2.5. First we calculate the degrees of freedom: ν = ( 2 .I.95 4 which is < .44 5 9 2 2 ) 22. The test ( x − y ) − 10 = 4.10 so we reject H0 and conclude that the true average lean angle for older females is more than 10 degrees smaller than that of younger females.9473 = −7. The test statistic is 2 − 14 ( 44. 2 2 29.98 kN.2 2 26 + 4 . The data does suggest that the extra carbonation of cola results in a higher average compression strength. 30. We are very confident that the true average load for carbon beams exceeds that for fiberglass beams by between 6.59 ≈ 6 and the p-value from table A.36 t= = −2. H 0 : µ1 − µ 2 = 10 vs. H a : µ 1 − µ 2 < 0 .4 + 2. A lower tailed test is appropriate.4 ± 2.08 = 5.10) = . which we would round down to 37.005.4 − 42.09 kN. b. a.3 .434 . 29.Chapter 9: Inferences Based on Two Samples 28.3 2 26 2 = 37.576 = (− 11.2 2 26 2 2 . ν = = = 25. 36 = 2. t .4 ) 1971. We test H 0 : µ1 − µ 2 = 0 vs.83 and 11.24 . The upper limit of the interval in part a does not give a 99% upper confidence bound.75 2 10 2 2 .8 is approx .719 .4 ) + (15) 77.023 .44 2 5 5 ) 2 = ( ) +( ) 4 . 32 26 = −9. The 99% upper bound would be − 9.09 .83) . Let µ1 = the true average compression strength for strawberry drink and let µ 2 = the true average compression strength for cola. Using 36 degrees of freedom (a more conservative choice). except that there is 26 no df = 37 row in Table A.4 + 15 (29.75 2 10 + 10 4 .17 + 4.8) ± 2. H a : µ 1 − µ 2 > 10 .3 2 26 ) 2 ( ) +( ) 26 4 . This p-value indicates strong support for the alternative hypothesis. 3. 44 We will test the hypotheses: ν = ( ( 2 . −6. so use df=25.5 = 2. is (33.10 .719 2.08 The degrees of freedom statistic is t = 2 2 . ( 271 ) . 79 2 14 13 2 1 . a.3 − 437. Comparative Box Plot for High Range and Mid Range 470 460 mid range 450 440 430 420 m id range hi gh range The mo st notable feature of these boxplots is the larger amount of variation present in the mid-range data compared to the high-range data.e.8) = .69 = (− 7. With degrees of freedom The test statistic is t = .069 plausible values for 15. This p-value 9 indicates strong support for the alternative hypothesis since we would reject Ho at significance levels greater than . b.2084 )2 ( ) +( ) . We test H 0 : µ 1 − µ 2 = 1 vs. the interval spans zero) we would conclude that there is not sufficient evidence to suggest that the average value for mid-range and the average value for high-range differ.65) − 1 = 1.85 ± 8.046.10 14 Let ν = (.54 ) . H a : µ 1 − µ 2 > 1 .28 2 10 2 = 13.11 = .Chapter 9: Inferences Based on Two Samples 31. Using df = 23.45) ± 2. Since 2 µ mid −range − µ high−range are both positive and negative (i. There is sufficient evidence to claim that true average proportional stress limit for red oak exceeds that of Douglas fir by more than 1 MPa.046 .48 − 6. µ1 = the true average proportional stress limit for red oak and let µ 2 = the true average proportional stress limit for Douglas fir.84.9. (8.85 ≈ 14 . 32. Otherwise.. the p-value ≈ P(t > 1.79 2 28 2 .83 1. a 95% confidence interval for (438. both look reasonably symmetric with no outliers present.2084 + 1. 272 .12 17 µ mid −range − µ high−range is 83 + 6.818 . 6 2 8 2 2 2 .447 )(1.876 . we need to calculate the degrees of freedom.8 − 40 .0686 3 t . The p-value for a .23 ≈ 2. 9 = 2. The exercise asks if we can conclude that µ 2 exceeds µ 1 by more than 5 g.2252 4 ( 3 + 1. Therefore.5 + 8 10 ) 2 ( ) +( ) 2 2. With df = m+n-1 = 6 for this interval. H a : µ 1 − µ 2 < −5 . ( x − y ) ± tα / 2.225 .. 6 = 2.m + n−2 ⋅ s p 1 m + 1n .225 2 4 2 1 . The pooled variance estimate is s 2p = The sample means and standard deviations of the two samples are  m − 1  2  n −1  2  4 −1  2  4 −1  2   s1 +  s2 =  (1. t .2 ) = P( t > 2. Using the two-sample t interval discussed earlier.Chapter 9: Inferences Based on Two Samples 33.24.1609 = 14.943 = (− . 34.025. Following the usual format for most confidence intervals: statistic ± (critical value)(standard error).1454 9 lower tailed test is P( t < -2. c.262(.2) ± 2.2252 4 + 1. This interval contains 0.90 . a.1227) 14 + 14 = 1. 5 10 7 2 2 = − 2.20 . is 1.025.260 .01.7938 ) = (− . Therefore.10. s1 = 1.f.90 − 12.5 − (− 5 ) = 2 1 2 2 2 2 . which we round down to 14. ν = ( 1 . Since this p-value is larger than the specified significance level . which we can restate in the equivalent form: µ1 − µ 2 < −5 .70 ± 2. The test statistic is t= ( x − y ) − (∆ ) = 32 . we cannot reject Ho .7 = −2.262 1. a pooled variance confidence interval for the difference between two means is b. we conduct a lower-tailed test of H 0 : µ1 − µ 2 = −5 vs. this data does not support the belief that average weight gain for the control group exceeds that of the steroid group by more than 5 g. 01 4 2 = .3.9 − 12.64 ) . This 2 interval is slightly smaller.5 s s + m n ν = ( 2 2.1227 . y = 12.7 ± 1. we use the CI as follows: First.010 .225) +  (1.022.014 2 ) 2 ) +( ) 2 1. but it still supports the same conclusion.50) .014 = 1.6 8 + 210. Let µ1 = the true average weight gain for steroid treatment and let µ 2 = the true average weight gain for the population not treated with steroids.262 .447 and (13. so it does not support the the desired interval is conclusion that the two population means are different.6 2 .19 ≈ 9 so .2 . so s p = 1.2124 2. 273 . s 2 = 1.20) ± (2.6302 = 9.010 )  m + n − 2  m+ n − 2  4 + 4 − 2 4+4−2 = 1. The approximate d.3.2 ) = . Then the interval is (13. x = 13. 01.2) = .8628 / 8 7 Fail to reject Ho .544 + 8 10 value is P ( t < -2.5) 2 = 2. Since .25 .25 − 0 = 1. First.  16   16  (32. This is the same conclusion reached in Exercise 33. 274 . There are two changes that must be made to the procedure we currently use.8628 1 Parameter of Interest: µ D = true average difference of breaking load for fabric in unabraded or abraded condition. then. d = 7.544 . Section 9.73 11. s D = 11.8 − 40. Assuming equal variances in the situation from Exercise 33.998 6 t= 7. The data does not indicate a difference in breaking load for the two fabric load conditions. the equation used to compute the value of the t test statistic is: t= (x − y ) − (∆ ) sp 1 1 + m n where s p is defined as in Exercise 34 above. 3 H 0 : µD = 0 Ha : µD > 0 4 t= 2 d − µD sD / n = d −0 sD / n 5 RR: t ≥ t .021 > . and the p1 1 2.021. 7 = 2.6 )2 +  (2.5) − (− 5 ) = −2. we calculate s p as follows:  7 9 s p =  (2. The value of the test statistic is. Second. we fail to reject Ho .2 t= .Chapter 9: Inferences Based on Two Samples 35.01.24 ≈ −2. The degrees of freedom = 16. the degrees of freedom = m + n – 2.3 36. 3758) . s d = . We can be  33  highly confident. (indoor value – outdoor value).3868 ) 1 + 331 = (− 1. Then d = −.55 = 4. largely because of the amount of variation in the differences.3868  − .55.4239 ± .3758 nanograms/m3 and that the outdoor concentration may exceed the indoor concentration by a much as 1.3868 . First. this is a wide prediction interval. at the 95% confidence level. compute the difference between indoor and outdoor concentrations of hexavalent chromium for each of the 33 houses. which yields an IQR of 49.224 nanograms/m3 .55 = 2.037 )  = −.95 – 88.224.2868 ) . Clearly. The median of the “Normal” data is 46..00.1 and the upper and lower quartiles are 88. 38.037 .5611 nanograms/m3 . and a 95% confidence interval for the population mean difference between indoor and outdoor concentration is  . This prediction interval means that the indoor concentration may exceed the outdoor concentration by as much as . This exercise calls for paired analysis. that the true average concentration of hexavalent chromium outdoors exceeds the true average concentration indoors by between .4239 .2868 and . a. A 95% prediction interval for the difference in concentration for the 34th house is ( d ± t . These 33 differences are summarized as follows: n = 33.95.40.55 and 49. for the 34th house.55 – 45.13715 = (− .Chapter 9: Inferences Based on Two Samples 37. 32 = 2. The median of the “High” data is 90.55 and 90. b.80 and the upper and lower quartiles are 45.5611. 32 s d 1 + 1 n ) = −.4239 ± (2. where d = t . a.037 )(. The most significant feature of these boxplots is the fact that their locations (medians) are far apart. −.025.025. Compara tive Boxplots for Normal and High Strength Con crete Mix 90 80 70 60 50 40 Hig h: Normal : 275 . which yields an IQR of 90.4239 ± (2. 83) .018 < .009] = . which is clearly insignificant. 40. with test statistic d −0 sD / n = 167. 41.178 . . we find With d = −42.2 − 0 228 / 14 = 2. The d = −.714 .047. With d = 7.23 ± (2.9 ) = .14 = 2. There is strong evidence to support the claim that the true average difference between intake values measured by the two methods is not 0. H a : µ d ≠ 0 . and s d = 4.Chapter 9: Inferences Based on Two Samples b.404 = (− 44. Taking differences in the order “Normal” – “High” . t ≥ t . a.23 ± 2. we conclude that the two population means are not equal.7 . and s d = . Ho is rejected in favor of Ha.145)  = −42. s p = 2. we can reject Ho . So.70 .9 .05 . we would (barely) reject Ho .544 = −3. i.34  − 42.. at the typical significance level of . H a : µ d > 0 . A quantile of the 15 differences shows that the data follows (approximately) a straight line.178 / 9 1. The two-tailed p-value is 2[ P( t > 2.544 . and conclude that the data indicates that the higher level of illumination yields a decrease of more than 5 seconds in true average task completion time. we have to work with the differences of the two samples. Because 0 is  15  not contained in this interval. the 4.74 ≈ 2. and t = − . With degrees of freedom n – 1 = 8. the . b. Since .018.947 or t ≤ −2.e. There is a difference between them.025.23 .05 ≤ −2. s 2p = 7.05.1785 − .947 .39 We test corresponding p-value is P( t > 1. we can conclude that the difference between the population means is not 0. 39.600 .005.947 . a. so t = − 3.15 = 2. a 95% confidence interval for the difference between the population means is  4.63.87 ≈ 1.31 .57 . t .34 .145 .96 incorrect analysis yields an inappropriate conclusion. A normal probability plot shows that the data could easily follow a normal distribution.600 − 5 2. Therefore.−39.05. and s d = 4. H 0 : µ d = 0 vs. 7. We test t= H 0 : µ d = 0 vs. indicating that it is reasonable to assume that the differences follow a normal distribution.6 t= = = 1.544 = −.047. Ho will be rejected in favor of Ha if either summary quantities are Because b. We would reject Ho at any alpha level greater than .7)] = 2[. 276 . This data is paired because the two measurements are taken for each of 15 test conditions. 40. from which d = −7.60 − (1.389 ) . t .39. 8 = 2. 1 Parameter of interest: µ d denotes the true average difference of spatial ability in brothers exposed to DES and brothers not exposed to DES.05.54) 45.2858. ( x 2 . -6. while s 1 = s 2 = 8.I. -8. -9.6 − 13. -8. A 95% lower confidence bound for the population mean difference is: s   23.178 . A 95% upper confidence bound for the corresponding population mean difference is 38.−4.14 We need to check the differences to see if the assumption of normality is plausible. -16. with a confidence level of 95%. The confidence level is not specified in the problem description. y1 ) = (6. 44. the population mean difference is above (– 49. and the C.811. c.96 and t = . and –2.600 ± (2.07.5) .91 16   n  ⇒ (∞.  n  15  Therefore.306 )  = −7.8) 0 .761)  = −38.029 (from Table A.753)  = 2635.00.600 ± 3. the data follow a reasonably straight path. and ( x 4 .60 − 10.05.06. Although there is a “jump” in the middle of the Normal Probability plot.15  d  = 2635.96.16. The data supports the idea that exposure to DES reduces spatial ability.2 . The differences (white – black) are –7.05. 20) . so there is no strong reason for doubting the normality of the population of differences.Chapter 9: Inferences Based on Two Samples 42.0 ) . 277 .54 = 49.14  d  = −38. -8. with corresponding p-value . A probability chart will validate our use of the t distribution. With ( x1.5 43.63 + 222. y 3 ) = (1. Let µ d = µ exp osed − µ un exp osed.600 .62. df = 8 6 t= 7 Reject Ho . and s d = 4.645  d + t . A 95% confidence interval: s   508.60 + 10. is  4. 3 H 0 : µD = 0 Ha : µD < 0 4 t= 2 d − µD sD / n = d −0 sD / n 5 RR: P-value < .178  − 7. so s p = 8.14) .18  d − t . for 95% confidence.89. a. and 1). 1. y 2 ) = (15.14).63 + (1.09.211 = (− 10.88. -1.54 = −49.306 . 1.7 ) − 0 = −2.  9  46.14 . d = 1 and s d = 0 (the d I’s are 1.025. y 4 ) = (21. b. (12. ( x3 . 2067 = = −4. and qˆ = .737 . Let p 1 = Plain. p 2 = Picture.2]   [− (1.Chapter 9: Inferences Based on Two Samples Section 9. 300 180 .2100 − .33 .737 )( 200 + 600 ) .2]   1 − Φ  − Φ  = . The data does not indicate that plain cover surveys have a lower response rate.2875 .0432 .0421) + .84 ≤ −1. a.4167 .84 .33 .275 and σ = . 278 .300 . With pˆ 1 = .01 = −2. the proportion of those who repeat after inducement appears lower than those who repeat after no inducement.263 .2100 .0432 . 1 2 3 4 Parameter of interest: p 1 – p 2 = true difference in proportions of those responding to two different survey covers. z ≤ − z .1910 . The calculated test statistic is 200 + 600 800 .4247 Fail to Reject Ho .150 z= = = −4.18 ≤ −2. 48. z = 300 + 180 − 4 . so power =   [(1. p = . 49.0359 Ho will be rejected if rejected.96 .0432      1 − [Φ (6.4167 − .96)(. H 0 : p1 − p 2 = 0 H a : p1 − p2 < 0 pˆ 1 − pˆ 2 z= pˆ qˆ ( m1 + 1n ) 5 Reject Ho if p-value < .18 . Ho is 1 1 (.54 ) − Φ (2.7125)( 300 + 180 ) .150 − . and pˆ 2 = .9967 .10 6 z= 7 104 207 − 109 213 ( 213 )( 207 )( 1 1 ) 420 420 207 + 213 = −.0421) + .4 47.150 . 1 1 (.72)] = .2875)(. p-value = . With pˆ 1 = 63 + 75 = .96 .263)(.300 − . and pˆ 2 = = .0427 z ≥ 1. 30 + 80 210 pˆ = = = .96 )(. 63 75 = . Ho will be rejected if pˆ = Since b. Because − 4. Ho is rejected. 011 z ≤ −1.661 . Ho is not rejected and we conclude that no difference exists.645 or . .836) . 034 104 11.0774 = (.43.360.5(.850 − .31) .2. Then taking the antilogs of the two bounds gives the CI for θ to be (1.661 1 m 1 n (.244)( 1 180 + pˆ 1 = 1 180 ) 153 = . θˆ = 189 11.48 is not ≥ 1. A 95% confidence interval is 126 = ( 224 395 − 266 ) ± 1.18. .38 )(1. and the standard deviation is 10.756 . H 0 : p1 − p 2 = 0 vs.037 )(104) .598 . 279 . pˆ = . The test statistic is z = pˆ 2 = 119 272 = .1213 .62 ) + 1. The hypotheses are H 0 : p1 − p 2 > 0 ..182 . b.1476 ) 2 = 6582 .850 .756)(.2) ≈ 0 .  H 0 : p1 = p 2 will be rejected in favor of H a : p1 ≠ p 2 if either z ≥ 1. .0934 ± . A 95% large sample confidence interval formula for () ln (θ ) is m− x n − y ln θˆ ± zα / 2 + .598 ± 1. pˆ 1 − pˆ 2 .045 1 − Φ (4. 395 + 266 ( ˆp1qˆ1 m + pˆ 2 qˆ 2 n )  = .48 .1708 ) . pˆ = = . and pˆ 2 = . Radiation appears to be 53. a. 52.2 .845 10.96(.1 .645 .933 + = .00742 Since 1. Let α ( pˆ 1 − pˆ 2 ) ± zα / 2 171 140  ( 224 ( 126 395 )( 395 ) 266 )( 266 )  = . z = = 1. ln θˆ = . b.1213) = (. z = 180 360 The p-value = beneficial. so reject Ho at any reasonable level.Chapter 9: Inferences Based on Two Samples 50.645 n= = .16 + .645 .189 = 4 .188 . so the CI for ln (θ ) is (11.193 . β = . With pˆ 1 = .96    51. and 180 = .1 .645 .818 . α (1. Using formula (9..2.0160. a. Taking the antilogs of the upper and lower bounds mx ny gives the confidence interval for θ itself. p 2 = .034)(189) (11. p 2 = true proportion of untreated bulbs that are marketable.0004 Let p 1 = true proportion of irradiated bulbs that are marketable. and z α / 2 = 1. With pˆ qˆ ( + ) .05 .7) with p 1 = .28 . 037 () = 1. Chapter 9: Inferences Based on Two Samples 54. 5.07) . so Var  d. c. The Z statistic is then = .96 )  Section 9.05. Ho can be rejected but at level . row 5. L = 2(1 . b. column 5.I. The computed value of Z is ( X1 + X 3 ) − (X1 + X 2 ) n = X3 − X2 .21 = (− .550 − .68 . p 2 = p 3 . 5. row 8. so L=.5 57.69 . n  X 3 − X 2  p2 + p 3 .8. 280 .95. 56.14 ± .550 .01. 55. and the 95% C. which is estimated by = n n   X3 − X2 pˆ 2 + pˆ 3 X3 − X2 n .68) = .0037 . = .690 ) ± 1.82 .106 ) = −.25 .690 .9. F. The “after” success probability is p 1 + p 3 while the “before” probability is p 1 + p 2 . 5 = 4. so P = 1 − Φ (2. n pˆ 2 + pˆ 3 X 2 + X3 n 200 − 150 = 2.207 .01.01. F.35. b. 15 + 7 29 = . 8 = 1 F.96(. so p 1 + p 3 > p 1 + p 2 becomes p 3 > p 2 . a. 5 F. From Table A..7719 . When Ho is true.1 requires n=769.8. a. is 40 42 (. thus we wish to test H 0 : p 3 = p 2 versus H a : p3 > p2 . pˆ 2 = = . pˆ 1 =  .5.8 = 3.001 Ho would not be rejected. The estimator of (p 1 + p 3 ) – (p 1 + p 2 ) is c. At 200 + 150 level .25  2 . + = n   n n Using p 1 = q 1 = p 2 = q 2 = . From column 8. 01. 5 F. 5 = 58. the p-value > .10 = 1 = .01.99.8.177 ≤ F ≤ 4.01 = . Since . F. For a lower tailed test. e. 01) = . F.74 ) − P( F ≤ .12.05. Since the given f value of 4. 4.10. 33 and F.64) = 2(. 4 = 6.12 = g.90.Chapter 9: Inferences Based on Two Samples 1 d.02 . we see that for denominator df = 20.f.95.94 . b. F. 5.75 falls between F. F.10. 10. of 35 in Table A.64 P(.9.12 = 4.01 < p-value < .10 = 1 = .2110. F.5. 5 obviously closer to . our f value is between F.177 . 5. Since the given f of 2.71 1 = .001< p-value < .05 (but F.64 .74) = P( F ≤ 4.10 1 2 P( F ≥ 5.10 = 2.95 .05.10.5.05). 281 .10.10 = F.10 = 5.01.52 .16 ) = .95 − . F. F.01.10.5 1 = = . we must first use formula 9.3030 .8 = .212 .99. so P( F ≤ 6.95.05.271 1 F.177 ) = .10.10.01 and .5. 5. however looking at both df = 30 and df = 40 columns.30 f. 01.01 and F. F.05.9 to find the critical values: F. h. The two tailed p-value = d. 5.10 = 3.0995 < f = . F.99.00 is less than c.16 .001.5.05. we can say that the upper-tailed p-value is between .2110 .200 < . So we can say . . Since a.10. There is no column for numerator d. = . 5 = e.6.0995 . so we cannot reject Ho .9) 2 = 1.f. = m – 1 = 10 – 1 = 9. 44 ≈ .3)2 (205. f = 1.814 < 2. 4. we reject H0 if f ≤ F. There is not sufficient evidence that the standard deviation of the strength distribution for fused specimens is smaller than that of not-fused specimens.384 is in neither rejection region. 4 = 1 F. = m – 1 = 10 – 1 = 9.75)2 (4. = n – 1 = 5 – 1 = 4. 282 . The calculated test statistic is (2.22 . we test H 0 : σ 1 = σ 2 vs.975. H 0 : σ 1 = σ 2 will be rejected in favor of H a : σ 1 ≠ σ 2 if either f ≤ F. 62. The calculated With test statistic is f = (277.8 . 9. and denominator d.05. σ 1 = true standard deviation for not-fused specimens and σ 2 = true standard deviation for fused specimens.05 if f > F. and we reject Ho at level . Ho is not rejected. We test f = H 0 : σ 21 = σ 22 vs.8 .384 . The data does suggest that there is more variability in the low-dose weight gains.44)2 = . 60. With numerator d. H a : σ 12 ≠ σ 22 .95. The test 2 s12 statistic is f = 2 .Chapter 9: Inferences Based on Two Samples 59.f. 61.85 ≥ 20. H a : σ 1 > σ 2 . Since . With numerator d. 9.72 = F. 47.814 . 4 = 6. The data does not suggest a difference in the two variances. so reject Ho at level . and σ 22 = variance in weight 2 2 2 2 gain for control condition.f. H a : σ 1 > σ 2 . Let σ 1 = variance in weight gain for low-dose treatment.08 . 22 ≈ 2. We can say that the p - value > .01.275 . We wish to test H 0 : σ 1 = σ 2 vs. 44 ≈ 1. 9 f ≥ F.05.19. Because f = 1. s2 f 2 ( 54 ) = (32 )2 = 2 .05 .63 = . = n – 1 = 8 – 1 = 7. and denominator d.56 or if f ≥ F. 7 .025.10. 47. which is obviously > .f.00 or =1 3.05.10.9. we do not reject H0 and conclude that there is no significant difference between the two standard deviations. The data indicates a significant difference.95. 3 = 9.m −1. We are (.n −1 σ 22 S 22 Fα / 2.10 . σ1 F. 9.59 )2 (3.9 )2 + (168. The set of inequalities inside the S2 / σ 2   2 S 2 F1−α / 2. 3.6 .99). The p-value for a two- 9 tailed test is approximately 2P( t > 3. the C.074. 283 . σ 22 σ 12 σ A 95% upper bound for 2 is σ1 s 22 F. so we need σ1 1 = .18) = 8.n −1  = 1 − α . Supplementary Exercises 65. 3 = for 64.006.28 σ is (.f.05. is 241 15. and for 2 is (.41). H a : µ 1 − µ 2 ≠ 0 .28 and F. This small of a p-value gives strong support for the alternative hypothesis.1)2 9 = 15. H 0 : µ1 − µ 2 ( x − y ) − (∆ ) = t= s12 s22 + m n We test ν = = 0 vs.023. which we round down to 15. Substituting S12 σ1 S12 σ 22 the sample values s and s yields the confidence interval for 2 .9 s12 = (3.524 27 2 412 + 10 10 (241) 2 (72. 1. m = n = 4. The approximate d. n−1 ≤ 12 12 ≤ Fα / 2.   S 2 /σ 2 P F1−α / 2.108 .15.003) = .22) = 2( .79)2 confident that the ratio of the standard deviation of triacetate porosity distribution to that of the cotton porosity distribution is at most 8. Then with s 1 = . m−1. and taking the square σ1 2 1 2 2 root of each endpoint yields the confidence interval for σ2 . I.Chapter 9: Inferences Based on Two Samples 63.n −1 parentheses is clearly equivalent to ≤ 2 ≤ .22 . The test statistic is 807 − 757 50 50 = = = 3.10. 9.m −1.160 and s 2 = .3.m −1.05. 1. which further supports the conclusion based on the p -value. We would fail to reject Ho .85.673 . This data does not suggest that including an incentive increases the likelihood of a response. 0 is a plausible value for the difference. At this point we notice that since pˆ 1 > pˆ 2 . 67. Since this interval contains 0. H a : µ 1 − µ 2 ≠ 0 yields a t value of -.f.Chapter 9: Inferences Based on Two Samples 66. Comparative Boxplot of Tree Density Between Fertilizer Plots and Control Plots 1400 Fertiliz 1300 1200 1100 1000 Fe rtiliz Co ntrol Although the median of the fertilizer plot is higher than that of the control plots. (d. Let p 1 = true proportion of returned questionnaires that included no incentive. We fail to reject Ho .5. b.20. = 13). A test of H 0 : µ1 − µ 2 = 0 vs. The hypotheses are H 0 : p1 − p 2 = 0 vs. and a two- tailed p-value of . the 110 98 numerator of the z statistic will be > 0. c. and pˆ 2 = = .682 . With 95% confidence we can say that the true average difference between the tree density of the fertilizer plots and that of the control plots is somewhere between –144 and 120. H 0 : p1 − p 2 < 0 . the fertilizer plot data appears negatively skewed. p 2 = true proportion of returned questionnaires that included an incentive. the data does not indicate a significant difference in the means. and since we have a lower tailed test. while the opposite is true for the control plot data. 284 . a. the p-value will be > . 75 66 = . The test statistic is z = pˆ 1 = pˆ 1 − pˆ 2 pˆ qˆ ( m1 + 1 n ) . 14 = 43. A 95% lower prediction bound for the strength of a single joint with a side coating is ( x − t .025. With s p = 13.35) = 609.37.66 . the average strength of joints with a side coating is at least 59.f.28.3 ± 908 . We use the pooled t interval based on 24 + 11 – 2 = 33 d.645)(552.96 the 90% confidence interval is then = (− 299.77.73 = (− .70) 1 24 + 111 = 2. a.55 ± 2.18. s 1 = 3.9) . For a 90% interval.78 (Note: this bound is valid only if the distribution of joint strength is normal.3 + 1691. Because the interval contains 0.. A 95% lower confidence bound for the true average strength of joints with a side coating is  s   5. ( )( ) ( 285 ) . For a confidence level of 95%.28) . That is. We are confident that the difference between true average dry densities for the two sampling methods is between -.379 5. 9   = 63.23 ± 20. 025.35 .77 .55 ± (2.833)  = 63.3) = 1082. the confidence interval is 2 2. 33 = 2. n = 11.96 1 + 101 ) = 63. Equating this value to the expression on the right of the 2 x1 − x 2 = 95% confidence interval formula.3 . s 2 = 3.Chapter 9: Inferences Based on Two Samples 68.09 and 83.23 − 11. That is. the associated z value is 1. a two-sided tolerance interval for capturing at least 95% of the strength values of joints with side coating is x ± (tolerance critical value)s. we find n1 n2 1082.96) s12 s 22 1082.6 70. so n1 n2 1. and n = 10.23 − 3.03 .11 . the interval is 63.09. Summary quantities are m = 24.1517.23 − (1. y = 101. we cannot say that there is a significant difference between them.833)(5.96 = 63.45 = 59.74. That is. The tolerance critical value is obtained from Table A.3.9 = 609. the strength of a single joint with a side coating would be at least 51.645. c. with a confidence level of 95%.9 − (− 473. Thus.23 ± 3. x = 103. so − 473.18 and 5. 95% confidence requires t. 69. Furthermore.37 .23 − (1.6 = (1. 9 s 1 + 1 n ) = 63. half of the width of this interval is 2 1691.70 .03)(3. s12 s 22 + .60.) b. 609.46 = 51.  n  10  with a confidence level of 95%.6 + = = 552. k = 95%.5.83. we can be highly confident that at least 95% of all joints with side coatings have strength values between 43.68 and s p = 3.78 .96  x − t .6 . The center of any confidence interval for µ1 − µ 2 is always x1 − x 2 .025.3 ± (1.6 with 95% confidence. where d = t .61 = (10. 72..58 245.f.33 lb-in. y = 2795. In addition. Because of this.11 and 25.105216 )2 d. The = 15. The value 0 is not contained in this interval so we can state that.025.96)2 10 ( 91.109681 + 35.72 ± (2.ν (9. with very high confidence.131)(3.43 = (− 22. the value of µ1 − µ 2 is not 0.120 )  = −4. s 1 = 245.025.7. 71.1.59)2 + (5.Chapter 9: Inferences Based on Two Samples d. 286 . First compute the difference between the amount of cone penetration for commutator and pinion bearings for each of the 17 motors.18 . Then d = −4. The bound on the error of estimation is quite large.85 .105216 )2 approximate degrees of freedom is ν = ( 91.0 . which is equivalent to concluding that the population means are not equal. the confidence interval spans zero. we have insufficient evidence to claim that the population mean penetration differs for the two types of bearings. This exercise calls for a paired analysis.57) = 17.18 ± 18.131 . 10 ./in. We would have to say  17  that the population mean difference has not been precisely estimated.11.025. x = 3975. and the 95% confidence interval for the population mean difference between penetration for the commutator armature bearing and penetration for the pinion bearing is:  35. then.120 . confidence interval for The large sample 99% µ1 − µ 2 is (3975.14.7 2 + 40 40 (1180.33) .0 .85  − 4.1336 ) .18 ± (2.16 = 2.0 − 2795.12 293.25.5 ≈ (1024.25) . With 95% confidence. and 9 t .72 ± 7.15 = 2. (commutator value – pinion value). A 95% confidence interval for the difference between the true average strengths for the two types of joints is (80.05 . so we use 15 9 17. These 17 differences are summarized as follows: n = 17.95 − 63.0) ± 2. we can say that the true average strength for joints without side coating exceeds that of joints with side coating by between 10.0 ) ± 1560.109681)2 + ( 35.61.23) ± t. s 2 = 293. m = n = 40. The interval is . s d = 35. s 1 = 14. steel specimens and The data supports the article’s authors’ claim. a.2. the p-value = P( t > 2. s 2 = 12.6) = . From Table A.5. s 1 = 14.047203)2 29 29 ≈ 2(.6 . 8 2 16 2 ) 2 ( ) +( ) 15 74.2 .f.028 .086083 (. m = 16. t= 287 .0 . y = 129. x = 74.007. 5 2 14 2 = 27.03888 + .9207 120. y = 61. s 2 = 39. The test statistic is − . which is less than the specified significance level.66 − . n = 15.3309)2 + (101. the results are different. the two-sample t test is appropriate. Since we can assume that the distributions from which the samples were taken are normal. is ν statistic is t= = ( 14 .05. The relevant hypotheses are H 0 : µ1 − µ 2 = 0 vs. we use the two-sample t test.7. 006) = .8 2 16 2 14.9207 )2 10 14 the two-tailed p-value ≈ 2(.1 . Assuming both populations have normal distributions. The two-tailed p-value (.1. Then the hypotheses are H 0 : µ1 − µ 2 = 0 vs.66 t= = = −2. 014) = .0 . µ 1 denote the true average tear length for Brand A and let µ 2 denote the true average tear length for Brand B.047203 .03888 )2 + (.0 − 61. The relevant hypotheses are H 0 : µ1 − µ 2 = 0 vs. Assuming both populations have normal distributions. the two-sample t test is appropriate.252 2 ( 120.7. Let µ 1 denote the true mean headability rating for aluminum killed µ 2 denote the true mean headability rating for silicon killed steel. m = 11.5 2 12. H a : µ 1 − µ 2 ≠ 0 .012 . 8 2 16 + 1214.8. so the Let approximate d.97 .1 = = −2.3309 + 101. 75.0 14. H a : µ 1 − µ 2 > 0 . The test statistic is − 31.5 . Ho is rejected and we conclude that the average tear length for Brand A is larger than that of Brand B.252) freedom ν = = 18. The test 13 ≈ 2.64 . n = 14. so we use 18.1 − 31. so we use 57. The approximate degrees of freedom . obviously. From Table A. so we would reject Ho . x = 98. 74. which we round down to 27. No.Chapter 9: Inferences Based on Two Samples 73.5 + 1214.84 .25 .086083)2 ν = = 57. H a : µ 1 − µ 2 ≠ 0 . The approximate degrees of 18. (18. At a significance level of . The relevant hypotheses are H 0 : µ1∗ − µ ∗2 = 0 (which is equivalent to saying µ1 − µ 2 = 0 ) versus H a : µ 1∗ − µ 2∗ ≠ 0 (which is the same as saying µ1 − µ 2 ≠ 0 ). the p-value s p m1 + 1n 4.1 − (− 25) changes to t = = −. which is equivalent to saying that there is a difference between µ 1 and µ 2 .8) = P( t > 1. with d. then even if µ1 = µ 2 . 77.05. 0.024 . The relevant hypotheses are H 0 : µ d = 0 vs. However. 76. rejected and we conclude that there is a difference between b. the p-value 120. a.278 . this data does not show that the heated average strength exceeds the average strength for the control population.8 ) = .8 .7. From Table A.6 ) = 22.0 ) = 2(. The observed differences (control – heated) are: -.76 ≈ −1.0305 d = −. = m + n – 2 = 8 + 9 – 2 = 15. the lower tailed p- 5 value associated with t = -1. we For the hypotheses fail to reject Ho . H a : µ d < 0 . where µ d denotes the difference between the population average control strength minus the population average heated strength. The test statistic is = −1.008.742 18 + 19 associated with t = 3. H a : µ 1 − µ 2 < −25 .9 ) +  (4. There is insufficient evidence that the true average strength for males exceeds that for females by more than 25N.e. No. The test statistic  8+ 9 − 2 8 + 9 − 2 x *−y * 18. = 5 – 1 = 4. where µ and σ are the parameters of the lognormal distribution (i.8 is P( t < -1.004) = .04 ≈ 3.06. Since the p-value is greater than any sensible choice of α . From Table A. The sample mean and standard deviation of the differences are t= − . The pooled t test is based on d.f. the two means µ 1 and µ 2 (given by the ln(x)). and -. So when σ 1 formula above) would not be equal. Ho is µ1∗ and µ ∗2 . Ho should not be rejected.742 . At significance level . the mean and standard deviation of ∗ = σ 2∗ .7.6) = . so the paired t test is employed. 288 .Chapter 9: Inferences Based on Two Samples b. H 0 : µ1 − µ 2 = −25 vs. . so s p = 4.252 ≈ P(t < −. This is paired data.05.0 is 2P( t > 3.05. The 2  m − 1  2  n −1  2 pooled variance is s p =   s1 +  s2  m + n − 2  m+ n − 2  8 −1  2  9 −1  2  (4.. ∗ ∗ ∗ ∗ when σ 1 ≠ σ 2 .0 is t = = = 3.01.073.0 .f. At significance level .0305 . Therefore. then µ1∗ = µ ∗2 would imply that µ1 = µ 2 . the test statistic − 31. -.556 . With degrees of freedom 18.49 .02.0 − 11.024 and s d = . The mean of a lognormal distribution is ∗ ∗ 2 ∗ ∗ µ = e µ + (σ ) / 2 . 05.f. we reject Ho .5) ≈ 0 . Let µ 1 denote the true average ratio for young men and µ 2 denote the true average ratio for elderly men. This indicates a lack of statistical significance in the difference of delight index scores for men and women. The reported p-value for the test of mean differences on the distress index was less than 0. the relevant hypotheses are H 0 : µ1 − µ 2 = 0 vs. The d. Assuming both populations from which these samples were taken are normally distributed. The value of the test statistic is t= (7. The reported p-value for the test of mean differences on the delight index was > 0. 80.47 − 6. with the mean score for women being higher.5 Good Visibility P oor Visibility 4 3 1.71) (.001. This indicates a statistically significant difference in the mean scores.05 . 79.5 2 0. The relevant hypotheses would be µ M = µ F versus µ M ≠ µ F for both the distress and delight indices. = 20 and the p-value is 12 P( t > 7. H a : µ 1 − µ 2 > 0 .22)2 + (. thus a t-test is not appropriate for this small a sample size.5 . 2. We have sufficient evidence to claim that the true average ratio for young men exceeds that for elderly men.78. Since the p-value is < α = . 261 .28) 2 13 = 7.5 1 -1 -1 0 0 1 Normal Score 1 Normal S core A normal probability plot indicates the data for good visibility does not follow a normal distribution. 046. there are more degrees of freedom. The test statistic is  24   24  − . Taking the derivative with respect to n and 400 − n n −2 2 −2 2 equating to 0 yields 900(400 − n ) − 400n = 0 . and the p-value is smaller than with the unpooled method. 79 1 .023) = .1 )] = 2( . We are testing the hypotheses H0 : µd = 0 vs Ha: µd ? 0.52 . at significance level .52 2 12 2 = 15.1 . . we would reject the null hypothesis and conclude that there is a difference between average intake and expenditure. Test statistic t= 1.96 t= = = −1.004.96 t= = ≈ −2. Using either significance level .757 1 . which yields n = 47.97)] 2 2 .79 2 14 2 .88 with df = n – 1 = 6 leads to a p-value of 2[ P( t > 7 3. m = 141. We obtain the differences by subtracting intake value from expenditure value.01.95 ≈ 16 . and the test statistic 13 11 8. 3n n 900 400 b. m = 400 – n = 240.3970 . the 400 − n n 900 400 n which minimizes + .36 − . so s p = 1. With n denoting the second sample size. This yields n = 160.12 2 ) 2 ( ) +( ) 1 .181 141 + 121 The degrees of freedom With the pooled method.58 ) 900 400 + .96 − . We wish to find the n which minimizes 2( zα / 2 ) + . 82. we will reject Ho if p − value < α . the first is m = 3n.97 leads to a p-value of 2[ P(t > 1. 83.79 2 14 52 + 1.181 . However.062 Pooled: ν = m = n − 2 = 14 + 12 − 2 = 24 and the pooled variance  13  2  11  2 is  (.4869 + 14 12 ≈ 2(. 290 .197 = 3. .Chapter 9: Inferences Based on Two Samples 81.52) = 1. we will use a paired t test. We then wish 20 = 2(2.05 or . whence 9n = 4(400 − n ) .48 − 9. or 5n 2 + 3200n − 640. or equivalently.000 = 0 . a.001.031) ≈ .88 ) ˜ . µ1 = µ 2 versus Ha: µ1 ≠ µ 2 We wish to test H0 : Unpooled: With Ho : ν = ( µ1 − µ 2 = 0 vs. The p-value = 2[ P( t 24 > 2. Because of the nature of the data. Ha: µ1 − µ 2 ≠ 0 .79 ) +   (1.465 1. we would not reject. 802 .437 . At significance .0003 .437 or t ≤ −4. 86.004) = . the data does not indicate a difference in true average insulin-binding capacity due to the dosage level.125 z= = = −3.129)( 91 + 110 ) . H a : p1 − p 2 ≠ 0 . so reject Ho at any reasonable level.0320 pˆ 1 = Φ (− 3.042721) = (. so we reject Ho if t ≥ 4.3) = 2 (.437 The test statistic is . We test H 0 : µ1 − µ 2 = 0 vs. so we cannot reject Ho . ο C . P-value = 2P( t > 3. With pˆ qˆ ( m1 + 1n ) Let p 1 = true survival rate at 11 73 102 175 = .927 − .91) < Φ (− 3. Assuming both populations have normal distributions. [(n = ] − 1)S12 + (n 2 − 1)S 22 + (n 3 − 1)S 32 + (n4 − 1)S 42 σˆ n1 + n2 + n3 + n 4 − 4 2 ( n1 − 1)σ 1 + (n2 − 1)σ 22 + (n3 − 1)σ 32 + (n 4 − 1)σ 42 2 E σˆ = = σ 2 .49) = .6561) + 7 (.4096) + 17(.008 which is > .0102083)2 = 11. qˆ = . The estimate for n1 + n 2 + n3 + n4 − 4 [15(.91 .871)(.0005. The approximate degrees of freedom ν 2 ( .68 ≈ 3. so we use df = 11. a.871 .129 . pˆ = = .11 = 4. H a : µ 1 − µ 2 ≠ 0 .001.Chapter 9: Inferences Based on Two Samples 84.2601) + 11(. The two survival rates appear to differ. 85.001.927 .0325125) 2 + (. which is not ≥ 4. 91 110 201 .042721 t= level . The hypotheses are pˆ 1 − pˆ 2 H 0 : p1 − p 2 = 0 vs. b. The p-value = 1 1 (. 7 11 t .409 the given data is = 50 2 1 ( ) [ ] 291 .437 . and pˆ 2 = = . The test statistic is z = . the two-sample t test is appropriate. p 2 = true survival rate at 30 ο C .3 .1225 )] = .4 .802 − . 79 . With pˆ 1 = 2500 = . and t = .34 ) = . so the p-value = Φ (− 1.0332 z= = 4. If µ i ' s referred to true average IQ’s resulting from two different conditions. 2500. It appears that a response is more likely for a white . the effect of carpeting would be mixed up with any effects due to the different types of hospitals.15 = 2. and pˆ = . 05 = 1. ∆ 0 = 0 . 89.142 n  . 025. In the situation described.10 .96. s 2 = 3. 100.0834 . 142 n   β = .645 . . The experiment should have been designed so that a separate assessment could be obtained (e.05 . σ 1 = σ 2 = 10 . so β = Φ 1. µ1 − µ 2 = 1 would have little practical significance. With x = 11.0079 name than for a black name.20 . and . a randomized block design). d = 1. 34 − 46 = −1. H 0 : p1 = p 2 will be rejected at level α in favor of H a : p1 > p2 if either 250 167 z ≥ z.g.21 . yet very large sample sizes would yield statistical significance in this situation. = .131 or if t ≤ −2.Chapter 9: Inferences Based on Two Samples 87. so Ho is rejected . . so no separate assessment could be made. so we would not judge the drug to The computed value of Z is z= be effective. y = 9. 292 .05 if either t ≥ t.000 respectively.9015.68 ..2 . s p = 2. s1 = 2.95. 90. and 10.34 . H 0 : µ1 − µ 2 = 0 is tested against H a : µ 1 − µ 2 ≠ 0 using the two-sample t test.8264.0000 for n = 25. rejecting Ho at level . and m = n = 8. so Ho is not rejected.0294.0668 .131 . 88. σ = giving the  200 14. A lower tailed test would be 34 + 46 appropriate.645 −   n 14 . pˆ 2 = 2500 = .0901 > . . Chapter 9: Inferences Based on Two Samples 91. 93. z = -5.616 and y = 2. µ 1 and µ 2 denote the true average weights for operations 1 and 2.011363 + 3. 293 . The test statistic value is t = 1402.59) λˆ1 = x = 1.000 .94 ± (1.30672 = − 17. H a : µ 1 − µ 2 ≠ 0 .39 4. λˆ = .56 . The data indicates that there is a significant difference between the true mean weights of the packages for the two operations.35 = (− 1. λˆ2 = y = 2.00 Because 1. The value of the Let test statistic is (1402.1 .97 = 30 2.3)) < . so we would certainly reject H 0 : λ1 = λ2 in favor of H a : λ1 ≠ λ2 .30672)2 29 29 so we can reject Ho at level . The relevant hypotheses are H 0 : µ1 − µ 2 = 0 vs. 92. respectively.43 . and the confidence interval is m n − .97 )2 + (9.96)(1.24 − 1400 10.05. so use df = 57.0006 .699. t .24 = 1.3 and p-value = 2(Φ (− 5.−.318083) The d. b. a. 29 = 1. Var ( X − Y ) = Z= X −Y λˆ m + λˆ n λ1 λ 2 mX + nY + and λˆ1 = X .63) (10.77 .025.96) 2 t= = − 17.39 7.29.557 . H 0 : µ1 = 1400 will be tested against H a : µ 1 > 1400 using a one-sample t test x − 1400 with test statistic t = . λˆ2 = Y .24 − 1419. With x = 1.1 < 1.318083 = −6.77) = −.5 .f. (4.94 ± .05. Ho is not rejected. With degrees of freedom = 29. giving m n m+n . we reject Ho if s1 m t > t . 2. 30 30 2 ( 7. λˆ1 λˆ2 + = 1. True average weight does not appear to exceed 1400.011363) 2 + (3. 57 ≈ 2.62 . ν = = 57.699 . Chapter 9: Inferences Based on Two Samples 294 . 295 . Ho is not rejected.20 ) + (39. b.00 46.34 3 698.20 4 682. it can be said that . Ho will be rejected if f ≥ F. a.9188 F. so reject Ho . The computed value of F is f = MSTr 2673.07 − 712.678 > 3.36 .05.0604 f = = = 3.3 = = 2.93 − 712. Type of Box x s 1 713.10.15 = 3.223. 2.06 .55 ) + (40.691. 4.44 .10. 4.1 1.691.02 39.02 − 712.9188 4 MSTr 6. and our computed value of 2.51) = 6.34) + (37.06 (since I – 1 = 4.51)2 + (698.51 [ 6 (713. The data does not indicate a difference in the mean tensile strengths of the different types of copper wires.93 40.51)2 4 −1 2 + (682.07 37.05 < p-value < .87 Grand mean = 712.CHAPTER 10 Section 10.87) = 1.05.10 MSTr = [ ] ] 3. Since 2.2 ≥ 3.10. There is a difference in compression strengths among the four box types.44 is between those values.00 − 712.06 and F.0604 1 2 2 2 2 MSE = (46.678 MSE 1. 4. F.05.223.15 = 2.44 is not MSE 1094.55 2 756.3. 20 = 3.15 = 3.51) 2 + (756. and I ( J – 1 ) = (5)(3) = 15 ). µ i = true average lumen output for brand i bulbs.01.38 = 29. The comp uted test statistic value of 10.51.1143 2 1 MSTr .Chapter 10: The Analysis of Variance 3. 05. 3). we wish to test H 0 : µ1 = µ 2 = µ 3 versus H a : at least two µ i ' s are unequal.56 − 1.112 169. H 0 : µ1 = µ 2 = µ 3 = µ 4 is rejected at level . f = = = 1.256 F. 2 ( 166.85 exceeds 4.49 .643 Total 39 1.30 With 2 and I ( J – 1) = 21.5367 )2 + (1.5367 )2 = .38 .134 15.39 − 5.95 . 21 = 2.. we cannot reject Ho .85 Error 36 563..01.707 10. Source Df SS MS F Treatments 3 509. 6. 30 = 4.10.60 . so reject Ho in favor of Ha: at least two of the four means differ. [ ] 10 (1. There are no differences in the average lumen outputs among the three brands of bulbs.36 − 5. 3.24 ) + (.05.26) ] = .10.30 For finding the p-value. 4. H a : at least two µ i ' s are unequal. 3. we see that 1.51 .57 .0660 . MSE = σˆ W2 = = 227.38 29. Fail to 3 MSE . + (6.01.5367) 2 + (1.072. Since .9. 2. Reject Ho if f ≥ F. we need degrees of freedom I – 1 = MSE 227.19 ) = 166.3 MSTr = σˆ B2 = = 295.08 . so the p-value > .73 . We test H 0 : µ1 = µ 2 = µ 3 vs. so 2 2 SSE = 49. 27 = 5. so SST = 911.19) = 20. The three grades do not appear to differ.10 is not < . The grand mean = 1. 296 .95 − 20. 2.0660 MSTr = [ reject Ho .1143 2 2 2 MSE = (.43 ≥ F. 57 3 = 6.43 .63 − 1. so 2 21 MSTr 295.19 ) + .08) x •• = IJx •• = 32(5. µ i = true mean modulus of elasticity for grade i (i = 1.42 − 1. 28 = 2. 2.5367. Since 28 6. 2 . In the 2nd row and 21st column of Table A. SSTr = 8 (4. 591.05 .30 < F. Then f = 20 .27 ) + (.2 4773.30 . There are differences between at least two average flight times for the four treatments.91 − [ 32 ] = 49.57 .60 f = = = 1.95 . 5. 36 ≈ F. 95 Error 20 15.68 .02 . 3. 01.05 and Ho is rejected at level .55. The summary quantities are H 0 : µ1 = µ 2 = µ 3 = µ 4 = µ 5 is rejected. ΣΣx 2 ij x1• = 34.10. SST = 946.14 .419. 4 .04 14. x 2• = 39.98 .992. x 4• = 41.467. and we fail to reject Ho.3.12 .. (These values should be 30 1049.9 .757 Total 23 24. = 2.94 = F.5 . x1• = 2332. 05. so CF = 5.0 .72 25. so CF = 24 = 922.12 − 8.24 1. H a : at least two µ i ' s are unequal. The summary quantities are x •• = 148. x 5• = 3060. 6 Source Since Df SS MS F Treatments 3 8.05.69 Total 19 310. 2 ( 148.998. The hypotheses are 1. 4 31.98 = 15. There are differences in the true average axial stiffness for the different plate lengths. x 4• = 2851.03 10.21.17 displayed in an ANOVA table as requested.56 = 8.70 < F.953.70 Error 16 235.46 . x •• = 13. 20 .48 . SST = 75.12 3.95 < 4.9 . Source Df SS MS F Treatments 3 75.713. 2 2 ( 34.4 .992.30 = 4.56 = 24.081. 3.56 .3) + .01 < p − value < . SSE = 24. 20 < 3.98 2.5 . x 2• = 2576.475.5 . 9.6 . SSE = 31.446 . 43.500.8) = 946.10.165.998.10 = F. x 3• = 2625.14 MSE = = 1049.03.99 3.475. 297 .027.58.14 .48 ≥ F.3 .68 − 922.Chapter 10: The Analysis of Variance 7. .2 .8 .) Since 10. x 3• = 33.55 SSTr = 43. so p-value > . MSTr = = 10. + (41.9 ) SSTr = − 922..16 8.76 H 0 : µ1 = µ 2 = µ 3 = µ 4 vs. 01.17 and f = = 10.14 . overestimates ] 2 µ1 = .09 . 3. I I b.1 The brands seem to divide into two groups: 1. Q. a. J = σ2 + µ2.5 462.3 512.8 = 36.2 11.0 469. 298 .05. 5.8 532.37 272.  σ2   σ2     E (SSTr ) = E JΣ X i2• − IJX •2• = J ∑  − IJ 2  2   J + µ IJ + µ  i    [ 2 2 = σ2 + µ i2 . with no significant differences within each group but all between group differences are significant. E X •2• = Var X • • + E X •• ( ) ( ) [ ( )] d. E X i2• = Var X i• + E X i• ( ) ( ) [ ( )] c. so Σ(µ i − µ ) = 0 and E (MSTr ) = σ 2 . MSTr 2 σ 2 ). 4 3 1 4 2 5 437. w = 4. Section 10. and 2 and 5. When Ho is false. When Ho is true. so E (MSTr ) > σ 2 (on average. = µ i = µ ..37 . so 2 E (SSTr ) (µ − µ ) . E (MSTr ) = = E JΣX i2• − IJX •2• = σ 2 + J ∑ i I −1 I −1 [ e. and 4.15 = 4.Chapter 10: The Analysis of Variance 10. E (X • • ) = ΣE (X i• ) Σµ i = =µ. IJ ] = Iσ 2 + JΣµ i2 − σ 2 − IJµ 2 = (I − 1)σ 2 + JΣ( µ i − µ ) . 2 Σ(µ i − µ ) > 0 .. 5 1 462.06 = 1.95 33. 2 does not differ significantly from 4 or 5. 5 does not differ significantly from 2. w = 4.75 15.g. 05. 13.87 1.84 Treatment 4 appears to differ significantly from both 1 and 2.Chapter 10: The Analysis of Variance 12.75 . etc.0 469. 01. 3 427.36 = 4. While brands 3 and 4 do differ significantly. 4. 8 1 2 3 4 4. 3. 4 does not differ significantly from 1 or 2. so Q.3 512.8 5 532.1 Brand 1 does not differ significantly from 3 or 4.0 4 469.52 5.) do appear to be significant. J = 8. but both differ significantly from brands 1.94 . 3 1 4 2 5 437. 299 .08 29. w = 3. 10 2 1 3 4 24. 2 with 3.1 Brands 2 and 5 do not differ significantly from one another.3 2 502.5 462. there is not enough evident to indicate a significant difference between 1 and 3 or 1 and 4.64 = 5. 1 with 2 and 5..41 .39 4.69 26. 14.8 532. Q.87 .49 6. 3 does not differ significantly from1. I = 4. but there are no other significant differences. but there are no other significant differences. 4. 15. but all other differences (e. and 4.36 Treatment 4 appears to differ significantly from both 1 and 2. 28 ≈ 3. MSE = = 14.20 and ΣΣx ij = 4280 . and SSE = 415.50 ) 3 = −. yet 4 and 10 differ significantly. θ = Σ ci µ i where c1 = c2 = . IJ 20 2 2 2 2 2 2 Σxi• (51) + (71) + (70 ) + (46 ) + (40) = = 4064. a. the CI is (from 10. b.20 = 200. 300 . Length 12 differs from 4.396 ± .13 407. Because 3.396 and Σ ci2 = 1.51) is only slightly more than twice the smallest (s 3 = 20.06 375. SST = 415. and for lengths 6.80 – 200. Thus 200.075 .83) it is plausible that the population variances are equal (see text p.06 .−. 6 = 2. With Let x•2• (278) 2 = = 3864. µ i = true average growth when hormone #i is applied.447 and MSE = . c.447 ) (. 4 . 18. H 0 : µ1 = .17 There is no significant difference in the axial stiffness for lengths 4.49 . so θˆ = .5 on page 418) − . 5x1• + .30 = 215. but does not differ from 10. With t .36 437.025.21 368.50 – J 4 2 3864. H a : at least two µ i ' s differ.50.3 215.15 = 3. a.80. Since the largest standard deviation (s 4 = 44. 17.48 and associated p-value of 0. 406). = µ 5 will be rejected in favor of H a : at least two µ i ' s differ if f ≥ F.000.5 = 50. There appears to be a 14.Chapter 10: The Analysis of Variance 16. 6.091) .49 ≥ 3.075 f = = 3. so SSTr = 4064. and 10. reject Ho .3667 MSTr = difference in the average growth with the application of the different growth hormones. 4 6 8 10 12 333. and 8..06 .. The relevant hypotheses are H 0 : µ1 = µ 2 = µ 3 = µ 4 = µ 5 vs.5 and c 3 = −1 .05. 8.03106 )(1. and 8. 6.50 .50 . we can reject Ho and conclude that there is a difference in axial stiffness for the different plate lengths.701.305 = (− . With the given f of 10.5 x2• − x 3• = −.3. and 4 15 50.3667 .396 ± (2.03106. 21.75 which doesn’t exceed 8. and 17. and f = 22.645) 1. so no differences are judged significant.f.37 14.05 . Reject Ho if f ≥ F.6. The sample means are.4867 SSE > 10 ( = 20 – 10. Thus we wish > 3.34.16 . MSE = 1.167. error d.005. = 12. H a : at least two µ i ' s differ .94.5.15 = 4.681. These become 431.00 = 7. in increasing 4 order. b.6 ≥ 3. 2 . SSE / 12 SSE MSE SSE 1680 w = Q.645 i) Confidence interval for µ1 Σ ci xi ± t α / 2.681.12 = 3.50. 140 1680 = and F. w = .05.681. so f = 1500 . = µ 6 vs.1333.75 – 10.4867 SSE . The data indicates there is a dependence on injection regimen.00.so no significant differences are identified).89 J 60 SSE (significance of f) and . we reject Ho . 12.89 . Grand mean = 222. w = 4. 17. 3.34 ) With 22.25 ) = (29.Chapter 10: The Analysis of Variance b. a. 01.75.64 ) .8333(1. The hypotheses are H 0 : µ1 = . The most extreme difference is 17. 10. 14 Confidence interval for 14 (µ 2 + µ 3 + µ 4 + µ 5 ) − µ 6 : = −67.645) ii) = 61. the difference between the extreme x i• ' s .75. 5.01.8333.60 = 3. Clearly no value of SSE can satisfy both f = inequalities.4 ± (2. Assume t . so SSE = 425 will work.05.75 ± (2. MSTr = 38.015.34 .2 ) = (− 99. Tukey’s method and the F test are at odds. MSTr = 140. use f ≥ F.12 = 3.37 .28. −35. 19.60 > SSE and SSE > 422. 78 ( but since there is no table value for ν 2 = 78 . 5. and the inequalities SSE become 385.4867 SSE as before.16.88 > SSE and SSE > 422. 78 ≈ 2.77 = .16 . Q.16) 14 301 . Now MSTr = 125. so 20. I ( J −1) − 15 (µ 2 + µ 3 + µ 4 + µ 5 + µ 6 ) : ( ) MSE Σ ci2 J 1.3667 = 8.28 ... 11.8333(1.50. 07 .3.07 = 456.56 Error 74-3=71 970. x 4• = 227.14 5. x 3• = 50. J3 = 4.05. 24.94 .12.12 ≥ F. x 2• = 221.85 .078. x 3• − x 4• ± W 34 = (5.07 5 4 4 5 = 49. x 2• − x 4• ± W24 = (9.05.56 ≥ F. 23.953. 4. ΣΣx ij2 = 50.14 ⋅ significantly different. 2.35 ) ± (5. = µ 4 is rejected at level .28 .497.89  1 1  + = 4 .96 13.81) *. 4 3 2 1 This underscoring pattern does not have a very straightforward interpretation.88) ± (5..89 .6 .05.497. corresponding to µ ' s that are judged MSE = 8.78) ± (5.4) 2 + (221.48) *.497. 2  J i J j  2  J i J j  x1• − x 2• ± W12 = (2. x 2• − x3• ± W 23 = (4. and f = 17. 71 ≈ 4.01.50 = 152.07 .90 ) ± (5. J4 = 5. 302 . MSE  1 1  8. x 4• = 45.81) . There is a difference MSTr = in yield of tomatoes for the four different levels of salinity. MSE = = 8. x1• = 58.50 124. x 2• = 55.01.4) 2 + (227. reject H 0 : µ1 = µ 2 = µ 3 at level ..34 . With Wij = Q.09 5. Since Source Df SS MS F Groups 3-1=2 152. SSTr = (291.14 = 3. Summary quantities are x •• = 943.Chapter 10: The Analysis of Variance Section 10. x1• − x 3• ± W13 = (7.43) ± (5. x1• = 291.17 .50 . J2 = 4.40 . x1• − x 4• ± W14 = (12. H 0 : µ1 = .81) .89. x 3• = 203.55) ± (6. CF = 49.13) . from which SST = 581 .6 )2 + (203. Because 3 18 − 4 17. and SSE = 124.50 .57 − 49.5) 2 − 49. *Indicates an interval that doesn’t include zero.9 . Thus 456. 11 + .4 . J1 = 5.3 22.81) *.5 .50 .4 .68 Total 74-1=73 1123.18 76. 29 * − 5.273.04 ± 1. a.017 ) + 13(42.04 ± 1.20 .37 * − 5.10 12.37 − 4.30 ± 1.10.5 . We do not reject the MSE = null hypothesis at significance level .37 . 20 = 4 .6 . = µ 6 is rejected.45. so we conclude that the data suggests no difference in the percentages for the different regimens. and we need the grand mean: 8(43.4 5.0 − 43.017) = 8.6 4. f = 79.4 − 43.621 1. = + 17(43.017) + 13(43.714 Since 48 MSE 1.0 55.5 − 43.017 ) X .7 72.4 2.334 2 2 8.9 = = 43. i: 1 2 3 4 5 6 JI: 4 5 4 4 5 4 xi • : 56.3) + 16(1.2) + 13(1. ' s .4) + 17(43. The distributions of the polyunsaturated fat percentages for each of the four regimens must be normal with equal variances.5 2. Then f = = = 1.3.03 ± 1.5 ) + 12(1.2) = 77. MSTr = 21.27 ± 1.6 Interval − 3.79 MSTr 2.) + 2  J i J j  Interval 1.44 * Pair 2.6 Interval − 1.4 85.5 3.3.10 ΣΣx 2j = 5850.10 17.017 52 52 2 2 2 SSTr = ∑ J i (x i.3 52.4 64.3 2. 26.83 13.2 xi • : 14.64.10 .37 Asterisks identify pairs of means that are judged significantly different from one another.1) + 14(43.44 1. MSE = .6 3. The modified Tukey intervals are as follows: (The first number is second is Wij = Q.4 1.334 = 2.0 ) + 13(42.3 ≥ F. We have all the X i.2 1.80 13.44 * 1.37 * − 4.44 − 3.64.31 ± 1. SSE = 5.3 1.10 (or any smaller).37 * Pair 3. ) = 8(43.Chapter 10: The Analysis of Variance 25.10.00 ± 1.44 * − 4.30 ± 1. 50 = 2. a. b..34 ± 1.01.621 .37 * − 4.5 4.96 ± 1.44 − .01 ⋅ Pair 1.5 x i• − x j • and the MSE  1 1  .00 ± 1.1 − 43.37 − . − x.778 3 2 2 2 2 SSTr = ∑ ( J i − 1)s 2 = 7(1.778 = 1.20 Thus SST = 113.00 ± 1.14 18.37 ± 1. b. H 0 : µ1 = . we can say that the p-value is > .714 < F. Since 79 .5) 2236.27 ± 1. 303 .19.30 ± 1..79 and and MSTr = 77. SSTr = 108...4 x• • = 386. 01. 3.16 ± . 27.27 With numerator df = 3 and denominator = 20.78.61 = 23. 2 i Ji t .27. 20 = 2.94 . MS 7.845 .10 .10 − 1181. The 99% t confidence interval is ( ) MSE Σc i2 . so = + Ji 7 5 6 6 SSTr = 1205. The interval in the answer section is a Scheffe’ interval.9) 2 = 1205.05 .Chapter 10: The Analysis of Variance c.9) (37. so SST = 65. so .75 < F. Let x2 (168.16 ± 2. 20 = 3. and is substantially wider than the t interval.4 ) = 1181. 20 = 4.10 < 3. and since the p-value < . The resulting interval is ) ( )( ) ( ) − 4. 2 2 ij Source Df SS Treatments 3 23.005.1) 2 + (34. 3. At least one of the pairs of brands of green tea has different average folacin content. = −4.16 .005.1719 . µ i = true average folacin content for specimens of brand I.62 = − 4. ( 1 4 1 4 1 2 (Σc ) = .05.5) + (38. Ji Σ ci xi • ± t . I ( J −1) Σ ci xi • = x1• + x2• + x 3• + 14 x 4• − 12 x 5• − x6• 1 4 MSE = . H a : at least two µ i ' s differ .−3.49 .61 .75 F.49 Error 20 41. we reject Ho .88 and •• = n 24 2 2 2 Σxi• (57.83 2.273.54 . The hypotheses to be tested are H 0 : µ1 = µ 2 = µ 3 = µ 4 vs.01 < p − value < .78 Total 23 65.1719 = −4.845 . a.09 F 3. 304 . ΣΣx = 1246.05.273 . { } SSTr = Σ Σ(X i• − X •• ) = Σ J i (X i• − X •• ) = Σ J i X i2• − 2 X •• Σ J i X i• + X •2• Σ J i i j 2 i• 2 = Σ J i X − 2 X •• X •• + nX i 2 i 2 •• i = Σ J i X − 2nX + nX i 2 i• 305 2 •• i 2 •• i = Σ J i X − nX i 2 i• 2 •• . so the normality assumption is plausible.27.82 for I = 1. 4.45 ± 2.92 ± 2.4 1. 2.4 .68 ± 2.15 ± 2.4 2.3 1.2 . and 5. . 05. A normal probability plot appears below.Chapter 10: The Analysis of Variance x i• = 8.25 * 3. With that the distribution of residuals could be normal.45 1.25 2.77 ± 2.34 4 3 2 1 Only Brands 1 and 4 are different from each other.53 ± 2.96 ⋅ c.96 and Wij = 3.50. and indicates b. 7. Normal Probability Plot for ANOVA Residuals 2 resids 1 0 -1 -2 -2 -1 0 1 2 prob Q.37 2. 2. we calculate the residuals x ij − x i• for all observations. 28.09  1 1  + . 6. 20 = 3. so the Modified Tukey 2  J i J j  intervals are: Pair Interval Pair Interval 1.3 1. 3.35. 4.45 1. Chapter 10: The Analysis of Variance 29. the ANOVA should be done using 1 = 3 .5).15.5 J . Φ2 = 32.16 = 5. Φ = 1. and y ij = x ij . a. SSTr = 2. ≈ . Ho cannot be rejected. 3 .5 .90 . 2 (20 25 ) α 4 = 54 . J = 9 looks to be sufficient.5). ν 2 = 14 . ( ) ( ) ( ) = Σ J [Var ( X ) + (E ( X )) ]− n[Var ( X ) + (E ( X )) ] E (SSTr ) = E Σ J i X i2• − nX •2• = ΣJ i E X i2• − nE X •2• i 2 i• i 2 i• •• •• σ 2 (Σ J i µ i )2  σ 2 2 = ΣJ i  + µi  − n +  n  n   Ji  = ( I − 1)σ 2 + ΣJ i (µ + α i ) − [ΣJ i (µ + α i )] 2 2 = ( I − 1)σ 2 + ΣJ i µ 2 + 2µ ΣJ iα i + ΣJ iα i2 − [µ ΣJ i ] which E(MSTr) is obtained through division by ( I − 1) . α 4 = 1 . Φ 2 = = 1 . y 3• = 19. so µ = µ1 + 15 . ν 1 = 4 .67. so Φ = .52. α 3 = −1 . y 4• = 20.79 . c.12 . SST = 6. This gives y1• = 15.58 . By inspection 1 of figure (10. MSTr = . so Φ 2 = and from figure (10.12. power b.01 . α 1 = α 2 = α 3 = α 4 = − 15 . 2 30. 1 2 Φ 2 = . ν With Poisson data. SSE = 4.6).01. 306 . ) = 2. power ≈ . α 2 = α 3 = 0 . y •• = 71. The expected number of flaws per reel does not seem to depend upon the brand of tape. By inspection of figure (10. α 1 = −1 .26 . µ 5 = µ1 + 1 .60 Φ = 1. α 4 = 1 .84. MSE = . f = 3. ΣΣy ij2 = 263. 2 2 2 2 Φ ).15 . = ( I − 1)σ 2 + ΣJ iα i2 .25 5(− 1) + 5(0 ) + 5(0 ) + 5(1) 1 power ≈ .43 . y 2• = 17. CF = 257.62 . µ1 = µ 2 = µ 3 = µ 4 .26. Φ = 2 .71 . α 1 = α 2 = 0 . 31.55 . ν 2 = 45 .707 J and ν 2 = 4(J − 1) . Since F. With σ = 1 (any other σ ( would yield the same . from ( ) 2 0 2 + 0 2 + (− 1) + 12 = 4.29 .23. = µ 5 will be rejected in favor of H a : at least two µ i ' s differ if f ≥ F. 34.94 . With x •• = 30. H 0 : µ1 = µ 2 = µ 3 = µ 4 vs.4591 MSTr = = 55.278 . From a n  n  x  as the appropriate table of integrals. b.65 .65 < 2.112 80. We reject Ho when the p-value < alpha. this gives h ( x ) = arcsin u = arcsin   n   ( ) transformation. E (MSTr ) = σ 2 + 1  IJ 2  2 n− J 2  n − σ A = σ 2 + σ A = σ 2 + Jσ 2A I −1  n  I −1 Supplementary Exercises 35. 3. 01. x x  −1 / 2 g ( x) = x1 −  = nu (1 − u ) where u = . b. thus fail to reject Ho . straightforward calculation yields 221. we still fail to reject Ho .43 ≥ 2. MSE = = 16.Chapter 10: The Analysis of Variance 33.3. 307 . The data suggests that 20.029 is not < . The calculated test statistic is f = 33. 4. Since 1.68 is not ≥ F.1098 .61 .61 . a.05. 20 = 4. The means do not appear to differ. Ho is not rejected. a..01. 40 = 2.82 . H 0 : µ1 = .43 . Because 3. There is a difference 16. Ho is rejected. 36. the five methods are not different from one another. and 4 5 55.1098 among the five teaching methods with respect to true mean exam score.109 with respect to true average retention scores. H a : at least two µ i ' s differ .278 f = = 3. The format of this test is identical to that of part a. Since . so h ( x ) = ∫ [u(1 − u )] du ..61 .12 = 1. x 5• = 14.48 . so we reject Ho .44 .78.76 ..76 .4 -1. x 4• = 14. x 3• = 12.855 22.867* 1. 5 38.05. 25 = 6.Chapter 10: The Analysis of Variance 37. 4. which is also < . The Tukey multiple comparison is appropriate. 5.78 .78 = .050 3.93.2 -2.. = µ 5 vs. Using Table A. The ANOVA table Let follows: Source Treatments Error Total Df 4 25 29 SS 30.600 3. Wij = 4. 24 = 4.217 1.15 .78 .49 . 05.4 -1.05.69 3.10.44 > F.714 0. Q. approximate with Q.94 x •• = 73.16 F. SSE = 3. 4.016 2.44 8. µ i = true average amount of motor vibration for each of five bearing brands.46 .001. 5. so p-value < . SST = 3. x 2• = 15.5 0.838 53.15 (from Minitab output.62.267* 2. Then the hypotheses are H 0 : µ1 = . do not reject Ho at level . so CF = 179.694 MS 7. 308 .001.93 = 2.066 1.93 2.233 . Source Treatments Error Total Df 4 25 29 SS . At least two of the means differ from one another. Pair x i• − x j • Pair x i• − x j • 1.3 2.69.3 0.283* 4.62 MS . Since 2.5 0.05.584 2.5 2. H a : at least two µ i ' s differ.62 .4 1. 05. 25 = 2.108 F 2.620 . 25 = 4.. SSTr = 180.650* *Indicates significant pairs. 3 1 4 2 x1• = 15.16 is not ≥ 2.71 – 179.17 ).914 F 8.914 / 6 = 1.5 1. 9897 – 86. H a : at least two µ i ' s differ.165 ± 2.16 = 3. Ho will be rejected in favor of Ha if f ≥ F.7114.49 θˆ = 2.  (204.060 .108.309 = (− . Variation in laboratories does not appear to be present.31 61.7114 8 = 3. µ1 = µ 2 = µ 3 . so β ≈ . 8 = 4.474 ) . so 0 is a plausible value for 40. and SSE = . J α i2 I ∑ σ2 2 2 2 (− 35 σ )  6  3 ( 25 σ ) =  +  = 1 . ν 1 = 4 .077..48 .25 . a. . which is not ≥ 4. By 5  σ 2 σ 2  inspection of figure (10.25) 6 = .31 MS 11. This is a random effects situation. SST = 86.25) + (− .23 .16 = 6.07 . 42.58 − = . θ.52 .13 + 2.28 . so µ = µ1 − 25 σ .803 < F.025.Chapter 10: The Analysis of Variance 39.6).25 ) + (− .0559 3 . 309 .05.598.165 .7 )2 (134.803 F.05.96 . H 0 : σ A2 = 0 states that variation in laboratories doesn’t contribute to variation in percentage.00 38.01. so a 95% confidence interval for θ is . There are differences in CFF based on iris color.6 )2 (169. SST = 13. Then Φ 2 = 41.0559.659.078.00 The SSTR =  + +  8 5 6   ANOVA table follows: Source Treatments Error Total Because Df 2 16 18 SS 23.25) + (− . 25 = 2.67 – 13.63 < 4. This interval does include zero. power ≈ . µ 4 = µ 5 = µ 1 − σ .0) 2   − 13.36 = 23.07 .7673.05.144. Then the hypotheses are H 0 : µ1 = µ 2 = µ 3 vs. SSTr = 1. 2. Thus f = 1 .165 ± .50 2.41 + 2. MSE = .598.108 )(1. t . α 1 = α 2 = α 3 = 25 σ . ν 2 = 25 .3.01 < p-value < . α 4 = α 5 = − 35 σ . and 4 2 2 2 2 2 Σci2 = (1) + (− .2224 = 1.31.39 F 4.36 = 61.060 (. so we reject Ho . so Ho cannot be rejected at level .25 ) = 1. 2.63 + 2.05. 2. µ i = true average CFF for the three iris colors.632 and Φ = 1. 540 . 3. c1 = 1.39 )(3.70. for µ1 − µ 3 .1.27 1. (I − 1)( MSE )(F.59 – 26.−.83.99.606 )(4 . Similarly. 2.39  1 1  + .498 )(4 .42 Brown 25.92 (− 1.17 = -2. and for i 2 c i2 . 5µ 2 − µ 3 The contrast between ci2 1 1 ∑ J = 5 + 6 = .2 − 1.33 ) ± (.5 2 . 43.0.33 25.27 ) (− 1.58 26.92 = -1.33) (− 1.17 The CFF is only significantly different for Brown and Blue iris color.92 Blue 28. i .166 .92 ) ± (.5 2 (− 1) = + + = .58) ± (. ∑J 8 5 6 i Estimate Interval 25. c2 = 1.65 ⋅ intervals are: (x Pair i• − x j• ) ± W ij 1.15) µ1 and µ3 since the calculated interval is the only one that does not contain the value (0).1 . Contrast µ1 − µ 2 µ1 − µ 3 µ2 − µ 3 .166 ) = (− 3. i ci2 1 1 ∑ J = 8 + 6 = .5 µ 2 + .59 Green 26.166 ) = (− 3. for µ 2 − µ 3 .540 )(4.16 = 3.166 ) = (− 3.33 ± 2.498 . I −1.17 = -1.92 – 28.65 and Wij = 3.25 -1.3 − 2.570 .59 – 28.25 ± 2. so ci2 1 1 ∑ J = 8 + 5 = . so the Modified Tukey 2  J i J j  Q.04 ) (− 2.58 ± 2. n− I ) = (2)(2.166 ) = (− 4.63) = 4. 310 .25 ) ± (. and c3 = 0.05.05.5µ 2 − µ 3 .606 .15 * 2.570 )(4.77 . 5µ 2 + .3 − 1. For µ1 − µ 2 .Chapter 10: The Analysis of Variance b. 49 24.8 is rejected.13 .07. . 1.63 59.33. .36.86.54 7. 45. The ordered residuals are –6. .07 Q. The resulting plot of the respective pairs (the Normal Probability Plot) is reasonably straight. F.51. -.33.8 ≥ 4. The corresponding z percentiles are –1. and F computed from Yij ’s = F computed from Xij ’s. 1. and 1.09. Only x1• − x 4• < 7. and thus there is no reason to doubt the normality assumption.67.86. Source Treatments Error Total Because Df 3 8 11 SS 24. 0.13 .05 4.92 . -. .67.67. 6.96 . -2. -1.33. -.33. -.07 . -5.8 = 4.30 . so all means are judged significantly different from one another except for µ 4 and µ 1 (corresponding to PCM and w = 4.Chapter 10: The Analysis of Variance 44.997.21.67. 0.67. so 7.12 MS 8312.36.07.33. 46. -1. 3 x 3• = 115.91.44 = 7. H 0 : µ1 = µ 2 = µ 3 = µ 4 F 1117. 2. . x1• = 33.51. .91.53 OCM). . Yij − Y• • = c (X ij − X •• ) and Yi• − Y•• = c( X i• − X •• ) .09.33. so each sum of squares involving Y will be the corresponding sum of squares involving X multiplied by c2 .44 1117. -. 1. 4. .38. c2 appears in both the numerator and denominator so cancels. -1.937. -.38. -4. 1. 0. The four sample means are x 4• = 29. 5. Since F is a ratio of two sums of squares. -1. 311 . 4.53 .84 .21. . and x 2• = 129.05. Chapter 10: The Analysis of Variance 312 . SSE = 298. 2.25. x 3• = 142 .2 7.36 B 2 91. b. αˆ 1 = x1• − x• • = 4.65 .50 45. x 2• = 152 .93 not ≥ F.93 ≥ F. MSE = = 4. βˆ1 = x•1 − x•• = 3. Since 2. 2.50 . αˆ 2 = .55 is 4 12 4. x •2 = 188 . 05.76 .6 59. x1• = 163 .300. βˆ 2 = −3.55 .86 1. f B = = 2.599 .05.58 .70 . so SST = 12 2 2 2 2 298. µˆ = x •• = 50.92 . αˆ 3 = −2.25 . a. 3. 4 .25 .50.300.25 F.25 − 83.14 .12 = 3.58 . x •1 = 215 .98 is not 3 4.58 27. 05.98 . ΣΣx ij2 = 30.17 . don’t reject HoB.23 Error 6 123.1 1. 6 = 4.58 − 91.08 .3. don’t reject HoA . F.6 = 5. b.75 2.25 − 30.70 = 14. neither HoA nor HoB is rejected. There is no difference in true average tire lifetime MSB = due to different brands of tires. x •3 = 200 . 44. x •• = 603 .49 .75 . Since neither f is greater than the appropriate critical value.05. βˆ 3 = −. 30. Since 1.1 14. There is no difference in true average tire MSA = lifetime due to different makes of cars.50 = 123.CHAPTER 11 Section 11.75 = 83.25 .300. CF = (603)2 = 30.17 20. SSB = 30. x 4• = 146 .75 = 91.42 .12 = 3. f A = = 1. 313 .53 Total 11 298. αˆ 4 = −1.65 = 7. a. SSA = 13 (163) + (152) + (142) + (146) − 30.392.26 .93 . [ ] Source Df SS MS F A 3 83. 248.2.99 . 9 = 6. x •1 = 1347 . w = 95. Source Df SS MS F A 3 324. x •• = 6445 . 248.4 .4 Since b. x •3 = 1677 . x 3• = 1764 . both HoA and HoB are rejected.96 . x •2 = 1529 .082.2 . SSE = 9232.56 . SSA = 324.2 108. ΣΣx ij2 = 2.596. Q. w = 5.4 4 i: 1 2 3 4 xi• : 231. as in b i: 1 2 3 4 x• j : 336.311.969.00 613.Chapter 11: Multifactor Analysis of Variance 3.75 325.934. x •4 = 1892 .0 Error 9 9232.25 419. 2 ( 6445) CF = = 2.375 . x1• = 927 .8 Total 15 373.01. 4.0 1025.75 382. F. 314 .4 13.25 473 Only levels 1 and 4 appear to differ significantly.2 13.8 = 95.96 1025. x 4• = 2453 .0 a.25 441.01. 16 SST = 373. x 2• = 1301 .126.027.4 105. 9 = 5. 3.934. SSB = 39.4 .25 All levels of Factor A (gas rate) differ significantly except for 1 and 2 c.3 B 3 39.082. 25 6. 315 .5565 Connector 4 246. b.3 Brand 1 differs significantly from all other brands. 01.3 41. a. SSB = 38. SSA = 159.7 50.37. w = 7.00 396.3. H a : at least one α i is not zero. reject HoA : α 1 = α 2 = α 3 = α 4 = 0 : The amount of coverage depends on the paint brand. x •1 = 183 .13 7. Tukey’s method is used only to identify differences in levels of factor A (brands of paint).16 19.5833 H 0 : α1 = α 2 = α 3 = α 4 = 0 . SSE = 40. f A = 2. d.67 .Chapter 11: Multifactor Analysis of Variance 4.7425 8. x 3• = 125 . 6 = 4.80 is not ≥ 5.58 53. Q.1419 Error Total 12 19 91. The amount of coverage does not depend on the roller brand.78 Since 7.12 = 5.00.76 B 2 38.95 . Source Df SS MS f Angle 3 58.7 45. x 2• = 137 . Since 2.67 238.00 2.98 . c.14 Error Total 6 11 40. i: 4 3 2 1 xi• : 41. x •• = 537 . x1• = 151 .80 5. 5. x •2 = 169 .76 .3867 2. Source Df SS MS f F. SST = 238. so fail to reject Ho .05 A 3 159.5565 < F. 4 .25 .97 61. x 4• = 124 .85 4. After subtracting 400. do not reject HoA : β1 = β 2 = β 3 = 0 .00 19. The data fails to indicate any effect due to the angle of pull.90 .19 7. Because HoB was not rejected.85 ≥ 4. x •3 = 185 .14 .05 . 35. x1• = .454. 3. MSA = 11.295 − 140.05. HoA is rejected. MSTr = 14.9763 = . and suggests substantial variation among subjects. 51 . w = .6 5. which is significant.68 . If there really was a difference between assessors. SSE = 1. MSE = . Q.18 = 3.Chapter 11: Multifactor Analysis of Variance 6.0938 .8906 SSTr = 1.454 = 2977. SST = 3476. a. so both anesthetic 1 and anesthetic 2 appear to be different from anesthetic 3 but not from one another. x 2• = .81. 2 . 7.443 .1 ≥ F.55 . a.05.30 .34 .43 . f = 6. If we had not controlled for such variation.83 .61 . Otherwise extraneous variation associated with houses would tend to interfere with our ability to assess assessor effects. x 3• = . SSE = 469.8217 .853 .1. an observed difference between assessors might have been due just to variation among houses and the manner in which assessors were allocated to homes.85 . 430.53 . CF = 140.39. f Tr = 1.55. SSBl = b. b. Since 6.434 .20 significant at level .20 . house variation might have hidden such a difference. 3 MSTr = .78 . a.05. SSBl = − 9. 316 . x1• = 4. MSE = = 3. it might have affected the analysis and conclusions. which is clearly insignificant when compared to F. 32. x •• = 17.9872 . b. Alternatively. f = = 1. SST = 3.6887 . SSTr = (905 )2 + (913) 2 + (936 )2 18 − 140.67 .7 25.85 = 5.5729 . which is not 2 8 3. there appears to be a difference between anesthetics. 8. x 3• = 8. f Bl = 12. x 2• = 4. MSE = 3 13.1458 .05. 2.18 = 3.454 = 28.04 . Source Df SS MS f Treatment 3 81. 317 .22 10. there are differences between batches as well.2106 Total 35 176.18 = 10.001.04 1.56 9. but Method A produces strengths that are different (less) than those of both B and C. 2.01 .43 9 1 4 3 2 8. F.69 Batch 9 86.18 = 3. which is significant.05.3125 6.001 < p-value < .01. 3.3. 2. w = (3.18 = 6. Q.34 = 1.0648 22.32 10 Method A Method B Method C 29.78 12.0556 1.90 ) 1.01 < 8. 4.44 Source Df SS MS f Method 2 23.90 .07 10.69 < F. (With pvalue < . 24 = 3. w = (3.40 Methods B and C produce strengths that are not significantly different.Chapter 11: Multifactor Analysis of Variance 9.2106 = 1.34 Total 29 134.001.49 31. 24 = 3.) Q.01.61 .7500 F.79 9.61) 1.61 8.22 Error 18 24.5000 8. so .64 7.36 Block 8 66. At least two of the curing methods produce differing average compressive strengths.39 . 05.87 Error 24 29.31 31.1944 27.23 11.05. There is an effect due to treatments. Reject Ho . 01 . the conclusions reached from using the Y’s will be identical to those reached using the X’s. when F ratios are formed the c2 factors cancel. a.10).Chapter 11: Multifactor Analysis of Variance 11.55). Yi• = X i• + d . (0. J j I i J j = 318 . etc. 113. -0.32). -1. 12.0875.0750. With Yi• − Y•• = X i• − X •• .0283. E (X i• − X • • ) = E (X i• ) − E ( X •• ) = 1  1 E  Σ X ij  − E Σ Σ X ij   IJ  i j  J j 1 1 Σ (µ + α i + β j ) − Σ Σ (µ + α i + β j ) J j IJ i j 1 1 1 = µ + α i + Σ β j − µ − Σ α i − Σ β j = α i . b. as desired. 1. Normal Probability Plot residuals 0.55). If Yij = cX ij + d . (0. Yij = cX ij .10).0825. f B = 8. -1.0992. 14.15).0642. (0. 0.73).73). The residual.1225.87 . Y•• = X •• + d .20 .1 0.81). percentile pairs are (-0.01.0117. 0. (-0. (0. (-0.1575. MSE = = 3. Y• j = X • j + d .87 ≥ 7. (0. With Yij = X ij + d .0758.15). F.).0708. (0. -0. and since 4 8 8. (-0. we reject Ho and conclude that σ B2 > 0 .5) remain unchanged when the Y quantities are substituted for the corresponding X’s (e.1 -2 -1 0 1 2 z-percentile The pattern is sufficiently linear.5 25.6 = 28.01 .81). 1.0 -0. 4. so all quantities inside the parentheses in (11. -0. so normality is plausible. MSB = 13. so all F ratios computed from Y are identical to those computed from X. each sum of squares for Y is the corresponding SS for X multiplied by 2 c .38 .. -0.g. (0. 8 = 7. (0.0350. However. 0.32). 0.  3  24  Σ α i2 = 24. so HoA is rejected at level .43 . Φ = 1. f A = 3. Source Df SS MS f A 2 30. 05.53 1.Chapter 11: Multifactor Analysis of Variance 15.581. 6.87 = 64.6 11.59 .81 which is not ≥ F. and from  4  16  figure 10.763.2 .436.93 .50 3.79 which is not ≥ F.02 4010.395. b.20 2.8 4059. Φ =   ∑ 2 =    = 1.40 . so HoB is not rejected. and J σ  5  16  power ≈ .00 .00 .79 which is ≥ F. 24 = 3.6 f AB = 1. and we conclude that no interaction is present.79 Error 24 97.53 4059. so Φ = 1. 2. a. 24 = 3.2 16.966. Q.51 .125 . Φ = 1. 05. ν 2 = 12. w = 3. a. so HoAB cannot be rejected. 2  1  β j  4  20  b. e. For the second alternative. ν 1 = 3. 24 = 2. 3.05.185. ν 1 = 4.10 Only times 2 and 3 yield significantly different strengths.5. f B = 2. 2 Section 11.53 . d.2 7263.01 .81 AB 6 43.0 15.88 4029. ν 2 = 6.06 . and power ≈ . 3. 24 = 3. so Φ 2 =    = 1.79 B 3 34. 12 3 1 2 3960.381.87 Total 35 205. c.05. 05. 319 .3 . power ≈ . 5 0 0.25 0.5 30 0.50 8. Sand% Fiber% x 0 0 62 15 0 68 30 0 69. c.05 4.33* 4. This agrees with the statistical analysis in part b 75 0.74 Error 9 843 93.0 6.26 3.75 0. F.5 3.26 Sand&Fiber 4 8.82* Sand&Fiber 4 279 69.25 71.Chapter 11: Multifactor Analysis of Variance 17.00 0.63 b.56 5.67 Total 17 3.5 30 0.25 69 15 0.278 639.26 4.89 2.5 71. Source Df SS MS f F.05 Sand 2 106.63 Error 9 73. a.5 68 15 0.105 There appears to be an effect due to carbon fiber addition.39 6.54* 4.76 Fiber 2 1.11 43.50 mean 70 65 0 10 20 Sand% 320 30 .78 53. Source Df SS MS f Sand 2 705 352.17 Total 17 276.5 74 The plot below indicates some effect due to sand and fiber addition with no significant interaction.28 There appears to be an effect due to both sand and carbon fiber addition to casting hardness.22 .25 73 0 0.27 3.26 Fiber 2 87. 3 166.33 6.63 -1.1 179.6 190.3 Observed 161.57 -2. For Factor A: µ1• = 187.6 µ13 = 191.19 α 2 = −11. c.27** F.93 12 17 71.3 321 Fitted 161.13 3.73 Residual 0.47 166.97 d.97 0.73 overall mean: µ = 175. Observed 189. 03 µ 33 = 166.47 µ12 = 180.253.01 9.88 For Interaction: µ11 = 189.45 y12 = −1.18 β j = µ• j − µ : β1 = 1.1 165.29 1.03 161.47 189.2 -3.03 191.7 5.83 µ • 2 = 170.1 -0.253.44 115.07 -3.2 166.73 166.4 4.96 y 23 = −.45 y 21 = −.99 β 2 = −5.82 µ •3 = 178. Source Formulation Speed Formulation & Speed Error Total Df 1 2 SS 2.2 µ 22 = 161.89 F. Both formulation and speed appear to have a highly statistically significant effect on yield.41 y 22 = 1.03 191.2** 19.58 9.67 -1.7 159.66 µ •1 = 177.03 µ 2• = 164.3 1.84 α i = µ i• − µ : α 1 = 11.41 f 376.23 0.3 Fitted 189.6 180.57 .9 167.23 -0.55 3.99 a.87 0.44 230.7 188.03 161. Let formulation = Factor A and speed = Factor B.04 For Factor B: y ij = µ ij − (µ + α i + β j ) : y11 = .73 166.8 161.1 163.574.6 Residual 0.5 -1. b.93 2 18.6 180.81 MS 2.47 189.43 -0.03 191.2 180.02 β 3 = 3.4 177.87 2.05 4.03 1.6 170.0 191.6 189. There appears to be no interaction between the two factors.75 3.2 166.89 6.6 185.1 165.39 y13 = .Chapter 11: Multifactor Analysis of Variance 18.03 166.03 µ 21 = 166.0 193. 07 0.20 -1.556 86.43 -3.778 8.03 -1.36 0.87 -0.07 0.667 47.889 69.30 -0.50 2.91 Normal Probability Plot of ANOVA Residuals 5 4 3 Residual 2 1 0 -1 -2 -3 -4 -2 -1 0 z-percentile The residuals appear to be normally distributed.97 3.40 1.Chapter 11: Multifactor Analysis of Variance e.333 63.63 0.23 0.09 1.30 -2.38 -1.09 -0.57 0.51 0.111 41. i Residual Percentile z-percentile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -3.23 -1.36 -0.111 91.667 97.67 0.778 58.57 4.91 -1.86 1.889 19.21 0.10 -0.07 0.67 -0.13 0.222 52.86 -0.222 -1. 322 1 2 .000 30.51 -0.21 -0.444 75.444 25.38 1.000 80.67 1.333 13.556 36. 2942 HoAB cannot be rejected.0622 3.37 x • j• 49. 2 (49.37 )2 − 1238.0024 .20 50.01 A 2 .27 16.2942. 9 = 5.9991 = .33 CF = 1238.43 .Chapter 11: Multifactor Analysis of Variance 19.0036 .607 Coal 2 is judged significantly different from both 1 and 3. Q.15)2 + (50.21 6. w = 5.8583 = 1.81 2 16. so varying levels of NaOH does not have a significant impact on total acidity. HoA cannot be rejected.247 8.0170 = .1243 .80 17.91 49.10 49.1530 .48* 8.035 8.1525 – 1238.44 17.81)2 + (49.0024 SSA = 6 SSE = 1240. b.1530 .64 48.289 6 j: 3 1 2 x • j• 8.42 Error 9 . so no significant interaction. a. 2479.5012 29.8583 Thus SST = 1240.0170 Total 17 1.37 16. HoB is rejected: type of coal does appear to affect total acidity.24 17.15 3 16.8583 = .43 .00 15. 323 .0145 . j i x ij• 1 2 3 x i• • 1 16.1243 .02 B 2 1.01. 3. SSB = 1.1525 − Source Df SS MS f F.48 51.66 8.21 x••• = 149 . but these latter two don’t differ significantly from each other.02 AB 4 . 80 .70 − [22. x• 2 • = 1640 .380.46 .103 − (19. 10 30 SSAB = 64. 2. x13• = 845 .32 is insignificant.05.93 Error 12 833. 324 . CF = 1. Both HoA and HoB are both rejected. x• 3• = 1520 .688.33 69. Both phosphor type and glass type significantly affect the current necessary to produce the desired level of brightness.8 = 3.96* 6.32 6. x 2• • = 2115 .40 ≥ F.80 + 22.38 499.87 499.Chapter 11: Multifactor Analysis of Variance 20.253.40 . SSE = 12.50 .87 Source Df SS MS A 2 22. fAB = . 4.50 1016. since 11. x 22• = 735 .44 622. x•1• = 1560 . SSB = 22.253.253. 470.901 (19. Since f 11.103 − 2 2 122.93 B 2 1244.53 5691.338. Source Df SS MS f F.98 = 11.90 B 4 22.23 Error 15 15. HoB is also rejected.280.253.01 A 1 13.45 22.954.23 = 22.38 AB 8 3993. a.80 11.98 ≥ F. We conclude that the different cement factors affect flexural strength differently and that batch variability contributes to variation in flexural strength. since they are both greater than the respective critical values.338.93 AB 2 44. HoA is rejected.150 . 05. 21.70 fAB = .23 .941.237. ΣΣxij2• = 3.84 .143) SSA = − = 22.50] = 3993. x11• = 855 .89 13.22 8. SST = 12.8 = 4.70 .23 5691.49 is clearly not significant.90 Total 29 64.44 Total 17 15. 30 (24.950 .90 499.53 .941. 2 ΣΣΣxijk = 1.470.756. which yields the accompanying ANOVA table.954.941.461.89 192.765. x 23• = 675 . x1•• = 2605 .49 22. x 21• = 705 .143)2 = 64.529.11 Clearly.954.89.765.765.699) = 15. so HoAB is not rejected.280. x12• = 905 .53 + 15.09* 9. b. x• •• = 4720 . 499.598)2 = 20. 325 .591.25 Source Df SS MS A 3 1.67 Total 23 20.100.14 3.598)2 SSB =  = 2888.0 + 1387.591. SST = 11. − 8 24   SSAB = 20.76 5.552) = 8216.00 Interaction between brand and writing surface has no significant effect on the lifetime of the pen.492 − (16.387. and since neither fA nor fB is greater than its respective critical value.5 + 2888.888. − 6 24    (5413)2 + (5621)2 + (5564)2  (16.216.5 B 2 2.598)2 SSA =  = 1387.5 . we can conclude that neither the surface nor the brand of pen has a significant effect on the writing lifetime.07 = 1.350.34 = 1. H 0 AB : σ G2 = 0 .83 − [8216.08] = 8216.83 .591.5 462. H 0B : σ B2 = 0 .04 AB 6 8.08 . 24 (22.Chapter 11: Multifactor Analysis of Variance 22.444.08 1. The relevant null hypotheses are H 0 A : α 1 = α 2 = α 3 = α 4 = 0 .499.25 1.04 Error 12 8.97 F.0 .0 684.982. SSE = 11.05 4.83 f MSA MSAB MSB MSAB MSAB MSE = .492 − 2 2 2  (4112) + (4227 ) + (4122)2 + (4137)2  (16. a. x•1• = 5432 . − X .930. x3•• = 9234 .17 216.52 AB 8 1734. Source Df Since SS MS A 2 11. x• 3• = 5177 . and 20.09 4482. ) = E(γˆ ij ) = 1 1 1 1 Σ E( X ijk ) − Σ Σ E (X ijk ) − Σ Σ E (X ijk ) + Σ Σ Σ E (X ijk ) k j k i k K JK IK IJK i j k = µ + α i + β j + γ ij − (µ + α i ) − (µ + β j ) + µ = γ ij 326 . b.01 ..8 = 8. x• 2 • = 5684 . 8.30 = 3. x• •• = 27.38 5786. Σx •2j• = 151. 01.22 Total 44 35.77 Error 30 4716. CF = 16. x• 3• = 5619 .17 . Σx i2•• = 251. x• 4 • = 5567 .65 . 4.. HoG cannot be rejected.Chapter 11: Multifactor Analysis of Variance 23. Both capping material and the different batches affect compressive strength of concrete cylinders.38 MSA MSAB MSB MSAB MSAB MSE 1.8 = 7. and we continue: 26.. resulting in the Summary quantities include x1•• = 9410 .872.479 .01.70 ≥ F.573.68 = 1.70 = 20.67 157. so both HoA and HoB are rejected.01.180. 1 1 Σ Σ E( X ijk ) − Σ Σ Σ E (X ijk ) JK j k IJK i j k 1 1 = Σ Σ(µ + α i + β j + γ ij ) − Σ Σ Σ (µ + α i + β j + γ ij ) = µ + α i − µ = α i JK j k IJK i j k E( X i. 24.69 .459 . 2.779.69 B 4 17.954. x 2• • = 8835 .081 .31 f = 26.38 < F.68 ≥ F. accompanying ANOVA table.898. Chapter 11: Multifactor Analysis of Variance 25.262 .37) ± 2. ) ± tα / 2 .f. and since i ≠ i ′ . so E(MSE ) σ MSE appropriate F ratio. so is the appropriate F ratio. IJ ( K −1) . MSE = . ) = + = (because Var (X i ..15 − 50. ) JK JK JK 2 MSE 2 and Var(ε ijk ) = σ ) so σˆ θˆ = .I.15 . J = 3. = 49. = 50. is ( xi .37 . θˆ = X i.17 = (− 1.I... JK x3.. − X i′. 1 θ = α i − α i′ .262 26.. k..05) .0370 = −1. is (49. MSAB 327 . ) = Var(ε i . − xi ′. x 2. 6 Kσ 2 MSAB E (MSAB) 2 2 is the = 1 + 2G = 1 if σ G = 0 and > 1 if σ G > 0 . ) + Var (X i′. 2 2 2 JKσ A2 E (MSA) σ + Kσ G + JK σ A = = 1 + = 1 if σ A2 = 0 and > 1 if 2 2 2 2 E (MSAB) σ + Kσ G σ + Kσ G MSA σ A2 > 0 . 9 = 2... t .−1. is IJ(K – 1).22 ± .39. For the data of exercise 19. JK j k X ijk andX i′jk are independent for every j. = Σ Σ( X ijk − X i′jk ) .025. b.0170.. The appropriate number of d. so JK 2MSE the C. a. . Thus With () σ 2 σ 2 2σ 2 Var θˆ = Var (X i. K = 2. so the C.. 79 5.05.77 135.35 C 2 244.62 1.880.983.3.82 AB 6 53.04 3.81 The statistically significant interactions are AB and BC.64 23.14 2.50 115.06 3.24 3. Factor A appears to be the least significant of all the factors.523.70 4.81 546.3.080.62 267. Source Df SS MS f F.52 66.46 3.73 BC 4 331. Source Df SS MS f F.011.033. All three (3)(3)(2 ) levels differ significantly from each other.294.20 1056.27 4.83 Total 53 270. 24 = 3.73 ABC 8 1.08 .437.940.67 .367.01 A 3 19.50 7.83 = 8.149.67 AC 3 9. d.72 2. It does not have a significant main effect and the significant interaction (AB) is only slightly greater than the tabled value at significance level .73 6.873.75 3.53 115.50 2.72 BC 2 91.73 AC 4 62.61 C 1 157.67 Error 24 56. This indicates that none of the interactions are statistically significant.127.73 3.44 7072.348.17 2.33 b.31 2.22 61.27 2755.35 B 2 5.437.024.01 328 .79 3.05.10 1.95 .46 1. Q.589. c.48 Total 47 2. w = 3.39 122.069.511.40 5.05. use Q.67 82.164.52 157.Chapter 11: Multifactor Analysis of Variance Section 11.819.696. 28.047.093.53 .02 19. All three main effects are statistically significant.238.72 B 2 2.05.04 45.558. The computed f-statistics for all three main effects exceed the tabled value for significance at level .144.35 AB 4 1. The computed f-statistics for all four interaction terms are less than the tabled values for statistical significance at the level .24 2.05 A 2 14. a.41 2.24 1.61 ABC 6 6.383.67 15.21 8.31 Error 27 3.3 27.92 . 27 is not tabled. use Q. For level A: x1.145 21..... b.. k . − x.634 .823 ..215 = 1.. − x.837 1....436 Df SS MS f F.976. 4.. 2 SSC = IJL∑ (x.05. J = 2.215 Total 95 1. x 2..667 5. I = 3.. L = 4. ) .875 x. No interaction effects are significant at level . 4. = 3...08 2. − x. ) .15 AC 6 71.938 x.1.454 1..037.25 Error 72 447.04 3.74 Machine: Mean: 3 1 4 2 .. 2..500 6. 1.469 x.021 11.417 5.781 For level B: x. = 5. = 3.542 .2..958 2 2 SSB = 99.. 72 is not tabled.907 6. = 4..Chapter 11: Multifactor Analysis of Variance 29.10 2. = .646 .875 3.417 x..514 . Source SSC = 393.05 d.09 4.00 C 3 393.625 x. K = 4. w = 3.. Factor B and C main effects are significant at the level . = 3.. 60 = 3.90 2.667 SSA = 12. SSA = JKL∑ ( xi.4. ) . a.875 x3. 3. = 2.436 131.976 99.15 B 1 99.976 16.74 ..05 c.05.90 ..805 1.833 *use 60 df for denominator of tabled F.76 ABC 6 9. = 4.907. (3)(2 )(4) .05* A 2 12. = 3..875 329 6.979 For level C: x.26 2..76 AB 2 1. = 5. SSB = IKL∑ (x.13 3.25 BC 3 1. j. Q. 05 A 3 .13 C 1 .075417 77. (2 )(2) 1 2 3 4 1.000975 Error -- -- -- Total 15 .000025 .69 10.22 9.22625 .1200 1.002925 = .48 9. 4.82 . w = 6.03 10. Q.28 BC 1 .0036 .3 = 6.004325 .000025 .13 ABC 3 .05 is the factor A main effect: levels of nitrogen.13 AB 3 .35 9.2384 The only statistically significant effect at the level .64 10.28 AC 3 .000625 .00065 . See ANOVA table Source Df SS MS f F.3875 1.3025 1.0036 3. c. b.82 .Chapter 11: Multifactor Analysis of Variance 30.4300 330 .1844 .000217 .05.28 B 1 .000625 .002925 .0014417 1. a. 2j .7 230. = 1.0 209.6 661.. ΣΣΣxijk = 145.157.6 x.5 223. xij.386.6 ΣΣxij2.6 221. 8 = 7..0 B2 214. F.540. CF = 144.56 Also. so the AB and BC interactions are significant (as can be seen from the p-values) and tests for main effects are not appropriate.8 226.0 x.5 221.1 649. from which we obtain the ANOVA table displayed in the problem statement.40 . 652.666.0 684.3 246.74 ΣΣx. 01. = 1.0 202.5 220. 653.k A1 A2 A3 C1 213.8 675.9 218. = 435.7 x.2 675.906.304..304..5 C3 213.0 C2 225..k = 435.1 224.5 218.382.5 A3 217.0 B3 213. j. k 640.81.0 219.01 .26 ΣΣxi2. 331 . 2 x.0 205.305. = 1978.6 x i.6 226.34 Σx.2 224.92 Σx i2.2 x i.2jk = 435.Chapter 11: Multifactor Analysis of Variance 31.2k = 1..36 Σx.156.1 229.774.4 641.4. B1 B2 B3 A1 210. jk C1 C2 C3 B1 213.8 222.1 A2 224. 31 98. a.58 721. is the F ratio for testing H 0 : all MSABC MSA γ ijAB = 0 .21 516.12 Error 24 8655.85 = 6.67 At level .24 = 3.78 5.72 472.01 = 19. MSAC Since b. treatment was not effective.48 Total 48 168. 05.61 3.15 MSAB MSABC MSAC MSABC MSBC MSE MSABC MSE = 2.31 AC 2 1442.67 3.29 BC 6 3096.318.06 8. no Ho ’s can be rejected. is MSE MSE MSAB 2 the F ratio for testing H 0 : σ C = 0 .11 AB 3 3408.91 Error 30 44.24 14. 33. Source Df SS MS A 6 67.51 C 6 5.61 .24 B 3 9656.04 ABC 6 2832.60 360.93 1136. 332 f .41 = 2.07 Since . so there appear to be no interaction or main effects present.4 3218.61 < F.43 .22 1135.318.80 C 2 2270. Source Df SS MS A 1 14.43 = 1.00 = 1.26 1.01.42 .78 5. and is the F ratio for testing H 0 : all α i = 0 .50 9. Similarly. 6.32 11.61 9.02 B 6 51. 2 2 E (MSABC ) σ + Lσ ABC 2 2 = = 1 if σ ABC = 0 and > 1 if σ ABC > 0 . 2 E (MSE ) σ MSABC MSC 2 is the appropriate F ratio for testing H 0 : σ ABC = 0 .65 Total 47 f MSA MSAC MSB MSBC MSC MSE F. 30 = 2.Chapter 11: Multifactor Analysis of Variance 32. shelf space does not appear to affect sales.26..89 7. 35.71 5. = 6605.05.50 36.2j .2j .219 . ΣΣxij2( k ) = 1358. 1 2 3 4 5 6 x i.2k = 203.. j.93 8.01 . 36 = 203. 40.. CF = 1297.47 101.745 .30.78 B 4 23.60 Source Df SS MS f A 4 28. = 180.31 40.25 Error 12 8.59 ≥ F. so both factor A (plant) and B(leaf size) appear to affect moisture content.59 36.02 x. 29.30 F4.16 41. k 36..92 x.67 36.423 .61 37. 20 = 2.82 Σ x.69 0.89 C 5 508. ΣΣxij ( k ) = 42.12 = 3.2k = 6489.09 . 5.30 Σ x.71 .. Σx i.428.19 31. 1 2 3 4 5 x i..80 1295.47 105.04 44.23 63.97 Since 1..68 30.17 0.619 Source Df SS MS A 5 6475.22 10.69 Error 20 1277.91 x. CF = (1097)2 2 2 = 33.85 C 4 0.14 33.02 32. 144 205 272 293 85 98 x. j..Chapter 11: Multifactor Analysis of Variance 34.. 171 199 147 221 177 182 x. HoC is not rejected.67 Total 24 61.16 B 5 529. = 1097 . but factor C (time of weighing) does not.. k 180 161 186 171 169 230 x. 333 .21 Σ xi2. Thus Σx..59 is not f 1.03 34.03 .. = 6630. Σ x.89 Total 35 8790. = 239. 1382 . Source Df SS MS f F.0635 1.953 1. There are no statistically significant interactions.86 Error 144 115.2387 .96 AC 15 15.79 C (Fabric type) 5 21.19 BC 10 1.01* A (laundry treatment) 3 39.37 1.17 AB 6 1. 334 . At the level .23 3.171 13.35 3.016 .41 4.30 2.17 2.057 16.01. use 120.432 .947 *Because denominator degrees of freedom for 144 is not tabled.3005 .382 .95 B (pen type) 2 .Chapter 11: Multifactor Analysis of Variance 36.508 4.3325 .820 .32 2.8043 Total 215 204.665 .47 ABC 30 9. there are two statistically significant main effects (laundry treatment and fabric type).3016 5. 335 .Chapter 11: Multifactor Analysis of Variance 37. C.93 4.39 B 1 47.05 7.446 281.347 .02 Error 36 . 4 At level .17 5.19 5.714 3.56 CD 2 .37 7.329 2259.39 ABCD 4 .48 5.255 48.39 BC 2 2.29 5.647.02 ABD 2 4.141 2.79 4. Thus SSABCD = 6.783 503. Computing all other sums of squares and adding them up = 6.247 .80 4.280 .470 .091 – 6. Source Df MS f F.56 AB 2 15.091.621) = 6.977 Total 71 *Because denominator d.01* A 2 2207.39 D 1 .303 15. No other interactions are statistically significant.702.647. use d.28 7. for 36 is not tabled.02 AD 2 .f.767 .273 .01 the statistically significant main effects are A. B.645.645.66 5.389 = .39 ABC 4 3.f.072 4.29 5.044 .02 BCD 2 .36 5.389 and MSABCD = 1.25 5.355 4.39 BD 1 .56 C 2 491. The interaction AB and AC are also statistically significant.347 . = 30 SST = (71)(93.702 = 1.39 ACD 4 .39 AC 4 275. 4 80.0 41.0 3697.6 c = x112.6 83.573. Those effects significant at level . 2 ijkl = 882.4 21.2 36. p-value < .4 583.2 21.0 164.4 23.38 − 16 2 = 28.2 ab = x 221.4 38.335. 1 2 Effect Contrast (1) = x111.573.4 bc = x122. 549.6 717.0 a = x211.2 30.e.05) are the three main effects and the speed by distance interaction. 473. indicating statistical significance.3 The important effects are those with small associated p-values.8 1706.6 270.6 ac = x212.8 24.2 839.4 52.8 312.2 abc = x222.6 . 602.2 1991. 404.0 1151.0 5076.4 39.8 109.6 2.5 ΣΣΣΣx b.8 -2.05 (i.1 b = x121.272. 339. a. 336 . 515.38 ..2 988.8 -41. ) SS = ( contrast 16 Treatment Condition xijk.4 -19. 2 ( 3697) SST = 882.Chapter 11: Multifactor Analysis of Variance Section 11.2 1685. 435.6 -285. 378. . SST = 158.660.337.38 222 1107 370 113 27 ABC = 30.49 . 253. from which SSE = 2608.177. 24 γˆ 21AC = −γˆ11AC = 2.7..38 221 967 1844 627 199 AB = 1650...889 . F.38 a.16 = 4. The ANOVA table appears in the answer section.04 122 737 257 86 57 BC = 135.04 and 24 2 ΣΣΣΣxijkl = 1.96. With CF = 54852 = 1.21 . 05.1. − x.04 212 710 383 681 -53 AC = 117.04 121 584 1163 680 1305 B = 70. b. 337 .551. from which we see that the AB interaction and al the main effects are significant.21.411. Factor SS’s appear above.Chapter 11: Multifactor Analysis of Variance 39.04 112 453 297 624 529 C = 11.38 24 315 − 612 + 584 − 967 − 453 + 710 − 737 + 1107 γˆ11AC = = 2.959. ) SS = ( contrast 24 2 Condition Total 1 2 Contrast 111 315 927 2478 5485 211 612 1551 3007 1307 A = 71.2. = = 54. 584 + 967 + 737 + 1107 − 315 − 612 − 453 − 710 βˆ1 = x. 49 .78 .53 D 1 8.00 <1 Error 16 35.535. The F. 2 ΣΣΣΣΣx ijklm = 3.26 <1 AC 1 . and and SSE is obtained by subtraction.85.42 3. 41. a value exceeded by the F ratios for AB interaction and the four main effects.91 <1 CD 1 .26 .62 .98. a.24 Total 31 F.76 1. 05.956)2 .16 = 4.62 <1 BCD 1 1. x.Chapter 11: Multifactor Analysis of Variance 40.. 338 (11.78 <1 ABC 1 ...979.05.143 .16 − 35.607.15 .91 .42 1.77 6.17 .956 .308.02 . ΣΣΣΣΣxijklm = 4783.08 5.84 BD 1 . Each SS is (effectcontrast )2 48 ANOVA table appears in the answer section. x.85 2..64 AB 1 .00 .78 <1 ABD 1 6.08 13. 2 SST = 4783. so CF = SST = 328.16 3.14 = 72.. effects are listed in the order implied by Yates’ algorithm.78 .56 and SSE = 72. so none of the interaction effects is judged significant. and only the D main effect is significant.17 <1 B 1 1. Source b. = 388.16 8.16 ..94 1.02 ACD 1 .02 .94 <1 C 1 3.1. 48 = 2.02 <1 BC 1 13.14 .74 . = 11.1..56 – (sum of all other SS’s) = 32 Df SS MS f A 1 .74 <1 AD 1 .76 <1 ABCD 1 .32 ≈ 4. In the accompanying ANOVA table.. so 2 368.77 3. 1.33 ABD 15.133 + . 01.02 3. so MSE = .020 -- BC 1. 5 = 6.020 + .12 AD . 32 ≈ 7.19 <1 BC 3553.388.190 5.02 9.170.61.62 BCD .1.123 1067.133 -- ABC . Condition/E ffect ) SS = ( contrast 16 Condition/ Effect ) SS = ( contrast 16 (1) -- D 414.19 <1 B 332.167.917.17 CD 4700. = 39.69 70.436 1.69 F.64 AC .69 3.11 ABC 1210.078 <1 ACD 1.055 -- C . 2 ΣΣΣΣΣxijklm = 32.42 BD 3519.f.. 339 2 f F.02 3.099 <1 BD .681 -- 2 f SSE = . d.Chapter 11: Multifactor Analysis of Variance 42.170.52 <1 C 43.017 <1 B .051 -- ABCD . = 5. 48 Effect MS f Effect MS f A 16.460.497 1.02 <1 D 20.456 <1 AB ..055 + 1.817 .69 2.681 = 1. 43.109 <1 CD 2.5 . SS = (contrast )2 .52 <1 Error 4733..229.02 4. only the D main effect is significant.404 3.42 ABCD 1692. 05.28 ABD .33 A . = 32.52 <1 AC 776. and error d.f.02 <1 BCD 10.22 AB 1989.140.371 .19 <1 ACD 1963.940. so only the B and C main effects are judged significant at the 1% level.19 AD 16..354.051 + . x. so . and c.79 AC 1 2.00 3.Chapter 11: Multifactor Analysis of Variance 44. cd.00 -- D 2 930.25 = 51 .00 1.00 -- Total 2 f 1153. 340 . = 4.93 ABD 2 20.71 . ab.25 + 2.25 <1 BC 1 .75 SSE = 9. and abd. ΣΣΣΣxijkl = 105.00 2. = 1290 . d. the other eight conditions are placed in the second block. Condition/Effect Block ) SS = ( contrast 16 (1) 1 -- A 2 25. 4 = 7.00 <1 ACD 2 20. so only the D main effect is significant. so SSBl = 6392 6512 12902 + − = 9.25 -- CD 1 4.00 . The eight treatment conditions which have even number of letters in common with abcd and thus go in the first (principle) block are (1).93 B 2 9. ac.f.9375. ad. bd. 05..25 <1 ABC 2 9.25 -- BCD 2 2.75 .160 ..00 <1 AB 1 12. so MSE = 12.25 -- ABCD=Blocks 1 9. so SST = 1153. bc.75. b.00 1.25 71.0 + 20. a.. 2 x.25 + 20 . The two block totals are 639 and 651.25 <1 C 2 49. which is identical (as it must be 8 8 16 here) to SSABCD computed from Yates algorithm. F.78 BD 1 25.90 AD 1 36.1. g. the sum of the four observations in block #1 on replication #1).853. The only other possibilities are for both to be “triples” (e.12 = 9.1. so SSBl = 20912 21222 16. A) or a pair of letters (e.. etc.g.88 . 2092. = 16. 2133. and 2122. With F. The result is clearly true if either defining effect is represented by either a single letter (e...035. 2122.898 .g...054 − 16. AB). See the text’s answer section. But the generalized interaction of ABC and ABD is CD. The eight 32 block × replicatio n totals are 2091 ( = 618 + 421 + 603 + 449.) or one a triple and the other ABCD. The remaining SS’s 4 4 32 as well as all F ratios appear in the ANOVA table in the answer section. 01. b. 47..88 . 341 . x..898 2 + . 46. ABC or ABD. so a two-factor interaction is confounded.33 . 2113. 2080. with block #1 containing all treatments having an even number of letters in common with both ab and cd. + − = 898. and the generalized interaction of ABC and ABCD is D. all of which must have two letters in common.Chapter 11: Multifactor Analysis of Variance 45. so SST = 9. only the A and B main effects are significant. 2145. a. The allocation of treatments to blocks is as given in the answer section. so a main effect is confounded.8982 = 111. ac. 342 SSE=2. BC}.49 1. The treatment conditions in the observed group are (in standard order) (1). {D.26 - + - + - + - Cd = 11. CD}. so only the D main effect is judged significant. ad.47 . {AC. bd.05. The alias pairs are {A.29 + + + + + + + Contrast 5.27 -2.11 + + - - + - - Ac = 21.72 + - - + - - + Bd = 11.87 -1. {C.72 - - + + + - - Abcd = 12.08 f 4. ABD}. BCD}.66 + - + - - + - Bc = 20. {AB.13 .47 130. ABC}.31 -3. ACD}.09 1. b. ab.93 -32.09 - - - - + + + Ab = 20.Chapter 11: Multifactor Analysis of Variance 48.36 . cd. {B.87 . and {AD.770 . BD}.3 = 10.31 MSE=. and abcd.47 F.44 - + + - - - + Ad = 13. A B C D AB AC AD (1) = 19.1.79 SS 3. bc. a.55 .69 .51 <1 <1 169. SSDE = 4.563. F.7 ≥ 2.4 − 72.090.67 .240.6 + - + + - - + + - - - + + - - bcd e 66. SSAD = .4 + .8 + + - - + + - - + - - + + - - ace bce 67.696 Error 32 3. SSAC = 7.1. 01. SSCE = . Source Df SS MS f Treatment 4 14. SSBC = 2. SSB = 7.962 3.3 + - + + + - + + - + - + + - + + - + - + + - ade 64.1 − 70.262 Total 44 27..7 Block 8 9.8 + + + - - + + - - + - - - - + d abd 67.0 - + - + - + - + + + + + - + + + + - + + - - abe 67.203.563. MSE Because 36.0 + + - + + - + + + - + + - + - + + - - + + - acd 66.0)2 16 = 2. SSE = 10.4 - + - + - - + + - + + + + - + + + - + - + + abc 73.102 H 0 : α1 = α 2 = α 3 = α 4 = α 5 = 0 will be rejected if f = MSTr ≥ F.05 .840.563.9 65.1 70. SSBD = .10 = 10. A B C D E a 70. Ho is rejected.4 67. Error MS = 2. SSBE = .8 68.063.123.057. so only the D main effect is significant.04 .250 . SSCD = .103.Chapter 11: Multifactor Analysis of Variance 49. SSAE = 4.741 36.67 .010.0 + + + + + + + + + + + + + + + Thus SSA = AB AC AD AE BC BD BE CD CE DE (70. 343 .360. SSAB = 1.568. We conclude that expected smoothness score does depend somehow on the drying method used.. SSD = 52.4 + - - - - - - - - + + + + + + b c 72.0 + - - + + - - + + + - - - - + bde cde 67.840.9 - + - + + + + + + + - - - - + - + - + + + + abcde 68.5 70.32 = 2. SSC = . + 68.920 . 4 . Supplementary Exercises 50. Error SS = sum of two factor SS’s = 20.010. tests for main effects are not appropriate.869.50 Because 2.874 36. I −1. Because 8. Ho will be MSAB ≥ F. j).623 11.329 Total 23 372.667 322.86 MSE Source Df SS MS f Treatment Block 3 3.Chapter 11: Multifactor Analysis of Variance 51.24 .08 AB 3 8.667 980.86 . ( I −1)( J −1 ) = F. Expected accumulation does not appear to depend on sowing rate.67 ≥ 3.266 . 344 . Source Df SS MS f A 1 322.129.5 Total 15 26.0 460. H 0 : α1 = α 2 = α 3 = α 4 = 0 will be rejected if f = MSTr ≥ Fα .751.0 Error 9 4.05.153.28 3 19.752. Ho is MSE rejected.141.05 if f AB = : γ ij = 0 for all I.16 = 3.113 We first test the null hypothesis of no interactions ( H 0 rejected at level .17 2.28 < 3. Let X ij = the amount of clover accumulation when the ith sowing rate is used in the jth plot = µ + α i + β j + eij .040. 3 .05. Because we have concluded that interaction is present.550.38 B 3 35. 3 .470.67 Error 16 5.24 . Ho is not rejected.5 1. 52.141.165.9 = 3.557 2.852 8. 25 6.00 3.72 AC 1 1.32 .25 ABC 101 42 75 16 16.00 AB 98 160 9 134 1122.00 .904.25 C 88 -23 31 14 12.50 Total 15 F.05.72 B 1 144. but none of the interactions are significant.00 BC 59 -33 59 -14 12.25 B 62 143 13 48 144.39 C 1 12. Let A = spray volume.00 AB 1 1122.72 ABC 1 16.25 249. 05.8 = 5. SS = (contrast)2 Condition Total 1 2 Contrast (1) 76 129 289 592 21.56 Error 8 4.25 AC 55 36 17 -4 1. C = brand.Chapter 11: Multifactor Analysis of Variance 53.00 32.25 2.00 16 The ANOVA table is as follows: Effect Df MS f A 1 30. B = belt speed.1. 345 .00 A 53 160 303 22 30. so all of the main effects are significant at level .25 2.22 BC 1 12. SS = (contrast)2 Condition Total 1 2 Contrast (1) 23.2 - A 43.3 -118.1 213.8 103.5 ABC 16.2 24.1.645 5. the B and C main effects are significant at the .5 -. and for ease of calculation.32 .6 43.005 B 71. 05.398 BC 1 1740.Chapter 11: Multifactor Analysis of Variance 54.245 8 We assume that there is no three-way interaction.9 81.645 AB 76.0 1740.6 -36.68 BC 17.8 -15.005 1.7 -28.8 1507.000 <1 C 1 1507. both amount of water (B) and the land disposal scenario (C) affect the leaching characteristics under study.2 51.5 -4..05 level.3 18.68 2.005 AC 33.5 40.8 = 5.4 70.5 3.247* Error 1 43.2 4.848* AC 1 103. 346 .179 B 1 248.750* AB 1 18.5 44. as well as the BC interaction.0 -3.3 -109.0 18. we divide each observation by 1000.0 33.6 248.3 -12.7 20.0 19.0 147.245 Total 8 F. so the MSABC becomes the MSE for ANOVA: With Source df MS f A 1 51.4 103.005 34. We use Yates’ method for calculating the sums of squares.1 66. and there is some interaction between the two factors.5 317.000 C 37. We conclude that although binder type (A) is not significant. 00 .  11  11  11 We use estimate = b.00 . αβ  = 0 . αγ  = 2.00 .50 .00 . Similarly. βˆ1 = = 2. and  δγ  = 4. D 20 effect C 10 A CD 0 -2 -1 0 1 2 z-percentile The plot suggests main effects A. αδ  = 2. 2 16 16 16 336  ∧   ∧   ∧  γˆ1 = = 21 . and perhaps the interaction CD as well.75 .75 .25 . 16  11   11   11 ∧ ∧ ∧        βδ  = .Chapter 11: Multifactor Analysis of Variance 55. (See answer section for comment.  βγ  = . C. Effect %Iron 7 11 7 12 21 41 27 48 28 51 33 57 70 95 77 99 A B AB C AC BC ABC D AD BD ABD CD ACD BCD ABCD 1 18 19 62 75 79 90 165 176 4 5 20 21 23 24 25 22 2 37 137 169 341 9 41 47 47 1 13 11 11 1 1 1 -3 3 174 510 50 94 14 22 2 -2 100 172 32 0 12 0 0 -4 Effect Contrast 684 144 36 0 272 32 12 -4 336 44 8 0 72 -32 -12 -4 SS 1296 81 0 4624 64 9 1 7056 121 4 0 324 64 9 1 contrast when n = 1 (see p 472 of text) to get 2p 144 144 36 272 αˆ 1 = 4 = = 9. and D are quite important.50 . δˆ1 = = 17 . a.) 347 . 2. and we 27.40 .2 12.8787 Since 3.2 x•• • = 36 .5880 0.5120 .0 2 7.8787 .8 16.56 ≥ F.56 pH 2 0.00 Error 24 0.0) + (20.8 10.2 6.128 Analysis of Variance for Average Bud Rating Source DF SS MS F Health 1 0. 348 .1.00 is not continue: ≥ F.6507 0.3253 15.560 . 24 = 4.560 − = .05.0213 Total 29 1.0 5.26 .6813 . SSB = 10 2 2 and by subtraction.5880 27.5880 . 2.0640 3.6813 = 1.2) − CF = . we fail to reject the no interactions hypothesis.24 (16.2) + (12. and 15. so we conclude that both the health of the seedling and its pH level have an effect on the average rating.05. SSA = 5 15 2 2 2 (13.4 20. ΣΣΣxijk = 45. The summary quantities are: j i CF = (36.2) − CF = .2 x • j• 13.6 6.8) + (10. 24 = 3.40 . SSE = 45.Chapter 11: Multifactor Analysis of Variance 56.5120 0.2 4. 225. so 2 30 SST = 45.25 ≥ F.2 = 43. 05.6507 .1280 0. 24 = 3.25 Interaction 2 0.2 )2 x ij• 1 2 3 x i• • 1 6.560 − 43. SSAB = . 199 6552.12* 3.350 2.545 64.165 10.04 5.40 AB 3 4.51 ABC 6 4. statistically significant factors at the level . so all effects.39* 2.40 BC 6 15.372 All calculated f values are greater than their respective tabled values. are significant at level .01 C(concen.01 AC 2 7. 58.178 2. 59.26 B(time) 3 5.218 436. 349 .600 Total 47 70.57 4.49 C 2 516.728 1.713 839 21.21 2.05 A(pressure) 1 6.868 1.11 ABC 8 6.717 43.11 BC 4 10.25 3. There are no significant interactions.) 2 12. including the interaction effects.01 are adhesive type and cure time.94 6.51 Error 24 14.33 6.633 4.897 1342.870 3.57 4. Source df SS MS f F.37 .61 1.40 . The conductor material does not have a statistically significant effect on bond strength.436 17.30 3. Based on the p-values in the ANOVA table.10* 3.793 52.398 258.28* 3. The ANOVA table is: Source df SS MS f F.80 2.29 4.940 11.064 39 Total 53 692.731 69.922 2.57* 4. However both AC and BC two-factor interactions appear to be present.01 A 2 34.01.Chapter 11: Multifactor Analysis of Variance 57.49 B 2 105.26 Error 27 1.82 There appear to be no three-factor interactions.11 AC 4 10.05 1.660 6.30 5.49 AB 4 6.32 3.92 5. .. with similar expressions for SSB.497.916 x. 2 1 2 3 4 5 Σx 2 xi.32 B (temp.502* 3.958* 3.291 313. .84 B 4 227. SSA = ∑∑ (X i..170 Also.473 3.84 Error 8 962..k .9* ≈ 3. j .2 with N2 – 1 df. = 2294 .) 2 5. : 470 451 440 482 451 1.594 3.l : 340 417 466 537 534 1. However.. each having N – 1 df. x...32 Interaction 4 1.3 1.76 71. while while HoC and HoD are rejected.44 Source df SS MS f F.0 28.76 56. 350 .4 ≈ 2.348 Interaction appears to be absent.69..94 .. 61. : 482 446 464 468 434 1..080.053.05 A 4 285. ) = 2 i j X2 1 Σ X i2.. − X ... and CF = 210.182 2591. both diet and temperature appear to affect expected energy intake.44 . SST = ∑∑ ( X ij( kl ) − X ..3* ≈ 3.6 Total 44 36.138 9.626 x.737 434.053.94 5..Chapter 11: Multifactor Analysis of Variance 60. : 372 429 484 528 481 1.34 Total 24 HoA and HoB cannot be rejected. ΣΣxij2(kl ) = 220..56 1384.378 ..05 A (diet) 2 18.. leaving N N − 1 − 4( N − 1) df for error.. N N and SSD..84 C 4 2867. ) = ∑∑ X ij2( kl ) − 2 i j i j X .14 11.76 716.72 120. SSC. since both main effect f values exceed the corresponding F critical values.0 8..826 x. Source df SS MS f F..066..84 D 4 5536...69 Error 36 11. − . The efficiency ratio is not uniquely determined by temperature since there are several instances in the data of equal temperatures associated with different efficiency ratios.24) is very large compared to the typical value of 1.6 and the distribution appears to be positively skewed. The distribution is reasonably symmetric in appearance and somewhat bell-shaped.6. The variation in the data is fairly small since the range of values ( 188 – 170 = 18) is fairly small compared to the typical value of 180. a typical value is around 1. The two largest values could be outliers..1 1.CHAPTER 12 Section 12. 351 .08 . the five observations with temperatures of 180 each have different efficiency ratios. b.84 = 2. 0 889 1 0000 13 1 4444 1 66 1 8889 2 11 2 25 26 2 3 00 stem = ones leaf = tenths For the ratio data. a. The variation in the data is large since the range of the data (3. For example. Stem and Leaf display of temp: 17 0 17 23 17 445 17 67 17 18 0000011 18 2222 18 445 18 6 18 8 stem = tens leaf = ones 180 appears to be a typical value for this data. 3 2 4 3 2 1 1 0 0 0 5 10 15 0 5 10 15 Age: Age: With this data the relationship between the age of the lawn mower and its NOx emissions seems somewhat dubious. A scatter plot of the data appears below. Age does not seem to be a particularly useful predictor of NOx emission. Ratio: 3 2 1 170 180 190 Temp: Scatter plots for the emissions vs age: 7 5 6 4 5 Reformul Baseline 2. 352 .Chapter 12: Simple Linear Regression and Correlation c. One might have expected to see that as the age of the lawn mower increased the emissions would also increase. We certainly do not see such a pattern. The points exhibit quite a bit of variation and do not appear to fall close to any straight line or simple curve. Chapter 12: Simple Linear Regression and Correlation 3. a. Box plots of both variables: BOD m as s removal BOD mass loading 0 50 100 0 150 10 20 30 40 50 60 70 80 90 y: x: On both the BOD mass loading boxplot and the BOD mass removal boxplot there are 2 outliers. The points fall very close to a straight line with an intercept of approximately 0 and a slope of about 1. This suggests that the two methods are producing substantially the same concentration measurements. Both variables are positively skewed. y: 220 120 20 50 100 150 200 x: 4. A scatter plot of the data appears below. 353 . This value is (37. The two outliers seen on each of the boxplots are seen to be correlated here. 5. As the loading increases. Temperature (x) vs Elongation (y) 250 200 150 y: a.Chapter 12: Simple Linear Regression and Correlation b. 9). so does the removal. 100 50 0 0 20 40 60 x: 354 80 100 . Scatter plot of the data: BOD mass loading (x) vs BOD mass removal (y) 90 80 70 y: 60 50 40 30 20 10 0 0 50 100 150 x: There is a strong linear relationship between BOD mass loading and BOD mass removal.0) is shown below. There is one observation that appears not to match the liner pattern. One might have expected a larger value for BOD mass removal. The scatter plot with axes intersecting at (0. As the resonance frequency increases the sum of peak-to-peak acceleration tends to decrease. expected change = slope = c. a. Temperature (x) vs Elongation (y) 250 y: 200 150 100 55 65 75 85 x: c. One might investigate if these rackets differ in other ways as well. expected change = 100 β1 = 130 d. 7.Chapter 12: Simple Linear Regression and Correlation b. One should also notice that there are two tennis rackets that appear to differ from the other 21 rackets.3 355 . there is not a perfect relationship. There appears to be a linear relationship between racket resonance frequency and sum of peak-to-peak acceleration. Both have very high resonance frequency values. 6. However.3(2500) = 5050 b. Variation does exist. µ Y ⋅ 2500 = 1800 + 1. A parabola appears to provide a good fit to both graphs. expected change = − 100β 1 = −130 β1 = 1. The scatter plot with axes intersecting at (55. 100) is shown below. We expect flow rate to decrease by c.97   2 d.3( x2 − x1 ) . 100 − 650   Thus P(Y2 − Y1 > 0) = P z >  = P (Z > . so Y1 .20 ) = .40 ) = .33 .Y2 has expected value .095   P(Y1 > Y2 ) = P(Y1 − Y2 > 0) = P z >  = P(Z > 2. and s.3446 . respectively.025) 2 + (.3(x 2 − x1 ) − 1. 494.095.025   .4443 b. Thus + . so P(Y > 5000) 5000 − 4400   = P Z >  = P( Z > 1. and σ = 350 .925.830 .305 .0436 350   P(Y > 5000) = P(Z > . and V (Y2 − Y1 ) = V (Y2 ) + V (Y1 ) = (350 ) + (350 ) = 245.3x1 ) = 1.645 = .025   Let Y1 and Y2 denote pressure drops for flow rates of 10 and 11.12 + .12 + .0036 . 2 Y2 − Y1 = 494. .095(15) = 1.095(10) = .71) = .830   P(Y > .000 .97 9. Now E(Y) = 5050.840 − . a.3 x 2 − (1800 + 1.3( x2 − x1 )   P(Y2 > Y1 ) = P(Y2 − Y1 > 0) = P z >  = .69 ) = .97 (from c).. e. β 1 = expected change in flow rate (y) associated with a one inch increase in pressure drop (x) = .83. µ Y ⋅ 2000 = 1800 + 1.925 = -. so the s.4207 . d. (.71) = . E (Y2 − Y1 ) = E (Y2 ) − E (Y1 ) = 5050 − 4400 = 650 .14 ) = .035355 .830   P(Y > .095.97   − 1.2389 494. µ Y ⋅10 = −.475 . and E (Y2 − Y1 ) = 1800 + 1.97 .835 − .3(2000) = 4400 . a. 5β 1 = . of Y2 − Y1 = 494.835) = P Z >  = P(Z > .840 ) = P Z >  = P(Z > .d.d.025 )2 = .035355   356 . so c. Then µ Y ⋅11 = .95 implies that 494. so x 2 − x1 = 626. Thus The standard deviation of − 1. and µ Y ⋅15 = −.Chapter 12: Simple Linear Regression and Correlation 8. b. 00 − . so the two − 8500  − 17.6 − 2 .5 .01 . c.500 = −1.8164.09 = .8164 . Y has expected value 14.10 . 11. − (− . The probability that any particular one of the other four observations is between 2. .000 when x = 2000.5   2 . and µ Y ⋅ 250 = 2.4641 .10607   357 . a σ σ probabilities become contradiction.6) = P ≤Z ≤  . The standard (.8164 )5 = . and 10β 1 = −.33 ≤ Z ≤ 1. The probability that the first observation is between 2.4 and 2. This gives two different values for σ .Y2 has expected value deviation of Y1 .000 when x = 1000 and 24. β 1 = expected change for a one degree increase = -.01(x + 1) − (5.01(200 ) = 3 .6 is also .3627 .075)2 + (.5 P(2. respectively.28 .075 = P(− 1.4 and 2.645 and = −1.6 is d.05 and P z >  = . a.33) = .01.01)  ( ) Thus  P(Y1 − Y2 > 0) = P z >  = P Z > .Y2 is 5.10607 . Let Y1 and Y2 denote the times at the higher and lower temperatures.00 − . b. Then Y1 .Chapter 12: Simple Linear Regression and Correlation 10.00 − . Thus σ  σ    − 8500 − 17. so the probability that all five are between 2.075 )2 = .4 and 2. (.6 is 2 .4 − 2 .1 is the expected change for a 10 degree increase.075   . so the answer to the question posed is no. µ Y ⋅ 200 = 5.4 ≤ Y ≤ 2.01x ) = −.500    P z >  = . 002.454 − (346)2 = 8902. so the equation of the least squares n 14 regression line is y = .095 − = 20.2 12. Without the two upper ext reme observations.743 .626 + .747 =1− = .626 + .652)(517 ) βˆ 0 = = = . and the slope slightly.857 .2891 + .047.79 = .56445 and βˆ 0 = 2.456 = −2. r 2 = 1 − e.333 .714 . SST 8902 . Σ x = 272. c. n−2 12 SSE 395. which is much worse than that of the original set of 998.652 .56445x .652 x .6879 . the new summary values are n = 12. SST = S yy = 8902. The new r 2 =1− 311. Removing the two values changes the p osition of the line considerably. The residual is y − yˆ = 21 − 23. yˆ (35) = . Σ y 2 = 3729. S xy = 25. S xx b.714) = 395.652(35 ) = 23.825 − 1 14 S xx 20. S xy = 1217. σˆ = SSE 395. Σx 2 = 8322.Chapter 12: Simple Linear Regression and Correlation Section 12.2891 .667. a. S yy = 998.747 = = 5. βˆ = S xy = 13.857 .929 .747 .652)(13047. Σy = 181. so 14 SSE = 8902.857 d. 2 ( 517) = 39. 358 .456 . The new S xx = 2156. New βˆ1 = .626 .956 .917.917 observations.002.714 = .929 ˆ Σ y − β 1Σx 346 − (. 14 (517 )(346 ) = 13047. which yields the new equation y = 2.456 . S yy = 17. Σxy = 5320 .857 − (. 9977 . Σ xi = 4308 . These residuals do not all have the same sign because in the cases of the first two pairs of observations. Σx 2i = 12.243.972 . S xx 2000 4 4 SSE = S yy − βˆ1S xy = 2. SSE .243 .14085 − (.8997 = 0.5 5. 1.3501 . n = 24. the observed efficiency ratios were smaller than the predicted value of 1. S xx = 12. Σ xi yi = 333 . 14.8823 . c.09092 182 = 1. 1. we would predict an efficiency ratio of 1.0 .8997 = −0.09092)(45. n = 4.8997 = 0.4202 .8246 S xy = 7 . The equation of the estimated regression 24 24 line is yˆ = −14 .09 4308 βˆ0 = − (. This is a very high value of r 2 . SSE = S yy − βˆ1S xy = 9.9153 . 2. S xx = 773 .65 . and (182.6497 + . and 24 (4308 )(40 . (182.37 )2 4 (200) 2 4 = 2000 .37 . (182.7489 r 2 =1− =1− = .060750 r 2 =1− =1− = .7823 .90 − 1.8997.09 ) = 45 .27000 .03225 and βˆ 0 = − (. 2. yˆ = −14.09 )2 = 504 .09092 ) = −14. SSE 5. d. Σ xi = 200 . (42.5 .37) = 64. Whereas.8246 .000 .09 .6497 + .37 200 = .81). The four observations for which temperature is 182 are: (182.Chapter 12: Simple Linear Regression and Correlation 13.14085 βˆ1 = = the authors’ claim that there is a strong linear relationship between the two variables.68 − 1.03225 )(64.790 − S yy = 76.8246 ) = 5. For this data.) 359 .90).9153 efficiency ratio can be attributed to the approximate linear relationship between the efficiency ratio and the tank temperature.02% of the observed variation in SST 9.03225 ) = −. Σ yi = 5. So when the tank temperature is 182. = 2.94 − 1.140875 . 1. 09092 and 24 S xx 504 40. a. .68).09092 x .8997.94).5) = .9153 − (.65 − β1 = = = . ( ) When x = 182.060750 . in the cases of the last two pairs of observations.7489 . b. which confirms SST 2. and S xy = 333 − (200 )(5.8823 − (4308 )2 24 (40 . Σx 2i = 773.6497 .790 . Σy 2i = 9.8997 = −0. Σ yi = 40. Σy 2i = 76.8997 . 1.81 − 1. the observed efficiency ratios were larger than the predicted value.000 − S yy = 9. 4 64. Σxi yi = 7. ˆ S xy 45.0877 . Their corresponding residuals are: .0423 .3501 − S xy (5. = 9. and there is a reasonably large amount of variation in the data (e. there is some positive skew in the data.. c. For a beam whose modulus of elasticity is x = 40. note that the two pairs of observations having strength values of 42.10748 x .0-29.605.2 is large compared with the typical values in the low 40’s).2925 + . the spread 80. potentially misleading) to extrapolated the linear relationship that far.0).738 (or 73. so it would be dangerous (i. The value x = 100 isfar beyond the range of the x values in the data. the predicted strength would be yˆ = 3.e.736. the strength values are not uniquely determined by the MoE values.Chapter 12: Simple Linear Regression and Correlation 15.59 .2925 + .10748 40 = 7. and the coefficient of determination is r2 = . The following stem and leaf display shows that: a typical value for this data is a number in the low 40’s.8 = 50. The least squares line is yˆ = 3. ( ) d. For example..8 have different MoE values. There are some potential outliers (79. SSE = 18. From the output.g. which suggests that the linear relationship is a useful approximation to the true relationship between these two variables. SST = 71. 29 3 33 3 5566677889 4 1223 4 56689 51 5 62 69 7 79 80 b. 360 .5 and 80. a. The r2 value is large.8%). stem = tens leaf = ones No. 435.24 .200 .867 . y = 42.07 .7 . SSE = S yy − βˆ 1S xy = 14.2207 . βˆ = S xy = 17. 232 − 1 15 S xx 20.Chapter 12: Simple Linear Regression and Correlation 16.2 = −1.024.1278 . = 14.324.024.867 − (.435.586. runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall.82697 and = 51.4 . Rainfall volume (x) vs Runoff volume (y) 100 90 80 70 y: 60 50 40 30 20 10 0 0 50 100 x: Yes. S xx = 63040 − S yy S xy (798 )2 2 ( 643) = 41. d.999 − 15 = 20. SSE 357.07 r 2 =1− =1− = .4 βˆ 0 = 42. and 15 (798 )(643) = 17.07 = = 5.4 = . c.9753 .4) = 357. 361 . x = 53. µ y ⋅50 = −1.435. the scatterplot shows a strong linear relationship between rainfall volume and runoff volume.1278 + . So 97.7 s = σˆ = e.82697 )(17.82697 )53. thus it supports the use of the simple linear regression model.586. b.53% of the observed variation in SST 14. a.82697(50) = 40.7 − (.4 . n−2 13 SSE 357. 933. 15 b. d.14 ) = 44.175695 − . r 2 = .46 n−2 21 15(987.398)(. For a one (mg /cm2 ) increase in dissolved material.175695 .037.86 = 1 − Then 18.86) = (320.68 ) − 404.645 ) − (1425)(10. b. βˆ1 = −.00736023 2 54.878 c. SST SSE 44.144 (g/l) increase in calcium content. but the deflection factor cannot be negative. s= SSE .3250 βˆ1 = = = −. so SSE = (SST )(1 − .4112 − .Chapter 12: Simple Linear Regression and Correlation 17. With x now denoting temperature in ( ) 9 οC .7500 15(139.678 + .0132484 and 5 ˆ the new β 0 = 1.007360 x . one would expect a .0608 .00736023 c. Using the equation of a.00736023)(1425) βˆ 0 = = 1. y = βˆ 0 + βˆ1  x + 32  5  9 = βˆ 0 + 32 βˆ1 + βˆ 1 x = 1. predicted y = βˆ 0 + βˆ1 (200) = −.85572 = = 1. so the new βˆ 1 is -.144(50 ) = 10. a.68 − (− . 86% of the observed variation in calcium content can be attributed to the simple linear regression relationship between calcium content and dissolved material. Note: n = 23 in this study.25 ) − (1425) 10. µ y ⋅50 = 3. 362 . a.41122185 . Secondly.85572 . y = 1.0132484 x . Σx 2i = 913.16 c. SST 1. µˆ Y ⋅15 = βˆ 0 + βˆ1 (15 ) = 7.4 βˆ1 = = . a.207. βˆ1 = .57 20. No. βˆ 0 = −2.51 c. Σ xi yi = 1. Note: SSR 1.72 363 . b. σˆ = .0427 = = . predicted d.9668 b.413.500 y = −45.5519 + 1.7114(225) = 339.18247148 744. βˆ 0 = −45.71143233 .100 . . SST = 1. 491. The summary statistics can easily be verified using Minitab or Excel.717 which matches R-squared on output. βˆ 0 = . N = 14. so we use the equation 1. Σ yi = 5010 . the value 500 is outside the range of x values for which observations were available (the danger of extrapolation). βˆ1 = = 1.7% of this variation can be explained by the model.4533 21.4533.Chapter 12: Simple Linear Regression and Correlation 19.000 a. b.7114 x .902.3651 .500 3.256.66034186 . 71.55190543 .5519 + 1.8485 c. etc. a. µˆ Y ⋅ 225 = −45.72 y = βˆ 0 + βˆ 1 (15) = 7. Σy i2 = 2. Σ xi = 3300 . = −50 βˆ1 = −85.750 . Estimated expected change d.1932 d. 049245 = 1 − .8518 − a.6) + .71 24. Using the (10.207.049245 .41122185)(10. 205.235..801 .9.41122185 . The regression equation is y = 138 + 9. a simple linear regression model does appear to be plausible.500 ) = 16. 205.24764 23. 414.00736023)(987. .003788 .8518 − (1.06155 13 b.45 = . 364 r 2 = 99. This new equation appears to differ significantly.55190543)(5010 ) − (1.213.71 so r 2 = 1 − 16.68 )2 15 = . y i ' s given to one decimal place accuracy is the answer to Exercise 19.55 x*. The desired value is the coefficient of determination. and σˆ = s = . y 12 00 7 00 2 00 0 50 100 x According to the scatter plot of the data.413.199 = . a.75 SSE = 7. 2 ( 5010) SST = 2.049245 s2 = = .64 .235.Chapter 12: Simple Linear Regression and Correlation 22. a.933.100 − (− 45. d. where using the original data.5. 0% .45 b.0 ) = 16.961 .207.31 x c.24764 so r 2 = 1 − . 54.654) = .. The computation 2 2 formula gives SSE = 2. βˆ 0 = 1. + (670 − 639. . b.00736023 . The new equation is y* = 190 + 7.325 βˆ1 = = −. If we were to predict a value of y * for x* = 50. SSE = (150 − 125.68) − (− . the value would be 567. SST = 7. the predicted value for x = 50 would be 603.71143233)(1.100 − 14 = 414. − 404. Since Σ ( xi − x ) = 0 . so that ( x. and since ∂f ∂ b0 2 Σ ( xi − x ) y i = Σ( xi − x )( y i − y ) [ because Σ ( xi − x ) y = yΣ ( xi − x ) ]. Equating 2 ∂f 2 to 0 yields nb0 + b1Σ (x i − x ) = Σ y i . Σ y − βˆ1Σ xi βˆ 0 = i and βˆ 1 for b o and b 1 on the left hand side of the normal n n Σy i − βˆ1Σ xi equations yields + βˆ 1Σx i = Σ y i from the first equation and n 2 2 Σxi Σ y i − βˆ1Σx i Σx iΣ y i βˆ1 nΣ x i − (Σx i ) 2 ˆ + β1 Σ x i = + n n n Σxi Σy i nΣx i y i Σx i Σy i + − = Σxi y i from the second equation. The least squares Σxi Σx iYi estimator of βˆ 1 is thus βˆ 1 = . y results. * * Thus βˆ = Y and βˆ = βˆ . a. n n n Substitution of ( ( 26. the new y intercept (height at x = 0) is the height of the old line at x = x .Chapter 12: Simple Linear Regression and Correlation 25. Subtracting x from each x i shifts the plot in a rigid fashion x units to the left without otherwise altering its character. ( ) We show that when line ) ) x is substituted for x in βˆ 0 + βˆ1 x . We wish b 0 and b 1 to minimize f(b 0 . y ) is on the Σy − β1Σxi y = βˆ 0 + βˆ1 x : βˆ 0 + βˆ1 x = i + βˆ1 x = y − βˆ 1 x + βˆ1 x = y . b. b1 = βˆ1 . Equating f ′(b1 ) to 0 yields Σx y 2Σ ( yi − b1 x i )(− x i ) = 0 so Σxi yi = b1Σx 2i and b1 = i 2 i . 27. b 1 ) = to Σ[y i − (b0 + b1 ( xi − x ))] . which is βˆ 0 + βˆ1 x = y (since from exercise 26. Σxi2 2 We wish to find b 1 to minimize 28. n Σ( y i − b1 xi ) = f (b1 ) . b0 = y . 0 1 1 365 . Since the new line is x units to the left of the old one. b0Σ ( xi − x ) + b1Σ (x i − x ) ∂b1 = Σ( xi − x ) = Σ( xi − x ) y i . Thus the new y intercept is y . y ) is on the old line). ( x. The last squares line for the new plot will thus have the same slope as the one for the old plot. For data set #1.0175 and the standard βˆ 1 is .9412 . In general.1323 .000 now.323   1.001017 .90 for #3. 31. of βˆ1 = .03 for #2.89 ) = . which is smaller than in a.5 − 1. βˆ1 = −. s = .100. s 2 = .3 30.d.00956. −.645) = . ( ) 2 Σ( xi − x ) = 7.99 and 1.41122185)(10. one hopes for both large r2 (large % of variation explained) and small s (indicating that observations don’t deviate much from the estimated line). Simple linear regression would thus seem to be most effective in the third situation.03 . βˆ 0 = 1.00000103 .00516 ) 366 . ( ) the choice of x values in a.41122185 . so SSE = 7. Because this appears in the denominator of V βˆ1 . 3662.68 ) − (− . and .00220 = (− .000 . a. (350 )2 7.04925 .160 )(.43 and σˆ = s = 4. whereas these quantities are .25 P 1.0 ≤ βˆ1 ≤ 1.001017 ) = −. ( ) 1. the variance is smaller for 2 Although n = 11 here and n = 7 in a.5 = P ≤Z ≤  1. c. Section 12.003788 = .00736023 .00736 ± .00000103 = .000 = . Σ( xi − x ) = 1.99 and 4.25 Σ x − (Σ xi ) / n = estimated s.00736 ± (2.25   1.0 − 1. a.323 = P(− 1.000. σˆ β2ˆ = 1 b.000. so V βˆ1 = deviation of b.8518 − (1. r2 = . 1 = .00736023)(987.06155 .003788 .Chapter 12: Simple Linear Regression and Correlation 29.89 ≤ Z ≤ 1. 1 σˆ βˆ = s βˆ s2 2 i 2 − .0175 = . From the printout in Exercise 15. b. = n – 2 = 25. The conclusion is the same as that of part a. for each 1 m3 increase in rainfall.5 . t .58 . H o : β 1 = 1. 0. The data contradict the prior belief. a.01280 t= 1 such as this would not lead to rejecting Ho .13 = 2. we can be confident that the true average change in runoff. b. β 1 denote the true average change in runoff for each 1 m3 increase in rainfall.025.01280 ) = (. H a : β 1 > .025. a.9668 − 1..1 .134 MPa.64 which (from the printout) has an associated p-value of P = s βˆ ..82697 ± (2.081. therefore reject Ho . 33. H a : β 1 ≠ 0 t > t α / 2. Therefore. A confidence interval for β 1 is based on n – 2 = 15 – 2 = 13 degrees of freedom. We wish to test H o : β 1 = . we estimate with a high degree of confidence that the true average change in strength associated with a 1 Gpa increase in modulus of elasticity is between . so the interval estimate is βˆ1 ± t . c. t .000. The calculated t statistic is βˆ1 − .f.718 0. Therefore. To test the hypotheses H o : β1 = 0 vs.01 .134 ) . 34. the error d.03652 ) = (.748 m3 and . n − 2 or t < −2.106 t = 5. At the level α = 0. A large p-value s βˆ . and the model appears to RR: be useful.92 : Reject Ho .748. In this case.906 m3 .03652 1 Let 0.000.01.1829 367 . Ho is rejected and we conclude that there is a useful linear relationship between runoff and rainfall.906 ) .29 : Reject Ho .160 . the reported p-value was 0. reject h o if the p-value is less than 0.82697 t= 1 = = 22. 25 ⋅ s βˆ1 = .Chapter 12: Simple Linear Regression and Correlation 32.060)(. the calculated t statistic is βˆ .1 .277.060 .1 vs. which yields a p-value of . 025. H a : β 1 < 1. H a : β 1 ≠ 0 . Therefore.10748 ± (2. 25 = 2.13 ⋅ s βˆ1 = . since the p-value is so small. 025.160)(.5 t= = −2.1 = = . H o : β1 = 0 . The slope differs significantly from 0. is somewhere between . n− 2 or t > 3. so there is not enough evidence to contradict the prior belief.081 MPa and .5 RR: t < −t α . The confidence interval is then βˆ1 ± t .10748 − . we would reject the null hypothesis at most reasonable significance levels.15 ⋅ s βˆ1 . 238. Then 418.50).55 .632. 115.28 = 5.1) We need βˆ 0 = = −8.6 – the range of values sampled. and then βˆ1 = 1. βˆ1 . The observation does not seem to be exerting undue influence. With df=15.561 ± 2.0.112 .1)(193) = 238. A regression model is only useful for estimating values of nausea % when using dosages between 6.131. our 1 15 155.551 . and the test statistic is 1. and the new CI 1 2 ( 2 ) is 1. 2.440).2494 .5 .Chapter 12: Simple Linear Regression and Correlation 35. We want a 95% CI for β1: Using the given summary statistics. or ( . c. a. b.145 ⋅ .002.69 .6226) = 2( . Σ xi yi = 2744. With a p-value of . s = 5. Σ xi = 216 . But removing the one observation did not change it that much.632 and 2. S xy = 2759. βˆ1 ± t .551 . With t .1) = 3056.118 .536 ± 2. Σx i = 3020. the two-tailed p-value = 2P( t > 3. Σ yi = 191.536)(2759.379.019 17 S xx 193 − (1. 2.0 and 17. No.536 .424 = .025.715)(193) − (1.1 .28 and s βˆ = = . and βˆ1 = 17 S xy = = 155.75 .69 − (222. s βˆ = .019 .6 − S xx 2 ( 222. The interval is a little wider. 2.112 = 1. the new summary stats are: n = 16.424 ) = ( .440.2494 5. This suggests that there is a useful linear relationship between motion sickness dose and reported nausea.715 to calculate the SSE: 17 SSE = 2975 − (− 8. Σy i = 2968.019 CI is 1.536 = 3. d.536)(222.424 .6 . First.6226 .025. With 95% confidence. we need our point estimate.5264.001) . βˆ 0 = −9.131 ⋅ (.15 = 2.561 . 368 .6 ) = 418. Removing the point (6. we estimate that s= the change in reported nausea percentage for every one-unit change in motion sickness dose is between . SSE = 430. We test the hypotheses t= H o : β1 = 0 vs H a : β 1 ≠ 0 . .743).002. β1 .00041069.0006667 t= = −. the average y .0006667 .00083147) We are asked for a confidence interval for 369 .00007579 ) = (. r2 = .4 0 100 200 300 400 500 600 700 800 900 1000 fluid flow velocity b. The test statistic is .00062108 − . supports the decision to use linear regression analysis. a.00007579 were to increase by . If.6. Increasing x from 100 to 1000 means an increase of 900. generated by Minitab.7 0. A scatter plot.931 ( which is close to the hand calculated value. There is not .0006667 . Using the values from the Minitab output. the slope would be sufficient evidence that with an increase from 100 to 1000.8 0. we have .0 mist droplets 0.9 0. H a : β1 < . as a result.776(. r2 . which is not significant..Chapter 12: Simple Linear Regression and Correlation 36. From the Minitab output.00062108 ± 2. We should test the 900 hypotheses H o : β1 = . 1.6 0.601 .) c.0006667 vs. the difference being accounted for by round-off error.5 0. d.6. the true average increase in y is less than . We are asked for the coefficient of determination.6 = . 0060 .I.1860 .205.217 = (1. b. S xx = 18.28) . 1.0060 vs H a : β 1 ≠ .636 1. so p-value < . for SSE = 124.04103377 t= = 4.8 = 1. b.33041347 / 18.06 ≥ 1.000262 38.454 .1.828 Let’s use α = . s 2 = 1350.928) .860 .5 . 12.10. so H o : β1 = 0 is rejected and the s = 1.860 or t ≤ −t .11009492 = &. Ho is rejected if either t ≥ t . Then F. so βˆ1 = = .05 . Thus 368.Chapter 12: Simple Linear Regression and Correlation 37. Σ yi2 = 161. from 1. so H o : β 1 = .0997 H o : β1 = 0 is rejected at level .958547)(1574.14 .8295 .33041347 . We wish to test a.77 = 1.039. so .9679. Ho is rejected.711 ± (2.00680058 .77 Total 19 39. which also refers to Exercise 19. At level .88) = 7. The C.0005.0997 ) = 1.75 = .1.0 = f . 94.01.525 which SSE = .2 > 4.711 ± . Σ yi = 39. s = .19.8295 model is judged useful.828 Source df SS MS f Regr 1 31.11009492 s = . SSE = 16.968 1.001. Because the p-value < . for 10β 1 is (14.110 σˆ βˆ1 = = .453. β 1 is 1.318 = t .94 . and SST = 39.01 in favor of the conclusion that the model is useful t= ( β1 ≠ 0 ) .0060 t= = 3.0 .001.027 Σ xi yi = 11.04103377)(222657.675 .0 Error 18 7.20 .I.18 = 15. From Exercise 23. βˆ 0 = 2.852 a. n = 10.09696713. Thus the C.2426 )2 = 18. Σ xi2 = 860.000262 176.179 )(.2426 .8 = −1.110 = σˆ . and t 2 = (4. Σ xi = 2615 .75 .0068 − .8) – (. 39. .38 < 18.921.860 18. . s = 36.768. .711 = 17.14164770 .58– (72.860 31. and s βˆ = 1 36.494.921.05 . Since . 370 .0997 .45.001. (Σx i )(Σ Yi )] c Σ x Σx n n = ∑ x i E (Yi ) − i ∑ E(Yi ) = ∑ xi ( β 0 + β1 x i ) − i ∑ ( β 0 + β1 x i ) c c c c β1 nΣx i2 − (Σxi )2 = β1 . E βˆ 0 = n E (Σy i ) E(ΣYi ) = − E βˆ1 x = − β1 x n n Σ(β 0 + β1 x i ) = − β1 x = β 0 + β1 x − β 1 x = β 0 n We use the fact that ) ( ) 41.Chapter 12: Simple Linear Regression and Correlation 40. Then E βˆ1 = E [nΣ xiYi . or not at all. ( ) 1 2 c = nΣx i2 − (Σxi ) .17 as approximately . The approximate power curve is for n – 2 df = 13. 43. 371 . βˆ1 = Σ ( xi − x )(Yi − Y ) = Σ( xi − x )Yi (since c c 1 2 Σ ( xi − x )Y = Y Σ (x i − x ) = Y ⋅ 0 = 0 ). SSE will change by the factor d 2 . so s will change by the factor d.. Thus βˆ 1 will change by the factor d . so 324. The numerator of βˆ 1 will be changed by the factor cd (since s both Σ xi y i and (Σx i )(Σ y i ) appear) while the denominator of βˆ 1 will change by the factor 2 2 c2 (since both Σ xi and (Σ xi ) appear). as desired. t itself will change by ⋅ = 1 .. c d The numerator of d is |1 – 2| = 1. ] 1 1 2 c = Σ( xi − x ) .. Because c 2 SSE = Σ( y i − yˆ i ) . and β is read from .831 .831 Table A.1.Yi − (Σx i ). 2 2 2 c Σ ( xi − x ) Σxi − (Σ xi ) / n With ( ) 42. c Let [ b. d c Since • in t changes by the factor c. a.20 .40 1 = 1. and the denominator is d= 4 14 = . Σxi2 − (Σx i )2 / n t = βˆ1 . ( ) ( E Σy i − βˆ1Σ xi βˆ 1 is unbiased for β 1 .. so V βˆ1 = 2 Σ (x i − x ) Var (Yi ) c 2 2 1 σ σ 2 = 2 Σ ( xi − x ) ⋅ σ 2 = = . 592 ± .895 )2 + = . s yˆ values.3349 . the s yˆ s yˆ for x = 60.592 ± 1. 372 . the quantity (40 − x ) 2 x = 45. Therefore.961 MPa.179) 7.821 = (5.4669 ) c.4 44. the term will be larger.11 must be smaller than (60 − x ) 2 .088 ± 1 . We wish to find a 90% CI for 45.15. and 1 (125 − 140 .15 is than is x = 60. Said briefly. that the true average strength for all beams whose MoE is 40 GPa is between 7. since these quantities are the only ones that are different in the two value for x = 40 must necessarily be smaller than the closer x is to b.734 (. 025. c.961 .7091 . From the printout in Exercise 12. a. Note that the prediction interval is almost 5 times as wide as the confidence interval.9.7. the simultaneous confidence level is at least 100(1 – 2(. and thus the width of the interval is wider. 25 = 2.592 ± (2.8657 ) + (.3719 .Putting it together. t .413) . the error degrees of freedom is df = 25.18 = 1. Since x = 40 is closer to 45. From the printout in Exercise 12.3349 ) = (77 .11 . making the standard error larger.025.6687 ) s yˆ = s b. d.921 . The mean of the x data in Exercise 12. t . µ y⋅125 : yˆ 125 = 78. s = .060) (.15. so the 95% prediction interval is yˆ ± t . We estimate.734 . Because the x* of 115 is farther away from (x ∗ −x ) 2 x than the previous value.80 . so the PI is 20 18.8295 78 .5073.223 MPa and 7.05)) = 90% a.771. the x .060 . so the interval estimate when x = 40 is : 7. For two 95% intervals.734 (1. We want a 90% PI: Only the standard error changes: 1 (125 − 140 . the smaller the value of s yˆ .223.895 ) s yˆ = s 1 + + = 1 .921.8295 2 78 .592 ± 2.3719 ) = (75 .Chapter 12: Simple Linear Regression and Correlation Section 12. we get 20 18.05.060 ( ( ) )(.8657.369 = 7.088 ± 1. with a high degree of confidence.088 .78 . 25 s 2 + s 2yˆ = 7.179 ) 2 2 = 7. A 95% CI for β1: βˆ1 ± t .48) s yˆ (500) = .2614 and 1 (400 − 471. so we do not reject Ho .11 = 2. The test statistic is t= 78. which is not ≥ 2. We wish to test t= H 0 : y (400) = . 025.709 ..311 + .08 = (.2614 − .23 yˆ (500) ± t .025.088 − 80 = −5.00143(400) = . H 0 : y (400) ≠ . We .00.002223) d.11 ⋅ S yˆ ( 500) = . µ Y ⋅ 500 : yˆ (500) = −.03775) = .519. so the interval is 13 131.201(.210(.25 s yˆ (400) . 373 .201 . and we reject Ho if t ≥ t .32.201 .40 ± . so the calculated 13 131.00143 ± 2.025..000637.25 .3349 would reject Ho .54)2 + = . would the average moisture content of the compressed pellets be less than 80%.25 t= = . c.0445 s yˆ ( 400) = .Chapter 12: Simple Linear Regression and Correlation d. a.2561 .311 + (.00143)(500 ) = . The test statistic is yˆ (400) − .40 ± 2.131 data does not contradict the prior belief.25 vs.40 and A 95% CI for 1 (500 − 471.03775 .0445 .709) ˜ 0. This sample . The width at x = 400 will be wider than that of x = 500 because x = 400 is farther away from the mean ( x = 471. There is significant evidence to prove that the true average moisture content when filtration rate is 125 is less than 80%.54 ).23 .54 )2 + = .11 ⋅ s βˆ1 = .131 b. and with 18 df the p-value is P(t<-5. 46.0003602) = (. We would be testing to see if the filtration rate were 125 kg-DS/m/h.519. yˆ (400) = −. 4 . therefore the available information is not very precise.586.69) .095 + 3.5 ) = .445 ± .160 (5. A new PI for y when x = 1.0647) 2 + (.0293 .095 .6) βˆ 0 = = = −2. Σ x 2 = 63.365 (.693(2.68) = 2.0231) = 3. The prediction interval for a future y value is wider than the confidence interval for an average value of y when x is 1. SSE = 8.22 ± 11. A 95% PI for y when x = 1.5 − 1.0231 . a.162 = (3.216 ) = .3448 − 1 9 S xx .968 − (12.74 = (20.92) .693(1.69 = (28. Σ x = 798.6 )(27.358)2 = 40.0647 + = .5) = 3. so the point estimate is n 9 yˆ (1.21.607 ) . b.82697(40 ) = 31.95 ± 2.3.53.025.216 = 3.2 is farther away from the mean x = 1.4 .390.22 ± 2.44 )2 = 31. the resulting interval is very wide.0231)2 = 3. 48. Thus 7 1 (1 .95 ± 11.445 ± 2. The simultaneous prediction level for the two intervals is at least 100(1 − 2α )% = 90% .0647 . c. t . which yields s= SSE = n−2 .95 .68) = 8.5 is similar: 3.213 βˆ = xy = 2.60 Σ y − βˆ1Σx 27.Chapter 12: Simple Linear Regression and Correlation 47.216 .24 )2 + (1.128 + . S xy = 40.5 is 9 . S yy = 93.586. The 95% CI for µ y ⋅1.445 ± .445 ± 2.365(.445 .6) 2 = .693)(12.24 − (12.160 (5.24 1 (50 − 53.60 3.20 )2 + = 1. No.5) = −2. a.5. so the PI for runoff when x = 50 is 15 20.24) 2 + (1. 374 .3.51.693 .2 will be wider since x = 1. yˆ (40) = −1. which in turn gives s yˆ ( 50) = 5.60 .4) s yˆ (1. 9 9 2 S (27.43.358 .13 = 2. 2 b.040 which gives S xx = 20.160 .50 ) .4 40.055 = (3.0293 = .213 − 3. a 95% PI for runoff is 31.68 − (3. S xx = 18.283. (with t .94 ± 17. 9 = 3.101) ⋅ (0.70. is 11 3834. 95% CI: (462. x.94 ± (2. t .1. 8 = 2.I. 9 = 330. the interval is 274. βˆ 0 + βˆ1 (0.402 99% CI: 50.9 + (2.306 .101)(0. For x = 15..059.0311) = (0.53) . 529.5 ) βˆ1 = 18.8104 ± (2. n− 2 ⋅ s 2 + s 2 βˆ 0 + βˆ1 (1.18) b. 1 11(18 − 20.025.250 ).262)(16. 0.20) ± t α / 2. 1 11(18 − 20.36.6206)(.26 330. a.2912 ± (2.97. For x = 18.40.54 = (291. 20) or 0.0516) = 330.0516 .94 ± (2.262 .3. ( midpoint = 529. so the CI is 330. s = 16. For x = 20.3255) 2 βˆ 0 + βˆ 1 (18) = 330.025.628.0352 )2 = (.Chapter 12: Simple Linear Regression and Correlation 49.52 ) .69 ± 0.35 = (251.9 ± (3. 51. βˆ 0 = −8.402) = (431.745. 330.370.24 = (318.67) .9.348.58 = (313. c.26 = 2.77862227 . so the P.876) c.87349841 .94 ± 12.94 .355 )(29.2909 . n− 2 ⋅ sˆ βˆ 0 + βˆ1 ( 0. 368.2909) + = 1.0. we compute each one using confidence level 99%.262)(16. SSE = 2486.32 ± 22.7).209.7 t .6206)(1. 597.306) sˆ βˆ 0 +βˆ1 (15) = 597.94 ± 39.523) 375 .005.1049) 2 + (0.84 = (367. 296. ( ) βˆ 0 + βˆ1 (1.40 ) ± tα / 2. 2 1+ To obtain simultaneous confidence of at least 97% for the three intervals. sˆ βˆ 0 +βˆ1 (15) = 29.85.2909) + = . 40) or 0.3255 .369.6206 a. 11 3834.48 ) . x = 20. ) 529.343.40 is closer to b.  1 9(3.0 − 2. We can be highly confident that when the flow rate is increased by 1 SCCM.546 )(1.0 : 38.365 . The 95% PI is 38.365 2.35805) = 38.0 is not in the range of observed x values. the associated expected change in etch rate will be between 824 and 1296 A/min. or 3185.0.24. We wish to test H o : β1 = 0 vs H a : β 1 ≠ 0 .96 . The test statistic 10.365(2. b.546)(.156 = (36.9985 = 8.44.546 1 + +   9 58.365 2.256 ± 2.12.6026 = 10.0 − 2.655 ) . The intervals for x* = 2. and b and c are both smaller than d.256 ± 2. Nothing can be said about the relationship between b and c.Chapter 12: Simple Linear Regression and Correlation 52.006 ( 2P( t > 4. and Ho is rejected since the p-value is smaller than any reasonable α . or 3610. a.40.0 to 4041. a value of 6. No. A 95% CI for µ Y ⋅3.667 )2   d.256 ± 2.859.398 = (31.62 leads to a p-value of < . 53. ( )( ) ( ) A 95% confidence interval for β 1 : 10. Choice a will be the smallest.6026 ± 2.5 will be narrower than those above because 2. with d being largest.0 ) from the 7 df row of . 376 .50   = 38.8).5 is closer to the mean than is 3.06 ) = 38. f. a is less than b and c (obviously).2 A/min.256 ± 2.50   = 38.  1 9(3.546 +   9 58 .412) .100.5 A/min. e.256 ± 6.365(2.256 ± 2.9 to 4465. therefore predicting at that point is meaningless.667 )2   c.9985 table A. The t= data suggests that this model does specify a useful relationship between chlorine flow and etch rate. The simple linear regression model does appear to specify a useful relationship. yet y 2 ≠ y3 b. The interval is centered at βˆ 0 + βˆ 1 (75 ) = 435. A confidence interval for β 0 + β 1 (75 ) is requested.060132 . βˆ0 = 515 . and Σ xi yi = 449. s = 54.83) = (403.Chapter 12: Simple Linear Regression and Correlation 54.179 or s βˆ 1 t ≤ −2.803). 377 .523 . Reject Ho at level .9 ± (2.05 if either t ≥ t .631.12 = 2. SSE = 36. a. t = 1 . The simple linear regression model provides a sensible starting point for a formal analysis.08. There is a linear pattern in the scatter plot.241 rejected. s βˆ = = .060 = −4. n = 141.825. x 2 = x3 = 12 . from which βˆ1 = −1. a simple linear regression model does seem a reasonable way to describe the relationship between the two variables.80 r 2 = . Since − 4.179 . Σ y i = 5960.93. although the pot also shows a reasonable amount of variation about any straight line fit to the data. s 2 = 3003 . c.6. Σx i2 = 151. b. Σy i2 = 2. We calculate t = − 1. 200 .39 . 179)(14.21 βˆ H a : β 1 ≠ 0 . Thus a 95% CI is ˆ ( 75 ) 0 + β1 =s 1 n(75 − x ) 2 + = 14.179 Ho is .036 .241 H o : β 1 = 0 vs 1 51.80. 54. 55.39 ≤ −2.616 .9 .83 (using s = n nΣxi2 − (Σxi )2 435. 100 90 % damage 80 70 60 50 40 30 20 10 10 20 30 age Based on a scatterplot of the data. a. s βˆ 54. Σ xi = 1185.559.7 ) .446887 .025 .850 . 669528)(572 ) − (3.634 − (− 19 . after some algebra. SSE = 35.228 s 2 + s β2ˆ0 + βˆ1 (20) = 46. 03 ± 21. t .03 .92 .025.285 x 699 d. and S xy = 404.5 57. 378 . which is the equation of a straight line.391. y.70188. 12 8388 2 βˆ 0 + βˆ 1 (20) = 46. c.6308 . Summary values: Σx = 44.480. 284. The value of r does no depend on the unit of measure for either variable.Chapter 12: Simple Linear Regression and Correlation c. Thus. s 2 = 82.284692 . The PI is 46. the majority of people in a sample will have y values that closely follow the line y = x − 16 .500 . yields the desired expression. 56. Using these values we calculate S xx = 4.67 + 3. y = −19. for example.755.03 ± 2.022 ) = 827. the value of r would have remained the same. s βˆ 0 +βˆ1 ( 20) = 9. 58. n = 12 .67 .284692 )(14. is usually related to age by the equation y ≈ x − 16 . then the time since acquiring a license.5) + = 2. s = 9.669528 .094 . S xx S yy b. Thus. Σy 2 = 1. Section 12. 2296 βˆ1 = = 3. the value of r would have remained the same. S yy = 42.94.425 .10 = 2.355. Σy = 3. So S xy r= = . a.860 .09 = (24. 450 . 2 2 n nΣx i − (Σ xi ) ( ) which.816. In other words. (x − x )∑ ( xi − x )Yi 1 βˆ 0 + βˆ 1 x = Y − βˆ1 x + βˆ1 x = Y + ( x − x )βˆ1 = ∑ Yi + = Σd iYi 2 n nΣx i2 − (Σxi ) 1 ( x − x )(x i − x ) 2 2 2 where d i = + .67 .0188 . The value of r does not depend on which of the two variables is labeled as the x variable. βˆ 0 − 19. the minimum age for obtaining a license is 16.094 1 12(20 − 20.9233 . had we expressed RBOT time in hours instead of minutes. Σx 2 = 170.12) . Thus Var βˆ 0 + βˆ1 x = ∑ d i Var (Yi ) = σ Σ d i .615 . Most people acquire a license as soon as they become eligible.228 .67. had we let x = RBOT time and y = TOST time.572. If. Σxy = 14. 80 . t = r n−2 1− r 2 .9 2 StDe v: 6 38.50 .228 or t ≤ −2. 379 .99 Probability .8 56 Normal Probability Plot .80 .50 .05 . Normal Probability Plot .999 .667 StD ev : 62.10 = 2.58.Chapter 12: Simple Linear Regression and Correlation d.3 893 N : 12 Anders on-D arli ng N orma lity Te st A-Squ ared : 0 .01 . r = .20 .2 32 Both TOST time and ROBT time appear to have come from normally distributed populations.446 P-Val ue: 0.05 if either t ≥ t .228 .95 .197 P-Va lue: 0.95 .999 .001 2800 3800 4800 TOST: Av erage : 3717 .01 .025. Reject Ho at level .001 200 300 400 RBOT: Av era ge: 32 1. so Ho should be rejected.05 . The model is useful. e.99 Probability .22 0 N: 1 2 And erson-D arli ng N ormal ity Tes t A-Sq uared: 0 .20 . t = 7.923. H o : ρ1 = 0 vs H a : ρ ≠ 0 . 586667 .6074 − = 40.7482)2 = . 18 18 (1950)(47.7482 . so and S xy = 5530.930. We are testing r= H o : ρ = 0 vs H a : ρ > 0 .Chapter 12: Simple Linear Regression and Correlation 59. We 36. 60. t = r n−2 1− r 2 . Reject Ho at level . a.359 1 − .7482 12 = 3.966 16 1 − .720 3.9662 .05 . 380 . d.033711 S xx between the two variables. H o : ρ = 0 vs H a : ρ ≠ 0 .033711 . There appears to be a non-zero correlation in the population. so Ho should be rejected. S yy = 3.92 ) = 339.704 = . It is the same no matter which variable is the predictor.94 ≥ 2.819 . Changing the units of measurement on either (or both) variables will have no effect on the calculated value of r. the specimen with the larger shear force will tend to have a larger percent dry fiber weight. 22 = 2. the coefficient of determination. b.005. The data indicates a positive linear relationship between the two variables.01 if either t ≥ t .7482 2 reject Ho since t = 3.583 .9066 ≥ t .583 . There is a very strong positive correlation 40.32.5778. t = 3. 61.01. There is evidence that a positive correlation exists between maximum lactate level and muscular endurance.586667 r= = .966 2 . 2 ( 1950) = 251. because any change in units will affect both the numerator and denominator of r by exactly the same multiplicative constant.819 or t ≤ −2. r 2 = (.16 = 2.9066 .966 ) = . so Ho should be rejected . t = 2 t ≥ t . r2 = (. a.12 = 1.9839 2. H o : ρ = 0 vs H a : ρ > 0 .970 − 2 ( 47. t = r n−2 1− r 2 .92 − 18 339.92) = 130. r = .628. c.01 if = 14. 7377.720 . We are looking for r2 .782 . and t = .933 e. b. Reject Ho at level .5598. Because the association between the variables is positive. 652 ± 26  1. Reject Ho at level .328 381 .11) is − .645) = (− .724.915 . Σx i2 = 2.7943) − (111.71)(2. so (12. H o : ρ1 = 0 vs H a : ρ ≠ 0 .−.724. (.Chapter 12: Simple Linear Regression and Correlation 62. Reject Ho if.436 .  .71. The data does not indicate that the population correlation coefficient differs from 0.318) .7729 ) 4 1 − (. Fail to reject Ho . n = 6.573  and the desired interval for ρ is (− . Σy i2 = 1.328 d. Reject Ho at level .3290 ) .9) 2 = . so Ho cannot be rejected at any reasonable level.179 or t ≤ −2.7729 )2 = 2. Again.5730 (3756. This result may seem surprising due to the relatively large size of r (. 025. the data does not suggest that the population correlation coefficient differs from 0. r= vs (6 )(63. r= a.449 )2 b. t ≥ t . t = r n−2 1− r 2 = (.915) − (111.025. z = (− .12 = 2.976.20 so 20 percent of the observed variation in gas porosity can be accounted for by variation in hydrogen content.776 .751.96 )(465. Σy i = 2. and Σ xi yi = 63. b.6572) − (2. 4 = 2.449) 2 = 1.05 if either a.427  v = . = . c. 64.449) 12 1 − (.73)2 ⋅ (6)(1.7643. H o : ρ1 = 0 H a : ρ ≠ 0 . − 757. r 2 = .652 .49 . 63.5 ln   = −.74 . Fail to reject Ho .7729 .9) (6 )(2.549 ) 23 = −. r 2 = .179 . t= (.05 if t ≥ t . Σ xi = 111.34) (1.652 + .6572 .6423 = −. it can be attributed to a small sample size (6).−. however.9.77). Σ xi2 = 78. 2 = .4080 relationship. r2 = 0. The normal probability plot of the y’s is much straighter.00032 < α = . Σy i2 = 1959. We used Minitab to calculate the ri ’s: r1 = 0. Σ xi = 864.022 .355 . b. r = .4 . a. For all lags. This r 1 suggests only a weak linear relationship between x and y.913(2. b.96 . 66.355 . and a p-value of . Ho should be rejected at this significance level. 382 . which in turn shows an extremely weak linear relationship.142. one that would typically have little practical import.192. Solving 3.01 if t ≥ t .159. For this n. so r = = .60). It appears that the lag 2 correlation is best.025.33 ≥ 3. ri does not fall in the rejection 100 region.3880) . then the simultaneous confidence levels would be approximately (. Because p-value = . a. There is not evidence of theoretical autocorrelation at the first 3 lags. a.95)(.60 = r 498 − r 2 for r yields r = . but all of them are weak. Not necessarily. 625) does not reject the hypothesis of population normality. so Ho is rejected in favor of Ha.8796)(23.05 significance level for the simultaneous hypotheses.857. The value t = 2. b. based on the definitions given in the text.001. There does appear to be a linear . 9998 = 1.8284 ) t= = 6. so reject Ho . z = 3. c.022 implies ρ = . c. We reject Ho if ri ≥ . and r3 = 0.20 is statistically significant -.382.0.it cannot be attributed just to sampling variability in the case ρ = 0 . If the individual confidence levels were . Σ y i = 138.00032 corresponds to 67.95) = .95. the test of normality presented in section 14. so we cannot reject Ho .2 .2 (p.183.005. the test statistic t has approximately a standard normal distribution when H o : ρ1 = 0 is true. If we want an approximate .1 and 3992 Σ xi y i = 12.2 .322.60 (or –3. such a pattern is not terribly unusual when n is small. But with this n.8 = 3.Chapter 12: Simple Linear Regression and Correlation 65.20 ≥ t . H o : ρ1 = 0 will be rejected in favor of H a : ρ ≠ 0 at level . we would have to use smaller individual significance level.913 and (186. Although the normal probability plot of the x’s appears somewhat curved.95)(. t = 2. 76 and − 1538.875) = 1. Σy i = 621. and sβˆ 0 +βˆ1 ( 25 ) =s 1 n(25 − x )2 + n nΣxi2 − (Σxi )2 1 8(25 − 25. only 2%. Ho will be rejected (and the model judged useful) if either t ≥ t .1333 8.80.896.84 ± 1.543 77.173 − . βˆ 0 + βˆ 1 (25) = 77. 2 Thus the second set of x values is preferable to the first set. At level . x 5 = . b.78.206 + = . which is 1. whereas the 2 x1 = .133258 .03175 ≤ −3. ∑ ( xi − x ) = 1442. which is again preferable to the first set of 2 xi ' s .84 ± (2. For the given proposed x values xi ' s ..447 )(. n = 8 ..707 . This is the case because the predictive value of 25% is very close to the mean of our predictor sample.84. = x 8 = 50 . With just 3 observations at x = 0 and 3 at x = 50. 0 H 0 : β1 = 0 vs H 0 : β1 ≠ 0 .04 = (76. the smaller will be σˆ βˆ1 and the more accurate the 2 i estimate will tend to be. = x 4 = 0 . d. which gives βˆ1 = = −. so the 95% CI is 8 11. ∑ ( xi − x ) = 5000 .. Σ xi2 = 6799.985 .20 Σ xi y i = 15.1333 x as the equation of the estimated line. a. 383 .707 or t ≤ −3.005. and t = = = −4..Chapter 12: Simple Linear Regression and Correlation Supplementary Exercises 68.707 .426 .173051 . s = 1. 6 = 3. and y = 81.01. We wish to test c.206 / 37. ∑ (x i − x ) = 3750 . The larger the value of ∑ (x − x ) . Σ yi2 = 48. The interval is quite 2 narrow. 11.88 ) .8.206.426 ) = 77.363. SSE = − .732664.543 βˆ = 81. so we do reject Ho and find the model useful. Σ xi = 207.8 .2 .88 .1333 − . 71.Chapter 12: Simple Linear Regression and Correlation 69. r 2 = .9138 − 1 t= = −1.0013. d. If it does not.9134 = 0. The test statistic t = 3.457072 . c. 1 .403964 )x . When x = 35. yˆ = 28. and Ho will be rejected if either s βˆ 1 t ≥ t .9557 e. a.025. Σy i2 = 5731 and Σ xi yi = 5805 .613 .18.913819 . for this particular model an x value of 40 yields a g value of –9. Yes. r = + r 2 = .24 .01. therefore we reject Ho and conclude that the model is useful. With Σ xi = 243. Ho cannot .64 .0693 . r= a. r = r 2 = 0.902 = . and s βˆ = . βˆ1 = .24 is neither ≤ −2. yˆ = 326. which is an impossible value for y. the model utility test is statistically significant at the level .5.976038 − (8.93 gives p-value = .01. the given level of significance. βˆ 0 = 1. a. Σ xi2 = 5965.970 (136 )(128. First check to see if the value x = 40 falls within the range of x values used to generate the least-squares regression equation.7122 (positive because βˆ 1 is positive. this equation should not be used.201 .0693 be rejected. It is plausible that β 1 = 1 . We test test H 0 : β1 = 0 vs H 0 : β1 ≠ 0 . Σ yi = 241. b.201 nor ≥ 2.5073 = . Because –1. 16.126 . s = 2.201 . The test statistic value is t= βˆ 1 − 1 .5073 b.11 = 2. 384 . sample size = 8 70.15) b. Furthermore.201 or t ≤ −2. SSE = 75.) c. which is < . 228 . t .0436(400) = 17. it does not appear that ∆NO y is accurately predicted. 72. c. The value of s with the value included is 2. e. 17 17(41. yˆ (50) = .131(.787218 + . So we have sufficient evidence to claim that ∆CO is a good predictor of ∆NO y . Regression Plot 30 Noy: 20 10 0 0 100 200 300 400 500 600 700 CO: The above analysis was created in Minitab.60)2 2 s yˆ ( 5 0) = . While the large ∆CO value appears to be “near” the least squares regression line. The least squares line that is obtained when excluding the value is yˆ = 1. the value has extremely high leverage.2143 .275299 ) .787218 + . yˆ (30) = .72. Since this interval is so wide.1.80 − 1.953.0143 = −. then.020308.Chapter 12: Simple Linear Regression and Correlation d.165718 .025.00 + .051422 ) = 1. s = “Root MSE” = . so 1 17 (50 − 42.131 .109581 = (1.0143.051422 .024. A simple linear regression model seems to fit the data well.20308 1. The interval is . a. 385 . We use a 95% CI for µ Y ⋅50 . yˆ = −.220 + .575) − (719.056137.007570(50 ) = 1.33) + = .15 = 2. and with the value excluded is 1.0436 x . So the large ∆CO value does appear to effect our analysis in a substantial way. The corresponding p-value is extremely small.0346 x . The model utility test obtained from Minitab produces a t test statistic equal to 12. The least squares regression equation is yˆ = −. A 95% prediction interval produced by Minitab is (11.503).007570(30) = 1. The residual is y − yˆ = .165718 ± 2. The r2 value with the value included is 96% and is reduced to 75% when the value is excluded.220 + .96. 22. b.165718 ± . 4608. βˆ 0 = 14.93 Σ xi yi = 2348. 3.10 vs.0805)x . and 1638 the equation yˆ = 14. βˆ 0 = −20. We wish to test H 0 : β 1 = −.190392 .44 ). and s βˆ 1 .984 for either regression. as xi ' s . Even though the sample size for the proposed x values is larger. r = .43 is not 1 . 386 .7 = −1. giving βˆ1 = = −. . (t . Σ xi2 = 7946.0342 182 ≤ −1. 74. 3. Ho will be rejected at level βˆ − (− .1489 + 1 = .60.365)(.4046 .025 . Σ yi2 = 982.05 if t = 1 ≤ −t .Chapter 12: Simple Linear Regression and Correlation 73.2254)x .3877 ) = .0342 . 44.40 + (12. a. yˆ = −20. yˆ = 1.2932 and − 243. H 0 : β1 < −.10.10 (the alternative contradicts prior belief). do not reject Ho .2254 . 05.42 . n = 9. 7 )(s ) 1 9(28 − 25.02.43 .5979 βˆ1 = = .15 .2943 c.6939 . Σx i2 = 5958. Because –1.4608 − . β 1 is the expected increase in load associated with a one-day age increase (so a negative value of β 1 corresponds to a decrease). Σ xi = 306.5979 βˆ1 = = 12. Thus t = = −1.713 b. Σy i = 93.02 ± . s = . d.4608)(. βˆ 0 = 1. a.19 − (. so r2 = . so ∑ ( xi contrasted with 182 for the given 9 − x ) = 7946 − 2 (306)2 12 = 143 here.4862. the original set of values is preferable. Σ xi = 228.33) + = (2.1489 )x . With SSE = 1.42 = (9. so the 95% CI is 10.148919 .10) . s βˆ = c.0805 . b.992.895 .76.69 + (. and 9 1638 2 βˆ 0 + βˆ 1 (28) = 10.895 . Σ xi = 890. so SST 93.044649 )x . and r 2 = 1 − SSE = .983 .182.448 .08259 − (. 387 1 56.044649 )(19.044649 vs H 0 : β1 ≠ 0 is t = = 19.001 . s = .002237 ) = .68 − .621 ± 2.5..621 . so with s = . Σ xi2 = 4334. βˆ 0 = −.7702 . t . so .5141 .12 = 2. 76. β 0 + β1 (20) is .365 )(.621 ± 2.28.262453. A 95% CI for g.810 ± (2.22 = (3.026146.0394. the CI is 3.0902 .002237)(.4028 and 299.7702 = −.0499 ) .1479 )(.179(.002237 p − value < 2(.84).04464854 . SSE = 7.005291 = (.578. and the 6717.810 ± . Σ yi2 = 7. SSE = . .4028 − (7. Σ xi = 1797.06112.762.044649 ± .1) = .” yˆ 4 = −. 7 = 5.002237 . a. A 95% CI for s βˆ = β 1 is .026146. = 1.06112 = . n = 14. The plot suggests a strong linear relationship between x and y.52 + 12 148.408 .8. Σx i2 = 67.41.08259 − (. From Table A.858) Substituting x* = 0 gives the CI 12. d.3333356 ) = .4 .683 .365 )(. c.0005. x = 63.Chapter 12: Simple Linear Regression and Correlation 75. SST = 7.8% of the observed variation is “explained.96 .n −2 ⋅ s + n nΣ xi2 − (Σx i )2 βˆ 0 = 3. and f.179(.5714. e.977935 = .179 .1479) = 3. From Example βˆ 0 ± t α / 2. n = 9. 0005) = .1479..044649 ± (2. so βˆ1 = = .931 Σ xi y i = 178. t .40.3.6 equation of the estimated line is yˆ = −.08259353 . b.048 = (.62 ± . and y 4 − yˆ 4 = .28) 2 9 = . so the value of t for testing H 0 : β1 = 0 1 746.6815) = 3.025. There is strong evidence for a useful relationship.4028 − (− 601281) − 7. Σy i = 7. 1 nx 2 . 4721 above. note that 2 sy sx s yy = s xx ( since the factor n-1 appears in both the numerator and denominator. sy sx s xy s xx (x − x ) = y + s yy s xx ⋅ s xy s xx s yy (x − x ) ⋅ r ⋅ ( x − x ) .’s above. 79.Chapter 12: Simple Linear Regression and Correlation 77.d. Hence. and the resulting value of the correlation coefficient would decrease.573 s. ) The value of the sample correlation coefficient using the squared y values would not necessarily be approximately 1. s yy = 2 ∑(y i − y ) . (above. Thus y = βˆ 0 + βˆ1 x = y + βˆ1 ( x − x ) = y + = y+ b.3143) an amount 2. With s xx = ∑ ( x i − x ) . as desired. If the y values are greater than 1. so cancels). 388 . as desired. By . a. Substituting βˆ 0 = . since r < 0) or (since s y = 4. SSE becomes n Σy Σy − βˆ1Σx (Σy )2 + βˆ1ΣxΣy − βˆ Σxy SSE = Σ y 2 − − βˆ1Σxy = Σy 2 − 1 n n n  2 (Σy )2  ΣxΣ y  ˆ  = Σy − = S yy − βˆ1S xy . the relationship between x and y 2 would be less like a straight line.  − β1 Σ xy −  n  n    ( 78. Σ y − βˆ1Σx SSE = Σy 2 − βˆ 0 Σy − βˆ1Σxy . then the squared y values would differ from each other by more than the y values differ from one another. SST = s yy and SSE = s yy − βˆ 1s xy Using the notation of the exercise above. A Scatter Plot suggests the linear model is appropriate.g. . 98. 99. s xx s yy s xx = ∑ ( x i − x ) ). s xy given in the text. n−2 s xx Thus the t statistic for H o : βˆ1 = 0 is (s xy / s xx ) ⋅ s xx βˆ1 t= = (s yy − s xy2 / s xx )/ (n − 2) s / ∑ ( xi − x )2 βˆ1 = = 81. r = With s xy s xy (where e.0 removal% a. so 1 − = 1− s xx SST r n−2 1− r 2 s 2xy s yy − s xx s yy = s xy2 s xx s yy = r 2 . Also. s xy2 = 10 15 temp 389 . s xy ⋅ n − 2 (s 2 xx s yy − s xy s= ) = (s xy / s xx s yy ) n−2 1 − s xy2 / s xx s yy SSE = s yy − .0 5 as desired. 82. as desired. and 2 SSE 2 and SSE = Σ yi − βˆ 0 Σ yi − βˆ1Σ xi y i = s yy − βˆ1s xy .Chapter 12: Simple Linear Regression and Correlation 80.5 98. 0294.000 R-Sq(adj) = 78.7224 3.74 P 0.000 0.5 + 0.40 P 0.41.1552 StDev 0.090079 ) f. The slope of the regression line is steeper.17 10.061303.4 e.6%.000 Minitab will output all the residual information if the option is chosen.0889 0. 390 . The value of s is almost doubled. and the value of R2 drops to 61.075691 ± 2.0241 F 115.4% T 1096.Chapter 12: Simple Linear Regression and Correlation b. using t .4986 0. the observed value y = 98.2933 .1 d.025.7% Analysis of Variance Source Regression Residual Error Total DF 1 30 31 SS 2.075691 S = 0.0757 temp Predictor Constant temp Coef 97.007046 R-Sq = 79.042 : .042(. so the residual = .. 30 = 2.7786 0. Roughly . c.7786 0.007046) = (. from which you can find the point prediction value yˆ 10. 5 = 98. A 95% CI for β1 . Minitab Output: The regression equation is removal% = 97.5010 MS 2. R2 = 79. 599 and 5. with a p value of . blood glucose level 7 6 5 4 0 10 20 30 40 50 60 time A linear model is reasonable.000 0.353 MS 11. A test of model utility indicates that time is a significant predictor of blood glucose level. with 95% confidence.833%. A point estimate for blood glucose level when time = 30 minutes is 4. We would expect the average blood glucose level at 30 minutes to be between 4. we can do a t test for the difference of two means based on paired data.Chapter 12: Simple Linear Regression and Correlation 83.638 6. a.0). which suggests that the average bf% reading for the two methods is not the same.17.000 The coefficient of determination of 63.54. Using Minitab. (t = 6.70 + 0. 391 .067.006137 R-Sq = 63.6965 0.4% indicates that only a moderate percentage of the variation in y can be explained by the change in x. Minitab’s paired t test for equality of means gives t = 3.000 R-Sq(adj) = 61.638 0.12 P 0. The Minitab output follows: The regression equation is blood glucose level = 3. 84.002.5525 StDev 0.7% Analysis of Variance Source Regression Residual Error Total DF 1 22 23 SS 11.2159 0.17 P 0. we create a scatterplot to see if a linear regression model is appropriate. Using the techniques from a previous chapter.716 18.037895 S = 0. p = 0.305 F 38.4% T 17.0379 time Predictor Constant time Coef 3. although it appears that the variance in y gets larger as x increases.12 6. 0821 σˆ 2 = = 1.87 MS 252. n = 8.5 . 20 HW 15 10 2 7 12 17 BOD The least squares linear regression equation.2. Σ yi = 472.0 .1333 + .Chapter 12: Simple Linear Regression and Correlation b. DF 1 18 19 SS 252.82 . and the coefficient of determination shows that about 75% of the variation in HW can be explained by the variation in BOD.875 .1333 .215 0. Σ xi2 = 3625 .000 n = 19 .94 P 0. and SSx1 = 1442. We see that we do have significance.286 − . and Σ xi yi = 9749. σˆ = 1. Using linear regression to predict HW from BOD POD seems reasonable after looking at the scatterplot.833 − .26827 . SSx2 = 1020.98 82.743 BOD Predictor Constant BOD Coef 4.94 7. SSE2 = 3.10 = 2.79 + 0.1003 R-Sq = 75.60 F 54. giving γˆ1 = estimated slope − 503 = = −. so . It is plausible that β 1 = γ 1 .228 .733 + 3.89 335.228 and –1.025.0448 Ho is not rejected.14 . = 392 . 6125 For boiler #1. Σ xi = 125 . Σ yi2 = 37.41 P 0.875 + 10201.140. can be found in the Minitab output below.228 nor ≤ −2.0512 = −1.146 StDev 1.3% T 3. For the second boiler. γˆ 0 = 80.833 . The regression equation is HW = 4.001 0. as well as the test statistic and p value for a model utility test.0821224 .377551 .095 14421.095 .98 4.788 0.7432 S = 2. and t = 10 1. t . Thus 8.000 R-Sq(adj) = 73. βˆ1 = −. SSE1 = 8.14 is neither ≥ 2. below.9% Analysis of Variance Source Regression Residual Error Total 85.733 . A straight-line regression function is a reasonable choice for a model.83 for i = 1.94.d.37. 3. giving standard deviations 7. 8. 8. 2. the observation (50. so s. and no observation appears unusual. The deviation from the estimated line is likely to be much smaller for the observation made in the experiment of b for x = 50 than for the experiment of a when x = 25. 8.83.32 for i = 1. Y). Y) is more likely to fall close to the least squares line than is (25. Now x = 20 and ∑ (x − x ) = 1250 . 0 -1 100 150 filtration rate 393 200 . b. 2. c. the variance of ε is reasonably constant. x = 15 and ∑ (x − x) 2 j 1 (x i − 15 )2 ˆ = 250 . 5. of Yi − Yi is 10 1 − − = 5 250 6. 3. The pattern gives no cause for questioning the appropriateness of the simple linear regression model. 3. 8. and the ε are normally distributed. a.87.94. 2. 5. 8.1 1.49. This plot indicates there are no outliers. 2 i 8. 4.37.32. and 2. That is.CHAPTER 13 Section 13. 1 residuals a. and 6. 4. 640975 0.4427 1 + 1 (x i − 140 .886 .8295 . which are close to s.64802 0.64218 0.895) + 20 18.65.16776 -1.648257 0.2307 -1.643881 0.565003 0.648002 0. We need Sxx = 2 ( 2817 .1562 -0.15021 0.85 − 2 20 = 18. This plot looks very much the same as the one in part a.644053 0.914.886 . All of the e / e*i ’s range between .6175 0.9 ) ∑ (xi − x ) = 415.640683 0.39034 0.584858 0.73943 e / e*i 0.631795 ei* ˜ e / s.646461 0.647182 0.578669 0.Chapter 13: Nonlinear and Multiple Regression b.09872 -1.65644 -2. 2 standardized residuals c. then e / e*i ˜ s .34881 -0.621857 e / e*i 0.643706 0.50205 0. Then each ei ei* can be calculated as follows: e * = i .09062 1.019 0.96313 0.8295 .633428 0.30593 0. 1 0 -1 -2 100 150 filtration rate 394 200 .57 and .4791 1.15998 Notice that if standardized residuals 0.79038 1.642113 0.647714 0. The table 2 below shows the values: standardized residuals -0.31064 -0.82185 -0.614697 0. 39. -. The standardized residuals (in order corresponding to increasing x) are -. -. Based on this value.14) is extreme. . so the standardized residuals appear to have a normal distribution. The (x. 1. -3. (7. -1. 1. and (285.Chapter 13: Nonlinear and Multiple Regression 4.93.12.051). residual) pairs for the plot are (0. -.341). (142. The residual plot shows a curve in the data.39.508).719).700). A standardized residual plot shows the same pattern as the residual plot discussed in the previous exercise.645.645. The plot follows. -. . b. (133. 97. b.19. -. it appears that a linear model is reasonable.262). .79. a.7% of the variation in ice thickness can be explained by the linear relationship between it and elapsed time. (190. One observation (5. 2 residual 1 0 -1 -2 -3 0 1 2 3 elapsed time 395 4 5 6 .04. (218. . . -. -. .335).46. -.50. (237. .142). Normal Probability Plot for the Standardized Residuals std resid 1 0 -1 -2 -2 -1 0 1 2 percentile 5. a. and -1. (114. The points follow a linear pattern. The z percentiles for the normal probability plot are –1.68.13. -1. 1.04.75. -.679). . so perhaps a non-linear relationship exists. 1.80. .68.5.13. . (17.50. -1.90.592). The plot shows substantial evidence of curvature. . 4 = 2. 3.025.97 . The test statistic is t = βˆ1 .776 .11.91.841 is close to 2 std deviations away from the expected value of 0. s ˆ β t= s 7. and –1.265 = = = . we reject Ho and conclude that the model . and –9. Similarly.0) = 1008.5983 2 = −1.97 ≥ 2. 4. a linear regression function may be inappropriate. .0) = 1051.841. H a : β 1 ≠ 0 .265 1 + + 6 165.0 − 14.0 ) = 1046 − 1051.49 .624. Since 10. H o : β1 = 0 vs.19268(7. and we will reject Ho if s βˆ 1 t ≥ t.208. No values are exceptionally large.565 .776 or if t ≤ −2. The plot of e* vs x follows : 1 SRES1 c. but the e* of –1. 0 -1 -2 10 15 20 x This plot gives the same information as the previous plot.776 . from which the residual is y − yˆ (7. 396 . .49 . and S xx 12.19268 = 10 .587. a. 1.123.58.49 1 (7.48) 7.Chapter 13: Nonlinear and Multiple Regression 6. b.565 is useful.074 . 7.38.73. the other residuals are -.869 1 6. yˆ (7.14 + 6. The standardized residuals are calculated as e1* = − 5. and similarly the others are -. The plot of the residuals vs x follows: RE SI1 10 0 -10 10 15 20 x Because a curved pattern appears.49 = −5. 05 8 62 73. a. The y' ˆ s . b.25 6 86 86.68 4 119 100.3 9.0 1.0 19.7 -.7 . e’s.Chapter 13: Nonlinear and Multiple Regression 7. and e*’s are given below: x y yˆ e e* 0 110 126.55 2 123 113.4 -1.4 -11.06 397 . which suggests that a simple linear model will not provide a good fit.6 -1.6 -16.7 -. 120 dry weight 110 100 90 80 70 60 0 1 2 3 4 5 6 7 8 exposure time There is an obvious curved pattern in the scatter plot. Chapter 13: Nonlinear and Multiple Regression First.52 heart rate response.8% Analysis of Variance Source Regression Residual Error Total DF 1 14 15 SS 2946. since its x value is a distance from the others. 70 oxygen uptake 8.795 0.6580 StDev 9.000 A quick look at the t and p values shows that the model is useful. and get the line: oxygen uptake = .51.4 + 1. We could run the regression again. which is quite linear.69 P 0.2 3470.000 0.000 R-Sq(adj) = 83. we will look at a scatter plot of the data. 398 . without this value.66 x Predictor Constant x S = 6.44.5 524. 72) has large influence.4 F 78. so it seems reasonable to use linear regression.87 P 0.119 Coef -51.24 8.7 MS 2946.1869 R-Sq = 84. and r2 shows a strong relationship between the two variables. The observation (72.8 + 1.9% T -5. 60 50 40 30 20 40 50 60 70 heart rate response The linear regression output (Minitab) follows: The regression equation is y = .5 37.355 1. Both a scatter plot and residual plot ( based on the simple linear regression model) for the first data set suggest that a simple linear regression model is reasonable. 399 .. However. so this point is extremely influential (and its x value does lie far from the remaining ones).Chapter 13: Nonlinear and Multiple Regression 9. with no pattern or influential data points which would indicate that the model should be modified. For data set #3. For data set #4 it is clear that the slope of the least squares line has been determined entirely by the outlier. Scatter Plot for Data Set #1 Scatter Plot for Data Set #2 11 9 10 8 9 7 y y 8 7 6 6 5 5 4 4 3 4 9 14 4 9 x Scatter Plot for Data Set #4 Scatter Plot for Data Set #3 13 13 12 12 11 11 10 y 10 y 14 x 9 9 8 8 7 7 6 6 5 5 4 9 10 14 15 20 x x For data set #2.+ -. and a residual plot would reflect this pattern and suggest a careful look at the chosen model... the relationship is perfectly linear except one outlier. The signs of the residuals here (corresponding to increasing x) are + + + + . scatter plots for the other three data sets reveal difficulties. which has obviously greatly influenced the fit even though its x value is not unusually large or small.. a quadratic function would clearly provide a much better fit. This sum differs too much from 0 to be explained by rounding. which sum to -.55. 400 . the residuals cannot be independent. then al least one other eI would have to be negative to preserve Σ ei = 0 . If one eI is large positive.05. c.25.73. The five ei* ' s from Exercise 7 above are –1. but the first term in brackets is the numerator of βˆ . ( i b. and –1. a. so the difference becomes (numerator of βˆ 1 ) – (numerator of βˆ 1 ) = 0. so Σe = Σ( y − y ) − βˆ Σ(x − x ) = 0 + βˆ ⋅ 0 = 0 . (Σxi )(Σyi )  ˆ  2 (Σxi )2   ˆ ( ) Σx i ei = Σ xi y i − Σxi y − β1Σ xi xi − x = Σ xi y i −  − β 1  Σxi − n  n     . . 1. d. There is clearly a linear relationship between the residuals.06.68. This suggests a negative correlation between residuals (for fixed values of any n – 2. In general it is not true that Σ e*i = 0 . -.Chapter 13: Nonlinear and Multiple Regression 10. Since i 1 i 1 Σ ei = 0 always. while the second term is the 1 denominator of βˆ 1 . ) ei = y i − βˆ 0 − βˆ 1 xi = yi − y − βˆ1 ( xi − x ) . the other two obey a negative linear relationship). Thus Var Yi − Yˆi = ΣVar (c jY j ) (since the Yj ’s are independent) = σ 2 Σc 2j which. Each x i ei is positive (since x i and e i have the same sign) so Σ xi ei > 0 . Yi − Yˆi = Yi − Y − βˆ 1 ( xi − x ) = Yi − ∑ Y j − n j (x i − x ) Σj (x j − x )Y j Σ(x j − x ) 2 j =∑ c j Y j . which is not = 0. 401 . ( xi − x ) grows larger. c. ) 12. so  (x i − x )2  . b. which contradicts the result of exercise 10c. ( ) ( ) σ 2 = Var (Yi ) = Var (Yˆi + Yi − Yˆi ) = Var (Yˆi ) + Var Yi − Yˆi . 1 a. x i moves further from x .Chapter 13: Nonlinear and Multiple Regression 11.2).2). Σ ei = 34 . which is exactly 2 2 21 ˆ ˆ Var Yi − Yi = σ − Var (Yi ) = σ − σ +  n nΣ (x j − x )2    ( ) (13. after some algebra. but Var Yi − Yˆi decreases (since 2 As ( ( xi − x )2 has a negative sign). so these cannot be the residuals for the given x values. gives equation (13. a. j 1 (x i − x )2 1 ( xi − x )(x j − x ) where c j = 1 − − for j = i and c = 1 − − for j 2 n nΣ(x j − x )2 n Σ(x j − x ) ( ) j ≠ i . b. so these cannot be the residuals. so Var (Yˆi ) increases (since ( xi − x )2 has a positive sign in Var (Yˆi ) ). f. The distribution of any particular standardized residual is also a t distribution with n – 2 d.730 . With the ith standardized residual as a random variable.1 . 2. = 149.0 ) + 3(149. y 4.02 .01. Σ y = 288.50. so [ ] SSPE = 288. With 2 2 2 2 Σ xi = 4480 .50)) = ) ( ) P Ei* ≥ 2.50 + P Ei* ≤ −2. 402 . Σ yi = 1923 . With c – 2 = 2 and n 2880 4361 – c = 10. y 2.0 . 23 = 2..01 = . This formal test procedure does not suggest that a linear model is inappropriate.5 .013 (as above). since ei* is obtained by taking standard normal variable (Y − Yˆ ) and substituting the (σ ) i i Yi −Yˆ estimate of σ in the denominator (exactly as in the predicted value case).0 . n 3 = n 4 = 4 . ΣΣy ij2 = 288.5) + 4(107. when n = 25 d. = 202. space a. 2 i 2 i Σ xi y i = 544.30 . Σ x = 1.013 .50 = .30 is not ≥ 4. = 110.013 − 3(202. n1 = n 2 = 3 (3 observations at 110 and 3 at 230). 14.10 = 4.05 .10 .0) + 4(110. 2 10 1440 so the computed value of F is = 3.Chapter 13: Nonlinear and Multiple Regression 13.0) = 4361 . y1.1 and reject Ho .f. F.500 . The scatter plot clearly reveals a curved pattern which suggests that a nonlinear model would be more reasonable and provide a better fit than a linear model. Since 3. = 107. and ( Ei* denoting Ei* has a t distribution with 23 t . b. SSE = 7241 so SSLF = 7241-4361=2880.10 . we do not 436. MSLF = = 1440 and SSPE = = 436.0 .733. so P( Ei* outside (-2.01 + .50 . 2 . y 3. 1188. The computer generated 95% P. The probabilistic model is then y = αx β ⋅ ε .I.06. The linear pattern in b above would indicate that a transformed regression using the natural log of both x and y would be appropriate.996. 6.8712).6384 − 1. A linear model would not be appropriate. Scatter Plot of ln(Y) vs ln(X) 3 ln(y) 2 1 0 2 3 4 ln(x) c. Scatter Plot of Y vs X 15 y 10 5 0 0 10 20 30 40 50 60 x The points have a definite curved pattern.04920 ln( x ) .996 is (1. (The power function with an error term!) d.Chapter 13: Nonlinear and Multiple Regression Section 13. a. A regression of ln(y) on ln(x) yields the equation ln( y) = 4.I.1. for ln(y) when ln(x) = 2.50) . 403 . Using Minitab we can get a P.8712 = (3. b.2 15. e1. We must now take the antilog to return to the original units of Y: (e 1. In this plot we have a strong linear pattern. 1188 ) . for y when x = 20 by first transforming the x value: ln(20) = 2. 5917 ) .5 1.6917 . and 1 + 1 (. 5 2 No rmal Sco re 4 5 6 7 8 Observati on Nu mbe r Hi stogram of Residual s Residuals vs.5 2.163 .0 -0. A computer generated residual analysis: Residual Model Diagnostics I Chart of Re siduals Normal Plot of Residuals 3 2 3.0SL= -2 . (all from computer printout.927. where y ′i = ln (L178 ) ).6917 (again from computer output).Chapter 13: Nonlinear and Multiple Regression e.5 1.438051363 ×1011 . The patterns in the residual plot and the normal probability plot (upper left) are marginally acceptable.10 . We first predict y ′ using the linear model and then exponentiate: y ′ = 20.I.95 − x ) + 2 = 1. so yˆ = Lˆ178 = e 25. fits (bottom right). We first compute a prediction interval for the transformed data and then exponentiate.5496)(1. There are only two positive standardized residuals. a. 5 0.0976 . for y is then (e 25. 404 .6667(.6667 and αˆ = e β 0 = 968. Thus 0 1 ˆ βˆ = βˆ1 = 6.1. is a bit large.6917 + 6. but two others are essentially 0. c.025.28. 0SL= 2.5 .4334 = (25.228 .956 -3 -2 -1. s = . Σ yi′ 2 = 288. b.0251 ± (2.11 .082 . Σ xi2 = 8. Σ yi′ = 313.4585 ) .5 0.820 2 Resid ual Re sidua l 1 0 -1 1 0 X= -0.0251 ± 1.6667 and βˆ = 20.72 .4585 . one standardized residual.0 0. from which βˆ = 6.5 .6917 = 1.082 ) = 27. corresponding to the third observation.013 .5946. 06771 -1 -2 -3 .0 1. With t . prediction interval for The P.0 1 1. Fits 3 2 2 1 Residua l Freq uen cy 3 1 0 -1 0 -2 -1.228)(.1. 0 0.0 0 1 Resid ual 2 3 Fit Looking at the residual vs. Σ xi y ′i = 255.5917. Σ xi = 9. e 28.0 -0. the 12 Σx − (Σ x )2 / 12 2 y′ is 27.75) = 25.10 = 2. 16. 228 . Minitab output follows: transform The regression equation is y = 0.002633 0.002257.00011943 0. A 95 % prediction interval for y 5000 is 405 (.106 .78 P 0.11 ≤ −3.468 and αˆ = e = .106 .572 . d.51718 . s = . in addition.7% Analysis of Variance Source Regression Residual Error Total DF 1 17 18 SS 0. or that β = 1. from which βˆ1 = 1. b.109 . so only 49.001 R-Sq(adj) = 46. based on a t test.07 is not ≤ −t .30 and . c. A point estimate for y when x =5000 is y = .7. The claim that A scatter plot may point us in the direction of a power function. Σ yi′2 = 16.254 −.Chapter 13: Nonlinear and Multiple Regression 17. . so 1. The plots give strong support to this choice of model. so y = α + β ln( x) .501 .0775 RR − t .019709 -0. 1 − 1.07 .626 .00128 ln( x ) .009906. a.10 P 0. with a p value of .5 is equivalent to α ⋅ 5 β = 2α (2.00024046 MS 0. Ho cannot be .00000712 F 16.0197 − .0197 .0197 − . Σ yi′ = 13.0003126 R-Sq = 49.017555).796 .001.1024.00128 x Predictor Constant x Coef 0. SSE = .001 The model is useful. This transformation yields a linear regression equation y = . Σ x′i y ′i = 18.960 for the transformed data.00012103 0.254 and βˆ 0 = −. Since –1. We x ′ = ln( x) . we need x ′ = ln( 5000) = 8. H a : β1 ≠ 1 .000 0. 18. r2 = .468 so βˆ = βˆ1 = 1.01 since − 4 .0775 rejected in favor of Ha.005.33 = −1.0775.352 .002668 StDev 0.05. Σ x′i = 15.5) β . µ Y ⋅5 = 2 µ Y ⋅ 2.0. Ho is rejected at level . so we try y = αx β . Σ xi′2 = 20 .00011943 0.11536 (computer printout). and the estimated sd of t= βˆ 1 is .33 Thus we wish test H o : β 1 = 1 vs.25 − 1. To estimate y 5000 .7% of the variation in y can be explained by its relationship with ln(x).00128 x ′ or y = .0012805 S = 0.49 -4.30 ≤ −3. With t = = −4.7% T 7.11 = −1. But r2 = 49. Y ′ = β 0 + β 1 ( x′ ) + ε where x ′ = 1 and y ′ = ln( lifetime) .082273 . 21.785 for the transformed data.01. and the estimated 1485 regression function is y = 18.00445) = 6. I felt that the power model Y = αx β ⋅ ε ( y ′ = ln( y ) vs.00037813 . .2045 (values read x ′ = . SSE = 1. Σ yi = 1865. Comparing this to F. No. it is clear that Ho cannot 1. Σ x′i 2 = .36594 / 15 be rejected. there was no indication of any influential observations.17 .1391 − .44695 .01 . there is definite curvature in the plot. Σ xi′ = 4058 . = 8.00445 so yˆ ′ = −10 . y ′2. e ′i * < 1. from which βˆ1 = 3735. temp x′ gives a plot which has a pronounced linear appearance (and in fact r2 = .50 .7748 and thus yˆ = e yˆ ′ = 875. = 5. a.954 for the straight line fit).02993 / 1 f = = .Chapter 13: Nonlinear and Multiple Regression 19. x.57295 .88 . The summary x 2 2 quantities are Σ x′i = 159. x ′ = ln( x )) provided the bet fit.1.33 . 500 406 . Plotting y′ vs.68 . The suggested model is Y b. After examining a scatter plot and a residual plot for each of the five suggested models as well as for y vs. Σ yi′ 2 = 879. from which βˆ1 = −. y1′. SSLF = . = 6.36594.2 .65 for every i.4485(.1391 . 20.1485 and βˆ 0 = 18. c. Σ xi′ = .02993. Σ x′i y i = 2281. Σ yi′ = 123. Σ yi = 121.50 . x = β 0 + β 1 ( x′ ) + ε where x ′ = a. d. b.4485 and βˆ 0 = −10. from computer output).32891 . The transformation seemed to remove most of the curvature from the scatter plot.8 . x = 500 ⇒ yˆ = 18. the residual plot appeared quite random. y ′3.1391 − 1485 = 15. from which SSPE = 1. 10 4 .83157 . and r2 = .64 .6 . Σ xi′ y ′i = . and n1 = n 2 = n3 = 6.15 = 8. For the transformed data.2045 + 3735.39857. With x = 220. y ′ = α + βx . Since the p-value is greater than any sensible choice of alpha we do not reject Ho . y 1  1  1 − 1 = e α + βx . The probabilistic model is Y ′ = α α +βx + βx + ε ′ . d. There is insufficient evidence to claim that age has a significant impact on the presence of kyphosis. H 0 : β1 = 0 vs H a : β 1 ≠ 0 .73. y ′ = α + βx . 407 . This function cannot be linearized. α + βx 1+ e ⋅ε ln ( y ) = eα + βx = ln (ln ( y )) = α + βx . Y = e e ⋅ ε where ε′ ε =e . Thus with y ′ = ln  − 1 . or equivalently 1 ε′ Y= where ε = e . y ′ = α + βx . If β > 0 . It is important to realize 2 that a scatter plot of data generated from this model will not spread out uniformly about the exponential regression function throughout the range of x values. so ln  −1 = α + βx .Chapter 13: Nonlinear and Multiple Regression 22. 23. this is an increasing function of x so we expect more spread in y for large x than for small x. with a corresponding p-value of . [ ] Var (Y ) = Var (αe β x ⋅ ε ) = αe βx ⋅ Var (ε ) = α 2 e 2 βx ⋅τ 2 where we have set Var (ε ) = τ 2 . 1 1 = α + βx . The corresponding probabilistic model y y 1 is = α + βx + ε . or equivalently. while the situation is reversed if β < 0 . y y  y  The corresponding probabilistic model is Y ′ = α + βx + ε ′ . the spread will only be uniform on the transformed scale. The value of the test statistic is z = . a. Thus with y ′ = ln (ln ( y )). 24. c. Similar results hold for the multiplicative power model. so with y ′ = . b.463. 06) = 491. so we reject Ho . t .g.71.025.561. 408 . with a corresponding pvalue of .70 and its corresponding p-value of .3 = 5. which can be rewritten as 491.05. H 0 : β 2 = 0 vs H a : β 2 ≠ 0 .38 . We desire R2 .3 26.1 0. a ˆ one unit increase in x).81.841(12. b. We want a 99% confidence interval..182 491. The test statistic is t = -3.36) for the odds ratio given in the printout.3 0. a.48). when the amount of experience increases by one year (i. which is < . The z value of 2.I. d. The test statistic is MSR f = = 22.9 0.2 0.06 . t .8 0. .6 0.10 ± 38.194.032. the quadratic term appears to be useful in this model.10 ± 5. s yˆ ⋅14 = e.7 p(x) 0. A graph of πˆ appears below.Chapter 13: Nonlinear and Multiple Regression 25.10 ± 70. we estimate that the odds ratio increase by about 1. 529.10.007 imply that the null hypothesis H 0 : β1 = 0 can be rejected at any of the usual significance levels (e.194 .05. Since the p-value < .65.51 .841 . This is consistent with the confidence interval ( 1.005.0 0 10 20 30 experience Section 13.8% c. Therefore. so the 99% C.025. That is . It could be consistent with a quadratic regression.38 = 12. so 38. 3 = 3. and the corresponding p-value is .182 .17772 ≈ 1.4 0. since this interval does not contain the value 1. so the estimate of the odds ratio is e β1 = e .016.45 = (420.05. .17772 . .55) . which means that experience does appear to affect the likelihood of successfully performing the task. which we find in the output: R2 = 93.5 0. The point estimate of β 1 is βˆ1 = . Now.e. There is a slight curve to this scatter plot. 0. MSE we reject Ho and conclude that the model is useful.01). H 0 : β1 = β 2 = 0 vs H a : at least one β i ≠ 0 . is 3. but the output gives us a 95% confidence interval of (452. 1.05. there is clear evidence that β 1 is not zero. . 49 .7679(6) = 52.91 1.Chapter 13: Nonlinear and Multiple Regression 27. 586.20 . Otherwise.12.482 − 15. c.77 = . 2 SST = Σ y i2 − (Σy i ) 2 n = 586.66 -.04 -. 75 70 y 65 60 55 50 1 2 3 4 5 6 7 8 x b.88.89 1. d. the standardized residual plot does not exhibit any troublesome features.53 -.16 .88 = . residual = y 6 − yˆ 6 = 53 − 52.90 .875(6) + 1.16 . 2).895 . yˆ = 84. A scatter plot of the data indicated a quadratic regression model might be appropriate.53 (continued) 409 . For the Normal Probability Plot: Residual Zth percentile -1. but they are both within the interval (-2. a.88 The first two residuals are the largest. so R 2 = 1 − 61.49 .88 .25 -.95 -1.89 -.58 . 28.841 or if t ≤ −5. 2 yˆ = βˆ 0 + βˆ1 (60) + βˆ 2 (60) = 24.780) = 217. R2 =1− e.43 − (− 113.22 ) . 5 = 2.88 ± (2.82 = .61 . n− 3 = t . s = 8.69 ) = 52.35 and 12. SSE = 61. f.90) = 52.002) − (− .85. 2 2 1 residual std resid 1 0 -1 0 -1 -2 -2 1 2 3 4 5 6 7 8 -1 x e. is 52.90 .88 (from b) and t .0937 + 3.779 987.005. we reject .01780(75) = 39.77 so s2 = a.91) .93 .419.88 .0937)(210.025.I.82 s2 = = = 72.52 n−3 3 217. SSE = Σyi2 − βˆ 0 Σy i − βˆ1Σx i yi − βˆ 2 Σx i2 y i = 8386.62. and since − 7.35 + (1.88 ± (2. Ho will be rejected in favor of Ha if either computed value of t is t= t ≥ t. 61. so the C.571)(3.3 = 5.41 b.88 ± 10. 0 1 z %ile µˆ Y ⋅ 6 = 52. c. SSE 217.0178)(1.571 .54. 2 2 µˆ Y ⋅75 = βˆ 0 + βˆ1 (75) + βˆ 2 (75) = −113.00226 Ho .841.88 ± 4.571)(1.70) − (3.69) = 3. 410 . The − .03 = (42.34 = (48.01780 = −7.57. The P. is 5 52.35 d.77 2 = 12.82 .88 ≤ −5.025.Chapter 13: Nonlinear and Multiple Regression The normal probability plot does not exhibit any troublesome features.3684)(17.I.841.36684(75) − . 303 or if − 1.I.01) = 114.69 . f. To obtain joint confidence of at least 95%.84 ± (6. + (3.35.34 4.99) ( an extremely wide interval). From computer output: Thus b.89 2.2 = 6. is 8.18.08.06 58.71 . 2630 H 0 : β 2 = 0 will be rejected in favor of H a : β 2 ≠ 0 if either t ≥ t. is 114.37 = 51. and for = (− 5. Ho cannot be rejected.19 .71 94. 2 = 2.05.961.06 ± (6.920)(5..303 . For β1 the C. so R 2 = 1 − 103.71 ± 14. we compute a 98% C.480 strongly for the inclusion of the quadratic term.31)2 = 103.480) t.84 t ≤ −4.965)(.129.66 114. the value of x which maximizes βˆ 0 + βˆ1 x + βˆ 2 x 2 would be obtained by setting the derivative of this to 0 and solving: β1 + 2β 2 x = 0 ⇒ x = − β1 βˆ . 2 = 4.29 -8.01.69 y − yˆ : -1. a. yˆ : 111.I.63 = (100.71 ± (2.Chapter 13: Nonlinear and Multiple Regression 29.06 3.01) = (− 19. so the C. The estimate of this is x = − 1 = 2.83 . is − 1.19.50) .I. c.37 = .I. βˆ1 .. for each coefficient using t. β 2 the C. With t = = −3. If we knew βˆ 0 .920 and βˆ 0 + 4βˆ 1 + 16βˆ 2 = 114. the data does not argue . s 2 = SST = Σy − 2 i (Σy i ) 2 n = 2630 .025.34) .87. 2β2 2βˆ 2 411 . e.89)2 + .965)(4. d. s = 7. 2 103. βˆ 2 .965 .89 120.1.31 SSE = (− 1.37 . 3. 1402. s 2 = MSE = 2. which suggests that the model fits the data quite well.201)(41.3% of the variation in wheat yield is accounted for by the model.814 20.Chapter 13: Nonlinear and Multiple Regression 30.4x .72x2.82.327 1.718 When x = 2.201) (136.15 ± (2.15 1.040. This means 85. Using Minitab.0 ) a.01. H 0 : µ y ⋅2.−43. yˆ = 1402.5 < 1500 .786 -.97) = (− 227. and R2 = 94.853. a. using Minitab.173 1.317 1. The quadratic model thus explains 94.587 .83 53.5 = 1500.402.1725. H a : µ y⋅2. R2 = 0. for example. RR : t ≤ −t. b. The data does not indicate µ y⋅2.327 29. 20) has had a great influence on the fit – it is the point which forced the fitted quadratic to have a maximum between 3 and 4 rather than. y − yˆ : -.7%. − 135. From Minitab output.5) 2 = (1081.5)2 + (53.7% of the variation in the observed y’s .44 ± (2. continuing to curve slowly upward to a maximum someplace to the right of x = 6. d.1. the predicted and residual values are: yˆ : 23. b.06) c.5 is less than 1500. c.5.587 31. Again.814 31.6 + 11.11 = −2.5 Fail to reject Ho .15 − 1500 t= = −1.327 23.186 Residuals Versus theFitted Values (response i sy) 2 34 32 1 28 y Residual 30 0 26 24 -1 22 20 -2 20 22 24 26 28 30 32 1 2 3 4 5 6 x Fitted Value The residual plot is consistent with a quadratic model (no pattern which would suggest modification). the regression equation is y = 13.317 . 412 . but it is clear from the scatter plot that the point (6.814 31.914 31. 3968. .28.3968( x − x ) .132)(1. substituting into a yields yˆ = .3968( x − 4.196. From a.059 standardized residuals are then computed as -. (Note: Minitab regression output can produce these values.31 .81 ± (2.233 as the estimated std dev’s of the residuals. and –1.4590 the inclusion of the cubic term is not justified. 4 = 2.) The resulting residual plot is virtually identical to the plot of b. Var (Y f ) + Var (Yˆ f ) is estimated by 2.3968 = −. e. .638 = 1. to x2 both from 2 3 2 c. 1.6430 .624 .624 ) = (28. which is not significant ( H 0 : β 3 = 0 cannot be rejected).040 + (.49.132 .638 .28. a.3456) and from − 2 .955)2 = 1.229 ≠ −.05. is 31.3964(x − x ) − 2. b. (and similarly) 1.35. so 2. Expanding these and adding yields 33.10. There sill be a contribution 2. (and similarly for all points) 10.426 yield the correct standardized residuals.327 = −. -1.76.6430 as the coefficient of x2 .777 )2 = 2. none of which are unusually large. With yˆ = 31.27 ) . y − yˆ − . x = 4. 32. 1.1544 .97 . 1. This gives 2. t= − 2.I.196.3456) .327 = = −.2933(x − x ) + 2. so s y f + yˆ f = 2.35. 1. the coefficient of x3 is -2. The − . and . 1. 413 .81 and t . the desired P.31 .040 − (.3968 . ( ) ( ) σ 2 = Var (Yˆi ) + Var Yi − Yˆi suggests that we can estimate Var Yi − Yˆi by s − s and then take the square root to obtain the estimated standard deviation of each 2 2 yˆ residual. so βˆ = 33.59.3463 − 1.Chapter 13: Nonlinear and Multiple Regression d.3964( x − 4.196. 1. so 2 3 βˆ 3 = −2. so standardizing using just s would not s 1.1949 .059.16.236.5 ⇒ x ′ = x − x = . d. Chapter 13: Nonlinear and Multiple Regression 33. a. x − 20 . For x = 20, x ′ = 0 , and 10.8012 yˆ = βˆ 0∗ = .9671 . For x = 25, x ′ = .4629 , so x = 20 and s x = 10.8012 so x ′ = yˆ = .9671 − .0502(.4629 ) − .0176(.4629) + .0062(.4629) = .9407 . 2 3  x − 20   x − 20   x − 20  yˆ = .9671 − .0502  − .0176  + .0062   10.8012   10.8012   10.8012  .00000492 x 3 − .000446058x 2 + .007290688 x + .96034944 . 2 b. c. 3 .0062 = 2.00 . We reject Ho if either t ≥ t .025, n− 4 = t . 025, 3 = 3.182 or if .0031 t ≤ −3.182 . Since 2.00 is neither ≥ 3.182 nor ≤ −3.182 , we cannot reject Ho ; the t= cubic term should be deleted. d. SSE = Σ ( y i − yˆ i ) and the yˆ i ' s are the same from the standardized as from the unstandardized model, so SSE, SST, and R2 will be identical for the two models. e. Σ yi2 = 6.355538 , Σ yi = 6.664 , so SST = .011410. For the quadratic model R2 = .987 and for the cubic mo del, R2 = .994; The two R2 values are very close, suggesting intuitively that the cubic term is relatively unimportant. 34. a. b. c. d. x = 49.9231 and s x = 41.3652 so for x = 50, x ′ = x − 49.9231 = .001859 and 41.3652 2 ˆ ( ) ( µY ⋅50 = .8733 − .3255 .001859 + .0448 .001859 ) = .873 . SST = 1.456923 and SSE = .117521, so R2 = .919.  x − 49 .9231   x − 49 .9231  .8733 − .3255   + .0448    41.3652   41 .3652  1.200887 − .01048314 x + .00002618 x 2 . 2 βˆ ∗ 1 βˆ 2 = 22 so the estimated sd of βˆ 2 is the estimated sd of βˆ 2∗ multiplied by : sx sx 1   sβˆ = (.0319 )  = .00077118 . 2  41 .3652  e. t= .0448 = 1.40 which is not significant (compared to ± t .025, 9 at level .05), so the .0319 quadratic term should not be retained. 414 Chapter 13: Nonlinear and Multiple Regression 35. Y ′ = ln( Y ) = ln α + βx + γ x 2 + ln (ε ) = β 0 + β1 x + β 2 x 2 + ε ′ where ε ′ = ln (ε ) , β 0 = ln (α ) , β1 = β , and β 2 = γ . That is, we should fit a quadratic to ( x, ln ( y )) . The 2.00397 + .1799 x − .0022 x 2 , so = 7.6883 . (The ln(y)’s are 3.6136, 4.2499, resulting estimated quadratic (from computer output) is βˆ = .1799, γˆ = −.0022 , and αˆ = e 2.0397 4.6977, 5.1773, and 5.4189, and the summary quantities can then be computed as before.) Section 13.4 36. a. Holding age, time, and heart rate constant, maximum oxygen uptake will increase by .01 L/min for each 1 kg increase in weight. Similarly, holding weight, age, and heart rate constant, the maximum oxygen uptake decreases by .13 L/min with every 1 minute increase in the time necessary to walk 1 mile. b. yˆ 76, 20,12,140 = 5.0 + .01(76) − .05(20) − .13(12) − .01(140) = 1.8 L/min. c. yˆ = 1.8 from b, and σ = .4 , so, assuming y follows a normal distribution, 2.6 − 1.8   1.00 − 1.8 P(1.00 < Y < 2.60) = P <Z <  = P(− 2.0 < Z < 2.0) = .9544 .4 .4   37. a. The mean value of y when x1 = 50 and x2 = 3 is b. When the number of deliveries (x2 ) is held fixed, then average change in travel time associated with a one-mile (i.e. one unit) increase in distance traveled (x1 ) is .060 hours. Similarly, when distance traveled (x1 ) is held fixed, then the average change in travel time associated with on extra delivery (i.e., a one unit increase in x2 ) is .900 hours. c. Under the assumption that y follows a normal distribution, the mean and standard deviation of this distribution are 4.9 (because x1 = 50 and x2 = 3) and σ = .5 ( since the standard deviation is assumed to be constant regardless of the values of x1 and x2 ). µ y ⋅50,3 = −.800 + .060(50 ) + .900(3) = 4.9 hours. Therefore 6 − 4 .9   P ( y ≤ 6) = P  z ≤  = P( z ≤ 2.20 ) = .9861 . That is, in the long .5   run, about 98.6% of all days will result in a travel time of at most 6 hours. 415 Chapter 13: Nonlinear and Multiple Regression 38. = 125 + 7.75(40 ) + .0950(1100 ) − .009(40)(1100) = 143.50 a. mean life b. First, the mean life when x1 = 30 is equal to 125 + 7.75(30) + .0950 x2 − .009(30)x 2 = 357.50 − .175x 2 . So when the load increases by 1, the mean life decreases by .175. Second, the mean life when x1 =40 is ( ) ( ) equal to 125 + 7.75 40 + .0950 x 2 − .009 40 x 2 load increases by 1, the mean life decreases by .265. = 435 − .265 x 2 . So when the 39. a. For x1 = 2, x2 = 8 (remember the units of x2 are in 1000,s) and x3 = 1 (since the outlet has a drive-up window) the average sales are yˆ = 10.00 − 1.2 2 + 6.8 8 + 15.3 1 = 77.3 (i.e., $77,300 ). ( ) () () b. For x1 = 3, x2 = 5, and x3 = 0 the average sales are c. When the number of competing outlets (x1 ) and the number of people within a 1-mile radius (x2 ) remain fixed, the sales will increase by $15,300 when an outlet has a drive-up window. 40. yˆ = 10.00 − 1.2(3) + 6.8(5) + 15.3(0 ) = 40.4 (i.e., $40,400 ). a. µˆ Y ⋅10,. 5, 50,100 = 1.52 + .02(10 ) − 1.40(.5) + .02(50 ) − .0006(100 ) = 1.96 b. µˆ Y ⋅ 20,. 5, 50, 30 = 1.52 + .02(20) − 1.40(.5) + .02(50) − .0006(30) = 1.40 c. βˆ 4 = −.0006; 100 βˆ 4 = −.06 . d. There are no interaction predictors – e.g., x 5 = x1 x 4 -- in the model. There would be dependence if interaction predictors involving x4 had been included. e. R2 = 1− 20.0 = .490 . For testing H 0 : β1 = β 2 = β 3 = β 4 = 0 vs. Ha: at least 39.2 R one among rejected if 2 β 1 ,..., β 4 is not zero, the test statistic is F = (1− R 2 ) f ≥ F.05, 4, 25 = 2.76 . f = .490 .510 4 k (n− k −1) . Ho will be = 6.0 . Because 6.0 ≥ 2.76 , Ho is 25 rejected and the model is judged useful (this even though the value of R2 is not all that impressive). 416 Chapter 13: Nonlinear and Multiple Regression 41. H 0 : β 1 = β 2 = ... = β 6 = 0 vs. Ha: at least one among β 1 ,..., β 6 is not zero. The test R statistic is f = . 83 k (n− k −1) . Ho will be rejected if f ≥ F.05, 6, 30 = 2.42 . = 24.41 . Because 24.41 ≥ 2.42 , Ho is rejected and the model is judged 6 (1−.83) 2 F = (1− R 2 ) 30 useful. 42. a. H 0 : β1 = β 2 = 0 vs. H a : at least one β i ≠ 0 , the test statistic is MSR f = = 319.31 (from output). The associated p-value is 0, so at any reasonable MSE To test level of significance, Ho should be rejected. There does appear to be a useful linear relationship between temperature difference and at leas one of the two predictors. b. The degrees of freedom for SSE = n – (k + 1) = 9 – (2 – 1) = 6 (which you could simply read in the DF column of the printout), and t .025, 6 = 2.447 , so the desired confidence ( )( ) ( ) interval is 3.000 ± 2.447 .4321 = 3.000 ± 1.0573 , or about 1.943,4.057 . Holding furnace temperature fixed, we estimate that the average change in temperature difference on the die surface will be somewhere between 1.943 and 4.057. c. When x1 = 1300 and x2 = 7, the estimated average temperature difference is d. From the printout, s = 1.058, so the prediction interval is yˆ = −199.56 + .2100 x1 + 3.000 x 2 = −199.56 + .2100(1300 ) + 3.000(7) = 94.44 . The desired confidence interval is then 94.44 ± (2.447 )(.353) = 94.44 ± .864 , or (93.58,95.30) . 94.44 ± (2.447 ) (1.058)2 + (.353) 2 = 94.44 ± 2.729 = (91.71,97.17 ) . 417 Chapter 13: Nonlinear and Multiple Regression 43. a. b. x1 = 2.6, x2 = 250, and x1 x2 = (2.6)(250) = 650, so yˆ = 185.49 − 45.97(2.6 ) − 0.3015(250 ) + 0.0888(650) = 48.313 No, it is not legitimate to interpret β 1 in this way. It is not possible to increase by 1 unit the cobalt content, x1 , while keeping the interaction predictor, x3 , fixed. When x1 changes, so does x3 , since x3 = x1 x2 . c. Yes, there appears to be a useful linear relationship between y and the predictors. We determine this by observing that the p-value corresponding to the model utility test is < .0001 (F test statistic = 18.924). d. We wish to test H 0 : β 3 = 0 vs. H a : β 3 ≠ 0 . The test statistic is t=3.496, with a corresponding p-value of .0030. Since the p-value is < alpha = .01, we reject Ho and conclude that the interaction predictor does provide useful information about y. e. A 95% C.I. for the mean value of surface area under the stated circumstances requires the following quantities: yˆ = 185.49 − 45.97 2 − 0.3015 500 + 0.0888 2 500 = 31.598 . Next, ( ) ( ) ( )( ) t .025,16 = 2.120 , so the 95% confidence interval is 31.598 ± (2.120)(4.69 ) = 31.598 ± 9.9428 = (21.6552,41.5408 ) 44. a. Holding starch damage constant, for every 1% increase in flour protein, the absorption rate will increase by 1.44%. Similarly, holding flour protein percentage constant, the absorption rate will increase by .336% for every 1-unit increase in starch damage. b. R2 = .96447, so 96.447% of the observed variation in absorption can be explained by the model relationship. c. To answer the question, we test H 0 : β1 = β 2 = 0 vs H a : at least one β i ≠ 0 . The test statistic is f = 339.31092 , and has a corresponding p-value of zero, so at any significance level we will reject Ho . There is a useful relationship between absorption and at least one of the two predictor variables. d. We would be testing H a : β 2 ≠ 0 . We could calculate the test statistic t = β2 , or we sβ2 could look at the 95% C.I. given in the output. Since the interval (.29828, 37298) does not contain the value 0, we can reject Ho and conclude that ‘starch damage’ should not be removed from the model. e. The 95% C.I. is The 95% P.I. is 42.253 ± (2.060 )(.350) = 42.253 ± 0.721 = (41.532,42.974 ) . ( ) 42.253 ± (2.060 ) 1.09412 2 + .350 2 = 42.253 ± 2.366 = (39.887,44.619 ) . 418 Chapter 13: Nonlinear and Multiple Regression f. We test H a : β 3 ≠ 0 , with t = − .04304 = −2.428 . The p-value is approximately .01773 2(.012) = .024. At significance level .01 we do not reject Ho . The interaction term should not be retained. 45. a. The appropriate hypotheses are H 0 : β1 = β 2 = β 3 = β 4 = 0 vs. H a : at least one R2 β i ≠ 0 . The test statistic is f = (1−R ) = k 2 (n −k −1 ) . 946 4 (1−.946) = 87 .6 ≥ 7.10 = F.001, 4 ,20 (the 20 smallest available significance level from Table A.9), so we can reject Ho at any significance level. We conclude that at least one of the four predictor variables appears to provide useful information about tenacity. b. The adjusted R2 value is 1 − = 1− c. n − 1  SSE  n −1 2 1− R   =1− n − (k + 1)  SST  n − (k + 1) ( ) 24 (1 − .946 ) = .935 , which does not differ much from R2 = .946. 20 The estimated average tenacity when x1 = 16.5, x2 = 50, x3 = 3, and x4 = 5 is yˆ = 6 .121 − . 082 x + .113 x + .256 x − .219 x yˆ = 6.121 − .082 (16 .5 ) + .113 (50 ) + .256 (3) − .219 (5 ) = 10 .091 . For a 99% C.I., t.005 , 20 = 2.845 , so the interval is 10. 091 ± 2. 845(. 350 ) = (9. 095,11. 087 ) . Therefore, when the four predictors are as specified in this problem, the true average tenacity is estimated to be between 9.095 and 11.087. 46. a. Yes, there does appear to be a useful linear relationship between repair time and the two model predictors. We determine this by conducting a model utility test: H 0 : β1 = β 2 = 0 vs. H a : at least one β i ≠ 0 . We reject Ho if f ≥ F. 05, 2 ,9 = 4 .26 . The calculated statistic is f = SSR SSE k (n − k −1) = MSR = MSE 10 .63 2 (20. 9 ) 9 = 5.315 = 22.91 . Since .232 22 .91 ≥ 4 .26 , we reject Ho and conclude that at least one of the two predictor variables is useful. b. We will reject H 0 : β 2 = 0 in favor of H a : β 2 ≠ 0 if t ≥ t .005,9 = 3. 25 . The test statistic is t = 1.250 = 4. 01 which is ≥ 3 .25 , so we reject Ho and conclude that the “type of .312 repair” variable does provide useful information about repair time, given that the “elapsed time since the last service” variable remains in the model. 419 Chapter 13: Nonlinear and Multiple Regression c. d. A 95% confidence interval for β 3 is: 1.250 ± (2.262 )(.312 ) = (.5443 ,1.9557 ) . We estimate, with a high degree of confidence, that when an electrical repair is required the repair time will be between .54 and 1.96 hours longer than when a mechanical repair is required, while the “elapsed time” predictor remains fixed. yˆ = .950 + .400 (6 ) + 1 .250 (1) = 4 .6 , s 2 = MSE = .23222 , and t.005 , 9 = 3.25 , so the 99% P.I. is 4 .6 ± (3 .25 ) (.23222 ) + (.192 ) = 4.6 ± 1 .69 = (2. 91,6.29 ) The prediction interval is quite wide, suggesting a variable estimate for repair time under these conditions. 2 47. a. For a 1% increase in the percentage plastics, we would expect a 28.9 kcal/kg increase in energy content. Also, for a 1% increase in the moisture, we would expect a 37.4 kcal/kg decrease in energy content. b. The appropriate hypotheses are H 0 : β1 = β 2 = β 3 = β 4 = 0 vs. H a : at least one β i ≠ 0 . The value of the F test statistic is 167.71, with a corresponding p-value that is extremely small. So, we reject Ho and conclude that at least one of the four predictors is useful in predicting energy content, using a linear model. c. H 0 : β 3 = 0 vs. H a : β 3 ≠ 0 . The value of the t test statistic is t = 2.24, with a corresponding p-value of .034, which is less than the significance level of .05. So we can reject Ho and conclude that percentage garbage provides useful information about energy consumption, given that the other three predictors remain in the model. d. yˆ = 2244.9 + 28.925(20) + 7.644(25 ) + 4.297(40 ) − 37.354(45 ) = 1505.5 , and t .025, 25 = 2.060 . (Note an error in the text: s yˆ = 12.47 , not 7.46). So a 95% C.I for the true average energy content under these circumstances is 1505.5 ± (2.060 )(12.47 ) = 1505.5 ± 25.69 = (1479.8,1531.1) . Because the interval is reasonably narrow, we would conclude that the mean energy content has been precisely estimated. e. A 95% prediction interval for the energy content of a waste sample having the specified 1505.5 ± (2.060 ) (31.48 )2 + (12.47 )2 = 1505.5 ± 69.75 = (1435.7,1575.2 ) . characteristics is 420 Chapter 13: Nonlinear and Multiple Regression 48. a. H 0 : β1 = β 2 = ... = β 9 = 0 H a : at least one β i ≠ 0 RR: f ≥ F.01, 9 , 5 = 10.16 R 2 f = (1− R 2 ) k (n− k −1) = . 938 9 (1−.938) ( 5) = 8.41 Fail to reject Ho . The model does not appear to specify a useful relationship. b. µˆ y = 21.967 , t α / 2,n −( k +1) = t .025, 5 = 2.571 , so the C.I. is 21.967 ± (2.571)(1.248 ) = (18.76, 25.18) . c. s2 = SSE 23.379 = = 4.6758 , and the C.I. is n − ( k + 1) 5 21.967 ± (2.571) 4.6758 + (1.248) 2 = (15.55, 28.39) . d. SSEk = 23.379, SSEl = 203.82, H 0 : β 4 = β 5 = ... = β 9 = 0 H a : at least one of the above β i ≠ 0 RR: f ≥ Fα ,k −l , n −( k +1 ) = F. 05, 6 , 5 = 4.95 f = (203 .82 −23 .379 ) ( 23.379 ) (9 − 3 ) ( 5) = 6.43 . Reject Ho . At least one of the second order predictors appears useful. 49. a. b. µˆ y⋅18.9, 43 = 96.8303; Residual = 91 – 96.8303 = -5.8303. H 0 : β1 = β 2 = 0 ; H a : at least one β i ≠ 0 RR: f ≥ F.05, 2 ,9 = 8.02 R 2 f = (1− R 2 ) k (n− k −1) = .768 = 14.90 . Reject Ho . The model appears useful. 2 (1−.768) 9 c. 96.8303 ± (2.262 )(8.20 ) = (78.28,115.38) d. 96.8303 ± (2.262 ) 24.45 2 + 8.20 2 = (38.50,155.16 ) 421 Chapter 13: Nonlinear and Multiple Regression e. We find the center of the given 95% interval, 93.875, and half of the width, 57.845. This latter value is equal to t .025, 9 ( s yˆ ) = 2.262( s yˆ ) , so s yˆ = 25.5725 . Then the 90% interval is f. 93.785 ± (1.833)(25.5725) = (46.911,140.659 ) With the p-value for H a : β1 ≠ 0 being 0.208 (from given output), we would fail to reject Ho . This factor is not significant given x2 is in the model. g. With Rk2 = .768 (full model) and Rl2 = .721 (reduced model), we can use an alternative f statistic (compare formulas 13.19 and 13.20). n=12, k=2 and l=1, we have F= .768 − .721 (1−.768 ) 9 = F= R 2k − Rl2 (1− Rk2 ) k −l . With n − ( k +1) .047 = 1.83 . .0257 t = (−1.36) = 1.85 . The discrepancy can be attributed to rounding error. 2 2 50. a. Here k = 5, n – (k+1) = 6, so Ho will be rejected in favor of Ha at level .05 if either t ≥ t .025, 6 = 2.447 or t ≤ −2.447 . The computed value of t is t = .557 = .59 , so .94 Ho cannot be rejected and inclusion of x1 x2 as a carrier in the model is not justified. b. c. No, in the presence of the other four carriers, any particular carrier is relatively unimportant, but this is not equivalent to the statement that all carriers are unimportant. ( ) SSEk = SST 1 − R 2 = 3224.65, so f = not (5384.18 −3224.65 ) (3224.65 ) 3 = 1.34 , and since 1.34 is 6 ≥ F. 05, 3,6 = 4.76 , Ho cannot be rejected; the data does not argue for the inclusion of any second order terms. 51. a. No, there is no pattern in the plots which would indicate that a transformation or the inclusion of other terms in the model would produce a substantially better fit. b. k = 5, n – (k+1) = 8, so H 0 : β 1 = ... = β 5 = 0 will be rejected if f ≥ F.05,5, 8 = 3.69; f = (.759 ) (. 241) 5 = 5.04 ≥ 3.69 , so we reject Ho . At least one of the 8 coefficients is not equal to zero. 422 6279. yˆ = 84.I. 53. 8 = 2. the model is useful.88 ) 3 = 3.306 . The complete 2nd order model obviously provides a better fit.050 = 85.44 is not 8 ≥ 4. and since 3..000.67 + .1x1 + .5. With SSEk = 8. The single y corresponding to these x i values is 85.88.01785) = (. and x ′4 = 15 x 4 + 160 . x ′3 = x 3 + 2.71 and the standardized residual is e* = .31.390 .046.98 with a p-value of 0. a.. d. 88 ) (390.046. so H 0 : β 3 = β 4 = β 5 = 0 will be rejected if f ≥ 4. x1 = x 2 = x 3 = x 4 = +1 . + . 15 423 . Is x1 a significant predictor of y while holding x2 constant? A test of H 0 : β1 = 0 vs the two-tailed alternative gives us a t = 6.0 and x2 = 33.304. Thus.4 − 85.07 . x ′ + 160 x 2 = 10 x ′2 − 3. sd of residual = e/e* = 6. so y − yˆ = 85.b. and solution for testing x2 as a predictor yields a similar conclusion: With a p-value of 0.650 − . and x 4 = 4 yields the uncoded function.1 the residual is e = 2.: this is not a modification which would be suggested by a residual plot.1x2 + .010 . For b. Ho cannot be rejected and the quadratic terms should all be deleted. we would accept this predictor as significant if our significance level were anything larger than 0.258 + . F.07 .44 . A 95% CI for y when x1 =x2 =30 and x3 =10 is .025.3.4. A similar question.915 .29 and t ..39. so the estimated sd is 3.3..05. with a p-value of 0. (n.Chapter 13: Nonlinear and Multiple Regression c. and f = (894... 54. s 2 = 390.120(.16) = 10. 52.390 = . Thus the estimated 2 2 Yˆ is (6. the desired C.67.95 −390. so this predictor is significant.000. x1′ = . Substitution of x1 = 10 x1′ − 3. Since yˆ = 24. Letting x1′ .3. When x1 = 8.16.29 ± 2. x ′4 denote the uncoded variables. x 3 = x ′3 − 2.44.306(3. a. since e* = e/(sd of the residual). 8 = 4.07 . x ′2 = ...5. so there is a need to account for interaction between the three predictors. b.91) . is variance of 24.99) − (6.66573 ± 2.7036 ) Some possible questions might be: Is this model useful in predicting deposition of poly-aromatic hydrocarbons? A test of model utility gives us an F = 84.304 ) = (16. 755 = (1.05. from which yˆ = 2. giving f = (4 . β1 = β .9450 .28 . β 2 = γ and ε ′ = ln( ε ) . so Ho is not rejected and prior belief is not contradicted by the data.5652 .27 . . 0. so to fit this model it is necessary to take the natural log of each Q value (and not transform a or b) before using multiple regression analysis. x 2 ) = (ln (Q). β 0 = ln( α ). Ho cannot be rejected.. ln( b) ) (take the natural log of the values of each variable) and do a multiple linear regression.49 . x 2 = ln( b ). βˆ1 = . b. For a = 10 and b = . 0 = 85. Thus we transform to ( y. x1 . 0.217 ) . Section 13. . 0.9845) 10 = 2.. Since 2.01. Again taking the natural log.5 55. t = = 7. x1 = ln(10) = 2 .28 is not ≥ 2.10.706 .0 if 85.9845) (1 . c. Thus H 0 : β 5 = .8146−1 . For the full model k = 14 and for the reduced model l – 4. and βˆ 2 = . while n – (k + 1) = 16. 0.01) = -4. ln( a). We simply exponentiate each endpoint: 424 (e . a.78) . so all 16 higher order terms should be deleted.Chapter 13: Nonlinear and Multiple Regression c. 0 < 85.49 . H 0 : µ Y ⋅0. ( ) SSE = 1 − R 2 SST so SSEk = 1.3026 and x2 = ln(.6052.9845 and SSEl = 4.16 = 2.5.5548 .706 .5548 − 85 t ≤ −t . = β14 = 0 will be rejected if f ≥ F. ln( Q) = Y = ln( α ) + β ln( a) + γ ln( b) + ln( ε ) = β 0 + β 1 x1 + β 2 x 2 + ε ′ where x1 = ln( a).0 will be rejected in favor of H a : µ Y ⋅0.1815 . With µˆ = βˆ 0 = 85. e 1.8146 .9053 2.9053 and Qˆ = e = 18. d. 26 = −1. Y = ln( Q ) = ln( α ) + βa + γ b + ln( ε ) .19 .24.05.0772 which is certainly not ≤ −1. A computer analysis gave βˆ 0 = 1. so 5. is .532 ± . 425 . so f = (..004334 . = 15 SSEk = (. and SSEl = (. From a.000845 .532 and s 2 + s βˆ0 + βˆ 3 x′3 + βˆ5 x ′5 = .01.447 yˆ = . so βˆ 3 for the  5. 002261) (.0196610) = . t .489.05.0236 ± 2.110(.0046) = (− .I. For the full model. 5.32 is not ≥ 3.69 .2196 ..0236 )(− ..02058) = .0333. adjusted model. g. is f. 769 ) (.17 = 2.025.447 .575) . if f ≥ F.195 = −.231) 5 = 9.3.110 )(.14 = 3..004542 .0236 unstandardized model = = −.540 x − 89.4447   3.110 (error df = n – (k+1) = 20 – (2+1) = 17 for the two carrier model).  x − 52. R2 R2 = (19)(.Chapter 13: Nonlinear and Multiple Regression 56..6660 yˆ = .5255 − .004542) 3 = 2.043 = (. x 3 − 52. Wood specific gravity appears to be 14 linearly related to at lest one of the five carriers. x ′3 = e. n = 20. k = 5.0097 5  .346 )(.769 ) − 4 = .687 .532 ± (2.4665) + (.34 and 2.195  y = .006803 .−. a. − .707 . the adjusted c.14 = 4..0097 )(. while for the reduced 14 (19)(.5386 . so H 0 : β 1 = .0139) .0046 unstandardized βˆ 3 is = = −.. b. = β 5 = 0 will be rejected in favor of H a : at least one among β1 . so Ho is rejected.6660  − .I. β 5 ≠ 0 . so the P.69 . The estimated sd of the 5.540   x − 89. Since 14 F.5255 − (.34 . With f = (.0236 3  + .02058 . 5 . n − (k + 1) = 14 . so the desired C.231)(.32 .32 ≥ 4. we conclude that β1 = β 2 = β 4 = 0 .4665 and x ′5 = 5 = .4447 3.2196 ) = . d.0196610) = .769 ) − 5 = . including the 5 variable model. associated with variable x2 (summerwood fiber %).2 4 . 58. The t ratio for x4 .9824 4 Where s 2 = 5.12. Both t-ratios have magnitudes that exceed 2. was the smallest in absolute magnitude and 1. -1. R2 is as large as any of the models. one might consider the model with k = 3 which excludes the summerwood fiber and springwood % variables. k R2 Adj. 60. At step #1 (in which the model with all 4 predictors was fit).83 was the t ratio smallest in absolute magnitude.12 < 2.2 2 . When the model with predictors x1 and x2 only was fit.76). The model with 4 variables including all but the summerwood fiber variable would seem bests. No. the smallest t-ratio is t = .676 . The corresponding predictor x3 was then dropped from the model. As a second choice. Backwards Stepping: Step 1: A model with all 5 variables is fit. whose magnitude is smaller than 2.00. so the predictor x4 was deleted. Therefore.976 3. Since t = . and a model with predictors x1 . and x4 was fit. so both variables are kept and the backwards stepping procedure stops at this step. t = . The final model identified by the backwards stepping method is the one containing x3 and x5 . Step 2: A model with all variables except x2 was fit. so no further deletion is necessary.53 < 2.647 138. x4 is the next variable to be eliminated. 59. both t ratios considerably exceeded 2 in absolute value. the variable x2 was eliminated. Clearly the model with k = 2 is recommended on all counts.53. b. Variable x4 (springwood light absorption) has the smallest t-ratio (t = -1. (continued) 426 . R2 adjusted is at its maximum and CP is at its minimum .979 . Forward selection would let x4 enter first and would not delete it at the next stage. x2 .975 2.9825 a.Chapter 13: Nonlinear and Multiple Regression 57. Step 3: A model with variables x3 and x5 is fit. The choice of a “best” model seems reasonably clear–cut.7 3 .9819 . R2 Ck = SSE k + 2(k + 1) − n s2 1 . so no more variables are added. but removing #67 should be considered as well.516298. Both the forwards and backwards stepping methods arrived at the same final model. Note.8. so the results of these tests are not shown. There are cases when the different stepwise methods will arrive at slightly different collections of predictor variables. so βˆ − βˆ (2 ) =  1   . 41. 15.6.920 ) . x5 is the next variable to enter the mo del. n 10 (Note: Formulas involving matrix algebra appear in the first edition.1156   − . Therefore.0974 . {x3 . {x3 . Step 2: All 4 two-variable models that include x3 were fit.180  and similar 1 − .030    427 . This often happens. If multicollinearity were present. a. since h 44 > . None of the t-ratios for the added variables have absolute values that exceed 2.6. x3 was the first variable to enter the model. 63. Because this t-ratio exceeds 2.127         . For data point #2. x5 } were all fit. at least one of the four R2 values would be very close to 1. Looking at the h ii column and using 2 (k + 1) 8 = = . observations 14.712933. correspond to response (y) values 22. and 48. With h ii values of .  . three observations n 19 appear to have large influence. which is not the case. 62. 16. x ′(2 ) = (1 3. data point #4 would appear to have large influence. Step 3: (not printed): All possible tree-variable models involving x3 and x5 and another predictor. the t-ratio 2.12 (for variable x5 ) was largest in absolute value. the model with x3 had the t-ratio with the largest magnitude (t = -4.766 −1  ( X ′X )  3. That is.040  .453  = −1.3032   − . but not always.333       − .453 − 4. {x3 .920   . we conclude that multicollinearity is not a problem in this data. We would need to investigate further the impact these two observations have on the equation. Removing observation #7 is reasonable. the models {x3 . x2 }. x5 }. before regressing again. Of all 4 models.82). and . Because the absolute value of this t-ratio exceeds 2.421 as the criteria. There is no need to print anything in this case.6. x4 }.1644  =  − .8. 2(k + 1) 6 = = .) b. 64.302  − 4.Chapter 13: Nonlinear and Multiple Regression Forward Stepping: Step 1: After fitting all 5 one-variable models.106    calculations yield βˆ − βˆ ( 4) =  − . in this problem.513214. x1 }. {x3 . 61. . the actual influence does not appear to be great. a. none of the changes is all that substantial (the largest is 1. 428 557) . generated by Minitab: Two sample T for ppv prism qu cracked not cracke N 12 18 Mean 827 483 95% CI for mu (cracked StDev 295 234 SE Mean 85 55 ) .Chapter 13: Nonlinear and Multiple Regression c. Comparing the changes in the βˆ i ' s to the s βˆi ' s . Thus although h 44 is large. Boxplots of ppv by prism quality (means are indicated by solid circles) ppv 1200 700 200 cracked not cracked prism qualilty A two-sample t confidence interval. Supplementary Exercises 65.2sd’s for the change in βˆ 1 when point #2 is deleted). indicating a potential high influence of point #4 on the fit.mu (not cracke): ( 132. .00000295 R-Sq = 57.000 0. 98% of the 2 observed variation in steady-state permeation flux can be explained by its relationship to inverse foil thickness.004892 StDev 0.00 -0. Also.000 Residual -0. c.Chapter 13: Nonlinear and Multiple Regression b.00002393 Fit 0. 6 = 2. a 95% Prediction interval for Y(45) is 11. To test model usefulness.18 -6.577.11.253) 2 = 11. but we have an extreme observation. For every 1 cm-1 increase in inverse foil thickness (x).722 µA / cm 2 . a.5 can be found in the Observation 3 row of the Minitab output: yˆ = 5. The simple linear regression results in a significant model.000018 ppv Predictor Coef Constant 1.585 429 µA / cm 2 . r2 is . so we reject Ho and conclude that the model is useful. with associated p-value of .447 . The regression equation is ratio = 1.00204 0.057 and 12.26 StDev Fit 0. which is less than any significance level. and for a model with the indicator and an interaction term.447 . H a : β 1 ≠ 0 .203 + (.00091571 0.00067016 0. we are confident that when inverse foil thickness is 45 cm-1 . d. we estimate that we would expect steady-state permeation flux to increase by . but not included here was a model with an indicator for cracked/ not cracked.000 R-Sq(adj) = 56. Minitab output is below. A point estimate of flux when inverse foil thickness is 23.025.980704 F 38.264 = (10. Neither improved the fit significantly. b.018704 St Resid -4.2% Analysis of Variance Source Regression Residual Error Total DF 1 28 29 SS 0.962000 MS 0.7% T 491.000. That is.19 P 0.057. with std resid = -4. we test the hypotheses H 0 : β1 = 0 vs.00158587 Unusual Observations Obs ppv ratio 29 1144 0.00161 ppv -0. The test statistic is t = 17034. With t .11R R denotes an observation with a large standardized residual 66.00001827 S = 0. a predicted value of steadystate flux will be between 10. Also run.12.585) .321 ± 1.26042 µA / cm .001786 P 0.00091571 0.321 ± 2. and the residual plots against both x and y (only vs x shown) show no detectable pattern. while keeping the other predictor variables fixed.6% of the observed variations in SST 102.1033 =1− = .0996(11) − .67 ) = −. so Ho 15 is rejected. 430 ..0096(170 ) − . so we judge the model adequate..25 .15 = 8.Chapter 13: Nonlinear and Multiple Regression e. It appears that the model specifies a useful relationship between VO 2 max and at least one of the other predictors..0880(140) = 3.15 − 3. For a one-minute increase in the 1-mile walk time. yˆ = 3. b. With f = (.05. We would expect male to have an increase of . we would expect the VO 2 max to decrease by .706 ) = 9. c. 4. H 0 : β1 = β 2 = β 3 = β 4 = 0 will be rejected in favor of H a : at least one among β1 .3922 VO2 max can be attributed to the model relationship.52 . The residual is yˆ = (3.6566(1) + . d.67 . 706) 4 (1 −..25 . 67. R2 = 1− SSE 30. or 70.0996. e. if f ≥ F. β 4 ≠ 0 . a.6566 in VO 2 max over females. while keeping the other predictor variables fixed.005 ≥ 8.5959 + . Standard Residuals vs x 2 2 1 1 stdresid std res Normal Plot of Standard Residuals 0 -1 0 -1 -1 0 1 20 30 %iles 40 50 x The normal plot gives no indication to question the normality assumption.706. 4)(21. Therefore βˆ = S x′y ′ = 11. 7984 16. the scatter plot of the two transformed variables appears quite linear.5 seconds.1737(300 ) .4) 2 = 13. a.69  42. the regression model for the variables y′ and x ′ is log 10 ( y ) = y ′ = α + βx′ + ε ′ . Scatter Plot of Log(edges) vs Log(time) 3 Log(time 2 1 0 1 2 3 4 Log(edge Yes. 431 . 76011 = .502 . b. y′ and x ′ .1737 .7984 and γ 0 = 10α = 10 −.79839 S x ′x′ = 126.e. the predicted value of y is yˆ = .1737 x .4  and βˆ 0 = − (.640 − 16 (42. Letting y denote the variable ‘time’. or about 16. c. and thus suggests a linear relationship between the two. For x = 300.1615 and S x ′y′ = 68.76011 . The estimated power function model Using the transformed variables is then y = .. the model is y = γ 0 x ⋅ ε where γ 0 = α and γ 1 = β . i. The estimate of γ 1 is 16  16  γˆ1 = .34 − 1 16 S x′x ′ 13.98 21.1615 = .79839)  = −. Exponentiating (taking the antilogs of ) both sides gives ( )( ) y = 10α + β log( x )+ε ′ = 10 α x β 10 ε ′ = γ 0 x γ 1 ⋅ ε .69 ) = 11.7984 . This model is often called a “power γ1 function” regression model. the necessary sums of squares are (42.98 .Chapter 13: Nonlinear and Multiple Regression 68. 432 R2 = 1 − 3.25. 350 Temperat 250 150 50 0 100 200 300 400 Pressure b. R2 = 1− 5. including the interaction term increases the amount of variation in lift/drag ratio that can be explained by the model to 64. a. or 39. For the model excluding the interaction term. Minitab will also generate a 95% prediction interval of (256.Chapter 13: Nonlinear and Multiple Regression 69. or 8.1%. Using a quadratic model. we are confident that when pressure is 200 psi.7191x − .0024753x 2 .423 + 1. but obvious curvature.4% of the 8. and a point estimate of temperature when pressure is 200 is yˆ = 280.07 = .23 .55 observed variation in lift/drag ratio can be explained by the model without the interaction accounted for.394 .55 . 304.25 and 304. a Minitab generated regression equation is yˆ = 35. a quadratic model would probably be more appropriate. However. a.22 ο F. Because of the slight. Based on a scatter plot (below).18 = .641 . a single value of temperature will be between 256. That is. 70.22). a simple linear regression model would not be appropriate. 355 pallcont . H 0 : β1 = β 2 = 0 vs. c. 2.05.305 + 0..3.004 0.07055 0. This model is not useful. With rejection region f ≥ F. we fail to reject Ho and conclude that all the quadratic and interaction terms should not be included in the model.85 0.000.002.6 = 5.07 .993 currdens + 0.05.29 2. f ≥ F.16134 0.293 433 T -3. this model is also useful at any reasonable significance level. Since 1. H a : at least one of the listed β i ' s ≠ 0 . Testing H 0 : β1 = .161 temp + 0.10 −290.031 0.03 and a p-value of .96 0. the test statistic is f = (SSEl − SSEk ) (SSEk ) k −l n − k −1 the rejection region would be = ( 716. They do not add enough information to make this model significantly better than the simple first order model. and p-value .Chapter 13: Nonlinear and Multiple Regression b. H a : either β 1 or β 2 ≠ 0 . there is not enough of a significant relationship between lift/drag ratio and the two predictor variables to make the model useful (a bit of a surprise!) 71.29 3. we are testing H 0 : β 1 = β 2 = β 3 = 0 vs.29 . we 3 (1−. which shows all predictors as significant at level .27 ( 20− 5 ) (32 − 20 −1 ) = 1. we are testing R The test statistic is f = (1− R 2 ) the calculated statistic is f = 2 k ..40484 69.41 and calculated statistic f = . and 6 region. Looking at the individual predictors.0.05.. Partial output from Minitab follows. H a : at least one of the β i ' s ≠ 0 . With calculated statistic f = 6.49 -3. = β 20 = 0 vs.138 StDev 93.05. Even with the interaction term. The rejection region is (n −k −1) . we determine that at least one of the predictors is useful in predicting palladium content. With the interaction term. 27 ) 290. 5 = 5.3570 0. which does not fall in the rejection 2 (1−.4. and with f = 21.000 0.35460 -4. the p-value associated with the pH predictor has value .05: The regression equation is pdconc = .11 = 2.15. Without interaction.641 = 2.98 0.27 -0.03381 1.004 .3 pH .394) f ≥ F.169. b.95 . we test the model utility (to see if any of the predictors are useful). a. so we fail to reject Ho .9929 0.405 niconc + 69.07 < 2. Using Minitab to generate the first order regression model. d.641) 5 still fail to reject the null hypothesis. = β 20 = 0 vs.003 0.78 10.14 pHsq Predictor Constant niconc pH temp currdens pallcont pHsq Coef -304.98 .010 0.09432 21. Testing H 0 : β 6 = . H a : at least one of the β i ' s ≠ 0 .14 .20 P 0.000 0.72..15 -2. which would indicate that this predictor is unimportant in the presence of the others. Using significance level .72 .394 = 1.24 4. 9506) 27 The complete second order model is H 0 : β 7 = 0 vs H a : β 7 ≠ 0 (the coefficient corresponding to the wc*wt To test predictor).222(12) + . H a : at least one of the R β i ' s ≠ 0 . ..I. so it can be eliminated.05. f = 2. a.869 (df s βˆ 2 = n – 3 = 5).06% of the observed variation in SST 16. H a : either β 1 or β 2 ≠ 0 .The rejection (n −k −1) f ≥ F.52 .18555 weld strength can be attributed to the given model.68 9 (1−. b.808.869 . 2.00003391 quadratic predictor should be retained.9986 .25 .116 ) .297(6) − . The test statistic value is βˆ 2 . 27 = 2. Ho is rejected. it appears that the value of strength can be accurately predicted. The relevant hypotheses are t= H 0 : β 2 = 0 vs. 434 .32 = 1. The hypotheses are H 0 : β 1 = .1 ≤ −6. so we reject the null hypothesis.9506 = 57.0102 10 2 − . The . the p-value ≈ 2(.146 t= (from Table A. = β 9 = 0 vs. giving f = 1783. H a : β 2 ≠ 0 . R 2 = 1 − . The calculated statistic is f = ≥ 2. R2 = 1− SSE .052(. With df = 27.962 ± 2.8).001 if either t ≥ 6. where k = 9. Because of the narrowness of the interval.01.9506 . 073) = . or 95. The test statistic 2 k ..k . c. 27 = 2. 9. where k = 2 for the quadratic model. this predictor is not useful in the presence of all the others. We wish to test R is f = (1− R 2 ) H 0 : β1 = β 2 = 0 vs. folks – the quadratic model is useful! b.n −k −1 = F.0750 ) = 7.0128(10 )(12 ) = 7. The complete second order model consists of nine predictors and nine corresponding coefficients.88 doubt about it.352 + . a. The test statistic is f = (1− R 2 ) region is which is useful.80017 =1− = .052 . ( ) yˆ = 3. 73. Since t= − .Chapter 13: Nonlinear and Multiple Regression 72. and n = 37. No 202. the 95% P. and Ho will be rejected at level .962 .154 = (7.962 ± . With such a large p-value.025.25 . d.00163141 = −48.037 6 2 + .29 = .869 or t ≤ −6. The rejection region is (n −k −1) f ≥ Fα . 5 = 13. would be The point estimate is ( ) 7.8. With t .098(10) + . 2 k .27 . 91) . d.05) e.55 Ho is rejected at significance level . 435 .22.031 ) = (44 . 895 .0219 . SST = Σy 2 − 26 .571)(.3) giving βˆ1 = = . b.36 ± (2.36 .3621 = = −7.01 and the quadratic model is judged useful. 7 = 5. t.3.8 . the 95% P.293.69 . so the C.67.058 . c.98 = .5 .12196058 + . 0005 .I. 45 .e. and t. The quadratic predictor should not be eliminated. a. is ) 21. k.067 = (20. Σ yi′2 = 47.96 ± (1 . The βx necessary summary quantities are Σ xi = 397.081. n − k− 1 = F. so the marginal benefit of including the cubic predictor would be essentially nil – and a scatter plot doesn’t show the type of curvature associated with a cubic model. 263. Y ′ = log (α ) + βx + log (ε ) = β 0 + β1 x + ε ′ .Chapter 13: Nonlinear and Multiple Regression c. Σ xi2 = 14. The hypotheses are t= H 0 : β 2 = 0 vs.898 2 (.058 + . i.8 ≥ 9.0219 .01.895 )(1 .7 = 9.001 and ps βˆ2 .69 = (20.001.263) − (397) −9 . 3073 value < .12196058 Thus βˆ = . 2 t . so n = 30. Because 30.I.I.427 ) 74. H a : β 2 ≠ 0 .36 ± 2.36 ± . Then. 12(− 2358. Σ yi′ = −74.29 5 = . 2 12(14.025.12196058 . suggesting the = α ⋅ (10 ) ⋅ ε .1 . and f = 264 .2 .571 .08857312(35) = −6. A scatter plot of model Y y ′ = log 10 ( y ) vs. No. and Σ xi yi′ = −2358.08857312 and α = 10 . 75. we need to figure out s 2 based on the information we have been given.025.47 .1141) = 21. and βˆ 0 + βˆ 1 (100) + βˆ 2 (100 ) = 21. First. so Ho is rejected at level . is 21.571 .36 ± 1. The test statistic value is βˆ2 − 2.5 . The predicted value of y′ when x = 35 is − 9. and µˆ Y ⋅1 = βˆ0 + βˆ1 (1) + βˆ 2 (1) = 45 . x shows a substantial linear pattern. so yˆ = 10 −6. 2 x = 1 here.22. s 2 = MSE = SSE df ( = . giving the C.08857312 and βˆ 0 = −9.102) 7 (Σy) = 264.5 = 2.1141 = 21. R2 is extremely high for the quadratic model.7 = 1.96 .01.1) − (397 )(− 74. H 0 : β1 = β 2 = 0 will be rejected in favor of H a : either β 1 or β 2 ≠ 0 if R2 f = (1−R 2 ) k ( n − k −1) R2 = 1 − ≥ Fα .55 .898 .408 . x12 .5 ) = 180 + (1)(15 ) + (10 .4851)2 + (. The models with 7. R2 = 1 − c. s + (est.903 1210 . No.2. 238 . is 12 − 3 229 .10 = 3.01.st. β 6 . 2 Estimate = βˆ 0 + βˆ1 (15 ) + βˆ 2 (3. the interval is 6.0609 . 436 .007 which is less than . b.30 b.5 ) = 231 . the alternative that at least one of these six corresponding coefficients β i ' s is not zero.6. x2 . p-value = .75 117 . 1) (9 −3 ) (20 −10) = 530. The test statistic value is f = 1.56 ) 77.3067 . β 7 . f = . and CK meet the variable selection criteria given in Section 13. x 2 x 3 with β1 .162 )2 = (5.5 − 5027. µˆ y⋅9 . H 0 : β1 = β 2 = 0 vs. t.5. The model with 6 carriers is a defensible choice on all three grounds. a.Chapter 13: Nonlinear and Multiple Regression 76. which 9 greatly exceeds F. 262 )(3. 40 2 = 13.05. 9 so there appears to be a useful linear relationship.05. The P. 66 = 6.4 = .7.I.262 . using α = . 1) (5027. H a : either β 1 or β 2 ≠ 0 (or both) .2224 ) e. β 4 . or 9 carriers here merit serious consideration.dev) = 3. 9. It doesn’t appear as though any of the quadratic or interaction carriers should be included in the model.03301 ) = (. x1 x 3 . 9 . 806 ) = (220 . a. Since 502.9 = 1. . x32 .025 . 5 ± (2 . The second order model has predictors x1 .447 ) (. x3 . p-value = . 78.1 .1 < (821. a.9 = 2 . MSE k . x1 x 2 .14167 ± (2 . d.79 b. c.01.044 . We wish to test H 0 : β 4 = β 5 = β 6 = β 7 = β 8 = β 9 = 0 vs. 80. These models merit consideration because Rk2 . β 2 . 1) s2 = d. 8.447 )(.806 . x22 .. Yes. There are obviously several reasonable choices in each case. β 5 .097 2 = 41. β 9 .5 )(3.043 which is less than .903 . β 8 .71 F.3067 ± (2.06. 79.05 . as are those with 7 and 8 carriers.22 . 2 117. Ho cannot be rejected. β 3 . Chapter 13: Nonlinear and Multiple Regression 80. a. (.90 ) (15 ) (.10 ) (4 ) f = = 2.4 . Because 2.4 < 5.86, H 0 : β 1 = ... = β15 = 0 cannot be rejected. There does not appear to be a useful linear relationship. b. The high R2 value resulted from saturating the model with predictors. In general, one would be suspicious of a model yielding a high R2 value when K is large relative to n. (R ) 2 c. (15) (1−R ) 2 ( 4) R2 21. 975 ≥ 5.86 iff ≥ 21.975 iff R 2 ≥ = .9565 2 1− R 22. 975 81. a. The relevant hypotheses are H 0 : β 1 = ... = β 5 = 0 vs. Ha: at least one among β 1 ,..., β 5 is not 0. F.05,5,111 = 2.29 and f = (.827 ) (.173) (5 ) (111) = 106.1 . Because 106.1 ≥ 2.29 , Ho is rejected in favor of the conclusion that there is a useful linear relationship between Y and at least one of the predictors. b. t .05,111 = 1.66 , so the C.I. is .041 ± (1.66 )(.016 ) = .041 ± .027 = (.014,.068 ) . β 1 is the expected change in mortality rate associated with a one-unit increase in the particle reading when the other four predictors are held fixed; we cab be 90% confident that .014 β 1 < .068. < c. H 0 : β 4 = 0 will be rejected in favor of H a : β 4 ≠ 0 if t = βˆ 4 is either ≥ 2.62 s βˆ 4 ≤ −2.62 . t = or .014 = 5.9 ≥ 2.62 , so Ho is rejected and this predictor is judged .007 important. d. yˆ = 19.607 + .041(166 ) + .071(60) + .001(788) + .041(68 ) + .687(.95) = 99.514 and the corresponding residual is 103 – 99.514 = 3.486. 82. a. The set x1 , x3 , x 4 , x 5 , x6 , x8 includes both x1 , x4 , x5 , x8 and x1 , x3 , x 5 , x 6 , so 2 1, 3, 4,5, 6, 8 R b. ( ) ≥ max R12, 4,5, 8 , R12, 3,5, 6 = .723 . R12, 4 ≤ R12, 4,5, 8 = .723 , but it is not necessarily ≤ .689 since x1 , x 4 is not a subset of x1 , x3 , x 5 , x 6 . 437 Chapter 13: Nonlinear and Multiple Regression 438 CHAPTER 14 Section 14.1 1. a. We reject Ho if the calculated χ 2 value is greater than or equal to the tabled value of χ α2, k −1 from Table A.7. Since 12.25 ≥ χ .205, 4 = 9.488 , we would reject Ho . b. Since 8.54 is not ≥ χ .201, 3 = 11.344 , we would fail to reject Ho . c. Since 4.36 is not ≥ χ .210, 2 = 4.605 , we would fail to reject Ho . d. Since 10.20 is not a. In the d.f. = 2 row of Table A.7, our ≥ χ .201, 5 = 15.085 , we would fail to reject Ho . 2. χ 2 value of 7.5 falls between χ .2025, 2 = 7.378 and χ .201, 2 = 9.210 , so the p-value is between .01 and .025, or .01 < p-value < .025. b. With d.f. = 6, our χ 2 value of 13.00 falls between χ .205, 6 = 12.592 and χ .2025,6 = 14.440 , so .025 < p-value < .05. c. With d.f. = 9, our χ 2 value of 18.00 falls between χ .205, 9 = 16.919 and χ .2025,9 = 19.022 , so .025 < p-value < .05. d. With k = 5, d.f. = k – 1 = 4, and our χ 2 value of 21.3 exceeds χ .2005, 4 = 14.860 , so the p-value < .005. e. The d.f. = k – 1 = 4 – 1 = 3; χ 2 = 5.0 is less than χ .210, 3 = 6.251 , so p-value > .10. 439 Chapter 14: The Analysis of Categorical Data 3. Using the number 1 for business, 2 for engineering, 3 for social science, and 4 for agriculture, let p i = the true proportion of all clients from discipline i. If the Statistics department’s expectations are correct, then the relevant null hypothesis is H o : p1 = .40, p 2 = .30, p3 = .20, p 4 = .10, versus H a : The Statistics department’s expectations are not correct. With d.f = k – 1 = 4 – 1 = 3, we reject Ho if χ 2 ≥ χ .205, 3 = 7.815 . Using the proportions in Ho , the expected number of clients are : Client’s Discipline Expected Number Business (120)(.40) = 48 Engineering (120)(.30) = 36 Social Science (120)(.20) = 24 Agriculture (120)(.10) = 12 Since all the expected counts are at least 5, the chi-squared test can be used. The value of the test statistic is k (ni − np i ) 2 i =1 np i χ2 = ∑ = ∑ allcells (observed − exp ected ) 2 exp ected  (52 − 48) (38 − 36) (21 − 24) (9 − 12) 2  = + + +  = 1.57 , which is not 48 36 24 12   ≥ 7.815 , so we fail to reject Ho . (Alternatively, p-value = P( χ 2 ≥ 1.57) which is > .10, 2 2 2 and since the p-value is not < .05, we reject Ho ). Thus we have no evidence to suggest that the statistics department’s expectations are incorrect. 4. p i0 = 18 = .125 for I = 1, …, 8, so = .125 will be rejected in favor of H a if The uniform hypothesis implies that H o : p10 = p 20 = ... = p80 χ 2 ≥ χ .210, 7 = 12.017 . Each expected count is npi0 = 120(.125) = 15, so  (12 − 15) 2 (10 − 15) 2  2 χ = + ... +  = 4.80 . Because 4.80 is not ≥ 12.017 , we fail to 15 15   reject Ho . There is not enough evidence to disprove the claim. 440 Chapter 14: The Analysis of Categorical Data 5. We will reject Ho if the p-value < .10. The observed values, expected values, and corresponding χ 2 terms are : Obs 4 15 23 25 38 21 32 14 10 8 Exp 6.67 13.33 20 26.67 33.33 33.33 26.67 20 13.33 6.67 χ 1.069 .209 .450 .105 .654 .163 1.065 1.800 .832 .265 2 χ 2 = 1.069 + ... + .265 = 6.612 . With d.f. = 10 – 1 = 9, our χ 2 value of 6.612 is less than χ .210, 9 = 14.684 , so the p-value > .10, which is not < .10, so we cannot reject Ho . There is no evidence that the data is not consistent with the previously determined proportions. 6. A 9:3:4 ratio implies that p10 = 169 = .5625 , p 20 = 163 = .1875 , and p 30 = 4 16 = .2500 . With n = 195 + 73 + 100 = 368, the expected counts are 207.000, 69.000, and 92.000, so  (195 − 207 ) 2 (73 − 69 )2 (100 − 92 )2  2 χ = + +  = 1.623 . With d.f. = 3 – 1 = 2, our 207 69 92   2 2 χ value of 1.623 is less than χ .10, 2 = 4.605 , so the p-value > .10, which is not < .05, so we cannot reject Ho . The data does confirm the 9:3:4 theory. 7. We test H o : p1 = p2 = p 3 = p4 = .25 vs. H a : at least one proportion ≠ .25 , and d.f. = 3. We will reject Ho if the p-value < .01. Cell 1 2 3 4 Observed 328 334 372 327 Expected 340.25 340.25 340.25 34.025 χ term .4410 .1148 2.9627 .5160 2 χ 2 = 4.0345 , and with 3 d.f., p-value > .10, so we fail to reject Ho . The data fails to indicate a seasonal relationship with incidence of violent crime. 441 Chapter 14: The Analysis of Categorical Data 8. H o : p1 = , p2 = 15 365 46 365 , p3 = 120 365 , p4 = 184 365 , versus H a : at least one proportion is not a stated in Ho . The degrees of freedom = 3, and the rejection region is χ 2 ≥ χ .01, 3 = 11.344 . Cell 1 2 3 4 Observed 11 24 69 96 Expected 8.22 25.21 65.75 100.82 χ term .9402 .0581 .1606 .2304 2 χ =∑ 2 (obs − exp )2 exp = 1.3893 , which is not ≥ 11.344 , so Ho is not rejected. The data does not indicate a relationship between patients’ admission date and birthday. 9. a. Denoting the 5 intervals by [0, c1 ), [c 1 , c2 ), …, [c 4 , ∞ ), we wish c1 for which .2 = P(0 ≤ X ≤ c1 ) = ∫ c1 0 e − x dx = 1 − e −c1 , so c1 = -ln(.8) = .2231. Then .2 = P(c1 ≤ X ≤ c2 ) ⇒ .4 = P(0 ≤ X 1 ≤ c 2 ) = 1 − e − c2 , so c2 = -ln(.6) = .5108. Similarly, c3 = -ln(.4) = .0163 and c4 = -ln(.2) = 1.6094. the resulting intervals are [0, .2231), [.2231, .5108), [.5108, .9163), [.9163, 1.6094), and [1.6094, ∞ ). b. Each expected cell count is 40(.2) = 8, and the observed cell counts are 6, 8, 10, 7, and 9, 2  (6 − 8)2 ( 9 − 8)  2 so χ =  + ... +  = 1.25 . Because 1.25 is not ≥ χ .10, 4 = 7.779 , 8   8 2 even at level .10 Ho cannot be rejected; the data is quite consistent with the specified exponential distribution. 10. a. k (ni − np i0 )2 i =1 np i0 χ =∑ 2 =∑ i =∑ i Ni − 2np i0 N i + n pi 0 N =∑ i − 2 Σ N i + n Σ p i 0 i i np i0 i npi 0 2 2 2 2 2 i N N2 − 2n + n(1) = ∑ i − n as desired. This formula involves only one npi 0 i npi 0 subtraction, and that at the end of the calculation, so it is analogous to the shortcut formula for s 2. b. k ∑ N i2 − n . For the pigeon data, k = 8, n = 120, and ΣN i2 = 1872 , so n i 8(1872 ) χ2 = − 120 = 124.8 − 120 = 4.8 as before. 120 χ2 = 442 Chapter 14: The Analysis of Categorical Data 11. a. The six intervals must be symmetric about 0, so denote the 4th , 5th and 6th intervals by [0, Φ (a ) = .6667( 12 + 16 ) , which from Table A.3 gives a ≈ .43 . Similarly Φ (b ) = .8333 implies b ≈ .97 , so the six intervals are ( − ∞ , -.97), [-.97, -.43), [-.43, 0), [0, .43), [.43, .97), and [.97, ∞ ). a0, [a, b), [b, ∞ ). a must be such that b. The six intervals are symmetric about the mean of .5. From a, the fourth interval should extend from the mean to .43 standard deviations above the mean, i.e., from .5 to .5 + .43(.002), which gives [.5, .50086). Thus the third interval is [.5 - .00086, .5) = [.49914, .5). Similarly, the upper endpoint of the fifth interval is .5 + .97(.002) = .50194, and the lower endpoint of the second interval is .5 - .00194 = .49806. The resulting intervals are ( − ∞ , .49806), [.49806, .49914), [.49914, .5), [.5, .50086), [.50086, .50194), and [.50194, ∞ ). c. Each expected count is 45( 16 ) = 7.5 , and the observed counts are 13, 6, 6, 8, 7, and 5, so χ 2 = 5.53 . With 5 d.f., the p-value > .10, so we would fail to reject Ho at any of the usual levels of significance. There is no evidence to suggest that the bolt diameters are not normally distributed. Section 14.2 12. a. Let θ denote the probability of a male (as opposed to female) birth under the binomial model. The four cell probabilities (corresponding to x = 0, 1, 2, 3) are π 1 (θ ) = (1 − θ ) , π 2 (θ ) = 3θ (1 − θ ) , π 3 (θ ) = 3θ 2 (1 − θ ) , and π 4 (θ ) = θ 3 . 3 The likelihood is 2 3 n2 + n3 ⋅ (1 − θ ) 3 n1 +2 n 2 + n 3 ⋅θ n2 + 2n3 + 3n4 . Forming the log likelihood, taking the derivative with respect to θ , equating to 0, and solving yields n + 2 n3 + 3n4 66 + 128 + 48 θˆ = 2 = = .504 . The estimated expected counts are 3n 480 3 2 160(1 − .504) = 19.52 , 480(.504 )(.496 ) = 59.52 , 60.48, and 20.48, so  (14 − 19.52) 2 (16 − 20.48) 2  2 χ = + ... +  = 1.56 + .71 + .20 + .98 = 3.45 . 20.48  19.52  The number of degrees of freedom for the test is 4 – 1 – 1 = 2. Ho of a binomial distribution will be rejected using significance level .05 if χ 2 ≥ χ .205, 2 = 5.992 . Because 3.45 < 5.992, Ho is not rejected, and the binomial model is judged to be quite plausible. b. Now 53 θˆ = = .353 and the estimated expected counts are 13.54, 22.17, 12.09, and 150 2.20. The last estimated expected count is much less than 5, so the chi-squared test based on 2 d.f. should not be used. 443 Chapter 14: The Analysis of Categorical Data 13. According to the stated model, the three cell probabilities are (1 – p)2 , 2p(1 – p), and p 2 , so we wish the value of p which maximizes example 14.6 gives are then n (1 − pˆ ) 2 (1 − p )2 n [2 p (1 − p )]n 1 2 p 2 n3 . Proceeding as in n 2 + 2n3 234 = = .0843 . The estimated expected cell counts 2n 2776 2 = 1163.85 , n[2 pˆ (1 − pˆ )] = 214.29 , npˆ 2 = 9.86 . This gives pˆ =  (1212 − 1163.85) 2 (118 − 214.29 )2 (58 − 9.86 )2  χ = + +  = 280.3 . According 1163 . 85 214 . 29 9 . 86   2 2 2 to (14.15), Ho will be rejected if χ ≥ χ α , 2 , and since χ .01, 2 = 9.210 , Ho is soundly 2 rejected; the stated model is strongly contradicted by the data. 14. a. We wish to maximize p Σxi − n (1 − p ) , or equivalently (Σx i − n ) ln p + n ln (1 − p ) . n (Σxi − n) d (Σxi − n ) n to 0 yields , whence p = . For the = Σxi dp p (1 − p ) given data, Σ xi = (1)(1) + ( 2)(31) + ... + (12)(1) = 363 , so (363 − 130 ) = .642 , and qˆ = .358 . pˆ = 363 Equating b. pˆ times the previous count, giving nqˆ = 130(.358) = 46.54 , nqˆpˆ = 46.54(.642) = 29.88 , 19.18, 12.31, 17.91, 5.08, 3.26, … . Grouping all values ≥ 7 into a single category gives 7 cells with estimated Each estimated expected cell count is expected counts 46.54, 29.88, 19.18, 12.31, 7.91, 5.08 (sum = 120.9), and 130 – 120.9 = 9.1. The corresponding observed counts are 48, 31, 20, 9, 6, 5, and 11, giving χ 2 = 1.87 . With k = 7 and m = 1 (p was estimated), from (14.15) we need χ .210, 5 = 9.236 . Since 1.87 is not ≥ 9.236 , we don’t reject Ho . 444 Chapter 14: The Analysis of Categorical Data 15. [(1 − θ ) ] ⋅ [θ (1 − θ ) ] ⋅ [θ [θ (1 − θ )]n (1 − θ )2 ] = θ 233 (1 − θ )367 , so 4 n1 The part of the likelihood involving θ is [ ] 3 n2 n3 2 ⋅ ⋅ θ 4 5 = θ n2 + 2 n3 +3n4 + 4n5 (1 − θ )4 n1 +3 n2 + 2n3 +n4 ln (likelihood ) = 233 ln θ + 367 ln (1 − θ ) . Differentiating and equating to 0 yields 233 θˆ = = .3883, and 1 −θˆ = .6117 [note that the exponent on θ is simply the total # 600 of successes (defectives here) in the n = 4(150) = 600 trials.] Substituting this θ ′ into the formula for pi yields estimated cell probabilities .1400, .3555, .3385, .1433, and .0227. 3 n 4 ( ) Multiplication by 150 yields the estimated expected cell counts are 21.00, 53.33, 50.78, 21.50, and 3.41. the last estimated expected cell count is less than 5, so we combine the last two categories into a single one ( ≥ 3 defectives), yielding estimated counts 21.00, 53.33, 50.78, 24.91, observed counts 26, 51, 47, 26, and χ 2 = 1.62 . With d.f. = 4 – 1 – 1 = 2, since 1.62 < χ .210, 2 = 4.605 , the p-value > .10, and we do not reject Ho . The data suggests that the stated binomial distribution is plausible. 16. (0)(6 ) + (1)(24 ) + (2)(42 ) + ... + (8 )(6 ) + (9)(2 ) = 1163 = 3.88 , so the λˆ = x = 300 300 x (3.88) . estimated cell probabilities are computed from p ˆ = e −3 .88 x! x 0 1 2 3 4 5 6 7 ≥8 np(x) 6.2 24.0 46.6 60.3 58.5 45.4 29.4 16.3 13.3 obs 6 24 42 59 62 44 41 14 8 This gives χ 2 = 7.789 . To see whether the Poisson model provides a good fit, we need χ .210, 9−1−1 = χ .210, 7 = 12.017 . Since 7.789 < 12.017 , the Poisson model does provide a good fit. 380 (3.167 ) . λˆ = = 3.167 , so pˆ = e −3.167 120 x! x 17. x 0 1 2 3 4 5 6 ≥7 pˆ .0421 .1334 .2113 .2230 .1766 .1119 .0590 .0427 npˆ 5.05 16.00 25.36 26.76 21.19 13.43 7.08 5.12 obs 24 16 16 18 15 9 6 16 The resulting value of χ 2 = 103.98 , and when compared to χ .201, 7 = 18.474 , it is obvious that the Poisson model fits very poorly. 445 07.11) = . . Substituting the observed n I’s yields 2n 2(49 ) + 20 + 53 110 θˆ1 = = .1210 . according to (14.992 .201.201.254 .2199 . pˆ 2 = .11 ≤ Z ≤ −. it is implausible that this data came from a normal population.4275) = .08.8 observed 49 26 14 20 53 38 This gives χ 2 = 29.17 ) = . The estimated expected counts are then (multiply pˆ i by n = 83) 11. the likelihood is proportional θ 1Aθ 2B (1 − θ 1 − θ 2 ) . Taking the natural log and equating both ∂ ∂ A C B C and to zero gives = and = . a conclusion that is also supported by the small p-value ( < . 24.344 .235 . 446 .205. pˆ 3 = P(− . 6−1− 2 = χ .8 32.35 ≤ Z ≤ . pˆ 4 = 2(. Comparing this with χ . Thus θˆ1 = . θ2 = .066   pˆ 2 = P(. and 1 − θˆ1 − θˆ2 = . pˆ 4 = P(.2 17.25. and pˆ 5 = .41 ≤ Z ≤ 1. Substituting this into the first equation gives θ 1 = .164 . It would be dangerous to use the one-sample t interval as a basis for inference. from which χ 2 = 1.173   pˆ 1 = P( X < . 20.4275 . and 10. B = 2n 2 + n 4 + n 6 . and C = 2n 3 + n 5 + n 6 . from 400 400 2 which pˆ 1 = (.04. and then A A + B +C 2n1 + n 4 + n5 ˆ 2n 2 + n4 + n 6 B θ2 = .4275)(.085 . pˆ 6 = .0 50. whence ∂θ 1 ∂θ 2 θ1 1 − θ1 − θ 2 θ 2 1 − θ1 − θ 2 Bθ A θ 2 = 1 .201.7 is a clear outlier.085 . 3 = 11. the observation 116.205. θˆ2 = = . 19.67 .Chapter 14: The Analysis of Categorical Data 18. 5 = 15. 6−1 = χ .100 ≤ X ≤ .2297 . Category 1 2 3 4 5 6 np 36.2959 . Therefore.35) = . With χ .1 ≥ 15.150 ) = P(− 1.01000 ) of the Ryan-Joiner test.089 . 19. 18.2750 . and A+ B+C 2n 2n 2n + n 5 + n6 1 − θˆ1 − θˆ2 = 3 . The pattern of points in the plot appear to deviate from a straight line.1 . .100 − .076 . where A + B + C = 2n. 5−1−2 = χ . and χ .15) Ho must be rejected since 29 . With A = 2n 1 + n 4 + n 5 . the hypothesis of normality cannot be rejected.275 ) = .100) = P Z <  = Φ(− 1 . pˆ 3 = . C to ( ) ( ) pˆ 5 = .8 47.56.201. 2 = 5.41) = .183 .6 15. In particular.1335 .2975 . 1 74.9408. even at the very smallest significance level of .1 76.259 1.6 73.871 .8 76.6 . Minitab uses the same y I for each x(i) in a group of tied values.0 77.517 79.b.063 1.761 -.5 71.10.967.05 = 9639.407 .523 ) = .161 With Fail to reject Ho . Since c.: Minitab was used to calculate the y I’s. Σ yi = 0 .03 .967 n.901 1.7 . Σ x(i ) y i = 103.3 24.259 -1.Chapter 14: The Analysis of Categorical Data 21. so .901 -.000 .523 . xi yi xi yi xi yi 69.9408. Section 14. though the hand calculated value may be slightly different because when there are ties among the x(i)’s.871 ) − (1925 .634 -. 23. and .05 .2 83.815 2 Computed χ = 6. Σx(i ) = 148.1 82.1 -.761 .3 -1. and c.199 .301 . RR: χ 2 ≥ 7.1 73. Σ x(i ) = 1925. one would have to consider population normality plausible.5 75. This data could reasonably have come from a normal population.7 75.10. The Ryan-Joiner test p-value is larger than . This means that it would be legitimate to use a one-sample t test to test hypotheses about the true average ratio.301 -. 2 Σ yi2 = 22.063 -.7 79. so we conclude that the null hypothesis of normality cannot be rejected.7 93.6 ) 2 25 (25 .2 76.9707.199 -.01 = .9 78. At the 5% significance level. 22.923 .407 75.9 72.03 ) 25(148 .520 1.967 -1.923 < .099 . Minitab gives r = . Ho : TV watching and physical fitness are independent of each other Ha: the two variables are not independent Df = (4 – 1)(2 – 1) = 3 α = . so r= 25 (103 .05 < p-value < .2 75. 447 .3 73.01.634 . the null hypothesis of population normality must be rejected (the largest observation appears to be the primary culprit). The data fail to indicate an association between daily TV viewing habits and physical fitness.099 .5 74.9 80.9 77.6 79. C10 = .517 -.520 -1. 5 102 238 508 746 Thus χ2 = (141 − 110.5 69.65 726 2 471.77. 5 will be rejected at level .597 . Then Ho : p i1 = p i2 for i = 1.82. 448 . 4 = 13. j = 1.35 761 921 15 36 18 497 1487 χ 2 = 23. With i = 1 identified with men and i = 2 identified with women. + (82 − 69. we wish to test Ho : p 1j = p 2j for j = 1.34 7. 28. 2.5 = 24. which is ≥ 13.32 17. 2 = 10. 3. 8. Observed Estimated Expected Matured Aborted Matured Aborted ni 141 206 110.201. 2.42 9.05.7 98 24 78 32.79 242.98 . 4. Ho is rejected.98 > χ . resulting in freedom.01 if χ 2 ≥ χ .7 )2 110. Since 26.2005.5) 2 69. Let Pij = the proportion of white clover in area of type i which has a type j mark (i = 1.18 23.66 7. 27. ….( 2−1)( 5−1) = χ .3 347 28 69 30.7 236.68 18. 2.18. so Ho is rejected at level .Chapter 14: The Analysis of Categorical Data 25. and j = 1.277 .18 ≥ 13. L=R. 5 will be rejected if χ 2 ≥ χ .9 66.01. 3 vs. 19. The hypothesis Ho : p 1j = p 2j for j = 1. 4. 2. Let p i1 = the probability that a fruit given treatment i matures and p i2 = the probability that a fruit given treatment i aborts.23 and for women are 39.1 97 25 73 31. 5). 3. 2. 3 denoting the 3 categories L>R. L<R. Eˆ ij 1 2 3 4 5 1 449. p-value < . With (2 – 1)(3 – 1) = 2 degrees of 44.58 8.7 + .277 . Ha: p 1j not equal to p 2j for at least one j.277 . and 13.95. which strongly suggests that Ho should be rejected. since χ 2 = 44.82 .5 102 20 82 32. 4 = 13.21 254.. The estimated cell counts for men are 17.201.005.5 69.201..3 66.277 . 49 ).5 27.1 10.06 .26 5 5.6 42 .44 38.19 6 26. With p ij denoting the probability of a type j response when treatment i is applied.1 35. Eˆ ij 1 2 3 1 35. Ho will be rejected at level .587 .2 115 . Ho cannot be rejected at level .02 .1 10. 4 will be rejected at level .6 40.01 81.7 170 .587 .7 9. 9 = 23.3 3 26.8 23.13 . Ho : p 1j = …= p 6j for j = 1.47 152 353 149 654 6.40 . 449 .8 22.1 26.5 χ 2 = 27.5 34.4 43.4 151 .4 2 25.1 4 χ2 1 2 3 154 .10.66 ≥ 23.06 .00 .46 (carrying 2 decimal places in Eˆ ij yields χ 2 = 6.5 91.005 29. 3.210.0 50.005 if χ 2 ≥ χ .Chapter 14: The Analysis of Categorical Data 28.32 .7 23.7 62. where p ij is the proportion of the jth sex combination resulting from the ith genotype.987.1 2 39.0 21. so reject Ho at level .2005.8 10. 2.66 1.1 11. Ho : p 1j = p 2j = p 3j =p 4j for j = 1.12 .46 < 15.8 83.10 = 15. 3 is the hypothesis of interest.8 4 30.34 9.987 .37 . Eˆ ij 1 2 3 4 1 24.49 .14 1. 2.9 5.1 43.8 3 35. Since 6.10 if χ 2 ≥ χ .1 12.0 22 . 21 = 12.592 . we reject Ho .05 (but not at level .74 18. p .5 )2 (119 − 125 .3) ( 85 − 49 .205.30 61 3 19.11)2 (5 − 8.05.592 .00 12. so .40 49.0 ) = + + + = 64 .2 2 2 2 2 ( 15 − 44 .37 8.19 26.6) ( 45 − 59 . a different conclusion would be reached.15 ≥ 12.592 < 13.205.65 ≥ χ .96 30.11 43.) Configuration appears to have an effect on type of failure.74 29.0 so the independence hypothesis is rejected in favor of the conclusion that political views and level of marijuana usage are related.j (type of car and commuting distance are independent) will be rejected at level .4 151.025!) 32.2)2 (214 − 177 .4 )2 (173 − 151 . 4) and j the jth category of commuting distance. 450 .201.6 59 . With 6 df. Ho : p ij = p i. Since the p-value is < .21)2 + .8) ( 172 − 193 .253 < χ .2 177 .05 12. 6 = 12.40 38 49 126 75 250 χ 2 = 14.05 if χ 2 ≥ χ .16 19.05. 6 Eˆ ij 1 2 3 4 1 16.00 5.90 29.8 193 .21 15.11 8.58 18.32 90 2 7.. Ho : the design configurations are homogeneous with respect to type of failure vs.2025.47 40 3 10.70 99 4 7. 31. 16.025 < p-value < . so the independence hypothesis Ho is rejected at level .253 .15 11.60 52 2 11.0 )2 (47 − 54 .45 19. Ha: the design configurations are not homogeneous with respect to type of failure.00 8. 3. + = 13.440 .277 44 . 6 = 14. With I denoting the Ith type of car (I = 1.. χ2 = χ . χ2 = (479 − 494.0 54. Eˆ ij 1 2 3 1 10.3 49.21 60 34 92 38 26 190 (20 − 16. 4 = 13 .2)2 + + + + 494 .5 125 . (If a smaller significance level were chosen. 2.Chapter 14: The Analysis of Categorical Data 30. . + 20 = .11n. With the much larger sample size. so there are 26 freely determined cell counts. j .. Eˆ ijk = n k N ij n pˆ ij = N ij n and . With p ij denoting the common value of p ij1 . the departure from what is expected under Ho . giving ΣΣp ij = 1 so χ 2 df = 32 – 8 = 24. and the same is true of the estimated expected counts so now χ = 6. .13n.e. = Σp. 13 7 20 χ 2 Observed 19 11 30 2 ( 13 − 12) = 12 28 22 50 Estimated Expected 12 18 30 8 12 20 60 40 100 2 ( 22 − 20) + . which can be done as the last step in the calculation. but ˆ ˆ Eij Eij ΣΣEˆ ij = ΣΣN ij = n . a. Each observation count here is 10 times what it was in a. p ij4 (under Ho ).19n. Under Ho . b. ….12n. . . . Because .. .3. p ij2 .2. Ho is not rejected. p 2. 34.806 ≥ 4. but Σ pi. and Ho is rejected. p ..6806 < χ .. so there are 27 cells. so χ 2 = ΣΣ N ij2 − n . This formula is computationally efficient Eˆ ij because there is only one subtraction to be performed. p ij3 .28n.f. the independence hypothesis. p 11 .006806 n ≥ 4. With four different tables (one for each region).006806n . The observed counts are . .2 . p 33 must be estimated but only 8 independent parameters are estimated.12n.. there are 8 + 8 + 8 + 8 = 32 freely determined cell counts..20n. whereas the estimated 2 expected (. 2 = 4 .605 .6 .22n. . χ 2 (N = ΣΣ ij − Eˆ ij Eˆ ij ) 2 N ij2 − 2 Eˆ ij N ij + Eˆ ij2 ΣΣN ij2 = ΣΣ = − 2ΣΣN ij + ΣΣEˆ ij . 20n) = . n Ho will be rejected at level . p ..18n.1. is statistically significant – it cannot be explained just by random variation.. p ..605 .6806 . The rule for d.3 . We must estimate p .Chapter 14: The Analysis of Categorical Data 33. and p 3. = Σ p. 2 c. . 451 .. 60n)(.. Only the total sample size n is fixed in advance of the experiment. p . . χ 2 df = 26 – 6 = 20. k = 1 so only 6 independent parameters are estimated.605 .210. p 1. so the minimum n = 677.. yielding χ = .08n. now gives 35.07n.. iff n ≥ 676. This is a 3 × 3 × 3 situation..10 iff . i.30n. p ....1 . 36. 28 22.75 )2 7.562 15. There are 3 categories here – firstborn.1 ≥ 9.068 .50) = 15.75. Because 13. and we reject Ho if χ 2 ≥ χ .992. and 7. Ho will be rejected at significance level .5)2 (8 − 7 .65 . 2 = 5. The proportion of fish that are parasitized does appear to depend on which treatment is used.05 if χ 2 ≥ χ .077 .277 .01 if χ 2 ≥ χ .44 210.71 33 24 32 89 This gives χ = 13.992 .50 (p 2 = P(2nd or 3rd born) = .210 .5.492 .17 244. p 3 = .06 177. (31)(.25.151 1. Observed 30 3 16 8 16 16 62 27 Estimated Expected 22. 452 . middleborn. With p 1 .72 7. Df = 4.25) = 7. Ha: gender and years of experience are not independent.1 .37 47. Let p i1 = the proportion of fish receiving treatment i (i = 1.201. With df = (2 – 1)(3 – 1) = 2.56 409. Reject Ho .63 24.25 + . we wish to test Ho : p 1 = . 4 = 13. Ho should be rejected.83 475. 2 39.94 347.96 362. 3) who are parasitized.Chapter 14: The Analysis of Categorical Data Supplementary Exercises 37. Ho : gender and years of experience are independent.29 9. The two variables do not appear to be independent.932 . so χ 2 = (12 − 7.095 30. The hypothesis of equiprobable birth order appears quite plausible. (2nd or 3rd born).25.201.75 )2 (11 − 15 .451 4.680 7.205. 2. p 2 = . We wish to test Ho : p 1j = p 2j = p 3j for j = 1.99 10. 2.75 + 15 . Ho will be rejected at level . 38. women have higher than expected counts in the beginning category (1 – 3 years) and lower than expected counts in the more experienced category (13+ years). 2 = 9. and lastborn. Years of Experience Gender Male Observed Expected (O− E ) 2 E Female Observed Expected (O− E ) 2 E 1– 3 4– 6 7– 9 10 – 12 13 + 202 369 482 361 811 285.415 230 251 238 164 258 146.65 < 5. The expected counts are (31)(. 2 In particular.210 .04 706. and p 3 denoting the category probabilities.01 16.061 χ 2 = Σ ( O−EE ) = 131.75 = 3. p 2 .75. Ho is not rejected.5 + 7.25 = . Because 3.50). a.6 58. from which χ = 197. Parental and student use level do not appear to be independent.7 shows that this test statistic value is highly significant – the hypothesis of independence is clearly implausible.8 33. and the p-value < . The estimated expected counts are displayed below.Chapter 14: The Analysis of Categorical Data 40.3 91.005. Ha: The two variables are not independent.3 163 23.6 469. 41. The null hypothesis Ho : p ij = p i. 2 b. Ho : The probability of a late-game leader winning is independent of the sport played.70 .5 11.2 372.9 40.860 .05.2 934 504 2081 404 2989 453 . A glance at the 6 df row of Table A. the computed χ = 10. There appears to be a relationship between the late-game leader winning and the sport played.6 47 226 109 110 445 The calculated value of χ 2 = 22.1 235 82.5 72.5 650.4 > χ .3 846 > 74 157. With 3 df.3 126.4 .015 is also < . p .j states that level of parental use and level of student use are independent in the population of interest.0 114. p-value < . 4 = 14.1 674 65 – 74 142.3 535 55 – 64 113.2005. 2 Estimated Expected Home Acute Chronic 15 – 54 90.005. Quite possibly: Baseball had many fewer than expected late-game leader losses.3 57. Since 22.7 589. The test is based on (3 – 1)(3 – 1) = 4 df. so we would reject Ho .9 11. 42.518 . so Ho should be rejected at any significance level greater than . Estimated Expected 119. Chapter 14: The Analysis of Categorical Data 43. obsv 22 10 exp 13. Therefore (n − np10 )2 + (n2 − np20 )2 = (n1 − np10 )2  n + n  χ2 = 1 np10 np20 n2 2 n   n  ( pˆ 1 − p 10 )  = =  1 − p10  ⋅  = z2. p10 p20 n p p    10 20  n p  10 p20  2 46. Test Statistic: χ 2 5 7. 2 = 5. 45.406 17 .9) (6.189 10 7.99. The accompanying table contains both observed and estimated expected counts.405 ) = + + + 13. 3. 4 = 9. 5. At level . Ha: probabilities are not as specified. though it would not be rejected at the level .205. The data strongly supports that there are differences in perception of odors among the three areas. 2 44.189 ) ( 10 − 10 ) ( 5 − 7. The model postulated in the exercise is not a good fit.406 ) ( 11 − 17.1) (42. 4. We reject Ho at any significance level. Rejection Region: χ > χ 2 2 .01 < p-value < .60 ≥ χ . the null hypothesis of independence is rejected.8) 23 23 5 2 8 Don’t 61 (18.3) (7. Age 127 118 77 61 41 Want 424 (131.05.7) (10. 454 . The given SPSS output reports the calculated χ = 70.025 .2) 150 141 82 63 49 485 This gives χ 2 = 11.3) (71.01 ( . 2.357 = 9.405 2 2 2 2 ( 22 − 13. This is a test of homogeneity: Ho : p 1j = p 2j = p 3j for j = 1.0000. a.025).05.1) (123.406 11 17.189 10 H0 : probabilities are as specified.886 + 0 + 0.488 .9) (17.7) (55. the latter in parentheses. we reject H0 . (n1 − np10 )2 = (np10 − n1 )2 = (n − n1 − n(1 − p10 ))2 = (n2 − np20 )2 .99 Since 9.405 = 5.782 + 2.025 > 5.64156 and accompanying p-value (significance) of . we would fail to reject and conclude that there is no evidence of non-zero correlation in the population.651 )2 22 .295 11 .1125) = 19.Chapter 14: The Analysis of Categorical Data b. We are testing the hypothesis H0 : ρ = 0 vs Ha: ρ ? 0.6 37. so we reject Ho if χ > χ .651 (22 − 22 .6 2 2 (8 − 17.1842 > 5.13 . there is evidence of only weak correlation.11032 5.0000262 + .651 = .8875 81 No Concussion 60.8875 37.2875 63.4 17.6)2 30.4 )2 + (68 − 63.1570332 With the same rejection region as in a.024 9 .1842 .1125 2 2 1) = 2.99 .024 )2 + (10 − 9.0363746 = .7125 60.2875)2 + (28 − 32. Vs Ha: there is a difference … No Concussion Observed Concussion Total Soccer 45 46 91 Non Soccer 28 68 96 Control 8 45 53 Total 81 159 240 Expected Soccer Non Soccer Control Total χ2 = Concussion 30.0164353 + . 455 .03 )2 + (5 − 5. The df for this test is (I – 1)(J – + 17.99.024 0.18813 9. The test statistic is t= r n−2 1− r 2 = − .1041971 + .22 89 1 − . our decision could change.295 )2 + (11 − 11 .8875) + (45 − 37. so we reject H0 .24272 11. pi exp χ2 = 0.45883 22. 19.22 2 = −2.01. At significance level α = . At best. Our hypotheses are H0 : no difference in proportion of concussions among the three groups. There is a difference in the proportion of concussions based on whether a person plays soccer.05. 2 = 5. If we were willing to accept a higher significance level.03 0. a. we do not reject the null hypothesis.7125)2 + (46 − 60.4 63. This model does provide a good fit. 47. b.2875 32.1125 159 Total 91 96 53 240 (45 − 30.295 0.7125 32.03 5 . 4659 3. d.50 − 39. For this test.2873 and 124.63) = −.73295 MSE = = .000.10 vs Ha: at least one p i ? .05. we reject the null hypothesis. MSTr .Chapter 14: The Analysis of Categorical Data c.95 .35) 2 + 53(. a. the number of non-overlapping groups of 5 is only 20.10.30 − . 48. m n s12 s 22 + m n (37.49 − .9 vs Ha: at least one p ij ? 0.5244 The test statistic is f = table A. The df = (3.000 (the number of possible combinations of 5 digits).01 for I and j= 1. f = = 3.5244 . the p value is between .10. Based on these p -values.30 .48) 2 = 124.2873 1. H0 : µ1 = µ2 vs Ha: µ1 ? µ2 . Now.854)2 ≈ 56 . We will test to see if the average score on a controlled word association test is the same for soccer and non-soccer athletes. We’ll use test statistic ( x1 − x2 ) s12 s 22 t= . we could conclude that the digits of p behave as though they were randomly generated. with df = 99.73295 2 SSE = 90(. with df = 9.9.854 + 25 55 be > .67) 2 + 95(.200 from 237 .05.35) 2 + 96(. We need a much larger sample size! d.2. the number of p’s in the Hypothesis would be 105 = 100.206 + 1. At significance level . With = 3. Using df 2. Our hypotheses for ANOVA are H0 : all means are equal vs Ha: not all means are equal. 19 − . Using only the first 100. MSE SSTr = 91(.01 and . Ho : p ij = .000 digits in the expansion. The p-value will t= 3. so we do not reject H0 and conclude that there is no difference in the average score on the test for the two groups of athletes.854 .206 and = 1. 87) 2 + 52(. b.854 2 3.35) 2 = 3.…. 456 .4659 MSTr = = 1. c.206 + 1. There is sufficient evidence to conclude that there is a difference in the average number of prior non-soccer concussions between the three groups.206 2 1. Ho : p 0 = p 1 = … = p 9 = . ( xi − 25) ranks 25.1 -3. so do not reject Ho . n = 12.5 -8.2 4 27. H a : µ > 25 .1 -8. and we reject Ho at significance level . The test statistic is s + = sum of the ranks associated with the positive values of ( xi − 100) .3 -7. 2.2 2.2 3* xi 457 .4 7.8 1* 36. With n = 5 and α ≈ .4 5. 2 ranks xi ( xi − 100 ) We test 105.7 3.7 103.3 2* 21.13.3 92. H a : µ ≠ 100 .1 105 99.8 0.3 1. reject Ho if s + ≥ 15 .6 5* 26. or if s + ≤ − 64 = 78 − 64 = 14 .1 1. 025). H 0 : µ = 100 vs. and since 27 is neither ≥ 64 nor ≤ 14 .7 0. (from Table A.8 -3.6 -9.6 107. which is close to the desired 12(13) value of .03 .2 96.3 100. There is not enough evidence to suggest that the mean is something other than 100. H 0 : µ = 25 vs.1 5 -0.CHAPTER 15 Section 15. We test It is still plausible that the mean = 25. we do not reject Ho . From the table below we arrive at s + =1+5+2+3 = 11.8 -3.026 .9 96.05 if s + ≥ 64 . which is not ≥ 15 .6 7* 12 11 3 5 10 1* 6* 2 9* 4* 8 S+ = 27. with α / 2 = .6 90.5 91.6 11.9 91. 8 3* 1* 12 7 6 8 9* 4* 31. H a : µ ≠ 7.39 vs.7 27.22. We test .8 -21.3 2. -. Since s + = 18 ≤ 21 . -.6 30.1 25.6 26.01 Ho would be rejected. and 13.05. H a : µ D ≠ 0 . With n = 12 and α = .3 -2.9 2. Table A. H 0 : µ D = 0 vs.9 30. 4.07.05 .01.3 . so Ho cannot be rejected. -.5 -6.1 15. The ( xi − 7. The appropriate test is Ho if s+ ≤ H 0 : µ = 30 vs. With n = 15.8 0.17.01 .025 .04. and α = .2 -17. The data is paired. -. 5.25.11.2 8.1 2. and -. -.6 5 0. Ho is rejected at level .30. . .4 -3. 458 α = .06.10 .5 rank 1 10* 12* 3* 6* 5 11* 7* 2* 8* 4* 9* s + = 72 .9 -4.2 5* 15* 13 11 14 10 2* S+ = 39. -.9 1.13 indicates that Ho should be rejected if either s + ≥ 84or ≤ 21 . H 0 : µ = 7.38. the critical value .5 2. In fact for is c = 71. 2 xi ( xi − 30) ranks xi ( xi − 30) ranks 30. Ho should be rejected if either s + ≥ 64 or if s + ≤ 14 .Chapter 15: Distribution-Free Procedures 3. from which the ranks of the three positive differences are 1.2 12.9 . so Ho is rejected at level . so even at level .37.39) ’s are -. With n = 14 and α / 2 = .05.6 0.1 -14. There is not enough evidence to prove that diagnostic time is less than 30 minutes at the 10% significance level.9 53.05. which is not ≤ 37 . 4.8 24. -. reject 15(16) − 83 = 37 . and we wish to test di -.1 0. H a : µ < 30 . so a two tailed test is appropriate.39 .6 1.8 3. -. -. and 72 ≥ 64 .2 1.2 -5.2 -1.4 35 30.44.5 23.8 .9 23.29. where µ D = µ outdoor − µ indoor .15 1.03 0.39 2.4 8.5 0.16 0.36 0.25 at significance level .46 23 4 31 1 18 14 30 9. and we conclude that the We wish to test greater illumination does decrease task completion time by more than 5 seconds.17 0.85 0. we can use the large sample test.09 -0. Ho will be rejected if s + ≥ 37 . The test statistic is we reject Ho if Z= s+ − n (n +1 ) 4 n ( n +1 )(2 n +1 ) 24 .37 0. so we can (barely) reject Ho at level approximately .17 0.9665 3132. where µ D = µ black − µ white .56 . 459 . we reject Ho 55.87 0.31 -0.62 3 4.25 -0.83 1.7 0.23 0.38 0.05 1. s + = 37 .22 -0.61 3* 4* 8* 1* 6 16.56 ≥ 1.42 0.15 0.62 8 9.31 0.07 -0.02 -0.96 0.22 0.67 0.89 2. With n = 9 and α ≈ .05 .2 rank di d i − .5 22 12 0.11 0.Chapter 15: Distribution-Free Procedures 6. and because n = 33.17 0.5 32 21 8 15 28 33 25 9.51 -0. α = .96 .2 rank 0.63 1.05.22 0. so z = 424 − 280.63 0.19 0.03 0.65 0.20 vs.5 0.2 rank di d i − .48 0. di d i − .19 0.02 0.89 -2.18 0. which is ≥ 37 .68 0.43 0.05 2 17 16 19 29 3 13 26 27 7 5.07 8.76 0 -0. As given in the table below.31 0.07 3.1 0.48 0.3 0.05 .1 -0.03 0.11 0.96 .06 -3.03 0.5 143.23 0. 7.3 -0.09 6. H 0 : µ D = 5 vs.11 0. di di −5 rank di di −5 rank 7.12 9* 5* 7* 2 H 0 : µ D = .28 -0.45 -0.05.13 0.5 = = 2.20 .5 11 20 24 s + = 434 .11 5.06 1.4 3.26 0. and z ≥ 1. H a : µ D > 5 . Since 2.88 11.01 0.09 1.2 -0.39 0.71 0. H a : µ D > . 14 gives the upper tail critical value for a level .1.7.042 while c = 2 implies α= 4 24 = . -.645 .16 .. . 20. 7.0427 . 1.5.05 test as 36 (reject Ho if W ≥ 36 ).1.5 z= = 1. . . 21.2.9. 24. 37.75 and σ = 37. H 0 : µ = 75 vs.g. -.5 − 162.645 . r1 r2 r3 r4 D 1 2 3 4 0 1 2 4 3 2 1 3 2 4 2 1 3 4 2 6 1 4 2 3 6 1 4 3 2 8 2 1 3 4 2 2 1 4 3 4 2 3 1 4 6 2 3 4 1 12 2 4 1 3 10 2 4 3 1 14 r1 r2 r3 r4 D 3 1 2 4 6 3 1 4 2 10 3 2 1 4 8 3 2 4 1 14 3 4 1 2 16 3 4 2 1 18 4 1 2 3 12 4 1 3 2 14 4 2 1 3 14 4 2 3 1 18 4 3 1 2 18 4 3 2 1 20 When Ho is true. 3.2 10. Since 1. Section 15. 7. 3. 12. and 25. so Ho will be rejected at level . 4.9. 9.9.16 p − value ≈ 1 − Φ (1. With m = n = 5.5.1. Since n = 25 the large sample approximation is used. so w = 5 + 6 + 8 + 9 + 10 = 38. 286(x). each of the above 24 rank sequences is equally likely. 250(x). 245(x). 229(x). P(D = 2) = P( 1243 1 or 1324 or 2134) = 3/24).7. Ho is rejected. so 25(26 )(51) 4(1)(2)(3) σ s2+ = − = 1381. 4. which yields the distribution of D when Ho is true as described in the answer section (e. 1. Table A. 8.7. 5.4. The ( xi − 75)' s are We wish to test –5. 14.7. 8.72 ) = . H a : µ > 75 . Ho is rejected in favor of Ha.5.05 if z ≥ 1. 179(y).5.50 = 1380. 213(y).5. 23.72 ≥ 1. 2. 1. 17. Since 38 ≥ 36 . . so s + = 226.7. 247(y).167 .5. 2. The data indicates that true average toughness of the 16.5. 1.8.25 − .2.Chapter 15: Distribution-Free Procedures 8.8.2) 4 2 for σ should be used (because of the ties): τ 1 = τ 2 = τ 3 = τ 4 = 2 .2. The ordered combined sample is 163(y). The ranks of the positive differences are 1. and 299(x). 19. n(n + 1) = 162. Then c = 0 yields α = 24 = . 1. 4.1.5 and steel does exceed 75. -1. Thus 24 48 226. 225(y). -3. 8.5 .72 .5. -2.0. -1. 2. Expression (15. and 18.3. 460 .9.6. At level . 18.05. 17.6 21. so 138.58 or z ≤ −2. x 8.6 rank 1 4 5 6 8 10 15 16 y 3.3 5. and 20. Ho is not 174. H 0 : µ1 − µ 2 = 0 will be rejected at level .0 rank 2 3 7 9 11 12 13 14 Since w = 65.0 14.40). With τ1 = 3.8 4.2 9.58 . H a : µ 1 − µ 2 ≠ 0 . we wish to test H 0 : µ1 − µ 2 = 0 vs.4 7.58 nor ≤ −2. 663.4 4.22 .0 11.0219[2(3)(4) + 1(2 )(3) + 1(2 )(3)] = 174.27 .8 6.52). Ho cannot be rejected. x– 1 3.27)) = .Chapter 15: Distribution-Free Procedures 11. where 1(X) refers to the original process and 2 (Y) to the new process.8 6.54 is neither ≥ 2. 9.98). From Table A. τ 2 = 2 . so w = 135. τ 3 = 2 .0 7. p − value ≈ 2(1 − Φ (2.14 with m = 6 and n = 8. Since 37 is neither ≥ 61 nor ≤ 29 .6).2 5.0 4.1 17.7 5.5 − 105 z= = 2.5 5.27 is neither 12 13. 11. 4 (for .3 9.5 14.5 The denominator of z must now be computed according to (15. Thus 1 must be subtracted from each xI before pooling and ranking. 461 .5 rank 1 2 3 4 5 6 9 12.0 7. The hypotheses of interest are H 0 : µ 1 − µ 2 = 1 vs.5 10. 19.0232.5 11.5 7. 8 (for 1. Here m = n = 10 > 8. 16.20).5 rank 7 9 9 11 12. 12.21 2 rejected. so w = 37. Because 2.58 .5 9. Ho is not rejected. 10. Ho is not rejected.73). so we use the large-sample test statistic from p. 8. 13. 7 (for 1.5 4. Ho should be rejected in favor of Ha if w ≥ 84 .7 10.54 .58 nor ≤ −2.042).6 7.1 4. ≠ 0 if either Identifying X with orange juice.22 ≥ 2. H a : µ 1 − µ 2 > 1 .5 16 17 18 19 20 y 4. With 14.58 . The X ranks are 3 (for .5 14. Ho is rejected in favor of Ha at level .9 5.021) = . 5 (for 1.7 5. m(m + n + 1) = 105 and 2 mn(m + n + 1) 135 − 105 = 175 = 13. Because 2. z = = 2. the X ranks are 7.2 16.33).01 in favor of H a : µ 1 − µ 2 z ≥ 2. With X identified with pine (corresponding to the smaller sample size) and Y with oak.5 9. and 10 (for 1.5 15.21 .2 5.05 if either w ≥ 61 or if w ≤ 90 − 61 = 29 (the actual α is 2(. σ = 175 − . 7.75 rank 2 8 1 3 6 5 4 12 Y 1.0 + 11.I.8 ). H a : µ 1 − µ 2 < −25 . 23.25 3. Ho will be rejected at level .6.58 1.0 + 12.8 5. and 136. 39. = 8. 68. b.22 rank 9 7 11 10 16 15 14 13 We verify that w = sum of the ranks of the x’s = 41. 2 2 and the 5 largest averages are 30. With m = 7. 462 . There is evidence that the distribution of good visibility response time is to the left (or lower than) that response time with poor visibility.2 5.60 . It is easily verified that the 5 smallest pairwise averages are ( ) ( ) 5 .00 .3 11.80.0027. 45. a 95% C. We are testing H 0 : µ1 − µ 2 = 0 vs. The hypotheses of interest are H 0 : µ1 − µ 2 = −25 vs. 4. and 2 2 2 2 5. from which w = 1 + 3 + … + 12 = 39. H a : µ 1 − µ 2 < 0 . X 0. and 23. 6.33 4.17 0. giving 33. x( 32) .47 0. 16. -25 is subtracted from each xI (i. The corresponding ranks in the combined set of 15 observations are 1. The true average level for exposed infants appears to exceed that for unexposed infants by more than 25 (note that Ho would not be rejected using level .53 4.80). Before ranking.5%) has the form x (36−32+1) . 8.01 so we reject Ho .43 1.15 (the smallest average not involving 5. x( 32) = x(5 ) .00 .05 if w ≤ 7(7 + 8 + 1) − 71 = 41 .01). Section 15. µ 1 and µ 2 denote true average cotanine levels in unexposed and exposed infants.0 is x(6 ) = = 11. 3.68 0. so the confidence interval is (11.15. respectively. Ho is rejected.95. 25 is Let added to each).8 1.3 17.23 3. The reported p-value (significance) is .Chapter 15: Distribution-Free Procedures 15. (actually 94. which is < .0 + 17.0 + 5 .8 + 11 . so from Table A.8 = 11.e.58 0.0 + 17. 36. 26. 24.0 5.5 2. 23. Because 39 ≤ 41 . = 8. and 12. n = 8. 37.0 = 5.15.40 . n = 8.0. = 11. 5.47 0. a.37 0. 2 x (13) = 19. with n = 5 and n(n + 1) = 10 . d ij( 21) . and -.I. Subtracting 7 from each xI and multiplying by 100 (to simplify the arithmetic) yields the ordered values –5. 123.79. 86 (construct a table like Table 15.13 shows that a two tailed test can be carried out at level . 4. -. x (10) ) = (. The smallest average is clearly = −13 2 The ordered d i ’s are –13. 10. 45. 34. 16 while the five ( ) largest differences are 136. is (-13. 32. 91. 9. 1. With n = 14 and (x ( 13 ) n(n + 1) = 105 . 2 m = n = 5 and from Table A.5%) interval is d ij(5) .250 (or. -7. mn = 48. 2. 9. and 19 ( so 14.86 . the 94% C.79. -.52 . of course even higher levels).85. Table A. 117.095) while the 13 largest sums are 154. n(n + 1) = 15 .67 = . 12. is thus (7. or a 75% C.52 .I.I. -3. 4. 3.16.095.045.92. 463 .85.16 a 99% interval (actually 99. 28.I..67 = . -6). 35. ( 22.33 .82.83.124 or at level . 40.6% interval is (x (1) . . so the desired interval is 16. 14.177 ) .40 . and 86 ( so x (93) = = 7. (d ( ) .48 = . -12.15 shows 2 − 13 − 13 .. ) m = 6. n = 8. 111. 7. 99.5). d ( ) ) 1 15 − 6 −6 = −6 . and from Table A. 2 For n = 4 Table A. 2 14. 11. from Table A. 17. and 1.430) . -6.48 = . 120. The 13 smallest sums are –10.2%) requires c = 44 and the interval is (d ( ) .Chapter 15: Distribution-Free Procedures 18. 112.6% C. -. 12. 5. d ( ) ) .. while the five smallest are –1. -11. as (since c = 1) while the largest is 20.15 we se that c = 93 and the 99% interval is 2 . 122. 22. x(93) ) .73. 18. 4. The five smallest x i − y j differences are –18. 1. so the confidence interval for µ1 − µ 2 (where µ 1 refers to pine and µ 2 refers to oak) is (-..04. -2. 1. so we can obtain either an 87. so the C.48 = 1.73). 17. 107.19 = 7. The desired C.40 . The five largest x ij 5 ij 44 i − y j ’s are 1. c = 21 and the 90% (actually 90.04. 109..430). and 77. 16.86 87..I. With 21.99. the 87. = 176 r4. 25. = 160 r3. = 68 r3. 3 = 7. 20. Below we record in parentheses beside each observation the rank of that observation in the combined sample.251 .5(13) 1. = 85 k ≥ χ .205. 2.587 is not ≥ 7.7(10) 2: 7. 8.9(4) 6. 3 = 6. The 12 104 2 + 160 2 + 176 2 + 226 2  computed k is k =  − 3(37 ) = 7.10 if r1.992 .2(17) 3: 5.3(15) 11.23 . 10. 19.210. After ordering the 9 observation within each sample. 9.6(8) 8. H 0 : µ1 = µ 2 = µ 3 will be rejected at level . Ho cannot be rejected.06 ≥ 6.7(2) 5. and 94. the ranks in the combined sample are 1: 1 2 3 7 8 16 18 22 27 r1. 21. so the rank totals are 69.5(16) 11.992 . 1: 5.0.8(3) 6. The computed value of k is 12  312 + 68 2 + 26 2 + 85 2  k=  − 3(21) = 14.815 .251 . 13. 5. = 104 2: 4 5 6 11 12 21 31 34 36 3: 9 10 13 14 15 19 28 33 35 4: 17 20 23 24 25 26 29 30 32 At level . = 226 H 0 : µ1 = µ 2 = µ 3 = µ 4 will be rejected if k ≥ χ . 16.1(5) 6.4 23. r2.9(14) 10. = 26 r4. 3.05 if k ≥ χ .4(6) 6. 15. Since 9.4(20) Ho will be rejected at level . k= 12  69 2 90 2 94 2  + + − 3(23) = 9. 12. 24.23 ≥ 5. Since 36(37 )  5  7. 7.815 . 18 for the second.06 . reject 20(21)  5  Ho . 14 for the first sample.1(19) 12.5(7) 7. 90. = 31 r2.8(12) 9.205. 4. 11. we reject Ho .191) 5. 22 for the third.Chapter 15: Distribution-Free Procedures Section 15.2(11) 4: 9. The ranks are 1. 2 = 5.05. 22(23)  10 6 5  464 .1(9) 8. 17. Since 14. 6.7(18) 12.587 . . 1. Because 24 is not in the rejection region.92 . . so s + = 1 + 2 + 3 + 5 + 6 + 7 = 24 . -. 3 = 11.992 .11 8(9 ) with n = 8. The d i ' s are (in 2 order of magnitude) .56.205. which is not 10(3)(4) ≥ χ .01. H 0 : µ D ≠ 0 . 1 2 3 4 5 6 7 8 9 10 ri ri 2 A 2 2 2 2 2 2 2 2 2 2 20 400 B 1 1 1 1 1 1 1 1 1 1 10 100 C 4 4 4 4 3 4 4 4 4 4 39 1521 D 3 3 3 3 4 3 3 3 3 3 31 961 2982 The computed value of Fr is 12 (2982) − 3(10 )(5) = 28.60 . Supplementary Exercises 28.344 .96. which is 4(10)(5) ≥ χ . and –1. From Table A.60. Ho will be rejected if either s + ≥ 32 or s + ≤ − 32 = 4 . where µ D = the difference between expected rate for a potato diet and a rice diet.16. . 27.25.201. 1 2 3 4 5 6 7 8 9 10 ri ri 2 I 1 2 3 3 2 1 1 3 1 2 19 361 H 2 1 1 2 1 2 2 1 2 3 17 289 C 3 3 2 1 3 3 3 2 3 1 24 576 1226 The computed value of Fr is 12 (1226) − 3(10)(4 ) = 2.18.24. 465 . 2 = 5. The Wilcoxon signed-rank test will be used to test H 0 : µ D = 0 vs.Chapter 15: Distribution-Free Procedures 26. Ho is not rejected. so don’t reject Ho . . so Ho is rejected. = 28 .7. = 29 .344 Treatment k= 12 420 H 0 : µ1 = µ 2 = µ 3 = µ 4 .67 . so mn − c + 1 = 25 − 22 = 1 = 4 . -3.344 . The Kruskal-Wallis test is appropriate for testing rejected at significance level . -3. -3.05. At level . From Table A. Ho will be rejected if f r ≥ χ . from which the defining formula gives f r = 9.I.815 .5. H 0 : α 1 = α 2 = α 3 = α 4 = 0 is rejected. reject   − 63 = 17 . -3. m = n = 5 implies that c = 22 for a confidence level of 95%.815 . 466 .16. -6. Thus the confidence interval extends from the 4th smallest difference to the 4th largest difference. 3 = 7.201.8).9. = 16 .9. 3 = 11.86 .86 ≥ 11. -5. Because f r ≥ 7.1.8.Chapter 15: Distribution-Free Procedures 29. r4. and we conclude that there are It is easily verified that effects due to different years.01 if k ≥ χ . r2. r1.2. Friedman’s test is appropriate here. so the C. 5 Ho . 30.62 and the computing formula gives f r = 9. is (-5. and the 4 largest are –3. -6.1. = 17 . Ho will be ri ranks I 4 1 2 3 5 15 II 8 7 10 6 9 40 III 11 15 14 12 13 65 IV 16 20 19 17 18 90  225 + 1600 + 4225 + 8100  Because 17.4.205. r3. 31. The 4 smallest differences are –7. 24 1.07 .59 -.54 . a.85 . 50% of the distribution should be above 25.15 -.12 -. From the same binomial table as in a.27 -.51 . . giving (-. H 0 : µ1 − µ 2 = 0 will be rejected in favor of H a : µ 1 − µ 2 ≠ 0 if either w ≥ 56 or w ≤ 6(6 + 7 + 1) − 56 = 28 .45 1.058 (a close as we can get to . There appears to be no difference between b. c = 35 and Lateral Gait 1.Chapter 15: Distribution-Free Procedures 32.02 -.66 1.53 1.14 -.05).57 -.41. a.04 -.66 .09 1.I.64 1. we don’t µ 1 and µ 2 . 467 so .19 -.39 Gait D L L L D D obs 1.64 .35 -.13 -.50.79 . From the Binomial Tables (Table A. Because 43 is neither reject Ho .26 .51 From Table A.31 .85 1.66 1.86 1.942 = .80 -.43 .01 -.82 w = 1 + 4 + 5 + 8 + 12 + 13 = 43 .27 1.96 .31 1. we find that P (Y ≥ 14 ) = 1 − P (Y ≤ 13 ) = 1 − .29 .53 1.29) as the C.979 = .41 -. ≥ 56 nor ≤ 28 .36 -. and since 12 is not ≥ 14 . For this data. With “success” as defined.50.51 1.29 . b. thus p = .37 .31 1.08 -.27 .82 .18 -.46 .68 . 021 . Y = (the number of observations in the sample that exceed 25) = 12. 33.39 1. we see that α = P (Y ≥ 15 ) = 1 − P (Y ≤ 14 ) = 1 − .09 1. Gait D L L D D L L Obs .24 . Differences Diagonal gait .16.27 1. then Y is a binomial with n = 20.06 -.45 1.058 if Y ≥ 14 .86 1. we would reject Ho at level .18 mn − c + 1 = 8 .40 .15 -. c = 14.24 -. we fail to reject Ho .73 .24 1.15 .06 -.38 -. To determine the binomial proportion “p” we realize that since 25 is the hypothesized median.1) with n = 20 and p = . the critical value for the upper-tailed test is (Table A. 468 . 4 = 4.3 4. Using the same logic as in Exercise 33. The null hypothesis will not be rejected if the median is between the 6th smallest observation in the data set and the 6th largest. (If the median is less than or equal to 14.) Thus with a confidence level of 95. and P (Y ≥ 15 ) = . so the significance level is α b. we have 5 or less observations above. 6 = 2. Sample: Observations: Rank: y x y y x x x y y 3.0 4.021 .4. if any value at least 41. m = 4. 3 = 3. n = 5) c = 27 ( α = .056 ).g.7 4.1 5. P (Y ≤ 5 ) = .Chapter 15: Distribution-Free Procedures 34.042 .021 . X ordering 123 124 125 126 127 134 135 136 137 145 146 147 ranks w’ 123 124 123 122 121 134 133 132 131 143 142 141 6 7 6 5 4 8 7 6 5 8 7 6 X ordering 156 157 167 234 235 236 237 245 246 247 256 257 ranks w’ 132 131 121 234 233 232 231 243 242 241 232 231 66 5 4 9 8 7 6 9 8 7 7 6 X ordering 267 345 346 347 356 357 367 456 457 467 567 ranks w’ 221 343 342 341 332 331 321 432 431 421 321 5 10 9 8 8 7 6 9 8 7 6 Since when Ho is true the probability of any particular ordering is 1/35.05. 5 = 3. Ho cannot be rejected at level . 3. and 4. 7 = 1 (so e. 2 = 2. a. 35. At level . 2. The only possible ranks now are 1. exclusive.9 5.4 and 41. 36. the X ordering 256 corresponds to ranks 2. 2).1 4. then there are at least 15 observations above. and we reject Ho .5 is chosen.14. Similarly.4 4. = .8% the median will fall between 14. we easily obtain the null distribution and critical values given in the answer section.8 4. 3. Since 26 is not ≥ 27 .6 1 3 5 7 9 8 6 4 2 The value of W’ for this data is w′ = 3 + 6 + 8 + 9 = 26 .5. Each rank triple is obtained from the corresponding X ordering by the “code” 1 = 1.05. 6808 = 1 − P − 3 + n < Z < 3 + n = 1 − P(− . P(− c ≤ Z ≤ c ) = .8993. All ten values of the quality statistic are between the two control limits. P(point falls outside the limits when µ = µ 0 + . but (.5) than by an ordinary chart.8993 = .882) = 1 − . 3. ( 1− P − 3 − 2 n < Z < 3 − 2 469 .5σ )  3σ 3σ  = 1 − P  µ 0 − < X < µ0 + whenµ = µ 0 + .24) = .5σ  n n   = 1 − P − 3 − .12 < Z < 1.  3σ 3σ  1 − P µ 0 − < X < µ0 + whenµ = µ 0 − σ  n n   ( ) n ) = 1 − P(− 7. taken together. Such a “small” change is more easily detected by a CUSUM procedure (see section 16.3 then gives c = 2. The 2 appropriate control limits are therefore µ ± 2. a standard normal random variable.2 4.2236 c. P(10 successive points inside the limits) = P(1st inside) x P(2nd inside) x…x P(10th inside) = (. it is readily verified that all but one plotted point fall below the center line (at height . b. However.998)10 = . (.81σ . the observed values do suggest that there may be a decrease in the average value of the quality statistic.81.47 < Z < −1. Thus even though no single point generates an out-of-control signal. 2. a.998)25 = . Table A.9802. P(25 successive points inside the limits) = (.1007 > . 5.10.998)52 = .04975)..76 < Z < 5.9512.995 + .998)53 = . so for 53 successive points the probability that at least one will fall outside the control limits when the process is in control is 1 . For Z.0301 .005 = .5 n ( ) = 1 − P(− 4.9011. All ten values are between the two control limits.995 implies that Φ (c ) = P (Z ≤ c ) = . Section 16.9699 = .CHAPTER 16 Section 16.9975 . so no out-of-control signal is generated.47) = .1 1.5 n < Z < 3 − . 526 12. . The value of x on the 22nd day lies .91.20.13.47 ± 3 = 96. 2317.70 .80 .81σ 2.952 6 Now x= between these limits.47 ± 1. 9.00 ± .81 < Z < 2. 5 Every one of the 22 x values is well within these limits. 8.005 and ARL = = 200 . 2. yielding the control limits 1.325 .77 = 12. and a 6 = .17 .264 .  2. so the process appears to be in control with respect to location.95 ± .526 . 2317. the control limits are .00 ± (3)(.98.54 ± 1.13. s = 1.005 470 . All 23 remaining x values are .250 96.81σ  P µ 0 − < X < µ0 + whenµ = µ 0  n n   = P(− 2.60 = 96.95 ± 3 = 12.264 96.54 ± 3 = 96.6) = 13.08 . All points are between these limits. giving the control limits 24 1. every point ( x ) is between .336 b5 = 2.98. r = 1. from which LCL = 12. Again.80. so with a 5 = .86.61 = 94. so the process appears to be out of control at that time.Chapter 16: Quality Control Methods 6.940 . so the probability that a point falls outside the limits 1 is .20 and UCL = 13. x = 12.18.95 and s = . 11. so there is no evidence of an out-of-control process. 10.995 .34 − 1.336 12. so no further out-of-control signals are generated.325 5 and so the process again appears to be in control with respect to location. a.34 30.81) = . 7. The limits are 13.75 = 12.95 ± .952 6 x= above the UCL.07 − 98.250 . giving the limits 23 23 1.940 5 these limits.63 = 94.72 .47 and s = = 1.54 .95 ± 3 = 12.07 = 96.952 . 78 .13.20 13 X=12.Chapter 16: Quality Control Methods b.13.0SL=13. No points fall outside the 2-sigma limits.25 = 12. 13).50 = 12.95 .81 − n < Z < 2.20 . .0SL= 12.70.0026 µ = µ 0 + σ . Thus ARL = c.1587 1 .45 = 12.20 12 0 10 20 Sample Number The 3-sigma control limits are from problem 7. P = P(a point is outside the limits)  2. .95 ± .53 . 12.70 Sample Mean 2.209 1 = 385 for an in-control process.0SL= 12.1587 .37.45..0SL= 12.81 < Z < .81 − n ( ) = 1 − P(− 4.70 -2. There are also no runs of eight on the same side of the center line – the longest run on the same side of the center line is four (the points at times 10.81) = 1 − . 13. and the 1-sigma limits are 12.4273. The 2-sigma limits are 12.13. 1 = 4.9974 = .95 ± . and only two points fall outside the 1-sigma limits.0SL=13. the probability of an out-of-control point is 1 − P( −3 − 2 < Z < 1) 1 = 1 − P( Z < 1) = . IQR = .95 -1. 14 3.0SL=13. so ARL = = 6.4273 12. No out-of-control signals result from application of the supplemental rules.81σ  = 1 − P µ 0 − < X < µ0 + whenµ = µ 0 + σ  n n   = 1 − P − 2.30 .95 ± 3 = 12.791 = .990 5 471 .209 .990 . and when .0026 so ARL = 12. x = 12.13.45 -3.81σ 2.45 1. k 5 = .45 .45. The control limits are . 11. 48 .65) = 2.880)(2.940 .54 .484 .54) = 3.75)2 = .06 = .940 = .5632 = 1. a.2642 ± 1.2040) 1 − (.60 .434. so all si ' s are between the control limits. b4 = 2.723 . The smallest s I is s 20 = . b8 = 2. the lower control limit is zero 24 3(.5172 + .940 ) 2 and the upper limit is . The largest s I is s 9 = . = 1. so all points fall between the control limits. 39. LCL = 0 (since n = 5) and UCL = 3(. The process appears to be in control with respect to variability.058 r = 3.5172) 1 − (. so every value is between .54 ± = 3.48.2 r = 30 = 2.045 and 2.64 = 6. Every s I is between these limits.Chapter 16: Quality Control Methods Section 16. 15. Σ si2 = 39. With a 5 = .844 .515) = 6. and c 4 = . 2. LCL = 0 and UCL 3(. 18. 2.070 . = 2. s = 1. 16. 2.963. and the control limits are 1. and the largest is s 12 = 1.2040 + . so LCL = 24 5 (1.940 ) 2 .84 .844 s = .5625 and UCL = 20 5 2 2 2 and the largest is s12 = (1.3 14.65.058 . The smallest s 2 value is s 2 = (.84 + b.6664 .4261 .5172 .045.2642 . Σ si = 4.895 and s = 4.84 + 3. so the process appears to be in control with respect to variability.952 2 = 1.940 = .84 ) = 2. 85. 17.2642 ± 3(1.9944 and s 2 = 472 .6.6664)(.895 = .820)(3. a 5 = .75. a 6 = .0804 .2221 = .820 . and c8 = .6664)(20.940 .2040 + .880 .2040 . Since n = 4.952 .837 .210 ) = .5172 + .2642 ) 1 − (.9944 (1.54 ± 3. and the control limits are 3(.2194 = .952) . The smallest x i is 200 7 x 7 = 7 . pˆ i x x x + . which is between the limits.. 13 39 = . Thus k n n n 100 5.e. + x k 578 ˆ i = 1 + .390 .9055) Σ xi 567 = = .231 ± 3 = .0324..4 19. This is the only such signal.0945)(. + k = 1 where Σ p = = 5. p (1 − p ) 2 .1500 < UCL .130 .0621 = . (after squaring both sides) 50 p > 3 p (1 − p ) . but = .105. This (barely) exceeds the LCL..0945 ± .126 = . The control limits are . Thus pˆ 5 > UCL = . 100 p=Σ b.e.0945 . 20.1566 . with 30 pˆ 25 = = .0566 . from which p = .78 . 200 21.. with pˆ 7 = = . i. n 3 50 p > 3(1 − p ) .0945 ± 3 (.1566 . which exceeds the upper 100 100 control limit and therefore generates an out-of-control signal. i.357 . The largest x i is 200 37 x 5 = 37 .Chapter 16: Quality Control Methods Section 16..e. 53 LCL > 0 when p>3 473 .769) a. 25 (.231)(.. 53 p > 3 ⇒ p = = .231 ± . so an out-of-control 200 signal is generated.78 p= = . since the next largest x i is x 25 = 30 . i. The control limits are nk (200)(30 ) = . with pˆ 5 = = .231 .0350 .185 . Σ xi = 567 . 3218 0. LCL = 0.50.4.3081 ± .3537 0.3081 ± 3 1 800 = . gi 3. the ui ' s are 3. LCL = 0. and 1.0.75. UCL = 13.3047 0.00.3218 0.00. 3. 23. UCL = 14.857 . 2.0933 .67. gi u ±3 u = 5.10.5125 ± 7.5125 .00.4446 0.4142 . xi . 12.. gi ui = u ±3 u = 5.3537 0. Thus LCL = 0 and UCL = 10.3537 ( ) 0.3081 .06 ≈ (− 2.3835 Σ yi = 9.25. 4.3300 0.0436 . so the process is judged to be in control.2367 0.2958 0.3133 0.33.6 .50.2868 0. 30 are The suggested transformation is Y ( ) ( ) 0.2255 (in n = sin 4n −1 xi radians).75.8 . 12.3977 0.3047 0. Several ui ' s are close to the gi corresponding UCL’s but none exceed them.0 . x − 3 x < 0 is equivalent to 25. sin . i.00. the process is judged to be in control.2020. In contrast ot the result of exercise 20.2678 0. giving u = 5. 474 .2475 0.3614 These give y±3 1 4n 0. 2.75.08 ± 6. 3. 8.00. u For g i = . there I snow one point below the LCL (.2774 0.08 .33. ( ) = h ( X ) = sin −1 X n . UCL = 12. 5.33. 5. and x ± 3 x = 4.050 = .2367 0.3381 0. 6. 1. …. 6.2437 and y = .1091 = . Because no x i exceeds 10.6. 24. x < 9 .2475 0. u ± 3 = 5.1.2958 0. 5.67 for I = 1. …. 12.3047 0.2255 0.00. LCL = 0.75. 20. With x < 3 .2020) as well as one point above the UCL. For g i = 1. and the values of y i = sin n for i = 1.6. x = 4. 3.2958 ( ) 0. 3.1882 < . 6.2774 0.e. For g i = .5125 ± 7.3906 0. Σ xi = 102 .5125 ± 9.1.67.00.Chapter 16: Quality Control Methods 22.1) .1882 0. 3. with approximate mean value 1 −1 −1 x1 sin −1 p and approximate variance .2868 0. The control limits are = . 46.Chapter 16: Quality Control Methods 26.05 x i − 15 . 4.115 -0. 3.29. d i = max (0.20 or that e r > . and 4. 2. 5.20 .082 0.003 0 0 0 0. 4.934 ± 3 = .024 -0.010 0 0 0 0. 2.934. 5.110 0.038 0. 5.00.020 0. Since every yi is well within these limits it appears that the process is in control.29.6.079 -0. Thus y ± 3 = 3.068 -0.83.124 0. k = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -0.00.088 0.080 0.101 0.20 .018 -0.46.46.05 .151 -0.5 27. from which Σ yi = 98.024 0.46.95 )) .138 -0.05)) .00. 4.105 0 0 0 0. 4.00. 4.090 0.66.005 0.016 -0. so no out-of-control signals are 475 . h = .001 0. 2.012 0.116 -0. ∆ = 0. 4. 4.934 .010 -0.021 -0. 2 ei = max (0.054 0.015 0. d i −1 + ( x i − 16. 2.010 0.95 di ei µ 0 = 16 . 4. 3. i x i − 16 . 5. 3.83.058 0. y i = 2 xi and the y i ' s are 3/46.29.00.47.934 .00.032 -0.005 For no time r is it the case that generated.001 0.017 0 0 0. 3. 25. 4. ….83.024 0.054 0 0 0 0. ei−1 + ( xi − 15.90.015 0 0 0 d r > .00.83.35 and y = 3.038 0 0 0 0. 0 0. Section 16.47.90 for I = 1. 2. 0008 -. 2 ei = max (0.0013 -.005  h =  (2.0095 .0039 -.0045 > .0092 .0045* . h = .0009 0 0 0 0 0 0 0 0 .005 / n n = 2.8 ) =  (2.8.0022 -.0013 .0048 -.87.0018 -. ei−1 + ( x i − .0020 .0014 . Then connecting .0006 0 0 0 0 0 0 0 0 0 0 0 .0020 -.0037 .0017 .75.0002 0 0 .0021 -.00626 .0018 . d i −1 + ( x i − .8) = . Connecting 600 on the in-control ARL scale to 4 on the out-of-control scale and extending to the k’ scale gives k’ = .0006 -.0009 -. d i = max (0.0007 -.0000 .0075 .0002 .  n  5  476 .751 di x i − .0067 .003 .0018 -.0002 .0007 -.0038 -.175 ⇒ n = 4.751)) .0019 .001 .0028 -.0028 -.0017 -.0020 . ∆ = 0. so  σ   .0001 -.0003 -.0019 .0120 .0007 .0006 -.0030 -.0008 -.749 ei 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 -.0040 -.0014 .0011 .0050 -. Thus k′ = ∆/2 .0129 e15 = .0003 -. 29.002 = from which σ / n .73 = s . k = Clearly i x i − .003 = h .75 . suggesting that the process mean has shifted to a value smaller than the target of .749)) .0026 -.Chapter 16: Quality Control Methods 28.87 on the k’ scale to 600 on the out-ofcontrol ARL scale and extending to h’ gives h’ = 2.0006 .0026 .0027 -.0012 -.0000 -. µ 0 = .0029 0 0 0 0 0 0 0 0 0 0 . the probability of lot acceptance using this plan is larger than that for the previous plan. .0827 . 33.85 ) = 2.Chapter 16: Quality Control Methods 30.07 .02 .07 .  50   50  P( X ≤ 1) =   p 0 (1 − p )50 +   p1 (1 − p ) 49 = (1 − p )50 + 50 p(1 − p )49 0 1 p .9106 .0153σ .7 = Section 16. ….0048 .08 .06 .1900 .85. 10.0113 .04 . In control ARL = 250.10 P ( X ≤ 1) .7358 .0019 For values of p quite close to 0. out-of-control ARL = 4.08 . whereas for larger p this plan is less likely to result in an “accept the lot” decision (the dividing point between “close to zero” and “larger p” is someplace between . 32.0338 100  0 100  1 100  2 P( X ≤ 2) =   p (1 − p )100 +   p (1 − p )99 +   p (1 − p )98  0   1   2  p .01 .2321 .02).10.4 ⇒ n = 1.03 .0566 .1183 . 15. Then h’ = 2.  n k ′ = . For the binomial calculation. the current plan is better.8.10 P ( X ≤ 2) . n = 50 and we wish  50   50   50  P( X ≤ 2) =   p 0 (1 − p )50 +   p 1 (1 − p )49 +   p 2 (1 − p ) 48 0 1  2 50 49 48 = (1 − p ) + 50 p(1 − p ) + 1225 p 2 (1 − p ) when p = .01 and .01 .01. For the hypergeometric calculation  M  500 − M   M  500 − M   M  500 − M           0  50   1  49   2  48   P ( X ≤ 2) = + + . .9206 .09 .0532 .06 .05 . 50.1265 .05 .09 . So n = 1.03 .04 .02.4198 . to be  500   500   500         50   50   50  calculated for M = 5.6 31. ….02 .6767 . 2 σ/ n σ/ n  σ  giving h =  (2.2794 .5553 .0258 . In this sense. from which ∆/2 σ /2 n = = . 477 .96 ≈ 2 .4005 . The resulting probabilities appear in the answer section in the text. 02 np 2.0059 ) = .0038 478 . X2 = 0) [since c2 = r1 – 1 = 3] = P(X1 = 0 or 1) + P(X1 = 2)P( X2 = 0 or 1) + P(X1 = 3)P(X2 = 0) 1 1  50   50  100  x = ∑   p x (1 − p )50− x +   p 2 (1 − p )48 ⋅ ∑   p (1 − p )100− x x= 0  x  x=0  x  2  50   100  0 =   p 3 (1 − p )47 ⋅   p (1 − p )100 .2503) + (. or 2) = P(X1 = 0 or 1) + P(X1 = 2)P(X2 = 0.5968 p = .02: = . X2 = 0.1386)(. which appears in the 1 column in the c = 5 row.7604) + (.9862 ) = .93)131− x = .05: = .7358 + (.2794 + (.9981 p = .05: = .07 ) x (.55 . P(accepting the lot) = P(X1 = 0 or 1) + P(X1 = 2. 1.10 x x =0   P(accepting the lot) = P(X1 = 0 or 1) + P(X1 = 2.01: 36. X2 = 0 or 1) + P(X1 = 3. X2 = 0. 1.9106 + (. or 3) + P(X1 = 3. 2.0003) + (. or 3) + P(X1 = 3)P( X2 = 0.2794 + (.0779 )(.0688 p = .2199 )(. Then p2 AQL .02) = 1 − ∑  (.0607)(.05 x x= 0   5 131   P( X ≤ 5 when p = .0000) = .2611)(.0487 ≈ .2611)(.2904 p = .1117 ) = .0371) + (.0122)(.02 5 131 P( X > 5 when p = .5 ≈ 3. 35.65 ≈ 131 .0756 )(.0338 + (. p1 .02 )x (.07 = = 3. 1.613 n= 1= = 130.10: = . = .5405) = .2199 )(.8188 p = . 1.0974 ≈ .1858)(.9984 ) + (. 2.07) = ∑  (.1386)(. p LTPD .0779 )(.98)131− x = .1326) = .0338 + (. or 2).10: = .Chapter 16: Quality Control Methods 34.4033) + (. 3  0  p = . 07 .0274 c.2 917.1 1188.06 . 49 48 p .0114 .01 . ATI = 50P(A) + 2000(1 – P(A)) p .0048 .0140 .014 .04 .02 . AOQ = pP( A) = p[(1 − p ) + 50 p(1 − p ) + 1225 p 2 (1 − p ) ] 50 a.07 .3 1838.02 .03 .1 [ ] d d AOQ = pP ( A) = p[ (1 − p )50 + 50 p (1 − p )49 ] = 0 gives the quadratic dp dp 48 + 110.02 .0447.10 ATI 77.5 1753.01 .7 1896.0034 ATI = 50P(A) + 2000(1 – P(A)) p .05 .08 .018 .3 565. 49 each entry in the second row by the corresponding entry in the first row gives AOQ: p .2 1219.01 .6 679. AOQL = .08 .10 ATI 224.8 1393.09 .0318P( A) ≈ .011 b.03 .9 945.Chapter 16: Quality Control Methods 37.0167 .07 .3 1686. from which p = = .010 .0 1455.09 .10 AOQ .04 .09 .06 .02 . so multiplying 50 38.022 .3 202.08 .08 .6 1559. 479 . p = .05 .05 .0318 .07 .2 1629.0167 .10 AOQ .0147 .0160 .03 .04 . and 4998 AOQL = .024 .027 .0091 .04 .018 .05 .03 .1 418.06 .027 .0447P(A) = .3 1934.0066 .91 2 equation 2499 p − 48 p − 1 = 0 .01 .025 .6 AOQ = pP( A) = p[(1 − p ) + 50 p (1 − p ) ] .09 .1 1781. Exercise 32 gives P(A).06 .0089 . Σ xi = 10. so LCL = 0. giving LCL = 0 and UCL = 9.44 . Σ si = 402 . and 24 x ± 3 x = 3.31 .31 ± = 41 .7.536 X chart based on s : 422 .833 ± 5.952 6 3(41.31 ± 3(15 .4615 ± 14 .55.874 .3077) X chart based on r : 422.31 ± 41.4615 ± 40. Because x22 = 10 > UCL.9141 ≈ .980 .31) R chart: 41 . 833 . Σ ri = 1074 . k = 26.36. r = 41.Chapter 16: Quality Control Methods Supplementary Exercises 39.37 .4615 .42 . 3(15 .4615 ) 1 − (.26 2. the process appears to have been out of control at the time that the 22nd plate was obtained.442 .20 . A c chart is appropriate here. 480 .30 .536 6 2 S chart: 15 .4615 ) = 402 .442.848 )(41. Σxi = 92 so x = 92 = 3.952 3(. x = 422. UCL = 82.3077 n = 6.75 2.31 ± = 402. s = 15.952 ) = 15 . 4 1.886 3 481 .172 . s = .416 . from which an s chart has LCL = 0 and UCL = .473 .51.8957 . and s 21 = 2. Then Σ si = 16.Chapter 16: Quality Control Methods 41.3 1.5 1.71 .9 . .85 .6 1.20 50.265 .8 1.90 50.794 2.145 .33 49.5 5. and si < 2.839 .7988) s has limits 50.80 50.886 .404 .886) .7 .10 50.83 50.1 1.9 1.854 . Σ xi = 1103.43 1.97 50.0529.775 .136 1.698 .58.2 1. The x chart based on 3(.8 2.886 2 = 2.1 2.931 .00 50.13 49.93 49.8957 + 3(.0529 for every remaining i. s = .87 50.9 Σ si = 19.931 > UCL .6 1. x = 50. Since an assignable cause is assumed to have been identified we eliminate the 21st group.8957 ) 1 − (.30 49.30 50.175 .5 .17 50.781 .8 1.097 .706 .666 .8 . x = 50. a 3 = .854 .145 ± = 48.33 51.964 1.70 49.7998 .3020 .7 1.902 .854 1.40 49. All x i values are between these limits.33 50.23 50.1 .643 . The resulting UCL for an s chart is 2.159 .23 50.37 49.971 2.2 1. i xi si ri 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 50.7 2.833 . 25 and LCL = 0.886 = 24. All points are between these limits.478.56 = 391. UCL = 24. 470. and when n = 4.21 . so s = 24.43 . For variation: Σ (ni − 1) 76 − 20 3(24.2905. Σ ni xi = 32.0608 . + (1 − α ) ) + (1 − α ) ] = µ α + α (1 − α ) + ..65 ± 39.. x = 430.29 + 30.82 = 55. 2905 + .. UCL = 24.2905 + 44. n = 100.14 .) 2 1− C t σ 2 ⋅ .2905) 1 − (.. as was the case for the p-chart..4 = = 590. a.65 .. which gives the desired expression. The p-chart and np-chart will always give identical results since p(1 − p ) p(1 − p ) < pˆ i < p + 3 iff n n np − 3 np (1 − p ) < npˆ i = xi < np + 3 np (1 − p ) p−3 43.2905) 1 − (. Σ ni = 4(16 ) + (3)(4) = 76 .921) 2 = 24.08(. + α (1 − α ) E( X 1 ) + (1 − α ) µ t −1 t [ + (1 − α ) ] = µ [α (1 + (1 − α ) + .921 For location: when n = 3.9392 ) = 6.17 = 13.. s2 = Σ(ni − 1)si2 27.0279 . 1− C n 482 2(t −1) V (X 1 ) . 430.. Provided the E (X i ) = µ for each i..16.14 = 62.. E (Wt ) = αE ( X t ) + α (1 − α )E (X t −1 ) + . .09.. p = . + α 2 (1 − α ) 2 [ = α 1 + (1 − α ) + .380. 3(24. + C t−1 ⋅ =α2 2 (t −1 ) ]⋅ V ( X ) 1 2 (where C = (1 − α ) .886 ) 2 when n = 3. V (Wt ) = α 2V ( X t ) + α 2 (1 − α ) V (X t −1 ) + .4 .29 + 38.729. + α (1 − α ) t −1 t t −1 t ∞  ∞  = µ α ∑ (1 − α )i − α ∑ (1 − α )i + (1 − α )t  i =t  i= 0   α 1  = µ − α (1 − α )t ⋅ + (1 − α )t  = µ 1 − (1 − α ) 1 − (1 − α )  b. + (1 − α ) 2 2 [ ] σn = α 2 1 + C + .65 ± 47.Chapter 16: Quality Control Methods 42.08 + 7.49 = 383. so UCL = np + 3 np (1 − p ) = 6. 430. when n = 4..11 .08 + 3 6.16 − 5661. 1633) = 39. σ = . From Example 16. w14 = 40. w6 = 39.00 .88 w3 = . w11 = 40 .25 ⋅ = . w9 = 40. 2 − . an out-of-control signal is generated.6(40..6(39. σ 2 = .6 ) .6 x 2 + . 40 ± 3(.19 .6 w0 = µ 0 = 40 w1 = . w15 = 40.29 [ ] .6 1 − (1 − .1616 .49 = UCL.55.0261 . 40 ± 3(.6 4 σ 3 = .5 (or s can be used instead).6 x3 + .12 ) = 39 .20 w4 = 40.45 For t = 2. w8 = 40. w10 = 40.4 w2 = .6 ) .0225 .40.40. 2 − .6(40.12 w2 = .1637 = σ 6 .21 .4(40 ) = 40.1500 . 483 .48 For t = 3. ….25 .4(39. σ 5 = .1636 .4µ 0 = .8. w7 = 39. σ 4 = .4(40. Then α = .52.42 ) + .σ 16 σ 12 = [ 2 ] Control limits are: 40 ± 3(.51 > 40. 16.1633 .07 . These last limits are also the limits for t = 4.29 .20) + .88) = 40.74 . w13 = 40.1500) = 39. w5 = 40..25 σ 22 = ⋅ = .6 1 − (1 − .1616) = 39.72 ) + .6 4 4 .36 .51.51 .4 w1 = .6 x1 + . For t = 1. 40. w12 = 40.Chapter 16: Quality Control Methods c.14 .49 . Suppose that we use (not specified in the problem). w16 = 40.06 .88 . σ 1 = . Because w13 = 40. Documents Similar To Jay L. Devore-Probability and Statistics for Engineering and the Sciences, Jay L Devore Solutions Manual-Duxbury Press (2007)Skip carouselcarousel previouscarousel nextSolution Manual for Investment Science by David LuenbergerAnswers to International Economics Salvatore14443348 Probability and Statistics for Scientists and Engineers 8th Ed K Ye S Myers SOLUTIONS MANUALSolution Manual for Probability and Statistics for Engineers and Scientists 9th Edition by Walpole , Myers and Keying Ye1337870089 DornbuschS.C. Gupta, V.K. Kapoor Fundamentals of Mathematical Statistics a Modern Approach, 10th Edition 2000Solutions - Macroeconomia - Blanchard - 3 Ed.Investment Science David G LuenbergerMankiw Solution[] Answers Sydsaeter Hammond - MathematicsProbability and Statistics for Engineering and the Sciences 9th Edition Devore Solutions Manualeven number solutionsMsg 00062Sydsaeter Hammond - Mathematics for Economic AnalysisEconomic Growth in Nehru Era52243796 Gujarati Basic Econometrics SolutionsProbability and Statistics Book SolutionsIntermediate Microeconomics Varianbasic-econometrics-gujarati-2008Workouts in Intermediate MicroeconomicsEVS E. BharuchaPrinciples of Macroeconomics 10th Edition Solution ManualEconomics. J. Stiglitz, C. WalshKrityanand UNESCO Club Jamshedpur Internship Form Summer 2016essentials of econometricsProbability and Statistics for Engineers R.a Johnson Ch1-11Probability and StatisticsJohn E. Freund's Mathematical Statistics With Applications 7th EdMore From Nicodemus Sigit SutantoSkip carouselcarousel previouscarousel nextUts 2012 Manajemen Industri&Kewirausahaan Sutrisno5. Pers Dasar Al 2 FaseCatetan Mekanika FLuida 2t4a (Autosaved) 8 Februari 2018Tugas K3LTugas AgamaBab 1 Local Wisdom Inspiring Global SolutionSubmersible Pump Patent UsMilling Machine14 Pengenalan Tingkat Kehalusan Dan Toleransi1 SKK-00Direct-Ethanol Fuel CellMaster Jawaban MattekThe Blue BottlePerlindungan KatodikFooter MenuBack To TopAboutAbout ScribdPressOur blogJoin our team!Contact UsJoin todayInvite FriendsGiftsLegalTermsPrivacyCopyrightSupportHelp / FAQAccessibilityPurchase helpAdChoicesPublishersSocial MediaCopyright © 2018 Scribd Inc. .Browse Books.Site Directory.Site Language: English中文EspañolالعربيةPortuguês日本語DeutschFrançaisTurkceРусский языкTiếng việtJęzyk polskiBahasa indonesiaSign up to vote on this titleUsefulNot usefulMaster Your Semester with Scribd & The New York TimesSpecial offer for students: Only $4.99/month.Master Your Semester with a Special Offer from Scribd & The New York TimesRead Free for 30 DaysCancel anytime.Read Free for 30 DaysYou're Reading a Free PreviewDownloadClose DialogAre you sure?This action might not be possible to undo. Are you sure you want to continue?CANCELOK
Copyright © 2024 DOKUMEN.SITE Inc.